Using LaTeX (and Markdown) in scientific writing
After this chapter, the students can evaluate the benefits of LaTeX in different kinds of scientific writing taska and can find instruction on how to learn and start using it.
Creating scientific (or any formatted) documents consists of two parts: writing of the contents and defining the layout of the final product (also known as typesetting). Widely used software such Word, LibreOffice, Pages and Docs combine these two and many people think that this is how it is supposed to be. However, sending out .docx or .odt documents as the final output is idiotic ill-judged as these documents are meant for text processing and to be modified; there are no guarantees that the documents aren’t changed (either by mistake or in purpose) by the recipient or that different versions of text-processing software show the document in the same way (a simple change from “A4” to “letter”-sized pages may change the formatting). The final output should always be shared as a PDF or in another static format that displays in identical fashion on everyone’s screen.
Professional content producers (book publishers, newspapers etc.) separate the content creation and typesetting, and there exists free software systems for amateurs to do the same. These systems use a markup language to define the different types of content (title, author name, 1st-, 2nd-, 3rd-level header, body text, list, numbered list, figure legend, footnote, citation etc.) and then do the typesetting using specific formatting rules for each content type category. The central dogma of these systems is that the user should focus on the content and let the program ensure that this content is nicely set in the final output. The benefit of this is that, once the content is ready, it is very easy to change the typesetting rules and thus the appearance of the final output. The downside is that it can be difficult to force the output to a specific style not provided by the pre-defined formats.
The most widely used markup system in scientific writing is LaTeX (pronounced either “LAY-tek” or “LAH-tek”). With that, the document is first written using a markup language and this “source” file is converted to the final output (nowadays PDF; it used to PostScript in the past) by the LaTeX compiler. There exist also systems that allow writing the document using a simpler markup system (such as Markdown) and that then first convert the primary document to LaTeX and compile that to PDF.
Local installation of LaTeX
LaTeX can be used on most personal computers. At the minimum, one needs the LaTex compiler package that can be either installed from the system’s software repository or downloaded from the net. One also needs a text editor to write the markup document: I prefer “integrated writing environments” such as TeXstudio that combine the editor and the compiler, and provide shortcuts and visual menus to insert special symbols or formatting commands. The benefit of the stand-alone system is that everything is done locally and it doesn’t require internet connection.
The lighter markup languages typically utilise Pandoc in the format conversion and LaTex in the final typesetting. Pandoc is available for the most popular operating system at https://pandoc.org/installing.html or (at least on Linux) in the software repository.
One of the latest tools in scientific writing is Quarto. It is especially powerful in combining text and data visualisation, and allows generating figures directly from raw data at the time of compilation. Quarto works with Jupyter and R, but it can also be used with Microsoft’s Visual Code Studio editor. That is a bit surprising as VSCode was originally meant for writing code in programming languages such as C and Java. However, its capabilities can be extended with plugins: I have installed e.g. “Quarto” and “Grammarly” and have then written both web pages (like this course) and scientific articles using the Markdown format.
Web-interface for LaTeX
The easiest way to test LaTeX is to use Overleaf.com, an online-interface to LaTex. The basic functionality is free of charge but additional features require a monthly subscription.
One can test its features by creating an “Example Project”.
This creates a new project with some example content. One can then edit this project to one’s liking, learning the basics of LaTeX on the fly.
With “Code Editor” selected (in the top panel), the page looks like this:
Here, the left window shows the LaTeX markup code and the right window shows the compiled output. One can now make edits in the left window and recompile the document by clicking “Recompile” or pressing Ctrl+Enter.
Alternatively, one can use the “Visual Editor” (in the top panel), and the page looks like this:
Overleaf provides an easy introduction to LaTeX and extensive documentation of its features at https://www.overleaf.com/learn.
The great benefit of Overleaf is the collaborative writing, similar to that in Google Docs. (This is even better with the subscription version.) The downside of online tools is that they require internet connection and may not work e.g. when travelling. Overleaf does allow easy downloading of the source files (as a zip package) and continuing working on the document using a stand-alone LaTeX. The files can also be synced via GitHub.
Writing a MSc thesis with Overleaf
On the Kumpula campus, LaTeX and Overleaf are widely used when writing Master’s thesis works. I have edited their thesis template for the EEB and GMB Master’s programs.
One can download the template files though these links:
Ecology and Evolutionary Biology: https://version.helsinki.fi/aloytyno/eeb_msc_thesis_template/-/raw/main/EEB_MSc_Thesis_Template.zip
Genetics and Molecular Biosciences: https://version.helsinki.fi/aloytyno/gmb_msc_thesis_template/-/raw/main/GMB_MSc_Thesis_Template.zip
In Overleaf, one can then select “Upload Project” and select one of these zip-files.
This then opens the template with basic settings:
The template support theses in Finnish, Swedish and English. The selection is done in the very beginning:
\documentclass[english,oneside]{UH_EEB_MSc}
where english
can be replaced by finnish
or swedish
.
The template includes comments that should help filling in the right details and then correctly formatting the thesis. However, basic understanding of LaTeX is required even when using Overleaf.
Writing documents using Markdown and Pandoc
Pandoc can convert Markdown to LaTeX and then to PDF. Doing highly sophisticated documents requires some experience but the basic usage is relatively easy to learn. One could use the default Markdown for everything and specify the title and the author using regular headings. However, the Markdown syntax has extensions to define details of the document in the file header. Pandoc understands some of these and Quarto some more.
We can write the first draft of a manuscript:
---
title: "Going out in Helsinki: "
author: Matti Meikäläinen, University of Helsinki, Finland
documentclass: article
classoption:
- twocolumn
papersize: a4
geometry: margin=2cm
linestretch: 1.15
fontsize: 11pt
mainfont: Arial
csl: pubs/nature.csl
abstract: "Regular physical exercise in the outdoors is crucial for one's physical and mental health and general well-being. However, the current weather has a major impact on one's decision to get out for daily exercise and the resilience to keep up with the regular exercise routine. Given this, choosing the right day of the week for one's outdoor activity plays a major role in one's life and should be performed only after a rigorous analysis and earnest consideration. Using weather data from a twenty-year time span, we show that Sunday is the optimal day for outdoors activities in the city of Helsinki, Finland. The result is surprisingly clear and robust for different optimisation criteria."
---
# Introduction
The current weather has been shown to have a major impact on the physical activity of children [@remmers2017daily; @ylvisaaker2022role], adolescents [@belanger2009influence] as well as adults [@tucker2007effect]. The decisions on keeping up with the regular outdoors routines has been studied in great detail [@chan2009assessing], both in specific geographic and spatial contexts [@spinney2011weather] and across different temporal ranges [@wagner2019impact], including the emerging challenges due to climate change [@brocherie2015emerging]. In addition to the beneficial impacts of weather on one's physical and mental well-being and health through outdoors exercise, the current weather and decisions on time usage naturally affect also the work motivation and the national economic output [@lee2014rainmakers].
Given the huge significance -- both for individual's health and for national economics -- of these decisions, it is surprising that no national guidelines are provided to support citizens on choosing the optimal day of the week for their outdoors activities. Naturally, any guidelines should be based on a vigorous scientific analysis using large collections of data. To rectify this shortage, we studied the long-term weather recordings in the city of Helsinki and provide recommendations for choosing the day for one's outdoor activities. We show that the results are surprisingly robust for different optimisation criteria.
# Results
TBD. Remember to refer to Table \ref{table:counts}.
\begin{table}[h]
\caption{The counts for each day within a week being optimal for outdoors activities under optimisation criteria 'Most sunshine', 'Highest temperature' and 'Smallest amount rain'.}
\centering
\medskip
\begin{tabular}{lrrr}
\hline
Day & Shine & Temp & Rain \\
\hline
Monday & 174 & 222 & 456 \\
Tuesday & 128 & 142 & 448 \\
Wednesday & 159 & 131 & 472 \\
Thursday & 136 & 117 & 456 \\
Friday & 146 & 140 & 470 \\
Saturday & 167 & 131 & 482 \\
Sunday & 200 & 218 & 487 \\
\hline
\end{tabular}
\label{table:counts}
\end{table}
# Discussion
TBD.
# Materials and methods
TBD. Summarise the methods here. Upload the README.md file, source data and the scripts to GitHub and give link to that. The analyses are fully reproducible on any system running Bash!
# References
::: {#refs}
:::
and then compile this with Pandoc as:
> pandoc --bibliography refs.bib --csl nature.csl --citeproc -o manuscript_nature.pdf manuscript.md
( Pandoc recognises the input and output formats from file suffixes. By using standard suffixes, options -f
(--from
) and-t
(--to
) can be omitted. )
This produces the output:
The older versions of Pandoc do not have the option --citeproc
and do the citation processing automatically. If you version gives an error with the above command, you can try to leave that option out.
If we now decide that we don’t want to submit this to Nature but to Science, we can recompile the same document with the new citation-formatting:
> pandoc --bibliography refs.bib --csl science.csl --citeproc -o manuscript_science.pdf manuscript.md
This produces the output:
Changing the citation style to that of Climate Research:
> pandoc --bibliography refs.bib --csl climate-research.csl -o manuscript_climate_research.pdf manuscript.md
produces the output:
CSL-files – that define the citation format – for a huge number of scientific journals can be downloaded from https://paperpile.com/guides/resources/citation-styles/.
This example is simplistic but it gives an idea of Pandoc – and of Quarto that builds on top of that.
Those with sharp eyes may have noticed that I made the table using LaTeX syntax and not with Markdown (as was shown in a previous section). The reason for that is the much greater power and flexibility of LaTex over the plain Markdown. As Markdown and LaTeX can be freely mixed, one can use Markdown where it works (and is simpler) and LaTeX where it is definitely needed. Experienced users can do nice-looking tables and figures with LaTeX and probably prefer that over the simple versions of Markdown.
The Markdown format for adding figures is pretty simple. If we download some graphics:
> wget https://upload.wikimedia.org/wikipedia/commons/5/58/Botanic_garden_Kaisaniemi_2008-001.jpg
we can then add this in the document as ![Figure caption text](figure_file)
– or here (placed just above # Results
):
![Helsinki-Kaisaniemi weather station is located in the Kaisaniemi botanical garden.](Botanic_garden_Kaisaniemi_2008-001.jpg)
By default, the figures are page- or column-wide. One change that by using LaTeX or the “column” environments of the Markdown format.
LaTeX uses the bib reference format that can be obtained e.g. from Google Scholar output.
The bib-file for my document was the following:
@article{chan2009assessing,
title={Assessing the effects of weather conditions on physical activity participation using objective measures},
author={Chan, Catherine B and Ryan, Daniel A},
journal={International journal of environmental research and public health},
volume={6},
number={10},
pages={2639--2654},
year={2009},
publisher={Molecular Diversity Preservation International (MDPI)}
}
@article{lee2014rainmakers,
title={Rainmakers: Why bad weather means good productivity.},
author={Lee, Jooa Julia and Gino, Francesca and Staats, Bradley R},
journal={Journal of Applied Psychology},
volume={99},
number={3},
pages={504},
year={2014},
publisher={American Psychological Association}
}
@article{belanger2009influence,
title={Influence of weather conditions and season on physical activity in adolescents},
author={B{\'e}langer, Mathieu and Gray-Donald, Katherine and O'loughlin, Jennifer and Paradis, Gilles and Hanley, James},
journal={Annals of epidemiology},
volume={19},
number={3},
pages={180--186},
year={2009},
publisher={Elsevier}
}
@article{brocherie2015emerging,
title={Emerging environmental and weather challenges in outdoor sports},
author={Brocherie, Franck and Girard, Olivier and Millet, Gr{\'e}goire P},
journal={Climate},
volume={3},
number={3},
pages={492--521},
year={2015},
publisher={MDPI}
}
@article{remmers2017daily,
title={Daily weather and Children's physical activity patterns},
author={Remmers, Teun and Thijs, Carel and Timperio, Anna and Salmon, JO and Veitch, Jenny and Kremers, Stef PJ and Ridgers, Nicola D},
journal={Medicine and science in sports and exercise},
volume={49},
number={5},
pages={922--929},
year={2017},
publisher={Lippincott Williams \& Wilkins}
}
@article{spinney2011weather,
title={Weather impacts on leisure activities in Halifax, Nova Scotia},
author={Spinney, Jamie EL and Millward, Hugh},
journal={International journal of biometeorology},
volume={55},
pages={133--145},
year={2011},
publisher={Springer}
}
@article{tucker2007effect,
title={The effect of season and weather on physical activity: a systematic review},
author={Tucker, Patricia and Gilliland, Jason},
journal={Public health},
volume={121},
number={12},
pages={909--922},
year={2007},
publisher={Elsevier}
}
@article{wagner2019impact,
title={The impact of weather on summer and winter exercise behaviors},
author={Wagner, Abram L and Keusch, Florian and Yan, Ting and Clarke, Philippa J},
journal={Journal of sport and health science},
volume={8},
number={1},
pages={39--45},
year={2019},
publisher={Elsevier}
}
@article{ylvisaaker2022role,
title={The role of weather conditions on time spent outdoors and in moderate-to-vigorous physical activity among Norwegian preschoolers},
author={Ylvis{\aa}ker, Einar and Nilsen, Ada Kristine Ofrim and Johannessen, Kjersti and Aadland, Eivind},
journal={Journal of sports sciences},
volume={40},
number={1},
pages={73--80},
year={2022},
publisher={Taylor \& Francis}
}
If the recipient still demands a Microsoft Word file
Pandoc can convert also from and to Microsoft Word’s docx-format. When converting from Markdown to docx, we can utilise the citation and reference processing within Pandoc with the parameter --citeproc
and thus create a Word document with citations embedded in the correct format and a matching reference list added in the end of the document.
> pandoc --bibliography refs.bib --csl nature.csl --citeproc -o manuscript_nature.docx manuscript.md
In LibreOffice, the output looks like this:
The citation format can be changed with the CSL-file.
Note that the table didn’t convert from Markdown to docx. This could be expected as the table was written using LaTeX, not Markdown, and here we were converting from Markdown to Word docx. We could get the table included by converting the document first to LaTeX and then to docx, but all the features are still not correctly maintained.
> pandoc --bibliography refs.bib --csl nature.csl --citeproc -o manuscript_nature.tex manuscript.md
> pandoc -o manuscript_nature_latex.docx manuscript_nature.tex
Doing everything in plain Markdown or combining different bits from various documents could do the job.
It would be lying to claim that learning LaTeX is really easy but it is not impossible either and the language is widely used in life sciences. Overleaf makes the first steps of LaTeX easier and provides an excellent platform for scientific collaborative writing. The documentation provided at https://www.overleaf.com/learn is extensive and suitable to both beginners and experienced users.
One should never write the reference list or convert the list to another format manually. Computers do it much better and faster.