Indexing Custom Data with Umbraco Examine
January 20, 2020
Using R Markdown to Share Analysis
You have some R code that has performed some analysis. How can you share the analysis without having to spend even more time creating a slideshow or document in some other tool? Enter R Markdown, an R package that allows you to embed code snippets in Markdown documents.
R is a statistical programming language that is often used for data analysis and visualizations. R is open source and has a very robust community that share content and ideas.
RStudio is a popular Integrated Development Environment (IDE) for R.
An R package is a group of functions created by the community that can be installed using R code or through an IDE such as R Studio.
R Markdown is a package for R and the center of this post.
Markdown is a language that uses plain text to format documents.
There are four easy steps to designing and publishing an R Markdown file. First, open a file with the .rmd extension. Second, write code using the R Markdown syntax. Third, add R code that generates an output to the report. Lastly, render the document into a slideshow, HTML, PDF, or Word file.
For this walkthrough, we will be using RStudio, which can be downloaded from here. If you do not have the R language installed as well, you can get that from here. In RStudio, you can install packages by clicking the Install button on the Packages window. Alternatively, you can use the following code to install a package directly.
To create an R Markdown file, go to File>New File>R Markdown… in RStudio. A window will appear where you can select the title, author, and output format of the file. This method will create an .rmd file with these sections already filled out as well as some instructions and examples of proper syntax for inserting code and generating the document. Alternatively, you can save any file using the .rmd extension.
An R Markdown file has many syntax options to transform your code into a descriptive and well formatted document. The YAML header at the top is optional but can contain information such as the title, author, date, and much more. From there, you can begin adding formatted text and code snippets. The most useful and unique part about R Markdown is the ability to determine how a code snippet will function. You can choose whether or not each block of code executes, displays the output or displays the code and output. Some use cases of these options are executing code that may need to be referenced later in the script without showing it in the resulting document, executing code and only showing the results in the document, or executing code and showing both the code used and its output.
Beyond code snippets, there are many text formatting options to improve the professionalism and readability of your document. There are custom templates, the option to use a CSS file, add images, web links, scientific notation, bibliographies, and more. For a full formatting guide, check out the R Markdown Reference Guide and cheat sheet. In addition to formatting text, R Markdown can contain code snippets from other programming languages such as Python and SQL as well.
The wide variety of options to publish content makes R Markdown scripts very useful and versatile. You can relay your code through a document, slideshow, website, book, interactive document, and more. There are two ways to publish your script using RStudio. The first is calling the render function directly. The name of the file to be created is the first parameter and the second parameter is the output type. If the output type parameter is blank, it will render the script using the output type identified in the YAML header. The second way to publish content is to select the Knit button in the RStudio window. It will render a script using the output that was selected in the YAML header. The output types include popular formats such as Word, PowerPoint, PDF, GitHub, and HTML. The output type used in the following code example is “html_document”.
This example walks through the entire process of gathering data, performing simple data modeling tasks, creating some visualizations, and turning the code into a markdown document in R. The example R script demonstrates the entire process of data analysis using a public data source. The script has comments explaining each step of importing the dataset, cleaning and reformatting the fields, creating combination and subset tables based on what is to be analyzed, and finally, creating the visualizations to display analysis. The R Markdown script example uses the code from the R script but presents it in a format for non-programmers to consume. The document created by the R Markdown script has descriptions of each outputted visual while hiding the underlying code used to create them.
R Markdown is a simple way to share R code results that includes several options for output types. The improved output allows users to understand and take action from the results quickly and efficiently. Programmers will find it easy to use and non-programmers will marvel (this is a great joke if you looked at the code examples) at the quality of your slides, document, web page, or interactive content. To learn more about R Markdown, check out the R Markdown from RStudio website and a free book, R Markdown: The Definitive Guide by Yihui Xie, J. J. Allaire, Garrett Grolemund.