Vojtech Huser and Laura Wiley, November 15, 2015
The R statistical programming language provides powerful tools to manipulate data and attracts many non-programmers. R offers a unique package management system and powerful data visualization packages. This tutorial will provide an introduction to the language, R installation (free software) and use of RStudio, a free integrated development environment built for R. In the first part we will cover R solutions for basic challenges facing data scientists like wrangling, cleaning and visualizing data in reproducible ways. We will focus on the most recent R packages, such as dplyr (data manipulation), ggplot2 (publication ready plots), and shiny (interactive web-based reports). In the second part, we will use several case studies (using publically available data from International Warfarin Pharmacogenomics Consortium (IWPC), Drugs@FDA, ClinicalTrials.gov and RxNorm) to demonstrate R in action on biomedical informatics datasets. We will demonstrate how the previously introduced packages for data cleaning and visualization can be applied to a dataset that combines clinical and genomic data and a range of informatics resources. All work will be demonstrated using reproducible reporting tools (e.g., RMarkdown) that combine code and analysis output in a single file (html, docx, or pdf). We will conclude with a summary of latest trends in the R language and comparison of R to other languages commonly used for data science (such as Python, Java, Julia, C or SAS), and a general Q&A section.
Project Homepage
GitHub Repository
Laura Wiley, August 10, 2015
Laura Wiley (PhD Candidate, Vanderbilt) will walk us through an overview of ggplot and the "grammar of graphics." The talk will include a high level overview of ggplot and go through some specific visualizations and an example of how to code them in R.
Laura Wiley, May 12, 2015
Laura Wiley, March 20, 2015