This report is a portion of the AMIA 2015 Tutorial on Using R for Healthcare Data Science. All code and data available at my GitHub page.


This report will walk you through the data scientist’s workflow and how recent R packages make data science easier and more intuitive. First, let’s start with a couple of disclaimers:

  1. This tutorial is to give you a sense of what is possible with R and motivate you to learn more - not to teach you every detail of code or packages available.
  2. We will not spend extensive time on data modeling. This tutorial is intended to work through data janitor tasks and reporting - in my experience some of the most time consuming tasks of data science.

To illustrate how packages released over the past few years have made these tasks easier we will walk through an entire analysis plan using data published by the International Warfarin Pharmacogenomics Consortium available on the PharmGKB website.


Starting a new patient on warfarin can be a complicated process as many providers select a starting warfarin dose based on complex clinical algorithms. We know that genetics play a role in final warfarin dose and many groups have started to include genomic markers in their algorithms used to advise starting warfarin dose.

Our goal is to ultimately create a web app that a provider could use to input clinical and genetic data about a patient and get back a recommended starting dose of warfarin. One group that has already completed this task is the IWPC (International Warfarin Pharmacogenomics Consortium).