R for Beginners

Barbara Fulda

Please use arrow keys to navigate

What lies ahead?

  • The why question
  • The what question
  • The how question

Why R?

  • A programming language designed by and for statisticians

Why not stay with Excel?

  • R code makes it easy to document and reproduce your analysis
  • Therefore sharing and working together on a project is much easier
  • Any form of data can be used (.csv, data you get through scraping websites etc.)
  • R can do more than Excel as there exist about 5000 packages and ever more are developed continuously
  • Data visualization: Create easily customizable and beautiful graphs

What are the essential ingredients of R?

  • Input: Write code in the script window
  • Output: The results are in the output window
  • Graphs are displayed in another window

We will quickly try this out...

  • Open the R script labelled “beginners.R”
  • Select the lines of code below the line “Example”
  • Press “Run”

What is an R editor?

  • Here you use R studio
    -> Easy and nice user surface

  • There are many other options

  • Tinn-R

  • Vim

  • Emacs etc.

What does this mean for analysis?

  • Numbers and characters
1+1
[1] 2
 "Hello, World!"
 [1] "Hello, World!"

Wait! Before we begin, we store the commands in a file

Where are we?

 getwd()

Where do we want to be?

 setwd("/home/jim/psych/risk/adol")

As with any data anlysis software we can store our commands in a file called “R-script”

Comment it for later use-> #Hello

Basics (1)

  • Objects (variables etc.)
x<-2

What is x again?

x
[1] 2

“Ah, typed it wrong…”

x<-5

Basics (2)

y<-4
z=x+y
z
[1] 9

Load packages for specialized use

  • Some packages are preinstalled
  • any other package is installed by you

    install.packages("ggplot2")
    
  • Why?
    It is sparse. Storage remains empty and calculations are quick

When I am stuck... What to do?

help(sum)
example(min)

min> require(stats); require(graphics)

min>  min(5:1, pi) #-> one number
[1] 1

min> pmin(5:1, pi) #->  5  numbers
[1] 3.141593 3.141593 3.000000 2.000000 1.000000

min> x <- sort(rnorm(100));  cH <- 1.35

min> pmin(cH, quantile(x)) # no names
[1] -2.8148556 -0.7282596 -0.1301134  0.7611575  1.3500000

min> pmin(quantile(x), cH) # has names
        0%        25%        50%        75%       100% 
-2.8148556 -0.7282596 -0.1301134  0.7611575  1.3500000 

min> plot(x, pmin(cH, pmax(-cH, x)), type = "b", main =  "Huber's function")

plot of chunk unnamed-chunk-12


min> cut01 <- function(x) pmax(pmin(x, 1), 0)

min> curve(      x^2 - 1/4, -1.4, 1.5, col = 2)

plot of chunk unnamed-chunk-12


min> curve(cut01(x^2 - 1/4), col = "blue", add = TRUE, n = 500)

min> ## pmax(), pmin() preserve attributes of *first* argument
min> D <- diag(x = (3:1)/4) ; n0 <- numeric()

min> stopifnot(identical(D,  cut01(D) ),
min+           identical(n0, cut01(n0)),
min+           identical(n0, cut01(NULL)),
min+           identical(n0, pmax(3:1, n0, 2)),
min+           identical(n0, pmax(n0, 4)))

Load data for analysis

  • We use the example data sets available within R
library(MASS)
data()
  • We will use the “ToothGrowth” dataset
data(ToothGrowth)
  • You can load several datasets at the same time and use them simultaneously
data(Animals)
  • Look at the environment!

Simple calculations with R

x<-mean(ToothGrowth$len)
x
[1] 18.81333
y<-mean(ToothGrowth$dose)
y
[1] 1.166667
x/y 
[1] 16.12571

Using both datasets

ToothGrowth$dose/Animals$body
 [1] 3.703704e-01 1.075269e-03 1.376273e-02 1.807664e-02 4.807692e-01
 [6] 4.273504e-05 1.963094e-04 2.672368e-03 9.596929e-04 5.000000e-02
[11] 3.030303e-01 1.890359e-03 4.830918e-03 1.612903e-02 1.502855e-04
[16] 1.063830e-04 1.470588e-01 2.857143e-02 8.333333e+00 4.347826e+01
[21] 8.000000e-01 3.603604e-02 2.000000e-02 3.834356e-02 7.142857e+00
[26] 2.298851e-05 1.639344e+01 1.041667e-02 1.481481e+00 4.301075e-03
[31] 1.376273e-02 1.807664e-02 4.807692e-01 4.273504e-05 1.963094e-04
[36] 2.672368e-03 9.596929e-04 5.000000e-02 1.515152e-01 9.451796e-04
[41] 4.830918e-03 1.612903e-02 1.502855e-04 1.063830e-04 1.470588e-01
[46] 2.857143e-02 8.333333e+00 4.347826e+01 4.000000e-01 1.801802e-02
[51] 2.000000e-02 3.834356e-02 7.142857e+00 2.298851e-05 1.639344e+01
[56] 1.041667e-02 1.481481e+00 4.301075e-03 5.505092e-02 7.230658e-02

How to access your data

  • What's the content of a variable?
ToothGrowth$len
 [1]  4.2 11.5  7.3  5.8  6.4 10.0 11.2 11.2  5.2  7.0 16.5 16.5 15.2 17.3
[15] 22.5 17.3 13.6 14.5 18.8 15.5 23.6 18.5 33.9 25.5 26.4 32.5 26.7 21.5
[29] 23.3 29.5 15.2 21.5 17.6  9.7 14.5 10.0  8.2  9.4 16.5  9.7 19.7 23.3
[43] 23.6 26.4 20.0 25.2 25.8 21.2 14.5 27.3 25.5 26.4 22.4 24.5 24.8 30.9
[57] 26.4 27.3 29.4 23.0

Plotting your data (1)

  • The Effect of Vitamin C on Tooth Growth in Guinea Pigs
  • They were given Orange Juice or ascorbic acid.
  • How long do their teeth grow when they are given orange juice?

plot of chunk unnamed-chunk-19

Plotting your data (2)

  • Cars and the distance they travel depending on speed

plot of chunk unnamed-chunk-20

Thank you!

Questions?

Where to continue...

Here are some websites to continue learning R

Introductory courses:
tryr.codeschool.com
https://www.coursera.org/course/compdata https://www.coursera.org/course/datasci http://sentimentmining.net/StatisticsWithR/