# R for Beginners

Barbara Fulda

Please use arrow keys to navigate

• The why question
• The what question
• The how question

### Why R?

• A programming language designed by and for statisticians

### Why not stay with Excel?

• R code makes it easy to document and reproduce your analysis
• Therefore sharing and working together on a project is much easier
• Any form of data can be used (.csv, data you get through scraping websites etc.)
• R can do more than Excel as there exist about 5000 packages and ever more are developed continuously
• Data visualization: Create easily customizable and beautiful graphs

### What are the essential ingredients of R?

• Input: Write code in the script window
• Output: The results are in the output window
• Graphs are displayed in another window

### We will quickly try this out...

• Open the R script labelled “beginners.R”
• Select the lines of code below the line “Example”
• Press “Run”

### What is an R editor?

• Here you use R studio
-> Easy and nice user surface

• There are many other options

• Tinn-R

• Vim

• Emacs etc.

### What does this mean for analysis?

• Numbers and characters
1+1

[1] 2

 "Hello, World!"

 [1] "Hello, World!"


### Wait! Before we begin, we store the commands in a file

Where are we?

 getwd()


Where do we want to be?

 setwd("/home/jim/psych/risk/adol")


As with any data anlysis software we can store our commands in a file called “R-script”

Comment it for later use-> #Hello

### Basics (1)

• Objects (variables etc.)
x<-2


What is x again?

x

[1] 2


“Ah, typed it wrong…”

x<-5


### Basics (2)

y<-4

z=x+y
z

[1] 9


### Load packages for specialized use

• Some packages are preinstalled
• any other package is installed by you

install.packages("ggplot2")

• Why?
It is sparse. Storage remains empty and calculations are quick

### When I am stuck... What to do?

help(sum)

example(min)


min> require(stats); require(graphics)

min>  min(5:1, pi) #-> one number
[1] 1

min> pmin(5:1, pi) #->  5  numbers
[1] 3.141593 3.141593 3.000000 2.000000 1.000000

min> x <- sort(rnorm(100));  cH <- 1.35

min> pmin(cH, quantile(x)) # no names
[1] -2.8148556 -0.7282596 -0.1301134  0.7611575  1.3500000

min> pmin(quantile(x), cH) # has names
0%        25%        50%        75%       100%
-2.8148556 -0.7282596 -0.1301134  0.7611575  1.3500000

min> plot(x, pmin(cH, pmax(-cH, x)), type = "b", main =  "Huber's function")



min> cut01 <- function(x) pmax(pmin(x, 1), 0)

min> curve(      x^2 - 1/4, -1.4, 1.5, col = 2)



min> curve(cut01(x^2 - 1/4), col = "blue", add = TRUE, n = 500)

min> ## pmax(), pmin() preserve attributes of *first* argument
min> D <- diag(x = (3:1)/4) ; n0 <- numeric()

min> stopifnot(identical(D,  cut01(D) ),
min+           identical(n0, cut01(n0)),
min+           identical(n0, cut01(NULL)),
min+           identical(n0, pmax(3:1, n0, 2)),
min+           identical(n0, pmax(n0, 4)))


• We use the example data sets available within R
library(MASS)
data()

• We will use the “ToothGrowth” dataset
data(ToothGrowth)

• You can load several datasets at the same time and use them simultaneously
data(Animals)

• Look at the environment!

### Simple calculations with R

x<-mean(ToothGrowth$len) x  [1] 18.81333  y<-mean(ToothGrowth$dose)
y

[1] 1.166667

x/y

[1] 16.12571


### Using both datasets

ToothGrowth$dose/Animals$body

 [1] 3.703704e-01 1.075269e-03 1.376273e-02 1.807664e-02 4.807692e-01
[6] 4.273504e-05 1.963094e-04 2.672368e-03 9.596929e-04 5.000000e-02
[11] 3.030303e-01 1.890359e-03 4.830918e-03 1.612903e-02 1.502855e-04
[16] 1.063830e-04 1.470588e-01 2.857143e-02 8.333333e+00 4.347826e+01
[21] 8.000000e-01 3.603604e-02 2.000000e-02 3.834356e-02 7.142857e+00
[26] 2.298851e-05 1.639344e+01 1.041667e-02 1.481481e+00 4.301075e-03
[31] 1.376273e-02 1.807664e-02 4.807692e-01 4.273504e-05 1.963094e-04
[36] 2.672368e-03 9.596929e-04 5.000000e-02 1.515152e-01 9.451796e-04
[41] 4.830918e-03 1.612903e-02 1.502855e-04 1.063830e-04 1.470588e-01
[46] 2.857143e-02 8.333333e+00 4.347826e+01 4.000000e-01 1.801802e-02
[51] 2.000000e-02 3.834356e-02 7.142857e+00 2.298851e-05 1.639344e+01
[56] 1.041667e-02 1.481481e+00 4.301075e-03 5.505092e-02 7.230658e-02


### How to access your data

• What's the content of a variable?
ToothGrowth\$len

 [1]  4.2 11.5  7.3  5.8  6.4 10.0 11.2 11.2  5.2  7.0 16.5 16.5 15.2 17.3
[15] 22.5 17.3 13.6 14.5 18.8 15.5 23.6 18.5 33.9 25.5 26.4 32.5 26.7 21.5
[29] 23.3 29.5 15.2 21.5 17.6  9.7 14.5 10.0  8.2  9.4 16.5  9.7 19.7 23.3
[43] 23.6 26.4 20.0 25.2 25.8 21.2 14.5 27.3 25.5 26.4 22.4 24.5 24.8 30.9
[57] 26.4 27.3 29.4 23.0


• The Effect of Vitamin C on Tooth Growth in Guinea Pigs
• They were given Orange Juice or ascorbic acid.
• How long do their teeth grow when they are given orange juice?

• Cars and the distance they travel depending on speed

## Thank you!

Questions?

### Where to continue...

Here are some websites to continue learning R

Introductory courses:
tryr.codeschool.com
https://www.coursera.org/course/compdata https://www.coursera.org/course/datasci http://sentimentmining.net/StatisticsWithR/