1 Getting Started with R

In this session we will cover the following:

  1. What is R?
  2. Installing R and RStudio
  3. RStudio Overview
  4. Working in the Console
  5. Arithmetic Operators
  6. Logical Operations
  7. Functions
  8. Getting Help in R and Quitting RStudio
  9. Summary

Watch introductory video

How to use this guide

Throughout this guide, there are points where concepts are explained through entering commands in R. You will learn the concepts best if you follow along. Anytime you see something like the example below (in a different font) you should be typing in those commands in R. The > indicates that this is where you type the command (the input). Do not type >. The output from R follows below beginning with the [1]. This is something you do not type; this is the output generated by R.

2+7
## [1] 9

Abbreviations

BEDMAS --- Brackets( ), Exponents ^, Division / and Multiplication *, Addition + and Subtraction -

GUI --- Graphical User Interface

IDE --- Integrated Development Environment

1. What is R?

R is a free software environment for statistical computing and graphics. “The R language is widely used among statisticians and data miners for developing statistical software and data analysis." ((“R Programming Language” 2018), para 1).

What is R and what does R stand for?

R is an implementation of the S programming language. S was created by John Chambers while at Bell Labs and R was created by Rose Ihaka and Robert Gentleman at the University of Auckland, New Zealand. R is currently developed by the R Development Core Team, of which Chambers is a member. R is named partly after the first names of the first two R authors and partly as a play on the name of S. ((“R Programming Language” 2018), para 2).

"R provides a wide variety of statistical (linear and nonlinear modeling, classical statistical tests, time-series analysis classification, clustering, …) and graphical techniques, and is highly extensible. The S language is often the vehicle of choice for research in statistical methodology, and R provides an Open Source route to participation in that activity" (R Core Team (2016), para 2).

"One of R’s strengths is the ease with which well-designed publication-quality plots can be produced, including mathematical symbols and formulae where needed. Great care has been taken over the defaults for the minor design choices in graphics, but the user retains full control" (R Core Team (2016), para 3).

"R is available as Free Software under the terms of the Free Software Foundation’s GNU General Public License in source code form. It compiles and runs on a wide variety of UNIX platforms and similar systems (including FreeBSD and Linux), Windows and MacOS" (R Core Team (2016), para 4).

2. Installing R and RStudio

First, you install R, then you will need to install RStudio.

R.

You will need at least version 4.0.3 of R installed. This is the current version as of February 20, 2020. It is freely distributed online and you can download it at: https://cran.rstudio.com/

There are free updates every six months.

RStudio

The version of R that you downloaded is pretty bare bones. It requires you to work in a console window (see Figure 1.1).

R console interface

Figure 1.1: R console interface


R is the underlying statistical language, not the graphical user interface (GUI) that you use to interact with R. The GUI interface is RStudio. RStudio provides a professional interface into R (IDE, Integrated development environment) that is much easier to work with. Like R, RStudio is free software.

Note: You must have R installed prior to installing RStudio.

Download RStudio from http://www.RStudio.com > Download IDE for R When you click on the download button on the homepage it will ask you to choose whether you want the desktop version or the server version. Select the desktop version. You’ll be directed to a page that suggests the correct download for your system. Once you download it, open the installer file to install RStudio.

3. RStudio Overview

After RStudio has finished installing, you can start R by opening RStudio. See Figure 1.2 for a screenshot of RStudio.

RStudio IDE running on the Mac OSX 10.12.5

Figure 1.2: RStudio IDE running on the Mac OSX 10.12.5


For your reference, refer to the documentation on how to use RStudio at: https://support.RStudio.com/hc/en-us/categories/200035113-Documentation

4. Working in the console

The lower left quadrant of the screen is called the console pane or window, see Figure 1.3. When starting up RStudio there is a description that appears in the console window that describes R project and provides some guidance on how to get help and learn more.

Console pane in RStudio

Figure 1.3: Console pane in RStudio


We can begin using R as a simple calculator.

The > is known as the command prompt. After the > we can type a command in the console window and press enter. Pressing enter executes the command we type in.

TRY IT: Type 32 + 78 at the R command prompt and press enter.

32 + 78
## [1] 110

So what happened?

Well, R gave you a response (output) to your input (32 + 78). That response came after you pressed the enter key. It was [1] 110. It’s clear that 110 is the answer to the 32 + 78. However, what does the [1] mean? At this point you can pretty much ignore it, but technically it refers to the index of the first item on each line. (Sometimes R prints out many lines as a result. The number inside the brackets helps you figure out where in the sequence you are per line.)

Working in the console

All programming languages, including R, have a set of rules that need to be followed. Some rules are strict (such as case sensitivity) and others are less strict such as spacing.

R can tell when you are not done. A + indicates that a command has not been fully entered by you. Note the dual use of the + operator. It is used for addition and in the case below as a way to notify the user that R is still waiting for the command to be entered in an executable fashion.

For example, type 2 + and press enter.

2 +
+

The result will be +. This means that R is waiting for you to complete the command. R cannot do anything with the command 2 +. There is nothing to add to 2. If you enter 3 after the plus sign, you’ll get the answer is returned.

2 +
+ 3
## [1] 5

An alternative to escape the + prompt and return back to the > command prompt you can always press the escape key on your keyboard.

R is flexible with spacing. R ignores redundant spacing.

2 +     7
## [1] 9

is the same as

2 + 7
## [1] 9

R is case sensitive. There will be many cases where you will reference variables, functions, and different data structures. It’s important to note that R interprets a variable named X as different from a variable named x. We’ll create some of our own variables later on in this session.

R complains sometimes. If you make a syntax error such as:

1 + d
Error: object 'd' not found

you’ll get an error that object ‘d’ was not found. This means that d was never defined by you in R. R doesn’t know what d means. When this happens, it’s probably because we made a typo. Perhaps you meant to type 1 + 3, not 1 + d.

1 + 3
## [1] 4

Clearing the console. The console window can become cluttered at times. To clear the screen, press Ctrl + l. (control and the letter l).

5. Simple calculations with R: Arithmetic operators

There are many arithmetic operators used in R. The first five operators are used very frequently in R and throughout this course.

Operation Operator Example Input Example Output
Addition + 100 + 2 102
Subtraction - 93 - 3 90
Multiplication * 10 * 10 100
Division / 100 / 5 20
Power ^ 8^2 64
Integer Division %/% 65 %/% 10 6
Modulus (remainder for integer division) %% 65 %% 10 5

Table 1. Arithmetic Operators


Try it. I would encourage you to practice using these operators at the R command prompt. Type in the example input provided in Table 1.

Order of operations

Consider the following examples

ex. 1

3 + 3 * 20

ex. 2

(3+3) *20

In ex. 1 the answer is 63 and in ex. 2 the answer is 120. Why? R follows the order of operations, where precedence follows the BEDMAS order: Brackets( ), Exponents ^, Division / and Multiplication *, Addition + and Subtraction -.

In ex. 2 we used the brackets to force R to compute the sum of 3 + 3 prior to executing the multiplication operation.

Let’s look at another example:

ex. 3

42 / 2 * 4

What is the answer in this case? It’s 84 right? When using operators that have the same priority in the order of precedence, such as division and multiplication R evaluates the problem from left to right. In ex. 3, the division is done before the multiplication. See ex. 4 to see how multiplication is evaluated before the division operation.

ex. 4

40 * 10 / 2
## [1] 200

To learn more about how R gives precedence to all operators you can reference the help section in R by typing:

?Syntax

6. Logical operations

Logical operators are used to evaluate the “truth” of a statement. See Table 2 for a list of logical operators. For example, if you asked the question does 1 equal 1? You may be thinking why would I ever want to ask such a dumb question. I know that the two values are equal.

In R, the answer to the question would not be a yes or no, but a true or false response. Obviously, in this case, the answer is TRUE.

You would write the question like this:

1 == 1
## [1] TRUE

When using logical operators, such as ==, R returns a Boolean value of either TRUE or FALSE. Did you notice that we used two equal signs to evaluate for equality? Why didn’t we use just a single equals sign? This is because = denotes assignment. For example if we wanted to create a variable called temperature and give it a value of 45 (we’ll learn much more about variables in session 2). We could simply assign it a value using the assignment operator.

temperature = 45

Note: the conventional assignment operator in R is <-. See the example below.

temperature <- 45

We could check that our new variable temperature referenced the value we assigned it by simply typing the variable name at the console.

temperature
## [1] 45

= denotes assignment, not equality.

<- is the preferred operator to use for assignment.

== denotes equality

Logical operators are useful for evaluation of certain conditions. Let’s think about a conceptual example. Suppose that you created this amazing App that remotely controlled the thermostat of your house. Using this App you want to change the temperature based the temperature of the house. To do this, you would want to evaluate the house temperature against some threshold temperature. Then, the house temperature can be compared to the threshold temperature to determine whether to raise the thermostat, lower the thermostat, or do nothing. To begin programming this problem in R we would need to use logical operators such as >= (greater than or equal to). We would want to ask R if the temperature of the house is >= to the threshold temperature. If the result is FALSE, then we would want to take action and set the thermostat to a certain value.

temperature <- 45
temperature >= 50
## [1] FALSE
Operation Operator Example Input Answer
Less Than < 4 < 10 TRUE
Less Than or Equal To <= 4 <= 4 TRUE
Greater Than > 11 > 12 FALSE
Greater Than or Equal To >= 4 >= 4 TRUE
Equal To == 3 == 2 FALSE
Not Equal To != 3 != 2 TRUE
Not ! !(3==3) FALSE
Or | (3==3) | (4==7) TRUE
And & (3==3) & (4==7) FALSE

Table 2. Logical Operators in R


Evaluating inequality

There are times where you will want to evaluate if two values are not equal. This is done using the != operator.

4 !=4
## [1] FALSE

In the example above, we are asking R if 4 is not equal to 4. R evaluates this statement as FALSE since 4 is equal to 4.

Evaluating multiple conditions at once

The & (and) and | (or) operators enable you to evaluate multiple conditions. For example, suppose you wanted to do something if the thermostat in your house was >= 50 and the house temperature was >= 65. Let’s say that based on both conditions being TRUE, you would to set the thermostat to 50 degrees.

For purposes of this example, we need to “hard-code” our variables so we have something to test. Below, we just set our variables thermostat and temperature to some value.

thermostat <- 55
temperature <- 70
  1. The OR operator is denoted using the | (known as the pipe) and can be found on your keyboard on the same key as the forward slash (\).

Next, let’s write our evaluative statement

(thermostat >= 50) & (temperature >= 65)
## [1] TRUE

This program is not quite complete. We haven’t done anything based on the result of the statement. Later on in this course, we’ll learn house to write conditional statements and develop control structures such as loops (such as if/else, while, for) to execute a commands based on the evaluation of logical operators.

Try out the other examples listed in table 2 to become familiar with all of the logical operators.

7. Functions

There are many functions that help you perform calculations. As you noticed, we can already use arithmetic and logical operators to perform calculations and evaluate data. These operators are technical functions.

To do more advanced calculations, manipulate data, and perform actual statistics we need to use functions that go beyond the basic use of operators.

Some examples of functions include:

Description Function Example Input Answer
Square Root sqrt( ) sqrt(144) 12
Absolute Value abs( ) abs(-21) 21
Round round( ) round(3.432,2) 3.43
Logarithm in Base 10 log10( ) log10(1000) 3
Logarithm in Base 2 log2( ) log2(8) 3
Exponential Function. Refers to e, Euler's number3 exp( ) exp(3) 20.08554

Table 3. Simple functions in R


All functions are followed by ( ). The parentheses are where you pass in the data you want to manipulate using the function. For example, if we want the square root of 144 we pass the value of 144 into the function sqrt(144). Behind the scenes in R the function is basically computing the square root with a line of code that reads something similar to: 144 ^ .5

sqrt(144)
## [1] 12

The absolute value function simple converts negative numbers to positive numbers and leaves positive numbers alone. Mathematically, absolute value of x is written |x| or sometimes abs(x). In R, it’s very simple:

abs(-21)
## [1] 21

The rounding function enables you to round a number to a specified number of decimal places or to a whole number. Rounding to a whole number.

round (10.5352)
## [1] 11

Rounding to 2 decimal places. In the example below, the function round( ) has two arguments: 10.5352 and 2. The first argument is the number to be rounded. The second argument specifies the number of decimal places the first argument should be rounded.

round (10.5352, 2)
## [1] 10.54

Try it. Try entering the example input from Table 3 to see how these basic functions work.

8. Getting help in R and Quitting RStudio

Help

To learn more about the functions discussed in this lesson, you can always use the help built-into R. The ? operator next to any function will provide details on the function and examples.

For example, if you wanted to remember how to use the round function you could simply type:

?round

This will launch the help window in RStudio in the lower right quadrant (see Figure 1.4). Here you can find details about various syntax and functions in R.

The Help Window in RStudio

Figure 1.4: The Help Window in RStudio

Quitting RStudio

To quit RStudio type:

q( )
Save workspace image? [y/n/c]:

The y/n/c is short for yes / no / cancel. Type y if you want to save, n if you don’t, and c if you changed your mind and don’t want to quit R.

Saving your workspace image in R just means that you can store all of the variables you created (if you created any) in a default data file, which will automatically reload for you next time you open R. I’d recommend that you do not save your workspace image because there are better ways to save the variables you created. We’ll learn how to do this later in the course when it’s most relevant.

9. Summary

R is a free software environment (free software is also known as “open-source” since the background coded used to produce the software is open to the public) for computing and graphics. RStudio is an integrated development environment that provides a graphical user interface for using R to work with data.

There are some basic rules you need to follow when working in R. R is case sensitive. When typing any commands in R, note that it is case sensitive. There will be many cases where you will reference variables. X is different than x and Y is different than y.

R is flexible with spacing. R ignores redundant spacing.

R can tell when you are not done. A plus sign (+) indicates that a command has not been fully entered by you. The + is also used as a mathematical operator (to denote addition) and as an operator to concatenate (more on concatenation later).

R complains when we make syntactical mistakes. A syntax error is usually a typo by the user. This can be a misspelling or a reference to an undefined function, variable, data set, etc. in R.

Order of operations. R follows the BEDMAS order of operations. When using operators that have the same priority in the order of precedence, such as division and multiplication R evaluates the problem from left to right.

R Commands

• To enter a command, enter it after the R command prompt >

• Press the enter key, to execute a command you entered

• Clearing the console. Ctrl + l clears the console window

• Arithmetic operators include +, -, /, ^, and *

• Logical operators include >, >=, ==, <, <=, !=, !, &, and |

• = denotes assignment, not equality.

• <- denotes assignment, use it instead of =.

• == denotes equality

• Basic functions in R include abs( ), round( ), and sqrt( ).

References and Resources

R Programming Language. Wikipedia.com Accessed http://en.wikipedia.org/wiki/R_(programming_language)

R. http://cran.r-project.org

How to use RStudio https://support.RStudio.com/hc/en-us/categories/200035113-Documentation

R for Data Science. http://r4ds.had.co.nz/index.html

1.1 Exercise 1.1

Compute the following:

  1. (123 - 45) / 4 + 4 * (72 / 2.34 - 3)

  2. absolute value of -88

  3. Base 10 logarithm of 72

  4. e^1.45 - 2.612

  1. assign a variable year_born to 1984
  2. assign a variable year_current to 2014
  3. assign a variable age and compute it
  4. return True / False if person is eligible to vote in US (if age is greater than or equal to 18)
  1. Given: formula for area of circle is pi*r2 Given: Area = 100
  1. Write statement to find r. (Hint: utilize “sqrt” and “pi” functions)
  1. Given: went to lunch and pre-tax bill was $45.90
  1. compute subtotal: add NYC tax of 8.875%
  2. compute 15% tip on subtotal
  3. compute 20% tip on subtotal

1.1.1 Code walkthrough

1.2 Exercise 1.2

  1. Compute the following:
(((20*3)-14)^3)
  1. Round the square root of 50 to the fourth decimal

  1. Assign a variable customers to 500
  2. Assign a variable pizza_price to $20

Task:

  1. Assign a variable todays_revenue (customers * pizza_price) and compute today's revenue

  2. Is today's revenue greater than yesterday’s revenue of $7,000 and less than tomorrow’s projected revenue of $11,000? Show the code that would answer the following question.

1.2.1 Code walkthrough

1.3 Assignment 1

Answer the following questions:

  1. Which of the following is a logical operator?
/ | - ^
  1. What value does R return in the statement below?
3 >= 4
  1. What is the result of this calculation?
(45 + 3) * 43 + 3^2
  1. How would R evaluate the following?
carspeed = 70
speedlimit = 65
carspeed > speedlimit
  1. How would R evaluate the following?
(2+2 == 4) | (2+2 == 5)
  1. How would R evaluate the following?
!FALSE
  1. What is the result of this function?
round(33.2321435452, 2)
  1. What is the result of this function?
sqrt (64)
  1. What is the result of this statement?
sqrt(64) == 64 ^.5
  1. What is the result of this statement?
abs(-32)
  1. Which of the following is an arithmetic operator?
*, |, &, !
  1. What is wrong with this code?
2 +     3 *4 + sqrt[100]

References

R Core Team. 2016. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.

“R Programming Language.” 2018. Wikipedia. https://en.wikipedia.org/wiki/R_(programming_language).