Lesson 5 Conditionals and Controls

by Kristen Sosulski

Welcome to lesson 5. In this lesson, you will be introduced comparison operators, logical operators, conditional statements, iteration and looping.

Follow along with this tutorial by creating a new ipython notebook named lesson05.ipynb and entering the code snippets presented.

Outline

  • Comparison operators
  • Logical operators
  • Conditional statements
  • Conditionals and DataFrames
  • Control structures
  • Summary
  • Exercise 5.1
  • Exercise 5.2
  • Assignment 5

5.1 Comparison operators

Early on in lesson 1, we learned about different types of mathematical operators including: +, -, *, and /. Aside from these operators, there comparative and logical operators that are used to evaluate conditions based on bool values. Boolean values are either True or False.

Let’s start with understanding comparison operations as presented in the table below:

Operation Operator Example Input Answer
Less than < 4 < 10 True
Less than or equal to <= 4 <= 4 True
Greater than > 11 > 12 False
Greater than or equal to >= 4 >= 4 True
Equal to == 3 == 2 False
Not equal to != 3 != 2 True

Less than

For numeric comparisons, the less than operator < evaluates to see if the first value is less than the second value. If it is, the value of True is returned. If it is not, the value of False is returned.

What will be returned from the following statement?

33 <  90

If you answered True you are correct.

Less than or equal to

For numeric comparisons, the less than or equal to operator <= evaluates to see if the first value is less than or equal to second value. If it is, the value of True is returned. If it is not, the value of False is returned. When using <= do not put a space between the < and the = (e.g. < =). This will produce a SyntaxError: invalid syntax.

33 <=33
True

Greater than

For numeric comparisons, the greater than operator > evaluates to see if the first value is greater than second value. If it is, the value of True is returned. If it is not, the value of False is returned.

33 >= 90
False

Greater than or equal to

For numeric comparisons, the greater than or equal to operator >= evaluates to see if the first value is greater or equal to the second value. If it is, the value of True is returned. If it is not, the value of False is returned. When using >= do not put a space between the > and the = (e.g. > =). This will produce a SyntaxError: invalid syntax.

33 >= 33
True

Equal to

For numeric comparisons, the equality operator == evaluates to see if the first value is equal to the second value. If it is, the value of True is returned. If it is not, the value of False is returned. When using == do not put a space between the first = and the second = (e.g. = =). This will produce a SyntaxError: invalid syntax.

33 == 33
True

Note that the equal sign and the double equal sign have VERY different meanings in python. The = denotes assignment as in x = 4 defined as the value of 4 is assigned the variable x. The == denotes equality and compares two values to see if they are equal.

Not equal to

For numeric comparisons, the inequality operator != evaluates to see if the first value is not equal to the second value. If it is, the value of True is returned. If it is not, the value of False is returned. When using != do not put a space between the ! and the = (e.g. ! =). This will produce a SyntaxError: invalid syntax.

33 != 33
False

5.2 Conditional operators

In addition to the six comparison operators (>,>=,<,<=,==, and !-=), there are three conditional operators. These are explained in the table below.

Operation Operator Example Input Answer
Or or (3==3) or (4==7) True
And and (3==3) and (4==7) False
Not not not(3==3) False

Or

For numeric comparisons, the or operator returns True if one of the statements is true. For example, we can evaluate the result of the two expressions in parentheses as show below. If one of them is true, then the value of True is returned.

(5**2 == 25) or (5!=5)  
True

However, if neither of the expressions evaluate to True the value of False is returned.

And

Contrarily, if we altered the expressions above and replaced the or with and the result would evaluate to False. This is because both expressions need to be true if the and operator is used.

(5**2 == 25) and (5!=5)  
False

** Not **

The not operator reverse the result of the statements. It returns False if the result is true and True if the result is false.

not(5==5) 
False

Important note of usage: When working with DataFrames and Series, you will have to use the &, |, and ! operators instead of and, or, and not respectively.

5.3 Conditional statements

Conditionals statements are a nice way to make decisions by evaluating a condition to see if it True. Often times, we may want to make a decision based one or more conditions. This can be achieved with the use of comparison and/or logical operators together with a simple if statement.

if statements

Here’s an example that uses a conditional if statement to ask that question whether 1 < 2 and 4 > 2.

if 1 < 2 and 4 > 2:
    print("The first condition is met")#if the condition is true this line prints out, otherwise nothing is returned.
The first condition is met

The structure for an if statement is very particular. The first line of the if is left aligned and communicates the condition. The condition must be one that returns a True or False value. The condition must be followed by a colon, :.

Next, if the condition evaluates to True, then the lines below the if statement will be executed.

See the prototype (or pseudo code) below:

if CONDITION :
    # do something such as execute a print statement
    print("The first condition is met")

Include only one statement on each line. Otherwise, an IndentationError will be thrown.

For example, the code below will return this error: IndentationError: expected an indented block

if 1 < 2 and 4 > 2:
print("The first condition met")

Multiple if statements

You can ask multiple questions in a single code chunk. Each if statement is evaluated separately and unrelated to each other. This means, if the first condition is met or unmet the second condition is still evaluated, and so on.


if 1 < 2 and 4 > 2:
    print("The first condition met")
The first condition met
if 1 > 2 and 4 < 10:
    print("The second unrelated condition is met.")

if 4 < 10 or 1 < 2:
    print("The third unrelated condition is met.")
The third unrelated condition is met.

if / elif statements

To predicate the one condition on another, for example, the second condition on the first and the third on the second, we would use the if and elif statements together. elif is short for else if and always needs to be preceded by an if statement when it is the first elif in the block of code. There can be multiple elif statements.

Examine the code below. Take note of how the colon is used at the end of the if statement and each elif statement.

Also, note the indentation of the nested if statement after each elif. The indentation rules are important to follow.


if 1 < 2 and 4 > 2:
    print("The first condition is met") #printed if condition is met
elif 1 > 2 and 4 < 10:
    print("The second condition is met") #printed if condition is met but not the first
elif 4 < 10 or 1 < 2:
    print("The third condition is met") #printed if condition is met but not the second or first
else: 
    print("No conditions were met.") #printed only if the none of the 3 conditions were met. 
The first condition is met

This example illustrates how you can evaluate for one condition and if that is not met, then evaluate for a second condition and so on. The else: block is only run when the last condition elif 4 < 10 or 1 < 2: is evaluated to False.

5.4 Conditionals and DataFrames

Conditional statements are very useful when working with DataFrames.

Let’s do a quick review of how to import data and select columns and rows with the mba data set.

First, we import the pandas class as pd. Then we import the data by passing in the full URL references to the .csv file to the read_csv method.

#first import the data
import pandas as pd
mydata = pd.read_csv("http://becomingvisual.com/python4data/mba.csv", header=0) 

Then, preview the data.

mydata.head(5)
   Rank             School  ... Total Tuition ($)  Duration (Months)
0     1    Chicago (Booth)  ...            106800                 21
1     2   Dartmouth (Tuck)  ...            106980                 21
2     3  Virginia (Darden)  ...            107800                 21
3     4            Harvard  ...            107000                 18
4     5           Columbia  ...            111736                 20

[5 rows x 11 columns]

Next, to select columns we can use a simple syntax of DataFrameName.ColumnName.

For example,

mydata.Rank
0      1
1      2
2      3
3      4
4      5
5      6
6      7
7      8
8      9
9     10
10    11
11    12
12    13
13    14
14    15
15    16
16    17
17    18
18    19
19    20
20    21
21    22
22    23
23    24
24    25
Name: Rank, dtype: int64

This is a handy short cut. The alternative is to use the indexing operator as shown below.

mydata["School"]
0                   Chicago (Booth)
1                  Dartmouth (Tuck)
2                 Virginia (Darden)
3                           Harvard
4                          Columbia
5     California At Berkeley (Haas)
6                       MIT (Sloan)
7                          Stanford
8                              IESE
9                               IMD
10                 New York (Stern)
11                           London
12           Pennsylvania (Wharton)
13                        HEC Paris
14                Cornell (Johnson)
15                  York (Schulich)
16         Carnegie Mellon (Tepper)
17                            ESADE
18                           INSEAD
19           Northwestern (Kellogg)
20                 Emory (Goizueta)
21                               IE
22                  UCLA (Anderson)
23                  Michigan (Ross)
24                             Bath
Name: School, dtype: object

Now that we’ve reviewed how to select columns and rows in a pandas DataFrame, let’s select some data based on some condition?

Specifically, what if we want to filter our mba DataFrame to show only schools ranked greater or equal to 10?

To do that, we take a column from the DataFrame and apply a Boolean condition to it. Here’s an example of a Boolean condition:

#Boolean condition
condition01 = mydata.Rank <= 10
condition01
0      True
1      True
2      True
3      True
4      True
5      True
6      True
7      True
8      True
9      True
10    False
11    False
12    False
13    False
14    False
15    False
16    False
17    False
18    False
19    False
20    False
21    False
22    False
23    False
24    False
Name: Rank, dtype: bool

A Series is returned of Boolean values. Those with the value of True meet the condition. You can see that a Series is returned by by using the type() function.

type(condition01)
<class 'pandas.core.series.Series'>

Let’s write another condition to see which schools have an average starting salary of greater than or equal to $125,000.

#Boolean condition
condition02 = mydata.AvgSalary >=125000
condition02
0     False
1     False
2     False
3     False
4     False
5     False
6     False
7      True
8      True
9      True
10    False
11    False
12    False
13    False
14    False
15    False
16    False
17    False
18    False
19    False
20    False
21    False
22    False
23    False
24     True
Name: AvgSalary, dtype: bool

These data are informative, but probably not that useful. What if you wanted to return the data for just those schools that met your conditions?

This is where we would use double square brackets to return the result set. Let’s try doing this with condition02.

#Boolean condition that returns the result set
condition02 = mydata[mydata.AvgSalary >=125000]
condition02
    Rank    School  ... Total Tuition ($)  Duration (Months)
7      8  Stanford  ...            114600                 21
8      9      IESE  ...             95610                 19
9     10       IMD  ...             67416                 11
24    25      Bath  ...             36057                 12

[4 rows x 11 columns]

This returns a DataFrame object. You can see this by using the type() function.

type(condition02)
<class 'pandas.core.frame.DataFrame'>

The alternative syntax is:

condition02 = mydata[mydata['AvgSalary'] >=125000]
type(condition02)
<class 'pandas.core.frame.DataFrame'>

We can ask more complex questions of our data using using logical operators | for “or” and & for “and”.

Let’s filter the the DataFrame to show only those schools where the Average salary is greater than or equal to $125,000 and the tution is less than or equal to $100,000.

#Boolean condition that returns the result set
condition03 = mydata[(mydata['AvgSalary'] >=125000) & (mydata['Total Tuition ($)'] <= 100000)]
condition03 #type DataFrame
    Rank School  ... Total Tuition ($)  Duration (Months)
8      9   IESE  ...             95610                 19
9     10    IMD  ...             67416                 11
24    25   Bath  ...             36057                 12

[3 rows x 11 columns]

Take note of the the use of square brackets and parentheses. We need to make sure to group evaluations with parentheses so Python knows how to evaluate the conditional.

I choose to use the index operator [ ], over the . to access the column names. This is because the names have spaces which can cause an error. This is why single names for variables make coding more efficient.

Also, take note of the use of the & operator usage. If and were used a ValueError would be thrown. This is because the conditional operators ( and, or and not) do operate the same on pandas Series and DataFrame. Use the logical bit-wise operators (&, |, and !) when working with DataFrames and Series.

The pandas isin() method

Pandas isin() method is used to filter DataFrames. Using the isin() method returns a subset of data dependent on the presence of a value in a column, for instance. In the example below, we are evaluating to see if the Country field contains the US or France.

#Boolean condition that returns the result set
condition04 = mydata[mydata.Country.isin(['US', 'France'])]
condition04 #type DataFrame
    Rank                         School  ... Total Tuition ($)  Duration (Months)
0      1                Chicago (Booth)  ...            106800                 21
1      2               Dartmouth (Tuck)  ...            106980                 21
2      3              Virginia (Darden)  ...            107800                 21
3      4                        Harvard  ...            107000                 18
4      5                       Columbia  ...            111736                 20
5      6  California At Berkeley (Haas)  ...            106792                 21
6      7                    MIT (Sloan)  ...            116400                 22
7      8                       Stanford  ...            114600                 21
10    11               New York (Stern)  ...             96640                 20
12    13         Pennsylvania (Wharton)  ...            107852                 21
13    14                      HEC Paris  ...             66802                 16
14    15              Cornell (Johnson)  ...            107592                 21
16    17       Carnegie Mellon (Tepper)  ...            108272                 21
19    20         Northwestern (Kellogg)  ...            113100                 22
20    21               Emory (Goizueta)  ...             87200                 22
22    23                UCLA (Anderson)  ...            105160                 21
23    24                Michigan (Ross)  ...            105500                 20

[17 rows x 11 columns]

5.5 Control structures

The while loop

The while statement allows you to repeatedly execute a block of statements as long as a condition is True. A while statement can have an optional else clause. Commonly, the while control structure is called a loop. This is because you are doing something over and over again, only when a given condition evaluates to True. The while loop iterates over a block of code as long as the test expression (condition) is True.

We generally use this type of control structure (over the for loop) when we don’t know beforehand, the number of times the loop will iterate.

Read through the following example.

What is the expected output?

counter = 0

while counter < 3:
    print("Inside loop")
    counter = counter + 1
else:
    print("Inside else")
  • initialize the variable counter by assigning it to zero.
  • evaluate the condition counter < 3
  • if the condition is True then execute the print statement and increment the counter by 1.
  • If the condition counter < 3 evaluates to False, then go to the else clause and execute the print statement. Exit the loop.

The output of the code is below.

Inside loop
Inside loop
Inside loop
Inside else

A common use case for the while loop is the circumstance where you want to collect input from a user until they are done. The code below asks the user a question.

print("Tell me about yourself. Enter one sentence at a time and press enter. When you are done type the word logout and press enter.")

while True:
  line=input('>')
  if line[0] == "#":
    continue
  if line == "logout":
    break
  print(line) 

print("Thank you for sharing! Have a nice day.")  

Try out this code. See if you can understand the role of the break and continue statements.

  • When the continue is executed it ends the current iteration and jumps back to the while statement and starts the next iteration. This occurs when the user types #.
  • When the break statement is executed, the loop is exited. The occurs only when the user types logout.

The for loop

The for statement is another looping statement which iterates over a sequence of objects i.e. go through each item in a sequence (list, tuple, string) and other iterable objects.

Let’s look at two examples:

Example 1

What is the output produced by this loop?

mytuple =(1, 5,9)

for i in mytuple:
    print(i)
else:
    print('The for loop is over')

Try running code yourself to observe how the for loop functions.

The for loops presents a short cut for writing the following code:

mytuple =(1, 5,9)
i=0
while  i < len(mytuple): 
    print (mytuple[i])
    i=i+1
else:
    print('The loop is over')

Essentially, a for loop is a condensed version of a while loop. It avoids explicitly initializing the index variable, incrementing the index variable, and the condition of len(mytuple) is implied.

Example 2

In this example, we are finding the sum of all numbers stored in a list


# List of numbers
numbers = [6, 5, 3, 8, 4, 2, 5, 4, 11]

# variable to store the sum
sum = 0

# iterate over the list
for val in numbers:
    sum = sum+val

# Output: The sum is 48
print("The sum is", sum)
The sum is 48

5.6 Summary

  • Conditional statements may use comparison and logical operators to evaluate the truth of an expression.
  • We can ask more complex questions of our data using using logical operators | for “or” and & for “and”.
  • Pandas isin() method is used to filter DataFrames.
  • for and while are looping statements that iterate over a sequence of objects or block of code, respectively.

Exercise 5.1

  1. Filter the mba DataFrame to show only those schools where the Average salary is less than to $100,000 and the tution is greater than equal to $100,000.

Exercise 5.2

  1. Create a new DataFrame called schoolOptions that pulls in the top 10 schools where average salary is greater than or equal to $100,000.

  2. Create a new column in the above DataFrame that calculates the monthly tuition. Call this column monthlyTuition, sort the new dataframe by smallest to largest (hint: use dataframe.sort_values(by=[“columnName”])

Assignment 5

  1. Upload the BIGMAC Index from the Economist data from Github for July 2019. It’s available at: http://becomingvisual.com/python4data/bigmac.csv

The value of a BigMac in the United States is $1. Create a new DataDrame called `inflatedBurger for all countries whose Big Mac price in dollars (dollar_price) was greater than purchasing power parity (dollar_ppp).

  1. Create a for loop that prints “Do not buy a Big Mac when you visit _________” for each country in your new table.

  2. Create a list of all country names where dollar price is greater than dollar purchasing power parity (dollar_ppp).

  3. Remove Euro Area from your list since it is not a country.

  4. Why was the index used not 6?