Lesson 5 Conditionals and Controls

Welcome to lesson 5. In this lesson, you will be introduced to comparison operators, logical operators, conditional statements, iteration, and looping.

Follow along with this tutorial by creating a new ipython notebook named lesson05.ipynb and entering the code snippets presented.

Outline

  • Comparison operators
  • Logical operators
  • Conditional statements
  • Conditionals and DataFrames
  • Control structures
  • Summary
  • Exercise 5.1
  • Exercise 5.2
  • Assignment 5

5.1 Comparison operators

Early on in lesson 1, we learned about different types of mathematical operators including +, -, *, and /. Aside from these operators, there comparative and logical operators that are used to evaluate conditions based on bool values. Boolean values are either True or False.

Let’s start with understanding comparison operations as presented in the table below:

Operation Operator Example Input Answer
Less than < 4 < 10 True
Less than or equal to <= 4 <= 4 True
Greater than > 11 > 12 False
Greater than or equal to >= 4 >= 4 True
Equal to == 3 == 2 False
Not equal to != 3 != 2 True

Less than

For numeric comparisons, the less-than operator < evaluates to see if the first value is less than the second value. If it is, the value of True is returned. If it is not, the value of False is returned.

What will be returned from the following statement?

33 <  90

If you answered True you are correct.

Less than or equal to

For numeric comparisons, the less than or equal to operator <= evaluates to see if the first value is less than or equal to the second value. If it is, the value of True is returned. If it is not, the value of False is returned. When using <= do not put a space between the < and the = (e.g. < =). This will produce a SyntaxError: invalid syntax.

33 <=33
True

Greater than

For numeric comparisons, the greater than operator > evaluates to see if the first value is greater than the second value. If it is, the value of True is returned. If it is not, the value of False is returned.

33 >= 90
False

Greater than or equal to

For numeric comparisons, the greater than or equal to operator >= evaluates to see if the first value is greater or equal to the second value. If it is, the value of True is returned. If it is not, the value of False is returned. When using >= do not put a space between the > and the = (e.g. > =). This will produce a SyntaxError: invalid syntax.

33 >= 33
True

Equal to

For numeric comparisons, the equality operator == evaluates to see if the first value is equal to the second value. If it is, the value of True is returned. If it is not, the value of False is returned. When using == do not put a space between the first = and the second = (e.g. = =). This will produce a SyntaxError: invalid syntax.

33 == 33
True

Note that the equal sign and the double equal sign have VERY different meanings in python. The = denotes assignment as in x = 4 defined as the value of 4 is assigned the variable x. The == denotes equality and compares two values to see if they are equal.

Not equal to

For numeric comparisons, the inequality operator != evaluates to see if the first value is not equal to the second value. If it is, the value of True is returned. If it is not, the value of False is returned. When using != do not put a space between the ! and the = (e.g. ! =). This will produce a SyntaxError: invalid syntax.

33 != 33
False

5.2 Conditional operators

In addition to the six comparison operators (>,>=,<,<=,==, and !-=), there are three conditional operators. These are explained in the table below.

Operation Operator Example Input Answer
Or or (3==3) or (4==7) True
And and (3==3) and (4==7) False
Not not not(3==3) False

Or

For numeric comparisons, the or operator returns True if one of the statements is true. For example, we can evaluate the result of the two expressions in parentheses as shown below. If one of them is true, then the value of True is returned.

(5**2 == 25) or (5!=5)  
True

However, if neither of the expressions evaluates to True the value of False is returned.

And

Contrarily, if we altered the expressions above and replaced the or with and the result would evaluate as False. This is because both expressions need to be true if the and operator is used.

(5**2 == 25) and (5!=5)  
False

Not

The not operator reverses the result of the statements. It returns False if the result is true and True if the result is false.

not(5==5) 
False

Important note of usage: When working with DataFrames and Series, you will have to use the &, |, and ! operators instead of and, or, and not respectively.

5.3 Conditional statements

Conditionals statements are a nice way to make decisions by evaluating a condition to see if it is True. Oftentimes, we may want to make a decision based on one or more conditions. This can be achieved with the use of comparison and/or logical operators together with a simple if statement.

if statements

Here’s an example that uses a conditional if statement to ask that question whether 1 < 2 and 4 > 2.

if 1 < 2 and 4 > 2:
    print("The first condition is met")#if the condition is true this line prints out, otherwise nothing is returned.
The first condition is met

The structure for an if statement is very particular. The first line of the if is left aligned and communicates the condition. The condition must return a True or False value. The condition must be followed by a colon, :.

Next, if the condition evaluates to True, then the lines below the if statement will be executed.

See the prototype (or pseudo code) below:

if CONDITION :
    # do something such as execute a print statement
    print("The first condition is met")

Include only one statement on each line. Otherwise, an IndentationError will be thrown.

For example, the code below will return this error: IndentationError: expected an indented block

if 1 < 2 and 4 > 2:
print("The first condition met")

Multiple if statements

You can ask multiple questions in a single code chunk. Each if statement is evaluated separately and unrelated to the other. This means if the first condition is met or unmet the second condition is still evaluated, and so on.

if 1 < 2 and 4 > 2:
    print("The first condition met")
The first condition met
if 1 > 2 and 4 < 10:
    print("The second unrelated condition is met.")

if 4 < 10 or 1 < 2:
    print("The third unrelated condition is met.")
The third unrelated condition is met.

if / elif statements

To predicate the one condition on another, for example, the second condition on the first and the third on the second, we would use the if and elif statements together. elif is short for else if and always needs to be preceded by an if statement when it is the first elif in the block of code. There can be multiple elif statements.

Examine the code below. Take note of how the colon is used at the end of the if statement and each elif statement.

Also, note the indentation of the nested if statement after each elif. The indentation rules are important to follow.

if 1 < 2 and 4 > 2:
    print("The first condition is met") #printed if condition is met
elif 1 > 2 and 4 < 10:
    print("The second condition is met") #printed if condition is met but not the first
elif 4 < 10 or 1 < 2:
    print("The third condition is met") #printed if condition is met but not the second or first
else: 
    print("No conditions were met.") #printed only if none of the 3 conditions were met. 
The first condition is met

This example illustrates how you can evaluate for one condition and if that is not met, then evaluate for a second condition and so on. The else: block is only run when the last condition elif 4 < 10 or 1 < 2: is evaluated to False.

5.4 Conditionals and DataFrames

Conditional statements are very useful when working with DataFrames.

Let’s do a quick review of how to import data and select columns and rows with the mba data set.

First, we import the pandas class as pd. Then we import the data by passing in the full URL references to the .csv file to the read_csv method.

#first import the data
import pandas as pd
mydata = pd.read_csv("http://becomingvisual.com/python4data/mba_2021.csv", header=0) 

Then, preview the data.

mydata.head(5)
   Rank                         School  ... Total Tuition ($)  Duration (Months)
0     1                         INSEAD  ...             83832                 12
1     2                            LBS  ...            119359                 21
2     3  University of Chicago (Booth)  ...            111855                 21
3     4                           IESE  ...            107049                 21
4     5                           Yale  ...            104752                 21

[5 rows x 11 columns]

Next, to select columns we can use a simple syntax of DataFrameName.ColumnName.

For example,

mydata.Rank
0      1
1      2
2      3
3      4
4      5
5      6
6      7
7      8
8      9
9     10
10    11
11    12
12    13
13    14
14    15
15    16
16    17
17    18
18    19
19    20
20    21
21    22
22    23
23    24
24    25
Name: Rank, dtype: int64

This is a handy shortcut. The alternative is to use the indexing operator as shown below.

mydata["School"]
0                                               INSEAD
1                                                  LBS
2                        University of Chicago (Booth)
3                                                 IESE
4                                                 Yale
5                               Northwestern (Kellogg)
6                                                Ceibs
7                                            HEC Paris
8                              Duke University (Fuqua)
9                             Dartmouth College (Tuck)
10                     University of Virginia (Darden)
11                    SDA Bocconi School of Management
12                         New York University (Stern)
13    National University of Singapore Business School
14                                   Cornell (Johnson)
15                                   Cambridge (Judge)
16                              Georgetown (McDonough)
17                                       Oxford (Said)
18                                                 IMD
19                                               Esade
20                                     Michigan (Ross)
21                                               HKUST
22                           Indian School of Business
23                                      USC (Marshall)
24                                        WASHU (Olin)
Name: School, dtype: object

Now that we’ve reviewed how to select columns and rows in a pandas DataFrame, let’s select some data based on some condition?

Specifically, what if we want to filter our mba DataFrame to show only schools ranked greater or equal to 10?

To do that, we take a column from the DataFrame and apply a Boolean condition to it. Here’s an example of a Boolean condition:

#Boolean condition
condition01 = mydata.Rank <= 10
condition01
0      True
1      True
2      True
3      True
4      True
5      True
6      True
7      True
8      True
9      True
10    False
11    False
12    False
13    False
14    False
15    False
16    False
17    False
18    False
19    False
20    False
21    False
22    False
23    False
24    False
Name: Rank, dtype: bool

A Series is returned of Boolean values. Those with the value of True meet the condition. You can see that a Series is returned by using the type() function.

type(condition01)
<class 'pandas.core.series.Series'>

Let’s write another condition to see which schools have an average starting salary of greater than or equal to $125,000.

#Boolean condition
condition02 = mydata.AvgSalary >=125000
condition02
0     True
1     True
2     True
3     True
4     True
5     True
6     True
7     True
8     True
9     True
10    True
11    True
12    True
13    True
14    True
15    True
16    True
17    True
18    True
19    True
20    True
21    True
22    True
23    True
24    True
Name: AvgSalary, dtype: bool

These data are informative, but probably not that useful. What if you wanted to return the data for just those schools that met your conditions?

This is where we would use double square brackets to return the result set. Let’s try doing this with condition02.

#Boolean condition that returns the result set
condition02 = mydata[mydata.AvgSalary >=125000]
condition02
    Rank  ... Duration (Months)
0      1  ...                12
1      2  ...                21
2      3  ...                21
3      4  ...                21
4      5  ...                21
5      6  ...                22
6      7  ...                18
7      8  ...                16
8      9  ...                21
9     10  ...                24
10    11  ...                21
11    12  ...                18
12    13  ...                22
13    14  ...                17
14    15  ...                21
15    16  ...                12
16    17  ...                21
17    18  ...                22
18    19  ...                12
19    20  ...                18
20    21  ...                22
21    22  ...                16
22    23  ...                18
23    24  ...                20
24    25  ...                24

[25 rows x 11 columns]

This returns a DataFrame object. You can see this by using the type() function.

type(condition02)
<class 'pandas.core.frame.DataFrame'>

The alternative syntax is:

condition02 = mydata[mydata['AvgSalary'] >=125000]
type(condition02)
<class 'pandas.core.frame.DataFrame'>

We can ask more complex questions of our data using logical operators | for “or” and & for “and”.

Let’s filter the DataFrame to show only those schools where the Average salary is greater than or equal to $125,000 and the tuition is less than or equal to $100,000.

#Boolean condition that returns the result set
condition03 = mydata[(mydata['AvgSalary'] >=125000) & (mydata['Total Tuition ($)'] <= 100000)]
condition03 #type DataFrame
    Rank  ... Duration (Months)
0      1  ...                12
8      9  ...                21
10    11  ...                21
11    12  ...                18
13    14  ...                17
15    16  ...                12
16    17  ...                21
17    18  ...                22
19    20  ...                18
20    21  ...                22
22    23  ...                18
24    25  ...                24

[12 rows x 11 columns]

Take note of the use of square brackets and parentheses. We need to make sure to group evaluations with parentheses so Python knows how to evaluate the conditional.

I choose to use the index operator [ ], over the . to access the column names. This is because the names have spaces that can cause an error. This is why single names for variables make coding more efficient.

Also, take note of the use of the & operator usage. If and were used a ValueError would be thrown. This is because the conditional operators ( and, or and not) do operate the same on pandas Series and DataFrame. Use the logical bit-wise operators (&, |, and !) when working with DataFrames and Series.

The pandas isin() method

Pandas isin() method is used to filter DataFrames. Using the isin() method returns a subset of data, dependent on the presence of a value in a column, for instance. In the example below, we are evaluating to see if the Country field contains the US or France.

#Boolean condition that returns the result set
condition04 = mydata[mydata.Country.isin(['US', 'France'])]
condition04 #type DataFrame
    Rank                           School  ... Total Tuition ($)  Duration (Months)
2      3    University of Chicago (Booth)  ...            111855                 21
4      5                             Yale  ...            104752                 21
5      6           Northwestern (Kellogg)  ...            111658                 22
7      8                        HEC Paris  ...            116713                 16
8      9          Duke University (Fuqua)  ...             97554                 21
9     10         Dartmouth College (Tuck)  ...            118047                 24
10    11  University of Virginia (Darden)  ...             99856                 21
12    13      New York University (Stern)  ...            121541                 22
14    15                Cornell (Johnson)  ...            101685                 21
16    17           Georgetown (McDonough)  ...             95063                 21
20    21                  Michigan (Ross)  ...             97566                 22
23    24                   USC (Marshall)  ...            102964                 20
24    25                     WASHU (Olin)  ...             69461                 24

[13 rows x 11 columns]

5.5 Control structures

The while loop

The while statement allows you to repeatedly execute a block of statements as long as a condition is True. A while statement can have an optional else clause. Commonly, the while control structure is called a loop. This is because you are doing something over and over again, only when a given condition evaluates to True. The while loop iterates over a block of code as long as the test expression (condition) is True.

We generally use this type of control structure (over the for loop) when we don’t know beforehand, the number of times the loop will iterate.

Read through the following example.

What is the expected output?

counter = 0

while counter < 3:
    print("Inside loop")
    counter = counter + 1
else:
    print("Inside else")
  • initialize the variable counter by assigning it to zero.
  • evaluate the condition counter < 3
  • if the condition is True then execute the print statement and increment the counter by 1.
  • If the condition counter < 3 evaluates to False, then go to the else clause and execute the print statement. Exit the loop.

The output of the code is below.

Inside loop
Inside loop
Inside loop
Inside else

A common use case for the while loop is the circumstance where you want to collect input from a user until they are done. The code below asks the user a question.

print("Tell me about yourself. Enter one sentence at a time and press enter. When you are done type the word logout and press enter.")

while True:
  line=input('>')
  if line[0] == "#":
    continue
  if line == "logout":
    break
  print(line) 

print("Thank you for sharing! Have a nice day.")  

Try out this code. See if you can understand the role of the break and continue statements.

  • When the continue is executed it ends the current iteration and jumps back to the while statement and starts the next iteration. This occurs when the user types #.
  • When the break statement is executed, the loop is exited. The occurs only when the user types logout.

The for loop

The for statement is another looping statement that iterates over a sequence of objects i.e. go through each item in a sequence (list, tuple, string) and other iterable objects.

Let’s look at two examples:

Example 1

What is the output produced by this loop?

mytuple =(1, 5,9)

for i in mytuple:
    print(i)
else:
    print('The for loop is over')

Try running code yourself to observe how the for loop functions.

The for loops presents a short cut for writing the following code:

mytuple =(1, 5,9)
i=0
while  i < len(mytuple): 
    print (mytuple[i])
    i=i+1
else:
    print('The loop is over')

Essentially, a for loop is a condensed version of a while loop. It avoids explicitly initializing the index variable, incrementing the index variable, and the condition of len(mytuple) is implied.

Example 2

In this example, we are finding the sum of all numbers stored in a list

# List of numbers
numbers = [6, 5, 3, 8, 4, 2, 5, 4, 11]

# variable to store the sum
sum = 0

# iterate over the list
for val in numbers:
    sum = sum+val

# Output: The sum is 48
print("The sum is", sum)
The sum is 48

5.6 Summary

  • Conditional statements may use comparison and logical operators to evaluate the truth of an expression.
  • We can ask more complex questions of our data using logical operators | for “or” and & for “and”.
  • Pandas isin() method is used to filter DataFrames.
  • for and while are looping statements that iterate over a sequence of objects or block of code, respectively.

Exercise 5.1

  1. Filter the mba DataFrame to show only those schools where the average salary is less than 100,000 and the tuition is greater than equal to 100,000. If no schools match, ensure the output returned is interpretable.

Exercise 5.2

  1. Create a new DataFrame called schoolOptions that pulls in the top 10 schools where the average salary is greater than or equal to $100,000.

  2. Create a new column in the above DataFrame that calculates the monthly tuition. Call this column monthlyTuition, sort the new dataframe by smallest to largest (hint: use dataframe.sort_values(by=['columnName'])

  3. OPTIONAL. Create a new column called AvgsalFormatted and format all values with dollar signs and thousands-separators.

Assignment 5

  1. Upload the BIGMAC Index from the Economist data from Github for July 2019. It’s available at: http://becomingvisual.com/python4data/bigmac.csv

The value of a BigMac in the United States is $1. Create a new DataDrame called `inflatedBurger for all countries whose Big Mac price in dollars (dollar_price) was greater than purchasing power parity (dollar_ppp).

  1. Create a for loop that prints “Do not buy a Big Mac when you visit _________” for each country in your new table.

  2. Create a list of all country names where dollar price is greater than dollar purchasing power parity (dollar_ppp).

  3. Remove Euro Area from your list since it is not a country.