Lesson 5 Conditionals and Controls
Welcome to lesson 5. In this lesson, you will be introduced to comparison operators, logical operators, conditional statements, iteration, and looping.
Follow along with this tutorial by creating a new ipython notebook named lesson05.ipynb and entering the code snippets presented.
Outline
- Comparison operators
- Logical operators
- Conditional statements
- Conditionals and DataFrames
- Control structures
- Summary
- Exercise 5.1
- Exercise 5.2
- Assignment 5
5.1 Comparison operators
Early on in lesson 1, we learned about different types of mathematical operators including +
, -
, *,
and /
. Aside from these operators, there comparative and logical operators that are used to evaluate conditions based on bool
values. Boolean values are either True
or False
.
Let’s start with understanding comparison operations as presented in the table below:
Operation | Operator | Example Input | Answer |
---|---|---|---|
Less than | < | 4 < 10 | True |
Less than or equal to | <= | 4 <= 4 | True |
Greater than | > | 11 > 12 | False |
Greater than or equal to | >= | 4 >= 4 | True |
Equal to | == | 3 == 2 | False |
Not equal to | != | 3 != 2 | True |
Less than
For numeric comparisons, the less-than operator <
evaluates to see if the first value is less than the second value. If it is, the value of True
is returned. If it is not, the value of False
is returned.
What will be returned from the following statement?
If you answered True
you are correct.
Less than or equal to
For numeric comparisons, the less than or equal to operator <=
evaluates to see if the first value is less than or equal to the second value. If it is, the value of True
is returned. If it is not, the value of False
is returned. When using <=
do not put a space between the <
and the =
(e.g. < =
). This will produce a SyntaxError: invalid syntax
.
True
Greater than
For numeric comparisons, the greater than operator >
evaluates to see if the first value is greater than the second value. If it is, the value of True
is returned. If it is not, the value of False
is returned.
False
Greater than or equal to
For numeric comparisons, the greater than or equal to operator >=
evaluates to see if the first value is greater or equal to the second value. If it is, the value of True
is returned. If it is not, the value of False
is returned. When using >=
do not put a space between the >
and the =
(e.g. > =
). This will produce a SyntaxError: invalid syntax
.
True
Equal to
For numeric comparisons, the equality operator ==
evaluates to see if the first value is equal to the second value. If it is, the value of True
is returned. If it is not, the value of False
is returned. When using ==
do not put a space between the first =
and the second =
(e.g. = =
). This will produce a SyntaxError: invalid syntax
.
True
Note that the equal sign and the double equal sign have VERY different meanings in python. The =
denotes assignment as in x = 4
defined as the value of 4 is assigned the variable x. The ==
denotes equality and compares two values to see if they are equal.
Not equal to
For numeric comparisons, the inequality operator !=
evaluates to see if the first value is not equal to the second value. If it is, the value of True
is returned. If it is not, the value of False
is returned. When using !=
do not put a space between the !
and the =
(e.g. ! =
). This will produce a SyntaxError: invalid syntax
.
False
5.2 Conditional operators
In addition to the six comparison operators (>,>=,<,<=,==, and !-=
), there are three conditional operators. These are explained in the table below.
Operation | Operator | Example Input | Answer |
---|---|---|---|
Or | or | (3==3) or (4==7) | True |
And | and | (3==3) and (4==7) | False |
Not | not | not(3==3) | False |
Or
For numeric comparisons, the or
operator returns True
if one of the statements is true. For example, we can evaluate the result of the two expressions in parentheses as shown below. If one of them is true, then the value of True
is returned.
True
However, if neither of the expressions evaluates to True
the value of False
is returned.
And
Contrarily, if we altered the expressions above and replaced the or
with and
the result would evaluate as False
. This is because both expressions need to be true if the and
operator is used.
False
Not
The not
operator reverses the result of the statements. It returns False
if the result is true and True
if the result is false.
False
Important note of usage: When working with DataFrames and Series, you will have to use the &
, |
, and !
operators instead of and
, or
, and not
respectively.
5.3 Conditional statements
Conditionals statements are a nice way to make decisions by evaluating a condition to see if it is True
. Oftentimes, we may want to make a decision based on one or more conditions. This can be achieved with the use of comparison and/or logical operators together with a simple if
statement.
if
statements
Here’s an example that uses a conditional if
statement to ask that question whether 1 < 2 and 4 > 2.
if 1 < 2 and 4 > 2:
print("The first condition is met")#if the condition is true this line prints out, otherwise nothing is returned.
The first condition is met
The structure for an if
statement is very particular. The first line of the if
is left aligned and communicates the condition. The condition must return a True
or False
value. The condition must be followed by a colon, :
.
Next, if the condition evaluates to True
, then the lines below the if statement will be executed.
See the prototype (or pseudo code) below:
Include only one statement on each line. Otherwise, an IndentationError
will be thrown.
For example, the code below will return this error: IndentationError: expected an indented block
Multiple if statements
You can ask multiple questions in a single code chunk. Each if
statement is evaluated separately and unrelated to the other. This means if the first condition is met or unmet the second condition is still evaluated, and so on.
The first condition met
if 1 > 2 and 4 < 10:
print("The second unrelated condition is met.")
if 4 < 10 or 1 < 2:
print("The third unrelated condition is met.")
The third unrelated condition is met.
if
/ elif
statements
To predicate the one condition on another, for example, the second condition on the first and the third on the second, we would use the if
and elif
statements together. elif
is short for else if and always needs to be preceded by an if
statement when it is the first elif
in the block of code. There can be multiple elif
statements.
Examine the code below. Take note of how the colon is used at the end of the if
statement and each elif
statement.
Also, note the indentation of the nested if
statement after each elif
. The indentation rules are important to follow.
if 1 < 2 and 4 > 2:
print("The first condition is met") #printed if condition is met
elif 1 > 2 and 4 < 10:
print("The second condition is met") #printed if condition is met but not the first
elif 4 < 10 or 1 < 2:
print("The third condition is met") #printed if condition is met but not the second or first
else:
print("No conditions were met.") #printed only if none of the 3 conditions were met.
The first condition is met
This example illustrates how you can evaluate for one condition and if that is not met, then evaluate for a second condition and so on. The else:
block is only run when the last condition elif 4 < 10 or 1 < 2:
is evaluated to False
.
5.4 Conditionals and DataFrames
Conditional statements are very useful when working with DataFrames.
Let’s do a quick review of how to import data and select columns and rows with the mba
data set.
First, we import the pandas class as pd. Then we import the data by passing in the full URL references to the .csv file to the read_csv method.
#first import the data
import pandas as pd
mydata = pd.read_csv("http://becomingvisual.com/python4data/mba_2021.csv", header=0)
Then, preview the data.
Rank School ... Total Tuition ($) Duration (Months)
0 1 INSEAD ... 83832 12
1 2 LBS ... 119359 21
2 3 University of Chicago (Booth) ... 111855 21
3 4 IESE ... 107049 21
4 5 Yale ... 104752 21
[5 rows x 11 columns]
Next, to select columns we can use a simple syntax of DataFrameName.ColumnName.
For example,
0 1
1 2
2 3
3 4
4 5
5 6
6 7
7 8
8 9
9 10
10 11
11 12
12 13
13 14
14 15
15 16
16 17
17 18
18 19
19 20
20 21
21 22
22 23
23 24
24 25
Name: Rank, dtype: int64
This is a handy shortcut. The alternative is to use the indexing operator as shown below.
0 INSEAD
1 LBS
2 University of Chicago (Booth)
3 IESE
4 Yale
5 Northwestern (Kellogg)
6 Ceibs
7 HEC Paris
8 Duke University (Fuqua)
9 Dartmouth College (Tuck)
10 University of Virginia (Darden)
11 SDA Bocconi School of Management
12 New York University (Stern)
13 National University of Singapore Business School
14 Cornell (Johnson)
15 Cambridge (Judge)
16 Georgetown (McDonough)
17 Oxford (Said)
18 IMD
19 Esade
20 Michigan (Ross)
21 HKUST
22 Indian School of Business
23 USC (Marshall)
24 WASHU (Olin)
Name: School, dtype: object
Now that we’ve reviewed how to select columns and rows in a pandas DataFrame, let’s select some data based on some condition?
Specifically, what if we want to filter our mba
DataFrame to show only schools ranked greater or equal to 10?
To do that, we take a column from the DataFrame and apply a Boolean condition to it. Here’s an example of a Boolean condition:
0 True
1 True
2 True
3 True
4 True
5 True
6 True
7 True
8 True
9 True
10 False
11 False
12 False
13 False
14 False
15 False
16 False
17 False
18 False
19 False
20 False
21 False
22 False
23 False
24 False
Name: Rank, dtype: bool
A Series
is returned of Boolean values. Those with the value of True
meet the condition. You can see that a Series
is returned by using the type()
function.
<class 'pandas.core.series.Series'>
Let’s write another condition to see which schools have an average starting salary of greater than or equal to $125,000.
0 True
1 True
2 True
3 True
4 True
5 True
6 True
7 True
8 True
9 True
10 True
11 True
12 True
13 True
14 True
15 True
16 True
17 True
18 True
19 True
20 True
21 True
22 True
23 True
24 True
Name: AvgSalary, dtype: bool
These data are informative, but probably not that useful. What if you wanted to return the data for just those schools that met your conditions?
This is where we would use double square brackets to return the result set. Let’s try doing this with condition02.
#Boolean condition that returns the result set
condition02 = mydata[mydata.AvgSalary >=125000]
condition02
Rank ... Duration (Months)
0 1 ... 12
1 2 ... 21
2 3 ... 21
3 4 ... 21
4 5 ... 21
5 6 ... 22
6 7 ... 18
7 8 ... 16
8 9 ... 21
9 10 ... 24
10 11 ... 21
11 12 ... 18
12 13 ... 22
13 14 ... 17
14 15 ... 21
15 16 ... 12
16 17 ... 21
17 18 ... 22
18 19 ... 12
19 20 ... 18
20 21 ... 22
21 22 ... 16
22 23 ... 18
23 24 ... 20
24 25 ... 24
[25 rows x 11 columns]
This returns a DataFrame
object. You can see this by using the type()
function.
<class 'pandas.core.frame.DataFrame'>
The alternative syntax is:
<class 'pandas.core.frame.DataFrame'>
We can ask more complex questions of our data using logical operators |
for “or” and &
for “and”.
Let’s filter the DataFrame to show only those schools where the Average salary is greater than or equal to $125,000 and the tuition is less than or equal to $100,000.
#Boolean condition that returns the result set
condition03 = mydata[(mydata['AvgSalary'] >=125000) & (mydata['Total Tuition ($)'] <= 100000)]
condition03 #type DataFrame
Rank ... Duration (Months)
0 1 ... 12
8 9 ... 21
10 11 ... 21
11 12 ... 18
13 14 ... 17
15 16 ... 12
16 17 ... 21
17 18 ... 22
19 20 ... 18
20 21 ... 22
22 23 ... 18
24 25 ... 24
[12 rows x 11 columns]
Take note of the use of square brackets and parentheses. We need to make sure to group evaluations with parentheses so Python knows how to evaluate the conditional.
I choose to use the index operator [ ]
, over the .
to access the column names. This is because the names have spaces that can cause an error. This is why single names for variables make coding more efficient.
Also, take note of the use of the &
operator usage. If and
were used a ValueError
would be thrown. This is because the conditional operators ( and
, or
and not
) do operate the same on pandas Series and DataFrame. Use the logical bit-wise operators (&
, |
, and !
) when working with DataFrames and Series.
The pandas isin()
method
Pandas isin()
method is used to filter DataFrames. Using the isin()
method returns a subset of data, dependent on the presence of a value in a column, for instance. In the example below, we are evaluating to see if the Country
field contains the US or France.
#Boolean condition that returns the result set
condition04 = mydata[mydata.Country.isin(['US', 'France'])]
condition04 #type DataFrame
Rank School ... Total Tuition ($) Duration (Months)
2 3 University of Chicago (Booth) ... 111855 21
4 5 Yale ... 104752 21
5 6 Northwestern (Kellogg) ... 111658 22
7 8 HEC Paris ... 116713 16
8 9 Duke University (Fuqua) ... 97554 21
9 10 Dartmouth College (Tuck) ... 118047 24
10 11 University of Virginia (Darden) ... 99856 21
12 13 New York University (Stern) ... 121541 22
14 15 Cornell (Johnson) ... 101685 21
16 17 Georgetown (McDonough) ... 95063 21
20 21 Michigan (Ross) ... 97566 22
23 24 USC (Marshall) ... 102964 20
24 25 WASHU (Olin) ... 69461 24
[13 rows x 11 columns]
5.5 Control structures
The while loop
The while
statement allows you to repeatedly execute a block of statements as long as a condition is True
. A while
statement can have an optional else
clause. Commonly, the while
control structure is called a loop. This is because you are doing something over and over again, only when a given condition evaluates to True
. The while loop iterates over a block of code as long as the test expression (condition) is True
.
We generally use this type of control structure (over the for
loop) when we don’t know beforehand, the number of times the loop will iterate.
Read through the following example.
What is the expected output?
counter = 0
while counter < 3:
print("Inside loop")
counter = counter + 1
else:
print("Inside else")
- initialize the variable counter by assigning it to zero.
- evaluate the condition
counter < 3
- if the condition is
True
then execute the- If the condition
counter < 3
evaluates toFalse
, then go to theelse
clause and execute the
The output of the code is below.
Inside loop
Inside loop
Inside loop
Inside else
A common use case for the while
loop is the circumstance where you want to collect input from a user until they are done. The code below asks the user a question.
print("Tell me about yourself. Enter one sentence at a time and press enter. When you are done type the word logout and press enter.")
while True:
line=input('>')
if line[0] == "#":
continue
if line == "logout":
break
print(line)
print("Thank you for sharing! Have a nice day.")
Try out this code. See if you can understand the role of the break
and continue
statements.
- When the
continue
is executed it ends the current iteration and jumps back to the while statement and starts the next iteration. This occurs when the user types#
.- When the
break
statement is executed, the loop is exited. The occurs only when the user typeslogout
.
The for loop
The for
statement is another looping statement that iterates over a sequence of objects i.e. go through each item in a sequence (list, tuple, string) and other iterable objects.
Let’s look at two examples:
Example 1
What is the output produced by this loop?
Try running code yourself to observe how the for
loop functions.
The for
loops presents a short cut for writing the following code:
mytuple =(1, 5,9)
i=0
while i < len(mytuple):
print (mytuple[i])
i=i+1
else:
print('The loop is over')
Essentially, a for loop is a condensed version of a while loop. It avoids explicitly initializing the index variable, incrementing the index variable, and the condition of len(mytuple)
is implied.
Example 2
In this example, we are finding the sum of all numbers stored in a list
# List of numbers
numbers = [6, 5, 3, 8, 4, 2, 5, 4, 11]
# variable to store the sum
sum = 0
# iterate over the list
for val in numbers:
sum = sum+val
# Output: The sum is 48
print("The sum is", sum)
The sum is 48
5.6 Summary
- Conditional statements may use comparison and logical operators to evaluate the truth of an expression.
- We can ask more complex questions of our data using logical operators
|
for “or” and&
for “and”. - Pandas
isin()
method is used to filter DataFrames. for
andwhile
are looping statements that iterate over a sequence of objects or block of code, respectively.
Exercise 5.1
- Filter the
mba
DataFrame to show only those schools where the average salary is less than 100,000 and the tuition is greater than equal to 100,000. If no schools match, ensure the output returned is interpretable.
Exercise 5.2
Create a new DataFrame called
schoolOptions
that pulls in the top 10 schools where the average salary is greater than or equal to $100,000.Create a new column in the above DataFrame that calculates the monthly tuition. Call this column monthlyTuition, sort the new dataframe by smallest to largest (hint: use
dataframe.sort_values(by=['columnName'])
OPTIONAL. Create a new column called
AvgsalFormatted
and format all values with dollar signs and thousands-separators.
Assignment 5
- Upload the BIGMAC Index from the Economist data from Github for July 2019. It’s available at: http://becomingvisual.com/python4data/bigmac.csv
The value of a BigMac in the United States is $1. Create a new DataDrame called `inflatedBurger for all countries whose Big Mac price in dollars (dollar_price) was greater than purchasing power parity (dollar_ppp).
Create a for loop that prints “Do not buy a Big Mac when you visit _________” for each country in your new table.
Create a list of all country names where dollar price is greater than dollar purchasing power parity (dollar_ppp).
Remove Euro Area from your list since it is not a country.