Lesson 5 Conditionals and Controls
by Kristen Sosulski
Welcome to lesson 5. In this lesson, you will be introduced comparison operators, logical operators, conditional statements, iteration and looping.
Follow along with this tutorial by creating a new ipython notebook named lesson05.ipynb and entering the code snippets presented.
Outline
- Comparison operators
- Logical operators
- Conditional statements
- Conditionals and DataFrames
- Control structures
- Summary
- Exercise 5.1
- Exercise 5.2
- Assignment 5
5.1 Comparison operators
Early on in lesson 1, we learned about different types of mathematical operators including: +
, -
, *,
and /
. Aside from these operators, there comparative and logical operators that are used to evaluate conditions based on bool
values. Boolean values are either True
or False
.
Let's start with understanding comparison operations as presented in the table below:
Operation | Operator | Example Input | Answer |
---|---|---|---|
Less than | < | 4 < 10 | True |
Less than or equal to | <= | 4 <= 4 | True |
Greater than | > | 11 > 12 | False |
Greater than or equal to | >= | 4 >= 4 | True |
Equal to | == | 3 == 2 | False |
Not equal to | != | 3 != 2 | True |
Less than
For numeric comparisons, the less than operator <
evaluates to see if the first value is less than the second value. If it is, the value of True
is returned. If it is not, the value of False
is returned.
What will be returned from the following statement?
33 < 90
If you answered True
you are correct.
Less than or equal to
For numeric comparisons, the less than or equal to operator <=
evaluates to see if the first value is less than or equal to second value. If it is, the value of True
is returned. If it is not, the value of False
is returned. When using <=
do not put a space between the <
and the =
(e.g. < =
). This will produce a SyntaxError: invalid syntax
.
33 <=33
True
Greater than
For numeric comparisons, the greater than operator >
evaluates to see if the first value is greater than second value. If it is, the value of True
is returned. If it is not, the value of False
is returned.
33 >= 90
False
Greater than or equal to
For numeric comparisons, the greater than or equal to operator >=
evaluates to see if the first value is greater or equal to the second value. If it is, the value of True
is returned. If it is not, the value of False
is returned. When using >=
do not put a space between the >
and the =
(e.g. > =
). This will produce a SyntaxError: invalid syntax
.
33 >= 33
True
Equal to
For numeric comparisons, the equality operator ==
evaluates to see if the first value is equal to the second value. If it is, the value of True
is returned. If it is not, the value of False
is returned. When using ==
do not put a space between the first =
and the second =
(e.g. = =
). This will produce a SyntaxError: invalid syntax
.
33 == 33
True
Note that the equal sign and the double equal sign have VERY different meanings in python. The =
denotes assignment as in x = 4
defined as the value of 4 is assigned the variable x. The ==
denotes equality and compares two values to see if they are equal.
Not equal to
For numeric comparisons, the inequality operator !=
evaluates to see if the first value is not equal to the second value. If it is, the value of True
is returned. If it is not, the value of False
is returned. When using !=
do not put a space between the !
and the =
(e.g. ! =
). This will produce a SyntaxError: invalid syntax
.
33 != 33
False
5.2 Conditional operators
In addition to the six comparison operators (>,>=,<,<=,==, and !-=
), there are three conditional operators. These are explained in the table below.
Operation | Operator | Example Input | Answer |
---|---|---|---|
Or | or | (3==3) or (4==7) | True |
And | and | (3==3) and (4==7) | False |
Not | not | not(3==3) | False |
Or
For numeric comparisons, the or
operator returns True
if one of the statements is true. For example, we can evaluate the result of the two expressions in parentheses as show below. If one of them is true, then the value of True
is returned.
(5**2 == 25) or (5!=5)
True
However, if neither of the expressions evaluate to True
the value of False
is returned.
And
Contrarily, if we altered the expressions above and replaced the or
with and
the result would evaluate to False
. This is because both expressions need to be true if the and
operator is used.
(5**2 == 25) and (5!=5)
False
Not
The not
operator reverse the result of the statements. It returns False
if the result is true and True
if the result is false.
not(5==5)
False
Important note of usage: When working with DataFrames and Series, you will have to use the &
, |
, and !
operators instead of and
, or
, and not
respectively.
5.3 Conditional statements
Conditionals statements are a nice way to make decisions by evaluating a condition to see if it True
. Often times, we may want to make a decision based one or more conditions. This can be achieved with the use of comparison and/or logical operators together with a simple if
statement.
if
statements
Here's an example that uses a conditional if
statement to ask that question whether 1 < 2 and 4 > 2.
if 1 < 2 and 4 > 2:
print("The first condition is met")#if the condition is true this line prints out, otherwise nothing is returned.
The first condition is met
The structure for an if
statement is very particular. The first line of the if
is left aligned and communicates the condition. The condition must be one that returns a True
or False
value. The condition must be followed by a colon, :
.
Next, if the condition evaluates to True
, then the lines below the if statement will be executed.
See the prototype (or pseudo code) below:
if CONDITION :
# do something such as execute a print statement
print("The first condition is met")
Include only one statement on each line. Otherwise, an IndentationError
will be thrown.
For example, the code below will return this error: IndentationError: expected an indented block
if 1 < 2 and 4 > 2:
print("The first condition met")
Multiple if statements
You can ask multiple questions in a single code chunk. Each if
statement is evaluated separately and unrelated to each other. This means, if the first condition is met or unmet the second condition is still evaluated, and so on.
if 1 < 2 and 4 > 2:
print("The first condition met")
The first condition met
if 1 > 2 and 4 < 10:
print("The second unrelated condition is met.")
if 4 < 10 or 1 < 2:
print("The third unrelated condition is met.")
The third unrelated condition is met.
if
/ elif
statements
To predicate the one condition on another, for example, the second condition on the first and the third on the second, we would use the if
and elif
statements together. elif
is short for else if and always needs to be preceded by an if
statement when it is the first elif
in the block of code. There can be multiple elif
statements.
Examine the code below. Take note of how the colon is used at the end of the if
statement and each elif
statement.
Also, note the indentation of the nested if
statement after each elif
. The indentation rules are important to follow.
if 1 < 2 and 4 > 2:
print("The first condition is met") #printed if condition is met
elif 1 > 2 and 4 < 10:
print("The second condition is met") #printed if condition is met but not the first
elif 4 < 10 or 1 < 2:
print("The third condition is met") #printed if condition is met but not the second or first
else:
print("No conditions were met.") #printed only if the none of the 3 conditions were met.
The first condition is met
This example illustrates how you can evaluate for one condition and if that is not met, then evaluate for a second condition and so on. The else:
block is only run when the last condition elif 4 < 10 or 1 < 2:
is evaluated to False
.
5.4 Conditionals and DataFrames
Conditional statements are very useful when working with DataFrames.
Let's do a quick review of how to import data and select columns and rows with the mba
data set.
First, we import the pandas class as pd. Then we import the data by passing in the full URL references to the .csv file to the read_csv method.
#first import the data
import pandas as pd
mydata = pd.read_csv("http://becomingvisual.com/python4data/mba.csv", header=0)
Then, preview the data.
mydata.head(5)
Rank School ... Total Tuition ($) Duration (Months)
0 1 Chicago (Booth) ... 106800 21
1 2 Dartmouth (Tuck) ... 106980 21
2 3 Virginia (Darden) ... 107800 21
3 4 Harvard ... 107000 18
4 5 Columbia ... 111736 20
[5 rows x 11 columns]
Next, to select columns we can use a simple syntax of DataFrameName.ColumnName.
For example,
mydata.Rank
0 1
1 2
2 3
3 4
4 5
5 6
6 7
7 8
8 9
9 10
10 11
11 12
12 13
13 14
14 15
15 16
16 17
17 18
18 19
19 20
20 21
21 22
22 23
23 24
24 25
Name: Rank, dtype: int64
This is a handy short cut. The alternative is to use the indexing operator as shown below.
mydata["School"]
0 Chicago (Booth)
1 Dartmouth (Tuck)
2 Virginia (Darden)
3 Harvard
4 Columbia
5 California At Berkeley (Haas)
6 MIT (Sloan)
7 Stanford
8 IESE
9 IMD
10 New York (Stern)
11 London
12 Pennsylvania (Wharton)
13 HEC Paris
14 Cornell (Johnson)
15 York (Schulich)
16 Carnegie Mellon (Tepper)
17 ESADE
18 INSEAD
19 Northwestern (Kellogg)
20 Emory (Goizueta)
21 IE
22 UCLA (Anderson)
23 Michigan (Ross)
24 Bath
Name: School, dtype: object
Now that we’ve reviewed how to select columns and rows in a pandas DataFrame, let's select some data based on some condition?
Specifically, what if we want to filter our mba
DataFrame to show only schools ranked greater or equal to 10?
To do that, we take a column from the DataFrame and apply a Boolean condition to it. Here's an example of a Boolean condition:
#Boolean condition
condition01 = mydata.Rank <= 10
condition01
0 True
1 True
2 True
3 True
4 True
5 True
6 True
7 True
8 True
9 True
10 False
11 False
12 False
13 False
14 False
15 False
16 False
17 False
18 False
19 False
20 False
21 False
22 False
23 False
24 False
Name: Rank, dtype: bool
A Series
is returned of Boolean values. Those with the value of True
meet the condition. You can see that a Series
is returned by by using the type()
function.
type(condition01)
<class 'pandas.core.series.Series'>
Let's write another condition to see which schools have an average starting salary of greater than or equal to $125,000.
#Boolean condition
condition02 = mydata.AvgSalary >=125000
condition02
0 False
1 False
2 False
3 False
4 False
5 False
6 False
7 True
8 True
9 True
10 False
11 False
12 False
13 False
14 False
15 False
16 False
17 False
18 False
19 False
20 False
21 False
22 False
23 False
24 True
Name: AvgSalary, dtype: bool
These data are informative, but probably not that useful. What if you wanted to return the data for just those schools that met your conditions?
This is where we would use double square brackets to return the result set. Let's try doing this with condition02.
#Boolean condition that returns the result set
condition02 = mydata[mydata.AvgSalary >=125000]
condition02
Rank School ... Total Tuition ($) Duration (Months)
7 8 Stanford ... 114600 21
8 9 IESE ... 95610 19
9 10 IMD ... 67416 11
24 25 Bath ... 36057 12
[4 rows x 11 columns]
This returns a DataFrame
object. You can see this by using the type()
function.
type(condition02)
<class 'pandas.core.frame.DataFrame'>
The alternative syntax is:
condition02 = mydata[mydata['AvgSalary'] >=125000]
type(condition02)
<class 'pandas.core.frame.DataFrame'>
We can ask more complex questions of our data using using logical operators |
for "or" and &
for "and".
Let's filter the the DataFrame to show only those schools where the Average salary is greater than or equal to $125,000 and the tution is less than or equal to $100,000.
#Boolean condition that returns the result set
condition03 = mydata[(mydata['AvgSalary'] >=125000) & (mydata['Total Tuition ($)'] <= 100000)]
condition03 #type DataFrame
Rank School ... Total Tuition ($) Duration (Months)
8 9 IESE ... 95610 19
9 10 IMD ... 67416 11
24 25 Bath ... 36057 12
[3 rows x 11 columns]
Take note of the the use of square brackets and parentheses. We need to make sure to group evaluations with parentheses so Python knows how to evaluate the conditional.
I choose to use the index operator [ ]
, over the .
to access the column names. This is because the names have spaces which can cause an error. This is why single names for variables make coding more efficient.
Also, take note of the use of the &
operator usage. If and
were used a ValueError
would be thrown. This is because the conditional operators ( and
, or
and not
) do operate the same on pandas Series and DataFrame. Use the logical bit-wise operators (&
, |
, and !
) when working with DataFrames and Series.
The pandas isin()
method
Pandas isin()
method is used to filter DataFrames. Using the isin()
method returns a subset of data dependent on the presence of a value in a column, for instance. In the example below, we are evaluating to see if the Country
field contains the US or France.
#Boolean condition that returns the result set
condition04 = mydata[mydata.Country.isin(['US', 'France'])]
condition04 #type DataFrame
Rank School ... Total Tuition ($) Duration (Months)
0 1 Chicago (Booth) ... 106800 21
1 2 Dartmouth (Tuck) ... 106980 21
2 3 Virginia (Darden) ... 107800 21
3 4 Harvard ... 107000 18
4 5 Columbia ... 111736 20
5 6 California At Berkeley (Haas) ... 106792 21
6 7 MIT (Sloan) ... 116400 22
7 8 Stanford ... 114600 21
10 11 New York (Stern) ... 96640 20
12 13 Pennsylvania (Wharton) ... 107852 21
13 14 HEC Paris ... 66802 16
14 15 Cornell (Johnson) ... 107592 21
16 17 Carnegie Mellon (Tepper) ... 108272 21
19 20 Northwestern (Kellogg) ... 113100 22
20 21 Emory (Goizueta) ... 87200 22
22 23 UCLA (Anderson) ... 105160 21
23 24 Michigan (Ross) ... 105500 20
[17 rows x 11 columns]
5.5 Control structures
The while loop
The while
statement allows you to repeatedly execute a block of statements as long as a condition is True
. A while
statement can have an optional else
clause. Commonly, the while
control structure is called a loop. This is because you are doing something over and over again, only when a given condition evaluates to True
. The while loop iterates over a block of code as long as the test expression (condition) is True
.
We generally use this type of control structure (over the for
loop) when we don't know beforehand, the number of times the loop will iterate.
Read through the following example.
What is the expected output?
counter = 0
while counter < 3:
print("Inside loop")
counter = counter + 1
else:
print("Inside else")
- initialize the variable counter by assigning it to zero.
- evaluate the condition
counter < 3
- if the condition is
True
then execute the- If the condition
counter < 3
evaluates toFalse
, then go to theelse
clause and execute the
The output of the code is below.
Inside loop
Inside loop
Inside loop
Inside else
A common use case for the while
loop is the circumstance where you want to collect input from a user until they are done. The code below asks the user a question.
print("Tell me about yourself. Enter one sentence at a time and press enter. When you are done type the word logout and press enter.")
while True:
line=input('>')
if line[0] == "#":
continue
if line == "logout":
break
print(line)
print("Thank you for sharing! Have a nice day.")
Try out this code. See if you can understand the role of the break
and continue
statements.
- When the
continue
is executed it ends the current iteration and jumps back to the while statement and starts the next iteration. This occurs when the user types#
.- When the
break
statement is executed, the loop is exited. The occurs only when the user typeslogout
.
The for loop
The for
statement is another looping statement which iterates over a sequence of objects i.e. go through each item in a sequence (list, tuple, string) and other iterable objects.
Let's look at two examples:
Example 1
What is the output produced by this loop?
mytuple =(1, 5,9)
for i in mytuple:
print(i)
else:
print('The for loop is over')
Try running code yourself to observe how the for
loop functions.
The for
loops presents a short cut for writing the following code:
mytuple =(1, 5,9)
i=0
while i < len(mytuple):
print (mytuple[i])
i=i+1
else:
print('The loop is over')
Essentially, a for loop is a condensed version of a while loop. It avoids explicitly initializing the index variable, incrementing the index variable, and the condition of len(mytuple)
is implied.
Example 2
In this example, we are finding the sum of all numbers stored in a list
# List of numbers
numbers = [6, 5, 3, 8, 4, 2, 5, 4, 11]
# variable to store the sum
sum = 0
# iterate over the list
for val in numbers:
sum = sum+val
# Output: The sum is 48
print("The sum is", sum)
The sum is 48
5.6 Summary
- Conditional statements may use comparison and logical operators to evaluate the truth of an expression.
- We can ask more complex questions of our data using using logical operators
|
for "or" and&
for "and". - Pandas
isin()
method is used to filter DataFrames. for
andwhile
are looping statements that iterate over a sequence of objects or block of code, respectively.
Exercise 5.1
- Filter the
mba
DataFrame to show only those schools where the Average salary is less than 100,000 and the tution is greater than equal to 100,000. If no schools match be sure the output returned is interpetable.
Exercise 5.2
Create a new DataFrame called
schoolOptions
that pulls in the top 10 schools where average salary is greater than or equal to $100,000.Create a new column in the above DataFrame that calculates the monthly tuition. Call this column monthlyTuition, sort the new dataframe by smallest to largest (hint: use
dataframe.sort_values(by=['columnName'])
OPTIONAL. Create a new column called
AvgsalFormatted
and format all values with dollar signs and thousands separtors.
Assignment 5
- Upload the BIGMAC Index from the Economist data from Github for July 2019. It's available at: http://becomingvisual.com/python4data/bigmac.csv
The value of a BigMac in the United States is $1. Create a new DataDrame called `inflatedBurger for all countries whose Big Mac price in dollars (dollar_price) was greater than purchasing power parity (dollar_ppp).
Create a for loop that prints “Do not buy a Big Mac when you visit _________” for each country in your new table.
Create a list of all country names where dollar price is greater than dollar purchasing power parity (dollar_ppp).
Remove Euro Area from your list since it is not a country.