# Lesson 5 Conditionals and Controls

by Kristen Sosulski

Welcome to lesson 5. In this lesson, you will be introduced comparison operators, logical operators, conditional statements, iteration and looping.

Follow along with this tutorial by creating a new ipython notebook named lesson05.ipynb and entering the code snippets presented.

Outline

• Comparison operators
• Logical operators
• Conditional statements
• Conditionals and DataFrames
• Control structures
• Summary
• Exercise 5.1
• Exercise 5.2
• Assignment 5

## 5.1 Comparison operators

Early on in lesson 1, we learned about different types of mathematical operators including: `+`, `-`, `*,` and `/`. Aside from these operators, there comparative and logical operators that are used to evaluate conditions based on `bool` values. Boolean values are either `True` or `False`.

Let’s start with understanding comparison operations as presented in the table below:

Less than < 4 < 10 `True`
Less than or equal to <= 4 <= 4 `True`
Greater than > 11 > 12 `False`
Greater than or equal to >= 4 >= 4 `True`
Equal to == 3 == 2 `False`
Not equal to != 3 != 2 `True`

Less than

For numeric comparisons, the less than operator `<` evaluates to see if the first value is less than the second value. If it is, the value of `True` is returned. If it is not, the value of `False` is returned.

What will be returned from the following statement?

``33 <  90``

If you answered `True` you are correct.

Less than or equal to

For numeric comparisons, the less than or equal to operator `<=` evaluates to see if the first value is less than or equal to second value. If it is, the value of `True` is returned. If it is not, the value of `False` is returned. When using `<=` do not put a space between the `<` and the `=` (e.g. `< =`). This will produce a `SyntaxError: invalid syntax`.

``33 <=33``
``True``

Greater than

For numeric comparisons, the greater than operator `>` evaluates to see if the first value is greater than second value. If it is, the value of `True` is returned. If it is not, the value of `False` is returned.

``33 >= 90``
``False``

Greater than or equal to

For numeric comparisons, the greater than or equal to operator `>=` evaluates to see if the first value is greater or equal to the second value. If it is, the value of `True` is returned. If it is not, the value of `False` is returned. When using `>=` do not put a space between the `>` and the `=` (e.g. `> =`). This will produce a `SyntaxError: invalid syntax`.

``33 >= 33``
``True``

Equal to

For numeric comparisons, the equality operator `==` evaluates to see if the first value is equal to the second value. If it is, the value of `True` is returned. If it is not, the value of `False` is returned. When using `==` do not put a space between the first `=` and the second `=` (e.g. `= =`). This will produce a `SyntaxError: invalid syntax`.

``33 == 33``
``True``

Note that the equal sign and the double equal sign have VERY different meanings in python. The `=` denotes assignment as in `x = 4` defined as the value of 4 is assigned the variable x. The `==` denotes equality and compares two values to see if they are equal.

Not equal to

For numeric comparisons, the inequality operator `!=` evaluates to see if the first value is not equal to the second value. If it is, the value of `True` is returned. If it is not, the value of `False` is returned. When using `!=` do not put a space between the `!` and the `=` (e.g. `! =`). This will produce a `SyntaxError: invalid syntax`.

``33 != 33``
``False``

## 5.2 Conditional operators

In addition to the six comparison operators (`>,>=,<,<=,==, and !-=`), there are three conditional operators. These are explained in the table below.

Or or (3==3) or (4==7) `True`
And and (3==3) and (4==7) `False`
Not not not(3==3) `False`

Or

For numeric comparisons, the `or` operator returns `True` if one of the statements is true. For example, we can evaluate the result of the two expressions in parentheses as show below. If one of them is true, then the value of `True` is returned.

``(5**2 == 25) or (5!=5)  ``
``True``

However, if neither of the expressions evaluate to `True` the value of `False` is returned.

And

Contrarily, if we altered the expressions above and replaced the `or` with `and` the result would evaluate to `False`. This is because both expressions need to be true if the `and` operator is used.

``(5**2 == 25) and (5!=5)  ``
``False``

** Not **

The `not` operator reverse the result of the statements. It returns `False` if the result is true and `True` if the result is false.

``not(5==5) ``
``False``

Important note of usage: When working with DataFrames and Series, you will have to use the `&`, `|`, and `!` operators instead of `and`, `or`, and `not` respectively.

## 5.3 Conditional statements

Conditionals statements are a nice way to make decisions by evaluating a condition to see if it `True`. Often times, we may want to make a decision based one or more conditions. This can be achieved with the use of comparison and/or logical operators together with a simple `if` statement.

`if` statements

Here’s an example that uses a conditional `if` statement to ask that question whether 1 < 2 and 4 > 2.

``````if 1 < 2 and 4 > 2:
print("The first condition is met")#if the condition is true this line prints out, otherwise nothing is returned.``````
``The first condition is met``

The structure for an `if` statement is very particular. The first line of the `if` is left aligned and communicates the condition. The condition must be one that returns a `True` or `False` value. The condition must be followed by a colon, `:`.

Next, if the condition evaluates to `True`, then the lines below the if statement will be executed.

See the prototype (or pseudo code) below:

``````if CONDITION :
# do something such as execute a print statement
print("The first condition is met")``````

Include only one statement on each line. Otherwise, an `IndentationError` will be thrown.

For example, the code below will return this error: `IndentationError: expected an indented block`

``````if 1 < 2 and 4 > 2:
print("The first condition met")``````

Multiple if statements

You can ask multiple questions in a single code chunk. Each `if` statement is evaluated separately and unrelated to each other. This means, if the first condition is met or unmet the second condition is still evaluated, and so on.

``````
if 1 < 2 and 4 > 2:
print("The first condition met")``````
``The first condition met``
``````if 1 > 2 and 4 < 10:
print("The second unrelated condition is met.")

if 4 < 10 or 1 < 2:
print("The third unrelated condition is met.")``````
``The third unrelated condition is met.``

`if` / `elif` statements

To predicate the one condition on another, for example, the second condition on the first and the third on the second, we would use the `if` and `elif` statements together. `elif` is short for else if and always needs to be preceded by an `if` statement when it is the first `elif` in the block of code. There can be multiple `elif` statements.

Examine the code below. Take note of how the colon is used at the end of the `if` statement and each `elif` statement.

Also, note the indentation of the nested `if` statement after each `elif`. The indentation rules are important to follow.

``````
if 1 < 2 and 4 > 2:
print("The first condition is met") #printed if condition is met
elif 1 > 2 and 4 < 10:
print("The second condition is met") #printed if condition is met but not the first
elif 4 < 10 or 1 < 2:
print("The third condition is met") #printed if condition is met but not the second or first
else:
print("No conditions were met.") #printed only if the none of the 3 conditions were met. ``````
``The first condition is met``

This example illustrates how you can evaluate for one condition and if that is not met, then evaluate for a second condition and so on. The `else:` block is only run when the last condition `elif 4 < 10 or 1 < 2:` is evaluated to `False`.

## 5.4 Conditionals and DataFrames

Conditional statements are very useful when working with DataFrames.

Let’s do a quick review of how to import data and select columns and rows with the `mba` data set.

First, we import the pandas class as pd. Then we import the data by passing in the full URL references to the .csv file to the read_csv method.

``````#first import the data
import pandas as pd

Then, preview the data.

``mydata.head(5)``
``````   Rank             School  ... Total Tuition (\$)  Duration (Months)
0     1    Chicago (Booth)  ...            106800                 21
1     2   Dartmouth (Tuck)  ...            106980                 21
2     3  Virginia (Darden)  ...            107800                 21
3     4            Harvard  ...            107000                 18
4     5           Columbia  ...            111736                 20

[5 rows x 11 columns]``````

Next, to select columns we can use a simple syntax of DataFrameName.ColumnName.

For example,

``mydata.Rank``
``````0      1
1      2
2      3
3      4
4      5
5      6
6      7
7      8
8      9
9     10
10    11
11    12
12    13
13    14
14    15
15    16
16    17
17    18
18    19
19    20
20    21
21    22
22    23
23    24
24    25
Name: Rank, dtype: int64``````

This is a handy short cut. The alternative is to use the indexing operator as shown below.

``mydata["School"]``
``````0                   Chicago (Booth)
1                  Dartmouth (Tuck)
2                 Virginia (Darden)
3                           Harvard
4                          Columbia
5     California At Berkeley (Haas)
6                       MIT (Sloan)
7                          Stanford
8                              IESE
9                               IMD
10                 New York (Stern)
11                           London
12           Pennsylvania (Wharton)
13                        HEC Paris
14                Cornell (Johnson)
15                  York (Schulich)
16         Carnegie Mellon (Tepper)
19           Northwestern (Kellogg)
20                 Emory (Goizueta)
21                               IE
22                  UCLA (Anderson)
23                  Michigan (Ross)
24                             Bath
Name: School, dtype: object``````

Now that we’ve reviewed how to select columns and rows in a pandas DataFrame, let’s select some data based on some condition?

Specifically, what if we want to filter our `mba` DataFrame to show only schools ranked greater or equal to 10?

To do that, we take a column from the DataFrame and apply a Boolean condition to it. Here’s an example of a Boolean condition:

``````#Boolean condition
condition01 = mydata.Rank <= 10
condition01``````
``````0      True
1      True
2      True
3      True
4      True
5      True
6      True
7      True
8      True
9      True
10    False
11    False
12    False
13    False
14    False
15    False
16    False
17    False
18    False
19    False
20    False
21    False
22    False
23    False
24    False
Name: Rank, dtype: bool``````

A `Series` is returned of Boolean values. Those with the value of `True` meet the condition. You can see that a `Series` is returned by by using the `type()` function.

``type(condition01)``
``<class 'pandas.core.series.Series'>``

Let’s write another condition to see which schools have an average starting salary of greater than or equal to \$125,000.

``````#Boolean condition
condition02 = mydata.AvgSalary >=125000
condition02``````
``````0     False
1     False
2     False
3     False
4     False
5     False
6     False
7      True
8      True
9      True
10    False
11    False
12    False
13    False
14    False
15    False
16    False
17    False
18    False
19    False
20    False
21    False
22    False
23    False
24     True
Name: AvgSalary, dtype: bool``````

These data are informative, but probably not that useful. What if you wanted to return the data for just those schools that met your conditions?

This is where we would use double square brackets to return the result set. Let’s try doing this with condition02.

``````#Boolean condition that returns the result set
condition02 = mydata[mydata.AvgSalary >=125000]
condition02``````
``````    Rank    School  ... Total Tuition (\$)  Duration (Months)
7      8  Stanford  ...            114600                 21
8      9      IESE  ...             95610                 19
9     10       IMD  ...             67416                 11
24    25      Bath  ...             36057                 12

[4 rows x 11 columns]``````

This returns a `DataFrame` object. You can see this by using the `type()` function.

``type(condition02)``
``<class 'pandas.core.frame.DataFrame'>``

The alternative syntax is:

``````condition02 = mydata[mydata['AvgSalary'] >=125000]
type(condition02)``````
``<class 'pandas.core.frame.DataFrame'>``

We can ask more complex questions of our data using using logical operators `|` for “or” and `&` for “and”.

Let’s filter the the DataFrame to show only those schools where the Average salary is greater than or equal to \$125,000 and the tution is less than or equal to \$100,000.

``````#Boolean condition that returns the result set
condition03 = mydata[(mydata['AvgSalary'] >=125000) & (mydata['Total Tuition (\$)'] <= 100000)]
condition03 #type DataFrame``````
``````    Rank School  ... Total Tuition (\$)  Duration (Months)
8      9   IESE  ...             95610                 19
9     10    IMD  ...             67416                 11
24    25   Bath  ...             36057                 12

[3 rows x 11 columns]``````

Take note of the the use of square brackets and parentheses. We need to make sure to group evaluations with parentheses so Python knows how to evaluate the conditional.

I choose to use the index operator `[ ]`, over the `.` to access the column names. This is because the names have spaces which can cause an error. This is why single names for variables make coding more efficient.

Also, take note of the use of the `&` operator usage. If `and` were used a `ValueError` would be thrown. This is because the conditional operators ( `and`, `or` and `not`) do operate the same on pandas Series and DataFrame. Use the logical bit-wise operators (`&`, `|`, and `!`) when working with DataFrames and Series.

The pandas `isin()` method

Pandas `isin()` method is used to filter DataFrames. Using the `isin()` method returns a subset of data dependent on the presence of a value in a column, for instance. In the example below, we are evaluating to see if the `Country` field contains the US or France.

``````#Boolean condition that returns the result set
condition04 = mydata[mydata.Country.isin(['US', 'France'])]
condition04 #type DataFrame``````
``````    Rank                         School  ... Total Tuition (\$)  Duration (Months)
0      1                Chicago (Booth)  ...            106800                 21
1      2               Dartmouth (Tuck)  ...            106980                 21
2      3              Virginia (Darden)  ...            107800                 21
3      4                        Harvard  ...            107000                 18
4      5                       Columbia  ...            111736                 20
5      6  California At Berkeley (Haas)  ...            106792                 21
6      7                    MIT (Sloan)  ...            116400                 22
7      8                       Stanford  ...            114600                 21
10    11               New York (Stern)  ...             96640                 20
12    13         Pennsylvania (Wharton)  ...            107852                 21
13    14                      HEC Paris  ...             66802                 16
14    15              Cornell (Johnson)  ...            107592                 21
16    17       Carnegie Mellon (Tepper)  ...            108272                 21
19    20         Northwestern (Kellogg)  ...            113100                 22
20    21               Emory (Goizueta)  ...             87200                 22
22    23                UCLA (Anderson)  ...            105160                 21
23    24                Michigan (Ross)  ...            105500                 20

[17 rows x 11 columns]``````

## 5.5 Control structures

The while loop

The `while` statement allows you to repeatedly execute a block of statements as long as a condition is `True`. A `while` statement can have an optional `else` clause. Commonly, the `while` control structure is called a loop. This is because you are doing something over and over again, only when a given condition evaluates to `True`. The while loop iterates over a block of code as long as the test expression (condition) is `True`.

We generally use this type of control structure (over the `for` loop) when we don’t know beforehand, the number of times the loop will iterate.

What is the expected output?

``````counter = 0

while counter < 3:
print("Inside loop")
counter = counter + 1
else:
print("Inside else")``````
• initialize the variable counter by assigning it to zero.
• evaluate the condition `counter < 3`
• if the condition is `True` then execute the `print` statement and increment the counter by 1.
• If the condition `counter < 3` evaluates to `False`, then go to the `else` clause and execute the `print` statement. Exit the loop.

The output of the code is below.

``````Inside loop
Inside loop
Inside loop
Inside else``````

A common use case for the `while` loop is the circumstance where you want to collect input from a user until they are done. The code below asks the user a question.

``````print("Tell me about yourself. Enter one sentence at a time and press enter. When you are done type the word logout and press enter.")

while True:
line=input('>')
if line == "#":
continue
if line == "logout":
break
print(line)

print("Thank you for sharing! Have a nice day.")  ``````

Try out this code. See if you can understand the role of the `break` and `continue` statements.

• When the `continue` is executed it ends the current iteration and jumps back to the while statement and starts the next iteration. This occurs when the user types `#`.
• When the `break` statement is executed, the loop is exited. The occurs only when the user types `logout`.

The for loop

The `for` statement is another looping statement which iterates over a sequence of objects i.e. go through each item in a sequence (list, tuple, string) and other iterable objects.

Let’s look at two examples:

Example 1

What is the output produced by this loop?

``````mytuple =(1, 5,9)

for i in mytuple:
print(i)
else:
print('The for loop is over')``````

Try running code yourself to observe how the `for` loop functions.

The `for` loops presents a short cut for writing the following code:

``````mytuple =(1, 5,9)
i=0
while  i < len(mytuple):
print (mytuple[i])
i=i+1
else:
print('The loop is over')``````

Essentially, a for loop is a condensed version of a while loop. It avoids explicitly initializing the index variable, incrementing the index variable, and the condition of `len(mytuple)` is implied.

Example 2

In this example, we are finding the sum of all numbers stored in a list

``````
# List of numbers
numbers = [6, 5, 3, 8, 4, 2, 5, 4, 11]

# variable to store the sum
sum = 0

# iterate over the list
for val in numbers:
sum = sum+val

# Output: The sum is 48
print("The sum is", sum)``````
``The sum is 48``

## 5.6 Summary

• Conditional statements may use comparison and logical operators to evaluate the truth of an expression.
• We can ask more complex questions of our data using using logical operators `|` for “or” and `&` for “and”.
• Pandas `isin()` method is used to filter DataFrames.
• `for` and `while` are looping statements that iterate over a sequence of objects or block of code, respectively.

## Exercise 5.1

1. Filter the `mba` DataFrame to show only those schools where the Average salary is less than to \$100,000 and the tution is greater than equal to \$100,000.

## Exercise 5.2

1. Create a new DataFrame called `schoolOptions` that pulls in the top 10 schools where average salary is greater than or equal to \$100,000.

2. Create a new column in the above DataFrame that calculates the monthly tuition. Call this column monthlyTuition, sort the new dataframe by smallest to largest (hint: use `dataframe.sort_values(by=[“columnName”])`

## Assignment 5

1. Upload the BIGMAC Index from the Economist data from Github for July 2019. It’s available at: http://becomingvisual.com/python4data/bigmac.csv

The value of a BigMac in the United States is \$1. Create a new DataDrame called `inflatedBurger for all countries whose Big Mac price in dollars (dollar_price) was greater than purchasing power parity (dollar_ppp).

1. Create a for loop that prints “Do not buy a Big Mac when you visit _________” for each country in your new table.

2. Create a list of all country names where dollar price is greater than dollar purchasing power parity (dollar_ppp).

3. Remove Euro Area from your list since it is not a country.

4. Why was the index used not 6?