Lesson 2 Data types and variables

Welcome to lesson 2. In this lesson we’ll focus on creating variables and understand the different data types. Follow along with this tutorial by creating a new notebook in jupyterhub named lesson02.ipynb and entering the code snippets presented.

  • Variable assignment
  • Data types
  • Variable reassignment
  • Casting

2.1 Variable assignment

One of the most important things to be able to do in any programming language is store information in variables. You can think of variables as a label for one or more pieces of information. When doing analysis in python, all of your data (the variables you measured in your study) will be stored as variables. You can also create variables for other things too, which we will learn later on in the course.

Let’s create a few simple variables to represent customer demographics:

Variable Value
customer id 12345
first name Wanda
last name Wonderly
gender Female
married False
annual income 240,000

We’ll begin with customer id. As a rule, all variables need to be single words without spaces, otherwise python will throw a Syntax Error. Underscores work well in variable names _.

It’s best practice to also use lowercase letter for variable names and to use names that are meaningful. For example, customer_id is more descriptive then ci.

Now that we have some basic rules down, let’s create the variable customer_id. We do this by using a special operator called the assignment operator denoted by a single equals sign, =. The name of the variable is the on the left side of the equals sign and the value being assigned is on the right.

customer_id = 12345

To see the value referenced in a variable, type in the variable name.

customer_id
12345

2.2 Data types

You can also determine the data type of the variable. “A data type is a particular kind of data item, as defined by the values it can take, the programming language used (in our case, python), or the operations that can be performed on it”.

Numeric data types: int and float

The data type for customer_id is determined by the value it is assigned. Since it is assigned 12345, the data type is a numeric data type called an integer denoted as int, since it is a positive whole number. If it was -12345, would it still be of type int. Yes, since integers are positive or negative whole numbers. However, it was assigned 12345.0 python would consider it another type of numeric data called a float. A float is simply a positive floating point decimal. In short, if your data has decimal points, then it is a float. If it is only whole numbers, then it is an int.

To determine the data type of an existing variable use the type() function and pass in the variable name.

Try it:

type(customer_id)
<type 'int'>

Let’s create a variable named salary and assign it the value of 240,000.

salary=240,000 # this will cause an error.
salary
(240, 0)

What is the error? Well you’ll notice that python still executed the code. A logical error type occurred. This means that our code worked but the output was unexpected. When we check the data type as shown below, it is of type tuple. This is a data structure we’ll learn more about in lesson 3.

type(salary)
<type 'tuple'>

What caused this mistake?

Well, that thousands separator (the comma) in 240,000. Python interpret the assignment to salary as two values: 240 and 000.

To correct it, remove the thousands separator.

salary=240000
type(salary)
<type 'int'>
salary
240000

However, this can be unsettling output for reports and graphs. Consider using the .format() function to make your ouput look more readable,

Number formatting

For example,

salary=240000
print("${:,.0f}".format(salary))
$240,000

The :, adds a comma as a thousands separator, and the .0f limits the string to zero decimal places at the end. This is followed by the .format() function that passes in the variable to format.

The following table shows various ways to format numbers using Python’s str.format() function, including examples for both float formatting and integer formatting (Katz, 2012).

To run examples use print("FORMAT".format(NUMBER)). So to get the output of the first example, you would run: print("{:.2f}".format(3.1415926)) (Katz, 2012, para 3.).

Number Format Output Description
3.1415926 {:.2f} 3.14 2 decimal places
3.1415926 {:+.2f} +3.14 2 decimal places with sign
-1 {:+.2f} -1.00 2 decimal places with sign
2.71828 {:.0f} 3 No decimal places
5 {:0>2d} 05 Pad number with zeros (left padding, width 2)
5 {:x<4d} 5xxx Pad number with x’s (right padding, width 4)
10 {:x<4d} 10xx Pad number with x’s (right padding, width 4)
1000000 {:,} 1,000,000 Number format with comma separator
0.25 {:.2%} 25.00% Format percentage
1000000000 {:.2e} 1.00e+09 Exponent notation

Use the format code syntax {field_name:conversion}, where field_name specifies the index number of the argument to the str.format() method, and conversion refers to the conversion code of the data type.

String data types

Next, let’s create the other variables and assign them to the following values:

Variable Value Status
customer id 12345 Created as customer_id
first name Wanda
last name Wonderly
gender Female
married False
annual income 240,000

Below, I’ve attempted to create the variable first name and assign it to Wanda. There are two mistakes in this code. Can you find them?

first name = Wanda #will throw an error 

First, the variable first name has a space in it. We know variables cannot contain spaces. Let’s fix it.

firstname = Wanda #will throw an error 

What’s still wrong with the code above?

If you don’t know, try to run it. What type of error is thrown?

You will likely receive an error message that looks like the following:

NameError: name 'Wanda' is not defined

What do you think is wrong?

Well, it probably has something to do with Wanda. When we assign values that contain letters (With some exceptions like True and False. More on that soon!), or letters AND numbers those values need to be enclosed in double (or single) quotation marks. When we run the code above we receive this error:

NameError: name 'Wanda' is not defined

We can fix this code by putting Wanda in double quotes:

firstname = "Wanda"

The type of data assigned to firstname is called a string. Strings are sequences of character data. The string type in Python is called str.

String literals may be delimited using either single or double quotes. All the characters between the opening delimiter and matching closing delimiter are part of the string. For example, when we used the print() function in lesson 1, we enclosed the text in double quotes. We could print out the name Wanda three ways.

Way 1: Using double quotes:

print("Wanda")
Wanda

Way 2: Using single quotes:

print('Wanda')
Wanda

Way 3: Referencing the variable firstname

print(firstname)
Wanda

We can determine the variable data type of firstname and see that the type str is return, which is a string data type.

type(firstname)
<type 'str'>

Let’s move on to creating the two other string variables: lastname and gender and print them out. You’ll notice in the code below that I’ve used the print() function to print out both a string literal and the value referenced in the string variable together in a single statement.

lastname = "Wonderly"
gender = "Female"

print("I just created two more variables. lastname =",lastname, "and gender=",gender)
('I just created two more variables. lastname =', 'Wonderly', 'and gender=', 'Female')

There are many functions in python that act exclusively on strings. For example, if you wanted to change a string variable to all upper or lowercase letters you could use the upper() or lower() functions, respectively.

The format is variablename.upper() and variablename.lower().

Let’s print out lastname and gender using these functions to change the output.

lastname.upper()
'WONDERLY'
gender.lower()
'female'

Boolean data types

Has lastnamechanged it’s referenced value to uppercase WONDERLY?

You can test this by using the isupper() function as shown below.

lastname.isupper()
False

Are you surprised by the result?

You’ll notice that the value of False is returned. False is a Boolean value. Boolean values can only be True or False. The function isupper() returns a Boolean value. The value returned in this example is False, indicating that the variable lastname is not is uppercase lettering.

This my be confusing to you. Previously, you just used the upper() function to print out lastname in uppercase lettering. This is because we did not update lastname with the the result of lastname.isupper(). We’ll learn how to update variables in the section below.

For now, let’s continue the discussion on Boolean data types.

In our demographic data we have a variable named married. Let’s create the variable and assign it the value of False.

married =False
type(married)
<type 'bool'>

You’ll notice that the type returned for the married variable is bool, meaning a Boolean data type. When using Boolean data types, never put double or single quotes around the values. True and False are special reserved words in python. When used without quotes they indicate that values are Boolean.

In the upcoming lessons we’ll use Boolean variables with conditional statements and logical operators.

For example, we can use an if statement to see if our customer is married or not. Note that the percise syntax of the statements below, including the colons and indentatation.

if (married):
  print ("This customer is married.")
else:
  print ("This customer is not married.")
This customer is not married.

This statement just tests to see if married is True. Since it is not true the print statement below the else is printed out.

2.3 Variable reassignment

Updating a variable is easy. To save the result from lastname.isupper() we can simply assign lastname to itself as shown below.

lastname = lastname.isupper()
print(lastname) #prints out the result
False

Did you know that updating a varaible with a different type of data will change it’s data type?

For example, the variable lastname is of type bool. We can prove it using the type() function.

type(lastname)
<type 'bool'>

If we reassign lastname to the value of 20 (without double or single quotes around the value), it would be of type int.

lastname =20
type(lastname)
<type 'int'>

Let’s go back and reassign lastname the correct value.

lastname ="Wonderly"
type(lastname)
<type 'str'>

2.4 Changing data types

If you wanted salary to be of type float you could simply reassign salary the value of 240000.00 or “cast” the variable to a different type.

Option 1: reassign and use the .format function.

salary=240000.00
type(salary)
<type 'float'>
print("${:,.2f}".format(salary))
$240,000.00

Option 2: cast the variable to type float using the float() function and use the .format function.

salary=240000
salary=float(salary)
type(salary)
<type 'float'>
print("${:,.2f}".format(salary))
$240,000.00

You can change from one data type to another using functions such as:

Function Description Example Input Example Output
int() Converts an argument to an integer value from a number or a string int("5") 5
float() Converts an argument to a floating point decimal value from a number or a string float(5) 5.0
str() Converts an argument to a string value str(5) '5'
bool() Converts an argument to a Boolean value bool(0) False

While there are functions to change data from one type to another, the value must be able to be converted to the right type.

For example, it’s easy to convert x = “23” to x=23 by using x=int(x). However, you couldn’t necessarily convert name =“Kelly” to an integer using int(name). This is simply because the name Kelly cannot be converted to a number.

Here’s a good reference to learn more about data types: https://realpython.com/python-data-types/

2.5 Summary

  • The equal sign = denotes assignment in python.
  • We use the assignment operator to assign values to a variable.
  • Variables must be named as a single word without spaces.
  • Variable names should be descriptive and written in lowercase.
  • The value referenced in a variable can be accessed by typing out the variable name in a code chunk. The value will be returned.
  • Variables that reference values that are non-numeric (with some exceptions) must be assigned values that are enclosed in double or single quotes and will be of type str.
  • The ouput of numbers can be formated using the format() function.
  • a bool data type is Boolean or logical data with the value of True or False.
  • Variables can be reassigned new values. Variable reassignment from one data type such an int value to a str value will change the data type of the reassigned variable.
  • Casting converts a variable from one data type to another. The value must be compatible and convert-able to the target data type.
  • Functions to cast to different data types include int(), float(), str(), and bool().

2.6 Quiz

1. How would you assign the integer value of 138 to a variable of named weight?

  1. 138 = weight
  2. weight = 138
  3. weight = 138.00
  4. weight ="138"
  5. weight ='138'

2. What data type would python assign to distance_run=3.25?

  1. bool
  2. int
  3. str
  4. float
  5. complex

3. What function would you use to determine the data type of a variable?

  1. str()
  2. print(datatype())
  3. data.type()
  4. class()
  5. type()

4. Which of the following functions work on only with strings?

  1. upper()
  2. lower()
  3. isupper()
  4. a and b
  5. a,b,and c

5. What is the data type of this variable: x=True

  1. bool
  2. str
  3. int
  4. char
  5. float

6. Which statement is correct given that you want to print the value of the following variable: name ="Kelly Jones" followed by “is a customer in our system”.

  1. print("Kelly Jones is a customer in our system")
  2. print("Kelly Jones is a", name, "customer in our system")
  3. print('Kelly Jones is a', name, 'customer in our system')
  4. print(name, 'is customer in our system')
  5. print(name "is customer in our system"")

7. How would you update the following variable named Present to the Boolean value of false?

  1. present = False
  2. present = "false"
  3. Present = False
  4. Present ='FALSE'
  5. present = 'False'

8. Which of the following statements will result in an error:

  1. int(“3”)
  2. string(3)
  3. myvariable = int(“3”)
  4. str(3)
  5. str(“mystring”)

9. True or False

variable01="44" and variable02="33".

If you wanted to add variable01 to variable02 and get the result of 77, you would first need to cast variable01 and variable02 to an integer using the int() function and update the variables with the new integer values.

  1. True
  2. False

10. True or False

The following statement would result in a logical error if you wanted to assign the variable annual_rent to a value of $3,000.

annual_rent=3,000

  1. True
  2. False

2.7 Exercises

2.7.1 Exercise 2.1

1. Building upon exercise 1.2, create variable buy and variable sell for Jen.

2. Restructure your percent return formula using your new variables and assign this to its own ‘tot_return’ variable.

3. What data type is returned by the output (answer) from question 2?

2.7.2 Exercise 2.2

  1. You are working for Facebook’s Operations Intelligence team and are tasked to build a calculator to find the best ad sales representative each month which is based on largest (hint: max) monthly ad revenue. There are 5 sales representatives listed below with this months revenue. Set up the formula so each month you simply change each representatives revenue. Call the calculator rep_calculator

Joe: $1,500,000 Sarah: $2,750,000 Jack: $560,000 Mark: $1,975,000 Shawn: $2,200,000

Answer:

Joe=1500000 Sarah=2750000 Jack=560000 Mark=1975000 Shawn=2200000

rep_calculator=max(Joe, Sarah, Jack, Mark, Shawn)

  1. Print out the following statement using your calculated value: This months Best Ad Sales Representative made: ____________ dollars!

Answer:

print(‘This months Best Ad Sales Representative made:’, ad_calculator, ‘dollars!’)

2.8 Assignment 2

1. You work in baseball operations at the NY Mets and are purchasing baseball caps from NewEra for next season. NewEra sells two types of caps, a snapback and a flex cap. The price points for each hat are listed below. Based on forecasted demand for next season you plan to buy 40,000 snapbacks and 70,000 flex hats.

  • Flex=$12.45
  • Snapback=$10.60

Create variables for: snap_num (the number of snap caps), flex_num (the number of flex caps), along with the corresponding prices: snap_px, flex_px.

2. Create a total_cost variable that represents the total cost of your purchase of both hat types.

3. After running some analysis you realize that snapbacks are even less popular this year and you decide to reduce the number of snapbacks by 10,000 and increase the flex hats by 10,000. Adjust your variables for this change and calculate a new total cost.

2.9 Exam questions

** 1. What data type is the following variable?**

aapl=202.59

  1. int
  2. float
  3. Str
  4. tuple
  5. num

** 2. The following code is entered into Python:

goog=500 amzn=100 cat=350 xom=550 jpm=600 max_shares=max(goog, amzn, cat, xom, jpm) print(max_shares)

What is the output of the Python code above?

  1. SyntaxError
  2. Jpm
  3. 600
  4. NameError

3. Which statement below calculates the average jersey price?

Rivera=134.99 Alonso=124.99 Harper=119.99 Bregman=119.99 Betts=124.99

  1. nd(((Rivera+Alonso+Harper+Bregman+Betts)/5),2)
  2. round(((Rivera+Alonso+Harper+Bregman+Betts)//5),2)
  3. rnd(((Rivera+Alonso+Harper+Bregman+Betts)//5),2)
  4. round(((Rivera+Alonso+Harper+Bregman+Betts)/5),2)

4. How would you format the variable savings=23000.23 to print out as Your savings is: $23,000.23

  1. print(“${:,.2f}”.format(savings)) [CORRECT]
  2. print(“Your savings is: $” savings )
  3. print(“{:,.1f}”.format(savings)) d)print(“$(:,.2f)”.format{savings})

References

Katz, M. (2012).Python String Format Cookbook. Available at: https://mkaz.blog/code/python-string-format-cookbook/