Lesson 2 Data types and variables
Welcome to lesson 2. In this lesson we’ll focus on creating variables and understand the different data types. Follow along with this tutorial by creating a new notebook in jupyterhub named lesson02.ipynb and entering the code snippets presented.
- Variable assignment
- Data types
- Variable reassignment
- Casting
2.1 Variable assignment
One of the most important things to be able to do in any programming language is store information in variables. You can think of variables as a label for one or more pieces of information. When doing analysis in python, all of your data (the variables you measured in your study) will be stored as variables. You can also create variables for other things too, which we will learn later on in the course.
Let’s create a few simple variables to represent customer demographics:
Variable | Value |
---|---|
customer id | 12345 |
first name | Wanda |
last name | Wonderly |
gender | Female |
married | False |
annual income | 240,000 |
We’ll begin with customer id
. As a rule, all variables need to be single words without spaces, otherwise python will throw a Syntax Error. Underscores work well in variable names _
.
It’s best practice to also use lowercase letter for variable names and to use names that are meaningful. For example, customer_id
is more descriptive then ci
.
Now that we have some basic rules down, let’s create the variable customer_id
. We do this by using a special operator called the assignment operator denoted by a single equals sign, =
. The name of the variable is the on the left side of the equals sign and the value being assigned is on the right.
customer_id = 12345
To see the value referenced in a variable, type in the variable name.
customer_id
12345
2.2 Data types
You can also determine the data type of the variable. “A data type is a particular kind of data item, as defined by the values it can take, the programming language used (in our case, python), or the operations that can be performed on it”.
Numeric data types: int and float
The data type for customer_id
is determined by the value it is assigned. Since it is assigned 12345
, the data type is a numeric data type called an integer denoted as int
, since it is a positive whole number. If it was -12345
, would it still be of type int
. Yes, since integers are positive or negative whole numbers. However, it was assigned 12345.0
python would consider it another type of numeric data called a float
. A float
is simply a positive floating point decimal. In short, if your data has decimal points, then it is a float
. If it is only whole numbers, then it is an int
.
To determine the data type of an existing variable use the type()
function and pass in the variable name.
Try it:
type(customer_id)
<type 'int'>
Let’s create a variable named salary
and assign it the value of 240,000
.
salary=240,000 # this will cause an error.
salary
(240, 0)
What is the error? Well you’ll notice that python still executed the code. A logical error type occurred. This means that our code worked but the output was unexpected. When we check the data type as shown below, it is of type tuple
. This is a data structure we’ll learn more about in lesson 3.
type(salary)
<type 'tuple'>
What caused this mistake?
Well, that thousands separator (the comma) in 240,000. Python interpret the assignment to salary as two values: 240 and 000.
To correct it, remove the thousands separator.
salary=240000
type(salary)
<type 'int'>
salary
240000
However, this can be unsettling output for reports and graphs. Consider using the .format()
function to make your ouput look more readable,
Number formatting
For example,
salary=240000
print("${:,.0f}".format(salary))
$240,000
The :,
adds a comma as a thousands separator, and the .0f
limits the string to zero decimal places at the end. This is followed by the .format() function that passes in the variable to format.
The following table shows various ways to format numbers using Python’s str.format()
function, including examples for both float formatting and integer formatting (Katz, 2012).
To run examples use print("FORMAT".format(NUMBER))
. So to get the output of the first example, you would run: print("{:.2f}".format(3.1415926))
(Katz, 2012, para 3.).
Number | Format | Output | Description |
---|---|---|---|
3.1415926 | {:.2f} | 3.14 | 2 decimal places |
3.1415926 | {:+.2f} | +3.14 | 2 decimal places with sign |
-1 | {:+.2f} | -1.00 | 2 decimal places with sign |
2.71828 | {:.0f} | 3 | No decimal places |
5 | {:0>2d} | 05 | Pad number with zeros (left padding, width 2) |
5 | {:x<4d} | 5xxx | Pad number with x’s (right padding, width 4) |
10 | {:x<4d} | 10xx | Pad number with x’s (right padding, width 4) |
1000000 | {:,} | 1,000,000 | Number format with comma separator |
0.25 | {:.2%} | 25.00% | Format percentage |
1000000000 | {:.2e} | 1.00e+09 | Exponent notation |
Use the format code syntax {field_name:conversion}, where field_name specifies the index number of the argument to the str.format() method, and conversion refers to the conversion code of the data type.
String data types
Next, let’s create the other variables and assign them to the following values:
Variable | Value | Status |
---|---|---|
customer id | 12345 | Created as customer_id |
first name | Wanda | |
last name | Wonderly | |
gender | Female | |
married | False | |
annual income | 240,000 |
Below, I’ve attempted to create the variable first name
and assign it to Wanda. There are two mistakes in this code. Can you find them?
first name = Wanda #will throw an error
First, the variable first name
has a space in it. We know variables cannot contain spaces. Let’s fix it.
firstname = Wanda #will throw an error
What’s still wrong with the code above?
If you don’t know, try to run it. What type of error is thrown?
You will likely receive an error message that looks like the following:
NameError: name 'Wanda' is not defined
What do you think is wrong?
Well, it probably has something to do with Wanda
. When we assign values that contain letters (With some exceptions like True
and False
. More on that soon!), or letters AND numbers those values need to be enclosed in double (or single) quotation marks. When we run the code above we receive this error:
NameError: name 'Wanda' is not defined
We can fix this code by putting Wanda
in double quotes:
firstname = "Wanda"
The type of data assigned to firstname is called a string
. Strings are sequences of character data. The string type in Python is called str
.
String literals may be delimited using either single or double quotes. All the characters between the opening delimiter and matching closing delimiter are part of the string. For example, when we used the print()
function in lesson 1, we enclosed the text in double quotes. We could print out the name Wanda
three ways.
Way 1: Using double quotes:
print("Wanda")
Wanda
Way 2: Using single quotes:
print('Wanda')
Wanda
Way 3: Referencing the variable firstname
print(firstname)
Wanda
We can determine the variable data type of firstname
and see that the type str
is return, which is a string data type.
type(firstname)
<type 'str'>
Let’s move on to creating the two other string variables: lastname
and gender
and print them out. You’ll notice in the code below that I’ve used the print()
function to print out both a string literal and the value referenced in the string variable together in a single statement.
lastname = "Wonderly"
gender = "Female"
print("I just created two more variables. lastname =",lastname, "and gender=",gender)
('I just created two more variables. lastname =', 'Wonderly', 'and gender=', 'Female')
There are many functions in python that act exclusively on strings. For example, if you wanted to change a string variable to all upper or lowercase letters you could use the upper()
or lower()
functions, respectively.
The format is variablename.upper()
and variablename.lower()
.
Let’s print out lastname and gender using these functions to change the output.
lastname.upper()
'WONDERLY'
gender.lower()
'female'
Boolean data types
Has lastname
changed it’s referenced value to uppercase WONDERLY
?
You can test this by using the isupper()
function as shown below.
lastname.isupper()
False
Are you surprised by the result?
You’ll notice that the value of False
is returned. False
is a Boolean value. Boolean values can only be True
or False
. The function isupper()
returns a Boolean value. The value returned in this example is False
, indicating that the variable lastname
is not is uppercase lettering.
This my be confusing to you. Previously, you just used the upper()
function to print out lastname
in uppercase lettering. This is because we did not update lastname
with the the result of lastname.isupper()
. We’ll learn how to update variables in the section below.
For now, let’s continue the discussion on Boolean data types.
In our demographic data we have a variable named married. Let’s create the variable and assign it the value of False
.
married =False
type(married)
<type 'bool'>
You’ll notice that the type returned for the married
variable is bool
, meaning a Boolean data type. When using Boolean data types, never put double or single quotes around the values. True
and False
are special reserved words in python. When used without quotes they indicate that values are Boolean.
In the upcoming lessons we’ll use Boolean variables with conditional statements and logical operators.
For example, we can use an if
statement to see if our customer is married or not. Note that the percise syntax of the statements below, including the colons and indentatation.
if (married):
print ("This customer is married.")
else:
print ("This customer is not married.")
This customer is not married.
This statement just tests to see if married is True
. Since it is not true the print statement below the else
is printed out.
2.3 Variable reassignment
Updating a variable is easy. To save the result from lastname.isupper()
we can simply assign lastname to itself as shown below.
lastname = lastname.isupper()
print(lastname) #prints out the result
False
Did you know that updating a varaible with a different type of data will change it’s data type?
For example, the variable lastname
is of type bool
. We can prove it using the type()
function.
type(lastname)
<type 'bool'>
If we reassign lastname
to the value of 20 (without double or single quotes around the value), it would be of type int
.
lastname =20
type(lastname)
<type 'int'>
Let’s go back and reassign lastname
the correct value.
lastname ="Wonderly"
type(lastname)
<type 'str'>
2.4 Changing data types
If you wanted salary
to be of type float
you could simply reassign salary the value of 240000.00 or “cast” the variable to a different type.
Option 1: reassign and use the .format
function.
salary=240000.00
type(salary)
<type 'float'>
print("${:,.2f}".format(salary))
$240,000.00
Option 2: cast the variable to type float using the float()
function and use the .format
function.
salary=240000
salary=float(salary)
type(salary)
<type 'float'>
print("${:,.2f}".format(salary))
$240,000.00
You can change from one data type to another using functions such as:
Function | Description | Example Input | Example Output |
---|---|---|---|
int() |
Converts an argument to an integer value from a number or a string | int("5") |
5 |
float() |
Converts an argument to a floating point decimal value from a number or a string | float(5) |
5.0 |
str() |
Converts an argument to a string value | str(5) |
'5' |
bool() |
Converts an argument to a Boolean value | bool(0) | False |
While there are functions to change data from one type to another, the value must be able to be converted to the right type.
For example, it’s easy to convert x = “23” to x=23 by using x=int(x)
. However, you couldn’t necessarily convert name =“Kelly” to an integer using int(name). This is simply because the name Kelly
cannot be converted to a number.
Here’s a good reference to learn more about data types: https://realpython.com/python-data-types/
2.5 Summary
- The equal sign
=
denotes assignment in python. - We use the assignment operator to assign values to a variable.
- Variables must be named as a single word without spaces.
- Variable names should be descriptive and written in lowercase.
- The value referenced in a variable can be accessed by typing out the variable name in a code chunk. The value will be returned.
- Variables that reference values that are non-numeric (with some exceptions) must be assigned values that are enclosed in double or single quotes and will be of type
str
. - The ouput of numbers can be formated using the
format()
function. - a
bool
data type is Boolean or logical data with the value ofTrue
orFalse
. - Variables can be reassigned new values. Variable reassignment from one data type such an
int
value to astr
value will change the data type of the reassigned variable. - Casting converts a variable from one data type to another. The value must be compatible and convert-able to the target data type.
- Functions to cast to different data types include
int()
,float()
,str()
, andbool()
.
2.6 Quiz
1. How would you assign the integer value of 138 to a variable of named weight?
138 = weight
weight = 138
weight = 138.00
weight ="138"
weight ='138'
2. What data type would python assign to distance_run=3.25
?
bool
int
str
float
complex
3. What function would you use to determine the data type of a variable?
str()
print(datatype())
data.type()
class()
type()
4. Which of the following functions work on only with strings?
upper()
lower()
isupper()
- a and b
- a,b,and c
5. What is the data type of this variable: x=True
bool
str
int
char
float
6. Which statement is correct given that you want to print the value of the following variable: name ="Kelly Jones"
followed by “is a customer in our system”.
print("Kelly Jones is a customer in our system")
print("Kelly Jones is a", name, "customer in our system")
print('Kelly Jones is a', name, 'customer in our system')
print(name, 'is customer in our system')
print(name "is customer in our system"")
7. How would you update the following variable named Present
to the Boolean value of false?
present = False
present = "false"
Present = False
Present ='FALSE'
present = 'False'
8. Which of the following statements will result in an error:
- int(“3”)
- string(3)
- myvariable = int(“3”)
- str
(3)
- str(“mystring”)
9. True or False
variable01="44"
and variable02="33"
.
If you wanted to add variable01
to variable02
and get the result of 77, you would first need to cast variable01 and variable02 to an integer using the int()
function and update the variables with the new integer values.
- True
- False
10. True or False
The following statement would result in a logical error if you wanted to assign the variable annual_rent to a value of $3,000.
annual_rent=3,000
- True
- False
2.7 Exercises
2.7.1 Exercise 2.1
1. Building upon exercise 1.2, create variable buy
and variable sell
for Jen.
2. Restructure your percent return formula using your new variables and assign this to its own ‘tot_return’ variable.
3. What data type is returned by the output (answer) from question 2?
2.7.2 Exercise 2.2
- You are working for Facebook’s Operations Intelligence team and are tasked to build a calculator to find the best ad sales representative each month which is based on largest (hint: max) monthly ad revenue. There are 5 sales representatives listed below with this months revenue. Set up the formula so each month you simply change each representatives revenue. Call the calculator
rep_calculator
Joe: $1,500,000 Sarah: $2,750,000 Jack: $560,000 Mark: $1,975,000 Shawn: $2,200,000
Answer:
Joe=1500000 Sarah=2750000 Jack=560000 Mark=1975000 Shawn=2200000
rep_calculator=max(Joe, Sarah, Jack, Mark, Shawn)
- Print out the following statement using your calculated value:
This months Best Ad Sales Representative made: ____________ dollars!
Answer:
print(‘This months Best Ad Sales Representative made:’, ad_calculator, ‘dollars!’)
2.8 Assignment 2
1. You work in baseball operations at the NY Mets and are purchasing baseball caps from NewEra for next season. NewEra sells two types of caps, a snapback and a flex cap. The price points for each hat are listed below. Based on forecasted demand for next season you plan to buy 40,000 snapbacks and 70,000 flex hats.
- Flex=$12.45
- Snapback=$10.60
Create variables for: snap_num
(the number of snap caps), flex_num
(the number of flex caps), along with the corresponding prices: snap_px
, flex_px
.
2. Create a total_cost
variable that represents the total cost of your purchase of both hat types.
3. After running some analysis you realize that snapbacks are even less popular this year and you decide to reduce the number of snapbacks by 10,000 and increase the flex hats by 10,000. Adjust your variables for this change and calculate a new total cost.
2.9 Exam questions
** 1. What data type is the following variable?**
aapl=202.59
- int
- float
- Str
- tuple
- num
** 2. The following code is entered into Python:
goog=500
amzn=100
cat=350
xom=550
jpm=600
max_shares=max(goog, amzn, cat, xom, jpm)
print(max_shares)
What is the output of the Python code above?
- SyntaxError
- Jpm
- 600
- NameError
3. Which statement below calculates the average jersey price?
Rivera=134.99 Alonso=124.99 Harper=119.99 Bregman=119.99 Betts=124.99
nd(((Rivera+Alonso+Harper+Bregman+Betts)/5),2)
round(((Rivera+Alonso+Harper+Bregman+Betts)//5),2)
rnd(((Rivera+Alonso+Harper+Bregman+Betts)//5),2)
round(((Rivera+Alonso+Harper+Bregman+Betts)/5),2)
4. How would you format the variable savings=23000.23
to print out as Your savings is: $23,000.23
- print(“${:,.2f}”.format(savings)) [CORRECT]
- print(“Your savings is: $” savings )
- print(“{:,.1f}”.format(savings)) d)print(“$(:,.2f)”.format{savings})
References
Katz, M. (2012).Python String Format Cookbook. Available at: https://mkaz.blog/code/python-string-format-cookbook/