Lesson 2 Data types and variables

Welcome to lesson 2. In this lesson we will learn how to create variables and understand different types of data. Follow along with this tutorial by creating a new ipython notebook named lesson02.ipynb enter the code snippets presented.

  • Variable assignment
  • Data types
  • Variable reassignment
  • Changing data types
  • Summary
  • Exercise 2.1
  • Exercise 2.2
  • Assignment 2

2.1 Variable assignment

One of the most important things to be able to do in any programming language is store information in variables. You can think of variables as a label for one or more pieces of information. When doing analysis in python, all of your data (the variables you measured in your study) will be stored as variables. You can also create variables for other things too, which we will learn later on in the course.

Let’s create a few simple variables to represent customer demographics:

Variable Value
customer id 12345
first name Wanda
last name Wonderly
gender Female
married False
annual income 240,000

We’ll begin with customer id. As a rule, all variables need to be single words without spaces, otherwise python will throw a Syntax Error. Use underscores _ instead of spaces.

It’s best practice to also use lowercase letter for variable names and to use names that are meaningful. For example, customer_id is more descriptive then ci and uses an underscore instead of a space.

Also, avoid using the following reserved python words as variables names.

Keyword
and del from None True
as elif global nonlocal try
assert else if not while
break except import or with
class False in pass yield
continue finally is raise
def for lambda return

Now that we have some basic rules down, let’s create the variable customer_id. We do this by using a special operator called the assignment operator denoted by a single equals sign, =. The name of the variable is the on the left side of the equals sign and the value being assigned is on the right.

customer_id = 12345

To see the value referenced in a variable, type in the variable name.

customer_id
12345

2.2 Data types

You can also determine the data type of the variable. “A data type is a particular kind of data item, as defined by the values it can take, the programming language used (in our case, python), or the operations that can be performed on it”.

Numeric data types: int and float

The data type for customer_id is determined by the value it is assigned. Since it is assigned 12345, the data type is a numeric data type called an integer denoted as int, since it is a positive whole number. If it was -12345, would it still be of type int? Yes, since integers are positive or negative whole numbers. However, it was assigned 12345.0 python would consider it a float. A float is simply a floating point decimal. In short, if your data has decimal points, then it is a float. If it is only whole numbers, then it is an int.

To determine the data type of an existing variable use the type() function and pass in the variable name.

Try it:

type(customer_id)
<class 'int'>

Let’s create a variable named salary and assign it the value of 240,000.

salary=240,000 # this will cause an error.
salary
(240, 0)

What is the error? Well you’ll notice that python still executed the code. A logical error type occurred. This means that our code worked but the output was unexpected. When we check the data type as shown below, it is of type tuple. This is a data structure we’ll learn more about in lesson 3.

type(salary)
<class 'tuple'>

What caused this mistake?

Well, that thousands separator (the comma) in 240,000. Python interpreted the assignment to salary as two values: 240 and 000.

To correct it, remove the thousands separator.

salary=240000
type(salary)
<class 'int'>
salary
240000

However, this output format is undesirabled for for reports and graphs. Consider using the .format() function to make your output look more readable,

Number formatting

For example,

salary=240000
print("${:,.0f}".format(salary))
$240,000

The $ adds the dollar sign. You can add any currency symbol, e.g. £, €, ¥.

The :, adds a comma as a thousands separator, and the .0f limits the string to zero decimal places at the end. This is followed by the .format() function that passes in the variable to format.

The following table shows various ways to format numbers using Python’s str.format() function, including examples for both float formatting and integer formatting (Katz (2015)).

To run examples use print("FORMAT".format(NUMBER)). See the table below for the usage.

The first example, illustrates how to format data for 2 decimal places using: print("{:.2f}".format(3.1415926))

Number Format Output Description
3.1415926 {:.2f} 3.14 2 decimal places
3.1415926 {:+.2f} +3.14 2 decimal places with sign
-1 {:+.2f} -1.00 2 decimal places with sign
2.71828 {:.0f} 3 No decimal places
5 {:0>2d} 05 Pad number with zeros (left padding, width 2)
5 {:x<4d} 5000 Pad number with x’s (right padding, width 4)
10 {:x<4d} 1000 Pad number with x’s (right padding, width 4)
1000000 {:,} 1,000,000 Number format with comma separator
0.25 {:.2%} 25.00% Format percentage
1000000000 {:.2e} 1.00e+09 Exponent notation
32 ${:.2f} $32.00 Adding a currency symbol

Use the format code syntax {field_name:conversion}, where field_name specifies the index number of the argument to the str.format() method, and conversion refers to the conversion code of the data type.

String data types

Next, let’s create the other variables from our very first example of customer demographics and assign them to the following values:

Variable Value Status
customer id 12345 Created as customer_id
first name Wanda
last name Wonderly
gender Female
married False
annual income 240,000

Below, I’ve attempted to create the variable first name and assign it to Wanda. There are two mistakes in this code. Can you find them?

first name = Wanda #will throw an error 

First, the variable first name has a space in it. We know variables cannot contain spaces. Let’s fix it.

firstname = Wanda #will throw an error 

What’s still wrong with the code above?

If you don’t know, try to run it. What type of error is thrown?

You will likely receive an error message that looks like the following:

NameError: name 'Wanda' is not defined

What do you think is wrong?

Well, it probably has something to do with Wanda. When we assign values that contain letters (With some exceptions like True and False. More on that soon!), or letters AND numbers those values need to be enclosed in double (or single) quotation marks. When we run the code above we receive this error:

NameError: name 'Wanda' is not defined

We can fix this code by putting Wanda in double quotes:

firstname = "Wanda"

The type of data assigned to firstname is called a string. Strings are sequences of character data. The string type in Python is called str.

String literals may be delimited using either single or double quotes. All the characters between the opening delimiter and matching closing delimiter are part of the string. For example, when we used the print() function in lesson 1, we enclosed the text in double quotes. We could print out the name Wanda three ways.

Way 1: Using double quotes:

print("Wanda")
Wanda

Way 2: Using single quotes:

print('Wanda')
Wanda

Way 3: Referencing the variable firstname

print(firstname)
Wanda

We can determine the variable data type of firstname and see that the type str is returned, which is a string data type.

type(firstname)
<class 'str'>

Let’s move on to creating the two other string variables: lastname and gender and print them out. You’ll notice in the code below that I’ve used the print() function to print out both a string literal and the value referenced in the string variable together in a single statement. I’ve added the sep parameter to the print() function as well. sep is set to sep="'" to indicate that the separator between values (our variables gender and lastname) should be a single quote. The default for sep is a space.

lastname = "Wonderly"
gender = "Female"

print("I just created two more variables named lastname with the value of", lastname, "and gender with the value of",gender, sep="'")
I just created two more variables named lastname with the value of'Wonderly'and gender with the value of'Female

There are many functions in python that act exclusively on strings. For example, if you wanted to change a string value to all upper or lowercase letters you could use the upper() or lower() functions, respectively.

The format is variablename.upper() and variablename.lower().

Let’s print out lastname and gender using these functions to change the output.

lastname.upper()
'WONDERLY'
gender.lower()
'female'

Boolean data types

Has lastname changed it’s referenced value to uppercase WONDERLY?

You can test this by using the isupper() function as shown below.

lastname.isupper()
False

Are you surprised by the result?

You’ll notice that the value of False is returned. False is a Boolean value. Boolean values can only be True or False. The function isupper() returns a Boolean value. The value returned in this example is False, indicating that the variable lastname is not in uppercase lettering.

This my be confusing to you. Previously, you just used the upper() function to print out the value referened in lastname in uppercase lettering. This is because we did not update lastname with the the result of lastname.isupper(). We’ll learn how to update variables in the section below.

For now, let’s continue the discussion on Boolean data types.

In our demographic data we have a variable named married. Let’s create the variable and assign it the value of False.

married =False
type(married)
<class 'bool'>

You’ll notice that the type returned for the married variable is bool, meaning a Boolean data type. When using Boolean data types, never put double or single quotes around the values. True and False are special reserved words in python. When used without quotes they indicate that values are Boolean. The words TRUE, true, FALSE, and false are not Boolean values in python and will result in a NameError.

In the upcoming lessons we’ll use Boolean variables with conditional statements and logical operators.

For example, we can use an if statement to see if our customer is married or not. Note that the precise syntax of the statements below, including the colons and indentation.

if (married):
  print ("This customer is married.")
else:
  print ("This customer is not married.")
This customer is not married.

This statement evaluates married to see if it is True or False. Since the value is False the print statement below the else is printed returned.

2.3 Variable reassignment

Updating a variable is easy. To save the result from lastname.isupper() we can simply assign lastname to itself as shown below.

lastname = lastname.isupper()
print(lastname) #prints out the result
False

Did you know that updating a variable with a different type of data will change it’s data type?

For example, the variable lastname is of type bool. We can prove it using the type() function.

type(lastname)
<class 'bool'>

If we reassign lastname to the value of 20 (without double or single quotes around the value), it would be of type int.

lastname =20
type(lastname)
<class 'int'>

Let’s go back and reassign lastname the correct value.

lastname ="Wonderly"
type(lastname)
<class 'str'>

2.4 Changing data types

If you wanted salary to be of type float you could simply reassign salary the value of 240000.00 or cast the variable to a different type. Casting isthe process of converting a value to different data type.

Option 1: reassign and use the .format function.

salary=240000.00
type(salary)
<class 'float'>
print("${:,.2f}".format(salary))
$240,000.00

Option 2: cast the variable to type float using the float() function and use the .format function.

salary=240000
salary=float(salary)
type(salary)
<class 'float'>
print("${:,.2f}".format(salary))
$240,000.00

You can change from one data type to another using the following functions:

Function Description Example Input Example Output
int() Converts an argument to an integer value from a number or a string int("5") 5
float() Converts an argument to a floating point decimal value from a number or a string float(5) 5.0
str() Converts an argument to a string value str(5) '5'
bool() Converts an argument to a Boolean value bool(0) False

While there are functions to change data from one type to another, the value must be able to be converted to the right type.

For example, it’s easy to convert x = "23" to x=23 by using x=int(x). However, you couldn’t necessarily convert name ="Kelly" to an integer using int(name). This is simply because the name Kelly cannot be converted to a number.

Here’s a good reference to learn more about data types: https://realpython.com/python-data-types/ (Sturtz 2019).

2.5 Summary

  • The equal sign = denotes assignment in python.
  • We use the assignment operator to assign values to a variable.
  • Variables must be named as a single word without spaces.
  • Variable names should be descriptive and written in lowercase.
  • The value referenced in a variable can be accessed by typing out the variable name in a code chunk. The value will be returned.
  • Variables that reference values that are non-numeric (with some exceptions) must be assigned values that are enclosed in double or single quotes and will be of type str.
  • The output of numbers can be formatted using the format() function.
  • a bool data type is Boolean or logical data with the value of True or False.
  • Variables can be reassigned new values. Variable reassignment from one data type such an int value to a str value will change the data type of the reassigned variable.
  • Casting converts a variable from one data type to another. The value must be compatible and convert-able to the target data type.
  • Functions to cast to different data types include int(), float(), str(), and bool().

Exercise 2.1

  1. Building upon exercise 1.2, create variable buy and variable sell for Jen.

  2. Restructure your percent return formula using your new variables and assign this to its own ‘tot_return’ variable.

  3. What data type is returned by the output (answer) from question 2?

Exercise 2.2

  1. You are working for Facebook’s Operations Intelligence team and are tasked to build a calculator to find the best ad sales representative each month which is based on the largest (hint: max) monthly ad revenue. There are 5 sales representatives listed below with this months revenue. Set up the formula so each month you simply change each representatives revenue. Call the calculator rep_calculator
  • Joe: $1,500,000
  • Sarah: $2,750,000
  • Jack: $560,000
  • Mark: $1,975,000
  • Shawn: $2,200,000
  1. Print out the following statement using your calculated value: This months Best Ad Sales Representative made: ____________ dollars!

Assignment 2

  1. You work in baseball operations at the NY Mets and are purchasing baseball caps from NewEra for next season. NewEra sells two types of caps, a snapback and a flex cap. The price points for each hat are listed below. Based on forecasted demand for next season you plan to buy 40,000 snapbacks and 70,000 flex hats.
  • Flex=$12.45
  • Snapback=$10.60

Create variables for: snap_num (the number of snap caps), flex_num (the number of flex caps), along with the corresponding prices: snap_px, flex_px.

  1. Create a total_cost variable that represents the total cost of your purchase of both hat types.

  2. After running some analysis you realize that snapbacks are even less popular this year and you decide to reduce the number of snapbacks by 10,000 and increase the flex hats by 10,000. Adjust your variables for this change and calculate a new total cost.

References

Katz, M. 2015. “Python String Format Cookbook.” https://mkaz.blog/code/python-string-format-cookbook/.

Sturtz, John. 2019. “Basic Data Types in Python.” https://realpython.com/python-data-types/.