Lesson 2 Data types and variables

Welcome to lesson 2. In this lesson we will learn how to create variables and understand different types of data. Follow along with this tutorial by creating a new ipython notebook named lesson02.ipynb enter the code snippets presented.

  • Variable assignment
  • Data types
  • Characters
  • Variable reassignment
  • Changing data types
  • Reading user input
  • Summary
  • Exercise 2.1
  • Exercise 2.2
  • Assignment 2

2.1 Variable assignment

One of the most important things to be able to do in any programming language is store information in variables. You can think of variables as a label for one or more pieces of information. When doing analysis in python, all of your data (the variables you measured in your study) will be stored as variables. You can also create variables for other things too, which we will learn later on in the course.

Let's create a few simple variables to represent customer demographics:

Variable Value
customer id 12345
first name Wanda
last name Wonderly
gender Female
married False
annual income 240,000

We'll begin with customer id. As a rule, all variables need to be single words without spaces, otherwise python will throw a Syntax Error. Use underscores _ instead of spaces.

It's best practice to also use lowercase letter for variable names and to use names that are meaningful. For example, customer_id is more descriptive then ci and uses an underscore instead of a space.

Also, avoid using the following reserved python words as variables names.

Keyword
and del from None True
as elif global nonlocal try
assert else if not while
break except import or with
class False in pass yield
continue finally is raise
def for lambda return

Now that we have some basic rules down, let's create the variable customer_id. We do this by using a special operator called the assignment operator denoted by a single equal sign, =. The name of the variable is on the left side of the equals sign and the value being assigned is on the right.

customer_id = 12345

To see the value referenced in a variable, type in the variable name.

customer_id
12345

2.2 Data types

You can also determine the data type of the variable. "A data type is a particular kind of data item, as defined by the values it can take, the programming language used (in our case, python), or the operations that can be performed on it".

Numeric data types: int and float

The data type for customer_id is determined by the value it is assigned. Since it is assigned 12345, the data type is a numeric data type called an integer denoted as int, since it is a positive whole number. If it was -12345, would it still be of type int? Yes, since integers are positive or negative whole numbers. However, it was assigned 12345.0 python would consider it a float. A float is simply a floating point decimal. In short, if your data has decimal points, then it is a float. If it is only whole numbers, then it is an int.

To determine the data type of an existing variable use the type() function and pass in the variable name.

Try it:

type(customer_id)
<class 'int'>

Let's create a variable named salary and assign it the value of 240,000.

salary=240,000 # this will cause an error.
salary
(240, 0)

What is the error? Well you'll notice that python still executed the code. A logical error type occurred. This means that our code worked but the output was unexpected. When we check the data type as shown below, it is of type tuple. This is a data structure we'll learn more about in lesson 3.

type(salary)
<class 'tuple'>

What caused this mistake or logical error?

Well, that thousands separator (the comma) in 240,000. Python interpreted the assignment to salary as two values: 240 and 000.

To correct it, remove the thousands separator.

salary = 240000
type(salary)
<class 'int'>
salary
240000

However, this output format is undesirable for for reports and graphs. Consider using the .format() function to make your output look more readable,

Number formatting

For example,

salary=240000
print("${:,.0f}".format(salary))
$240,000

The $ adds the dollar sign. You can add any currency symbol, e.g. £, €, ¥.

The :, adds a comma as a thousands separator, and the .0f limits the string to zero decimal places at the end. This is followed by the .format() function that passes in the variable to format.

The following table shows various ways to format numbers using Python’s str.format() function, including examples for both float formatting and integer formatting (Katz (2015)).

To run examples use print("FORMAT".format(NUMBER)). See the table below for the usage.

The first example, illustrates how to format data for 2 decimal places using: print("{:.2f}".format(3.1415926))

Number Format Output Description
3.1415926 {:.2f} 3.14 2 decimal places
3.1415926 {:+.2f} +3.14 2 decimal places with sign
-1 {:-.2f} -1.00 2 decimal places with sign
2.71828 {:.0f} 3 No decimal places
5 {:0>2d} 05 Pad number with zeros (left padding, width 2)
5 {:x<4d} 5000 Pad number with x’s (right padding, width 4)
10 {:x<4d} 1000 Pad number with x’s (right padding, width 4)
1000000 {:,} 1,000,000 Number format with comma separator
0.25 {:.2%} 25.00% Format percentage
1000000000 {:.2e} 1.00e+09 Exponent notation
32 ${:.2f} $32.00 Adding a currency symbol

Use the format code syntax {field_name:conversion}, where field_name specifies the index number of the argument to the str.format() method, and conversion refers to the conversion code of the data type.

String data types

Next, let's create the other variables from our very first example of customer demographics and assign them to the following values:

Variable Value Status
customer id 12345 Created as customer_id
first name Wanda
last name Wonderly
gender Female
married False
annual income 240,000

Below, I've attempted to create the variable first name and assign it to Wanda. There are two mistakes in this code. Can you find them?

first name = Wanda #will throw an error 

First, the variable first name has a space in it. We know variables cannot contain spaces. Let's fix it.

firstname = Wanda #will throw an error 

What's still wrong with the code above?

If you don't know, try to run it. What type of error is thrown?

You will likely receive an error message that looks like the following:

NameError: name 'Wanda' is not defined

What do you think is wrong?

Well, it probably has something to do with Wanda. When we assign values that contain letters (With some exceptions like True and False. More on that soon!), or letters AND numbers those values need to be enclosed in double (or single) quotation marks. When we run the code above we receive this error:

NameError: name 'Wanda' is not defined

We can fix this code by putting Wanda in double quotes:

firstname = "Wanda"

The type of data assigned to firstname is called a string. Strings are sequences of character data. The string type in Python is called str.

String literals may be delimited using either single or double quotes. All the characters between the opening delimiter and matching closing delimiter are part of the string. For example, when we used the print() function in lesson 1, we enclosed the text in double quotes. We could print out the name Wanda three ways.

Way 1: Using double quotes:

print("Wanda")
Wanda

Way 2: Using single quotes:

print('Wanda')
Wanda

Way 3: Referencing the variable firstname

print(firstname)
Wanda

We can determine the variable data type of firstname and see that the type str is returned, which is a string data type.

type(firstname)
<class 'str'>

Let's move on to creating the two other string variables: lastname and gender and print them out. You'll notice in the code below that I've used the print() function to print out both a string literal and the value referenced in the string variable together in a single statement. I've added the sep parameter to the print() function as well. sep is set to sep="'" to indicate that the separator between values (our variables gender and lastname) should be a single quote. The default for sep is a space.

lastname = "Wonderly"
gender = "Female"

print("I just created two more variables named lastname with the value of", lastname, "and gender with the value of",gender, sep="'")
I just created two more variables named lastname with the value of'Wonderly'and gender with the value of'Female

There are many functions in python that act exclusively on strings. See Severance (2016), chapter 6. For example, if you wanted to change a string value to all upper or lowercase letters you could use the upper() or lower() functions, respectively.

The format is variablename.upper() and variablename.lower().

Let's print out lastname and gender using these functions to change the output.

lastname.upper()
'WONDERLY'
gender.lower()
'female'

A method call is called an invocation; in this case, we would say that we are invoking .upper on lastname and .lower on gender.

Boolean data types

Has lastname changed it's referenced value to uppercase WONDERLY?

You can test this by using the isupper() function as shown below.

lastname.isupper()
False

Are you surprised by the result?

You'll notice that the value of False is returned. False is a Boolean value. Boolean values can only be True or False. The function isupper() returns a Boolean value. The value returned in this example is False, indicating that the variable lastname is not in uppercase lettering.

This my be confusing to you. Previously, you just used the upper() function to print out the value referenced in lastname in uppercase lettering. This is because we did not update lastname with the the result of lastname.isupper(). We'll learn how to update variables in the section below.

For now, let's continue the discussion on Boolean data types.

In our demographic data we have a variable named married. Let's create the variable and assign it the value of False.

married = False
type(married)
<class 'bool'>

You'll notice that the type returned for the married variable is bool, meaning a Boolean data type. When using Boolean data types, never put double or single quotes around the values. True and False are special reserved words in python. When used without quotes they indicate that values are Boolean. The words TRUE, true, FALSE, and false are not Boolean values in python and will result in a NameError.

In the upcoming lessons we'll use Boolean variables with conditional statements and logical operators.

For example, we can use an if statement to see if our customer is married or not. Note that the precise syntax of the statements below, including the colons and indentation.

if (married):
  print ("This customer is married.")
else:
  print ("This customer is not married.")
This customer is not married.

This statement evaluates married to see if it is True or False. Since the value is False the print statement below the else is printed returned.

2.3 Characters

Each character that you can type on your keyboard has a numeric representation. For example the value of "A" is 65 in ASCII. "ASCII stands for American Standard Code for Information Interchange. Computers can only understand numbers, so an ASCII code is the numerical representation of a character such as 'a' or '@'"((“ASCII Table and Description” 2020), para 1).

In python,"A" will be still be considered data of type string. However, knowing the numeric equivalent for each character is important when trying to compare them to each other.

For example, given that "A" is 65 in ASCII, the value of "a" is 97. This means that "a" is greater than "A", in computer speak. How could you prove this (and not just take my word for it)? Well, use the max() function to see which one is greater.

Try it

max("A", "a")
## 'a'

You'll see that 'a' is returned, thus the higher ASCII value.

However, you still haven't proved the ASCII values of "A" or 'a'. Let's find the the ASCII values of these character to prove this example out. To do this, we can use the ord() function, which is a built-in function in Python that accepts a char (string of length 1) as argument and returns the Unicode code point for that character. Since the first 128 Unicode code points are same as ASCII value, we can use this function to find the ASCII value of any character.

your_first_character = "a"
your_second_character = "A"
print("The ASCII value of " + your_first_character + " is: ",ord(your_first_character))
## The ASCII value of a is:  97
print("The ASCII value of " + your_second_character + " is: ",ord(your_second_character))
## The ASCII value of A is:  65

Then you can check the data types of the returned ASCII values by using type(ord(your_first_character)) and type(ord(your_second_character)).

Try it

type(ord(your_first_character))
## <class 'int'>
type(ord(your_second_character))
## <class 'int'>

It's possible to return the character value from the ASCII value using the chr() function. See below:


num = (65)
print(chr(num))
## A

For the full ASCII table and character equivalents go to: https://simple.wikipedia.org/wiki/ASCII

2.4 Variable reassignment

Updating a variable is easy. To save the result from lastname.isupper() we can simply assign lastname to itself as shown below.

lastname = lastname.isupper()
print(lastname) #prints out the result
False

Did you know that updating a variable with a different type of data will change it's data type?

For example, the variable lastname is of type bool. We can prove it using the type() function.

type(lastname)
<class 'bool'>

If we reassign lastname to the value of 20 (without double or single quotes around the value), it would be of type int.

lastname = 20
type(lastname)
<class 'int'>

Let's go back and reassign lastname the correct value.

lastname = "Wonderly"
type(lastname)
<class 'str'>

2.5 Changing data types

If you wanted salary to be of type float you could simply reassign salary the value of 240000.00 or cast the variable to a different type. Casting is the process of converting a value to different data type.

Option 1: reassign and use the .format function.

salary = 240000.00
type(salary)
<class 'float'>
print("${:,.2f}".format(salary))
$240,000.00

Option 2: cast the variable to type float using the float() function and use the .format function.

salary = 240000
salary = float(salary)
type(salary)
<class 'float'>
print("${:,.2f}".format(salary))
$240,000.00

You can change from one data type to another using the following functions:

Function Description Example Input Example Output
int() Converts an argument to an integer value from a number or a string int("5") 5
float() Converts an argument to a floating point decimal value from a number or a string float(5) 5.0
str() Converts an argument to a string value str(5) '5'
bool() Converts an argument to a Boolean value bool(0) False

While there are functions to change data from one type to another, the value must be able to be converted to the right type.

For example, it's easy to convert x = "23" to x = 23 by using x = int(x). However, you couldn't necessarily convert name = "Kelly" to an integer using int(name). This is simply because the name Kelly cannot be converted to a number.

Here's a good reference to learn more about data types: https://realpython.com/python-data-types/ (Sturtz 2019).

2.6 Reading user input

Up until now, you've been writing a few lines of code to solve a specific problem. These short programs did not require any input from a user. Rather, you were both the the coder and user of the programs you've written thus far (in this course).

We'll go through an example that asks user of our program for input, such as their name and how many hours they exercise per week.

Then, we can present our user with a result that calcuates the number of hours they work out per year; this is all personalized based on their input!

Let's try it using the input function to read in the user input.

input("Enter your name and press the enter key.")

This will literally prompt the user to enter their name. Try it.

Did you try it out? What happened? Well, it's probably a bit underwhelming. The input you typed in as the user was just printed back to the screen.

Let's modify our code and store the result of the user's response in a variable.

username = input("Enter your name and press the enter key.\n")

Now, the name entered by the user is can be easily referenced in the variable, username. Also, did you notice the \n that I included after my prompt? This prints the input string to the screen with a line break following the prompt. The box for user input then appears on a newline. The \n is the syntax for a newline.

Try it.

Use the newline character when you print strings to the screen and you want to include line breaks.

Next, let us ask the user our second question:

How many hours do you exercise per week?

How would you do this? Well, the same way we did with the first question, using the input function and assigning the value entered by the user to a variable.

workout_hours = input("How many hours on average do you exercise per week.\n")

Our next task is to print out a nice statement to the user that calcuates the number of hours they exercise per year.

print("Hi", username,". You've have spent", (workout_hours)*52, "working out this year.")

Try it.

What was the result? Was the output expected?

This is an example of another logical error. Our code worked, but the output was produced an unexpected result. Any idea why?

The answer is very simple, but one that may not be obvious. The input function captures data of type string, regardless of the user's input.

This is a perfect example of where we need to convert from one data type to another. We need to convert our variable workout_hours to a float. Why a float and not an int? Well, it's possible our user exercise average is 5.5 hours per week, which is a non-integer value. If we try to convert a string to an integer we'd get an error. Try it yourself.

hours = 5.5 print(int(hours))

Did you see the error?

Here is the final solution:

print("Hi", username,". You've have spent", float(workout_hours)*52, "working out this year.")

When we plan and write programs for others we have to account for all types of input.

What other types of user input would want to plan for? As we continue on in the course we will do more "error handling" to account for human, rather than computer input, that can take many unexpected forms.

2.7 Advanced: Finding variable names

There may be times where you actually want to know the name of the variable you are using in on of your python programs.

There a python library called varname and a method called nameof() that we can use to programmatically identify the name of a given variable.

Here's an example you can try:

pip install python-varname #first install library

from varname import varname, nameof #import library and method

a=1 # create a variable
aname=nameof(a) #store the name of the variable in another variable 
print(aname) #print variable a's name to the screen

2.8 Summary

  • The equal sign = denotes assignment in python.
  • We use the assignment operator to assign values to a variable.
  • Variables must be named as a single word without spaces.
  • Variable names should be descriptive and written in lowercase.
  • The value referenced in a variable can be accessed by typing out the variable name in a code chunk. The value will be returned.
  • Variables that reference values that are non-numeric (with some exceptions) must be assigned values that are enclosed in double or single quotes and will be of type str.
  • The output of numbers can be formatted using the format() function.
  • a bool data type is Boolean or logical data with the value of True or False.
  • Variables can be reassigned new values. Variable reassignment from one data type such an int value to a str value will change the data type of the reassigned variable.
  • Casting converts a variable from one data type to another. The value must be compatible and convert-able to the target data type.
  • Functions to cast to different data types include int(), float(), str(), and bool().
  • The input function allows us to prompt the user for input. The input can be referenced in a variable.
  • The input captured from the input function is always of type str.

Exercise 2.1

  1. Building upon exercise 1.2, create variable buy and variable sell for Jen and assign it the appropriate values as shown below. Be sure to name your variables descriptive names, in lowercase lettering, and without spaces.

  2. Restructure your percent return formula using your new variables and assign this to its own tot_return variable. Print out the variables and format to 2 decimal places.

  3. What data type is returned by the output (answer) from question 2? Use the appropriate function to display the data type.

  4. Provide the output for the following expressions and explain why.

  1. max("adbED")

  2. max(4234234,234234)

  3. max(1,000)

  4. min("}34#&")

Exercise 2.2

  1. You are working for Facebook’s Operations Intelligence team and are tasked to build a calculator to find the best ad sales representative each month which is based on the largest (hint: max) monthly ad revenue. There are 5 sales representatives listed below with this months revenue. Set up the formula so each month you simply change each representatives revenue. Call the calculator rep_calculator
  • Joe: $1,500,000
  • Sarah: $2,750,000
  • Jack: $560,000
  • Mark: $1,975,000
  • Shawn: $2,200,000
  1. Print out the following statement using your calculated value. Be sure to format the data using .format.

This months Best Ad Sales Representative made: ____________ dollars!

  1. Thinking question: How might you find out the name of the Ad Sales Representative who earned the most (answer in words, not code)?

Assignment 2

  1. You work in baseball operations at the NY Mets and are purchasing baseball caps from NewEra for next season. NewEra sells two types of caps, a snapback and a flex cap. The price points for each hat are listed below. Based on forecasted demand for next season you plan to buy 40,000 snapbacks and 70,000 flex hats.
  • Flex=$12.45
  • Snapback=$10.60

Create variables for: snap_num (the number of snap caps), flex_num (the number of flex caps), along with the corresponding prices: snap_px, flex_px.

  1. Create a total_cost variable that represents the total cost of your purchase of both hat types. Print out the value to the screen.

  2. After running some analysis you realize that snapbacks are even less popular this year and you decide to reduce the number of snapbacks by 10,000 and increase the flex hats by 10,000. Adjust your variables for this change and calculate a new total cost. Print out the value to the screen in the following sentence: The total cost is $1,314,000.00. Where the resulting total cost is a variable that is computed and printed.

References

“ASCII Table and Description.” 2020. http://www.asciitable.com/.

Katz, M. 2015. “Python String Format Cookbook.” https://mkaz.blog/code/python-string-format-cookbook/.

Severance, Charles. 2016. Python for Everybody: Exploring Data in Python 3.

Sturtz, John. 2019. “Basic Data Types in Python.” https://realpython.com/python-data-types/.