Lesson 3 Data Structures I

Welcome to lesson 3. In this lesson we will begin to learn the ways in which data is encoded in python.

Follow along with this tutorial by creating a new ipython notebook named lesson03.ipynb and entering the code snippets presented. Note: this lesson and corresponding exercises and assignment will take you more time to complete than the previous lessons.

Outline

  • Tuples
  • Lists
  • Dictionaries
  • Summary
  • Exercise 3.1
  • Exercise 3.2
  • Assignment 3

In lesson 2, we learned how create and update variables with a single value. When we want to store more than one value in a variable object we create a data structure for the object. "In any programming language, the data structures available play an extremely important role. They decide how your data will be stored in the memory and how you can retrieve the data when required" (Gupta (2018), para 1).

A data structure defines attributes of the data and the organization of the data. To make this concept more concrete, you can think of an Excel spreadsheet as a data structure comprised of rows and columns. There are many types of data structures in python, each with a set of features and functions that you can use to manipulate or act on your data. Some examples include the tuple, List, Series, dictionary, and DataFrame.

The syntax for each one of these data structures is different.

We'll begin working with tuples and Lists. These "are arguably Python's most versatile, useful data types. You will find them in virtually every nontrivial Python program"(Sturtz (2018), para 1).

3.1 Tuples

The first simple data structure that I'll introduce you to is the tuple. A tuple is a data object in python. A tuple can contain heterogeneous data. This means you can store data of different types (i.e. str, int, float) in the same structure. A tuple is considered to be the fastest and consumes the least amount of memory. However, you cannot update the values in a tuple. It is an object that cannot be updated; it is immutable. Use tuples if your data will not change. A tuple has a fixed length and size.

We can best understand how tuples work through an example. Let's assume we tracked our monthly spend on consumable goods for a 12 month period on Amazon. We have 12 values, one for each month. The objective is to store this table of information as variables.

Month Value
January 230
February 250
March 345
April 290
May 890
June 321
July 232
August 456
September 213
October 299
November 123
December 566

If we used a variable for each value, we'd have to create 12 variables. For example,

January = 230
February = 250
# and so on

Instead, since these values are fixed, meaning they won't change since we can't go back in time and update how much we spent per month (assume no returns), we can store them in a single variable of type tuple. We can store different types of data in our tuple. Let's just store the values for each month. We do this by giving our tuple object a name, i.e. spend. Next we use the assignment operator (i.e =) to assign 12 values to our variable spend. The values must be enclosed within in parentheses and separated by commas to delimit one value from the next.

spend=(230,250,345,290, 890, 321, 232,456,213,299, 123, 566)

The variable spend is a data object of type tuple. We can prove this by using the type() function. You'll observe that the object type that is returned is a tuple.

type(spend)
<class 'tuple'>

To see the length (or how many elements are in the tuple) we can use the len function.

len(spend)
12

If we wanted to add up all of the values in our tuple named spend we could use a function called sum().

Similarly, we can determine the minimum and maximum value in a tuple using the min() and max() functions, respectively.

spend=(230,250,345,290, 890, 321, 232,456,213,299, 123, 566)

print(sum(spend))
print(min(spend))
print(max(spend))

To access a given value in a tuple, we use indexing. This means that we reference a value by it's location within the tuple.

For example, if we wanted to access the value for January, the first value in the tuple, we would use the expression spend[0]. We pass the position of the value as a parameter (using square brackets).

This expression is basically saying return the first value in the tuple. However, why did we use 0 instead of 1?

Let's look at an example.

spend[1]
250

What value is returned when we pass in 1? It's the second value in the tuple, 250. We can deduce that python using zero indexing. This means that python starts counting at zero, not 1. In contrast, the R programming language, uses non-zero indexing, it starts counting at 1.

The table below show the index for a given value in our tuple.

Index Value Expression
0 230 spend[0]
1 250 spend[1]
2 345 spend[2]
3 290 spend[3]
4 890 spend[4]
5 321 spend[5]
6 232 spend[6]
7 456 spend[7]
8 213 spend[8]
9 299 spend[9]
10 123 spend[10]
11 566 spend[11]

Now that you understand how python indexes values in a tuple, let's return the value for January from our tuple spend.

spend[0]
230

Now, let's create a second tuple to store the actual months that correspond to our monthly spending called month.

month=("January", "February", "March", "April", "May", "June", "July", "August", "September", "October", "November", "December")

To print out the values of our new tuple month just use the print() function or type the name of the tuple.

print(month) 
('January', 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December')

Tuples are interesting data structures because they can store heterogeneous data. That means, we can store data of type int, str, float, and bool all in one data structure.

Let's use the demographic data from lesson 2 to create a single tuple.

Variable Value
customer id 12345
first name Wanda
last name Wonderly
gender Female
married False
annual income 240,000

Try it.

demographics=(12345,"Wanda","Wonderly","Female", False, 2400000)

type(demographics) #returns the type of data structure
<class 'tuple'>
type(demographics[0]) #returns the data type of the 1st value
<class 'int'>
type(demographics[1]) #returns the data type of the 2nd value
<class 'str'>

Tuples are useful when you do not want the data to change. A defining characteristic of a tuple is that it is immutable. For example, if Wanda Wonderly's annual income changed we would not be able to update our tuple.

To access the marriage element we use the square brackets [ ]. We want to access the 5th value, which is the 4th element (remember we start counting at zero).

demographics[4]
False

This returns the marriage status.

With other data structures, we can usually update this value if Wanda's salary changed in the following way:

demographics[5] = 250000

However, if you try to run this code it returns a TypeError:

TypeError: 'tuple' object does not support item assignment

I bet you now know why this error occured? If not, reread this section on tuples.

3.2 Lists

Another type of data structure is called a List. Lists have specific features that differ from tuples. The most important feature for our purposes is that Lists are mutable. This means we can update values in a List. In many programming languages, arrays are used instead of Lists. However, python does not have built in support for arrays. Python lists can be used instead.

Let's begin by creating a list.

mylistofskills = ["python", "R", "SQL"]

You'll immediately notice that to construct a list named mylistofskills we pass in the list values using square brackets [ ] rather than parentheses. This is how python distinguishes between a tuple and a List two data structures

To access a single element in a list, use the square brackets and pass in the position number.

mylistofskills [0] #returns the first value
'python'
mylistofskills [1] #returns the second value
'R'
mylistofskills [2] #returns the third value
'SQL'

To determine how many values you have in your list use the len() function, just as you did for determining the length of a tuple.

len(mylistofskills) 
3

Slicing lists

There are times when you will want to access a subset of your data. You can do this by providing the index of the first element you wish to return, followed by a colon, then 1 + the index of the last value you want to return. The list that is returned is exclusive of the last element in the range you provide, e.g. [1:3] returns elements at index 1 and 2, but not 3 .

An example should make this clearer. Let's slice mylistofskills and only return the first two elements.


mylistofskills [0:2]
['python', 'R']

This will return elements 0 and 1. It is inclusive of the first element and exclusive of the last element.

Modifying a Single List Value

A single value in a list can be replaced by using indexing and simple assignment.

To replace a skill in mylistofskills, assign the new value to the second element in the list.


mylistofskills [2] = "Java"
mylistofskills
['python', 'R', 'Java']

Adding new list items

To add a new skill to mylistofskills assign the new value using the += operator, in combination wiht the element you wish to add using the square brackets. The use of the += and [] is especially important because because strings are alterable, not atomic. This means that strings can be broken down into smaller components (their individual characters) and each letter would be added as a separate list item. Therefore we need to add a string as a it's own list item to the existing list. See below.


mylistofskills += ["SQL"] # correct way of adding a string to list.
mylistofskills
['python', 'R', 'Java', 'SQL']

mylistofskills += "SQL" # incorrect way of adding a string to list.
mylistofskills # prints each letter as a list element.
['python', 'R', 'Java', 'SQL', 'S', 'Q', 'L']

An alternative is to use append() function to add a new element to the end of the list. When using the append() function, the elements passed in to the function will not be interpreted as iterable, and therefore will be added as a single element to the list (without the need to create a new list using the [ ]).


mylistofskills.append("Tableau") #adds Tableau as a single element
mylistofskills
['python', 'R', 'Java', 'SQL', 'S', 'Q', 'L', 'Tableau']

If you wanted to insert a new element at a specific location, you would use the insert() function. The first argument of the function is the location (the specified index). The second parameter is the value to be inserted. The remaining elements are pushed to the right.

mylistofskills.insert(1,"Excel") #adds Excel as the second element
mylistofskills
['python', 'Excel', 'R', 'Java', 'SQL', 'S', 'Q', 'L', 'Tableau']

Removing list items

There are a few ways to remove an item from a list. I'll demonstrate three ways using the mylistofskills list. Let's look as the list to remind ourselves of which value corresponds to which index.

mylistofskills
['python', 'Excel', 'R', 'Java', 'SQL', 'S', 'Q', 'L', 'Tableau']
  • Way 1 - use del to remove using the index number
del mylistofskills[5] #removes the value at the given index '5' which is the value 'S'
mylistofskills
['python', 'Excel', 'R', 'Java', 'SQL', 'Q', 'L', 'Tableau']
  • Way 2 - .remove() to remove using the element value OR the index number
mylistofskills.remove("L") #removes the value at the given index of 'L'
mylistofskills
['python', 'Excel', 'R', 'Java', 'SQL', 'Q', 'Tableau']
  • Way 3 - .pop() to remove using the index number
mylistofskills.pop(5) #removes the value at the given index 'Q'
'Q'
mylistofskills
['python', 'Excel', 'R', 'Java', 'SQL', 'Tableau']

Converting data structures

There will be instances where you begin working with you data in tuple form and then you'd like to convert it to a list so you can update the data. Remember, tuples are immutable. Therefore, Lists are the data structure of choice, if you need to update values.

To convert a tuple to a List you can use the list() function.

mytuple = (4,5)   # creates a tuple named mytuple     
type(mytuple) #confirms that mytuple is a tuple
<class 'tuple'>
print(mytuple)
(4, 5)
listnumbers = list(mytuple) #converts mytuple to a list called listofnumbers
type(listnumbers) # confirms that listnumbers is a List
<class 'list'>
print(listnumbers)
[4, 5]

To convert a List to a tuple you can use the tuple() function.

mylist = [6,3,7,4]
type(mylist)
<class 'list'>
print(mylist)
[6, 3, 7, 4]
tupleofnumbers = tuple(mylist)#converts a list to a tuple
type(tupleofnumbers)
<class 'tuple'>
print(tupleofnumbers)
(6, 3, 7, 4)

3.3 Dictionaries

A third type of data structure is called a dictionary. A dictionary is a collection which is unordered, changeable and indexed. In Python dictionaries are written with curly brackets, and they have keys and values.

The main characteristics of a dictionary are:

  • A dictionary is like a list, but more general
  • Each key maps to a value
  • These are called key-value pairs (or items)

Example 1

The code below shows the creation of the dictionary named garage that is assigned three key-value pairs enclosed in curly brackets. The keys are brand, model, andyear and are followed by a colon :. The value to the corresponding keys follow the colon and are Tesla, Model X, 2020.


garage = {
  "brand": "Tesla",
  "model": "Model X",
  "year": 2020
}
print(garage)
{'brand': 'Tesla', 'model': 'Model X', 'year': 2020}

It's important to note that any key of the dictionary is associated (or mapped) to a value. The values of a dictionary can be any Python data type. For example, notice that value for the key year is the number 2020 not the string"2020".

Example 2

The code below shows simple Spanish to English look-up dictionary named translator. Note the key-value pairs. For example, the key one corresponds to the value uno.


translator = {"one":"uno","two":"dos", "three":"tres", "four":"cuatro", "five": "cinco"}

print(translator["four"])
cuatro

Referencing dictionary values

To obtain the dictionary value from a key we use the name of the dictionary followed by the square brackets and pass-in the index. For example, as shown, translator["one"].

If the key does not exist in the dictionary, you will receive an exception: Key Error.

print(translator["seven"])

KeyError: 'seven'

Adding an item (key-value pair) to a dictionary

To add an item to a dictionary, assign the value to a key using the [] to indicate the new key as in translator["six"]="seis".


translator = {"one":"uno","two":"dos", "three":"tres", "four":"cuatro", "five": "cinco"}

translator["six"]="seis"
you="two"
print(translator["six"])
seis

Updating a value in a dictionary

Updating a value is just a simple reassignment. For example, to update the key for one, simply assign the value to dictionary name and the name of the key, as in translator["one"]="uno uno". The full code is below:

translator = {"one":"uno","two":"dos", "three":"tres", "four":"cuatro", "five": "cinco"}
translator["six"]="seis" # from the previous example
translator["one"]="uno uno"
print(translator)
{'one': 'uno uno', 'two': 'dos', 'three': 'tres', 'four': 'cuatro', 'five': 'cinco', 'six': 'seis'}

Deleting a value in a dictionary

To delete a value from a dictionary use the del function and reference the dictionary[key] you wish to delete.

del translator["one"]
print(translator)

Output: `{'six': 'seis', 'three': 'tres', 'two': 'dos', 'four': 'cuatro', 'five': 'cinco'}

Using the in operator to interrogate dictionary keys and values

In the example below, we can "check" to see "if" five exists as a key in our dictionary named translator by using theinfunction. A logical value ofTrueorFalse` will be returned.

translator = {"one":"uno","two":"dos", "three":"tres", "four":"cuatro", "five": "cinco"}

print("five" in translator)
True

It is slightly more work to get at the values in a dictionary. We need to query the values from our translator dictionary. To do this we use the .values() function.

translator = {"one":"uno","two":"dos", "three":"tres", "four":"cuatro", "five": "cinco"}

print("cinco" in translator.values())
True

Using iteration with keys

While we do not get to iteration until a later lesson, it may be helpful to see how you can iterate through a dictionary and print out the key value pairs.

translator = {"one":"uno","two":"dos", "three":"tres", "four":"cuatro", "five": "cinco"}

for key in translator: # for every index... 
  print (key, translator[key])
one uno
two dos
three tres
four cuatro
five cinco

Understanding nested dictionaries

Let's return back to our garage example. Let's imagine we want to keep inventory for every spot in our garage. So in other words, we have to have multiple "cars" listed in our dictionary.

Here's our original simple dictionary named garage.

garage = {
  "brand": "Tesla",
  "model": "Model X",
  "year": 2020
}
print(garage)
{'brand': 'Tesla', 'model': 'Model X', 'year': 2020}

If we wanted to have our dictionary hold multiple cars we could create another dictionary for spots and assign each spot a dictionary with the car information.


mygarage = { 'spot01':{"brand": "Tesla","model":"Model X","year": 2020}, 'spot02': {"brand": "Tesla","model": "Model Y","year": 2019}}

print(mygarage)
{'spot01': {'brand': 'Tesla', 'model': 'Model X', 'year': 2020}, 'spot02': {'brand': 'Tesla', 'model': 'Model Y', 'year': 2019}}
print(mygarage["spot01"])
{'brand': 'Tesla', 'model': 'Model X', 'year': 2020}
print(mygarage["spot01"]["model"])
Model X

Let's look at another nested dictionary example.There are two people, each with a number (the first dictionary) with a name, age, and gender.


people = {1: {'name': 'John', 'age': '27', 'sex': 'Male'},
          2: {'name': 'Marie', 'age': '22', 'sex': 'Female'}}
print(people[1]['name'])
John
print(people[1]['age'])
27
print(people[1]['sex'])
Male

3.4 Try it.

Build a nested dictionary of items your want to sell (such as old iphones, clothing, etc. ) and print out the key value pairs.

Here's a great resource: https://www.geeksforgeeks.org/python-nested-dictionary/

3.5 Example

forsale = {'iphone5': {'price': 90, 'condition': 'screen cracked'},
          'macbookair': {'price': 150,'condition': 'fair', 'memory': '8GB'}}
          
print(forsale['iphone5']['price'])
90
print(forsale['macbookair']['price'], forsale['macbookair']['memory'])
150 8GB
for key in forsale:
  print (key, forsale[key])
iphone5 {'price': 90, 'condition': 'screen cracked'}
macbookair {'price': 150, 'condition': 'fair', 'memory': '8GB'}

3.6 Summary

  • A tuple is a python data object. Tuples are immutable.
  • Tuples and Lists can hold data of different types (e.g. heterogeneous data)
  • Lists are mutable which means you can change their data value and modify their structures. Tuples are immutable. A tuple has a fixed length.
  • Python uses a zero index. This means we start counting elements in an tuple, array,list, etc. at zero, rather than 1. R for example uses a non-zero index of 1.
  • To update a List, reference the list name followed by the index and assign it new value (e.g. x[1]="Hello").
  • To append a new value to a list use the += operator followed by the new value in square brackets (e.g. x +=["Hello"]).
  • To remove value from a list you can use del, .pop() or .remove()
  • To extract a subset of a List (i.e. slicing) use the colon to delineate the range (e.g x[2:5])
  • Dictionaries allow you to store heterogeneous data in the form of key-value pairs.
  • The in operator can be used to see if a dictionary key or and value exists
  • Deleting a key value pair in an dictionary follows the form del dictionary[key].

Exercise 3.1

  1. Create two Lists with the following data. Show how you create the list. Then, use the function to verify that your list is a list object. Finally, use the function to show how many values are in the list.
Month Value
January 230
February 250
March 345
April 290
May 890
June 321
July 232
August 456
September 213
October 299
November 123
December 566
  1. Update the value for April from 290 to 302 and verify by displaying the contents of the list.

  2. Use indexing to reference the April from your month list and the corresponding value of 302 from the value list. Print those values in a sentence that reads:

You spent $302 in April.

  1. Add 2 new values to your Lists. The new value for the month list should be total and the new value to be added to the value list is the sum of all values in the list.

Print those new variables out in a sentence that reads:

The total annual spend on consumable goods was $4,227.

  1. Now, create a dictionary with the following data. Show how you create the dictionary and print out the contents of the dictionary. Then, use the function to verify that you have an object of type dictionary. Use the function to show how many values there are in the dictionary. Next, update the amount for April from 290 to 302 and verify by displaying the contents of the dictionary using a for loop. Finally sum all the values.
Month Amount
January 230
February 250
March 345
April 290
May 890
June 321
July 232
August 456
September 213
October 299
November 123
December 566

Exercise 3.2

  1. Create 2 tuples for the following data: date and precipitation. Use the appropriate function to show that your tuples are indeed tuples using the isinstance function. Note: You'll need to do some research on how to use the function. Ensure both tuples have the same number of values by comparing them to each other. Then, print out the values for both tuples.
Date Precipitation
21-Jul Sun
22-Jul Rain
23-Jul Rain
24-Jul Sun
25-Jul Sun
26-Jul Sun
27-Jul Sun
28-Jul Sun
29-Jul Sun
30-Jul Sun
31-Jul Rain
1-Aug Rain
2-Aug Rain
3-Aug Rain
  1. Convert both tuples into lists date_list, precipitation_list. Use the command to show the data structure type to prove it.

  2. Count the number of days included in this list and write a pretty sentence to communicate this information.

  3. Separate the lists into two lists one for week1 and one for week2 (total of 4 lists).

  4. Using indexing for week2_precipitation, what will the weather be on July 30th? Return your answer in a pretty sentence.

  5. Create a dictionary named my weathertracker from the two lists using the following syntax: dict(zip(y,x))

Note: This may require you to research this function. =)

Assignment 3

Part 1

Using the data from Exercise 3.2, replace the each value of Rain with 1 and each value of Sun with a 0.

  1. How many days in the 2 week time span will it rain and how many days will it be sunny?

  2. The weather can now be predicted for August 4th and 5th. Aug 4th will be sunny and Aug 5th will rain, add both dates to the data.

Print the following statements using indexing:

On 4-Aug we will see the sun. On 5-Aug we will see rain.

Part 2

Refer the the code below and complete the following:

  1. Add two more people to the dictionary people.
  2. Modify the name of person 1 from John to Jonathan.
  3. Delete person 2 from the dictionary.
  4. Show all key values pairs from the dictionary using a for loop. Be sure to nicely format the output of the all of the key value pairs.
people = {1: {'name': 'John', 'age': '27', 'sex': 'Male'},
          2: {'name': 'Marie', 'age': '22', 'sex': 'Female'}}

Part 3

Back to our friends…

You are working for Facebook’s Operations Intelligence team and are tasked to build a calculator to find the best ad sales representative each month which is based on largest (hint: max) monthly ad revenue. There are 5 sales representatives listed below with this months revenue. Call the calculator rep_calculator.

  • Joe: $1,500,000
  • Sarah: $2,750,000
  • Jack: $560,000
  • Mark: $1,975,000
  • Shawn: $2,200,000

Complete the following:

  1. Build a dictionary
  2. Return the key with the max value
  3. Print a pretty sentence: Sarah has the highest sales! She reached $2,750,000.00 this month

References

Gupta, Kitty. 2018. “Python List Versus Array Versus Tuple - Understanding the Differences.” https://www.freelancinggig.com/blog/2018/05/17/python-list-vs-array-vs-tuple-understanding-the-differences/.

Sturtz, John. 2018. “Lists and Tuples in Python.” https://realpython.com/python-lists-tuples/#python-lists.