Lesson 3 Data Structures I

Welcome to lesson 3. In this lesson we will begin to learn the ways in which data is encoded in python.

Follow along with this tutorial by creating a new ipython notebook named lesson03.ipynb and entering the code snippets presented.

Outline

  • Tuples
  • Lists
  • Summary
  • Exercise 3.1
  • Exercise 3.2
  • Assignment 3

In lesson 2, we learned how create and update variables with a single value. When we want to store more than one value in a variable object we create a data structure for the object. “In any programming language, the data structures available play an extremely important role. They decide how your data will be stored in the memory and how you can retrieve the data when required” (Gupta (2018), para 1).

A data structure defines attributes of the data and the organization of the data. To make this concept more concrete, you can think of an Excel spreadsheet as a data structure comprised of rows and columns. There are many types of data structures in python, each with a set of features and functions that you can use to manipulate or act on your data. Some examples include the tuple, List, Series, array, and DataFrame.

The syntax for each one of these data structures is different.

We’ll begin working with tuples and Lists. These “are arguably Python’s most versatile, useful data types. You will find them in virtually every nontrivial Python program”(Sturtz (2018), para 1).

3.1 Tuples

The first simple data structure that I’ll introduce you to is the tuple. A tuple is a data object in python. A tuple can contain heterogeneous data. This means you can store data of different types (i.e. str, int, float) in the same structure. A tuple is considered to be the fastest and consumes the least amount of memory. However, you cannot update the values in a tuple. It is an object that cannot be updated; it is immutable. Use tuples if your data will not change. A tuple has a fixed length and size.

We can best understand how tuples work through an example. Let’s assume we tracked our monthly spend on consumable goods for a 12 month period on Amazon. We have 12 values, one for each month. The objective is to store this table of information as variables.

Month Value
January 230
February 250
March 345
April 290
May 890
June 321
July 232
August 456
September 213
October 299
November 123
December 566

If we used a variable for each value, we’d have to create 12 variables. For example,

January = 230
February = 250
# and so on

Instead, since these values are fixed, meaning they won’t change since we can’t go back in time and update how much we spent per month (assume no returns), we can store them in a single variable of type tuple. We can store different types of data in our tuple. Let’s just store the values for each month. We do this by giving our tuple object a name, i.e. spend. Next we use the assignment operator (i.e =) to assign 12 values to our variable spend. The values must be enclosed within in parentheses and separated by commas to delimit one value from the next.

spend=(230,250,345,290, 890, 321, 232,456,213,299, 123, 566)

The variable spend is a data object of type tuple. We can prove this by using the type() function. You’ll observe that the object type that is returned is a tuple.

type(spend)
<class 'tuple'>

To see the length (or how many elements are in the tuple) we can use the len function.

len(spend)
12

If we wanted to add up all of the values in our tuple named spend we could use a function called sum().

Similarly, we can determine the minimum and maximum value in a tuple using the min() and max() functions, respectively.

sum(spend)
min(spend)
max(spend)

To access a given value in a tuple, we use indexing. This means that we reference a value by it’s location within the tuple.

For example, if we wanted to access the value for January, the first value in the tuple we would use the expression spend[0]. We pass the position of the value as a parameter (using square brackets).

This expression is basically saying return the first value in the tuple. However, why did we use 0 instead of 1?

Let’s look at an example.

spend[1]
250

What value is returned when we pass in 1? It’s the second value in the tuple, 250. We can deduce that python using zero indexing. This means that python starts counting at zero, not 1. In contrast, the R programming language, uses non-zero indexing, it starts counting at 1.

The table below show the index for a given value in our tuple.

Index Value Expression
0 230 spend[0]
1 250 spend[1]
2 345 spend[2]
3 290 spend[3]
4 890 spend[4]
5 321 spend[5]
6 232 spend[6]
7 456 spend[7]
8 213 spend[8]
9 299 spend[9]
10 123 spend[10]
11 566 spend[11]

Now that you understand how python indexes values in a tuple, let’s return the value for January from our tuple spend.

spend[0]
230

Now, let’s create a second tuple to store the actual months that correspond to our monthly spending called month.

month=("January", "February", "March", "April", "May", "June", "July", "August", "September", "October", "November", "December")

To print out the values of our new tuple month just use the print() function or type the name of the tuple.

print(month) 
('January', 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December')

Tuples are interesting data structures because they can store heterogeneous data. That means, we can store data of type int, str, float, and bool all in one data structure.

For example, we can the demographic data from lesson 2 all in a single tuple.

Variable Value
customer id 12345
first name Wanda
last name Wonderly
gender Female
married False
annual income 240,000

Try it.

demographics=(12345,"Wanda","Wonderly","Female", False, 2400000)

type(demographics) #returns the type of data structure
<class 'tuple'>
type(demographics[0]) #returns the data type of the 1st value
<class 'int'>
type(demographics[1]) #returns the data type of the 2nd value
<class 'str'>

Tuples are useful when you do not want the data to change. A defining characteristic of a tuple is that it is immutable. For example, if Wanda Wonderly’s annual income changed we would not be able to update our tuple.

To access the marriage element we use the square brackets [ ]. We want to access the 5th value, which is the 4th element (remember we start counting at zero).

demographics[4]
False

This returns the marriage status.

With other data structures, we can usually update this value if Wanda’s salary changed in the following way:

demographics[5] = 250000

However, if you try to run this code it returns a TypeError:

TypeError: 'tuple' object does not support item assignment

3.2 Lists

Another type of data structure is called a List. Lists have specific features that differ from tuples. The most important feature for our purposes is that Lists are mutable. This means we can update values in a list. In many programming languages, arrays are used instead of Lists. However, python does not have built in support for arrays. Python lists can be used instead.

Let’s begin by creating a list.

mylistofskills = ["python", "R", "SQL"]

You’ll immediately notice that to construct a list named mylistofskills we pass in the list values using square brackets [ ] rather than parentheses. This is how python distinguishes between the two data structures.

To access a single element in a list, use the square brackets and pass in the position number.

mylistofskills [0] #returns the first value
'python'
mylistofskills [1] #returns the second value
'R'
mylistofskills [2] #returns the third value
'SQL'

To determine how many values you have in your list use the len() function, just as you did for determining the length of a tuple.

len(mylistofskills) 
3

Slicing lists

There are times when you will want to access a subset of your data. You can do this by providing the index of the first element you wish to return, followed by a colon, then 1 + the index of the last value you want to return.

An example should make this clearer. Let’s slice mylistofskills and only return the first two elements.


mylistofskills [0:2]
['python', 'R']

This will return elements 0 and 1. It is inclusive of the first element and exclusive of the last element.

Modifying a Single List Value

A single value in a list can be replaced by using indexing and simple assignment.

For example, to replace a skill in mylistofskills assign the new value to the second element in the list.


mylistofskills [2] = "Java"
mylistofskills
['python', 'R', 'Java']

Adding new list items

To add a new skill to mylistofskills assign the new value using the += operator. The use of the += is only necessary when adding strings to a list. It is simply because strings are alterable, not atomic. This means that strings can be broken down into smaller components (their individual characters) and each letter would be added as a separate list item. Therefore we need to add a string as a it’s own list item to the existing list. See below.


mylistofskills += ["SQL"] # correct way of adding a string to list.
mylistofskills
['python', 'R', 'Java', 'SQL']

mylistofskills += "SQL" # incorrect way of adding a string to list.
mylistofskills # prints each letter as a list element.
['python', 'R', 'Java', 'SQL', 'S', 'Q', 'L']

An alternative is to use append() function to add a new element to the end of the list. When using the append() function, the elements passed in to the function will not be interpreted as iterable, and therefore will be added as a single element to the list (without the need to create a new list using the [ ]).


mylistofskills.append("Tableau") #adds Tableau as a single element
mylistofskills
['python', 'R', 'Java', 'SQL', 'S', 'Q', 'L', 'Tableau']

If you wanted to insert a new element at a specific location, you would use the insert() function. The first argument of the function is the location (the specified index). The second parameter is the value to be inserted. The remaining elements are pushed to the right.

mylistofskills.insert(1,"Excel") #adds Excel as the second element
mylistofskills
['python', 'Excel', 'R', 'Java', 'SQL', 'S', 'Q', 'L', 'Tableau']

Removing list items

There are a few ways to remove an item from a list. I’ll demonstrate three ways using the mylistofskills list. Let’s look as the list to remind ourselves of which value corresponds to which index.

mylistofskills
['python', 'Excel', 'R', 'Java', 'SQL', 'S', 'Q', 'L', 'Tableau']
  • Way 1 - use del to remove using the index number
del mylistofskills[5] #removes the value at the given index 'S'
mylistofskills
['python', 'Excel', 'R', 'Java', 'SQL', 'Q', 'L', 'Tableau']
  • Way 2 - .remove() to remove using the element value OR the index number
mylistofskills.remove("L") #removes the value at the given index 'L'
mylistofskills
['python', 'Excel', 'R', 'Java', 'SQL', 'Q', 'Tableau']
  • Way 3 - .pop() to remove using the index number
mylistofskills.pop(5) #removes the value at the given index 'Q'
'Q'
mylistofskills
['python', 'Excel', 'R', 'Java', 'SQL', 'Tableau']

Converting data structures

There will be instances where you begin working with you data in tuple form and then you’d like to convert it to a list so you can update the data. Remember, tuples are immutable. Therefore, Lists are the data structure of choice, if you need to update values.

To convert a tuple to a List you can use the list() function.

mytuple = (4,5)   # creates a tuple named mytuple     
type(mytuple) #confirms that mytuple is a tuple
<class 'tuple'>
print(mytuple)
(4, 5)
listnumbers = list(mytuple) #converts mytuple to a list called listofnumbers
type(listnumbers) # confirms that listnumbers is a List
<class 'list'>
print(listnumbers)
[4, 5]

To convert a List to a tuple you can use the tuple() function.

mylist = [6,3,7,4]
type(mylist)
<class 'list'>
print(mylist)
[6, 3, 7, 4]
tupleofnumbers = tuple(mylist)#converts a list to a tuple
type(tupleofnumbers)
<class 'tuple'>
print(tupleofnumbers)
(6, 3, 7, 4)

3.3 Summary

  • A tuple is a python data object. Tuples are immutable.
  • Tuples and Lists can hold data of different types (e.g. heterogeneous data)
  • Lists are mutable which means you can change their data value and modify their structures. Tuples are immutable. A tuple has a fixed length.
  • Python uses a zero index. This means we start counting elements in an tuple, array,list, etc. at zero, rather than 1. R for example uses a non-zero index of 1.
  • To update a List, reference the list name followed by the index and assign it new value (e.g. x[1]="Hello").
  • To append a new value to a list use the += operator followed by the new value in square brackets (e.g. x +=["Hello"]).
  • To remove value from a list you can use del, .pop() or .remove()
  • To extract a subset of a List (i.e. slicing) use the colon to delineate the range (e.g x[2:5])

Exercise 3.1

  1. Create two Lists with the following data.
Month Value
January 230
February 250
March 345
April 290
May 890
June 321
July 232
August 456
September 213
October 299
November 123
December 566
  1. Update the value for April from 290 to 302

  2. Use indexing to reference the April from your month list and it’s corresponding value of 302 from the value list. Print those values in a sentence that reads:

You spent $302 in April.

  1. Add 2 new values to your Lists. The new value for the month list should be total and the new value to be added to the value list is the sum of all values in the list.

Print those new variables out in a sentence that reads:

The total annual spend on consumable goods was $4,215.

Exercise 3.2

  1. Create 2 tuples for the following data: date and precipitation
Date Precipitation
21-Jul Sun
22-Jul Rain
23-Jul Rain
24-Jul Sun
25-Jul Sun
26-Jul Sun
27-Jul Sun
28-Jul Sun
29-Jul Sun
30-Jul Sun
31-Jul Rain
1-Aug Rain
2-Aug Rain
3-Aug Rain
  1. Convert both tuples into lists date_list, precipitation_list.

  2. Count the number of days included in this list.

  3. Separate the lists into two lists one for week1 and one for week2 (total of 4 lists)

  4. Using indexing for week2_precipitation, what will the weather be on July 30th?

Assignment 3

Using the data from Exercise 3.2, replace the each value of Rain with 1 and each value of Sun with a 0.

  1. How many days in the 2 week time span will it rain and how many days will it be sunny?

  2. The weather can now be predicted for August 4th and 5th. Aug 4th will be sunny and Aug 5th will rain, add both dates to the data.

Print the following statements using indexing:

On 4-Aug we will see the sun. On 5-Aug we will see rain.

References

Gupta, Kitty. 2018. “Python List Versus Array Versus Tuple - Understanding the Differences.” https://www.freelancinggig.com/blog/2018/05/17/python-list-vs-array-vs-tuple-understanding-the-differences/.

Sturtz, John. 2018. “Lists and Tuples in Python.” https://realpython.com/python-lists-tuples/#python-lists.