Lesson 3 Data Structures I
Welcome to lesson 3. In this lesson we will begin to learn the ways in which data is encoded in python.
Follow along with this tutorial by creating a new ipython notebook named lesson03.ipynb and entering the code snippets presented.
- Exercise 3.1
- Exercise 3.2
- Assignment 3
In lesson 2, we learned how create and update variables with a single value. When we want to store more than one value in a variable object we create a data structure for the object. “In any programming language, the data structures available play an extremely important role. They decide how your data will be stored in the memory and how you can retrieve the data when required” (Gupta (2018), para 1).
A data structure defines attributes of the data and the organization of the data. To make this concept more concrete, you can think of an Excel spreadsheet as a data structure comprised of rows and columns. There are many types of data structures in python, each with a set of features and functions that you can use to manipulate or act on your data. Some examples include the tuple, List, Series, array, and DataFrame.
The syntax for each one of these data structures is different.
We’ll begin working with tuples and Lists. These “are arguably Python’s most versatile, useful data types. You will find them in virtually every nontrivial Python program”(Sturtz (2018), para 1).
The first simple data structure that I’ll introduce you to is the tuple. A tuple is a data object in python. A tuple can contain heterogeneous data. This means you can store data of different types (i.e. str, int, float) in the same structure. A tuple is considered to be the fastest and consumes the least amount of memory. However, you cannot update the values in a tuple. It is an object that cannot be updated; it is immutable. Use tuples if your data will not change. A tuple has a fixed length and size.
We can best understand how tuples work through an example. Let’s assume we tracked our monthly spend on consumable goods for a 12 month period on Amazon. We have 12 values, one for each month. The objective is to store this table of information as variables.
If we used a variable for each value, we’d have to create 12 variables. For example,
January = 230 February = 250 # and so on
Instead, since these values are fixed, meaning they won’t change since we can’t go back in time and update how much we spent per month (assume no returns), we can store them in a single variable of type
tuple. We can store different types of data in our tuple. Let’s just store the values for each month. We do this by giving our tuple object a name, i.e.
spend. Next we use the assignment operator (i.e
=) to assign 12 values to our variable
spend. The values must be enclosed within in parentheses and separated by commas to delimit one value from the next.
spend=(230,250,345,290, 890, 321, 232,456,213,299, 123, 566)
The variable spend is a data object of type tuple. We can prove this by using the
type() function. You’ll observe that the object type that is returned is a
To see the length (or how many elements are in the tuple) we can use the
If we wanted to add up all of the values in our tuple named
spend we could use a function called
Similarly, we can determine the minimum and maximum value in a tuple using the
max() functions, respectively.
sum(spend) min(spend) max(spend)
To access a given value in a tuple, we use indexing. This means that we reference a value by it’s location within the tuple.
For example, if we wanted to access the value for January, the first value in the tuple we would use the expression
spend. We pass the position of the value as a parameter (using square brackets).
This expression is basically saying return the first value in the tuple. However, why did we use
0 instead of
Let’s look at an example.
What value is returned when we pass in
1? It’s the second value in the tuple,
250. We can deduce that python using zero indexing. This means that python starts counting at zero, not 1. In contrast, the R programming language, uses non-zero indexing, it starts counting at
The table below show the index for a given value in our tuple.
Now that you understand how python indexes values in a tuple, let’s return the value for January from our tuple
Now, let’s create a second tuple to store the actual months that correspond to our monthly spending called
month=("January", "February", "March", "April", "May", "June", "July", "August", "September", "October", "November", "December")
To print out the values of our new tuple
month just use the
print() function or type the name of the tuple.
('January', 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December')
Tuples are interesting data structures because they can store heterogeneous data. That means, we can store data of type
bool all in one data structure.
For example, we can the demographic data from lesson 2 all in a single tuple.
demographics=(12345,"Wanda","Wonderly","Female", False, 2400000) type(demographics) #returns the type of data structure
type(demographics) #returns the data type of the 1st value
type(demographics) #returns the data type of the 2nd value
Tuples are useful when you do not want the data to change. A defining characteristic of a tuple is that it is immutable. For example, if Wanda Wonderly’s annual income changed we would not be able to update our tuple.
To access the marriage element we use the square brackets
[ ]. We want to access the 5th value, which is the 4th element (remember we start counting at zero).
This returns the marriage status.
With other data structures, we can usually update this value if Wanda’s salary changed in the following way:
demographics = 250000
However, if you try to run this code it returns a TypeError:
TypeError: 'tuple' object does not support item assignment
Another type of data structure is called a List. Lists have specific features that differ from tuples. The most important feature for our purposes is that Lists are mutable. This means we can update values in a list. In many programming languages, arrays are used instead of Lists. However, python does not have built in support for arrays. Python lists can be used instead.
Let’s begin by creating a list.
mylistofskills = ["python", "R", "SQL"]
You’ll immediately notice that to construct a list named
mylistofskills we pass in the list values using square brackets
[ ] rather than parentheses. This is how python distinguishes between the two data structures.
To access a single element in a list, use the square brackets and pass in the position number.
mylistofskills  #returns the first value
mylistofskills  #returns the second value
mylistofskills  #returns the third value
To determine how many values you have in your list use the
len() function, just as you did for determining the length of a tuple.
There are times when you will want to access a subset of your data. You can do this by providing the index of the first element you wish to return, followed by a colon, then 1 + the index of the last value you want to return.
An example should make this clearer. Let’s slice
mylistofskills and only return the first two elements.
This will return elements 0 and 1. It is inclusive of the first element and exclusive of the last element.
Modifying a Single List Value
A single value in a list can be replaced by using indexing and simple assignment.
For example, to replace a skill in
mylistofskills assign the new value to the second element in the list.
mylistofskills  = "Java" mylistofskills
['python', 'R', 'Java']
Adding new list items
To add a new skill to
mylistofskills assign the new value using the
+= operator. The use of the
+= is only necessary when adding strings to a list. It is simply because strings are alterable, not atomic. This means that strings can be broken down into smaller components (their individual characters) and each letter would be added as a separate list item. Therefore we need to add a string as a it’s own list item to the existing list. See below.
mylistofskills += ["SQL"] # correct way of adding a string to list. mylistofskills
['python', 'R', 'Java', 'SQL']
mylistofskills += "SQL" # incorrect way of adding a string to list. mylistofskills # prints each letter as a list element.
['python', 'R', 'Java', 'SQL', 'S', 'Q', 'L']
An alternative is to use
append() function to add a new element to the end of the list. When using the
append() function, the elements passed in to the function will not be interpreted as iterable, and therefore will be added as a single element to the list (without the need to create a new list using the
mylistofskills.append("Tableau") #adds Tableau as a single element mylistofskills
['python', 'R', 'Java', 'SQL', 'S', 'Q', 'L', 'Tableau']
If you wanted to insert a new element at a specific location, you would use the
insert() function. The first argument of the function is the location (the specified index). The second parameter is the value to be inserted. The remaining elements are pushed to the right.
mylistofskills.insert(1,"Excel") #adds Excel as the second element mylistofskills
['python', 'Excel', 'R', 'Java', 'SQL', 'S', 'Q', 'L', 'Tableau']
Removing list items
There are a few ways to remove an item from a list. I’ll demonstrate three ways using the
mylistofskills list. Let’s look as the list to remind ourselves of which value corresponds to which index.
['python', 'Excel', 'R', 'Java', 'SQL', 'S', 'Q', 'L', 'Tableau']
- Way 1 - use
delto remove using the index number
del mylistofskills #removes the value at the given index 'S' mylistofskills
['python', 'Excel', 'R', 'Java', 'SQL', 'Q', 'L', 'Tableau']
- Way 2 -
.remove() to remove using the element value OR the index number
mylistofskills.remove("L") #removes the value at the given index 'L' mylistofskills
['python', 'Excel', 'R', 'Java', 'SQL', 'Q', 'Tableau']
- Way 3 -
.pop()to remove using the index number
mylistofskills.pop(5) #removes the value at the given index 'Q'
['python', 'Excel', 'R', 'Java', 'SQL', 'Tableau']
Converting data structures
There will be instances where you begin working with you data in tuple form and then you’d like to convert it to a list so you can update the data. Remember, tuples are immutable. Therefore, Lists are the data structure of choice, if you need to update values.
To convert a tuple to a List you can use the
mytuple = (4,5) # creates a tuple named mytuple type(mytuple) #confirms that mytuple is a tuple
listnumbers = list(mytuple) #converts mytuple to a list called listofnumbers type(listnumbers) # confirms that listnumbers is a List
To convert a List to a tuple you can use the
mylist = [6,3,7,4] type(mylist)
[6, 3, 7, 4]
tupleofnumbers = tuple(mylist)#converts a list to a tuple type(tupleofnumbers)
(6, 3, 7, 4)
tupleis a python data object. Tuples are immutable.
- Tuples and Lists can hold data of different types (e.g. heterogeneous data)
- Lists are mutable which means you can change their data value and modify their structures. Tuples are immutable. A tuple has a fixed length.
- Python uses a zero index. This means we start counting elements in an tuple, array,list, etc. at zero, rather than 1. R for example uses a non-zero index of 1.
- To update a List, reference the list name followed by the index and assign it new value (e.g.
- To append a new value to a list use the += operator followed by the new value in square brackets (e.g.
- To remove value from a list you can use
- To extract a subset of a List (i.e. slicing) use the colon to delineate the range (e.g
- Create two Lists with the following data.
Update the value for April from
Use indexing to reference the
monthlist and it’s corresponding value of
valuelist. Print those values in a sentence that reads:
You spent $302 in April.
- Add 2 new values to your Lists. The new value for the
monthlist should be
totaland the new value to be added to the
valuelist is the sum of all values in the list.
Print those new variables out in a sentence that reads:
The total annual spend on consumable goods was $4,215.
- Create 2 tuples for the following data:
Convert both tuples into lists
Count the number of days included in this list.
Separate the lists into two lists one for
week1and one for
week2(total of 4 lists)
Using indexing for
week2_precipitation, what will the weather be on July 30th?
Using the data from Exercise 3.2, replace the each value of Rain with 1 and each value of Sun with a 0.
How many days in the 2 week time span will it rain and how many days will it be sunny?
The weather can now be predicted for August 4th and 5th. Aug 4th will be sunny and Aug 5th will rain, add both dates to the data.
Print the following statements using indexing:
On 4-Aug we will see the sun.
On 5-Aug we will see rain.
Gupta, Kitty. 2018. “Python List Versus Array Versus Tuple - Understanding the Differences.” https://www.freelancinggig.com/blog/2018/05/17/python-list-vs-array-vs-tuple-understanding-the-differences/.
Sturtz, John. 2018. “Lists and Tuples in Python.” https://realpython.com/python-lists-tuples/#python-lists.