Exercise 4: Messy Data

By Kristen Sosulski

Solve The Tasks In Order To Get Hands On The Concepts
What is the exercise about?

The exercise presents few tasks based on Data Management. Solve these 4 tasks that will not only brush up your concepts but also it will give you practical exposure on dealing with messy data using R. All the best!

Task 1: Getting to know the data

a. Import the data named “summer_winter_olympics.csv”
b. View the data
c. Look at columns name
d. Look at dimensions of data (rows and columns)

Task 2: Dealing with Data

a. Look at the column names and change names to more meaningful names
b. The data represent, in order:
1. Country
2. Number of summer games played, gold, silver, bronze, total
3. Number of winter games played, gold, silver, bronze, total
4. Total (Winter + Summer) games, gold, silver, bronze, total

Task 3: Summary

a. Use table() to find frequency of total summer games played
b. Explore the data with other variables

Task 4: Graphs

a. Create histograms of summer games (total)
b. Create histogram of winter games (total)
c. Put above two histograms on one page
d.Create two histograms on one page: total summer, total winter medals won
e. Is there a correlation between number of medals given out in winter and summer? (do plot)
f. How about number of games each country competes in. Is there correlation between winter and summer?
g. Look at distribution of each of the types of medals, by season (6 histograms on one page)
h. Recreate g with different number of bins (10 instead of 20)
i. Explore data on your own