Python CSV data (import csv)

CSV means “comma separated values.”

Example of what CSV data looks like:

Joy,15,1025
Alfred,14,900
Jane,16,800

Notice that each line has multiple items of data separated by commas.

You can view a .csv file in Excel to see it in a grid. You can also see it in any text editor such as Notepad++ or Sublime Text.

You can create a .csv file from Excel or Google Spreadsheets if you save as .csv instead of using the regular save command.

Loading CSV data

Let’s say our data file is called ‘people.csv’ and it’s located in the working directory.

import csv
f = open('people.csv')
reader = csv.reader(f)
for row in reader:
    print(row)

The program creates a file object called f.

Next it creates a reader object called reader. 

You can use a for loop to iterate through this reader object. The output of the program above looks like this:

['Joy', ' 15', ' 1025']
['Alfred', ' 14', ' 900']
['Jane', ' 16', ' 800']

Each row is a list that contains three items. We can use these lists in a variety of ways. Depending on what your’e trying to do with the data, you’ll want to transform the data in different ways.

One row at a time

You can get just the next row of data from a reader object using the next() function:

import csv
f = open('people.csv')
reader = csv.reader(f)
x = next(reader)
print(x)
x = next(reader)
print(x)

If you run a program like the example above, you’ll see that you get the second row of data the second time you call the next() function.

You don’t usually need the next() function, because this is automatically called when you iterate through a reader object using a for loop as described above. However, it does provide a quick way to skip the first row of a file. This is handy when you have a header row followed by all of the data rows.

Creating separate lists from the data

names = []
ages = []
scores = []
for row in reader:
    names += [row[0]]
    ages += [row[1]]
    scores += [row[2]]

Blank lists for each column of data are created first (before the for loop), then each item from the row is added to one of the lists. They have to match the order that the data is listed within the rows.

This is useful if you want to make a pyplot data visualization, where the inputs for the plot functions need to be lists of values of a single variable.

List of Dictionaries

Sometimes it’s nice to have data rows setup as dictionaries, because then you don’t have to remember that item #1 is age and so on. Here’s an example:

people = []
for row in reader:
    #create new temporary dictionary
    temp = {}
    temp['name'] = row[0]
    temp['age'] = row[1]
    temp['score'] = row[2]
    people += [temp]

If the data is a list of dictionaries, you can store and print data according to these examples:

n1 = people[2]['name']
print(people[0]['age'])

Your data can be stored in memory in whatever way is most convenient for you. That could be a list of separate dictionaries, a dictionary of lists, a dictionary of dictionaries, a list of lists, and so on.

If you know how to define a Python class, you can store your data as a list of objects.