Python 2.7 Tutorial

Go to: Na-Rae Han's home page

Python 2.7 Tutorial

With Videos by mybringback.com

Pickling

Play All on YouTube

<< Previous Tutorial Next Tutorial >>

On this page: pickle module, pickle.dump(), pickle.load(), cPickle module
Pickling: the Concept
Suppose you just spent a better part of your afternoon working in Python, processing many data sources to build an elaborate, highly structured data object. Say it is a dictionary of English words with their frequency counts, translation into other languages, etc. And now it's time to close your Python program and go eat dinner. Obviously you want to save this object for future use, but how?

You *could* write the data object out to a text file, but that's not optimal. Once written as a text file, it is a simple text file, meaning next time you read it in you will have parse the text and process it back to your original data structure.

What you want, then, is a way to save your Python data object as itself, so that next time you need it you can simply load it up and get your original object back. Pickling and unpickling let you do that. A Python data object can be "pickled" as itself, which then can be directly loaded ("unpickled") as such at a later point; the process is also known as "object serialization".
How to Pickle/Unpickle
Pickling functions are part of the pickle module. You will first need to import it. And, pickling/unpickling obviously involves file IO, so you will have to use the file writing/reading routines you learned in the previous tutorial.

Below, grades, a small dictionary data object, is being pickled. pickle.dump() is the method for saving the data out to the designated pickle file, usually with the .p or .pkl extension.

grades = {'Bart':75, 'Lisa':98, 'Milhouse':80, 'Nelson':65} import pickle # import module first f = open('gradesdict.p', 'w') # Pickle file is newly created where foo1.py is pickle.dump(grades, f) # dump data to f f.close() foo1.py
Unpickling works as follows. Again you start by importing pickle. You then open the pickle file for reading, load the content into a new variable, and close up the file. Loading is done through the pickle.load() method. Your dictionary has a different name of mydict, but the content is the same.

import pickle # import module first f = open('gradesdict.p', 'r') # 'r' for reading; can be omitted mydict = pickle.load(f) # load file content as mydict f.close() print mydict # prints {'Lisa': 98, 'Bart': 75, 'Milhouse': 80, 'Nelson': 65} foo2.py

Pickling in the Binary
The default pickling routine shown above saves the data as an ASCII text file, albeit in a Python-specific data format. This means that your pickle file is going to be large. For improved efficiency, it is recommended to use a binary protocol instead. This is basically achieved by specifying a third, optional "protocol level" argument while dumping, e.g., pickle.dump(grades, f, -1). "-1" means the highest available binary protocol. In addition, file IO will have to be done in a binary mode: you need to use 'wb' ('b' for binary) during file writing and 'rb' during file opening.

grades = {'Bart':75, 'Lisa':98, 'Milhouse':80, 'Nelson':65} import pickle f = open('gradesdict.p', 'wb') # 'wb' instead 'w' for binary file pickle.dump(grades, f, -1) # -1 specifies highest binary protocol f.close() foo1.py

import pickle f = open('gradesdict.p', 'rb') # 'rb' for reading binary file mydict = pickle.load(f) f.close() print mydict # prints {'Lisa': 98, 'Bart': 75, 'Milhouse': 80, 'Nelson': 65} foo2.py
One caveat of having the binary protocol option is that for a particular pickle file you might not remember if it was pickled in the binary mode or not. For this reason, you should pick a pickling mode you routinely use and stick with it. Actually, you should always use the binary protocol.
cPickle: Pickling Faster
Once you start crunching some serious volumes of data, you will find that pickle operations move at a crawl speed. Python gives you an alternative for this: cPickle. This module is very much like the standard pickle module, only much faster (up to 1000x times!), because it was written in C.

To use this module instead, all you have to do is remember to import cPickle. All other operations are the same. And pickle data files written by either module are compatible with either, meaning a pickle file written by cPickle can be unpickled by pickle and vice versa, as long as the same pickling protocol was used.

import cPickle f = open('gradesdict.p', 'rb') mydict = cPickle.load(f) f.close() foo3.py