Python 3 Notes[ HOME | LING 1330/2330 ] |
Pickling<< Previous Note Next Note >>
| ||||||||
On this page: pickle module, pickle.dump(), pickle.load(), cPickle module
Pickling: the ConceptSuppose you just spent a better part of your afternoon working in Python, processing many data sources to build an elaborate, highly structured data object. Say it is a dictionary of English words with their frequency counts, translation into other languages, etc. And now it's time to close your Python program and go eat dinner. Obviously you want to save this object for future use, but how?You *could* write the data object out to a text file, but that's not optimal. Once written as a text file, it is a simple text file, meaning next time you read it in you will have parse the text and process it back to your original data structure. What you want, then, is a way to save your Python data object as itself, so that next time you need it you can simply load it up and get your original object back. Pickling and unpickling let you do that. A Python data object can be "pickled" as itself, which then can be directly loaded ("unpickled") as such at a later point; the process is also known as "object serialization". How to Pickle/UnpicklePickling functions are part of the pickle module. You will first need to import it. And, pickling/unpickling obviously involves file IO, so you will have to use the file writing/reading routines you learned in the previous tutorial.Below, grades, a small dictionary data object, is being pickled. pickle.dump() is the method for saving the data out to the designated pickle file, usually with the .p or .pkl extension.
Pickling in the BinaryThe default pickling routine shown above saves the data as an ASCII text file, albeit in a Python-specific data format. This means that your pickle file is going to be large. For improved efficiency, it is recommended to use a binary protocol instead. This is basically achieved by specifying a third, optional "protocol level" argument while dumping, e.g., pickle.dump(grades, f, -1). "-1" means the highest available binary protocol. In addition, file IO will have to be done in a binary mode: you need to use 'wb' ('b' for binary) during file writing and 'rb' during file opening.
|