Pickling or serialization is the process of converting a Python object to a byte stream; and Unpickling or deserialization is the process of re-creating the original in-memory Python object (not necessarily at the same memory address).
Python's pickle
module has the necessary methods to pickle and unpickle Python object hierarchies.
pickle
module:
-
is part of the Python standard library
- converts arbitrary in-memory Python objects to/from byte streams
-
Those byte streams can be:
- saved to any binary file (bytes cannot be written to plain files) for later retrieval.
eg., save the progress of some activity so the activity can be paused and resumed - OR - -
sent over network between Python end-points that expect binary data
- saved to any binary file (bytes cannot be written to plain files) for later retrieval.
-
It is possible to pickle a variety of data types including built-in types — numeric types (integer, float, complex numbers), sequence types (lists, tuples), text sequence type (strings), binary sequence types (bytes, bytearray), set types (set), mapping types (dictionary), classes and built-in functions defined at the top level of a module.
Any attempt to pickle an unpicklable object may trigger PicklingError
exception.
Couple of gotchas:
- Pickle is specific to Python - so, ideal to use pickled objects within Python ecosystem.
-
When dealing with applications written in different programming languages; or even within Python ecosystem with different versions of Python involved, best to avoid Pickle as non-Python applications may not be able to reconstruct pickled Python objects.
- Alternatives: consider data formats that are ideal for interoperability such as JSON, XML
-
-
Given the nature of binary data, pickled Python objects are not human-readable unless earlier protocols are used to serialize the data
eg.,
A trivial example demonstrating the calls to pickle (save data to a binary file) and unpickle (load data from the binary file) a Python data structure.
#!/usr/bin/python import pickle EMP = {} EMP['name'] = 'Gary' EMP['id'] = 12345 # pickle with open('employee.db', 'wb') as f: pickle.dump(EMP, f, pickle.HIGHEST_PROTOCOL) print ' Pickled data, EMP ', EMP # unpickle with open('employee.db', 'rb') as f: EMP_REC = pickle.load(f) print 'Unpickled data, EMP_REC ', EMP_REC, '\n' print '(EMP_REC is EMP)? : ', (EMP_REC is EMP) print '(EMP_REC == EMP)? : ', (EMP_REC == EMP)
Running the above code shows the following on stdout.
Pickled data, EMP {'name': 'Gary', 'id': 12345} Unpickled data, EMP_REC {'name': 'Gary', 'id': 12345} (EMP_REC is EMP)? : False (EMP_REC == EMP)? : True
dump()
method takes a serializable Python object as the first argument; and writes pickled representation of the object (serialized object) to a file. Second argument is the file handle that points to an open file. Rest of the arguments are optional.
Third argument, if specified, is the protocol to use. pickle.HIGHEST_PROTOCOL
tells pickle module to use the highest protocol version available. When working with a mix of old and new Python versions, using earlier versions of the protocol may ease or eliminate some of the potential compatibility issues.
As highlighted earlier, the protocol used by pickle module is Python specific so better watch out for cross-language compatibility issues while working in heterogeneous environments.
load()
method reads a pickled object representation (serialized data) from a file and returns the reconstructed object. The protocol version is detected automatically so it is not necessary to specify the protocol version during unpickling process.
In-Memory Pickling/Unpickling Operations
If persistence is not a requirement, dumps()
and loads()
methods in pickle
module can be used to serialize (pickle) and deserialize (unpickle) a Python object in memory. This is useful when sending Python objects over network between compatible applications.
#!/usr/bin/python import pickle EMP = {} EMP['name'] = 'Gary' EMP['id'] = 12345 # in-memory pickling x = pickle.dumps(EMP, pickle.HIGHEST_PROTOCOL) print ' Pickled data, EMP ', EMP # in-memory unpickling EMP_REC = pickle.loads(x) print 'Unpickled data, EMP_REC ', EMP_REC, '\n' print '(EMP_REC is EMP)? : ', (EMP_REC is EMP) print '(EMP_REC == EMP)? : ', (EMP_REC == EMP)
Running the above code shows output identical to the output produced by the previous code listing - just that there is no file involved this time.
Pickled data, EMP {'name': 'Gary', 'id': 12345} Unpickled data, EMP_REC {'name': 'Gary', 'id': 12345} (EMP_REC is EMP)? : False (EMP_REC == EMP)? : True
Exceptions
As mentioned earlier, any attempt to pickle or unpickle objects that are not appropriate for serialization fail with an exception. Therefore, it is appropriate to safe guard the code with try-except blocks to handle unexpected failures.
Here is another trivial example demonstrating a pickling exception.
#!/usr/bin/python import sys import pickle try: f = open('dummy.txt', 'a') x = pickle.dumps(f) print 'Pickled file handle' except Exception, e: print 'Caught ', e.__class__.__name__, '-', str(e)
Running the above code throws a TypeError as shown below.
Caught TypeError - can't pickle file objects
(Credit: Various Sources including Python Documentation)
No comments:
Post a Comment