Python: can't pickle module objects error

I can reproduce the error message this way:

import cPickle

class Foo(object):
    def __init__(self):
        self.mod=cPickle

foo=Foo()
with file('/tmp/test.out', 'w') as f:
    cPickle.dump(foo, f) 

# TypeError: can't pickle module objects

Do you have a class attribute that references a module?


Python's inability to pickle module objects is the real problem. Is there a good reason? I don't think so. Having module objects unpicklable contributes to the frailty of python as a parallel / asynchronous language. If you want to pickle module objects, or almost anything in python, then use dill.

Python 3.2.5 (default, May 19 2013, 14:25:55) 
[GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import dill
>>> import os
>>> dill.dumps(os)
b'\x80\x03cdill.dill\n_import_module\nq\x00X\x02\x00\x00\x00osq\x01\x85q\x02Rq\x03.'
>>>
>>>
>>> # and for parlor tricks...
>>> class Foo(object):
...   x = 100
...   def __call__(self, f):
...     def bar(y):
...       return f(self.x) + y
...     return bar
... 
>>> @Foo()
... def do_thing(x):
...   return x
... 
>>> do_thing(3)
103 
>>> dill.loads(dill.dumps(do_thing))(3)
103
>>> 

Get dill here: https://github.com/uqfoundation/dill


Recursively Find Pickle Failure

Inspired by wump's comment: Python: can't pickle module objects error

Here is some quick code that helped me find the culprit recursively.

It checks the object in question to see if it fails pickling.

Then iterates trying to pickle the keys in __dict__ returning the list of only failed picklings.

Code Snippet

import pickle

def pickle_trick(obj, max_depth=10):
    output = {}

    if max_depth <= 0:
        return output

    try:
        pickle.dumps(obj)
    except (pickle.PicklingError, TypeError) as e:
        failing_children = []

        if hasattr(obj, "__dict__"):
            for k, v in obj.__dict__.items():
                result = pickle_trick(v, max_depth=max_depth - 1)
                if result:
                    failing_children.append(result)

        output = {
            "fail": obj, 
            "err": e, 
            "depth": max_depth, 
            "failing_children": failing_children
        }

    return output

Example Program

import redis

import pickle
from pprint import pformat as pf


def pickle_trick(obj, max_depth=10):
    output = {}

    if max_depth <= 0:
        return output

    try:
        pickle.dumps(obj)
    except (pickle.PicklingError, TypeError) as e:
        failing_children = []

        if hasattr(obj, "__dict__"):
            for k, v in obj.__dict__.items():
                result = pickle_trick(v, max_depth=max_depth - 1)
                if result:
                    failing_children.append(result)

        output = {
            "fail": obj, 
            "err": e, 
            "depth": max_depth, 
            "failing_children": failing_children
        }

    return output


if __name__ == "__main__":
    r = redis.Redis()
    print(pf(pickle_trick(r)))

Example Output

$ python3 pickle-trick.py
{'depth': 10,
 'err': TypeError("can't pickle _thread.lock objects"),
 'fail': Redis<ConnectionPool<Connection<host=localhost,port=6379,db=0>>>,
 'failing_children': [{'depth': 9,
                       'err': TypeError("can't pickle _thread.lock objects"),
                       'fail': ConnectionPool<Connection<host=localhost,port=6379,db=0>>,
                       'failing_children': [{'depth': 8,
                                             'err': TypeError("can't pickle _thread.lock objects"),
                                             'fail': <unlocked _thread.lock object at 0x10bb58300>,
                                             'failing_children': []},
                                            {'depth': 8,
                                             'err': TypeError("can't pickle _thread.RLock objects"),
                                             'fail': <unlocked _thread.RLock object owner=0 count=0 at 0x10bb58150>,
                                             'failing_children': []}]},
                      {'depth': 9,
                       'err': PicklingError("Can't pickle <function Redis.<lambda> at 0x10c1e8710>: attribute lookup Redis.<lambda> on redis.client failed"),
                       'fail': {'ACL CAT': <function Redis.<lambda> at 0x10c1e89e0>,
                                'ACL DELUSER': <class 'int'>,
0x10c1e8170>,
                                .........
                                'ZSCORE': <function float_or_none at 0x10c1e5d40>},
                       'failing_children': []}]}

Root Cause - Redis can't pickle _thread.lock

In my case, creating an instance of Redis that I saved as an attribute of an object broke pickling.

When you create an instance of Redis it also creates a connection_pool of Threads and the thread locks can not be pickled.

I had to create and clean up Redis within the multiprocessing.Process before it was pickled.

Testing

In my case, the class that I was trying to pickle, must be able to pickle. So I added a unit test that creates an instance of the class and pickles it. That way if anyone modifies the class so it can't be pickled, therefore breaking it's ability to be used in multiprocessing (and pyspark), we will detect that regression and know straight away.

def test_can_pickle():
    # Given
    obj = MyClassThatMustPickle()

    # When / Then
    pkl = pickle.dumps(obj)

    # This test will throw an error if it is no longer pickling correctly

Tags:

Python

Pickle