Insert a Pandas Dataframe into mongodb using PyMongo
I doubt there is a both quickest and simple method. If you don't worry about data conversion, you can do
>>> import json
>>> df = pd.DataFrame.from_dict({'A': {1: datetime.datetime.now()}})
>>> df
A
1 2013-11-23 21:14:34.118531
>>> records = json.loads(df.T.to_json()).values()
>>> db.myCollection.insert(records)
But in case you try to load data back, you'll get:
>>> df = read_mongo(db, 'myCollection')
>>> df
A
0 1385241274118531000
>>> df.dtypes
A int64
dtype: object
so you'll have to convert 'A' columnt back to datetime
s, as well as all not int
, float
or str
fields in your DataFrame
. For this example:
>>> df['A'] = pd.to_datetime(df['A'])
>>> df
A
0 2013-11-23 21:14:34.118531
odo can do it using
odo(df, db.myCollection)
If your dataframe has missing data (i.e None,nan) and you don't want null key values in your documents:
db.insert_many(df.to_dict("records"))
will insert keys with null values. If you don't want the empty key values in your documents you can use a modified version of pandas .to_dict("records")
code below:
from pandas.core.common import _maybe_box_datetimelike
my_list = [dict((k, _maybe_box_datetimelike(v)) for k, v in zip(df.columns, row) if v != None and v == v) for row in df.values]
db.insert_many(my_list)
where the if v != None and v == v
I've added checks to make sure the value is not None
or nan
before putting it in the row's dictionary. Now your .insert_many
will only include keys with values in the documents (and no null
data types).
Here you have the very quickest way. Using the insert_many
method from pymongo 3 and 'records' parameter of to_dict
method.
db.collection.insert_many(df.to_dict('records'))