How can I load data from mongodb collection into pandas' DataFrame?
You can load your MongoDB data to pandas DataFame using this code. It works for me.
import pymongo
import pandas as pd
from pymongo import Connection
connection = Connection()
db = connection.database_name
input_data = db.collection_name
data = pd.DataFrame(list(input_data.find()))
Use:
df=pd.DataFrame.from_dict(collection)
Comprehend the cursor you got from the MongoDB before passing it to DataFrame
import pandas as pd
df = pd.DataFrame(list(tweets.find()))
If you have data in MongoDb like this:
[
{
"name": "Adam",
"age": 27,
"address":{
"number": 4,
"street": "Main Road",
"city": "Oxford"
}
},
{
"name": "Steve",
"age": 32,
"address":{
"number": 78,
"street": "High Street",
"city": "Cambridge"
}
}
]
You can put the data straight into a dataframe like this:
from pandas import DataFrame
df = DataFrame(list(db.collection_name.find({}))
And you will get this output:
df.head()
| | name | age | address |
|----|---------|------|-----------------------------------------------------------|
| 1 | "Steve" | 27 | {"number": 4, "street": "Main Road", "city": "Oxford"} |
| 2 | "Adam" | 32 | {"number": 78, "street": "High St", "city": "Cambridge"} |
However the subdocuments will just appear as JSON inside the subdocument cell. If you want to flatten objects so that subdocument properties are shown as individual cells you can use json_normalize without any parameters.
from pandas.io.json import json_normalize
datapoints = list(db.collection_name.find({})
df = json_normalize(datapoints)
df.head()
This will give the dataframe in this format:
| | name | age | address.number | address.street | address.city |
|----|--------|------|----------------|----------------|--------------|
| 1 | Thomas | 27 | 4 | "Main Road" | "Oxford" |
| 2 | Mary | 32 | 78 | "High St" | "Cambridge" |