MongoDB schema design for multiple auth user accounts
I hope it could help someone with similar needs like you.
Schema Design and Data Modelling in MongoDB
SQL has fixed/strict schema whereas NoSQL has dynamic/flexible schema i.e does not enforce document structure
MongoDB has two types of data model:
- Embedded data modelling:
- are has single document structure and are refer to as denormalized models
- supports document level atomic operations.
- easy to perform CRUD
- is used 70% of the cases and are high performing for read operation
- Size can easily reach its threshhold which is 16 MB and is prone to high redundancy
- recommended in one-to-one and one-to-many relationships
- Refrential or linked data modelling :
- mimicking SQL database's normalized tables to reduce data duplication and redundancy.
- reference or _id is used to refer another document which is similar to joining table in SQL using primary and foreign key.
- is used on 30% of the cases
- recommended in many-to-many relationships
- Embedded data modelling:
Perspectives of Data Modelling
- Conceptual Data Modelling: Big pictures about the functionality and services which also includes
- ER Data Modelling: Graphical approach to database design.
- Schema Designing
- Logical Data Modelling: Conceptual data modelling will be converted to logical data modelling(program) using programming language, tables etc. (Server Code)
- Physical Data Modelling: Putting logical DM into practise where actual data is inserted by its users.(database)
Types of Data Models
- Flat, star, hierarchial, Relational, Object-relational
Model Relationships between the documents
- In general, you should structure your schema so your application receives all of its required information in a single read operation.
Model One-to-One Relationships with Embedded Documents
- In referenced or normalized data model, If one document is frequetly refering some data in another document, It would create better data model to embed both documents into one.
- If a single document seems to be large, it is better split your data into referential model, the most frequently-accessed portion of the data should go in the collection that the application loads first
```json
// one person and one address
{
_id: "joe",
name: "Joe Bookreader",
address: {
street: "123 Fake Street",
city: "Faketon",
state: "MA",
zip: "12345"
}
}
```
Model One-to-Many Relationships with Embedded Documents
- It is also based on same concept as one-to-one in terms of designing model for it.
// one person and his multiple address
{
"_id": "joe",
"name": "Joe Bookreader",
"addresses": [
{
"street": "123 Fake Street",
"city": "Faketon",
"state": "MA",
"zip": "12345"
},
{
"street": "1 Some Other Street",
"city": "Boston",
"state": "MA",
"zip": "12345"
}
]
}
Model One-to-Many Relationships with References Documents
- In some case it is better to use referential model for better performane like below.
- To avoid repetition of the publisher data, use references and keep the publisher information in a separate collection from the book collection.
{
_id: 'some string'
name: "O'Reilly Media",
founded: 1980,
location: "CA",
books: [123456789, 234567890, ...]
}
{
_id: 123456789,
title: "MongoDB: The Definitive Guide",
author: [ "Kristina Chodorow", "Mike Dirolf" ],
published_date: ISODate("2010-09-24"),
pages: 216,
language: "English"
}
{
_id: 234567890,
title: "50 Tips and Tricks for MongoDB Developer",
author: "Kristina Chodorow",
published_date: ISODate("2011-05-06"),
pages: 68,
language: "English"
}
- If there is chance that books array can become huge in future, it is better to store the publisher reference inside the book document.
- Observe the intelligent changes in the schema how an
_id
info (ie. publisher Id) is referred to in the BOOK COLLECTION aspublisher_id
.
{
_id: "oreilly",
name: "O'Reilly Media",
founded: 1980,
location: "CA"
}
{
_id: 123456789,
title: "MongoDB: The Definitive Guide",
author: [ "Kristina Chodorow", "Mike Dirolf" ],
published_date: ISODate("2010-09-24"),
pages: 216,
language: "English",
publisher_id: "oreilly"
}
{
_id: 234567890,
title: "50 Tips and Tricks for MongoDB Developer",
author: "Kristina Chodorow",
published_date: ISODate("2011-05-06"),
pages: 68,
language: "English",
publisher_id: "oreilly"
}
1) There are three strategies that you might take to structure your data in MongoDB:
- a) Array of embedded documents
- b) Array of embedded references
- c) Expanded into the parent document
Strategy (a) is the first one you describe, where the Profile document contains an array of Account sub-documents.
Strategy (b) is similar to strategy (a), but you'd use an array of references to other documents (typically in an Account collection) rather than embedding the actual documents.
Strategy (c) is the one you describe as "having all data in a single model".
2) It's generally considered Best Practice to use an array of embedded documents, especially if the information in them can vary. If it will make your life easier, you can use a key to distinguish the type of the account, like so:
{
firstname: 'Fred',
lastname: 'Rogers',
email: '[email protected]',
accounts: [
{ kind: 'facebook',
uid: 'fred.rogers'
},
{ kind: 'internal',
username: 'frogers',
password: '5d41402abc4b2a76b9719d911017c592'
},
{ kind: 'twitter',
uid: 'fredr'
}
]
}
3) MongoDB allows you search on an embedded document. So you would write the following query (JavaScript syntax):
db.profile.find(
{ email: '[email protected]', 'accounts.kind': 'facebook' }
);
With appropriate indexes, this query will be quite fast.