MongoDB say: "A good rule of thumb for structuring data in MongoDB is to prefer embedding data inside documents to breaking it apart into separate collections, unless you have a good reason"
But why...?
As an ex-Oracle Database Administrator, coming from a relational world, my instinct it always to break data apart into separate collections, a la relational databases.
But here are the reasons why, with MongoDB (and other similar databases), we're advised to prefer embedded data inside documents over breaking it apart into separate collections:
* Performance: MongoDB is designed for high-performance, document-oriented data storage. When data is embedded within a document, accessing and querying that data becomes more efficient because it can be retrieved in a single disk read operation. This reduces the need for complex joins or multiple queries that would be required in a relational database.
* Atomicity: MongoDB provides atomic operations at the document level. When data is embedded within a document, updates to that data can be performed atomically, ensuring consistency. In contrast, if data were stored in separate collections, updating related data would require multiple operations, making it more challenging to maintain atomicity.
* Data Locality: By embedding related data within
a document, MongoDB takes advantage of data locality. When a document is accessed, all the required data is available in one place, which can be more efficient for read operations. In contrast, if data were stored in separate collections, it would require additional network round-trips and potentially slower performance.
* Schema Flexibility: MongoDB is schema-flexible, meaning that documents within a collection do not need to have the same structure. This allows for easy updates and evolution of the data model over time. By embedding data within documents, you can easily accommodate changes to the data structure without impacting other parts of the application.
* Simplified Development: Embedding data within documents can simplify application development by reducing the need for complex joins and relationships. Developers can retrieve all the required data in a single query, leading to cleaner and more straightforward code.
So that's why I need to fight my instincts from time to time and put all the related data together in a single collection! However, I still think that there are cases where breaking data apart into separate collections can surely be beneficial. If the embedded data is too large, for example, it just becomes a nightmare to work with in a single document. Or, if we need to enforce strict relationships and integrity constraints between entities (as is my instinct...!) then separate collections would surely be worth considering.