MongoDB — Hadoop and NoSQL Part 3

March 09, 2018

MongoDB is another type of NoSQL database that boasts high performance, high availability, and easy scalability. It is based on the idea of collections and documents.

A database in the MongoDB world is a physical, organized assembly of collections. A collection is a group of documents or a table. A document is a set of key-value pairs.

Documents have a dynamic schema meaning that every document in the same collection does not need to have matching structures. Even matching fields could hold different types of data. This concept makes MongoDB very flexible. This dynamic schema is sometimes known as schema-less meaning that it can pretty much be anything.

Let’s talk about a couple of niceties that MongoDB offers. One is the flexibility that you get with the schema less architecture that we touched on above. Another is that a single object’s structure is very clear. You can tell exactly what it is. Don’t worry about complex joins in MongoDB! We will get to why in a second. MongoDB comes with a very nice query language that is document based and makes querying of the documents very easy. Let’s take a quick look at a sample document that could be in MongoDB.

{
_id: ObjectId(23f20918g201),
postTitle: "Cat Videos",
likes: 4000000,
shares: 100000000,
{
timePosted: 10929382,
clicksOnPost: 923091208,
usersReached: 99302981029
}
]
}

In the piece of JSON above, we have some information about a Facebook post with the title Cat Videos from a certain Facebook user that got a ton of likes, comments, and shares. Inside of the post metadata we have more key value pairs showing that these can be nested. MongoDB is a little different because each document is stored as JSON objects. This is really cool! This is how you can have a schema less architecture because no two JSON objects have to look the same. However, you’re looking at the example above and thinking that looks pretty structured. You are right. The actual document inside of MongoDB is structured but the collection doesn’t care what the JSON looks like, so MongoDB is still schemaless. The document query language allows you to dive deep into the JSON to get the data that you need out of it. If you got fired up about no complex joins, this is why. All of the data that you need is inside of the JSON object so you don’t have to go from table to table to get the data that you want.

How to pick the right database?

Picking a database can be one of the hardest decision and most important decision that you can make. It’s hard because there are so many options for databases and because a database is an essential part of the technology stack.

The first step in finding the right database is knowing what kind of data your application will be creating. Here are some questions that can help guide you into choosing the right database:

• Is this data structured?
• Does the structure of the data change?
• How much data is there going to be?
• How do you want to use this data?
• Analytics? Transactional?
• Does the database need to be up all the time?
• Does the database need ACID properties?

Once you have some of these questions answered, you should have a good idea of what you need.

If you have structured data and are using this data for a transactional system, a relational database might be the right fit for you.

If your data doesn’t have a schema and you have a lot of it, HDFS might be the right “database” (because we know HDFS is really a filesystem) to use.

If you have structured data and the database should be up all the time, maybe Cassandra will work for you.

If you need a database that allows for multiple types of schemas, MongoDB might be an option for you.

There are so many options that are out there that your experience and judgement are super important. It’s important that you have the ability to fail fast if you’re not sure about which database you need. Failing fast will allow you to try a couple of the database and really find out which one will work for you the best. You might be thinking that MongoDB is perfect for your use case and end up trying MySQL and finding out that it works a little bit better. There is no shame in changing just do it early and fast.