In this article, we are going to be looking at some of the different type of data that flow through a system. We will also look at some of the components of a Big Data system at a high level.
There are three main types of data that you will see. These are structured, semi-structured, and unstructured data. Let’s take a look at them and see some examples of them.
If you have any experience with relational database, you know exactly what structured data is. Structured data is data that has clear organization and can be queried using basic algorithms. So...
The Hadoop Distributed File System (HDFS) is the one of the two components that makes up the backbone of Hadoop. As the name suggests, HDFS was implemented based on the distributed file system architecture with a few extra features.
A Distributed File System (DFS) is a file system that uses network protocols to store and manage files on a server or set of servers. The goal of a DFS is to allow clients to access data as if it were on their local machine. A DFS also allows data to be stored and shared in a secure and convenient way for multiple users. The servers that store the data have full control over the data and give access control to the clients of the DFS. This is why access to servers that are part of a distributed file system are limited — the data must be retrieved through an API.
HDFS was designed based on the distributed file syst...