Apache Hadoop is an open source framework that is used for processing massive amounts of data in a distributed fashion across a cluster of commodity hardware. By design, Hadoop is able to scale from a single node implementation to thousands of nodes or servers. With more nodes comes more computation power and storage.
Hadoop is a simple programming framework that is written in Java. Hadoop has two main components that make up the backbone of the framework: MapReduce and the Hadoop Distributed File System. Throughout the years, companies and other open source communities continue to expand the number of applications and products that can interact with Hadoop ecosystem.
Now that you know what Hadoop is, let’s talk about what Hadoop isn’t. Hadoop is not the replacement for the relational database such as Postgres that handle millions of transactions a day. Relational Databases are organized into tables and normalized to optimize SQL queries — queries often run in milliseconds.
Hadoop really prefers to work with big data that is ...