MapReduce is a programming framework that allows us to perform parallel processing on large data sets in a distributed environment. It is a data processing paradigm for condensing large volumes of data into useful aggregated results.
As the name suggests, Map and Reduce functions are performed on the data. Map function processes a block of data and generates a set of intermediate key/value pairs. Reduce function merges all the intermediate values associated with the same intermediate key into a single value.
The two operations are each designed such that efficient implementation is possible in distributed systems. Moreover, the two types of operations together provide a lot of flexibility and expressibility, so that a variety of real-world tasks can be implemented within the MapReduce paradigm.