In the last lecture, we learned about how useful scatter-plots are when analyzing 2D data. But scatter-plots cannot capture all the essential details of the data. Also, they don't always give the correct answer when there is noise.
For example, Consider the given table:
Ans. If we use the approach, that we previously used in the first lecture, we get that the cost of the house is:
But, can we say with utmost surety that this result is correct. If we assume that the cost is linearly dependent to the size and this above rule is satisfied.
On observing the above plot, we can see that there is no linearity between the cost and size points, and we there is a deviation. This deviation is known as noise.
For cases, when we want to ignore these deviations and understand the larger picture behind the graph, we will use Bar Graphs.
Bar graphs help in plotting groups of data and help in finding the global trends followed by the data. Presence of a few exceptions will not prevent us from understanding the bigger picture represented by the data.
We group the data into bins and then plot again.
Each pair of points(1300,1400),(1600,1900),(2100,2300) is represented by bins having the ranges of (1000 -1500),(1500 - 2000 ) and (2000 - 2500 ) respectively.
From the bar chart, it is more easy to see that the graph is also following linearity property. For smaller data sets, we won't see the need for bar charts but for larger datasets in thousands, it becomes necessary to use bar-charts.
Histograms are a special case of bar charts which is defined over 1D data. In this, the y-axis represents the frequency of the data. This is helpful to get a count of the values present in very bin of the histogram.
From the histogram given above, we can see the frequency of numbers between the range 1300 -1600 is 2.
1) Bar charts and histograms work by grouping data and help in identifying global trends.
2) Bar charts work on 2D data and histograms on 1D data.
I would love to receive feedback in comments below.