Statistics is the study of data. Given data, we try to make sense out of it. Its primary use is in decision making. Given a huge quantity of information, it is important to decide what action can be taken using it.
It has numerous applications ranging from data mining, machine learning, business, political science, biology, physics, astronomy, etc.
Using fancy tools like neural nets, boosting, and support vector machines without understanding basic statistics is like doing brain surgery before knowing how to use a band-aid -Larry Wassermann
We will begin with the most basic forms of representation of data:
Scatter plot
Let's imagine that you want to buy a new house. Before you pay, you would like to check about a few things . Are you paying too much for the house? When is the best time to buy a house? Which is the best location for the house?
To make this decision of buying a house, you would like to look at some house size(in ft2) and their corresponding prices(in $) :
What will be the cost of house having size 1300 ft2?
Ans. It is easy to find the cost of this house present in the table. The corresponding value is: $104000.
What will be the cost of the house having 2100 ft2?
Ans. This is slightly tricky. We don't have any house having the size 2100 ft2. But we know that 2100 lies between 1800 and 2400. So, the size of the house in 2100 ft2 will be the average of the price of 1800 and 2400 ft2:
Given this information, we can also say one more interesting property about this given data:
What is the cost of the house per square feet?
Ans. It is easy to see the relationship between the cost and the size of the house in square feet =
All the elements in the data follow this wonderful property.
This type of relationship is also known as Linear Relationship and data is said to follow the property of linearity.
One easy method to find this relationship is through a scatter plot:
It is a 2D representation of the data in the co-ordinate axes where the x-variable will be the size and the y-variable is the price.
The graph is a line!!!
Key Takeaways:
- Statistics is the analysis and study of data which is very important in decision making.
- Scatterplot is a simple tool to analyze 2D data. It is not important that the data should have a directly proportional relationship.
References:
- Udacity
- Scatter Plots Graphing Calculator Online
- http://www.sciweavers.org/free-online-latex-equation-editor
- Flowchart Maker & Online Diagram Software
This is my 2nd article only. I would love to receive feedback from you in regards to how I can improve my explanation, typos, mistakes,etc.