An introduction to data science - what it is and examples of data science around us. The introduction article also introduces key components of data science, namely - programming, data, statistics, machine learning and big data.
This 35-part course consists of tutorials, quizzes, hands-on assignments and real-world projects to learn data science, as well as advanced python tools for data science. You can think of this list as a "Free Online Nano Book".
Trillions of gigabytes of data is being produced yearly, and the number is still growing exponentially. It is estimated that for every person, 1.7 megabytes of data will be produced every second by 2020.
Our society is increasingly becoming data dependent. Data is only a raw material and extracting information from it requires further work. Data Science helps us make sense of data.
Subscribe to add this course to the top of your Home Page. Get started with the first article below.
In this section we cover the mathematics background required for data science. The first 2 tutorials (+1 quiz) are a review of linear algebra and statistics. The 3rd tutorial (+1 quiz) is a review of probability.
Feel free to skip / skim if you are already comfortable with these topics. These tutorials might feel rushed if you haven't encountered these topics before, since they are meant to be a short review.
We'll introduce the important Python libraries for Data Science. Numpy (numerical python) provides vector and matrix primitives in Python. Pandas introduces a Data Frame, which is basically a table (like a database table or an excel sheet, but in Python). Matplotlib is used for plotting and visualization of data.
This is followed by an end-to-end example illustrating the use of all of these libraries to solve a data science problem. Although some of the machine learning concepts might be unclear, the example will tie together what we have learnt from the previous three lessons.
In this section, we first discuss Hypothesis Testing, probably the most popular application of Data Science. Given multiple options (say different drugs for a disease, or different user interfaces for a website), it allows us to identify which one is better and by how much.
The next tutorial (+quiz), covers techniques for network analysis. This will include understanding and implementing the famous PageRank algorithm, which powers Google Search.
In this section, we'll introduce machine learning and take a look at a number of machine learning algorithms. Algorithms will include regression, classification and clustering algorithms commonly used in data science projects. We'll also introduce recommendation systems (such as those used by Netflix and Amazon). After learning each algorithm, we'll do hands-on projects and apply them to various applications such as hand-written digit recognition, sentiment classification, diabetes diagnosis, and movie recommendations.