# Learn Data Science with Python

March 23, 2018

This 45-part course consists of tutorials, quizzes, hands-on assignments and real-world projects to learn data science, as well as advanced python tools for data science.

# Key Features of the Course

- 10+ portfolio projects and 150+ exercises to give you a lot of practice and build fluency.
- Most of the tutorials are available in three different formats — video, long article and bite-sized cards — so you can learn the way that works best for you.
- Articles
**and videos**have code execution built-in. You can play the instructor’s code right inside the video! - Exceptional content quality. We teach you the real thing, no dumbing things down or only talking about the easiest use case.
- We collect ratings on every tutorial and project. Anything with average rating below 4.5 is sent back for revision.

# What is Data Science? Why Data Science?

Trillions of gigabytes of data are being produced yearly, and the number is still growing exponentially. However, data is only a raw material, and Data Science is the field which enables us to extract information from it.

Data Scientists are in demand in virtually every company to drive strategic decisions and power their business. It was ranked the #1 Job by Glassdoor with an average salary of over $120,000.

**Growth of Data.** Source: Patrick Cheesman

# Primary Objectives of the Course

- Become familiar with all the key components of data science — programming, statistics and probability, data analysis and exploration, and machine learning.
- Learn advanced Python data science libraries such as NumPy, Pandas, Matplotlib, Seaborn and Scikit-learn.
- Understand machine learning concepts and algorithms. Apply them to problems in image classification, sentiment classification, movie recommendations, etc.
- Become comfortable with the the end-to-end workflow of a typical data science project - starting from data cleaning and analysis and going all the way to interpreting results and machine learning models.
- Attain fluency and build a portfolio by implementing many real-world projects using above techniques.

**Prerequisites:** Python and Linear Algebra, Statistics and Probability Review.

**Related course:** Machine Learning.

**Enroll** to add this course to the top of your Home Page**. Get started** with the first tutorial below**.**

# Introduction to Data Science

An introduction to **data science** - what it is and examples of data science around us. The introduction article also introduces **key components** of data science, namely - programming, data, statistics, machine learning and big data.

# NumPy Library

In the next few sections, you’ll learn about various python libraries for data science. This section teaches you NumPy (**Num**erical **Py**thon), which provides vector and matrix primitives in Python.

- Introduction to NumPy
- NumPy: Operations
- NumPy: Manipulating arrays
- Hands-on Assignment: Data Exploration with NumPy

# Pandas Library

The Pandas library introduces a DataFrame, which is basically a table (like a database table or an excel sheet, but in Python). Pandas is the go-to library for working with structured tabular datasets.

- Pandas Introduction: DataFrames
- Pandas: Creating Series and DataFrames
- Pandas: Indexing and Slicing
- Pandas: Criteria Based Selection
- Pandas: Modifying DataFrames
- Hands-on Assignment: Data Exploration and Preparation with Pandas

# Data Visualization with Matplotlib

In this section, we will learn about data visualization. We will mostly be using the `matplotlib`

library.

- Introduction to Data Visualization with Matplotlib
- Hands-on Assignment: Data Visualization with Matplotlib
- Histograms
- Box plots

# Data Cleaning and Analysis

In this section, we will learn about some advanced data cleaning and analysis methods — including combining multiple datasets, data transformations, and handling duplicate data, missing values, and outliers. The section ends with a detailed project for end-to-end data cleaning and analysis.

- Pandas: DataFrame Merge and Concatenation
- Data Transformations [Under Construction]
- Handling Duplicates, Missing values and Outliers [Under Construction]
- End-to-End Project: Data Cleaning and Analysis [Under Construction]

# Advanced Data Analysis

In this course, we will learn about some advanced methods for data analysis using `pandas`

and `seaborn`

libraries. This includes getting aggregate statistics after grouping the data based on some variable, analyzing time series data, and performing multivariate analysis.

- Pandas: Apply functions and GroupBy
- Pandas: Working with Dates and Times
- Correlation Analysis: Two Variables
- Correlation Analysis: Multivariable [Under Construction]

# Machine Learning

In this section, we’ll introduce **machine learning** and take a look at a number of machine learning algorithms. Algorithms will include **regression, classification and clustering algorithms** commonly used in data science projects. After learning each algorithm, we’ll do **hands-on projects** and apply them to various applications such as **handwritten digit recognition** and **diabetes diagnosis**.

- What is Machine Learning? Why Machine Learning?
- K-nearest neighbors
- Hands-on Project: Digit classification with K-Nearest Neighbors and Data Augmentation
- Linear Regression
- Gradient Descent
- Hands-on Assignment: Implementing Linear Regression with Gradient Descent
- Logistic Regression
- Quiz: Logistic Regression
- K-Means Clustering
- Types of Learning Algorithms: Supervised, Unsupervised and Reinforcement Learning
- Quiz: Types of Machine Learning problems

# End-to-End Data Science Projects

This section consists of two **end-to-end data science projects**. The last tutorial contains a list of 10 project ideas (including datasets and suggested algorithms). It is recommended that you do at-least one **end-to-end project** as part of the course.

- End-to-End Example: Using Logistic Regression for predicting Diabetes
- Predicting Titanic Survivors with Machine Learning (Detailed End-to-End Example)
- Data Science and Machine Learning Project Ideas

# Natural Language Processing

Natural language processing comprises of a set of computational techniques to understand natural languages such as English, Spanish, Chinese, etc. In this section, you’ll see many popular NLP applications, such as search engines, finding related articles, sentiment classification and text classification, and topic modeling.

- Introduction to Natural Language Processing
- TF-IDF: Vector representation of Text
- Quiz: TFIDF (+ search engines, related articles recommender)
- Hands-on Project: Implementing a Search engine from scratch
- Text Classification (Topic Categorization, Spam filtering, etc)
- Hands-on Assignment: Sentiment Classification with Naive Bayes
- Topic modeling with LDA

# Other topics in Data Science

This section introduces us to **databases and SQL**, used for storing and managing data used in computer systems. We’ll also look at **map reduce**, a programming model that allows us to perform parallel processing on large data sets in a distributed environment. Again, our tutorials will be interleaved with **quizzes and hands-on assignments**.