In this hands-on assignment, we’ll use the Pandas python library to explore a dataset and then prepare it for machine learning algorithms. The dataset we’ll be using is a medical dataset with information about some patients on metrics like glucose, insulin levels, and other metrics related to diabetes. The primary objectives for this assignment are - (a) practice Pandas on a realistic task, (b) learn how to get a feel for a large dataset (also known as data cleaning and data exploration), (c) see the distinction between Pandas and NumPy and (d) learn the typical flow for data preparation.
To get started, first download the dataset from this link: diabetes.csv. Open the file in your favorite text editor and have a look.
The quiz will guide you through the rest of the assignment. Make sure you finish each step before moving on to the next step. Let’s get started!