CommonLounge Archive

Hands-on Assignment: Data Exploration and Preparation with Pandas

June 27, 2018

In this hands-on assignment, we’ll use the Pandas python library to explore a dataset and then prepare it for machine learning algorithms. The dataset we’ll be using is a medical dataset with information about some patients on metrics like glucose, insulin levels, and other metrics related to diabetes. The primary objectives for this assignment are - (a) practice Pandas on a realistic task, (b) learn how to get a feel for a large dataset (also known as data cleaning and data exploration), (c) see the distinction between Pandas and NumPy and (d) learn the typical flow for data preparation.

Dataset description

To get started, first download the dataset from this link: diabetes.csv. Open the file in your favorite text editor and have a look.

Quiz time!

The quiz will guide you through the rest of the assignment. Make sure you finish each step before moving on to the next step. Let’s get started!


© 2016-2022. All rights reserved.