In this hands-on assignment, we'll use the matplotlib python library to visualize a dataset. The dataset we'll be using is a medical dataset with information about some patients on metrics like glucose, insulin levels, and other metrics related to diabetes. The assignment will serve two primary objectives - (a) practice matplotlib on a realistic task, and (b) learn how one can visualize and present a dataset.
Project Template on Google Colaboratory
Work on this project directly in-browser via Google Colaboratory. The link above is a starter template that you can save to your own Google Drive and work on. Google Colab is a free tool that lets you run small Machine Learning projects through your web browser. You should read this 1 min tutorial if you're unfamiliar with Google Colaboratory. Note that, for this project, you'll have to upload the dataset linked below to Google Colab after saving the notebook to your own system.
Getting started
To get started, first download the dataset from this link: diabetes.csv. Open the file in your favorite text editor and have a look.
First, we'll import numpy, pandas and matplotlib. Then, we'll load the dataset, clean it, and also create a normalized dataset.
import numpy as npimport pandas as pdimport matplotlib.pyplot as pltdataset = pd.read_csv('diabetes.csv')for column in ["Glucose", "BloodPressure", "SkinThickness", "Insulin", "BMI", "DiabetesPedigreeFunction", "Age"]:bad = (dataset[column] == 0)dataset.loc[bad, column] = Nonedataset.describe()dataset.info()normalized = (dataset - dataset.mean()) / dataset.std()normalized["Outcome"] = (normalized["Outcome"] > 0.0)