Pandas is an open-source Python library which provides a 2-dimensional data structure called a DataFrame.
A DataFrame is the Python equivalent of a spreadsheet or SQL table, and is the most commonly used Pandas object. It provides a lot of useful functionality to work with tabular data, which is something we do regularly in data science and machine learning.
In this tutorial, we will be learning about the pandas DataFrame and explore some of its useful methods.
We will need the Pandas and NumPy libraries for this tutorial, so let us load them at the beginning.
type=codeblock|id=pd_rc_import|show_output=0|autocreate=datascienceimport pandas as pdimport numpy as np
Note: import pandas as pd is the standard convention for importing pandas. It is followed universinally within the machine learning / data science communities.
The next step is to load our dataset.