If real world data-sets contain numeric, texts, alpha-numeric, time-stamps, and various other unstructured data types, then how does one store, retrieve and easily manipulate these multidimensional data-sets? The answer is a data science library like Pandas! Pandas is a powerful data analysis toolkit with high-performance and easy-to-use data structures. Unlike Excel and SQL, it carries a host of useful tools, methods and other functionalities that set it apart when it comes to row-wise and column-wise data manipulations. We will visit these functionalities in this tutorial.
In [7]: from IPython.display import Imagefrom IPython.core.display import HTMLImage(url="https://raw.githubusercontent.com/Back2Basics/Pandas-Talks/master/Python-datasci.jpg", width=400, height=300)Out[7]: <IPython.core.display.Image object>
In this article, we will traverse through the usefulness of pandas: starting with data exploration, followed by data analysis and then concluding with data visualization. Often, pandas is used for prepping data for machine learning and so we will take a quick dip into that at the very end.
You are encouraged to follow along with the tutorial and play around with Pandas, trying various things and making sure you're getting the hang of it. Let's get started!
Nearly everything that follows will require a basic understanding of the fundamental data structures: the Series, DataFrame, and Index.