# Pandas: Creating Series and DataFrames

April 18, 2019

# Introduction

In the previous tutorial, you have used **DataFrame** which is a 2-dimensional data structure supported by Pandas, which looks and behaves like a *table*.

In this tutorial, you will learn about **Series** which is a 1-dimensional data structure supported by Pandas. In fact, in Pandas, each column of a DataFrame is a Series.

We will also learn in depth about creating DataFrames, adding rows to a DataFrame and converting DataFrames to other formats, such as NumPy arrays or Python lists and dictionaries.

# Imports

Before we start, let’s import the Pandas and NumPy libraries, as we will need them throughout the tutorial.

```
import pandas as pd
import numpy as np
```

# Series

Just like the DataFrame, **Series** is another useful data structure provided by the Pandas library.

A Series is a 1-dimensional array. It is capable of holding all data types like integers, strings, floating point numbers, Python objects, etc.

Unlike DataFrames which have an index (row labels) and columns (column labels), **Series objects have only one set of labels**.

# Creating a Series

Series can be created with the following syntax:

`pandas.Series(data=None, index=None)`

Here’s a description of each of the parameters:

`data`

— refers to the data to store in the Series. It can be a list, 1D numpy array, dict or a scalar value.`index`

— refers to the index labels. If no index is passed, the default index is range`0`

to`n-1`

. Index values must be hashable and have the same length as the data.

Additionally, if a `dict`

of *key-value pairs* is passed as `data`

and no `index`

is passed, then the `key`

is used as the `index`

.

Let us look at some examples of Series construction.

```
# From a list, without passing any index
s1 = pd.Series([1, 'tom', 32, 'qualified'])
print(s1)
```

```
# From a list, with an index
s2 = pd.Series([1, 'tom', 32, 'qualified'], index=['number', 'name', 'age', 'status'])
print(s2)
```

```
# From a list of integer values, with an index
s3 = pd.Series([1, 345, 14, 24, 12], index=['first', 'second', 'third', 'fourth', 'fifth'])
print(s3)
```

```
# From a dict of key-value pairs
s4 = pd.Series({'number':1, 'name':'tom', 'age':32, 'status':'qualified'})
print(s4)
```

You will notice that the data type ( `dtype`

) of the Series is inferred from the elements passed to the Series. It is automatically chosen such that all elements in the Series are of same `dtype`

(or a sub-type of `dtype`

).

For example, if all the values are integers, the `dtype`

will be `int64`

. If the values are a mix of integers and floats, the `dtype`

of the Series will be `float64`

. Similarly, if any of the values is a string, the Series’ `dtype`

will be `object`

.

# Creating a DataFrame

Now we will look at the different ways for creating DataFrames. We use the following syntax to create a DataFrame.

`pandas.DataFrame(data=None, index=None, columns=None)`

Here’s a description of each of the parameters:

`data`

— a 2D array, or a dict (of 1D array, Series, list or dicts).`index`

— row index values for the DataFrame that will be created. If not specified, row index values default to the range`_0_`

to`_n-1_`

(where*n*is the number of rows).`columns`

— column labels of the DataFrame that will be created. If not specified, column labels default to the range`_0_`

to`_c-1_`

(where c is the number of columns).

There are many other ways to pass information about the row index values and the column labels as well, which we will see soon in this section.

## Creating DataFrame from ndarray

In this section, we will see how to create a DataFrame using a numpy `ndarray`

as `data`

, and a `list`

of column labels.

We will:

- Create a random
`ndarray`

with 7 rows and 5 columns:`np.random.rand(7, 5)`

. - Create a list of 5 column labels:
`['col1','col2','col3','col4','col5']`

- Create the DataFrame

Let us put these steps together and create a DataFrame:

```
# Create a random 7 x 5 numpy ndarray
np.random.seed(42) # set a seed so that the same random numbers are generated each time
np_array = 10 * np.random.rand(7, 5)
# Create a list of 5 column labels
cols = ['col1', 'col2', 'col3', 'col4', 'col5']
# Create the DataFrame
ndf = pd.DataFrame(data=np_array, columns=cols)
# Display dataframe
print(ndf)
```

## Creating DataFrame from dict

Next, let’s see how to create a DataFrame using a `dict`

of Series.

First, we will create a `dict`

of key-value pairs, where the values are pandas Series. This will be the `data`

parameter.

```
# make three Series'
s1 = pd.Series([10, 20, 30, 40, 50])
s2 = pd.Series(['a', 'b', 'c', 'd', 'e'])
s3 = pd.Series(['one', 'two', 'three', 'four', 'five'])
# create a dict
data_dict = {'col1': s1, 'col2': s2, 'col3': s3}
```

If we don’t pass a list of names to `column`

explicitly, the column labels in the constructed DataFrame will be the ordered list of dict keys.

```
# create dataframe
df = pd.DataFrame(data=data_dict)
# display dataframe
print(df)
```

If we pass a list to `column`

parameter, then only the dictionary keys which **match the list of column labels** are selectively kept in the DataFrame.

So, first we must create a list of columns which matches some of the dict keys.

`cols = ['col1', 'col2']`

Then, we can create the DataFrame with these columns:

```
# data_dict same as defined earlier
# create a list of columns_labes
cols = ['col1', 'col2']
# create DataFrame
df = pd.DataFrame(data=data_dict, columns=cols)
# display DataFrame
print(df)
```

## DataFrame from multiple lists as rows

Finally, let’s see an example where we use a list of lists to create a DataFrame. This is similar to creating a DataFrame using a 2D Numpy array.

Let us start with the following lists of rows:

```
a1 = ['one', 1, 'up', 'top', 'beauty']
a2 = ['zero', 0, 'down', 'bottom', 'charm']
```

Then, we combine both the above lists, to get a list of lists.

`l = [a1, a2]`

Let’s see the full code:

```
# create multiple lists (one per row)
a1 = ['one', 1, 'up', 'top', 'beauty']
a2 = ['zero', 0, 'down', 'bottom', 'charm']
# combine the data into a single list
l = [a1, a2]
# create a list of column names
col = ['col1', 'col2', 'col3', 'col4', 'col5']
# create the DataFrame
df2 = pd.DataFrame(data=l, columns=col)
# display DataFrame
print(df2)
```

Awesome! We saw how to create DataFrames using various different methods. Depending on how your data is stored initially, or where you are getting your data from, some of these methods would be more convenient than others for creating the DataFrame.

# Creating or adding rows

In this section we will learn how to add rows to an existing DataFrame.

To do this, we will be using the `.append()`

DataFrame method. This method allows us to add a single row or multiple rows to the DataFrame.

Let us see the syntax:

`DataFrame.append(other, ignore_index=False)`

where,

`other`

- can be a Series or Dictionary. Incase you want to pass multiple rows, you also pass a list of Series, Dictionaries or even a DataFrame.`ignore_index`

- if**True,**it ignores the index of object passed in`other`

and ressigns the row(s) with new index instead. If**False**(the default value), it preserves the index from`other`

.

There are a few other parameters which give us more options, but we won’t be going over them in this tutorial.

`append()`

returns a new DataFrame with the added row(s). It does not modify the original DataFrame.

Note:If the columns of DataFrame and`other`

object doesn’t match, the additional elements from non existing columns will be filled with`NaN`

. We will avoid doing this for now and assume that the`other`

object we pass has exactly the same columns as the DataFrame.

Let us see a few examples.

We will use the last DataFrame that we created in the previous section. Let’s take a look at it:

```
# display DataFrame
print(df2)
```

**Dict as new row**

Let us use a dict with key-value pairs as a new row.

The *keys* denote the column names and *values* denote the values of corresponding columns in the new row.

We will first declare the dictionary and then pass the dictionary as `other`

.

```
# declare a key-value pair dict type to match the dimensions of row
row = {'col1':'two', 'col2':2, 'col3':'blue', 'col4':'green', 'col5':'red'}
# pass it to the append() method
new_df = df2.append(row, ignore_index=True)
# display the new DataFrame
print(new_df)
```

**Series as new row**

Next we append a Series as a new row, where the index of the Series is same as column names of `df2`

.

We will first declare the Series with the required index and then pass it to the append method.

```
# create a series with column labels of "df2" as index
row = pd.Series(['three',3,'black','white','grey'],
index=df2.columns)
# pass it to the append() method
new_df = df2.append(row, ignore_index=True)
# display the new DataFrame
print(new_df)
```

## Adding multiple rows

We can also append multiple rows to the DataFrame by passing a list of Series or Python dictionaries.

Let us look at an example using Series. We will pass a list with multiple Series to the append method.

```
# create two series with column labels of "df2" as index
row1 = pd.Series(['four',4,'left','right','center'],
index=df2.columns)
row2 = pd.Series(['five',5,'Winterfell','Eyrie','Sunspear'],
index=df2.columns)
# pass it to the append() method
new_df = df2.append([row1, row2], ignore_index=True)
# display the new DataFrame
print(new_df)
```

Great! The new DataFrame now has two more rows.

# Converting a DataFrame to other formats

Earlier in the tutorial, you have seen how to create DataFrames using NumPy arrays, lists, dictionaries, etc.

Often, it is also useful to convert DataFrame back to one of these forms. In this section, we will see how to convert DataFrames to NumPy arrays, Python dictionaries, etc.

## Converting to a `ndarray`

(NumPy array)

First, let’s see how to convert a DataFrame to an NumPy `ndarray`

. There are two methods to do this.

- using the Pandas DataFrame
`.values`

attribute - using the Pandas DataFrame
`.to_numpy()`

method

`values`

attribute

The `values`

attribute extracts all the values from a Pandas DataFrame in the form of a NumPy `ndarray`

. Its syntax is as follows:

`DataFrame.values`

This returns a NumPy `ndarray`

. The datatype (`dtype`

) of the `ndarray`

will be chosen such that it can preserve and accommodate all the values from the DataFrame.

For example, if the DataFrame contains integers and floats, the `dtype`

of the `ndarray`

will be `float`

. But if the DataFrame has numeric as well as non-numeric values, then the `dtype`

will be `object`

.

Let us see an example. We will reuse the `ndf`

DataFrame we created at the beginning of the tutorial.

```
# print DataFrame
print('The DataFrame')
print(ndf)
print("")
# use the values attribute to return an ndarray
print('Using values attribute')
print(ndf.values)
print("")
```

`to_numpy()`

method

We can also use the `to_numpy()`

method to convert a DataFrame to a NumPy `ndarray`

. The syntax is as follows:

`DataFrame.to_numpy(dtype="")`

In this case too, the `dtype`

of the `ndarray`

is chosen such that it can preserve and accommodate all the values from the DataFrame. But with this method we can also explicitly specify the `dtype`

by passing the `dtype`

parameter.

Let’s see an example. First, we will just use the method as is, so the result is same as before. Then, we will specify the dtype=‘int’.

```
# use the to_numpy() method to convert to ndarray
print('Using to_numpy() method')
print(ndf.to_numpy())
print("")
# use to_numpy() method with explicit dtype
print('Using to_numpy() method with dtype="int"')
print(ndf.to_numpy(dtype='int'))
print("")
```

Note:`to_numpy()`

method is a recent addition in Pandas version 0.24.0. If you are following this tutorial on your own computer, make sure your Pandas is updated to this version before using the method. You can check the Pandas version by typing`pandas.__version__`

.

## Converting to a `dict`

or a `list`

We can also convert a DataFrame to a Python `dict`

. We will use the `to_dict()`

method to do this. The syntax is as follows:

`DataFrame.to_dict(orient='dict')`

The `orient`

parameter can take a number of arguments, but we will concentrate on four of them: `"dict"`

, `"list"`

, `"series"`

and `"records"`

.

- For the first three arguments (
`"dict"`

,`"list"`

and`"series"`

), the method returns a`dict`

of key-value pairs, where the keys are*column labels*of the DataFrame. The data-structure of the values are as specified by the`orient`

parameter (`dict`

,`list`

or`Series`

). `"dict"`

- this is the default argument. The values of the dict returned are dict themselves with row index as key and elements of the column as values.`"list"`

- the values of the dict are lists of corresponding column elements.`"series"`

- the values of the dict are Series of column elements, with the row index as index label of the series. The dtype of the series are inferred from the data.- For the last argument —
`"records"`

— the method returns a`list`

with one`dict`

corresponding to each*row*in the DataFrame.

Let’s convert the DataFrame `new_df`

from the “Adding multiple rows” section into these various formats. We’ll use Python’s pretty print library — `pprint`

— to print the results in a nicely formatted way so that it is easier to look at.

```
import pprint
# print the actual dataframe
print('The dataframe')
print(new_df)
print('')
print('to_dict() with orient="dict"')
pprint.pprint(new_df.to_dict(orient='dict'))
print('')
print('to_dict() with orient="list"')
pprint.pprint(new_df.to_dict(orient='list'))
print('')
print('to_dict() with orient="series"')
pprint.pprint(new_df.to_dict(orient='series'))
print('')
print('to_dict() with orient="records"')
pprint.pprint(new_df.to_dict(orient='records'))
print('')
```

# Summary

**Series**is a 1-dimensional data-structure supported by Pandas.- Series objects have only one set of labels.
- Each column of a Pandas DataFrame is a Series.
- We can
**create DataFrames**from ndarray, dicts, etc and also**convert a DataFrame**back to these formats - We can
**add rows**to a DataFrame using`append()`

# Reference

Creating a Series:

```
pandas.Series(data=None, index=None)
# data is usually 1D numpy array, list or dict
```

Creating a DataFrame:

```
pandas.DataFrame(data=None, index=None, columns=None)
# data is usually a 2D array, or
# a dict where each key-value pairs represent columns.
# the values can be 1D arrays, Series, lists or dicts
```

Append rows:

```
DataFrame.append(other, ignore_index=False)
# other can be 1D array, Series, list, dict, DataFrame
# or a list of 1D arrays, Series, lists, dicts, DataFrames
```

Converting DataFrame to other formats:

```
# to ndarray
DataFrame.values
DataFrame.to_numpy(dtype="")
# to Python dict with key-value pairs representing columns
DataFrame.to_dict(orient='dict') # orient can also be "list" or "series"
# to Python list with each element representing rows
DataFrame.to_dict(orient='records')
```