Hierarchical Indexing with Pandas

A guide on how to use hierarchical indexing (MultiIndex) with Pandas.

Tirendaz AI
Level Up Coding

--

Photo by Bench Accounting on Unsplash

Hierarchical indexing allow us to use multiple index levels on an axis. Hierarchical indexing is also known as multiple indexing. In this post, I’ll show how to use hierarchical indexing. In short, I’ll cover the following topics:

  • What is MultiIndex?
  • What is the unstack?
  • What is Hierarchical Indexing?
  • Selecting in Hierarchical Indexing
  • What is Swaplevel?
  • Sorting in Hierarchical Indexing
  • Summary Statistics in Hierarchical Indexing
  • Hierarchical Indexing in The Data Frame

Let’s get started.

First, I’m going to import NumPy and Pandas.

Let’s generate random data from the normal distribution.

What is MultiIndex?

MultiIndex allows you to select more than one row and column in your index. To understand MultiIndex, let’s see the indexes of the data.

MultiIndex is an advanced indexing technique for DataFrames that shows the multiple levels of the indexes. Our dataset has two levels. You can obtain subsets of the data using the indexes. For example, let’s take a look at the values with index a.

You can slice the data.

You can see more than one index. For example, let’s take a look at the values with indexes a and c.

You can select values from the inner index. Let’s take a look at the first values of the inner index.

What is the unstack?

The stack method turns column names into index values, and the unstack method turns index values into column names. You can see the data as a table with the unstack method.

To restore the dataset, you can use the stack method.

What is hierarchical indexing?

Hierarchical indexing is a method of creating structured group relationships in the dataset. Data frames can have hierarchical indexes. To show this, let me create a dataset.

Notice that in this dataset, both row and column have hierarchical indexes. You can name hierarchical levels. Let’s show this.

Selecting in Hierarchical Indexing

You can select subgroups of data. For example, let’s select the index named num.

What is Swaplevel?

Sometimes, you may want to swap the level of the indexes. You can use the swaplevel method for this. The swaplevel method takes two levels and returns a new object. For example, let’s swap the class and exam indexes in the dataset.

Sorting in Hierarchical Indexing

To sort the indexes by level, you can use the sort_index method. For example, let’s sort the dataset by level 1.

Summary Statistics in Hierarchical Indexing

Summary statistics in Series or DataFrame are found by one level. If you have more than one level of data, you can calculate summary statistics according to the level. For example, let’s see the sum values according to the exam level in the dataset.

Let’s see the total values according to the field level.

Hierarchical Indexing in The Data Frame

You can move the DataFrame’s columns to the row index. To show this, let’s create a dataset.

Let’s transform columns a and b of this dataset into a row index.

In the set_index method, the indexes moved to the row are removed from the column. You can use drop = False to remain the columns you get as an index in the same place.

Let’s first take a look at data2 to demonstrate the reset_index method.

You can use the reset_index method to restore the dataset.

That’s it. In this post, I explained hierarchical indexing with Pandas. I hope you enjoy this post. Thanks for reading. You can find the notebook here.

--

--