How to Fix Missing Data in Pandas

Tirendaz AI
Artificial Intelligence in Plain English
5 min readFeb 18, 2021

--

A guide on how to handle massing data with the toy dataset.

Photo by Alejandro Escamilla on Unsplash

Real-world data is dirty. It is important to preprocess the data before analyzing the data. In this post, I’ll talk about missing data. Let’s get started.

Loading Data

First of all, let’s import the libraries.

Missing data in the Pandas is represented by the value NaN (Not a Number).

You can use the isnull method to see missing data in the data.

The notnull method does the opposite of the isnull method. Let me show that.

Let’s assign None to a value in data.

If you want to remove the missing data, you can use the dropna method.

Handling Missing Data in Data Frame

Now, let’s show how to remove missing data in DataFrame structure. First, let’s import nan from Numpy.

Let me create a data frame named df.

By default, the dropna method removes rows with missing data.

The how = all argument removes all rows with missing data.

Let’s take a look at df.

The axis = 1 argument is used to remove the column with all missing data. First, let’s assign the missing data to the second column of the df dataset.

Let’s remove the columns with missing data.

Let’s take a look at the df dataset again.

If you want to get rows with a certain number of values, you can use the thresh argument.

If you want to assign another value instead of missing data, you can use the fillna method.

Using the dictionary structure, you can assign the missing data in each column to a different value with the fillna method. Let me show that.

Let’s take a look at the df dataset.

If you want to modify the object after using the fillna method, you can use the inplace argument.

Let’s create a new df dataset.

If you want to assign the value in the upper row to the missing data, you can use the method = “ffill”.

If you want to assign the same values by limiting, you can use the limit option with the method = “ffill”. For example, Let’s assign the same value for only one value,

Now, let’s assign the mean value to missing data. To show this, let’s first create a Series.

Let’s assign a mean value to missing data.

Let’s take a look at the df dataset.

Let’s assign the mean of each column to the missing data.

That’s it. I hope you enjoy this post. You can access the notebook here.

--

--