~ 5 min read

How to Fill NaN Values in Pandas with Value?

By: Adam Richardson
Share:

Using .fillna() method to fill NaN values

One of the best ways to fill in NaN (Not a Number) values in a Pandas DataFrame is to use the .fillna() method. The .fillna() method allows you to fill in the missing data with a single value or by using a custom function to determine the value.

The basic syntax for using the .fillna() method is as follows:

DataFrame.fillna(value=None, method=None, axis=None, inplace=False, limit=None, downcast=None)

Below are some examples of using the .fillna() method to fill in missing data with different parameters:

Example 1: Fill NaN values with a single value

import pandas as pd

data = {'Name': ['John', 'Peter', 'Amy', 'David', 'Linda'],
        'Age': [25, 47, None, None, 34],
        'Height': [174.5, 180.3, 167.8, None, 162.1],
        'Score': [80, 92, 88, 75, None]}

df = pd.DataFrame(data)

df.fillna(0, inplace=True)

In the above example, all NaN values in the DataFrame are filled with the value 0.

Example 2: Fill NaN values with Forward Fill method

import pandas as pd

data = {'Name': ['John', 'Peter', 'Amy', 'David', 'Linda'],
        'Age': [25, 47, None, 28, 34],
        'Height': [174.5, 180.3, 167.8, None, 162.1],
        'Score': [80, 92, 88, None, None]}

df = pd.DataFrame(data)

df.fillna(method='ffill', inplace=True)

In this example, the Forward Fill method is used to fill in the missing data. The method propagates the last valid observation forward to fill the missing data.

Example 3: Fill NaN values with Custom Function

import pandas as pd
import numpy as np

data = {'Name': ['John', 'Peter', 'Amy', 'David', 'Linda'],
        'Age': [25, 47, None, None, 34],
        'Height': [174.5, 180.3, 167.8, None, 162.1],
        'Score': [80, 92, 88, 75, None]}

df = pd.DataFrame(data)

def fill_avg(column):
    return column.fillna(np.mean(column))

df['Age'] = fill_avg(df['Age'])
df['Height'] = fill_avg(df['Height'])
df['Score'] = fill_avg(df['Score'])

In the above example, a custom function fill_avg() is created to fill in the missing data with the mean of the column. The function is then applied to the relevant columns.

Using .replace() method for filling NaN values

Another method for filling in NaN values in a Pandas DataFrame is to use the .replace() method. The method is particularly useful when there are specific values that are missing and need to be replaced with a specific value.

The basic syntax for using the .replace() method is as follows:

DataFrame.replace(to_replace=None, value=None, inplace=False, limit=None, regex=False, method='pad')

Below are some examples of using the .replace() method to fill in missing data with different parameters:

Example 1: Replace a specific NaN value with a single value

import pandas as pd
import numpy as np

data = {'Name': ['John', 'Peter', 'Amy', 'David', 'Linda'],
        'Age': [25, 47, np.NaN, 30, 34],
        'Height': [174.5, 180.3, 167.8, 155.2, 162.1],
        'Score': [80, None, 88, 75, None]}

df = pd.DataFrame(data)

df.replace(np.NaN, -1, inplace=True)

In the above example, a specific NaN value in the DataFrame is replaced with the value -1.

Example 2: Replace multiple NaN values with a single value

import pandas as pd
import numpy as np

data = {'Name': ['John', 'Peter', 'Amy', 'David', 'Linda'],
        'Age': [25, 47, np.NaN, 30, np.NaN],
        'Height': [174.5, np.NaN, 167.8, 155.2, 162.1],
        'Score': [80, np.NaN, 88, 75, None]}

df = pd.DataFrame(data)

df.replace(np.NaN, -1, inplace=True)

In this example, multiple NaN values in the DataFrame are replaced with the value -1.

Example 3: Replace NaN values with a calculated value

import pandas as pd
import numpy as np

data = {'Name': ['John', 'Peter', 'Amy', 'David', 'Linda'],
        'Age': [25, 47, np.NaN, 30, 34],
        'Height': [174.5, 180.3, 167.8, 155.2, 162.1],
        'Score': [80, None, 88, 75, None]}

df = pd.DataFrame(data)

df.replace(np.NaN, np.mean(df['Age']), inplace=True)

In this example, NaN values in the Age column of the DataFrame are replaced with the mean value of the column.

Using .interpolate() method for NaN values

Another powerful method to fill in the missing data in a Pandas DataFrame is to use the .interpolate() method. The method is particularly useful for time series data and can fill in the missing data using intermediate values.

The basic syntax for using the .interpolate() method is as follows:

DataFrame.interpolate(method='linear', axis=0, limit=None, inplace=False, limit_direction='forward', limit_area=None, downcast=None, **kwargs)

Below are some examples of using the .interpolate() method to fill in missing data with different parameters:

Example 1: Interpolating missing values using Linear Method

import pandas as pd
import numpy as np

data = {'Name': ['John', 'Peter', 'Amy', 'David', 'Linda'],
        'Age': [25, 47, np.NaN, 30, 34],
        'Height': [174.5, 180.3, 167.8, 155.2, 162.1],
        'Score': [80, 85, np.NaN, 75, 82]}

df = pd.DataFrame(data)

df.interpolate(method='linear', inplace=True)

In the above example, the Linear method is used to fill in the missing data in the DataFrame using intermediate values.

Example 2: Interpolating missing values using Time Method

import pandas as pd
import numpy as np

data = {'Value': [1, 2, np.NaN, 4, 5],
        'Time': pd.date_range('20210101', periods=5)}

df = pd.DataFrame(data)

df.interpolate(method='time', inplace=True)

In this example, the Time method is used to fill in the missing data in the DataFrame, where the missing data is assumed to be at different times.

Example 3: Setting Limit for Interpolation

import pandas as pd
import numpy as np

data = {'Name': ['John', 'Peter', 'Amy', 'David', 'Linda'],
        'Age': [25, 47, np.NaN, 30, 34],
        'Height': [174.5, 180.3, 167.8, 155.2, 162.1],
        'Score': [np.NaN, 85, 88, 75, np.NaN]}

df = pd.DataFrame(data)

df.interpolate(method='linear', limit=1, inplace=True)

In this example, a limit of 1 is set for interpolation, meaning only one missing value in a row can be interpolated.

Summary

In this article, we explored different methods to fill in the missing data in a Pandas DataFrame. We discussed how to use the .fillna() method to fill NaN values with a single value, using forward fill method and a custom function. We also looked at how to use .replace() method to replace NaN values with a specific value or a calculated value. Finally, we examined the .interpolate() method to fill in missing data using intermediate values. The choice of method to fill in missing data largely depends on the nature of the data and the goals of your analysis. Be sure to use the appropriate method and parameters that suit your needs.

Share:
Subscribe to our newsletter

Stay up to date with our latest content - No spam!

Related Posts