~ 5 min read

Using astype() Function for Data Type Conversion in Pandas

By: Adam Richardson
Share:

Overview of astype() function

The Pandas library in Python provides numerous data manipulation functions to help users to manage data effeciently. One of such function is astype(), which can be used to change the data type of a Pandas Series or DataFrame.

In some cases, data type conversion is necessary for proper data analysis. For example, when your data is imported as string (object) datatype and you need to perform mathematical or statistical operations on it, you’ll need to convert it into a numeric data type.

Here’s the general syntax for the astype(new_type) function:

import pandas as pd
df = pd.DataFrame({'col1': ['1', '2', '3', '4'], 'col2': ['4.1', '5.2', '6.3', '7.4']})
print(df.dtypes)  # Output: col1    object, col2    object, dtype: object
df["col1"] = df["col1"].astype(int)
df["col2"] = df["col2"].astype(float)
print(df.dtypes)  # Output: col1      int32, col2    float64, dtype: object

In this example, we created a DataFrame with two columns ‘col1’ and ‘col2’. We changed the datatype of ‘col1’ from ‘object’ to ‘int32’ and ‘col2’ from ‘object’ to ‘float64’ using astype() function. By running df.dtypes, you can check the data types of both columns.

astype() function accepts a string or a dictionary as an argument. If a string argument is passed to astype(), it will try to convert all columns of a DataFrame to that datatype. If a dictionary is passed to astype(), you can define specific data type conversions for individual columns of a DataFrame.

Overall, the astype() function is an important tool within Pandas library that ensures data type consistency and prepares the data for data analysis.

Using astype() on a single column

Pandas Series is a one-dimensional labeled array that can hold any data type such as integer, float, or string. You can use the astype() function to change the pandas series to another specified data type. In this case, let’s take an example of converting a pandas series that contains float data type to integer data type.

import pandas as pd
df = pd.DataFrame({'col1': ['1', '2', '3', '4'], 'col2': ['4.1', '5.2', '6.3', '7.4']})
df['col2'] = df['col2'].astype(float)
df['col2'] = df['col2'].astype(int)
print(df['col2'])

In this example, we first created a DataFrame with two columns ‘col1’ and ‘col2’. Then we changed the datatype of ‘col2’ from ‘object’ to ‘float64’ using astype() function. In the next line, we converted ‘col2’ from ‘float64’ to ‘int32’ using astype() function.

If you want to convert multiple columns of a pandas dataframe to a specific data type, you can use astype() method with the dataframe name and specify the columns to be converted in a dictionary.

import pandas as pd
df = pd.DataFrame({'col1': ['1', '2', '3', '4'], 'col2': [4.1, 5.2, 6.3, 7.4], 'col3': ['True', 'False', 'True', 'False']})
df = df.astype({'col1': int, 'col2': int, 'col3': bool})
print(df.dtypes)

In this example, we created a DataFrame with three columns ‘col1’, ‘col2’, and ‘col3’. We used the astype() function to change each column data type. Column ‘col1’ and ‘col2’ are changed to ‘int32’ data type and ‘col3’ is changed to ‘bool’ data type.

Keep in mind that when using astype() function, you should always ensure that the data is valid for conversion. Otherwise, it will throw a conversion error. astype() function also cannot be used to convert nullable integer columns to a non-nullable integer data type in Pandas.

Using astype() on multiple columns

Sometimes, you may have to change the data types of multiple columns in a Pandas DataFrame at once. In such cases, you can use the astype() function with a dictionary of column names as the keys, and the target data type as the values.

import pandas as pd
df = pd.DataFrame({'col1': ['1', '2', '3', '4'], 'col2': ['4.1', '5.2', '6.3', '7.4'], 'col3': ['True', 'False', 'True', 'False']})
df = df.astype({'col1': int, 'col2': float, 'col3': bool})
print(df.dtypes)

In the above example, we created a DataFrame with three columns ‘col1’, ‘col2’, and ‘col3’. We used the astype() function on multiple columns by passing a dictionary to it. In the dictionary, we passed a column name as the key and the target data type as the value. Column ‘col1’ was converted to ‘int32’, ‘col2’ was converted to ‘float64’, and ‘col3’ was converted to ‘bool’ data type.

The astype() function is also useful when you’re working with massive datasets with a large number of columns. In such cases, you can save time by using the astype() function to convert multiple columns to a specific data type in one shot.

import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.rand(10000000, 3), columns=['col1', 'col2', 'col3'])
df = df.astype({'col1': np.float32, 'col2': np.float32, 'col3': np.float32})
print(df.dtypes)

In this example, we created a DataFrame with 10000000 rows and 3 columns, with random float values using the NumPy library. Then, we used the astype() function to convert all columns of the DataFrame to ‘float32’ data type.

Keep in mind that using astype() function may result in unpredictable behavior if your input data has different values than expected. So, it’s always a good practice to carefully check the source data before using this function.

Summary

In summary, astype() function is a useful pandas tool to convert data types of Pandas Series and DataFrames. You can use it on a single column or multiple columns with a dictionary. By using astype(), you can ensure data type consistency and prepare data for efficient data analysis. If you are working with larger datasets, it is highly recommended that you use astype() to convert multiple columns in one shot. Just be sure to check your source data before using the function to avoid any errors. From my personal experience, when working with datasets, it’s always best practice to do a thorough analysis of the data before manipulating it.

Share:
Subscribe to our newsletter

Stay up to date with our latest content - No spam!

Related Posts