In this tutorial, we’ll explore and demonstrate how to concat DataFrames in Pandas with different Python examples and use cases. If you usually work with data, merge datasets, or handle lots of info, learning the DataFrame concatenation technique in Pandas will be helpful. It makes your data analysis tasks a lot easier.
Prerequisites
Before we start, ensure that you have Pandas installed. If you don’t have it installed, you can use the following command:
pip install pandas
Now, let’s learn how to concatenate DataFrames using Pandas and different scenarios that evolve around this topic.
Understanding Concatenation
Concatenation is the process of combining data frames along a particular axis. In Pandas, the concat
function is used for this purpose. It allows you to stack DataFrames vertically or horizontally. The key parameter is axis
, where axis=0
stacks DataFrames vertically (along rows), and axis=1
stacks them horizontally (along columns).
Also Explore: Concatenate Strings in Python With Examples
Concatenating DataFrames Vertically
Let’s create two simple DataFrames, df1
and df2
, to demonstrate vertical concatenation.
import pandas as pds
# Create DataFrame 1
df1 = pds.DataFrame({
'Name': ['Soumya', 'Versha'],
'Age': [25, 30],
'City': ['New York', 'Los Angeles']
})
# Create DataFrame 2
df2 = pds.DataFrame({
'Name': ['Kavya', 'Sena'],
'Age': [22, 28],
'City': ['Chicago', 'Houston']
})
# Concate DataFrames Using Pandas
res_vt = pds.concat([df1, df2], axis=0)
# Display the result
print("Concatenated DataFrame Vertically:")
print(res_vt)
In this example:
- We create two DataFrames,
df1
anddf2
, with similar column names and structures. - The
pds.concat()
function combines these DataFrames vertically, creating a new DataFrame namedres
_vt. - The result is then displayed.
Run this script to see the concatenated DataFrame:
Concatenated DataFrame Vertically:
Name Age City
0 Soumya 25 New York
1 Versha 30 Los Angeles
0 Kavya 22 Chicago
1 Sena 28 Houston
The resulting data frame has consecutive index values, reflecting the stacking of rows.
Concatenating DataFrames Horizontally
Now, let’s explore horizontal concatenation. We’ll modify the script to concatenate DataFrames df1
and df2
horizontally.
# Concatenate DataFrames horizontally
res_hr = pds.concat([df1, df2], axis=1)
# Display the result
print("Concatenated DataFrame Horizontally:")
print(res_hr)
Run this modified script to see the horizontally concatenated DataFrame:
Concatenated DataFrame Horizontally:
Name Age City Name Age City
0 Somya 25 New York Kavya 22 Chicago
1 Versha 30 Los Angeles Sena 28 Houston
The resulting data frame now has columns from both df1
and df2
side by side.
Handling Index Reset
After concatenation, the resulting data frame may have duplicate index values. To address this, you can reset the index using the ignore_index
parameter.
# Concatenate DataFrames vertically with index reset
res_set_index = pds.concat([df1, df2], axis=0, ignore_index=True)
# Display the result
print("Concatenated DataFrame with Reset Index:")
print(res_set_index)
In the script, ignore_index=True
ensures that the resulting data frame has a new sequential index:
Concatenated DataFrame with Reset Index:
Name Age City
0 Soumya 25 New York
1 Versha 30 Los Angeles
2 Kavya 22 Chicago
3 Sena 28 Houston
Now, the index values are reset, providing a cleaner structure.
Concat DataFrames with Different Columns
What if your DataFrames have different columns? The concat
function can handle this by filling in missing values with NaN.
# Create data frames with diff columns
df3 = pds.DataFrame({
'Name': ['Dave', 'Tim'],
'Job': ['Doctor', 'Engineer']
})
# Concatenate data frames with diff columns
result = pds.concat([df1, df3], axis=1)
# Display the result
print("Concatenated DataFrame with Different Columns:")
print(result)
The output will look like this:
Concatenated DataFrame with Different Columns:
Name Age City Name Occupation
0 Soumya 25.0 New York Dave Doctor
1 Versha 30.0 Los Angeles Tim Engineer
The missing values in columns that don’t exist in the original DataFrame are filled with NaN.
Concat DataFrames with Common Columns
When DataFrames have common columns, you might want to concatenate them without duplicating those columns. The pds.concat
function provides the keys
parameter for this purpose.
# Concatenate data frames with common columns
result = pds.concat([df1, df2], axis=0, keys=['First', 'Second'])
# Display the result
print("Concatenated DataFrame with Common Columns:")
print(result)
Here, we use the keys
parameter to create a hierarchical index:
Concatenated DataFrame with Common Columns:
Name Age City
First 0 Soumya 25 New York
1 Versha 30 Los Angeles
Second 0 Kavya 22 Chicago
1 Sena 28 Houston
This hierarchical index allows you to distinguish between the original DataFrames.
Concat DataFrames with Duplicate Columns
In some cases, your DataFrames may have columns with identical names. To handle this, use the suffixes
parameter to add suffixes to the duplicate columns.
# Create data frames with duplicate columns
df4 = pds.DataFrame({
'Name': ['Shiv', 'Som'],
'Age': [26, 35],
'City': ['Miami', 'Seattle']
})
# Concatenate data frames with duplicate columns
result = pds.concat([df1, df4], axis=0, suffixes=('_left', '_right'))
# Display the result
print("Concatenated DataFrame with Duplicate Columns:")
print(result)
The output will look like this:
Concatenated DataFrame with Duplicate Columns:
Name Age_left City Age_right
0 Soumya 25 New York NaN
1 Versha 30 Los Angeles NaN
0 Shiv 26 Miami NaN
1 Som 35 Seattle NaN
The suffixes _left
and _right
help distinguish between the duplicate columns.
Frequently Asked Questions (FAQ)
Let’s add a few FAQs related to concat() DataFrames in Pandas.
Q1: Can I concatenate DataFrames with different column names?
A: Yes, you can mix DataFrames with different column names. The result will have all columns, with empty spaces filled as NaN.
Q2: How do I deal with repeated column names when combining DataFrames?
A: Use the suffixes
option in pds.concat()
to add labels like _left
and _right
to distinguish duplicate columns.
Q3: What if my index values become jumbled after combining?
A: Set ignore_index=True
in pds.concat()
to give your DataFrame a fresh, organized index.
Q4: Can I combine DataFrames with varying column numbers?
A: Absolutely. Combining DataFrames with different column counts fills in gaps with NaN.
Q5: Is there a way to join DataFrames without doubling up common columns?
A: Yes, use keys
in pds.concat()
to create a neat structure with a nested index, keeping things clear.
Q6: Are there other ways to put DataFrames together in Pandas?
A: Certainly! You can use pds.DataFrame.append()
to add rows or pds.DataFrame.merge()
for more complex merging.
Feel free to add more questions that you might have via the comment box. We want to ensure the tutorial’s completeness and help you as much as possible.
Conclusion
In this tutorial, we covered the essentials of concatenating DataFrames in Pandas. We explored both vertical and horizontal concatenation. In addition, you got to see more scenarios like handling index reset, dealing with different and common columns, and managing duplicate columns. Understanding the concat
() gives you a nice way to combine and modify datasets in your Python programming assignments.
As you continue working with Pandas, practicing with techniques like concatenating DataFrames and others will help you do a more mature data analysis job. It will make your data analysis and manipulation tasks more efficient.
Happy coding,
Team TechBeamers