Concat DataFrames in Pandas Explained in Python With Examples

In this tutorial, we’ll explore and demonstrate how to concat DataFrames in Pandas with different Python examples and use cases. If you usually work with data, merge datasets, or handle lots of info, learning the DataFrame concatenation technique in Pandas will be helpful. It makes your data analysis tasks a lot easier.

Contents

Prerequisites Understanding Concatenation Concatenating DataFrames Vertically Concatenating DataFrames Horizontally Handling Index Reset Concat DataFrames with Different Columns Concat DataFrames with Common Columns Concat DataFrames with Duplicate Columns Frequently Asked Questions (FAQ)Q1: Can I concatenate DataFrames with different column names?Q2: How do I deal with repeated column names when combining DataFrames?Q3: What if my index values become jumbled after combining?Q4: Can I combine DataFrames with varying column numbers?Q5: Is there a way to join DataFrames without doubling up common columns?Q6: Are there other ways to put DataFrames together in Pandas?

Prerequisites

Before we start, ensure that you have Pandas installed. If you don’t have it installed, you can use the following command:

pip install pandas

Now, let’s learn how to concatenate DataFrames using Pandas and different scenarios that evolve around this topic.

Understanding Concatenation

Concatenation is the process of combining data frames along a particular axis. In Pandas, the concat function is used for this purpose. It allows you to stack DataFrames vertically or horizontally. The key parameter is axis, where axis=0 stacks DataFrames vertically (along rows), and axis=1 stacks them horizontally (along columns).

Also Explore: Concatenate Strings in Python With Examples

Concatenating DataFrames Vertically

Let’s create two simple DataFrames, df1 and df2, to demonstrate vertical concatenation.

import pandas as pds

# Create DataFrame 1
df1 = pds.DataFrame({
    'Name': ['Soumya', 'Versha'],
    'Age': [25, 30],
    'City': ['New York', 'Los Angeles']
})

# Create DataFrame 2
df2 = pds.DataFrame({
    'Name': ['Kavya', 'Sena'],
    'Age': [22, 28],
    'City': ['Chicago', 'Houston']
})

# Concate DataFrames Using Pandas
res_vt = pds.concat([df1, df2], axis=0)

# Display the result
print("Concatenated DataFrame Vertically:")
print(res_vt)

In this example:

We create two DataFrames, df1 and df2, with similar column names and structures.
The pds.concat() function combines these DataFrames vertically, creating a new DataFrame named res_vt.
The result is then displayed.

Run this script to see the concatenated DataFrame:

Concatenated DataFrame Vertically:
      Name  Age           City
0    Soumya   25       New York
1    Versha   30    Los Angeles
0     Kavya   22        Chicago
1      Sena   28        Houston

The resulting data frame has consecutive index values, reflecting the stacking of rows.

Concatenating DataFrames Horizontally

Now, let’s explore horizontal concatenation. We’ll modify the script to concatenate DataFrames df1 and df2 horizontally.

# Concatenate DataFrames horizontally
res_hr = pds.concat([df1, df2], axis=1)

# Display the result
print("Concatenated DataFrame Horizontally:")
print(res_hr)

Run this modified script to see the horizontally concatenated DataFrame:

Concatenated DataFrame Horizontally:
    Name  Age           City     Name  Age     City
0  Somya   25       New York    Kavya   22  Chicago
1 Versha   30    Los Angeles     Sena   28  Houston

The resulting data frame now has columns from both df1 and df2 side by side.

Handling Index Reset

After concatenation, the resulting data frame may have duplicate index values. To address this, you can reset the index using the ignore_index parameter.

# Concatenate DataFrames vertically with index reset
res_set_index = pds.concat([df1, df2], axis=0, ignore_index=True)

# Display the result
print("Concatenated DataFrame with Reset Index:")
print(res_set_index)

In the script, ignore_index=True ensures that the resulting data frame has a new sequential index:

Concatenated DataFrame with Reset Index:
      Name  Age           City
0   Soumya   25       New York
1   Versha   30    Los Angeles
2    Kavya   22        Chicago
3     Sena   28        Houston

Now, the index values are reset, providing a cleaner structure.

Concat DataFrames with Different Columns

What if your DataFrames have different columns? The concat function can handle this by filling in missing values with NaN.

# Create data frames with diff columns
df3 = pds.DataFrame({
    'Name': ['Dave', 'Tim'],
    'Job': ['Doctor', 'Engineer']
})

# Concatenate data frames with diff columns
result = pds.concat([df1, df3], axis=1)

# Display the result
print("Concatenated DataFrame with Different Columns:")
print(result)

The output will look like this:

Concatenated DataFrame with Different Columns:
    Name   Age         City   Name Occupation
0 Soumya  25.0     New York   Dave     Doctor
1 Versha  30.0  Los Angeles    Tim   Engineer

The missing values in columns that don’t exist in the original DataFrame are filled with NaN.

Concat DataFrames with Common Columns

When DataFrames have common columns, you might want to concatenate them without duplicating those columns. The pds.concat function provides the keys parameter for this purpose.

# Concatenate data frames with common columns
result = pds.concat([df1, df2], axis=0, keys=['First', 'Second'])

# Display the result
print("Concatenated DataFrame with Common Columns:")
print(result)

Here, we use the keys parameter to create a hierarchical index:

Concatenated DataFrame with Common Columns:
              Name   Age           City
First  0    Soumya   25        New York
       1    Versha   30     Los Angeles
Second 0     Kavya   22         Chicago
       1      Sena   28         Houston

This hierarchical index allows you to distinguish between the original DataFrames.

Concat DataFrames with Duplicate Columns

In some cases, your DataFrames may have columns with identical names. To handle this, use the suffixes parameter to add suffixes to the duplicate columns.

# Create data frames with duplicate columns
df4 = pds.DataFrame({
    'Name': ['Shiv', 'Som'],
    'Age': [26, 35],
    'City': ['Miami', 'Seattle']
})

# Concatenate data frames with duplicate columns
result = pds.concat([df1, df4], axis=0, suffixes=('_left', '_right'))

# Display the result
print("Concatenated DataFrame with Duplicate Columns:")
print(result)

The output will look like this:

Concatenated DataFrame with Duplicate Columns:
    Name  Age_left        City  Age_right
0 Soumya        25    New York        NaN
1 Versha        30 Los Angeles        NaN
0   Shiv        26       Miami        NaN
1    Som        35     Seattle        NaN

The suffixes _left and _right help distinguish between the duplicate columns.

Frequently Asked Questions (FAQ)

Let’s add a few FAQs related to concat() DataFrames in Pandas.

Q1: Can I concatenate DataFrames with different column names?

A: Yes, you can mix DataFrames with different column names. The result will have all columns, with empty spaces filled as NaN.

Q2: How do I deal with repeated column names when combining DataFrames?

A: Use the suffixes option in pds.concat() to add labels like _left and _right to distinguish duplicate columns.

Q3: What if my index values become jumbled after combining?

A: Set ignore_index=True in pds.concat() to give your DataFrame a fresh, organized index.

Q4: Can I combine DataFrames with varying column numbers?

A: Absolutely. Combining DataFrames with different column counts fills in gaps with NaN.

Q5: Is there a way to join DataFrames without doubling up common columns?

A: Yes, use keys in pds.concat() to create a neat structure with a nested index, keeping things clear.

Q6: Are there other ways to put DataFrames together in Pandas?

A: Certainly! You can use pds.DataFrame.append() to add rows or pds.DataFrame.merge() for more complex merging.

Feel free to add more questions that you might have via the comment box. We want to ensure the tutorial’s completeness and help you as much as possible.

Conclusion

In this tutorial, we covered the essentials of concatenating DataFrames in Pandas. We explored both vertical and horizontal concatenation. In addition, you got to see more scenarios like handling index reset, dealing with different and common columns, and managing duplicate columns. Understanding the concat() gives you a nice way to combine and modify datasets in your Python programming assignments.

As you continue working with Pandas, practicing with techniques like concatenating DataFrames and others will help you do a more mature data analysis job. It will make your data analysis and manipulation tasks more efficient.

Happy coding,
Team TechBeamers

Concat DataFrames in Pandas: A Step-by-Step Tutorial

Prerequisites

Understanding Concatenation

Concatenating DataFrames Vertically

Concatenating DataFrames Horizontally

Handling Index Reset

Concat DataFrames with Different Columns

Concat DataFrames with Common Columns

Concat DataFrames with Duplicate Columns

Frequently Asked Questions (FAQ)

Q1: Can I concatenate DataFrames with different column names?

Q2: How do I deal with repeated column names when combining DataFrames?

Q3: What if my index values become jumbled after combining?

Q4: Can I combine DataFrames with varying column numbers?

Q5: Is there a way to join DataFrames without doubling up common columns?

Q6: Are there other ways to put DataFrames together in Pandas?

Conclusion

Popular Tutorials

50 SQL Practice Questions for Good Results in Interview

7 Sites to Practice Selenium for Free in 2024

SQL Exercises – Complex Queries

15 Java Coding Questions for Testers

30 Python Programming Questions On List, Tuple, and Dictionary

Our tutorials are written by real people who’ve put in the time to research and test thoroughly. Whether you’re a beginner or a pro, our tutorials will guide you through everything you need to learn a programming language.

Top Coding Tips

Top Tutorials

Sign Up for Our Newsletter

Prerequisites

Understanding Concatenation

Concatenating DataFrames Vertically

Concatenating DataFrames Horizontally

Handling Index Reset

Concat DataFrames with Different Columns

Concat DataFrames with Common Columns

Concat DataFrames with Duplicate Columns

Frequently Asked Questions (FAQ)

Q1: Can I concatenate DataFrames with different column names?

Q2: How do I deal with repeated column names when combining DataFrames?

Q3: What if my index values become jumbled after combining?

Q4: Can I combine DataFrames with varying column numbers?

Q5: Is there a way to join DataFrames without doubling up common columns?

Q6: Are there other ways to put DataFrames together in Pandas?

Conclusion

You Might Also Like

Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.

Popular Tutorials