In our previous tutorial, you learned how to merge multiple CSV files using Python built-in functions. Today, we’ll demonstrate how to use Pandas to merge CSV files and explain with a fully working example.
We’ll start by telling you – what is the use of Pandas. It is a library written in Python for data munging and analysis. It provides highly optimized data structures and high-performing functions for working with data.
Pandas handle data from 100MB to 1GB quite efficiently and give an exuberant performance. However, in the case of BIG DATA CSV files, it provides functions that accept chunk size to read big data in smaller chunks.
Using Pandas to Merge CSV Files in Python
Python script to merge CSV using Pandas
Include required Python modules
In our Python script, we’ll use the following core modules:
- OS module – Provides functions like copy, delete, read, and write files, and directories.
- Glob module – Provides a Python glob function to list files and directories in Python.
- Pandas – Provides functions to merge multiple CSV files in a quick time.
To sum up, check out the below coding snippet. It loads the required modules and sets the working dir for our testing.
""" Python Script: Combine/Merge multiple CSV files using the Pandas library """ from os import chdir from glob import glob import pandas as pdlib # Move to the path that holds our CSV files csv_file_path = 'c:/temp/csv_dir/' chdir(csv_file_path)
Prepare a list of all CSV files
In this step, we have to find out the list of all CSV files. Therefore, we’ll use the glob() function and give it the “.csv” pattern to list matching the target.
Below is a piece of code to list all files matching the “.csv” pattern.
# List all CSV files in the working dir file_pattern = ".csv" list_of_files = [file for file in glob('*.{}'.format(file_pattern))] print(list_of_files)
Check out – Python to List All Files in a Directory
Concatenate to produce a consolidated file
It is the last step where we have to call Pandas concat() to return a consolidated object. After that, we convert the result back to a single CSV file. It generates the final output in the current working directory.
Let’s check out the final piece of code that does our task.
""" Function: Produce a single CSV after combining all files """ def produceOneCSV(list_of_files, file_out): # Consolidate all CSV files into one object result_obj = pdlib.concat([pdlib.read_csv(file) for file in list_of_files]) # Convert the above object into a csv file and export result_obj.to_csv(file_out, index=False, encoding="utf-8") file_out = "ConsolidateOutput.csv" produceOneCSV(list_of_files, file_out)
Full script code
""" Python Script: Combine/Merge multiple CSV files using the Pandas library """ from os import chdir from glob import glob import pandas as pdlib # Produce a single CSV after combining all files def produceOneCSV(list_of_files, file_out): # Consolidate all CSV files into one object result_obj = pdlib.concat([pdlib.read_csv(file) for file in list_of_files]) # Convert the above object into a csv file and export result_obj.to_csv(file_out, index=False, encoding="utf-8") # Move to the path that holds our CSV files csv_file_path = 'c:/temp/csv_dir/' chdir(csv_file_path) # List all CSV files in the working dir file_pattern = ".csv" list_of_files = [file for file in glob('*.{}'.format(file_pattern))] print(list_of_files) file_out = "ConsolidateOutput.csv" produceOneCSV(list_of_files, file_out)
Must Reda: Convert Python Dictionary to DataFrame
Summary
We hope that you now know how to use the Pandas library to merge CSV files. Also, you can write a fully working Python script. It will help you combine multiple files quickly.
Python for Data Science
Check this Beginners’s Guide to Learn Pandas Series and DataFrames.
If you want us to continue writing such tutorials, support us by sharing this post on your social media accounts like Facebook / Twitter. This will encourage us and help us reach more people.
Best,
TechBeamers