• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar
PythonForBeginners.com

PythonForBeginners.com

Learn By Example

  • Home
  • Learn Python
    • Python Tutorial
  • Categories
    • Basics
    • Lists
    • Dictionary
    • Code Snippets
    • Comments
    • Modules
    • API
    • Beautiful Soup
    • Cheatsheet
    • Games
    • Loops
  • Python Courses
    • Python 3 For Beginners
You are here: Home / Basics / Merge DataFrames in Python

Merge DataFrames in Python

Author: Aditya Raj
Last Updated: October 12, 2022

Python provides us with the pandas dataframes to handle tabular data. In this article, we will discuss how we can merge two dataframes in python.

How to Merge Two DataFrames in Python?

Suppose that we have a dataframe that contains the names of some students, their roll number, and the class they have opted to study as shown below.

     Class  Roll      Name
0       1    11    Aditya
1       1    12     Chris
2       1    13       Sam
3       1    14      Joel
4       1    15       Tom
5       1    16  Samantha
6       1    17     Pablo
7       1    20      Tina
8       1    24       Amy
9       1    30    Justin
10      1    31      Karl

Also, we have a dataframe that contains the roll number of the students and the marks obtained by them as shown below.

    Roll  Marks Grade
0    11     85     A
1    12     95     A
2    13     75     B
3    14     75     B
4    16     78     B
5    15     55     C
6    19     75     B
7    20     72     B
8    24     92     A
9    25     95     A

Now, we have to merge the given dataframes such that the resultant dataframe contains the name, roll number, grade, class, and the corresponding marks for each student as shown below.

    Roll  Marks Grade  Class      Name
0    11     85     A      1    Aditya
1    12     95     A      1     Chris
2    13     75     B      1       Sam
3    14     75     B      1      Joel
4    16     78     B      1  Samantha
5    15     55     C      1       Tom
6    20     72     B      1      Tina
7    24     92     A      1       Amy

To obtain the output, we will use the merge() method defined in the pandas module. The syntax of the merge() method is as follows.

pd.merge(df1,df2, on) 

Here, 

  • df1  denotes the first dataframe.
  • The parameter df2 denotes the second dataframe that is to be merged.
  • The parameter ‘on’ is the column name of the dataframes that is used to compare the columns of the given dataframes. Rows from two dataframes are merged together if they have the same value in the column corresponding to the ‘on’ parameter.

The merge() method also takes several other parameters. You can have a look at them in this documentation.

To merge the given dataframes, we will invoke the merge() method first dataframe as the first input argument. Subsequently, we will pass the dataframe containing the marks as the second input argument to the merge() method. To the ‘on’ parameter, we will pass the ‘Roll’ column name. In this way, the rows corresponding to the same roll numbers from the given dataframes will be merged and the resultant dataframe will be produced as shown below.

import numpy as np
import pandas as pd
df1=pd.read_csv("grade_with_roll1.csv")
print("First dataframe is:")
print(df1)
df2=pd.read_csv("name_data1.csv")
print("second dataframe is:")
print(df2)
df3=pd.merge(df1,df2,left_on="Roll", right_on="Roll")
print("Merged dataframe is:")
print(df3)

Output:

First dataframe is:
   Roll  Marks Grade
0    11     85     A
1    12     95     A
2    13     75     B
3    14     75     B
4    16     78     B
5    15     55     C
6    19     75     B
7    20     72     B
8    24     92     A
9    25     95     A
second dataframe is:
    Class  Roll      Name
0       1    11    Aditya
1       1    12     Chris
2       1    13       Sam
3       1    14      Joel
4       1    15       Tom
5       1    16  Samantha
6       1    17     Pablo
7       1    20      Tina
8       1    24       Amy
9       1    30    Justin
10      1    31      Karl
Merged dataframe is:
   Roll  Marks Grade  Class      Name
0    11     85     A      1    Aditya
1    12     95     A      1     Chris
2    13     75     B      1       Sam
3    14     75     B      1      Joel
4    16     78     B      1  Samantha
5    15     55     C      1       Tom
6    20     72     B      1      Tina
7    24     92     A      1       Amy

If the second dataframe has rows that do not correspond to any row in the first dataframe, those rows are omitted from the result. Similarly, if the first dataframe has rows that don’t correspond to the second dataframe, those rows will be omitted from the result. You can observe this in the above example.

Merge DataFrames Using Outer Join in Python

To include the omitted rows, we can use the parameter ‘how’ in the merge() method. The ‘how’ parameter has the default value ‘inner’. Due to this, only those rows are included in the resultant dataframe that are present in both the input dataframes. To included the omitted dataframes, you can pass the value ‘outer’ to the ‘how’ parameter as shown below.

import numpy as np
import pandas as pd
df1=pd.read_csv("grade_with_roll1.csv")
print("First dataframe is:")
print(df1)
df2=pd.read_csv("name_data1.csv")
print("second dataframe is:")
print(df2)
df3=pd.merge(df1,df2,how="outer",left_on="Roll", right_on="Roll",suffixes=("_left","_right"))
print("Merged dataframe is:")
print(df3)

Output:

First dataframe is:
   Roll  Marks Grade
0    11     85     A
1    12     95     A
2    13     75     B
3    14     75     B
4    16     78     B
5    15     55     C
6    19     75     B
7    20     72     B
8    24     92     A
9    25     95     A
second dataframe is:
    Class  Roll      Name
0       1    11    Aditya
1       1    12     Chris
2       1    13       Sam
3       1    14      Joel
4       1    15       Tom
5       1    16  Samantha
6       1    17     Pablo
7       1    20      Tina
8       1    24       Amy
9       1    30    Justin
10      1    31      Karl
Merged dataframe is:
    Roll  Marks Grade  Class      Name
0     11   85.0     A    1.0    Aditya
1     12   95.0     A    1.0     Chris
2     13   75.0     B    1.0       Sam
3     14   75.0     B    1.0      Joel
4     16   78.0     B    1.0  Samantha
5     15   55.0     C    1.0       Tom
6     19   75.0     B    NaN       NaN
7     20   72.0     B    1.0      Tina
8     24   92.0     A    1.0       Amy
9     25   95.0     A    NaN       NaN
10    17    NaN   NaN    1.0     Pablo
11    30    NaN   NaN    1.0    Justin
12    31    NaN   NaN    1.0      Karl

Suggested Reading: If you are into machine learning, you can read this article on regression in machine learning. You might also like this article on k-means clustering with numerical example.

Merge DataFrames Using Left and Right Join in Python

If you want to include omitted rows only from the first dataframe, you can pass the value ‘left’ to the parameter ‘how’.

import numpy as np
import pandas as pd
df1=pd.read_csv("grade_with_roll1.csv")
print("First dataframe is:")
print(df1)
df2=pd.read_csv("name_data1.csv")
print("second dataframe is:")
print(df2)
df3=pd.merge(df1,df2,how="left",left_on="Roll", right_on="Roll",suffixes=("_left","_right"))
print("Merged dataframe is:")
print(df3)

Output:

First dataframe is:
   Roll  Marks Grade
0    11     85     A
1    12     95     A
2    13     75     B
3    14     75     B
4    16     78     B
5    15     55     C
6    19     75     B
7    20     72     B
8    24     92     A
9    25     95     A
second dataframe is:
    Class  Roll      Name
0       1    11    Aditya
1       1    12     Chris
2       1    13       Sam
3       1    14      Joel
4       1    15       Tom
5       1    16  Samantha
6       1    17     Pablo
7       1    20      Tina
8       1    24       Amy
9       1    30    Justin
10      1    31      Karl
Merged dataframe is:
   Roll  Marks Grade  Class      Name
0    11     85     A    1.0    Aditya
1    12     95     A    1.0     Chris
2    13     75     B    1.0       Sam
3    14     75     B    1.0      Joel
4    16     78     B    1.0  Samantha
5    15     55     C    1.0       Tom
6    19     75     B    NaN       NaN
7    20     72     B    1.0      Tina
8    24     92     A    1.0       Amy
9    25     95     A    NaN       NaN

Similarly, if you want to include omitted rows only from the second dataframe, you can pass the value ‘right’ to the ‘how’ parameter as shown below.

import numpy as np
import pandas as pd
df1=pd.read_csv("grade_with_roll1.csv")
print("First dataframe is:")
print(df1)
df2=pd.read_csv("name_data1.csv")
print("second dataframe is:")
print(df2)
df3=pd.merge(df1,df2,how="right",left_on="Roll", right_on="Roll",suffixes=("_left","_right"))
print("Merged dataframe is:")
print(df3)

Output:

First dataframe is:
   Roll  Marks Grade
0    11     85     A
1    12     95     A
2    13     75     B
3    14     75     B
4    16     78     B
5    15     55     C
6    19     75     B
7    20     72     B
8    24     92     A
9    25     95     A
second dataframe is:
    Class  Roll      Name
0       1    11    Aditya
1       1    12     Chris
2       1    13       Sam
3       1    14      Joel
4       1    15       Tom
5       1    16  Samantha
6       1    17     Pablo
7       1    20      Tina
8       1    24       Amy
9       1    30    Justin
10      1    31      Karl
Merged dataframe is:
    Roll  Marks Grade  Class      Name
0     11   85.0     A      1    Aditya
1     12   95.0     A      1     Chris
2     13   75.0     B      1       Sam
3     14   75.0     B      1      Joel
4     15   55.0     C      1       Tom
5     16   78.0     B      1  Samantha
6     17    NaN   NaN      1     Pablo
7     20   72.0     B      1      Tina
8     24   92.0     A      1       Amy
9     30    NaN   NaN      1    Justin
10    31    NaN   NaN      1      Karl

Conclusion

In this article, we have discussed how to merge two dataframes in python using the merge() method. To learn more about python programming, you can read this article on dictionary comprehension in python. You might also like this article on list comprehension in python.

Related

Recommended Python Training

Course: Python 3 For Beginners

Over 15 hours of video content with guided instruction for beginners. Learn how to create real world applications and master the basics.

Enroll Now

Filed Under: Basics Author: Aditya Raj

More Python Topics

API Argv Basics Beautiful Soup Cheatsheet Code Code Snippets Command Line Comments Concatenation crawler Data Structures Data Types deque Development Dictionary Dictionary Data Structure In Python Error Handling Exceptions Filehandling Files Functions Games GUI Json Lists Loops Mechanzie Modules Modules In Python Mysql OS pip Pyspark Python Python On The Web Python Strings Queue Requests Scraping Scripts Split Strings System & OS urllib2

Primary Sidebar

Menu

  • Basics
  • Cheatsheet
  • Code Snippets
  • Development
  • Dictionary
  • Error Handling
  • Lists
  • Loops
  • Modules
  • Scripts
  • Strings
  • System & OS
  • Web

Get Our Free Guide To Learning Python

Most Popular Content

  • Reading and Writing Files in Python
  • Python Dictionary – How To Create Dictionaries In Python
  • How to use Split in Python
  • Python String Concatenation and Formatting
  • List Comprehension in Python
  • How to Use sys.argv in Python?
  • How to use comments in Python
  • Try and Except in Python

Recent Posts

  • Count Rows With Null Values in PySpark
  • PySpark OrderBy One or Multiple Columns
  • Select Rows with Null values in PySpark
  • PySpark Count Distinct Values in One or Multiple Columns
  • PySpark Filter Rows in a DataFrame by Condition

Copyright © 2012–2025 · PythonForBeginners.com

  • Home
  • Contact Us
  • Privacy Policy
  • Write For Us