• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar
PythonForBeginners.com

PythonForBeginners.com

Learn By Example

  • Home
  • Learn Python
    • Python Tutorial
  • Categories
    • Basics
    • Lists
    • Dictionary
    • Code Snippets
    • Comments
    • Modules
    • API
    • Beautiful Soup
    • Cheatsheet
    • Games
    • Loops
  • Python Courses
    • Python 3 For Beginners
You are here: Home / Basics / Select Specific Columns in Pandas Dataframe

Select Specific Columns in Pandas Dataframe

Author: Aditya Raj
Last Updated: January 27, 2023

While working with dataframes in python, we sometimes need to select specific data. For this, we need to select one or more columns that may or may not be contiguous. I have already discussed how to select multiple columns in the pandas dataframe. This article will discuss different ways to select specific columns in a pandas dataframe. 

Table of Contents
  1. Select Specific Columns in Pandas Dataframe Using Column Names
  2. Select Specific Columns in Pandas Dataframe Using the Column Positions
  3. Select Specific Columns in a Dataframe Using the iloc Attribute
  4. Specific Columns in a Dataframe Using the loc Attribute
  5. Conclusion

Select Specific Columns in Pandas Dataframe Using Column Names

To select specific columns from the pandas dataframe using the column names, you can pass a list of column names to the indexing operator as shown below.

import pandas as pd
myDicts=[{"Roll":1,"Maths":100, "Physics":80, "Chemistry": 90},
        {"Roll":2,"Maths":80, "Physics":100, "Chemistry": 90},
        {"Roll":3,"Maths":90, "Physics":80, "Chemistry": 70},
        {"Roll":4,"Maths":100, "Physics":100, "Chemistry": 90},
        {"Roll":5,"Maths":90, "Physics":90, "Chemistry": 80},
        {"Roll":6,"Maths":80, "Physics":70, "Chemistry": 70}]
df=pd.DataFrame(myDicts)
print("The input dataframe is:")
print(df)
print("The columns are:")
columns=df[["Maths", "Physics"]]
print(columns)

Output:

The input dataframe is:
   Roll  Maths  Physics  Chemistry
0     1    100       80         90
1     2     80      100         90
2     3     90       80         70
3     4    100      100         90
4     5     90       90         80
5     6     80       70         70
The columns are:
   Maths  Physics
0    100       80
1     80      100
2     90       80
3    100      100
4     90       90
5     80       70

In this example, we first converted a list of dictionaries to a dataframe using the DataFrame() function. Then, we selected the "Maths" and "Physics" columns from the dataframe using the list ["Maths", "Physics"].

Select Specific Columns in Pandas Dataframe Using the Column Positions

If you don’t know the column names and only have the position of the columns, you can use the column attribute of the pandas dataframe to select specific columns. For this, we will use the following steps.

  • First, we will get a list of column names from the dataframe using the columns attribute.
  • Then, we will extract the name of specific columns that we want to select. For this, we will use the list containing column names and list comprehension.
  • After obtaining the list of specific column names, we can use it to select specific columns in the dataframe using the indexing operator.

You can observe this in the following example.

import pandas as pd
myDicts=[{"Roll":1,"Maths":100, "Physics":80, "Chemistry": 90},
        {"Roll":2,"Maths":80, "Physics":100, "Chemistry": 90},
        {"Roll":3,"Maths":90, "Physics":80, "Chemistry": 70},
        {"Roll":4,"Maths":100, "Physics":100, "Chemistry": 90},
        {"Roll":5,"Maths":90, "Physics":90, "Chemistry": 80},
        {"Roll":6,"Maths":80, "Physics":70, "Chemistry": 70}]
df=pd.DataFrame(myDicts)
print("The input dataframe is:")
print(df)
column_names=df.columns
reduired_indices=[0,2,3]
reuired_columns=[column_names[index] for index in reduired_indices]
print("The column names are:")
print(reuired_columns)
print("The columns are:")
columns=df[reuired_columns]
print(columns)

Output:

The input dataframe is:
   Roll  Maths  Physics  Chemistry
0     1    100       80         90
1     2     80      100         90
2     3     90       80         70
3     4    100      100         90
4     5     90       90         80
5     6     80       70         70
The column names are:
['Roll', 'Physics', 'Chemistry']
The columns are:
   Roll  Physics  Chemistry
0     1       80         90
1     2      100         90
2     3       80         70
3     4      100         90
4     5       90         80
5     6       70         70

In this example, we had to select the columns at positions 0, 2, and 3. For this, we created a variable reduired_indices with the list [0, 2, 3] as its value. Then, we used list comprehension and the python indexing operator to get the column names at the specified indices from the list of column names. We stored the specified column names in the reuired_columns variable. Then, we used the indexing operator to select the specific columns from the dataframe.

Select Specific Columns in a Dataframe Using the iloc Attribute

The iloc attribute in a pandas dataframe is used to select rows or columns at any given position. The iloc attribute of a dataframe returns an _ilocIndexer object. We can use this _ilocIndexer object to select columns from the dataframe. To select columns as specific positions using the iloc object, we will use the following syntax.

df.iloc[start_row:end_row, list_of_column_positions]

Here,

  • df is the input dataframe.
  • The start_row variable contains the start position of the rows that we want to include in the output.
  • The end_row variable contains the position of the last row that we want to include in the output. 
  • The list_of_column_positions variable contains the position of specific columns that we want to select from the dataframe. 

As we want to select all the rows and specified columns, we will keep start_row and end_row empty. We will just pass the list containing the position of specific columns to the list_of_column_positions variable for selecting the columns from the dataframe as shown in the following example.

import pandas as pd
myDicts=[{"Roll":1,"Maths":100, "Physics":80, "Chemistry": 90},
        {"Roll":2,"Maths":80, "Physics":100, "Chemistry": 90},
        {"Roll":3,"Maths":90, "Physics":80, "Chemistry": 70},
        {"Roll":4,"Maths":100, "Physics":100, "Chemistry": 90},
        {"Roll":5,"Maths":90, "Physics":90, "Chemistry": 80},
        {"Roll":6,"Maths":80, "Physics":70, "Chemistry": 70}]
df=pd.DataFrame(myDicts)
print("The input dataframe is:")
print(df)
print("The columns are:")
list_of_column_positions=[0,2,3]
columns=df.iloc[:,list_of_column_positions]
print(columns)

Output:

The input dataframe is:
   Roll  Maths  Physics  Chemistry
0     1    100       80         90
1     2     80      100         90
2     3     90       80         70
3     4    100      100         90
4     5     90       90         80
5     6     80       70         70
The columns are:
   Roll  Physics  Chemistry
0     1       80         90
1     2      100         90
2     3       80         70
3     4      100         90
4     5       90         80
5     6       70         70

In this example, we used the iloc attribute to select columns at positions 0, 2, and 3 in the dataframe.

Specific Columns in a Dataframe Using the loc Attribute

The loc attribute in a pandas dataframe is used to select rows or columns at any given index or column name respectively. The loc attribute of a dataframe returns a _LocIndexer object. We can use this _LocIndexer object to select columns from the dataframe using the column names. To select specific columns using the loc object, we will use the following syntax.

df.iloc[start_row_index:end_row_index, list_of_column_names]

Here,

  • df is the input dataframe.
  • The start_row_index variable contains the start index of the rows that we want to include in the output.
  • The end_row_index variable contains the index of the last row that we want to include in the output. 
  • The list_of_column_names variable contains the name of specific columns that we want to select from the dataframe. 

As we want to select all the rows and specified columns, we will keep start_row_index and end_row_index empty. We will just pass the list of specific column names to list_of_column_names for selecting the columns from the dataframe as shown below.

import pandas as pd
myDicts=[{"Roll":1,"Maths":100, "Physics":80, "Chemistry": 90},
        {"Roll":2,"Maths":80, "Physics":100, "Chemistry": 90},
        {"Roll":3,"Maths":90, "Physics":80, "Chemistry": 70},
        {"Roll":4,"Maths":100, "Physics":100, "Chemistry": 90},
        {"Roll":5,"Maths":90, "Physics":90, "Chemistry": 80},
        {"Roll":6,"Maths":80, "Physics":70, "Chemistry": 70}]
df=pd.DataFrame(myDicts)
print("The input dataframe is:")
print(df)
print("The columns are:")
columns=df.loc[:,["Maths", "Physics"]]
print(columns)

Output:

The input dataframe is:
   Roll  Maths  Physics  Chemistry
0     1    100       80         90
1     2     80      100         90
2     3     90       80         70
3     4    100      100         90
4     5     90       90         80
5     6     80       70         70
The columns are:
   Maths  Physics
0    100       80
1     80      100
2     90       80
3    100      100
4     90       90
5     80       70

In this example, we have selected specific columns from the dataframe using a list of column names and the loc attribute.

Conclusion

In this article, we have discussed different ways to select specific columns in a pandas dataframe.

To learn more about python programming, you can read this article on how to sort a pandas dataframe. You might also like this article on how to drop columns from a pandas dataframe.

I hope you enjoyed reading this article. Stay tuned for more informative articles.

Happy Learning!

Related

Recommended Python Training

Course: Python 3 For Beginners

Over 15 hours of video content with guided instruction for beginners. Learn how to create real world applications and master the basics.

Enroll Now

Filed Under: Basics Author: Aditya Raj

More Python Topics

API Argv Basics Beautiful Soup bitly Cheatsheet Code Code Snippets Command Line Comments Concatenation crawler Data Structures Data Types deque Development Dictionary Dictionary Data Structure In Python Error Handling Exceptions Filehandling Files Functions Games GUI Json Lists Loops Mechanzie Modules Modules In Python Mysql OS pip Python Python On The Web Python Strings Queue Requests Scraping Scripts Split Strings System & OS urllib2

Primary Sidebar

Menu

  • Basics
  • Cheatsheet
  • Code Snippets
  • Development
  • Dictionary
  • Error Handling
  • Lists
  • Loops
  • Modules
  • Scripts
  • Strings
  • System & OS
  • Web

Get Our Free Guide To Learning Python

Most Popular Content

  • Reading and Writing Files in Python
  • Python Dictionary – How To Create Dictionaries In Python
  • How to use Split in Python
  • Python String Concatenation and Formatting
  • List Comprehensions in Python
  • How to Use sys.argv in Python?
  • How to use comments in Python
  • Try and Except in Python

Recent Posts

  • Pandas Append Row to DataFrame
  • Convert String to DataFrame in Python
  • Pandas DataFrame to List in Python
  • Solved: Dataframe Constructor Not Properly Called Error in Pandas
  • Overwrite a File in Python

Copyright © 2012–2023 · PythonForBeginners.com

  • Home
  • Contact Us
  • Privacy Policy
  • Write For Us