Pandas Dataframe Index in Python - PythonForBeginners.com

Pandas dataframes are one of the most used data structures for data analysis and machine learning tasks in Python. In this article, we will discuss how to create and delete an index from a pandas dataframe. We will also discuss multilevel indexing in a pandas dataframe and how we can access elements from a dataframe using dataframe indices.

Table of Contents

What Is a Pandas Dataframe Index?
Create an Index While Creating a Pandas Dataframe
Create Dataframe Index While Loading a CSV File
Create an Index After Creating a Pandas Dataframe
Convert Column of a DataFrame into Index
Change Index of a Pandas Dataframe
Create Multilevel Index in a Pandas Dataframe
1. Create a Multilevel Index While Creating a Dataframe
2. Create a Multilevel Index After Creating a Dataframe
Remove Index From a Pandas Dataframe
Conclusion

What Is a Pandas Dataframe Index?

Just like a dataframe has column names, you can consider an index as a row label. When we create a dataframe, the rows of the dataframe are assigned indices starting from 0 till the number of rows minus one as shown below.

import pandas as pd
list1=[1,2,3]
list2=[3,55,34]
list3=[12,32,45]
myList=[list1,list2,list3]
myDf=pd.DataFrame(myList,columns=["A", "B", "C"])
print("The dataframe is:")
print(myDf)
print("The index is:")
index=list(myDf.index)
print(index)

Output:

The dataframe is:
    A   B   C
0   1   2   3
1   3  55  34
2  12  32  45
The index is:
[0, 1, 2]

Create an Index While Creating a Pandas Dataframe

You can also create custom indices while creating a dataframe. For this, you can use the index parameter of the DataFrame() function. The index parameter takes a list of values and assigns the values as indices of the rows in the dataframe. You can observe this in the following example.

import pandas as pd
list1=[1,2,3]
list2=[3,55,34]
list3=[12,32,45]
myList=[list1,list2,list3]
myDf=pd.DataFrame(myList,columns=["A", "B", "C"],index=[101,102,103])
print("The dataframe is:")
print(myDf)
print("The index is:")
index=list(myDf.index)
print(index)

Output:

The dataframe is:
      A   B   C
101   1   2   3
102   3  55  34
103  12  32  45
The index is:
[101, 102, 103]

In the above example, we have created the index of the dataframe using the list [101, 102, 103] and the index parameter of the DataFrame() function.

Here, you need to make sure that the number of elements in the list passed to the index parameter should be equal to the number of rows in the dataframe. Otherwise, the program will run into a ValueError exception as shown below.

import pandas as pd
list1=[1,2,3]
list2=[3,55,34]
list3=[12,32,45]
myList=[list1,list2,list3]
myDf=pd.DataFrame(myList,columns=["A", "B", "C"],index=[101,102,103,104])
print("The dataframe is:")
print(myDf)
print("The index is:")
index=list(myDf.index)
print(index)

Output:

ValueError: Length of values (3) does not match length of index (4)

In the above example, you can observe that we have passed 4 elements in the list passed to the index parameter. However, the dataframe has only three rows. Hence, the program runs into Python ValueError exception.

Create Dataframe Index While Loading a CSV File

If you are creating a dataframe a csv file and you want to make a column of the csv file as the dataframe index, you can use the index_col parameter in the read_csv() function.

The index_col parameter takes the name of the column as its input argument. After execution of the read_csv() function, the specified column is assigned as the index of the dataframe. You can observe this in the following example.

myDf=pd.read_csv("samplefile.csv",index_col="Class")
print("The dataframe is:")
print(myDf)
print("The index is:")
index=list(myDf.index)
print(index)

Output:

The dataframe is:
       Roll      Name
Class                
1        11    Aditya
1        12     Chris
1        13       Sam
2         1      Joel
2        22       Tom
2        44  Samantha
3        33      Tina
3        34       Amy
The index is:
[1, 1, 1, 2, 2, 2, 3, 3]

You can also pass the position of a column name in the column list instead of its name as an input argument to the index_col parameter. For instance, if you want to make the first column of the pandas dataframe as its index, you can pass 0 to the index_col parameter in the DataFrame() function as shown below.

myDf=pd.read_csv("samplefile.csv",index_col=0)
print("The dataframe is:")
print(myDf)
print("The index is:")
index=list(myDf.index)
print(index)

Output:

The dataframe is:
       Roll      Name
Class                
1        11    Aditya
1        12     Chris
1        13       Sam
2         1      Joel
2        22       Tom
2        44  Samantha
3        33      Tina
3        34       Amy
The index is:
[1, 1, 1, 2, 2, 2, 3, 3]

Here, the Class column is the first column in the csv file. Hence, it is converted into index of the dataframe.

The index_col parameter also takes multiple values as their input. We have discussed this in the section on multilevel indexing in dataframes.

Create an Index After Creating a Pandas Dataframe

When a dataframe is created, the rows of the dataframe are assigned indices starting from 0 till the number of rows minus one. However, we can create a custom index for a dataframe using the index attribute.

To create a custom index in a pandas dataframe, we will assign a list of index labels to the index attribute of the dataframe. After execution of the assignment statement, a new index is created for the dataframe as shown below.

myDf=pd.read_csv("samplefile.csv")
print("The dataframe is:")
myDf.index=[101,102,103,104,105,106,107,108]
print(myDf)
print("The index is:")
index=list(myDf.index)
print(index)

Output:

The dataframe is:
     Class  Roll      Name
101      1    11    Aditya
102      1    12     Chris
103      1    13       Sam
104      2     1      Joel
105      2    22       Tom
106      2    44  Samantha
107      3    33      Tina
108      3    34       Amy
The index is:
[101, 102, 103, 104, 105, 106, 107, 108]

Here, you can see that we have assigned a list containing numbers from 101 to108 to the index attribute of the dataframe. Hence, the elements of the list are converted into indices of the rows in the dataframe.

Remember that the total number of index labels in the list should be equal to the number of rows in the dataframe. Otherwise, the program will run into a ValueError exception.

Convert Column of a DataFrame into Index

We can also use a column as the index of the dataframe. For this, we can use the set_index() method. The set_index() method, when invoked on a dataframe, takes the column name as its input argument. After execution, it returns a new dataframe with the specified column as its index as shown in the following example.

myDf=pd.read_csv("samplefile.csv")
print("The dataframe is:")
myDf=myDf.set_index("Class")
print(myDf)
print("The index is:")
index=list(myDf.index)
print(index)

Output:

The dataframe is:
       Roll      Name
Class                
1        11    Aditya
1        12     Chris
1        13       Sam
2         1      Joel
2        22       Tom
2        44  Samantha
3        33      Tina
3        34       Amy
The index is:
[1, 1, 1, 2, 2, 2, 3, 3]

In the above example, we have use the set_index() method to create index from an existing column of the dataframe instead of a new sequence.

Change Index of a Pandas Dataframe

You can change the index column of a dataframe using the set_index() method. For this, you just need to pass the column name of the new index column as input to the set_index() method as shown below.

myDf=pd.read_csv("samplefile.csv")
print("The dataframe is:")
myDf=myDf.set_index("Class")
print(myDf)
print("The index is:")
index=list(myDf.index)
print(index)
print("The modified dataframe is:")
newDf=myDf.set_index("Roll")
print(newDf)
print("The index is:")
index=list(newDf.index)
print(index)

Output:

The dataframe is:
       Roll      Name
Class                
1        11    Aditya
1        12     Chris
1        13       Sam
2         1      Joel
2        22       Tom
2        44  Samantha
3        33      Tina
3        34       Amy
The index is:
[1, 1, 1, 2, 2, 2, 3, 3]
The modified dataframe is:
          Name
Roll          
11      Aditya
12       Chris
13         Sam
1         Joel
22         Tom
44    Samantha
33        Tina
34         Amy
The index is:
[11, 12, 13, 1, 22, 44, 33, 34]

If you want to assign a sequence as the new index to the dataframe, you can assign the sequence to the index attribute of the pandas dataframe as shown below.

myDf=pd.read_csv("samplefile.csv")
print("The dataframe is:")
myDf=myDf.set_index("Class")
print(myDf)
print("The index is:")
index=list(myDf.index)
print(index)
print("The modified dataframe is:")
myDf.index=[101, 102, 103, 104, 105, 106, 107, 108]
print(myDf)
print("The index is:")
index=list(myDf.index)
print(index)

Output:

The dataframe is:
       Roll      Name
Class                
1        11    Aditya
1        12     Chris
1        13       Sam
2         1      Joel
2        22       Tom
2        44  Samantha
3        33      Tina
3        34       Amy
The index is:
[1, 1, 1, 2, 2, 2, 3, 3]
The modified dataframe is:
     Roll      Name
101    11    Aditya
102    12     Chris
103    13       Sam
104     1      Joel
105    22       Tom
106    44  Samantha
107    33      Tina
108    34       Amy
The index is:
[101, 102, 103, 104, 105, 106, 107, 108]

When we change the index column of a dataframe, the existing index column is deleted from the dataframe. Therefore, you should first store the index column into a new column of the dataframe before changing the index column. Otherwise, you will lose data stored in the index column from your dataframe.

myDf=pd.read_csv("samplefile.csv")
print("The dataframe is:")
myDf=myDf.set_index("Class")
print(myDf)
print("The index is:")
index=list(myDf.index)
print(index)
print("The modified dataframe is:")
myDf["Class"]=myDf.index
myDf.index=[101, 102, 103, 104, 105, 106, 107, 108]
print(myDf)
print("The index is:")
index=list(myDf.index)
print(index)

Output:

The dataframe is:
       Roll      Name
Class                
1        11    Aditya
1        12     Chris
1        13       Sam
2         1      Joel
2        22       Tom
2        44  Samantha
3        33      Tina
3        34       Amy
The index is:
[1, 1, 1, 2, 2, 2, 3, 3]
The modified dataframe is:
     Roll      Name  Class
101    11    Aditya      1
102    12     Chris      1
103    13       Sam      1
104     1      Joel      2
105    22       Tom      2
106    44  Samantha      2
107    33      Tina      3
108    34       Amy      3
The index is:
[101, 102, 103, 104, 105, 106, 107, 108]

Here, you can observe that we have first stored the index into the Class column before changing the index of the dataframe. In the previous example, we hadn’t done that. Due to this, the data in the Class column was lost.

Create Multilevel Index in a Pandas Dataframe

You can also create a multilevel index in a dataframe. Multilevel indices help you access hierarchical data such as census data that have different levels of abstraction. We can create multilevel indices while creating the dataframe as well as after creating the dataframe. This is discussed as follows.

Create a Multilevel Index While Creating a Dataframe

To create a multilevel index using different columns of a dataframe, you can use the index_col parameter in the read_csv() function. The index_col parameter takes a list of columns that have to be used as indices. The order of the column names in the list given to the index_col parameter from left to right is from highest to lowest level of index. After execution of the read_csv() function, you will get a dataframe with multilevel index as shown in the following example.

myDf=pd.read_csv("samplefile.csv",index_col=["Class","Roll"])
print("The dataframe is:")
print(myDf)
print("The index is:")
index=list(myDf.index)
print(index)

Output:

The dataframe is:
                Name
Class Roll          
1     11      Aditya
      12       Chris
      13         Sam
2     1         Joel
      22         Tom
      44    Samantha
3     33        Tina
      34         Amy
The index is:
[(1, 11), (1, 12), (1, 13), (2, 1), (2, 22), (2, 44), (3, 33), (3, 34)]

In the above example, the Class column contains the first level of index and the Roll column contains the second level of index. To access elements from the dataframe, you need to know index at both the level for any row.

Instead of using the column names, you can also pass the position of a column name in the column list instead of its name as an input argument to the index_col parameter. For instance, you can assign the first and third column of the dataframe as its index as shown below.

myDf=pd.read_csv("samplefile.csv",index_col=[0,1])
print("The dataframe is:")
print(myDf)
print("The index is:")
index=list(myDf.index)
print(index)

Output:

The dataframe is:
                Name
Class Roll          
1     11      Aditya
      12       Chris
      13         Sam
2     1         Joel
      22         Tom
      44    Samantha
3     33        Tina
      34         Amy
The index is:
[(1, 11), (1, 12), (1, 13), (2, 1), (2, 22), (2, 44), (3, 33), (3, 34)]

Create a Multilevel Index After Creating a Dataframe

You can also create a multilevel index after creating a dataframe using the set_index() method. For this, you just need to pass a list of column names to the set_index() method. Again, the order of the column names in the list given to the index_col parameter from left to right is from highest to lowest level of index as shown below.

myDf=pd.read_csv("samplefile.csv")
print("The dataframe is:")
myDf=myDf.set_index(["Class","Roll"])
print(myDf)
print("The index is:")
index=list(myDf.index)
print(index)

Output:

The dataframe is:
                Name
Class Roll          
1     11      Aditya
      12       Chris
      13         Sam
2     1         Joel
      22         Tom
      44    Samantha
3     33        Tina
      34         Amy
The index is:
[(1, 11), (1, 12), (1, 13), (2, 1), (2, 22), (2, 44), (3, 33), (3, 34)]

You need to keep in mind that the set_index() method removes the existing index column. If you want to save the data stored in the index column, you should copy the data into another column before creating new index.

Remove Index From a Pandas Dataframe

To remove index from a pandas dataframe, you can use the reset_index() method. The reset_index() method, when invoked on a dataframe, returns a new dataframe without any index column. If the existing index is a specific column, the column is again converted to a normal column as shown below.

myDf=pd.read_csv("samplefile.csv",index_col=[0,1])
print("The dataframe is:")
print(myDf)
print("The index is:")
index=list(myDf.index)
print(index)
myDf=myDf.reset_index()
print("The modified dataframe is:")
print(myDf)
print("The index is:")
index=list(myDf.index)
print(index)

Output:

The dataframe is:
                Name
Class Roll          
1     11      Aditya
      12       Chris
      13         Sam
2     1         Joel
      22         Tom
      44    Samantha
3     33        Tina
      34         Amy
The index is:
[(1, 11), (1, 12), (1, 13), (2, 1), (2, 22), (2, 44), (3, 33), (3, 34)]
The modified dataframe is:
   Class  Roll      Name
0      1    11    Aditya
1      1    12     Chris
2      1    13       Sam
3      2     1      Joel
4      2    22       Tom
5      2    44  Samantha
6      3    33      Tina
7      3    34       Amy
The index is:
[0, 1, 2, 3, 4, 5, 6, 7]

Conclusion

In this article, we have discussed how to create pandas dataframe index. Additionally, we have also created multilevel indices and learnt how to remove index from a pandas dataframe. To learn more about python programming, you can read this article on list comprehension in Python. If you are into machine learning, you can read this article on regular expressions in machine learning.

Stay tuned for more informative articles.

Happy Learning!

Recommended Python Training

Course: Python 3 For Beginners

Over 15 hours of video content with guided instruction for beginners. Learn how to create real world applications and master the basics.

Enroll Now

What Is a Pandas Dataframe Index?

Create an Index While Creating a Pandas Dataframe

Create Dataframe Index While Loading a CSV File

Create an Index After Creating a Pandas Dataframe

Convert Column of a DataFrame into Index

Change Index of a Pandas Dataframe

Create Multilevel Index in a Pandas Dataframe

Create a Multilevel Index While Creating a Dataframe

Create a Multilevel Index After Creating a Dataframe

Remove Index From a Pandas Dataframe

Conclusion

Related

Recommended Python Training

More Python Topics