Often during data analysis tasks, we come across text data which needs to be processed so that useful information can be derived from the data. During text processing, we may have to extract or remove certain text from the data to make it useful or we may also need to replace certain symbols and terms with other text to extract useful information. In this article, we will study about punctuation marks and will look at the methods to remove punctuation marks from python strings.
What is a punctuation mark?
There are several symbols in English grammar which include comma, hyphen, question mark, dash, exclamation mark, colon, semicolon, parentheses, brackets etc which are termed as punctuation marks. These are used in English language for grammatical purposes but when we perform text processing in python we generally have to omit the punctuation marks from our strings. Now we will see different methods to remove punctuation marks from a string in Python.
Removing punctuation marks from string using for loop
In this method,first we will create an empty python string which will contain the output string. Then we will simply iterate through each character of the python string and check if it is a punctuation mark or not. If the character will be a punctuation mark, we will leave it. Otherwise we will include it in our output string using string concatenation.
For Example, In the code given below, we have each punctuation mark kept in a string named punctuation
. We iterate through the input string myString
using for loop and then we check if the character is present in the punctuation string or not. If it is not present, the character is included in the output string newString
.
punctuation= '''!()-[]{};:'"\, <>./?@#$%^&*_~'''
print("The punctuation marks are:")
print(punctuation)
myString= "Python.:F}or{Beg~inn;ers"
print("Input String is:")
print(myString)
newString=""
for x in myString:
if x not in punctuation:
newString=newString+x
print("Output String is:")
print(newString)
Output
The punctuation marks are:
!()-[]{};:'"\, <>./?@#$%^&*_~
Input String is:
Python.:F}or{Beg~inn;ers
Output String is:
PythonForBeginners
Remove punctuation marks from python string using regular expressions
We can also remove punctuation marks from strings in python using regular expressions. For this we will use re
module in python which provides functions for processing strings using regular expressions.
In this method, we will substitute each character which is not an alphanumeric or space character by an empty string using re.sub()
method and hence all of the punctuation will be removed.
The syntax for sub()
method is re.sub(pattern1, pattern2,input_string)
where pattern1
denotes the pattern of the characters which will be replaced. In our case, we will provide a pattern which denotes characters which is not an alphanumeric or space character. pattern2
is the final pattern by which characters in pattern1
will be replaced. In our case pattern2
will be empty string as we just have to remove the punctuation marks from our python string. input_string
is the string which has to be processed to remove punctuation.
Example:
import re
myString= "Python.:F}or{Beg~inn;ers"
print("Input String is:")
print(myString)
emptyString=""
newString=re.sub(r'[^\w\s]',emptyString,myString)
print("Output String is:")
print(newString)
Output
Input String is:
Python.:F}or{Beg~inn;ers
Output String is:
PythonForBeginners
Remove punctuation marks from python string using replace() method
Python string replace() method takes initial pattern and final pattern as parameters when invoked on a string and returns a resultant string where characters of initial pattern are replaced by characters in final pattern.
We can use replace() method to remove punctuation from python string by replacing each punctuation mark by empty string. We will iterate over the entire punctuation marks one by one replace it by an empty string in our text string.
The syntax for replace()
method is replace(character1,character2)
where character1
is the character which will be replaced by given character in the parameter character2
. In our case, character1
will contain punctuation marks and character2
will be an empty string.
punctuation= '''!()-[]{};:'"\, <>./?@#$%^&*_~'''
myString= "Python.:F}or{Beg~inn;ers"
print("Input String is:")
print(myString)
emptyString=""
for x in punctuation:
myString=myString.replace(x,emptyString)
print("Output String is:")
print(myString)
Output:
Input String is:
Python.:F}or{Beg~inn;ers
Output String is:
PythonForBeginners
Remove punctuation marks from python string using translate() method
The translate()
method replaces characters specified in the input string with new characters according to the translation table provided to the function as parameter. The translation table should contain the mapping of which characters have to be replaced by which characters. If the table does not have the mapping for any character, the character will not be replaced.
The syntax for translate()
method is translate(translation_dictionary
) where the translation_dictionary
will be a python dictionary containing mapping of characters in the input string to the characters by which they will be replaced.
To create the translation table, we can use maketrans()
method. This method takes the initial characters to be replaced, final characters and characters to be deleted from the string in the form of string as optional input and returns a python dictionary which works as translation table.
The syntax for maketrans()
method is maketrans(pattern1,pattern2,optional_pattern)
. Here pattern1
will be a string containing all the characters which are to be replaced. pattern2
will be a string containing the characters by which characters in pattern1
will be replaced. Here the length of pattern1
should be equal to length of pattern2
. optional_pattern
is a string containing the characters which have to be deleted from the input text. In our case, pattern1
and pattern2
will be empty strings while optional_pattern
will be a string containing punctuation marks.
To create a translation table for removing punctuation from python string, we can leave empty the first two parameters of maketrans()
function and include the punctuation marks in the list of characters to be excluded. In this way all the punctuation marks will be deleted and output string will be obtained.
Example
punctuation= '''!()-[]{};:'"\, <>./?@#$%^&*_~'''
myString= "Python.:F}or{Beg~inn;ers"
print("Input String is:")
print(myString)
emptyString=""
translationTable= str.maketrans("","",punctuation)
newString=myString.translate(translationTable)
print("Output String is:")
print(newString)
Output
Input String is:
Python.:F}or{Beg~inn;ers
Output String is:
PythonForBeginners
Conclusion
In this article, we have seen how to remove punctuation marks from strings in python using for loop , regular expressions and inbuilt string methods like replace() and translate(). Stay tuned for more informative articles.
Recommended Python Training
Course: Python 3 For Beginners
Over 15 hours of video content with guided instruction for beginners. Learn how to create real world applications and master the basics.