Many times while working with Python strings, we have a problem in which we need to remove certain characters from strings. This can have applications in data preprocessing in the Data Science domain and also in day-day programming. Let’s discuss certain ways in which we can perform this task using Python.
Method 1: Remove Punctuation from a String with Translate
The first two arguments for string.translate method is empty strings, and the third input is a Python list of the punctuation that should be removed. This instructs the Python method to eliminate punctuation from a string. This is one of the best ways to strip punctuation from a string.
Python3
import
string
test_str
=
'Gfg, is best: for ! Geeks ;'
test_str
=
test_str.translate
[
str
.maketrans['
', '
', string.punctuation]]
print
[test_str]
Output:
Gfg is best for Geeks
Method 2: Remove Punctuation from a String with Python loop
This is the brute way in which this task can be performed. In this, we check for the punctuations using a raw string that contain punctuations and then we construct a string removing those punctuations.
Python3
test_str
=
"Gfg, is best : for ! Geeks ;"
print
[
"The original string is : "
+
test_str]
punc
=
for
ele
in
test_str:
if
ele
in
punc:
test_str
=
test_str.replace[ele, ""]
print
[
"The string after punctuation filter : "
+
test_str]
Output:
The original string is : Gfg, is best : for ! Geeks ; The string after punctuation filter : Gfg is best for Geeks
Method 3: Remove Punctuation from a String with regex
The part of replacing with punctuation can also be performed using regex. In this, we replace all punctuation with an empty string using a certain regex.
Python3
import
re
test_str
=
"Gfg, is best : for ! Geeks ;"
print
[
"The original string is : "
+
test_str]
res
=
re.sub[r
'[^\w\s]'
, '', test_str]
print
[
"The string after punctuation filter : "
+
res]
Output :
The original string is : Gfg, is best : for ! Geeks ; The string after punctuation filter : Gfg is best for Geeks
Method 4: Using for loop, punctuation string and not in operator
Python3
test_str
=
"Gfg, is best : for ! Geeks ;"
print
[
"The original string is : "
+
test_str]
punc
=
res
=
" "
for
ele
in
test_str:
if
ele
not
in
punc:
res
+
=
ele
print
[
"The string after punctuation filter : "
+
res]
Output
The original string is : Gfg, is best : for ! Geeks ; The string after punctuation filter : Gfg is best for Geeks
The Time and Space Complexity for all the methods are the same:
Time Complexity: O[n]
Auxiliary Space: O[n]
From an efficiency perspective, you're not going to beat
s.translate[None, string.punctuation]
For higher versions of Python use the following code:
s.translate[str.maketrans['', '', string.punctuation]]
It's performing raw string operations in C with a lookup table - there's not much that will beat that but writing your own C code.
If speed isn't a worry, another option though is:
exclude = set[string.punctuation]
s = ''.join[ch for ch in s if ch not in exclude]
This is faster than s.replace with each char, but won't perform as well as non-pure python approaches such as regexes or string.translate, as you can see from the below timings. For this type of problem, doing it at as low a level as possible pays off.
Timing code:
import re, string, timeit
s = "string. With. Punctuation"
exclude = set[string.punctuation]
table = string.maketrans["",""]
regex = re.compile['[%s]' % re.escape[string.punctuation]]
def test_set[s]:
return ''.join[ch for ch in s if ch not in exclude]
def test_re[s]: # From Vinko's solution, with fix.
return regex.sub['', s]
def test_trans[s]:
return s.translate[table, string.punctuation]
def test_repl[s]: # From S.Lott's solution
for c in string.punctuation:
s=s.replace[c,""]
return s
print "sets :",timeit.Timer['f[s]', 'from __main__ import s,test_set as f'].timeit[1000000]
print "regex :",timeit.Timer['f[s]', 'from __main__ import s,test_re as f'].timeit[1000000]
print "translate :",timeit.Timer['f[s]', 'from __main__ import s,test_trans as f'].timeit[1000000]
print "replace :",timeit.Timer['f[s]', 'from __main__ import s,test_repl as f'].timeit[1000000]
This gives the following results:
sets : 19.8566138744
regex : 6.86155414581
translate : 2.12455511093
replace : 28.4436721802