How do you replace special characters in python?

I am using urllib to get a string of html from a website and need to put each word in the html document into a list.

Here is the code I have so far. I keep getting an error. I have also copied the error below.

import urllib.request

url = input["Please enter a URL: "]

z=urllib.request.urlopen[url]
z=str[z.read[]]
removeSpecialChars = str.replace["!@#$%^&*[][]{};:,./?\|`~-=_+", " "]

words = removeSpecialChars.split[]

print ["Words list: ", words[0:20]]

Here is the error.

Please enter a URL: //simleyfootball.com
Traceback [most recent call last]:
  File "C:\Users\jeremy.KLUG\My Documents\LiClipse Workspace\Python Project 2\Module2.py", line 7, in 
    removeSpecialChars = str.replace["!@#$%^&*[][]{};:,./?\|`~-=_+", " "]
TypeError: replace[] takes at least 2 arguments [1 given]

asked Jun 2, 2014 at 13:47

One way is to use re.sub, that's my preferred way.

import re
my_str = "hey th~!ere"
my_new_string = re.sub['[^a-zA-Z0-9 \n\.]', '', my_str]
print my_new_string

Output:

hey there

Another way is to use re.escape:

import string
import re

my_str = "hey th~!ere"

chars = re.escape[string.punctuation]
print re.sub[r'['+chars+']', '',my_str]

Output:

hey there

Just a small tip about parameters style in python by PEP-8 parameters should be remove_special_chars and not removeSpecialChars

Also if you want to keep the spaces just change [^a-zA-Z0-9 \n\.] to [^a-zA-Z0-9\n\.]

answered Jun 2, 2014 at 14:01

Kobi KKobi K

7,4696 gold badges39 silver badges83 bronze badges

3

str.replace is the wrong function for what you want to do [apart from it being used incorrectly]. You want to replace any character of a set with a space, not the whole set with a single space [the latter is what replace does]. You can use translate like this:

removeSpecialChars = z.translate [{ord[c]: " " for c in "!@#$%^&*[][]{};:,./?\|`~-=_+"}]

This creates a mapping which maps every character in your list of special characters to a space, then calls translate[] on the string, replacing every single character in the set of special characters with a space.

answered Jun 2, 2014 at 14:02

rassahahrassahah

7265 silver badges6 bronze badges

4

You need to call replace on z and not on str, since you want to replace characters located in the string variable z

removeSpecialChars = z.replace["!@#$%^&*[][]{};:,./?\|`~-=_+", " "]

But this will not work, as replace looks for a substring, you will most likely need to use regular expression module re with the sub function:

import re
removeSpecialChars = re.sub["[!@#$%^&*[][]{};:,./?\|`~-=_+]", " ", z]

Don't forget the [], which indicates that this is a set of characters to be replaced.

answered Jun 2, 2014 at 13:58

Danny MDanny M

4072 silver badges6 bronze badges

replace operates on a specific string, so you need to call it like this

removeSpecialChars = z.replace["!@#$%^&*[][]{};:,./?\|`~-=_+", " "]

but this is probably not what you need, since this will look for a single string containing all that characters in the same order. you can do it with a regexp, as Danny Michaud pointed out.

as a side note, you might want to look for BeautifulSoup, which is a library for parsing messy HTML formatted text like what you usually get from scaping websites.

answered Jun 2, 2014 at 13:51

PavelPavel

7,2162 gold badges28 silver badges41 bronze badges

2

You can replace the special characters with the desired characters as follows,

import string
specialCharacterText = "H#y #@w @re &*]?"
inCharSet = "!@#$%^&*[][]{};:,./?\|`~-=_+\""
outCharSet = "                               " #corresponding characters in inCharSet to be replaced
splCharReplaceList = string.maketrans[inCharSet, outCharSet]
splCharFreeString = specialCharacterText.translate[splCharReplaceList]

answered Feb 12, 2015 at 16:08

surendransurendran

4801 gold badge8 silver badges18 bronze badges

Translate seems faster:

N=100000, 30 special characters, string length=70

replace: 0.3251810073852539 re.sub: 0.2859320640563965 translate: 0.12320685386657715

answered Sep 2 at 17:35

YanoYano

5985 silver badges8 bronze badges

How do you replace special characters?

To insert special characters directly into the Find and Replace dialog box:.
Select the Home tab in the ribbon..
Select Replace in the Editing group..
Ensure that the special character appears in the Find what text box in the Find and Replace dialog box. ... .
Enter the replacement character in the Replace with text box..

How do I remove special characters from a text in Python?

Remove Special Characters From the String in Python Using the str. isalnum[] Method. The str. isalnum[] method returns True if the characters are alphanumeric characters, meaning no special characters in the string.

How do I remove special characters from a data in Python?

Remove Special Characters Including Strings Using Python isalnum. Python has a special string method, . isalnum[] , which returns True if the string is an alpha-numeric character, and returns False if it is not. We can use this, to loop over a string and append, to a new string, only alpha-numeric characters.

How do you replace special characters in a string?

Example of removing special characters using replaceAll[] method.
public class RemoveSpecialCharacterExample1..
public static void main[String args[]].
String str= "This#string%contains^special*characters&.";.
str = str.replaceAll["[^a-zA-Z0-9]", " "];.
System.out.println[str];.

Chủ Đề