Hướng dẫn beautifulsoup remove html tags

How can I simply strip all tags from an element I find in BeautifulSoup?

Hugo

26.1k7 gold badges80 silver badges95 bronze badges

asked Apr 25, 2013 at 4:26

Signal et Communication
Ingénierie Réseaux et Télécommunications

The expected result is:

Signal et Communication
Ingénierie Réseaux et Télécommunications

Here is the source code:

#!/usr/bin/env python3
from bs4 import BeautifulSoup

text = '''
Signal et Communication

Ingénierie Réseaux et Télécommunications ''' soup = BeautifulSoup[text] print[soup.get_text[]]

answered Jul 20, 2015 at 16:37

I linked to example.com'] for a in soup.find['a'].children: if isinstance[a,bs4.element.Tag]: a.decompose[] print soup Out: I linked to

answered Oct 17, 2013 at 22:37

danblackdanblack

1111 silver badge2 bronze badges

Code to simply get the contents as text instead of html:

'html_text' parameter is the string which you will pass in this function to get the text

from bs4 import BeautifulSoup

soup = BeautifulSoup[html_text, 'lxml']
text = soup.get_text[]
print[text]

answered May 18, 2020 at 8:53

1

it looks like this is the way to do! as simple as that

with this line you are joining together the all text parts within the current element

''.join[htmlelement.find[text=True]]

answered Apr 25, 2013 at 4:46

Daniele BDaniele B

18.7k23 gold badges106 silver badges165 bronze badges

Here is the source code: you can get the text which is exactly in the URL

URL = ''
page = requests.get[URL]
soup = bs4.BeautifulSoup[page.content,'html.parser'].get_text[]
print[soup]

answered Mar 10, 2020 at 15:08

Not the answer you're looking for? Browse other questions tagged python beautifulsoup or ask your own question.

Chủ Đề