I'm newbie to python. Here is my code working on python 2.7.5
import urllib2
import sys
url ="mydomain.com"
usock = urllib2.urlopen[url]
data = usock.read[]
usock.close[]
print data
Getting HTML markup like that and it works.
What I want to do is, to get value from inside tag. for ex. I need data value from this example:
Data
How to do it?
asked Sep 6, 2013 at 11:38
1
You can use a HTML parser module such as BeautifulSoup
:
from bs4 import BeautifulSoup as BS
url ="mydomain.com"
usock = urllib2.urlopen[url]
data = usock.read[]
usock.close[]
soup = BS[data]
print soup.find['font', {'class':'big'}].text
This finds a tag with a
class="big"
. It then prints its content.
answered Sep 6, 2013 at 11:39
TerryATerryA
56.9k11 gold badges117 silver badges137 bronze badges
2
Using lxml
:
import urllib2
import lxml.html
url ="mydomain.com"
usock = urllib2.urlopen[url]
data = usock.read[]
usock.close[]
for font in lxml.html.fromstring[data].cssselect['font.big']:
print font.text
>>> import lxml.html
>>> root = lxml.html.fromstring['Data']
>>> [font.text for font in root.cssselect['font.big']]
['Data']
answered Sep 6, 2013 at 11:40
falsetrufalsetru
343k57 gold badges683 silver badges606 bronze badges
View Discussion
Improve Article
Save Article
View Discussion
Improve Article
Save Article
Prerequisites: Beautifulsoup
In this article, we will discuss how beautifulsoup can be employed to find a tag with the given attribute value in an HTML document.
Approach:
- Import module.
- Scrap data from a webpage.
- Parse the string scraped to HTML.
- Use find[] function to find the attribute and tag.
- Print the result.
Syntax: find[attr_name=”value”]
Below are some implementations of the above approach:
Example 1:
Python3
from
bs4
import
BeautifulSoup
markup
=
soup
=
BeautifulSoup[markup,
'html.parser'
]
div_bs4
=
soup.find[
id
=
"container"
]
print
[div_bs4.name]
Output:
div
Example 2:
Python3
from
bs4
import
BeautifulSoup
soup
=
BeautifulSoup[markup,
'html.parser'
]
print
[div_bs4.name]
Output:
a
Example 3:
Python3
from
bs4
import
BeautifulSoup
markup
=
soup
=
BeautifulSoup[markup,
'html.parser'
]
div_bs4
=
soup.find[
class_
=
"gfg"
]
print
[div_bs4.name]
Output:
p