How do i read the contents of a website in python?

The following works when I paste it on the browser:

//www.somesite.com/details.pl?urn=2344

But when I try reading the URL with Python nothing happens:

 link = '//www.somesite.com/details.pl?urn=2344'
 f = urllib.urlopen[link]           
 myfile = f.readline[]  
 print myfile

Do I need to encode the URL, or is there something I'm not seeing?

Martin Thoma

113k148 gold badges570 silver badges875 bronze badges

asked Feb 28, 2013 at 14:55

For python3 users, to save time, use the following code,

from urllib.request import urlopen

link = "//docs.scipy.org/doc/numpy/user/basics.broadcasting.html"

f = urlopen[link]
myfile = f.read[]
print[myfile]

I know there are different threads for error: Name Error: urlopen is not defined, but thought this might save time.

Asclepius

52.1k15 gold badges150 silver badges131 bronze badges

answered Aug 25, 2017 at 17:38

i.n.n.mi.n.n.m

2,7486 gold badges25 silver badges48 bronze badges

2

None of these answers are very good for Python 3 [tested on latest version at the time of this post].

This is how you do it...

import urllib.request

try:
   with urllib.request.urlopen['//www.python.org/'] as f:
      print[f.read[].decode['utf-8']]
except urllib.error.URLError as e:
   print[e.reason]

The above is for contents that return 'utf-8'. Remove .decode['utf-8'] if you want python to "guess the appropriate encoding."

Documentation: //docs.python.org/3/library/urllib.request.html#module-urllib.request

answered May 24, 2019 at 14:50

FreddieFreddie

7701 gold badge10 silver badges20 bronze badges

1

A solution with works with Python 2.X and Python 3.X makes use of the Python 2 and 3 compatibility library six:

from six.moves.urllib.request import urlopen
link = "//www.somesite.com/details.pl?urn=2344"
response = urlopen[link]
content = response.read[]
print[content]

answered Jan 20, 2015 at 8:17

Martin ThomaMartin Thoma

113k148 gold badges570 silver badges875 bronze badges

We can read website html content as below :

from urllib.request import urlopen
response = urlopen['//google.com/']
html = response.read[]
print[html]

answered Mar 8, 2018 at 9:21

Akash KinwadAkash Kinwad

6541 gold badge7 silver badges21 bronze badges

1

#!/usr/bin/python
# -*- coding: utf-8 -*-
# Works on python 3 and python 2.
# when server knows where the request is coming from.

import sys

if sys.version_info[0] == 3:
    from urllib.request import urlopen
else:
    from urllib import urlopen
with urlopen['//www.facebook.com/'] as \
    url:
    data = url.read[]

print data

# When the server does not know where the request is coming from.
# Works on python 3.

import urllib.request

user_agent = \
    'Mozilla/5.0 [Windows; U; Windows NT 5.1; en-US; rv:1.9.0.7] Gecko/2009021910 Firefox/3.0.7'

url = '//www.facebook.com/'
headers = {'User-Agent': user_agent}

request = urllib.request.Request[url, None, headers]
response = urllib.request.urlopen[request]
data = response.read[]
print data

answered Aug 24, 2019 at 7:14

The URL should be a string:

import urllib

link = "//www.somesite.com/details.pl?urn=2344"
f = urllib.urlopen[link]           
myfile = f.readline[]  
print myfile

answered Feb 28, 2013 at 14:58

ATOzTOAATOzTOA

33.6k22 gold badges92 silver badges116 bronze badges

1

I used the following code:

import urllib

def read_text[]:
      quotes = urllib.urlopen["//s3.amazonaws.com/udacity-hosted-downloads/ud036/movie_quotes.txt"]
      contents_file = quotes.read[]
      print contents_file

read_text[]

answered Aug 22, 2017 at 11:00

# retrieving data from url
# only for python 3

import urllib.request

def main[]:
  url = "//docs.python.org"

# retrieving data from URL
  webUrl = urllib.request.urlopen[url]
  print["Result code: " + str[webUrl.getcode[]]]

# print data from URL 
  print["Returned data: -----------------"]
  data = webUrl.read[].decode["utf-8"]
  print[data]

if __name__ == "__main__":
  main[]

answered Nov 27, 2019 at 7:37

from urllib.request import urlopen

# if has Chinese, apply decode[]
html = urlopen["//blog.csdn.net/qq_39591494/article/details/83934260"].read[].decode['utf-8']
print[html]

codedge

4,4602 gold badges21 silver badges38 bronze badges

answered May 16, 2020 at 7:59

1

Can Python pull data from a website?

When scraping data from websites with Python, you're often interested in particular parts of the page. By spending some time looking through the HTML document, you can identify tags with unique attributes that you can use to extract the data you need.

How do you read data from a website?

There are roughly 5 steps as below:.
Inspect the website HTML that you want to crawl..
Access URL of the website using code and download all the HTML contents on the page..
Format the downloaded content into a readable format..
Extract out useful information and save it into a structured format..

How do I read a page in Python?

Python - Reading HTML Pages.
Install Beautifulsoup. Use the Anaconda package manager to install the required package and its dependent packages. ... .
Reading the HTML file. In the below example we make a request to an url to be loaded into the python environment. ... .
Extracting Tag Value. ... .
Extracting All Tags..

How do I extract text from a URL in Python?

URL extraction is achieved from a text file by using regular expression. The expression fetches the text wherever it matches the pattern. Only the re module is used for this purpose.

Chủ Đề