Skip to main content Brad's PyNotes

Urllib Module

TL;DR

The urllib package provides URL handling through four submodules: request (fetch URLs), parse (manipulate URLs), error (exceptions), and robotparser (robots.txt).

Interesting!

Unlike many languages that require external libraries, Python includes full-featured HTTP client capabilities right in the standard library!

Making HTTP Requests

python code snippet start

import urllib.request

# Simple GET request
with urllib.request.urlopen('https://httpbin.org/json') as response:
    data = response.read().decode('utf-8')
    print(data)

# POST request with data
import urllib.parse

data = urllib.parse.urlencode({'key': 'value'}).encode('utf-8')
req = urllib.request.Request('https://httpbin.org/post', data=data)
with urllib.request.urlopen(req) as response:
    print(response.read().decode('utf-8'))

python code snippet end

URL Parsing

python code snippet start

from urllib.parse import urlparse, urljoin, quote

# Parse URL components
url = 'https://example.com:8080/path?query=value#fragment'
parsed = urlparse(url)
print(parsed.hostname)  # example.com
print(parsed.port)      # 8080
print(parsed.query)     # query=value

# Join URLs
base = 'https://example.com/api/'
endpoint = 'users/123'
full_url = urljoin(base, endpoint)  # https://example.com/api/users/123

# URL encoding
safe_string = quote('hello world!')  # hello%20world%21

python code snippet end

Error Handling

python code snippet start

from urllib.error import HTTPError, URLError

try:
    with urllib.request.urlopen('https://httpbin.org/status/404') as response:
        data = response.read()
except HTTPError as e:
    print(f"HTTP Error: {e.code}")
except URLError as e:
    print(f"URL Error: {e.reason}")

python code snippet end

Custom Headers and Authentication

python code snippet start

# Add headers
req = urllib.request.Request('https://api.example.com/data')
req.add_header('User-Agent', 'MyApp/1.0')
req.add_header('Authorization', 'Bearer token123')

with urllib.request.urlopen(req) as response:
    print(response.read().decode('utf-8'))

python code snippet end

urllib provides everything needed for HTTP operations without external dependencies - perfect for simple web requests and URL manipulation!

While urllib covers basic HTTP needs, it pairs well with JSON handling for API work and asyncio for concurrent requests in high-performance applications. For robust error handling, see exception patterns and logging HTTP operations .

Reference: urllib — URL handling modules