Urllib Module
TL;DR
The urllib
package provides URL handling through four submodules: request
(fetch URLs), parse
(manipulate URLs), error
(exceptions), and robotparser
(robots.txt).
Interesting!
Unlike many languages that require external libraries, Python includes full-featured HTTP client capabilities right in the standard library!
Making HTTP Requests
python code snippet start
import urllib.request
# Simple GET request
with urllib.request.urlopen('https://httpbin.org/json') as response:
data = response.read().decode('utf-8')
print(data)
# POST request with data
import urllib.parse
data = urllib.parse.urlencode({'key': 'value'}).encode('utf-8')
req = urllib.request.Request('https://httpbin.org/post', data=data)
with urllib.request.urlopen(req) as response:
print(response.read().decode('utf-8'))
python code snippet end
URL Parsing
python code snippet start
from urllib.parse import urlparse, urljoin, quote
# Parse URL components
url = 'https://example.com:8080/path?query=value#fragment'
parsed = urlparse(url)
print(parsed.hostname) # example.com
print(parsed.port) # 8080
print(parsed.query) # query=value
# Join URLs
base = 'https://example.com/api/'
endpoint = 'users/123'
full_url = urljoin(base, endpoint) # https://example.com/api/users/123
# URL encoding
safe_string = quote('hello world!') # hello%20world%21
python code snippet end
Error Handling
python code snippet start
from urllib.error import HTTPError, URLError
try:
with urllib.request.urlopen('https://httpbin.org/status/404') as response:
data = response.read()
except HTTPError as e:
print(f"HTTP Error: {e.code}")
except URLError as e:
print(f"URL Error: {e.reason}")
python code snippet end
Custom Headers and Authentication
python code snippet start
# Add headers
req = urllib.request.Request('https://api.example.com/data')
req.add_header('User-Agent', 'MyApp/1.0')
req.add_header('Authorization', 'Bearer token123')
with urllib.request.urlopen(req) as response:
print(response.read().decode('utf-8'))
python code snippet end
urllib provides everything needed for HTTP operations without external dependencies - perfect for simple web requests and URL manipulation!
While urllib covers basic HTTP needs, it pairs well with JSON handling for API work and asyncio for concurrent requests in high-performance applications. For robust error handling, see exception patterns and logging HTTP operations .
Reference: urllib — URL handling modules