Owning the Internet with Python and Requests
Python modules for sending HTTP requests can be cumbersome. While the inbuilt Python module has HTTP capabilities, the problem lies with the broken APIs. It requires tremendous effort to overcome this drawback. In this article, learn how to access Web API
Python is a scripting language with so much power that it's actually used as a general-purpose programming language. It is so natural that it is easy to learn. And so high-level that it's very easy to perform complex functions with it, using just a few lines of code. And yet, to interact with the Internet with the standard library, you have to jump through many hoops.
Surely, interacting with the Internet in Python could be a lot easier, a lot more ‘Pythonic'? Well, there's a library called Requests, and it does just that.
Getting Started
Before using Requests, you need to download it. If you're in a virtual environment, you can just use pip to do the job for you, as follows:
$: pip install requests
For installing Requests system-wide, you can check if your distro has a package. ‘Requests' is available for both Python 2 and Python 3, so download the correct package.
Let's start with the basics. To import Requests, run the following command:
>>> import requests
Start with a simple GET request. You will get Google's home page: >>> r = requests.get("https://www.google.com") >>> r.status_code 200 >>> r.headers["content-type"] 'text/html; charset=ISO-8859-1' >>> r.encoding
'ISO-8859-1' >>> r.text '<!doctype html><html itemscope=""...
It’s that easy. To make a request, simply use the method that corresponds to the verb you want to use. So to make a GET request, use requests.get(); to make a POST request, use requests.post() and so on. ‘Requests’ currently supports GET, POST, HEAD, OPTIONS, DELETE and PATCH. Let’s break down the above code snippet, bit by bit:
>>> r = requests.get(“https://www.google.com”)
r is now a Response object, which is pretty powerful. Let’s see what it can give us: >>> r.status_code 200
200 means ‘OK’, so the HTTP request went perfectly and the server dispatched whatever data you wanted. So let’s see what kind of data you can expect to see in the response body: >>> r.headers[“content-type”] ‘text/html; charset=ISO-8859-1’
Right, so it’s an HTML page and it’s not utf-8, but plain old ISO-8859-1. Hey, while we’re at it, why not see all the headers that the server sent down? Here’s the dump: >>> r.headers CaseInsensitiveDict({' cache-control': 'private, max-age=0', 'content-type': 'text/html; charset=ISO-8859-1', 'x-xss-protection': '1; mode=block', 'server': 'gws', 'transfer-encoding': 'chunked', 'date': 'Sat, 23 Nov 2013 19:12:11 GMT', ...
})
So it’s a special kind of a dictionary - a case-insensitive one. The HTTP specification says that HTTP headers can be case insensitive, hence this. It means you can do the following: >>> r.headers[“content-type”] ‘text/html; charset=ISO-8859-1’
And you can also do what follows: >>> r.headers[“Content-Type”] ‘text/html; charset=ISO-8859-1’
And both work the same. Remember that the response was encoded in ISO-8859-1. Well, I want utf-8. Requests can handle this. >>> r.encoding 'ISO-8859-1' >>> r.encoding = "utf-8" >>> r.encoding 'utf-8'
And I’ve just re-encoded that data to utf-8 Unicode, on the fly.
Anyway, we’ve played around with the response. Now, it’s time to see the actual data that the server beamed down to us. We can do it in one of three ways. The simplest is to simply access the response body as text: >>> r.text '<!doctype html><html itemscope=""...
If I’m downloading an image, there won’t be any text to see. You’re going to have to access the raw binary data, like this: >>> r.content b'<!doctype html><html itemscope=""...
Requests will automatically decompress the response if it's been encoding with gzip or deflate. r.content will give you the uncompressed raw byte stream.
This is all good for small data - a JSON response or a Web page. What if you’re downloading a multi-gigabyte file? ‘Requests’ supports streaming responses, so you can do the following: >>> r = requests.get(“http://example.com/really-big-file.bin”) >>> with open(“local-file.bin”, “wb”) as fd: … for chunk in r.iter_content(chunk_size): … fd.write(chunk)
That’ll save really-big-file.bin as local-file.bin, and since it’s using HTTP streaming downloads, it won’t cache the entire file in memory before dumping it into the local file.
‘Requests’ has an awesome trick that makes it the perfect choice when you’re writing REST API clients. It can automatically decode JSON responses into Python objects, as follows: >>> import requests >>> r = requests.get('https://github.com/timeline.json') >>> r.headers[“Content-Type”] 'application/json; charset=utf-8' >>> r.json() [{u'repository': {u'open_issues': 0, u'url': 'https://github. com/...
Currently, Requests only tries to get and automatically convert the response data into a Python object if the response mimetype is application/json.
Parameters
Let’s make a request with a couple of URL parameters. We can always construct the request URL manually, like this: >>> url = “http://example.com/param?arga=one&argb=2” >>> r = requests.get(url)
This works, but there’s a simpler way. We can create an argument payload (basically a dictionary), as shown below:
>>> args = {“arga”: “one”, “argb”: 2}
And then make a request as follows:
>>> r = requests.get(“http://example.com/param”, params=args)
If you want to verify the URL that the request was made to, print r.url (where r is the response object), and you’ll see that the URL was correctly constructed: >>> r.url ‘http://example.com/param?arga=one&argb=2’
If you want to make a POST request to the same endpoint with the same data, you can use the following code:
>>> r = requests.post(“http://example.com/param”, data=args)
Note that with POST, PUT, DELETE and the rest, you use the data argument. As the data is a dictionary, Requests will form-encode the data and send it. To get past this, you can simply pre-encode the data into a string and pass it. So to send a JSON request body (such as when you’re making an API call), you can simply do the following: >>> r = requests.post(“http://example.com/param”, data=json. dumps(args))
Let’s upload a file. Requests makes it easy to do Multipart-Encoded (basically, chunked) uploads - all you have to do is provide a file-like object. Something like what follows should do nicely: >>> url = 'http://example.com/file-upload' >>> files = {“file”: (“report.xls”, open(“report.xls”, “rb”))} >>> r = requests.post(url, files=files)
I should explain the files dictionary a bit. The keys to the dictionary are the names of the file-input form fields that you would create in a HTML form. The value tuple has two values inside it - the first one is the file name you want the server to see, and the second is the file-like object that Requests reads the data from. Easy enough? Let’s try sending custom headers. Let’s suppose we have an API endpoint that can accept data serialised as both JSON and XML. Then, we would need to specify what format we’re sending by populating the Content-Type request header with the appropriate mimetype.
Now, let’s send some JSON data to the aforementioned endpoint: >>> import simplejson as json >>> url = “https://example.com/api_endpoint” >>> payload = {“pi”: 3.14159, “e”: 2.71828} >>> headers = {“Content-Type”: “application/json”} >>> r = requests.post(url, data=json.dumps(payload), headers=headers)
That’s it! All you need to do is populate the headers argument with a dictionary containing your headers. Requests will automatically turn it into a caseinsensitive dictionary, so you don’t have to go digging for CaseInsensitiveDict yourself.
Let’s try something different now…something tasty!
Cookies
You might need to deal with HTTP cookies in your applications. ‘Requests’ makes it pretty easy to do so. Every response object has a cookies property that holds all the cookies the server passed down to you. Let’s see what it looks like: >>> r.cookies <<class 'requests.cookies.RequestsCookieJar'>[Cookie(versi on=0, name='NID', value='67…
To access individual cookies, run the following commands: >>> r.cookies.get(“NID”) ‘67=BbB9rqQYqjGgH… >>> r.cookies[“NID”] '67=BbB9rqQYqjGgH… Right, so it behaves like a dictionary. Sending cookies is equally easy. All you need to do is load up the cookies argument to the request method with a dictionary, as shown below: >>> url = “http://example.com/cookies” >>> cookies = {“the_answer_to_life_the_universe_and_ everything”: 42} >>> r = requests.get(url, cookies=cookies)
Let’s try something a little more mundane now.
Automatic redirects and history
Notice that if you try to get Google’s homepage through www.google.com, Requests gives you the homepage. But www.google.com doesn’t really serve you a page but simply redirects you to Google’s country-specific homepage for the country you’re accessing the homepage from. What’s going on here?
Well, Requests automatically follows all HTTP redirects for all verbs except HEAD. That means GET, PUT, POST, DELETE and PATCH are covered. And just because Requests does it automatically for these verbs doesn’t mean you can’t turn it off, or that you can’t turn it on for HEAD.
Let’s see what’s going on here: >>> r = requests.get(“https://www.google.com”) >>> r.url 'https://www.google.co.in/' >>> r.status_code 200
So we queried for www.google.com, but got back www. google.co.in, with a 200 OK status code. What happened to the Request to www.google.com? >>> r.history [<Response [302]>]
That’s the history of all the requests that had to be completed to get us to the 200 OK response. There’s only one of them here for this request, but if Requests had to go through more than one redirection, this list would be sorted from the oldest to the most recent request.
Let’s try something else: >>> r.history[0].url ‘https://www.google.com/' >>> r.history[0].status_code 302
So the history list is actually a list of full-fledged Response objects. You could inspect all of them and grab whatever data you wanted - status codes, headers, URLs–from them.
What if you didn’t want the redirections to be followed? You could do the following: >>> r = requests.get(“https://www.google.com”, allow_ redirects=False) >>> r.url 'https://www.google.com/' >>> r.status_code 302
And if you wanted your HEAD request to follow all redirects, you’d do as follows: >>> r = requests.head(“https://www.google.com”, allow_ redirects=True) >>> r.url 'https://www.google.co.in/' >>> r.status_code 200
By default, Requests will resolve 30 redirects before giving up with a TooManyRedirects error. You can change that, but you’d have to create a Session first, which I’ll go into later. Let’s do some authentication now.
Authentication
HTTP was built to have protected resources, and REST relies heavily on authentication. Requests makes HTTP authentication almost too easy.
Let’s start with HTTP Basic Authentication, which apart from being easy, since it’s done over SSL, is pretty secure and pretty common. So common, in fact, that Requests provides a handy shorthand to do it: >>> requests.get(“https://api.github.com/user”, auth=(“user”, “pass”)) <Response [200]>
The Requests authentication system is modular, and authentication modules can be plugged-in per request or per session. The long way to do HTTP Basic Authentication is as follows: >>> from requests.auth import HTTPBasicAuth >>> requests.get(“https://api.github.com/user”, auth=HTTPBasicAuth(“user”, “pass”)) <Response [200]>
HTTPBasicAuth is an authentication plug-in and it does HTTP Basic Authentication. There are plenty more of these authentication plug-ins, and some of them come with Requests itself. For example, to do HTTP Digest Authentication, you’d use the HTTPDigestAuth plug-in, as shown below: >>> from requests.auth import HTTPDigestAuth >>> url = “http://example.com/my_digest_endpoint” >>> requests.get(url, auth=HTTPDigestAuth(“user”, “pass”)) <Response [200]>
There are third party libraries that allow other authentication methods. You can do Kerberos authentication by installing the requests-kerberos package (from PyPI), OAuth 1 and OAuth 2 by installing the requests-oauthlib package, NTLM by installing the requests_ntlm package, and there’s even a package for AWS authentication (the PyPI package is called requests-aws).
Sessions
‘Sessions’ is a cool feature of Requests. It allows you to persist authentication data and cookies across requests. You can create a Session by using the following commands: >>> s = requests.Session() >>> s.auth = HTTPDigestAuth(“user”, “pass”) >>> s.headers.update({“X-Deliver-Pizza-To”: “Home”}) >>> s.max_redirects = 500
Now, instead of using requests.get(), requests.post() and the rest, you use the methods offered by the Session object, like s.get(), s.post() and so on, as shown below:
>>> s.get(“http://example.com/my_digest_endpoint”)
You can add or remove headers from the Session, as follows: >>> s.get(“http://example.com/my_digest_endpoint”, headers = {“X-Deliver-Pizza-To”: None, “X-Add-Chicken-Chunks”: “Yes”})
This will remove the X-Deliver-Pizza-To header and add an X-Add-Chicken-Chunks header. To remove a header, just set its value to None. To override the header, give it a new value. It’s that easy.
Sessions are pretty useful. Many authentication libraries that deal in authentication ( requests-oauthlib for one) don’t return authentication plug-ins, but ask for your credentials and return a Requests session pre-populated with the authentication data. This makes things much easier, both for the library programmer and the user.
Next
This should be enough to get you started with Requests. There is a lot more that Requests can do, and reading the official documentation should help you familiarise yourself with that. You should go ahead and read the documentation for requests-oauthlib too, since OAuth is what most REST APIs use nowadays.
Writing custom authentication libraries for Requests is a breeze, so if you want to hack on Requests, this is a good place to start. Requests’ transport layer (the bit that handles the actual traffic over the Internet) is built on urllib3, but it is modular and can be replaced. The author of Requests specifically wants a non-blocking backend to Requests; so if you’re up for it, you can start hacking on that too.
So let me go off and make something awesome with Requests. Till the next time, readers! Describing himself as a 'retard by choice', the author believes that madness is a cure-all for whatever is wrong or right with society. A social media enthusiast, he can be reached at @BaloneyGeek on Twitter.