OpenSource For You

Owning the Internet with Python and Requests

Python modules for sending HTTP requests can be cumbersome. While the inbuilt Python module has HTTP capabiliti­es, the problem lies with the broken APIs. It requires tremendous effort to overcome this drawback. In this article, learn how to access Web API

- By: Boudhayan Gupta

Python is a scripting language with so much power that it's actually used as a general-purpose programmin­g language. It is so natural that it is easy to learn. And so high-level that it's very easy to perform complex functions with it, using just a few lines of code. And yet, to interact with the Internet with the standard library, you have to jump through many hoops.

Surely, interactin­g with the Internet in Python could be a lot easier, a lot more ‘Pythonic'? Well, there's a library called Requests, and it does just that.

Getting Started

Before using Requests, you need to download it. If you're in a virtual environmen­t, you can just use pip to do the job for you, as follows:

$: pip install requests

For installing Requests system-wide, you can check if your distro has a package. ‘Requests' is available for both Python 2 and Python 3, so download the correct package.

Let's start with the basics. To import Requests, run the following command:

>>> import requests

Start with a simple GET request. You will get Google's home page: >>> r = requests.get("https://www.google.com") >>> r.status_code 200 >>> r.headers["content-type"] 'text/html; charset=ISO-8859-1' >>> r.encoding

'ISO-8859-1' >>> r.text '<!doctype html><html itemscope=""...

It’s that easy. To make a request, simply use the method that correspond­s to the verb you want to use. So to make a GET request, use requests.get(); to make a POST request, use requests.post() and so on. ‘Requests’ currently supports GET, POST, HEAD, OPTIONS, DELETE and PATCH. Let’s break down the above code snippet, bit by bit:

>>> r = requests.get(“https://www.google.com”)

r is now a Response object, which is pretty powerful. Let’s see what it can give us: >>> r.status_code 200

200 means ‘OK’, so the HTTP request went perfectly and the server dispatched whatever data you wanted. So let’s see what kind of data you can expect to see in the response body: >>> r.headers[“content-type”] ‘text/html; charset=ISO-8859-1’

Right, so it’s an HTML page and it’s not utf-8, but plain old ISO-8859-1. Hey, while we’re at it, why not see all the headers that the server sent down? Here’s the dump: >>> r.headers CaseInsens­itiveDict({' cache-control': 'private, max-age=0', 'content-type': 'text/html; charset=ISO-8859-1', 'x-xss-protection': '1; mode=block', 'server': 'gws', 'transfer-encoding': 'chunked', 'date': 'Sat, 23 Nov 2013 19:12:11 GMT', ...

})

So it’s a special kind of a dictionary - a case-insensitiv­e one. The HTTP specificat­ion says that HTTP headers can be case insensitiv­e, hence this. It means you can do the following: >>> r.headers[“content-type”] ‘text/html; charset=ISO-8859-1’

And you can also do what follows: >>> r.headers[“Content-Type”] ‘text/html; charset=ISO-8859-1’

And both work the same. Remember that the response was encoded in ISO-8859-1. Well, I want utf-8. Requests can handle this. >>> r.encoding 'ISO-8859-1' >>> r.encoding = "utf-8" >>> r.encoding 'utf-8'

And I’ve just re-encoded that data to utf-8 Unicode, on the fly.

Anyway, we’ve played around with the response. Now, it’s time to see the actual data that the server beamed down to us. We can do it in one of three ways. The simplest is to simply access the response body as text: >>> r.text '<!doctype html><html itemscope=""...

If I’m downloadin­g an image, there won’t be any text to see. You’re going to have to access the raw binary data, like this: >>> r.content b'<!doctype html><html itemscope=""...

Requests will automatica­lly decompress the response if it's been encoding with gzip or deflate. r.content will give you the uncompress­ed raw byte stream.

This is all good for small data - a JSON response or a Web page. What if you’re downloadin­g a multi-gigabyte file? ‘Requests’ supports streaming responses, so you can do the following: >>> r = requests.get(“http://example.com/really-big-file.bin”) >>> with open(“local-file.bin”, “wb”) as fd: … for chunk in r.iter_content(chunk_size): … fd.write(chunk)

That’ll save really-big-file.bin as local-file.bin, and since it’s using HTTP streaming downloads, it won’t cache the entire file in memory before dumping it into the local file.

‘Requests’ has an awesome trick that makes it the perfect choice when you’re writing REST API clients. It can automatica­lly decode JSON responses into Python objects, as follows: >>> import requests >>> r = requests.get('https://github.com/timeline.json') >>> r.headers[“Content-Type”] 'applicatio­n/json; charset=utf-8' >>> r.json() [{u'repository': {u'open_issues': 0, u'url': 'https://github. com/...

Currently, Requests only tries to get and automatica­lly convert the response data into a Python object if the response mimetype is applicatio­n/json.

Parameters

Let’s make a request with a couple of URL parameters. We can always construct the request URL manually, like this: >>> url = “http://example.com/param?arga=one&argb=2” >>> r = requests.get(url)

This works, but there’s a simpler way. We can create an argument payload (basically a dictionary), as shown below:

>>> args = {“arga”: “one”, “argb”: 2}

And then make a request as follows:

>>> r = requests.get(“http://example.com/param”, params=args)

If you want to verify the URL that the request was made to, print r.url (where r is the response object), and you’ll see that the URL was correctly constructe­d: >>> r.url ‘http://example.com/param?arga=one&argb=2’

If you want to make a POST request to the same endpoint with the same data, you can use the following code:

>>> r = requests.post(“http://example.com/param”, data=args)

Note that with POST, PUT, DELETE and the rest, you use the data argument. As the data is a dictionary, Requests will form-encode the data and send it. To get past this, you can simply pre-encode the data into a string and pass it. So to send a JSON request body (such as when you’re making an API call), you can simply do the following: >>> r = requests.post(“http://example.com/param”, data=json. dumps(args))

Let’s upload a file. Requests makes it easy to do Multipart-Encoded (basically, chunked) uploads - all you have to do is provide a file-like object. Something like what follows should do nicely: >>> url = 'http://example.com/file-upload' >>> files = {“file”: (“report.xls”, open(“report.xls”, “rb”))} >>> r = requests.post(url, files=files)

I should explain the files dictionary a bit. The keys to the dictionary are the names of the file-input form fields that you would create in a HTML form. The value tuple has two values inside it - the first one is the file name you want the server to see, and the second is the file-like object that Requests reads the data from. Easy enough? Let’s try sending custom headers. Let’s suppose we have an API endpoint that can accept data serialised as both JSON and XML. Then, we would need to specify what format we’re sending by populating the Content-Type request header with the appropriat­e mimetype.

Now, let’s send some JSON data to the aforementi­oned endpoint: >>> import simplejson as json >>> url = “https://example.com/api_endpoint” >>> payload = {“pi”: 3.14159, “e”: 2.71828} >>> headers = {“Content-Type”: “applicatio­n/json”} >>> r = requests.post(url, data=json.dumps(payload), headers=headers)

That’s it! All you need to do is populate the headers argument with a dictionary containing your headers. Requests will automatica­lly turn it into a caseinsens­itive dictionary, so you don’t have to go digging for CaseInsens­itiveDict yourself.

Let’s try something different now…something tasty!

Cookies

You might need to deal with HTTP cookies in your applicatio­ns. ‘Requests’ makes it pretty easy to do so. Every response object has a cookies property that holds all the cookies the server passed down to you. Let’s see what it looks like: >>> r.cookies <<class 'requests.cookies.RequestsCo­okieJar'>[Cookie(versi on=0, name='NID', value='67…

To access individual cookies, run the following commands: >>> r.cookies.get(“NID”) ‘67=BbB9rqQYqj­GgH… >>> r.cookies[“NID”] '67=BbB9rqQYqj­GgH… Right, so it behaves like a dictionary. Sending cookies is equally easy. All you need to do is load up the cookies argument to the request method with a dictionary, as shown below: >>> url = “http://example.com/cookies” >>> cookies = {“the_answer_to_life_the_universe_and_ everything”: 42} >>> r = requests.get(url, cookies=cookies)

Let’s try something a little more mundane now.

Automatic redirects and history

Notice that if you try to get Google’s homepage through www.google.com, Requests gives you the homepage. But www.google.com doesn’t really serve you a page but simply redirects you to Google’s country-specific homepage for the country you’re accessing the homepage from. What’s going on here?

Well, Requests automatica­lly follows all HTTP redirects for all verbs except HEAD. That means GET, PUT, POST, DELETE and PATCH are covered. And just because Requests does it automatica­lly for these verbs doesn’t mean you can’t turn it off, or that you can’t turn it on for HEAD.

Let’s see what’s going on here: >>> r = requests.get(“https://www.google.com”) >>> r.url 'https://www.google.co.in/' >>> r.status_code 200

So we queried for www.google.com, but got back www. google.co.in, with a 200 OK status code. What happened to the Request to www.google.com? >>> r.history [<Response [302]>]

That’s the history of all the requests that had to be completed to get us to the 200 OK response. There’s only one of them here for this request, but if Requests had to go through more than one redirectio­n, this list would be sorted from the oldest to the most recent request.

Let’s try something else: >>> r.history[0].url ‘https://www.google.com/' >>> r.history[0].status_code 302

So the history list is actually a list of full-fledged Response objects. You could inspect all of them and grab whatever data you wanted - status codes, headers, URLs–from them.

What if you didn’t want the redirectio­ns to be followed? You could do the following: >>> r = requests.get(“https://www.google.com”, allow_ redirects=False) >>> r.url 'https://www.google.com/' >>> r.status_code 302

And if you wanted your HEAD request to follow all redirects, you’d do as follows: >>> r = requests.head(“https://www.google.com”, allow_ redirects=True) >>> r.url 'https://www.google.co.in/' >>> r.status_code 200

By default, Requests will resolve 30 redirects before giving up with a TooManyRed­irects error. You can change that, but you’d have to create a Session first, which I’ll go into later. Let’s do some authentica­tion now.

Authentica­tion

HTTP was built to have protected resources, and REST relies heavily on authentica­tion. Requests makes HTTP authentica­tion almost too easy.

Let’s start with HTTP Basic Authentica­tion, which apart from being easy, since it’s done over SSL, is pretty secure and pretty common. So common, in fact, that Requests provides a handy shorthand to do it: >>> requests.get(“https://api.github.com/user”, auth=(“user”, “pass”)) <Response [200]>

The Requests authentica­tion system is modular, and authentica­tion modules can be plugged-in per request or per session. The long way to do HTTP Basic Authentica­tion is as follows: >>> from requests.auth import HTTPBasicA­uth >>> requests.get(“https://api.github.com/user”, auth=HTTPBasicA­uth(“user”, “pass”)) <Response [200]>

HTTPBasicA­uth is an authentica­tion plug-in and it does HTTP Basic Authentica­tion. There are plenty more of these authentica­tion plug-ins, and some of them come with Requests itself. For example, to do HTTP Digest Authentica­tion, you’d use the HTTPDigest­Auth plug-in, as shown below: >>> from requests.auth import HTTPDigest­Auth >>> url = “http://example.com/my_digest_endpoint” >>> requests.get(url, auth=HTTPDigest­Auth(“user”, “pass”)) <Response [200]>

There are third party libraries that allow other authentica­tion methods. You can do Kerberos authentica­tion by installing the requests-kerberos package (from PyPI), OAuth 1 and OAuth 2 by installing the requests-oauthlib package, NTLM by installing the requests_ntlm package, and there’s even a package for AWS authentica­tion (the PyPI package is called requests-aws).

Sessions

‘Sessions’ is a cool feature of Requests. It allows you to persist authentica­tion data and cookies across requests. You can create a Session by using the following commands: >>> s = requests.Session() >>> s.auth = HTTPDigest­Auth(“user”, “pass”) >>> s.headers.update({“X-Deliver-Pizza-To”: “Home”}) >>> s.max_redirects = 500

Now, instead of using requests.get(), requests.post() and the rest, you use the methods offered by the Session object, like s.get(), s.post() and so on, as shown below:

>>> s.get(“http://example.com/my_digest_endpoint”)

You can add or remove headers from the Session, as follows: >>> s.get(“http://example.com/my_digest_endpoint”, headers = {“X-Deliver-Pizza-To”: None, “X-Add-Chicken-Chunks”: “Yes”})

This will remove the X-Deliver-Pizza-To header and add an X-Add-Chicken-Chunks header. To remove a header, just set its value to None. To override the header, give it a new value. It’s that easy.

Sessions are pretty useful. Many authentica­tion libraries that deal in authentica­tion ( requests-oauthlib for one) don’t return authentica­tion plug-ins, but ask for your credential­s and return a Requests session pre-populated with the authentica­tion data. This makes things much easier, both for the library programmer and the user.

Next

This should be enough to get you started with Requests. There is a lot more that Requests can do, and reading the official documentat­ion should help you familiaris­e yourself with that. You should go ahead and read the documentat­ion for requests-oauthlib too, since OAuth is what most REST APIs use nowadays.

Writing custom authentica­tion libraries for Requests is a breeze, so if you want to hack on Requests, this is a good place to start. Requests’ transport layer (the bit that handles the actual traffic over the Internet) is built on urllib3, but it is modular and can be replaced. The author of Requests specifical­ly wants a non-blocking backend to Requests; so if you’re up for it, you can start hacking on that too.

So let me go off and make something awesome with Requests. Till the next time, readers! Describing himself as a 'retard by choice', the author believes that madness is a cure-all for whatever is wrong or right with society. A social media enthusiast, he can be reached at @BaloneyGee­k on Twitter.

 ??  ??
 ??  ??
 ??  ??

Newspapers in English

Newspapers from India