Owning the In­ter­net with Python and Re­quests

Python mod­ules for send­ing HTTP re­quests can be cum­ber­some. While the in­built Python mo­d­ule has HTTP ca­pa­bil­i­ties, the prob­lem lies with the bro­ken APIs. It re­quires tre­men­dous ef­fort to over­come this draw­back. In this ar­ti­cle, learn how to ac­cess Web API

OpenSource For You - - CONTENTS - By: Boud­hayan Gupta

Python is a script­ing lan­guage with so much power that it's ac­tu­ally used as a gen­eral-pur­pose pro­gram­ming lan­guage. It is so nat­u­ral that it is easy to learn. And so high-level that it's very easy to per­form com­plex func­tions with it, us­ing just a few lines of code. And yet, to in­ter­act with the In­ter­net with the stan­dard li­brary, you have to jump through many hoops.

Surely, in­ter­act­ing with the In­ter­net in Python could be a lot eas­ier, a lot more ‘Pythonic'? Well, there's a li­brary called Re­quests, and it does just that.

Get­ting Started

Be­fore us­ing Re­quests, you need to down­load it. If you're in a vir­tual en­vi­ron­ment, you can just use pip to do the job for you, as fol­lows:

$: pip in­stall re­quests

For in­stalling Re­quests sys­tem-wide, you can check if your dis­tro has a pack­age. ‘Re­quests' is avail­able for both Python 2 and Python 3, so down­load the cor­rect pack­age.

Let's start with the ba­sics. To im­port Re­quests, run the fol­low­ing com­mand:

>>> im­port re­quests

Start with a sim­ple GET re­quest. You will get Google's home page: >>> r = re­quests.get("https://www.google.com") >>> r.sta­tus_­code 200 >>> r.head­ers["con­tent-type"] 'text/html; charset=ISO-8859-1' >>> r.en­cod­ing

'ISO-8859-1' >>> r.text '<!doctype html><html item­scope=""...

It’s that easy. To make a re­quest, sim­ply use the method that cor­re­sponds to the verb you want to use. So to make a GET re­quest, use re­quests.get(); to make a POST re­quest, use re­quests.post() and so on. ‘Re­quests’ cur­rently sup­ports GET, POST, HEAD, OP­TIONS, DELETE and PATCH. Let’s break down the above code snip­pet, bit by bit:

>>> r = re­quests.get(“https://www.google.com”)

r is now a Re­sponse ob­ject, which is pretty pow­er­ful. Let’s see what it can give us: >>> r.sta­tus_­code 200

200 means ‘OK’, so the HTTP re­quest went per­fectly and the server dis­patched what­ever data you wanted. So let’s see what kind of data you can ex­pect to see in the re­sponse body: >>> r.head­ers[“con­tent-type”] ‘text/html; charset=ISO-8859-1’

Right, so it’s an HTML page and it’s not utf-8, but plain old ISO-8859-1. Hey, while we’re at it, why not see all the head­ers that the server sent down? Here’s the dump: >>> r.head­ers CaseInsen­si­tiveDict({' cache-con­trol': 'pri­vate, max-age=0', 'con­tent-type': 'text/html; charset=ISO-8859-1', 'x-xss-pro­tec­tion': '1; mode=block', 'server': 'gws', 'trans­fer-en­cod­ing': 'chun­ked', 'date': 'Sat, 23 Nov 2013 19:12:11 GMT', ...

})

So it’s a spe­cial kind of a dic­tionary - a case-in­sen­si­tive one. The HTTP spec­i­fi­ca­tion says that HTTP head­ers can be case in­sen­si­tive, hence this. It means you can do the fol­low­ing: >>> r.head­ers[“con­tent-type”] ‘text/html; charset=ISO-8859-1’

And you can also do what fol­lows: >>> r.head­ers[“Con­tent-Type”] ‘text/html; charset=ISO-8859-1’

And both work the same. Re­mem­ber that the re­sponse was en­coded in ISO-8859-1. Well, I want utf-8. Re­quests can han­dle this. >>> r.en­cod­ing 'ISO-8859-1' >>> r.en­cod­ing = "utf-8" >>> r.en­cod­ing 'utf-8'

And I’ve just re-en­coded that data to utf-8 Uni­code, on the fly.

Any­way, we’ve played around with the re­sponse. Now, it’s time to see the ac­tual data that the server beamed down to us. We can do it in one of three ways. The sim­plest is to sim­ply ac­cess the re­sponse body as text: >>> r.text '<!doctype html><html item­scope=""...

If I’m down­load­ing an im­age, there won’t be any text to see. You’re go­ing to have to ac­cess the raw bi­nary data, like this: >>> r.con­tent b'<!doctype html><html item­scope=""...

Re­quests will au­to­mat­i­cally de­com­press the re­sponse if it's been en­cod­ing with gzip or de­flate. r.con­tent will give you the un­com­pressed raw byte stream.

This is all good for small data - a JSON re­sponse or a Web page. What if you’re down­load­ing a multi-gi­ga­byte file? ‘Re­quests’ sup­ports stream­ing re­sponses, so you can do the fol­low­ing: >>> r = re­quests.get(“http://ex­am­ple.com/re­ally-big-file.bin”) >>> with open(“lo­cal-file.bin”, “wb”) as fd: … for chunk in r.iter_­con­tent(chunk_­size): … fd.write(chunk)

That’ll save re­ally-big-file.bin as lo­cal-file.bin, and since it’s us­ing HTTP stream­ing down­loads, it won’t cache the en­tire file in mem­ory be­fore dump­ing it into the lo­cal file.

‘Re­quests’ has an awe­some trick that makes it the per­fect choice when you’re writ­ing REST API clients. It can au­to­mat­i­cally de­code JSON re­sponses into Python ob­jects, as fol­lows: >>> im­port re­quests >>> r = re­quests.get('https://github.com/time­line.json') >>> r.head­ers[“Con­tent-Type”] 'ap­pli­ca­tion/json; charset=utf-8' >>> r.json() [{u'repos­i­tory': {u'open_is­sues': 0, u'url': 'https://github. com/...

Cur­rently, Re­quests only tries to get and au­to­mat­i­cally con­vert the re­sponse data into a Python ob­ject if the re­sponse mime­type is ap­pli­ca­tion/json.

Pa­ram­e­ters

Let’s make a re­quest with a cou­ple of URL pa­ram­e­ters. We can al­ways con­struct the re­quest URL man­u­ally, like this: >>> url = “http://ex­am­ple.com/param?arga=one&argb=2” >>> r = re­quests.get(url)

This works, but there’s a sim­pler way. We can create an ar­gu­ment pay­load (ba­si­cally a dic­tionary), as shown be­low:

>>> args = {“arga”: “one”, “argb”: 2}

And then make a re­quest as fol­lows:

>>> r = re­quests.get(“http://ex­am­ple.com/param”, params=args)

If you want to ver­ify the URL that the re­quest was made to, print r.url (where r is the re­sponse ob­ject), and you’ll see that the URL was cor­rectly con­structed: >>> r.url ‘http://ex­am­ple.com/param?arga=one&argb=2’

If you want to make a POST re­quest to the same end­point with the same data, you can use the fol­low­ing code:

>>> r = re­quests.post(“http://ex­am­ple.com/param”, data=args)

Note that with POST, PUT, DELETE and the rest, you use the data ar­gu­ment. As the data is a dic­tionary, Re­quests will form-en­code the data and send it. To get past this, you can sim­ply pre-en­code the data into a string and pass it. So to send a JSON re­quest body (such as when you’re mak­ing an API call), you can sim­ply do the fol­low­ing: >>> r = re­quests.post(“http://ex­am­ple.com/param”, data=json. dumps(args))

Let’s up­load a file. Re­quests makes it easy to do Mul­ti­part-En­coded (ba­si­cally, chun­ked) up­loads - all you have to do is pro­vide a file-like ob­ject. Some­thing like what fol­lows should do nicely: >>> url = 'http://ex­am­ple.com/file-up­load' >>> files = {“file”: (“re­port.xls”, open(“re­port.xls”, “rb”))} >>> r = re­quests.post(url, files=files)

I should ex­plain the files dic­tionary a bit. The keys to the dic­tionary are the names of the file-in­put form fields that you would create in a HTML form. The value tu­ple has two val­ues in­side it - the first one is the file name you want the server to see, and the se­cond is the file-like ob­ject that Re­quests reads the data from. Easy enough? Let’s try send­ing cus­tom head­ers. Let’s sup­pose we have an API end­point that can ac­cept data se­ri­alised as both JSON and XML. Then, we would need to spec­ify what for­mat we’re send­ing by pop­u­lat­ing the Con­tent-Type re­quest header with the ap­pro­pri­ate mime­type.

Now, let’s send some JSON data to the afore­men­tioned end­point: >>> im­port sim­ple­j­son as json >>> url = “https://ex­am­ple.com/api_end­point” >>> pay­load = {“pi”: 3.14159, “e”: 2.71828} >>> head­ers = {“Con­tent-Type”: “ap­pli­ca­tion/json”} >>> r = re­quests.post(url, data=json.dumps(pay­load), head­ers=head­ers)

That’s it! All you need to do is pop­u­late the head­ers ar­gu­ment with a dic­tionary con­tain­ing your head­ers. Re­quests will au­to­mat­i­cally turn it into a ca­sein­sen­si­tive dic­tionary, so you don’t have to go dig­ging for CaseInsen­si­tiveDict your­self.

Let’s try some­thing dif­fer­ent now…some­thing tasty!

Cook­ies

You might need to deal with HTTP cook­ies in your ap­pli­ca­tions. ‘Re­quests’ makes it pretty easy to do so. Ev­ery re­sponse ob­ject has a cook­ies prop­erty that holds all the cook­ies the server passed down to you. Let’s see what it looks like: >>> r.cook­ies <<class 're­quests.cook­ies.Re­quest­sCook­ieJar'>[Cookie(versi on=0, name='NID', value='67…

To ac­cess in­di­vid­ual cook­ies, run the fol­low­ing com­mands: >>> r.cook­ies.get(“NID”) ‘67=BbB9rqQYqjGgH… >>> r.cook­ies[“NID”] '67=BbB9rqQYqjGgH… Right, so it be­haves like a dic­tionary. Send­ing cook­ies is equally easy. All you need to do is load up the cook­ies ar­gu­ment to the re­quest method with a dic­tionary, as shown be­low: >>> url = “http://ex­am­ple.com/cook­ies” >>> cook­ies = {“the_an­swer_­to_life_the_u­ni­verse_and_ ev­ery­thing”: 42} >>> r = re­quests.get(url, cook­ies=cook­ies)

Let’s try some­thing a lit­tle more mun­dane now.

Au­to­matic redi­rects and his­tory

No­tice that if you try to get Google’s home­page through www.google.com, Re­quests gives you the home­page. But www.google.com doesn’t re­ally serve you a page but sim­ply redi­rects you to Google’s coun­try-spe­cific home­page for the coun­try you’re ac­cess­ing the home­page from. What’s go­ing on here?

Well, Re­quests au­to­mat­i­cally fol­lows all HTTP redi­rects for all verbs ex­cept HEAD. That means GET, PUT, POST, DELETE and PATCH are cov­ered. And just be­cause Re­quests does it au­to­mat­i­cally for these verbs doesn’t mean you can’t turn it off, or that you can’t turn it on for HEAD.

Let’s see what’s go­ing on here: >>> r = re­quests.get(“https://www.google.com”) >>> r.url 'https://www.google.co.in/' >>> r.sta­tus_­code 200

So we queried for www.google.com, but got back www. google.co.in, with a 200 OK sta­tus code. What hap­pened to the Re­quest to www.google.com? >>> r.his­tory [<Re­sponse [302]>]

That’s the his­tory of all the re­quests that had to be com­pleted to get us to the 200 OK re­sponse. There’s only one of them here for this re­quest, but if Re­quests had to go through more than one re­di­rect­ion, this list would be sorted from the old­est to the most re­cent re­quest.

Let’s try some­thing else: >>> r.his­tory[0].url ‘https://www.google.com/' >>> r.his­tory[0].sta­tus_­code 302

So the his­tory list is ac­tu­ally a list of full-fledged Re­sponse ob­jects. You could in­spect all of them and grab what­ever data you wanted - sta­tus codes, head­ers, URLs–from them.

What if you didn’t want the redi­rec­tions to be fol­lowed? You could do the fol­low­ing: >>> r = re­quests.get(“https://www.google.com”, al­low_ redi­rects=False) >>> r.url 'https://www.google.com/' >>> r.sta­tus_­code 302

And if you wanted your HEAD re­quest to fol­low all redi­rects, you’d do as fol­lows: >>> r = re­quests.head(“https://www.google.com”, al­low_ redi­rects=True) >>> r.url 'https://www.google.co.in/' >>> r.sta­tus_­code 200

By de­fault, Re­quests will re­solve 30 redi­rects be­fore giv­ing up with a TooManyRedi­rects er­ror. You can change that, but you’d have to create a Ses­sion first, which I’ll go into later. Let’s do some au­then­ti­ca­tion now.

Au­then­ti­ca­tion

HTTP was built to have pro­tected re­sources, and REST re­lies heav­ily on au­then­ti­ca­tion. Re­quests makes HTTP au­then­ti­ca­tion al­most too easy.

Let’s start with HTTP Ba­sic Au­then­ti­ca­tion, which apart from be­ing easy, since it’s done over SSL, is pretty se­cure and pretty com­mon. So com­mon, in fact, that Re­quests pro­vides a handy short­hand to do it: >>> re­quests.get(“https://api.github.com/user”, auth=(“user”, “pass”)) <Re­sponse [200]>

The Re­quests au­then­ti­ca­tion sys­tem is mod­u­lar, and au­then­ti­ca­tion mod­ules can be plugged-in per re­quest or per ses­sion. The long way to do HTTP Ba­sic Au­then­ti­ca­tion is as fol­lows: >>> from re­quests.auth im­port HTTPBa­sicAuth >>> re­quests.get(“https://api.github.com/user”, auth=HTTPBa­sicAuth(“user”, “pass”)) <Re­sponse [200]>

HTTPBa­sicAuth is an au­then­ti­ca­tion plug-in and it does HTTP Ba­sic Au­then­ti­ca­tion. There are plenty more of these au­then­ti­ca­tion plug-ins, and some of them come with Re­quests it­self. For ex­am­ple, to do HTTP Di­gest Au­then­ti­ca­tion, you’d use the HTTPDigestAuth plug-in, as shown be­low: >>> from re­quests.auth im­port HTTPDigestAuth >>> url = “http://ex­am­ple.com/my_di­gest_end­point” >>> re­quests.get(url, auth=HTTPDigestAuth(“user”, “pass”)) <Re­sponse [200]>

There are third party li­braries that al­low other au­then­ti­ca­tion meth­ods. You can do Ker­beros au­then­ti­ca­tion by in­stalling the re­quests-ker­beros pack­age (from PyPI), OAuth 1 and OAuth 2 by in­stalling the re­quests-oauth­lib pack­age, NTLM by in­stalling the re­quest­s_ntlm pack­age, and there’s even a pack­age for AWS au­then­ti­ca­tion (the PyPI pack­age is called re­quests-aws).

Ses­sions

‘Ses­sions’ is a cool fea­ture of Re­quests. It al­lows you to per­sist au­then­ti­ca­tion data and cook­ies across re­quests. You can create a Ses­sion by us­ing the fol­low­ing com­mands: >>> s = re­quests.Ses­sion() >>> s.auth = HTTPDigestAuth(“user”, “pass”) >>> s.head­ers.up­date({“X-De­liver-Pizza-To”: “Home”}) >>> s.max_redi­rects = 500

Now, in­stead of us­ing re­quests.get(), re­quests.post() and the rest, you use the meth­ods of­fered by the Ses­sion ob­ject, like s.get(), s.post() and so on, as shown be­low:

>>> s.get(“http://ex­am­ple.com/my_di­gest_end­point”)

You can add or re­move head­ers from the Ses­sion, as fol­lows: >>> s.get(“http://ex­am­ple.com/my_di­gest_end­point”, head­ers = {“X-De­liver-Pizza-To”: None, “X-Add-Chicken-Chunks”: “Yes”})

This will re­move the X-De­liver-Pizza-To header and add an X-Add-Chicken-Chunks header. To re­move a header, just set its value to None. To over­ride the header, give it a new value. It’s that easy.

Ses­sions are pretty use­ful. Many au­then­ti­ca­tion li­braries that deal in au­then­ti­ca­tion ( re­quests-oauth­lib for one) don’t re­turn au­then­ti­ca­tion plug-ins, but ask for your cre­den­tials and re­turn a Re­quests ses­sion pre-pop­u­lated with the au­then­ti­ca­tion data. This makes things much eas­ier, both for the li­brary pro­gram­mer and the user.

Next

This should be enough to get you started with Re­quests. There is a lot more that Re­quests can do, and read­ing the of­fi­cial doc­u­men­ta­tion should help you fa­mil­iarise your­self with that. You should go ahead and read the doc­u­men­ta­tion for re­quests-oauth­lib too, since OAuth is what most REST APIs use nowa­days.

Writ­ing cus­tom au­then­ti­ca­tion li­braries for Re­quests is a breeze, so if you want to hack on Re­quests, this is a good place to start. Re­quests’ trans­port layer (the bit that han­dles the ac­tual traf­fic over the In­ter­net) is built on url­lib3, but it is mod­u­lar and can be re­placed. The au­thor of Re­quests specif­i­cally wants a non-block­ing back­end to Re­quests; so if you’re up for it, you can start hack­ing on that too.

So let me go off and make some­thing awe­some with Re­quests. Till the next time, read­ers! De­scrib­ing him­self as a 're­tard by choice', the au­thor be­lieves that mad­ness is a cure-all for what­ever is wrong or right with so­ci­ety. A so­cial me­dia en­thu­si­ast, he can be reached at @BaloneyGeek on Twit­ter.

Newspapers in English

Newspapers from India

© PressReader. All rights reserved.