An In­tro­duc­tion to NumPy

Numer­i­cal Python or NumPy is a Python pro­gram­ming lan­guage li­brary that sup­ports large, multi-di­men­sional ar­rays and ma­tri­ces, and comes with a vast col­lec­tion of high­level math­e­mat­i­cal func­tions to op­er­ate on th­ese ar­rays.

OpenSource For You - - Contents - By: Sharon Sunny The au­thor is an as­sis­tant pro­fes­sor at Amaljyothi Col­lege of En­gi­neer­ing, Ker­ala. She can be reached at ssharon099@gmail.com.

Ma­trix-sig, a spe­cial in­ter­est group, was founded in 1995 with the aim of de­vel­op­ing an ar­ray com­put­ing pack­age com­pat­i­ble with the Python pro­gram­ming lan­guage. In the same year, Jim Hu­gunin de­vel­oped a gen­er­alised ma­trix im­ple­men­ta­tion pack­age, Nu­meric. Later, in 2005, Travis Oliphant in­cor­po­rated fea­tures of NumAr­ray and C-API into Nu­meric code to cre­ate NumPy (Numer­i­cal Python), which was de­vel­oped as a part of the SciPy project. Th­ese were sep­a­rated from each other to avoid in­stalling the large SciPy pack­age just to get an ar­ray ob­ject.

In the rest of this ar­ti­cle, ‘>>>’ rep­re­sents the Python in­ter­preter prompt; state­ments with­out this prompt show the out­put of the code.

In­stal­la­tion

NumPy is a BSD-new li­censed li­brary for the Python pro­gram­ming lan­guage. It comes with Python dis­tri­bu­tions like Ana­conda, En­thought Canopy and Pyzo. It can also be in­stalled us­ing pack­age man­agers like dnf or pip. In Ubuntu and De­bian sys­tems, use the fol­low­ing code for in­stalling NumPy:

sudo apt-get in­stall python-numpy

Once NumPy is in­stalled in the sys­tem, we need to im­port it to use the func­tion­al­i­ties pro­vided by it:

im­port numpy as n

The above com­mand will cre­ate an alias ‘n’ while im­port­ing the pack­age. Once you im­port the pack­age, it will be ac­tive till you exit from the in­ter­preter. If the pro­grams are saved in files, you must im­port NumPy in each file.

NumPy, SciPy, Pan­das and Scikit-learn

In Python, many pack­ages pro­vide sup­port for sci­en­tific ap­pli­ca­tions. NumPy, like Matlab, is used for ef­fi­cient ar­ray com­pu­ta­tion. It also pro­vides vec­torised math­e­mat­i­cal func­tions like sin() and cos(). SciPy as­sists us in sci­en­tific com­put­ing by pro­vid­ing meth­ods for in­te­gra­tion, in­ter­po­la­tion, sig­nal pro­cess­ing, statis­tics and lin­ear al­ge­bra. Pan­das helps in data anal­y­sis, statis­tics and vi­su­al­i­sa­tion. We use Scikit-learn when we want to train a ma­chine learn­ing al­go­rithm. It seems like NumPy is in­fe­rior to all th­ese pack­ages. But the beauty is that all th­ese pack­ages use NumPy for their work­ing.

ndar­ray

The soul of NumPy is ‘ndar­ray’, an n-di­men­sional ar­ray.

You can see from the code seg­ment given be­low that when­ever you ap­ply the func­tion ‘type’ on any NumPy ar­ray, it will re­turn the type numpy.ndar­ray ir­re­spec­tive of the type of data stored in it.

>>>im­port numpy as n

>>>a=n.ar­ray([1,2,3])

>>>type(a)

<type ‘numpy.ndar­ray’>

>>>b=n.ar­ray([‘a’,’b’,’c’])

>>>type(b)

<type ‘numpy.ndar­ray’>

Un­like the list data struc­ture in Python, ndar­ray holds el­e­ments of the same data type only. A few at­tributes of the ndar­ray ob­ject are listed be­low. Ex­am­ples shown with the de­scrip­tion of each at­tribute re­fer to the ar­ray ‘a’, whose def­i­ni­tion is as fol­lows:

>>>a=n.ar­ray([[1,2,3],[3,4,5]])

ndar­ray.ndim: This dis­plays the num­ber of di­men­sions of the ar­ray. The ar­ray ‘a’ in the ex­am­ple is two-di­men­sional, and hence the out­put is as fol­lows:

>>>a.ndim 2

ndar­ray.shape: This dis­plays the di­men­sions of the ar­ray, as shown be­low:

>>>a.shape (2,3)

ndar­ray.size: This dis­plays the to­tal num­ber of el­e­ments in the ar­ray, as shown be­low:

>>>a.size 6

ndar­ray.dtype: This dis­plays the data type of el­e­ments, de­pend­ing on the type of data stored in it. Built-in types in­clude int, bool, float, com­plex, bytes, str, uni­code, buf­fer; all oth­ers are re­ferred to as ob­jects. De­fault dtype of ndar­ray is float64.

>>>a.dtype dtype(‘int64’)

In the case of strings, ‘type’ will be dis­played as dtype(‘S#’), where # rep­re­sents the length of the string:

>>>c=n.ar­ray([‘sarah’,’serin’,’jaden’])

>>>c.dtype dtype(‘S5’)

ndar­ray.item­size: This dis­plays the size of each el­e­ment in bytes. In the ex­am­ple, ‘a’ con­tains in­te­gers and the size of the in­te­gers is 8 bytes. Hence, the out­put is as fol­lows:

>>>a.item­size 8

Cre­at­ing ndar­rays

We have al­ready seen in the above ex­am­ples how ndar­rays are cre­ated from a Python list us­ing ar­ray(). A tu­ple can also be used in place of a list. It is pos­si­ble to spec­ify ex­plic­itly the data type of el­e­ments in the ar­ray, as shown be­low:

>>>e=n.ar­ray([[1,2,3],[3,4,5]],dtype=’S2’)

>>>e ar­ray([[‘1’, ‘2’, ‘3’],

[‘3’, ‘4’, ‘5’]], dtype=’|S2’)

>>>e.dtype dtype(‘S2’)

There are other ways too for gen­er­at­ing ar­rays. A few of them are listed be­low (ital­i­cised words in the de­scrip­tion of each func­tion de­note the ar­gu­ments to the func­tions).

ones(shape[,dtype, order]): This re­turns an ar­ray of given di­men­sions and type filled with 1s. ‘Order’ in the op­tion set spec­i­fies whether to store the data in rows or columns. An ex­am­ple is given be­low.

>>>a=n.ones(3,dtype=’S1’)

>>>a ar­ray([‘1’, ‘1’, ‘1’],

dtype=’|S1’)

If the spec­i­fied dtype is Sn, what­ever be the value of ‘n’, the ar­ray gen­er­ated by ones() will con­tain a string of length 1. But, at some later point of time, we will be able to re­place ‘1’ with a string of length up to ‘n’.

empty(shape[,dtype, order]): This re­turns an ar­ray of given di­men­sions and type with­out ini­tial­is­ing the en­tries. In the code seg­ment given be­low, the spec­i­fied dtype is ‘S1’. Hence the ar­ray ‘a’ may be mod­i­fied later to store strings of length 1.

>>>a=n.empty(3,dtype=’S1’)

>>>a ar­ray([‘’, ‘’, ‘’],

dtype=’|S1’)

full(shape,fil­l_­value[,dtype,order]): This re­turns an ar­ray of given di­men­sions filled with ‘fil­l_­value’. If dtype is not ex­plic­itly spec­i­fied, a float ar­ray will be gen­er­ated with a warn­ing mes­sage. A sam­ple state­ment is given be­low:

>>>a=n.full((3,3),2,dtype=int)

>>>a ar­ray([[2, 2, 2],

[2, 2, 2],

[2, 2, 2]])

from­string(string[,dtype,count,sep]): This re­turns a 1-D ar­ray ini­tialised with ‘string’. NumPy takes ‘count’

el­e­ments of type ‘dtype’ from ‘string’ and gen­er­ates an ar­ray. ‘String’ will be in­ter­preted as a bi­nary if ‘sep’, a string, is not pro­vided, and as ASCII oth­er­wise. I would like to add a bit more about from­string(). This func­tion needs the in­put string size to be a mul­ti­ple of the el­e­ment size. Un­less spec­i­fied, the ar­ray cre­ated us­ing this func­tion will be of dtype ‘float64’, which re­quires 8 bytes for rep­re­sen­ta­tion. Con­sider the ex­am­ple given be­low:

>>>a=n.from­string(‘123’)

The above state­ment in­tends to gen­er­ate an ar­ray from the string ‘123’. But it will gen­er­ate an er­ror mes­sage since its length is not a mul­ti­ple of 8.

>>>a=n.from­string(‘12345678’)

The above state­ment will suc­cess­fully gen­er­ate an ar­ray as given be­low:

>>>a ar­ray([ 6.82132005e-38])

Con­sider the ex­am­ple given be­low: >>>a=n.from­string(‘123456’,dtype=’S2’,count=2)

Here, dtype is spec­i­fied as ‘S2’. So the ar­ray ‘a’ will con­tain el­e­ments of length 2.

>>>a ar­ray([‘12’, ‘34’],

dtype=’|S2’)

We can see that ‘a’ con­tains only two el­e­ments since the count given in from­string() is 2.

load­txt(fname[,dtype][,com­ments][,skiprows][,de­lim­iter] [,con­vert­ers][,usecol] ..... ): This re­turns an ar­ray con­tain­ing el­e­ments formed from the data in the file. The con­tents of an in­put file, say load­txt.txt, are given be­low:

#this is com­ment line abc def ghi jkl mno pqr

Use the func­tion shown be­low:

>>>n.load­txt(‘/home/abc/Desk­top/load­txt.txt’,dtype=’S3’) ar­ray([[‘abc’, ‘def’, ‘ghi’],

[‘jkl’, ‘mno’, ‘pqr’]], dtype=’|S3’)

We can see in the out­put that the com­ment state­ment has au­to­mat­i­cally been elim­i­nated. Be­fore ap­ply­ing this func­tion, we must make sure that all rows in the file con­tain an equal num­ber of strings. We can spec­ify in the com­ments op­tion of load­txt() which char­ac­ter will mark the be­gin­ning of the com­ments. By de­fault, it is the ‘#’ sym­bol. The skiprows op­tion will help to skip the first ‘skiprows’ lines in the in­put file.

arange([start], stop[, step,][,dtype]): This re­turns an ar­ray con­tain­ing el­e­ments within a range.

There are many more func­tions that help in gen­er­at­ing ar­rays. They are de­tailed in the of­fi­cial site scipy.org.

Func­tions as­so­ci­ated with ar­rays

We have writ­ten pro­grams to find trace, to sort el­e­ments, to find the in­dex of non-zero el­e­ments, to mul­ti­ply two ma­tri­ces, etc. We know how lengthy th­ese pro­grams are, if writ­ten in C. Each of th­ese tasks can be fin­ished with a sin­gle state­ment us­ing NumPy. The de­scrip­tion of a few func­tions that are used is given be­low. A ma­jor­ity of the func­tions as­so­ci­ated with an ar­ray re­turn an ar­ray.

nonzero(a): This re­turns a tu­ple con­tain­ing the in­dices of non-zero el­e­ments in the ar­ray.

>>>a=n.ar­ray([[0,0,2],[3,0,0],[0,0,0]])

>>>n.nonzero(a)

(ar­ray([0, 1]), ar­ray([2, 0]))

We can see in the def­i­ni­tion of ‘a’ that in­dices of nonzero el­e­ments in it are [0,2] and [1,0]. Each el­e­ment in the re­sult of nonzero() is an ar­ray con­tain­ing the in­dex po­si­tion of the non-zero el­e­ment in that di­men­sion. In this case, the first ar­ray con­tains row num­bers and the se­cond ar­ray con­tains col­umn num­bers of non-zero el­e­ments. If we had a third di­men­sion in the in­put ar­ray, the tu­ple would have con­tained one more el­e­ment show­ing the po­si­tions of nonzero el­e­ments in that di­men­sion.

>>>a[n.nonzero(a)] ar­ray([2, 3])

The above code shows us how to re­trieve the non-zero el­e­ments from the ar­ray.

trans­pose(a[, axes]): This re­turns a new ndar­ray af­ter per­form­ing a per­mu­ta­tion on di­men­sions. The code seg­ment given be­low shows a 3D ar­ray and its trans­pose.

>>>a=n.ar­ray((((1,2,3),(4,5,6)),((7,8,9),(0,1,2))))

>>>a ar­ray([[[1, 2, 3],

[4, 5, 6]],

[[7, 8, 9], [0, 1, 2]]]) >>>n.trans­pose(a) ar­ray([[[1, 7],

[4, 0]],

[[2, 8], [5, 1]],

[[3, 9], [6, 2]]])

sum(a [, axis][,dtype][,out][,keep­dims]): This re­turns the sum of el­e­ments along the given axis. In the op­tion list is an ndar­ray into which the re­sult should be writ­ten. keep­dims is a Boolean value, which if set to ‘True’, will keep the axis which is re­duced a di­men­sion with size one in the re­sult. An ex­am­ple of a 3D ar­ray is given be­low. In this case, the axis takes val­ues from 0 to 2.

>>>a=n.ar­ray((((1,2,3),(4,5,6)),((7,8,9),(0,1,2))))

>>>a ar­ray([[[1, 2, 3],

[4, 5, 6]],

[[7, 8, 9], [0, 1, 2]]])

>>>n.sum(a,axis=0) ar­ray([[ 8, 10, 12],

[ 4, 6, 8]])

>>>n.sum(a,axis=1) ar­ray([[ 5, 7, 9],

[ 7, 9, 11]])

>>> n.sum(a,axis=2,keep­dims=True) ar­ray([[[ 6],

[15]],

[[24], [ 3]]])

prod(a [, axis][,dtype][,out][,keep­dims]): This re­turns the prod­uct of el­e­ments along the given axis.

There are func­tions like argmax, min, argmin, ptp, clip, conj, round, trace, cum­sum, mean, var, std, cumprod, all and any, which make sci­en­tific com­pu­ta­tions eas­ier. There are func­tions for ar­ray con­ver­sion, shape ma­nip­u­la­tion, item se­lec­tion and ma­nip­u­la­tion too. If one wishes to dig deep, please visit the of­fi­cial SciPy site.

Op­er­a­tions on ar­rays

Arith­metic op­er­a­tions: Arith­metic op­er­a­tors like ‘+’, ‘-’, ‘*’, ‘/’ and ‘%’ can be ap­plied di­rectly on NumPy ar­rays. It is to be noted that all op­er­a­tions are el­e­ment-wise op­er­a­tions. The re­sult of 2D ar­ray mul­ti­pli­ca­tion is shown be­low:

>>>c=n.ar­ray([[1,2],[3,4]])

>>>d=n.ar­ray([[1,3],[2,1]])

>>>c*d ar­ray([[1, 6], [6, 4]])

If you wish to per­form ma­trix mul­ti­pli­ca­tion, use the func­tion dot() as shown be­low or gen­er­ate ma­tri­ces us­ing the ma­trix func­tion and use the ‘*’ op­er­a­tor on them.

>>>c=n.ar­ray([[1,2],[3,4]])

>>>d=n.ar­ray([[1,3],[2,1]])

>>> c ar­ray([[1, 2],

[3, 4]])

>>> d ar­ray([[1, 3],

[2, 1]])

>>> n.dot(c,d) ar­ray([[ 5, 5],

[11, 13]])

Re­la­tional op­er­a­tions: NumPy al­lows one to com­pare two ar­rays us­ing re­la­tional op­er­a­tors. The re­sult will be a Boolean ar­ray, i.e., an el­e­ment in a re­sul­tant ar­ray is set to ‘True’ only if the con­di­tion is sat­is­fied. An ex­am­ple is shown be­low:

>>>c=n.ar­ray([[1,2],[3,4]])

>>>d=n.ar­ray([[1,3],[2,1]])

>>>c ar­ray([[1, 2],

[3, 4]])

>>>d ar­ray([[1, 3],

[2, 1]])

>>>c==d ar­ray([[ True, False],

[False, False]], dtype=bool)

Log­i­cal op­er­a­tions: Log­i­cal op­er­a­tions can be per­formed on ar­rays us­ing built-in func­tions sup­ported by NumPy. Func­tions like log­i­cal_or(), log­i­cal_not(), log­i­cal_and(), etc can be used for this pur­pose. The code seg­ment given be­low shows the re­sults of the XOR op­er­a­tion.

>>>c=n.ar­ray([[0,1],[2,3]])

>>>d=n.ar­ray([[1,3],[2,0]])

>>>n.log­i­cal_xor(c,d) ar­ray([[ True, False],

[False, True]], dtype=bool)

In­dex­ing and slic­ing ar­rays

The NumPy ar­ray in­dex starts at 0. Let ‘a’ be a 2D ar­ray. ‘a[i][j]’ rep­re­sents the (j+1)th el­e­ment in the (i+1)th row. Equiv­a­lently, you can write it as ‘a[i,j]’. ‘a[3,:]’ rep­re­sents all el­e­ments in the 4th row. ‘a[i:i+2, :]’ rep­re­sents all the el­e­ments in the (i+1)th row to the (i+3)rd row.

I am now go­ing to ex­plain an at­trac­tive fea­ture of NumPy

ar­rays, which is noth­ing but sup­port for Boolean in­dex­ing. The ex­am­ple given be­low ex­plains the same.

>>>c=n.ar­ray([1,4,7,8,2])

>>>d=c<5

>>>d ar­ray([ True, True, False, False, True], dtype=bool)

>>>c[d] ar­ray([1, 4, 2])

Here, ‘d’ is a Boolean ar­ray whose el­e­ment is set to ‘True’ if the cor­re­spond­ing el­e­ment in ‘c’ has a value less than 5. Ac­cess­ing ar­ray ‘c’ us­ing ‘d’, i.e., c[d], will fetch an el­e­ment in ‘c’ only if the el­e­ment in the cor­re­spond­ing in­dex po­si­tion in ‘d’ is ‘True’. I will give one more ex­am­ple. An ar­ray ‘a’ is de­fined as fol­lows:

>>>a=n.ar­ray([1,2,3,4,5,6,7])

We can see that the state­ment given be­low will re­trieve all el­e­ments in ar­ray ‘a’ which are even num­bers.

>>>a[a%2==0] ar­ray([2, 4, 6])

In­te­ger over­flow in Python

Python 2 sup­ports two types of in­te­gers: int and long. Int is C type, which al­lows a range of val­ues to be taken, while long is ar­bi­trary pre­ci­sion whose max­i­mum value is lim­ited by the avail­able mem­ory. If int is not enough, it will be au­to­mat­i­cally pro­moted to long. When it comes to Python 3, there is sup­port for ar­bi­trary pre­ci­sion in­te­gers. So there is no ques­tion of over­flow in in­te­ger op­er­a­tions in pure Python. But we can­not re­strict our use to pure Python, since sci­en­tific com­pu­ta­tion needs pack­ages in the PyData stack (e.g., NumPy, Pan­das, SciPy, etc). The PyData stack uses C type in­te­gers which have fixed pre­ci­sion. It uses 64 bits for rep­re­sen­ta­tion. So the max­i­mum value an in­te­ger can take is 263-1. The over­flow con­di­tion is shown be­low: >>>a=n.ar­ray([2**63-1,4],dtype=int)

>>>a ar­ray([9223372036854775807, 4])

>>>a+1 ar­ray([-9223372036854775808, 5])

To con­clude, NumPy not only makes com­pu­ta­tion eas­ier, but also makes the pro­gram run faster. It pro­vides mul­ti­di­men­sional ar­rays and tools to play with ar­rays.

Newspapers in English

Newspapers from India

© PressReader. All rights reserved.