Fast Text: In­cred­i­bly Fast Text Clas­si­fi­ca­tion

FastText is a state-of-art, ded­i­cated tool for su­per­fast text clas­si­fi­ca­tion, which pro­vides ac­cu­racy on par with any other deep learn­ing tool. It is a li­brary de­signed to help build scal­able so­lu­tions for text rep­re­sen­ta­tion and clas­si­fi­ca­tion.

OpenSource For You - - Developers - By: Kr­ishna Modi The au­thor has a B. Tech de­gree in com­puter en­gi­neer­ing from NMIMS Univer­sity, Mum­bai and an M. Tech in cloud com­put­ing from VIT Univer­sity, Chen­nai. He has rich and var­ied ex­pe­ri­ence at var­i­ous re­puted IT or­gan­i­sa­tions in In­dia. He can b

With the con­tin­u­ous growth of on­line data, it is very im­por­tant to un­der­stand it too. And in or­der to make sense out of the data, ma­chine learn­ing tools are used. A great deal of ef­fort has gone into clas­si­fy­ing data us­ing deep learn­ing tools, but un­for­tu­nately, these are highly com­pli­cated pro­ce­dures that con­sume vast CPU re­sources and time to get us results. fastText is the best avail­able text clas­si­fi­ca­tion li­brary that can be used for blaz­ing fast model train­ing and for fairly ac­cu­rate clas­si­fi­ca­tion results.

Text clas­si­fi­ca­tion is a sig­nif­i­cant task in nat­u­ral lan­guage pro­cess­ing (NLP) as it can help us solve es­sen­tial prob­lems like fil­ter­ing spam, search­ing the Web, page rank­ing, doc­u­ment clas­si­fi­ca­tion, tag­ging and even some­thing like sen­ti­ment anal­y­sis. Let us ex­plore fastText in de­tail.

Why fastText?

fastText is an open source tool de­vel­oped by the Facebook AI Re­search (FAIR) lab. It is a li­brary that is ded­i­cated to rep­re­sent­ing and clas­si­fy­ing text in a scal­able en­vi­ron­ment, and has a faster and su­pe­rior per­for­mance com­pared to any of the other avail­able tools. It is writ­ten in C++ but also has in­ter­faces for other lan­guages like Python and Node.js.

Ac­cord­ing to Facebook, “We can train fastText on more than one bil­lion words in less than 10 min­utes us­ing a stan­dard multi-core CPU, and clas­sify half a mil­lion sen­tences among 312K classes in less than a minute.” That kind of CPU-in­ten­sive clas­si­fi­ca­tion would gen­er­ally take hours to achieve us­ing any other ma­chine learn­ing tool.

Deep learn­ing tools per­form well on small data sets, but tend to be very slow in case of large data sets, which lim­its their use in pro­duc­tion en­vi­ron­ments.

At its core, fastText uses the ‘bag of words’ ap­proach, dis­re­gard­ing the or­der of words. Also, it uses a hi­er­ar­chi­cal clas­si­fier in­stead of a linear one to re­duce the linear time com­plex­ity to log­a­rith­mic, and to be much more ef­fi­cient on large data sets with a higher cat­e­gory count.

Com­par­i­son and statis­tics

To test the fastText pre­dic­tions, we used an al­ready trained model with 9000 Web ar­ti­cles of more than 300 words each and eight class la­bels. This we looped into the Python API cre­ated us­ing the Asyn­cio frame­work, which works in an asyn­chro­nous fash­ion sim­i­lar to Node.js. We per­formed a test us­ing an Apache bench­mark­ing tool to eval­u­ate the re­sponse time. The in­put was lorem ip­sum text of about 500 lines as a sin­gle doc­u­ment for text clas­si­fi­ca­tion. No caching was used in any of the mod­ules to keep the test results sane. We per­formed 1000 re­quests, with 10 con­cur­rent re­quests each time, and got the results shown in Fig­ure 1.

The re­sult states that the av­er­age re­sponse time was 8 mil­lisec­onds and the max­i­mum re­sponse time was 11 mil­lisec­onds. Ta­ble 1 shows the train­ing time re­quired and ac­cu­racy achieved by fastText when com­pared to other pop­u­lar deep learn­ing tools, as per the data pre­sented by Facebook in one of its case stud­ies.

With a new up­date in the fastText li­brary, FAIR has in­tro­duced com­pressed text clas­si­fi­ca­tion mod­els which en­able us to use the li­brary even on small mem­ory de­vices like mo­biles and Rasp­berry Pi. This tech­nique al­lows mod­els us­ing gi­ga­bytes of mem­ory to come down to only a few hun­dred kilo­bytes, while main­tain­ing the same per­for­mance and ac­cu­racy lev­els.

Now that we know how well fastText can per­form, let’s set it up.

Con­fig­u­ra­tion and usage

It is quite sim­ple to set up fastText. There are two ways to do this – ei­ther get the source and build it your­self, or in­stall the Python in­ter­face for it and get started. Let’s look at both meth­ods.

Build­ing from the source code: You will just need to get the source code from the Git repos­i­tory, https://github.com/

face­bookre­search/fastText.git. Then go to the di­rec­tory and en­ter make, which should com­pile the code and gen­er­ate the ex­e­cutable fastText li­brary for you. The out­put should be as shown in Fig­ure 2.

In­stal­la­tion us­ing the Python in­ter­face: This is the rec­om­mended method, as you can use it later for train­ing and pre­dic­tion pur­poses in the same Python script.

The Python mod­ule for fastText re­quires Cython to be in­stalled. Ex­e­cute the fol­low­ing com­mands to in­stall Cython and fastText:

pip in­stall cython pip in­stall fasttext

And you are done! Just im­port fastText and use the pre­trained model to start pre­dict­ing the classes.

Im­port fasttext model = fasttext.load­_­model(‘model.bin’) texts = [‘fastText is re­ally amaz­ing’, ‘I love fastText’] la­bels = clas­si­fier.pre­dict(texts) print la­bels

You can also re­fer to the Python fastText mod­ule doc­u­men­ta­tion at https://pypi.python.org/pypi/fasttext for more details.

With the latest ma­chine learn­ing tools like fastText for text clas­si­fi­ca­tion, you can cer­tainly ex­pect amaz­ing prod­ucts that utilise these ca­pa­bil­i­ties, par­tic­u­larly in the field of ar­ti­fi­cial in­tel­li­gence.

Fig­ure 1: Bench­mark­ing with fastText

Fig­ure 2: Out­put

Newspapers in English

Newspapers from India

© PressReader. All rights reserved.