In this month’s col­umn, we con­tinue our dis­cus­sion of the ma­chine read­ing com­pre­hen­sion task.

OpenSource For You - - Contents -

Ma­chine read­ing com­pre­hen­sion (MRC) task falls un­der the broader class of ques­tio­nan­swer­ing sys­tems, as we dis­cussed in last month’s col­umn. Given a pas­sage of text and a set of ques­tions, the task is to find the an­swers to the ques­tions from the pas­sage. In par­tic­u­lar, we will fo­cus on the sim­ple prob­lem of an­swer-ex­trac­tion, where we as­sume that the an­swer to the ques­tion is present in the pas­sage. Our task is to iden­tify the span or the con­tigu­ous text lo­ca­tions, which con­tain the an­swer.

In the ap­proach we dis­cussed in last month’s col­umn, we pro­posed to cre­ate a fixed length rep­re­sen­ta­tion of the pas­sage P, a fixed length rep­re­sen­ta­tion of the ques­tion Q, com­bine these two rep­re­sen­ta­tions us­ing an en­coder, and then use the en­coded rep­re­sen­ta­tion as an in­put to a de­coder to pre­dict the an­swer span in the orig­i­nal pas­sage. Can you iden­tify the is­sues as­so­ci­ated with this ap­proach? While this ap­proach is sim­ple, the dis­ad­van­tage is that the en­tire large pas­sage of text gets com­pressed into a fixed-length vec­tor rep­re­sen­ta­tion.

For in­stance, let us as­sume that our pas­sage con­sists of 100 sen­tences with each sen­tence con­tain­ing ap­prox­i­mately 30 words each. So the pas­sage to­tally con­tains 3000 words. Let us as­sume that we have two ques­tions, Q1 and Q2, where the an­swer span for ques­tion Q1 is <10,15> and the an­swer span for Q2 is <2900, 2910>. Note that the an­swer span is spec­i­fied in terms of the word in­dices of the pas­sage of text. The se­quence of this pas­sage of text is con­verted into a fixed-length rep­re­sen­ta­tion by pass­ing it through a re­cur­rent neu­ral net­work. As stan­dard re­cur­rent neu­ral net­works have the ex­plod­ing/ van­ish­ing gra­di­ent prob­lem with long se­quences of text, the stan­dard prac­tice is to use a gated vari­ant of re­cur­rent neu­ral net­works such as LSTMs or GRUs.

Let us as­sume that we use LSTM to en­code this long pas­sage of text, which con­tains around 3000 words. Even with LSTMs, we find that pre­dict­ing pieces of in­for­ma­tion de­pend­ing on in­di­vid­ual words oc­cur­ring much ear­lier in the se­quence be­comes dif­fi­cult, as the in­for­ma­tion in the se­quence gets com­pressed into a fixed-length vec­tor. Given that we are com­press­ing a 3000-word se­quence into a fixed-length vec­tor of the size <256>, for in­stance, we are los­ing vi­tal word level in­for­ma­tion, which could im­pact the ca­pa­bil­ity to pre­dict the an­swer span cor­rectly. How do we then over­come the is­sue of the fixed-length vec­tor rep­re­sen­ta­tion of the pas­sage? In­stead of us­ing a vec­tor, can we use a ma­trix to en­code the text se­quence?

There are mul­ti­ple ways in which we can cre­ate a ma­trix rep­re­sen­ta­tion of the se­quence of the text.

Let us as­sume that as we pass the se­quence of words in the pas­sage of text through the LSTM, we ex­tract the out­put of the LSTM af­ter each time-step. Hence, we will have as many vec­tor out­puts as the num­ber of the words in the text se­quence. We can con­cate­nate these vec­tors to­gether to con­struct a ma­trix, which rep­re­sents the pas­sage of text. This can be fur­ther en­hanced by pass­ing the se­quence of the pas­sage of text through a bi-di­rec­tional LSTM, and con­cate­nat­ing the for­ward and back­ward out­put vec­tors at each timestep to con­struct the en­coded pas­sage ma­trix.

By rep­re­sent­ing the pas­sage as a ma­trix in­stead of a sin­gle vec­tor, we man­age to re­duce the loss of in­for­ma­tion as­so­ci­ated with a fixed length vec­tor rep­re­sen­ta­tion. Just as we rep­re­sented the pas­sage of text as a ma­trix, we can also rep­re­sent the ques­tion text as a ma­trix. Given that we now have these two ma­trix rep­re­sen­ta­tions, we can then com­bine them based on our choice of en­coder, and feed the en­coded rep­re­sen­ta­tion to a de­coder to pre­dict the an­swer span.

Sandya Man­nar­swamy

Newspapers in English

Newspapers from India

© PressReader. All rights reserved.