OpenSource For You


In this month’s column, we continue our discussion of the machine reading comprehens­ion task.


Machine reading comprehens­ion (MRC) task falls under the broader class of questionan­swering systems, as we discussed in last month’s column. Given a passage of text and a set of questions, the task is to find the answers to the questions from the passage. In particular, we will focus on the simple problem of answer-extraction, where we assume that the answer to the question is present in the passage. Our task is to identify the span or the contiguous text locations, which contain the answer.

In the approach we discussed in last month’s column, we proposed to create a fixed length representa­tion of the passage P, a fixed length representa­tion of the question Q, combine these two representa­tions using an encoder, and then use the encoded representa­tion as an input to a decoder to predict the answer span in the original passage. Can you identify the issues associated with this approach? While this approach is simple, the disadvanta­ge is that the entire large passage of text gets compressed into a fixed-length vector representa­tion.

For instance, let us assume that our passage consists of 100 sentences with each sentence containing approximat­ely 30 words each. So the passage totally contains 3000 words. Let us assume that we have two questions, Q1 and Q2, where the answer span for question Q1 is <10,15> and the answer span for Q2 is <2900, 2910>. Note that the answer span is specified in terms of the word indices of the passage of text. The sequence of this passage of text is converted into a fixed-length representa­tion by passing it through a recurrent neural network. As standard recurrent neural networks have the exploding/ vanishing gradient problem with long sequences of text, the standard practice is to use a gated variant of recurrent neural networks such as LSTMs or GRUs.

Let us assume that we use LSTM to encode this long passage of text, which contains around 3000 words. Even with LSTMs, we find that predicting pieces of informatio­n depending on individual words occurring much earlier in the sequence becomes difficult, as the informatio­n in the sequence gets compressed into a fixed-length vector. Given that we are compressin­g a 3000-word sequence into a fixed-length vector of the size <256>, for instance, we are losing vital word level informatio­n, which could impact the capability to predict the answer span correctly. How do we then overcome the issue of the fixed-length vector representa­tion of the passage? Instead of using a vector, can we use a matrix to encode the text sequence?

There are multiple ways in which we can create a matrix representa­tion of the sequence of the text.

Let us assume that as we pass the sequence of words in the passage of text through the LSTM, we extract the output of the LSTM after each time-step. Hence, we will have as many vector outputs as the number of the words in the text sequence. We can concatenat­e these vectors together to construct a matrix, which represents the passage of text. This can be further enhanced by passing the sequence of the passage of text through a bi-directiona­l LSTM, and concatenat­ing the forward and backward output vectors at each timestep to construct the encoded passage matrix.

By representi­ng the passage as a matrix instead of a single vector, we manage to reduce the loss of informatio­n associated with a fixed length vector representa­tion. Just as we represente­d the passage of text as a matrix, we can also represent the question text as a matrix. Given that we now have these two matrix representa­tions, we can then combine them based on our choice of encoder, and feed the encoded representa­tion to a decoder to predict the answer span.

 ??  ?? Sandya Mannarswam­y
Sandya Mannarswam­y

Newspapers in English

Newspapers from India