Open Source for you

CODE SPORT

-

In this month’s column, we discuss how language models are lacking in common sense knowledge and what can be done to address this.

In last month’s column, we discussed about how natural language processing (NLP) techniques can be applied in the field of programmin­g and software developmen­t. We covered how ‘Big Code’ data sets enable NLP models to be used for bug detection, comments generation and also how language models can be enhanced for source code modelling. NLP models are also getting applied to a wide number of other fields, which require deep natural language understand­ing (NLU) and, in certain cases, reasoning capabiliti­es. In this month’s column, we will look at how deep NLU requires common sense knowledge capabiliti­es, what is missing in today’s large pre-trained language models, and how this can be addressed.

In human-to-human communicat­ion, the text that is used for communicat­ion does not always explicitly state all the informatio­n needed for processing that communicat­ion. Consider the following example from the task of natural language inference (NLI). In NLI, you are given a premise and a hypothesis. The task is to predict one of the three classes, namely: (a) entailment – the hypothesis follows from the premise; (b) contradict­ion - the hypothesis contradict­s the premise; and (c) neutral – the hypothesis can neither be inferred nor be contradict­ed based on the premise. An example is given below.

Premise: The moon was shining brightly and the sun has not risen.

Hypothesis: It was night.

For human beings, this is an example to infer. Since the sun has not yet risen, it must be night. Hence the correct output class is entailment. However, even stateof-the-art neural NLI models get this example wrong. They wrongly predict the output class as contradict­ion and, in some cases, as neutral whereas human annotators correctly label it as entailment. Why do even sophistica­ted neural NLP models — even those built on top of BERT family — make this mistake?

This is because human beings have the essential common sense world knowledge that if the sun has not risen, it must be night. However, this is not a common sense fact that NLP models would have learnt, unless it was present in the training data.

You may have the question: Given that large language models like BERT/Roberta, etc, are trained using billions of gigabytes of real world data, how is it possible that they would not have learnt this common sense fact? This happens because of what is known as ‘common background knowledge’ between all human beings. Because humans have this background knowledge that ‘if the sun has not risen, it is night’, this fact will never be explicitly communicat­ed in language between two human beings. Since the NLP model will never see this informatio­n explicitly in the data it has been trained on, it will not be able to encode this in its internal knowledge representa­tion. Hence, it gets this example wrong.

Let us consider one more example taken from the BERT language model. Given the sentence, “the colour of the dove sitting on the branch is …..,” the BERT language model quite often ends up predicting the next word as red. You can try this example yourself. Because language models are stochastic, you may get different outputs. But after a few tries, you should be able to hit this answer. BERT knows that the answer needs to be a word in the category ‘colour’. Based on its distributi­onal knowledge, it knows that colours can be red, blue, grey, green, etc. It will often try to use the most commonly associated word of colour with the word dove. However, it does not know that red doves are rare, so it ends up occasional­ly choosing the word ‘red’ when trying to complete the above sentence.

Consider another linguistic task that can be associated with testing for common sense. Here, the task is to fill up the blank with either of the two choices ‘always’ and ‘never’. For example: Given the sentence

….. have horns’, the model needs to select between ‘always’ or ‘never’. It is seen that neural language models often end up making commonsens­ical mistakes with these questions. They end up choosing ‘Hens always have horns’, occasional­ly indicating gaps in their knowledge of the physical world. Remember that language models learn purely from text. The two concepts ‘hens’ and ‘horns’ are not explicitly connected in language text. It is rare to find an explicit statement saying that ‘hens do not have horns’. Hence, models don’t know that ‘hens have horns’ is a common sense fallacy.

One way of overcoming the above issue is to use multimodal neural models, which combine vision and text inputs for their pretrainin­g. The knowledge that ‘hens do not have horns’ is represente­d in images. Once a model gathers knowledge of the visual concept of ‘horns’ from images, it can learn that the ‘hen’ image does not have the ‘horn’ concept associated with it. This will allow it to answer the question correctly. Hence, multi-modal pretrained models can be one way of addressing the gaps in common sense knowledge by combining the informatio­n from both visual images and textual representa­tions.

There are different kinds of common sense knowledge. Physical common sense knowledge is associated with physical concepts. For example, based on physical laws of nature, some of the examples of common sense facts are: (a) night is the time when sun has not risen, (b) sea water is salty, or (c) any device that runs on electricit­y will need to be plugged in or should have an alternate source of energy by means of a battery, etc. Factual common sense knowledge comprises the facts associated with past historical events or knowledge of objects/persons/things. This is what is commonly referred to as factual encyclopae­dic knowledge, such as ‘Dante is a famous Italian poet who was born in Milan’ or ‘President Obama was born in Hawaii and has a Kenyan grandfathe­r’ or the Second World War happened between the years 1938-1942.

There is also another kind of common sense knowledge which is known as social common sense knowledge. For instance, the knowledge that human beings prefer to dine in public, but perform bathing in private is social behavioura­l knowledge based on human habits. Consider the informatio­n that human beings often leave closet doors open but would rather not knowingly leave the fridge door open. This is based on social norms or human behaviour. This is known as social common sense. Because much of this knowledge is implied and understood between human beings, it is rarely stated in explicit communicat­ion, making it difficult for models to learn it directly. This is what is known as reporting bias. The frequencie­s at which certain types of situations/occurrence­s are reported in known text need not be the same as the relative frequencie­s of occurrence­s in human belief.

Reporting bias was well demonstrat­ed in a 2013 paper titled ‘Reporting bias and knowledge acquisitio­n’ available at https://openreview.net/pdf?id=AzxEzvpdE3­Wcy. The authors found that the informatio­n ‘human beings have eyes’ was reported a million times in text whereas the informatio­n ‘human beings have a spleen’ was reported less than 1500 times. The fact that many of the other human body parts are not often covered in texts does not mean that these body parts are less common. Since language models acquire their knowledge by distributi­onal semantics, they place greater confidence in informatio­n seen at higher frequencie­s in text. So their confidence that human beings have eyes would be much higher than human beings having a spleen even though, in reality, both the facts are equally true. Reporting bias is a major hurdle in knowledge acquisitio­n by neural models.

Given all these gaps in common sense knowledge of pretrained language models, we need to think about how they can be addressed. Now that we know pretrained language models cannot automatica­lly acquire complete common sense knowledge from the texts they are being trained on, what can be done? One way to augment the common sense knowledge learnt by language models is to explicitly inject it from external knowledge sources. For factual knowledge, sources such as Wikipedia can be used to augment the knowledge captured by learning models automatica­lly from text. However, for situationa­l, physical and social common sense knowledge, such ready-made sources are often not available.

Knowledge sources can represent their knowledge either in structured form or unstructur­ed text form. Wikipedia is an example of knowledge represente­d in natural language in text form. On the other hand, a source of structured knowledge is Concept Net. Originally started as a crowd-sourced effort by MIT Media Lab in 1998, it has evolved into a semantic network that captures the attributes and relations of various concepts which represent common sense knowledge in human life. Concept Net has been widely used as a source of external knowledge in neural NLP models.

While we can use external knowledge sources for informatio­n, the question still remains as to how we can incorporat­e the knowledge from them into the neural model. We will cover this question in next month’s column. Meanwhile, there is an excellent beginner’s tutorial on ‘Common sense knowledge representa­tion in NLP’ from last year’s ACL Conference, available at https://www.aclweb.org/ anthology/2020.acl-tutorials.7.pdf. Please go through it if you would like more details.

Feel free to reach out to me over LinkedIn/email if you need any help in your coding interview preparatio­n. If you have any favourite programmin­g questions/software topics that you would like to discuss on this forum, please send them to me, along with your solutions and feedback, at sandyasm_ AT_yahoo_DOT_com. Wishing all our readers happy coding until next month! Stay healthy and stay safe.

 ??  ?? Sandya Mannarswam­y
Sandya Mannarswam­y

Newspapers in English

Newspapers from India