CODE SPORT
In this month’s column, we discuss how language models are lacking in common sense knowledge and what can be done to address this.
In last month’s column, we discussed about how natural language processing (NLP) techniques can be applied in the field of programming and software development. We covered how ‘Big Code’ data sets enable NLP models to be used for bug detection, comments generation and also how language models can be enhanced for source code modelling. NLP models are also getting applied to a wide number of other fields, which require deep natural language understanding (NLU) and, in certain cases, reasoning capabilities. In this month’s column, we will look at how deep NLU requires common sense knowledge capabilities, what is missing in today’s large pre-trained language models, and how this can be addressed.
In human-to-human communication, the text that is used for communication does not always explicitly state all the information needed for processing that communication. Consider the following example from the task of natural language inference (NLI). In NLI, you are given a premise and a hypothesis. The task is to predict one of the three classes, namely: (a) entailment – the hypothesis follows from the premise; (b) contradiction - the hypothesis contradicts the premise; and (c) neutral – the hypothesis can neither be inferred nor be contradicted based on the premise. An example is given below.
Premise: The moon was shining brightly and the sun has not risen.
Hypothesis: It was night.
For human beings, this is an example to infer. Since the sun has not yet risen, it must be night. Hence the correct output class is entailment. However, even stateof-the-art neural NLI models get this example wrong. They wrongly predict the output class as contradiction and, in some cases, as neutral whereas human annotators correctly label it as entailment. Why do even sophisticated neural NLP models — even those built on top of BERT family — make this mistake?
This is because human beings have the essential common sense world knowledge that if the sun has not risen, it must be night. However, this is not a common sense fact that NLP models would have learnt, unless it was present in the training data.
You may have the question: Given that large language models like BERT/Roberta, etc, are trained using billions of gigabytes of real world data, how is it possible that they would not have learnt this common sense fact? This happens because of what is known as ‘common background knowledge’ between all human beings. Because humans have this background knowledge that ‘if the sun has not risen, it is night’, this fact will never be explicitly communicated in language between two human beings. Since the NLP model will never see this information explicitly in the data it has been trained on, it will not be able to encode this in its internal knowledge representation. Hence, it gets this example wrong.
Let us consider one more example taken from the BERT language model. Given the sentence, “the colour of the dove sitting on the branch is …..,” the BERT language model quite often ends up predicting the next word as red. You can try this example yourself. Because language models are stochastic, you may get different outputs. But after a few tries, you should be able to hit this answer. BERT knows that the answer needs to be a word in the category ‘colour’. Based on its distributional knowledge, it knows that colours can be red, blue, grey, green, etc. It will often try to use the most commonly associated word of colour with the word dove. However, it does not know that red doves are rare, so it ends up occasionally choosing the word ‘red’ when trying to complete the above sentence.
Consider another linguistic task that can be associated with testing for common sense. Here, the task is to fill up the blank with either of the two choices ‘always’ and ‘never’. For example: Given the sentence
….. have horns’, the model needs to select between ‘always’ or ‘never’. It is seen that neural language models often end up making commonsensical mistakes with these questions. They end up choosing ‘Hens always have horns’, occasionally indicating gaps in their knowledge of the physical world. Remember that language models learn purely from text. The two concepts ‘hens’ and ‘horns’ are not explicitly connected in language text. It is rare to find an explicit statement saying that ‘hens do not have horns’. Hence, models don’t know that ‘hens have horns’ is a common sense fallacy.
One way of overcoming the above issue is to use multimodal neural models, which combine vision and text inputs for their pretraining. The knowledge that ‘hens do not have horns’ is represented in images. Once a model gathers knowledge of the visual concept of ‘horns’ from images, it can learn that the ‘hen’ image does not have the ‘horn’ concept associated with it. This will allow it to answer the question correctly. Hence, multi-modal pretrained models can be one way of addressing the gaps in common sense knowledge by combining the information from both visual images and textual representations.
There are different kinds of common sense knowledge. Physical common sense knowledge is associated with physical concepts. For example, based on physical laws of nature, some of the examples of common sense facts are: (a) night is the time when sun has not risen, (b) sea water is salty, or (c) any device that runs on electricity will need to be plugged in or should have an alternate source of energy by means of a battery, etc. Factual common sense knowledge comprises the facts associated with past historical events or knowledge of objects/persons/things. This is what is commonly referred to as factual encyclopaedic knowledge, such as ‘Dante is a famous Italian poet who was born in Milan’ or ‘President Obama was born in Hawaii and has a Kenyan grandfather’ or the Second World War happened between the years 1938-1942.
There is also another kind of common sense knowledge which is known as social common sense knowledge. For instance, the knowledge that human beings prefer to dine in public, but perform bathing in private is social behavioural knowledge based on human habits. Consider the information that human beings often leave closet doors open but would rather not knowingly leave the fridge door open. This is based on social norms or human behaviour. This is known as social common sense. Because much of this knowledge is implied and understood between human beings, it is rarely stated in explicit communication, making it difficult for models to learn it directly. This is what is known as reporting bias. The frequencies at which certain types of situations/occurrences are reported in known text need not be the same as the relative frequencies of occurrences in human belief.
Reporting bias was well demonstrated in a 2013 paper titled ‘Reporting bias and knowledge acquisition’ available at https://openreview.net/pdf?id=AzxEzvpdE3Wcy. The authors found that the information ‘human beings have eyes’ was reported a million times in text whereas the information ‘human beings have a spleen’ was reported less than 1500 times. The fact that many of the other human body parts are not often covered in texts does not mean that these body parts are less common. Since language models acquire their knowledge by distributional semantics, they place greater confidence in information seen at higher frequencies in text. So their confidence that human beings have eyes would be much higher than human beings having a spleen even though, in reality, both the facts are equally true. Reporting bias is a major hurdle in knowledge acquisition by neural models.
Given all these gaps in common sense knowledge of pretrained language models, we need to think about how they can be addressed. Now that we know pretrained language models cannot automatically acquire complete common sense knowledge from the texts they are being trained on, what can be done? One way to augment the common sense knowledge learnt by language models is to explicitly inject it from external knowledge sources. For factual knowledge, sources such as Wikipedia can be used to augment the knowledge captured by learning models automatically from text. However, for situational, physical and social common sense knowledge, such ready-made sources are often not available.
Knowledge sources can represent their knowledge either in structured form or unstructured text form. Wikipedia is an example of knowledge represented in natural language in text form. On the other hand, a source of structured knowledge is Concept Net. Originally started as a crowd-sourced effort by MIT Media Lab in 1998, it has evolved into a semantic network that captures the attributes and relations of various concepts which represent common sense knowledge in human life. Concept Net has been widely used as a source of external knowledge in neural NLP models.
While we can use external knowledge sources for information, the question still remains as to how we can incorporate the knowledge from them into the neural model. We will cover this question in next month’s column. Meanwhile, there is an excellent beginner’s tutorial on ‘Common sense knowledge representation in NLP’ from last year’s ACL Conference, available at https://www.aclweb.org/ anthology/2020.acl-tutorials.7.pdf. Please go through it if you would like more details.
Feel free to reach out to me over LinkedIn/email if you need any help in your coding interview preparation. If you have any favourite programming questions/software topics that you would like to discuss on this forum, please send them to me, along with your solutions and feedback, at sandyasm_ AT_yahoo_DOT_com. Wishing all our readers happy coding until next month! Stay healthy and stay safe.