Google open sources ‘Parsey McParseface’
‘The World’s Most Accurate Parser’ has been made available by search giant for anyone to download.
Google (who makes no secret of its research into machine learning) recently open-sourced SyntaxNet, a neural network framework implemented in TensorFlow (Google’s open source software library for machine learning). Google says it can be used as a foundation for Natural Language Understanding (NLU) systems.
The release includes all the code needed to train new SyntaxNet models as well as ‘Parsey McParseface’, an English parser trained by Google that can be used to analyse text. With its name being a nod towards the recent controversy regarding the naming competition for the UK research ship, RRS Sir David Attenborough (the name that won the popular vote being ‘Boaty McBoatface’), Parsey McParseface is touted as the most accurate model in the world and built on machine learning algorithms that can learn to analyse the linguistic structure of language, as well as explaining the functional role of each word in a given sentence. Slav Petrov, a Senior Research Scientist at Google, gave examples of how SyntaxNet takes sentences and determines the syntactic relationships between words in them. He also described how longer sentences might have thousands of different possible structures (known somewhat brain-bendingly as ‘prepositional phrase attachment ambiguity'), which is something humans can parse easily, but requires the use of neural nets for SyntaxNet to be able to deal with ( http://bit.ly/SyntaxNetOpenSourced).
Google claims that Parsey can correctly understand dependencies between words over 94% of the time when standard testing data (from news wire sentences) is used, something approaching human levels of performance (on more free-form data this drops to 90%). The team want to carry on developing this approach in order to incorporate real word knowledge and have Parsey become able to understand natural language across all languages and contexts. The paper detailing how all this works is at arXiv.org ( http://bit. ly/GNTNN) while the code is available on GitHub ( http://bit.ly/SyntaxNetCode).