Modern-day computers are some of the most powerful objects we have created. They can take us to space, measure our heart rate on a run and help us create graphics that are lifelike. However, while computers are great at a lot of things, reading isn’t one of them.
Artificial Intelligence (AI) that can read would certainly be a revolution – it would help us save hours that we can dedicate for more complex tasks. In medicine, for example, several thousand papers are published every day; no doctor or researcher can possibly read them all. Drug discovery gets delayed because the information is locked up in unread literature. Similarly, lawyers often spend hours behind volumes of case papers to eke out the smallest of details. What if a computer could do all that?
Google Tried, But it Wasn’t Very Successful
When it comes to AI, few companies have the might of investment and brainpower as Google. In early 2018, futurist and inventor Ray Kurzweil (currently a director of engineering at Google) took the stage at TED’s annual conference to unveil Google Talk to Books (GTB). Using “Semantic search”, the AI system would be able to search for ideas and concepts, rather than just phrases, in books and papers. In a blog, Kurzweil said: “Semantic search is based on searching meaning, rather than on keywords or phrases… Developed with machine learning, it uses ‘natural language understanding’ of words and phrases.”
The result though wasn’t exactly the great success Google hoped it would be. While some results were pretty accurate, others weren’t. For example, when they asked GTB “how do you cope with anxiety”, it threw up some phrases like:
- Because of this vulnerability, I carefully monitor my response to stress, as people at high risk for stroke monitor their blood pressure. I meditate, do yoga and deep breathing exercises, and exercise daily. I see a therapist and take antidepressant medication if I need extra help.
- with anxiety. Initially, clients usually require external direction, but the overall goal is to teach them to recognize and manage their own anxiety while it is “young (still of short duration) to prevent major anxiety attacks, as well as to avert attacks if they occur.
- Saeed SA, Antonacci DJ, Bloch RM. Exercise, yoga, and meditation for depressive and anxiety disorders. Am Fam Physician
While the results relate to anxiety, GBT mixed fiction with non-fiction, and suggestions from medical books with titles of those books. We didn’t get the exact answer we were looking for, but at least GBT didn’t throw us off track entirely. To test its abilities with fiction, we asked GBT “where did Voldemort die?” The responses:
- seen in The Goblet of Fire, the fourth instalment of the enormously popular Harry Potter series. Near the climactic end of the book, Harry sees the spirits of his parents and friend Cedric, who were killed by the evil Voldemort. Harry’s loved ones offer him comfort in a terrifying moment and even help him reach safety.
- Olympia to ask the help of the giants Abroad, without success. In 95/96 Albus gets slandered by the MoM and eventually loses his job, but is saved by Voldemort’s appearance in the MoM, proving he has been right all along in saying Voldemort is in Britain, alive and corporeal. For now, the MoM and Albus are on the same side.
- In fact, that defeat almost kills Voldemort. Third, by the end of Harry Potter and the Goblet of Fire, Voldemort has been revived by the boy’s blood (642-43). Because it is the father or father figure who has attempted to destroy the child, there must come a time of reckoning between the hero and his attacker.
In the top 10 results, there was only one mention of ‘Harry Potter and the Deathly Hallows’, and GBT failed to bring up a single passage from the book. Anyone who has read the books or seen the movies knows that Voldemort was killed in Hogwarts. To get a sense for why robust machine reading is still such a distant prospect, it helps to appreciate—in detail—what is required even to comprehend a children’s story.
Understanding the Text, and Context
To comprehend a story, it is essential to understand both the text and the context. By doing so, you build a chain of reasoning for why events happen in a story. This reasoning is built on the knowledge of events, characters and even objects. You also need to know how the world works, even in a fictional universe, which has its own set of rules. This diverse knowledge set is more general, it doesn’t take one specific skill set, but rather several.
In the Harry Potter example, you need to know who Voldemort is, what his goal was, and why his death is important. You also need to know the Battle of Hogwarts took place, and that he was killed by Harry. To understand how he died, you need knowledge of the Harry Potter world – spells, wands etc.
Currently, AI development does not enable such learning. Instead of representing knowledge, it just represents probabilities, mainly of how often words tend to co-occur in different contexts. This means you can generate strings of words that sound humanlike, but there’s no real coherence there. You can see for yourself. Simply head to Talk to Transformer, a state-of-the-art AI text generation system that can guess what comes next.
No matter what you enter, you will realise that while the words are fluent, the ideas are incoherent. The output text does not make sense in the larger scheme of things, as the AI has no context for the text. There are two reasons for this mess – the first is that we have been developing specific AI, i.e. systems that can handle a single task like driving a car, or playing chess, but not diverse skill sets. The second is due to the system we use to train AI models – deep learning.
Deep Learning Doesn’t Go Wide Enough
Current AI systems are trained using a statistical technique called deep learning. This is a great way to learn correlations, as it allows the system to learn based on comparisons. However, that also means the AI is very task-focussed, it does not learn about the relationships of different parts. In this case, the context of the story.
This is because of what linguists call compositionality. It is the way we construct the meaning of sentences based on the meaning of individual parts. For example, in the sentence “The moon is 240,000 miles from the Earth,” the word moon means one specific astronomical object, Earth means another, mile means a unit of distance, 240,000 means a number, and then, by virtue of the way that phrases and sentences work compositionally in English, 240,000 miles means a particular length, and the sentence “The moon is 240,000 miles from the Earth” asserts that the distance between the two heavenly bodies is that particular length.
Compositionality and context are not part of standard AI training models. So there’s no way to incorporate background knowledge. So while GTB may have realised Voldemort died, it has no way of understanding that he is a villain, and in literature, it is common for villains to die. Whereas when we read, we build a cognitive model of the text based on not just the text, but the larger context. We are then able to recall these ideas whenever we need to.
In the end, statistical analysis and deep learning is a far cry from the real-world understanding we have built over centuries. Instead, there is a fundamental mismatch between the kind of statistical computation that powers current AI programs and the cognitive-model construction that would be required for systems to actually comprehend what they are trying to read.
If we want to build computers that can genuinely understand and read, adding more computing power isn’t going to cut it. Rather, we need a new approach, one which is centred around reasoning and cognitive psychology. Basically, we need to create a machine version of common sense. Reading isn’t just about statistics, it’s about synthesizing knowledge: combining what you already know with what the author is trying to tell you. Kids manage that routinely; machines still haven’t.