The real takeaway from the Eugene Goostman saga

Has a chatbot ever passed the Turing test? It turns out, opinions vary.

Some developers are striving to make their bot the first to pass it. Others say that it’s already happened.

At the centre of this controversy lies Eugene Goostman, artificial stupidity, and the question of what it means to pass the famous test.  

Here, we explore the arguments around the supposedly Turing test beating bot: “Eugene Goostman”. But rather than determine whether the bot passed or not, we look at the lessons to be learned from the Eugene Goostman saga.


Introducing Eugene

Eugene Goostman is a chatbot — a computer program that chats to you. It’s been (rightly or wrongly) credited with being the first bot to pass the Turing test.

Vladimir Veselov, Eugene Demchenko and Sergey Ulasen created Eugene in 2001. Since its creation, the bot has competed in various Turing test competitions as well as the Loebner Prize. The bot came first in one such Turing test competition in 2012.

Then, in 2014, Eugene Goostman took part in another contest. The result of which would spark some controversy. Hosted by Reading University on 7 June, the chatbot convinced 33% of the judges that it was a human.  And thus, supposedly passed the Turing test.


A note about the Turing test

To understand the relevance of this, it’s important to understand what the Turing test is.

Named after its creator, Alan Turing, the Turing test was devised in 1950 to test a machine’s intelligence. Specifically, it looks for human-like intelligence, which it measures through natural language conversations.

Turing’s test took inspiration from a party game known as ‘the imitation game’. In this game, players would try to imitate the opposite gender while a judge works out who’s who.

Turing introduced his test idea in his paper Computing Machinery and Intelligence.  He wrote that by the turn of the millennium, a human judge would “not have more than 70 per cent chance” of determining whether their conversation was human or machine.

In this instance, this was interpreted to mean that the Turing test is considered ‘passed’ if the machine can fool just 30% of the judges.


The controversy

Eugene Goostman successfully passed as a human for 10 out of 30 judges at the Royal Society in London. As such, the chatbot passed the Turing test.

The argument goes that Eugene ‘passed’ the test by cheating. Rather than display human-like ability to adapt to different conversations and questions, Eugene tricked judges by explaining odd or confusing answers away. All it took was claiming a foreign mother-tongue and young age. No one would hold a foreign child to the same standard as a native, educated adult, after all.

The bot, then, might be ‘believable’ but isn’t intelligent. It’s displaying artificial stupidity. It lowered the bar it needed to reach, mimicking the ‘lower end’ of the human intelligence spectrum. That is, someone that hasn’t yet the opportunity to learn, placed at a language disadvantage, and with a brain that would still be developing.

Another argument discrediting Eugene’s win is the fact that other chatbots had outperformed Goostman in the past. For instance, Cleverbot in 2011, which convinced 59% of its judges. The test in question has also faced scrutiny, with some unconvinced that the test was thorough enough to count.

It’s a bit of a Schrödinger’s chatbot situation. Until people can agree on their interpretation of the Turing test, and the needs to pass it, Eugene has simultaneously passed and not passed.


The questions surrounding the pass

There are a few considerations surrounding the success of Eugene Goostman. Many of which fall under the overarching question, did Eugene really pass the Turing test? For example:

Does Eugene embody the intelligence envisaged by Alan Turing?

Surely, being able to convincingly fool judges denotes a level of intelligence.

Maybe it’s not Eugene that we should question, but the Turing test itself. Is the Turing test a real measure of ‘intelligence’? Eugene’s controversial victory shows that the test was never really about intelligence. Instead, it was about effective mimicry of human behaviour.

Is the 30% threshold enough for a passing grade?

It’s also worth noting that the 30% threshold for passes was not something that Turing explicitly stated. Rather, the organiser of the event, Kevin Warwick, had chosen to interpret Turing’s paper in that way. Although this seems to be a common interpretation, the Eugene Goostman saga calls into question whether it’s really a high enough score to pass the test.

But, perhaps the most important question to ask is this one:

Have we gotten so used to ‘bad bots’ that instead of striving for better, we’re willing to accept their flaws under the guise of a plausible backstory?

This question shows just how pervasive and convincing artificial stupidity can become. Rather than looking for better, the Eugene Goostman saga shows that chatbots can stay dumb as long as they cleverly explain their shortcomings.


The key takeaway: what we want from chatbots

Eugene Goostman mimicked a human with limited knowledge. In doing so, the task became about mimicry and deception, not intelligence. The outcry against Eugene revolves largely around this fact.

And so, the key takeaway here is what it tells us we’re truly striving for when it comes to chatbots. We want a certain level of ‘intelligence’ from our bots. Further, ‘human’ is not a viable metric for measuring that.

‘Human intelligence’ is a vast spectrum. It could mean the pinnacle of human academia, someone with a higher than average IQ that’s spent years studying and thinking. It could just as easily mean a teenager that communicates through text speak and memes.

But we want more — we want an ‘intelligent’ chatbot to do more than mimic humans. We want them to be better — to have a certain degree of education and domain knowledge. If we ask it a question, we don’t want a slick reason why it doesn’t know. We want the answer.

Chatbots are tools. And we want these tools to raise our knowledge and hold coherent conversations. (Not merely provide a jarring feedback loop of excuses.)


Bonus note: the power of expectation management

The Eugene Goostman saga also shows how we can encourage users to forgive chatbot shortfalls. (While that is, developers are still refining, improving, and developing the bot in question.)

As long as there is a seemingly genuine reason as to why the chatbot struggles, we’re more lenient with any misunderstandings. We can adapt our expectations if we have a guide to help us do so. Again, the judges lowered their expectations for Eugene Goostman because they knew what they’d expect of a thirteen-year-old still learning English.

If we can find a way to transpose that level of understanding to our bots, we could reduce the frustration felt when they hit a bump or get stuck. (Without, that is, the dubious ethics of having a chatbot lie about who they are to users.)


The Eugene Goostman saga

Whether you agree that Eugene Goostman passed the Turing test or not, it’s undeniable that chatbots still have a long way to go. The journey isn’t over.

The saga has shown us that we need to recognise that human intelligence is a spectrum. And one that isn’t necessarily compatible with chatbots and their services.

The key takeaway of the Eugene Goostman saga is that we don’t want ‘human-level intelligence’ from our bots. We want functional bots with their own brand of intelligence. Honest bots that can hold a coherent conversation with us, answer our questions and solve our problems.


Useful links

Could chatbots ever become engaging conversationalists?

Why your standalone chatbot is failing

The problem with generalist chatbots