Join top executives in San Francisco on July 11-12, to hear how leaders are integrating and optimizing AI investments for success. Learn More
Today OpenAI-rival AI21 Labs released the results of a social experiment, an online game called “Human or Not,” which found that a whopping 32% of people can’t tell the difference between a human and an AI bot.
The game, which the company said is the largest-scale Turing Test to date, paired up players for two-minute conversations using an AI bot based on leading large language models (LLMs) such as OpenAI’s GPT-4 and AI21 Labs’ Jurassic-2, and ultimately analyzed more than a million conversations and guesses.
The results were eye-opening: For one thing, the test revealed that people found it easier to identify a fellow human — when talking to humans, participants guessed right 73% of the time. But when talking to bots, participants guessed right just 60% of the time.
Educating participants on LLM capabilities
But beyond the numbers, the researchers noted that participants used several popular approaches and strategies to determine if they were talking to a human or a bot. For example, they assumed bots don’t make typos, grammar mistakes or use slang, even though most models in the game were trained to make these types of mistakes and to use slang words.
Join us in San Francisco on July 11-12, where top executives will share how they have integrated and optimized AI investments for success and avoided common pitfalls.
Participants also frequently asked personal questions, such as “Where are you from?”, “What are you doing?” or “What’s your name?”, believing that AI bots would not have a personal history or background, and that their responses would be limited to certain topics or prompts. However, the bots were mostly able to answer these types of questions, since they were trained on a lot of personal stories.
After the two minute conversations, users were asked to guess who they had been speaking with — a human or a bot. After over a month of play and millions of conversations, results have shown that 32% of people can’t tell the difference between a human and AI.
And in an interesting philosophical twist, some participants assumed that if their discussion partner was too polite, they were probably a bot.
But the purpose of ‘Human or AI’ goes far beyond a simple game, Amos Meron, game creator and creative product lead at the Tel Aviv-based AI21 Labs, told VentureBeat in an interview.
“The idea is to have something more meaningful on several levels — first is to educate and let people experience AI in this [conversational] way, especially if they’ve only experienced it as a productivity tool,” he said. “Our online world is going to be populated with a lot of AI bots, and we want to work towards the goal that they’re going to be used for good, so we want we want to let people know what the technology is capable of.”
AI21 Labs has used game play for AI education before
This isn’t AI21 Labs’ first go-round with game play as an AI educational tool. A year ago, it made mainstream headlines with the release of ‘Ask Ruth Bader Ginsburg,’ an AI model that predicted how Ginsburg would respond to questions. It is based on 27 years of Ginsburg’s legal writings on the Supreme Court, along with news interviews and public speeches.
‘Human or AI’ is a more advanced version of that game, said Meron, who added that he and his team were not terribly surprised by the results.
“I think we assumed that some people wouldn’t be able to tell the difference,” he said. What did surprise him, however, was what it actually teaches us about humans.
“The outcome is that people now assume that most things humans do online may be rude, which I think is funny,” he said, adding the caveat that people experienced the bots in a very specific, service-like manner.
Why policymakers should take note
Still, with U.S. elections coming down the pike, whether humans can tell the difference between another human and an AI is important to consider.
“There are always going to be bad actors, but what I think can help us prevent that is knowledge,” said Meron. “People should be aware that this technology is more powerful than what they have experienced before.”
That doesn’t mean that people need to suspicious online because of bots, he emphasized. “If it’s a human phishing attack, or a human with a [convincing alternate] persona online, that’s dangerous,” he said.
Nor does the game tackle the issue of sentience, he added. “That’s a different discussion,” he said.
But policymakers should take note, he said.
“We need to make sure that if you’re a company and you have a service using an AI agent, you need to clarify whether this is a human or not,” he said. “This game would help people understand that this is a discussion they need to have, because by the end of 2023 you can assume that any product could have this kind of AI capability.”
VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Discover our Briefings.