Researchers developed artificial intelligence that can write so well it kind of scares even them
Researchers have developed an artificial intelligence text prediction tool so advanced, it could take this sentence you’re reading and basically finish the article.
And they’re so concerned by its potential for “malicious applications,” the avowedly open-source organization that developed it is not releasing its full research.
They’re worried that, in the wrong hands, it could be the new frontier for fake news.
The group OpenAI announced the development of the model, called GPT-2, in a blog post on Thursday.
They trained it to produce human-like writing by feeding it eight million web pages. Its function, the group said, is to “predict the next word, given all of the previous words within some text.”
What this means, practically, is that if you feed it only a sentence or two, it can continue the writing with startlingly intuitive results.
We've trained an unsupervised language model that can generate coherent paragraphs and perform rudimentary reading comprehension, machine translation, question answering, and summarization — all without task-specific training: https://t.co/sY30aQM7hU pic.twitter.com/360bGgoea3
— OpenAI (@OpenAI) February 14, 2019
In one example used in OpenAI’s blog post, the model took just a single sentence written by a human about Miley Cyrus – “Miley Cyrus was caught shoplifting from Abercrombie and Fitch on Hollywood Boulevard today.” – and finished a short news article from it.
It continued: “The 19-year-old singer was caught on camera being escorted out of the store by security guards. The singer was wearing a black hoodie with the label ‘Blurred Lines’ on the front and ‘Fashion Police’ on the back.”
In another example, it both invented a quote from the U.S. energy secretary and a Department of Energy news release in a 172-word article it wrote about nuclear materials being stolen in Cincinnati.
All it was fed was: “A train carriage containing controlled nuclear materials was stolen in Cincinnati today. Its whereabouts are unknown.”
In the article the invented energy secretary, Tom Hicks, gave this remarkably realistic quote: “The safety of people, the environment and the nation’s nuclear stockpile is our highest priority. We will get to the bottom of this and make no excuses.”
“The model is chameleon-like,” the researchers wrote. “It adapts to the style and content of the conditioning text … our model is capable of generating samples from a variety of prompts that feel close to human quality and show coherence over a page or more of text.”
They did find that it sometimes fails, producing repetitive text, nonsensical phrases like “fires happening under water” or unnatural topic switching.
But about half the time, at least on subjects popular enough to wind up in the dataset more frequently and give it better context (like, say, Miley Cyrus), it produced “reasonable samples.”
It can be quite a bit jarringly better than “reasonable,” too. Fed just a sentence from “Lord of the Rings,” for example, it produced whole paragraphs that, while a little clunky, mostly continue the story along plausibly.
The model produced, in a way, its own take on "Lord of the Rings."
It also wrote a short essay when asked to complete a homework assignment describing the reasons for the Civil War.
And The Guardian was able to feed it the opening line of George Orwell’s “1984” – just 14 words, “It was a bright cold day in April, and the clocks were striking thirteen.” – and get back original prose about driving in Seattle and teaching in China with a “vaguely futuristic tone and the novelistic style,” as the paper put it, consistent with Orwell’s work.
It’s also capable of reading comprehension, translation, question answering and summarization.
The nonprofit, in part funded by Elon Musk, was startled by its own success.
Its institutional purpose is to release its full research when it develops a new tool or makes a new advancement, to contribute to the larger AI research community.
In this case, however, the possibility that the tool could be used not only to create alarmingly deceptive fake news articles, but impersonate people or automate highly sophisticated social media trolling campaigns and spam/phishing attacks, led them to only release a scaled-down version of the model for public analysis.
“Due to concerns about large language models being used to generate deceptive, biased, or abusive language at scale, we are only releasing a much smaller version of GPT-2 along with sampling code,” they wrote.
Their blog post noted, however, that others in the community could probably take even this limited release and reproduce the more sophisticated version of the model.
They wrote that they hope their cautious approach at least “gives the AI community more time to have a discussion about the implications of such systems.”
In a more optimistic vision, they see it potentially improving translation tools, speech recognition systems and writing assistant programs.
Either way, it seems inevitable that models like this will make an impact on how we communicate and consume information.
To use a quote GPT-2 wrote to end another article which was posted to the OpenAI blog:
“They seem to be able to communicate in English quite well, which I believe is a sign of evolution, or at least a change in social organization.”