January 2, 2025 – In today’s digital age, artificial intelligence (AI) writing tools have become a convenient way to generate content such as papers, rap songs, or scripts with just a few keystrokes. However, a recent study suggests that, despite their convenience, these AI-generated works still lack the original creativity of a Shakespeare.
According to a report in Science, researchers have developed a new program to measure the creativity of AI output. Mirco Musolesi, a computer scientist studying AI creativity at University College London, points out that evaluating creativity is “a complex and interesting challenge.” He believes that this new method excels in addressing the issue of linguistic novelty.
Since the advent of generative AI and large language models, there have been doubts in the scientific community about their creative writing abilities. While these AI systems can quickly produce text that appears human-like, some scholars argue that they do not truly innovate but simply reorganize content from their training corpus. Critics have likened them to “random parrots” that blindly repeat known text.
Quantifying creativity, however, is not an easy task. Traditionally, scientists have relied on two methods: using computers to detect plagiarism, which does not necessarily indicate originality, and relying on human ratings to evaluate fluency and creativity, a subjective and time-consuming process.
To address this, Lu Ximing, a computer scientist at the University of Washington, and his team have developed a tool called DJ Search. This tool objectively captures subtle differences by extracting phrase fragments from AI-generated text and searching for similar content in a database. It not only looks for exact matches but also identifies semantically similar expressions. With the help of AI algorithms analyzing word meanings, it uses embedding vectors to determine synonyms. DJ Search ultimately evaluates the novelty of AI output by calculating the proportion of unmatched text.
The study found that humans significantly outperform AI in poetry, novels, and speeches, with margins of 80%, 100%, and 150%, respectively. DJ Search can also compare human works, revealing, for instance, that the language originality of “The Hunger Games” is 35% higher than that of “Twilight.” Lu compares AI to a DJ, stating, “They remix text like DJs remix music. It’s impressive, but it can’t replace the composer.”
Nanyun Violet Peng, a computer scientist at the University of California, Los Angeles, suggests that future assessments should evaluate the originality of the overall narrative, not just the linguistic level. This highlights the need for more comprehensive metrics to truly gauge AI’s creative potential.