Google,Italy OpenAI, DeepSeek, et al. are nowhere near achieving AGI (Artificial General Intelligence), according to a new benchmark.
The Arc Prize Foundation, a nonprofit that measures AGI progress, has a new benchmark that is stumping the leading AI models. The test, called ARC-AGI-2 is the second edition ARC-AGI benchmark that tests models on general intelligence by challenging them to solve visual puzzles using pattern recognition, context clues, and reasoning.
This Tweet is currently unavailable. It might be loading or has been removed.
According to the ARC-AGI leaderboard, OpenAI's most advanced model o3-low scored 4 percent. Google's Gemini 2.0 Flash and DeepSeek R1 both scored 1.3 percent. Anthropic's most advanced model, Claude 3.7 with an 8K token limit (which refers to the amount of tokens used to process an answer) scored 0.9 percent.
The question of how and when AGI will be achieved remains as heated as ever, with various factions bickering about the timeline or whether it's even possible. Anthropic CEO Dario Amodei said it could take as little as two to three years, and OpenAI CEO Sam Altman said "it's achievable with current hardware." But experts like Gary Marcus and Yann LeCun say the technology isn't there yet and it doesn't take an expert to see how fueling AGI hype is advantageous to AI companies seeking major investments.
The ARC-AGI benchmark is designed to challenge AI models beyond specialized intelligence by avoiding the memorization trap — spewing out PhD-level responses without an understanding of what it means. Instead it focuses on puzzles that are relatively easy for humans to solve because of our innate ability to take in new information and make inferences, thus revealing gaps that can't be resolved by simply feeding AI models more data.
"Intelligence requires the ability to generalize from limited experience and apply knowledge in new, unexpected situations. AI systems are already superhuman in many specific domains (e.g., playing Go and image recognition)" read the announcement.
SEE ALSO: I compared Sesame to ChatGPT voice mode and I'm unnerved"However, these are narrow, specialized capabilities. The 'human-ai gap' reveals what's missing for general intelligence - highly efficiently acquiring new skills."
To get a sense of AI models' current limitations, you can take the ARC-AGI test for yourself. And you might be surprised by its simplicity. There's some critical thinking involved, but the ARC-AGI test wouldn't be out of place next to the New York Timescrossword puzzle, Wordle, or any of the other popular brain teasers. It's challenging but not impossible and the answer is there in the puzzle's logic, which is something the human brain has evolved to interpret.
OpenAI's o3-low model scored 75.7 percent on the first edition of ARC-AGI. By comparison, its 4 percent score on the second edition shows how difficult the test is, but also how there's a lot more work to be done with reaching human level intelligence.
Topics Google OpenAI
PlayStation 4 is finally getting external hard drive supportJapan's largest messenger app is taking on Slack with chat for workDon't ever try to use 'Harry Potter' against J.K. Rowling in an argumentMashTalk: What even is Instagram anymore?10 apps you'll definitely want at the next protestAzar is Chatroulette back from the dead, and it's going viral right under your noseLow Power FM radio stations hacked to play explicit antiMashTalk: What even is Instagram anymore?How Twitter took on Trump's bot army—and wonOh nothing, just a man walking a duck with two dogs in his pocketsPopular Twitter account that rates dogs now has a cute mobile gameMozilla gives up on Firefox OS, its mobileDon't ban refugees. Ban garbage Facebook memes about refugees.Forget telepresence robots on wheels, Google wants telepresence drones'Sharknado 5' is here to destroy the worldAmazon's record holiday season wasn't good enough for Wall StreetPlayStation 4 is finally getting external hard drive supportEmilia Clarke wraps up 'Game of Thrones' filming with an R. Kelly lip syncMcSweeney's published Trump's Black History Month speech as a humor columnDon't ban refugees. Ban garbage Facebook memes about refugees. How to fight loneliness during coronavirus social distancing Dinky One is a new dating site that caters to people with small penises Microsoft turns its Build conference into an online Fifth grader perfectly explains the similarities between The Rock and a rock Netflix's 'On My Block' Season 3 finale: Why the show should end now Tourism nixed by coronavirus: Airbnb got it right. Hotels.com didn't. UK government finally, FINALLY scraps the tampon tax Tesla has produced one million cars U.S. health agency attacked by hackers amid coronavirus outbreak AMC, Regal cut down theater attendance because of coronavirus Coronavirus delays John Krasinski's 'A Quiet Place: Part 2' Bitcoin's recent price drop proves it's not a 'safe haven' investment 'Work from home' coronavirus policy is a blessing with a hidden risk Man turns top deck of bus into dance floor, busts out flawless shapes Orlando promises the future of self Apple shuts all stores around the world — except in China Marshall's Monitor II ANC headphones review Apple's Worldwide Developer Conference 2020 will be online Earning a minimum wage from Spotify is a lot harder than you think Google asks tens of thousands of U.S. workers to stay home amid coronavirus fears