For those of us not quite willing to bow down before our new AI overlords (to say nothing of the Idiocracy we’ll all be saddled with for at least the next four years), here is a somewhat encouraging story from Live Science about how even the most advanced AI models have completely flunked out when it comes to solving the most complex problems in the rarified realm of higher mathematics:
Mathematicians have stumped the most advanced generative artificial intelligence (AI) models with a series of mind-bending new math problems.
These problems typically require doctorate-level mathematicians hours to days to solve, according to the research institute
Epoch AI. But in the new tests, the most advanced AI models on the market got correct answers on less than 2% of these problems.
Most of these benchmarks are geared toward testing AI's ability to do high-school and college-level math, Elliot Glazer, a mathematician at Epoch AI, and colleagues wrote in a new paper posted on the preprint database arXiv. (The paper has not yet been peer-reviewed or published in a scientific journal.)
The problems were also unique — a step taken to ensure that none of the problems were already in the AI models' training data. When complex reasoning problems are included in the training data, the AI may appear to solve the problems, but in reality, it already has a "cheat sheet," since it has been trained on the answers.
"[E]ven when a model obtained the correct answer, this does not mean that its reasoning was correct," the paper authors wrote. "For instance, on one of these problems running a few simple simulations was sufficient to make accurate guesses without any deeper mathematical understanding. However, models' low overall accuracy shows that such guessing strategies do not work on the overwhelming majority of FrontierMath problems."
The findings show that right now, AI models don't possess research-level math reasoning, Epoch AI's collaborators concluded. However, as AI models advance, these benchmark tests will provide a way to find out if their reasoning abilities are deepening.
Who would have thought that higher-reasoning math skills would still be the apparent Achilles’ Heel of advanced AI models at this point? I wonder if Elon is paying any attention?