Why AI often struggles with math

In the school year that ended recently, one class of learners stood out as a seeming puzzle. They are hardworking, improving and remarkably articulate. But curiously, these learners — artificially intelligent chatbots — often struggle with math.

Chatbots such as Open AI’s ChatGPT can write poetry, summarize books and answer questions, often with human-level fluency. These systems can do math, based on what they have learned, but the results can vary and be wrong. They are fine-tuned for determining probabilities, not doing rules-based calculations. Likelihood is not accuracy, and language is more flexible, and forgiving, than math.

“The AI chatbots have difficulty with math because they were never designed to do it,” said Kristian Hammond, a computer science professor and artificial intelligence researcher at Northwestern University.

That, on the face of it, is a sharp break with computing’s past. Since the early computers appeared in the 1940s, a good summary definition of computing has been “math on steroids.” Computers have been tireless, fast, accurate calculating machines. Crunching numbers has long been what computers are really good at, far exceeding human performance.

Traditionally, computers have been programmed to follow step-by-step rules and retrieve information in structured databases. They were powerful but brittle. So, past efforts at AI hit a wall.

Yet, more than a decade ago, a different approach broke through and began to deliver striking gains. The underlying technology, called a neural network, is loosely modeled on the human brain.

This kind of AI is not programmed with rigid rules, but learns by analyzing vast amounts of data. It generates language, based on all the information it has absorbed, by predicting what word or phrase is most likely to come next — much as humans do.

“This technology does brilliant things, but it doesn’t do everything,” Hammond said. “Everybody wants the answer to AI to be one thing. That’s foolish.”

At times, AI chatbots have stumbled with simple arithmetic and math word problems that require multiple steps to reach a solution, something recently documented by some technology reviewers. The AI’s proficiency is getting better, but it remains a shortcoming.

Speaking at a recent symposium, Kristen DiCerbo, chief learning officer of Khan Academy, an education nonprofit that is experimenting with an AI chatbot tutor and teaching assistant, introduced the subject of math accuracy. “It is a problem, as many of you know,” DiCerbo told the educators.

A few months ago, Khan Academy made a significant change to its AI-powered tutor, called Khanmigo. It sends many numerical problems to a calculator program instead of asking the AI to solve the math. While waiting for the calculator program to finish, students see the words “doing math” on their screens and a Khanmigo icon bobbing its head.

For more than a year, ChatGPT has used a similar workaround for some math problems. For tasks such as large-number division and multiplication, the chatbot summons help from a calculator program.

Math is an “important ongoing area of research,” OpenAI said in a statement, and a field where its scientists have made steady progress. Its new version of GPT achieved nearly 64% accuracy on a public database of thousands of problems requiring visual perception and mathematical reasoning, the company said. That is up from 58% for the previous version.

The AI chatbots often excel when they have consumed vast quantities of relevant training data — textbooks, drills and standardized tests. The effect is that the chatbots have seen and analyzed very similar, if not the same, questions before.

The large language models, LeCun has said, have little grasp of logic and lack common-sense reasoning. What’s needed, he insists, is a broader approach, which he calls “world modeling,” or systems that can learn how the world works much as humans do. And it may take a decade or so to achieve.

In the meantime, though, Meta is incorporating AI-powered smart assistant software in its social media services including Facebook, Instagram and WhatsApp, based on its large language model, LLaMA. The current models may be flawed, but they still do a lot.

Oman Observer is now on the WhatsApp channel. Click here