It's a model mismatch, not an inherent impossibility. A calculator needs to have...

It's a model mismatch, not an inherent impossibility. A calculator needs to have an adaptive number of intermediate steps. Usually our models have fixed depth, but in auto-regressive modelling the tape can become longer as needed by the stepwise algorithm. Recent models show LMs can do arithmetic, symbolic math and common sense chain-of-thought step by step reasoning and reach much higher accuracies.

In other words, we too can't do three digit multiplication in our heads reliably, but can do it much better on paper, step by step. The problem you were mentioning is caused by the bad approach - LMs need intermediate reasoning steps to get from problem to solution, like us. We just need to ask them to produce the whole reasoning chain.

- Chain of Thought Prompting Elicits Reasoning in Large Language Models https://arxiv.org/abs/2201.11903

- Deep Learning for Symbolic Mathematics https://arxiv.org/abs/1912.01412