Mathematical meaning is not captured by meaning-as-use.
|
home |
November 11, 2024
by Phoebe Klett and W. J. Zeng
“The meaning of a word is its use in the language.” - Wittgenstein, Philosophical Investigations
Large language models take Wittgenstein’s insight to heart. They learn the meaning of words from their context, i.e. how the words are used in the text (and other data) that form its training set. The recent successes of LLMs seem like a victory for this meaning-as-use approach. The extent to which LLMs can or will understand meaning is the extent to which meaning can be reduced to use-in-context.
But there are languages where LLMs still clearly struggle, namely languages for mathematical and symbolic reasoning. Why is use and context not sufficient to understand this kind of meaning? What meaning of the tokens “5+3 = 8” is not captured in the collection of texts in which they appear?
We propose that meaning-as-use is not sufficient for mathematical meaning. Language is not the real world itself. It is a representation of the world, and so any meaning that LLMs determine is a meaning that comes from compressing and correlating representations. We argue that mathematical reasoning is instead about the world itself (the signified) and not just its representations (the signs). Importantly, our position is directly opposed to the (sometimes common) view that mathematical meaning is captured purely by syntactical manipulations.
Mathematical meaning is encoded in the physical world
We suggest that the non-use meaning of mathematics is encoded in the physical world. Let’s consider a motivating example. If you pick two large numbers at random and ask a language model to add them, then you are likely to have chosen an instance that it hasn’t seen during training. Likely too, the LLM will give the wrong answer. If you ask the same to a person, they’ll be able to add the numbers mentally up to some size, and then at some point, say 102 digits in each number, they’ll reach for a pen and paper. At this point, the response to the addition prompt becomes an interaction between the person and the physical world. This response becomes a computation where the properties of the world (reliable memory of ink on paper, muscle memory of biological arm, conservation of pen ink, etc.) support the meaning of the computation. If you choose really big numbers, say 1010 digits, then we reach for computers where we find the algorithms and meaning of addition in the complicated physics of silicon transistors.
Similarly, we argue that LLMs need access to the physical world either through embodiment (e.g. robots in the physical world) or through simulation (e.g. automated theorem provers, or Turing-machines encoded on computers) if they are to understand mathematical meaning.
You may object that one does not need to actually add 1010 digit numbers in order to understand addition. We would agree. Addition was understood by humans long before we built computers capable of practically doing those long computations. However, important to this understanding, is the principal that given sufficient time and sufficient apparatus, those computations could be done by hand. The mathematical understanding relies on the assumption of a stable physical reality to support the computation. This isn’t a statement about the resource complexity of the computation. We don’t in principal need enough food to feed the humans to do the computation. We do in principal need stable physical reality.
Let’s consider what would happen if our assumptions about a stable physical world were violated. Perhaps, unknown to us, there is some fundamental time in the future, say in 10 minutes, that resets the universe back to the big bang. The addition that seems natural in that universe is different from the one that seems natural to us. Beings in that universe would model it differently, perhaps putting addition into a cyclic group. What mathematics might we have learned if we were embodied in this world with periodic resets?
Of course, our natural number addition would still exist in that cyclic world but would seem less appropriate for their everyday counting.
While this example doesn’t break us out into different mathematical meaning entirely, it shows that the underlying reality does affect the character of mathematical meaning. Switching between the cyclic time reality and our reality switches which mathematical notion is natural and which is abstract and constructed1. Fundamentally, mathematics is about the signified and not just the sign. This may be part of the explanation for the “unreasonable effectiveness” of mathematics in modeling the world.2
We aren’t suggesting that neural networks are incapable of learning addition or other mathematical functions. Neural networks can, in theory, approximate any computable function. We are suggesting that the reason they haven’t yet learned addition is that they are not trained properly for this task. The computable function of addition is not wholly contained in the representations of meaning-as-use that comprise today’s training sets. Bigger training sets of math literature won’t get us there either as they are still the wrong type: representations. Mathematical meaning also requires the world itself.
We propose that mathematical meaning is distinct from meaning-as-use because mathematical meaning is encoded in the world itself and not its representations. If we want systems that understand mathematical reasoning, then we must give these systems access to physical reality. Not physics and mathematics literature, but actual physics in-action via experiments and simulators on physical substrates.