Working notes on language and the representation of meaning
Presented with apologies.
All meaning is use.
Put another way: there is nothing real between syntax and pragmatics.
Attempts to axiomatize the in-between place miss the point: this is not
where our reasoning needs to be sound.
Which is not to say there aren’t useful abstractions at this level, or
models with latent variables that correspond to “semantics” in the
classical sense—only that a map from strings to database queries has
not yet reached “meaning”.
Which, in turn, is not to say that the problem is with the database
itself. Certainly we need some level of abstraction above individual pixel
intensities—this is what perception does for us. But in telling me that
some subset of all utterances correspond to logical forms, you haven’t
yet explained how I will interpret any utterance in this subset, nor
explained what to do with all the utterances outside of it.
Many standard problems in the philosophy of language (“the morning star
is the evening star”, etc.) arise only because we insist on ascribing
to sentences meanings independent of their speakers’ and hearers’ mental
states.
Grice’s timeless meanings are useful, but they are not what we ultimately
want to model.
I regard the early chapters of Philosophical Investigations as basically
persuasive on this view.
Or, from a different angle, a colorful anecdote: One of my old professors
is at a
dinner party with Elizabeth Anscombe, who says “Look—I will prove to you
that a speaker-independent sentence meaning cannot be built up
compositionally from its parts.” As they are leaving the party, she goes
to the host, shakes his hand, and says “Fuck you very much for dinner.”
The host replies “You’re very welcome.”
She means to express thanks, she chooses a set of words which in other
contexts would convey exactly the opposite, but her thanks are received
and correctly interpreted. This is only possible if G.E.M.A. has a very
precise model of how the speaker will respond to any set of sounds she
produces (including corrections by his internal language model). And this
is all she needs! No reason to worry about how to say “fuck you” in the
fluent calculus. Can we formalize this?
Q: If logical propositions are not enough, what abstraction do we
use for the representation of meaning? A: A decision process.
More specifically: think of the world you live in as approximated by a
POMDP. Language forms both a class of observations (when produced by
others) and a class of actions (when produced by you).
Qua observation in a POMDP, all speech tends to alter the listener’s belief
state.
With the exception of performatives, the only first-order effect language
can have on the world is to alter belief states.
Of course, nobody would ever communicate if it didn’t have second-order
effects: altering others’ behavior by altering their belief states.
Language cannot be understood without a model of the speaker’s belief
state and value function. Language cannot be produced without a model of
the listener’s belief state and value function.
This is not quite a private language argument: even if we don’t want to
talk about public meanings, communication is only possible if there is a
shared inventory of symbols—an equilibrium of a coordination game.
If you want a compact description of an utterance’s meaning, relativized
to an individual speaker, use this: the update that will be made to their
belief state when it is communicated to them. (This belief state has
memory, and includes a distribution over all histories consistent with the
present.)
If we ignore memory, in a discrete state space this might look like a draw
from a Dirichlet distribution.
Currently, most interesting semantics work looks like transforming questions
into database queries. How do we build a question answering system that
bottoms out in a POMDP?
System zero receives an utterance, transforms it into a database query,
executes the query, and prints the result. All mechanically, because it is
programmed to do so.
System one receives an utterance, and knows that it will receive a
reward for returning the output of the query corresponding to the
utterance.
System two receives an utterance from a user, and knows that it will
receive a reward for altering that user’s mind to include a particular
true belief associated with the utterance. This is what we really want.
Let’s generalize this:
The level-zero speaker mechanically produces linguistic
representations of its belief state.
The level-zero hearer mechanically updates its belief state in
response to language.
The level-zero language-user is both and . Think of this
as a pre-theory-of-mind language-learner: whenever an event happens in the
world, nature also produces some structured sequence of noises (a
linguistic description of the event).
The level-one speaker produces language in order to form
appropriate beliefs in an ; interprets language as though
it were produced by , and so on.
obeys the Gricean maxim of quality. obeys the maxim of
quantity. obeys the maxims of relation and manner.
Observation: I am and in most of my interactions with
people, but higher levels when playing Mafia. I am not clever enough to be
.
The first thing we need it a direct mapping from language to paths
through a state space, and back again. The second thing is a good-enough
model of other language users.
Does reward-chasing explain all language learning behavior?
Speculation: A pre-verbal child cannot use language for anything, and
doesn’t know that it will eventually be able to do so. Yet by the time it
has the ability to make sounds, it has already learned the names of
things. See also: baby sign language, Pavlov’s dogs.
Maybe is best explained in terms of some innate
correlation-learning behavior (“monkey see, monkey do”), and only higher
levels as learning to maximize rewards.
I can walk up to a stranger on the street, ask for the time of day, and be
reasonably assured of receiving a correct answer.
What do I mean by “ask for the time of day”? Produce a set of sounds that,
given my model of the listener (I have a strong prior on the behavior of
strangers on the street), will produce an appropriate update to his belief
about my mental state.
This is not the same as “produce a set of sounds that, in another speaker,
I would interpret as the speaker’s attempt to update my belief about his
mental state”. I might be in a foreign country, and have memorized the
appropriate sequence of muscle movements, but be unable to recognize them
when spoken to me. Symmetry is not necessary.
What are my assumptions about the stranger? Easier to ask under what
conditions this procedure fails (i.e. in which I wind up with a false
belief about the time, or no belief at all). Possibilities: if the
stranger doesn’t the language I use, if the stranger is a sociopath, or if
the stranger’s watch is broken (nonexistent). The first two are
interesting.
How much language behavior depends on people being mostly altruistic most
of the time? (Possibly in a stronger sense than Gricean.)
A group of two adversaries won’t bother to communicate
with each other, but three will. (Think of chess for three players, or
Diplomacy for two.) Invention of language is a Nash equilibrium here, and
a better one than if nobody talks.
The interaction between language and a “database” model of the world is
complicated.
This world is basically a collection of discrete entities, each with a
set of properties (inc. part–whole relationships, etc.).
Color is one of these properties—not just real-valued, but discretized.
But we know color is slightly Whorfian—my willingness to classify
objects as orange or brown, or blue or green, is at least partly
a function of my native language.
Similarly, I do not notice parts of some things until they are given names
distinct from their wholes.
So the schema of this database is certainly not fixed ahead of time.
But structure doesn’t only flow from language to schema—also in the
other direction. Sometimes I notice a new regularity in the world, or a
new part. In order to incorporate it into my interior monologue, I have to
give it a name. I do this reflexively.
Again, think of the underlying representation of meaning as a POMDP. What
does language do in this world?
Let’s limit ourselves to a literal listener. This means declaratives and
imperatives, but no questions.
It would be formally convenient if language only talked about the
probability of being in states.
Let our state representation encode the action currently executed by
every entity in the world.
“I am eating truffles”—sharpen my belief around states where this is
true.
“I will eat truffles tomorrow”—increase the posterior of paths including
that (single) edge; that is, increase transition probabilities into
truffle-eating states.
“I will buy truffles every Thursday”—increase transition probabilities
along all such edges.
“I might buy truffles tomorrow”—as above, but less so.
“You must buy truffles tomorrow”—increase an estimate of value.
We could do many of the above things by altering single transition
probabilities, rather than all transitions in or out of a state.
We could probably replace value with reward above.
An agent’s behavior in an MDP is (over)specified by a model (which is a
transition function and a reward function) and a policy (which is at least
a value estimate for (state, action) pairs). If we want to learn the
meaning of language automatically, maybe we should allow language to
update any of these three things. I.e. this graphical model:
text -> update type -> update
| ^
\----------------------/
If everything is covered in here somewhere, we don’t need to worry about
cleanly separating these updates according to pre-existing linguistic
theory.
Because all states remember their history, this has the form of a
(minimax) tree search rather than a graph search.
This makes it easy (formally, not computationally) to incorporate
information about long sequences not localized to single
transitions—just add weight to all move sequences consistent with the
high-level information.
So: at time I maintain a belief over current states in the
tree, derived from a belief about how I got to them, and a belief about
what transitions will be probable in the future. Coupled with knowledge of
my own (deterministic) behavior, this gives a distribution over sequences
from the beginning of time to the horizon.
My behavior, in turn, is specified by a (heuristic!) estimate of the value
of every state.
We can say: declarative sentences increase or decrease the probability
of individual paths in this tree (i.e. my beliefs).
We can say: imperative sentences increase or decrease my estimates of the
value along individual paths in this tree.
So the meaning of a sentence is: 1. an update type (probably just from
imperative / declarative); 2. a sequence-recognizing predicate; 3. a
delta (sign matters).