Working notes on language and the representation of meaning

Presented with apologies.

  1. All meaning is use.
    1. Put another way: there is nothing real between syntax and pragmatics. Attempts to axiomatize the in-between place miss the point: this is not where our reasoning needs to be sound.
    2. Which is not to say there aren’t useful abstractions at this level, or models with latent variables that correspond to “semantics” in the classical sense—only that a map from strings to database queries has not yet reached “meaning”.
    3. Which, in turn, is not to say that the problem is with the database itself. Certainly we need some level of abstraction above individual pixel intensities—this is what perception does for us. But in telling me that some subset of all utterances correspond to logical forms, you haven’t yet explained how I will interpret any utterance in this subset, nor explained what to do with all the utterances outside of it.
    4. Many standard problems in the philosophy of language (“the morning star is the evening star”, etc.) arise only because we insist on ascribing to sentences meanings independent of their speakers’ and hearers’ mental states.
    5. Grice’s timeless meanings are useful, but they are not what we ultimately want to model.
    6. I regard the early chapters of Philosophical Investigations as basically persuasive on this view.
    7. Or, from a different angle, a colorful anecdote: One of my old professors is at a dinner party with Elizabeth Anscombe, who says “Look—I will prove to you that a speaker-independent sentence meaning cannot be built up compositionally from its parts.” As they are leaving the party, she goes to the host, shakes his hand, and says “Fuck you very much for dinner.” The host replies “You’re very welcome.”
    8. She means to express thanks, she chooses a set of words which in other contexts would convey exactly the opposite, but her thanks are received and correctly interpreted. This is only possible if G.E.M.A. has a very precise model of how the speaker will respond to any set of sounds she produces (including corrections by his internal language model). And this is all she needs! No reason to worry about how to say “fuck you” in the fluent calculus. Can we formalize this?
  2. Q: If logical propositions are not enough, what abstraction do we use for the representation of meaning? A: A decision process.
    1. More specifically: think of the world you live in as approximated by a POMDP. Language forms both a class of observations (when produced by others) and a class of actions (when produced by you).
    2. Qua observation in a POMDP, all speech tends to alter the listener’s belief state.
    3. With the exception of performatives, the only first-order effect language can have on the world is to alter belief states.
    4. Of course, nobody would ever communicate if it didn’t have second-order effects: altering others’ behavior by altering their belief states.
    5. Language cannot be understood without a model of the speaker’s belief state and value function. Language cannot be produced without a model of the listener’s belief state and value function.
    6. This is not quite a private language argument: even if we don’t want to talk about public meanings, communication is only possible if there is a shared inventory of symbols—an equilibrium of a coordination game.
    7. If you want a compact description of an utterance’s meaning, relativized to an individual speaker, use this: the update that will be made to their belief state when it is communicated to them. (This belief state has memory, and includes a distribution over all histories consistent with the present.)
    8. If we ignore memory, in a discrete state space this might look like a draw from a Dirichlet distribution.
  3. Currently, most interesting semantics work looks like transforming questions into database queries. How do we build a question answering system that bottoms out in a POMDP?
    1. System zero receives an utterance, transforms it into a database query, executes the query, and prints the result. All mechanically, because it is programmed to do so.
    2. System one receives an utterance, and knows that it will receive a reward for returning the output of the query corresponding to the utterance.
    3. System two receives an utterance from a user, and knows that it will receive a reward for altering that user’s mind to include a particular true belief associated with the utterance. This is what we really want.
  4. Let’s generalize this:
    1. The level-zero speaker mechanically produces linguistic representations of its belief state.
    2. The level-zero hearer mechanically updates its belief state in response to language.
    3. The level-zero language-user is both and . Think of this as a pre-theory-of-mind language-learner: whenever an event happens in the world, nature also produces some structured sequence of noises (a linguistic description of the event).
    4. The level-one speaker produces language in order to form appropriate beliefs in an ; interprets language as though it were produced by , and so on.
    5. obeys the Gricean maxim of quality. obeys the maxim of quantity. obeys the maxims of relation and manner.
    6. Observation: I am and in most of my interactions with people, but higher levels when playing Mafia. I am not clever enough to be .
    7. The first thing we need it a direct mapping from language to paths through a state space, and back again. The second thing is a good-enough model of other language users.
  5. Does reward-chasing explain all language learning behavior?
    1. Speculation: A pre-verbal child cannot use language for anything, and doesn’t know that it will eventually be able to do so. Yet by the time it has the ability to make sounds, it has already learned the names of things. See also: baby sign language, Pavlov’s dogs.
    2. Maybe is best explained in terms of some innate correlation-learning behavior (“monkey see, monkey do”), and only higher levels as learning to maximize rewards.
  6. I can walk up to a stranger on the street, ask for the time of day, and be reasonably assured of receiving a correct answer.
    1. What do I mean by “ask for the time of day”? Produce a set of sounds that, given my model of the listener (I have a strong prior on the behavior of strangers on the street), will produce an appropriate update to his belief about my mental state.
    2. This is not the same as “produce a set of sounds that, in another speaker, I would interpret as the speaker’s attempt to update my belief about his mental state”. I might be in a foreign country, and have memorized the appropriate sequence of muscle movements, but be unable to recognize them when spoken to me. Symmetry is not necessary.
    3. What are my assumptions about the stranger? Easier to ask under what conditions this procedure fails (i.e. in which I wind up with a false belief about the time, or no belief at all). Possibilities: if the stranger doesn’t the language I use, if the stranger is a sociopath, or if the stranger’s watch is broken (nonexistent). The first two are interesting.
    4. How much language behavior depends on people being mostly altruistic most of the time? (Possibly in a stronger sense than Gricean.)
    5. A group of two adversaries won’t bother to communicate with each other, but three will. (Think of chess for three players, or Diplomacy for two.) Invention of language is a Nash equilibrium here, and a better one than if nobody talks.
  7. The interaction between language and a “database” model of the world is complicated.
    1. This world is basically a collection of discrete entities, each with a set of properties (inc. part–whole relationships, etc.).
    2. Color is one of these properties—not just real-valued, but discretized.
    3. But we know color is slightly Whorfian—my willingness to classify objects as orange or brown, or blue or green, is at least partly a function of my native language.
    4. Similarly, I do not notice parts of some things until they are given names distinct from their wholes.
    5. So the schema of this database is certainly not fixed ahead of time.
    6. But structure doesn’t only flow from language to schema—also in the other direction. Sometimes I notice a new regularity in the world, or a new part. In order to incorporate it into my interior monologue, I have to give it a name. I do this reflexively.
  8. Again, think of the underlying representation of meaning as a POMDP. What does language do in this world?
    1. Let’s limit ourselves to a literal listener. This means declaratives and imperatives, but no questions.
    2. It would be formally convenient if language only talked about the probability of being in states.
    3. Let our state representation encode the action currently executed by every entity in the world.
    4. “I am eating truffles”—sharpen my belief around states where this is true.
    5. “I will eat truffles tomorrow”—increase the posterior of paths including that (single) edge; that is, increase transition probabilities into truffle-eating states.
    6. “I will buy truffles every Thursday”—increase transition probabilities along all such edges.
    7. “I might buy truffles tomorrow”—as above, but less so.
    8. “You must buy truffles tomorrow”—increase an estimate of value.
    9. We could do many of the above things by altering single transition probabilities, rather than all transitions in or out of a state.
    10. We could probably replace value with reward above.
    11. An agent’s behavior in an MDP is (over)specified by a model (which is a transition function and a reward function) and a policy (which is at least a value estimate for (state, action) pairs). If we want to learn the meaning of language automatically, maybe we should allow language to update any of these three things. I.e. this graphical model:
          text -> update type -> update
            |                      ^
            \----------------------/
      
    12. If everything is covered in here somewhere, we don’t need to worry about cleanly separating these updates according to pre-existing linguistic theory.
    13. Because all states remember their history, this has the form of a (minimax) tree search rather than a graph search.
    14. This makes it easy (formally, not computationally) to incorporate information about long sequences not localized to single transitions—just add weight to all move sequences consistent with the high-level information.
  9. So: at time I maintain a belief over current states in the tree, derived from a belief about how I got to them, and a belief about what transitions will be probable in the future. Coupled with knowledge of my own (deterministic) behavior, this gives a distribution over sequences from the beginning of time to the horizon.
    1. My behavior, in turn, is specified by a (heuristic!) estimate of the value of every state.
    2. We can say: declarative sentences increase or decrease the probability of individual paths in this tree (i.e. my beliefs).
    3. We can say: imperative sentences increase or decrease my estimates of the value along individual paths in this tree.
    4. So the meaning of a sentence is: 1. an update type (probably just from imperative / declarative); 2. a sequence-recognizing predicate; 3. a delta (sign matters).

— 27 February 2014