Hi. Welcome. I'm Aaron Mart. Thank you so much for those of you who have never come across Washington Road. Welcome to Chancellor Green. I so Pine and Chancellor Green. This is our last of the four AMM events that the Center for Digital Humanities has been hosting this year with the generous support of the Humanities Council and more recently with the Princeton Language Intelligence Center. This is a project that was birthed out of the brains of month here. Dan, I just saw Freeman over there who said, wouldn't it be great if we could get all of the folks who were thinking about these problems and issues together and together from across disciplines and talk about them. Natasha and I said, Yes, absolutely. That sounds great. What should we do? Dan, said let me talk to Dache and then Dache and I got together and we wrote the grants and then Dache created a new life and had to get busy taking care of that little bundle of joy all year and so she hasn't been around, but she's here in spirit. And she also had to do the whole Priston Language at the television Center. But we've been so excited to have these events, and we're so excited that Ts here to finish us off with thank goodness and understanding because we have still not very much. But I just wanted to recap that we had Whitfer here to talk about society in the very first event, talking with our colleague vi Mariana for CTP computer science, and then we had hitch Dim here to talk about language. A culture. And our last session with Simon D Deo, speaking with our own Arthur Spy about models. I don't know if we had an understanding after that session, I'm sure we will after this one. I'm now going to pass the stage over to Brandon Stewart, our Per Cole in sociology and politics. It's going to introduce Serna Series. Thank you for those of us being with us the whole time. We really appreciate Okay. All right. Thank you so much for coming. Today's topic for the LLM form is understanding. Understanding of language, society, culture, and theory of mind. I definitely don't envy the speakers in trying to figure out how to say things about this. But I know they're up to the task. So Professor Tal Linson earned his PhD in linguistics from MYU in 2015, followed by a two year postdoc at Emo Superior. He then joined John Hopkins University where he was appointed in Cognitive Science, computer science, and the Center for Language speech processing. In 2020, he moved back to his home MYU where he is now an associate professor of linguistics and Data Science and also lights as on research scientist. And there he directs the computation and psycholinguistics lab where they published work in computer science conferences on deep learning, in computational linguistics in journals and psychology and neuroscience and linguistics. He covers a large range of fields. I always love a good title when I'm prepping intros and he has one of the favorites I've seen in a while, which is colorless green recurrent networks dream hierarchically. Great. So he's a fantastic choice for today's topic. He's done influential and highly cited work on limits of what LMS can understand, including his paper for the wrong reasons, diagnosing syntactic touristics in natural language inference, which broke down three syntactic heuristics that natural language inference models learn when doing entailment and sort of discusses the context in which these breakdown. His work was funded recently through an MSF career award, and he's had many other great accolades. And so after he speaks, we will have our discussant, our very own Professor Tanya Laboso. She is the Arthur W Marks 19 professor of psychology here at Princeton and directs the program in Cognitive Sciences. After completing her PD at Harvard, she taught at Berkeley before coming to join us here at Princeton. She directs the concepts and cognition lab where she's done path breaking work on causal inference, explanation, the pistovology. Personally, I'm a big fan of her work on explanation and abductive inference. And for those who have tried to write down what an explanation is, it is maddeningly difficult. As she knows at the start of her lab research statement, studying explanations often well, dot dot requires explanation. H. Not to be done with great titles. She has a fantastic one. Tell me your cognitive budget, and I'll tell you what your values are, and explanation is effective because it is selective, which has a nice rhyme and starts with the section, picky with a purpose, which I just really enjoy. She's also received many accolades, including the standing prize from the society of philosophy and psychology and award from the NSF. And I will say at perf level, one thing that strikes me from knowing Tanya is her intellectual generosity. She reads very broadly and engages with scholars across a wide range of fields. And so I'm sure she will give us excellent discussion of Helen's excellent talk. Okay. Thank you for this introduction. Probably the longest one that I've had. So it's first Crisper speaking in a humanities forum. No, I really actually enjoy the idea of looking through the test title and someone's CV I might steal that from you. Anyway, so yeah, as Brennan said, quite the prompt here, like to talk about understanding. So I couldn't just recycle on existing talk, I had to actually think about how to make a new talk. I title provocatively. The language models understand. I'm not going to answer this question here only. So First of all, I'm going to start with telling you a little bit about what language models are or the way in which I'm going to use that term. So it has a technical significant meaning in machine learning, which is it's a system that assigns probabilities to sequences of words. So if it does its job well, it will tell us that the boy is running is a better sequence of English than the boy is running. Or this shows that in learned something about the grammar of English implicitly. Or it will tell us that the boys running is more probable than the table is running. And we'll also assign specific numbers to each of those. So like that could tell us that the probability of the boys running is 0.007 and the table is running is 0.01. I don't know something like that. So that is all that we mean by language model. Usually, we train it. So it's a machine learning setting to mimic the distribution of sequences in a text corpus. So in some cases, we use some auxiliary data like images and audio that I'm not going to talk about this setup today. So we want the sequences that actually occurred in the corpus to be assigned higher probability than the sequences that didn't occur in the corpus. The sequences that occurred, a lot of times should be assigned high probability to signus occurred fewer times. That all makes sense. Okay. And we can also use those probabilities to generate new sequences of awards. So we can if we generated 1,000 sequences, then most of those sequences will be ones that have a higher probability, but then maybe once in a while I will generated sequences had lower probability. So that's what I mean by a language model. Now what do I mean by understanding. So we have interactions with language models like Chacha PT and our experience of those interactions that the models appear to understand us and produce meaningful language. So there are two questions that this brings up. First of all, is an empirical question. So aside from this more impersonistic takeaway that we have from the interactions, can we actually empirically say that they understand language and all of its variety. So we need to do more scientific experiments to make sure that they consist the behavior is consistent with the behavior a system that we think understands English. The second question is more conceptual, which is could language models in principle understand language? And this is a question that has a lot more philosophical than empirical component to it. So I'm going to try to do half of each question in this talk, see how far I get, but then we can talk about more later. So the appearance of understanding, of course, is kind of an impression that can be misleading. We've all seen this Eliza bod it's like a psychotherapy bud that was created more more than 50 years ago. And it's basically it's based on a set of very simple rules that are meant to trick people into thinking that they're talking to a person, but it's clearly not a system that really understands language. We know exactly how it works. You can be pretty confident that the way that it works is very different from any intuition of language understanding. So we need to evaluate the system systematically. And the notion of systematicity bar from photo pollson is that we will only grant the system understands a sentence if it also understands trivial modification of that sentence. So Uh, it seems that if a system reacts appropriately to John likes Mary, but does something weird with Mary likes John, then we wouldn't say that in a sense, either of those things, it's almost like it got the meaning of John like Mary by chance. It kind of was a fluke to actually understand what is the meaning of liking, you need to be able to apply that to all names. It seems kind of silly if you are able to understand the first and not the second. Okay. And what's even more crucial as generalization beyond the training set. So we can have a system that just memorizes the appropriate responses for a very long sequence of prompts. But then if you give it a prompt that it hasn't seen before, it does something unpredictable and bizarre with it. So the crucial test for understanding is actually not the more common interactions that may have been in the training set with the system, but generalization to new ones that we know were not in the training set. And this has become increasingly difficult in the last few years because language models will train on enormous amounts of words and they can do very well in basic interaction without generalizing at all. This is called technically a machine learning data contamination where the evaluation doesn't actually test for generalization, it just tests for memorization of specific interaction or sequence that happen in the training set. So we need to construct targeted systematic evaluations that really convince us that the model understands a certain aspect of language systematically and not just haphazardly. Okay. So for example, one thing that we can do is take a word that occurred in the training set but did not occur in a specific context. And then test if the model can understand it in that new context. We can construct this is an example of evaluation data set that Najam and I constructed a few years ago. That follows this procedure where we make sure that the examples that we test on were not seen in the training set. Uh, but this is very difficult to do when we evaluate a model of Chacha PT where it's train on Trillings awards, and we don't have access to those awards as you know, people who don't work at opening eye. Okay. So the specific task that I'm going to focus on in this talk and actually, it's going to come back in the second half is natural language inference. And this is a really useful task, I think to evaluate understanding. The idea here is that we get two sentences, premise and a hypothesis, and we have to determine if the hypothesis, the second sentence always follows from the first one. So every time the first sentence is true, then the second sentence also has to be true. So if you have a picture of a soccer game with multiple males playing. It also means that some men are playing a sport. So here the correct el is entailment. If the man inspects the uniform of figure and the hypothesis of the man is sleeping, then it's a contradiction because you don't expect someone who is inspecting the uniform to also be sleeping. So we can see that it's a more expansive notion of just logical entailment. Here you also have to know a little bit about the world to be able to do this. So it's pretty challenging task in a general case. So we can frame basically any AI task as an inference task of this sort. So we wanted to make sure that models are able to do this in the general case. And we specifically focused on the one specific concern about how models could be cheating or, you know, appearing to understand language without actually understanding it. And that is the case of syntactic heuristics. So the horse here is that that's the clever hunts. Maybe some of you know about this guy. If not, I can I can tell you about him later. So Hunts is also an acronym. Anyway. So an example of a heuristic that system could use is to assume that a premise entails all of the hypotheses where the words and the hypotheses are constructed from words that are in the premise. So if the premises the judge was paid by the lawyer, that entails the lawyer paid the judge. All right. So this is a heuristic that actually, if if if you look at the typical tests for entailment, works pretty well. And a lot of a lot of the time it is actually the case that the hypothesis is fully contained in the premise. But of course, we can construct counter examples to that. Like the doctor was paid by the actor that doesn't entail the doctor to pay the actor, even though all of the words in the hypothesis occur in the premise. But that's not going to come up very often. So you might not notice it if you are just interacting with the system in a more casual way. And we also looked at a couple of other heuristics here. It's a more special case where we look at we assume that the premise entails all of its subsequences. Okay. So what we do is we look at a bunch of models and we compare their performance on entailment detection between sentences where the heuristic makes the correct prediction and sentences where the heuristic makes an incorrect prediction. So the prediction our prediction here is that if the model is doing the right thing, It's actually understanding the structure of the sentences, then it's accuracy will be close to 100% in both cases. But if the model is using the heuristic, its accuracy will be very high on the cases in the top where it makes the correct prediction and very low on the cases in the bottom where it makes the incorrect prediction. And this is work that we did five years ago. So the models did very, very poorly here. So did extremely well on the sequences on the case of heuristic was correct and very poorly on case where it was incorrect. So this is very strong evidence of the model is not understanding the sentence that is doing something that's a proxy for the correct thing that it's supposed to do. And this is something that people hadn't noticed before we constructed this data set, which is quite striking. People said this model, which was a popular model a few years ago, is very good at entailment because it gets it's overall accuracy, something like 85%. But actually, if you it on the right case, you see that it does something that's pretty funny. So that's why we need these systematic elevations. But then you might ask, well, this was five years ago. What about GPD four? That's usually the question that comes up in any talk that I give. So I'm preempting it and telling you that we tried to ask PDF these questions. So here we interact with it and with the child bodes. We asked if the lawyer was advised by the actor. Is it definitely true that the lawyer advised the actor? And then if four said, yes. Which is not the correct answer. So this performance was not as bad as Byrd a few years ago? Here, I'm not going to talk about all these numbers, but the important ones are the ones at the far I guess the lab pointed it doesn't actually work. Okay. That's fine. The important numbs are the 64 and 58, that's very low accuracy. That's close to just guessing, basically. So I guess that's that's a very recent paper that we just did. So looks like the problem still persists, which is kind of interesting. So basically, important takeaway here is that we need to evaluate understanding in a systematic way and this kind of casual interaction with the model is not necessarily telling about whether the model understands language systematically. The second set of topics that I want to talk about is the more conceptual one. And that's going to be pretty fun. Here, the question is, could language models in principle understand maybe the specific models that we have now don't understand, but we need to add more GPUs to training and eventually they will understand. So the important thing here is to distinguish two notions of understand. There's at least two notions, probably 100 notions of not but two very important ones, which is the definition of understanding that I feel comfortable with is that I would say that a system understands Language if it behaves as if it understands language. So this kind for us, it's useful to describe a system as something that understands language, I'm comfortable with ascribing understanding to that system. And that is a view that down then it. I'm going to bring in a lot of very superficial philosophy in this talk. I'm not a philosopher, but hopefully I'm not misrepresenting the philosophers. So that's what I'm comfortable talking about. Now, a lot of people feel that that is a notion that is too deflationist or weak. And what they are interested in is not behavioral understanding, but something that's more like fundinological understanding. So we will say that the system understands language if it has the experience that a human has when they understand language. It's a, when I tell you that John L Mary, that maybe evokes all of your high school crushes or something, and that's part of the meaning of liking for you. And that's going to be very hard to prove that language model also has experiences like that. In general, if you want to ascribe experiences to a language model, it's likely that he would also need to argue that it's conscious. Now, I know that it's a very controversial topic. I don't find it so offensive to argue that language models are conscious or at least in principle could be conscious if we made them better than they are now. **** Trauma has written a nice paper about this a few months ago. So I'm ightly convinced that it's tricky because they think that everything is conscious, including chairs. Philosophers, I have weird beliefs. Every philosopher has at least one extremely weird belief. I realized from talking to them. But I mostly focused on the first more, I guess, empirical notion of understanding here. So one of the most important objection to why language models couldn't principle understand language is that even setting aside the empirical issue, the language that Language models produce is not grounded in experience with the world. They just learn the co occurrences between words, how often does this word occur for the other one and so on. And a crucial aspect of understanding is a connection between language and the external world. And that's something that we as humans who actually interact with the world have and language models couldn't in principle because they only they remain in the internal to language and they don't come out of language, basically. So that actually suggests that if the model had, I don't know, arms and legs, like a robot and interacted with the world, and maybe it could understand language. That's which maybe that's what grounding proponents would argue. So I'm associating this view with Emily Bender just as an example, but there are other people who I think implicitly explicitly advocate for this view. So we have argued in a half empirical half theoretical paper that actually can learn a lot about meaning just from form. The example here is if the Rosetta Stone where people basically the task of the language model is very similar to the task of the human like encountered the Rosetta stone for the first time. We don't have a translation. Let's say we don't have a translation. We only have one one part and like one of the languages and we need to reconstruct the grammar of the language just from that language. So how much of natural language meaning can we learn a dress from form here specifically just language. And I will argue that we can describe a lot of language as inferential rules. So the entailment task that I mentioned at the beginning, actually characterize almost everything that we know about language. I'll say more about the syn a bit. And if we assume the speakers produce language and according with commas communicative principles, we can infer a lot from distribution of sentences in the corpus about which sentences entail each other. This was a long sentence we'll unpack the syn a bit. The second part is reference, right? So we were concerned about the connection between the words that the language model with producing and the external world because the language model doesn't have experience with the external world. So I would argue that is miss Gage, because humans also don't have experience with the reference of almost any of the words that they produce and we still are willing to describe meaning and reference to those words. So I'll say more. When when do we need to end? I didn't keep track of what like in Yeah. Like 10 minutes or 5 minutes, 15. Okay, very good. So good. I think I can just go through everything. I just wasn't sure when we started. Okay. So as I said, entail just means if a sentence entails, another one, if the second sentence is true, whenever the first one is 50 or 20. Okay. Um, You must know. All right. So no one has anything after this, right? Great. Okay. Yeah. We have a dinner reservation at eight, right? So at eight. Yeah, that's right. Okay. So there's actually a school of foo philosophy that says that everything that we know about meaning can be captured by these entailment relation between sentences, that's a bit of an extreme position. But I think it's true that most of what we know about meaning can be captured at the entailment between sentences. Um, so the argument here is that I understand what the Apple is red means, if I know, for example, that the apple is red and tail is the apple has a color, the apple is dark red tail, the apple is red, and so on. And it's actually not so important to be able to classify red things versus non red things. And it's also not sufficient for us to say that the speaker actually understands the word red. So I mean, we can we can think of very clear examples of that happening, for example, many of the sent scientific concepts or concepts that I don't like, I wouldn't be able to diagnose someone with a disease but I can still I can know what the disease means because I trust other people in society to know more about this than than I do. So it's not that I have no idea what the word arthritis means because I don't fully understand the mechanisms that cause arthritis or what the symptoms are, how to cure someone with arthritis and so on. So the question that we ask in this empirical paper is whether there is evidence for entailment sematics in form or do we need to have an independent source of understanding? So do we need some sort of explicit connection with the external world? Do we need someone to teach us that senus one entail sens two? And we think that actually in the distribution of words and language, there is a lot of evidence for entailment. One kind of maybe obvious source of evidence for entailment is that often we explicitly use words like therefore or because that actually tells us something about the relationship between sentences. But we're going to give a different argument here, which is that the sequences of sentences that we encounter in the corpus are not just a random signals of sentences. They're produced by speakers that are human and they are rational and are trying usually to convey something to the other person. And as Paul rice has argued, the philosopher from the 70s, speakers tend to follow the cooperative principle. So they tend to say what they think would advance the goals of the conversation to say something that is actually useful for the other person for the conversation. Again, we don't just produce sequences of things that are true in the world. So the gist of it is that utterances should be true they should be relevant and informative. So crucially, we shouldn't just say something that we have already said or that is obvious. So just as an example. The first sentence they say is I have two cats. What is the second sentence say here? So saying I have at least one cat is true and consistent with the first one, but is not informative. So normal people would not say that. I don't have a cat is really bizarre because it's not even truthful. Something like one of them is orange is great. It's informative, truthful, relevant. That's the kind of sequence of the sense that you would find in the corpus. Okay. So what we argue in this theorem, which doesn't really matter I just thought to have at least one equation in this sent the way to read it is that the dollar sign just means that the speaker just decides to stop speaking after the sentence, whereas x y means the sentence the speaker says sentence y after sentence six. So basically, the bottom line here, the takeaway is that we think that a sentence y is entailed by a sentence X. If speakers are as likely to say that sentence after X as they are to just repeat themselves, right? So the idea here is that producing a sentence that's entailed by something that we've already said is tantamount to just repeating ourselves, right? We wouldn't be repeating the exact same words, but we would be conveying the same information in different words. That's a waste of everyone's time. We wouldn't do that. So that's another source of information that we think is implicitly encodes entailment relations between sentences in corpora. And the crucial thing that the corpora the attacks are produced by people. So the fact that the corpor produced by people grounds a lot of the meaning of the words. They're not just internal to the process that generate the words, but they connect to the world by virtue of the fact that people who are connected to the world produce them in a particular sequence. And this will come up again in the reference part. All right. So the grounding here comes not from the language models experience with the world, but from the fact that the text were produced by people who either themselves or through other people, we are connected to the world. There's this chain here where the language model is at the end of a chain and somewhere there's someone who connected the sequence of sentences to a specific goal in the outside world. Okay. So the specific theoretical results revolved in it showed before and the last slide has a lot of limitations that we are actively working on. I'm happy to talk about the more technical part later. So one one big practical limitation is that we don't really know if language models can really estimate the very low probabilities that we need for that theorem to go through. Okay. And in practice, the no redundancy assumption, the assumption that you wouldn't say something that you've ally said is way too strong. So if you look at actual expository writing, often people will repeat themselves because they want to make sure that the speaker understand, maybe the first time, they didn't quite get it. So you want to repeat like paraphrase yourself to make sure that they get it the second time. And the most extreme example of it is that the mathematical proof, mathematical proof, the outcome is entailed. The proof is completely Irrelevant, right? It's all everything is entailed already by the premise. And yet the coin of the proof is to convey to the other person the thought process that led you to believe the premise. Okay. So in this new study, we looked at a lot of corpus data and try to characterize what people actually do. So we need a much more complex model of pragmatics to really make this theory work. But the crux of the theory is that the takeaway that I want you to leave with is that the texts are generated by speakers and follows the kind of pragmatic concerns of the speakers. Okay. So this is exactly what I just said, text is produced by Crammatic speakers who have listened is in mind. And that it may be possible to recover these entailment relationships just based on the probability that the lingua with model assigns to sequences of sentences. So again, say sentences entailed by something that we just said is going to be assigned a probability. It's very similar to just repeating ourselves. By the way, I just repeated myself. Yeah. So tell. Okay. So the last part of it is back to reference. So there are people who think that inferential role semantics just knowing what sentence entails what is enough to capture the meaning of language. But most people think that we also need another kind of word to world connection, which is reference. So the words refer to me in the world. Okay. So then there's a question of do the words that are produced by language model? Do those words refer? And you would think that they obviously do, but actually it's not that clear and some people think that they don't. Okay. So we can imagine two kinds of situations. One where they're just like ants crawling in the in the sand and they accidentally spell out the words piano approved the incompleteness theorem in the sand. So that's a very low probability event, but it could happen. And some people would argue that that's what language models do. The text only appears to be meaningful but isn't, in fact, meaningful. Now, here's another situation like Luke, a guy, a logic student says, P and approved the incompleteness theorem. Now, which one of those is more like a language model. So a linguage model is more like ants in some ways. Probably it's not conscious. I don't know, but it's not a human. Luke is a different entity. It's human, he's conscious and so on. So maybe that's what we need for the words to refer to the world. So the intuition that we have that we probably agree with the ants are not referring to anything. The words that the ants are producing are just like random patterns in the sand that we can assign meaning to but impose meaning on that they do not refer in themselves. But in the second case, Luke was, in fact, referring to PNO. So the crucial thing to know here, which baby is not I obviously, every one here is the piano didn't in fact prove the incompleteness theorem. That's like famously, God proved it. So this is a false statement. And yet we are referring to piano when we say Piano have proved the incompleteness theorem. And it's possible that we know nothing about him, and yet we would still be referring to him when we say that. So just like if everything that I believe about Shakespeare is false, like, maybe I think that he was born in the 17th century and that he composed like oratorios. But I can still say Shakespeare and refers to Shakespeare. Okay. So obviously, I wouldn't be able to visually distinguish my piano from other people. I've never see piano. I don't know what it looks like. And yet, when I say the words, the names of those people, I am, in fact, still referring to those people. I hope that you agree with that intuition. I don't know if everyone agrees with that, but the intuition that most people have is that sensory grounding doesn't seem to matter here. Like, I can refer to people that I've never met. I don't know what they look like. I don't have any relevant beliefs about those people. Okay. And then view is called externalism about meaning. So the names refer by virtue of this historical chain of use that traces a name back to its first use, right? So I can use the word Shakespeare meaningfully because other people that I learned about Shakespeare from use it meaningfully and why was it that they use it meaningfully? There's a whole chain of generations where the name and its reference were passed along. And supposedly it goes back to this mythical baptism or someone pointed at Shakespeare and said, this is Shakespeare, like maybe Shakespeare's mom or something. So this argument can also be applied to things that are not names like kinds. Or disease are like arthritis. I don't necessarily know anything about arthritis, but the word still means something because other people use it in a meaningful way. Okay. So clearly, the externalism of meaning argument applies to language models as well as to humans. There's really no reason to think that it doesn't. So word reference is relatively easy. And we think that it should be taken the idea that language models words refer shouldn't be taken as seriously as the idea that my words refer when I talk about someone who I don't have any experience with. Now, there's another question that seems almost identical but it is actually much more difficult, which is can we say the language model itself refers, right? So when there are people who would argue and that makes a lot of sense that when I am referring to someone as a speaker, I have the intention to refer. So then we go into a lot of discussion in this paper on what kind of intention is required. So it can be the intention to refer to a specific person in the world because we just said that we can still refer to someone even if we know nothing about that person. So that's not the kind of intention. It seems to be some sort of general intention to refer to something or to the world. But it still might be the case that it's weird to describe that intention to language models. That starts to get back to the question of consciousness. Are you willing to use the word intention in the interpretationist way where you ascribe an intention to something if it's a useful way to describe that thing, or do you require that the thing is conscious and intention only conscious entities can have intentions. So that again goes back to the philosophy of my argument that is about my pay grade. So why do these things matter? I'm almost done. So there are two points of view here that I think are both worth keeping in mind, and that maybe connects to the question of understanding in society because so far it's been very languurnal philosophy of language talk. So from a philosophical perspective, it might make a lot of sense to describe intentions and meaningfulness to language models because it's really not so clear that we could come up with criteria that distinguish them in many cases, especially if you think about certain humans that don't have the self reflective abilities that we have such as, you know, three year old children, we still tend to ascribe intentions to them and say that their words mean stuff and so on. So we want to make sure that we from a philosophical perspective that we're not setting an implausibly kind of high bar for the language models. But there's another perspective to take on this, which I'm not sure that's the perspective that I would take, but it's very pertinent here is the political perspective. So I think some people would argue that asscribing predicates mental predicates like understanding, attentions, beliefs to language models may overstate the capabilities of those models and may perhaps leads some people to absolve themselves of the responsibility for what the language models do if they're their own conscious entities. Maybe some people will start advocating for language models rights, maybe language models will have rights that are equal to human rights. So then there's political stakes here that are much more important than just the philosophical stuff. I don't really know what to think about it, but I'm just bringing this up as a topic for discussion. Okay. So just to summarize, focus of this talk with on the behavioral, not feminological understanding. I think that if the language model behaved as if it understands when subjected to the kind of difficult enough tests that we as scientists cognitive scientists linguists subjected to. Um, then I would be satisfied in saying that it understands in this weaker sense in this behavioral sense. The phenomenon phenmenological sense, I leave to the philosophers. So we argued that in principle, language models could learn a taming pattern without needing additional grounding experience with the world. And we also show the word reference is relatively easy. And we are fine describing reference to the words that language models produce and saying that these words are meaningful. Thank you. That's what I want to say. Thank you so much for f. And how should we go? I mean, I'm terrible half hour. You could go also. I have 138 questions and between some parts. Okay. All right. Well, thank you so much for really fascinating talk. And you've raised so many issues that I want to ask you about, but I want to start with the biggest topic for today, which is understanding. And there's a distinction that comes up frequently, especially in the epistomology literature that tries to characterize what understanding is where people differentiate two different kinds of approaches. One approach is that having understanding is about having the right kind of knowledge or the right kind of representations. And then it's about saying, okay, well, what kind of knowledge constitutes understanding or what kind of representations you need to understand. Another approach is set things about understanding as being about an ability. To understand is to be able to do something. And I think is interesting is that I take it that both of these fall on the behavioral side rather than phenological side. I still take them both fall kind of within the scope of what you're taking your arguments to bear on. I'm curious, if you do you think one of those is a more promising approach for thinking about this question of how do you understand LM or how LMS understand her language? Uh huh. Okay. So just to make sure that I understood the two kinds. One is knowing things and the other one is about doing things in the world. Okay. Yeah. So representations or stabilities, right? So if you thought it was about representations, you might think the right kinds of questions are like, let's look under the hood and figure out what kinds of representations the system has. If you thought it was about abilities, you might see, well, what can it do? Yeah. I mean, those things are going to be related. And you might think that one or the other is really constitutes understanding? Yeah. That's a really good question. So I think that I am usually drawn to the latter definition of understanding where if we even if we treated the systems of black box and we had no access internal representations, we still expected to behave in a certain way, but the judgments that will give us in the tailing tests or any other tests. We have a conversation with them with the system we expected to behave in a coherent way that reflects understanding. So that would be enough for me. I'm definitely interested in seeing what happens out there the hood, but for me, representations, if the representation cannot be inferred from behavior, then I'm not sure that I'm interested. So because often what happens when we look inside the systems is that you can find a lot of competing representations. So in the example of inference in those simple cases where we change the word order where the system thinks that the lawyer met the lawyer saw the doctor and the doctor saw the Bowyer kind of mean the same thing. All right. Often, you can find inside the system some sort of internal representation that does distinguish those two meetings. But the model isn't using it. Okay. So it's interesting. I just had a conversation with student about this today. It's interesting when that happens, but it looks like sometimes these models construct all the possible representation that you can imagine instructing and use one of them randomly. So from my perspective, it's kind of not so that doesn't reflect understanding. It's interesting you raised that because something that comes up in the context of education is what people call inner understanding sometimes, right? And I think, you know, you often confront this when you know that a student, you know, in fact, to learn something and will produce it in one particular context. There's other context and they sort of fail to produce in the right way. And so yeah. I'm curious if you thought about that. And one of one of the one idea that come up in a lot of different counts of understanding that having understanding is about having things connected to each other in the right way. Yeah. And so do you think that in these cases where there is something that like maybe there's a relevant representation, but it's not being used in this case, that part of what's going on there is the failure of understanding because it wasn't it wasn't connected appropriately to the other things that were related to it? Yeah. That's interesting. I mean, I think that the differential semantics that we're talking about is very related to the concept of figuring out how things are connected to each other. Knowing that red is a color, knowing where different kinds of red are objects are red, all those things are that's what it means to understand the word red. But the second part of it, yeah. I don't know. I'll have to think about it. Yeah. I know. I was going to say there's another link I think to sort of heagogy that I was thinking about in the context that talked about. So you talked about this sort of the Clever Hans problem where you might have a system that's right but for the wrong reasons. Right. And so part of what you developed are these tools we're trying to figure out not only something to get the right answers contacts, but do they get it for you right for the right reasons. And that strikes me as something that we in thinking about how to assess student's understanding or thinking about how to assess a child's understanding? And I think a lot of what clever assessments do is trying to figure out, what's the right generalization question that you can ask somebody that will help you appreciate that they're not just like getting right in these cases for the wrong reason. Like for example, they just memorize that example or alternative strategy, which is affective in that case but in general. So I was curious to what extent in your approach to designing of the studies, you were thinking about these sorts of strategies to sort of like assess understanding of the human case. And then on the flip side, do you think Do you think that some of what you've done here is going to be usefully exportable to other kinds of context where we're trying to actually assess human understanding? Yeah. What the back story of the study was that we wanted to see if models made the same mistakes and understanding as humans do. So in some pretty complex sentences, this kind of sentence called the garden past incident that starts you read it first a few words and you think it means one thing, then realize it means another thing. So an example of that is when Mary bathe the baby cry. Okay. So people read it and think that Mary the baby and then tens that no, it's actually Mary B herself and the baby, correct. And people sometimes misunderstand that sentence. And you asked them did Mary Bay the baby and they will say yes, even though it's not what the sentence means. So we wanted to look at these complex sentences and see if the errors are similar between humans and models. And then we realized the models kept making a lot of very stakes and we kept making the senses simpler and simpler and the models we still extremely bad of them. So then we end up with those very simple sentences like the lawyer saw the judge versus the judge saw the lawyer. But we do also have those pest sentences at part of Ps as part of that data set. So In this case, the model struggles with difficult sense, but also with the simple ones that humans find very trivial. So I would use it as a cogive model. It is not fair. Makes sense. I wanted to ask you about it is like a middle part of your talk where you were talking about learning meaning for form. In particular, you have, I think, really interesting idea that because human language use follows these cian norms, we can actually use natural language to extract the kinds of inferential relationships that are going to be crucial for inferential roles semantics. That seems really cool and powerful. I guess I want to ask you about some possible limitations for that. The sorts of things I was thinking about are what kind of information about inferential role are not going to be contained within, you know, language, given that, you know, really given that it follows pricing arms. And so here's two kinds of examples I thought of. I'm curious if you fought about what you say. So one is it seems very crucial to human understanding that inferential roles don't just occur within language and linguistic representations. In fact, it seems like some of the cases where we get the most understanding comes from linking different kinds of representations to each other. So an example that I like of this is that you know, when I when I first learned the Tigree theorem, I'm pretty sure I was shown sort of like an algebraic proof for it. And then maybe a couple of years later, I was shown this whole diagram that allows you to kind of, like, just geometrically visually understand it in a different way. And that's an interesting case because I didn't learn anything new in the sense that I now there I already knew it was true. I could give any proof for it. Yet getting a distinct proof in this case of a geometric proof seems to give me a new sense of understanding. I have at least one way to understand what's going on is you have something like more algebraic kind of representation, something like more geometric kind of representation. Those have their own inferential roles internally to each other. But then there's something extra you get when you can relate those to each other in a systematic way. And one thing that's going to be very different, even if I grant that the leNs have inferential roles within certain kind of linguistic representation, it seems like humans have these other like, modalities for representation, and that a lot of what happens in the case of humans inferential role and the associated understanding that comes from that is possibly modalities. So does that I'm curious what you think? Does that seem right to you and is that going to be a limitation to the kind of understanding that the elements have achieved? Okay. So I have two answers to this. I think that it's going back to this, almost behaviorist view that I have. The question is, does that link between the two sets of representation have any behavioral consequences? Does it make ization to certain cases possible or better? I probably. Yeah. Yeah. But I would I think I agree with that poition. The other side of it is whether that is the only way to get there. So there's the question of the efficient learning versus learning of the pit, and I think that language models often can learn things about the world in a very rounded way that we as people who have sensory experience, we can learn very quickly or people have mental imagery. That's a really useful shortcut potentially. But you see the language models learn quite surprising things about the visual world. So they are pretty good at knowing which colors are similar to each other, directions, how how south and north are related spatially and so on. And they're not learning it in the most efficient way, but I think that there's a surprising amount of evidence in language for all of those things. If you train them long enough trillions of words, eventually, we'll learn a lot of those things that are not often verbalized. And I think you might end up finding representations that are isomorphous to those geometric representations. Eventually, if the model was able to do those tasks in the systematic way that my guess it would be that somewhere you can find representations that are basically geometric. Yeah. That is interesting. Yeah. I find that in other areas. Sir. Sir I was going to say. I mean, another another case I was thinking about where maybe the relevant inferential structure is at least not so obviously contained in language, but I'm curious what do you think it's something where there's an unstated inference and part of the communicative act, it's kind of crucial that it be stated. So this sort of thing I have in mind is something like implied threat, right? So I don't know. We're in some sort of hostile negotiation and I say to you, you know, that's a really nice water bottle. I would be terrible happen. Okay. Now, there's some sort of You know, there's an inference there. I intend for you to get it for you to, you know, as a human be able to infer. But it's unstated. And it's at least it's at least not obvious to me that that's going to be extractable merely from human generated speech. So I guess I'm curious if you could say more generally, what are the limits to the kind of inferential information that can be extracted? You know, you talked about some pricrms seems really compellingly reflected in the kinds of data that you have, but are there going to be other features of pragmatics and communication that you don't expect to be able to extract. I don't know. Anymore like if you asked me a long time ago about ten years ago, I thought it was extractable from a corpus, I'd say probably you couldn't extract that from corpus without but I don't even as I said, we've seen that so many things or can be extracted from text. We do have work on precipositions, which are more wonky linguistic key parallel to what you're describing. That is kind of an unstated assumption. So when you say The bottle is on the table, there's an unstated assumption that there's just one bottle. And it's it would be we to say the bottles on the table and there are three bottles on the table. But no one says I there is exactly one bottle on the table. That bottle is exactly one bottle. That bottle is on the table. We just inferred that. So that's sort of presupposition guage bottles can learn pretty well. You can go they unstated. I guess you how they learn it is interesting can speculate. But I guess there will be contexts where the implied threat would actually be interpreted as a threat. And then the person who was threatened would respond as if it was a threat. And then I can infer from what they said that they have just been threatened. That would be yes. Yeah. Yeah. All right. So I want to I want to ask you a question about territory that I think you wanted to avoid get into philosophy, but this is relevant to the part of your top that had to do with reference. And I thought it might be helpful to have a very concrete case here, right? So suppose that I say fats have two wings. Okay. I think we want to evaluate that statement to be true. I by bats, I mean the mammals and is false, if by bats, I mean the tools used as part of sports games, right? Now, what is it that makes the case that the sentence is true or false? I that that should be understood as having one reference versus the other? I I think a very natural idea in the human case is to think that it has to do with something with my communicate intention and you don't have to think it's the kind of attention that involves phenominology and consciousness and so on. You could just think it's something about intending to relate my utterance to one particular historical chain of reference versus like a different historical chain of reference. I'm curious if you thought about cases like that in the LLM case, right? So the LLM produces bats have two wings. If you think LM refer or that the words have reference, seems like we ought to be able to say something about whether that sentence is true or false. Okay. Do you think what the similarities of differences are going to be in the way we tell that story for the case? I mean, certainly in the Yellen case, there's something you can say about, you know, whether it's talking about mammals or sorting equipment, right. But how does that relate to the story about reference? I think that the human case the sequence of words referred to both things. The speaker can refer to only one of those things. So I think that the liuage model produced those sequence it would just be an ambiguous sequence of words. And whether the linguage model intended to refer to the animal that depends on whether it has intentions. So that's kind of the stronger case. I would infer that it had those intentions from whatever it said before and after that has two wings. So if it said something that is consistent with thinking that baseball bats have wings, then I guess I would think that it referred to baseball bats. But I don't have anything else to say. All right. But it sounds like you're really there might be no fact of the matter about what the large language model is referring to. I mean, at least at least unless you sound like develop some story about what it means for no attention. Yeah. Well, I mean, that's a distinction between the words and the speaker reference. So I think that the words in isolation, would refer to both scenarios. Would you want to say. But in the human case, if you say that as well? Yeah. Okay. Okay. B we don't Oh, I see what you mean. So like the word reference could inherit the speaker reference. Yes. I think you human case. I think natural, right? So if I say bats have two wings, it seems pretty much to say that the fact of the matter about the reference of bats? Yeah. I erect. All right. In the interest of time, I'm going to just ask you one more question so that we can see what people in the audience are thinking about. I love the way you framed your question for the talk, which had to do with whether or not largely Mos understand us. And I'm curious I'm curious if you think that we can understand something about how we understand each other by understanding how you know, whether or not LLMs understand us. In other words, are there lessons we can draw for sort of the psychology and science questions about interpersonal understanding human cases of understanding. By confronting this case of sort of like an alien intelligence trying to understand SMS trying to understand it. Okay. That's a question. I think from my experience with being impressed and the widely disappointed with line model repeatedly, I think that I've become very suspicious of whether someone really understands basically I guess it makes me want to make extra sure that I'm precise about what say. No. The more more generally, I Well, I guess the other side of that maybe changes the way that I think about human is that it's it's hard it's easy to appear as if you have things like theory of mind and like But then if you probe further, you realize that the model is not a. If your mind is basically, you have hypothesis in your head about what goes on in people. Sorry. I'm probably watching this but that's what I mean by. So it looks like there's all of these shortcuts that models use in maybe people so two. Sorry, I don't to be so pessimistic about human communication. I guess it may be a little more skeptical. Psychologists are so skeptical, we want to make sure that so the audience. But I think often people think we understand enough to get by. Which probably is much more modest target than what we think we achieved most of the time. Yeah. All right. Great. Well, I'd love to know what the audience is thinking about, so I think we can open it up to general Q&A from the view. Do we need the s? You guys are so good at your job hopping up with your mikes reporting Zoom sorry. Reference. I think that's one Okay. Thanks for the great talk. In the earlier Bird evaluation, you had a finalized model with the current era of, you know, closed source, large language models that are constantly evolving. Can you make any final conclusions, even if you had a perfect inference or do you have to do an ongoing, every six months, we test it and see if it still works. Okay. I had to put the GPT four results out there because someone was going to ask what about GPT four, but I try to avoid working with models like that because it's often unclear what we can learn from them. Because we don't know how they were trained and it's possible, for example, that they had the entire data set in the training data. I'm actually surprised that they don't they said it out there on the Internet. So they probably do have our data set in the training data and still they make a lot of mistakes, which is. But anyway, that we don't know what happens under the with open eye models, if it's possible that there's like a guy in Kenya that looks at query the sides where first sense entail the second one. I don't know if not, but who knows? So that's why it's very hard to do incentive those models. Fortunately, we have more and more dig models that are openly available. I think we had a very narrow window maybe a year. Were the best models for the closed models, I mean, there might still be a little bit better than the best open source models that we have, but I think that the gap is shrinking. So I definitely for as you said, for reproducibility reasons, it's best to use models for artifact, I've been download and another person can also download the you know, Lama version from June 23 and reproducer experiment. Hello. Thank you for the great talk. I wanted to sort of get just some more out of you about this idea of looking at behavior versus representations and trying to assess understanding. And specifically on the case of understanding people. And I also had a concrete example in mind. So I'm pretty extroverted, and let's say a friend of mine assumes that I'd like to go to a club. I would feel and that's in fact, false. I hate clubs for various reasons, but it's very true that if you had the right kind of model of me as a person, you would make that inference. So technically, your performance is off, you're making a mistake, but you're making a mistake for the right kind of reason. And in that case, I would want to attribute understanding of myself to my friend. But you can never look at their performance and get at that. Maybe across, you know, a lot of inferences you could get that. But anyway, it seemed to me from that example that there's something important about having the right kind of representation or process in assessing understanding. So I'm just curious to hear your thoughts. Yeah. I think that the buried in there was the key, which is you have to test it systematically and eventually, the pattern across all the inferences will be consistent with that representation. So you can't it's hard to know what to conclude from a single example. But if you tried all the possible cases eventually the pattern that would emerge, was the one that's only consistent with having the right representations. How do we know that humans have the right representations, I guess, is the other side of this question. We have the fundinological experience of having the right representations, but who knows if you're right about that? I think the real evidence that we have is human behavior for representations. Does that answer your question? Correct. There's so much more that could be said about the nature of understanding. I'm here. Thanks so much. I really illuminating talk and raises so many key issues. I'm intrigued by the aunt and Luke in the sense that at by any rational believable scenario. The aunt does not understand that piano or girdle or anybody else devise the incompleteness theorem. Understanding At do not understand that from what you've laid out, it's simply randomized. I don't even know if it's probabilistic. If it's if it's completely randomized, you know, the old Shakespeare typewriter some animal spent 1 trillion years, all the plays would be produced. I mean, that's a classic example from 60, 70 years ago. And it highlights the problem that if L is based on not probabilistic, but simply randomized in a matter of time figuring out some kind of a sentence, then I don't see where there is understanding. With you Luke, there is clearly understanding. I think we would agree. But I don't think there is with the aunt. And is the aunt analogous to LLM? That's the question. Yeah. So I think that here you again have to look at the systematic patterns and if you once in 1 trillion years produce a sentence that happens to look meaningful and all the other sentences that you produced are not meaningful that it's a case of right for the wrong reasons. And I I wouldn't describe meaning even through the one sentence that appears to be meaningful. It needs to be a pattern. So the ants most of the time would just create brandon shapes. So then once million years they create something that looks like an English sentence, but it's not an English sentence to me. The other part of it is that in this part of the talk, I was not defending the idea that language models understand what they say, but just that the words that they produce refer to the outside world. So it's a weaker claim. But I would be willing to defend the other claim as well. I is harder. Okay. Okay. Hi. Thank you for the talk. I wanted to ask a bit about the theory of the mind and what the theory of the mind stuff. Okay. I know what they were also referencing before. So for something like what is it? LLM, the training dataset usually is going to just be whatever text you find online. Compared that to people, especially like your students or just students in general, when you're learning something, you don't just read. Sometimes the biggest ways you learn is to do practice sets, and then that in a sense, I feel like it's kind of what is it like a person being a adversarial network to themselves. But when training LM is I don't I haven't really heard about anything, similar to that. There is one instance, I think, where in the coding. What is it for like this like new coding software someone made a month ago. They put this kind of externally like, oh, yeah, you generate the code, you feed it through the actual code and then it returns like, oh, it's not working and then later, it says, okay, generate the code again. But then that's really externalized to, like, oh, you're already suing the LLMs, like pre trained and everything. So I wanted to ask if there has been any work done into like looking independently into each of the connections and then how these work. Okay. Each of the connections, circuit, can you repeat what the connections are? Oh, yeah, sorry. I mean, like, kind of in the sense that, like the connections within, like, I guess, I don't know, like the attention units or something are adversarial, so then they can, what is it, like distill the information down in a way that like is like referencing something that like you would understand. Like meaning wise. You're asking if there are internal representations that correspond to concepts that we understand as humans or I I guess so. Because yeah, what is it? For example, like, oh, if when you're running the test line, right? You say, oh, yeah, A, A does, and then another time you say C does D. But then but then if the test is saying that, okay, these two are basically the same thing. Like the concept wise, it's the same thing. And then you assume like, what is it? A does B, and then you get the C does D in wrong. Then what is it then from that, you're able to learn a C D is going to be the same as a does B, but then that generalizes to anything where C becomes A and then B becomes D. And then that way, what is it? I can create a actual basically like a policy instead of just representation. I think we might talk about it afterwards, like maybe maybe to stick around. Am fully got it. Okay. Thanks, T. It's interesting talk. I have a question about the following awkward? So the idea that a language model could understand something and be bad at it or rather actually be much better at it than a human, but then simultaneously don't understand it. So abstracting away from language, if we think about something like a chess engine. We might say, Well, chess engines don't really understand the game of chess, but they're really good at it, right? Whereas a human, we really understands the game of chess, but they're actually very poor at it. Is there something special about language? Be Obviously, I don't understand why you're talking about language It's is there something special about language that we say, Yeah, there's this idea that they get this lawyer and active business wrong? And that's an example of their ability is low. And so therefore, their understanding is low, because I feel like in other aspects of computer human interaction, that's not a requirement that we have. Okay. Yeah. All right. So what does it mean that when the person understands chess. Like, what are the criteria for that for saying that someone understands chess that they know. So that the person you have in mind is someone who knows the rules of the game, but not the clever strategies for winning. Yeah. I I don't think that I'm opposed to saying that the chest engine understands chest in the behavioral sense that I described here. I know that it's but I think it's it's good to think about examples like that because it's kind of more intuitive for us to ascribe understanding to something like Catch T than to a chest engine. Yeah. I think that to bite that bullet. We also have a reception. So we'll have time to. We just want to so Hi, T. Hi. Thank you for the talk. I just wanted to probe more on whether you think you can have understanding language understanding, I guess, without a theory of mind to go back to the your theorem. Like, I would think that pragmatic skus and whatever is actually impossible without being able to represent someone else. So like, you talked about when would you actually repeat something you said. It really depends on what you know about the other person or, you have to represent what knowledge they actually have. Similarly, for entailment, you would actually sometimes say the entailment. Sometimes, it makes sense if the other person actually might not know the entailment, or whatever. And I guess, yes I have a strong feeling that you can't have language like anything close to language understanding without being able to represent other people. And I guess like a deeper question is it sort of depends on your theory of language. Like, if you think that language is fundamentally just about representing things and like, referring to things maybe to yourself and linking representations. But then if your theory of language is about it's fundamentally about communication. And yeah, conveying information to other people if it seems like you need the other people part of it? Yeah. I agree with that. And I think that to generate language that is believable, coherent, helpful way that Cache Peta is, I guess, you need to have an implicit theory of the listener. And the way that you do it is by having an implicit theory of a speaker as well. And so I think that the language model would not be able to Assign the right probabilities to sequences of sentences in the training set if it didn't acquire an implicit theory of what speakers tend to say. So the way that it does it, I think, is it inhabits a person of the not a person exactly, but it has some representation of what someone might say. And then it produces sentences in accordance with that inial representation of an agent. Okay. Um, so whether you want to say that the language model itself is an agent. That's a more philosophical question, but I think that in order to be effective, it has to have some sort of general representation of an agent. That goes back to the behavior is a point again. If the behavior is consistent with simulating an agent, then it's probably simulating an agent. Here Thank you. Great talk. Also, it's really fascinating in your talk that you showed how much we can learn about language use from the way language is used with the tilments and the groundings and stuff. I'm kind of wondering kind of similar questions like how do we push the limits of what we can learn from language use. For example, sometimes I feel really understood if there's some ague failings in me and I hear a very precise description of that. Let me the moment I heard that the description I realized, that's a really precise description of what I had in my mind, but I was not able to generate that description myself. Since you can find in literature and poetry, sometimes good philosophy papers or stuff like that. If there's a language only trained on the linguistic data I have ever produced or listen to in my life is not able to generate that description because I struggle to generate that description. But there's a possibility that when I saw that I immediately realized that's a very precise description. Do you feel maybe it safely to generalize from that intuition to say if you just throw all the human linguistic data to a large language model is able to do very well to what it can learn from the stularities, but is it possible to go beyond what it can learn and get new insights in a way we think about things or say things. I follow the last sentence. Well, I think Okay. So it's true that there are certain things that we don't talk about and sometimes hard to verbalize. But as long as they're not impossible to verbalize, eventually they'll turn up into corpus. So that's why having trolling supports is helpful. The second part of it, whether the language model could be truly created, was that the I haven't thought about it in though. I would say that it would probably be as creative? No. I don't know. I I I don't need to think about it more. That's a good question. I think that the starts to. Go back to the point of, like, Could only conscious being the creative and everything that's not conscious is just like mixing together pieces that it's already seen before. I don't know. Yeah. That's a great question that I have I think Thank you, our speaker. Can we also please thank Erin and Jane and all the CDH staff and made all of these elements