Episode 209: Artificial Intelligence—How Does It Work?

Photo by Tony Rinaldo

Episode 208: Artificial Intelligence—How Does It Work? with Fernanda Viégas

Click on the audio player above to listen to the episode or follow BornCurious on Amazon Music, Apple, Audible, Spotify, and YouTube.

On This Episode

These days, it seems everyone is talking about artificial intelligence and machine learning—think ChatGPT. But how do these work, and where do they fall short? In this week’s episode, we do a deep dive on these tools with Fernanda Viégas, whose work in academia and industry focuses on people-centered machine learning.

This episode was recorded on February 29, 2024.
Released on May 2, 2024.

The conversation continues in Episode 210.

Guest

Fernanda Viégas is a Sally Starling Seaver Professor at Harvard Radcliffe Institute, a Gordon McKay Professor of Computer Science at the Harvard John A. Paulson School of Engineering and Applied Sciences, and an affiliate with Harvard Business School. With her longtime collaborator, Martin Wattenberg, she coleads Google’s People + AI Research (PAIR) initiative, which advances the research and design of people-centric AI systems.

Credits

Ivelisse Estrada is your cohost and the editorial manager at Harvard Radcliffe Institute (HRI), where she edits Radcliffe Magazine.

Kevin Grady is the multimedia producer at HRI.

Alan Catello Grazioso is the executive producer of BornCurious and the senior multimedia manager at HRI.

Jeff Hayash is a freelance sound engineer and recordist.

Heather Min is your cohost and the senior manager of digital strategy at HRI.

Anna Soong is the production assistant at HRI.

Mahbuba Sumiya is a multimedia intern at HRI and a Harvard College student.

Transcript

Ivelisse Estrada:
Hello, welcome back to BornCurious, coming to you from Harvard Radcliffe Institute, one of the world’s leading centers for interdisciplinary exploration. I’m your cohost, Ivelisse Estrada.

Heather Min:
And I am your cohost, Heather Min.

Ivelisse Estrada:
This podcast is, like its home, about unbounded curiosity.

Heather Min:
And our guess is that with seemingly endless talk in the media about the promise and perils of artificial intelligence, you are curious to learn more about it. Well, on today’s episode, we are talking to Fernanda Viégas, a computational designer and data visualizer who has worked on machine learning at Google.

Ivelisse Estrada:
Welcome to BornCurious. Why don’t you introduce yourself for our listeners and tell us a little bit about your professional background and your work.

Fernanda Viégas:
Hi, I’m Fernanda Viégas. I am a professor at Harvard and a principal scientist at Google. I’m the Sally Starling Seaver Professor at Harvard Radcliffe Institute and the Gordon McKay Professor of Computer Science at SEAS. And I do data visualization, AI interpretability—there’s a lot of different kinds of research I’m interested in.

Heather Min:
Are you a computer scientist?

Fernanda Viégas:
I am a computer scientist, indeed. I’m also a designer. My background actually is graphic design and art history—has nothing to do with computation—and so it’s kind of crazy that I do what I do today. So my bridge between the world of graphic design and art history and the kind of research I do today in computer science was going to the MIT Media Lab and using my background in graphic design with computation, and this is where I actually, for the first time, heard about something called data visualization.

So, this is when I learned that you could turn data—numbers, measurements—you could turn those things into beautiful images on a computer, and I became fascinated by that, and that takes me all the way to Harvard today.

Heather Min:
Let’s dive into the topic at hand. How do you define artificial intelligence?

Fernanda Viégas:
So artificial intelligence is interesting as a term because everybody’s using it these days, but if you think about very technically, it does not have a super technical definition. Artificial intelligence, or AI, tends to be a group of different kinds of techniques and approaches that we can take in computer science to build systems that we would like to think have some sort of intelligence. And by intelligence I mean they have rules that they abide by or they can understand something about the information they’re gathering. Now, artificial intelligence is a large field, and it has existed for decades—even before the whole craze about machine learning and all the chatbots that we see today.

Even before any of that, there was artificial intelligence systems where, for instance, people would have rule-based systems or expert systems, and those would be systems, for instance, that you would hope a doctor could use. And so you would give a bunch of rules, and the system would know how to follow or not follow those rules specifically for that domain expertise. And so that has existed for quite a while. Symbolic systems as well. What’s new in the last decade or so is this explosion of something called machine learning systems. So when you think about machine learning systems, those do have a very technical definition, and so it’s much more concrete, in a sense, than AI.

And these became really powerful, I’d say, in the last decade or so. And what machine learning does—and there, I think we can start to understand very concretely what is the difference between a regular software system and a machine learning system. The way I like to think about it is in traditional software engineering, what you do when you write a program is you give all the rules for the computer. You kind of describe the little world that this program is going to live in. You describe all the scenarios the program is likely to encounter. And you say, “Okay, if you see scenario A, you’re going to do 1, 2, 3. If you see scenario B, you’re going to do 9, 4, 6.” Whatever it is. In machine learning, the really powerful thing is the fact that you do not give rules. In fact, you do not tell the system how to solve a problem. All you give the system are examples of things it needs to ingest, of information. And so let’s think about something like an image classification system. I just want to understand, I want to point something to a photo of an animal, and I want that system to tell me what kind of animal that is. That’s an image classification system.

Ivelisse Estrada:
Is this what is being used then when you do reverse image search in Google?

Fernanda Viégas:
Sometimes it can be. Yes, yes, yes. It could be very much. And just going back to that fundamental notion of you’re not telling the computer what to do: the computer is going to figure out what to do and it’s—

Heather Min:
Because you’re teaching the computer to sort out perhaps the underlying common features or principles when, for example, you’re saying, “This is an image of a cat,” or rather, “I’m giving you a lot of images, you figure—”

Fernanda Viégas:
Yes.

Heather Min:
And I say, “These are all examples of a cat.”

Fernanda Viégas:
Yes.

Heather Min:
But you didn’t say it’s got a backbone. It’s got fur or not. It’s got eyes.

Fernanda Viégas:
Exactly. I’m not describing it, but you touched on two very important principles. One is that you have training data. So when you say, “I give it a lot of images of a cat,” two things are important. One is a lot of images, but the second thing you said that’s super important is that there is something called “ground truth.” So a human has gone through every one of these images and said, “There is a cat here. This is a cat; this is a cat.” And then another human might have gone through a bunch of other images and said, “This is a dog, a dog, a dog. This is a horse, horse, horse. This is a zebra.”

So the system has trained on these labeled examples, and so the system has been exposed to things that are called cat and dog. And again, we’re not telling the system that a horse is taller than a dog or that a cat has pointy ears or that it has whiskers. We’re just showing the system these images. Eventually, after we’ve shown this system so many images, we will test it, and we will say, “Here’s an image you’ve never seen before, system. What is this an image of?” And when the system says, “Oh, this is a dog.” It’s like, “Oh yeah, this is right.” Or if the system guesses it wrong, we say, “Oh, not so good, let’s try again.” So you have a way of retraining the system so it gets better and better at that one task.

Heather Min:
So, you haven’t defined the fundamental principles of a dog or a horse or a cat.

Fernanda Viégas:
Exactly.

Heather Min:
However, somebody has coded, or a team of people have coded, these are the things that you need to focus on in order to extrapolate what the common features of all of these images or other pieces of data that I’m providing you are.

Fernanda Viégas:
So, the coding part is literally what we call an objective function, which is literally the goal you’re giving the system. All you’re saying to the system is, in this case, let’s say again I have a simple image classification model. Let’s say I have 10 bins, 10 different kinds of animals—dog, cat, horse, zebra, lion, whatever. I have 10. And I say, “System, your world is made up of these 10 bins. No matter what image I give you, you will have to classify this as one of these things.” Again, I never said what these things are. The system has no real understanding of what a cat or a dog is, but it has an amazingly sophisticated understanding of pixels.

So what it does is as it is ingesting these images, these pairs—it’s always a pair of an image and a label—it is analyzing these pixels, and it’s saying, “Oh, okay, this sort of pixel arrangement, I’m going to cluster here and I see that it keeps getting this label ‘cat.’ This other arrangement of pixels seems to hover around this label here ‘dog.’” You know when sometimes we do brainstorming exercises, and we all each get Post-it note, and we write something. And then there is this moment when we are done with the Post-it notes and we need to cluster the Post-it notes. And we’re like, “Oh yeah, this is this topic over here.”

We cluster, and we put it on the board or something. You always end up with certain Post-its that fit either multiple clusters or they don’t fit any cluster. In a sense, this is what these systems are doing. They are clustering all these pieces of information based on similarity. And then they are putting this on a board, except that our boards tend to be 2D. Their boards are millions of D. And so they’re highly, highly multidimensional, and this is how they’re going to organize the space, and this is how they are going to kind of digest the information they have.

Heather Min:
We as human beings, not only do we take in information through the senses and through our minds, there is of course ethics, values, so that we can prioritize and also know right and wrong. So artificial intelligence, does it also factor in those values?

Fernanda Viégas:
It’s a great question. So it is starting to try to factor in values. One of the things to keep in mind is that the world of these AI systems, it’s a very, very mathematical world. So basically when I, say, when I go back to this example, this very, very simple example of cats versus dogs. You and I see an image, a 2D image, we see if the cat is cute. All the system sees is literally a linear sequence of colored dots, of colored pixels, that’s all it sees, and that’s how it starts. And so it turns that into a huge collection of numbers, which is called a vector, and this is how it starts to cluster things.

So, to your point, how do we encode values in those kinds of highly, highly mathematical spaces? It’s a very active area of research. Maybe one of the things we can start talking about now is one of the newest waves of AI that we’re all contending with right now, which is generative AI. So you have these large language models, which are incredibly powerful.

Heather Min:
What do you mean by large language model?

Fernanda Viégas:
These are models that were trained on massive amounts of text, and so you can imagine everything on the web and all of the books that have ever been scanned by Google Books and all of the news coverage that is available on the web. And basically what these models are trained to do in a very simple way is to kind of guess what the next word or the next token would be in a sequence of words or tokens. And so you can imagine that after this model has again ingested all of human literature—not all of it, but a big portion of it—it has seen so many sentences in English and so many sentences in Portuguese and so many sentences… Let’s imagine I have a sequence of words that says, “To be or not to be, that is the ‘blank.’” The system may look at that sequence of words and say, “Oh, the highest probability, next word here is ‘question.’ To be or not to be, that is the question.”

Heather Min:
And that is based on how many times it saw. So it’s not being unique or novel in how it’s using words or language, but it’s more that it saw, “To be or not to be, that is the question” more often than not than it would say.

Fernanda Viégas:
Exactly. So it has a higher probability for the word question, but then it has a whole list of other options that it has. It just so happens that question is the next token with the highest probability. So it says, “Okay, to be or not to be, that is the question.” End of story.

Ivelisse Estrada:
This is how our phones work, right? Predictive text.

Fernanda Viégas:
Yes, yes. As I am texting, “Hi, how are you?” “How”—oh, “are you” is probably the next couple of tokens on that, right? One thing that we should just say, I think, even maybe before all of this, I think one of the things that is both a very powerful feature of these systems and a very equally challenging feature of these systems is the fact that, one, we don’t give the systems the rules, and two, we don’t know exactly what they are learning. So let’s unpack that a moment. Why is that powerful? That is powerful because again, let me go back to the little example of the cat versus dog.

If you think about it, you and I, all of us, we are able to recognize animals all the time. In fact, we are great at recognizing not only animals. We’re amazing at recognizing human faces. If I ask you—oh, you just saw a friend today. Walking out on the street, and you saw a friend, and they recognize you, “Hi, how are you?” What did you do? How did you do that? Write to me all the rules, everything that happened that made you recognize that face. It will be really hard. Of course there are things you’re going to be, “Well, I looked at the eyes. This friend always dresses in a certain way. I looked at the hair.”

This is great, but exactly how did your brain come to that conclusion? We don’t know how to describe that, right? And there are many things like these that we do that we take for granted because we just do it. It’s a snap thing. That we don’t know what the rules are that we are using. So it gives me a lot of power when I don’t have to tell the computer every single step in a problem-solving situation. So when I don’t have to give the rules to a computer, and I can just ask a computer, “Figure it out. Learn. Here.” That starts to unlock a lot of things that we know how to do.

All the way to really big scientific questions that we wrangle with and we don’t know also. What’s a better way of diagnosing cancer? What is a better way of forecasting earthquakes? What is a better way of folding protein? There are many things that we don’t know how to solve that if these systems can help us with, and we didn’t have to teach them everything or to give them the rules, that is great. So this is all the positives, right? On the not-so-great piece about this is exactly the reverse of this coin, which is, “Oh, wait a second. We didn’t give the rules.”

So what exactly are these systems learning and glimpsing about the world? I was never able to figure out if this is an anecdote that everybody talks about in AI or if this actually happened, but it is absolutely possible to have a scenario like the following. There was a military system, another image classification system, that was being trained to identify war tanks. And it was doing really well in the lab. They were training it, it was doing really well, passed a bunch of tests. “Yay, let’s deploy it out in the real world.” It did incredibly poorly. And they’re like, “Wow, wait, the tanks haven’t changed. We know the system can identify tanks. What’s going on?”

When they looked at their training data, they realized that for some reason a lot of the images they had used to train the system were stock photography images. They were images where, by coincidence, all of these tanks were shown in beautiful sunny conditions and out in the field the tanks were in rainy conditions. The background was very different. So what the system had learned was to identify the sky, not the tanks. And so this is the kind of challenging situation you find yourself in, which is it seems like it’s working, but then it doesn’t work.

And it’s like, “But wait a second. What did it learn? Did it learn what I wanted it to learn, or did it learn something completely separate that is completely uncorrelated?” Going back to our language model and bringing it back to your question about values, for instance, one of the things that very quickly you get into complicated waters with these models is in a scenario like “To be or not to be, that is the ‘blank.’” That’s all well and good. I think we all have a sense of what would come next.

But when we have different kinds of sequences or sentences, it’s much less clear what we want the system to say to us or how to complete a sentence. And an example would be something like, “My name is Lauren, I am a ‘blank,’” versus “My name is James, I am a ‘blank.’” Because there what we find are things that sometimes are professions. “I am Lauren, I am a teacher. I am James, I am an engineer.” So the model will decide to complete that sentence with information that tends to be quite biased.

Heather Min:
Because our culture and the artifacts of our culture, we know it’s not gender-neutral, and it’s got skewed perspectives based on who wrote them and when they wrote it and all the other cultural contexts.

Fernanda Viégas:
Exactly, exactly. And this is one of the problems, right? Because to train—

Heather Min:
It’s tainted data.

Fernanda Viégas:
It is. It is data that reflects all of our societal shortcomings and problems.

Heather Min:
Do we like that?

Fernanda Viégas:
Do we like that? That is a great question. So here’s an interesting thing. One is I think we need to be aware of that, and the good part is that as a research community, I think we’re very aware of that. I think it’s been amazing to me actually as a computer scientist to see how much public debate there is about that, which I’m very happy to see because we need stakeholders in different parts of society who are going to be affected by this technology to be aware of these challenges.

But I’ll tell you another thing. So I think to me, to solve a problem, the first step is you always have to be aware that that’s a problem. So I think we’re there. I think we are aware. Do we have solutions? We don’t have a silver bullet yet. We are starting to come up with approaches that seem promising.

Ivelisse Estrada:
Is that because you have to come up with different ways to train the system?

Fernanda Viégas:
Yes, there is... Some questions start even earlier. They start with the data set that, yes, it is about training, but it could be even earlier. What kind of data do I have to train the system? How problematic is this data? And again, because these systems are so massive and they rely on such massive amounts of data, we don’t really have good tools today to inspect our data. Because sometimes it’s—how do you inspect all of the web? How do you inspect all of the text in the web?

Heather Min:
A lot of it is not great.

Fernanda Viégas:
Yes, right? But the first step again to me is to be aware, to understand. If we had a little nutrition label, what is the nutrition label for the data sets that we’re using? Because another reality of where we are today is that it’s so expensive and so hard to put together quality data sets that once you have a useful massive data set, a lot of times you will put it online or you will open source it, and then other people will reuse that data set. And so if I don’t know what’s in my data set, and then you are going to use my data set to do your own model, to do your own task, it can really amplify the problem.

The little unintuitive piece that I think is fascinating is that we may think, “Okay, so you mentioned a lot of the data on the web is not great. A lot of the data, say, in user-generated content can be problematic: It can be toxic. It can be about bullying people. It can be about all sorts of things we don’t want. So one of the approaches one can take is to try to filter those things out of the data set and train a system with a “clean dataset,” if you will. That has been tried. That may be okay.

But it turns out that for your system to understand what is not desirable, you also have to introduce your system to these things that are not desirable. And so to understand what is good behavior and what’s bad behavior, the system needs to see those examples because otherwise it will be exposed to things it has never seen before.

Heather Min:
Are there people in the field who take on the role of teachers of these generative AI systems?

Fernanda Viégas:
Yes. This is a very active area of research, and there are all sorts of experimentation happening. There are people who are trying to be very explicit about the teaching. And it’s interesting because one of the things we’re finding with the more sophisticated, say, the chatbots, for instance, one of the things that tends to work really well is what we call prompt engineering. Prompt engineering is when you tell the system how to behave. So, you can say, “Okay, I am going to talk to you right now, and you are going to be a mentor to me. You are going to be a very helpful, kind mentor to me, and I want you to teach me about physics and pretend I am a six-year-old.”

And so, this kind of prompting will sort of put the system in that persona. And so it turns out the system will react to that positively and will try to use the language that would make sense to a six-year-old, will try to remember to be helpful and kind. There are limitations to that kind of approach. Because over time, if the conversation keeps going, there is a chance that the system will drift from that persona.

But then you just have to be like, “Remember, you are a kind and generous mentor or helpful mentor,” and then it will try to impersonate that persona again. This is a major approach today that people use. But I think your question is also getting at something deeper than that, which is what is the right pedagogy, if you will, to getting these systems to really learn maybe in different stages? So a short answer to that is that would be wonderful. We’re not quite there yet, but there is a lot of experimentation.

One of the things that is, again, unintuitive about these very large foundational models is that they take, again, tons of data. They take a long time and lots of money to train. They train for months, and then you spend months fine-tuning and trying to understand what they learned. And one of the things that is coming up is the fact that the larger we go with these models, the more skills they seem to have that we’re still learning about. So for instance, we are learning, some of these latest models, the amount of math they can do. We’re still trying to understand how much math can you actually do? How much programming can you actually do?

And so it’s a strange way to work, if you will. Because you are building this thing that turns out it takes a long time. It comes out, and you start testing it, and you see that it passes all these benchmarks. So it does the very basic, basic, basic things. And you start conversing with it, and it’s very sophisticated, but it’s also the case that it seems to have these hidden skills. That every once in a while someone will—it’s like, “Oh, did you know that it can do this?” In calculus. “Or did you know that it can do this?” In whatever skill it is.

So, to your point about can you learn the ABCs, and then can you learn grammar, and then can you learn—we don’t know exactly what sequence of learning blocks make sense right now.

Heather Min:
Because even though humans have built these machine learning systems to learn, we don’t really know what we’ve built.

Fernanda Viégas:
Yes. Again, this is the strange and uncomfortable piece, and it’s also the incredibly powerful piece.

Heather Min:
Here we are in higher education, however, and you’re a professor.

Fernanda Viégas:
[Laughs] Yeah.

Heather Min:
And ChatGPT is out there as well as other generative AI that the students are using. Do you have any thoughts about that at this very early stage?

Fernanda Viégas:
I do. I don’t believe in telling the students not to use this technology. I think the cat is out of the bag, and let’s not pretend this doesn’t exist. Having said that, I do think we’re struggling right now in higher ed, in K–12 too. How exactly do we use this to the student’s benefit? Right now I’m at Radcliffe, I am not teaching right now this semester, but my colleague, Martin Wattenberg, has taught a class in computer science, and it was post ChatGPT. So it was going to be the first time he was going to teach a computer science class with the presence of ChatGPT.

So it was not mandatory to use ChatGPT, but he said, “Students, you are welcome to use it. The one thing I ask is if you use ChatGPT, please credit. Just say, ‘I’ve used this,’ and just know I am not going to take off points.” Because the class, in this case, was about artistic computation. And so he was like, “Look, I’m not going to sit here. I’m not grading the code you wrote. I am grading the creativity and the expressivity of what you just did. And so don’t worry, I’m not going to read your code and think less if it was ChatGPT created.”

And it was an interesting experiment because he said that students had prototypes much faster. So much more quickly they could get to an initial thing on the screen that you could see and play with. He also felt that the range of creativity was not as wide as previous classes. So people were doing good work. It was not bad work, but there weren’t as many out there thinking things or like, “Whoa, I would never have thought about doing something like this.” It was less of that.

Ivelisse Estrada:
So fewer creative leaps, it sounds like.

Fernanda Viégas:
Fewer creative leaps. Also, to be honest, fewer not-so-great assignments. So the level overall went up, but yes, fewer creative leaps. But then the other thing he said is for the students who maybe were not as masterful in coding, he really felt that some of the students were not getting some of the fundamentals of coding and were not really getting their hands dirty and experimenting. And maybe what they were doing was more taking whatever ChatGPT was giving it and using it full-on and not knowing how to take that, by all means, but then edit that or play with that or just dance with it a little, right? And it was a little bit stifling.

I think there is something to be said about this technology just getting you out of the, “Oh my gosh, I’m staring at a blank piece of paper. I don’t know where to start.” So it gets you there immediately, but then when you have that initial seed, then what do you do with it? And I don’t think we quite know yet how to teach to that ability. To me, it’s a little bit—the analogy I have in my mind is when you are a reporter or a journalist, there is the task that you do as a journalist: you’re writing the article. But then there is the task of the editor, who’s going to read your article and really polish it and have that editorial eye.

And I wonder if part of the change that needs to happen is that I think we teach so much about the writing skills. In this metaphor, it is all about writing your article. I wonder if we need to start also teaching more about the editing. How do you edit what you get from these chatbots with a critical eye, with your own voice? How do you make your own voice come through to this? So I think that’s a really interesting change that I’m curious, how do we do? How do we get there?

Heather Min:
It’s a tool. It’s not the finished product.

Fernanda Viégas:
No.

Ivelisse Estrada:
Well, let’s talk about, because we were just talking about academia, you also work in industry.

Fernanda Viégas:
Yes.

Ivelisse Estrada:
So what’s that like for you? What are the differences, and why a foot in each one?

Fernanda Viégas:
No, that’s a great question. I think because it gives me such different perspectives and useful perspectives. So the thing that I get from industry is a very grounded applied eye of, “Oh, wow, I did not know these things can break in these different ways.” Or “Oh, how interesting. We don’t have tools to look at massive amounts of data, look at that.” Even though we’re building these huge systems. So I get to see some of the immediate needs that exist today, and also how we can make a difference. Either is it tool building, is it thinking about users and impressing upon some of the companies, like how you might want to think about users in different ways. And it doesn’t have to be a huge leap from how you’re building these models. There are ways in which you can start to bridge gaps. So that’s on the industry side.

On the academia side, I just feel like I can ask all sorts of different kinds of questions, say around transparency, that I may not get to ask inside a company maybe because they are racing for these benchmarks, they are trying to build the next model, and they’re testing this model in certain ways, and it’s like, “No, no, no, wait. There’s all these other things you should be thinking about.”

Sometimes it’s easier in academia to do that work of coming at it from a diverse point of view and building proofs of concept of just a different mindset.

Ivelisse Estrada:
Because then it’s not as commerce driven. It’s not as product driven.

Fernanda Viégas:
Exactly.

Heather Min:
Or deadline driven.

Fernanda Viégas:
Or deadline driven, yes.

Heather Min:
There’s always the question of tests. Are you testing the right thing? Are you asking the right questions?

Fernanda Viégas:
There is a huge debate going on in computer science and in other fields as well today about what exactly are these systems learning. Are they intelligent? Are they not intelligent? What is the nature of their intelligence? And one side of this debate talks about these systems as stochastic parrots. Which is saying, as intelligent as they seem, all these systems are doing is sort of superficial statistical correlations. So it’s not like the system is learning anything deeper than the surface statistics of, “Well, this is kind of like this one, and this is kind of like this other one.” But you can get very far with that.

The other side of the debate says, “No, we think that these systems are learning, are glimpsing something about the world that goes beyond the surface statistical correlations.” And so for instance, what would that look like? One of the measurements for a kind of intelligence that we as a species have come up with is the ability to model something about the world that is not given to you explicitly. So for instance, when I say, “I was in a room, and Alice came in, and she was very upset because she had a broken glass.”

And I said, “Don’t worry, Alice, I’ll take your broken glass.” And she left, and I put the broken glass inside a drawer and closed the drawer, and then Alice comes back, and you can ask the system, “What happened then? What does Alice think happened? And why do you think I did that?” Or blah, blah, blah. The system has to model something about the world to say, “Oh, Alice probably doesn’t know that the broken glass is inside the drawer, but you do.” Or even going further: Alice comes back. I said, “I don’t know where your glass is, but don’t worry, it’s not a big deal.”

And when you ask the system, “So does this person know, or doesn’t know?” And the system will say, “Yeah, this person knows. They’re lying.” And so if that is the scenario, if the system can understand the mental model of the two people in that little story, it is probably understanding something about—again—even though it may not understand physics as we do—it understands that once I put something inside a drawer and I close it while the person is outside, they have no way of knowing that that was inside the drawer all this time. So we started to see things like these.

So again, this is still an ongoing debate, but for instance, one of the things that we’re doing in our lab is trying to understand if we give certain tasks to a system like this, does it understand anything more than the pure information we gave it? So for instance, one of the things we did, a small experiment. We took one of these language models, and all we give to the language model was a series of tokens. We were giving it really the placement of tokens on an Othello game. Do you know Othello?

Heather Min:
I do. I’ve never played it.

Fernanda Viégas:
It’s a two-person game. And let’s say I have white pieces, and you have black pieces, and it’s a 2D board. It’s just a board. It’s a board game. And if you can flank my pieces, if you have black pieces flanking my white pieces, then you turn my pieces into black. And so that’s all it is. So the game is very simple, but it gets interesting and challenging. We never said to the system that this was a game. We never said this is Othello. We never said you were playing a game. We never gave it any rules. But basically we fed to the system a series of sequences of movements in this game. So like A4, 2B, 3C, whatever.

And then the only thing we wanted to understand is when we gave it a new set of placements on the board, could the system complete it with a legal move? And it did. It was able to complete with the legal move. And so that was step one. It was like even though it doesn’t understand, we never said anything about, again, a game, 2D board, anything. It was doing legal moves. It was not doing illegal moves. So that’s interesting. That’s thing number one that’s interesting. But then the thing that started getting very interesting is when we looked inside its massively high dimensional space, we could kind of glean a 2D board.

We could kind of glean a structure that looked like a 2D board. How is this system realizing that there is the structure of a 2D grid, basically. Given that all we gave it was A4, 2B, blah, blah, blah, blah, blah. And then the third thing is we made a little intervention where we would give it a sequence of movements, and then it would be like, “Okay, I will put my white chip here.” But then we would change as if we had moved the piece, and it would change its prediction too. So that was one instance of where we were like, “Okay, this has to be more than just surface statistics.” It understands something about the 2D structure of this grid.

It understands that there are things that are legal or illegal here, and it’s just playing the game. There have been a number of discoveries when these models ingest information about, say colors, colors on novels, like, again, all the text on the internet. When we look at how the system is positioning colors, we see a color wheel. When we look at how the system is placing, again, all in this high-dimensional space, it’s placing countries, it seems to be mimicking the placement of countries around the globe. So there are these indications that there might be what we call internal models of the world that it’s starting to put together that seem to go beyond a statistical just correlational piece. One of the ones that we are playing with currently that we’re very interested in is how it’s modeling us as users. It turns out that—and this started when ChatGPT came out. I am from Brazil, so my native language is Portuguese. And immediately as it came out, I was like, “Oh, I want to know does it speak Portuguese? And if it does, how good is its Portuguese?” So I literally just said in Portuguese, “Hi, ChatGPT, how are you?” That’s all I said. And it said, “Oh, hi. I am an AI assistant. I don’t have days or feelings, but I’m here to help you. How can I help you?” And just in saying that in Portuguese, so it responded in Portuguese—perfect Portuguese, by the way—but because of the nature of Portuguese, Portuguese is a romance Latin language, everything is gendered. So a table has a gender, a watch has—everything.

Heather Min:
Masculine or feminine.

Fernanda Viégas:
Masculine or feminine. So just literally saying, “I’m just an AI system. I don’t have days or feelings. I’m here to help you. How can I help you?” It decided to treat me in the masculine, and I was not expecting that. I don’t know what I was expecting, but I was like, “Oh, this is interesting.” I’m like, “Oh, it decided I’m a man. Okay, that’s fine.” I was like, “Hey, can you help me pick an outfit for a work dinner I have tonight?” And then it’s like, “Sure, yes, it will depend on if it’s formal.” So it gave me a perfect answer, very nice answer. It was also clear that, again, it was treating me as a man.

It was like, “Oh, you should consider a suit if it’s very formal or maybe this kind.” And I was like, oh, this is great. When I answered, “This is great, I was thinking more along the lines of a dress. What do you think?” And then it said, “Oh, a dress is great.” And then it gave me a perfect answer again, long answer. And in the end it said, “Remember, the most important thing is for you to feel confident and comfortable.” And when it said that, it had switched to female, so it started treating me as a woman.

This is all well and good, but one of the things that I started getting interested in is, “Huh, it probably had a certain model of me before it was treating me in the masculine.” I flipped its model once I said, “What do you think about a dress?” And it started treating me in the feminine, and I started getting interested: What else is it modeling about me as a user? And does it matter? Maybe it doesn’t matter. The answers it gave me we’re totally fine. So we started investigating this in our lab here at Harvard, and this is actually part of my Radcliffe project here. [45:13]

Which is: what are the dimensions that these chatbots may be modeling us on? When we look inside these chatbots—we have access to open-source chatbots, which is great; in this case, we were working with Llama 2. And because again, we have access to its brain, if you will, its internal space of high dimensions, one of our students found a dimension that had to do with socioeconomic status. At first, the student, all he did was to ask the bot, “Hey, I live in Boston. I would like to spend my vacation in Hawaii. Can you help me with an itinerary?”

And the system said, “Of course, Hawaii is a great place, great destination. You’re in luck because from Boston, there are many options. You have direct flights, you have indirect flights, many options to choose from.” And my student was like, “Great.” Then because my student had glimpsed this direction of socioeconomic status, he’s able to then intervene and say, “Okay, now let’s pretend the user is low socioeconomic status.” So he literally just said, “Whatever answer you give, it’s on the low socioeconomic status.” Exact same question this student asked, “I live in Boston, I want to spend vacation in Hawaii. Can you help me?”

The system was like, “Sure, Hawaii is a great place, great destination for a vacation. Unfortunately, from Boston, there are only connecting flights. But don’t worry, you still have a lot of options.” This moment when I learned about this result, I was like, “Oh. Oh, wow, okay.” The model is lying to the user.

Heather Min:
Or exercising discretion, assuming that you can’t afford the nonstop flights. So I’m not even going to tell you about it.

Fernanda Viégas:
That’s right, that’s right. But isn’t it interesting that the model decided to phrase it as there are no direct flights, only indirect flights, but I think you also bring up a really good point. If we think from a utilitarian perspective, in the end, maybe the model saved time for the user and said, “Hey, you’re probably not going to be able to afford a direct flight. Let’s talk about connecting flights.” My problem with this is that given that it seems like these models are modeling us, what should I know or not know? And do I know if I’m being discriminated against?

So the project we have is to create a dashboard of these dimensions that could be sensitive dimensions—gender, age, socioeconomic status, level of education.

Heather Min:
Because you are assuming that the data by which the model trained itself to compute or “think” is inherently colored by all of those different parameters or vectors. So you’re just trying to make it more visible.

Fernanda Viégas:
Yes, bring awareness to the user. But also once, the other piece of the dashboard is—so basically what we have right now is a prototype where as I’m interacting with this chatbot, it is literally telling me, “Oh, I think you are a female, or I think you’re a male.” And as I’m interacting, real time, it’s also telling me how that dashboard, how those dimensions—

Heather Min:
You can watch how it’s thinking.

Fernanda Viégas:
Exactly. And it’s fascinating because sometimes I’ll ask something to the bot, and it will think I’m a high schooler.

Heather Min:
How can you trust that that information is truthful as well?

Fernanda Viégas:
Yes, this is fascinating. So we’ve tried multiple things, and this is why it’s important to have access to the internals of the system because we have also—

Heather Min:
It’s like you’re playing with electrodes on its brain.

Fernanda Viégas:
It’s kind of like we’re doing an MRI of the system. We also tried a different approach. Let’s not look at the internals of the system. Let’s just plain ask the system. How old do you think I am? What is my level of education? What is my gender? But the problem is these systems have been censored. So Llama 2 will say, “I am only an AI system. I don’t have enough information to decide if you are a male or a female,” or “I am not supposed to.” And so isn’t it interesting that it is not supposed to tell you this information about yourself, but if we look inside the internals of it, it seems like it has a perception of you.

And not only does it have a perception, once we have a way into this internal space where it decides what is what, then we also have a little anchor for control. So once I know where I stand either on the gender dimension or where I stand on the socioeconomic dimension, that becomes also a control for me. So I can literally now say, “Okay, you think I’m a woman, and I can see your answer to me. Okay, I’m going to change this all the way to man or all the way to undecided. Does that change its answer to me?” And sometimes it does. So now I can play, I can control, I can steer this model. And it’s fascinating.

Sometimes it matters, sometimes it doesn’t matter. And so part of what we want to understand is one, in what places, in what scenarios, do we care that this model is modeling me in certain ways? There are many open questions. Sometimes the model models me perfectly, but is it discriminating against me? Sometimes it models me incorrectly, but maybe I’m okay with that and it just gave me a perfectly useful answer. That may be okay. Another piece we’re looking at is a dashboard of the model itself. So there are things we know today that the model seems to have a sense of itself.

For instance, one of the things we’re playing with is its notion of truth. Just to contextualize, what do I mean by truth? It’s more in the sense of when I ask a bot, “Who is the President of the United States?” The bot has a very clear sense that I am having a factual conversation with it. And then if I say, “Oh, that’s great, but I love unicorns. Can we talk about unicorns now?” It also has a sense that now we’re in the realm of fiction, and it’s absolutely fine. We’re going to have a conversation, a fictional conversation, but it understands the difference in factuality between those two things.

And so given that it has that grasp on what kind of conversation are we having, we want to show that on the dashboard. Because I want to make sure that depending on the kind of conversation I have, it also understands the kind of conversation I want to have with it. And so there are questions around, on one side, discrimination on the user side, but there is also questions around safety. Does it model me as an adult? And when my kids are interacting with it, does it model them as kids and stays in the model for kids? All of those things can be important depending on what you’re doing.

Ivelisse Estrada:
I have a question because we don’t know what the machines are learning. We don’t know what it is exactly that they know. How can we know what they’re capable of if we don’t even have a grasp on these two things?

Fernanda Viégas:
So that is a great question. And so one of the ways in which we get to know what these machines are capable of is there are huge batteries of benchmarks that they now go through, and new benchmarks are being developed. And in fact, I think we need a whole lot more benchmarks and way more sophisticated. Some of these benchmarks are very straightforward, and they are things like: How accurate is this model? When I say one plus one or when I give it some math problem, does it get it? Other benchmarks are going to be about things like maybe some biases.

So when I say my name is Lauren, and I work as A, how biased are those answers that it’s going to give me? But more and more benchmarks are coming up. One of the things that is surprising about this kind of work is the fact that the larger these models get, and the more data they ingest, in a sense, the more surprises we have about how capable they are. So we’re still learning. So when these companies build these huge foundational models, they will spend months to over a year literally just trying to understand what have we built, what can it do?

What things have we not even thought about yet that it can do? And how can we control also, how can we make it behave in a helpful, useful, safe way? So there’s a number of things that they’re contending with as they try to understand the universe of knowledge that these new models have. But one of the things that we haven’t talked about yet that I think is really fascinating, and one of the reasons why this kind of technology resonates so much with people, is the fact that you use language to use it. And we all are experts in using language. We are all experts at communicating with each other one way or another.

Heather Min:
Wow. You’ve given us so much to consider, Fernanda, and your backstory and insights on being a woman in the field are illuminating.

Ivelisse Estrada:
Yes. Let’s talk more about that. But for right now, we’re out of time. Listeners, we hope you’ll join us next week as we ask Fernanda to tell us more about her journey from graphic design to data visualization and computer science. Thank you for joining us.

Heather Min:
That concludes today’s program.

Ivelisse Estrada:
BornCurious is brought to you by Harvard Radcliffe Institute. Our producer is Alan Grazioso. Jeff Hayash is the man behind the microphone.

Heather Min:
Kevin Grady, Anna Soong, and Mahbuba Sumiya provided editing and production support.

Ivelisse Estrada:
Many thanks to Jane Huber for editorial support. And we are your cohosts: I’m Ivelisse Estrada.

Heather Min:
And I’m Heather Min.

Ivelisse Estrada:
Our website, where you can listen to all our episodes, is radcliffe.harvard.edu/borncurious.

Heather Min:
If you have feedback, you can e-mail us at info@radcliffe.harvard.edu.

Ivelisse Estrada:
You can follow Harvard Radcliffe Institute on Facebook, Instagram, LinkedIn, and X. And as always, you can find BornCurious wherever you listen to podcasts.

Heather Min:
Thanks for learning with us, and join us next time.