ChatGPT can hold a conversation, but lacks knowledge representation and original sources for verification

ChatGPT represents a next step in the evolution of large language models, says Detlef Nauck. However, there are still major challenges - and concerns - to overcome.

Machine learning
Large language models
AI
Author

Brian Tarran

Published

January 27, 2023

ChatGPT is, right now, the world’s most popular - and controversial - chatbot. Users have been both wowed by its capabilities1 and concerned by the confident-sounding nonsense it can produce.

But perhaps what impresses most is the way it is able to sustain a conversation. When I interviewed our editorial board member Detlef Nauck about large language models (LLMs), back in November, he said:

… if you use these systems for dialogues, then you have to script the dialogue. They don’t sustain a dialogue by themselves. You create a dialogue tree, and what they do is they parse the text that comes from the user and then generate a response to it. And the response is then guided by the dialogue tree. But this is quite brittle; it can break. If you run out of dialogue tree, you need to pass the conversation over to a person. Systems like Siri and Alexa are like that, right? They break very quickly. So, you want these systems to be able to sustain conversations based on the correct context.

Fast-forward a couple of months and, as discussed in our follow-up interview below, OpenAI, the makers of ChatGPT, have succeeded in building a question answering system that can sustain a dialogue. As Nauck says: “I have not yet seen an example where [ChatGPT] lost track of the conversation… It seems to have quite a long memory, and doing quite well in this.”

There are still major challenges to overcome, says Nauck - not least the fact that ChatGPT has no way to verify the accuracy or correctness of its outputs. But, if it can be linked to original sources, new types of search engines could follow.

Check out the full conversation below or on YouTube.

Detlef Nauck is a member of the Real World Data Science editorial board and head of AI and data science research for BT’s Applied Research Division.

Timestamps

  • How ChatGPT was built and trained (0:41)
  • ChatGPT’s major advance (3:05)
  • The big problems with large language models (4:36)
  • Search engines and chatbots (9:35)
  • Questions for OpenAI and other model builders (11:29)

Quotes

“[OpenAI] have achieved quite remarkable capabilities in terms of sustaining conversations, and producing very realistic sounding responses… But sometimes [ChatGPT] makes silly mistakes. Sometimes the mistakes are not that obvious. It can hallucinate content… And it still doesn’t know what it’s talking about. It has no knowledge representation, doesn’t have a word model. And it’s just a statistical language model.” (2:04)

“These models, they produce an answer, which is based on the kind of texts that they have been trained on. And that can be quite effective. But it cannot yet link back to an original source. So what’s still missing is the step where it says, ‘Okay, this my answer to your question, and here’s some evidence.’ As soon as they have done this, then these kinds of systems will probably replace the search engines that we’re used to.” (4:07)

“[These large language models are] still too big and too expensive to run… For [use in a] contact centre or similar, what you need is a much smaller model that is restricted in terms of what it can say. It should have knowledge representation, so it gives correct answers. And it doesn’t need to speak 48 languages and be able to produce programming code. It only needs to be able to talk about a singular domain, where the information, the knowledge about the domain, has been carefully curated and prepared. And that’s what we’re not seeing yet. Can we build something like this, much smaller, much more restricted, and provably correct, so we can actually use the output?” (7:49)

“We are seeing communities who don’t necessarily have the technical background to judge the capabilities of these models, but see the opportunities for their own domain and might be acting too fast in adopting them. So the producer of these models has a certain responsibility to make sure that this doesn’t happen.” (12:26)

Further reading

Transcript

This transcript has been produced using speech-to-text transcription software. It has been only lightly edited to correct mistranscriptions and remove repetitions.

Brian Tarran
We’re following up today Detlef on the, I guess, one of the biggest stories in artificial intelligence and data science at the moment, ChatGPT, the chat bot that’s driven by a large language model and is proving endless amounts of– providing endless amounts of either entertainment or concern, depending on what you ask it, and what outputs you get. So, but you’ve been looking at it in some detail, right, ChatGPT. And that’s why I thought we would follow up and have a conversation to see, get your view on it, get your take on it. What’s going on?

Detlef Nauck
Yeah. So, what they have done is, OpenAI have used their large language model GPT-3 and they have trained an instance to basically answer questions and have conversations, where the model remembers what has been said in the conversation. And they have done this by using curated data of question and answers, where they basically have posed a question and said, This is what the answer should be. They trained the system on doing this, then, in the next step, they began use questions, potentially different ones, the system came up with a variety of answers, and then again, human curators would mark which is the best answer. And they would use this data to train what’s called a reward model - so, a separate deep network that learns what kind of answer for a particular question is a good one - and then they would use this reward model to do additional reinforcement learning on the ChatGPT that they had built so far, basically using dialogues and the reward model would then either reward or penalise the response that comes out of the system. And by doing that they have achieved quite remarkable capabilities in terms of sustaining conversations, and producing kind of very realistic sounding kind of responses. Sounds all very convincing. The model presents its responses quite confidently. But sometimes it makes silly mistakes. Sometimes the mistakes are not that obvious. It can hallucinate content. So let’s say you ask it to write you scientific text about whatever topic and put some references in and these references are typically completely fabricated and not real. And it still doesn’t know what it’s talking about. It has no knowledge representation, doesn’t have a word model. And it’s just a statistical language model. So it’s what we would call a sequence to sequence model. It uses an input sequence, which are words, and then guesses what’s the next most likely word in the sequence. And then it continues building these sequences.

Brian Tarran
Yeah. But, do you think the big advance as you see it is the way it’s able to remember or store some knowledge, if you like, of the conversation, because that was something that came out of our first conversation that we had, where you were saying that, you know, if you’re looking at these as a potential chatbots for customer service lines, or whatever it might be, actually, the trees, the conversation trees break down after a while, and they don’t, you know, these models get lost, but actually, they’re able to maintain it a little longer, are they, or– ?

Detlef Nauck
Yeah, I have not yet seen an example where they lost track of the conversation they seem to have, it seems to have quite a long memory, and doing quite well in this. So the main capability here is they have built a question answering system. And that’s kind of the ultimate goal for search engines. So if you put something into Google, essentially, you have a question, show me something that answered this, answers this particular question. Of course, what you want this kind of an original source. And these models, they produce an answer, which is based on the kind of texts that they have been trained on. And that can be quite effective. But it cannot yet link back to an original source. So what’s still missing is the step where it says, Okay, this my answer to your question, and here’s some evidence. Then if, as soon as they have done this, then these kinds of systems will probably replace the search engines that we’re used to.

Brian Tarran
Yeah. The other thing that struck me with them was that the, if you’re asking somebody a question - a human, you know, for instance - you expect a response that and you would hope you will be able to trust that response, especially if it’s someone in an expert position or someone you’re calling, you know, on behalf of a company or something. The fact that - and I asked this question of ChatGPT itself - and the response was, again, you should consult external sources to verify the information that’s been provided by the chatbot. So it’s like, I guess that leaves a question as to what the utility of it is, if you if you’re always having to go elsewhere to verify that information.

Detlef Nauck
Yeah, I mean, that’s the main problem with these models, because they don’t have a knowledge representation. They don’t have a word model, they can’t fall back on facts that are represented as being true and present those. They come up with an answer. But I mean, there has been a lot of kind of pre-prompting going in to ChatGPT. So when you start writing something, the session has already been prompted with a lot of text, telling the model how to behave, what not to say, to avoid certain topics. There are additional moderation APIs running that make sure that you can’t create certain type of responses, which are based on classical text filtering, and topic filtering. So they try to kind of restrict what the model can do to make sure it’s not offensive or inappropriate. But that is limited. So through crafting your requests, intelligently, you can convince it to ignore all of these things and go past it in some instances. So the, it’s not yet perfect, and certainly it’s not authoritative. So you can’t trust the information if you’re not an expert yourself. So at the moment, I’d say these kind of models are really useful for experts who can judge the correctness of the answer. And then what you get this kind of maybe a helpful kind of text representation of something that you would have to write yourself otherwise.

Brian Tarran
Yeah, and certainly conversations I’ve had with people, those who kind of work, maybe in creative industries, are finding them quite intriguing, in terms of things like, you know, maybe trying to come up with some clever tweets or something for a particular purpose, or something I want to try out is getting ChatGPT to write headlines for me, because it’s always my least favourite part of the editing job. So that sort of works. But you know, for you, in your position in the industry, has ChatGPT changed your mind at all about, you know, the way you’re perceiving these models and how they might be used? Or is it is it just kind of a next step along in the process of what you’d expect to see before these can become tools that we use?

Detlef Nauck
Yeah, it’s the next step in the evolution of these models. They’re still too big and too expensive to run, right. So now, it is not quite clear how much it costs OpenAI to run the service that they’re currently running. So you see estimates around millions of dollars per day that they have to spend on running the compute infrastructure to serve all of these questions. And this is not quite clear, the only official piece of information that I’ve seen is in a tweet, where the CEO said, a single question costs in the order of single digit cents, but we have no idea how many questions they serve per day, and therefore how much money they are spending. If you want to run a contact centre, or something like this, it all depends on how much compute need to stand up to be able to respond to hundreds or thousands of questions in parallel. And then obviously, if you can’t trust that the answer is correct, it is of no use. So for making use in the service industry for contact centre or similar, what you need is a much smaller model that is restricted in terms of what it can say, it should have knowledge representation, so it gives correct answers. And it doesn’t need to speak 48 languages and be able to produce programming code, it only needs to be able to talk about a singular domain, where it kind of the information, the knowledge about the domain has been carefully curated and prepared. And that’s what we’re not seeing yet. Can we build something like this, much smaller, much more restricted, and kind of provably correct, so we can actually use the output?

Brian Tarran
Yeah. Can we go back just to the point you mentioned earlier about, you know, the, the potential of like linking these sorts of chatbots up with search engines, you know, like Google? There’s been some conversations and reporting around, you know, what breakthroughs or not Google might have made in this regard. I mean, have you got any perspective on that area of work and how far along that is maybe and what the challenges are to get to that point?

Detlef Nauck
Well, Google has its own large language model, LaMDA. And we have seen an announcement that Microsoft wants to integrate ChatGPT into Bing, their search engine. And, but as I said before, what’s missing is the link to original sources. So you, coming up with a response is nice. But you need to be able to back it up, you need to say, Okay, this is my response, and I’m confident that this is correct, because here are some references. If I compare my response to these references, then they essentially mean the same thing. This is kind of what you need to be able to do. And we haven’t seen this step yet. But I’m certain that the search engine providers are hard at work at doing this because that’s essentially what they want. If you do a search in Google, in some instances, you’ll see a side panel where you get detailed information. Let’s say you ask about what’s the capital of Canada, you get a response, you get the information in more detail, you get links to Wikipedia, where they retrieve content from and present this as the response. And this is done through knowledge graphs. And so if these kinds of knowledge graphs grow together with these kind of large language models, then we will see new types of search engines.

Brian Tarran
Okay. I guess final, my final question for you, Detlef, and there might be other angles that you want to explore. But it’s like, are there questions that, you know, if you if you could sit down with OpenAI to talk about ChatGPT and what they’ve done, and what they plan to do next with it, what are the kinds of things that are bubbling away at the top of your mind?

Detlef Nauck
Well, one thing is controlling the use of these models, right? If you let them loose on the public, with an open API that anybody can use, you will see a proliferation of applications on top of it. If you go on YouTube, and you Google ChatGPT and health, you’ll already find discussions where GPs discuss, Oh, that is the next step of automated doctors that we can use. So they believe that the responses from these systems can be used for genuine medical advice. And that’s clearly a step too far. So we are seeing communities who don’t necessarily have the technical background to judge the capabilities of these models, but see the opportunities for their own domain and might be acting too fast in adopting them. So the producer of these models has a certain responsibility to make sure that this doesn’t happen. And I don’t know how they want to control this. And, so my question at the developers of these models would be how do you handle sustainability, because the trend goes to ever bigger models. So there’s, in some parts of the industry, there’s the belief, if you make them big enough you get artificial general intelligence, which I don’t believe is possible with these models. But this is definitely a trend that pushes the size of the models. The kind of, the idea of having just one model that can speak all the languages, can produce questions, answers, programming code, is obviously appealing. So you don’t want to build many models. Ideally, you have only one. But how is that supposed to work? And how do you embed actual word knowledge and word models into these systems so that you can verify what comes out?

Brian Tarran
Yeah. I mean, the ethical dimension that you mentioned in the first part of your response is an important one, I think, in the sense that– but I guess maybe almost redundant in the sense that it’s already out there; you can’t put ChatGPT back in the box, can we, essentially?

Detlef Nauck
Well, it’s expensive to run so charging enough for access will put a lid on some frivolous use cases, but still, it needs to be controlled better. And you can make a jump to an AI regulation. So far, we only thought about regulating automated decision making, or automated classification. We also have to think about the automatic creation of digital content or automatic creation of software, which is possible through these models or the other generative AI models like diffusers. So how do we handle the creation of artificial content that looks like real content?

Brian Tarran
Yeah. And there’s also I think, something I picked up yesterday, there was reports of a case being filed by, I think, Getty Images against the creators of one of these generative art models because they’re saying, you know, that you’ve used our data or you’ve used our image repositories essentially to train this model and it is now producing, you know, it’s producing its own outputs that’s based on this, and I guess there’s an argument of it being a copyright infringement case. And I think that’ll be quite interesting to watch to see how that does change the conversation around - yeah - fair use of that data that is available. You can find these images publicly, but you have to pay to use them for purposes other than just browsing, I guess. Yeah, it’ll be interesting to watch.

Have you got news for us?

Is there a recent data science story you’d like our team to discuss? Do you have your own thoughts to share on a hot topic, or a burning question to put to the community? If so, either comment below or contact us.

Back to Editors’ blog

Copyright and licence
© 2023 Royal Statistical Society

This work is licensed under a Creative Commons Attribution 4.0 (CC BY 4.0) International licence, except where otherwise noted.

How to cite
Tarran, Brian. 2023. “ChatGPT can hold a conversation, but lacks knowledge representation and original sources for verification.” Real World Data Science, January, 27 2023. URL

Footnotes

  1. I asked ChatGPT to write this article’s headline, for example. I typed in “Can you write a headline for this text:” and then copy/pasted the interview transcript into the dialogue box. It first came up with, “AI Chatbot ChatGPT Proves Capable in Sustaining Conversations but Lacks Knowledge Representation and Original Sources for Verification”. I then asked it to shorten the headline to 10 words. It followed up with, “ChatGPT: Large Language Model-Driven Chatbot Proves Capable But Limited”.↩︎