AI series: What is “best practice” when working with AI in the real world?

Working with AI in real world conditions can be quite a different proposal to the idealised settings often discussed “in theory”. Guest editor Anna Demming speaks to a panel of experts about how to meet “best practice” aspirations within real world constraints, and how to avoid common pitfalls.

large language models
machine learning

Anna Demming


June 4, 2024

Over the course of the Real World Data Science AI series, we’ve had articles laying out the nitty gritty of what AI is, how it works, or at least how to get an explanation for its output as well as burning issues around the data involved, evaluating these models, ethical considerations, and gauging societal impacts such as changes in workforce demands. The ideas in these articles give a firm footing for establishing what best practice with AI models should look like but there is often a divide between theory and practice, and the same pitfalls can trip people up again and again. Here we discuss how to wrestle with real world limitations and flag these common hazards.

Our interviewees, in order of appearance, are:

Ali Al-Sherbaz, academic director in digital skills at the University of Cambridge in the UK

Janet Bastiman, Napier chief data scientist and chair of the Royal Statistical Society Data Science & AI Section

Jonathan Gillard, professor of statistics/data science at Cardiff University, and a member of the Real World Data Science Board

Fatemeh Torabi, senior research officer and data scientist, health data science at Swansea University, and also a member of the Real World Data Science board

It is often said that while almost everybody is now trying to leverage AI in their projects, most AI projects fail. What nuggets of wisdom do the panel have for swelling that minority that succeed with their AI projects, and what should you do before you start doing anything?

Ali Al-Sherbaz: It’s not easy to start, especially for people who are not aware how AI works. My advice is, first, they have to understand the basics of how AI works because the expectation could be overpromising, and that is a danger. Just 25 years ago, a master dissertation might be about developing a simple – we call it simple now but it was a master’s project 25 years ago – a simple model with a neural network of a combination of nodes to classify data. Whatever the data is – it could be drawing shapes, simple shapes, square, circle triangle – just classifying them was worth an MSc. Now, kids can do it. But that is not the same as understanding what the neural network or the AI is. It’s a matrix of numbers, and actually, for the learning process each does multiple iterations to find the best combination of these numbers – product of sum; sum of product – to classify, to do something, and train them for a certain situation, and that is a supervised learning. Over the last 25 years – especially in the last 10 years – the computational power is getting better, so AI is now working better.

There are other things people have to learn. There’s the statistics as well, and of course people who would like to work in AI and data science must understand the data, and they should also be experts in the data itself. For instance, I can talk about cybersecurity, I can talk about networking and other things, but if it comes to something regarding health data, or financial services, or stock markets, I’m not an expert in the data. So I’m not going to be actively working on those things even if I use the same AI tools. This is in a nutshell why I think some people fail sometimes using AI, or they succeed using AI. And we should emphasise the human value. The AI is there, and it exists to help us to make a better more accurate decision, but the human value is still there. We have to insist on that.

Janet Bastiman: I would just like to build on all of that great stuff that Ali’s just said. When you look at basically the non-data scientist side of it, you often get businesses who think AI can solve a certain problem. They might go out and hire a team – whether that’s directly or indirectly – and get them to try and solve a problem that, as Ali said, they may not have the domain expertise for. The business might not even have the right data for it, and AI might not even be the right way of solving that problem. I think that’s one of the fundamental things to think about – really understanding what you’re trying to solve, and how you’re going to solve it before you start throwing complex tools, and potentially very expensive teams at the problem.

When you look at a lot of the failures, it’s been because businesses have just gone, we can solve this problem, I’m just going to hire a team and let these intelligent people look at something. And then they’re restricted on the data that they’ve got, which won’t even answer the question; they’re restricted on the resources they have; and even restricted in terms of wider buy in from the company. So really understanding what is it that you want to solve? What are you trying to do? Is AI the right thing? And can you even do it with the resources you have available? And I think that’s, that’s a fundamental starting point. Because, you can have wonderful experts, who have that domain knowledge, who understand the statistics, and all that essential stuff that Ali just said. But then if from a business point of view, if you don’t give them the right data to work on, or you don’t let them do their job and tell you when they can’t do their job, then again, you’re going to be doomed to failure.

Jonathan Gillard: Explainability is a big issue when it comes to AI models, as well. They are at the moment, very largely “black box” – data goes in, then these models get trained on dumb data and answers get popped out. And when it works, well, it works fabulously well. And we’ve seen lots of examples of that happening. But often for business, industry or real life, we want to learn. We want to understand the laws of the universe, and to understand the reasons why this answer came about. Because this explainability piece is missing – because everything is hidden away almost – I think that’s a big issue in successful execution. And particularly when it comes to industries where there’s a degree of regulation there as well, if you can’t explain how a particular input arose to a particular output, then how can you justify to regulatory bodies that what you’ve got is satisfactory, ethical, and that you’re learning and you’re doing things in the right way?

There have been efforts at trying to get explanations from these models. How do you think things are progressing there?

JG: Yeah, that’s a good question. I think where we are with explainability is in very simple scenarios, very simple models. This is where traditional statistical models do very well. There’s an explicit model which says if you put these things inside then you’ll get this output. So [for today’s AI] I think we’re actually very far away from having that complete explainability picture, particularly as we fetishise more and more grand models. The AI models are only getting bigger, more complex, and that makes the explainability per se even more challenging. And that’s why I think, as Ali says, at the moment, the human in the loop is absolutely crucial.

What AI does share with classical statistics (or classical data science if you want to call it that) is it can still only be as good as the data that’s put into it, that’s still a fundamental truth. I think a lot of the assumptions currently with AI models – and this is where there could be a few trip ups is that it can create something from nothing. It’s “artificial intelligence” – almost the wording suggested it’s artificial. But fundamentally, we still need a robust and reliable comprehensive source of data there in order to train these models in the first place.

In terms of having outsourced expertise for these projects– does that make more problems if you’re then trying to understand what this AI has done?

JB: Oh, hugely. Let’s say that domain expertise – that’s something Ali touched on –you’ve got to understand your data. Because even that fundamental initial preparation of data before you try and train anything is absolutely crucial – really looking at where are the gaps? Where are the assumptions? How is this data even being collected? Has it been manipulated before you got to it? If you don’t understand your industry, well enough you won’t know where those pitfalls might be – and a lot of teams do this, they just take the data, and then they just put it in, turn the handle and out comes something and it looks like it’s okay. What they’re really missing there – because they’re not putting that effort in to really understand those inputs, what the models are doing, they’re just turning the handle until they get something that feels about right – what they miss out is where it goes wrong. And there are some industries, where the false positives and false negatives from classification or the bad predictions from running things really have a severe human impact. And if you don’t understand what’s going in, and the potential impact of what comes out, then it’s very, very easy to just churn these things out and go, “it’s 80% accurate, but that’s fine” without really understanding the human impact of the 20% [that it gets wrong].

Going back to what Jon said about that explainability, it’s so crucial. It is challenging, and it is difficult, but going from these opaque systems to more transparent systems – we need that for trust. As humans, we divulge our trust very differently, depending on the impact. One of the examples I use all the time is, you know, sort of weather prediction stuff, you know, we don’t really care too much, because it’s not got a huge impact. But when you look at sort of financials or medicals, we really, really want to know that that output is good, and how we got to that output. The Turing Institute’s come out with some great research that says, as humans, if we want to understand why when another human has told us something, then we want the same thing from the models, and that can vary from person to person. So building that explainable level into everything we do, has to be one of the things we think about upfront. But you’ve got to really, truly deeply understand that data. And it’s not just a question of offloading a data set to a generalist who can turn that handle, otherwise you will end up with huge, huge problems.

Fatemeh Torabi: I very much agree with all the points that my colleagues raised. I also think it’s very important that we know why we are doing things. Having those incremental stages in our planning for any project, and then having a vision of where we see AI can contribute into this process and can give us further efficiency – and how – is very important. If we don’t have defined measures to see how this AI algorithm is contributing to this specific element of the project, we can get really lost bringing these capabilities on board. Yes, it might generate something, but how we are going to measure that something is very important. I think, as members of the scientific community, we must all view AI as a valuable tool. However, it has its own risks and benefits.

For example, in healthcare when we use AI for risk predictions, it can be a really great tool to aid clinicians to save time. However, in each stage, we need to assess the data quality, how these data are fed into the algorithm, what procedures, what models, and how we generate those models. And then which discriminative models do we use to balance the risk and eventually predict the risk of outcomes in patients? It’s very much a balance between risks and benefits for usefulness of these tools in practice. We have all these brilliant ideas of what best practice is. But in real terms, sometimes it’s a little bit tricky to follow through.

Could you give us some thoughts on the sort of best practice with data, for example, that doesn’t quite turn out to be quite so easy to follow in practice, and what you might do about it?

FT: We always call these AI algorithms, data hungry algorithms, because the models that we fit require us to see patterns in the data that we feed into them so that the learning happens. And then the discriminative functions come in place to balance and kind of give a score to wherever the learning is happening and give an evaluation of each step. However, the data that we put into these algorithms comes first – the quality of that data. Often in healthcare, because of its sensitivity, the data is held within a secure environment. So we cannot, at this point in time, expose an AI algorithm to a very diverse example, specifically for investigating rare diseases or rare conditions. And above that, there is also complexities in the data itself. We need to evaluate and clean the data before we feed it into these algorithms. We need to evaluate the diversity of the data itself – for example, the tabular data, the imaging data, the genomic data – and each one requires its own specific or tailored approach in data cleaning stages.

The panel. Clockwise from top left: Ali Al-Sherbaz, Janet Bastiman, Fatemeh Torabi and Jonathan Gillard
Figure 1: The panel. Clockwise from top left: Ali Al-Sherbaz, Janet Bastiman, Fatemeh Torabi and Jonathan Gillard

We also have another level that is now being discovered in the health data science community, which is the generation of synthetic data. We can give AI models access to these synthetic versions of the data that we hold. However, that also has its own challenges because it requires reading the patterns from real data, and then creating those synthetic versions of data.

For example, Dementia Platforms UK is one of the pioneers in developing this. We hold a range of cohort data, patients’ data, genomics data and imaging data. In each one of these when we try to develop those processing algorithms, there are specific tailored approaches that we need to consider to ensure we are actually creating a low fidelity level of data that is holding some of the patterns in it for the AI algorithm to allow the learning to happen. However, we also need to consider whether it is safe enough so that we can ensure the data provided are secure to be released for use at a lower governance level compared to the actual data. So there are quite a lot of challenges, and we captured a lot of it in our article.

A A-S: I can talk about the cybersecurity and other relevant data network security, the point being the amount of data we receive to analyse. It’s really huge. And when I say huge I mean about one gigabyte, probably in a couple of hours, or one terabyte in a week – that’s huge. One gigabyte of a text file – if I printed out this file with A4 – that would leave me with a stack of A4 paper, three times the Eiffel Tower.

Now, if I have cyber traffic, and try to detect any cyber attack, AI helps with that. However, if we train this model properly, they have to detect cyber attacks in real time – when I say real time, we’re talking about within microseconds or a millisecond – and the decision has to be correct. AI alone doesn’t work, doesn’t help. Humans should also intervene, but rather than having 100,000 records to check for a suspected breach, AI can reduce that to 100. A human can interact with that. And then in terms of the authentication or verification, humans alongside AI can learn whether this is a false positive, or a real attack or a false negative. This is a challenge in the cybersecurity area.

JB: I just wanted to dive in from the finance side – again the data is critical, and we have very large amounts of data. However in addition – and I think we probably suffer from the same sort of problem that Ali does in this – when I’m trying to detect things, there are people on the other side actively working against what I’m trying to detect, which I suppose is a problem that maybe Fatemeh doesn’t have in healthcare.

When you’re trying to build models to look for patterns, and those patterns are changing underneath you, it can be incredibly difficult. I have an issue that all of my client’s data legally has to be kept separated – some of it has to be kept in certain parts of the world so we can’t put that into one place. We can try and create synthetic data that has the same nuances of the snapshots that we can see at any one point in time, and we can try and put that together in one place, but what we can detect now will very quickly not be what we need to detect in a month’s time. As soon as transactions start getting stopped, as soon as suspicious activity reports are raised, and banks are fined, everything switches and how all of that financial crime occurs, changes. And it’s changing, on a big scale worldwide, but also subtly because, there are a team of data scientists on the other side trying desperately to circumvent the models that me and my team are building. It’s absolutely crazy. So while I would love to be able to pull all of the data that I have access to in one place and get that huge central visual view, legally I can’t do that because of all the worldwide jurisdictional laws around data and keeping it in certain places.

Then I’ve also got the ethical side of it, which is something that Fatemeh touched on. If I get it wrong, that can have a material impact on usually some of the most marginalised in society. The profile of some of the transactions that are highly correlated with financial crime are also highly correlated with people in borderline poverty, even in Western countries. So false positives in my world have a huge, huge ethical impact. But at the same time, we’re trying really hard to minimise those false negatives – that balance is critical, and the data side of it is such a problem.

Fatemeh mentioned the synthetic side of it. There’s a huge push, particularly in the UK to get good synthetic data to really showcase some of these things that we’re trying to detect. But by the time you get that pooling, and the synthesising of data that you can ethically use and share around without fear of all the legal repercussions, what we’re trying to detect has already moved on. So we’re constantly several steps behind.

I imagine Ali has similar problems in the cybercrime space in that as soon as things are detected, the ways in which they work move on. So there’s an awful lot I think that, as an industry, although we have different verticals, we can share best practices on.

Is there a demand for new types of expertise?

A A-S: There is a huge gap in the in the UK, at least and worldwide about finding people working as a data scientist or working with the data. So we created a course in Cambridge, which we call the data science career accelerator for people who work in data, and would like to move on and learn more. We did market research, and we interviewed around 50 people between CEO and head of security and head of data scientists, in science departments and in industry, to tell us – what kind of skills are you after? What problems do you currently have? And then we designed this course.

We found that first of all there are people who don’t know from where to start – what kind of data they need, what tools they have to learn with… Even if they learn the tools, they still need to learn what kind of machine learning process to use. And then suddenly, we have ChatGPT turned out, and the LLM [large language model] development – all of that in one course, it is a real challenge.

The course has started now, the first cohort. The big advice from industry we have is that during the course they have to work on real world case studies, on scenarios with data that nobody has touched before – that is, it’s new, not public. We teach them on a public data, but companies also have their own data, and we get consent from them to use that data for the students so we can test the skills they learned on virgin data that nobody has touched before.

We just started this month, and the students are going to start with the first project now. They are enjoying the course but that is the challenge we have now. How did we handle that? It’s to work together with the industry side by side, even during the delivery. We have an academic from Cambridge, and we have experts from the industry to support the learners to learn to get the best of both worlds.

The industry has changed so much in the last couple of years. Does that mean that the expertise and demands are also changing very quickly or is there a common thread that you can work with?

A A-S: Well, there is a common thread, but having new tools – I mean, Google just released Gemini, and that’s a new skill they have learnt and been tested on, and looked into how others feel about it and compared it to ChatGPT, or Claude 3 or Copilot. That’s all happened in the last 12 months. And then, of course, reacting on that, reflecting on the material, teaching the material – it’s a challenge. It’s not easy and you need to find the right person. Of course, people who have this kind of experience are in demand, and it’s hard to secure these kinds of human resources as well as to deliver the course. So there are challenges and we have to act dynamically and be adaptive.

What are your thoughts on the evaluation of these models, and how to manage the risk of something that you haven’t thought of before, and the role of regulation.

JG: I think a lot of our discussions at the moment are assuming that we’ve got well meaning, well intentioned people and well meaning, well intentioned companies and industries, who are trying to seek to do their best ethically and regulatorily and with appropriate data, and so on. But there is a space here for bad actors in the system.

Unfortunately, digital transformation of human life will happen in a good and bad way – unfortunately, I think there are going to be those two streams to this. Individuals are very capable now of making their own large language models by following a video guide if they wanted to, and having that data is, of course going to enable them maybe to do bad things with it.

Data is already a commodity in quite a strong way, but I do think we have to visit data security, and even the risks of open data as well. We live in a country, which I think does very well in producing lots of publicly available data. But that could be twisted in a way that we might not expect. And when I speak of those things, we’re usually thinking of groundwork – writing and implementing your own large language models – but there were recent examples of where just by using very clever prompting of existing large language models, you could get quite dangerous material, shall we say, which circumnavigated inbuilt existing safeguards. Again, that’s an emerging thing that we have to have to try and address as it comes on.

I think my final point with ethics and regulation is it will rapidly evolve, and it will rapidly change. And a story which I think can illustrate that is, when the first motorcar was introduced into the UK, it was law for a human to walk in front of the motorcar with a large red flag to warn passers-by of the incoming car because people weren’t really familiar with it. Now, of course, that’s in distant memory, right? We don’t have people with red flags, walking in front of cars. I do wonder, in 20 years or 50 years, what will the ethical norms regarding AI and its use be? Likewise, will we have deregulation? That seems to be the common theme in history that when we get more familiar with things, we deregulate because we’re more comfortable with their existence. That makes me quite curious about what the future holds.

FT: Jon raised a very interesting point and Janet touched upon keeping financial data in silos but we are facing this in healthcare as well. Data has to be checked within a trusted research environment or secure data environment that’s making the data silos. However, efforts at this point in time are on enhancing these digital platforms to bring data and federal data together. Alongside what is happening in terms of our progression towards development of a new ethical or legal requirement, is documenting what is being practised at the moment, because at the moment there are quite a lot of bubbles. Each institution has their own data and applies their own rules to it. So understanding what it is that we are currently working on – the data flows that are flowing into the secure environments – is building the basis of developments that are going on in terms of developing standardisation and common frameworks. A lot of projects have been focused on understanding the current to develop on it for the future.

We know for example, the Data Protection Act, put forward some specific requirements, but that was developed in 2018, before we had this massive AI consideration. In my academic capacity as well, we are facing what Jon mentioned, in terms of the diversity of assessments for students. For example, when we ask these questions, even if the data is provided within the course and within this defined governance, we know that the answers can possibly be aided by AI – a model. So we are defining more diverse assessment methods in academic practice to ensure that we have a way to evaluate the outcome that we are receiving by the human eye, rather than being blinded by what we receive from AI, and then calling it high quality output, whether in research practice or in academic practice. So there’s quite a lot of consideration of these issues, I think that is bringing our past knowledge to the current point where we now have to balance between human and machine interactions in every single process that we are facing.

How does this change the skill set required of data scientists, as AI is getting more and more developed?

A A-S: Regarding the terminology of data scientists, when we talk about data we immediately link that with statistics, and statistics is an old topic. There has been an accumulation of expertise for 100 years, to the best of my knowledge or more in statistics, and people who are new to data analysis or data, have to learn about this legacy. And when we develop the course, we should mention these skills in statistics and build this knowledge on top, that is, when we reach the right point, then we talk about learning or machine learning, supervised and unsupervised, and about LLM – these are the new skills they have to learn. As I mentioned, it’s tricky when we teach learners about it, we have to provide them with simple datasets to teach them something complex in statistics because it’s a danger to teach both [data and statistics at the same time] – we will lose them, they will lose concentration and it’s hard to follow up. So, a little bit of statistics – they have to learn the basics like normal distribution, the distribution, the type, and what does it mean when we have these distributions, the meaning of the data – and that is the point I made earlier about how people should have a sense for the numbers. What does it mean, when I say 0.56 in healthcare? Is that a danger? 60% – is that OK? In cybersecurity, if the probability of attack today is 60% should I inform the police? Should I inform someone; is that important? Or for example, for the stock market? Say we have dropped off 10% – Is that something we have to worry about? So making sense of the numbers is part of it.

That is part of personalised learning because it depends on their background or what they have learned – it’s not straightforward, and it has to be personalised not just for people taking the course now, for instance for someone who is 18 years old coming from their A levels. No, it’s for a wide range. People from diverse courses like to approach this data science course. And now we are in the era of people who are in social science, and engineering, doctors, journalism, art, they are all interested in learning a little bit of data science, and utilising AI for their benefit. So there is no one answer.

You emphasise that people still need to be able to make sense of numbers. We’re often told that AI will devalue knowledge and devalue experience – it sounds like you don’t feel that’s the case.

A A-S: I have to stick with the following: human value is just that – value. AI without humans is worth nothing. I have one example: In 1997, some software was developed for chess, to play against a human, and for the first time, that computer programme (called AI now) beat Kasparov. Guess what happened? Did chess disappear? No, we still value human to human competition. The value of the human is the same for art and for music. So we still have human value, and we have to maintain that for the next generation. They shouldn’t lose this human value, and handover to AI value, which I feel is zero without the human.

J B: I think one of the things we are seeing is that diversity in people’s backgrounds coming into data science, which is fantastic, because I think that really helps with the understanding of when things can go wrong, and how things can be misused. If you have this cookie cutter set of people that have all got a degree from the same place and all had the same experience, which is very similar – this happens a lot in the financial industry where there’s like five universities that all feed into the banks – they all think and solve problems in the same way because that’s how they’ve been trained. But as soon as you start bringing in people with different backgrounds, they’re the ones that say, hang on, this is a problem. So having those different backgrounds is really useful.

But then as Ali said there’s so many people who call themselves a data scientist that don’t understand data, or science. And I think he was absolutely right. If you’ve got a probability of 60%, or you’ve got a small standard deviation, when is that an issue? What do you really understand about that based on your industry, and based on your statistical knowledge? That’s so so key. And it’s something that a lot of people who are self-trained and call themselves data scientists have missed out on. So coming back to your original question about is it harder or is it easier, in some respects, it’s a lot harder, because someone who calls himself a data scientist now needs to do everything from basically fundamental research, trying to make models better, you’ve got to understand statistics, you’ve got to understand machine learning, engineering, production, isolation, efficiencies, effectiveness, ethics – it’s this huge, huge sphere. And it’s too much for one person. So you’ve really got to have well balanced teams and support. Because you can’t keep on top of your game across all of those. It’s just not possible. So I think that becomes really difficult. When I look at how things have changed, there’s so many basic principles from, you know, the 80s and 90s, in standard, good quality computer programming and testing. And I think the one thing that we’re really missing as an industry is a specialist AI testing role. Someone who understands enough about how models work and how they can go wrong and can do the same thing for AI solutions, as good QA analysts can do for standard software engineering models. Someone who can really test them to extremes with what happens when I put the wrong data in.

We saw this – there were a couple of days under COVID, where all the numbers went wrong, because the data hadn’t been delivered correctly, or not enough of it had been delivered. There were no checks in place to say, actually, we’ve only got 10% of what we were expecting, so don’t automatically publish these results. It’s things like that, that we really need to make sure are built into the systems because those are the things that, again, could cause problems. As soon as you get a model that’s not doing the right thing – going back to our original question – when they do go wrong, you can then find a company pulls that model even though it could be easily fixed. And then they’re disillusioned with AI, and won’t use it. That’s that whole project, and all of the expense and investment on that just thrown away when a bit more testing and understanding could have saved it.

Explore more data science ideas

About the authors
Anna Demming is a freelance science writer and editor based in Bristol, UK. She has a PhD from King’s College London in physics, specifically nanophotonics and how light interacts with the very small, and has been an editor for Nature Publishing Group (now Springer Nature), IOP Publishing and New Scientist. Other publications she contributes to include The Observer, New Scientist, Scientific American, Physics World and Chemistry World..
Copyright and licence
© 2024 Royal Statistical Society

This article is licensed under a Creative Commons Attribution 4.0 (CC BY 4.0) International licence.

How to cite
Demming, Anna. 2024. “What is “best practice” when working with AI in the real world?.” Real World Data Science, June 4, 2024. URL