Trusted AI: translating AI ethics from theory into practice

Data scientists can act as critical enablers of ethical AI when they have the right knowledge and toolkits at their disposal. Maxine Setiawan and Mira Pijselman review three key attributes of trustworthy AI – transparency, explainability, and fairness – and ways to put these into practice.

AI ethics

Maxine Setiawan and Mira Pijselman


July 3, 2023

With artificial intelligence (AI) becoming increasingly prevalent across sectors, so too have conversations about AI ethics. AI ethics provides a repeatable and comprehensive way to assess what we should and should not be doing with AI, and sets out how we ought to design, use, and govern AI products in accordance with key principles. Ethical frameworks are essential to derive sustainable value from AI products and services and build trust.

A myriad of AI tools that leverage automated or semi-automated decision-making processes have raised important questions that have become foundational in the AI ethics community, such as ‘What does it mean for an algorithm to be fair?’ As an example, AI tools that are used in recruitment may perpetuate biases arising from historical training data. If a model used to generate a shortlist of applicants has been trained on data from past candidates, say, and those candidates – both successful and unsuccessful – are predominantly men, historical patterns that contain various biases will perpetuate to become algorithmic biases that form the model’s decisions. Thus, the model may algorithmically discriminate against women or gender minorities, as individuals from these groups are not well represented in the training data.

To ensure the safe and responsible use of AI, the focus moving forward needs to be on the operationalisation of AI ethics into the day-to-day development lifecycle. But, what does this look like in practice? And how might you get started as an ethical AI practitioner? In this article, we unpack these questions and give you, the data scientist, a foundation to begin your journey towards trusted AI. Read along to get an overview of key principles that you should be aware of, what they mean, their underlying technical grounding, and what implementation might look like practically.

Ethical AI principles

You have likely heard of several principles in relation to ethical AI, such as fairness or transparency. The context in which you’ve encountered such principles is most probably due to their inclusion in a broader ethical framework. Some of the most popular ethical AI frameworks include the National Institute of Standards and Technology’s AI Risk Management Framework, the UK Data Ethics Framework, and the European Commission’s Ethics Guidelines for Trustworthy AI. Among these and many other frameworks, we can run into what Floridi and Cowls (2019) call “principle proliferation,” whereby it becomes overwhelming for those contributing to AI programmes to know where to begin with ethics due to an excess of choice (p. 2).

At the time of writing, there is no single universally accepted standard that dictates which essential ethical AI values or principles should be adhered to during AI development and deployment. However, there are common themes that emerge. In our organisation, EY, we’ve learned from the variety of principles, frameworks, and white papers in the AI ethics community and developed our own Trusted AI Framework comprising five key attributes that we believe assure the trustworthiness of AI:

  • Transparent
  • Explainable
  • Unbiased
  • Resilient
  • High-performing

In this article, we take a deeper dive into the first three attributes – transparency, explainability, and unbiasedness (or fairness). These are areas where data scientists can act as critical enablers of ethical AI when they have the right knowledge and toolkits at their disposal.


Transparency is the ability to provide meaningful information to foster awareness and understanding of an AI system. It starts with documenting AI systems in a way that is accessible for a broad audience with a spectrum of technical abilities. It is a simple yet powerful way to build trust in AI. It empowers non-technical stakeholders to critically evaluate AI development decisions, thereby unlocking multi-disciplinary insights that can mitigate reputational or performance risks. Further, it also builds trust with society, as it can enable everyday users to interrogate AI design decisions, product capabilities, and system limitations, thereby permitting users to make informed judgements about technology. Unfortunately, transparency is often misunderstood as disclosing trade secrets or proprietary information, such as source code and datasets. However, transparency can be achieved without disclosing such technically complex information. Instead, it can be as simple as disclosing where and when an AI system is being used, or for what purposes a model should be employed.

But what exactly does “documenting AI systems” look like? Documentation should consist of a mix of technical components (system architecture, dataset selection determination, model selection techniques, etc.) and non-technical components (business case, product purpose of use, alignment to overall AI strategy, etc.). The research community has recommended AI documentation standards, such as datasheets for datasets and model cards for model reporting. You can liken datasheets or model cards to the importance placed upon commenting your code – the more information there is available around decisions throughout model development, the greater the certainty that these artefacts will be understood and used as intended moving forward. Proper documentation and governance will help ensure accountability, improve internal and external oversight, and initiate discussions around model optimisation goals and their trade-offs, such as including fairness and accuracy in optimisation objectives.

With upcoming AI regulations, transparency requirements will become more integral. For example, the European Union (EU) AI Act introduces specific transparency obligations, such as bot disclosures, for both users and providers of AI systems, which would allow users to opt out of interacting with an AI system. Furthermore, in higher risk use cases, specific technical documentation is needed, which would include details of a system’s intended purpose and descriptions of its development process.


Once transparency is enabled, explainability is a natural next step, especially when an AI product is implemented in a more regulated or high-risk environment. Explainability is the ability to express why an AI system reached a particular decision or understand the features that affect model behaviour. Explainability is a key concern within the field of explainable AI, which, as a discipline, strives to improve trustworthiness by enabling a better understanding of a system’s underlying logic via a suite of technical methods.

Fundamentally, different model architectures mean that some models are more interpretable than others, as the steps used to evaluate their predictions are easier for humans to comprehend. Decision trees, for example, have more human-interpretable characteristics than deep learning models. Different model architectures also mean that there are interpretation tools that are only applicable to certain models, such as regression weights in a linear model.

Another approach to consider, then, is model-agnostic interpretation, which encompasses both global interpretability (explanation of an entire model’s behaviour) and local interpretability (explanation of a single prediction). While there are fast-developing techniques and tools for model-agnostic interpretability, let’s take a look at two of the more popular methods available:

  • Local interpretable model-agnostic explanations (LIME)
    This is an explanation technique that trains local surrogate models, using explainable models such as Lasso or decision trees, to approximate the predictions of a model that is not interpretable by design in order to explain individual model predictions. The idea is to use interpretable features from the surrogate models to create human-friendly explanations where the underlying model cannot. For example, in an image classification model that detects a flower in an image, LIME is able to highlight the parts of the image that explain why the model classifies the image as a flower (see illustration below). This provides an interpretable explanation between the input variable and prediction, which is an essential part of interpretability.

Illustration of explainable AI processes using LIME on an image classification AI system. In this example, an image classification system receives an image of a sunflower and classifies it as a flower with 70% likelihood. The LIME approach then sees parts of the input image perturbed, or masked, leading to different classification likelihoods from the AI system. From this, a model is able to determine the parts of the input image that best explain the initial classification of 'flower'.

Illustration of explainable AI processes using LIME on an image classification AI system. Adapted from “Local Interpretable Model-Agnostic Explanations (LIME): An Introduction” and O’Reilly.
  • Shapley Additive Explanations (SHAP)
    SHAP (GitHub repo) uses tools and theoretical foundations from game theory, one of which is Shapley values. It works by assigning each feature an importance value for a particular prediction to numerically explain the contribution of various features to a model’s output. For example, in a model that predicts flu, SHAP calculates the importance of sneezing as a feature by removing and adding the subset of other features, leading to different combinations of features that contribute to the prediction. This method provides interpretable solutions for more complex models similar to the equivalent of “weights” in linear models.


The area of AI ethics that is central to impending AI regulations, such as the EU AI Act and the New York City AI Law,1 is fairness. AI models are inherently biased because of their underlying training data.2 Thus, when we speak of fairness in the context of AI ethics, we are referring to a combination of technical and non-technical ways to minimise the impacts of algorithmic bias.

Let’s begin with the technical approaches to fairness. To achieve equitable, reliable, and fair decisions, a diverse and balanced set of examples is needed in training datasets. However, data often contains disparities that, if left unchecked, can perpetuate algorithmic biases and harms. There are various approaches to detect sources of bias, guarantee fairness, or “debias” models. To strive for algorithmic fairness, many papers have proposed various quantitative measures of fairness, with some based on unstated assumptions about fairness in society. Unfortunately, these assumptions are often mutually incompatible, making it difficult to compare fairness metrics to one another – consider, for example, the longstanding debate between equality of outcome and equality of treatment.

Although metrics incompatibilities exist, fairness broadly focuses on equality of opportunity (group fairness), and equality of outcome (individual fairness) to prevent discrimination against certain attributes. Drawing definitions from legal frameworks, the term “protected attribute” refers to the characteristics that are often protected under anti-discrimination laws, such as gender or race. Mathematically, the following metrics are often used to demonstrate scores that support fairness:

  • Statistical parity
    This measure seeks to uncover whether a model is fair towards protected attributes by measuring the difference between the majority and protected class in receiving a favourable outcome. A value of 0 demonstrates the model to be fair.
  • Disparate impact
    This compares the percentage of favourable outcomes for the monitored group to the percentage of favourable outcomes for a reference group. The groups compared can be the majority group and minority group, and this score will highlight in whose direction decisions are biased. For example, if a model grants loans to 60% of people in a middle-aged group and only 50% for those of other age cohorts, then the disparate impact is 0.8, which indicates a positive bias towards the middle-aged group and an adverse impact on the remaining cohorts.
  • Equality of odds
    This measures the balance of the true positive rate and false positive rate between protected and unprotected groups, which seeks to uncover whether a model performs similarly for the two groups.

It is important to remember that statistics are only one side of the fairness problem for machine learning, and one that treats the symptoms of bias as opposed to the underlying causes. In addition to the aforementioned technical approaches, there are a variety of non-technical measures that teams developing AI systems can adopt to augment fairness and inclusion:

  • Definition of fairness
    Organisations that develop or use AI systems need to define, practically, what it means to be fair. Although there are various quantitative fairness measures, these are based on assumptions of fairness in society, which could be defined for each specific use case.
  • Diversity on teams
    There’s been a sharpened focus on the value of team diversity to areas such as productivity and creativity. The same is true for ethics. Ensuring that product teams are composed of a broad cross-section of identities can help to organically drive fairness through diversity of thought and experience.
  • Education and self-reflection
    Developing knowledge within individuals and teams about the socio-technical aspects of AI – that is, the ways in which AI shapes our social, political, economic, and environmental lives. The more critical a person can be as a data scientist in questioning why something is being built, the more likely they are to proactively recognise risks surrounding fairness.
  • Consider the end user
    Imagine that you are on a development team building an AI solution for a problem in the agricultural sector pertaining to livestock health. Who is best suited to solving the problem: a data scientist or a farmer? As a data scientist, you may have the tools to develop a solution, but given your distance from the end user, you are unlikely to intimately understand the problem in the same way a farmer would. If you cannot understand the problem, you cannot hope to find a solution, much less an ethical one. Recognising the importance of consulting individuals that are representative of end users is key to ensure that your design is fair.
  • AI ethics review boards
    Data science teams should not operate in isolation. Increasingly, organisations are establishing AI ethics review boards or similar forums that are intended to act as checks on the design decisions made throughout AI development. Does your organisation have one?

In conclusion

These three areas – transparency, explainability, and fairness – are the starting points to embed and operationalise AI ethics in technical development. Transparency relies on both technical and non-technical documentation to facilitate discussions with non-technical stakeholders, as well as to create and enforce accountabilities. Explainability helps to build trust in AI output by vesting us with an ability to explain “why”. Finally, adopting both technical and non-technical measures of fairness can ensure that AI products in development do not adversely impact certain groups.

In addition to these three areas of AI ethics, within EY we have two other focus areas – resilience and high-performance – that form part of our Trusted AI Framework. We will discuss these in a future article. We’re also keen to explore topics such as generating trust in generative AI! Until then, please share your stories of developing ethical AI projects in the comments below. How are you translating AI ethics from theory into practice?

For further technical reading, we suggest:

For further socio-technical reading on AI and data ethics, we suggest:

  • The Age of Surveillance Capitalism, by Shoshana Zuboff
  • Invisible Women, by Caroline Criado Pérez
  • Race after Technology, by Ruha Benjamin
  • Algorithms of Oppression, by Safiya Noble
  • Atlas of AI, by Kate Crawford
  • Weapons of Math Destruction, by Cathy O’Neil
  • Data Feminism, by Catherine D’Ignazio and Lauren Klein

Explore more data science ideas

About the authors
Maxine Setiawan is a social data scientist specialising in trusted AI, and AI and data risk in EY UK&I. With her multi-disciplinary background, she works to help clients understand and manage risks from their data and AI systems, and to ensure AI governance that is fair, accountable, and trustworthy. Maxine holds an MSc in social data science from the University of Oxford.

Mira Pijselman is the digital ethics lead for EY UK&I, where she focuses on the responsible governance of key emerging technologies, including artificial intelligence, quantum technologies, and the metaverse. A social scientist and philosopher by training, she helps clients to map, understand, secure, and capitalise on their data and technology potential safely. Mira holds an MSc in the social science of the internet from the University of Oxford.

Copyright and licence
© 2023 Maxine Setiawan and Mira Pijselman

This article is licensed under a Creative Commons Attribution 4.0 (CC BY 4.0) International licence.

Thumbnail image by Alexa Steinbrück / Better Images of AI / Explainable AI / Licenced by CC-BY 4.0.

How to cite
Setiawan, Maxine and Mira Pijselman. 2023. “Trusted AI: translating AI ethics from theory into practice.” Real World Data Science, July 3, 2023. URL


  1. New York City Local Law 144.↩︎

  2. Not statistical bias (usually known as bias-variance trade-off), which compares the training data and target value to approximate errors.↩︎