<?xml version="1.0" encoding="UTF-8"?>
<rss  xmlns:atom="http://www.w3.org/2005/Atom" 
      xmlns:media="http://search.yahoo.com/mrss/" 
      xmlns:content="http://purl.org/rss/1.0/modules/content/" 
      xmlns:dc="http://purl.org/dc/elements/1.1/" 
      version="2.0">
<channel>
<title>Real World Data Science</title>
<link>https://realworlddatascience.net/foundation-frontiers/</link>
<atom:link href="https://realworlddatascience.net/foundation-frontiers/index.xml" rel="self" type="application/rss+xml"/>
<description></description>
<image>
<url>https://realworlddatascience.net/images/rwds-logo-150px.png</url>
<title>Real World Data Science</title>
<link>https://realworlddatascience.net/foundation-frontiers/</link>
<height>83</height>
<width>144</width>
</image>
<generator>quarto-1.9.37</generator>
<lastBuildDate>Fri, 10 Apr 2026 00:00:00 GMT</lastBuildDate>
<item>
  <title>Inside ‘RSS: Data Science and Artificial Intelligence’ with Neil Lawrence</title>
  <dc:creator>Annie Flynn</dc:creator>
  <link>https://realworlddatascience.net/foundation-frontiers/posts/2026/04/02/neil-lawrence-interview.html</link>
  <description><![CDATA[ 





<p>Real World Data Science recently had the opportunity to sit down with <a href="https://www.cst.cam.ac.uk/people/ndl21">Professor Neil Lawrence</a>, Editor-in-Chief of the Royal Statistical Society’s new journal, <a href="https://academic.oup.com/rssdat">RSS: Data Science and Artificial Intelligence</a>. Neil, who is the DeepMind Professor of Machine Learning at the University of Cambridge, a Senior AI Fellow at the <a href="https://www.turing.ac.uk/">Alan Turing Institute</a>, and a Visiting Professor at the University of Sheffield, is a leading voice in machine learning and AI. He has previous experience as Director of Machine Learning at Amazon and research interests spanning probabilistic models and real-world applications in health and developing economies. He is also passionate about public engagement—he co-hosts the <a href="https://www.thetalkingmachines.com/">Talking Machines</a> podcast and is the author of <a href="https://www.penguin.co.uk/books/455130/the-atomic-human-by-lawrence-neil-d/9781802062106">The Atomic Human</a>.</p>
<p>We recently published a <a href="https://realworlddatascience.net/foundation-frontiers/datasciencebites/posts/2025/11/21/uncertainty.html">Data Science Bite</a> breaking down <a href="https://realworlddatascience.net/foundation-frontiers/posts/2026/01/29/beyond-quantification-delacroix-interview.html">the first position paper</a> of the newly launched journal, and had the opportunity to speak to its lead author <a href="https://realworlddatascience.net/foundation-frontiers/posts/2026/01/29/beyond-quantification-delacroix-interview.html">Professor Sylvie Delacroix</a> about its themes: how AI can better support human judgment, why it is crucial to recognise forms of uncertainty that can’t be reduced to numbers, and how participatory design can make AI a true partner, rather than a replacement, for professionals.</p>
<p>In this conversation, Neil discusses the paper and how it aligns with the journal’s vision, plus the importance of bridging machine learning and related fields to keep the human element at the heart of AI systems.</p>
<p>Watch the full interview below and scroll down for key takeaways and some analysis.</p>
<hr>
<section id="interview" class="level2">
<h2 class="anchored" data-anchor-id="interview">Interview</h2>
<div class="quarto-video ratio ratio-16x9"><iframe data-external="1" src="https://www.youtube.com/embed/VV_FnGQXWlM" title="" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen=""></iframe></div>
<hr>
</section>
<section id="key-takeaways-at-a-glance" class="level2">
<h2 class="anchored" data-anchor-id="key-takeaways-at-a-glance">Key Takeaways at a Glance</h2>
<section id="the-journal-aims-to-convene-not-conclude" class="level3">
<h3 class="anchored" data-anchor-id="the-journal-aims-to-convene-not-conclude">1. The journal aims to convene, not conclude</h3>
<p>The first paper is intentionally a position paper: an invitation to discussion rather than a definitive answer. Lawrence emphasises that solutions to these challenges are distributed across the community. Progress depends on creating spaces—like the RSS journal—for thoughtful, cross-disciplinary exchange grounded in real-world practice.</p>
</section>
<section id="data-scientists-must-reassess-habits-not-just-adopt-new-tools" class="level3">
<h3 class="anchored" data-anchor-id="data-scientists-must-reassess-habits-not-just-adopt-new-tools">2. Data scientists must reassess habits, not just adopt new tools</h3>
<p>While AI can dramatically increase technical efficiency, Lawrence warns against using that efficiency to simply “do more of the same.” Instead, practitioners should reinvest time in understanding the broader human, societal, and institutional implications of their work.</p>
</section>
<section id="overconfidence-and-lack-of-accountability-in-ai-systems-pose-real-risks" class="level3">
<h3 class="anchored" data-anchor-id="overconfidence-and-lack-of-accountability-in-ai-systems-pose-real-risks">3. Overconfidence and lack of accountability in AI systems pose real risks</h3>
<p>As the journal’s position paper highlights, AI systems, unlike human stakeholders, do not carry social or reputational stakes. This can lead to overconfident outputs without accountability—particularly dangerous in high-stakes domains like healthcare, law, and education. Without better interfaces for uncertainty, professionals risk being distanced from the information they need to make sound judgments.</p>
</section>
<section id="conversational-uncertainty-is-now-central-to-real-world-ai-use" class="level3">
<h3 class="anchored" data-anchor-id="conversational-uncertainty-is-now-central-to-real-world-ai-use">4. “Conversational uncertainty” is now central to real-world AI use</h3>
<p>In many professional settings, decisions are not made through formal statistical outputs alone, but through dialogue—between clinicians, experts, or increasingly, humans and machines. Understanding how uncertainty is communicated and interpreted in these conversational settings is critical, especially as large language models become more influential.</p>
</section>
<section id="bridging-qualitative-and-quantitative-thinking-is-essential" class="level3">
<h3 class="anchored" data-anchor-id="bridging-qualitative-and-quantitative-thinking-is-essential">5. Bridging qualitative and quantitative thinking is essential</h3>
<p>A recurring theme is the need to close the long-standing divide between quantitative methods and qualitative insight. Many real-world decisions are inherently qualitative, yet current AI systems—and much of data science—are optimised for quantification. Failing to integrate these perspectives risks repeating past mistakes where “the numbers” were treated as unquestionable truth.</p>
</section>
<section id="participatory-approaches-lead-to-better-long-term-decisions" class="level3">
<h3 class="anchored" data-anchor-id="participatory-approaches-lead-to-better-long-term-decisions">6. Participatory approaches lead to better long-term decisions</h3>
<p>Although slower upfront, participatory and deliberative processes—bringing together diverse expertise and perspectives—can prevent costly mistakes and misaligned systems. In the long run, they are more effective than purely efficiency-driven approaches.</p>
</section>
</section>
<section id="join-the-conversation" class="level2">
<h2 class="anchored" data-anchor-id="join-the-conversation">Join the conversation</h2>
<p>This conversation touches on a theme we often explore here at Real World Data Science: the idea that the future of data science and AI will not be defined by technical capability alone, but by how well we integrate human judgment, context, and responsibility into our systems. The position paper—and RSS: Data Science and AI more broadly—is an open invitation to engage with these questions. Whether through research, case studies, or reflections from practice, there is a clear call for contributions that connect technical work with real-world impact.</p>
<p>As Neil suggests, the answers are unlikely to come from any single discipline or organisation. They will emerge from a broader conversation across the data science community.</p>
<p>Now is the time to be part of that conversation: answer RSS: Data Science and AI’s <a href="https://realworlddatascience.net/foundation-frontiers/posts/2026/01/29/beyond-quantification-delacroix-interview.html">call for submissions</a>.</p>
<div class="article-btn">
<p><a href="../../../../../foundation-frontiers/index.html">Explore more data science ideas</a></p>
</div>
<div class="further-info">
<div class="grid">
<div class="g-col-12 g-col-md-12">

</div>
</div>


</div>
</section>

 ]]></description>
  <category>Interviews</category>
  <category>AI</category>
  <category>Ethics</category>
  <category>Uncertainty</category>
  <guid>https://realworlddatascience.net/foundation-frontiers/posts/2026/04/02/neil-lawrence-interview.html</guid>
  <pubDate>Fri, 10 Apr 2026 00:00:00 GMT</pubDate>
  <media:content url="https://realworlddatascience.net/foundation-frontiers/posts/2026/04/02/images/thumb1.png" medium="image" type="image/png" height="96" width="144"/>
</item>
<item>
  <title>Why Great Models (Still) Fail</title>
  <dc:creator>Jennifer Hall</dc:creator>
  <link>https://realworlddatascience.net/foundation-frontiers/posts/2026/02/25/why_great_models_still_fail.html</link>
  <description><![CDATA[ 





<p>In the field of data science and AI, it’s easy to assume that technical excellence is the ultimate goal. Performance can be quantified in ROC curves, accuracy scores, and other metrics, but a model can be technically brilliant and still deliver no real-world impact.</p>
<p>In our earlier article <a href="https://realworlddatascience.net/applied-insights/case-studies/posts/2026/01/12/why-95-percent-of-ai-projects-fail.html">“Why 95% of AI Projects Fail”</a> Lee Cleweley examined the strategic and organisational reasons AI struggles to deliver value. This piece takes that conversation to the ground level, offering a practitioner’s guide to designing models that succeed in real‑world use.</p>
<p>Success in practice goes far beyond code and algorithms. It comes down to solving the right problem, in the right way, for the right people. No matter how elegant a technical solution is, it must address real problems for real users. Achieving that requires more than strong technical workflows—it also demands an understanding of how the model and technical solution fits into the bigger picture. To do that, data science and AI practitioners, when designing their solution, need to see how it will sit within broader processes, including how end users will actually interact with and use it.</p>
<p>The importance of this skill emerged repeatedly in the “10 Key Questions to Data Science and AI Practitioners” interview series, run by the Data Science and AI Section of the Royal Statistical Society. The series gathers perspectives from practitioners at various career stages, from those starting their career to senior leaders. By posing the same ten questions, it uncovers motivations, challenges, and visions for the future while highlighting the breadth of career paths in the field. When asked what they considered the most undervalued skill, many participants highlighted the importance of something non-technical — the ability to understand the organisational context and the needs of users.</p>
<p>The importance of these skills for data science and AI practitioners is further evidenced by their emphasis in government and professional standards. The UK Government’s <a href="https://ddat-capability-framework.service.gov.uk/role/data-scientist?utm_source=chatgpt.com">DDaT Capability Framework</a> highlights that data science practitioners especially at higher levels are expected to “design and manage processes to gather and establish user needs”. Similarly, the Royal Statistical Society in <a href="https://rss.org.uk/RSS/media/File-library/Membership/Prof%20Dev/AdvDSP-Guidance-Notes-2024.pdf?utm_source=chatgpt.com">The Alliance for Data Science Professionals Certification Guidance and Process: Advanced Data Science Professional</a> states as a key skill the ability to be “engaging stakeholders, demonstrating the ability to clearly define a problem and agree on solutions” including being able to “Identify and elicit project requirements”. Together, these frameworks show that engaging directly with users and stakeholders is not optional—it is a core professional expectation for data science and AI practitioners.</p>
<div class="callout callout-style-simple callout-note">
<div class="callout-body d-flex">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-body-container">
<p><strong>The Case of the Vanishing Model</strong></p>
<p>Consider a fictional, but perhaps painfully familiar, scenario to practitioners. A practitioner is asked to “build a model to predict which customers are likely to leave.”</p>
<p>They get to work: sourcing data, engineering features, and testing a range of algorithms. After three months, they deliver a model with 94% accuracy on retrospective data. It’s an elegant solution, using a technically sophisticated approach and they are justifiably proud.</p>
<p>Then comes the handover presentation:</p>
<ul>
<li><strong>Marketing:</strong> “How do we act on this? We already run retention campaigns—will this actually improve them?”</li>
<li><strong>Commercial:</strong> “It will cost £X per month to operate. What return should we expect?”<br>
</li>
<li><strong>Operations:</strong> “There’s no process for plugging these predictions into the CRM. Who is meant to action this?”</li>
</ul>
<p>The project stalls. Despite strong performance metrics, the model never makes it into production. The lesson is clear: even the most technically impressive solution will fail if it isn’t designed with real-world context in mind. The model simply “vanishes” and all that hard work goes to waste.</p>
<p>This example is deliberately simplified. In some organisations, practitioners may work alongside business partners, product owners, or domain leads who help shape requirements and maintain alignment with broader goals. Yet this support does not remove the practitioner’s responsibility: technical success still depends on their own clear understanding of the business requirement and recognition their technical solution may be a small but an integral cog in a large machine. For the machine to work effectively all the parts must work together. A model is not just a mathematical construct; it is a product that must operate within the complex, resource-limited realities of an organisation.</p>
</div>
</div>
</div>
<section id="start-with-what-we-are-trying-to-achieve" class="level2">
<h2 class="anchored" data-anchor-id="start-with-what-we-are-trying-to-achieve">Start with What We Are Trying to Achieve</h2>
<p>Too often, data science projects begin with vague aims such as “build a model” or “forecast sales.” These are activities, not outcomes. What matters is the result the organisation is striving for—for example, increasing upsell revenue by £2M this quarter or preventing 500 contract cancellations per month through timely intervention. Asking the right questions early is essential for designing solutions that can actually be implemented. For instance, a retention model might flag 1,000 customers at high risk of leaving, but if capacity allows only 50 calls per week, the key question becomes: which 50 should be prioritised, and does contacting them actually improve retention compared to a control group?</p>
<p>Before writing a single line of code, it is essential to gather as much context as possible:</p>
<ul>
<li>What problems is the business actually solving?</li>
<li>How does the model fit into the wider business process?</li>
<li>Who will use the outputs, and what actions will follow?</li>
<li>How will success be measured—commercially, operationally, behaviourally?</li>
<li>What trade-offs are acceptable in cost, complexity, or speed?</li>
<li>How will performance be monitored over time?</li>
<li>What are the operational constraints?</li>
</ul>
<p>Once the essentials are understood (to the extent they can be), the vision for the project and the success metrics must be agreed collectively. All key stakeholders—technical, operational, financial, and strategic—need to be involved in defining what success looks like. Without this shared vision, each group risks optimising for its own priorities rather than the organisation’s overall goals. Crucially, the vision should extend beyond performance metrics: it should tell the story of the problem being solved and what success will mean in practice. This shared narrative becomes the project’s guiding star. To keep it on course, data science and AI teams, working with stakeholders, must guard against scope creep and shifting success criteria, ensuring that any new requests fit within the agreed scope. Flexibility still has a place—experimentation and design changes are healthy—but only when they remain consistent with the original vision and are aligning to achieving the success metrics.</p>
</section>
<section id="the-power-of-test-and-learn" class="level2">
<h2 class="anchored" data-anchor-id="the-power-of-test-and-learn">The Power of Test-and-Learn</h2>
<p>Evaluation and monitoring must be built in from the beginning. Doing so ensures that systems are designed to capture the right metrics for monitoring, rather than scrambling to measure impact after the fact. This means defining not only technical performance measures but also organisational impact measures, all aligned to clear, measurable success metrics. These metrics should be developed collaboratively with stakeholders, and while data scientists may not set them alone, they play a critical role in shaping and challenging them where needed.</p>
<p>A test-and-learn approach is particularly powerful because it generates direct evidence of what works under real-world conditions. For example, a simple test-and-control design splitting customers into two groups, one acted on and one left as business-as-usual, provides incremental evidence of benefit that is far more persuasive than retrospective accuracy scores. Unlike abstract metrics, this method shows whether interventions truly drive the desired outcomes, and it allows organisations to learn, adapt, and refine strategies over time.</p>
<p>Ultimately, evaluation is about measuring decision performance in practice, while monitoring ensures that impact remains robust as circumstances evolve.</p>
<div class="callout callout-style-simple callout-note">
<div class="callout-body d-flex">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-body-container">
<p>In our fictional case, the practitioner was told simply “Predict which customers are likely to leave.”</p>
<p>Had the brief been framed instead as: “Identify the top 50 customers most likely to leave and integrate this into daily retention calls, aiming to save £1M/year in lost contracts,”</p>
<p>– the project would have taken a very different path. From the outset, the practitioner could have:</p>
<ul>
<li>Focused on the right features (e.g.&nbsp;time since last contact, usage trends).</li>
<li>Defined the appropriate technical workflows to meet the business vision such as defining how to best process the predictions (e.g.&nbsp;such as daily batches).</li>
<li>Set evaluation criteria and how this will be measured and monitored over time not just for accuracy, but for contracts saved and revenue retained. For example is a dashboard needed to monitor technical and/or business metrics over time?</li>
</ul>
</div>
</div>
</div>
<p>Map the current business process end to end, noting all user interactions and data collection points. Then overlay where the model will integrate into that process— the inputs into the model pipeline, who receives the model outputs, how they are acted on, and how outcomes flow back into the system. This makes clear both the operational impact of the model and what changes are needed for it to deliver value.</p>
</section>
<section id="design-for-value-not-novelty" class="level2">
<h2 class="anchored" data-anchor-id="design-for-value-not-novelty">Design for Value, Not Novelty</h2>
<p>Data science is not about building impressive models for their own sake. It is about solving valuable problems in ways that make business sense.</p>
<p>If a model improves accuracy by two percent but costs ten times more to run, is it worth it? The answer depends on whether those extra points translate into measurable financial impact.</p>
<p>Ask:</p>
<ul>
<li>Could a simpler model deliver “good enough” accuracy at lower cost?</li>
<li>What is the marginal value of added complexity?</li>
<li>Does the design reflect operational constraints?</li>
</ul>
<blockquote class="blockquote">
<p>Here, the product mindset for data science and AI practitioners becomes critical. Treating an AI solution as a product reframes the goal from “building a model” to “delivering value.” Like any product, an AI system has costs to design, build, deploy, and maintain. Its worth lies not in technical elegance but in whether the return justifies those costs. That means asking early: is the investment worth it?</p>
</blockquote>
<p>One practical way to answer that question is by forecasting scenarios. Before scaling, estimate the expected impact under different conditions: a base case, a best case, and a worst case. For example, in a retention project, you might forecast incremental revenue by combining churn rates, average customer value, intervention costs, and expected uplift. This makes assumptions explicit and gives decision-makers a clear view of risk and upside. A solution is rarely a guaranteed win, but scenario planning allows stakeholders to judge whether the likely outcomes justify the investment.</p>
<div class="callout callout-style-simple callout-note">
<div class="callout-body d-flex">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-body-container">
<p>Consider again the retention example. A complex ensemble might squeeze out a few extra percentage points of accuracy, but a straightforward logistic regression—fast, interpretable, and low-cost—might enable daily scoring and immediate action. Even if slightly less accurate, its ease of deployment and alignment with operational capacity could make it far more valuable overall. Simplicity, in many cases, is the shortest route to measurable business outcomes.</p>
<p>A product mindset also changes how value is communicated. Technical performance metrics—“87% recall with XGBoost”—speak to specialists but mean little to decision-makers. A product framing translates performance into outcomes: “This model could reduce service costs by £800k annually by targeting at-risk customers more effectively.” Such claims should be grounded in defendable assumptions: average customer value, historic retention rates, intervention costs, and expected uplift. Framing matters. Commercial cares about ROI, operations about efficiency and capacity, marketing about campaign effectiveness, and leadership about growth and risk. Lead with the “why,” not the “how,” so the role of the model in delivering value is unmistakable.</p>
<p>In our fictional retention project, the gap wasn’t the algorithm—it was the absence of product-minded, value-first design. A better path would have been to:</p>
<ul>
<li>Co-define the decision and action with Marketing: which customers will be contacted, via which channel, on what cadence.</li>
<li>Quantify a credible return on investment with Commercial by building a simple model using actual retention rates, average customer value, contact costs, and expected uplift—then present best/base/worst cases with explicit assumptions. From there, translate the ROI targets into required model performance thresholds (e.g., precision/recall, lift) to meet ROI and the agreed success metrics.<br>
</li>
<li>Choose the fastest viable baseline—such as logistic regression—to enable daily scoring and interpretability, and document the marginal value required to justify moving to a more complex ensemble. Factor in time investment and run costs, align these with the ROI calculations above, and use that alignment to communicate and justify the investment. This approach also provides a clear benchmark: if the baseline model cannot meet the agreed success metrics, it helps build the case for investing in more complex methods.</li>
<li>Run a time-boxed pilot with a holdout: four–six weeks, test-and-control experiment; measure incremental saves, revenue impact, and operational load before scaling.</li>
<li>Set guardrails and monitoring: track decision KPIs (contacts made, saves, £ retained) alongside model KPIs; agree thresholds for retraining and a rollback plan.</li>
</ul>
</div>
</div>
</div>
</section>
<section id="build-for-adoption" class="level2">
<h2 class="anchored" data-anchor-id="build-for-adoption">Build for Adoption</h2>
<p>Adoption must be planned from the start. Trust develops gradually, and regular check-ins with stakeholders help sustain it by keeping the project aligned with its agreed vision. These sessions are not box-ticking exercises but chances to test assumptions, surface blockers, have continuous feedback and make timely adjustments. Ultimately, a model succeeds only if people use it — so adoption depends on seamless integration into existing processes while delivering something users can see a clear benefit from.</p>
<p>Instead of starting with purely technical questions—such as “will I need to export this to a CSV?”—it is often more effective to begin by considering the user journey. For example, if the end goal is for users to view the results in a dashboard, that should frame the discussion from the outset. Once the user’s needs are clear, the practitioner can then work with the data engineering team to determine the most appropriate technical solution, such as the optimal data format or storage approach.</p>
<p>Hence it is important to ask early:</p>
<ul>
<li>Where will predictions appear (CRM, dashboard, alert)?</li>
<li>Will outputs be delivered in tools people already use?</li>
<li>What training or support is required?</li>
<li>How will impact be made visible to leadership?</li>
<li>How best should the outputs of the model be presented to ensure they are usable and actionable for the next stage of the business process.</li>
</ul>
<p>Thinking about these questions early prevents the familiar fate of a technically brilliant model that sits idle.</p>
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><img src="https://realworlddatascience.net/foundation-frontiers/posts/2026/02/25/images/body.png" class="img-fluid quarto-figure quarto-figure-center figure-img" style="width:80.0%"></p>
</figure>
</div>
<p>Adoption is strongest when development is iterative. Rather than disappearing into a three-month build, teams should work in cycles: release a minimum viable product (MVP), test it with users, gather feedback, and refine. The first iteration of the MVP should be the simplest form of the product while testing the core principle of what is trying to be achieved. An MVP could be as simple as a weekly spreadsheet with a risk score; if it proves valuable, the team can then invest in automation, dashboards, or more advanced models. This staged approach reduces risk, delivers value early, and builds trust among stakeholders. Crucially, reaching an MVP quickly lets both technical and business teams see what works—and what doesn’t—in practice, instead of relying on endless planning meetings where edge cases are difficult to anticipate.</p>
<p>Communication is critical. Just as one study on doctor–patient interactions found that 91% of patients preferred doctors who avoided jargon [1], stakeholders respond more positively when practitioners present results in plain language. Clear explanations build understanding, and understanding builds trust. It is also important to explain, in accessible terms, how a model or tool works “under the hood,” so users can better grasp how decisions are being made. Adoption can be further strengthened by having champions within the business—trusted and respected leaders in the business area who engage end users, promote new tools, and support day-to-day use through training and guidance.</p>
<div class="callout callout-style-simple callout-note">
<div class="callout-body d-flex">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-body-container">
<p>In the retention case, adoption failed because the model was delivered as a finished artefact, with no path to use. A better approach would have been to:</p>
<ul>
<li>Deliver an MVP: a simple risk score in a spreadsheet, tested with Marketing for a small pilot group while establishing a continuous feedback look through feedback forms or through stakeholder updates.</li>
<li>Work iteratively with data engineering to integrate predictions into the CRM step by step, rather than aiming for a big-bang deployment. define the CRM fields, score push schedule, ownership of follow-up, and SLAs; confirm who acts on the scores and how outcomes are recorded.</li>
<li>Run a test-and-control pilot to prove incremental benefit, building an evidence base for expansion.</li>
<li>Set up a lightweight KPI dashboard so everyone can see early wins in terms of contracts saved and revenue retained.</li>
<li>Create champions by involving stakeholders at every stage, so they owned and advocated for the solution.</li>
</ul>
<p>Had the project taken an iterative, MVP-first approach, the practitioner would have avoided months of sunk effort and built momentum for adoption as trust grew over time. Adoption is not an afterthought—it is the decisive factor that turns technical excellence into sustained impact.</p>
</div>
</div>
</div>
</section>
<section id="the-bottom-line" class="level2">
<h2 class="anchored" data-anchor-id="the-bottom-line">The Bottom Line</h2>
<blockquote class="blockquote">
<p>Great models rarely fail because of poor algorithms; they fail because they are disconnected from the goals, workflows, strategies, and people they are meant to serve.</p>
</blockquote>
<p>To avoid the fate of the Vanishing Model, projects must begin with a clear vision — one that is co-created with stakeholders and sustained through regular check-ins. Frame every project around measurable business outcomes and define success before writing a single line of code.</p>
<p>Prove value under real-world conditions with well designed and measurable evaluation plans such as test-and-control approaches. Weigh technical ambition against practical trade-offs—cost, complexity, deployment speed, and maintainability. Translate precision, recall, and ROC curves into outcomes the business understands: contracts retained, revenue gained, costs reduced. And above all, plan for adoption from day one, so that predictions are not just accurate but usable, trusted, and embedded in daily decisions.</p>
<p>In the end, the mark of a great model is not the elegance of its algorithm but its ability to have a positive impact.</p>
<p><em>For a broader, strategic view of why organisations struggle to realise value from AI—and how leadership and structure can change the odds—check out <a href="https://realworlddatascience.net/applied-insights/case-studies/posts/2026/01/12/why-95-percent-of-ai-projects-fail.html">“Why 95% of AI Projects Fail.”</a></em></p>
<p><strong>Sources:</strong> [1]Allen, K. A., Charpentier, V., Hendrickson, M. A., Kessler, M., Gotlieb, R., Marmet, J., Hause, E., Praska, C., Lunos, S., &amp; Pitt, M. B. (2023). Jargon Be Gone – Patient Preference in Doctor Communication. Journal of Patient Experience, 10, Article 23743735231158942. DOI: 10.1177/23743735231158942.</p>
<div class="article-btn">
<p><a href="../../../../../applied-insights/index.html">Explore more data science ideas</a></p>
</div>
<div class="further-info">
<div class="grid">
<div class="g-col-12 g-col-md-12">
<dl>
<dt>About the author:</dt>
<dd>
<a href="https://www.linkedin.com/in/jennifer--hall/">Jennifer Hall</a> is a Senior Analytics Manager at Barclays and Co-Vice Chair of the Royal Statistical Society’s Data Science and AI Section. She is an <a href="https://rss.org.uk/resources/resources-for-educators/rss-william-guy-lecturers/">RSS William Guy Lecturer (2025–2026)</a>; this year’s theme, Statistics and AI, aims to inspire young people to understand how statistical thinking underpins AI and shapes the world around them. Jennifer has extensive experience applying data science and advanced analytics to real-world challenges across finance, travel, healthcare, and insurance. This breadth of experience has strengthened her commitment to delivering responsible, data-driven solutions that create meaningful impact for both businesses and society.
</dd>
</dl>
<div class="g-col-12 g-col-md-6">
<p><strong>Copyright and licence</strong> : © 2026 Jennifer Hall<br>
<a href="http://creativecommons.org/licenses/by/4.0/?ref=chooser-v1" target="_blank" rel="license noopener noreferrer" style="display:inline-block;"> <img style="height:22px!important;vertical-align:text-bottom;" src="https://mirrors.creativecommons.org/presskit/icons/cc.svg?ref=chooser-v1"> <img style="height:22px!important;margin-left:3px;vertical-align:text-bottom;" src="https://mirrors.creativecommons.org/presskit/icons/by.svg?ref=chooser-v1"> </a> This article is licensed under a Creative Commons Attribution 4.0 (CC BY 4.0) <a href="http://creativecommons.org/licenses/by/4.0/?ref=chooser-v1" target="_blank" rel="license noopener noreferrer" style="display:inline-block;">International licence</a>.</p>
</div>
<div class="g-col-12 g-col-md-6">
<p><strong>How to cite</strong> :<br>
Hall, Jennifer. 2026. “<strong>Why Great Models Still Fail</strong>.” <em>Real World Data Science</em>, 2026. <a href="https://realworlddatascience.net/applied-insights/tutorials/posts/2026/12/why-great-models-still-fail.html">URL</a></p>
</div>
</div>


</div>
</div>
</section>

 ]]></description>
  <category>AI</category>
  <category>Data Science</category>
  <guid>https://realworlddatascience.net/foundation-frontiers/posts/2026/02/25/why_great_models_still_fail.html</guid>
  <pubDate>Wed, 25 Feb 2026 00:00:00 GMT</pubDate>
  <media:content url="https://realworlddatascience.net/foundation-frontiers/posts/2026/02/25/images/thumb.png" medium="image" type="image/png" height="96" width="144"/>
</item>
<item>
  <title>Beyond Quantification: Interview with Professor Sylvie Delacroix on Navigating Uncertainty with AI</title>
  <dc:creator>Annie Flynn</dc:creator>
  <link>https://realworlddatascience.net/foundation-frontiers/posts/2026/01/29/beyond-quantification-delacroix-interview.html</link>
  <description><![CDATA[ 





<p>We recently published a <a href="https://realworlddatascience.net/foundation-frontiers/datasciencebites/posts/2025/11/21/uncertainty.html"><em>Data Science Bite</em></a> breaking down the first position paper of the newly launched journal, <a href="https://academic.oup.com/rssdat"><em>RSS: Data Science and Artificial Intelligence</em></a>. The paper, <a href="https://academic.oup.com/rssdat/article/1/1/udaf002/8317136"><em>Beyond Quantification: Navigating Uncertainty in Professional AI Systems</em></a>, argues that if AI is truly to support professional decision-making in high-stakes fields, we must move beyond probabilistic measures and use participatory approaches that allow experts to collectively express and navigate non-quantifiable forms of uncertainty.</p>
<p><em>Real World Data Science</em> recently had the opportunity to speak to the paper’s lead author, Professor Sylvie Delacroix, about how AI can better support human judgment, why it is crucial to recognise forms of uncertainty that can’t be reduced to numbers, and how participatory design can make AI a true partner, rather than a replacement, for professionals.</p>
<p>Watch the full interview below and scroll down for key takeaways and some analysis.</p>
<hr>
<section id="interview-beyond-quantification-and-uncertainty-in-ai" class="level2">
<h2 class="anchored" data-anchor-id="interview-beyond-quantification-and-uncertainty-in-ai">Interview: Beyond Quantification and Uncertainty in AI</h2>
<div class="quarto-video ratio ratio-16x9"><iframe data-external="1" src="https://www.youtube.com/embed/tJDy293oqPk" title="" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen=""></iframe></div>
<hr>
</section>
<section id="key-takeaways-at-a-glance" class="level2">
<h2 class="anchored" data-anchor-id="key-takeaways-at-a-glance">Key Takeaways at a Glance</h2>
<section id="not-all-uncertainty-is-measurable" class="level3">
<h3 class="anchored" data-anchor-id="not-all-uncertainty-is-measurable">Not all uncertainty is measurable</h3>
<p>AI often focuses on quantifiable uncertainty, like probabilities or confidence scores, but ethical and contextual uncertainties are equally important in professions like healthcare, education, and justice.</p>
<blockquote class="blockquote">
<p>“The problem is that if we design these systems in a way that means they’re only capable of communicating these quantifiable types of uncertainty, we risk systematically undermining the significance and importance of non-quantifiable types of uncertainty… which are fundamentally ethical and contextual.”</p>
</blockquote>
</section>
<section id="participatory-ai-matters" class="level3">
<h3 class="anchored" data-anchor-id="participatory-ai-matters">Participatory AI matters</h3>
<p>Systems should let professionals shape how uncertainty is expressed, supporting collaboration and collective judgment rather than replacing human decision-making.</p>
<blockquote class="blockquote">
<p>“The intervention that we want is ideally one that means the systems are mouldable by the users over time… that’s what we mean by participatory interfaces.”</p>
</blockquote>
</section>
<section id="the-goal-is-to-support-and-foster-human-intelligence-not-replace-it" class="level3">
<h3 class="anchored" data-anchor-id="the-goal-is-to-support-and-foster-human-intelligence-not-replace-it">The goal is to support and foster human intelligence, not replace it</h3>
<p>The most valuable AI tools help professionals reflect, reason, and intuitively navigate complex situations, rather than just process more data faster.</p>
</section>
<section id="real-world-ai-is-already-in-use" class="level3">
<h3 class="anchored" data-anchor-id="real-world-ai-is-already-in-use">Real-world AI is already in use</h3>
<p>GPs, teachers, and other professionals are using AI in sensitive ways, sometimes for informal “sense-making” conversations that influence moral judgments.</p>
</section>
<section id="small-refinements-have-big-impact" class="level3">
<h3 class="anchored" data-anchor-id="small-refinements-have-big-impact">Small refinements have big impact</h3>
<p>Features like expressing incompleteness, ethical uncertainty, or alternative perspectives can significantly strengthen professional agency when developed with participatory input.</p>
<blockquote class="blockquote">
<p>“You could imagine a GP flagging an output and saying… it turns out the output could have been very dangerous because it didn’t include key diagnostic tools… and you could then imagine an interesting conversation with other GPs to figure out together how incompleteness should be expressed.”</p>
</blockquote>
</section>
<section id="efficiency-should-not-undermine-judgment" class="level3">
<h3 class="anchored" data-anchor-id="efficiency-should-not-undermine-judgment">Efficiency should not undermine judgment</h3>
<p>AI can save time, but systems must preserve the dynamic, normative nature of the professional practices within which they are deployed to ensure long-term effectiveness.</p>
</section>
<section id="the-time-to-act-is-now" class="level3">
<h3 class="anchored" data-anchor-id="the-time-to-act-is-now">The time to act is now</h3>
<p>Professionals, designers, and regulators need to collectively shape AI tools before design choices are frozen, ensuring they support human-centred, ethical practice.</p>
<blockquote class="blockquote">
<p>“If professionals just wait for regulation to intervene, there’s a risk that regulation will arrive only when design choices are frozen… we all have agency in this; we can’t afford to be passive.”</p>
</blockquote>
<hr>
</section>
</section>
<section id="join-the-conversation" class="level2">
<h2 class="anchored" data-anchor-id="join-the-conversation">Join the conversation</h2>
<p><a href="https://academic.oup.com/rssdat"><em>RSS: Data Science and Artificial Intelligence</em></a> has an open <a href="https://academic.oup.com/rssdat/pages/call-for-papers-uncertainty-in-the-era-of-ai">call for submissions</a> responding to the paper.</p>
<p>Sylvie Delacroix’s work is a call to action for data scientists, designers, and professionals alike. We have a window of opportunity to shape AI systems that encourage humans to keep re-articulating the values they care about.</p>
<p>We want to hear from you. As AI tools become more integrated into high-stakes professions, how can we ensure that systems support human judgment in all its facets rather than simply optimising for efficiency?</p>
<p>Read the full paper <a href="https://academic.oup.com/rssdat/article/1/1/udaf002/8317136">here</a>, or our accessible digest <a href="https://realworlddatascience.net/foundation-frontiers/datasciencebites/posts/2025/11/21/uncertainty.html">here</a>, and join the conversation about building AI tools that truly serve people, not just processes.</p>
<div class="article-btn">
<p><a href="../../../../../foundation-frontiers/index.html">Explore more data science ideas</a></p>
</div>
<div class="further-info">
<div class="grid">
<div class="g-col-12 g-col-md-12">
<dl>
<dt>About the speaker:</dt>
<dd>
<a href="https://delacroix.uk/">Professor Sylvie Delacroix</a>is the Inaugural Jeff Price Chair in Digital Law at Kings College London. She is also the director of the <a href="https://www.kcl.ac.uk/research/centre-for-data-futures">Centre for Data Futures</a> and a visiting professor at Tohoku University. Her research focuses on the role played by habit within ethical agency, the social sustainability of the data ecosystem that makes generative AI possible and bottom-up data empowerment.
</dd>
</dl>
</div>
</div>


</div>
</section>

 ]]></description>
  <category>Interviews</category>
  <category>AI</category>
  <category>Ethics</category>
  <category>Uncertainty</category>
  <guid>https://realworlddatascience.net/foundation-frontiers/posts/2026/01/29/beyond-quantification-delacroix-interview.html</guid>
  <pubDate>Thu, 29 Jan 2026 00:00:00 GMT</pubDate>
  <media:content url="https://realworlddatascience.net/foundation-frontiers/posts/2026/01/29/images/thumb.png" medium="image" type="image/png" height="96" width="144"/>
</item>
<item>
  <title>Why Data Quality Is the New Competitive Edge For Data Scientists</title>
  <link>https://realworlddatascience.net/foundation-frontiers/posts/2026/01/27/data-qual-is-competitive-edge.html</link>
  <description><![CDATA[ 





<section id="discipline-of-data-science" class="level2">
<h2 class="anchored" data-anchor-id="discipline-of-data-science">Discipline of Data Science</h2>
<p>Data science is a relatively recent field compared to the disciplines that it consists of, namely statistics and computer science. New knowledge and disciplines often arise as combinations of existing fields and data science is no exception. Born into a world with increasingly large datasets, data science combined statistical methods with the tools of computer science to analyze large amounts of data.</p>
<p>As a discipline, data science gained popularity in industry and academia in the early 2000s and in part due to an influential paper published in the Harvard Business Review <span class="citation" data-cites="davenport2012sexiest">(1)</span>. This paper carried the provocative title of “Data Scientist: The Sexist Job of the Twenty-first Century”. After its publication, universities developed data science programs, governments poured funding into Big Data initiatives and organizations built data science teams to tackle problems. Powerful machine learning algorithms such as neural nets, random forests and ensemble methods provided ways to develop complex models that could be used on these large datasets.</p>
<p>As Hoerl noted in an earlier piece <span class="citation" data-cites="hoerl2025future">(2)</span>:</p>
<blockquote class="blockquote">
<p>“There was a hiring rush for data scientists, not just in technology companies, but in virtually all sectors of the economy. For example, GE hired a new Chief Digital Officer from Oracle, Bill Ruh, in 2011. Ruh opened a new Software Center (later renamed “GE Digital”) in San Ramon, California in 2012, and by 2016 had hired 1,400 data scientists there.”</p>
</blockquote>
</section>
<section id="current-state-of-play-with-generative-ai" class="level2">
<h2 class="anchored" data-anchor-id="current-state-of-play-with-generative-ai">Current State of Play with Generative AI</h2>
<p>While data analyses based on text data such as NLP (natural language processing) have been around for a while, a new type of approach has quickly become widespread with the rapid rise of generative AI methods. These models, known as LLMs (large language models), have entered the everyday vernacular as all organizations grapple with how to use these tools. With the debut of ChatGPT in 2023, these LLMs have become increasingly sophisticated, trained on a wider variety of data sources and optimized for a variety of different scenarios.</p>
<p>The accuracy of these models depends on the quality of the training data used. While LLMs are known to hallucinate or give inaccurate answers, they tend to perform better in situations where the training data is precise and exact, with less nuance than language often carries. As a result, LLMs can produce large amounts of computer code based on large amounts of training data based on accurate computer code. Some specialized LLMs (e.g.&nbsp;<a href="https://claude.ai/">Claude Code</a>) have been specifically designed to generate accurate code that can access, clean, combine, analyze and visualize data. While LLMs are not perfect in code generation, they can increase the efficiency of an experienced coder.</p>
<p>As a result of these LLMs, less knowledge is required to analyze and work with large datasets. A user can provide a specific prompt on the business question or research objective, upload the relevant data and have an LLM provide a relevant data analysis, complete with the underlying code used to generate that analysis.</p>
</section>
<section id="the-new-data-scientist" class="level2">
<h2 class="anchored" data-anchor-id="the-new-data-scientist">The “New” Data Scientist</h2>
<p>As LLMs improve in their ability to quickly generate vast amounts of accurate code, what does that mean for a discipline which has prided itself on its data wrangling skills? As Davenport and Patil noted “data scientists’ most basic, universal skill is the ability to write code” <span class="citation" data-cites="davenport2012sexiest">(1)</span>. Data science has seen coding as a viable career path.</p>
<p>When a skill becomes accessible to a wider variety of people and can be automated, how does one distinguish themselves in an organization? When coding can be done by AI tools, what happens to those who are known for their coding abilities? For a discipline to be recognized as a discipline, it must have some distinguishing characteristic that defines it as different from other disciplines. In addition, disciplines become more valuable and prominent as their contribution to society grows.</p>
<p>So, what are the skills that data scientists have that can’t be done well by AI? While AI capabilities are rapidly increasing, we do believe there are things that may be beyond the reach of AI for the time being.</p>
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><img src="https://realworlddatascience.net/foundation-frontiers/posts/2026/01/27/images/infographic.png" class="img-fluid quarto-figure quarto-figure-center figure-img" style="width:80.0%"></p>
</figure>
</div>
<p>We believe that the greatest factor limiting AI is data quality. There seems to be a growing consensus of the importance of data quality as noted in Davenport, Hoerl, and Redman <span class="citation" data-cites="davenport2025unstructured">(3)</span>, Davenport and Tiwari <span class="citation" data-cites="davenport2024generative">(4)</span>, and Redman <span class="citation" data-cites="redman2020source">(5)</span> — as well as <a href="https://realworlddatascience.net/foundation-frontiers/posts/2025/10/30/data-detectives.html">recent Real World Data Science pieces</a>. As the adage goes, “garbage in, garbage out”. With poor, inaccurate data used in training, the resulting AI output will also be poor and inaccurate. A related fear is that, as AI models start to use AI generated content as a data source, a recursive loop happens that degrades the quality of any AI output <span class="citation" data-cites="shumailov2024collapse">(6)</span>.</p>
<p>Discussions of data quality are limited in books, university courses and training programs. When they do occur, they are restricted to the question of “are the data right?” and discussions of data cleaning. Data cleaning is often focused on eliminating outliers or invalid points. While there is often a reason to remove invalid points, those outliers can sometimes be the source of valuable insights.</p>
<p>However, data quality is much more than data cleaning or checking to make sure the data are accurate. There is an element of contextual understanding and process knowledge that enables the data scientist to properly prepare the data for analysis. We are skeptical of AI’s ability to fully understand context and the nuances of assumptions that go into data analysis. In an earlier piece, Jensen provided some examples of the limitations of AI when it comes to proper data cleaning <span class="citation" data-cites="jensen2024cleaning">(7)</span>. For any set of data, subject matter knowledge of how the data were collected and what they represent is crucial to a proper analysis.</p>
<p>This creates an opportunity for data scientists to become more valuable. By employing probing questions to better understand the context of the data, they will be in a better position to identify data quality issues and ways to improve the data quality, thus leading to better model output.</p>
</section>
<section id="conclusion" class="level2">
<h2 class="anchored" data-anchor-id="conclusion">Conclusion</h2>
<p>As coding becomes easier in an AI-enabled world - where anyone can code and analyze data - the skill set of a data scientist becomes less unique. Data scientists were once in high demand because they set themselves apart as coding wizards who could wrangle large datasets and extract insights. To remain successful and continue to deliver value, data scientists must now pivot their skillset. The real limiting factor in successful data science is data quality. A renewed focus on owning, improving and governing data quality will not only strengthen outcomes but also provide future job security and increase the value data scientists bring to organisations.</p>
<p><em>Note that this article is based on the following paper and contains some of the same ideas: Hoerl, Roger W. 2025. <a href="https://www.tandfonline.com/doi/full/10.1080/08982112.2025.2556222">“The Future of Statistics in an AI Era.”</a> Quality Engineering, published September 10, 2025.</em></p>
<p>You can find out more about the <em>Real World Data Science</em> stance on data quality from our article <a href="https://realworlddatascience.net/foundation-frontiers/posts/2025/10/30/data-detectives.html">‘Why We Should All Be Data Quality Detectives’</a>.</p>
<div class="article-btn">
<p><a href="../../../../../foundation-frontiers/index.html">Explore more data science ideas</a></p>
</div>
<div class="further-info">
<div class="grid">
<div class="g-col-12 g-col-md-12">
<dl>
<dt>About the authors:</dt>
<dd>
<a href="https://www.linkedin.com/in/roger-hoerl-6b6b3a5/">Roger Hoerl</a> is Brate-Peschel Professor of Statistics at Union College, after previously heading the Applied Statistics Laboratory at <a href="https://www.ge.com/news/reports/tag/ge%20global%20research">GE Global Research</a> for many years. He has been elected to the International Statistical Institute and the International Academy for Quality, recieved numerous statistic awards, and authored five books in the areas of statistics and business improvements.
</dd>
<dd>
<a href="https://www.linkedin.com/in/willis-jensen-305bba6/">Willis Jensen</a> is data and analytics expert, currently Senior Manager of People Analytics and Business Intelligence at <a href="https://chghealthcare.com/">CHG Healthcare</a>. He is an Adjunct Professor of Statistics at Brigham Young University, <a href="https://willisjensen.substack.com/">writes on Substack</a> and is a member of the Real World Data Science <a href="https://realworlddatascience.net/the-pulse/editors-blog/posts/2022/10/18/meet-the-team.html">editorial board</a>.
</dd>
</dl>
<div class="g-col-12 g-col-md-6">
<p><strong>Copyright and licence</strong> : © 2026 Roger Hoerl and Willis Jensen<br>
<a href="http://creativecommons.org/licenses/by/4.0/?ref=chooser-v1" target="_blank" rel="license noopener noreferrer" style="display:inline-block;"> <img style="height:22px!important;vertical-align:text-bottom;" src="https://mirrors.creativecommons.org/presskit/icons/cc.svg?ref=chooser-v1"> <img style="height:22px!important;margin-left:3px;vertical-align:text-bottom;" src="https://mirrors.creativecommons.org/presskit/icons/by.svg?ref=chooser-v1"> </a> This article is licensed under a Creative Commons Attribution 4.0 (CC BY 4.0) <a href="http://creativecommons.org/licenses/by/4.0/?ref=chooser-v1" target="_blank" rel="license noopener noreferrer" style="display:inline-block;">International licence</a>.</p>
</div>
<div class="g-col-12 g-col-md-6">
<p><strong>How to cite</strong> :<br>
Hoerl, Roger. Jensen, Willis. 2026. “<strong>Why Data Quality Is the New Competitive Edge for Data Scientists</strong>.” <em>Real World Data Science</em>, 2026. <a href="https://realworlddatascience.net/foundation-frontiers/tutorials/posts/2026/12/data-qual-is-competetive-edge.html">URL</a></p>
</div>
</div>
</div>


</div>

</section>

<div id="quarto-appendix" class="default"><section class="quarto-appendix-contents" id="quarto-bibliography"><h2 class="anchored quarto-appendix-heading">References</h2><div id="refs" class="references csl-bib-body">
<div id="ref-davenport2012sexiest" class="csl-entry">
<div class="csl-left-margin">1. </div><div class="csl-right-inline">Davenport TH, Patil DJ. Data scientist: The sexiest job of the 21st century. Harvard Business Review. 2012 Oct;70–6.</div>
</div>
<div id="ref-hoerl2025future" class="csl-entry">
<div class="csl-left-margin">2. </div><div class="csl-right-inline">Hoerl RW. The future of statistics in an AI era. Quality Engineering. 2025. doi:<a href="https://doi.org/10.1080/08982112.2025.2556222">10.1080/08982112.2025.2556222</a></div>
</div>
<div id="ref-davenport2025unstructured" class="csl-entry">
<div class="csl-left-margin">3. </div><div class="csl-right-inline">Davenport TH, Hoerl RW, Redman TC. To create value with AI, improve the quality of your unstructured data. Harvard Business Review [Internet]. 2025 May. Available from: <a href="https://hbr.org/2025/05/to-create-value-with-ai-improve-the-quality-of-your-unstructured-data">https://hbr.org/2025/05/to-create-value-with-ai-improve-the-quality-of-your-unstructured-data</a></div>
</div>
<div id="ref-davenport2024generative" class="csl-entry">
<div class="csl-left-margin">4. </div><div class="csl-right-inline">Davenport TH, Tiwari P. Is your company’s data ready for generative AI? Harvard Business Review [Internet]. 2024 Mar. Available from: <a href="https://hbr.org/2024/03/is-your-companys-data-ready-for-generative-ai">https://hbr.org/2024/03/is-your-companys-data-ready-for-generative-ai</a></div>
</div>
<div id="ref-redman2020source" class="csl-entry">
<div class="csl-left-margin">5. </div><div class="csl-right-inline">Redman TC. To improve data quality, start at the source. Harvard Business Review [Internet]. 2020 Feb. Available from: <a href="https://hbr.org/2020/02/to-improve-data-quality-start-at-the-source">https://hbr.org/2020/02/to-improve-data-quality-start-at-the-source</a></div>
</div>
<div id="ref-shumailov2024collapse" class="csl-entry">
<div class="csl-left-margin">6. </div><div class="csl-right-inline">Shumailov I, Shumaylov Z, Zhao Y, Papernot N, Anderson R, Gal Y. AI models collapse when trained on recursively generated data. Nature. 2024;631(8022):755–9.</div>
</div>
<div id="ref-jensen2024cleaning" class="csl-entry">
<div class="csl-left-margin">7. </div><div class="csl-right-inline">Jensen WA. Can data cleaning be automated? [Internet]. 2024. Available from: <a href="https://willisjensen.substack.com/p/can-data-cleaning-be-automated">https://willisjensen.substack.com/p/can-data-cleaning-be-automated</a></div>
</div>
</div></section></div> ]]></description>
  <category>Data quality</category>
  <guid>https://realworlddatascience.net/foundation-frontiers/posts/2026/01/27/data-qual-is-competitive-edge.html</guid>
  <pubDate>Tue, 27 Jan 2026 00:00:00 GMT</pubDate>
  <media:content url="https://realworlddatascience.net/foundation-frontiers/posts/2026/01/27/images/thumb.png" medium="image" type="image/png" height="96" width="144"/>
</item>
<item>
  <title>Keeping the Science in Data Science</title>
  <dc:creator>Willis Jensen, Fatemeh Torabi, Monnie McGee, Isabel Sassoon</dc:creator>
  <link>https://realworlddatascience.net/foundation-frontiers/posts/2025/12/03/scienceindatascience.html</link>
  <description><![CDATA[ 





<p>Have you ever run an elegant ML model that landed flat with those who were supposed to use the insights? Do you find yourself deep into building hundreds of features for your model without knowing exactly what they all mean? Do you spend the bulk of your time tweaking your algorithms while aiming for incremental improvements in accuracy? If so, you might be focused more on the “data” aspects of “data science” than the “science” aspects.</p>
<section id="two-foundational-elements-of-data-science" class="level2">
<h2 class="anchored" data-anchor-id="two-foundational-elements-of-data-science">Two Foundational Elements of Data Science</h2>
<p>“Data Science” contains two essential components, “data” and “science”. The field of Data Science requires holding both components in equilibrium. Data is the raw material molded in the service of Science. While Data is first, Science is no less important. Data is the foundation and Science gives it purpose.</p>
<p>What do we mean by Science? We’re referring specifically to the scientific method as an approach to gain knowledge. It is the process of formulating ideas and hypotheses about the world around us and collecting data to determine the validity of those ideas. By hypotheses we’re not limiting the definition to strict statistical hypothesis tests, but rather the general process of formulating a research question, gathering appropriate data and advancing human knowledge, regardless of the statistical techniques or machine learning algorithms employed. Science, at its core, is about using data to gain insights and understanding about the complex universe we inhabit.</p>
<p>The scientific method has a long history and is generally defined in terms of steps such as these <a href="https://en.wikipedia.org/wiki/Scientific_method?utm_source=chatgpt.com">noted in Wikipedia</a> (Scientific Method, 2025) as:</p>
<ol type="1">
<li><p>Characterizations (observations, definitions, and measurements of the subject of inquiry)</p></li>
<li><p>Hypotheses (theoretical, hypothetical explanations of observations and measurements of the subject)</p></li>
<li><p>Predictions (inductive and deductive reasoning from the hypothesis or theory)</p></li>
<li><p>Experiments (tests of all of the above)</p></li>
</ol>
<p>The relationship between Data and Science is cyclical. Performing good science requires gathering good data, informed by proper experimental design techniques, which in turn requires appropriate analysis and interpretation in the context of the science. And good research (science) often generates more questions than it answers, giving way to the need to gather more data, and so on. As such, data science is more than just confirming hypotheses or generating insights, it becomes the application of the scientific method.</p>
<p>The scientific method should be the scaffold supporting what data scientists do. While data scientists come from a variety of backgrounds, many have more training in computer science than statistical methodology, and have more experience in software tools than they do in executing the scientific method.</p>
<p>In an influential, relevant paper Shmueli (2010) described two major types of statistical modeling, I) explanatory models which attempt to determine causal effects, and II) predictive models which seek accurate predictions. While predictive models can lead to understanding and possible explanatory models, explanatory models tend to be preferred by those seeking more scientific explanations for phenomena.</p>
</section>
<section id="data-modeling-culture-versus-algorithmic-culture" class="level2">
<h2 class="anchored" data-anchor-id="data-modeling-culture-versus-algorithmic-culture">Data Modeling Culture Versus Algorithmic Culture</h2>
<p>Leo Breiman <a href="https://projecteuclid.org/journals/statistical-science/volume-16/issue-3/Statistical-Modeling--The-Two-Cultures-with-comments-and-a/10.1214/ss/1009213726.full">in a famous paper</a> described two paradigms: I) the data modelling approach, which assumes the model that generates the data, and II) the algorithm approach, which relies on flexible methods without making assumptions about an underlying data generating model or how the data are generated. Breiman(2001a) felt that the statistics discipline was missing out on opportunities by focusing more on data modeling approaches and not using algorithmic approaches. He practiced what he preached, developing new algorithmic approaches and encouraging the field to increase its focus on algorithms. For example, he introduced <a href="https://link.springer.com/article/10.1023/A:1010933404324">Random Forests</a>, starting a cascade of more algorithmic approaches to modeling (Breiman, 2001b).</p>
<p>This wise counsel from Breiman encouraged those working with data to build more expertise in algorithms, promoting the algorithmic culture as a way to harness the power of new computational techniques. We would equate Breiman’s algorithmic approach with a greater focus on the Data side of Data Science and the data modeling approach with a greater focus on the Science side of Data Science.</p>
</section>
<section id="balancing-data-and-science" class="level2">
<h2 class="anchored" data-anchor-id="balancing-data-and-science">Balancing Data and Science</h2>
<p>Just as pendulums slowly swing back and forth, so too has the pendulum swung too hard towards predictive accuracy (the Data side) at the expense of contextual interpretation (the Science side) . This pendulum swing is evidenced by the growing demand for explainable ML methods (see Alangariret al.&nbsp;2023 as an example) . One such method is the use of Shapley values to elicit and rank the most important features in a ML model (see Rozemberczkiet al 2022 for an introduction). It seems ironic that, in the rush to gain model accuracy with sophisticated models containing hundreds of features, end users of the models still want something they can understand and explain. In other words, they still want scientific knowledge and understanding of cause and effect, even for complicated problems.</p>
<p>So what is the best approach from a scientific perspective? Throw as many features into a model that you can think of and see which ones show up to be the most important? Or is there some thought and care that can go into feature selection, considering what might be important given your knowledge of the science behind a problem?</p>
<p>We’re not suggesting that it is bad to include many features in a model. We’re suggesting that considering the context of the problem can provide insight on features that might matter. Of course, we don’t want to jump to conclusions on what we think is important and miss opportunities to learn. We seek to maintain some balance between using our previous knowledge and experience while not increasing the risk of confirmation bias in the feature selection process.</p>
<p>In software engineering, there is a well-known warning: “premature optimisation is the root of all evil”. The same applies in Data Science. Too often, teams rush to optimise models, tuning hyper-parameters, stacking architectures, and searching for marginal gains, before clearly defining the scientific question or validating whether the data and assumptions are appropriate. This tendency leads to models that are mathematically elegant but scientifically ungrounded. Optimisation should follow understanding, not precede it. A model that captures the right question with moderate accuracy is far more valuable than one that optimises the wrong target to perfection. This limitation of models is reflected in the famous aphorism “All models are wrong, but some are useful,” most commonly associated with the British Statistician George Box, who wrote (Box 1976):</p>
<p><em>“Since all models are wrong, the scientist cannot obtain a”correct” one by excessive elaboration. On the contrary, following William of Occam, he should seek an economical description of natural phenomena. Just as the ability to devise simple but evocative models is the signature of the great scientist, so overelaboration and overparameterisation is often the mark of mediocrity.”</em></p>
<p>Is it okay to use black box models where the model accuracy is paramount and ignore the explainability of the model? Yes, for some problems. But should we use black box models for all problems? No.&nbsp;The key to being a good data scientist/statistician is to recognize when one provides more value than another and use the best approach for the problem at hand.</p>
<p>So how does one give more attention to the Science side of Data Science? It starts with more attention on the question of interest. It doesn’t matter so much the type of question - whether it is a research question, a business problem to solve, or something sparked by curiosity. And it is often more than a single question. Often, it is not a single question but a series of cascading questions, each one digging deeper to get at the root causes. To manage this complexity effectively, it helps to adopt a modular approach, structuring analytical work into well-defined, interlinked components that mirror the scientific process. Each module focuses on a specific purpose: formulating and refining hypotheses, understanding data provenance and quality, developing and validating models, and translating findings into meaningful actions. Such modularisations keep the process transparent and iterative, prevents premature optimisation, and ensures that model development remains anchored to the underlying scientific inquiry rather than drifting towards technical over-engineering. With this increased attention, we believe sampling methods and experimental design will continue to be fundamental.</p>
<p>Here’s one example loosely based on our work experience. A business executive has some reports that show an increase in turnover at their organization, which is driving up hiring and recruiting costs, making the company less profitable. We find that the turnover is higher for those who are newer to the company, which leads to the question: “Why these new ones?” This leads to an additional hypothesis that perhaps these newer employees are not getting the leadership support they need, which leads to questions about the effectiveness of leadership training programs, which in turn leads to questions around how we measure the effectiveness of training programs. By continuing to ask questions, we can get a more targeted effort at a root cause and thus increase the impact of our work.</p>
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><img src="https://realworlddatascience.net/foundation-frontiers/posts/2025/12/03/images/businessexec.png" class="img-fluid quarto-figure quarto-figure-center figure-img" style="width:80.0%"></p>
</figure>
</div>
<p>With the availability of many algorithms and approaches that are able to process large amounts of data, it can be tempting to gravitate towards them. When teaching analysis/ applied statistics courses, it is important to look beyond the methods and consider the overall aim. There are a set of frameworks that have been around for a while that can be helpful in finding the right balance. One example is PPDAC (Mckayet al.&nbsp;2000) that emphasises all the steps beyond the modelling part and the importance of considering them all. Using such frameworks can help decide whether a black box approach is suitable in the situation or whether this won’t achieve the overall intended aim.</p>
</section>
<section id="finding-balance" class="level2">
<h2 class="anchored" data-anchor-id="finding-balance">Finding Balance</h2>
<p>So how do we ensure a good balance between Data and Science in Data Science?</p>
<p>One way is to ask “so what” with any analysis that you do and any model that you build. Ideally, you would ask that at the beginning of a project to reduce wasted effort, but it should be clear how the analysis output will be used. And it would not be sufficient to say “so that we can publish the output in a paper”. You have to think about the impact of the analysis. Will it change a decision that is being made? Does it create a new insight that can be acted upon? Does it lead to a process improvement or a new product innovation? Does it lead to a new way of running an organization? Data science that doesn’t lead to some action or insight is just computation for computations sake. Vance et al.&nbsp;(2022) provides additional resources and advice for how to ask good questions.</p>
<p>A second way is to consider the potential explanations and meaning behind any model. Don’t become too enamored with the predictive accuracy of the model (which isn’t inherently a bad thing) at the expense of asking whether there are potential scientific explanations based on the features used in any model. Use a predictive model as a starting point for digging deeper and finding a smaller set of features that provide deeper insight on potential causal relationships to explore.Sometimes a simpler model that is easier to “explain” or one that uses trusted data provides more value than the latest and greatest algorithm.</p>
<p>A third way is continuing emphasis on the reproducibility of the results. Clean code, documentation of results, version control of analysis code and open sharing of the code with its underlying assumptions are best practices to ensure that others can replicate the findings of any data science output. <a href="https://realworlddatascience.net/foundation-frontiers/posts/2023/11/06/how-to-open-science.html">Sassoon (2023)</a> provides additional guidance for ensuring reproducibility and transparency of results.</p>
<p>A final way to find the balance is to better understand how the data are generated. Hoerl (2025) touched on this issue in calling for statisticians to focus more on data quality. <a href="https://realworlddatascience.net/foundation-frontiers/posts/2025/10/30/data-detectives.html">We believe this advice to be equally relevant for data scientists.</a> By recognizing the crucial importance of the data generation process, data scientists will better be able to use the right data that matches the problem of interest and push for changes as needed to ensure high quality data.</p>
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><img src="https://realworlddatascience.net/foundation-frontiers/posts/2025/12/03/images/balance.png" class="img-fluid quarto-figure quarto-figure-center figure-img" style="width:80.0%"></p>
</figure>
</div>
</section>
<section id="conclusions" class="level2">
<h2 class="anchored" data-anchor-id="conclusions">Conclusions</h2>
<p>We encourage data scientists to live up to their name and become experts in both Data and Science. As they find a proper balance between those two areas, their impact and influence will increase and the quality of their rigorous work will stand up to scrutiny and advance human knowledge.</p>
<p><strong>References</strong></p>
<p>Alangari, N., Menai, M. E. B.,Mathkour, H., &amp;Almosallam, I. (2023).Exploring evaluation methods for interpretable machine learning: A survey.<em>Information</em>,<em>14</em>(8), 469.</p>
<p>Box, George (December 1976). “Science and Statistics”. Journal of the American Statistical Association. 71 (356): 791–799. doi:10.1080/01621459.1976.10480949. Retrieved 2025-11-28.</p>
<p>Breiman, L. (2001a). Statistical modeling: The two cultures (with comments and a rejoinder by the author).<em>Statistical Science</em>,<em>16</em>(3), 199-231.</p>
<p>Breiman, L. (2001b). Random forests.<em>Machine Learning</em>,<em>45</em>, 5-32.</p>
<p>Hoerl, R. W. (2025). The future of statistics in an AI era.<em>Quality Engineering</em>. Advance online publication.</p>
<p>Hyde, R. (2009). The fallacy of premature optimization.<em>Ubiquity</em>,<em>2009</em>(February).</p>
<p>MacKay, R. J., &amp; Oldford, R. W. (2000). Scientific method, statistical method and the speed of light.<em>Statistical Science</em>,<em>15</em>(3), 254-278.</p>
<p>Rozemberczki, B., Watson, L., Bayer, P., Yang, H. T., Kiss, O., Nilsson, S., &amp; Sarkar, R. (2022). Theshapleyvalue in machine learning. In<em>The 31st International Joint Conference on Artificial Intelligence and the 25th European Conference on Artificial Intelligence</em>(pp.&nbsp;5572-5579). International Joint Conferences on Artificial Intelligence Organization.</p>
<p>Sassoon, I. (2023, November 6). How to ‘open science’: A brief guide to principles and practices.<em>Real World Data Science</em>. https://realworlddatascience.net/foundation-frontiers/posts/2023/11/06/how-to-open-science.html</p>
<p>Scientific method. (2025, November). In<em>Wikipedia</em>.</p>
<p>Shmueli, G. (2010). To explain or to predict?<em>Statistical Science</em>,<em>25</em>(3), 289-310.</p>
<p>Vance, E. A., Trumble, I. M., Alzen, J. L., &amp; Smith, H. S. (2022). Asking great questions.<em>Stat</em>,<em>11</em>(1), e471.</p>
<div class="article-btn">
<p><a href="../../../../../foundation-frontiers/index.html">Explore more data science ideas</a></p>
</div>
<div class="further-info">
<div class="grid">
<div class="g-col-12 g-col-md-12">
<dl>
<dt>About the authors</dt>
<dd>
This article was authored by some of our editorial board members. You can find their bios <a href="https://realworlddatascience.net/the-pulse/editors-blog/posts/2022/10/18/meet-the-team.html">on our team page</a>.
</dd>
</dl>
</div>
<div class="g-col-12 g-col-md-6">
<dl>
<dt>Copyright and licence</dt>
<dd>
© 2025 Willis Jensen, Fatemeh Torabi, Monnie McGee, Isabel Sassoon.
</dd>
</dl>
</div>
<div class="g-col-12 g-col-md-6">
<p>: Jensen, Willis. Torabi, Fatemeh. McGee, Monnie. Sassoon, Isabel. “Keeping the Science in Data Science,” Real World Data Science, December 04, 2025. <a href="https://realworlddatascience.net/foundation-frontiers/posts/2025/11/27/scienceindatascience.html">URL</a></p>
</div>


</div>
</div>
</section>

 ]]></description>
  <guid>https://realworlddatascience.net/foundation-frontiers/posts/2025/12/03/scienceindatascience.html</guid>
  <pubDate>Thu, 04 Dec 2025 00:00:00 GMT</pubDate>
  <media:content url="https://realworlddatascience.net/foundation-frontiers/posts/2025/12/03/images/thumbnail2.png" medium="image" type="image/png" height="96" width="144"/>
</item>
<item>
  <title>AI for Social Good: Interview with the Founder of Mike Hudson Foundation</title>
  <dc:creator>Annie Flynn</dc:creator>
  <link>https://realworlddatascience.net/foundation-frontiers/posts/2025/11/27/MHF-interview.html</link>
  <description><![CDATA[ 





<p>Since the emergence of generative AI such as OpenAI’s ChatGPT at the end of 2022, data science practitioners have watched large language models (LLMs) become both a transformative capability and a source of organisational anxiety. Few people sit closer to this intersection than <strong>Mike Hudson</strong>, a former fintech entrepreneur who now leads initiatives deploying AI for social good through the <a href="https://www.mikehudsonfoundation.org/">Mike Hudson Foundation</a>. MHF’s current projects include <a href="https://www.knowbot.uk/">Knowbot</a>, an LLM-powered tool designed to make complex websites more accessible. Knowbot uses LLMs to read and distill curated information to answer users’ complex questions.</p>
<p>Previously, MHF created <a href="https://www.testramp.org/">TestRAMP</a>, an ambitious effort during the COVID-19 pandemic to mobilise private lab PCR capacity for public benefit which also identified <a href="https://www.smf.co.uk/publications/testramp-marketplace-covid-testing/">potential lessons for future crisis situations</a>.</p>
<p>In this interview, Mike speaks candidly about the challenges and opportunities of deploying AI in nonprofit settings, and the lessons learned from building responsible, high-impact technology during moments of rapid change.</p>
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><img src="https://realworlddatascience.net/foundation-frontiers/posts/2025/11/27/images/simplequestion.png" class="img-fluid quarto-figure quarto-figure-center figure-img" style="width:80.0%"></p>
</figure>
</div>
<section id="the-appetite-for-ai-in-the-third-sector" class="level2">
<h2 class="anchored" data-anchor-id="the-appetite-for-ai-in-the-third-sector">The Appetite for AI in the Third Sector</h2>
<p><strong>Q: What led you to start working on Knowbot in the first place?</strong></p>
<p>I used to be a business entrepreneur with several tech and fintech businesses. That came to an end when I sold those businesses and then, when Covid started, it seemed like there was an opportunity to give something back. I founded TestRAMP, my first nonprofit, and that experience was so enjoyable and so productive. Once the pandemic eased, we had a foundation in place but no clear mission. Then ChatGPT launched, the world got excited about large language models, and we thought: could we use our tech backgrounds and some funding to build something for the social good?</p>
<p>Knowbot grew out of a simple question: Is there an appetite within the world of nonprofits for LLMs —and is there a use case for them? We needed a very simple, low-risk, easy-to-grasp AI use case that could act as a gateway for organisations cautiously exploring LLMs. Knowbot became that gateway.</p>
<p>We pushed it out to some of the non-profits we were already working with and it’s been interesting. There is some appetite for it, and the appetite is increasing. We use Knowbot as a jumping-off point to start a conversation about AI.</p>
<p><strong>Q: Are there patterns in which charities are more open to adopting AI?</strong></p>
<p>The biggest differentiator isn’t size or budget — it’s culture. We are finding it most productive to work with medium-to-large nonprofits with a scientific or research-oriented culture, where the internal decision-makers tend to “get” AI more quickly. Because our single biggest challenge, far and away above any technical or scaling challenges, or anything to do with IT or LLM, has been accessing the right decision-makers inside organisations.</p>
<p>The sector is understandably cautious and we are new kids on the block — Knowbot didn’t exist a year ago. And we have realised that word of mouth recommendations will be key to our growth: credibility has to be built one relationship at a time with the right nonprofit partners.</p>
</section>
<section id="designing-for-maximum-ease" class="level2">
<h2 class="anchored" data-anchor-id="designing-for-maximum-ease">Designing for Maximum Ease</h2>
<p><strong>Q: How did you approach technical deployment?</strong></p>
<p>From day one, we knew getting anything onto a nonprofit’s website would be difficult. So, technically speaking, we’ve made it as easy as we possibly can. Knowbot runs almost entirely on our servers. On a partner’s website, it appears as a javascript button in the corner of a user’s screen which when clicked loads a small Knowbot window, where they can ask questions. Knowbot loads its interface from our servers in Frankfurt.There’s no client-side coding or backend integration required. That was deliberately done to allow a very straightforward deployment, and it should only take new partners a couple of hours to implement. We’ve recently made an even simpler option: some partners now just link to a branded page that looks like their website but which actually exists on our servers.</p>
<p>Behind the scenes, Knowbot is written mostly in Python and uses LLMs from Anthropic, OpenAI, Meta, Google, and Perplexity. We don’t develop our own models — few organisations on Earth have the budget for that. Instead, we “build the car around the engine,” and it’s a slightly different car for each non-profit that we work with.</p>
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><img src="https://realworlddatascience.net/foundation-frontiers/posts/2025/11/27/images/thumbnailsocial.png" class="img-fluid quarto-figure quarto-figure-center figure-img" style="width:80.0%"></p>
</figure>
</div>
</section>
<section id="ethics-by-design-domain-restriction-as-a-safety-mechanism" class="level2">
<h2 class="anchored" data-anchor-id="ethics-by-design-domain-restriction-as-a-safety-mechanism">Ethics by Design: Domain Restriction as a Safety Mechanism</h2>
<p><strong>Q: As an AI-for-good nonprofit, how do you approach issues of risk, bias, and governance?</strong></p>
<p>With nonprofits, we are hyper-aware of these issues. Some of the organisations we work with handle sensitive information, so we need to be sure we’re not introducing new risks. With a non-profit that is working in healthcare, for example, if there is a possibility that Knowbot may be used for medical-adjacent questions, then we need to think carefully about whether that is something we should be doing and, if so, whether we can do it safely.</p>
<p>One major choice we made early on was domain restriction. Whereas most of the big answer engines out there, like ChatGPT, search the full public internet, Knowbot will only use its internal knowledge and specific website(s) upon which it’s based (i.e.&nbsp;the non-profit’s own website(s)). That means the knowledge is curated and the nonprofit knows exactly what information Knowbot can draw on. That dramatically reduces the risk of hallucination, misinformation, or unsafe advice.</p>
<p>We also adjust our prompts continually based on feedback. For example, we hadn’t anticipated that users would ask questions like “Who are you?” or “What is Knowbot?” Because the model had no context, it responded unpredictably. So we now require partners to include a “What is Knowbot?” page on their site, which Knowbow can reference.</p>
</section>
<section id="evolving-with-the-technology" class="level2">
<h2 class="anchored" data-anchor-id="evolving-with-the-technology">Evolving with the Technology</h2>
<p><strong>Q: What technical challenges have you encountered?</strong></p>
<p>I think we have been really lucky in terms of timing. To do what we’re doing now five years ago would have been completely impossible, because the LLMs weren’t there. From an infrastructure perspective, newer services such as cloud hosting tools like Render now let us deploy servers in minutes. That’s just a breath of fresh air. It takes away a lot of the operational heartache. And coding has become dramatically easier—AI assistance means we can build things now that would have taken us weeks before.</p>
<p>We’ve also been lucky in that LLM technology has improved fast enough that we’ve been able to incorporate new functionality almost as quickly as our nonprofit partners have requested it. For example, we now allow partners to restrict Knowbot to particular sections or topics within a website, or to sit across multiple websites. This has only become practical as models and retrieval tools have matured. Development has become faster, too: for example, we can now submit an entire codebase into our coding LLMs and ask questions about it. By comparison, when ChatGPT first launched we could only submit a small section of a program at a time.</p>
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><img src="https://realworlddatascience.net/foundation-frontiers/posts/2025/11/27/images/impact.png" class="img-fluid quarto-figure quarto-figure-center figure-img" style="width:80.0%"></p>
</figure>
</div>
</section>
<section id="impact-what-knowbot-is-changing" class="level2">
<h2 class="anchored" data-anchor-id="impact-what-knowbot-is-changing">Impact: What Knowbot Is Changing</h2>
<p><strong>What impact have you seen so far?</strong></p>
<p>There hasn’t been anybody who has stopped and decided that it’s not for them, which is encouraging. Most of our nonprofit partners have started small — monitoring answers, getting staff comfortable — and have then expanded deployment. Feedback has been invaluable and very positive.</p>
<p>We can measure impact partly by usage volume, but more importantly by value. Getting 1,000 questions about “where is the ice cream stall?” on a venue website is fine, but the real impact is when researchers or decision-makers can extract complex information from, say, a conservation charity or a healthcare resource site. That’s where AI becomes transformational.</p>
<p>We want to add more value with more nonprofits and other for-good actors. We think there are some obvious things that should be done more generally. Whether we do them through Knowbot or whether we flag other suitable tech partners isn’t clear yet, and probably isn’t that important. But, for example, long-term, we would like to see tools like Knowbot become standard on sites like NHS.uk, gov.uk, NICE, and others. These sites hold high-quality knowledge but can be very difficult to navigate due to the sheer volume of information they contain. Traditional on-website search tools simply aren’t up to the job and LLM-based retrieval is a natural upgrade. The sooner these sites start adopting tools such as Knowbot, the better information people are going to get. Whether that means members of the public, or whether it means professionals that are looking for technical advice, the biggest wins come from the right information helping people make valuable decisions.</p>
</section>
<section id="advice-for-nonprofits-considering-llms" class="level2">
<h2 class="anchored" data-anchor-id="advice-for-nonprofits-considering-llms">Advice for Nonprofits Considering LLMs</h2>
<p><strong>What advice would you give nonprofits thinking about adopting LLMs?</strong></p>
<p>Start small. Don’t try to design a global solution from day one. There is so much hype around AI — some justified, some not — that it feels like a high-risk decision. LLM AI is still new and evolving rapidly. Understand that you don’t know what you don’t know, and be prepared to experiment on a small scale, before building out.</p>
<p>Use low-impact tools first. Let different teams build familiarity. Learn what the models can and can’t do in your context. Build internal confidence gradually.</p>
<p>And for practitioners inside nonprofits facing resistance: the conversation is the same one we have externally. Be clear about risks, mitigation, and the fact that this is a learning process. Build trust.</p>
<p><strong>Q: Where do you see the next big opportunity for AI in the public interest?</strong></p>
<p>We’re at the beginning of another new chapter in generative AI. Until now, most LLMs have been about retrieval and synthesis. The next transformational phase is agentic AI: systems that can do things - take actions autonomously, or semi-autonomously. That will be incredibly consequential for society, with new risks and huge potential benefits. Getting that right is absolutely essential. Future technology aside, there remain enormous public interest opportunities even for today’s tech. There is always a lag between new technology and its adoption and implementation.</p>
<p><strong>Q: Anything you’d like readers to know about Knowbot?</strong></p>
<p>Yes! Come and talk to us. Every conversation teaches us something new, whether or not the organisation ends up using Knowbot. We’re eager to collaborate with nonprofits and with tech companies building the next generation of models.</p>
<p><em>Mike Hudson is an entrepreneur in technology &amp; electronic markets. He now uses his expertise to help solve social problems. Mike founded TestRAMP, a pandemic nonprofit social market described as a “major contribution to Covid PCR testing &amp; genomic sequencing” &amp; donated its £2.4mn profits for charity. Mike is a Fellow of ZSL &amp; adviser to its CEO. He is an honorary Research Fellow at City, University of London. Mike is a member of the Responsible AI Institute. He is a Foundation Fellow at St Antony’s College, University of Oxford.</em></p>
<div class="article-btn">
<p><a href="../../../../../foundation-frontiers/index.html">Explore more data science ideas</a></p>
</div>
<div class="further-info">
<div class="grid">
<div class="g-col-12 g-col-md-12">
<dl>
<dt>About the authors</dt>
<dd>
<strong>Annie Flynn</strong> is Head of Content at the RSS.
</dd>
</dl>
</div>
<div class="g-col-12 g-col-md-6">
<dl>
<dt>Copyright and licence</dt>
<dd>
© 2025 Annie Flynn
</dd>
</dl>
</div>
<div class="g-col-12 g-col-md-6">
<dl>
<dt>How to cite</dt>
<dd>
Flynn, Annie 2025. “AI for Social Good: Interview with the Founder of the Mike Hudson Foundation,” Real World Data Science, November 27, 2025. <a href="https://realworlddatascience.net/foundation-frontiers/posts/2025/11/27/MHF-interview.html">URL</a>
</dd>
</dl>
</div>
</div>
</div>


</section>

 ]]></description>
  <guid>https://realworlddatascience.net/foundation-frontiers/posts/2025/11/27/MHF-interview.html</guid>
  <pubDate>Thu, 27 Nov 2025 00:00:00 GMT</pubDate>
  <media:content url="https://realworlddatascience.net/foundation-frontiers/posts/2025/11/27/images/thumbnailsocial.png" medium="image" type="image/png" height="96" width="144"/>
</item>
<item>
  <title>Beyond Quantification: Navigating Uncertainty in Professional AI Systems</title>
  <dc:creator>Annie Flynn</dc:creator>
  <link>https://realworlddatascience.net/foundation-frontiers/datasciencebites/posts/2025/11/21/uncertainty.html</link>
  <description><![CDATA[ 





<div class="callout callout-style-default callout-note callout-titled" style="margin-top: 0rem;">
<div class="callout-header d-flex align-content-center" data-bs-toggle="collapse" data-bs-target=".callout-1-contents" aria-controls="callout-1" aria-expanded="true" aria-label="Toggle callout">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
<span class="screen-reader-only">Note</span>About the paper and this post
</div>
<div class="callout-btn-toggle d-inline-block border-0 py-1 ps-1 pe-0 float-end"><i class="callout-toggle"></i></div>
</div>
<div id="callout-1" class="callout-1-contents callout-collapse collapse show">
<div class="callout-body-container callout-body">
<p><strong>Title:</strong> Beyond Quantification: Navigating Uncertainty in Professional AI Systems</p>
<p><strong>Author(s) and year:</strong> Sylvie Delacroix, Diana Robinson, Umang Bhatt, Jacopo Domenicucci, Jessica Montgomery, Gaël Varoquaux, Carl Henrik Ek, Vincent Fortuin, Yulan He, Tom Diethe, Neill Campbell, Mennatallah El-Assady, Søren Hauberg, Ivana Dusparic12 and Neil D. Lawrence (2025)</p>
<p><strong>Status:</strong> Published in <em>RSS: Data Science and Artificial Intelligence</em>, open access: <a href="https://academic.oup.com/rssdat/article/1/1/udaf002/8317136">HTML</a></p>
</div>
</div>
</div>
<p>As artificial intelligence systems—especially large language models (LLMs)—become woven into everyday professional practice, they increasingly influence sensitive decisions in healthcare, education, and law. These tools can draft medical notes, comment on student essays, propose legal arguments, and summarise complex documents. But while AI can now answer many questions confidently, professionals know that confidence is not always what matters most.</p>
<p>Consider a doctor who suspects a patient may be experiencing domestic abuse, or a teacher trying to distinguish between a student’s misunderstanding and a culturally shaped interpretation of a text. These are situations where uncertainty isn’t just about missing data—it’s about interpretation, ethics, and human judgment.</p>
<p>Yet much current AI research focuses on quantifying uncertainty: assigning probability scores, confidence levels, or error bars. The authors of this paper argue that while such numbers help in some cases, they miss the forms of uncertainty that truly matter in professional decision-making. If AI systems rely only on numeric confidence, they risk eroding the very expertise they aim to support.</p>
<p>This paper asks a simple but transformative question: <strong>What if uncertainty isn’t always something to quantify, but something to communicate?</strong></p>
<section id="why-quantification-isnt-enough" class="level2 page-columns page-full">
<h2 class="anchored" data-anchor-id="why-quantification-isnt-enough">Why quantification isn’t enough</h2>
<p>The authors highlight a fundamental mismatch between the way today’s AI systems handle uncertainty and the way real professionals experience it. They distinguish between:</p>
<ul>
<li><p>Epistemic uncertainty – when we simply don’t know enough yet (e.g., missing data, incomplete measurements). <em>This can often be quantified.</em></p></li>
<li><p>Hermeneutic uncertainty – when a situation allows multiple legitimate interpretations, often shaped by culture, ethics, or context. <em>This cannot meaningfully be reduced to a percentage.</em></p></li>
</ul>
<div class="column-page">
<p><img src="https://realworlddatascience.net/foundation-frontiers/datasciencebites/posts/2025/11/21/images/uncertaintykinds.png" class="img-fluid"></p>
</div>
<p>Professional judgment often depends on this second kind. Teachers, doctors, and lawyers rely on tacit skills: subtle perceptions, ethical intuitions, and context-sensitive interpretation. AI systems trained on statistical patterns struggle to reflect this nuance.</p>
<p>When an AI model gives a probability score — “I’m 70% sure this infection is bacterial” — it communicates something useful. But if the real uncertainty stems from ethical or contextual complexity (e.g., whether asking a patient certain questions might put them at risk), probability scores offer a false sense of clarity.</p>
<p>The paper gives practical examples:</p>
<ul>
<li><p>A medical AI might be highly confident about symptoms but blind to the social dynamics suggesting abuse.</p></li>
<li><p>An educational AI may accurately flag grammar issues but miss culturally sensitive interpretations in a student essay.</p></li>
</ul>
<p>In both cases, the most important uncertainties are precisely the ones that cannot be captured by numbers.</p>
</section>
<section id="why-this-matters-now" class="level2">
<h2 class="anchored" data-anchor-id="why-this-matters-now">Why this matters now</h2>
<p>The authors warn that the problem becomes even more serious as we move toward agentic AI systems—multiple AI agents interacting and making decisions together. If one system miscommunicates uncertainty, the error may ripple through an entire network.</p>
<p>To address this, the authors propose shifting away from trying to algorithmically “solve” uncertainty, and instead enabling professionals themselves to shape how AI expresses it.</p>
</section>
<section id="takeaways-and-implications" class="level2 page-columns page-full">
<h2 class="anchored" data-anchor-id="takeaways-and-implications">Takeaways and implications</h2>
<p><strong>1. Uncertainty expression is part of professional expertise, not just a technical feature</strong></p>
<p>AI should not simply output probabilities. It should help preserve and enhance the ways professionals reason through complex, ambiguous situations. That means:</p>
<ul>
<li><p>highlighting when interpretation is required</p></li>
<li><p>surfacing multiple plausible perspectives</p></li>
<li><p>signalling when ethical judgment is involved</p></li>
<li><p>encouraging expanded inquiry rather than false certainty</p></li>
</ul>
<p>For example, instead of producing a diagnosis score, an AI assistant might say: “This pattern warrants attention to social context. Consider asking open-ended questions to understand the patient’s circumstances.”</p>
<p>This kind of prompting respects and supports professional judgment.</p>
<p><strong>2. Professionals—not engineers—must define how uncertainty is communicated</strong></p>
<p>The authors propose participatory refinement, a process where communities of practitioners (teachers, doctors, judges, etc.) collectively shape:</p>
<ul>
<li><p>the categories of uncertainty that matter in their field</p></li>
<li><p>the language and formats AI systems should use</p></li>
<li><p>how these systems should behave in ethically sensitive scenario</p></li>
</ul>
<p>This differs from typical user feedback loops. Instead of individuals clicking “thumbs down,” whole professions deliberate on what kinds of uncertainty an AI system should express and how.</p>
<p><strong>3. This requires new technical and organisational approaches</strong></p>
<div class="column-page">
<p><img src="https://realworlddatascience.net/foundation-frontiers/datasciencebites/posts/2025/11/21/images/futureai.png" class="img-fluid"></p>
</div>
<p>To make participatory refinement possible, future AI systems need:</p>
<ul>
<li><p>architectures that can incorporate community-defined uncertainty frameworks</p></li>
<li><p>interfaces designed for collective sense-making, not just individual use</p></li>
<li><p>institutional support (e.g., workshops, governance processes, professional committees)</p></li>
</ul>
<p>While this takes more time than simply deploying an AI system “out of the box,” the authors argue that in fields like healthcare or law, these deliberative processes are essential, not optional.</p>
<p><strong>4. Preserving “productive uncertainty” is key for ethical, adaptive professional practice</strong></p>
<p>If AI tools flatten complex uncertainty into simple numbers, they may unintentionally narrow the space for professional judgment and ethical debate. The authors suggest that sustained ambiguity—open questions, competing interpretations, ethical reflection—is not a flaw in human reasoning but a feature of high-quality professional work.</p>
<p>Well-designed AI should help maintain that reflective space, not close it down.</p>
<p><strong>Further reading</strong></p>
<p>For readers interested in exploring more:</p>
<ul>
<li><p>David Spiegelhalter – The Art of Uncertainty (accessible introduction to uncertainty in science)</p></li>
<li><p>Iris Murdoch – The Sovereignty of Good (on moral perception)</p></li>
<li><p>Participatory AI frameworks such as STELA (Bergman et al., 2024)</p></li>
<li><p>Visual analytics research on human-in-the-loop data interpretation</p></li>
<li><p>Discussions of agentic AI systems and coordinated AI in healthcare</p></li>
<li><p>Delacroix’s work on LLMs in ethical and legal decision-making</p></li>
</ul>
</section>
<section id="in-summary" class="level2">
<h2 class="anchored" data-anchor-id="in-summary">In summary</h2>
<p>This paper argues that if AI is to genuinely assist professionals, it must go beyond quantification. Numbers alone cannot capture the ethical, interpretive, and contextual uncertainties that define professional practice. Instead, AI should help preserve and enrich human judgment by communicating uncertainty in ways co-designed with the communities who rely on it. AI should not just be <em>accurate</em> —it should be <em>appropriately uncertain</em>.</p>
<div class="further-info">
<div class="grid">
<div class="g-col-12 g-col-md-6">
<dl>
<dt>About the author</dt>
<dd>
<strong>Annie Flynn</strong> is Head of Content at the RSS.
</dd>
</dl>
</div>
<div class="g-col-12 g-col-md-6">
<dl>
<dt>About DataScienceBites</dt>
<dd>
<a href="../../../../../../foundation-frontiers/datasciencebites/index.html"><strong>DataScienceBites</strong></a> is written by graduate students and early career researchers in data science (and related subjects) at universities throughout the world, as well as industry researchers. We publish digestible, engaging summaries of interesting new pre-print and peer-reviewed publications in the data science space, with the goal of making scientific papers more accessible. Find out how to <a href="../../../../../../contributor-docs/datasciencebites.html">become a contributor</a>.
</dd>
</dl>
</div>
</div>
</div>


</section>

 ]]></description>
  <guid>https://realworlddatascience.net/foundation-frontiers/datasciencebites/posts/2025/11/21/uncertainty.html</guid>
  <pubDate>Fri, 21 Nov 2025 00:00:00 GMT</pubDate>
  <media:content url="https://realworlddatascience.net/foundation-frontiers/datasciencebites/posts/2025/11/21/images/thumb.png" medium="image" type="image/png" height="96" width="144"/>
</item>
<item>
  <title>Why We Should All Be Data Quality Detectives</title>
  <dc:creator>A. Rosemary Tate and Roger Halliday</dc:creator>
  <link>https://realworlddatascience.net/foundation-frontiers/posts/2025/10/30/data-detectives.html</link>
  <description><![CDATA[ 





<p>At the <a href="https://rss.org.uk/news-publication/news-publications/2025/general-news/president-s-blog-reflections-on-a-record-breaking/">2025 Royal Statistical Society conference</a> in Edinburgh, a lively group of statisticians and data scientists gathered to tackle a quietly critical issue: data quality. Our workshop, titled “Why we should all be data quality detectives”, drew around 40 participants into a dynamic conversation about why data quality is often overlooked and what we can do to change that.</p>
<div id="thumbnail.png" class="quarto-figure quarto-figure-center anchored">
<figure class="figure">
<p><img src="https://realworlddatascience.net/foundation-frontiers/posts/2025/10/30/images/thumbnail.png" class="img-fluid figure-img" alt="The workshop drew 40 participants."></p>
<figcaption>The workshop drew 40 participants.</figcaption>
</figure>
</div>
<section id="the-case-for-data-quality" class="level2">
<h2 class="anchored" data-anchor-id="the-case-for-data-quality">The Case for Data Quality</h2>
<p>If you search for “data quality disasters” on any search engine, you will find many results. Similarly, literature on data quality measures offers abundant advice. But within the scientific research community, data quality is often ignored. For example, how often have you encountered the term “data quality” in the guidelines when submitting or reviewing an academic paper? We would venture to say, hardly ever (or never).</p>
<p>This is puzzling, because high-quality data (i.e., data that is fit for purpose) is essential; without it, results become almost meaningless. Data serves as the foundation of our work. So why isn’t its quality given the prominence it deserves? Why aren’t we, as statisticians and data scientists, advocating data quality more vocally?</p>
<p>Recent publications may shed some light: it appears that “Everyone wants to do the model work, not the data work” [1]<sup>1</sup>, and that statisticians may feel uneasy with elements that are not easily quantifiable [2]<sup>2</sup>. Or perhaps we are all guilty of “premature enumeration” (as Tim Harford puts it), rushing into data analysis without having a good look at the data first. Whatever the case, data quality work or “data cleaning/wrangling” is not seen as fun.</p>
<p>For us, as self-confessed “data quality detectives”, the reverse is true, and we began the workshop by reframing data quality not as a tedious chore, but as an empowering and even enjoyable part of the analytical process. We spend hours looking at the data, enjoying the delayed gratification of finally getting to (trustable) results.</p>
<p>In Rosemary’s case, her attitude was shaped by key experiences early in her statistical career. Her doctoral research focused on developing methods to automatically classify magnetic resonance spectra of human leg adipose tissue based on diet—specifically distinguishing between vegans and omnivores. The study recruited 33 vegans, while the control group included 34 omnivores and 8 vegetarians, primarily staff from the MRI unit at Hammersmith Hospital. With limited experience at the time, she began experimenting with various techniques, starting with k-means cluster analysis. Although she hoped the clusters would reflect dietary groups, the analysis instead produced two distinct clusters—one containing just two spectra and the other containing the rest. After consulting colleagues, she learned that the two outlier spectra had been acquired using a different protocol and were mistakenly included in the dataset. While she may have identified the error later, catching it early saved her several weeks of work — and won her some kudos with colleagues.</p>
<div id="kudos.png" class="quarto-figure quarto-figure-center anchored">
<figure class="figure">
<p><img src="https://realworlddatascience.net/foundation-frontiers/posts/2025/10/30/images/kudos.png" class="img-fluid figure-img" alt="Catching it early saved her several weeks of work."></p>
<figcaption>Catching it early saved her several weeks of work.</figcaption>
</figure>
</div>
</section>
<section id="detective-work-at-the-tables" class="level2">
<h2 class="anchored" data-anchor-id="detective-work-at-the-tables">Detective Work at the Tables</h2>
<p>During the workshop, we split into six groups to investigate two questions: Why does data quality get overlooked? What strategies can raise its profile?</p>
<p>The discussions were rich and revealing. Many pointed to organisational gaps — no clear strategy, limited training, and confusion over who is responsible for data quality. Others highlighted cultural issues: time pressures, lack of curiosity, and a tendency to assume someone else has already checked the data.</p>
<p>Simple Excel errors are also common. We heard an example case of a study comparing a new, advanced imaging machine with an older model. The results were presented in a spreadsheet, which included several measurements. As expected, the correlation matrix showed strong correlations between most columns—except for the first, which was the main measure of interest. It quickly became apparent that the sort function had been applied to that column, scrambling the values and rendering them effectively random. Unfortunately, the researcher had not kept a backup of the original data, meaning the entire experiment was compromised. During the COVID-19 pandemic, a similar technical mistake involving Excel led to <a href="https://www.bbc.co.uk/news/technology-54423988#:~:text=The%20badly%20thought%2Dout%20use,than%20a%20third%2Dparty%20contractor.">thousands of positive cases being omitted from the UK’s official daily figures</a>. These are the kinds of simple issues that could have been caught with a basic data check.</p>
<div id="excel-errors.png" class="quarto-figure quarto-figure-center anchored">
<figure class="figure">
<p><img src="https://realworlddatascience.net/foundation-frontiers/posts/2025/10/30/images/excel-errors.png" class="img-fluid figure-img" alt="Simple Excel errors are also common."></p>
<figcaption>Simple Excel errors are also common.</figcaption>
</figure>
</div>
<p>Other examples were given of data quality issues arising when datasets were used for a specific research focus, and the quality checks applied were tailored too narrowly to that focus. Additional problems only became apparent when the same data was later used for a different research purpose. Conclusion: you can’t be complacent about the quality of the data you’re using.</p>
</section>
<section id="strategies-for-change" class="level2">
<h2 class="anchored" data-anchor-id="strategies-for-change">Strategies for Change</h2>
<p>The second question sparked even more ideas. Suggestions ranged from embedding data quality education early (even at school level) to implementing cultural changes that lead to great transparency. Participants called for:</p>
<ul>
<li>Training and upskilling across roles</li>
<li>Transparent reporting of errors and limitations</li>
<li>Positive feedback loops for data collectors</li>
<li>Rewarding quality work and error detection</li>
<li>Modernising systems and improving interoperability</li>
<li>Using AI and automation to support quality checks</li>
<li>Publications including recommendations for more transparent reporting of “initial data analysis” in their guidelines.</li>
</ul>
<p>One standout idea: organisations could promote a “data amnesty” culture where errors can be acknowledged without blame. This is something Roger experienced during his time as Chief Statistician for the Scottish Government. There, he occasionally encountered serious data quality issues that required official statistics to be revised or delayed. Being transparent with users about these issues was a key principle of the Code of Practice for Official Statistics. A conscious effort was made — through training and through taking a certain approach to handling such situations — to foster a culture of openness and accountability. Staff were supported to create and implement plans to address the problems, learn from them, and communicate clearly with users. This transparency was essential to maintaining trust in both our processes and the statistics we produced.</p>
</section>
<section id="a-call-to-action" class="level2">
<h2 class="anchored" data-anchor-id="a-call-to-action">A Call to Action</h2>
<p>We walked away from the workshop with a clear conclusion: data quality needs a culture shift. It’s not enough to care — we need to prioritise and celebrate the work of those who keep our data trustworthy, while educating other stakeholders about what it involves.</p>
<p>Shaping the next steps will require keeping this conversation going within the data community and Real World Data Science can play an integral role in that. As a direct result of this piece, we have <a href="https://realworlddatascience.net/contributor-docs/call-for-contributions.html">updated our submission guidelines</a> to include recommendations for transparent data reporting and we would like to publish more stories of data disasters – or disasters averted through careful attention to data quality.</p>
<p>As one attendee put it, “We need to challenge the data and learn best practice from the get-go.” It’s time to embrace our inner data detectives; the integrity of our insights depends on it.</p>
<p>Please share your own data disaster stories in the comments, or in the <a href="mailto:rwds@rss.org.uk">Real World Data Science inbox</a>.</p>
<div class="article-btn">
<p><a href="../../../../../foundation-frontiers/index.html">Explore more data science ideas</a></p>
</div>
<div class="further-info">
<div class="grid">
<div class="g-col-12 g-col-md-12">
<dl>
<dt>About the authors</dt>
<dd>
<strong>Rosemary Tate</strong> is a Chartered Biostatistician and Computer Scientist with over 30 years of experience in medical research and statistical consulting. She has a BSC in mathematics and a DPhil in Computer Science and AI, and an MSc in Medical Statistics. She has been scientific manager of a large EU-funded project and held lectureships at the Institutes of Child Health and Psychiatry. An independent statistical consultant since 2016, she now spends most of her time as a “Data Quality Agent Provocateur”.
</dd>
<dd>
<strong>Roger Halliday</strong> CEO at Research Data Scotland, providing leadership to improving public wellbeing through transforming how data is used in research, innovation and insight. Roger was Scotland’s Chief Statistician from 2011 to 2022. During that time he was also Scottish Government Chief Data Officer (2017-20), and jointly led Scottish Government Covid Analytical Team during the pandemic. Before that, he worked in the Department of Health in England as a policy analyst managing evidence for decision making across NHS issues. He became an honorary Professor at the University of Glasgow in 2019.
</dd>
</dl>
</div>
<div class="g-col-12 g-col-md-6">
<dl>
<dt>Copyright and licence</dt>
<dd>
© 2025 A. Rosemary Tate and Roger Halliday
</dd>
</dl>
</div>
<div class="g-col-12 g-col-md-6">
<dl>
<dt>How to cite</dt>
<dd>
Tate, Rosemary A. and Halliday, Roger. 2025. “Why We Should All Be Data Quality Detectives” Real World Data Science, October 30, 20245. <a href="https://realworlddatascience.net/foundation-frontiers/posts/2024/10/31/data-detectives.html">URL</a>
</dd>
</dl>
</div>
</div>
</div>


</section>


<div id="quarto-appendix" class="default"><section id="footnotes" class="footnotes footnotes-end-of-document"><h2 class="anchored quarto-appendix-heading">References</h2>

<ol>
<li id="fn1"><p>Nithya Sambasivan, Shivani Kapania, Hannah Highfill, Diana Akrong, Praveen Paritosh, and Lora M Aroyo. Everyone wants to do the model work, not the data work: Data cascades in High-Stakes AI. In proceedings of the 2021 CHI Conference on Human Factors in Computing Systems, pages 1–15, 2021.↩︎</p></li>
<li id="fn2"><p>Thomas Redman and Roger Hoerl. Data quality and statistics: Perfect together? Quality Engineering, 35(1):152–159, 2023.↩︎</p></li>
</ol>
</section></div> ]]></description>
  <guid>https://realworlddatascience.net/foundation-frontiers/posts/2025/10/30/data-detectives.html</guid>
  <pubDate>Thu, 30 Oct 2025 00:00:00 GMT</pubDate>
  <media:content url="https://realworlddatascience.net/foundation-frontiers/posts/2025/10/30/images/thumbnail.png" medium="image" type="image/png" height="96" width="144"/>
</item>
<item>
  <title>Code, Calculate, Change - How Statistics Fuels AI’s Real World Impact: EICC Live</title>
  <link>https://realworlddatascience.net/foundation-frontiers/posts/2025/09/17/EICC_Live.html</link>
  <description><![CDATA[ 





<div class="quarto-video ratio ratio-16x9"><iframe data-external="1" src="https://www.youtube.com/embed/CeZpkZzWcuo" title="" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen=""></iframe></div>
<p>Artificial Intelligence (AI) is transforming how we live, work, and make decisions every day – from the content we see on social media to how we’re hired, navigate to work, or how spam is filtered from our inboxes. But what exactly is AI? How does it work, where did it come from, and where is it taking us?</p>
<p>Dr Sophie Carr, chair of the <em>Real World Data Science</em> <a href="https://realworlddatascience.net/the-pulse/editors-blog/posts/2022/10/18/meet-the-team.html">board of editors</a>, was joined by a panel of expert speakers for a public lecture in Edinburgh at the beginning of the month. <em>Real World Data Science</em> <a href="https://realworlddatascience.net/the-pulse/posts/2025/07/28/NHS-foundation-AI.html">contributor</a> Will Browne delivered “a hitchhiker’s guide to the history of AI”, taking us from the first ever algorithm (coded by “poetical scientist” Ada Lovelace) to today’s large langugage models, via a counting horse and a US naval invention.</p>
<p>Parwez Diloo, a data scientist at <a href="https://baysconsulting.co.uk/">Bays Consulting</a>, talked about how to balance technology with a human touch in recruitment processes (and the difference between maths and magic!)</p>
<p>And Amy Wilson, a lecturer in industrial mathematics at the University of Edinburgh, spoke about graphical modelling for decision-making in criminal contexts, touching on the legal failures of probabilistic reasoning in high profile cases like that of Lucy Letby and Amanda Knox.</p>
<p>The talk was rounded off by a lively Q&amp;A session which covered the viability of AI-designed graphical models, remedies to the so-called inappropriate uses of AI, and collective action we can take to ensure AI bias does not entrench existing inequalities.</p>
<p>This talk was part of the <a href="https://www.eicc.co.uk/eicc-live/">EICC Live</a> programme, a series of free public talks held by the <a href="https://www.eicc.co.uk/">EICC</a> as part of a commitment to community engagement and quality education. The talk was filmed by EICC and is published here with thanks to them. It took place at the RSS 2025 International Conference.</p>
<div class="g-col-12 g-col-sm-6">
<div class="nav-btn">
<p><a href="../../../../../foundations-frontiers/index.qmd">Back to Foundations &amp; Frontiers</a></p>
</div>
</div>
<div class="further-info">
<div class="grid">
<div class="g-col-12 g-col-md-12">

</div>
<div class="g-col-12 g-col-md-6">
<dl>
<dt>Copyright and licence</dt>
<dd>
© 2025 EICC
</dd>
</dl>
</div>


</div>
</div>

 ]]></description>
  <category>AI</category>
  <category>Communication</category>
  <category>Skills</category>
  <guid>https://realworlddatascience.net/foundation-frontiers/posts/2025/09/17/EICC_Live.html</guid>
  <pubDate>Wed, 17 Sep 2025 00:00:00 GMT</pubDate>
  <media:content url="https://realworlddatascience.net/foundation-frontiers/posts/2025/09/17/images/ELgraphic.png" medium="image" type="image/png" height="144" width="144"/>
</item>
<item>
  <title>All Creatures, Great, Small, and Artificial</title>
  <dc:creator>Robyn Lowe and Edward Rochead</dc:creator>
  <link>https://realworlddatascience.net/foundation-frontiers/posts/2025/08/26/new veterinary medicine.html</link>
  <description><![CDATA[ 





<p>This article had its genesis when co-author Ed’s dog, Sparkle, was treated for pneumonia in the summer of 2024. Ed, a mathematician and chair of the <a href="https://alliancefordatascienceprofessionals.com/">Alliance for Data Science Professionals</a>, was intrigued by the surgery’s use of data in Sparkle’s treatment and decided to find out more about the use of data and AI in veterinary medicine. His exploration led to a guest appearance on the <a href="https://www.vetvoices.co.uk/podcasts">Vet Voices on Air</a> podcast hosted by co-author Robyn. She is a registered veterinary nurse (RVN) and the director of <a href="https://www.vetvoices.co.uk/">Veterinary Voices UK</a>. Inspired by that conversation, this article explores the ways veterinary professionals are currently applying data science principles and how professions adapt and evolve in the face of these developments.</p>
<div id="fig-1" class="quarto-float quarto-figure quarto-figure-center anchored" data-fig-align="center" alt="Sparkle. Credit: Edward Rochead.">
<figure class="quarto-float quarto-float-fig figure">
<div aria-describedby="fig-1-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
<img src="https://realworlddatascience.net/foundation-frontiers/posts/2025/08/26/images/Sparkle.png" class="img-fluid quarto-figure quarto-figure-center figure-img" alt="Sparkle. Credit: Edward Rochead.">
</div>
<figcaption class="quarto-float-caption-bottom quarto-float-caption quarto-float-fig" id="fig-1-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
Figure&nbsp;1: Sparkle. Credit: Edward Rochead
</figcaption>
</figure>
</div>
<p>The use of AI and the data science that underpins and enables it are growing in ubiquity, and one area that is embracing these approaches is veterinary medicine.</p>
<p>Unlike human medicine in the UK, veterinary medicine is not organised under a centralised system such as the National Health Service (NHS). Instead, veterinary care is delivered through a variety of business structures, including Joint Venture Practices, Independent, Corporate, and Charity. These structures differ not only in ownership and funding but also in the scope and services that the practices provide. In many cases,the availability of more specialised care may depend on the expertise of individual(s) within the practice. Broadly speaking, practices tend to include: farm animals; exotics; equine; small animal; and mixed. Some practices will cover zoological work, conservation work and invertebrate work among other specialties.</p>
<p>Veterinary surgeons and RVNs are also employed in academia, conducting applied research in industry or government, and as advisors in government agencies.</p>
<section id="data-in-the-veterinary-profession-challenges-and-opportunities" class="level2">
<h2 class="anchored" data-anchor-id="data-in-the-veterinary-profession-challenges-and-opportunities">Data in the Veterinary Profession: Challenges and Opportunities</h2>
<p>If Artificial Intelligence (AI) is to be used in any sphere, it needs to be trained on data. The data used to train should be relevant, complete, structured, accurate, consistently formatted, and labelled. Achieving this standard is a challenge not only in veterinary medicine but also in many other fields where data are fragmented and inconsistently recorded. In the veterinary profession, unlike centralised NHS data, the veterinary data are often stored in individual practices or farms. These may use different formats and scales (such as imperial or metric), US or UK date formats, and twelve or twenty-four hour clocks. These records may also fail to follow the animal if it is sold or moves to a new practice. Such inconsistencies mirror the difficulties faced in other domains, and can make the adoption of AI in veterinary medicine particularly complex..</p>
<p>On the other hand, animal data has fewer constraints than human data. Article 4 of the <a href="https://www.gov.uk/data-protection">UK General Data Protection Regulation</a> (GDPR) makes it clear that the act applies to ‘personal data’ and specifies that ‘an identifiable natural person is one who can be identified’, which means that there is potentially more freedom to use data related to animals than humans. (It is worth noting that the GDPR would apply to the farmer, pet owner, or veterinary staff involved, so some consideration might still be required.) Given this data is an asset, it is worth considering whether it is owned by the animal’s owner or the veterinary professional (or their employer) in any given circumstance.</p>
</section>
<section id="how-ai-is-already-transforming-veterinary-practice" class="level2">
<h2 class="anchored" data-anchor-id="how-ai-is-already-transforming-veterinary-practice">How AI is Already Transforming Veterinary Practice</h2>
<p>AI is becoming an affordable and widely used tool in veterinary medicine. It’s now commonly applied in areas like diagnostics, treatment, and disease monitoring and prediction, despite the misconception that it’s rarely used. Preventative healthcare has always been a key aim within veterinary medicine. The obligation to ensure that both animal health and welfare and public health are accounted for is reflected by point 6.1 of the <a href="https://www.rcvs.org.uk/setting-standards/advice-and-guidance/code-of-professional-conduct-for-veterinary-surgeons/#public">Code of Professional Conduct for Veterinary Surgeons</a> and <a href="https://www.rcvs.org.uk/setting-standards/advice-and-guidance/code-of-professional-conduct-for-veterinary-nurses/#public">RVNs</a>: ‘6.1 Veterinary surgeons must seek to ensure the protection of public health and animal health and welfare’.</p>
<p><strong>Diagnostics</strong><br>
Diagnosis and prediction of diseases is one key area where AI is being used in veterinary medicine in farm animals, companion animals and beyond.</p>
<p>For example, in companion animals AI has been used to assist in the diagnosis of canine <a href="https://pubmed.ncbi.nlm.nih.gov/32006871/">hypoadrenocorticism</a>, an endocrine disease. Additionally, machine learning algorithms have potential for improving the <a href="https://pubmed.ncbi.nlm.nih.gov/40440642/">prediction and diagnosis</a> of leptospirosis, an infectious zoonotic disease. Additionally, by combining MRI data with facial image analysis, an AI tool can assist in predicting the likelihood of <a href="https://onlinelibrary.wiley.com/doi/full/10.1111/jvim.15621">chiari like malformation (CM) and syringomyelia (SM)</a> from images of the dog’s head obtained via an owner’s smartphone. And finally, AI can also assist with faecal analysis: images are analysed by proprietary <a href="https://dugganvet.ie/ovacyte/">Artificial Intelligence models</a> which reference the images against the Telenostic Reference Library, a company specialising in parasitology diagnostic solutions. The image recognition software identifies each specific parasite species and the number of parasitic eggs or oocysts present.</p>
<p>These are just a few examples of AI use in companion animals currently.</p>
<p><strong>Disease Monitoring and Prediction</strong> Disease monitoring and prediction are exciting because they can help us act earlier—sometimes even preventing illness. This not only improves animal health and welfare, but also supports <a href="https://www.skeptic.org.uk/2024/05/agr-tech-will-technology-help-or-hinder-food-production-and-animal-welfare/">antimicrobial stewardship</a> by reducing unnecessary treatments, helping to combat antimicrobial resistance—a serious global threat to both animals and humans.</p>
<p>An area that demonstrates compelling evidence of these positive outcomes is <a href="https://www.skeptic.org.uk/2024/05/agr-tech-will-technology-help-or-hinder-food-production-and-animal-welfare/">farming and agriculture</a>, where farmers are able to use AI to monitor herds, and act promptly to treat disease, before it would have been evident and notable by human monitoring. Examples, which will be explored in more detail below, include body condition technology, lameness technology, disease recognition, grazing, land and pasture management, biosensors and biochips and more.</p>
<div id="fig-2" class="quarto-float quarto-figure quarto-figure-center anchored" data-fig-align="center" alt="Technology that measures body condition each time the cow passes under the camera, reporting the changes in Body Condition Score directly to the farmer via app and online portal, helping to support individual cow treatment, group rationing and herd management.">
<figure class="quarto-float quarto-float-fig figure">
<div aria-describedby="fig-2-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
<img src="https://realworlddatascience.net/foundation-frontiers/posts/2025/08/26/images/herdvision1.png" class="img-fluid quarto-figure quarto-figure-center figure-img" alt="Technology that measures body condition each time the cow passes under the camera, reporting the changes in Body Condition Score directly to the farmer via app and online portal, helping to support individual cow treatment, group rationing and herd management.">
</div>
<figcaption class="quarto-float-caption-bottom quarto-float-caption quarto-float-fig" id="fig-2-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
Figure&nbsp;2: Technology that measures body condition each time the cow passes under the camera, reporting the changes in Body Condition Score directly to the farmer via app and online portal, helping to support individual cow treatment, group rationing and herd management.
</figcaption>
</figure>
</div>
<p><strong>Body Condition Technology</strong> The agricultural industry typically relies on subjective visual observation, human recording and manual reporting of all the key health and welfare traits, including Body Condition Score (BCS). Despite these individuals being highly skilled professionals, there is inevitable human error, paired with the constraints of busy farm management which can lead to cases getting picked up later in their disease process. BCS is a major indicator of metabolic performance in dairy cows and directly related to fertility performance and health traits. Technologies such as <a href="https://herd.vision/">Herdvision</a> use a 2D and 3D camera system to monitor BCS, resulting in improvement in cattle heath and fertility, less premature culling, and savings on feeding costs.</p>
<p><strong>Lameness Technology</strong><br>
Lameness is considered one of the <a href="https://www.frontiersin.org/articles/10.3389/fvets.2019.00094/full">top cattle health and welfare challenges</a>. A <a href="https://www.sciencedirect.com/science/article/abs/pii/S1871141313001698">2013 study</a> noted that almost 70% of the dairy farmers expressed an intention to take action for improving dairy cow foot health. Cattle naturally mask the signs of pain, and as with body condition scoring we have relied on subjective visual observation, human recording and manual reporting of all the key health and welfare traits. Technology that can pick up lameness earlier, with more objectivity and with less labour intensity is hugely beneficial to both the animals’ health and welfare and the farm’s profitability.</p>
<div id="fig-3" class="quarto-float quarto-figure quarto-figure-center anchored" data-fig-align="center" alt="Images that produce prioritisation list for vets and hoof trimmers, ranking cows according to severity of immobility and identifying small changes in mobility and BCS before they are visible to the human eye.">
<figure class="quarto-float quarto-float-fig figure">
<div aria-describedby="fig-3-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
<img src="https://realworlddatascience.net/foundation-frontiers/posts/2025/08/26/images/herdvision2.png" class="img-fluid quarto-figure quarto-figure-center figure-img" alt="Images that produce prioritisation list for vets and hoof trimmers, ranking cows according to severity of immobility and identifying small changes in mobility and BCS before they are visible to the human eye.">
</div>
<figcaption class="quarto-float-caption-bottom quarto-float-caption quarto-float-fig" id="fig-3-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
Figure&nbsp;3: Images that produce prioritisation list for vets and hoof trimmers, ranking cows according to severity of immobility and identifying small changes in mobility and BCS before they are visible to the human eye.
</figcaption>
</figure>
</div>
<p><strong>Disease Recognition</strong> As with lameness assessments, monitoring of pain in the UK pig industry relies on human observation, either in person or via video footage, to detect disease.</p>
<p>An interdisciplinary team at the Newcastle University have <a href="https://www.ukri.org/who-we-are/how-we-are-doing/research-outcomes-and-impact/bbsrc/ai-based-monitoring-aids-on-farm-disease-detection/">used artificial intelligence to develop automated systems</a> to analyse and monitor pig behaviour and health. The algorithm was tested in a controlled environment where infection and disease were present, assessing footage of pigs captured by cameras and pinpointing and quantifying changes in behaviours to identify links to disease.</p>
<p>Other computer vision and AI-based approaches have allowed the <a href="https://www.sciencedirect.com/science/article/abs/pii/S0168169920300673">automatic scoring</a> of pigs in relation to posture, aggressive episodes, tail-biting episodes, fouling, diarrhoea, stress prediction in piglets, weight estimation, and body size – all providing animal farmers increased insight into the health of their population.</p>
<p><strong>Grazing, Land and Pasture Management</strong> The use of AI has allowed more efficient pasture and grazing management, allowing movement of livestock onto new pastures when the grazing quality and quantity depletes below a certain threshold.</p>
<p>There are numerous methods of using Agri-Tech to monitor animals, such as the <a href="https://www.mdpi.com/1424-8220/19/3/603">SheepIT</a> project, an initiative where an automated IoT-based system controls grazing sheep. Typically, such solutions are split into two main groups: location monitoring and behaviour and activity monitoring. Location monitoring allows farmers to keep track of animals, inferring preferred pasturing areas and grazing times, and even detecting absent animals. Behaviour and activity monitoring focuses on detecting the type and duration of an animal’s activities – for example resting, eating or running - based on accelerometry and audiometry.</p>
<p><strong>Biosensors and Biochips</strong> In human medicine, advances in molecular medicine and cell biology have <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3270855/">driven the interest in electrochemical systems to detect disease biomarkers</a> and therapeutic compounds (medications for example). Currently in human literature, implantable biosensors have been noted in <a href="https://www.sciencedirect.com/science/article/abs/pii/S0956566305003544">glucose monitoring</a>, <a href="https://ieeexplore.ieee.org/abstract/document/4118162/">DNA detection</a> and <a href="https://www.sciencedirect.com/science/article/abs/pii/S0003269708007264">cultures</a>, among others. Microelectronic technology offers powerful circuits and systems to develop innovative and miniaturised biochips for sensing at the molecular level; these have numerous applications in veterinary medicine from hormone detection, pathogenic microorganism and infection monitoring and homeostatic mechanism surveillance (homeostasis being the bodies regulatory mechanisms that controls many functions and maintain stability) such as being applied to <a href="https://www.anl.gov/article/biochips-to-investigate-cattle-disease-win-entrepreneurial-challenge">pathogen detection</a> in cattle mastitis.</p>
<p>Paul Horwood, Farm Vet and Founder of <a href="https://www.bsas.org.uk/events/ailive/">AI(Live)</a>, a conference on the development of AI applications in the livestock industry, sees this as a time of opportunity for the profession:</p>
<p><em>“The farm vet’s role continues to evolve from “problem-solver after the fact” to “strategic advisor at the heart of herd health planning.” Technology is helping us get there by giving us earlier insights, better data, and stronger evidence for the decisions we make every day. We’re at a pivotal moment. The technology is here. The challenge is knowing how to use it and how to lead with it. As a farming nation, we have always been innovative; as farm animal veterinary surgeons, we can either wait to be brought in at the end of the conversation, or step forward now to shape how AI is used on UK farms. Let’s choose the latter.”</em></p>
<p><strong>Shared Frontiers: Common Threads in AI Adoption Across Sectors</strong></p>
<p>The veterinary sector, like every other industry, is on a journey when it comes to the use of artificial intelligence, and many of the themes that are emerging are common to other sectors.</p>
<p>For veterinary professionals this includes:</p>
<ul>
<li>The need to radically change training of new vets and RVNs to ensure that they are prepared to embrace the new opportunities that AI will bring.<br>
</li>
<li>The need to upskill existing vets and RVNs to enable them to use these new opportunities.</li>
<li>Working with stakeholders, such as in this case farmers and pet owners, to evolve the business model to ensure that all parties benefit from the change.</li>
<li>A change in the attitude to data, in which it becomes seen as a business asset when it is well managed, with the ultimate benefit in this sector of promoting the wellbeing of animals.</li>
</ul>
<p>These recurring patterns offer a blueprint for understanding how professions evolve in response to developments in the field, and a reminder that AI isn’t just transforming high-tech labs and Fortune 500 boardrooms – it is quietly revolutionising industries across every sector. By looking at how specific professions, like veterinary, are navigating this shift, we can better understand the broader dynamics at play when machine learning meets existing practice.</p>
</section>
<section id="bridging-disciplines-unlocking-value-through-interdisciplinary-collaboration" class="level2">
<h2 class="anchored" data-anchor-id="bridging-disciplines-unlocking-value-through-interdisciplinary-collaboration">Bridging Disciplines: Unlocking Value Through Interdisciplinary Collaboration</h2>
<p>This is a pivotal moment where the intersection of data science and veterinary medicine offers a unique opportunity for cross-sector collaboration, driving progress in both fields.</p>
<p>The field of data science has much to offer industries currently experiencing these inflection points. Although many data scientists come from ‘traditional’ backgrounds such as statistics, mathematics, or computer science graduates, many more diverse routes to data science roles now exist. These routes include people who wouldn’t necessarily call themselves data scientists who work in other professions who use data science in their working life, upskilling themselves through training, or even trial and error. The authors are already aware of veterinary professionals who are skilled data scientists, even if they may not identify as such, applying data science to veterinary research in academia or industry. The RSS, other professional bodies within the Alliance for Data Science Professionals, and Data Science departments in universities may find offering Continuous Professional Development opportunities to the veterinary profession worthwhile. Certainly, the certifications offered by the RSS of Data Science Practitioner and Advanced Data Science Practitioner are open to veterinary professionals who have developed such skills.</p>
<p>The veterinary profession may also provide benefits to the data science community, by providing data sets that can be applied in many ways without major GDPR issues, as well as opportunities to showcase the benefits of data science to society and animal health and welfare through examples similar to those above.</p>
<p>One vehicle for more cross-pollination could be joint conferences. A near-term opportunity is <a href="https://www.ailive.farm/">AI (Live)</a> in September 2025, which aims to start the debate and establish the principles by which AI and livestock farming can derive the maximum benefits, with a focus on education, governance and application.</p>
<p>By fostering collaboration across disciplines, we can ensure that the benefits of this data revolution are shared—by all creatures, great, small, and artificial. And we are happy to report that Sparkle, whose illness sparked this article, has made a full recovery and in fact recently celebrated her ninth birthday!</p>
<div class="article-btn">
<p><a href="../../../../../foundation-frontiers/index.html">Explore more data science ideas</a></p>
</div>
<div class="further-info">
<div class="grid">
<div class="g-col-12 g-col-md-12">
<dl>
<dt>About the author</dt>
<dd>
<a href="https://www.linkedin.com/in/robyn-lowe-7274a596/"><strong>Robyn Lowe, BSc (Hons), Dip AVN (Surgery, Medicine, Anaesthesia), Dip HE CVN, RVN</strong></a>, is a registered veterinary nurse and Director of <a href="https://www.vetvoices.co.uk/">Veterinary Voices UK</a>, a community of veterinary professionals fostering public understanding of veterinary and animal welfare issues. She hosts the organisation’s <a href="https://open.spotify.com/show/2DcdmAMJrwRf2RdgUPcYCP">Vet Voices on Air</a> podcast.<br>

</dd>
<dd>
<a href="https://www.linkedin.com/in/prof-edward-r-17768847/"><strong>Professor Edward Rochead, M.Math (Hons), PGDip, CMath, FIMA</strong></a> is a mathematician employed by the government, currently leading work on STEM Skills and Data. Ed is chair of the Alliance for Data Science Professionals, a Visiting Professor at Loughborough University, an Honorary Professor at the University of Birmingham, Chartered Mathematician, and Fellow of the IMA and RSA.
</dd>
</dl>
</div>
<div class="g-col-12 g-col-md-6">
<dl>
<dt>Copyright and licence</dt>
<dd>
© 2025 Robyn Lowe and Edward Rochead.
</dd>
</dl>
<p><a href="http://creativecommons.org/licenses/by/4.0/?ref=chooser-v1" target="_blank" rel="license noopener noreferrer" style="display:inline-block;"> <img style="height:22px!important;vertical-align:text-bottom;" src="https://mirrors.creativecommons.org/presskit/icons/cc.svg?ref=chooser-v1"><img style="height:22px!important;margin-left:3px;vertical-align:text-bottom;" src="https://mirrors.creativecommons.org/presskit/icons/by.svg?ref=chooser-v1"></a> Text, code, and figures are licensed under a Creative Commons Attribution 4.0 (CC BY 4.0) <a href="http://creativecommons.org/licenses/by/4.0/?ref=chooser-v1" target="_blank" rel="license noopener noreferrer" style="display:inline-block;">International licence</a>, except where otherwise noted. Thumbnail image by <a href="https://www.shutterstock.com/image-photo/cattle-cow-animal-farm-veterinary-agriculture-1463752661">Shutterstock/g/fotopanorama360</a> <a href="https://creativecommons.org/licenses/by/4.0/">Licenced by CC-BY 4.0</a>.</p>
</div>
<div class="g-col-12 g-col-md-6">
<dl>
<dt>How to cite</dt>
<dd>
Lowe, Robyn and Rochead, Edward. 2025. “All Creatures, Great, Small, and Artificial” Real World Data Science, August 22nd, 2025. <a href="https://realworlddatascience.net/foundation-frontiers/posts/2025/08/22/veterinary-medicine.html">URL</a>
</dd>
</dl>
</div>
</div>
</div>


</section>

 ]]></description>
  <category>AI</category>
  <category>Algorithms</category>
  <category>Machine Learning</category>
  <guid>https://realworlddatascience.net/foundation-frontiers/posts/2025/08/26/new veterinary medicine.html</guid>
  <pubDate>Tue, 26 Aug 2025 00:00:00 GMT</pubDate>
  <media:content url="https://realworlddatascience.net/foundation-frontiers/posts/2025/08/26/images/vet.jpg" medium="image" type="image/jpeg"/>
</item>
<item>
  <title>RSS: Data Science and Artificial Intelligence - showcase your research</title>
  <link>https://realworlddatascience.net/foundation-frontiers/datasciencebites/posts/2025/02/05/DSAI-journal.html</link>
  <description><![CDATA[ 





<p><img src="https://realworlddatascience.net/foundation-frontiers/datasciencebites/posts/2025/02/05/images/RSS-DSAI-Logo-blue.png" class="img-fluid" style="width:80.0%" alt="RSS Data Science and AI logo"><br>
</p>
<p><em>RSS: Data Science and Artificial Intelligence</em> provides a new forum for research of interest to a broad readership, spanning the data science fields. Created in recognition of the growing importance of data science and artificial intelligence in science and society, the new journal aims to fill the need for a venue that truly spans the relevant fields.</p>
<div class="img-float">
<p><img src="https://realworlddatascience.net/foundation-frontiers/datasciencebites/posts/2025/02/05/images/RSS-DSAI-cover.jpg" class="img-fluid" style="float: left; margin-right: 25px;;width:25.0%"></p>
</div>
<p>This new open access journal joins the RSS family of world class statistics journals and is published by Oxford University Press.</p>
<section id="scope-and-type-of-papers" class="level2">
<h2 class="anchored" data-anchor-id="scope-and-type-of-papers">Scope and type of papers</h2>
<p><em>RSS: Data Science and Artificial Intelligence</em> is seeking high quality papers from across the breadth of these disciplines which encompass statistics, machine learning, deep learning, econometrics, bioinformatics, engineering, computational social sciences and beyond.</p>
<p>As well as three primary paper types - method papers, applications papers and behind-the-scenes papers - <em>RSS: Data Science and Artificial Intelligence</em> will publish editorials, op-eds, interviews, and reviews/perspectives in line with its goal to become a primary destination for data scientists</p>
</section>
<section id="why-publish" class="level2">
<h2 class="anchored" data-anchor-id="why-publish">Why Publish?</h2>
<p><em>RSS: Data Science and Artificial Intelligence</em> offers an exciting open access venue for your work with a broad reach and is peer reviewed by editors esteemed in their field. Discover more about <a href="https://academic.oup.com/rssdat/pages/why-publish" target="_blank">why the new journal is the ideal platform for showcasing your research</a></p>
</section>
<section id="submit-a-paper" class="level2">
<h2 class="anchored" data-anchor-id="submit-a-paper">Submit a paper</h2>
<p>Find out how to <a href="https://academic.oup.com/jrsssa/pages/general-instructions" target="_blank">prepare your manuscript</a> for submission and visit our submission site to <a href="https://mc.manuscriptcentral.com/rssdat" target="_blank">submit your paper</a></p>
<div class="keyline">
<hr>
</div>
</section>
<section id="editors" class="level2">
<h2 class="anchored" data-anchor-id="editors">Editors</h2>
<p>&nbsp;</p>
<div class="grid">
<div class="g-col-12 g-col-md-4">
<p><img src="https://realworlddatascience.net/foundation-frontiers/datasciencebites/posts/2025/02/05/images/Mukherjee_Sach.jpg" class="img-fluid" alt="Photo of Mukherjee, Director of Research in Machine Learning for Biomedicine at the MRC"></p>
<p><strong>Sach Mukherjee</strong> is Director of Research in Machine Learning for Biomedicine at the Medical Research Council (MRC) Biostatistics Unit, University of Cambridge, and Head of Statistics and Machine Learning at the German Center for Neurodegenerative Diseases.</p>
</div>
<div class="g-col-12 g-col-md-4">
<p><img src="https://realworlddatascience.net/foundation-frontiers/datasciencebites/posts/2025/02/05/images/silvia-chiappa.jpeg" class="img-fluid" alt="Silvia Chiappa, Research Scientist at Google DeepMind"></p>
<p><strong>Silvia Chiappa</strong> is a Research Scientist at <a href="https://deepmind.com/" target="_blank">Google DeepMind</a> London, where she leads the Causal Intelligence team, and Honorary Professor at the <a href="https://www.ucl.ac.uk/computer-science/" target="_blank">Computer Science Department</a> of University College London.</p>
</div>
<div class="g-col-12 g-col-md-4">
<p><img src="https://realworlddatascience.net/foundation-frontiers/datasciencebites/posts/2025/02/05/images/neil-lawrence.png" class="img-fluid" alt="Neil Lawrenece, DeepMind Professor of Machine Learning at the University of Cambridge"></p>
<p><strong>Neil Lawrenece</strong> is the inaugural DeepMind Professor of Machine Learning at the University of Cambridge. He has been working on machine learning models for over 20 years. He recently returned to academia after three years as Director of Machine Learning at Amazon.</p>
</div>
</div>
<p><br>
</p>
<p><strong>View the full editorial board here:</strong> <a href="https://academic.oup.com/rssdat/pages/editorial-board" target="_blank">Editorial Board | RSS Data Science | Oxford Academic (oup.com)</a></p>
</section>
<section id="open-access" class="level2">
<h2 class="anchored" data-anchor-id="open-access">Open Access</h2>
<p><em>RSS: Data Science and Artificial Intelligence</em> is fully open access (OA) and is published by Oxford University Press (OUP). Your research will be free to read and can be accessed globally. An OA license increases the visibility of your research and creates more opportunities for fellow researchers to read, share, cite, and build upon your findings.</p>
<p>The cost of publishing Open Access may be covered under a Read and Publish agreement between OUP and the corresponding author’s institution. <a href="https://academic.oup.com/pages/open-research/read-and-publish-agreements/participating-journals-and-institutions" target="_blank">Find out if your institution is participating</a>. Members of the Royal Statistical Society can submit papers at a reduced cost.</p>
<p>Explore the journal’s website now <a href="https://www.academic.oup.com/rssdat" target="_blank">www.academic.oup.com/rssdat</a></p>
<div class="article-btn">
<p><a href="../../../../../../foundation-frontiers/index.html">Explore more data science ideas</a></p>
</div>
<div class="further-info">
<div class="grid">
<div class="g-col-12 g-col-md-6">
<dl>
<dt>Copyright and licence</dt>
<dd>
© 2023 Royal Statistical Society
</dd>
</dl>
<p><a href="http://creativecommons.org/licenses/by/4.0/?ref=chooser-v1" target="_blank" rel="license noopener noreferrer" style="display:inline-block;"> <img style="height:22px!important;vertical-align:text-bottom;" src="https://mirrors.creativecommons.org/presskit/icons/cc.svg?ref=chooser-v1"><img style="height:22px!important;margin-left:3px;vertical-align:text-bottom;" src="https://mirrors.creativecommons.org/presskit/icons/by.svg?ref=chooser-v1"></a> This article is licensed under a Creative Commons Attribution (CC BY 4.0) <a href="http://creativecommons.org/licenses/by/4.0/?ref=chooser-v1" target="_blank" rel="license noopener noreferrer" style="display:inline-block;">International licence</a>.</p>
</div>
<div class="g-col-12 g-col-md-6">

</div>
</div>
</div>


</section>

 ]]></description>
  <category>AI</category>
  <category>Data Science</category>
  <category>Machine learning</category>
  <category>Deep learning</category>
  <category>Econometrics</category>
  <guid>https://realworlddatascience.net/foundation-frontiers/datasciencebites/posts/2025/02/05/DSAI-journal.html</guid>
  <pubDate>Wed, 05 Feb 2025 00:00:00 GMT</pubDate>
  <media:content url="https://realworlddatascience.net/foundation-frontiers/datasciencebites/posts/2025/02/05/images/RSS-DS-AI-cover.jpg" medium="image" type="image/jpeg"/>
</item>
<item>
  <title>The machine learning victories at the 2024 Nobel Prize Awards and how to explain them</title>
  <dc:creator>Anna Demming</dc:creator>
  <link>https://realworlddatascience.net/foundation-frontiers/posts/2024/10/31/machine-learning-nobel-prizes.html</link>
  <description><![CDATA[ 





<p>Few saw it coming when on 8th October 2024 the Nobel Committee awarded the <a href="https://www.nobelprize.org/prizes/physics/2024/prize-announcement/">2024 Nobel Prize for Physics</a> to John Hopfield for his Hopfield networks and Geoffrey Hinton for his Boltzmann machines as seminal developments towards machine learning that have statistical physics at the heart of them. The next day machine learning albeit using a different architecture bagged half of the <a href="https://www.nobelprize.org/prizes/chemistry/2024/prize-announcement/">Nobel Prize for Chemistry</a> as well, with the award going to Demis Hassabis and John Jumper for the development of an algorithm that predicts protein folding conformations. The other half of the Chemistry Nobel was awarded to David Baker for successfully building new proteins.</p>
<div id="fig-1" class="quarto-float quarto-figure quarto-figure-center anchored" data-fig-align="center" alt="Close-up of a copy of the Nobel Prize Medal. Photographed on the floor of the Nobel Museum in Old Town, Stockholm. Machine learning came up a winner in both the Physics and Chemistry Nobel Prizes for 2024. Credit: Shutterstock">
<figure class="quarto-float quarto-float-fig figure">
<div aria-describedby="fig-1-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
<img src="https://realworlddatascience.net/foundation-frontiers/posts/2024/10/31/images/Nobelpic-shutterstock-991.png" class="img-fluid quarto-figure quarto-figure-center figure-img" alt="Close-up of a copy of the Nobel Prize Medal. Photographed on the floor of the Nobel Museum in Old Town, Stockholm. Machine learning came up a winner in both the Physics and Chemistry Nobel Prizes for 2024. Credit: Shutterstock">
</div>
<figcaption class="quarto-float-caption-bottom quarto-float-caption quarto-float-fig" id="fig-1-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
Figure&nbsp;1: Close-up of a copy of the Nobel Prize Medal. Photographed on the floor of the Nobel Museum in Old Town, Stockholm. Machine learning came up a winner in both the Physics and Chemistry Nobel Prizes for 2024. Credit: Shutterstock
</figcaption>
</figure>
</div>
<p>While the AI takeover at this year’s Nobel announcements for Physics and Chemistry came as surprise to most, there has been some keen interest on how these apparently different approaches to machine learning might actually reduce to the same thing, revealing new ways of extracting some fundamental explainability from the generative AI algorithms that have so far been considered effectively “black boxes”. The “transformer architectures” behind the likes of ChatGPT and AlphaFold are incredibly powerful but offer little explanation as to how they reach their solutions so that people have resorted to querying the algorithms and adding to them in order to extract information that might offer some insights. “This is a much more conceptual understanding of what’s going on,” says Dmitry Krotov, now a researcher at IBM Research in Cambridge Massachusetts, who working alongside John Hopfield made some of the first steps that helps bring the two types of machine learning algorithm together.</p>
<section id="collective-phenomena" class="level2">
<h2 class="anchored" data-anchor-id="collective-phenomena">Collective phenomena</h2>
<p>Hopfield networks brought some of the mathematical toolbox long applied to extract “collective phenomena” from vast numbers of essentially identical parts such as atoms in a gas or atomic spins in magnetic materials. Although there maybe too many particles to track each individually, properties like temperature and magnetic field can be extracted using statistical physics. Hopfield showed that similarly a useful phenomenon he described as “associative memory” could be constructed from large numbers of artificial neurons by defining a “minimum energy”, which describes the network of neurons. The energy is determined by connections between neurons, which store information about patterns. Thus the network can retrieve the memorized patterns by minimizing that energy, just as stable conformations of atomic spins might be found in a magnetic material<sup>1</sup>. As the energy of the network is then subsequently minimised the pattern gets closer to the one that was memorised, just as when recalling a word or someone’s name we might first run through similar sounding words or names.</p>
<p>These Hopfield networks proved a seminal step in progressing AI algorithms, enabling a kind of pattern recognition from multiple stored patterns. However, it turned out that the number of patterns that could be stored was fundamentally limited due to what are known as “local” minima. You can imagine a ball rolling down a hill – it will reach the bottom of the hill fine so long as there are no dips for it to get stuck in en route. Algorithms based on Hopfield networks were prone to getting stuck in such dips or undesirable local minima, until Hopfield and Krotov put their heads together to find a way around it. Krotov describes himself as “incredibly lucky” that his research interests aligned so well with Hopfield. “He’s just such a smart and genuine person, and he has been in the field for many years,” he tells Real World Data Science. “He just knows things that no one else in the world knows.” Together they worked out they could address the problem of local minima by toggling the “activation function”.</p>
<div id="fig-2" class="quarto-float quarto-figure quarto-figure-center anchored" data-fig-align="center" alt="Energy Landscape of a Hopfield Network, highlighting the current state of the network (up the hill), an attractor state to which it will eventually converge, a minimum energy level and a basin of attraction shaded in green. Note how the update of the Hopfield Network is always going down in Energy. Credit: Mrazvan22/wikimedia">
<figure class="quarto-float quarto-float-fig figure">
<div aria-describedby="fig-2-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
<img src="https://realworlddatascience.net/foundation-frontiers/posts/2024/10/31/images/Energy_landscape.png" class="img-fluid quarto-figure quarto-figure-center figure-img" alt="Energy Landscape of a Hopfield Network, highlighting the current state of the network (up the hill), an attractor state to which it will eventually converge, a minimum energy level and a basin of attraction shaded in green. Note how the update of the Hopfield Network is always going down in Energy. Credit: Mrazvan22/wikimedia">
</div>
<figcaption class="quarto-float-caption-bottom quarto-float-caption quarto-float-fig" id="fig-2-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
Figure&nbsp;2: Energy Landscape of a Hopfield Network, highlighting the current state of the network (up the hill), an attractor state to which it will eventually converge, a minimum energy level and a basin of attraction shaded in green. Note how the update of the Hopfield Network is always going down in Energy. Credit: Mrazvan22/wikimedia
</figcaption>
</figure>
</div>
<p>In a Hopfield network all the neurons are connected to all the other neurons, however originally the algorithm only considered interactions between two neurons at each point, i.e.&nbsp;the interaction between neuron 1 and neuron 2, neuron 1 and neuron 3 and neuron 2 and neuron 3, but not the interactions among all three altogether. By including such “higher order” interactions between more than two neurons, Krotov and Hopfield found they made the basins of attraction for the true minimum energy states deeper. You can think of it a little like the ball rolling down a steeper hill so that it picks up more momentum along the slope of the main hill and is less prone to falling in little dips en route. This way Krotov and Hopfield increased the memory of Hopfield networks in what they called the Dense Associative Memory, which they described in 2016<sup>2</sup>. Long before then, however, Geoffrey Hinton had found a different tack to follow to increase the power of this kind of neural network.</p>
</section>
<section id="generative-ai" class="level2">
<h2 class="anchored" data-anchor-id="generative-ai">Generative AI</h2>
<p>Geoffrey Hinton showed that by defining some neurons as a hidden layer and some as a visible layer (a Boltzmann machine<sup>3</sup>) and limiting the connections so that neurons are only connected with neurons in other layers (a restricted Boltzmann machine<sup>4</sup>), finding the most likely network would generate networks with meaningful similarities – a type of generative AI. This and many other contributions by Geoffrey Hinton also proved incredibly useful in the progress of machine learning. However, the generative AI algorithms grabbing headlines today have actually been devised using a “transformer” architecture, which differs from Hopfield networks and Boltzmann machines, or so it seemed initially.</p>
<p>Transformer algorithms first emerged as a type of language model and were defined by a characteristic termed “attention”. “They say that each word represents a token, and essentially the task of attention is to learn long-range correlations between those tokens,” Krotov explains using the word “bank” as an example. Whether the word means the edge of a river or a financial institution can only be ascertained from the context in which it appears. “You learn these long-range correlations, and that allows you to contextualize and understand the meaning of every word.” The approach was first reported in 2017 in a paper titled “Attention is all you need”<sup>5</sup> by researchers at Google Brain and Google Research.</p>
<p>It was not long before people figured out that the approach would enable powerful algorithms for tasks beyond language manipulation, including Demis Hassabis and John Jumper at Deep Mind as they worked to figure out an algorithm that could predict the folding conformations of proteins. The algorithm they landed on in 2020 – AlphaFold2 – was capable of protein conformation prediction with a 90% accuracy, way ahead of any other algorithm at the time, including Deep Mind’s previous attempt AlphaFold, which although streaks ahead of the field at the time it was developed in 2018, still only achieved an accuracy of 60%. It was for the extraordinary predictive powers for protein conformations achieved by AlphaFold2 that Hassabis and Jumper were awarded half the 2024 Nobel Prize for Chemistry.</p>
</section>
<section id="connecting-the-dots" class="level2">
<h2 class="anchored" data-anchor-id="connecting-the-dots">Connecting the dots</h2>
<p>Transformer architectures are undoubtedly hugely powerful but how they operate can seem something of a dark art as although computer scientists know how they are programmed, even they cannot tell how they reach their conclusions in operation. Instead they query the algorithm and add to it to try and get some pointers as to what the trail of logic might have been. Here Hopfield networks have an advantage because people can hope to get a grasp on what energy minima they are converging to, and that way get a handle on their working out. However, in their paper “Hopfield networks is all you need”<sup>6</sup>, researchers in Austria and Norway showed that the activation function, which Hopfield and Krotov had toggled to make Hopfield networks store more memories, can also link them to transformer architectures – essentially if the function is exponential they can reduce to the same thing.</p>
<p>“We think about attention as learning long-range correlations, and this dense associative memory interpretation of attention tells you that each word creates a basin of attraction,” Krotov explains. “Essentially, the contextualization of the unknown word happens through the attraction to these different memories,” he adds. “That kind of lens of thinking about transformers through the prism of energy landscapes – it’s opened up this whole new world where you can think about what transformers are doing computationally, and how they perform that computation.”</p>
<p>“I think it’s great that the power of these tools is being recognised for the impact that they can have in accelerating innovation in new ways,” says Janet Bastiman, RSS Data Science and AI Section Chair and Chief Data Scientist at financial crimes compliance solutions company Napier AI, as she comments on the Nobel Prize awards. Bastiman’s most recent work has been on adding explanation to networks. She notes how the report Hopfield networks is all you need highlights “the difference that layers can have on the final outcomes for specific tasks and a clear need for understanding some of the principles of the layers of networks in order to validate results and be aware of potential difficulties and “best” scenarios for different use cases.”</p>
<p>Krotov also points out that since Hopfield networks are rooted in neurobiological interpretations, it helps to find “neurobiological ways of interpreting their computation” for transformer algorithms too. As such the vein Hopfield and Hinton tapped into with their seminal advances is proving ever richer in what Krotov describes as “the emerging field of the physics of neural computation”.</p>
<div class="article-btn">
<p><a href="../../../../../foundation-frontiers/index.html">Explore more data science ideas</a></p>
</div>
<div class="further-info">
<div class="grid">
<div class="g-col-12 g-col-md-12">
<dl>
<dt>About the author</dt>
<dd>
<strong>Anna Demming</strong> is a freelance science writer and editor based in Bristol, UK. She has a PhD from King’s College London in physics, specifically nanophotonics and how light interacts with the very small, and has been an editor for Nature Publishing Group (now Springer Nature), IOP Publishing and New Scientist. Other publications she contributes to include The Observer, New Scientist, Scientific American, Physics World and Chemistry World..
</dd>
</dl>
</div>
<div class="g-col-12 g-col-md-6">
<dl>
<dt>Copyright and licence</dt>
<dd>
© 2024 Anna Demming
</dd>
</dl>
<p><a href="http://creativecommons.org/licenses/by/4.0/?ref=chooser-v1" target="_blank" rel="license noopener noreferrer" style="display:inline-block;"> <img style="height:22px!important;vertical-align:text-bottom;" src="https://mirrors.creativecommons.org/presskit/icons/cc.svg?ref=chooser-v1"><img style="height:22px!important;margin-left:3px;vertical-align:text-bottom;" src="https://mirrors.creativecommons.org/presskit/icons/by.svg?ref=chooser-v1"></a> Text, code, and figures are licensed under a Creative Commons Attribution 4.0 (CC BY 4.0) <a href="http://creativecommons.org/licenses/by/4.0/?ref=chooser-v1" target="_blank" rel="license noopener noreferrer" style="display:inline-block;">International licence</a>, except where otherwise noted. Thumbnail image by <a href="https://www.shutterstock.com/image-photo/mute-key-on-neat-white-keyboard-1832448097">Shutterstock/Park Kang Hun</a> <a href="https://creativecommons.org/licenses/by/4.0/">Licenced by CC-BY 4.0</a>.</p>
</div>
<div class="g-col-12 g-col-md-6">
<dl>
<dt>How to cite</dt>
<dd>
Demming, Anna. 2024. “The machine learning victories at the 2024 Nobel Prize awards and how to explain them” Real World Data Science, October 31, 2024. <a href="https://realworlddatascience.net/foundation-frontiers/posts/2024/10/31/machine-learning-nobel-prizes.html">URL</a>
</dd>
</dl>
</div>
</div>
</div>


</section>


<div id="quarto-appendix" class="default"><section id="footnotes" class="footnotes footnotes-end-of-document"><h2 class="anchored quarto-appendix-heading">References</h2>

<ol>
<li id="fn1"><p>Hopfield J J Neural networks and physical systems with emergent collective computational abilities <em>PNAS</em> <strong>79</strong> 2554-2558 (1982) <a href="https://www.pnas.org/doi/pdf/10.1073/pnas.79.8.2554">https://www.pnas.org/doi/pdf/10.1073/pnas.79.8.2554</a>↩︎</p></li>
<li id="fn2"><p>Krotov D and Hopfield J J Dense Associative Memory for Pattern Recognition <em>NIPS</em> (2016)<a href="https://papers.nips.cc/paper_files/paper/2016/hash/eaae339c4d89fc102edd9dbdb6a28915-Abstract.html">https://papers.nips.cc/paper_files/paper/2016/hash/eaae339c4d89fc102edd9dbdb6a28915-Abstract.html</a>↩︎</p></li>
<li id="fn3"><p>Ackley D H, Hinton G E and Sejnowski T E A learning algorithm for boltzmann machines <em>Cognitive Science</em> <strong>9</strong> 147-169 (1985) <a href="https://www.sciencedirect.com/science/article/pii/S0364021385800124">https://www.sciencedirect.com/science/article/pii/S0364021385800124</a>↩︎</p></li>
<li id="fn4"><p>Salakhutdinov R, Mnih A and Hinton G Restricted Boltzmann machines for collaborative filtering <em>ICML ’07: Proceedings of the 24th international conference on Machine learning</em> 791-798 (2007) <a href="https://dl.acm.org/doi/10.1145/1273496.1273596">https://dl.acm.org/doi/10.1145/1273496.1273596</a>↩︎</p></li>
<li id="fn5"><p>Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez A N, Kaiser Ł, Polosukhin I Attention is all you need <em>NIPS</em> (2017)<a href="https://papers.nips.cc/paper_files/paper/2017/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html">https://papers.nips.cc/paper_files/paper/2017/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html</a>↩︎</p></li>
<li id="fn6"><p>Ramsauer H, Schäfl B, Lehner J, Seidl P, Widrich M, Adler T, Gruber L, Holzleitner M, Pavlović M, Kjetil Sandve G, Greiff V, Kreil D, Kopp M, Klambauer G, Brandstetter J and Hochreiter S <em>arXiv</em> (2020) <a href="https://arxiv.org/abs/2008.02217">https://arxiv.org/abs/2008.02217</a>↩︎</p></li>
</ol>
</section></div> ]]></description>
  <category>AI</category>
  <category>Algorithms</category>
  <category>Machine Learning</category>
  <guid>https://realworlddatascience.net/foundation-frontiers/posts/2024/10/31/machine-learning-nobel-prizes.html</guid>
  <pubDate>Thu, 31 Oct 2024 00:00:00 GMT</pubDate>
  <media:content url="https://realworlddatascience.net/foundation-frontiers/posts/2024/10/31/images/Nobelpic-shutterstock-991.png" medium="image" type="image/png" height="96" width="144"/>
</item>
<item>
  <title>Are we at risk of muting the female voice in the digital world?</title>
  <dc:creator>Anna Demming</dc:creator>
  <link>https://realworlddatascience.net/foundation-frontiers/posts/2024/09/17/digital-gender-gap.html</link>
  <description><![CDATA[ 





<p>Knowledge is power and today a lot of that knowledge – not just what you know but who you know – is online. In 2015 the UN General Assembly laid out 17 Sustainable Development Goals (SDGs) that aim to end poverty and other deprivations while improving the welfare of both people and the planet. One of the <a href="https://sdgs.un.org/goals/goal5#targets_and_indicators">SDGs deals with gender equality</a> and emphasises the importance of digital technology for empowering women. Online, a woman can engage in commercial, social, business or networking transactions without the need to be absent from care responsibilities at home or maintain traditional 9-5 working hours or, in some instances, even expose the fact that she is a woman at all – all potentially transformative features of online engagement<sup>1</sup>. Yet the reality for digital technology to empower women is by no means clear cut.</p>
<p>‘For me, whether digital technologies are able to empower women was fundamentally an empirical question,’’ says professor of demography and computational data science at Oxford University <a href="https://www.sociology.ox.ac.uk/people/ridhi-kashyap">Ridhi Kashyap</a>. She adds that in order to ask these questions of impact, you first need to be able to measure inequalities in digital access. However, the pace of technological change has been a lot faster than the rate at which national censuses – or other kinds of surveys useful to social scientists – update their questions, so they shed little light on the demographics around digital technologies.</p>
<p>Since then, progress in accruing data on digital access has revealed some stark gender inequalities. However, access is not the only fly in the ointment when it comes to the potential for digital technology to help towards gender equality. ‘The most harmful illegal online content disproportionately affects women and girls,’ says the <a href="https://www.gov.uk/government/publications/online-safety-act-explainer/online-safety-act-explainer#how-the-act-protects-women-and-girls">explainer for the UK’s 2023 Online Safety Act</a>. A <a href="https://www.turing.ac.uk/news/publications/understanding-gender-differences-experiences-and-concerns-surrounding-online">study by the Turing Institute</a> published earlier this year has revealed nuances on this picture, but confirmed that many women feel particularly vulnerable online, suggesting women may be losing a seat at the table as debate and discourse increasingly moves online.</p>
<div id="fig-1" class="quarto-float quarto-figure quarto-figure-center anchored" alt="Muted. In the absence of proactive intervention, the shift of debate and discourse online risks muting women and girls as multiple factors exclude them from engaging there as productively as male counterparts. Copyright: Park Kang Hun/Shutterstock." data-fig-align="center">
<figure class="quarto-float quarto-float-fig figure">
<div aria-describedby="fig-1-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
<img src="https://realworlddatascience.net/foundation-frontiers/posts/2024/09/17/images/Minoan-Illustration.jpg" class="img-fluid quarto-figure quarto-figure-center figure-img" alt="Muted. In the absence of proactive intervention, the shift of debate and discourse online risks muting women and girls as multiple factors exclude them from engaging there as productively as male counterparts. Copyright: Park Kang Hun/Shutterstock.">
</div>
<figcaption class="quarto-float-caption-bottom quarto-float-caption quarto-float-fig" id="fig-1-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
Figure&nbsp;1: Muted. In the absence of proactive intervention, the shift of debate and discourse online risks muting women and girls as multiple factors exclude them from engaging there as productively as male counterparts. Copyright: Park Kang Hun/Shutterstock.
</figcaption>
</figure>
</div>
<p>The digital gender gap has a cost estimated at $126 billion USD for the 32 low- and low-to-middle-income countries analysed by the Alliance for Affordable Internet (A4AI)<sup>2</sup>. This is due to the ‘untold wealth of cultural, social, and scientific knowledge lost because of the exclusion of women’s and girls’ voices from the online world.’ Focus on this issue has brought a little more clarity to the size of the problem. However, while the UK’s Online Safety Act marks some progress, questions remain as to what can be done, and whether the hope of digital technologies helping towards gender equality is still justified.</p>
<section id="gender-disparities-in-internet-access" class="level2">
<h2 class="anchored" data-anchor-id="gender-disparities-in-internet-access">Gender disparities in internet access</h2>
<p>A turning point in the conversation around digital technology and gender equality came in 2018 with work by Kashyap and collaborators in the US and Qatar at the time. They found that where traditional survey-based data on internet and mobile gender gaps was available, it correlated well with the gender gap on Facebook, using data extracted for Facebook’s ad platform: When Facebook’s aggregate user counts did not show women, it provided a good signal that women were not online altogether in those countries. As such, the work revealed a potentially useful proxy to gauge the digital gender gap in countries where little traditional survey data was available<sup>3</sup>. <a href="https://www.digitalgendergaps.org/">The results</a> revealed an unexpectedly large gender gap, particularly in parts of South Asia and certain countries in Africa where men were up to twice as likely to have access to the Internet compared with women.</p>
<p>‘In some sense it was perhaps not surprising,’’ says Kashyap highlighting that having a mobile phone or similar device that grants access to the internet amounts to a kind of asset ownership, and studies for other assets indicate women are less likely to own them. ‘This is broadly reflective of economic gender inequality,’ she adds. Perhaps more surprising is that the gaps have changed very little in the five years since <a href="https://www.digitalgendergaps.org/">their website, which monitors the digital gender gap</a>, was first released, particularly in view of the pace of technological progress in general, and the importance placed on closing the gap. Citing India as an example, Kashyap points out that in 2019 the ratio of access to the internet for men versus women was 0.619 – fewer than two women had access for every three men with access. In the subsequent half decade this digital gender gap has closed by just 7.1% to a ratio of 0.663.</p>
<div id="fig-2" class="quarto-float quarto-figure quarto-figure-center anchored" alt="The digital gender gap. Ratio of female-to-male internet use estimated using the Facebook Gender Gap Index^[Leasure D R, Yan J, Bondarenko M, Kerr D, Fatehkia M, Weber I &amp; Kashyap R. Digital Gender Gaps Web Application, v1.0.0. Zenodo, GitHub (2023) [doi:10.5281/zenodo.7897491](https://github.com/OxfordDemSci/dgg-www)]" data-fig-align="center">
<figure class="quarto-float quarto-float-fig figure">
<div aria-describedby="fig-2-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
<img src="https://realworlddatascience.net/foundation-frontiers/posts/2024/09/17/images/Digital gender gap.png" class="img-fluid quarto-figure quarto-figure-center figure-img" alt="The digital gender gap. Ratio of female-to-male internet use estimated using the Facebook Gender Gap Index^[Leasure D R, Yan J, Bondarenko M, Kerr D, Fatehkia M, Weber I &amp; Kashyap R. Digital Gender Gaps Web Application, v1.0.0. Zenodo, GitHub (2023) [doi:10.5281/zenodo.7897491](https://github.com/OxfordDemSci/dgg-www)]">
</div>
<figcaption class="quarto-float-caption-bottom quarto-float-caption quarto-float-fig" id="fig-2-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
Figure&nbsp;2: The digital gender gap. Ratio of female-to-male internet use estimated using the Facebook Gender Gap Index<sup>4</sup>
</figcaption>
</figure>
</div>
<p>In countries where the gender disparity for access to the internet is large, there is evidence to suggest that those women who do have access are of the more affluent echelons of society. Analysis of the type of device used, which can also be retrieved from the Facebook ad platform, highlighted that where women are less likely to be online, the relative proportion of iOS users tends to be higher among women than among men, and as Kashyap points out, ‘iOS users are on average wealthier’. Fortunately, among the stakeholders starting to see the benefit of closing the gap in access to the internet between the genders are the mobile network providers, who are looking for ways to tap into this part of the market through incentives and discounts on SIMs for women. However, it is unclear to what extent these types of schemes are ultimately beneficial in closing the wider gap.</p>
<p>Kashyap and her colleagues also found that a key predictor of the digital gender gap was the gender gap in educational attainment. ‘I think that’s quite telling, because it’s showing that accessing education and going to educational institutions is also a pathway to becoming more digitally integrated,’ says Kashyap, flagging that schools and educational institutions are where women and girls often access computers and digital technologies. She highlights that beyond giving people a device ‘more of the challenge’ is helping them make good use of it by ‘giving people skills to feel that this is actually meaningful for them, and allows them to do things that they wouldn’t be able to do otherwise, and feeling confident and safe and secure.’ She emphasises the importance of men valuing gender equality, highlighting work from South Asia that shows that even when women have a device, their use of it may be curtailed or scrutinised by male members of the household, sometimes on the grounds of <a href="https://www.gsma.com/solutions-and-impact/connectivity-for-good/mobile-for-development/blog/the-mobile-gender-gap-in-south-asia-is-now-widening/">doubts over women’s safety online</a>.</p>
</section>
<section id="gender-disparities-in-fears-of-online-harms" class="level2">
<h2 class="anchored" data-anchor-id="gender-disparities-in-fears-of-online-harms">Gender disparities in ‘fears’ of online harms</h2>
<p>Safety can be a knotty issue when it comes to enabling women to have a voice online. A study by the Alan Turing Institute<sup>5</sup> earlier this year suggested just 23% of women in general feel comfortable expressing political opinions online, compared with 40% of men. This might be down to women in general being exposed to online violence more than men, as previous studies of online harms have suggested. Indeed, a key takeaway from the Alan Turing Institute’s study was that women reported greater fears of exposure for all categories of harm, although this included types of harm that women reported experiencing less frequently than men.</p>
<p>Previous studies have largely surveyed women-only sample-groups so that their conclusions were drawn without data on men against which to compare. In contrast, the researchers at the Alan Turing Institute, including researcher <a href="https://www.turing.ac.uk/people/researchers/tvesha-sippy">Tvesha Sippy</a>, took a nationally representative survey of 2,000 men and women. They investigated whether they had been exposed to various types of online harms, their fears surrounding such exposure, the psychological impact of those experiences in general, tendencies to use protective tools for digital activities, and how comfortable they felt with online behaviours such as expressing opinions and sharing information online. The study revealed that women were more likely to report experiencing some harms, such as online misogyny, cyberflashing, cyberstalking, image-based abuse and eating disorder content to a significantly greater extent than men. However, there were several harms that men reported being the direct targets of to a greater extent than women, such as hate speech, misinformation, trolling and threats of physical violence.</p>
<p>By using a representative cohort, the Alan Turing Institute study tells a more nuanced story than those sampling women only and highlights challenges in similar assessments for minority groups. For example, those identifying as non-binary were excluded from the analysis by the Alan Turing Institute because, although as Sippy emphasises, ‘We do want to look at minoritised genders,’ they did not have sufficient numbers of respondents in this category within their nationally representative survey to do any meaningful analysis. Ultimately, a higher budget enabling larger samples would allow analysis of minority groups as well.</p>
<p>As for the greater fears for all online harms reported by women, ‘it’s a very complex phenomenon,’’ Sippy tells Real World Data Science, highlighting the need for further research. She points to several possible explanations such as differences in the impacts of the harms experienced more by women versus men, as well as innate fearfulness potentially from the offline world translating to behaviour online. Sippy also highlights the differences in how men and women experience online harms, which may offer clues. Women were more likely to report that their fears stem from the experience of a public figure (35% of the women surveyed compared with 26% of the men) or a female friend (37% of the women compared with 27% of the men). Furthermore, the experience of a male friend was much less often cited as the source of online fears for both groups (8% of the women and 14% of the men). There is also the possibility that women’s adaptive behaviours make them less exposed to future online harms than men, since women were more likely to make use of protective tools from disabling location-sharing on a device, and limiting who can engage with images, posts and tweets, or even find their profile. While protective, such adaptive behaviours could also dampen the influence women have in online discourse.</p>
<p>Rather than relying on adaptive behaviour for self-protection, it would seem a lot of people are keen to see more action from social media companies and governments to help people to feel safer online. In 2023, researchers at the Turing Institute led by senior research associate Florence Enock published a study investigating <a href="https://www.turing.ac.uk/news/publications/experiences-online-harms">attitudes to online interventions</a>. They found that 79% thought social media platforms should ban or suspend users who create harmful content and 73% thought that platforms should remove harmful content. According to the report ‘this was consistent across age, gender, educational background, income and political ideology.’</p>
<p>There are some complications for social media companies who need to balance privacy needs with protection, as well as having the resources required to handle multilingual posts when investigating what action to take. However, Sippy feels there remains a need to have a civil remedy in place so that a user can request a platform take down content which is harmful without having to pursue criminal proceedings and get the police involved. Where the additional resources needed for social media companies to take corrective action and a lack of business incentive pose an obstacle, government legislation may help. The same study into attitudes to online interventions also reported that for platforms that fail to deal with harmful content online more than 70% of respondents felt the government should be able to issue large fines, and 66% thought that legal action should be taken.</p>
<p>‘The Online Safety Act is a really good start,’ adds Sippy, also highlighting the importance of proposals by the previous UK government to criminalise the creation of sexually explicit deep fakes. She points to a 2019 report by AI firm Deeptrace, suggesting that of 15,000 deep fake videos they found online, 96% constituted nonconsensual pornography with women disproportionately targeted<sup>6</sup>. In a recent Alan Turing Institute survey 90% of respondents expressed concerns about deepfakes increasing misogyny and online violence against women and girls<sup>7</sup>. ‘I do see there’s more advocacy, but it remains to be seen what approach the new Government will take.’</p>
</section>
<section id="gender-disparities-for-making-an-impact-online" class="level2">
<h2 class="anchored" data-anchor-id="gender-disparities-for-making-an-impact-online">Gender disparities for making an impact online</h2>
<p>Challenges to women being heard online seem to go beyond safety issues. Recent research by Kashyap and collaborators at the University of Oxford and collaborators in Iran and Germany has also highlighted differences in how influential women’s professional networks are relative to male counterparts<sup>8</sup>. In previous work with Florianne Verkroost, also at the University of Oxford, Kashyap had investigated the gender gaps in those who have a LinkedIn profile to see how they vary across industries<sup>9</sup>. They found that use of the platform broadly mirrors female-to-male ratios of representation in technical and managerial professions. In reference 8, they then investigated what insights LinkedIn data might provide as to the cause of some of the gender disparities in these professions, and ultimately why women are not progressing in technical and professional jobs as well as male counterparts.</p>
<p>‘One argument is that that’s often because they don’t have advantageous networks,’ says Kashyap, adding that women may be restricted by the need to resume care commitments at home instead of staying for drinks after work or travelling to attend conferences. One might expect online avenues for networking would be able to mitigate such obstacles. In fact, studies of LinkedIn data did suggest that although women are less likely to be in professional and technical occupations as reflected in the platform’s data, in some instances their numbers exceeded them. Kashyap suggests this could be ‘where they’re using online platforms to make themselves more visible, because other fine forms of networking are less available, or they have less time for it.’ Indeed, women who were on LinkedIn were more likely to report a promotion than their male counterparts, suggesting an element of positive selection among the female LinkedIn user population. However, the potential equalising impact of moving professional networking online seems to have its limits.</p>
<p>Their study of LinkedIn data showed women were less likely to report a relocation for work, which Kashyap suggests, ‘is a sign that the work family trade-off is probably still remaining acute for this highly selected group.’ In another 2023 study Kashyap and colleagues had also reported a lower mobility for women, specifically among published scientists, researchers and academics based on bibliometric data from over 33 million Scopus publications<sup>10</sup>. In addition, when Kashyap and her colleagues looked at women on LinkedIn working in the tech sector, they found that they had a lower chance of being connected to those working in one of the “big five” firms in the tech sector than men, when not working in one themselves. ‘One way to interpret that is to say that they have maybe less influential online social networks, right, even when they are on the platform.’</p>
<div id="fig-4" class="quarto-float quarto-figure quarto-figure-center anchored" alt="Leaky pipeline. The proportion of women working in science decreases towards the mid and senior career stages" data-fig-align="center">
<figure class="quarto-float quarto-float-fig figure">
<div aria-describedby="fig-4-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
<img src="https://realworlddatascience.net/foundation-frontiers/posts/2024/09/17/images/shutterstock_1215562669-h350.jpg" class="img-fluid quarto-figure quarto-figure-center figure-img" alt="Leaky pipeline. The proportion of women working in science decreases towards the mid and senior career stages">
</div>
<figcaption class="quarto-float-caption-bottom quarto-float-caption quarto-float-fig" id="fig-4-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
Figure&nbsp;3: Leaky pipeline. The proportion of women working in science decreases towards the mid and senior career stages.
</figcaption>
</figure>
</div>
<p>Kashyap suggests several reasons why women may have less influential networks online. For one, online networks are still likely to be influenced by the scenarios playing out offline, since referrals on these networks are based on the people you already know. The difference may also be based on the types of companies women tend to work in and the positions they hold. For instance, women are more likely to work in IT service support than programming-intensive occupations, and here once again Kashyap suggests the work family trade off plays a role in women seeking less intensive or more flexible jobs. She highlights that girls equal or exceed the achievement of male counterparts through school and continue to match them in their early careers before their numbers start to drop off dramatically. ‘I think now there’s a growing recognition that this is actually a real conflict, the work family conflict,’ she tells Real World Data Science. Today’s young women are socialised to have ‘high achieving aspirations’, which can be hard to reconcile with ‘regressive norms’ for women to shoulder the bulk of caring responsibilities, particularly when starting a family.</p>
</section>
<section id="real-world-gender-disparities-in-career-development" class="level2">
<h2 class="anchored" data-anchor-id="real-world-gender-disparities-in-career-development">Real world gender disparities in career development</h2>
<p>Neuroscientist Joanne Kenney has also been following data on the gender gap in the science and tech sectors and co-authored ‘A Snapshot of Female Representation in Twelve Academic Psychiatry Institutions Around the World’<sup>11</sup> with Assistant Professor of Psychiatry at Harvard Medical School Elisabetta del Re. The figures published here also show that globally women represent a large majority of early career scientists, but their numbers steadily decrease towards the mid and senior career stages so that there is a negative correlation between career stages and female presence in science, often referred to as the ‘leaky pipeline’ or ‘sticky floor’. ‘You don’t always hear their stories or the reasons why they’ve left,’ says Kenney who highlights that in her experience in academia exit interviews are rare. Just 24% of the UK total workforce in the tech sector are women, while black women account for only 0.7% of IT professionals according to the 2024 UN Women UK and Kearney Consulting report <a href="https://www.kearney.com/about/diversity-equity-and-inclusion/gap-to-gateway">‘Gap to Gateway: diversity in tech as the key to the future’</a> for which Kenney was an external collaborator. Kenney is currently working on another project with a team of scientists from Europe, Africa, and North and South America led by del Re to gather stories from women and other underrepresented groups in academic institutions around the world through focus groups aimed at better understanding their experiences of working in science.</p>
<p>For those who stick at it, the career path appears to be a steeper hike for women than their male counterparts. There is a citation-bias favouring male-authored articles<sup>12</sup>. Women also take on average nine years to transition to senior author whereas men take five<sup>13</sup>, and women are less likely to be promoted to leadership positions<sup>14</sup>. While women in science bear a measurably unequal career impact on entering parenthood<sup>15</sup>, some of these inequalities may also stem from sexism, which can range from fewer opportunities for mentorship and collaboration to outright harassment<sup>16</sup>.</p>
<p>‘I think a lack of mentorship and sponsorship are two big ones,’ says Kenney when it comes to the key discouraging factors for women at the mid-career point in tech and academia. In AI, in particular, less than 3% of venture capital funding deals involving AI startups go to women-founded companies. The gender pay gap, which at 16% in the sector exceeds the overall pay gap of 11.6% may be another disincentive.</p>
<p>In short there is evidence of various patriarchal subcultures at play, both in the tech and science sectors and the world in general that can still pose a significant disadvantage to women. As Sippy points out, ‘Those subcultures also translate to the online world.’ Ultimately while digital technologies may offer creative loopholes for side-stepping some aspects of gender bias and disadvantage, gender inequality needs to be tackled in both spaces in tandem.</p>
<div class="article-btn">
<p><a href="../../../../../foundation-frontiers/index.html">Explore more data science ideas</a></p>
</div>
<div class="further-info">
<div class="grid">
<div class="g-col-12 g-col-md-12">
<dl>
<dt>About the author</dt>
<dd>
<strong>Anna Demming</strong> is a freelance science writer and editor based in Bristol, UK. She has a PhD from King’s College London in physics, specifically nanophotonics and how light interacts with the very small, and has been an editor for Nature Publishing Group (now Springer Nature), IOP Publishing and New Scientist. Other publications she contributes to include The Observer, New Scientist, Scientific American, Physics World and Chemistry World..
</dd>
</dl>
</div>
<div class="g-col-12 g-col-md-6">
<dl>
<dt>Copyright and licence</dt>
<dd>
© 2024 Anna Demming
</dd>
</dl>
<p><a href="http://creativecommons.org/licenses/by/4.0/?ref=chooser-v1" target="_blank" rel="license noopener noreferrer" style="display:inline-block;"> <img style="height:22px!important;vertical-align:text-bottom;" src="https://mirrors.creativecommons.org/presskit/icons/cc.svg?ref=chooser-v1"><img style="height:22px!important;margin-left:3px;vertical-align:text-bottom;" src="https://mirrors.creativecommons.org/presskit/icons/by.svg?ref=chooser-v1"></a> Text, code, and figures are licensed under a Creative Commons Attribution 4.0 (CC BY 4.0) <a href="http://creativecommons.org/licenses/by/4.0/?ref=chooser-v1" target="_blank" rel="license noopener noreferrer" style="display:inline-block;">International licence</a>, except where otherwise noted. Thumbnail image by <a href="https://www.shutterstock.com/image-photo/mute-key-on-neat-white-keyboard-1832448097">Shutterstock/Park Kang Hun</a> <a href="https://creativecommons.org/licenses/by/4.0/">Licenced by CC-BY 4.0</a>.</p>
</div>
<div class="g-col-12 g-col-md-6">
<dl>
<dt>How to cite</dt>
<dd>
Demming, Anna. 2024. “Are we at risk of muting the female voice in the digital world?” Real World Data Science, September 17, 2024. <a href="https://realworlddatascience.net/foundation-frontiers/posts/2024/09/17/digital-gender-gap.html">URL</a>
</dd>
</dl>
</div>
</div>
</div>


</section>


<div id="quarto-appendix" class="default"><section id="footnotes" class="footnotes footnotes-end-of-document"><h2 class="anchored quarto-appendix-heading">References</h2>

<ol>
<li id="fn1"><p>Sicat M, Xu A, Mehetaj E, Ferrantino M &amp; Chemutai V Leveraging ICT Technologies in Closing the Gender Gap World Bank <em>World Bank Group, Washington DC</em> (2020) <a href="https://documents1.worldbank.org/curated/en/891391578289050252">https://documents1.worldbank.org/curated/en/891391578289050252</a>↩︎</p></li>
<li id="fn2"><p>Web Foundation. The Costs of Exclusion: Economic Consequences of the Digital Gender Gap. Alliance for Affordable Internet (2021) <a href="https://a4ai.org/report/the-costs-of-exclusion-economic-consequences-of-the-digital-gender-gap/">https://a4ai.org/report/the-costs-of-exclusion-economic-consequences-of-the-digital-gender-gap/</a>↩︎</p></li>
<li id="fn3"><p>Fatehkia M, Kashyap R &amp; Ingmar Weber I Using Facebook ad data to track the global digital gender gap <em>World Development</em> <strong>107</strong> 189-209 (2018) <a href="https://www.sciencedirect.com/science/article/pii/S0305750X18300883">https://www.sciencedirect.com/science/article/pii/S0305750X18300883</a>↩︎</p></li>
<li id="fn4"><p>Leasure D R, Yan J, Bondarenko M, Kerr D, Fatehkia M, Weber I &amp; Kashyap R. Digital Gender Gaps Web Application, v1.0.0. Zenodo, GitHub (2023) <a href="https://github.com/OxfordDemSci/dgg-www">doi:10.5281/zenodo.7897491</a>↩︎</p></li>
<li id="fn5"><p>Stevens F, Enock F E, Sippy T, Bright J, Cross M, Johansson P, Wajcman J, Margetts H Z Understanding gender differences in experiences and concerns surrounding online harms: A nationally representative survey of UK adults Alan Turing Institute (2024) <a href="https://www.turing.ac.uk/news/publications/understanding-gender-differences-experiences-and-concerns-surrounding-online">https://www.turing.ac.uk/news/publications/understanding-gender-differences-experiences-and-concerns-surrounding-online</a>↩︎</p></li>
<li id="fn6"><p>Ajder H, Patrini G, Cavalli F &amp; Cullen L The State of Deepfakes: Landscape, Threats, and Impact, (2019) <a href="https://regmedia.co.uk/2019/10/08/deepfake_report.pdf">https://regmedia.co.uk/2019/10/08/deepfake_report.pdf</a>↩︎</p></li>
<li id="fn7"><p>Sippy T, Enock F E, Bright J &amp; Margetts H Z Behind the Deepfake: 8% Create; 90% Concerned Alan Turing Institute (2024) <a href="https://www.turing.ac.uk/news/publications/behind-deepfake-8-create-90-concerned">https://www.turing.ac.uk/news/publications/behind-deepfake-8-create-90-concerned</a>↩︎</p></li>
<li id="fn8"><p>Kalhor G, Gardner H, Weber I, Kashyap R <em>Proceedings of the Eighteenth International AAAI Conference on Web and Social Media</em> <strong>18</strong> (2024) <a href="https://ojs.aaai.org/index.php/ICWSM/article/view/31353">https://ojs.aaai.org/index.php/ICWSM/article/view/31353</a>↩︎</p></li>
<li id="fn9"><p>Kashyap R &amp; Verkroost F C J Analysing global professional gender gaps using LinkedIn advertising data EPJ Data Science <strong>10</strong> 39 (2021) <a href="https://epjds.epj.org/articles/epjdata/abs/2021/01/13688_2021_Article_294/13688_2021_Article_294.html">https://doi.org/10.1140/epjds/s13688-021-00294-7</a>↩︎</p></li>
<li id="fn10"><p>Zhao X , Akbaritabar A, Kashyap R &amp; Zagheni E A gender perspective on the global migration of scholars <em>PNAS</em> <strong>120</strong> e2214664120 <a href="https://www.pnas.org/doi/10.1073/pnas.2214664120">https://doi.org/10.1073/pnas.2214664120</a>↩︎</p></li>
<li id="fn11"><p>Kenney J, Ochoa S, Alnor M A, Ben-Azu B, Diaz-Cutraro L, Folarin R, Hutch A, Luckhoff H K, Prokopez C R, Rychagov N, Surajudeen B, Walsh L, Watts T, Del Re E C A Snapshot of Female Representation in Twelve Academic Psychiatry Institutions Around the World <em>Psychiatry Research</em> (2021) <a href="https://pubmed.ncbi.nlm.nih.gov/34986430/">doi: 10.1016/j.psychres.2021.114358</a>↩︎</p></li>
<li id="fn12"><p>Dworkin J D, Linn K A, Teich E G, Zurn P, Shinohara R T &amp; Bassett D S The extent and drivers of gender imbalance in neuroscience reference lists <em>Nature</em> <strong>23</strong> 918-926 (2020) <a href="https://www.nature.com/articles/s41593-020-0658-y">https://www.nature.com/articles/s41593-020-0658-y</a>↩︎</p></li>
<li id="fn13"><p>Bearden C E Accelerating the Bending Arc Toward Equality: A Commentary on Gender Trends in Authorship in Psychiatry Journals <em>Biological Psychiatry</em> <strong>86</strong> 575-576 (2019)<a href="https://www.biologicalpsychiatryjournal.com/article/S0006-3223(19)31588-4/abstract">https://www.biologicalpsychiatryjournal.com/article/S0006-3223(19)31588-4/abstract</a>↩︎</p></li>
<li id="fn14"><p>Clark J &amp; Horton R A coming of age for gender in global health <em>The Lancet</em> <strong>393</strong> p2367-2369 (2019) <a href="https://www.thelancet.com/journals/lancet/article/PIIS0140-6736(19)30986-9/abstract">https://www.thelancet.com/journals/lancet/article/PIIS0140-6736(19)30986-9/abstract</a>↩︎</p></li>
<li id="fn15"><p>Morgan A C, Way S F, Hoefer M J D, Larremore D B, Galesic M &amp; Clauset A The unequal impact of parenthood in academia <em>Science Advnaces</em> <strong>7</strong> eabd1996 <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7904257/">doi: 10.1126/sciadv.abd1996</a>↩︎</p></li>
<li id="fn16"><p>O’Connor P s gendered power irrelevant in higher educational institutions? Understanding the persistence of gender inequality *Interdisciplinary Science Reviews” <strong>48</strong> 669-686 (2023) <a href="https://www.tandfonline.com/doi/full/10.1080/03080188.2023.2253667#d1e144">https://doi.org/10.1080/03080188.2023.2253667</a>↩︎</p></li>
</ol>
</section></div> ]]></description>
  <category>gender equality</category>
  <category>skills</category>
  <category>ethics</category>
  <guid>https://realworlddatascience.net/foundation-frontiers/posts/2024/09/17/digital-gender-gap.html</guid>
  <pubDate>Tue, 17 Sep 2024 00:00:00 GMT</pubDate>
  <media:content url="https://realworlddatascience.net/foundation-frontiers/posts/2024/09/17/images/Minoan-Illustration-991.jpg" medium="image" type="image/jpeg"/>
</item>
<item>
  <title>Nowcasting upgrade for better real time estimation of GDP and inflation</title>
  <dc:creator>Atmajitsinh Gohil</dc:creator>
  <link>https://realworlddatascience.net/foundation-frontiers/datasciencebites/posts/2024/6/25/nowcasting-3step.html</link>
  <description><![CDATA[ 





<p>Governments, policymakers and central banks across the world are wrestling to keep rising prices under control using monetary policies such as interest rate increases. The effectiveness of such policy changes should be assessed by monitoring inflation data as well as studying the impact on real GDP, making timely and accurate access to key economic indicators crucial for policy planning. The delay in publishing economic indicators such as Real GDP, inflation and other labour related series, makes this real time assessment of the economy particularly challenging. Now Menzie Chinn at the University of Wisconsin, Baptiste Meunier at the European Central Bank and Sebastian Stumpner at the Banque de France report an approach for “nowcasting” built on previous research that develops a framework using different machine learning techniques and is flexible and adaptable compared with traditional methods<sup>1</sup>. They report on the accuracy of their 3-step framework for nowcasting global trade volume estimates, showing how it can outperform traditional methods. They also highlight that the 3-step framework can be extended beyond World Trade data.</p>
<p>Nowcasting, an amalgamation of the term now and forecasting, provides a methodology to assess the current state of the economy by predicting the current value of inflation or Real GDP. The <a href="https://www.newyorkfed.org/research/policy/nowcast#/overview">Federal Reserve Bank of New York</a> and <a href="https://www.atlantafed.org/cqer/research/gdpnow">Federal Reserve Bank of Atlanta</a> have used nowcasting to publish real time GDP estimates, for the USA. Similarly, the <a href="https://www.clevelandfed.org/indicators-and-data/inflation-nowcasting">Federal Reserve Bank of Cleveland estimates real time inflation</a> using nowcasting methods.</p>
<div id="fig-1" class="quarto-float quarto-figure quarto-figure-center anchored" data-fig-align="center" alt="GDP digital drawing. Credit: Shutterstock, Vink Fan">
<figure class="quarto-float quarto-float-fig figure">
<div aria-describedby="fig-1-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
<img src="https://realworlddatascience.net/foundation-frontiers/datasciencebites/posts/2024/6/25/images/GDPshutterstock_2302082265-991.jpg" class="img-fluid quarto-figure quarto-figure-center figure-img" alt="GDP digital drawing. Credit: Shutterstock, Vink Fan">
</div>
<figcaption class="quarto-float-caption-bottom quarto-float-caption quarto-float-fig" id="fig-1-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
Figure&nbsp;1: Growth of GDP with statistical graph, 3d rendering. Digital drawing. Credit: Shutterstock, Vink Fan
</figcaption>
</figure>
</div>
<p>The basic principle of nowcasting is utilising information that is published early such as using data published at higher frequency, survey data, financial indicators or economic indicators. For example, the running estimate of Real GDP (aka GDPNow) that the Federal Reserve Bank of Atlanta provides is updated 6 or 7 times a month on weekdays when one of the 7 input data sources are released. Similarly, the real GDP growth estimate that the Federal Reserve Bank of New York provides is based on data releases in categories such as housing and construction, manufacturing, surveys, retail and consumption, income, labour, international trade, prices and others.</p>
<p>The traditional methods of nowcasting do not provide an integrated framework, and forecasters need to know which variables to use, and select a method for factor extraction and machine learning regression. Chinn, Meunier and Stumpner propose a sequential framework that selects the most important predictors. The selected variables are then summarized using Principal Component Analysis (PCA) and these factors are used as explanatory variables to perform the regression. Although traditional methods of nowcasting also utilized many of these techniques, the authors test various combinations of pre-selection, factor extraction and regression techniques and propose a combination that improves model accuracy.</p>
<section id="model-framework-improved-flexibility-and-accuracy" class="level2">
<h2 class="anchored" data-anchor-id="model-framework-improved-flexibility-and-accuracy">Model framework improved flexibility and accuracy:</h2>
<p>The 3 steps in the framework are chronological steps to be performed in which the first step is pre-selection of the independent variables with the highest predictive power. The independent variables from the first step are then summarised into a few factors using factor extraction methodology in the second step. The final step consists of using the factors from step 2 to perform regression.</p>
<div id="fig-2" class="quarto-float quarto-figure quarto-figure-center anchored" data-fig-align="center" alt="The various methods that can be employed in the 3 step framework in Chinn et al (2024). Credit: National Bureau of Economic Research.">
<figure class="quarto-float quarto-float-fig figure">
<div aria-describedby="fig-2-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
<img src="https://realworlddatascience.net/foundation-frontiers/datasciencebites/posts/2024/6/25/images/3step-framework-methods-big.png" class="img-fluid quarto-figure quarto-figure-center figure-img" alt="The various methods that can be employed in the 3 step framework in Chinn et al (2024). Credit: National Bureau of Economic Research.">
</div>
<figcaption class="quarto-float-caption-bottom quarto-float-caption quarto-float-fig" id="fig-2-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
Figure&nbsp;2: The various methods that can be employed in the 3 step framework in Chinn et al (2024). Credit: National Bureau of Economic Research.
</figcaption>
</figure>
</div>
<p>Figure&nbsp;2 summarises the various methods employed at each step in the 3 step framework. In their report Chinn, Meunier and Stumpner aim to propose the best techniques for pre-selection, factor extraction and regression. As such their 3-step framework comprises performing pre-selection using Least Angle Regression (LARS), factor extraction using Principal Component Analysis (PCA) and employing a Macroeconomic Random Forest (MRF) machine learning technique for nowcasting.</p>
<p>The model performance or accuracy of MRF is compared with traditional methods using Root Mean Square Error (RMSE), a measure of the deviation between the actual data and the predicted data. The 3-step framework model accuracy is tested by holding the preselection and factor extraction fixed to isolate the impact of regression techniques.</p>
<div id="fig-3" class="quarto-float quarto-figure quarto-figure-center anchored" data-fig-align="center" alt="Bar chart comparing the accuracy of different methods in terms of RMSE. Credit: National Bureau of Economic Research.">
<figure class="quarto-float quarto-float-fig figure">
<div aria-describedby="fig-3-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
<img src="https://realworlddatascience.net/foundation-frontiers/datasciencebites/posts/2024/6/25/images/method-accuracy-big.png" class="img-fluid quarto-figure quarto-figure-center figure-img" alt="Bar chart comparing the accuracy of different methods in terms of RMSE. Credit: National Bureau of Economic Research.">
</div>
<figcaption class="quarto-float-caption-bottom quarto-float-caption quarto-float-fig" id="fig-3-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
Figure&nbsp;3: Bar chart comparing the accuracy of different methods in terms of RMSE. Credit: National Bureau of Economic Research.
</figcaption>
</figure>
</div>
<p>Figure&nbsp;3 compares the RMSE of traditional methods, machine learning tree and machine learning regression model for backcasting (t-2 and t-1), nowcasting (t) and forecasting (t+1). It highlights the greater model accuracy of MRF and Gradient Boosting compared with traditional models and tree models for backcasting, nowcasting and forecasting.</p>
</section>
<section id="whats-next" class="level2">
<h2 class="anchored" data-anchor-id="whats-next">What’s Next?</h2>
<p>Organisations such as <a href="https://nowcastinglab.org/map">The Nowcasting Lab</a> provide GDP estimates for European countries. Such nowcasting techniques have been employed by humanitarian agencies including the United Nations Refugee Agency (UNHCR) which uses nowcasting to estimate the actual forced displaced population. The nowcasting techniques, dashboards and tools have been implemented and accepted as a reliable source of information at government organisations for policy making, central banks, and financial organisations. The 3-step framework, proposed by Chinn, Meunier and Stumpner, is easily adaptable, flexible and provides higher accuracy, which will be valuable to a range of fields employing nowcasting.</p>
<div class="article-btn">
<p><a href="../../../../../../foundation-frontiers/index.html">Explore more data science ideas</a></p>
</div>
<div class="further-info">
<div class="grid">
<div class="g-col-12 g-col-md-12">
<dl>
<dt>About the author</dt>
<dd>
<strong>Atmajitsinh Gohil</strong> is an independent researcher in the field of AI and ML, specifically managing AI and ML risk. He has worked with consulting firm assisting clients in model risk management. He has graduated from SUNY, Buffalo with a M.S. in Economics.
</dd>
</dl>
</div>
<div class="g-col-12 g-col-md-6">
<dl>
<dt>Copyright and licence</dt>
<dd>
© 2024 Atmajitsinh Gohil
</dd>
</dl>
<p><a href="http://creativecommons.org/licenses/by/4.0/?ref=chooser-v1" target="_blank" rel="license noopener noreferrer" style="display:inline-block;"> <img style="height:22px!important;vertical-align:text-bottom;" src="https://mirrors.creativecommons.org/presskit/icons/cc.svg?ref=chooser-v1"><img style="height:22px!important;margin-left:3px;vertical-align:text-bottom;" src="https://mirrors.creativecommons.org/presskit/icons/by.svg?ref=chooser-v1"></a> Text, code, and figures are licensed under a Creative Commons Attribution 4.0 (CC BY 4.0) <a href="http://creativecommons.org/licenses/by/4.0/?ref=chooser-v1" target="_blank" rel="license noopener noreferrer" style="display:inline-block;">International licence</a>, except where otherwise noted. Thumbnail image by <a href="https://www.shutterstock.com/image-illustration/growth-gdp-statistical-graph-3d-rendering-2302082265">Shutterstock Van Fink</a>.</p>
</div>
<div class="g-col-12 g-col-md-6">
<dl>
<dt>How to cite</dt>
<dd>
Gohil, Atmajitsinh. 2024. “Nowcasting upgrade for better real time estimation of GDP and inflation.” Real World Data Science, June 25, 2024. <a href="https://realworlddatascience.net/foundation-frontiers/posts/2024/06/25/nowcasting-3step.html">URL</a>
</dd>
</dl>
</div>
</div>
</div>


</section>


<div id="quarto-appendix" class="default"><section id="footnotes" class="footnotes footnotes-end-of-document"><h2 class="anchored quarto-appendix-heading">References</h2>

<ol>
<li id="fn1"><p>Nowcasting World Trade with Machine Learning: a Three-Step Approach Chinn, M. D., Meunier, B. &amp; Stumpner, S. <em>NBER</em> <a href="https://www.nber.org/papers/w31419">DOI 10.3386/w31419</a>) ↩︎</p></li>
</ol>
</section></div> ]]></description>
  <category>Forecasting</category>
  <category>Machine learning</category>
  <guid>https://realworlddatascience.net/foundation-frontiers/datasciencebites/posts/2024/6/25/nowcasting-3step.html</guid>
  <pubDate>Tue, 25 Jun 2024 00:00:00 GMT</pubDate>
  <media:content url="https://realworlddatascience.net/foundation-frontiers/datasciencebites/posts/2024/6/25/images/GDPshutterstock_2302082265-991.jpg" medium="image" type="image/jpeg"/>
</item>
<item>
  <title>AI series: Ensuring new AI technologies help everyone thrive</title>
  <dc:creator>Anna Demming</dc:creator>
  <link>https://realworlddatascience.net/foundation-frontiers/posts/2024/06/11/ai-series-7.html</link>
  <description><![CDATA[ 





<p>“There’s some beautiful stories in clinical notes,” said Mark Sales, global strategy leader of the cloud technology company Oracle Life Sciences. He was speaking to delegates at the 2024 London Biotechnology Show about “unlocking health data and artificial intelligence within life sciences”, where opportunities abound, such as exploiting large language models (LLMs) to process some of the detailed information currently hidden in clinical notes into more structured data to inform fields like oncology. Oracle are also looking into using AI to take some of the luck out of connecting the right patients with clinical trials that might help them. The AI in Medicine and Surgery group at the University of Leeds headed by Sharib Ali has demonstrated the potential to reduce the number of times patients need to go through <a href="https://www.sciencedirect.com/science/article/pii/S0016508521030870">uncomfortable procedures like oesophageal scans</a>for Barrett’s syndrome , and is working on the potential to provide haptic feedback for robot mediated surgery. The London Biotechnology Showcase delegates had already heard about all these opportunities. Nonetheless Sales’s talk had opened with a note of caution: “There’s a lot more we could do, and there’s a lot more we probably shouldn’t do.”</p>
<p>It is an increasingly familiar caveat. “In the best scenario, AI could widely enrich humanity, equitably equipping people with the time, resources, and tools to pursue the goals that matter most to them,” suggest the <a href="https://partnershiponai.org/paper/shared-prosperity/">Partnership on AI</a>, a non-profit partnership of academic, civil society, industry, and media organizations. The goal of the partnership is to ensure AI brings a net positive contribution to society as a whole not just a lucky minority, which they suggest will not necessarily be the case if we rely on chance and market forces to direct progress. While people working in developing and deploying AI tackle the burgeoning <a href="https://realworlddatascience.net/foundation-frontiers/posts/2024/04/22/ai-series-1.html">size and complexity of their models</a>, as well as the myriad <a href="https://realworlddatascience.net/foundation-frontiers/posts/2024/05/07/ai-series-3.html">requirements of testing and training data</a>, <a href="https://realworlddatascience.net/foundation-frontiers/posts/2024/05/21/ai-series-4.html">establishing whether a model is fit for purpose</a>, and <a href="https://realworlddatascience.net/foundation-frontiers/posts/2024/06/04/ai-series-6.html">dodging the numerous pitfalls that cause most AI projects to fail</a>, perhaps the greatest challenge remains the range of <a href="https://realworlddatascience.net/foundation-frontiers/posts/2024/05/14/ai-series-2.html">ethical considerations</a> including inclusiveness and fairness, robustness and reliability, transparency and accountability, privacy and security and general forethought and design. The scope of societal impact can reach far further than the immediate sphere of interaction with the model, or the interests of the companies deploying them, suggesting the need for some sort of governing forces.</p>
<p>However, technology is moving fast in a lot of different directions. Even with agreed sound values that all technological developments should respect, there is still space for companies to deploy AI models without supplying the necessary resources and expertise so that the roll out meets ethical and societal expectations. This expertise can range from the statistical skills required to ensure the appropriate level of representation in training datasets to the social science understanding to extrapolate potential implications for human behaviour when interacting with the technology.</p>
<p>Although the right checks and balances to avoid potential negative societal impacts have been slower to develop than the technologies they should be regulating, some guiding principles are emerging from organisations labouring to assess with greater clarity what the real immediate and longer term hazards are, what has worked well in other sectors, and the impact of government actions so far. There is an element of urgency in the challenge. As the Partnership on AI put it, “Our current moment serves as a profound opportunity — one that we will miss if we don’t act now.”</p>
<section id="high-stakes" class="level2">
<h2 class="anchored" data-anchor-id="high-stakes">High stakes</h2>
<p>When Open AI publicised their Voice Engine’s ability to clone human voices from just 15s of audio, they too flagged the potential benefit for people with poor health conditions, since those with deteriorating speech could find a means to <a href="https://www.euronews.com/next/2024/04/01/openai-unveils-ai-voice-cloning-tech-that-only-needs-a-15-second-sample-to-work">have their speech restored</a>. However, voice clones had already been used to make robot calls to voters imitating the voice of President Joe Biden and <a href="https://news.sky.com/story/fake-ai-generated-joe-biden-robocall-tells-people-in-new-hampshire-not-to-vote-13054446">telling voters to stay at home</a>.</p>
<p>“The question you have to ask there is what’s the societal benefit of that tool? And what are the risks,” associate director at the Ada Lovelace Institute Andrew Strait told <em>Real World Data Science</em>. “They thankfully decided to not fully release it,” he adds, highlighting how the timing “right before an election year with 40 democracies across the world” could have made the release particularly problematic.</p>
<div id="fig-1" class="quarto-float quarto-figure quarto-figure-center anchored" data-fig-align="center" alt="Themis, goddess of justice. External governance is required to ensure the outcomes of AI deployment are safe and just. Credit Shutterstock, Michal Bednarek">
<figure class="quarto-float quarto-float-fig figure">
<div aria-describedby="fig-1-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
<img src="https://realworlddatascience.net/foundation-frontiers/posts/2024/06/11/images/shutterstock_2436413315.jpg" class="img-fluid quarto-figure quarto-figure-center figure-img" alt="Themis, goddess of justice. External governance is required to ensure the outcomes of AI deployment are safe and just. Credit Shutterstock, Michal Bednarek">
</div>
<figcaption class="quarto-float-caption-bottom quarto-float-caption quarto-float-fig" id="fig-1-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
Figure&nbsp;1: Themis, goddess of justice. External governance is required to ensure the outcomes of AI deployment are safe and just. Credit Shutterstock, Michal Bednarek.
</figcaption>
</figure>
</div>
<p>While OpenAI’s voice engine might have made voice cloning more accessible had they proceeded with a full release, voice cloning is clearly still well within reach for some already. Strait cites the experiences of hundreds of performing artists in the UK over the past few months that have been brought to the attention of the Ada Lovelace Institute. “They’re brought into a room; they’re asked to record their voice and have their face and likeness scanned; and that’s the end of their career,” says Strait. The sums paid to artists on these transactions are not large either. “They are never going to be asked to come back for audition again, because they [the companies] can generate their likeness, that voice doing anything that a producer wants without any sense of attribution, further payments, or consent to be used in that way.”</p>
<p>Customer service is another sector where jobs have been threatened with replacement by a generative AI chatbot, however the technology can run into problems since gen-AI is known to <a href="https://realworlddatascience.net/the-pulse/editors-blog/posts/2022/11/23/LLM-content-warning.html">“hallucinate”</a>, generating false information. Air Canada has just lost a case defending its use of a chatbot that misinformed a customer that they could apply for a bereavement fare retroactively, which is not the case according to Air Canada’s bereavement fare policy. In their defence Air Canada flagged that the chatbot had supplied a link to a webpage with the correct information but the court ruled that there was no reason to believe the webpage information over the chatbot, or for the customer to double check the information they had been supplied. While there are <a href="https://mitsloanedtech.mit.edu/ai/basics/addressing-ai-hallucinations-and-bias/">ways to mitigate problems</a> with gen-AI with the right teams in place , other industries have also hit problems with the accuracy and reliability of gen-AI, which may dampen the impact AI has on the labour market. All in all the wider picture of how AI deployment may affect jobs is largely a matter of speculation. Here a US piloted scheme may soon provide framework for a more data informed approach to <a href="https://realworlddatascience.net/ideas/posts/2024/05/28/ai-series-5.html">tackling AI’s impact on the workforce</a>.</p>
<p>Strait highlights that conversations that centre around efficiency when weighing up the possible advantages of introducing AI can be ill informed. “If we’re talking about an allocation of resources in which we’re spending an increasing amount of money on automating certain parts of the NHS, or healthcare or the education system, or public sector services, how are we making the decisions that are determining if that is worth the value for money? Instead of investing in more doctors, more teachers, more social workers?” He tells Real World Data Science that these are the questions he and his colleagues at the Ada Lovelace Institute are often pushing governments to try to answer and evidence rather than to just assume the benefits will accrue. When it comes to measures of success of an AI model, Strait says “It’s often defined in terms of how many staff can be cut and still deliver some kind of service…This is not a good metric of success,” he adds. “We don’t want to just get rid of as many jobs as we can, right, we want to actually see improvements in care, improvements in service.”</p>
<p>Michael Katell, ethics fellow in the Turing’s Public Policy Programme and a visiting Senior lecturer at the Digital Environment Research Institute (DERI) at Queen Mary University of London suggests the problems may go deeper still when looking at the use of generative AI in creative industries. “There are definitely parallels with prior waves of disruption,” he says citing as an example the move to drum-based and eventually laser printing as opposed to manual typesetting. “A key difference, though, is that, in the creative arts, we’re talking about contributions to culture, and culture is something that, I think we often take for granted.” He highlights the often overlooked role cultural practices that enable and empower shared experiences have in holding society together. These may come in various forms from works of art to theatre, and the working and living practices among the wider community may play an important role too. While acknowledging there may be interesting and fascinating uses of AI in art to explore, Katell adds, “If we’re not attending to maintaining some aspects, or trying to manage the changes that are happening in our culture, I think we’ll see societal level effects that are much greater than the elimination of some jobs.”</p>
</section>
<section id="the-need-for-legislation" class="level2">
<h2 class="anchored" data-anchor-id="the-need-for-legislation">The need for legislation</h2>
<p>These stakes all highlight the need for regulatory interventions. However, most governments, bar China and the EU, have so far favoured “voluntary commitments” towards AI safety, which would seem to fall short of providing the kind of governance over the sector that can be robustly enforced. In a recent blog Strait alongside the Ada Lovelace Institute’s UK public policy lead Matt Davies and associate director (Law &amp; Policy) Michael Birtwhistle, “evaluate the evaluations” of the UK’s AI Safety Institute for companies that have opted in for <a href="https://www.adalovelaceinstitute.org/blog/safety-first/">these voluntary commitments</a>. They highlight that on the whole the companies planning to release the product hold too much control over how the evaluation can take place, ultimately empowering them to direct tests in their favour, which inhibits efforts at robust monitoring. Furthermore, there is usually no avenue for the necessary scrutiny of training data sets. Even withstanding these limitations, Davies, Strait and Birtwhistle conclude that “conducting evaluations and assessments is meaningless without the necessary enforcement powers to block the release of dangerous or high-risk models, or to remove unsafe products from the market.”</p>
<p>The reticence to implement firmer regulation might be attributed in some part to the perceived benefits to the state when their AI companies succeed. One often perceived benefit is that the percolating profits these companies accrue may benefit the economic buoyancy of the societies they function within. There is also cause for sovereign state competitiveness in “AI prowess” that stems from the potential for AI-based technology to underpin all aspects of society, prompting what has been described as an <a href="https://ainowinstitute.org/publication/a-lost-decade-the-uks-industrial-approach-to-ai">“AI arms race”</a>. Here the UK may well regret allowing Google to acquire Deep Mind, whose output is responsible for bolstering the “UK’s share” of citations in the top 100 recent AI papers from 1.9% to 7.2%. However, a lack of robust regulation may prove as much a disservice to the companies releasing AI products as it is to society as a whole.</p>
<p>“The medicine sector here [in the UK] is thriving, not in spite of regulation, but because of regulation,” says Strait. “People trust that the products you develop here are safe.” Katell, highlights the impact of pollution legislation on the automotive industry. “It jumped forward invention and discovery in automotive technology,” he tells <em>Real World Data Science</em>. “It seems prosaic in hindsight, but it wasn’t, it was a major innovation that was promoted by regulators, promoted by legislators.” The UK government’s chief scientific advisor Angela McLean seems to agree. “Good regulation is good for innovation,” she replied when asked about balancing regulation with favourable conditions for a flourishing AI sector at an Association of British Science Writers’ event in May. “We’re not there yet,” she added. The challenge is pinning down what good regulation looks like.</p>
</section>
<section id="regulatory-ecosystems" class="level2">
<h2 class="anchored" data-anchor-id="regulatory-ecosystems">Regulatory ecosystems</h2>
<p>As has been emphasised throughout the series, making a success of an AI project requires a unique skillset that <a href="https://realworlddatascience.net/foundation-frontiers/posts/2024/06/04/ai-series-6.html">combines expertise in AI with the domain expertise for the sector</a> the project is contributing to, and there is often a dearth of people that straddle both camps. The same hunt for “unicorns” with useful expertise in the tech sector and policymakers can also be an obstacle for developing “good regulation”. One solution is to bring people from the different disciplines together to develop legislation collaboratively, as was arguably the case with the roll out of General Data Protection Regulations (GDPR) in 2018. “Policymakers and academics, they worked very closely together in the crafting of that law,” says Katell. “It was one of those rare moments in which we saw the boundaries really dissolve between policy and academia in a way that delivered something that I think we can agree was largely a positive outcome.”</p>
<p>When it comes to AI, an obstacle to that kind of collaboration has been the lack of a common language. In “Defining AI in policy and practice” in 2020<sup>1</sup>, Katell alongside Peaks Krafft at the University of Oxford and co-authors found that AI researchers favoured definitions of AI that “emphasise technical functionality”, whereas policy-makers tended towards definitions that “compare systems to <a href="https://realworlddatascience.net/foundation-frontiers/posts/2024/04/29/gen-ai-human-intel.html">human thinking and behavior”, which AI systems remain far from achieving</a>. Strait also highlights a recurring theme among those without experience of actually making AI systems in overselling AI capabilities in suggestions that it will “help solve climate change” or “cure cancer”. “How are you measuring that?” he asks. “How are we making a clear sense of the efficacy, the proof behind those kinds of statements? Where are the case studies that actually work, and how are we determining that’s working?”</p>
<div id="fig-2" class="quarto-float quarto-figure quarto-figure-center anchored" data-fig-align="center" alt="Safety first. External governance is required to ensure the outcomes of AI deployment are safe and knock on effects have been considered. Credit Shutterstock, 3rdtimeluckystudio">
<figure class="quarto-float quarto-float-fig figure">
<div aria-describedby="fig-2-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
<img src="https://realworlddatascience.net/foundation-frontiers/posts/2024/06/11/images/shutterstock_2180417651.jpg" class="img-fluid quarto-figure quarto-figure-center figure-img" alt="Safety first. External governance is required to ensure the outcomes of AI deployment are safe and knock on effects have been considered. Credit Shutterstock, 3rdtimeluckystudio">
</div>
<figcaption class="quarto-float-caption-bottom quarto-float-caption quarto-float-fig" id="fig-2-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
Figure&nbsp;2: Safety first. External governance is required to ensure the outcomes of AI deployment are safe and knock on effects have been considered. Credit: Shutterstock. Photo by 3rdtimeluckystudio.
</figcaption>
</figure>
</div>
<p>As Krafft <em>et al.</em> point out in their 2020 paper, such exaggerated perceptions of AI capabilities can also hamper regulation. “As a result of this gap,” they write, “ethical and regulatory efforts may overemphasise concern about future technologies at the expense of pressing issues with existing deployed technologies.” Here a better <a href="https://realworlddatascience.net/foundation-frontiers/posts/2024/04/22/ai-series-1.html">understanding of what AI is</a> can be helpful to focus attention on the problems that exist now – not just the potential <a href="https://realworlddatascience.net/foundation-frontiers/posts/2024/05/28/ai-series-5.html">workforce impact</a>, but the carbon cost of training large language models, activities like nonconsensual gen-AI porn aggravating online gender inequality, and a widening digital divide disadvantaging pupils, workers and citizens who cannot afford all the latest AI tools, among others.</p>
<p>Fortunately, there has already been progress to breach the language divide between policy makers and the tech sector. “The current definitions [championed in policy circles] say things like technologies that can perform tasks that require intelligence when humans do them,” says Katell, which he describes as a far more sober and realistic definition than likening technologies to the way humans think and work. “This is really important,” he adds. “Because some of the problems that we see with AI now are symptomatic of the fact that they’re not humans and that they don’t have the same experience of the world.” As an example he describes someone driving a car with child in the car seat, calling on all their training and experience of road use to navigate roads and other traffic, while juggling their attention between driving and the child. “Things that AI is too brittle, to accomplish,” he adds, highlighting how a simple model may identify school buses in images quite impressively until it is presented with an image of a bus upside down. “The flexibility and adaptability, the softness of human reason, is actually its strength, its power.”</p>
<p>Getting everybody on the same page can also help provide a more multimodal approach to governance. Empowering independent assessors of AI product safety prior to release is one thing but as Strait points out, “It could be more like the environmental sector, where we have a whole ecosystem of environmental impact assessments, organisations and consultancies that do this kind of work for different organisations and companies.” Internal teams within companies can play an important role too so long as they work sufficiently independently from the companies themselves. When set up with the right balance of expertise they can be better placed to understand and hence assess the technology and practical elements of its implementation. Although such teams can be expensive, getting the technical evaluation and consideration of ethical issues right can pose a competitive advantage for the companies themselves as well as providing a more thorough safeguard for society at large. Nonetheless there are also obvious advantages in having external regulatory bodies, which do not need to take into account the company’s profit margins or shareholders’ needs. An ideal set up might incorporate both approaches. In fact in their appraisal of the current UK AI Safety Institute arrangement, Davies, Strait and Birtwistle first highlight the need to integrate the AI Safety Institute “into a regulatory structure with complementary parts that can provide appropriate, context-specific assurance that AI systems are safe and effective for their intended use.”</p>
</section>
<section id="prosperity-for-all" class="level2">
<h2 class="anchored" data-anchor-id="prosperity-for-all">Prosperity for all</h2>
<p>With all the precedents in other sectors from environmental impact checks to pharmacology, an organised framework or ecosystem for robust, independent and meaningful evaluation of AI product safety seems an inevitable imperative, albeit potentially expensive. (Davies, Strait and Birtwistle cite £100 million a year as a typical cost for safety driven regulatory systems in the UK<sup>2</sup>, and the expertise demands of AI could further increase costs.) However, such regulatory reform will likely slow down the pace of technological development and the route to market. While the breathing space to adjust to the societal changes they bring with them may be welcomed by some, the delay can be quite unpopular in a tech sector where the ethos is famed for embracing a “move fast, break things” mentality. As Katell points out that ideal is based on the notion that the things being broken were unimportant – when it’s vulnerable people and societies that is “unacceptable breakage”.</p>
<p>Strait also highlights the cultural mismatch between the companies developing AI products – where the research to market pipeline is extremely fast – and the sectors those tools are intended to serve, such as social care, education and health. Although Open AI eventually decided against full release of the Voice Engine, when it comes to the ethos of some AI technology companies , “The default is to put things out there and to not think through the ethical and societal implications,” says Strait who has direct experience of working for a company producing AI tools in the past. “I think it’s so critical for data scientists and ethicists to explore, and do that translation and interrogation of what are the ethics of the sector that we’re working in?”</p>
<p>Katell voices concern shared by many that at present AI is under the control of a very small handful of very large, powerful technology companies, and as a result the AI releases making the most impact are targeting the agendas of the companies releasing them and their current and anticipated customer base, as opposed to the needs of society. The potential for such large tech agents to become too big to fail poses additional regulatory challenges. While many may lament the tension between a demand for open source data sets for testing AI models versus the need to respect data privacy, security and confidentiality, there have already been widely mooted instances where certain companies may <a href="[https://www.bloomberg.com/news/articles/2024-04-04/youtube-says-openai-training-sora-with-its-videos-would-break-the-rules?embedded-checkout=true">not have met expectations for respecting copyright and terms of service</a>. In fact the tech giants are not the only people developing AI models and the open source community have been known to pose valuable competition that may temper the tendency for AI to concentrate a lot of power into the hands of a small few<sup>3</sup>. However, open source developers can also pose a certain amount of <a href="https://datainnovation.org/2024/03/the-eus-ai-act-creates-regulatory-complexity-for-open-source-ai/">regulatory complexity</a>.</p>
<p>There is also an argument that these efforts should broaden their scope beyond baseline AI safety and aim to focus efforts in AI development towards tools that actively promote greater wellbeing and prosperity to the many. “We need to bring in other values like fairness, justice, and simple things like explainability, gender equity, racial equity,” says Katell, highlighting some of the other qualities that demand attention among others. Taking explainability as an example, there is increasing awareness of the need to understand how certain outputs are reached in order for people <a href="https://realworlddatascience.net/foundation-frontiers/posts/2024/06/04/ai">to feel comfortable with the technology</a>, and the outputs requiring explanations differ from person to person. Although it can be hard to explain AI outputs, progress is being made in this direction. As Katell says, “We’re not helpless in managing these types of disruptions. It’s a matter of societies coming together and deciding that they can be managed.”</p>
<div class="article-btn">
<p><a href="../../../../../foundation-frontiers/index.html">Explore more data science ideas</a></p>
</div>
<div class="further-info">
<div class="grid">
<div class="g-col-12 g-col-md-12">
<dl>
<dt>About the authors</dt>
<dd>
<strong>Anna Demming</strong> is a freelance science writer and editor based in Bristol, UK. She has a PhD from King’s College London in physics, specifically nanophotonics and how light interacts with the very small, and has been an editor for Nature Publishing Group (now Springer Nature), IOP Publishing and New Scientist. Other publications she contributes to include The Observer, New Scientist, Scientific American, Physics World and Chemistry World.
</dd>
</dl>
</div>
<div class="g-col-12 g-col-md-6">
<!-- copyright goes to the author, or to Royal Statistical Society if written by staff -->
<dl>
<dt>Copyright and licence</dt>
<dd>
© 2024 Royal Statistical Society
</dd>
</dl>
<!-- confirm licence terms with contributor before publishing - must be Creative Commons licence, but different types of CC licences might be preferred -->
<p><a href="http://creativecommons.org/licenses/by/4.0/?ref=chooser-v1" target="_blank" rel="license noopener noreferrer" style="display:inline-block;"> <img src="https://mirrors.creativecommons.org/presskit/icons/cc.svg?ref=chooser-v1" style="height:22px!important;vertical-align:text-bottom;"><img src="https://mirrors.creativecommons.org/presskit/icons/by.svg?ref=chooser-v1" style="height:22px!important;margin-left:3px;vertical-align:text-bottom;"></a> This article is licensed under a Creative Commons Attribution 4.0 (CC BY 4.0) <a href="http://creativecommons.org/licenses/by/4.0/?ref=chooser-v1" target="_blank" rel="license noopener noreferrer" style="display:inline-block;"> International licence</a>. <!-- Add thumbnail image credit and any licence terms here --></p>
</div>
<div class="g-col-12 g-col-md-6">
<dl>
<dt>How to cite</dt>
<dd>
Demming, Anna. 2024. “Ensuring new AI technologies help everyone thrive .” Real World Data Science, June 11, 2024. <a href="https://realworlddatascience.net/foundation-frontiers/posts/2024/06/11/ai-series-7.html">URL</a>
</dd>
</dl>
</div>
</div>
</div>
<!-- Make sure to update main site homepage (index.qmd) before publishing. See README for details. -->


</section>


<div id="quarto-appendix" class="default"><section id="footnotes" class="footnotes footnotes-end-of-document"><h2 class="anchored quarto-appendix-heading">References</h2>

<ol>
<li id="fn1"><p>Krafft, P. M., Young, M., Katell, M., Huang, K. &amp; Bugingo, G. <a href="https://dl.acm.org/doi/abs/10.1145/3375627.337583">Defining AI in Policy versus Practice</a> <em>AIES ’20: Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society</em> 72-78 (2020)↩︎</p></li>
<li id="fn2"><p>Smakman, J, Davies, M. &amp; Birtwhistle, M. <a href="https://www.adalovelaceinstitute.org/policy-briefing/ai-safety/">Mission critical</a> <em>Ada Lovelace Policy Briefing</em> (2023)↩︎</p></li>
<li id="fn3"><p><a href="https://www.semianalysis.com/p/google-we-have-no-moat-and-neither">Google “We Have No Moat, And Neither Does OpenAI</a> <em>semianalysis.com</em> (2023) (semianalysis.com)↩︎</p></li>
</ol>
</section></div> ]]></description>
  <category>AI</category>
  <category>AI ethics</category>
  <category>Regulation</category>
  <guid>https://realworlddatascience.net/foundation-frontiers/posts/2024/06/11/ai-series-7.html</guid>
  <pubDate>Tue, 11 Jun 2024 00:00:00 GMT</pubDate>
  <media:content url="https://realworlddatascience.net/foundation-frontiers/posts/2024/06/11/images/shutterstock_2436413315.jpg" medium="image" type="image/jpeg"/>
</item>
<item>
  <title>AI series: What is “best practice” when working with AI in the real world?</title>
  <dc:creator>Anna Demming</dc:creator>
  <link>https://realworlddatascience.net/foundation-frontiers/posts/2024/06/04/ai-series-6.html</link>
  <description><![CDATA[ 





<div class="quarto-video ratio ratio-16x9"><iframe data-external="1" src="https://www.youtube.com/embed/SoOoj9iUTM0" title="" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen=""></iframe></div>
<p>Over the course of the Real World Data Science AI series, we’ve had articles laying out the nitty gritty of what AI is, how it works, or at least how to get an explanation for its output as well as burning issues around the data involved, evaluating these models, ethical considerations, and gauging societal impacts such as changes in workforce demands. The ideas in these articles give a firm footing for establishing what best practice with AI models should look like but there is often a divide between theory and practice, and the same pitfalls can trip people up again and again. Here we discuss how to wrestle with real world limitations and flag these common hazards.</p>
<p>Our interviewees, in order of appearance, are:</p>
<p><strong>Ali Al-Sherbaz</strong>, academic director in digital skills at the University of Cambridge in the UK</p>
<p><strong>Janet Bastiman</strong>, Napier chief data scientist and chair of the Royal Statistical Society Data Science &amp; AI Section</p>
<p><strong>Jonathan Gillard</strong>, professor of statistics/data science at Cardiff University, and a member of the Real World Data Science Board</p>
<p><strong>Fatemeh Torabi</strong>, senior research officer and data scientist, health data science at Swansea University, and also a member of the Real World Data Science board</p>
<p><strong>It is often said that while almost everybody is now trying to leverage AI in their projects, most AI projects fail. What nuggets of wisdom do the panel have for swelling that minority that succeed with their AI projects, and what should you do before you start doing anything?</strong></p>
<p><strong>Ali Al-Sherbaz</strong>: It’s not easy to start, especially for people who are not aware how AI works. My advice is, first, they have to understand the basics of how AI works because the expectation could be overpromising, and that is a danger. Just 25 years ago, a master dissertation might be about developing a simple – we call it simple now but it was a master’s project 25 years ago – a simple model with a neural network of a combination of nodes to classify data. Whatever the data is – it could be drawing shapes, simple shapes, square, circle triangle – just classifying them was worth an MSc. Now, kids can do it. But that is not the same as understanding what the neural network or the AI is. It’s a matrix of numbers, and actually, for the learning process each does multiple iterations to find the best combination of these numbers – product of sum; sum of product – to classify, to do something, and train them for a certain situation, and that is a supervised learning. Over the last 25 years – especially in the last 10 years – the computational power is getting better, so AI is now working better.</p>
<p>There are other things people have to learn. There’s the statistics as well, and of course people who would like to work in AI and data science must understand the data, and they should also be experts in the data itself. For instance, I can talk about cybersecurity, I can talk about networking and other things, but if it comes to something regarding health data, or financial services, or stock markets, I’m not an expert in the data. So I’m not going to be actively working on those things even if I use the same AI tools. This is in a nutshell why I think some people fail sometimes using AI, or they succeed using AI. And we should emphasise the human value. The AI is there, and it exists to help us to make a better more accurate decision, but the human value is still there. We have to insist on that.</p>
<p><strong>Janet Bastiman</strong>: I would just like to build on all of that great stuff that Ali’s just said. When you look at basically the non-data scientist side of it, you often get businesses who think AI can solve a certain problem. They might go out and hire a team – whether that’s directly or indirectly – and get them to try and solve a problem that, as Ali said, they may not have the domain expertise for. The business might not even have the right data for it, and AI might not even be the right way of solving that problem. I think that’s one of the fundamental things to think about – really understanding what you’re trying to solve, and how you’re going to solve it before you start throwing complex tools, and potentially very expensive teams at the problem.</p>
<p>When you look at a lot of the failures, it’s been because businesses have just gone, we can solve this problem, I’m just going to hire a team and let these intelligent people look at something. And then they’re restricted on the data that they’ve got, which won’t even answer the question; they’re restricted on the resources they have; and even restricted in terms of wider buy in from the company. So really understanding what is it that you want to solve? What are you trying to do? Is AI the right thing? And can you even do it with the resources you have available? And I think that’s, that’s a fundamental starting point. Because, you can have wonderful experts, who have that domain knowledge, who understand the statistics, and all that essential stuff that Ali just said. But then if from a business point of view, if you don’t give them the right data to work on, or you don’t let them do their job and tell you when they can’t do their job, then again, you’re going to be doomed to failure.</p>
<p><strong>Jonathan Gillard</strong>: Explainability is a big issue when it comes to AI models, as well. They are at the moment, very largely “black box” – data goes in, then these models get trained on dumb data and answers get popped out. And when it works, well, it works fabulously well. And we’ve seen lots of examples of that happening. But often for business, industry or real life, we want to learn. We want to understand the laws of the universe, and to understand the reasons why this answer came about. Because this explainability piece is missing – because everything is hidden away almost – I think that’s a big issue in successful execution. And particularly when it comes to industries where there’s a degree of regulation there as well, if you can’t explain how a particular input arose to a particular output, then how can you justify to regulatory bodies that what you’ve got is satisfactory, ethical, and that you’re learning and you’re doing things in the right way?</p>
<p><strong>There have been efforts at trying to get explanations from these models. How do you think things are progressing there?</strong></p>
<p><strong>JG</strong>: Yeah, that’s a good question. I think where we are with explainability is in very simple scenarios, very simple models. This is where traditional statistical models do very well. There’s an explicit model which says if you put these things inside then you’ll get this output. So [for today’s AI] I think we’re actually very far away from having that complete explainability picture, particularly as we fetishise more and more grand models. The AI models are only getting bigger, more complex, and that makes the explainability per se even more challenging. And that’s why I think, as Ali says, at the moment, the human in the loop is absolutely crucial.</p>
<p>What AI does share with classical statistics (or classical data science if you want to call it that) is it can still only be as good as the data that’s put into it, that’s still a fundamental truth. I think a lot of the assumptions currently with AI models – and this is where there could be a few trip ups is that it can create something from nothing. It’s “artificial intelligence” – almost the wording suggested it’s artificial. But fundamentally, we still need a robust and reliable comprehensive source of data there in order to train these models in the first place.</p>
<p><strong>In terms of having outsourced expertise for these projects– does that make more problems if you’re then trying to understand what this AI has done?</strong></p>
<p><strong>JB</strong>: Oh, hugely. Let’s say that domain expertise – that’s something Ali touched on –you’ve got to understand your data. Because even that fundamental initial preparation of data before you try and train anything is absolutely crucial – really looking at where are the gaps? Where are the assumptions? How is this data even being collected? Has it been manipulated before you got to it? If you don’t understand your industry, well enough you won’t know where those pitfalls might be – and a lot of teams do this, they just take the data, and then they just put it in, turn the handle and out comes something and it looks like it’s okay. What they’re really missing there – because they’re not putting that effort in to really understand those inputs, what the models are doing, they’re just turning the handle until they get something that feels about right – what they miss out is where it goes wrong. And there are some industries, where the false positives and false negatives from classification or the bad predictions from running things really have a severe human impact. And if you don’t understand what’s going in, and the potential impact of what comes out, then it’s very, very easy to just churn these things out and go, “it’s 80% accurate, but that’s fine” without really understanding the human impact of the 20% [that it gets wrong].</p>
<p>Going back to what Jon said about that explainability, it’s so crucial. It is challenging, and it is difficult, but going from these opaque systems to more transparent systems – we need that for trust. As humans, we divulge our trust very differently, depending on the impact. One of the examples I use all the time is, you know, sort of weather prediction stuff, you know, we don’t really care too much, because it’s not got a huge impact. But when you look at sort of financials or medicals, we really, really want to know that that output is good, and how we got to that output. The Turing Institute’s come out with some great research that says, as humans, if we want to understand why when another human has told us something, then we want the same thing from the models, and that can vary from person to person. So building that explainable level into everything we do, has to be one of the things we think about upfront. But you’ve got to really, truly deeply understand that data. And it’s not just a question of offloading a data set to a generalist who can turn that handle, otherwise you will end up with huge, huge problems.</p>
<p><strong>Fatemeh Torabi</strong>: I very much agree with all the points that my colleagues raised. I also think it’s very important that we know why we are doing things. Having those incremental stages in our planning for any project, and then having a vision of where we see AI can contribute into this process and can give us further efficiency – and how – is very important. If we don’t have defined measures to see how this AI algorithm is contributing to this specific element of the project, we can get really lost bringing these capabilities on board. Yes, it might generate something, but how we are going to measure that something is very important. I think, as members of the scientific community, we must all view AI as a valuable tool. However, it has its own risks and benefits.</p>
<p>For example, in healthcare when we use AI for risk predictions, it can be a really great tool to aid clinicians to save time. However, in each stage, we need to assess the data quality, how these data are fed into the algorithm, what procedures, what models, and how we generate those models. And then which discriminative models do we use to balance the risk and eventually predict the risk of outcomes in patients? It’s very much a balance between risks and benefits for usefulness of these tools in practice. We have all these brilliant ideas of what best practice is. But in real terms, sometimes it’s a little bit tricky to follow through.</p>
<p><strong>Could you give us some thoughts on the sort of best practice with data, for example, that doesn’t quite turn out to be quite so easy to follow in practice, and what you might do about it?</strong></p>
<p><strong>FT</strong>: We always call these AI algorithms, data hungry algorithms, because the models that we fit require us to see patterns in the data that we feed into them so that the learning happens. And then the discriminative functions come in place to balance and kind of give a score to wherever the learning is happening and give an evaluation of each step. However, the data that we put into these algorithms comes first – the quality of that data. Often in healthcare, because of its sensitivity, the data is held within a secure environment. So we cannot, at this point in time, expose an AI algorithm to a very diverse example, specifically for investigating rare diseases or rare conditions. And above that, there is also complexities in the data itself. We need to evaluate and clean the data before we feed it into these algorithms. We need to evaluate the diversity of the data itself – for example, the tabular data, the imaging data, the genomic data – and each one requires its own specific or tailored approach in data cleaning stages.</p>
<div id="fig-1" class="quarto-float quarto-figure quarto-figure-center anchored" data-fig-align="center" alt="The panel. Clockwise from top left: Ali Al-Sherbaz, Janet Bastiman, Fatemeh Torabi and Jonathan Gillard">
<figure class="quarto-float quarto-float-fig figure">
<div aria-describedby="fig-1-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
<img src="https://realworlddatascience.net/foundation-frontiers/posts/2024/06/04/images/panel-991.jpg" class="img-fluid quarto-figure quarto-figure-center figure-img" alt="The panel. Clockwise from top left: Ali Al-Sherbaz, Janet Bastiman, Fatemeh Torabi and Jonathan Gillard">
</div>
<figcaption class="quarto-float-caption-bottom quarto-float-caption quarto-float-fig" id="fig-1-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
Figure&nbsp;1: The panel. Clockwise from top left: Ali Al-Sherbaz, Janet Bastiman, Fatemeh Torabi and Jonathan Gillard
</figcaption>
</figure>
</div>
<p>We also have another level that is now being discovered in the health data science community, which is the generation of synthetic data. We can give AI models access to these synthetic versions of the data that we hold. However, that also has its own challenges because it requires reading the patterns from real data, and then creating those synthetic versions of data.</p>
<p>For example, Dementia Platforms UK is one of the pioneers in developing this. We hold a range of cohort data, patients’ data, genomics data and imaging data. In each one of these when we try to develop those processing algorithms, there are specific tailored approaches that we need to consider to ensure we are actually creating a low fidelity level of data that is holding some of the patterns in it for the AI algorithm to allow the learning to happen. However, we also need to consider whether it is safe enough so that we can ensure the data provided are secure to be released for use at a lower governance level compared to the actual data. So there are quite a lot of challenges, and we captured a lot of it in our <a href="https://realworlddatascience.net/foundation-frontiers/posts/2024/05/07/ai-series-3.html">article</a>.</p>
<p><strong>A A-S</strong>: I can talk about the cybersecurity and other relevant data network security, the point being the amount of data we receive to analyse. It’s really huge. And when I say huge I mean about one gigabyte, probably in a couple of hours, or one terabyte in a week – that’s huge. One gigabyte of a text file – if I printed out this file with A4 – that would leave me with a stack of A4 paper, three times the Eiffel Tower.</p>
<p>Now, if I have cyber traffic, and try to detect any cyber attack, AI helps with that. However, if we train this model properly, they have to detect cyber attacks in real time – when I say real time, we’re talking about within microseconds or a millisecond – and the decision has to be correct. AI alone doesn’t work, doesn’t help. Humans should also intervene, but rather than having 100,000 records to check for a suspected breach, AI can reduce that to 100. A human can interact with that. And then in terms of the authentication or verification, humans alongside AI can learn whether this is a false positive, or a real attack or a false negative. This is a challenge in the cybersecurity area.</p>
<p><strong>JB</strong>: I just wanted to dive in from the finance side – again the data is critical, and we have very large amounts of data. However in addition – and I think we probably suffer from the same sort of problem that Ali does in this – when I’m trying to detect things, there are people on the other side actively working against what I’m trying to detect, which I suppose is a problem that maybe Fatemeh doesn’t have in healthcare.</p>
<p>When you’re trying to build models to look for patterns, and those patterns are changing underneath you, it can be incredibly difficult. I have an issue that all of my client’s data legally has to be kept separated – some of it has to be kept in certain parts of the world so we can’t put that into one place. We can try and create synthetic data that has the same nuances of the snapshots that we can see at any one point in time, and we can try and put that together in one place, but what we can detect now will very quickly not be what we need to detect in a month’s time. As soon as transactions start getting stopped, as soon as suspicious activity reports are raised, and banks are fined, everything switches and how all of that financial crime occurs, changes. And it’s changing, on a big scale worldwide, but also subtly because, there are a team of data scientists on the other side trying desperately to circumvent the models that me and my team are building. It’s absolutely crazy. So while I would love to be able to pull all of the data that I have access to in one place and get that huge central visual view, legally I can’t do that because of all the worldwide jurisdictional laws around data and keeping it in certain places.</p>
<p>Then I’ve also got the ethical side of it, which is something that Fatemeh touched on. If I get it wrong, that can have a material impact on usually some of the most marginalised in society. The profile of some of the transactions that are highly correlated with financial crime are also highly correlated with people in borderline poverty, even in Western countries. So false positives in my world have a huge, huge ethical impact. But at the same time, we’re trying really hard to minimise those false negatives – that balance is critical, and the data side of it is such a problem.</p>
<p>Fatemeh mentioned the synthetic side of it. There’s a huge push, particularly in the UK to get good synthetic data to really showcase some of these things that we’re trying to detect. But by the time you get that pooling, and the synthesising of data that you can ethically use and share around without fear of all the legal repercussions, what we’re trying to detect has already moved on. So we’re constantly several steps behind.</p>
<p>I imagine Ali has similar problems in the cybercrime space in that as soon as things are detected, the ways in which they work move on. So there’s an awful lot I think that, as an industry, although we have different verticals, we can share best practices on.</p>
<p><strong>Is there a demand for new types of expertise?</strong></p>
<p><strong>A A-S</strong>: There is a huge gap in the in the UK, at least and worldwide about finding people working as a data scientist or working with the data. So we created a course in Cambridge, which we call the data science career accelerator for people who work in data, and would like to move on and learn more. We did market research, and we interviewed around 50 people between CEO and head of security and head of data scientists, in science departments and in industry, to tell us – what kind of skills are you after? What problems do you currently have? And then we designed this course.</p>
<p>We found that first of all there are people who don’t know from where to start – what kind of data they need, what tools they have to learn with… Even if they learn the tools, they still need to learn what kind of machine learning process to use. And then suddenly, we have ChatGPT turned out, and the LLM [large language model] development – all of that in one course, it is a real challenge.</p>
<p>The course has started now, the first cohort. The big advice from industry we have is that during the course they have to work on real world case studies, on scenarios with data that nobody has touched before – that is, it’s new, not public. We teach them on a public data, but companies also have their own data, and we get consent from them to use that data for the students so we can test the skills they learned on virgin data that nobody has touched before.</p>
<p>We just started this month, and the students are going to start with the first project now. They are enjoying the course but that is the challenge we have now. How did we handle that? It’s to work together with the industry side by side, even during the delivery. We have an academic from Cambridge, and we have experts from the industry to support the learners to learn to get the best of both worlds.</p>
<p><strong>The industry has changed so much in the last couple of years. Does that mean that the expertise and demands are also changing very quickly or is there a common thread that you can work with?</strong></p>
<p><strong>A A-S</strong>: Well, there is a common thread, but having new tools – I mean, Google just released Gemini, and that’s a new skill they have learnt and been tested on, and looked into how others feel about it and compared it to ChatGPT, or Claude 3 or Copilot. That’s all happened in the last 12 months. And then, of course, reacting on that, reflecting on the material, teaching the material – it’s a challenge. It’s not easy and you need to find the right person. Of course, people who have this kind of experience are in demand, and it’s hard to secure these kinds of human resources as well as to deliver the course. So there are challenges and we have to act dynamically and be adaptive.</p>
<p><strong>What are your thoughts on the evaluation of these models, and how to manage the risk of something that you haven’t thought of before, and the role of regulation.</strong></p>
<p><strong>JG</strong>: I think a lot of our discussions at the moment are assuming that we’ve got well meaning, well intentioned people and well meaning, well intentioned companies and industries, who are trying to seek to do their best ethically and regulatorily and with appropriate data, and so on. But there is a space here for bad actors in the system.</p>
<p>Unfortunately, digital transformation of human life will happen in a good and bad way – unfortunately, I think there are going to be those two streams to this. Individuals are very capable now of making their own large language models by following a video guide if they wanted to, and having that data is, of course going to enable them maybe to do bad things with it.</p>
<p>Data is already a commodity in quite a strong way, but I do think we have to visit data security, and even the risks of open data as well. We live in a country, which I think does very well in producing lots of publicly available data. But that could be twisted in a way that we might not expect. And when I speak of those things, we’re usually thinking of groundwork – writing and implementing your own large language models – but there were recent examples of where just by using very clever prompting of existing large language models, you could get quite dangerous material, shall we say, which circumnavigated inbuilt existing safeguards. Again, that’s an emerging thing that we have to have to try and address as it comes on.</p>
<p>I think my final point with ethics and regulation is it will rapidly evolve, and it will rapidly change. And a story which I think can illustrate that is, when the first motorcar was introduced into the UK, it was law for a human to walk in front of the motorcar with a large red flag to warn passers-by of the incoming car because people weren’t really familiar with it. Now, of course, that’s in distant memory, right? We don’t have people with red flags, walking in front of cars. I do wonder, in 20 years or 50 years, what will the ethical norms regarding AI and its use be? Likewise, will we have deregulation? That seems to be the common theme in history that when we get more familiar with things, we deregulate because we’re more comfortable with their existence. That makes me quite curious about what the future holds.</p>
<p><strong>FT</strong>: Jon raised a very interesting point and Janet touched upon keeping financial data in silos but we are facing this in healthcare as well. Data has to be checked within a trusted research environment or secure data environment that’s making the data silos. However, efforts at this point in time are on enhancing these digital platforms to bring data and federal data together. Alongside what is happening in terms of our progression towards development of a new ethical or legal requirement, is documenting what is being practised at the moment, because at the moment there are quite a lot of bubbles. Each institution has their own data and applies their own rules to it. So understanding what it is that we are currently working on – the data flows that are flowing into the secure environments – is building the basis of developments that are going on in terms of developing standardisation and common frameworks. A lot of projects have been focused on understanding the current to develop on it for the future.</p>
<p>We know for example, the Data Protection Act, put forward some specific requirements, but that was developed in 2018, before we had this massive AI consideration. In my academic capacity as well, we are facing what Jon mentioned, in terms of the diversity of assessments for students. For example, when we ask these questions, even if the data is provided within the course and within this defined governance, we know that the answers can possibly be aided by AI – a model. So we are defining more diverse assessment methods in academic practice to ensure that we have a way to evaluate the outcome that we are receiving by the human eye, rather than being blinded by what we receive from AI, and then calling it high quality output, whether in research practice or in academic practice. So there’s quite a lot of consideration of these issues, I think that is bringing our past knowledge to the current point where we now have to balance between human and machine interactions in every single process that we are facing.</p>
<p><strong>How does this change the skill set required of data scientists, as AI is getting more and more developed?</strong></p>
<p><strong>A A-S</strong>: Regarding the terminology of data scientists, when we talk about data we immediately link that with statistics, and statistics is an old topic. There has been an accumulation of expertise for 100 years, to the best of my knowledge or more in statistics, and people who are new to data analysis or data, have to learn about this legacy. And when we develop the course, we should mention these skills in statistics and build this knowledge on top, that is, when we reach the right point, then we talk about learning or machine learning, supervised and unsupervised, and about LLM – these are the new skills they have to learn. As I mentioned, it’s tricky when we teach learners about it, we have to provide them with simple datasets to teach them something complex in statistics because it’s a danger to teach both [data and statistics at the same time] – we will lose them, they will lose concentration and it’s hard to follow up. So, a little bit of statistics – they have to learn the basics like normal distribution, the distribution, the type, and what does it mean when we have these distributions, the meaning of the data – and that is the point I made earlier about how people should have a sense for the numbers. What does it mean, when I say 0.56 in healthcare? Is that a danger? 60% – is that OK? In cybersecurity, if the probability of attack today is 60% should I inform the police? Should I inform someone; is that important? Or for example, for the stock market? Say we have dropped off 10% – Is that something we have to worry about? So making sense of the numbers is part of it.</p>
<p>That is part of personalised learning because it depends on their background or what they have learned – it’s not straightforward, and it has to be personalised not just for people taking the course now, for instance for someone who is 18 years old coming from their A levels. No, it’s for a wide range. People from diverse courses like to approach this data science course. And now we are in the era of people who are in social science, and engineering, doctors, journalism, art, they are all interested in learning a little bit of data science, and utilising AI for their benefit. So there is no one answer.</p>
<p><strong>You emphasise that people still need to be able to make sense of numbers. We’re often told that AI will devalue knowledge and devalue experience – it sounds like you don’t feel that’s the case.</strong></p>
<p><strong>A A-S</strong>: I have to stick with the following: human value is just that – value. AI without humans is worth nothing. I have one example: In 1997, some software was developed for chess, to play against a human, and for the first time, that computer programme (called AI now) beat Kasparov. Guess what happened? Did chess disappear? No, we still value human to human competition. The value of the human is the same for art and for music. So we still have human value, and we have to maintain that for the next generation. They shouldn’t lose this human value, and handover to AI value, which I feel is zero without the human.</p>
<p><strong>J B</strong>: I think one of the things we are seeing is that diversity in people’s backgrounds coming into data science, which is fantastic, because I think that really helps with the understanding of when things can go wrong, and how things can be misused. If you have this cookie cutter set of people that have all got a degree from the same place and all had the same experience, which is very similar – this happens a lot in the financial industry where there’s like five universities that all feed into the banks – they all think and solve problems in the same way because that’s how they’ve been trained. But as soon as you start bringing in people with different backgrounds, they’re the ones that say, hang on, this is a problem. So having those different backgrounds is really useful.</p>
<p>But then as Ali said there’s so many people who call themselves a data scientist that don’t understand data, or science. And I think he was absolutely right. If you’ve got a probability of 60%, or you’ve got a small standard deviation, when is that an issue? What do you really understand about that based on your industry, and based on your statistical knowledge? That’s so so key. And it’s something that a lot of people who are self-trained and call themselves data scientists have missed out on. So coming back to your original question about is it harder or is it easier, in some respects, it’s a lot harder, because someone who calls himself a data scientist now needs to do everything from basically fundamental research, trying to make models better, you’ve got to understand statistics, you’ve got to understand machine learning, engineering, production, isolation, efficiencies, effectiveness, ethics – it’s this huge, huge sphere. And it’s too much for one person. So you’ve really got to have well balanced teams and support. Because you can’t keep on top of your game across all of those. It’s just not possible. So I think that becomes really difficult. When I look at how things have changed, there’s so many basic principles from, you know, the 80s and 90s, in standard, good quality computer programming and testing. And I think the one thing that we’re really missing as an industry is a specialist AI testing role. Someone who understands enough about how models work and how they can go wrong and can do the same thing for AI solutions, as good QA analysts can do for standard software engineering models. Someone who can really test them to extremes with what happens when I put the wrong data in.</p>
<p>We saw this – there were a couple of days under COVID, where all the numbers went wrong, because the data hadn’t been delivered correctly, or not enough of it had been delivered. There were no checks in place to say, actually, we’ve only got 10% of what we were expecting, so don’t automatically publish these results. It’s things like that, that we really need to make sure are built into the systems because those are the things that, again, could cause problems. As soon as you get a model that’s not doing the right thing – going back to our original question – when they do go wrong, you can then find a company pulls that model even though it could be easily fixed. And then they’re disillusioned with AI, and won’t use it. That’s that whole project, and all of the expense and investment on that just thrown away when a bit more testing and understanding could have saved it.</p>
<div class="article-btn">
<p><a href="../../../../../foundation-frontiers/index.html">Explore more data science ideas</a></p>
</div>
<div class="further-info">
<div class="grid">
<div class="g-col-12 g-col-md-12">
<dl>
<dt>About the authors</dt>
<dd>
<strong>Anna Demming</strong> is a freelance science writer and editor based in Bristol, UK. She has a PhD from King’s College London in physics, specifically nanophotonics and how light interacts with the very small, and has been an editor for Nature Publishing Group (now Springer Nature), IOP Publishing and New Scientist. Other publications she contributes to include The Observer, New Scientist, Scientific American, Physics World and Chemistry World..
</dd>
</dl>
</div>
<div class="g-col-12 g-col-md-6">
<!-- copyright goes to the author, or to Royal Statistical Society if written by staff -->
<dl>
<dt>Copyright and licence</dt>
<dd>
© 2024 Royal Statistical Society
</dd>
</dl>
<!-- confirm licence terms with contributor before publishing - must be Creative Commons licence, but different types of CC licences might be preferred -->
<p><a href="http://creativecommons.org/licenses/by/4.0/?ref=chooser-v1" target="_blank" rel="license noopener noreferrer" style="display:inline-block;"> <img src="https://mirrors.creativecommons.org/presskit/icons/cc.svg?ref=chooser-v1" style="height:22px!important;vertical-align:text-bottom;"><img src="https://mirrors.creativecommons.org/presskit/icons/by.svg?ref=chooser-v1" style="height:22px!important;margin-left:3px;vertical-align:text-bottom;"></a> This article is licensed under a Creative Commons Attribution 4.0 (CC BY 4.0) <a href="http://creativecommons.org/licenses/by/4.0/?ref=chooser-v1" target="_blank" rel="license noopener noreferrer" style="display:inline-block;"> International licence</a>. <!-- Add thumbnail image credit and any licence terms here --></p>
</div>
<div class="g-col-12 g-col-md-6">
<dl>
<dt>How to cite</dt>
<dd>
Demming, Anna. 2024. “What is “best practice” when working with AI in the real world?.” Real World Data Science, June 4, 2024. <a href="https://realworlddatascience.net/foundation-frontiers/posts/2024/05/20/ai-series-6.html">URL</a>
</dd>
</dl>
</div>
</div>
</div>
<!-- Make sure to update main site homepage (index.qmd) before publishing. See README for details -->



 ]]></description>
  <category>AI</category>
  <category>large language models</category>
  <category>machine learning</category>
  <guid>https://realworlddatascience.net/foundation-frontiers/posts/2024/06/04/ai-series-6.html</guid>
  <pubDate>Tue, 04 Jun 2024 00:00:00 GMT</pubDate>
  <media:content url="https://realworlddatascience.net/foundation-frontiers/posts/2024/06/04/images/panel-991.jpg" medium="image" type="image/jpeg"/>
</item>
<item>
  <title>AI series: Meeting the unprecedented challenges AI poses in the labour market</title>
  <dc:creator>Julia Lane, Lesley Hirsch, and Adam Leonard</dc:creator>
  <link>https://realworlddatascience.net/foundation-frontiers/posts/2024/05/28/ai-series-5.html</link>
  <description><![CDATA[ 





<p>Roughly $280 billion of new funding was authorized to boost research and production of semiconductors in the US under the CHIPS and Science Act in 2022 - an amount greater than the inflation-adjusted initial spending to create the US Interstate Highway System. The legislation was just one of multiple acts engineered to subsidise and support emerging technologies in the US that are bound to have seismic impacts on the labor market. It signifies how swift changes in new and emerging technologies have the potential to profoundly change the demand for skills and the structure of work. Here AI has the potential to be more disruptive than any other technological development since the industrial revolution.</p>
<p>The US is not alone. Countries across the globe are trying to understand the potential for AI to affect their workforce and economic activity. IPSOS, Group SA, a multinational market research company with headquarters in France, recently attempted to gauge people’s feelings towards AI across the world through a survey across 31 countries and interviews with a small cohort of AI leaders (<a href="https://www.ipsos.com/sites/default/files/ct/news/documents/2023-07/Ipsos%20Global%20AI%202023%20Report-WEB_0.pdf">Global Views on AI 2023</a>). However although extensive, the data retrieved shares the limitations common to all surveys. The OECD’s most recent <a href="https://www.oecd-ilibrary.org/sites/08785bba-en/index.html?itemId=/content/publication/08785bba-en">Employment Outlook</a> devotes six out of seven chapters to understanding the impact of AI on the workforce. But the OECD also notes that “No comprehensive method exists by which to track and compare AI R&amp;D funding across countries and agencies.” <sup>1</sup> Not surprisingly, the inability to track, let alone compare AI R&amp;D funding, means that it is difficult to make predictions about the R&amp;D induced global labor market consequences.</p>
<p>The lack of a comprehensive method, and the resultant uncertainty about impact, is a clarion call to action. There are many challenges that need to be addressed. A partial list would include the following: a) a lack of a common definition of AI; b) a lack of information about the needed AI capabilities and how they will change; c) mapping AI capabilities to occupational skillls; and d) an inability to measure the impact of AI on job replacement or job augmentation.</p>
<p>Fortunately, there is hope, with new partnerships being established in the US by universities, federal, and state agencies. A new data infrastructure is being developed at the Institute for Research on Innovation and Science (IRIS) at the University of Michigan, joint with Ohio State University, in the United States, funded by the federal US National Science Foundation (NSF). The pilot joins up existing data using university and state sources to trace how scientific innovation translates to the labor market <sup>2</sup>. The NSF, which was been charged with the regional implementation of the CHIPS and Science investments, is funding the pilot precisely because it needs “innovative tools to accurately assess the impact of these investments across the U.S. <sup>3</sup></p>
<section id="how-bad-is-the-problem" class="level2">
<h2 class="anchored" data-anchor-id="how-bad-is-the-problem">How bad is the problem?</h2>
<p>The lack of data results in conflicting information. Some reports have warned of apocalyptic takeovers of the job market for many professions. Indeed, a heavily cited report by Goldman Sachs <sup>4</sup> predicted that AI could replace 300 million jobs. But the same BBC report that cited the Goldman Sachs prediction quoted the future-of-work director at Oxford University, Carl Benedikt Frey as saying “The only thing I am sure of is that there is no way of knowing how many jobs will be replaced by generative AI”. Simply put, as the former US Federal CIO, Suzette Kent, said “we lack useful information for informing strategic decisions for national workforce matters.”</p>
<p>So just how much of a problem is it that there is no information on how investments in science and technology affect the labor market? Why should we worry if we cannot accurately predict the impact of AI on workers, firms, and jobs? One reason is to avoid the mistakes of the past, in which both workers and firms have borne the consequences of bad information. Just in recent history, digitization and globalization resulted in a devastating loss of jobs in many countries. And geographic inequality soared as jobs in the midwestern and northeastern urban centers were lost and a service economy on the coasts burgeoned. Efforts to reduce the loss of jobs and earnings came too little, too late <sup>5</sup> <sup>6</sup>. Another reason is to make evidence based policy recommendations. For example, the US National AI Research Resources Taskforce, which was directly charged by the President and Congress with recommending ways to invest in AI research to strengthen and democratize the U.S. AI innovation ecosystem did not have joined up data between science investment and the workforce to inform their final recommendations. <sup>7</sup></p>
<p>In other words, governments need more timely, local, and actionable data so that they can understand changes in the tasks that employers need performed, which types of jobs and firms will be affected, and where. Concomitantly, data will be needed about the effects of AI on different population groups and different geographic areas so that the costs of change are not unfairly distributed. Armed with such information, policy makers can make investments that mitigate or counteract negative impacts and workers can be trained in the new necessary skills and matched with the firms that need them. But the swift pace of change in AI means that the urgency to create timely, local, and actionable labor market information to guide these investments has never been greater.</p>
</section>
<section id="a-new-approach" class="level2">
<h2 class="anchored" data-anchor-id="a-new-approach">A new approach</h2>
<p>The IRIS approach, called the “Industry of Ideas” builds on the “economics of ideas” framework for which Paul Romer received the 2018 Nobel Prize in Economics”. <sup>8</sup> <sup>9</sup>. People who create ideas – new technologies – that can be reused, form the foundations of new industries. In other words, “the discovery of new ideas lie at center of economic growth…” (Charles Jones describing Paul Romer’s conceptual framework) <sup>10</sup>.</p>
<p>The project recognizes that, as Robert Oppenheimer said “the best way to transmit knowledge is to wrap it up in a human being”. <sup>11</sup> It uses people-centric methods for following the movement of ideas from investments in research into the marketplace. The approach identifies businesses that employ people with deep skills in AI and other emerging technology areas and developing early, never-before-available indicators that can provide alerts associated with potential impacts on current and future workforce. Initially focused on the artificial intelligence and electric vehicle industries in Ohio, the pilot is creating a data system that can be expanded and applied to other industries and other states across the country.</p>
<p>The new tools are innovative because they build on new opportunities to produce usable information that is local, about relevant industries, and that directly tie investments in new technologies, such as AI, to labor market impacts.</p>
<p>Another key aspect of the NSF piloted “Industry of Ideas” is the focus on tying innovation at its source - individual data on university research activities - to the local workforce data reported by firms to their state departments of labor. The need for local data is critical because so many labor markets are local, not national in scope. Even in a global economy, many businesses and workers are locally based – as are the training providers that work to ensure that labor demand and supply are well matched. Thus the Industry of Ideas pilot provides policy makers, workers, firms, and educational institutions with access to an array of local, timely, granular, actionable resources to help them make decisions. That way, local leaders who need labor market data don’t need to rely on national unemployment figures, which are reported once a month.</p>
</section>
<section id="connecting-science-investments-with-jobs" class="level2">
<h2 class="anchored" data-anchor-id="connecting-science-investments-with-jobs">Connecting science investments with jobs</h2>
<p>The Industry of Ideas approach directly connects investment in science and the labor market, moving beyond the current approach for evaluating investment by studying scientific papers and publications <sup>12</sup> <sup>13</sup> which are disconnected from workers and jobs. The data seeds were sown almost two decades ago. President Bush’s Science Advisor, John Marburger III, who, quite sensibly was unconvinced of the scientific and practical value of relying primarily on document-based, bibliometric approaches to studying science to understand its practical effects, called for a “Science of Science Policy” <sup>14</sup> <sup>15</sup>.</p>
<p>The Industry of Ideas is testing the potential to securely combine university and state data to measure the link between federal investments on local and regional economies for AI. It uses people-centric data generated by the administrative processes at universities and firms. With this data the Industry of Ideas project can capture the organization of people in science at multiple levels (e.g.&nbsp;individuals, teams, projects, and institutions), their multiple sources of funding (federal scientific and programmatic agencies, philanthropic foundations, industry, and state and local government), inputs into science from vendors (such as computing services, instruments, biological specimens), as well as the dynamics of their careers across time (individual career earnings and employment trajectories).</p>
<div id="fig-1" class="quarto-float quarto-figure quarto-figure-center anchored" data-fig-align="center" alt="The Industry of Ideas Infrastructure (provided by Jason Owen-Smith, IRIS, University of Michigan)">
<figure class="quarto-float quarto-float-fig figure">
<div aria-describedby="fig-1-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
<img src="https://realworlddatascience.net/foundation-frontiers/posts/2024/05/28/images/Industry-of-ideas991-724.png" class="img-fluid quarto-figure quarto-figure-center figure-img" alt="The Industry of Ideas Infrastructure (provided by Jason Owen-Smith, IRIS, University of Michigan)">
</div>
<figcaption class="quarto-float-caption-bottom quarto-float-caption quarto-float-fig" id="fig-1-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
Figure&nbsp;1: The Industry of Ideas Infrastructure (provided by Jason Owen-Smith, IRIS, University of Michigan)
</figcaption>
</figure>
</div>
<p>The IRIS infrastructure, developed over the past decade, provides administrative records on more than 41% of U.S. total R &amp; D spending at universities <sup>16</sup>. The infrastructure also provides links to survey data, as well as data from private sector suppliers <sup>17</sup>, and can trace the flows of university funded researchers into the private sector <sup>18</sup> by joining up the university administrative data with state workforce data.</p>
</section>
<section id="tying-information-about-ai-to-skills-needs" class="level2">
<h2 class="anchored" data-anchor-id="tying-information-about-ai-to-skills-needs">Tying information about AI to skills needs</h2>
<p>How is it possible to tie changes in AI to changing needs for skills? State leaders in workforce and education agencies have identified new ways to collaborate, build staff capacity, and develop solutions, services, and products that respond to local need. An example of how to use data to get better information that more accurately connects workers with firms in the swiftly changing labor market is the New Jersey Career Navigator. It provides job seekers recommendations on new careers, available job postings, and relevant training programs based on skills similarity, labor market demand, and wage impacts observed in the underlying data. These recommendations, which are in themselvers generated by AI, show how AI technology can be used to navigate the changes in the labour market AI may cause. The New Jersey Career Navigator draws on millions of wage records, providing earnings and industry information on all workers covered by unemployment insurance in New Jersey firms; employment and wage outcomes from hundreds of thousands of graduates of occupational skills training programs in New Jersey; several years of online job postings from the National Labor Exchange Research Hub (NLx); and the resumes of 400,000 New Jersey residents.</p>
<p>In other words, as the Industry of Ideas pilot evolves, new ideas from states like New Jersey can be used not only to trace the flows of ideas from academia to the workplace but also to develop a new system that targets reskilling efforts once the type and location of skills needs have been identified. The new joined up data and evidence can be used to address challenges such as low labor force participation, and supplies education and training providers the data they need to align their programs with the needs of the labor market. Such a system would help government, business, educators, and workers adjust regional talent pipelines continuously in response to the changes in AI and enable workers to successfully navigate the changes that it brings.</p>
</section>
<section id="new-approaches-to-classifying-industries-industries-of-ideas" class="level2">
<h2 class="anchored" data-anchor-id="new-approaches-to-classifying-industries-industries-of-ideas">New approaches to classifying industries: “Industries of Ideas”</h2>
<p>An important outcome of the new NSF pilot is the potential to transform the way in which we classify firms into industries. The current industry classifications are rule based. They are designed for the economy as it was organized 40 years ago, so are not designed to describe AI. A case in point is the state of Texas – a state that anecdotally has generated a lot of high tech jobs. Current industry data for Texas is limited because firms are grouped into industries that are defined by what they produce, or how they produce it, rather than describing what new technology is being developed or utilized by those firms. As a result, the main source of labor market data in Texas provides an implausibly low picture of AI activity <sup>19</sup>.</p>
<p>The Industries of Ideas approach could provide states with a new way to classify firms, based on clever new ideas of how firms can do their business, and by grouping firms by the people who created and use the technologies they will adopt <sup>20</sup>. Examples just for Ohio include funding to use AI to improve the ways in which medicine is delivered, and advancing digital agricuture , which includes things like precision livestock farming, or precision agriculture that reduces waste and improves productivity more generally. As they interact with farmers, the clustering of university researchers and the ideas embodied in them alongside the farms that adopt those ideas represents this new type of industry cluster . Such a classification framework is a sea change from earlier industrial classifications based on what goods are physically produced - like manufacturing and agriculture <sup>21</sup>.</p>
</section>
<section id="the-future" class="level2">
<h2 class="anchored" data-anchor-id="the-future">The Future</h2>
<p>Such a bottom-up classification and analysis system, based on local links between researchers and firms, could be designed locally but scaled nationally. It could address the challenges identified at the beginning of this piece. The definition of AI firms could evolve and be defined by the links between AI researchers and the firms with which they work. The lack of information about the needed AI capabilities would be resolved by the direct mapping of firm skill demand and their hiring patterns, as exemplified in New Jersey. The same New Jersey mapping could tie AI capabilities to occupational skills. And the direct impact of AI on job replacement or job augmentation could be mapped from the joined up university and workforce data.</p>
<p>Of course, much needs to be done. The implementation will depend on the success of the pilot, and the ability to build on existing assets. Not all states and universities have the capacity to build a similar system, but the fact that 30 universities and 15 state agencies are participating in advisory boards for the NSF Industry of Ideas pilot is grounds for hope. Indeed, a new generation of data leaders is leading the way, not only at the local and regional government level but also at universities and professional associations (Advisory Committee on Data for Evidence Building) <sup>22</sup>.</p>
<p>We began this paper by noting that the urgency to create timely, local, and actionable labor market information has never been greater. We close by arguing that our capacity to fundamentally change the way in which we can use data and information to understand the demand for skills and the structure of work has also never been greater. The opportunity is ours for the taking.</p>
<!-- article text to go here -->
<div class="article-btn">
<p><a href="../../../../../foundation-frontiers/index.html">Explore more data science ideas</a></p>
</div>
<div class="further-info">
<div class="grid">
<div class="g-col-12 g-col-md-12">
<dl>
<dt>About the authors</dt>
<dd>
<strong>Julia Lane</strong> is a Professor at New York University’s Wagner Graduate School of Public Service. She was a senior advisor in the Office of the Federal CIO at the White House, supporting the implementation of the Federal Data Strategy. She recently served on two White House committees: the Advisory Committee on Data for Evidence Building and the National AI Research Resources Task Force.
</dd>
<dd>
<strong>Adam Leonard</strong> is the Chief Analytics Officer &amp; Director of the Division of Information Innovation &amp; Insight (I|3) for the Texas Workforce Commission (TWC). Adam envisioned and founded I|3 to help TWC leverage its most important untapped resource - its data – to help the agency and its partners better help employers, individuals, families, and communities achieve &amp; maintain prosperity.
</dd>
<dd>
<strong>Lesley Hirsch</strong> is the Assistant Commissioner of Research and Information at the New Jersey Department of Labor and Workforce Development. Her vision for the department is to bring cutting-edge digital tools to bear to deliver labor market intelligence to the department’s internal and external customers where, when, and how they need it and to mine every data source so it can tell its full story.
</dd>
</dl>
</div>
<div class="g-col-12 g-col-md-6">
<!-- copyright goes to the author, or to Royal Statistical Society if written by staff -->
<dl>
<dt>Copyright and licence</dt>
<dd>
© 2024 Royal Statistical Society
</dd>
</dl>
<!-- confirm licence terms with contributor before publishing - must be Creative Commons licence, but different types of CC licences might be preferred -->
<p><a href="http://creativecommons.org/licenses/by/4.0/?ref=chooser-v1" target="_blank" rel="license noopener noreferrer" style="display:inline-block;"> <img src="https://mirrors.creativecommons.org/presskit/icons/cc.svg?ref=chooser-v1" style="height:22px!important;vertical-align:text-bottom;"><img src="https://mirrors.creativecommons.org/presskit/icons/by.svg?ref=chooser-v1" style="height:22px!important;margin-left:3px;vertical-align:text-bottom;"></a> This article is licensed under a Creative Commons Attribution 4.0 (CC BY 4.0) <a href="http://creativecommons.org/licenses/by/4.0/?ref=chooser-v1" target="_blank" rel="license noopener noreferrer" style="display:inline-block;"> International licence</a>. <!-- Add thumbnail image credit and any licence terms here --></p>
</div>
<div class="g-col-12 g-col-md-6">
<dl>
<dt>How to cite</dt>
<dd>
Lane, J., Hirsch, L. and Leonard, A. 2024. “Meeting the unprecedented challenges AI poses in the labour market.” Real World Data Science, May 28, 2024. <a href="https://realworlddatascience.net/foundation-frontiers/posts/2024/05/13/ai-series-5.html">URL</a>
</dd>
</dl>
</div>
</div>
</div>
<!-- Make sure to update main site homepage (index.qmd) before publishing. See README for details. -->


</section>


<div id="quarto-appendix" class="default"><section id="footnotes" class="footnotes footnotes-end-of-document"><h2 class="anchored quarto-appendix-heading">Footnotes</h2>

<ol>
<li id="fn1"><p>A new approach to measuring government investment in AI-related R&amp;D. Galindo-Rueda, F. &amp; Cairns, S. <em>oecd.ai</em> (2021)↩︎</p></li>
<li id="fn2"><p><a href="https://www.aei.org/research-products/report/the-industry-of-ideas-measuring-how-artificial-intelligence-changes-labor-markets/">The Industry of Ideas: Measuring How Artificial Intelligence Changes Labor Markets</a> Lane, J. AEI (2023)↩︎</p></li>
<li id="fn3"><p><a href="https://www.aei.org/research-products/report/the-industry-of-ideas-measuring-how-artificial-intelligence-changes-labor-markets/">NSF launches pilot to assess the impact of strategic investments on regional jobs</a> *new.nsf.gov (2023)↩︎</p></li>
<li id="fn4"><p><a href="https://www.bbc.co.uk/news/technology-65102150">AI could replace equivalent of 300 million jobs - report</a> Vallance, C. <em>BBC news</em> (2023)↩︎</p></li>
<li id="fn5"><p><a href="https://www.aeaweb.org/articles?id=10.1257/aer.103.5.1553">The Growth of Low-Skill Service Jobs and the Polarization of the US Labor Market</a> Autor, D. H. &amp; Dorn, D. <em>American Economic Review</em> <strong>103</strong> pp.&nbsp;1553-97 (2013)↩︎</p></li>
<li id="fn6"><p><a href="https://www.aeaweb.org/articles?id=10.1257/aer.104.8.2509">Explaining Job Polarization: Routine-Biased Technological Change and Offshoring</a> Goos, M., Manning, A. &amp; Salomons, A. <em>American Economic Review</em> <strong>104</strong> 2509-26 (2014)↩︎</p></li>
<li id="fn7"><p><a href="https://www.ai.gov/wp-content/uploads/2023/01/NAIRR-TF-Final-Report-2023.pdf">Strengthening and Democratizing the U.S. Artificial Intelligence Innovation Ecosystem</a> Office of Science and Technology Policy (2023)↩︎</p></li>
<li id="fn8"><p><a href="https://paulromer.net/deep_structure_growth/">The Deep Structure of Economic Growth</a> Romer, P. <em>paulromer.net</em> (2019)↩︎</p></li>
<li id="fn9"><p><a href="https://hdsr.mitpress.mit.edu/pub/zgu2u8y6/release/2">Interview With Paul Romer</a> Romer, P. &amp; Lane, J. (2022)↩︎</p></li>
<li id="fn10"><p><a href="https://onlinelibrary.wiley.com/doi/abs/10.1111/sjoe.12370">Paul Romer: Ideas, nonrivalry, and endogenous growth</a>(Jones, C. I. <em>The Scandinavian Journal of Economics</em> <strong>121</strong> 859-883 (2019)↩︎</p></li>
<li id="fn11"><p><a href="https://www.science.org/doi/10.1126/science.aac5949">Wrapping it up in a person: Examining employment and earnings outcomes for Ph.D.&nbsp;recipients</a> Zolas, N. <em>et al.</em> <em>Science</em> **350 1367-1371 (2015)↩︎</p></li>
<li id="fn12"><p><a href="https://www.nature.com/articles/464488a">Let’s make science metrics more scientific</a> Lane, J. <em>Nature</em> <strong>464</strong> 488–489 (2010)↩︎</p></li>
<li id="fn13"><p><a href="https://issues.org/democratizing-government-data-lane/">A Vision for Democratizing Government Data</a> Lane, J. <em>Issues in Science and Technology</em> <strong>XXXIX</strong> (2022)↩︎</p></li>
<li id="fn14"><p><a href="https://www.nature.com/articles/464488a">Let’s make science metrics more scientific</a> Lane, J. <em>Nature</em> <strong>464</strong> 488–489 (2010)↩︎</p></li>
<li id="fn15"><p><a href="https://www.science.org/doi/10.1126/science.1114801">Wanted: Better Benchmarks</a> Marburger III, J. H. <em>Science</em> <strong>308</strong> p1087(2005)↩︎</p></li>
<li id="fn16"><p><a href="https://iris.isr.umich.edu/research-data/2022datarelease-summarydoc/">The Institute for Research on Innovation &amp; Science (IRIS). Summary Documentation for the IRIS UMETRICS 2022 Data Release</a> Nicholls, N., Brown, C. A., Ku, R. L. and Owen-Smith, J. D. <em>Ann Arbor, MI: The Institute for Research on Innovation &amp; Science</em> (2022) doi: 10.21987/df2a-ha30↩︎</p></li>
<li id="fn17"><p><a href="https://hdsr.mitpress.mit.edu/pub/u073rjxs/release/3">A Linked Data Mosaic for Policy-Relevant Research on Science and Innovation: Value, Transparency, Rigor, and Community</a> Chang, W.-Y., Garner, M., Basner, J., Weinberg, B. and Owen-Smith, J. <em>Harvard Data Science Review</em> (2022) doi: 10.1162/99608f92.1e23fb3f↩︎</p></li>
<li id="fn18"><p><a href="https://www.aei.org/research-products/report/the-industry-of-ideas-measuring-how-artificial-intelligence-changes-labor-markets/">The Industry of Ideas: Measuring How Artificial Intelligence Changes Labor Markets</a> Lane,J. <em>American Enterprise institute</em> (2023)↩︎</p></li>
<li id="fn19"><p>[Outside of the Box Use of Administra4ve and Wage Data in Texas] (https://digitaleconomy.stanford.edu/wp-content/uploads/2024/03/Adam-Leonard.pdf) Leonard, A. <em>digitaleconomy.standford.edu</em> (2024)↩︎</p></li>
<li id="fn20"><p><a href="https://www.aei.org/research-products/report/the-industry-of-ideas-measuring-how-artificial-intelligence-changes-labor-markets/">The Industry of Ideas: Measuring How Artificial Intelligence Changes Labor Markets</a> Lane,J. <em>American Enterprise institute</em> (2023)↩︎</p></li>
<li id="fn21"><p><a href="https://www.bea.gov/system/files/papers/P2007-7.pdf">Converting historical industry time series data from SIC to NAICS. The Federal Committee on Statistical Methodology</a> Yuskavage, R. <em>Federal Committee on Statistical Methodology</em> (2007)) – or by how services and goods are produced – like the delivery of health, financial, and investment services <a href="https://www.jstor.org/stable/23487551">The Statistics Corner: The NAICS Is Coming. Will We Be Ready?</a> Haver, M. A. <em>Business Economics</em> <strong>32</strong> 63-65 (1997)↩︎</p></li>
<li id="fn22"><p><a href="https://www.bea.gov/system/files/2022-10/supplemental-acdeb-year-2-report.pdf">Year 2 Report Supplemental Information</a> Advisory Committee on Data for Evidence Building (ACDEB) (2022)↩︎</p></li>
</ol>
</section></div> ]]></description>
  <category>AI</category>
  <category>Data management</category>
  <category>Forecasting</category>
  <guid>https://realworlddatascience.net/foundation-frontiers/posts/2024/05/28/ai-series-5.html</guid>
  <pubDate>Tue, 28 May 2024 00:00:00 GMT</pubDate>
  <media:content url="https://realworlddatascience.net/foundation-frontiers/posts/2024/05/28/images/Industry-of-ideas991-724.png" medium="image" type="image/png" height="105" width="144"/>
</item>
<item>
  <title>AI series: Evaluation essentials for safe and reliable AI model performance</title>
  <dc:creator>Isabel Sassoon</dc:creator>
  <link>https://realworlddatascience.net/foundation-frontiers/posts/2024/05/21/ai-series-4.html</link>
  <description><![CDATA[ 





<p>It took just sixteen hours for <a href="https://blogs.microsoft.com/blog/2016/03/25/learning-tays-introduction/">Microsoft’s shiny new chatbot</a> Tay to be shut down for profanity. The chatbot had been released on the social media platform, X, then known as Twitter, following extensive evaluation and stress testing under different conditions to ensure that interacting with the chatbot would be a positive experience. Unfortunately, the testing plan had not bargained on a coordinated attack exploiting the chatbot’s vulnerability when exposed to a torrent of offensive material. Tay soon began tweeting wildly inappropriate words and images and was taken offline within hours.</p>
<p>The chatbot’s failure highlights just how hard yet imperative it can be to test and evaluate a model before real world deployment. With the recent flux of accessible “off-the-shelf” machine learning algorithms, building AI models, in particular generative AI models is now relatively straight forward. However, the simplicity with which models are deployed undermines the complexity of evaluating them. Nonetheless, deploying the model anywhere outside the data and context it has been trained on can be risky if its performance is not evaluated. The evaluation process requires clear definitions for good performance as well as highlighting the potential risks, and can throw up unexpected requirements in the test data. Not only are the subtle nuances in the initial evaluation requirements important, but once deployed a process needs to be in place so that the algorithm can be monitored over time.</p>
<section id="know-your-goals" class="level2">
<h2 class="anchored" data-anchor-id="know-your-goals">Know your goals</h2>
<p>The first point to note is that checking how well the output from an AI model matches the data in the training set is not an adequate indication of how well it will perform once deployed on other data. The problem can be exemplified by considering a simple model based on an equation that best fits a training data set. Data values are inevitably subject to measurement uncertainties and local conditions that add various types of noise, so taking the line defined by the equation identified as best matching the training data and measuring how good that match is falls short of adequate evaluation - the more perfectly a model matches this noisy data, the less perfectly it will fit an alternative set of data, a scenario described as “overfitting”. Similarly, what a machine learning or AI algorithm or model learns when it optimises its fit to the training data may not be generalisable.</p>
<div id="fig-1" class="quarto-float quarto-figure quarto-figure-center anchored" data-fig-align="center" alt="Reliable deployment of an algorithm requires identifying metrics, risks and rigorous, ongoing evaluation. Image created by Isabel Sassoon using firefly to show a technical report process flow of statistical model performance and a huge numbers chart.">
<figure class="quarto-float quarto-float-fig figure">
<div aria-describedby="fig-1-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
<img src="https://realworlddatascience.net/foundation-frontiers/posts/2024/05/21/images/Evaluation_thumbnail.jpg" class="img-fluid quarto-figure quarto-figure-center figure-img" alt="Reliable deployment of an algorithm requires identifying metrics, risks and rigorous, ongoing evaluation. Image created by Isabel Sassoon using firefly to show a technical report process flow of statistical model performance and a huge numbers chart.">
</div>
<figcaption class="quarto-float-caption-bottom quarto-float-caption quarto-float-fig" id="fig-1-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
Figure&nbsp;1: Evaluation plots. The kinds of charts of monitored performance and risk metrics that are plotted to evaluate an AI model. Reliable deployment of an algorithm requires identifying appropriate metrics for performance and risk as well as rigorous, ongoing evaluation. Image created by Isabel Sassoon using Adobe Firefly to show a technical report process flow of statistical model performance and a huge numbers chart.
</figcaption>
</figure>
</div>
<p>There are a number of possible approaches and factors to take into account when sourcing test data but the first thing to consider when drawing up a process for evaluating an AI model is its objective. With this objective in mind it is then possible to pin down an appropriate measure of performance, which will shape how to use the test data to evaluate the model performance. Among the distinguishing factors between different measures used to evaluate how a model performs on test data, some will be suitable when the objective is to classify (e.g high or low risk based on health data?) while others are useful for models that estimate or predict (e.g What is the estimated height of a child given their parents’ heights).</p>
<p>Classification model performance can be measured using accuracy, confusion matrices, sensitivity, specificity and the receiver operating characteristic (ROC). Classification accuracy summarises the performance of a classification model as the number of cases in which the model correctly classifies divided by the total number of cases used in the test set. However, this can be a blunt tool as there are cases where there is a different cost or consequence depending on the direction of the error. Confusion matrices are helpful to explore how the model performs in correctly classifying the different classes. The confusion matrix sums up the number of cases the model classifies correctly within each of the classes, for example how many actual high-risk cases are correctly classified as high risk by the model. The number of cases the model classifies as high risk, for example, that are not high risk is referred to as the False Positives. In the context of medical tests (e.g the covid lateral flow tests) testing positive for a condition that is not actually there is potentially less damaging than testing negative when the condition is there.</p>
<div id="fig-2" class="quarto-float quarto-figure quarto-figure-center anchored" data-fig-align="center" alt="The receiver operating characteristic can provide a helpful means of visualising performance. Credit: shutterstock .">
<figure class="quarto-float quarto-float-fig figure">
<div aria-describedby="fig-2-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
<img src="https://realworlddatascience.net/foundation-frontiers/posts/2024/05/21/images/shutterstock_2377152411.jpg" class="img-fluid quarto-figure quarto-figure-center figure-img" alt="The receiver operating characteristic can provide a helpful means of visualising performance. Credit: shutterstock .">
</div>
<figcaption class="quarto-float-caption-bottom quarto-float-caption quarto-float-fig" id="fig-2-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
Figure&nbsp;2: The receiver operating characteristic can provide a helpful means of visualising performance. Credit: shutterstock
</figcaption>
</figure>
</div>
<p>Additionally, the sensitivity and specificity can provide a more detailed look at model performance. The sensitivity refers to the proportion of cases labelled as positive that are classified as positive by the model, whereas specificity refers to the proportion labelled as negative that it classifies as negative. It is useful to visualise model performance and the receiver operating characteristic (ROC) provides a method to do just that. The ROC plots the True Positive rate against the False Positive rate for the model. This can be further summarised in one value as the area under the curve (AUC). The larger the AUC the better the model is performing.</p>
<p>Deciding whether accuracy is enough or whether there is a need to delve into the directions of the errors depends on the context of the model’s deployment. Other examples in medicine include the risk models that were developed to assess an individual’s risk of a specific medical condition, such as <a href="https://qrisk.org/">QRISK</a> <sup>1</sup> which calculates a person’s risk of developing a heart attack or stroke over the next 10 years. Here model performance needs to go beyond accuracy and consider the direction of the errors it makes. A good overview of performance evaluation is outlined by Flach (2019) <sup>2</sup>. Is it better to tell someone they may be at risk of disease X, run a blood test and rule it out (False Positive) than to tell them they are not at risk and not check (False Negative)? All this needs to be considered and factored into the validation of the model. It is worth noting that a systematic direction for its errors can also cause an algorithm to hit <a href="https://realworlddatascience.net/foundation-frontiers/posts/2024/05/14/ai-series-2.html">ethical problems</a>.</p>
<p>When evaluating the performance of models that are estimating a numerical value (e.g height of child from height of parents) the measures used are based on how far off the model’s estimate is from the actual value (which is known for testing data). There are then a multitude of ways of summarising that quantity. The mean square error (MSE) is computed by taking the average squared difference between the estimated values from the model and the actual value in the data. Other variations include root mean square error (RMSE) and mean absolute error (MAE). The RMSE is computed in the same way as the MSE but the value is square rooted. The MAE takes a different approach by summing up the absolute errors (i.e.&nbsp;the error magnitude). Each of these three measures involve dividing the value obtained by the number of rows in the data. Depending on the context one of these measures may be better suited than others. For example the MSE is sensitive to outliers so can be easily skewed by a small number of extreme values, which may be useful to highlight them, whereas the RMSE has the advantage of being measured in the same units as the original variable the model is designed to estimate.</p>
<p>Large Language Models (LLMs, e.g.&nbsp;Gemini, ChatGPT) are also models trained on a data set and as such also need to be evaluated and monitored. Whereas in the models discussed so far there are some standard metrics, evaluating LLMs is more challenging as there are a multitude of benchmarks and metrics<sup>3</sup>. When LLMs are used to answer questions (when you ask a chatbot a question) then monitoring the performance of the model (the trained LLM) can involve anything. Is the answer correct? Is the answer clear? Is the answer biased? The possible metrics are varied and not as simple to capture in one measure. It is also possible to use a LLM to evaluate or score another LLMs’ answer to a question. However this adds its own risk as LLMs are not 100% accurate or consistent themselves, and they can <a href="https://realworlddatascience.net/the-pulse/editors-blog/posts/2022/11/23/LLM-content-warning.html">hallucinate</a>.</p>
</section>
<section id="getting-data-right" class="level2">
<h2 class="anchored" data-anchor-id="getting-data-right">Getting data right</h2>
<p>Not only is separate test data needed for an evaluation, but care is needed to ensure the test data is suitably representative. Similar requirements apply for test data as for the original training data to ensure the dataset is representative of the context the model will be deployed in. For instance, if an algorithm is being developed to handle photos from the UK, training and testing it on photos where the sun always shines may cause problems. The model needs to be trained and tested on a set of photos that include rain and clouds otherwise it cannot be assumed it will reliably classify such photos if they appear during deployment in the real world. Getting the training and test data set right may mean using a smaller more curated set than simply one that contains everything available.</p>
<p>These data sets also need to have reliable labelling i.e.&nbsp;the rows of data need to be accurate so that the model’s performance can be assessed objectively against a trusted “ground truth”. For example, if we want to evaluate the performance of a fraud transaction classification model using accuracy as the performance metric, then we need a reliable training data set with true fraud transactions to evaluate how good the model is at detecting them. A data set with a list of transactions that are not accurately identified (or labelled) as fraud or not is not helpful. Thinking about how some commercial LLMs are trained on all the data in the “internet” it is worth asking whether a smaller more curated and specific training set would be better for model performance as well as being more ethical and safer.</p>
<p>Several approaches for generating test data sets take training and test data as distinct subsets from the same initial data set <sup>4</sup>. There are different ways of doing this to make the most of the data to evaluate the model as systematically and exhaustively as possible. Perhaps the simplest example is using a hold-out set, which involves taking all the data available and taking a random subset of the data to use for testing the model. Depending on how much data is available then this can be 50% or less.</p>
<p>A slightly more sophisticated approach is k-fold cross validation, which involves splitting all the data you have available into k subsets and then doing k iterations where in each iteration a different kth of the data is used as the testing data for evaluation of the model built by training it on the remaining (k-1/k) of the data. This is repeated k times each time using a different one of the k subsets for testing. The performance of the model can then be averaged over the k iterations. (The measure of performance can be, say, accuracy or sensitivity depending on the context). For example, if k is 3 then the data is split into 3, and each iteration will take a different 2/3 of the data as training data to build the model, and the remaining 1/3 as testing data to evaluate the model.</p>
<div id="fig-1" class="quarto-float quarto-figure quarto-figure-center anchored" data-fig-align="center" alt="K fold cross validation can indicate how sensitive a model is to the test data. Credit: Fabian Flöck CC-BY-AS-3.0">
<figure class="quarto-float quarto-float-fig figure">
<div aria-describedby="fig-1-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
<img src="https://realworlddatascience.net/foundation-frontiers/posts/2024/05/21/images/k-fold-cross-validation.png" class="img-fluid quarto-figure quarto-figure-center figure-img" alt="K fold cross validation can indicate how sensitive a model is to the test data. Credit: Fabian Flöck CC-BY-AS-3.0">
</div>
<figcaption class="quarto-float-caption-bottom quarto-float-caption quarto-float-fig" id="fig-1-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
Figure&nbsp;3: K fold cross validation can indicate how sensitive a model is to the test data. Credit: Fabian Flöck CC-BY-AS-3.0
</figcaption>
</figure>
</div>
<p>Bootstrap is a more computationally intensive approach and it involves creating multiple samples by randomly sampling with replacement from the original data. Typically, hundreds or thousands of and such samples are generated, each will be different. These multiple samples provide multiple versions of the training and testing data so the model can be evaluated on all these variations. As bootstrap relies on sampling with replacement this means that each row of data in the original data can appear multiple times in a sample training or test data during each iteration, or not appear in other samples. As with k-fold cross validation the performance of the model can be then averaged over these multiple iterations. It is important that bootstrap does not rely on only a handful of iterations. Both bootstrap and cross validation offer an opportunity to see how sensitive the model’s performance is to the characteristics of the test data, but when the data sets available are small, the use of the bootstrap approach provides a more robust way of estimating the model’s performance.</p>
<p>An approach that can be useful to test whether the performance of the model is sensitive to time is time-based splits. This involves taking a “sliding window” ensuring that data is split into back-to-back time periods. Using a back-to-back (sliding window) further ensures that the data the model is trained on is separate from the one it is tested on.</p>
</section>
<section id="maintained-monitoring" class="level2">
<h2 class="anchored" data-anchor-id="maintained-monitoring">Maintained monitoring</h2>
<p>Once an algorithm has been let loose it can be challenging to maintain any rigorous monitoring, but it is worth highlighting the importance of taking on the challenge of ongoing monitoring and promising approaches to it. Some of the same metrics will apply to keep a handle on the myriad of issues that could arise. These range from the banal, such as data input errors, to the complex as is the case in model drift.</p>
<p>In the first case, if a model makes use of data that is fed into it from another system (e.g.&nbsp;a billing system) any update to this other system can affect model performance. Identifying this involves checking that the characteristics of the data used to train the model and the latest data fed into the model are not too dissimilar, since a difference in the data such as an increase by a factor of 10 or a hundred can cause the algorithm to fail. The magnitude of acceptable change in the data will depend on the context. Such a step change (due to source system update) in one of the model inputs can be identified and can potentially be an easy fix.</p>
<p>Model drift is more complex as real-world data evolves over time. There are two types of model drift: data drift and concept drift. Data drift refers to the change that can occur to data over time, whilst concept drift<sup>5</sup> is a deterioration or change in the relationship between the target variable and input variables of a model. An example of data drift could be in the context of billing data the addition of new price plans or phones to the data, whilst an example of concept drift can arise when there is a change in the relationship between the effect (for instance, leaving one mobile phone provider for another) and underlying factors changes. In the context of the mobile phone provider market, a concept drift may mean that leaving for another provider is no longer dictated so much by price sensitivity as the type of network. Both types of drift lead to a deterioration in performance of the model as time goes by. Performance monitoring of the model is key to detecting model drift but differentiating between data or concept drift requires additional specialist approaches. Some of these are outlined in (Rotalinti, 2022)<sup>6</sup> and (Davis, 2020)<sup>7</sup>.</p>
<p>In some cases, refreshing a model to account for the change in the underlying data (both training and test) can be quick and easy. However, if concept drift is detected, then it may take more than just a model refresh as the relationships between the variable we are trying to model, and the explanatory data has changed. This may involve finding new data sources and could lead to significant changes in the model, for example moving from a regression model to a neural network. Deciding to rebuild or retrain a model can also in some cases have environmental impact (particularly for the more resource intensive models such as deep learning and LLMs). Either way, where models are subject to peer review or some form of governance this can be a more onerous task.</p>
<p>Even with each step in a model’s evaluation stringently adhered to it is also important to assess the context for its deployment for risks and rogue scenarios that might break or in the case of Tay despoil it. And like all other stages of the evaluation this should not just be at the time of deployment but also over time. When models (machine learning or other) are used to inform or make important decisions providing information on how and when the model was evaluated, and how it is monitored should be standard practice not just to avoid the wasted expense of another broken AI model (algorithm) left on the shelf but more importantly to safeguard the welfare of those who come into contact with it.</p>
<div class="article-btn">
<p><a href="../../../../../foundation-frontiers/index.html">Explore more data science ideas</a></p>
</div>
<div class="further-info">
<div class="grid">
<div class="g-col-12 g-col-md-12">
<dl>
<dt>About the author</dt>
<dd>
<strong>Isabel Sassoon</strong> is senior lecturer in the Department of Computer Science, Brunel University London.
</dd>
</dl>
</div>
<div class="g-col-12 g-col-md-6">
<!-- copyright goes to the author, or to Royal Statistical Society if written by staff -->
<dl>
<dt>Copyright and licence</dt>
<dd>
© 2024 Royal Statistical Society
</dd>
</dl>
<!-- confirm licence terms with contributor before publishing - must be Creative Commons licence, but different types of CC licences might be preferred -->
<p><a href="http://creativecommons.org/licenses/by/4.0/?ref=chooser-v1" target="_blank" rel="license noopener noreferrer" style="display:inline-block;"> <img src="https://mirrors.creativecommons.org/presskit/icons/cc.svg?ref=chooser-v1" style="height:22px!important;vertical-align:text-bottom;"><img src="https://mirrors.creativecommons.org/presskit/icons/by.svg?ref=chooser-v1" style="height:22px!important;margin-left:3px;vertical-align:text-bottom;"></a> This article is licensed under a Creative Commons Attribution 4.0 (CC BY 4.0) <a href="http://creativecommons.org/licenses/by/4.0/?ref=chooser-v1" target="_blank" rel="license noopener noreferrer" style="display:inline-block;"> International licence</a>. <!-- Add thumbnail image credit and any licence terms here --></p>
</div>
<div class="g-col-12 g-col-md-6">
<dl>
<dt>How to cite</dt>
<dd>
Sassoon, Isabel. 2024. “Evaluation essentials for safe and reliable AI model performance .” Real World Data Science, May 21, 2024. <a href="https://realworlddatascience.net/foundation-frontiers/posts/2024/05/06/ai-series-4.html">URL</a>
</dd>
</dl>
</div>
</div>
</div>
<!-- Make sure to update main site homepage (index.qmd) before publishing. See README for details. -->


</section>


<div id="quarto-appendix" class="default"><section id="footnotes" class="footnotes footnotes-end-of-document"><h2 class="anchored quarto-appendix-heading">References</h2>

<ol>
<li id="fn1"><p>Hippisley-Cox, J., Coupland, C. and Brindle, P. Development and validation of QRISK3 risk prediction algorithms to estimate future risk of cardiovascular disease: prospective cohort study.<em>BMJ</em> (2017) <a href="https://www.bmj.com/content/357/bmj.j2099">doi: https://doi.org/10.1136/bmj.j2099</a>.↩︎</p></li>
<li id="fn2"><p>Flach, P. (2019). Performance evaluation in machine learning: the good, the bad, the ugly, and the way forward. <em>Proceedings of the AAAI conference on artificial intelligence</em> pp.&nbsp;9808-9814 (2019) <a href="https://ojs.aaai.org/index.php/AAAI/article/view/5055">doi: https://doi.org/10.1609/aaai.v33i01.33019808</a>.↩︎</p></li>
<li id="fn3"><p>Chang, Y. <em>et al.</em> A survey on evaluation of large language models. <em>ACM Transactions on Intelligent Systems and Technology</em> (2023) <a href="https://dl.acm.org/doi/10.1145/3641289">doi: https://doi.org/10.1145/3641289</a>.↩︎</p></li>
<li id="fn4"><p>Witten, I. H., Frank, E. and Hall, M. A. Data mining: Practical machine learning tools and techniques. Morgan Kaufmann (2011).↩︎</p></li>
<li id="fn5"><p>Bayram, F., Ahmed, B. S. and Kassler A. From concept drift to model degradation: An overview on performance-aware drift detectors. <em>Knowledge-Based Systems</em> (2022) <a href="https://www.sciencedirect.com/science/article/pii/S0950705122002854">doi: https://doi.org/10.1016/j.knosys.2022.108632</a>.↩︎</p></li>
<li id="fn6"><p>Rotalinti, Y., Tucker, A., Lonergan, M., Myles, P. and Branson, R. Detecting drift in healthcare AI models based on data availability. <em>Joint European Conference on Machine Learning and Knowledge Discovery in Databases</em> 243-258 (2022) Springer Nature Switzerland. <a href="https://link.springer.com/chapter/10.1007/978-3-031-23633-4_17">doi: https://doi.org/10.1007/978-3-031-23633-4_17</a>↩︎</p></li>
<li id="fn7"><p>Davis, S. E., Greevy Jr, R. A., Lasko, T. A., Walsh, C. G. and Matheny, M. E. Detection of calibration drift in clinical prediction models to inform model updating. <em>Journal of biomedical informatics</em> (2020) <a href="https://www.sciencedirect.com/science/article/pii/S1532046420302392">doi: https://doi.org/10.1016/j.jbi.2020.103611</a>.↩︎</p></li>
</ol>
</section></div> ]]></description>
  <category>AI</category>
  <category>Machine Learning</category>
  <category>Large Language Models</category>
  <guid>https://realworlddatascience.net/foundation-frontiers/posts/2024/05/21/ai-series-4.html</guid>
  <pubDate>Tue, 21 May 2024 00:00:00 GMT</pubDate>
  <media:content url="https://realworlddatascience.net/foundation-frontiers/posts/2024/05/21/images/Evaluation_thumbnail.jpg" medium="image" type="image/jpeg"/>
</item>
<item>
  <title>AI series: On AI ethics - influencing its use in the delivery of public good</title>
  <dc:creator>Olivia Varley-Winter</dc:creator>
  <link>https://realworlddatascience.net/foundation-frontiers/posts/2024/05/14/ai-series-2.html</link>
  <description><![CDATA[ 





<!-- article text to go here -->
<p>Criminal <a href="https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing">sentencing biased by race</a> in the US, <a href="https://www.theguardian.com/education/2021/feb/18/the-student-and-the-algorithm-how-the-exam-results-fiasco-threatened-one-pupils-future">students systematically downgraded</a> in UK public examinations with no process for appeal, and <a href="https://orientblackswan.com/details?id=9789352875429#:~:text=Dissent%20on%20Aadhaar%20argues%20that,surveillance%20and%20commercial%20data%2Dmining">decisions to rescind food welfare</a> in India riddled with errors and discrepancies are all instances where AI algorithms have hit the headlines. When Bill Gates wrote that the age of <a href="https://www.linkedin.com/pulse/age-ai-has-begun-bill-gates/">AI has begun</a> and “will change the way people work, learn, travel, get health care, and communicate with each other,” those probably weren’t the changes he had in mind. Nor need they be an inevitable side effect of living with AI.</p>
<p>A number of points require consideration to work safely with AI, from the potential for bias in input and training data, and consent over data use, to the transparency and fairness of applying an algorithm – who has decided the problem, or set of problems, it is to solve? The steps that are taken to explain and involve an organisation’s stakeholders in the conclusions that AI reaches also require ethical consideration, as does ethical development of AI. Its use for social policies and services highlights an additional set of problems.</p>
<p>As AI becomes more active in society, AI ethics involves not only defining the objectives for data scientists, researchers and technologists to work on. It involves governing bodies, regulators, policy makers, businesses and organisations, the media, and civil society, working to handle and communicate AI’s benefits and mitigate its harms. Organisations with international clout – such as the United Nations Educational, Scientific and Cultural Organization (<a href="https://www.unesco.org/en/artificial-intelligence/recommendation-ethics">UNESCO</a>) and the Organisation for Economic Co-operation and Development (<a href="https://www.oecd.org/gov/ethics/ethicscodesandcodesofconductinoecdcountries.htm">OECD</a>) – have prominently set out ethical principles that can broadly apply. Nonetheless, a lot can go wrong.</p>
<section id="bias-in-bias-out" class="level2">
<h2 class="anchored" data-anchor-id="bias-in-bias-out">Bias in bias out</h2>
<p>In 2016 when ProPublica launched an investigation into potential biases in a ‘risk assessment’ algorithm used by the US criminal justice system, it was the first independent investigation of its kind. This was despite the widespread use of the algorithm and its power to influence a judge’s sentence, in one instance <a href="ttps://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing">doubling the duration while increasing the severity</a> of the imprisonment. On examining 7000 risk assessment scores and the records detailing whether the subjects of those scores had reoffended in the subsequent two years, Propublica found “Only 20 percent of the people predicted to commit violent crimes actually went on to do so”. Even when the full range of crimes was taken into account “the algorithm was somewhat more accurate than a coin flip” at 61%. Part of the enthusiasm for these algorithms had been the expectation that they might bypass the prejudices and unconscious biases of human judges, enabling fairer justice. However, while many might baulk at the thought of tossing a coin to determine someone’s prison sentence, it turns out this might be a fairer approach than the algorithm, which was found to “falsely flag black defendants as future criminals” at twice the rate of white defendants.</p>
<div id="Eleanor_Roosevelt-991724" class="quarto-figure quarto-figure-center anchored">
<figure class="figure">
<p><img src="https://realworlddatascience.net/foundation-frontiers/posts/2024/05/14/images/Eleanor_Roosevelt-991724.jpg" class="img-fluid figure-img" alt="Eleanor Roosevelt reads the Universal Declaration of Human Rights in 1949; FDR Presidential Library &amp; Museum 64-165 CC-BY-2.0"></p>
<figcaption>Eleanor Roosevelt reads the Universal Declaration of Human Rights in 1949; FDR Presidential Library &amp; Museum 64-165 CC-BY-2.0</figcaption>
</figure>
</div>
<p>Since Propublica’s investigation there have been multiple reports highlighting problems with algorithms trained on historic data for use in the criminal justice system. The risk illustrated here, which can be generalised, is that such algorithms will tend to propagate social biases. In this case it means that those from ethnic minorities and lower socioeconomic backgrounds are awarded harsher sentences. Compounding the problem was the proprietary nature of the algorithms involved, which made it difficult to launch independent investigations. However, in the case of the algorithm investigated by Propublica, the input data, which is taken from questions put to the defendant and their prison records, did provide clues as to the scope for unfair outcomes. Although race is not explicitly identified, it likely correlates with other data that is used as input. This meant that the outcomes would be biased with respect to race all the same. A lot more work is needed to mitigate the effects of historical social injustices in how the criminal justice system uses data. Innovators in this area need to have confidence in what will be affected by their evidence base, as well as support from independent legal and ethical reviewers, and from regulators, to determine what will make a good innovation, and what will not.</p>
</section>
<section id="consent-human-rights-and-data-provenance" class="level2">
<h2 class="anchored" data-anchor-id="consent-human-rights-and-data-provenance">Consent, human rights, and data provenance</h2>
<p>The testing and training of AI algorithms can also run into other ethical questions about the ratio of public to private benefits from data and who governs those benefits. On the eve of the UK’s AI Summit in 2023, <a href="https://www.linkedin.com/pulse/ai-beneath-surface-pivotal-role-data-smart-data-research-uk-betwe/">Joe Cuddeford of Smart Data Research UK wrote</a>: “Many AI systems rely on data collected passively from individuals, raising questions about transparency, privacy, and who benefits from these data-driven advancements.”</p>
<p>Large scale AI models, such as generative AI models, are <a href="https://www.washingtonpost.com/technology/2023/10/25/data-provenance/">often trained on web-scraped data</a> from online platforms. This leads to questions about the fairness of internet data, the ownership of it (e.g.&nbsp;<a href="https://www.nytimes.com/2024/04/06/technology/tech-giants-harvest-data-artificial-intelligence.html">potential violation of copyright law</a>), and methods for users’ consent and human rights to be embedded and respected. There are, once again, questions about accuracy and bias: what do algorithms “learn” from data scraped from the internet, and is the information appropriately curated for use?</p>
<p>Civil liberties concerns also similarly arise when people are compelled to give up data about themselves by powerful arms of the state. For example, the national Facial Verification Testing program run by the National Institute of Standards and Technology, a part of the U.S. Department of Commerce, has held and <a href="https://slate.com/technology/2019/03/facial-recognition-nist-verification-testing-data-sets-children-immigrants-consent.html">made use of images of vulnerable individuals</a> to test and validate the performance of commercialised AI technologies. The data used by the agency for testing include ‘mugshots’ or facial images from arrests or from other encounters with law enforcement. <sup>1</sup> An additional programme focuses on testing the performance of facial recognition algorithms against an image database of sexually exploited children (<a href="https://www.nist.gov/programs-projects/chexia-face-recognition">CHEXIA-FACE</a>). Having statistics from this kind of agency testing has clear commercial benefit: it helped win the case for the vendors who could match those statistics when the <a href="https://www.met.police.uk/SysSiteAssets/media/downloads/force-content/met/advice/lfr/other-lfr-documents/lfr-accuracy-and-demographic-differential.pdf">London Metropolitan police purchased live facial recognition technology</a>. However, the interests of the people that have been documented do not come up for discussion in this form of data governance. There are many participatory methods that could be used for more ethical stewardship of the data that people are compelled to give. <sup>2</sup></p>
<p>To address the scope for minorities and vulnerable groups to play their part in data collection, it should be possible for data scientists to adopt strategies that consciously address the bias in data collection. Eun Seo Jo (from Stanford University) and Timnit Gebru (formerly at Google) have suggested library and archival approaches. In Strategies for Collecting Sociocultural Data in Machine Learning, <sup>3</sup> they note that internet data is subject to historical and representative biases. Recognising and mitigating biases will “start with a statement of commitment to collecting the cultural remains of certain concepts, topics, or demographic groups.” A public mission statement, which highlights the interests of communities and minorities they plan to support, “forces [archival] researchers to reckon with their data composition.”</p>
<p>These strategies also need to be supported by good management of data collection and curation. A report by the Royal Academy of Engineering (2021) <a href="https://reports.raeng.org.uk/datasharing/implications-for-policy">Towards trusted data sharing: implications for policy and practice</a> has highlighted that, to support the use of data for research, good data management must exist among the data owners. Strong relationships with data owners predicated on data quality and ethics will help researchers to specify what data sets they are looking for and how they can best be curated for AI purposes. Good data management will not only help AI developers but also all potential users (as well as the public) to understand the scope and the quality of what’s being shared. “Defining the requirements for data quality, and ensuring these requirements are delivered, remains a central challenge.” (<a href="https://reports.raeng.org.uk/datasharing/implications-for-policy">RAE report</a>)</p>
<p>Advocates for accurate and fair data and machine learning have worked hard to clarify what good data management and sharing looks like, which is cause for optimism. However there is the sense that this is the area in which AI has the furthest to go, as data sets currently available fall far wide from the standards recommended by their work. Nonetheless the rise of Trusted Research Environments, Data Safe Havens and other methods of open-source transparency enable more AI innovators to disclose their sources without placing any of the personal information they use at further risk, as <a href="https://realworlddatascience.net/foundation-frontiers/posts/2024/05/07/ai-series-3.html">discussed previously in the AI series</a>. Leadership on ethical standards for data sharing may yet help to improve the <a href="https://oecd.ai/en/dashboards/ai-principles/P8">robustness, security, safety, and fairness of AI tools</a>, which the OECD advocates as key principles for AI.</p>
</section>
<section id="openness-explainability-and-the-scope-to-challenge-ai-decisions" class="level2">
<h2 class="anchored" data-anchor-id="openness-explainability-and-the-scope-to-challenge-ai-decisions">Openness, explainability, and the scope to challenge AI decisions</h2>
<p>A principle that many data science communities have been working on, is towards ensuring transparency and explainability of AI (<a href="https://oecd.ai/en/dashboards/ai-principles/P7">OECD AI Principle</a>). In OECD parlance that is in part “to ensure that people understand when they are engaging with [artificial intelligence] and can challenge outcomes.” In acknowledgement that some AI applications make this disclosure harder and more unappealing, the OECD suggests that the fact that AI is in use should be disclosed “with proportion to the importance of the outcome … so that consumers, for example, can make more informed choices”. The OECD emphasises the importance of the “explainability” of the algorithms, which it defines as “enabling people affected by the outcome of an AI system to understand how it was arrived at. … notably – to the extent practicable – the factors and logic that led to an outcome.”</p>
<p>The <a href="https://www.oii.ox.ac.uk/research/projects/a-fairwork-foundation-towards-fair-work-in-the-platform-economy/">tens of millions of digital ‘platform workers’</a> that now live all over the world are a case in point for where explainability is needed. They perform short-term, freelance, or temporary work through digital platforms or apps in the “gig economy”. There is little transparency about how algorithms and AI influence outcomes for gig workers, and whether platform algorithms are contributing systematically to unfair outcomes. <a href="https://www.oii.ox.ac.uk/research/projects/a-fairwork-foundation-towards-fair-work-in-the-platform-economy/">Platform workers themselves have come together</a> to share their data to understand more about the outcomes of the algorithms, or AI, which is shaping their lives.</p>
<p>It follows that where the use of an AI system does not affect outcomes for people, there may be less of a demand to publicly justify how AI arrived at its outcomes. For example, where AI is used to simulate something, or to research a decision, rather than to make a decision, there could be less weight placed on explaining the model publicly.</p>
<div id="Aerial_view_of_Silion_Valley991" class="quarto-figure quarto-figure-center anchored">
<figure class="figure">
<p><img src="https://realworlddatascience.net/foundation-frontiers/posts/2024/05/14/images/Aerial_view_of_Silicon_Valley991.jpg" class="img-fluid figure-img" alt="Aerial view of tech cluster in Silicon Valley, taken on 29 March 2013, courtesy of Patrick Nouhaillier CC-BY-3.0"></p>
<figcaption>Aerial view of tech cluster in Silicon Valley, taken on 29 March 2013, courtesy of <a href="Patrick Nouhaillier">https://www.flickr.com/photos/patrick_nouhailler/</a> CC-BY-3.0</figcaption>
</figure>
</div>
<p>François Candelon, Theodoros Evgeniou, and David Martens, writing for the Harvard Business Review have outlined that their preference is for <a href="https://hbr.org/2023/05/ai-can-be-both-accurate-and-transparent">accuracy as well as explainability</a>. Often, to strike this balance, they will prefer ‘white box’ models which are transparent and interpretable. But not always. “In [complex] applications such as face-detection for cameras, vision systems in autonomous vehicles, facial recognition, image-based medical diagnostic devices, illegal/toxic content detection, and most recently, generative AI tools like ChatGPT and DALL-E, a black box approach may be advantageous or even the only feasible option.”</p>
<p>Even where the algorithm is too large and complicated to be interpretable, work like that conducted by the Alan Turing Institute in <a href="https://www.turing.ac.uk/research/research-projects/project-explain">Project ExplAIn</a> finds ways of extracting some kind of explanation, for instance by embedding layers in the coding. The case for opening up AI in this way has to be balanced against concerns for intellectual property, information security and privacy. There can be <a href="https://www.tripwire.com/state-of-security/ai-transparency-why-explainable-ai-essential-modern-cybersecurity">cybersecurity issues</a> with making the different layers of an AI model more open to interrogation. Nonetheless, experiments with transparent and explainable models enable developers to advance their understanding of AI, as well as to consider whether its use for decision-making is ethically sound. The OECD principles make clear that it is important that AI doesn’t elude human insight, checks and balances. As Andrew Ng highlighted in the <a href="https://youtu.be/nIIPMmZaK-s?si=T6ahpP6R1QjuUIsq">RSS fireside chat in 2021</a>: “AI is increasing concentration of power like never before…governments and regulators need to look at that and think of what to do.”</p>
</section>
<section id="appropriate-human-centred-governance" class="level2">
<h2 class="anchored" data-anchor-id="appropriate-human-centred-governance">Appropriate, human-centred governance</h2>
<p>When school exams in England were cancelled during the Covid-19 pandemic, the government’s Department for Education decided that an algorithm should be used to allot grades to A-Level students, partly as a measure to counter grade inflation (a trend in which the grades awarded for the same standard of work will tend to rise, year on year). Algorithms had been used before in previous years to adjust the marks that were awarded for exams and coursework. Here instead of exams and coursework, the input data was gathered from Ofqual’s historical records about how particular schools’ pupils had performed in previous years, and some was generated by teachers. Efforts had been made at transparency in terms of how the new algorithm would arrive at these decisions (it was a relatively simple, white box algorithm). But there were ‘outliers’ acknowledged in the model even prior to deployment. Coupled with the widespread downgrading of teacher-estimated grades to fit a curve that would avoid grade inflation, there was not a clear process by which students and schools could appeal to change their grades. <a href="https://www.theguardian.com/education/2021/feb/18/the-student-and-the-algorithm-how-the-exam-results-fiasco-threatened-one-pupils-future">Dissatisfaction with the grades awarded</a> in the absence of exams or coursework was rife, as young people regarded as academically talented by their schools fell short of the grades their teachers had predicted, and lost university places.</p>
<p>In the resulting furore, the Department for Education determined that its original policy was wrong and adopted the teacher estimated grades with an appeal process in place. The incident demonstrates that achieving the functional transparency of an algorithm is only one step in due process. Controversial policies could be using an algorithm to apportion losses across the population (e.g.&nbsp;to try to reduce grade inflation) in ways that are abhorred by individuals.</p>
<p>Vested interests also surfaced during investigation of an algorithm brought into use to <a href="https://orientblackswan.com/details?id=9789352875429#:~:text=Dissent%20on%20Aadhaar%20argues%20that,surveillance%20and%20commercial%20data%2Dmining.">tackle fraud in India’s welfare system</a>. “From 2014 to 2019, the government of Telangana “<a href="https://www.aljazeera.com/economy/2024/1/24/how-an-algorithm-denied-food-to-thousands-of-poor-in-indias-telangana">cancelled more than 1.86 million existing food security cards</a> and rejected 142,086 fresh applications without any notice.” reported Al Jazeera in January of this year. Despite the government’s initial claims that the cancelled food security cards were fraudulent, critical data scholarship in India and elsewhere has established discrepancies and errors in the algorithms used, such as, confusing the records of a valid claimant with a car-owning citizen by the same name. (Under the government’s policies, SUV owners cannot receive food aid.) Further investigations revealed that at least 7.5 per cent of the food security cards were wrongly cancelled. The investigations highlight what can be a common problem: a focus on reducing the costs of welfare programmes tends to lead services to identify false positives - wrongful claimants – rather than false negatives. Thus efforts to correct sloppy data may meet resistance if this leads to fewer “frauds” being identified, even when citizens bring evidence to challenge it.</p>
<p>There is a similar type of example in the <a href="https://www.computerweekly.com/feature/Post-Office-Horizon-scandal-explained-everything-you-need-to-know">UK’s Post Office scandal</a>, in which many sub-postmasters were wrongfully prosecuted for false accounting, after the Post Office adopted accounting software that contained significant bugs, which were covered up for many years. This similarly goes to show how far organisations can pursue wrongful judgements, and the life-changing consequences.</p>
<p>The <a href="https://digital-strategy.ec.europa.eu/en/policies/regulatory-framework-ai">EU’s new AI Act</a> advocates a risk-based approach, to balance the desire to minimise the burden of compliance while ensuring the safety of people who may be affected by the implementation of AI algorithms. Systems assessed as <a href="https://digital-strategy.ec.europa.eu/en/policies/regulatory-framework-ai">high risk according to specific criteria</a> are then “subject to strict obligations before they can be put on the market”.</p>
<p>Governments across the industrialised world have raised their hopes for AI that will help to drive increases in productivity, and do so safely in ways that are fairly constructed, making use of legitimate data sources, and with fair outcomes for society. The work of data scientists is integral to the foundations by which AI can be used for social good, from establishing protocols for data management and sharing, to understanding the workings of complex algorithms, and the use of large and unstructured data sources. Data scientists and researchers are getting closer to understanding what good looks like, not just in terms of the ethical values to uphold but the technicalities of the code and data involved. However, a great deal of not only data work but also other work also needs to be maintained to uphold the ideal of ‘AI ethics’. Support for well-established ethical and legal rights and principles, to meaningfully involve people in policies that will be affected by AI use, and to develop data governance and infrastructure. It is always possible that when we’re working on AI ethics, we find that there are fairer and more ethical approaches that should precede the use of AI.</p>
<p>“AI development raises a range of ethical questions for data practitioners, whether they are data scientists, econometricians, analysts, or statisticians,” Daniel Gibbons, Vice Chair of the Royal Statistical Society’s Data Ethics and Governance Section told <em>Real World Data Science</em>. Today, many data scientists would urge that ethical considerations precede the development of an AI algorithm and must inform its design and use, particularly for processes that significantly affect people, to ensure it does not propagate errors and injustices.</p>
<div class="article-btn">
<p><a href="../../../../../foundation-frontiers/index.html">Explore more data science ideas</a></p>
</div>
<div class="further-info">
<div class="grid">
<div class="g-col-12 g-col-md-12">
<dl>
<dt>About the author</dt>
<dd>
<strong>Olivia Varley-Winter</strong> Olivia is an experienced policy manager who has worked for the Royal Statistical Society, the Open Data Institute, Open Data Charter, the Nuffield Foundation, and the Alan Turing Institute. She was part of the Ada Lovelace Institute’s founding team in 2018 to 2020 and has since supported the development of other policy-related programmes and partnerships relating to data, AI and ethics. She is presently working for Smart Data Research UK on matters pertaining to ethics and responsible data governance. She has an MSc. in Nature, Society, and Environmental Policy from University of Oxford.
</dd>
</dl>
</div>
<div class="g-col-12 g-col-md-6">
<!-- copyright goes to the author, or to Royal Statistical Society if written by staff -->
<dl>
<dt>Copyright and licence</dt>
<dd>
© 2024 Royal Statistical Society
</dd>
</dl>
<!-- confirm licence terms with contributor before publishing - must be Creative Commons licence, but different types of CC licences might be preferred -->
<p><a href="http://creativecommons.org/licenses/by/4.0/?ref=chooser-v1" target="_blank" rel="license noopener noreferrer" style="display:inline-block;"> <img src="https://mirrors.creativecommons.org/presskit/icons/cc.svg?ref=chooser-v1" style="height:22px!important;vertical-align:text-bottom;"><img src="https://mirrors.creativecommons.org/presskit/icons/by.svg?ref=chooser-v1" style="height:22px!important;margin-left:3px;vertical-align:text-bottom;"></a> This article is licensed under a Creative Commons Attribution 4.0 (CC BY 4.0) <a href="http://creativecommons.org/licenses/by/4.0/?ref=chooser-v1" target="_blank" rel="license noopener noreferrer" style="display:inline-block;"> International licence</a>. <!-- Add thumbnail image credit and any licence terms here --></p>
</div>
<div class="g-col-12 g-col-md-6">
<dl>
<dt>How to cite</dt>
<dd>
Varley-Winter, O., Author. 2024. “On AI ethics - influencing its use in the delivery of public good.” Real World Data Science, May 14, 2024. <a href="https://realworlddatascience.net/foundation-frontiers/posts/2024/04/22/ai-series-2.html">URL</a>
</dd>
</dl>
</div>
</div>
</div>
<!-- Make sure to update main site homepage (index.qmd) before publishing. See README for details. -->


</section>


<div id="quarto-appendix" class="default"><section id="footnotes" class="footnotes footnotes-end-of-document"><h2 class="anchored quarto-appendix-heading">Footnotes</h2>

<ol>
<li id="fn1"><p>Grother, P., Ngan, M. &amp; Hanaoka K. Face Recognition Vendor Test (FRVT) Part 3: Demographic Effects NISTIR 8280 (2019) https://nvlpubs.nist.gov/nistpubs/ir/2019/NIST.IR.8280.pdf↩︎</p></li>
<li id="fn2"><p>Participatory data stewardship (2021) Ada Lovelace Institute https://www.adalovelaceinstitute.org/wp-content/uploads/2021/11/ADA_Participatory-Data-Stewardship.pdf ↩︎</p></li>
<li id="fn3"><p>Jo, E. S. &amp; Gebru T. Lessons from Archives: Strategies for Collecting SocioculturalData in Machine Learning Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency (2020) https://dl.acm.org/doi/epdf/10.1145/3351095.3372829↩︎</p></li>
</ol>
</section></div> ]]></description>
  <category>AI</category>
  <category>AI ethics</category>
  <guid>https://realworlddatascience.net/foundation-frontiers/posts/2024/05/14/ai-series-2.html</guid>
  <pubDate>Tue, 14 May 2024 00:00:00 GMT</pubDate>
  <media:content url="https://realworlddatascience.net/foundation-frontiers/posts/2024/05/14/images/Eleanor_Roosevelt-991724.jpg" medium="image" type="image/jpeg"/>
</item>
<item>
  <title>AI series: Healthy datasets for optimised AI performance</title>
  <dc:creator>Fatemeh Torabi, Lewis Hotchkiss, Emma Squires, Prof. Simon E. Thompson and Prof. Ronan A. Lyons</dc:creator>
  <link>https://realworlddatascience.net/foundation-frontiers/posts/2024/05/07/ai-series-3.html</link>
  <description><![CDATA[ 





<!-- article text to go here -->
<p>In Charles Babbage’s <a href="https://www.gutenberg.org/files/57532/57532-h/57532-h.htm">Passages from the Life of a Philosopher</a> he recalls two incidents where he is asked, “Pray, Mr.&nbsp;Babbage, if you put into the machine wrong figures, will the right answers come out?” Reflecting on these incidents he comments, “I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question.” Similarly, accurate and clean data is at the core of a functional AI model. However, ensuring accuracy of input data to avoid any “wrong figures” creeping into training datasets used to train AI models, demands meticulous attention during the stages of data wrangling, cleaning, and curation.</p>
<p>This necessity is particularly pronounced when dealing with the vast datasets used for training machine learning algorithms which are at the core of AI models. The predictive power of these models are highly dependent on the quality of the training data. <sup>1</sup> The most obvious errors which often require meticulous attention at data processing stages are duplication, missingness and data imbalance (Figure&nbsp;1). The presence of any of these errors can exert multifaceted impacts both in the training and testing stage of machine learning algorithms that are at the core of any AI models, and the challenge does not end there. The provenance, content, format and structure of the data require attention as well. Even data that is essentially correct may be “wrong” for a particular data set.</p>
<section id="obviating-the-obvious-errors" class="level2">
<h2 class="anchored" data-anchor-id="obviating-the-obvious-errors">Obviating the obvious errors</h2>
<p>Duplicated records can mask existing diversities within the data, diminishing the representativeness of important subgroups and leading to a biased training set and model outcomes. If duplication originates from data labelling issues, it can lead to fundamental challenges during the training of supervised models. <sup>2</sup> In healthcare data, this issue can arise when linking data across multiple sources where each source holds different labels for the same data. <sup>3</sup></p>
<p>Missingness directly leads to loss of information required for training the algorithms on various real-world scenarios. It is typically addressed via two primary routes: deletion of missing rows or imputation. Deleting missing rows results in a reduction of the sample size and bias. For instance, when it comes to health data, using electronic medical records from a single health provider such as general practice may give rise to a lot of missingness in other aspects of an individual’s health such as their hospital records, pathology testing data or medical imaging. On the one hand, structured missingness can serve as an informative feature to explore within the data. However in cases where missing data causes pixelation in the comprehensive health picture we are attempting to construct, it often conceals an underlying narrative. <sup>4</sup> For instance, the COVID-19 response involved many initiatives across the AI community, however, during the early stages of the pandemic partial availability of data pixelated the picture and impacted models predictive ability which resulted in a minimal improvement according to the UK’s national centre for data science and AI report. <sup>5</sup></p>
<p>Imputing missing values can preserve the whole sample. However, the introduction of noise during the imputation process may compromise the quality of fitted models, contingent upon the proportion of missing records. One way to offset this may be to use larger and larger data sets that might inevitably include a fuller training picture to the algorithm, just as adding dots to a pixelated image makes it more and more clear what is depicted.</p>
<p>It is often perceived that for certain instances, particularly in the context of deep learning models, such as neural networks, the model itself is capable of handling missing values without explicit imputation or deletion procedures. <sup>6</sup> Is this really true in a real-world scenario? Are these models advanced enough to achieve an optimised performance even when data quality is not at an optimised level? <sup>7</sup> Köse et al.&nbsp;(2020) investigated the effect of two conventional imputation approaches: Multiple imputation (MICE) <sup>8</sup> and Factor analysis of mixed data (FAMD) <sup>9</sup> on performance of deep learning models. Their study endorsed the use of such explicit imputation approaches, showing an enhancement in model performance. <sup>10</sup></p>
<p>Data imbalance issues, arise within datasets where there is a disproportionate amount of information pertaining to a specific aspect. When imbalanced data, rich in focused information, is used to train an AI model, the model becomes adept at learning about the specific aspect but may struggle to generalise its findings to diverse scenarios, thus fostering overfitting where the model achieves accurate prediction on the training data but loses accuracy for any new test dataset.</p>
<div id="fig-1" class="quarto-float quarto-figure quarto-figure-center anchored" alt="Stages involved in AI model development" data-fig-align="center">
<figure class="quarto-float quarto-float-fig figure">
<div aria-describedby="fig-1-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
<img src="https://realworlddatascience.net/foundation-frontiers/posts/2024/05/07/images/Stages-of-model-development-724.png" class="img-fluid quarto-figure quarto-figure-center figure-img" alt="Stages involved in AI model development">
</div>
<figcaption class="quarto-float-caption-bottom quarto-float-caption quarto-float-fig" id="fig-1-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
Figure&nbsp;1: Stages involved in AI model development.
</figcaption>
</figure>
</div>
<p>Overfitting severely undermines the predictive performance of AI models on data beyond their training set, defeating the primary purpose of these models. For instance, out of all strokes that occur, approximately 85% are ischaemic, caused by blockage of blood supply to part of the brain, and 15% are haemorrhagic, caused by bleeding into the brain. Development of machine-learning and AI-based stroke predictive models are therefore being affected by this natural imbalance in the two types of stroke. <sup>11</sup> This type of imbalance also exists in population wide studies where stroke itself is only present in a minority subsection of a healthy population. <sup>12</sup></p>
<p>The circumstances of data collection can lead to bias, so care is needed at early stages to ensure that datasets are representative of the real world. These types of error can be picked up at an initial Quality Assurance (QA) stage conducted to reveal any unexpected errors in data used by AI models. QA checks often involve principle checks on presented values to ensure they are the right data type and within the expected range and have the expected temporal coverage.</p>
<p>Finally the choice of features included in a data set requires consideration since this can also have implications on an algorithm. Taking another example from the COVID-19 pandemic, a group of researchers trained an algorithm on radiological imaging of COVID-19 patients where the position of the patient during radiology was present as a feature in the dataset. However, since more severe cases were lying down and less severe cases were standing up, the existence of this feature resulted in an algorithm that predicted COVID-19 risk based on the position of the patients. Here although the data included was correct, its inclusion in the dataset proved to be “wrong”.</p>
</section>
<section id="things-get-complicated" class="level2">
<h2 class="anchored" data-anchor-id="things-get-complicated">Things get complicated</h2>
<p>When AI algorithms encounter complex, unstructured data, the task of quality assurance suddenly balloons beyond tackling the three main errors highlighted above. Such circumstances require some kind of quality enhancement procedure, where datasets in the form of images or unstructured text go through a curation process which involves enhancement of the quality and standardisation of the format to the level required for integration into AI algorithms. This process of standardization of data is paramount across various domains, especially in healthcare where complex, unstructured health data holds transformative potential for AI driven advances that revolutionise diagnosis, treatment, and prognosis of diseases. From electronic health records to magnetic resonance imaging (MRI) scans and genetic sequences, this data offers a wealth of insights for AI models to learn from. Adopting standardised formats not only facilitates seamless integration of diverse datasets but also streamlines the development and deployment of AI models. However, unlocking this potential requires a strong foundation in high-quality, processed data which begins with standardisation.</p>
<p>One category of complex health data is neuroimaging of which a prime example is MRI. Different institutions will often employ different acquisition protocols and different ways of collecting and storing neuroimaging data. Above all, this can make it very difficult to integrate into existing workflows and processing pipelines, but it also makes it challenging to understand, compare and combine with other datasets. To address these challenges, the neuroimaging community has adopted the Brain Imaging Data Structure (BIDS) <sup>13</sup> – a standardised format for organising and naming MRI data which allows compliant data to integrate smoothly with existing workflows and AI models, streamlining processing and analysis. By embracing standardisation, we can pave the way for common processing tools to enable the generation of AI-ready data.</p>
<p>Next, comes pre-processing. Sticking with the neuroimaging example, MRI scans are susceptible to various forms of noise and artifacts, which can appear as blurring or distortions which, without proper processing, can be misinterpreted by AI models. Pre-processing typically includes steps for spatial normalisation and image registration, involving alignment of brain images from different individuals into a common reference model and alignment of different images of the same subject to a common template. This standardisation facilitates inter-subject and inter-study comparisons, enabling AI models to generalise effectively across diverse datasets. However, the multi-layer aspect of this process means that aligning data to a common template is dependent on the choice of template which itself can introduce bias if the template brain doesn’t accurately reflect the patient’s anatomy (due to age, ethnicity, or disease for example).</p>
<div id="fig-2" class="quarto-float quarto-figure quarto-figure-center anchored" alt="Neuroimaging data" data-fig-align="center">
<figure class="quarto-float quarto-float-fig figure">
<div aria-describedby="fig-2-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
<img src="https://realworlddatascience.net/foundation-frontiers/posts/2024/05/07/images/neuroimaging.jpg" class="img-fluid quarto-figure quarto-figure-center figure-img" alt="Neuroimaging data">
</div>
<figcaption class="quarto-float-caption-bottom quarto-float-caption quarto-float-fig" id="fig-2-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
Figure&nbsp;2: Neuroimaging data.
</figcaption>
</figure>
</div>
<p>Once pre-processing is complete, you may want to combine datasets to increase sample sizes for your AI model. This is where harmonisation techniques <sup>14</sup> <sup>15</sup> come in to deal with inconsistencies and variations in acquisition which can add noise and bias into a model. A typical technique for harmonisation in neuroimaging, known as ComBat <sup>16</sup>, works by modelling data using a hierarchical Bayesian model and followed by empirical Bayes to infer the distribution of the unknown factors. The method is actually borrowed from genomics data but is applicable to situations where multiple features of the same type are collected for each participant, whether that be expression levels for genes, or imaging derived measures such as volumes from different regions. This is a crucial step for combining datasets to enable AI models to focus on learning the actual relationships within the data rather than struggling with inconsistencies across datasets. It also leads to models which can generalise better on unseen data.</p>
</section>
<section id="feeding-hungry-algorithms" class="level2">
<h2 class="anchored" data-anchor-id="feeding-hungry-algorithms">Feeding hungry algorithms</h2>
<p>The public good is at the heart of AI driven approaches and indeed, the aim is to develop models with optimised predictive ability that can be generalised to many scenarios. For this to be achieved a large and diverse training source is required. This is often referred to as data hungry algorithms. To provide a large amount of enriched training data for optimised model development two main approaches have been explored: federated analytics and synthetic data.</p>
<p>Federation is when data from multiple sources is made available for training and analysis of models designed to run on data that is not held in a single place, nominally called distributed models. It provides the opportunity to test algorithms in different populations/settings to ensure generalisability. In the context of patient-level health data, the data is often held institutionally. Enabling federation and trustworthy sharing of these datasets requires extensive attention to governance models and a common model between multiple organisations is a known catalyst of this process <sup>17</sup> <sup>18</sup></p>
<p>Generating synthetic data <sup>19</sup> from original data sources is a resource intensive mechanism. It requires the development of models on the real data to learn existing patterns, formats, and statistical properties within the original data from which it is possible to generate further synthetic versions of these data. When working with sensitive data such as health records, ensuring patient data is safe and secure is covered by information governance. Depending on how close the synthetic data source is to the original data, the same governance level may still be applicable when trying to bring individual/patient data from multiple sources together. A suggested solution to overcome the governance challenges in the context of synthetic data is to use a <a href="https://www.adruk.org/news-publications/news-blogs/accelerating-public-policy-research-with-easier-safer-synthetic-data/">low-fidelity version</a> of the original data which means a level of bias has been added throughout the synthesisation process to ensure safety and security of individual level data. <sup>20</sup> While the low fidelity data sources are generated based on real data, it is worth noting that the rise in generative AI also poses a concern for data pollution, particularly where AI tools such as Gretel.ai <sup>21</sup> are used to generate synthetic data which may also be used to train AI models – the problematic case of AI training AI!</p>
<p>When using sensitive health data of patients a further layer is in place to ensure security of access. These are called Trusted Research Environments (<a href="https://www.hdruk.ac.uk/access-to-health-data/trusted-research-environments/">TREs</a>), secure technology infrastructures which play a crucial role in consolidating disparate data collections into a centralised repository, facilitating researcher access to data for scientific exploration. However, integrating data from various sources into AI models poses challenges due to differences in data collection methods and formats, hindering computational analysis. In response, the FAIR (Findable, Accessible, Interoperable, Reusable) data principles were introduced in 2016 to enhance the reusability of scientific datasets by humans and machines. <sup>22</sup> Adoption of FAIR principles within TREs ensures well-documented, curated, and harmonised datasets, addressing issues raised above such as duplicated records and missing data. <sup>23</sup> Additionally, preprocessing pipelines within TREs streamline data standardisation, creating “AI research-ready” datasets. <sup>24</sup></p>
<p>Access to real-world healthcare data remains challenging, prompting the development of AI models on open-source or synthetic datasets. However, these models often exhibit performance discrepancies when applied to real world data <sup>25</sup> It is therefore imperative to provide researchers with secure access to real-world healthcare data within TREs, bolstered by robust governance and support mechanisms. Initiatives like the GRAIMATTER study <sup>26</sup> and AI risk evaluation workshops <sup>27</sup> exemplify efforts to facilitate AI model development and translation from TREs to clinical settings. By establishing governance guidelines and promoting FAIR datasets, TREs aim to become important resources for the AI research community. Providing standardised and curated data rich repositories that AI models can be developed on is a top priority in UK-TREs. Given the well-defined and secure governance environments of TREs they may also provide the basis for federated data analysis allowing researchers to combine datasets across TREs/data environments. In this way they can provide the large numbers that data hungry algorithms require, while avoiding the wide-ranging and myriad ways that data for a specific dataset can be “wrong”.</p>
</section>
<section id="also-in-the-ai-series" class="level2">
<h2 class="anchored" data-anchor-id="also-in-the-ai-series">Also in the AI series:</h2>
<p><a href="https://realworlddatascience.net/foundation-frontiers/posts/2024/04/22/ai-series-1.html">What is AI? Shedding light on the method and madness in these algorithms</a> <a href="https://realworlddatascience.net/foundation-frontiers/posts/2024/04/29/gen-ai-human-intel.html">Generative AI models and the quest for human-level artificial intelligence</a></p>
<div class="article-btn">
<p><a href="../../../../../foundation-frontiers/index.html">Explore more data science ideas</a></p>
</div>
<div class="further-info">
<div class="grid">
<div class="g-col-12 g-col-md-12">
<dl>
<dt>About the authors</dt>
<dd>
<strong>Fatemeh Torabi</strong> is Senior Research Officer and Data Scientist, at Swansea University and works on Health Data Science and Population Data Science for the Dementias Platform UK.
</dd>
<dd>
<strong>Lewis Hotchkiss</strong> is a Research Officer in Neuroimaging at Swansea University and works on Population Data Science for the Dementias Platform UK.
</dd>
<dd>
<strong>Emma Squires</strong> is the Data Project Manager for Dementias Platform UK based at Swansea University and works on Population Data Science
</dd>
<dd>
<strong>Prof.&nbsp;Simon E. Thompson</strong> is Deputy Associate Director of the Dementias Platform UK
</dd>
<dd>
<strong>Prof.&nbsp;Ronan A. Lyons</strong> is the Associate Director of the Dementias Platform UK based at Swansea University and works on Population Data Science.
</dd>
</dl>
</div>
<div class="g-col-12 g-col-md-6">
<!-- copyright goes to the author, or to Royal Statistical Society if written by staff -->
<dl>
<dt>Copyright and licence</dt>
<dd>
© 2024 Royal Statistical Society
</dd>
</dl>
<!-- confirm licence terms with contributor before publishing - must be Creative Commons licence, but different types of CC licences might be preferred -->
<p><a href="http://creativecommons.org/licenses/by/4.0/?ref=chooser-v1" target="_blank" rel="license noopener noreferrer" style="display:inline-block;"> <img src="https://mirrors.creativecommons.org/presskit/icons/cc.svg?ref=chooser-v1" style="height:22px!important;vertical-align:text-bottom;"><img src="https://mirrors.creativecommons.org/presskit/icons/by.svg?ref=chooser-v1" style="height:22px!important;margin-left:3px;vertical-align:text-bottom;"></a> This article is licensed under a Creative Commons Attribution 4.0 (CC BY 4.0) <a href="http://creativecommons.org/licenses/by/4.0/?ref=chooser-v1" target="_blank" rel="license noopener noreferrer" style="display:inline-block;"> International licence</a>. <!-- Add thumbnail image credit and any licence terms here --></p>
</div>
<div class="g-col-12 g-col-md-6">
<dl>
<dt>How to cite</dt>
<dd>
Torabi, Fatemeh, Hotchkiss, Lewis, Squires, Emma, Thompson, Simon E. and Lyons, Ronan A. 2024. “Getting the data right for optimised AI performance.” Real World Data Science, May 7, 2024. <a href="https://realworlddatascience.net/foundation-frontiers/posts/2024/05/07/ai-series-3.html">URL</a>
</dd>
</dl>
</div>
</div>
</div>
<!-- Make sure to update main site homepage (index.qmd) before publishing. See README for details. -->


</section>


<div id="quarto-appendix" class="default"><section id="footnotes" class="footnotes footnotes-end-of-document"><h2 class="anchored quarto-appendix-heading">Footnotes</h2>

<ol>
<li id="fn1"><p>Li, P. et al.&nbsp;CleanML: A Benchmark for Joint Data Cleaning and Machine Learning [Experiments and Analysis]↩︎</p></li>
<li id="fn2"><p>Azeroual, O. et al.&nbsp;A Record Linkage-Based Data Deduplication Framework with DataCleaner Extension. Multimodal Technol. Interact. 2022, Vol. 6, Page 27 6, 27 (2022).↩︎</p></li>
<li id="fn3"><p>Rajpurkar, P., Chen, E., Banerjee, O. &amp; Topol, E. J. AI in health and medicine. Nat. Med. 2022 281 28, 31–38 (2022).↩︎</p></li>
<li id="fn4"><p>Mitra, R. et al.&nbsp;Learning from data with structured missingness. (2023).↩︎</p></li>
<li id="fn5"><p>Alan Turing Institution. Data science and AI in the age of COVID-19. 2020 https://www.turing.ac.uk/sites/default/files/2021-06/data-science-and-ai-in-the-age-of-covid_full-report_2.pdf↩︎</p></li>
<li id="fn6"><p>Han, J. &amp; Kang, S. Dynamic imputation for improved training of neural network with missing values. Expert Syst. Appl. 194, 116508 (2022).↩︎</p></li>
<li id="fn7"><p>Köse, T. et al.&nbsp;Effect of Missing Data Imputation on Deep Learning Prediction Performance for Vesicoureteral Reflux and Recurrent Urinary Tract Infection Clinical Study. Biomed Res. Int. 2020, (2020).↩︎</p></li>
<li id="fn8"><p>Azur, M. J., Stuart, E. A., Frangakis, C. &amp; Leaf, P. J. Multiple imputation by chained equations: what is it and how does it work? Int. J. Methods Psychiatr. Res. 20, 40–49 (2011).↩︎</p></li>
<li id="fn9"><p>Audigier, V., Husson, F. &amp; Josse, J. A principal component method to impute missing values for mixed data. Adv. Data Anal. Classif. 10, 5–26 (2016).↩︎</p></li>
<li id="fn10"><p>Köse, T. et al.&nbsp;Effect of Missing Data Imputation on Deep Learning Prediction Performance for Vesicoureteral Reflux and Recurrent Urinary Tract Infection Clinical Study. Biomed Res. Int. 2020, (2020).↩︎</p></li>
<li id="fn11"><p>Liu, T., Fan, W. &amp; Wu, C. A hybrid machine learning approach to cerebral stroke prediction based on imbalanced medical dataset. Artif. Intell. Med. 101, 101723 (2019).↩︎</p></li>
<li id="fn12"><p>Kokkotis, C. et al.&nbsp;An Explainable Machine Learning Pipeline for Stroke Prediction on Imbalanced Data. Diagnostics 2022, Vol. 12, Page 2392 12, 2392 (2022).↩︎</p></li>
<li id="fn13"><p>Gorgolewski, K. J. et al.&nbsp;BIDS apps: Improving ease of use, accessibility, and reproducibility of neuroimaging data analysis methods. PLoS Comput. Biol. 13, (2017).↩︎</p></li>
<li id="fn14"><p>Bauermeister, S. et al.&nbsp;Research-ready data: the C-Surv data model. Eur. J. Epidemiol. 38, 179–187 (2023).↩︎</p></li>
<li id="fn15"><p>Abbasizanjani, H. et al.&nbsp;Harmonising electronic health records for reproducible research: challenges, solutions and recommendations from a UK-wide COVID-19 research collaboration. BMC Med. Inform. Decis. Mak. 23, 1–15 (2023).↩︎</p></li>
<li id="fn16"><p>Orlhac, F. et al.&nbsp;A Guide to ComBat Harmonization of Imaging Biomarkers in Multicenter Studies. J. Nucl. Med. 63, 172 (2022).↩︎</p></li>
<li id="fn17"><p>Toga, A. W. et al.&nbsp;The pursuit of approaches to federate data to accelerate Alzheimer’s disease and related dementia research: GAAIN, DPUK, and ADDI. Front. Neuroinform. 17, 1175689 (2023).↩︎</p></li>
<li id="fn18"><p>Torabi, F. et al.&nbsp;A common framework for health data governance standards. Nat. Med. 2024 1–4 (2024) doi:10.1038/s41591-023-02686-w.↩︎</p></li>
<li id="fn19"><p>Tucker, A., Wang, Z., Rotalinti, Y. &amp; Myles, P. Generating high-fidelity synthetic patient data for assessing machine learning healthcare software. npj Digit. Med. 2020 31 3, 1–13 (2020)↩︎</p></li>
<li id="fn20"><p>Tucker, A., Wang, Z., Rotalinti, Y. &amp; Myles, P. Generating high-fidelity synthetic patient data for assessing machine learning healthcare software. npj Digit. Med. 2020 31 3, 1–13 (2020)↩︎</p></li>
<li id="fn21"><p>Noruzman, A. H., Ghani, N. A. &amp; Zulkifli, N. S. A. Gretel.ai: Open-Source Artificial Intelligence Tool To Generate New Synthetic Data. MALAYSIAN J. Innov. Eng. Appl. Soc. Sci. 1, 15–22 (2021).↩︎</p></li>
<li id="fn22"><p>Wilkinson, M. D. et al.&nbsp;The FAIR Guiding Principles for scientific data management and stewardship. Sci. Data 2016 31 3, 1–9 (2016).↩︎</p></li>
<li id="fn23"><p>Chen, Y. et al.&nbsp;A FAIR and AI-ready Higgs boson decay dataset. Sci. Data 9, (2021).↩︎</p></li>
<li id="fn24"><p>Esteban, O. et al.&nbsp;fMRIPrep: a robust preprocessing pipeline for functional MRI. Nat. Methods 16, 111–116 (2018).↩︎</p></li>
<li id="fn25"><p>Alkhalifah, T., Wang, H. &amp; Ovcharenko, O. MLReal: Bridging the gap between training on synthetic data and real data applications in machine learning. Artif. Intell. Geosci. 3, 101–114 (2022).↩︎</p></li>
<li id="fn26"><p>Jefferson, E. et al.&nbsp;GRAIMATTER Green Paper: Recommendations for disclosure control of trained Machine Learning (ML) models from Trusted Research Environments (TREs). doi:10.5281/ZENODO.7089491.↩︎</p></li>
<li id="fn27"><p>DARE UK Community Working Group - DARE UK. https://dareuk.org.uk/dare-uk-community-working-groups/dare-uk-community-working-group-ai-risk-evaluation-working-group/.↩︎</p></li>
</ol>
</section></div> ]]></description>
  <category>AI</category>
  <category>Data management</category>
  <category>Data science education</category>
  <guid>https://realworlddatascience.net/foundation-frontiers/posts/2024/05/07/ai-series-3.html</guid>
  <pubDate>Tue, 07 May 2024 00:00:00 GMT</pubDate>
  <media:content url="https://realworlddatascience.net/foundation-frontiers/posts/2024/05/07/images/Stages-of-model-development-724.png" medium="image" type="image/png" height="105" width="144"/>
</item>
</channel>
</rss>
