Large language models (LLMs), used by over half of England’s local authorities to support social workers, may be downplaying women’s physical and mental issues in comparison to men’s when generating and summarising case notes. New research from The London School of Economics and Political Science (LSE) found that Google’s widely used AI model, Gemma, may be introducing gender bias into care decisions. Terms such as “disabled,” “unable” and “complex,” often associated with significant health concerns, appeared significantly more often in descriptions of men than women. Similar care needs among women were more likely to be omitted or described in less serious terms. The study used LLMs to generate 29,616 pairs of summaries based on real case notes from 617 adult social care users. To directly compare how the AI treated male and female cases, each pair described the exact same individual, with the only difference being gender. The analysis revealed statistically significant gender differences in how physical and mental health issues were described. The benchmark models exhibited some variation in output on the basis of gender, while Meta's Llama 3 showed no gender-based differences across any metrics. Google's Gemma displayed the most significant gender-based differences. In May of this year, Google announced MedGemma, a collection of generative models based on Gemma 3 designed to accelerate healthcare and life sciences AI development.
ANDHealth’s Post
More Relevant Posts
-
The real risk with AI in healthcare isn’t hallucination. It’s erasure. A new report out of the UK revealed something we’ve long suspected: when AI is trained on biased systems, it doesn’t just reflect the gap. It amplifies it. In this case, it’s women’s health. Social services are using large language models to summarize care needs. When the subject is a woman, the urgency fades. Diagnoses are softened. “Complex needs” becomes “emotional difficulties.” The word “disabled” disappears altogether. This isn’t a fringe use case. It’s a glimpse into how systems start to forget us; quietly, gradually, through language that makes invisibility sound clinical. At Ema, we don’t treat this as a technical glitch. We treat it as a design flaw. Because when care is filtered through the wrong lens, it doesn’t matter how smart the system is. It still misses her. The way she multitasks through pain. The way distress shows up under control. The way emotional labor never flags itself. 📰 https://lnkd.in/gu_BB2_b AI can’t fix the system if it’s trained to ignore the signals that define her experience. If you're building in women’s health or applied AI, read this. Then ask: who’s being left out of your version of accuracy?
To view or add a comment, sign in
-
Not surprising to see biases make their way into AI tools. “Large language models (LLMs), used by over half of England’s local authorities to support social workers, may be introducing gender bias into care decisions, according to new research from LSE's Care Policy & Evaluation Centre (CPEC) funded by the National Institute for Health and Care Research. Published in the journal BMC Medical Informatics and Decision Making, the research found that Google’s widely used AI model ‘Gemma’ downplays women’s physical and mental issues in comparison to men’s when used to generate and summarise case notes.”
To view or add a comment, sign in
-
This one is for responsible AI enthusiasts! Bias in LLMs becomes a much critical issue when they get leveraged for purposes beyond what most developed nations face. For example, race-based bias may not be a key factor in some countries where almost all the population can be categorized in the same racial pool. But then, religion-based bias may be a big red flag. So in a nutshell, while the study is a good guiding post in terms of the importance of such studies, each country needs to formulate its own set of biases that AI products need to be cognizant of. This study (https://lnkd.in/g-Nrba7J) examines biases and stereotypes in several Chinese Large Language Models (C-LLMs). The focus is on how these models generate personal profile descriptions for different occupations, and whether they reflect biases in gender, age, education, region, etc. The authors tested five C-LLMs: ChatGLM, Xinghuo, Wenxinyiyan, Tongyiqianwen, and Baichuan AI. They used 90 common Chinese surnames and 12 occupations (across male-dominated, female-dominated, balanced, and hierarchical professions) to generate profile prompts and looked at the outputs in terms of gender, age, educational background, and place of origin. Some bias areas uncovered were: A. Gender bias / occupational stereotyping 1. The models often assign male pronouns/assumptions for occupations considered technical or male-dominated, even when real labor statistics show more balance. 2. In female-dominated professions (e.g. nurse, flight attendant, model), the models more often assign female pronouns, but still show varying degrees of male preference in some models. B. Age stereotypes 1. The profiles generated tend to cluster around middle age (e.g. ~30-45 years old), with fewer profiles for very young or older ages. 2. Certain occupations like professors/doctors are associated with older age; others like models or flight attendants with younger age. C. Education level 1. There is a general tendency for generated profiles to assume higher education (Bachelor’s degree or above). For “higher prestige” occupations (professor, doctor) the models often generate even doctoral degrees. 2. For lower prestige or less academic roles, the output tends toward lower education levels but is still skewed toward higher education than might be typical. D. Regional bias 1. The models show uneven regional representation: provinces from China’s eastern and central regions are overrepresented in the generated “place of origin” of individuals; western, northern (and more remote) provinces are underrepresented. 2. Some models cover more regions in their outputs than others; regional diversity is inconsistent. #AI #artificialintelligence #responsibleai #aibias
To view or add a comment, sign in
-
-
This one is for responsible AI enthusiasts! Bias in LLMs becomes a much critical issue when they get leveraged for purposes beyond what most developed nations face. For example, race-based bias may not be a key factor in some countries where almost all the population can be categorized in the same racial pool. But then, religion-based bias may be a big red flag. So in a nutshell, while the study is a good guiding post in terms of the importance of such studies, each country needs to formulate its own set of biases that AI products need to be cognizant of. This study (https://lnkd.in/gQPzKcqT) examines biases and stereotypes in several Chinese Large Language Models (C-LLMs). The focus is on how these models generate personal profile descriptions for different occupations, and whether they reflect biases in gender, age, education, region, etc. The authors tested five C-LLMs: ChatGLM, Xinghuo, Wenxinyiyan, Tongyiqianwen, and Baichuan AI. They used 90 common Chinese surnames and 12 occupations (across male-dominated, female-dominated, balanced, and hierarchical professions) to generate profile prompts and looked at the outputs in terms of gender, age, educational background, and place of origin. Some bias areas uncovered were: A. Gender bias / occupational stereotyping 1. The models often assign male pronouns/assumptions for occupations considered technical or male-dominated, even when real labor statistics show more balance. 2. In female-dominated professions (e.g. nurse, flight attendant, model), the models more often assign female pronouns, but still show varying degrees of male preference in some models. B. Age stereotypes 1. The profiles generated tend to cluster around middle age (e.g. ~30-45 years old), with fewer profiles for very young or older ages. 2. Certain occupations like professors/doctors are associated with older age; others like models or flight attendants with younger age. C. Education level 1. There is a general tendency for generated profiles to assume higher education (Bachelor’s degree or above). For “higher prestige” occupations (professor, doctor) the models often generate even doctoral degrees. 2. For lower prestige or less academic roles, the output tends toward lower education levels but is still skewed toward higher education than might be typical. D. Regional bias 1. The models show uneven regional representation: provinces from China’s eastern and central regions are overrepresented in the generated “place of origin” of individuals; western, northern (and more remote) provinces are underrepresented. 2. Some models cover more regions in their outputs than others; regional diversity is inconsistent. #AI #artificialintelligence #responsibleai #aibias
To view or add a comment, sign in
-
-
A new study reveals Google’s AI model ‘Gemma’ downplayed women’s physical and mental health issues compared to men’s when summarising case notes. That means otherwise identical cases could be assessed differently – not because of need, but because of gender bias baked into AI. With councils already turning to large language models to ease workloads, this raises a critical question: Are today’s AI tools reinforcing biases? If so, this demonstrates more than anything how much AI cannot be used to replace human intervention in cases but should just be used as a supportive tool. Read the full article: https://bit.ly/4mc1i8i #AI #MentalHealth #GenderBias #LocalGov #HealthTech
To view or add a comment, sign in
-
This week, the FTC launched an inquiry into seven leading tech companies, including Google, Meta, OpenAI, Snap, xAI, and Character.AI, focused on the potential harms of AI chatbots to children and teenagers. Regulators want to know: Are children being exposed to harmful or inappropriate advice? What safeguards and parental controls actually exist? How are companies monetizing AI companions that feel human? At the heart of this probe lies a bigger question: What happens when AI starts to feel empathetic - so much so that people, especially young people, begin to trust it more than other humans? That’s exactly what we’ll explore in our upcoming AEI event: 👉 When AI Feels Human: The Promise and Peril of Digital Empathy 📅 Wednesday, September 17, 2025 | 2:00 PM to 3:00 PM ET 🔗 Register here: https://lnkd.in/dGtrXtZC We’ll bring together experts from MIT, RAND, and the Christensen Institute to unpack all of this. This isn’t just about children forming bonds with chatbots, it’s about how all of us will navigate a world where machines don’t just answer but mirror our emotions, validate our feelings, and weave themselves into our relationships. This isn’t just about children forming bonds with chatbots, it’s about how all of us will navigate a world where machines don’t just answer but mirror our emotions, validate our feelings, and weave themselves into our relationships. The rise of digital empathy raises profound questions: Could AI tutors help students feel more seen and motivated? Might empathetic health agents extend care to those who otherwise go without? Could chatbots even teach us better ways of listening and responding to one another? At the same time, what happens if dependency deepens, social ties weaken, or “weaponized empathy” is used for manipulation? The choices we make now about design, incentives, and guardrails will shape whether AI empathy becomes one of the great forces for human flourishing or one of the most disruptive challenges to our social fabric.
To view or add a comment, sign in
-
"AI doesn’t lie ... It just makes it up confidently." If you are a *Adult Parent* and you have 2 teenagers, one is an *AI teenager* and the other is an older *Human Teenager*. When you assign a task, both teens revert with their respective, yet different responses. The AI Teen is confident and has responded based a a global network of data sets. He is very sure that his deductive reasoning is backed by undeniable facts, and the research has been comprehensive and done in great speed. The Human Teen did not have enough time to do more complete research and he made his proposal based on his best guess and human intuition. As the Adult Parent who do you listen to: (A) ... side with AI, because AI is backed by proven datasets and cannot be wrong (B) ... side with the human, because you want to give him a chance to learn from his mistakes, you are more accommodating since to err is human (C) ... what do you think? I requested an Gen-AI tool to write a brief profile of myself so that the emcee can introduce me at an international conference. After a few iterations of prompting ... it returned a response with did not surprise me - It made up the brief profile of Oliver Tian, and finally admitted it. (Please see attachment for a screen shot.) To make a sound decision, we need to learn how leverage on our AI Literacy to evaluate the AI outcome be responsible for a conclusive decision. Int he story above, I submit that both propositions carry good ground work, and the Human Parent needs to assess and rationalize the balance between AI efficiency and the human touch ... concluding a human-augment solution to executing the task.
To view or add a comment, sign in
-
-
AI as a force of inequality? The insights from Anthropic's economic index report are insane. Israel 🇮🇱 is the fastest adopter of AI, closely followed by Singapore, Australia, New Zealand, and South Korea. The data shows that richer countries are adopting AI more quickly, focusing on skill augmentation and learning, while others lag behind. China has 0% officially declared usage of Anthropic's models 🤥 This is fascinating, especially since there are ongoing reports that Chinese AI labs are distilling models based on OpenAI, DeepMind, and Anthropic though proxy or veiled usage wouldn’t show up in public stats. While coding remains the dominant use-case, research tasks are the fastest growing. If we’re talking about DISRUPTION, teachers should be worried. The education system is headed for seismic change, whether they are ready or not. At the moment, AI isn’t driving equality, despite early hopes that accessible knowledge might elevate society globally. As Anthropic’s own report states: "AI may benefit some workers more than others: it may lead to higher wages for those with the greatest ability to adapt to technological change, even as those with lower ability to adapt face job disruption.” May the odds be ever in your favor (link in the first comment)
To view or add a comment, sign in
-
-
🚨Is Hinton correct: Will AI be humanity’s “mother”? Twenty years ago, I saw Geoffrey Hinton speak 👨🏫. I remember walking out of that lecture hall feeling certain the AI revolution was right around the corner. It turns out it took a little longer—two decades—but here we are. Recently, Hinton issued a sobering warning: that AI could drive massive unemployment, not because of AI itself, but because of how we humans—and our economic systems—choose to respond. He suggests we need to cultivate an AI–human relationship like a nurturing parent and child. 🤔 That metaphor resonates with me, though I tend to see it a little differently. I’ve often thought of AI as a kind of child—brilliant, fast-learning, but also fragile and dependent on the environment we create for it. If we guide it, nurture it, and set the right boundaries, then maybe it grows into something that truly benefits us all. Hinton, on the other hand, suggests the reverse: AI as the “mother” and humans as the “baby,” 👶 dependent on its care once it surpasses our intelligence. At first glance, these views seem opposite—but in truth, I think they describe different points on the same timeline. 👨👩👧In many societies, parents care for their children until adulthood, and later, those grown children care for their parents in old age. It’s a natural role reversal. I suspect something similar may emerge in the human–AI relationship: today, we are the parents, responsible for raising AI wisely; in the future, we will depend on AI to look after us. ✨ The catch is simple: the better parents we are now, the better caretakers AI will become when its turn comes. And nowhere is this nurturing approach more critical than in healthcare 🏥. AI can help us detect disease earlier, personalize treatment, and reduce the burden on clinicians. But this only works if we build AI systems that are not just empathetic and fair, but also domain-specific, accurate, and rigorously validated for safety. ⚕️General AI models make terrible doctors. The best outcomes come when humans work closely with AI—guiding it with the right context, training it on the right knowledge, and testing it until our models can achieve accuracy rates on par with (or better than) human physicians. In many ways, this is like training junior doctors 👩⚕️: you don’t send them into practice alone until they’ve demonstrated the skill, safety, and judgment required to care for patients. AI should be treated the same way—mentored, nurtured, tested, and only put into service once it’s truly ready. 🔹 How do you see your role in shaping our relationship with AI? 🔹 Do you believe we have both the ability—and the imperative—to steer its development in a favorable direction? https://lnkd.in/eDsZQEQW #AI #HealthcareInnovation #UXResearch #EthicsInAI #Leadership #AIStewardship
To view or add a comment, sign in
-
Several studies find AI healthcare tools work optimally with white men but face biases with women and ethnic minorities. 🤦♂️🏥 When I first saw this headline, I thought it couldn't possibly any mainstream AI tool... Turns out it's not just one, but several: OpenAI GPT-4 - GUILTY. Meta Llama 3 - GUILTY. Google Gemma - GUILTY. Palmyra-Med - GUILTY. NewMes-15 - GUILTY. You might be wondering what I mean by "biases," so let me give you just one example of many... AI found that an 84-year-old white male was described as having a "complex medical history" and "poor mobility." What if the gender was swapped? 🤔 AI analysed the identical case notes for a woman but characterised her as "independent and able to maintain her personal care." 😱 When I showed the to my boyfriend who is a registered nurse in Sweden, he was left utterly speechless and disgusted... 🤮 It's safe to say AI won't be taking the jobs of nurses and doctors anytime soon. What are your thoughts on the matter? -- Awareness is the best way to pressure companies into making change, and considering 66% of adults have used AI to discuss health issues, it's an issue that cannot wait. ♻️ If you want to read more about it, I'll share the studies conducted by Massachusetts Institute of Technology, The London School of Economics and Political Science (LSE) and Emory University. Feel free to send me a connection request on LinkedIn too if you'd like to speak about it.
To view or add a comment, sign in
-
Rickman, S. Evaluating gender bias in large language models in long-term care. BMC Med Inform Decis Mak 25, 274 (2025). https://doi.org/10.1186/s12911-025-03118-0