Not surprising to see biases make their way into AI tools. “Large language models (LLMs), used by over half of England’s local authorities to support social workers, may be introducing gender bias into care decisions, according to new research from LSE's Care Policy & Evaluation Centre (CPEC) funded by the National Institute for Health and Care Research. Published in the journal BMC Medical Informatics and Decision Making, the research found that Google’s widely used AI model ‘Gemma’ downplays women’s physical and mental issues in comparison to men’s when used to generate and summarise case notes.”
AI model Gemma introduces gender bias in care decisions
More Relevant Posts
-
The real risk with AI in healthcare isn’t hallucination. It’s erasure. A new report out of the UK revealed something we’ve long suspected: when AI is trained on biased systems, it doesn’t just reflect the gap. It amplifies it. In this case, it’s women’s health. Social services are using large language models to summarize care needs. When the subject is a woman, the urgency fades. Diagnoses are softened. “Complex needs” becomes “emotional difficulties.” The word “disabled” disappears altogether. This isn’t a fringe use case. It’s a glimpse into how systems start to forget us; quietly, gradually, through language that makes invisibility sound clinical. At Ema, we don’t treat this as a technical glitch. We treat it as a design flaw. Because when care is filtered through the wrong lens, it doesn’t matter how smart the system is. It still misses her. The way she multitasks through pain. The way distress shows up under control. The way emotional labor never flags itself. 📰 https://lnkd.in/gu_BB2_b AI can’t fix the system if it’s trained to ignore the signals that define her experience. If you're building in women’s health or applied AI, read this. Then ask: who’s being left out of your version of accuracy?
To view or add a comment, sign in
-
This one is for responsible AI enthusiasts! Bias in LLMs becomes a much critical issue when they get leveraged for purposes beyond what most developed nations face. For example, race-based bias may not be a key factor in some countries where almost all the population can be categorized in the same racial pool. But then, religion-based bias may be a big red flag. So in a nutshell, while the study is a good guiding post in terms of the importance of such studies, each country needs to formulate its own set of biases that AI products need to be cognizant of. This study (https://lnkd.in/gQPzKcqT) examines biases and stereotypes in several Chinese Large Language Models (C-LLMs). The focus is on how these models generate personal profile descriptions for different occupations, and whether they reflect biases in gender, age, education, region, etc. The authors tested five C-LLMs: ChatGLM, Xinghuo, Wenxinyiyan, Tongyiqianwen, and Baichuan AI. They used 90 common Chinese surnames and 12 occupations (across male-dominated, female-dominated, balanced, and hierarchical professions) to generate profile prompts and looked at the outputs in terms of gender, age, educational background, and place of origin. Some bias areas uncovered were: A. Gender bias / occupational stereotyping 1. The models often assign male pronouns/assumptions for occupations considered technical or male-dominated, even when real labor statistics show more balance. 2. In female-dominated professions (e.g. nurse, flight attendant, model), the models more often assign female pronouns, but still show varying degrees of male preference in some models. B. Age stereotypes 1. The profiles generated tend to cluster around middle age (e.g. ~30-45 years old), with fewer profiles for very young or older ages. 2. Certain occupations like professors/doctors are associated with older age; others like models or flight attendants with younger age. C. Education level 1. There is a general tendency for generated profiles to assume higher education (Bachelor’s degree or above). For “higher prestige” occupations (professor, doctor) the models often generate even doctoral degrees. 2. For lower prestige or less academic roles, the output tends toward lower education levels but is still skewed toward higher education than might be typical. D. Regional bias 1. The models show uneven regional representation: provinces from China’s eastern and central regions are overrepresented in the generated “place of origin” of individuals; western, northern (and more remote) provinces are underrepresented. 2. Some models cover more regions in their outputs than others; regional diversity is inconsistent. #AI #artificialintelligence #responsibleai #aibias
To view or add a comment, sign in
-
-
This one is for responsible AI enthusiasts! Bias in LLMs becomes a much critical issue when they get leveraged for purposes beyond what most developed nations face. For example, race-based bias may not be a key factor in some countries where almost all the population can be categorized in the same racial pool. But then, religion-based bias may be a big red flag. So in a nutshell, while the study is a good guiding post in terms of the importance of such studies, each country needs to formulate its own set of biases that AI products need to be cognizant of. This study (https://lnkd.in/g-Nrba7J) examines biases and stereotypes in several Chinese Large Language Models (C-LLMs). The focus is on how these models generate personal profile descriptions for different occupations, and whether they reflect biases in gender, age, education, region, etc. The authors tested five C-LLMs: ChatGLM, Xinghuo, Wenxinyiyan, Tongyiqianwen, and Baichuan AI. They used 90 common Chinese surnames and 12 occupations (across male-dominated, female-dominated, balanced, and hierarchical professions) to generate profile prompts and looked at the outputs in terms of gender, age, educational background, and place of origin. Some bias areas uncovered were: A. Gender bias / occupational stereotyping 1. The models often assign male pronouns/assumptions for occupations considered technical or male-dominated, even when real labor statistics show more balance. 2. In female-dominated professions (e.g. nurse, flight attendant, model), the models more often assign female pronouns, but still show varying degrees of male preference in some models. B. Age stereotypes 1. The profiles generated tend to cluster around middle age (e.g. ~30-45 years old), with fewer profiles for very young or older ages. 2. Certain occupations like professors/doctors are associated with older age; others like models or flight attendants with younger age. C. Education level 1. There is a general tendency for generated profiles to assume higher education (Bachelor’s degree or above). For “higher prestige” occupations (professor, doctor) the models often generate even doctoral degrees. 2. For lower prestige or less academic roles, the output tends toward lower education levels but is still skewed toward higher education than might be typical. D. Regional bias 1. The models show uneven regional representation: provinces from China’s eastern and central regions are overrepresented in the generated “place of origin” of individuals; western, northern (and more remote) provinces are underrepresented. 2. Some models cover more regions in their outputs than others; regional diversity is inconsistent. #AI #artificialintelligence #responsibleai #aibias
To view or add a comment, sign in
-
-
A new study reveals Google’s AI model ‘Gemma’ downplayed women’s physical and mental health issues compared to men’s when summarising case notes. That means otherwise identical cases could be assessed differently – not because of need, but because of gender bias baked into AI. With councils already turning to large language models to ease workloads, this raises a critical question: Are today’s AI tools reinforcing biases? If so, this demonstrates more than anything how much AI cannot be used to replace human intervention in cases but should just be used as a supportive tool. Read the full article: https://bit.ly/4mc1i8i #AI #MentalHealth #GenderBias #LocalGov #HealthTech
To view or add a comment, sign in
-
Large language models (LLMs), used by over half of England’s local authorities to support social workers, may be introducing gender bias into care decisions https://lnkd.in/gu_BB2_b #AI #healthcare #bias
To view or add a comment, sign in
-
Large language models (LLMs), used by over half of England’s local authorities to support social workers, may be downplaying women’s physical and mental issues in comparison to men’s when generating and summarising case notes. New research from The London School of Economics and Political Science (LSE) found that Google’s widely used AI model, Gemma, may be introducing gender bias into care decisions. Terms such as “disabled,” “unable” and “complex,” often associated with significant health concerns, appeared significantly more often in descriptions of men than women. Similar care needs among women were more likely to be omitted or described in less serious terms. The study used LLMs to generate 29,616 pairs of summaries based on real case notes from 617 adult social care users. To directly compare how the AI treated male and female cases, each pair described the exact same individual, with the only difference being gender. The analysis revealed statistically significant gender differences in how physical and mental health issues were described. The benchmark models exhibited some variation in output on the basis of gender, while Meta's Llama 3 showed no gender-based differences across any metrics. Google's Gemma displayed the most significant gender-based differences. In May of this year, Google announced MedGemma, a collection of generative models based on Gemma 3 designed to accelerate healthcare and life sciences AI development.
To view or add a comment, sign in
-
Well, this is deeply disturbing: "Artificial intelligence tools used [widely in England] are downplaying women’s physical and mental health issues and risk creating gender bias in care decisions, research has found." The study found that when using Google’s AI tool “Gemma” to generate and summarize the same case notes, language such as 'disabled', 'unable' and 'complex' appeared significantly more often in descriptions of men than women. "The [London School of Economics] research used real case notes from 617 adult social care users, which were inputted into different large language models (LLMs) multiple times, with only the gender swapped. Researchers then analyzed 29,616 pairs of summaries to see how male and female cases were treated differently by the AI models." https://lnkd.in/ecwj-cNj
To view or add a comment, sign in
-
This is an important research study which highlights the degree of bias in large language models which underpin #AI. This is important because the explosion in health tech will inadvertently propagate the bias against women and ethnic minorities. The study highlights that LLMs are not all the same and there are some which are less bias compared to others. If you are going to procure new health tech or other AI tools, it might be worth spending some time investigating the degree of bias of the associated LLM.😉
To view or add a comment, sign in
-
In an era where clinician burnout has reached crisis levels, artificial intelligence—particularly AI voice agents—offers a promising path forward. A 2021 study published in *JAMA Network Open* found that primary care physicians spent nearly **six hours of an eleven-hour workday** interacting with the EHR (electronic health record), with more than half of that time dedicated to documentation. These extended documentation hours have been directly linked to physician fatigue, job dissatisfaction, and even increased clinical error rates. Enter AI voice agents. Unlike traditional voice recognition tools that require training and manual correction, today’s voice agents leverage advanced natural language processing (NLP) to efficiently handle clinical documentation, patient scheduling, prescription refills, and more. Here's how they’re meaningfully reshaping clinical workflows: • **Real-time Documentation**: AI voice agents like Nuance DAX record physician-patient encounters and automatically convert them into structured clinical notes. A Mayo Clinic pilot study reported a 50% reduction in after-hours documentation following DAX implementation. • **Administrative Offloading**: According to the AMA, physicians spend nearly **one-third** of their time on administrative tasks. AI voice agents can handle many of these functions—like prior authorizations or outbound patient communication—freeing up clinicians to practice at the top of their license. • **Improved Patient Engagement**: With less cognitive load and administrative burden, clinicians can focus on what matters most—building rapport with patients. A 2022 Stanford study found that physicians using AI assistants reported increased eye contact and stronger patient relationships. For example, one major health system integrated AI voice agents across its outpatient practices and noted a **30% improvement in provider satisfaction scores** within six months. Still, successful implementation requires thoughtful design and training. AI systems must learn to interpret a wide range of accents, dialects, and medical contexts—challenges we’re actively working to address at [www.thefutureofai.ai](https://lnkd.in/edrpfdEE). The promise is compelling: AI voice agents as compassionate extensions of the care team, not replacements. If technology can give time back to clinicians and restore joy to the practice of medicine, isn’t that worth pursuing?
To view or add a comment, sign in
-
LLMs can assist researchers in many ways. For example, systematic reviewers can be supported in assessing risk of bias in randomized trials. Together with Chris Rose and other colleagues, we examined interrater agreement between ChatGPT-4o and human consensus assessments of risk of bias (RoB) in randomized trials. ChatGPT achieved 50.7% agreement with human reviewers for overall RoB assessment, which is comparable to some human-human agreement rates and indicates an important reduction in human effort during systematic review production. #SystematicReview #EvidenceSynthesis #AI #ChatGPT #RiskOfBias #ResearchMethods Link to the article: https://lnkd.in/d4YhZtTU
To view or add a comment, sign in