Can We Build AI That Does Not Harm Queer People?

ACM, Association for Computing Machinery

The world's largest professional organization advancing #computing as a science and profession.

Published Apr 30, 2025

"Most of us do not set out to make software that is actively harmful. And yet there is a plethora of AI systems that are harmful to queer people... How did we get there and how can we fix this?"

In this edition of "Advances in Computing," one researcher seeks to generate increased sensitivity. Also featured in this edition: two articles on a new prototype system that addresses the limitations of language prediction and the importance of distrust in trusting digital worker chatbots, as well as selected stories from the ACM magazine Interactions.

Enjoy!

Machine learning picks up and reproduces statistical patterns from large amounts of data. This makes it an especially good method whenever it is difficult to define hard rules or algorithms to solve a problem, but there are plenty of examples to learn from. But being part of a minority means that data instances representing the minority are much more rare than instances representing the majority. Or even worse, as it is often the case with the queer community, much of the representation could be negative. While recent methods for the alignment of LLMs are moving in the right direction, they are no panacea: rather than removing harmful instances from the training data, they add another training signal to counterbalance undesired outputs. Alignment is still easily circumventable and covers only one specific use case of machine learning systems, namely LLMs with chat interfaces.

If the pictures of trans people in an image dataset are mostly scraped from pornography websites, then AI-generated pictures of trans people will be hyper-sexualized. If machine translation datasets contain no instances of neo-pronouns it is not surprising that non-binary characters are misgendered in automatically translated video subtitles. And if queer identity terms are used in a derogatory way in the training data for LLMs, it is not surprising if sentiment analysis systems that build on LLMs assume queer identity terms are in fact swear words and not neutral descriptors.

Which leads us to the next question: How can computer science education and especially continuous training of the programming workforce address these shortcomings? Let’s imagine possible scenarios of how a group of developers in an industry setting might come to consider the impact of their software on the queer community and how computer science education can come into play.

ExampleCorp is a company that produces and markets software to a wide range of commercial clients. One of their products is a text-to-image system that generates personalized advertisements. For the upcoming pride month companies want to use the raised awareness to create ads that show their products in domestic scenes with queer couples, but this goes horribly wrong. One Monday morning the development team has an emergency meeting with Felicity, the head of the customer service department. She is faced with angry clients who claim that the text-to-image system produces homophobic caricatures when prompted with texts like “Lesbians at breakfast” or “Gay couple baking a cake.”

In a first knee-jerk reaction the development team blocks the text-input-API for all queer identity terms, so that typing any prompt with “gay” or “lesbian” in it will lead to an error message. Similar routes have been taken by other large tech companies: for some time the question “What is gay?” to Google’s Bard led to a canned response that denied further information, while “What is straight?” was treated as inoffensive by the system. At ExampleCorp the team looks through the training data, filtering for images with descriptions with queer identity terms in it. They quickly find the culprit: a big proportion of the tagged images were scraped from homophobic Web forums and some from pornography pages. But after removing these harmful samples there are many fewer pictures with queer descriptors left in the dataset. While the ads generated are not offensive any more, they are also of much lower quality than ads without queer identity terms. As a workaround the team communicates that customers should use prompts such as “Two women at breakfast” or “Two men baking a cake,” and replace prompts like “trans woman” with “woman holding a trans flag.” The team then bolsters the training dataset with stock photos of queer people, which have to be bought rather than scraped, and which are then manually annotated by crowd workers, adding to the price tag. After that the model performance gets a little bit better. But design oversights are not easily remedied on-the-fly: the solution comes too late and the public image of the company is damaged.

To prevent future mishaps like these, the company contacts a local queer organization that teaches diversity workshops. Together with the development team they come up with new design and testing criteria. They take an intersectional view of the problem too, making sure that other axes of discrimination are addressed. Armed with this new knowledge, the development team sets about disseminating it to other teams within the company, running peer-led workshops and giving lunchbreak talks. Two developers in particular, Sandja and Tom, take a liking to the topic and become unofficial experts who keep up with new developments and literature. They also join the Slack channel of Queer in AI, an organization that represents queer people who work in AI and that is an active forum of exchange. Community members point them to helpful literature and upcoming lectures or workshops.

A few months later ExampleCorp plans to start working on a tool that translates video subtitles in real time. Lee’s team is tasked with implementing it. Lee still vividly remembers the diversity workshop. He also is a big fan of the Netflix series One Day at a Time that features a non-binary character, Syd. Having learned Italian in college Lee wonders how Syd’s pronouns would be translated to Italian, where there are no established gender-neutral pronouns. How can the team make sure that the automatic translation does not misgender non-binary people?

Visit the full article here.

More like this:

‘What I Think about When I Type about Talking’

A new prototype system addresses the limitations of language prediction and retrieval features found in current AAC devices.

We present the prototype GenieTalk system as a lens through which to study these barriers, discussing five key themes: intuition, uncertainty, mode switching, the nature of conversation, and authorship.

The Importance of Distrust in Trusting Digital Worker Chatbots

Research shows that while trust is important, anthropomorphism counts more in the decision to hire an AI agent.

Trust and distrust are distinct: Trust encourages AI adoption, but distrust is a separate construct with unique effects on user experience. Managing both is key to evaluating AI adoption in professional settings.

Interactions - All About HCI:

Multilaboratory Experiments Are the Next Big Thing in HCI

To address these potential risks to public trust in HCI research—or ideally, to prevent them from arising in the first place—the next logical step for HCI are multilaboratory experiments.

Where Is ‘Spatial’ in Spatial Design?: How Design in the Age of Spatial Computing Can Leverage Paradigms from Physical Spatial Design

There is an opportunity to integrate HCI research with architectural paradigms to create actionable concepts around embodiment, intimacy gradients, and ambient transitions.

Discover our past editions:

Preprinting in AI Ethics: Toward a Set of Community Guidelines

Investigating Research Software Engineering

"I Was Wrong about the Ethics Crisis"

Notice and Choice Cannot Stand Alone

Reevaluating Google’s Reinforcement Learning for IC Macro Placement

AI Should Challenge, Not Obey

The Myth of the Coder

Summer Special Issue: Research and Opinions from Latin America

New Metrics for a Changing World

Do Know Harm: Considering the Ethics of Online Community Research

Now, Later, and Lasting

The Science of Detecting LLM-Generated Text

Can Machines Be in Language?

Enjoyed this newsletter?

Subscribe now to keep up to speed with what's happening in computing. The articles featured in this edition are from CACM, ACM's flagship magazine on computing and information technology; Interactions, a home to research related to human-computer interaction (HCI), user experience (UX), and all related disciplines. If you are new to ACM, consider following us on X | IG | FB | YT | Mastodon | Bsky. See you next time!

Advances in Computing

25,970 followers

+ Subscribe

Tinotenda Makuza

I like autonomy, coordination and healthy work environments. Very interested in AI and using it to maximise productivity and improve the business.

4mo

The state of the datasets that are available is what really throws a spanner in the works. Also, I am inclined to cut the AI some slack as even us humans innocently make some of these mistakes like misappropriating or just not knowing.

LinkedIn respects your privacy

Can We Build AI That Does Not Harm Queer People?

ACM, Association for Computing Machinery

The world's largest professional organization advancing #computing as a science and profession.

More like this:

Interactions - All About HCI:

Discover our past editions:

Enjoyed this newsletter?

Advances in Computing

25,970 followers

More articles by this author

Others also viewed

Generative AI: UNESCO study reveals alarming evidence of regressive gender stereotypes

The Next Evolution of AI: Trading Tokens for Concepts - Large Concept Models

Unraveling the Paradox of Large Language Models: An Investigative Look at the Promising Advances and Risks of GPT

The (Im)possibility of Automated Hallucination Detection in Large Language Models (LLMs): A Deep Dive

The Science of Detecting LLM-Generated Text

Gen AI Privacy: Privacy Risks of LLMs

AI Analysis #12 - Understanding Tokens and the Costs of Large Language Models (LLMs) for Enterprises

A Comparative Analysis: GPT-4 and Falcon LLM

🚀 Best Practices and Metrics for Evaluating Large Language Models (LLMs)

Scaling Laws of Large Language Models: Parameters vs Tokens

Explore content categories

More like this:

Interactions - All About HCI:

Discover our past editions:

Enjoyed this newsletter?

Advances in Computing

25,970 followers

Will AI Destroy the World Wide Web?

Sep 11, 2025

The Rational Programmer, the Right to Opt Out, and More

Jul 23, 2025

AI and Trust

Jul 9, 2025

Prompt Science, Agent Recommender, and More

Jun 18, 2025

Big Tech, You Need Academia. Speak Up!

Jun 4, 2025

AI-Driven Recruiting, The Outlook for Programmers, and More

May 14, 2025

Panmodal Information Interaction and More

Apr 2, 2025

Preprinting in AI Ethics: Toward a Set of Community Guidelines

Mar 19, 2025

The AI Alignment Paradox and More

Feb 26, 2025

Investigating Research Software Engineering

Feb 12, 2025

Others also viewed

Generative AI: UNESCO study reveals alarming evidence of regressive gender stereotypes

The Next Evolution of AI: Trading Tokens for Concepts - Large Concept Models

Unraveling the Paradox of Large Language Models: An Investigative Look at the Promising Advances and Risks of GPT

The (Im)possibility of Automated Hallucination Detection in Large Language Models (LLMs): A Deep Dive

The Science of Detecting LLM-Generated Text

Gen AI Privacy: Privacy Risks of LLMs

AI Analysis #12 - Understanding Tokens and the Costs of Large Language Models (LLMs) for Enterprises

A Comparative Analysis: GPT-4 and Falcon LLM

🚀 Best Practices and Metrics for Evaluating Large Language Models (LLMs)

Scaling Laws of Large Language Models: Parameters vs Tokens

Explore content categories