© 2021 Perceive
Facing Up To Bias
Steve Teig
Perceive
© 2021 Perceive
The concerning state of face recognition (FR)
2
New York Times
Khan Academy
© 2021 Perceive
Backlash and emerging legislation
3
© 2021 Perceive
Discrimination is pervasive…
4
© 2021 Perceive
Discrimination is pervasive… but not the whole story
5
• Training a neural network (typically) minimizes a loss function
• Near-universal loss function: expected value – i.e., the average – of the error
• E.g., cross-entropy H(p,q) = -Ep[log q] = average over p of –log(q)
• Suppose our FR training set has 10,000 white faces and 100 black faces
• ErrorW = average error on white faces; ErrorB = average error on black faces
• Total error is proportional to 10,000 * ErrorW + 100 * ErrorB
• Yup. Average error penalizes errors on white faces 100x as much as errors on black faces!
© 2021 Perceive
Of course, the trained model does better on white faces!
• Total error ∝ 100 * ErrorW + 1 * ErrorB
• Average error penalizes errors on white faces 100x as much as errors on black faces!
• Model compression makes this problem even worse
• Quantize the network, sparsify the network, etc.
• If the training network must jettison some information…
6
© 2021 Perceive
Why “balancing” the dataset won’t fix this
7
© 2021 Perceive
For experts: why (naïve) GANs won’t fix this either
• GAN: Generative Adversarial Network
• Generates synthetic data points that are hard to distinguish from real data points
• Can’t we use GANs to add more representative, interesting examples to the dataset?
• Yes, but…
• Mainstream GANs optimize only “datum looks as though from the original dataset”
• What if synthetic, clean-shaven faces are easier to generate than bearded ones?
• What if white faces are easier to generate than black ones?
• More bias 
8
© 2021 Perceive
How much influence should one image have?
9
vs.
© 2021 Perceive
Can we enable some images to have more influence?
• In today’s deep learning, each datum appears only once per epoch during training
• Loss L =
1
𝑁
σ𝑑 𝑒𝑟𝑟𝑜𝑟(𝑑) →
1
𝑁
σ𝑑 𝑚𝑎𝑠𝑠 𝑑 ∗ 𝑒𝑟𝑟𝑜𝑟(𝑑) , where σ𝑑 𝑚𝑎𝑠𝑠 𝑑 = 𝑁
• Typically, mass(d) = 1 for all d → average error
• What if we increase the mass of some data points vs. others?
• Mr. Muttonchops gets mass k, where all other data points get mass
𝑁−𝑘
𝑁−1
• Gradient pushes k times as hard on Mr. M
• Sounds reasonable, right?
10
© 2021 Perceive
Nope. Making some gradients bigger is a bad plan
11
Learning rate
© 2021 Perceive
A new idea: repeated selection vs. higher “learning rate”
• If d’s relative mass = k , include d (once) in each of ~k minibatches of each epoch
• Look at d more than once per epoch (in different local contexts)
• Now, d’s learning rate is the same as others’, but…
• d moves ~k times as far per epoch
• Wait a minute! How should we compute the mass of each datum?
• Lossd quantifies d’s distance from happiness: i.e., lossd = 0
• Lots of papers advocate Lossd as relative importance…
• Gradientd quantifies d’s current velocity on the path to happiness
• Lots of papers advocate Gradientd as relative importance…
12
© 2021 Perceive
Why loss and gradient are poor choices for mass
13
loss
A B
C
D
gradient
gradient
© 2021 Perceive
A new idea: “time to happiness”
• Distance = rate * time → Time = distance / rate
• Td =
𝐿𝑜𝑠𝑠𝑑
𝐺𝑟𝑎𝑑𝑖𝑒𝑛𝑡𝑑
• Want every data point to achieve happiness at (roughly) the same time
• Otherwise, either stop before every data point is happy, …
• Or wait for eons
• Make each datum’s mass be equal to its time to happiness
• Datum with more “work to do” gets more time to do it
• Time to happiness is a better criterion than Lossd or Gradientd
14
© 2021 Perceive
Remassing is powerful
• Optimizes worst-case accuracy, rather than average accuracy
• No customer really cares about average accuracy, yet everybody optimizes that!
• “Accuracy: Beware of Red Herrings and Black Swans” – Embedded Vision 2020
• But wait! There’s more!
• Remassing can massively accelerate training
• Focus optimization effort on points with the most work to do
• Most data points resemble other data points: get optimized “for free”!
15
© 2021 Perceive
Facing up to bias
• Remassing optimizes worst-case accuracy, not average accuracy
• Treats rare data points and common data points as equally important
• Treats rare (explanatory) features and common features as equally important
• Remassing addresses a major source of observed bias in face recognition
16
© 2021 Perceive
Resources
17
2021 Embedded Vision Summit
“TinyML Is Not Thinking Big Enough”
(talk)
Remassing based on gradient direction
https://arxiv.org/pdf/1803.09050.pdf
Remassing based on loss
https://arxiv.org/pdf/1511.06343.pdf
Perceive
https://www.perceive.io

More Related Content

PPTX
Lecture 4W-InterpretingRegression
PDF
“Making GANs Much Better, or If at First You Don’t Succeed, Try, Try a GAN,” ...
PPTX
Data simulation basics
PPTX
Supervised learning: Types of Machine Learning
PDF
Machine Learning Foundations
PPTX
What is the story with agile data keynote agile 2018 (Magennis)
PDF
Machine Learning Foundations for Professional Managers
PPTX
October hug
Lecture 4W-InterpretingRegression
“Making GANs Much Better, or If at First You Don’t Succeed, Try, Try a GAN,” ...
Data simulation basics
Supervised learning: Types of Machine Learning
Machine Learning Foundations
What is the story with agile data keynote agile 2018 (Magennis)
Machine Learning Foundations for Professional Managers
October hug

Similar to “Facing Up to Bias,” a Presentation from Perceive (10)

PPTX
NY R Conference talk
PPTX
Software estimation is crap
PDF
Demystifying big data
PPTX
2024 stanford cs25 guest lecture jason wei.pptx
PPTX
Cheg 2052 – introduction.pptx
PDF
Why projects do not fulfill deadlines? and how we can change this!
PPTX
Forecasting Rung 1
PDF
lec8_annotated.pdf ml csci 567 vatsal sharan
PPTX
TD Learning Webinar
PPTX
Prompt_engineering_and_applications.pptx
NY R Conference talk
Software estimation is crap
Demystifying big data
2024 stanford cs25 guest lecture jason wei.pptx
Cheg 2052 – introduction.pptx
Why projects do not fulfill deadlines? and how we can change this!
Forecasting Rung 1
lec8_annotated.pdf ml csci 567 vatsal sharan
TD Learning Webinar
Prompt_engineering_and_applications.pptx
Ad

More from Edge AI and Vision Alliance (20)

PDF
“A New Era of 3D Sensing: Transforming Industries and Creating Opportunities,...
PDF
“The New OpenCV 5.0: Added Features, Performance Improvements and Future Dire...
PDF
“Introduction to Shrinking Models with Quantization-aware Training and Post-t...
PDF
“Customizing Vision-language Models for Real-world Applications,” a Presentat...
PDF
“An Introduction to the MIPI CSI-2 Image Sensor Standard and Its Latest Advan...
PDF
“Visual Search: Fine-grained Recognition with Embedding Models for the Edge,”...
PDF
“Optimizing Real-time SLAM Performance for Autonomous Robots with GPU Acceler...
PDF
“LLMs and VLMs for Regulatory Compliance, Quality Control and Safety Applicat...
PDF
“Simplifying Portable Computer Vision with OpenVX 2.0,” a Presentation from AMD
PDF
“Quantization Techniques for Efficient Deployment of Large Language Models: A...
PDF
“Introduction to Data Types for AI: Trade-offs and Trends,” a Presentation fr...
PDF
“Introduction to Radar and Its Use for Machine Perception,” a Presentation fr...
PDF
“NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentati...
PDF
“Voice Interfaces on a Budget: Building Real-time Speech Recognition on Low-c...
PDF
“Computer Vision at Sea: Automated Fish Tracking for Sustainable Fishing,” a ...
PDF
“Squinting Vision Pipelines: Detecting and Correcting Errors in Vision Models...
PDF
“ONNX and Python to C++: State-of-the-art Graph Compilation,” a Presentation ...
PDF
“Beyond the Demo: Turning Computer Vision Prototypes into Scalable, Cost-effe...
PDF
“Running Accelerated CNNs on Low-power Microcontrollers Using Arm Ethos-U55, ...
PDF
“Scaling i.MX Applications Processors’ Native Edge AI with Discrete AI Accele...
“A New Era of 3D Sensing: Transforming Industries and Creating Opportunities,...
“The New OpenCV 5.0: Added Features, Performance Improvements and Future Dire...
“Introduction to Shrinking Models with Quantization-aware Training and Post-t...
“Customizing Vision-language Models for Real-world Applications,” a Presentat...
“An Introduction to the MIPI CSI-2 Image Sensor Standard and Its Latest Advan...
“Visual Search: Fine-grained Recognition with Embedding Models for the Edge,”...
“Optimizing Real-time SLAM Performance for Autonomous Robots with GPU Acceler...
“LLMs and VLMs for Regulatory Compliance, Quality Control and Safety Applicat...
“Simplifying Portable Computer Vision with OpenVX 2.0,” a Presentation from AMD
“Quantization Techniques for Efficient Deployment of Large Language Models: A...
“Introduction to Data Types for AI: Trade-offs and Trends,” a Presentation fr...
“Introduction to Radar and Its Use for Machine Perception,” a Presentation fr...
“NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentati...
“Voice Interfaces on a Budget: Building Real-time Speech Recognition on Low-c...
“Computer Vision at Sea: Automated Fish Tracking for Sustainable Fishing,” a ...
“Squinting Vision Pipelines: Detecting and Correcting Errors in Vision Models...
“ONNX and Python to C++: State-of-the-art Graph Compilation,” a Presentation ...
“Beyond the Demo: Turning Computer Vision Prototypes into Scalable, Cost-effe...
“Running Accelerated CNNs on Low-power Microcontrollers Using Arm Ethos-U55, ...
“Scaling i.MX Applications Processors’ Native Edge AI with Discrete AI Accele...
Ad

Recently uploaded (20)

PDF
ENT215_Completing-a-large-scale-migration-and-modernization-with-AWS.pdf
PDF
CloudStack 4.21: First Look Webinar slides
PDF
Architecture types and enterprise applications.pdf
DOCX
search engine optimization ppt fir known well about this
PDF
Hybrid horned lizard optimization algorithm-aquila optimizer for DC motor
PPTX
MicrosoftCybserSecurityReferenceArchitecture-April-2025.pptx
PDF
1 - Historical Antecedents, Social Consideration.pdf
PPTX
The various Industrial Revolutions .pptx
PDF
Taming the Chaos: How to Turn Unstructured Data into Decisions
PPTX
Benefits of Physical activity for teenagers.pptx
PDF
DP Operators-handbook-extract for the Mautical Institute
PDF
A novel scalable deep ensemble learning framework for big data classification...
PDF
sustainability-14-14877-v2.pddhzftheheeeee
PDF
Developing a website for English-speaking practice to English as a foreign la...
PDF
A Late Bloomer's Guide to GenAI: Ethics, Bias, and Effective Prompting - Boha...
PDF
TrustArc Webinar - Click, Consent, Trust: Winning the Privacy Game
PPTX
Tartificialntelligence_presentation.pptx
PPTX
Web Crawler for Trend Tracking Gen Z Insights.pptx
PDF
Hybrid model detection and classification of lung cancer
PDF
WOOl fibre morphology and structure.pdf for textiles
ENT215_Completing-a-large-scale-migration-and-modernization-with-AWS.pdf
CloudStack 4.21: First Look Webinar slides
Architecture types and enterprise applications.pdf
search engine optimization ppt fir known well about this
Hybrid horned lizard optimization algorithm-aquila optimizer for DC motor
MicrosoftCybserSecurityReferenceArchitecture-April-2025.pptx
1 - Historical Antecedents, Social Consideration.pdf
The various Industrial Revolutions .pptx
Taming the Chaos: How to Turn Unstructured Data into Decisions
Benefits of Physical activity for teenagers.pptx
DP Operators-handbook-extract for the Mautical Institute
A novel scalable deep ensemble learning framework for big data classification...
sustainability-14-14877-v2.pddhzftheheeeee
Developing a website for English-speaking practice to English as a foreign la...
A Late Bloomer's Guide to GenAI: Ethics, Bias, and Effective Prompting - Boha...
TrustArc Webinar - Click, Consent, Trust: Winning the Privacy Game
Tartificialntelligence_presentation.pptx
Web Crawler for Trend Tracking Gen Z Insights.pptx
Hybrid model detection and classification of lung cancer
WOOl fibre morphology and structure.pdf for textiles

“Facing Up to Bias,” a Presentation from Perceive

  • 1. © 2021 Perceive Facing Up To Bias Steve Teig Perceive
  • 2. © 2021 Perceive The concerning state of face recognition (FR) 2 New York Times Khan Academy
  • 3. © 2021 Perceive Backlash and emerging legislation 3
  • 5. © 2021 Perceive Discrimination is pervasive… but not the whole story 5 • Training a neural network (typically) minimizes a loss function • Near-universal loss function: expected value – i.e., the average – of the error • E.g., cross-entropy H(p,q) = -Ep[log q] = average over p of –log(q) • Suppose our FR training set has 10,000 white faces and 100 black faces • ErrorW = average error on white faces; ErrorB = average error on black faces • Total error is proportional to 10,000 * ErrorW + 100 * ErrorB • Yup. Average error penalizes errors on white faces 100x as much as errors on black faces!
  • 6. © 2021 Perceive Of course, the trained model does better on white faces! • Total error ∝ 100 * ErrorW + 1 * ErrorB • Average error penalizes errors on white faces 100x as much as errors on black faces! • Model compression makes this problem even worse • Quantize the network, sparsify the network, etc. • If the training network must jettison some information… 6
  • 7. © 2021 Perceive Why “balancing” the dataset won’t fix this 7
  • 8. © 2021 Perceive For experts: why (naïve) GANs won’t fix this either • GAN: Generative Adversarial Network • Generates synthetic data points that are hard to distinguish from real data points • Can’t we use GANs to add more representative, interesting examples to the dataset? • Yes, but… • Mainstream GANs optimize only “datum looks as though from the original dataset” • What if synthetic, clean-shaven faces are easier to generate than bearded ones? • What if white faces are easier to generate than black ones? • More bias  8
  • 9. © 2021 Perceive How much influence should one image have? 9 vs.
  • 10. © 2021 Perceive Can we enable some images to have more influence? • In today’s deep learning, each datum appears only once per epoch during training • Loss L = 1 𝑁 σ𝑑 𝑒𝑟𝑟𝑜𝑟(𝑑) → 1 𝑁 σ𝑑 𝑚𝑎𝑠𝑠 𝑑 ∗ 𝑒𝑟𝑟𝑜𝑟(𝑑) , where σ𝑑 𝑚𝑎𝑠𝑠 𝑑 = 𝑁 • Typically, mass(d) = 1 for all d → average error • What if we increase the mass of some data points vs. others? • Mr. Muttonchops gets mass k, where all other data points get mass 𝑁−𝑘 𝑁−1 • Gradient pushes k times as hard on Mr. M • Sounds reasonable, right? 10
  • 11. © 2021 Perceive Nope. Making some gradients bigger is a bad plan 11 Learning rate
  • 12. © 2021 Perceive A new idea: repeated selection vs. higher “learning rate” • If d’s relative mass = k , include d (once) in each of ~k minibatches of each epoch • Look at d more than once per epoch (in different local contexts) • Now, d’s learning rate is the same as others’, but… • d moves ~k times as far per epoch • Wait a minute! How should we compute the mass of each datum? • Lossd quantifies d’s distance from happiness: i.e., lossd = 0 • Lots of papers advocate Lossd as relative importance… • Gradientd quantifies d’s current velocity on the path to happiness • Lots of papers advocate Gradientd as relative importance… 12
  • 13. © 2021 Perceive Why loss and gradient are poor choices for mass 13 loss A B C D gradient gradient
  • 14. © 2021 Perceive A new idea: “time to happiness” • Distance = rate * time → Time = distance / rate • Td = 𝐿𝑜𝑠𝑠𝑑 𝐺𝑟𝑎𝑑𝑖𝑒𝑛𝑡𝑑 • Want every data point to achieve happiness at (roughly) the same time • Otherwise, either stop before every data point is happy, … • Or wait for eons • Make each datum’s mass be equal to its time to happiness • Datum with more “work to do” gets more time to do it • Time to happiness is a better criterion than Lossd or Gradientd 14
  • 15. © 2021 Perceive Remassing is powerful • Optimizes worst-case accuracy, rather than average accuracy • No customer really cares about average accuracy, yet everybody optimizes that! • “Accuracy: Beware of Red Herrings and Black Swans” – Embedded Vision 2020 • But wait! There’s more! • Remassing can massively accelerate training • Focus optimization effort on points with the most work to do • Most data points resemble other data points: get optimized “for free”! 15
  • 16. © 2021 Perceive Facing up to bias • Remassing optimizes worst-case accuracy, not average accuracy • Treats rare data points and common data points as equally important • Treats rare (explanatory) features and common features as equally important • Remassing addresses a major source of observed bias in face recognition 16
  • 17. © 2021 Perceive Resources 17 2021 Embedded Vision Summit “TinyML Is Not Thinking Big Enough” (talk) Remassing based on gradient direction https://arxiv.org/pdf/1803.09050.pdf Remassing based on loss https://arxiv.org/pdf/1511.06343.pdf Perceive https://www.perceive.io