Danish Business Authority: Explainability and causality in relation to ML Ops

Graph Usage
for
Fraud Detection
and Bias Mitigation

Danish Business Authority (DBA)
●
Business registrations
●
Central Business Registry (CVR)
●
Fiscal report audits
●
Business support schemes
(eg: Covid-19, IT-security...etc)
●
Legal oversight & control
ML-Lab IKP

3
Why Machine Learning
• Make it easy to be a law-abiding company
AND: Make it hard to swindle
●
~800.000 companies in Denmark – impossible to check
everything by hand
• Focus efforts where most needed
• Requires data, infrastructure and software tools

4
Intelligent Kubernetes Platform
Four Main Components:
●
Kubernilla: Vanilla version of Kubernetes, highly opinionated
●
RaceTrack: Deployment system (www.github.com/theracetrack)
●
CatWalk: Evaluation component
●
RecordKeeper: Platform wide system event logger
Plus: Data Warehouse (postgreSQL, Neo4J)
Development:
Idempotent system design
Infrastructure as Code
One source of truth

Knowledge Graph (postgreSQL → Neo4j)
●
CVR (Businesses, people,
addresses … etc)
●
DBA Cases
●
Fiscal Reports
...and much more ...
●
Labels: 50
●
Relationship types: 41
●
Node Properties: 237
●
Nodes: 445 mio
●
Edges: 688 mio
→ Forms basis for ML efforts

6
Example: Meta Graph
Apoc.meta.graph()

7
data
Registry data + metadata + observations

8
data
Registry data + metadata + observations

13
data
metadata
Machine learning
Group
Shared Client

14
data
metadata
Machine learning
Group
Shared Client
●
Automatic control of new data
●
Exploits what we already know
●
Uses machine insights

15
●
All Decision made by humans
– ML in supporting role
ML at the Business Authority

16
Pitfalls
●
ML: It is easy to do something:
→ but also extremely easy to do
it wrong
●
Any ML model reflects its training
data
●
ML is only as strong as the data

17
Doing it wrong: Unethical AI
United States: Repeat criminal offenders
●
Guided prison sentence lengths
●
Biased towards colored people
Netherlands: Child care benefits fraud
●
10.000s families effected
●
Many low-income families
●
Many pushed into poverty
●
Several suicides
●
Government resigned

18
Motivation / Bias
●
Build fair & ethical models
●
EU: Artificial Intelligence Act
(https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=celex%3A52021PC0206)
●
EU: GDPR
(https://eur-lex.europa.eu/eli/reg/2016/679/oj)
Data ‘landscape’
Used data
known unknown
Unknown unknown

19
Motivation / Bias
●
Build fair & ethical models
●
EU: Artificial Intelligence Act
(https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=celex%3A52021PC0206)
●
EU: GDPR
(https://eur-lex.europa.eu/eli/reg/2016/679/oj)
●
Challenge: Follow data trail, explain origin of
knowledge and conclusions
●
Our Answers: RecordKeeper & X-Rai framework
[Transparent, Responsible, Explainable AI:
https://pure.itu.dk/en/publications/x-rai-a-framework-for-the-transparent-responsible-and-accurate-us]
Data ‘landscape’
Used data
known population
Un-known population

20
ML at the Business Authority
●
Need for complete traceability
Traceability need

21
Flow
●
Describes quantity
traversing network
●
e.g: Traffic, Railways,
Water pipes
●
Knowledge graph:
Springs, pipes and sinks
https://yoshuabengio.org/2022/03/05/generative-flow-networks/

22
Example: Meta Graph
Apoc.meta.graph()

23
Example: Meta Graph
Apoc.meta.graph()
Capture causal flow, eg
ML-Model
Query
Output

24
RecordKeeper: System Event Logger
●
Server / Client system, Python
●
Passive component: Listening only
●
Platform Event Message (PEM):
●
One action on the cluster, Unique ID
●
Emitter ID
●
Predecessor ID known
●
Artifacts: Data references
●
Builds graph of PEMs and Artifacts
-> Facilitates explainability on the cluster

25
PEM Directed Acyclic Graphs (DAG)
●
Each event creates a PEM
●
PEMs can create or reference artifacts
Data ingest
Data
Warehouse
Model
Training
PEM
1
PEM
2
PEM
3
Components:
(Emitters)
DAG:
Artifacts: References Main knowledge Graph

27
Flow Networks
●
Edges as ‘action paths’
●
Probability representations
●
Inspired by Bangio et al.: [https://arxiv.org/abs/2106.04399v2]
[Flow software package: https://github.com/GFNOrg/gflownet]

28
●
Trace out data usage
●
PageRank for node importance
●
Bias Detection
– at training and runtime
– sink scores
Explainability & Bias detection
user
ML-models
data
Ss=∑ F(s ,a')−∑ F(s ,a)

29
●
Trace out data usage
●
PageRank for node importance
●
Bias Detection
– at training and runtime
– sink scores
●
Data driven insights for
explainability,
model retirement or
re-training
Explainability & Bias detection
service
Consumer
ML-models
data
ML-score
ML-score

30
●
Reward: ML-Score
●
Train Graph Neural Network
●
Learn flow structure
●
Meta Tensor Model across
data, actions and scores
Idea: Meta Model
user
ML-models
data
ML-score
ML-score

31
Closing Remarks
●
Knowledge Graphs facilitate ML-efforts at Danish Business Authority
●
Focus on Transparent, Responsible and Explainable AI (X-Rai)
●
RecordKeeper generates Causal knowledge graphs
(explainability, bias mitigation, Flow tensor models)
Open Sourcing main components
RaceTrack, adaptable launch system already publicly available at:
http:github.com/theracetrack

33
●
Creates artifacts
●
RK plugin
●
Model calls

34
Graph Test
Example
●
Unused Nodes

Doing it wrong
●
Great Britain: Student Grade Assignment
●
100.000s students affected
●
Lower grades prevented education admission

Danish Business Authority: Explainability and causality in relation to ML Ops

More Related Content

What's hot (20)

Similar to Danish Business Authority: Explainability and causality in relation to ML Ops (20)

More from Neo4j (20)

Recently uploaded (20)

Danish Business Authority: Explainability and causality in relation to ML Ops