SlideShare a Scribd company logo
USING BAYESIAN NETWORKS TO
IDENTIFY AND PREVENT 	

SPLIT PURCHASES IN BRAZIL
USING BAYESIAN NETWORKS TO IDENTIFY AND
PREVENT SPLIT PURCHASES IN BRAZIL
Rommel N. Carvalho, Leonardo J. Sales, Henrique A. da Rocha, and Gilson L. Mendes	

!
Data Science Team Leader / Professor	

http://about.me/rommelnc	

!
Department of Research and Strategic Information (DIE) / Department of Computer Science (CIC)	

Brazil’s Office of the Comptroller General (CGU) / University of Brasília (UnB)	

!
BMAW workshop @ UAI 2014	

Quebec City, Quebec, Canada - 07/27/2014
2
Introduction	

Methodology	

Data Understanding and Preparation	

Modeling and Evaluation	

Deployment	

Conclusion
3
AGENDA
INTRODUCTION
4
Introduction - Methodology - Data Understanding and Preparation - Modeling
and Evaluation - Deployment - Conclusion
CGU is responsible for inspecting and auditing
Brazilian Government projects and programs
providing transparency and preventing
corruption	

All contracts with the private sector must follow
the national procurement law	

Select the most advantageous proposal for a contract
in its interest	

Susceptible to many forms of corruption
5
INTRODUCTION
Introduction - Methodology - Data Understanding and Preparation - Modeling
and Evaluation - Deployment - Conclusion
Prevent and detect government corruption	

For that it must gather information from a variety of
sources and combine it to evaluate whether further
action, such as an investigation, is required	

Information explosion	

Growing Acceleration Program (PAC) alone has a budget
greater than 250 billion dollars	

More than one thousand projects only on the state of Sao
Paulo
6
CGU’S RESPONSIBILITY
Introduction - Methodology - Data Understanding and Preparation - Modeling
and Evaluation - Deployment - Conclusion
The Public Agent may favor a specific supplier
that she happens to know	

The Public Agent may receive, from the bidder, a
financial compensation for awarding a contract
to that firm	

Bidders may collude as to set the results of the
procurement	

And many others from within and outside the
public administration
7
CORRUPTION
Introduction - Methodology - Data Understanding and Preparation - Modeling
and Evaluation - Deployment - Conclusion
Law has many articles that established penalties
for firms or/and public legislators caught in
corruption activities	

Two types of penalties	

Administrative actions 	

Penal actions	

Enforcing the law is difficult 	

Avoid corruption in the first place
8
PREVENTION VS PUNISHMENT
Introduction - Methodology - Data Understanding and Preparation - Modeling
and Evaluation - Deployment - Conclusion
Apply Data Mining methods to create models
that will aid the experts in identifying
procurement frauds 	

If possible, identify suspicious transactions as
soon as possible in order to prevent the fraud
from happening
9
PROJECT OBJECTIVE
Introduction - Methodology - Data Understanding and Preparation - Modeling
and Evaluation - Deployment - Conclusion
METHODOLOGY
10
Introduction - Methodology - Data Understanding and Preparation - Modeling
and Evaluation - Deployment - Conclusion
11
CRoss Industry Standard Process for Data Mining
Introduction - Methodology - Data Understanding and Preparation - Modeling
and Evaluation - Deployment - Conclusion
CRISP-DM
DATA UNDERSTANDING AND PREPARATION
12
Introduction - Methodology - Data Understanding and Preparation -
Modeling and Evaluation - Deployment - Conclusion
Related to procurements and contracts of IT
services on the Brazilian Federal Government
from 2005 to 2010	

Merged data from different databases 	

42 attributes	

70,365 transactions
13
THE DATA
Introduction - Methodology - Data Understanding and Preparation -
Modeling and Evaluation - Deployment - Conclusion
CGU has been working closely with experts on
the domain and want to identify certain fraud
topologies, such as:	

Identify if owners from different companies are actually
partners	

Identify split purchases, i.e., if big contracts are being
broken down in small ones (adding up to more than
8,000)	

More than 20 more types of frauds to identify	

Focus on the second one
14
WHAT FOR?*
Introduction - Methodology - Data Understanding and Preparation -
Modeling and Evaluation - Deployment - Conclusion
*This is a bit more on business understanding.
Loaded in Excel	

Removed some characters 	

,	

“	

‘	

Replaced missing values	

NA, -9, -8, and 0 	

by ?
15
PREPARING THE DATA I
Introduction - Methodology - Data Understanding and Preparation -
Modeling and Evaluation - Deployment - Conclusion
Removed some attributes	

Mostly IDs	

E.g., we kept the name of the state instead of its ID	

Because it is easier to understand what it means	

From 42 to 26 attributes	

Saved the data as CSV file and loaded in Weka	

Changed the year to nominal 	

Removed rows with missing values in final price
16
PREPARING THE DATA II
Introduction - Methodology - Data Understanding and Preparation -
Modeling and Evaluation - Deployment - Conclusion
Typical cases of data that should not be trusted
or considered incorrect	

Values from cents to hundreds of trillions of dollars	

Proposed unit price was 4 thousand whereas the final
actual price was a dollar	

Number of cases with a really high value is small	

Analyzed by experts	

Remove noise (done for obvious cases)	

Identify outliers (still being analyzed by experts)
17
ANALYZING THE DATA
Introduction - Methodology - Data Understanding and Preparation -
Modeling and Evaluation - Deployment - Conclusion
Interested in transactions that sum to 8,000 in a
given period of time	

Created class attribute using R	

Irregular* - transactions that involved the same
institutions on the same month and year that added up
to more than 8,000	

Regular - all others	

This classification is easy but...
18
CHANGING THE DATA
Introduction - Methodology - Data Understanding and Preparation -
Modeling and Evaluation - Deployment - Conclusion
*This is a simplification, since we have restricted our work to IT purchases.
MODELING AND EVALUATION
19
Introduction - Methodology - Data Understanding and Preparation - Modeling
and Evaluation - Deployment - Conclusion
The idea is to try to classify, without looking at
the sums, which transactions should be
considered suspicious	

Avoid waiting for the problem to occur	

Prevention over punishment	

Identify the problem as soon as possible	

Find suspicious transactions in days (proactive)	

Not after a month (reactive)
20
...THIS IS HARD!!!
Introduction - Methodology - Data Understanding and Preparation - Modeling
and Evaluation - Deployment - Conclusion
21
BACK TO PREPARATION
Introduction - Methodology - Data Understanding and Preparation - Modeling
and Evaluation - Deployment - Conclusion
22
NAÏVE BAYES VS BAYES NET (K2) -
WITH AND WITHOUT RESAMPLING
Introduction - Methodology - Data Understanding and Preparation - Modeling
and Evaluation - Deployment - Conclusion
23
CHANGING ALGORITHMS AND
NUMBER OF PARENTS
Introduction - Methodology - Data Understanding and Preparation - Modeling
and Evaluation - Deployment - Conclusion
24
CAN IT GET BETTER?
Introduction - Methodology - Data Understanding and Preparation - Modeling
and Evaluation - Deployment - Conclusion
Analyzing the structure of the BN we were able
to identify some interesting relations about	

The government office responsible for the
procurement 	

And also the winner of the procurements	

Due to confidentiality nature of the problem we
are not allowed to discuss these relations further	

I can say that more analysis is needed	

Still to be done due to time constrain
25
UNDERSTANDING WHY I
Introduction - Methodology - Data Understanding and Preparation - Modeling
and Evaluation - Deployment - Conclusion
Tried to find if there was something obvious
about the data that made it easy to predict	

Used Apriori with min confidence .1 and min lift
1.1 generating 50 rules	

Nothing interesting at the first run (confidence 1
and lift 1.82)	

Removed uninteresting attributes	

Still nothing interesting (now with highest
confidence of .62 and lift of 1.11)
26
UNDERSTANDING WHY II
Introduction - Methodology - Data Understanding and Preparation - Modeling
and Evaluation - Deployment - Conclusion
DEPLOYMENT
27
Introduction - Methodology - Data Understanding and Preparation - Modeling
and Evaluation - Deployment - Conclusion
Identifying suspicious transactions as soon as
possible to avoid the occurrence of fraud	

What kind of fraud?	

The type of fraud we are concerned about in this
project is split purchases. I.e., when a procurement that
should be done just once with a value higher than
8,000 is broken down in smaller ones to avoid the
normal procurement procedure, allowing the office to
hire an enterprise directly
28
BACK TO THE GOAL
Introduction - Methodology - Data Understanding and Preparation - Modeling
and Evaluation - Deployment - Conclusion
Detect one or more of these procurements, but
that still do not sum up to more than 8,000, as
soon as they become suspicious (using the
classifier) and act proactively	

Faster response and prevent irregularities in the
first place	

Warn/educate/teach the people involved that
they should be careful, since breaking down big
procurements in small ones is not allowed
29
THE BIG IDEA
Introduction - Methodology - Data Understanding and Preparation - Modeling
and Evaluation - Deployment - Conclusion
Make those that do it on purpose to think twice	

Educate those that just don’t know better	

Expect a decrease in the number of irregular
transactions associated to this type of fraud	

Important to keep track of statistics to verify this
expectation	

As people learn and change their behavior, it is
also expected that our classifier will also need
some changes to cope with these modifications
30
THE CONSEQUENCES
Introduction - Methodology - Data Understanding and Preparation - Modeling
and Evaluation - Deployment - Conclusion
CONCLUSION
31
Introduction - Methodology - Data Understanding and Preparation - Modeling
and Evaluation - Deployment - Conclusion
Data Mining can be used to identify and prevent
procurement frauds in Brazil	

Prioritize depth instead of breadth	

Results paid off	

Correctly classified all split purchases 	

Really high ROC area (.999)	

High accuracy (99.197%)
32
CONCLUSIONS
Introduction - Methodology - Data Understanding and Preparation - Modeling
and Evaluation - Deployment - Conclusion
Deploy this classifier in CGU’s intelligence
system and analyze its impact	

Better understand why the model is capable of
getting such good results	

By carefully analyzing the structure of the BNs
generated and the variable’s CPTs	

Be careful with overfitting	

Still a lot of different types of frauds to be
analyzed
33
FUTURE WORK
Introduction - Methodology - Data Understanding and Preparation - Modeling
and Evaluation - Deployment - Conclusion
34
OBRIGADO!

More Related Content

PPT
Computer Uses in different areas
PPSX
Welcome to the presentation on ‘total station
PPTX
Uses of computer
PPTX
Accountant302018presentatie hs march122018
PPTX
The cloud: financial, legal and technical
PDF
Legal project management webinar
PDF
PDF
ACFE Presentation on Analytics for Fraud Detection and Mitigation
Computer Uses in different areas
Welcome to the presentation on ‘total station
Uses of computer
Accountant302018presentatie hs march122018
The cloud: financial, legal and technical
Legal project management webinar
ACFE Presentation on Analytics for Fraud Detection and Mitigation

Similar to BMAW 2014 - Using Bayesian Networks to Identify and Prevent Split Purchases in Brazil (20)

PDF
Data Science by Chappuis Halder & Co.
PPT
Presentation to Irish ISSA Conference 12-May-11
PPTX
Data Elicitation corporate presentation (june 2014)
PPTX
PPT1-Buss Intel Analytics.pptx
PDF
Data Science Introduction - Data Science: What Art Thou?
PPTX
Key Principles Of Data Mining
PDF
building-analytical-roadmap.pdf
PDF
building-analytical-roadmapinfocienciad.pdf
PDF
Whos role is it anyway
PDF
1340 keynote minkowski_using our laptop
PDF
Big Data Analytics in light of Financial Industry
PDF
Digital Transformation in the Cloud: What They Don’t Always Tell You [2020]
PDF
The Datafication of HR: Graduating from Metrics to Analytics
PDF
Giving Organisations new capabilities to ask the right business questions 1.7
PDF
Meet thedatajune2014
PDF
Protect Your Revenue Streams: Big Data & Analytics in Tax
PDF
Aftermarket2012 cargotec malcolmyoull
PDF
Technology Breakout – Simon Hardy, Elemica: “Next Generation Apps and Analytics”
PDF
Innovative Data Leveraging for Procurement Analytics
PDF
The Softer Skills Analysts need to make an impact
Data Science by Chappuis Halder & Co.
Presentation to Irish ISSA Conference 12-May-11
Data Elicitation corporate presentation (june 2014)
PPT1-Buss Intel Analytics.pptx
Data Science Introduction - Data Science: What Art Thou?
Key Principles Of Data Mining
building-analytical-roadmap.pdf
building-analytical-roadmapinfocienciad.pdf
Whos role is it anyway
1340 keynote minkowski_using our laptop
Big Data Analytics in light of Financial Industry
Digital Transformation in the Cloud: What They Don’t Always Tell You [2020]
The Datafication of HR: Graduating from Metrics to Analytics
Giving Organisations new capabilities to ask the right business questions 1.7
Meet thedatajune2014
Protect Your Revenue Streams: Big Data & Analytics in Tax
Aftermarket2012 cargotec malcolmyoull
Technology Breakout – Simon Hardy, Elemica: “Next Generation Apps and Analytics”
Innovative Data Leveraging for Procurement Analytics
The Softer Skills Analysts need to make an impact
Ad

More from Rommel Carvalho (20)

PPTX
Ouvidoria de Balcão vs Ouvidoria Digital: Desafios na Era Big Data
PDF
Como transformar servidores em cientistas de dados e diminuir a distância ent...
PPTX
Proposta de Modelo de Classificação de Riscos de Contratos Públicos
PPTX
Categorização de achados em auditorias de TI com modelos supervisionados e nã...
PPTX
Mapeamento de risco de corrupção na administração pública federal
PDF
Ciência de Dados no Combate à Corrupção
PDF
Aplicação de técnicas de mineração de textos para classificação automática de...
PDF
Filiação partidária e risco de corrupção de servidores públicos federais
PDF
Uso de mineração de dados e textos para cálculo de preços de referência em co...
PDF
Detecção preventiva de fracionamento de compras
PDF
Identificação automática de tipos de pedidos mais frequentes da LAI
PDF
A GUI for MLN
PDF
URSW 2013 - UMP-ST plug-in
PDF
Integração do Portal da Copa @ Comissão CMA do Senado Federal
KEY
Dados Abertos Governamentais
KEY
Modeling a Probabilistic Ontology for Maritime Domain Awareness
PDF
Probabilistic Ontology: Representation and Modeling Methodology
PDF
SWRL-F - A Fuzzy Logic Extension of the Semantic Web Rule Language
ODP
Default Logics for Plausible Reasoning with Controversial Axioms
PDF
Tractability of the Crisp Representations of Tractable Fuzzy Description Logics
Ouvidoria de Balcão vs Ouvidoria Digital: Desafios na Era Big Data
Como transformar servidores em cientistas de dados e diminuir a distância ent...
Proposta de Modelo de Classificação de Riscos de Contratos Públicos
Categorização de achados em auditorias de TI com modelos supervisionados e nã...
Mapeamento de risco de corrupção na administração pública federal
Ciência de Dados no Combate à Corrupção
Aplicação de técnicas de mineração de textos para classificação automática de...
Filiação partidária e risco de corrupção de servidores públicos federais
Uso de mineração de dados e textos para cálculo de preços de referência em co...
Detecção preventiva de fracionamento de compras
Identificação automática de tipos de pedidos mais frequentes da LAI
A GUI for MLN
URSW 2013 - UMP-ST plug-in
Integração do Portal da Copa @ Comissão CMA do Senado Federal
Dados Abertos Governamentais
Modeling a Probabilistic Ontology for Maritime Domain Awareness
Probabilistic Ontology: Representation and Modeling Methodology
SWRL-F - A Fuzzy Logic Extension of the Semantic Web Rule Language
Default Logics for Plausible Reasoning with Controversial Axioms
Tractability of the Crisp Representations of Tractable Fuzzy Description Logics
Ad

Recently uploaded (20)

PPTX
Quiz - Saturday.pptxaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
PPTX
Weekly Report 17-10-2024_cybersecutity.pptx
PDF
Items # 6&7 - 900 Cambridge Oval Right-of-Way
PDF
Item # 3 - 934 Patterson Final Review.pdf
PDF
PPT Item #s 2&3 - 934 Patterson SUP & Final Review
PPTX
PCCR-ROTC-UNIT-ORGANIZATIONAL-STRUCTURE-pptx-Copy (1).pptx
DOC
LU毕业证学历认证,赫尔大学毕业证硕士的学历和学位
PPTX
Omnibus rules on leave administration.pptx
PDF
4_Key Concepts Structure and Governance plus UN.pdf okay
PDF
Item # 4 -- 328 Albany St. compt. review
PDF
Item # 5 - 5307 Broadway St final review
PDF
CXPA Finland Webinar: Rated 5 Stars - Delivering Service That Customers Truly...
PDF
PPT Items # 6&7 - 900 Cambridge Oval Right-of-Way
PDF
The Detrimental Impacts of Hydraulic Fracturing for Oil and Gas_ A Researched...
PDF
PPT Item # 4 - 328 Albany St compt. review
PDF
buyers sellers meeting of mangoes in mahabubnagar.pdf
PPTX
The DFARS - Part 251 - Use of Government Sources By Contractors
PPTX
Social_Medias_Parents_Education_PPT.pptx
PDF
PPT Item # 2 -- Announcements Powerpoint
PDF
Abhay Bhutada and Other Visionary Leaders Reinventing Governance in India
Quiz - Saturday.pptxaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
Weekly Report 17-10-2024_cybersecutity.pptx
Items # 6&7 - 900 Cambridge Oval Right-of-Way
Item # 3 - 934 Patterson Final Review.pdf
PPT Item #s 2&3 - 934 Patterson SUP & Final Review
PCCR-ROTC-UNIT-ORGANIZATIONAL-STRUCTURE-pptx-Copy (1).pptx
LU毕业证学历认证,赫尔大学毕业证硕士的学历和学位
Omnibus rules on leave administration.pptx
4_Key Concepts Structure and Governance plus UN.pdf okay
Item # 4 -- 328 Albany St. compt. review
Item # 5 - 5307 Broadway St final review
CXPA Finland Webinar: Rated 5 Stars - Delivering Service That Customers Truly...
PPT Items # 6&7 - 900 Cambridge Oval Right-of-Way
The Detrimental Impacts of Hydraulic Fracturing for Oil and Gas_ A Researched...
PPT Item # 4 - 328 Albany St compt. review
buyers sellers meeting of mangoes in mahabubnagar.pdf
The DFARS - Part 251 - Use of Government Sources By Contractors
Social_Medias_Parents_Education_PPT.pptx
PPT Item # 2 -- Announcements Powerpoint
Abhay Bhutada and Other Visionary Leaders Reinventing Governance in India

BMAW 2014 - Using Bayesian Networks to Identify and Prevent Split Purchases in Brazil

  • 1. USING BAYESIAN NETWORKS TO IDENTIFY AND PREVENT SPLIT PURCHASES IN BRAZIL
  • 2. USING BAYESIAN NETWORKS TO IDENTIFY AND PREVENT SPLIT PURCHASES IN BRAZIL Rommel N. Carvalho, Leonardo J. Sales, Henrique A. da Rocha, and Gilson L. Mendes ! Data Science Team Leader / Professor http://about.me/rommelnc ! Department of Research and Strategic Information (DIE) / Department of Computer Science (CIC) Brazil’s Office of the Comptroller General (CGU) / University of Brasília (UnB) ! BMAW workshop @ UAI 2014 Quebec City, Quebec, Canada - 07/27/2014 2
  • 3. Introduction Methodology Data Understanding and Preparation Modeling and Evaluation Deployment Conclusion 3 AGENDA
  • 4. INTRODUCTION 4 Introduction - Methodology - Data Understanding and Preparation - Modeling and Evaluation - Deployment - Conclusion
  • 5. CGU is responsible for inspecting and auditing Brazilian Government projects and programs providing transparency and preventing corruption All contracts with the private sector must follow the national procurement law Select the most advantageous proposal for a contract in its interest Susceptible to many forms of corruption 5 INTRODUCTION Introduction - Methodology - Data Understanding and Preparation - Modeling and Evaluation - Deployment - Conclusion
  • 6. Prevent and detect government corruption For that it must gather information from a variety of sources and combine it to evaluate whether further action, such as an investigation, is required Information explosion Growing Acceleration Program (PAC) alone has a budget greater than 250 billion dollars More than one thousand projects only on the state of Sao Paulo 6 CGU’S RESPONSIBILITY Introduction - Methodology - Data Understanding and Preparation - Modeling and Evaluation - Deployment - Conclusion
  • 7. The Public Agent may favor a specific supplier that she happens to know The Public Agent may receive, from the bidder, a financial compensation for awarding a contract to that firm Bidders may collude as to set the results of the procurement And many others from within and outside the public administration 7 CORRUPTION Introduction - Methodology - Data Understanding and Preparation - Modeling and Evaluation - Deployment - Conclusion
  • 8. Law has many articles that established penalties for firms or/and public legislators caught in corruption activities Two types of penalties Administrative actions Penal actions Enforcing the law is difficult Avoid corruption in the first place 8 PREVENTION VS PUNISHMENT Introduction - Methodology - Data Understanding and Preparation - Modeling and Evaluation - Deployment - Conclusion
  • 9. Apply Data Mining methods to create models that will aid the experts in identifying procurement frauds If possible, identify suspicious transactions as soon as possible in order to prevent the fraud from happening 9 PROJECT OBJECTIVE Introduction - Methodology - Data Understanding and Preparation - Modeling and Evaluation - Deployment - Conclusion
  • 10. METHODOLOGY 10 Introduction - Methodology - Data Understanding and Preparation - Modeling and Evaluation - Deployment - Conclusion
  • 11. 11 CRoss Industry Standard Process for Data Mining Introduction - Methodology - Data Understanding and Preparation - Modeling and Evaluation - Deployment - Conclusion CRISP-DM
  • 12. DATA UNDERSTANDING AND PREPARATION 12 Introduction - Methodology - Data Understanding and Preparation - Modeling and Evaluation - Deployment - Conclusion
  • 13. Related to procurements and contracts of IT services on the Brazilian Federal Government from 2005 to 2010 Merged data from different databases 42 attributes 70,365 transactions 13 THE DATA Introduction - Methodology - Data Understanding and Preparation - Modeling and Evaluation - Deployment - Conclusion
  • 14. CGU has been working closely with experts on the domain and want to identify certain fraud topologies, such as: Identify if owners from different companies are actually partners Identify split purchases, i.e., if big contracts are being broken down in small ones (adding up to more than 8,000) More than 20 more types of frauds to identify Focus on the second one 14 WHAT FOR?* Introduction - Methodology - Data Understanding and Preparation - Modeling and Evaluation - Deployment - Conclusion *This is a bit more on business understanding.
  • 15. Loaded in Excel Removed some characters , “ ‘ Replaced missing values NA, -9, -8, and 0 by ? 15 PREPARING THE DATA I Introduction - Methodology - Data Understanding and Preparation - Modeling and Evaluation - Deployment - Conclusion
  • 16. Removed some attributes Mostly IDs E.g., we kept the name of the state instead of its ID Because it is easier to understand what it means From 42 to 26 attributes Saved the data as CSV file and loaded in Weka Changed the year to nominal Removed rows with missing values in final price 16 PREPARING THE DATA II Introduction - Methodology - Data Understanding and Preparation - Modeling and Evaluation - Deployment - Conclusion
  • 17. Typical cases of data that should not be trusted or considered incorrect Values from cents to hundreds of trillions of dollars Proposed unit price was 4 thousand whereas the final actual price was a dollar Number of cases with a really high value is small Analyzed by experts Remove noise (done for obvious cases) Identify outliers (still being analyzed by experts) 17 ANALYZING THE DATA Introduction - Methodology - Data Understanding and Preparation - Modeling and Evaluation - Deployment - Conclusion
  • 18. Interested in transactions that sum to 8,000 in a given period of time Created class attribute using R Irregular* - transactions that involved the same institutions on the same month and year that added up to more than 8,000 Regular - all others This classification is easy but... 18 CHANGING THE DATA Introduction - Methodology - Data Understanding and Preparation - Modeling and Evaluation - Deployment - Conclusion *This is a simplification, since we have restricted our work to IT purchases.
  • 19. MODELING AND EVALUATION 19 Introduction - Methodology - Data Understanding and Preparation - Modeling and Evaluation - Deployment - Conclusion
  • 20. The idea is to try to classify, without looking at the sums, which transactions should be considered suspicious Avoid waiting for the problem to occur Prevention over punishment Identify the problem as soon as possible Find suspicious transactions in days (proactive) Not after a month (reactive) 20 ...THIS IS HARD!!! Introduction - Methodology - Data Understanding and Preparation - Modeling and Evaluation - Deployment - Conclusion
  • 21. 21 BACK TO PREPARATION Introduction - Methodology - Data Understanding and Preparation - Modeling and Evaluation - Deployment - Conclusion
  • 22. 22 NAÏVE BAYES VS BAYES NET (K2) - WITH AND WITHOUT RESAMPLING Introduction - Methodology - Data Understanding and Preparation - Modeling and Evaluation - Deployment - Conclusion
  • 23. 23 CHANGING ALGORITHMS AND NUMBER OF PARENTS Introduction - Methodology - Data Understanding and Preparation - Modeling and Evaluation - Deployment - Conclusion
  • 24. 24 CAN IT GET BETTER? Introduction - Methodology - Data Understanding and Preparation - Modeling and Evaluation - Deployment - Conclusion
  • 25. Analyzing the structure of the BN we were able to identify some interesting relations about The government office responsible for the procurement And also the winner of the procurements Due to confidentiality nature of the problem we are not allowed to discuss these relations further I can say that more analysis is needed Still to be done due to time constrain 25 UNDERSTANDING WHY I Introduction - Methodology - Data Understanding and Preparation - Modeling and Evaluation - Deployment - Conclusion
  • 26. Tried to find if there was something obvious about the data that made it easy to predict Used Apriori with min confidence .1 and min lift 1.1 generating 50 rules Nothing interesting at the first run (confidence 1 and lift 1.82) Removed uninteresting attributes Still nothing interesting (now with highest confidence of .62 and lift of 1.11) 26 UNDERSTANDING WHY II Introduction - Methodology - Data Understanding and Preparation - Modeling and Evaluation - Deployment - Conclusion
  • 27. DEPLOYMENT 27 Introduction - Methodology - Data Understanding and Preparation - Modeling and Evaluation - Deployment - Conclusion
  • 28. Identifying suspicious transactions as soon as possible to avoid the occurrence of fraud What kind of fraud? The type of fraud we are concerned about in this project is split purchases. I.e., when a procurement that should be done just once with a value higher than 8,000 is broken down in smaller ones to avoid the normal procurement procedure, allowing the office to hire an enterprise directly 28 BACK TO THE GOAL Introduction - Methodology - Data Understanding and Preparation - Modeling and Evaluation - Deployment - Conclusion
  • 29. Detect one or more of these procurements, but that still do not sum up to more than 8,000, as soon as they become suspicious (using the classifier) and act proactively Faster response and prevent irregularities in the first place Warn/educate/teach the people involved that they should be careful, since breaking down big procurements in small ones is not allowed 29 THE BIG IDEA Introduction - Methodology - Data Understanding and Preparation - Modeling and Evaluation - Deployment - Conclusion
  • 30. Make those that do it on purpose to think twice Educate those that just don’t know better Expect a decrease in the number of irregular transactions associated to this type of fraud Important to keep track of statistics to verify this expectation As people learn and change their behavior, it is also expected that our classifier will also need some changes to cope with these modifications 30 THE CONSEQUENCES Introduction - Methodology - Data Understanding and Preparation - Modeling and Evaluation - Deployment - Conclusion
  • 31. CONCLUSION 31 Introduction - Methodology - Data Understanding and Preparation - Modeling and Evaluation - Deployment - Conclusion
  • 32. Data Mining can be used to identify and prevent procurement frauds in Brazil Prioritize depth instead of breadth Results paid off Correctly classified all split purchases Really high ROC area (.999) High accuracy (99.197%) 32 CONCLUSIONS Introduction - Methodology - Data Understanding and Preparation - Modeling and Evaluation - Deployment - Conclusion
  • 33. Deploy this classifier in CGU’s intelligence system and analyze its impact Better understand why the model is capable of getting such good results By carefully analyzing the structure of the BNs generated and the variable’s CPTs Be careful with overfitting Still a lot of different types of frauds to be analyzed 33 FUTURE WORK Introduction - Methodology - Data Understanding and Preparation - Modeling and Evaluation - Deployment - Conclusion