SlideShare a Scribd company logo
Copyright © 2017 Information Systems Audit and Control Association, Inc. All rights reserved.
Andrew Clark, IT Auditor / Internal Audit Data Scientist
Astec Industries, Inc., M.S. Data Science Candidate
Copyright © 2017 Information Systems Audit and Control Association, Inc. All rights reserved.
Overview
1. What is open source software?
2. Why is it important?
3. What are the benefits of using open source software for analytics over
CAATs?
4. How do I begin using open source software for analytics?
5. Case study
6. The application of advanced analytic techniques
Copyright © 2017 Information Systems Audit and Control Association, Inc. All rights reserved.
Meet Open Source
Copyright © 2017 Information Systems Audit and Control Association, Inc. All rights reserved.
Open Source Software
“Open source software is software whose source code is available for
modification or enhancement by anyone.”
What Is Open Source?" Opensource.com. Accessed June 12, 2016. https://opensource.com/resources/what-open-source.
Copyright © 2017 Information Systems Audit and Control Association, Inc. All rights reserved.
Open Source examples
1. Linux (mainly)
2. Android (mainly)
3. Firefox
4. R programming language
5. Git
6. Docker
Copyright © 2017 Information Systems Audit and Control Association, Inc. All rights reserved.
Why is it important?
• Vibrant community
• Frequent updates
• Potential for strong security
• Cutting edge technology
• Customizable
• Cost
Copyright © 2017 Information Systems Audit and Control Association, Inc. All rights reserved.
How does Open Source relate to Audit Analytics?
• State of the art technology
• Computer science's best and brightest love to contribute
• Customizable
• Scalability
• Beautiful visualizations
• Analytics and Data Science leaders use almost exclusively open source
frameworks for their analytics, i.e. Google, Facebook, Uber, Airbnb, etc.
Copyright © 2017 Information Systems Audit and Control Association, Inc. All rights reserved.
"Bubble Charts." Plotly. Accessed August 14, 2016. https://plot.ly/python/bubble-charts/.
Copyright © 2017 Information Systems Audit and Control Association, Inc. All rights reserved.
Benefits over traditional CAATs
• ACL, IDEA, Arbutus, the existing market leaders
• Not very user friendly
• Requires extensive training to use effectively
• Not very flexible
• Does not provide the output auditors are expecting
Copyright © 2017 Information Systems Audit and Control Association, Inc. All rights reserved.
So what do we do about it?
Copyright © 2017 Information Systems Audit and Control Association, Inc. All rights reserved.
Enter Python (and R)
Copyright © 2017 Information Systems Audit and Control Association, Inc. All rights reserved.
What is Python?
"About Python." Python.org. Accessed August 14, 2016. https://www.python.org/about/.
• Open source, general purpose programming
language
• High level of support
• Used by some of the best and brightest in
Data Science
• Extensive scientific, mathematic,
data wrangling and visualization libraries
• Most popular first language in computer
• science departments across America
(http://tinyurl.com/knw5mdv)
Copyright © 2017 Information Systems Audit and Control Association, Inc. All rights reserved.
What is R?
• "R is a language and environment for statistical computing and graphics."-
"What Is R?" The R Project for Statistical Computing. Accessed August 14, 2016. https://www.r-project.org/about.html.
• Used widely by statisticians for statistical analysis
• As a result of its widespread use, thousands of easy to implement libraries
that provide *all* widely used statistical techniques
• Is not a 'real' programming language
Copyright © 2017 Information Systems Audit and Control Association, Inc. All rights reserved.
How would we go about using Python (or R)?
• The hard way: by learning it
• The even harder way: hire an auditor with programming, analytics and
auditing experience
• The *easiest* and most effective way: create a cross functional team by
borrowing a programmer from IT and a business analyst from the
business.
Copyright © 2017 Information Systems Audit and Control Association, Inc. All rights reserved.
Example Python (and R) analytic test
• https://github.com/aclarkData/AuditAnalytics
• 999 amount, weekends and keywords journal entry tests
• Steps:
• Input libraries
• Import data
• Wrangle as needed
• Export to folder
• Email
• Schedule - Task Scheduler in Windows, Cron, or equivalent in Unix based system, i.e. Mac and Linux
Copyright © 2017 Information Systems Audit and Control Association, Inc. All rights reserved.
Copyright © 2017 Information Systems Audit and Control Association, Inc. All rights reserved.
Copyright © 2017 Information Systems Audit and Control Association, Inc. All rights reserved.
Copyright © 2017 Information Systems Audit and Control Association, Inc. All rights reserved.
Copyright © 2017 Information Systems Audit and Control Association, Inc. All rights reserved.
Machine Learning
• In essence, a machine understanding patterns in data without having to be
explicitly programmed.
• Very, very powerful technology that is transforming banking, search
engines, advertising, and soon, every industry.
• Examples: Credit card fraud detection, target demographic advertising, anomalous
sensory data, etc.
Copyright © 2017 Information Systems Audit and Control Association, Inc. All rights reserved.
Machine Learning Cont.
• Numerous possibilities for utilizing machine learning and related
technology, e.x. Natural Language Processing, etc., for Financial Auditing
• For example, unsupervised clustering algorithm in use at Astec Industries.
• Latest developments are only available in open source software or
expensive statistical or computational programs such as SAS, which
currently runs at a minimum of $9,200 upfront per single user license plus
annual fees - “SAS® Analytics Pro." SAS®. Accessed August 26, 2016. https://www.sas.com/store/software/analytics-
pro/prodPERSANL.html.
Copyright © 2017 Information Systems Audit and Control Association, Inc. All rights reserved.
Possibilities
• Time Series Machine Learning for predicting account balances
• Natural Language Processing techniques for contract review and
summarization - current bottleneck is (OCR) Optical Character Recognition
technology.
• Sentiment Analysis for Journal Entry and Transaction descriptions.
• Jupyter notebooks for reproducible analytics and audit documentation
Copyright © 2017 Information Systems Audit and Control Association, Inc. All rights reserved.
Copyright © 2017 Information Systems Audit and Control Association, Inc. All rights reserved.
Conclusion
• Definition of Open Source Software
• Unlimited possibilities for a customizable analytics experience
• Scalable
• Real world example
• Machine Learning and the future of audit analytics
Copyright © 2017 Information Systems Audit and Control Association, Inc. All rights reserved.
Thank you!
• Email: andrewtaylorclark@gmail.com
• GitHub: aclarkData
• Blog: https: aclarkdata.github.io
• LinkedIn: www.linkedin.com/in/andrew-clark-b326b767

More Related Content

PPTX
The Machine Learning Audit. MIS ITAC 2017 Keynote
PDF
ITAC 2016 Where Open Source Meets Audit Analytics
PPTX
The Machine Learning Audit
PPTX
Machine Learning for Auditors: What you need to know - ISACA North America CA...
PPTX
Machine Learning for Auditors
PPTX
AI and Security
PDF
Big Data Analytics to Enhance Security คุณอนพัทย์ พิพัฒน์กิติบดี Technical Ma...
PDF
Data science presentation 2nd CI day
The Machine Learning Audit. MIS ITAC 2017 Keynote
ITAC 2016 Where Open Source Meets Audit Analytics
The Machine Learning Audit
Machine Learning for Auditors: What you need to know - ISACA North America CA...
Machine Learning for Auditors
AI and Security
Big Data Analytics to Enhance Security คุณอนพัทย์ พิพัฒน์กิติบดี Technical Ma...
Data science presentation 2nd CI day

What's hot (20)

PPTX
IANS Forum Dallas - Technology Spotlight Session
PDF
Data Science Tutorial | What is Data Science? | Data Science For Beginners | ...
PPTX
Career in Data Science
PDF
Data Scientist Job, Career & Salary | Data Scientist Salary | Data Science Ma...
PPTX
Operationalizing Big Data Security Analytics - IANS Forum Dallas
PPTX
How To Become a Data Scientist in Iran Marketplace
PDF
Introduction To Data Science
PPTX
Introduction to Big Data/Machine Learning
PDF
April 2015 Webinar: Cyber Hunting with Sqrrl
PDF
Introduction to Data Science
PDF
Data Science Training | Data Science Tutorial for Beginners | Data Science wi...
PDF
Machine Learning for Incident Detection: Getting Started
PDF
AI-SDV 2021: Francisco Webber - Efficiency is the New Precision
PPTX
introduction to data science
PDF
Training in Analytics and Data Science
PDF
Introduction to Data Science (Data Science Thailand Meetup #1)
PDF
Introduction to Big Data Analytics and Data Science
PDF
User and Entity Behavior Analytics using the Sqrrl Behavior Graph
PPTX
Data science 101
PPTX
Intro to Data Science by DatalentTeam at Data Science Clinic#11
IANS Forum Dallas - Technology Spotlight Session
Data Science Tutorial | What is Data Science? | Data Science For Beginners | ...
Career in Data Science
Data Scientist Job, Career & Salary | Data Scientist Salary | Data Science Ma...
Operationalizing Big Data Security Analytics - IANS Forum Dallas
How To Become a Data Scientist in Iran Marketplace
Introduction To Data Science
Introduction to Big Data/Machine Learning
April 2015 Webinar: Cyber Hunting with Sqrrl
Introduction to Data Science
Data Science Training | Data Science Tutorial for Beginners | Data Science wi...
Machine Learning for Incident Detection: Getting Started
AI-SDV 2021: Francisco Webber - Efficiency is the New Precision
introduction to data science
Training in Analytics and Data Science
Introduction to Data Science (Data Science Thailand Meetup #1)
Introduction to Big Data Analytics and Data Science
User and Entity Behavior Analytics using the Sqrrl Behavior Graph
Data science 101
Intro to Data Science by DatalentTeam at Data Science Clinic#11
Ad

Similar to Where Open Source Meets Audit Analytics - ISACA North America CACS 2017 (20)

PDF
Maruti gollapudi cv
PPTX
Apache spark empowering the real time data driven enterprise - StreamAnalytix...
PPTX
New Reporting Experience in IBM Cognos Analytics: Demos of our Favorite New F...
PDF
R vs Python vs SAS
PDF
Big Data: Introducing BigInsights, IBM's Hadoop- and Spark-based analytical p...
PPTX
Cloud-native Enterprise Data Science Teams
PPTX
EXTENT-2017: Putting AI to Test
PPTX
SLAS 2017 - "Multiple Research Platforms: One Single Data Sharing Portal"
PDF
Big Data LDN 2017: How Big Data Insights Become Easily Accessible With Workfl...
PDF
Machine Data Is EVERYWHERE: Use It for Testing
PPTX
SAS an open ecosystem for Artifical Intelligence - Dean Zouari
PPTX
Case Studies: Enterprise BI vs Self-Service Analytics Tools: Real Life Consid...
PPTX
Getting the most from your API management platform: A case study
PPTX
Cloud-native Enterprise Data Science Teams
PDF
ICSE 2017 Keynote: Open Collaboration at Eclipse
PDF
Fried data summit big data for lob content
PPTX
TEC-Roundtable-API
PPTX
SRV210 Improving Microservice and Serverless Observability with Monitoring Data
PPTX
Using Cognos as a Data Source for Tableau: Demo & Live Case Study with Ixia
Maruti gollapudi cv
Apache spark empowering the real time data driven enterprise - StreamAnalytix...
New Reporting Experience in IBM Cognos Analytics: Demos of our Favorite New F...
R vs Python vs SAS
Big Data: Introducing BigInsights, IBM's Hadoop- and Spark-based analytical p...
Cloud-native Enterprise Data Science Teams
EXTENT-2017: Putting AI to Test
SLAS 2017 - "Multiple Research Platforms: One Single Data Sharing Portal"
Big Data LDN 2017: How Big Data Insights Become Easily Accessible With Workfl...
Machine Data Is EVERYWHERE: Use It for Testing
SAS an open ecosystem for Artifical Intelligence - Dean Zouari
Case Studies: Enterprise BI vs Self-Service Analytics Tools: Real Life Consid...
Getting the most from your API management platform: A case study
Cloud-native Enterprise Data Science Teams
ICSE 2017 Keynote: Open Collaboration at Eclipse
Fried data summit big data for lob content
TEC-Roundtable-API
SRV210 Improving Microservice and Serverless Observability with Monitoring Data
Using Cognos as a Data Source for Tableau: Demo & Live Case Study with Ixia
Ad

More from Andrew Clark (9)

PDF
GRC 2020 - IIA - ISACA Machine Learning Monitoring, Compliance and Governance
PDF
Blockchain for Auditors
PDF
The Machine Learning Audit
PDF
AWS for Auditors
PDF
Machine Learning Risk Management
PDF
Big data and other buzzwords
PDF
Machine Learning: What Assurance Professionals Need to Know
PPTX
Reinventing Auditing with Machine Learning
PPTX
Active Directory for Auditors
GRC 2020 - IIA - ISACA Machine Learning Monitoring, Compliance and Governance
Blockchain for Auditors
The Machine Learning Audit
AWS for Auditors
Machine Learning Risk Management
Big data and other buzzwords
Machine Learning: What Assurance Professionals Need to Know
Reinventing Auditing with Machine Learning
Active Directory for Auditors

Recently uploaded (20)

PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PPTX
Computer network topology notes for revision
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PPTX
climate analysis of Dhaka ,Banglades.pptx
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
PDF
Fluorescence-microscope_Botany_detailed content
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PPTX
Database Infoormation System (DBIS).pptx
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PPTX
Qualitative Qantitative and Mixed Methods.pptx
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PPTX
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
PDF
annual-report-2024-2025 original latest.
PPTX
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
PPTX
1_Introduction to advance data techniques.pptx
PDF
Lecture1 pattern recognition............
PPT
ISS -ESG Data flows What is ESG and HowHow
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
Introduction-to-Cloud-ComputingFinal.pptx
Computer network topology notes for revision
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
climate analysis of Dhaka ,Banglades.pptx
STUDY DESIGN details- Lt Col Maksud (21).pptx
Fluorescence-microscope_Botany_detailed content
Acceptance and paychological effects of mandatory extra coach I classes.pptx
Database Infoormation System (DBIS).pptx
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
Qualitative Qantitative and Mixed Methods.pptx
IBA_Chapter_11_Slides_Final_Accessible.pptx
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
annual-report-2024-2025 original latest.
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
1_Introduction to advance data techniques.pptx
Lecture1 pattern recognition............
ISS -ESG Data flows What is ESG and HowHow

Where Open Source Meets Audit Analytics - ISACA North America CACS 2017

  • 1. Copyright © 2017 Information Systems Audit and Control Association, Inc. All rights reserved. Andrew Clark, IT Auditor / Internal Audit Data Scientist Astec Industries, Inc., M.S. Data Science Candidate
  • 2. Copyright © 2017 Information Systems Audit and Control Association, Inc. All rights reserved. Overview 1. What is open source software? 2. Why is it important? 3. What are the benefits of using open source software for analytics over CAATs? 4. How do I begin using open source software for analytics? 5. Case study 6. The application of advanced analytic techniques
  • 3. Copyright © 2017 Information Systems Audit and Control Association, Inc. All rights reserved. Meet Open Source
  • 4. Copyright © 2017 Information Systems Audit and Control Association, Inc. All rights reserved. Open Source Software “Open source software is software whose source code is available for modification or enhancement by anyone.” What Is Open Source?" Opensource.com. Accessed June 12, 2016. https://opensource.com/resources/what-open-source.
  • 5. Copyright © 2017 Information Systems Audit and Control Association, Inc. All rights reserved. Open Source examples 1. Linux (mainly) 2. Android (mainly) 3. Firefox 4. R programming language 5. Git 6. Docker
  • 6. Copyright © 2017 Information Systems Audit and Control Association, Inc. All rights reserved. Why is it important? • Vibrant community • Frequent updates • Potential for strong security • Cutting edge technology • Customizable • Cost
  • 7. Copyright © 2017 Information Systems Audit and Control Association, Inc. All rights reserved. How does Open Source relate to Audit Analytics? • State of the art technology • Computer science's best and brightest love to contribute • Customizable • Scalability • Beautiful visualizations • Analytics and Data Science leaders use almost exclusively open source frameworks for their analytics, i.e. Google, Facebook, Uber, Airbnb, etc.
  • 8. Copyright © 2017 Information Systems Audit and Control Association, Inc. All rights reserved. "Bubble Charts." Plotly. Accessed August 14, 2016. https://plot.ly/python/bubble-charts/.
  • 9. Copyright © 2017 Information Systems Audit and Control Association, Inc. All rights reserved. Benefits over traditional CAATs • ACL, IDEA, Arbutus, the existing market leaders • Not very user friendly • Requires extensive training to use effectively • Not very flexible • Does not provide the output auditors are expecting
  • 10. Copyright © 2017 Information Systems Audit and Control Association, Inc. All rights reserved. So what do we do about it?
  • 11. Copyright © 2017 Information Systems Audit and Control Association, Inc. All rights reserved. Enter Python (and R)
  • 12. Copyright © 2017 Information Systems Audit and Control Association, Inc. All rights reserved. What is Python? "About Python." Python.org. Accessed August 14, 2016. https://www.python.org/about/. • Open source, general purpose programming language • High level of support • Used by some of the best and brightest in Data Science • Extensive scientific, mathematic, data wrangling and visualization libraries • Most popular first language in computer • science departments across America (http://tinyurl.com/knw5mdv)
  • 13. Copyright © 2017 Information Systems Audit and Control Association, Inc. All rights reserved. What is R? • "R is a language and environment for statistical computing and graphics."- "What Is R?" The R Project for Statistical Computing. Accessed August 14, 2016. https://www.r-project.org/about.html. • Used widely by statisticians for statistical analysis • As a result of its widespread use, thousands of easy to implement libraries that provide *all* widely used statistical techniques • Is not a 'real' programming language
  • 14. Copyright © 2017 Information Systems Audit and Control Association, Inc. All rights reserved. How would we go about using Python (or R)? • The hard way: by learning it • The even harder way: hire an auditor with programming, analytics and auditing experience • The *easiest* and most effective way: create a cross functional team by borrowing a programmer from IT and a business analyst from the business.
  • 15. Copyright © 2017 Information Systems Audit and Control Association, Inc. All rights reserved. Example Python (and R) analytic test • https://github.com/aclarkData/AuditAnalytics • 999 amount, weekends and keywords journal entry tests • Steps: • Input libraries • Import data • Wrangle as needed • Export to folder • Email • Schedule - Task Scheduler in Windows, Cron, or equivalent in Unix based system, i.e. Mac and Linux
  • 16. Copyright © 2017 Information Systems Audit and Control Association, Inc. All rights reserved.
  • 17. Copyright © 2017 Information Systems Audit and Control Association, Inc. All rights reserved.
  • 18. Copyright © 2017 Information Systems Audit and Control Association, Inc. All rights reserved.
  • 19. Copyright © 2017 Information Systems Audit and Control Association, Inc. All rights reserved.
  • 20. Copyright © 2017 Information Systems Audit and Control Association, Inc. All rights reserved. Machine Learning • In essence, a machine understanding patterns in data without having to be explicitly programmed. • Very, very powerful technology that is transforming banking, search engines, advertising, and soon, every industry. • Examples: Credit card fraud detection, target demographic advertising, anomalous sensory data, etc.
  • 21. Copyright © 2017 Information Systems Audit and Control Association, Inc. All rights reserved. Machine Learning Cont. • Numerous possibilities for utilizing machine learning and related technology, e.x. Natural Language Processing, etc., for Financial Auditing • For example, unsupervised clustering algorithm in use at Astec Industries. • Latest developments are only available in open source software or expensive statistical or computational programs such as SAS, which currently runs at a minimum of $9,200 upfront per single user license plus annual fees - “SAS® Analytics Pro." SAS®. Accessed August 26, 2016. https://www.sas.com/store/software/analytics- pro/prodPERSANL.html.
  • 22. Copyright © 2017 Information Systems Audit and Control Association, Inc. All rights reserved. Possibilities • Time Series Machine Learning for predicting account balances • Natural Language Processing techniques for contract review and summarization - current bottleneck is (OCR) Optical Character Recognition technology. • Sentiment Analysis for Journal Entry and Transaction descriptions. • Jupyter notebooks for reproducible analytics and audit documentation
  • 23. Copyright © 2017 Information Systems Audit and Control Association, Inc. All rights reserved.
  • 24. Copyright © 2017 Information Systems Audit and Control Association, Inc. All rights reserved. Conclusion • Definition of Open Source Software • Unlimited possibilities for a customizable analytics experience • Scalable • Real world example • Machine Learning and the future of audit analytics
  • 25. Copyright © 2017 Information Systems Audit and Control Association, Inc. All rights reserved. Thank you! • Email: andrewtaylorclark@gmail.com • GitHub: aclarkData • Blog: https: aclarkdata.github.io • LinkedIn: www.linkedin.com/in/andrew-clark-b326b767