SlideShare a Scribd company logo
Data Transformation
Workflow
(from gsheet to jupyter)
July 2019, @DinisCruz
OSBOT
(OWASP Security Bot)
Start Jupyter Server
Jupyter Lab Dev Environment
Step 1
Loading Data from GSheet
Original Data Set
Viewing Data in Slack
Viewing data in Jupyter
Pandas
DataFrame
Jupyter QGrid
All code and data is auto-saved in Git
Step 2
Parsing dataset
Explaining what we are doing in an Jupyter Notebook
Powerful development environment
Starting refactoring code into methods
Get 3 datasets as separate DataFrames
Helper function to transform raw data into objects
Refactored code that extracts data into 3 DataFrames
3 DataFrames with data
Step 3
Merging data
Example 1 - Merge 3 datasets
All fields from all 3 data sets
(in this case there was only one exact match on email)
Example 2 - Merging data loses user with different email
When Merging DF_1 with DF_3 we lose `Alan Lee` due
to different email
Example 3 - Fixing data before merge (version 1)
Lost user due to bad data in Name
Lost `Bruno Lyon`
User
Because `Name`
value is wrong
Example 4 - Fixing data before merge (version 2)
The merge of the two DataFrames now successfully finds
all 4 users (including Alan who has two emails)
By using email to find the first and last names values

More Related Content

PDF
Map camp - Why context is your crown jewels (Wardley Maps and Threat Modeling)
PDF
Glasswall - Safety and Integrity Through Trusted Files
PDF
Glasswall - How to Prevent, Detect and React to Ransomware incidents
PDF
The benefits of police and industry investigation - NPCC Conference
PDF
Serverless Security Workflows - cyber talks - 19th nov 2019
PDF
Modern security using graphs, automation and data science
PDF
Using Wardley Maps to Understand Security's Landscape and Strategy
PDF
Dinis Cruz (CV) - CISO and Transformation Agent v1.2
Map camp - Why context is your crown jewels (Wardley Maps and Threat Modeling)
Glasswall - Safety and Integrity Through Trusted Files
Glasswall - How to Prevent, Detect and React to Ransomware incidents
The benefits of police and industry investigation - NPCC Conference
Serverless Security Workflows - cyber talks - 19th nov 2019
Modern security using graphs, automation and data science
Using Wardley Maps to Understand Security's Landscape and Strategy
Dinis Cruz (CV) - CISO and Transformation Agent v1.2

More from Dinis Cruz (20)

PDF
Making fact based decisions and 4 board decisions (Oct 2019)
PDF
CISO Application presentation - Babylon health security
PDF
Using OWASP Security Bot (OSBot) to make Fact Based Security Decisions
PDF
GSBot Commands (Slack Bot used to access Jira data)
PDF
(OLD VERSION) Dinis Cruz (CV) - CISO and Transformation Agent v0.6
PDF
Jira schemas - Open Security Summit (Working Session 21th May 2019)
PDF
Template for "Sharing anonymised risk theme dashboards v0.8"
PDF
Owasp and summits (may 2019)
PDF
Creating a graph based security organisation - Apr 2019 (OWASP London chapter...
PDF
Open security summit 2019 owasp london 25th feb
PDF
Owasp summit 2019 - OWASP London 25th feb
PDF
Evolving challenges for modern enterprise architectures in the age of APIs
PDF
How to not fail at security data analytics (by CxOSidekick)
PDF
Thinking in graphs v1.0
PDF
Open Security Summit - April 2018
PDF
Using security to drive chaos engineering - April 2018
PDF
Using security to drive chaos engineering
PDF
Scaling security in a cloud environment v0.5 (Sep 2017)
PDF
Improving the quality of Cyber Security Hires via Pre-Interview Challenges
PDF
Creating a Graph Based Security Organisation - DevSecCon Keynote
Making fact based decisions and 4 board decisions (Oct 2019)
CISO Application presentation - Babylon health security
Using OWASP Security Bot (OSBot) to make Fact Based Security Decisions
GSBot Commands (Slack Bot used to access Jira data)
(OLD VERSION) Dinis Cruz (CV) - CISO and Transformation Agent v0.6
Jira schemas - Open Security Summit (Working Session 21th May 2019)
Template for "Sharing anonymised risk theme dashboards v0.8"
Owasp and summits (may 2019)
Creating a graph based security organisation - Apr 2019 (OWASP London chapter...
Open security summit 2019 owasp london 25th feb
Owasp summit 2019 - OWASP London 25th feb
Evolving challenges for modern enterprise architectures in the age of APIs
How to not fail at security data analytics (by CxOSidekick)
Thinking in graphs v1.0
Open Security Summit - April 2018
Using security to drive chaos engineering - April 2018
Using security to drive chaos engineering
Scaling security in a cloud environment v0.5 (Sep 2017)
Improving the quality of Cyber Security Hires via Pre-Interview Challenges
Creating a Graph Based Security Organisation - DevSecCon Keynote
Ad

Recently uploaded (20)

PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
Machine learning based COVID-19 study performance prediction
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PPTX
sap open course for s4hana steps from ECC to s4
PPTX
Spectroscopy.pptx food analysis technology
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
Mobile App Security Testing_ A Comprehensive Guide.pdf
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Per capita expenditure prediction using model stacking based on satellite ima...
Diabetes mellitus diagnosis method based random forest with bat algorithm
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Machine learning based COVID-19 study performance prediction
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
20250228 LYD VKU AI Blended-Learning.pptx
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Advanced methodologies resolving dimensionality complications for autism neur...
Reach Out and Touch Someone: Haptics and Empathic Computing
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
sap open course for s4hana steps from ECC to s4
Spectroscopy.pptx food analysis technology
Programs and apps: productivity, graphics, security and other tools
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Building Integrated photovoltaic BIPV_UPV.pdf
“AI and Expert System Decision Support & Business Intelligence Systems”
Ad

OSBot - Data transformation workflow (from GSheet to Jupyter)