SlideShare a Scribd company logo
Big Data & Taxonomies for
Actionable Intelligence
Mondeca
Ghislain Atemezing
Director, Research and Innovation
@gatemezing
Taxonomy Boot Camp
Track 2: Taxonomy Applications
Nov 5, Washington, DC
Help companies forestall and assess trade risks while protecting them against overdue items
Healthy snacking from the best of nature
Food industry
Healthy Snacks
Credit
Insurance
USA, France,
Mexico
USA, France, UK,
China, Portugal, Brazil
French, Portuguese, English,
Chinese
French, English,
Spanish
50+ power users
10+ power users
Our clients businesses are very different
Things both of our clients shared
3
None of them had a taxonomy department
None of them employed a taxonomist
Both wanted to analyze a large volume of diverse unstructured
content for the purpose of future risk identification
GoGo squeeZ wanted to listen to consumers
and detect risk related consumer issues as
soon as they were raised
Coface intended to go beyond traditional
balance sheet analysis and identify solvency
risk from information published in the media
How did it all start?
4
Risk profiles vary in time
and need to be adjusted
asap
Analysts need to detect
risks asap
We need to catch all
consumer-related
events asap
We need the big picture and
view results across a set of
categories (time, country,
activity, risk events)
Voice of consumer is
captured in multiple
different channels
We are missing important
messages because of the
volumes of User
Generated Content
Collecting a body of
evidence helps to assess
companies performance
We need to share alerts
proactively across our
marketing and quality
management teams
From business objectives to solution requirements
 Detect consumer risk
 Assess credit risk
Objectives
• Provide alerts/insight on product quality and associated risks
• Support quick decision making
Audience
Service & Quality managers and Executives
Requirements
• Monitor consumer feedback in real-time
• Collate data from different open and internal sources
• Reconcile data, identify and rank consumer risks
• Generate color-coded alerts
• Be 100% automated – end to end
Requirements
• Dynamically define online sources to monitor
• Collate data from open source to establish risk profiles
• Reconcile data, identify different types of trade risks
• Generate risk scores
• Be automated and allow human evaluation of information
Objectives
• Assess early events affecting companies’ solvency
• Streamline the risk evaluation process
Audience
Credit risk analysts
Common high level architecture
6
API-based connectors
Persistence DB
Datapipe/ETL
Text analytics
Auto-classification
Taxonomy
Semantic
Rules
Search index
Data visualization portal
Social
web
Consumer service
Call centers
Business managers Risk analysts
+800,000 messages/year
10 sources
News feeds
Websites
Scrapers
Persistence DB
Datapipe/ETL
Text analytics
Auto-classification
Persistence DB
Search index
Knowledge portal
Taxonomy
Semantic
Rules
+1,500,000 articles/year
English, Spanish, French 200+ sources
Chinese, Portuguese
English, Spanish, French
GoGo squeeZ UX targets alerts and trend changes
Synthetic dashboards Interactive analytics Focused on trend / time series
Extended search features and instant access to source data
Real-time, visual, easy-to-use, interactive, fast analytics
Coface UX is process oriented
Risk grades
Rationale
Terms & concepts detected
Acceptance or rejection of detected events
Relevance evaluation
The vocabulary was already there
More complex content, newspaper style,
POS analysis a must
Not many new terms, but more complex
sentences
Disambiguation was an issue requiring
complex ML & Word2Vec approach
Removing duplicates was an issue
Focus on precision, UX and response time
One size does not fit all … fine tuning was required
The taxonomy was build from scratch
Short, simple content, social media like, no
requirement for POS analytics
Evolving, informal language; candidate terms
discovered all the time
Little ambiguity – almost all content
somewhat relates to the subject matter
No issue in deduplicating content
Focus on recall and real time data availability
Key project metrics are similar for both projects
6 months elapsed
Taxonomy work Front endBack end
40% 40% 20%
Day 1 Go Live
Take away and lessons learned
The architecture was
generic enough to
respond to both client
requirements
UI/UX design was very
different and critical to
achieve user acceptance
Results achieved
through combination of
rule-based and machine
learning approach
12
Questions ?
Thank you

More Related Content

PDF
RPNA 4.0 Fact Sheet
PDF
Chapter 2 Foundations Of AI in Finance.pdf
PPTX
Risk_Pesentation for educaitional purposes
PDF
Disrupting Risk Management through Emerging Technologies
PPTX
How to seize B2B market opportunities thanks to Big Data
PDF
ORIGINATIONNEXT- Risk Assessment Model
PPTX
Strategic Enterprise Risk and Data Architecture
PPTX
Barclays - Case Study Competition | ISB | National Finalist
RPNA 4.0 Fact Sheet
Chapter 2 Foundations Of AI in Finance.pdf
Risk_Pesentation for educaitional purposes
Disrupting Risk Management through Emerging Technologies
How to seize B2B market opportunities thanks to Big Data
ORIGINATIONNEXT- Risk Assessment Model
Strategic Enterprise Risk and Data Architecture
Barclays - Case Study Competition | ISB | National Finalist

Similar to Big Data & Taxonomies for Actionable Intelligence (20)

PPTX
Big Data solution for multi-national Bank
PPTX
Eurisc analytics and big data solutions
PPTX
Strategic Enterprise Risk and Data Architecture
PDF
AI Powered Financial Risk Detection Real Time Analysis & Automated Business I...
PPTX
Deep Machine Reading for Customer Analytics
PDF
Product Taxonomy & Model Risk Management: 'Putting the Beans back into Coffee'
PDF
AntwMS-GREENOMY.pdf
PPTX
AI powered decision making in banks
PPTX
Predictive Analytics: Extending asset management framework for multi-industry...
PDF
"Risk Management in Open Finance Era" 26-12-2020
PPT
Big Data? Big Deal, Barclaycard
PDF
Finding the needle in the haystack: how Nestle is leveraging big data to defe...
PDF
Big data in marketing at harvard business club nick1 june 15 2013
PDF
Initio digital innovation digest #16 july 2019
PDF
Big Data LDN 2018: DEUTSCHE BANK: THE PATH TO AUTOMATION IN A HIGHLY REGULATE...
PDF
Financial server blue print - Blueprints.pdf
PDF
Use of Analytics to recover from COVID19 hit economy
PDF
Scaling Taxonomy Expertise: Helping Product Teams Navigate Taxonomy Adoption ...
PPTX
Risk listening: monitoring for profitable growth
PDF
ZIGRAM Introduction March 2023
Big Data solution for multi-national Bank
Eurisc analytics and big data solutions
Strategic Enterprise Risk and Data Architecture
AI Powered Financial Risk Detection Real Time Analysis & Automated Business I...
Deep Machine Reading for Customer Analytics
Product Taxonomy & Model Risk Management: 'Putting the Beans back into Coffee'
AntwMS-GREENOMY.pdf
AI powered decision making in banks
Predictive Analytics: Extending asset management framework for multi-industry...
"Risk Management in Open Finance Era" 26-12-2020
Big Data? Big Deal, Barclaycard
Finding the needle in the haystack: how Nestle is leveraging big data to defe...
Big data in marketing at harvard business club nick1 june 15 2013
Initio digital innovation digest #16 july 2019
Big Data LDN 2018: DEUTSCHE BANK: THE PATH TO AUTOMATION IN A HIGHLY REGULATE...
Financial server blue print - Blueprints.pdf
Use of Analytics to recover from COVID19 hit economy
Scaling Taxonomy Expertise: Helping Product Teams Navigate Taxonomy Adoption ...
Risk listening: monitoring for profitable growth
ZIGRAM Introduction March 2023
Ad

More from Ghislain Atemezing (10)

PPTX
Trends on Data Graphs & Security for the Internet of Things
PDF
Benchmarking Commercial RDF Stores with Publications Office Dataset
PDF
Phd defense slides
PDF
LIVE: a Tool for Checking Licenses Compatibility between Vocabularies and Data
PDF
publishing-ign-data
PDF
cold2014-ldvizwiz
PPTX
Information Content based Ranking Metric for Linked Open Vocabularies
PDF
Harmonizing services for LOD vocabularies: a case study
PDF
Visualisation and linked data applications edf 2013
PDF
Comparing Vocabularies for Representing Geographical Features and Their Geometry
Trends on Data Graphs & Security for the Internet of Things
Benchmarking Commercial RDF Stores with Publications Office Dataset
Phd defense slides
LIVE: a Tool for Checking Licenses Compatibility between Vocabularies and Data
publishing-ign-data
cold2014-ldvizwiz
Information Content based Ranking Metric for Linked Open Vocabularies
Harmonizing services for LOD vocabularies: a case study
Visualisation and linked data applications edf 2013
Comparing Vocabularies for Representing Geographical Features and Their Geometry
Ad

Recently uploaded (20)

PDF
Microsoft Core Cloud Services powerpoint
PDF
Introduction to the R Programming Language
PPT
lectureusjsjdhdsjjshdshshddhdhddhhd1.ppt
PDF
Jean-Georges Perrin - Spark in Action, Second Edition (2020, Manning Publicat...
PPTX
SAP 2 completion done . PRESENTATION.pptx
PPTX
IMPACT OF LANDSLIDE.....................
PPTX
sac 451hinhgsgshssjsjsjheegdggeegegdggddgeg.pptx
PDF
Transcultural that can help you someday.
PPT
statistic analysis for study - data collection
PDF
[EN] Industrial Machine Downtime Prediction
PPTX
Lesson-01intheselfoflifeofthekennyrogersoftheunderstandoftheunderstanded
PPTX
Topic 5 Presentation 5 Lesson 5 Corporate Fin
PPTX
QUANTUM_COMPUTING_AND_ITS_POTENTIAL_APPLICATIONS[2].pptx
PPTX
(Ali Hamza) Roll No: (F24-BSCS-1103).pptx
PPTX
modul_python (1).pptx for professional and student
PPTX
New ISO 27001_2022 standard and the changes
PPTX
CYBER SECURITY the Next Warefare Tactics
PPTX
STERILIZATION AND DISINFECTION-1.ppthhhbx
PDF
Global Data and Analytics Market Outlook Report
PPTX
FMIS 108 and AISlaudon_mis17_ppt_ch11.pptx
Microsoft Core Cloud Services powerpoint
Introduction to the R Programming Language
lectureusjsjdhdsjjshdshshddhdhddhhd1.ppt
Jean-Georges Perrin - Spark in Action, Second Edition (2020, Manning Publicat...
SAP 2 completion done . PRESENTATION.pptx
IMPACT OF LANDSLIDE.....................
sac 451hinhgsgshssjsjsjheegdggeegegdggddgeg.pptx
Transcultural that can help you someday.
statistic analysis for study - data collection
[EN] Industrial Machine Downtime Prediction
Lesson-01intheselfoflifeofthekennyrogersoftheunderstandoftheunderstanded
Topic 5 Presentation 5 Lesson 5 Corporate Fin
QUANTUM_COMPUTING_AND_ITS_POTENTIAL_APPLICATIONS[2].pptx
(Ali Hamza) Roll No: (F24-BSCS-1103).pptx
modul_python (1).pptx for professional and student
New ISO 27001_2022 standard and the changes
CYBER SECURITY the Next Warefare Tactics
STERILIZATION AND DISINFECTION-1.ppthhhbx
Global Data and Analytics Market Outlook Report
FMIS 108 and AISlaudon_mis17_ppt_ch11.pptx

Big Data & Taxonomies for Actionable Intelligence

  • 1. Big Data & Taxonomies for Actionable Intelligence Mondeca Ghislain Atemezing Director, Research and Innovation @gatemezing Taxonomy Boot Camp Track 2: Taxonomy Applications Nov 5, Washington, DC
  • 2. Help companies forestall and assess trade risks while protecting them against overdue items Healthy snacking from the best of nature Food industry Healthy Snacks Credit Insurance USA, France, Mexico USA, France, UK, China, Portugal, Brazil French, Portuguese, English, Chinese French, English, Spanish 50+ power users 10+ power users Our clients businesses are very different
  • 3. Things both of our clients shared 3 None of them had a taxonomy department None of them employed a taxonomist Both wanted to analyze a large volume of diverse unstructured content for the purpose of future risk identification
  • 4. GoGo squeeZ wanted to listen to consumers and detect risk related consumer issues as soon as they were raised Coface intended to go beyond traditional balance sheet analysis and identify solvency risk from information published in the media How did it all start? 4 Risk profiles vary in time and need to be adjusted asap Analysts need to detect risks asap We need to catch all consumer-related events asap We need the big picture and view results across a set of categories (time, country, activity, risk events) Voice of consumer is captured in multiple different channels We are missing important messages because of the volumes of User Generated Content Collecting a body of evidence helps to assess companies performance We need to share alerts proactively across our marketing and quality management teams
  • 5. From business objectives to solution requirements  Detect consumer risk  Assess credit risk Objectives • Provide alerts/insight on product quality and associated risks • Support quick decision making Audience Service & Quality managers and Executives Requirements • Monitor consumer feedback in real-time • Collate data from different open and internal sources • Reconcile data, identify and rank consumer risks • Generate color-coded alerts • Be 100% automated – end to end Requirements • Dynamically define online sources to monitor • Collate data from open source to establish risk profiles • Reconcile data, identify different types of trade risks • Generate risk scores • Be automated and allow human evaluation of information Objectives • Assess early events affecting companies’ solvency • Streamline the risk evaluation process Audience Credit risk analysts
  • 6. Common high level architecture 6 API-based connectors Persistence DB Datapipe/ETL Text analytics Auto-classification Taxonomy Semantic Rules Search index Data visualization portal Social web Consumer service Call centers Business managers Risk analysts +800,000 messages/year 10 sources News feeds Websites Scrapers Persistence DB Datapipe/ETL Text analytics Auto-classification Persistence DB Search index Knowledge portal Taxonomy Semantic Rules +1,500,000 articles/year English, Spanish, French 200+ sources Chinese, Portuguese English, Spanish, French
  • 7. GoGo squeeZ UX targets alerts and trend changes Synthetic dashboards Interactive analytics Focused on trend / time series Extended search features and instant access to source data Real-time, visual, easy-to-use, interactive, fast analytics
  • 8. Coface UX is process oriented Risk grades Rationale Terms & concepts detected Acceptance or rejection of detected events Relevance evaluation
  • 9. The vocabulary was already there More complex content, newspaper style, POS analysis a must Not many new terms, but more complex sentences Disambiguation was an issue requiring complex ML & Word2Vec approach Removing duplicates was an issue Focus on precision, UX and response time One size does not fit all … fine tuning was required The taxonomy was build from scratch Short, simple content, social media like, no requirement for POS analytics Evolving, informal language; candidate terms discovered all the time Little ambiguity – almost all content somewhat relates to the subject matter No issue in deduplicating content Focus on recall and real time data availability
  • 10. Key project metrics are similar for both projects 6 months elapsed Taxonomy work Front endBack end 40% 40% 20% Day 1 Go Live
  • 11. Take away and lessons learned The architecture was generic enough to respond to both client requirements UI/UX design was very different and critical to achieve user acceptance Results achieved through combination of rule-based and machine learning approach