SlideShare a Scribd company logo
Big Data for Line-of-Business Content
Jeff Fried
CTO, BA Insight
Data Summit
May 2017
Fried data summit big data for lob content
Big Data Is…
Focused on Search and
SharePoint since 2004
Longtime
Search Nerd
• CTO, BA Insight
• Senior PM, Microsoft
• VP, FAST
• SVP, LingoMotors
About Jeff Fried
Passionate About
• Search
• SharePoint
• Search-driven
applications
• Information Strategy
Blog:
BAinsight.com/blog
Technet Column
“A View from the
Crawlspace”
jeff.fried@bainsight.com
About BA Insight


– Connectivity
– Applications -
– Classification -
– Analytics

This session
Process data
Human data
Machine data
Source: Forrsights Strategy Spotlight: Business Intelligence And Big Data
Unstructured
50TB
Semi-structured
2 TB
Structured
12 TB
Only
12%
used today
Average data volume
per company
9 TB 75 TB
0.6 TB 5 TB
4 TB 50 TB
SMBs: LEs:
Companies don’t use most of their data
Fried data summit big data for lob content
Connectors to Many Enterprise Systems
• Aderant
• Amazon S3
• Alfresco
• Box
• Confluence
• CuadraSTAR
• Elite / 3E
• EMC Documentum
• EMC eRoom
• Google Drive
• HP Consolidated Archive
• (EAS, aka Zantaz)
• HPE Records Manager/HP TRIM
• IBM Connections
• IBM Content Manager
• IBM DB2
• IBM FileNet P8
• IBM Lotus Notes
• IBM WebSphere
• iManage Work
• Jive
• LegalKEY
• LexisNexis Interaction
• Lotus Notes Databases
• Microsoft Dynamics CRM
• Microsoft Exchange
• Microsoft Exchange Public Folders
• Microsoft SQL Server
• MySQL
• NetDocuments
• Neudesic The Firm Directory
• Objective
• OpenText LiveLink/RM
• OpenText eDOCS DM
• Oracle Database
• Oracle WebCenter
• Oracle WebCenter Content (UCM/Stellent)
• PLC/Practical Law
• ProLaw
• Salesforce.com
• SAP ERP
• ServiceNow
• SharePoint Online
• SharePoint 2016
• SharePoint 2013
• SharePoint 2010
• SharePoint 2007
• Sitecore
• Any SQL-based CRM system
• Veeva Vault
• Veritas Enterprise Vault
(Symantec eVault)
• West km
• Xerox DocuShare
• Yammer
Fried data summit big data for lob content
The average $1 billion company maintains 48 disparate
financial systems and uses 2.7 ERP systems
Integration Gaps Impact Performance
Source: The Hackett Group
Big Data on LoB Data:
Data Science and Data Discovery
• Volume, velocity, and
variety of data
• Potential business impact
• Ease of use
• Agility and flexibility
• Time-to-results
• Installed user base
• Complexity of analysis
• Potential impact
• Range of tools
• Smart algorithms
• Difficult to implement
• Slow and complex
• Narrow focus of
analysis
• Limited depth of
information exploration
• Low complexity of
analysis
BIG
DATA
DATA
SCIENCE
DATA
DISCOVERY
Source: PARC
Predictive Inventory
Levels to Minimize
Warehousing Costs
Personalized
Medicine Treatment
Programs
Smart Meter
Monitoring for
Customer Value Add
Customer Churn Analysis for
Increased Customer Lifetime
Value
Trade Options and
Futures Pricing
Platform

–
o
o
–
–
–

Example: Optimizing Leaseholds &
Mineral Rights in Oil & Gas Exploration

–
o
o
–
–
–

Example: Clinical Trial Management for
Pharmaceutical Development
Example: Genetics Analysis for Clinical
Research
18

–
–
–
–

Example: Insurance Claim Analysis
I am a software designer and sit at a desk all day. I could
not sit comfortably for months. I was unable to work until
the beginning of November. However, my company could
not wait for me to recover and was not able to provide me
with my job back. I did have six months of short term
disability, which I have to repay.
I was unable to get a new job until January 2, so I will be
claiming lost earnings from April 4 until the end of
December, nine months worth.
Fried data summit big data for lob content
Example: (Pharma R&D) Unified View
1. Documentum Image
2. SharePoint Doc
3. Regulatory Record
4. MEDLINE article
Multiple Sources One View
Search: amgen 655
Relationships Discovered:
Antibodies: mAb
Receptors: DR5, IGF-1R
Labs: Oncology 1
People: David Chang
Examples: Financial Risk Management
Example: Analyst Workbench
Data Discovery Applications - Patterns
Research Portal Unified View Customer Service Compliance
Analyst’s workbench
Management Adviser
Innovation Center
Voice of the Customer
Logistics Center
Consolidated Dashboard
Call Center
Online Service
Sales Dashboard
Fraud Center
E-Discovery
Info Governance
Fried data summit big data for lob content
Fried data summit big data for lob content
A “Recipe” for harnessing LoB data
Connect to Authoritative Sources
Develop a list, prioritize, and iterate
Create Structure from Human Language
Using text analytics techniques
Deploy a polished, flexible UX
Focus on users and use cases, don’t over-constrain it
Start with a Target Application
Incorporate your business drivers
Selecting Sources
a
b
c
d
e
f
g
h
ij
k l
mn
o
p
q
ImpactofContent
Onboarding & Cleanup Effort
Content Sources for Onboarding
a R&D projects - reports
b R&D projects - research notebooks
c Historical projects (OCR)
d Prototype data
e Lab notes
f Patent prep library
g CAD drawings
h Testing/Stress Data
i Design Patterns
j Technical Data Sheets
k Expert Profiles
l Regulation Database
m Subscription (OneSource, Lexis)
n Industry database
o Competitor Web Crawling
p Industry patents
q Newswires
Text Analytics Techniques
Entity Extraction
• Well Established
• Often essential to faceted navigation
• Driven by lexical resources (Taxonomies)
Acronym
Person Location End of sentence
End of
paragraph
Date
Base = 2002-03-XX
Fact Extraction
Substance
Base=„Gold“
Class=„Element“
Number=79
Symbol=Au
Location
Base=„Qilian“
Country=„China“
Region=„Asia“
Subregion=„East“
„The Red Valley property lies within the Qilian fold belt
which is host to gold deposits.“
Qilian is location of gold
Extracted Fact: Substances x Locations
Substance
Base=„Gold“
Class=„Element“
Number=79
Symbol=Au
Location=„Qilian“
Location
Base=„Qilian“
Country=„China“
Region=„Asia“
Subregion=„East“
Substance=„Gold“
Indicates a
gold location
32
User Experience is a Discipline
33
This session
Fried data summit big data for lob content
Traditional IM
 Requirements based
 Top-down design
 Integration and re-use
 Technology Consolidation
 World of EDW, CRM, ERP, ECM
 Competence Centers
 Commercial Software
“Big Data” Style
 Opportunity Oriented
 Bottom-up Experimentation
 Immediate use and gratification
 Tool proliferation
 “World of Hadoop”
 Hackathons
 Open Source
Contact:
Jeff.Fried@BAinsight.com
www.BAinsight.com
Questions

More Related Content

PDF
CI/DC in MLOps by J.B. Hunt
PDF
An Overview of the Neo4j Cloud Strategy and the Future of Graph Databases in ...
PPTX
The 5 Keys to a Killer Data Lake
PDF
Data lineage
PPTX
Operationalized Analytics in the Enterprise
PDF
Commercializing Alternative Data
PPTX
Azure data catalog your data your way eugene polonichko dataconf 21 04 18
PDF
Maximize the Value of Your Data: Neo4j Graph Data Platform
CI/DC in MLOps by J.B. Hunt
An Overview of the Neo4j Cloud Strategy and the Future of Graph Databases in ...
The 5 Keys to a Killer Data Lake
Data lineage
Operationalized Analytics in the Enterprise
Commercializing Alternative Data
Azure data catalog your data your way eugene polonichko dataconf 21 04 18
Maximize the Value of Your Data: Neo4j Graph Data Platform

What's hot (20)

PDF
Using Machine Learning to Capture Data Meaning and Wrangle it to Liberate its...
PDF
Gdpr ccpa steps to near as close to compliancy as possible with low risk of f...
PDF
Destroying Data Silos
PDF
Comment transformer vos données en informations exploitables
PDF
Looking for a Data Partner? 7 Things to Consider
PPT
Sound Data Quality for CRM
PPTX
Using a Graph Database for Next-Gen MDM
PDF
Enterprise ready: a look at Neo4j in production
PDF
Lean Data Lineage v10
PDF
GraphTalk Copenhagen - Killing Data Silos in the Life Sciences with Neo4j
PDF
Creating a Modern Data Architecture for Digital Transformation
PDF
Data Architecture for Machine Learning
PPTX
Is Your Staff Big Data Ready? 5 Things to Know About What It Will Take to Suc...
PDF
Neo4j Solutions - Master Data Management
PPTX
Creating a Data Distribution Knowledge Base using Neo4j, UBS
PDF
Opening Keynote: Why Elastic?
PPTX
Kickstart a Data Quality Strategy to Build Trust in Data
PDF
Cómo transformar los datos en análisis con los que tomar decisiones
PPTX
Data Integrity: The Baseline for Innovation
PDF
Total Data Governance on Hadoop with Talend and Cloudera
Using Machine Learning to Capture Data Meaning and Wrangle it to Liberate its...
Gdpr ccpa steps to near as close to compliancy as possible with low risk of f...
Destroying Data Silos
Comment transformer vos données en informations exploitables
Looking for a Data Partner? 7 Things to Consider
Sound Data Quality for CRM
Using a Graph Database for Next-Gen MDM
Enterprise ready: a look at Neo4j in production
Lean Data Lineage v10
GraphTalk Copenhagen - Killing Data Silos in the Life Sciences with Neo4j
Creating a Modern Data Architecture for Digital Transformation
Data Architecture for Machine Learning
Is Your Staff Big Data Ready? 5 Things to Know About What It Will Take to Suc...
Neo4j Solutions - Master Data Management
Creating a Data Distribution Knowledge Base using Neo4j, UBS
Opening Keynote: Why Elastic?
Kickstart a Data Quality Strategy to Build Trust in Data
Cómo transformar los datos en análisis con los que tomar decisiones
Data Integrity: The Baseline for Innovation
Total Data Governance on Hadoop with Talend and Cloudera
Ad

Similar to Fried data summit big data for lob content (20)

PDF
BAR360 open data platform presentation at DAMA, Sydney
PPTX
[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
PDF
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BI
PDF
Creating a Next-Generation Big Data Architecture
PDF
Creatinganext generationbigdataarchitecture-141204150317-conversion-gate02
PDF
Webinar: Lucidworks + Thomson Reuters for Improved Investment Performance
PDF
The Right Data Warehouse: Automation Now, Business Value Thereafter
PDF
The Maturity Model: Taking the Growing Pains Out of Hadoop
PDF
02 a holistic approach to big data
PDF
Big Data Evolution
PDF
BI Masterclass slides (Reference Architecture v3)
PPTX
Bitkom Cray presentation - on HPC affecting big data analytics in FS
PPTX
Cloud-native Enterprise Data Science Teams
PDF
Customer value analysis of big data products
PDF
12Nov13 Webinar: Big Data Analysis with Teradata and Revolution Analytics
PDF
Cloudian 451-hortonworks - webinar
PDF
Denodo DataFest 2016: Comparing and Contrasting Data Virtualization With Data...
PDF
Advanced Project Data Analytics for Improved Project Delivery
PPTX
Skillwise Big Data part 2
PDF
Gse uk-cedrinemadera-2018-shared
BAR360 open data platform presentation at DAMA, Sydney
[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BI
Creating a Next-Generation Big Data Architecture
Creatinganext generationbigdataarchitecture-141204150317-conversion-gate02
Webinar: Lucidworks + Thomson Reuters for Improved Investment Performance
The Right Data Warehouse: Automation Now, Business Value Thereafter
The Maturity Model: Taking the Growing Pains Out of Hadoop
02 a holistic approach to big data
Big Data Evolution
BI Masterclass slides (Reference Architecture v3)
Bitkom Cray presentation - on HPC affecting big data analytics in FS
Cloud-native Enterprise Data Science Teams
Customer value analysis of big data products
12Nov13 Webinar: Big Data Analysis with Teradata and Revolution Analytics
Cloudian 451-hortonworks - webinar
Denodo DataFest 2016: Comparing and Contrasting Data Virtualization With Data...
Advanced Project Data Analytics for Improved Project Delivery
Skillwise Big Data part 2
Gse uk-cedrinemadera-2018-shared
Ad

More from Jeff Fried (20)

PPTX
AI for Intelligent Search & Discovery
PDF
Use O365 and Azure Cognitive Services for intelligent search
PDF
The Race is on: comparing Google and Microsoft's Cognitive Services
PDF
Fried data summit data quality data analytics together
PPTX
Is BCS Dead?
PPTX
Cloud Hybrid Search with SharePoint
PPTX
Fried connecting across silos seminar
PPTX
AutoClassificaiton - Rules versus Machine Learning
PPTX
Understanding and Applying Cloud Hybrid Search
PPTX
O365 Tools for Building a Digital Workplace
PPTX
search driven intranets
PPTX
Understanding and Applying Cloud Hybrid Search
PPTX
Searching for SharePoint Analytics
PDF
Take Cloud Hybrid Search to the Next Level
PDF
Fried sp techcon hybrid search deeper dive
PPTX
Succeeding with Hybrid SharePoint (includes new Cloud SSA material)
PPTX
Search Success in 2016 - Recap of ESE2015
PPTX
Information Strategy with O365 in Mind
PPTX
Succeeding with Hybrid SharePoint and search
PPTX
Spsct fried info strategy session
AI for Intelligent Search & Discovery
Use O365 and Azure Cognitive Services for intelligent search
The Race is on: comparing Google and Microsoft's Cognitive Services
Fried data summit data quality data analytics together
Is BCS Dead?
Cloud Hybrid Search with SharePoint
Fried connecting across silos seminar
AutoClassificaiton - Rules versus Machine Learning
Understanding and Applying Cloud Hybrid Search
O365 Tools for Building a Digital Workplace
search driven intranets
Understanding and Applying Cloud Hybrid Search
Searching for SharePoint Analytics
Take Cloud Hybrid Search to the Next Level
Fried sp techcon hybrid search deeper dive
Succeeding with Hybrid SharePoint (includes new Cloud SSA material)
Search Success in 2016 - Recap of ESE2015
Information Strategy with O365 in Mind
Succeeding with Hybrid SharePoint and search
Spsct fried info strategy session

Recently uploaded (20)

PPTX
presentation_pfe-universite-molay-seltan.pptx
PDF
SASE Traffic Flow - ZTNA Connector-1.pdf
PPTX
522797556-Unit-2-Temperature-measurement-1-1.pptx
PPTX
artificial intelligence overview of it and more
PDF
Automated vs Manual WooCommerce to Shopify Migration_ Pros & Cons.pdf
PDF
How to Ensure Data Integrity During Shopify Migration_ Best Practices for Sec...
PDF
Vigrab.top – Online Tool for Downloading and Converting Social Media Videos a...
PPTX
Job_Card_System_Styled_lorem_ipsum_.pptx
PDF
WebRTC in SignalWire - troubleshooting media negotiation
PDF
Best Practices for Testing and Debugging Shopify Third-Party API Integrations...
PDF
FINAL CALL-6th International Conference on Networks & IOT (NeTIOT 2025)
PPTX
introduction about ICD -10 & ICD-11 ppt.pptx
PDF
Introduction to the IoT system, how the IoT system works
PDF
Slides PDF The World Game (s) Eco Economic Epochs.pdf
PDF
The New Creative Director: How AI Tools for Social Media Content Creation Are...
DOCX
Unit-3 cyber security network security of internet system
PPTX
Module 1 - Cyber Law and Ethics 101.pptx
PPTX
Introuction about WHO-FIC in ICD-10.pptx
PPTX
Internet___Basics___Styled_ presentation
PPTX
Introduction to Information and Communication Technology
presentation_pfe-universite-molay-seltan.pptx
SASE Traffic Flow - ZTNA Connector-1.pdf
522797556-Unit-2-Temperature-measurement-1-1.pptx
artificial intelligence overview of it and more
Automated vs Manual WooCommerce to Shopify Migration_ Pros & Cons.pdf
How to Ensure Data Integrity During Shopify Migration_ Best Practices for Sec...
Vigrab.top – Online Tool for Downloading and Converting Social Media Videos a...
Job_Card_System_Styled_lorem_ipsum_.pptx
WebRTC in SignalWire - troubleshooting media negotiation
Best Practices for Testing and Debugging Shopify Third-Party API Integrations...
FINAL CALL-6th International Conference on Networks & IOT (NeTIOT 2025)
introduction about ICD -10 & ICD-11 ppt.pptx
Introduction to the IoT system, how the IoT system works
Slides PDF The World Game (s) Eco Economic Epochs.pdf
The New Creative Director: How AI Tools for Social Media Content Creation Are...
Unit-3 cyber security network security of internet system
Module 1 - Cyber Law and Ethics 101.pptx
Introuction about WHO-FIC in ICD-10.pptx
Internet___Basics___Styled_ presentation
Introduction to Information and Communication Technology

Fried data summit big data for lob content

  • 1. Big Data for Line-of-Business Content Jeff Fried CTO, BA Insight Data Summit May 2017
  • 4. Focused on Search and SharePoint since 2004 Longtime Search Nerd • CTO, BA Insight • Senior PM, Microsoft • VP, FAST • SVP, LingoMotors About Jeff Fried Passionate About • Search • SharePoint • Search-driven applications • Information Strategy Blog: BAinsight.com/blog Technet Column “A View from the Crawlspace” jeff.fried@bainsight.com
  • 5. About BA Insight   – Connectivity – Applications - – Classification - – Analytics 
  • 8. Source: Forrsights Strategy Spotlight: Business Intelligence And Big Data Unstructured 50TB Semi-structured 2 TB Structured 12 TB Only 12% used today Average data volume per company 9 TB 75 TB 0.6 TB 5 TB 4 TB 50 TB SMBs: LEs: Companies don’t use most of their data
  • 10. Connectors to Many Enterprise Systems • Aderant • Amazon S3 • Alfresco • Box • Confluence • CuadraSTAR • Elite / 3E • EMC Documentum • EMC eRoom • Google Drive • HP Consolidated Archive • (EAS, aka Zantaz) • HPE Records Manager/HP TRIM • IBM Connections • IBM Content Manager • IBM DB2 • IBM FileNet P8 • IBM Lotus Notes • IBM WebSphere • iManage Work • Jive • LegalKEY • LexisNexis Interaction • Lotus Notes Databases • Microsoft Dynamics CRM • Microsoft Exchange • Microsoft Exchange Public Folders • Microsoft SQL Server • MySQL • NetDocuments • Neudesic The Firm Directory • Objective • OpenText LiveLink/RM • OpenText eDOCS DM • Oracle Database • Oracle WebCenter • Oracle WebCenter Content (UCM/Stellent) • PLC/Practical Law • ProLaw • Salesforce.com • SAP ERP • ServiceNow • SharePoint Online • SharePoint 2016 • SharePoint 2013 • SharePoint 2010 • SharePoint 2007 • Sitecore • Any SQL-based CRM system • Veeva Vault • Veritas Enterprise Vault (Symantec eVault) • West km • Xerox DocuShare • Yammer
  • 12. The average $1 billion company maintains 48 disparate financial systems and uses 2.7 ERP systems Integration Gaps Impact Performance Source: The Hackett Group
  • 13. Big Data on LoB Data: Data Science and Data Discovery • Volume, velocity, and variety of data • Potential business impact • Ease of use • Agility and flexibility • Time-to-results • Installed user base • Complexity of analysis • Potential impact • Range of tools • Smart algorithms • Difficult to implement • Slow and complex • Narrow focus of analysis • Limited depth of information exploration • Low complexity of analysis BIG DATA DATA SCIENCE DATA DISCOVERY
  • 14. Source: PARC Predictive Inventory Levels to Minimize Warehousing Costs Personalized Medicine Treatment Programs Smart Meter Monitoring for Customer Value Add Customer Churn Analysis for Increased Customer Lifetime Value Trade Options and Futures Pricing Platform
  • 15.  – o o – – –  Example: Optimizing Leaseholds & Mineral Rights in Oil & Gas Exploration
  • 16.  – o o – – –  Example: Clinical Trial Management for Pharmaceutical Development
  • 17. Example: Genetics Analysis for Clinical Research 18
  • 18.  – – – –  Example: Insurance Claim Analysis I am a software designer and sit at a desk all day. I could not sit comfortably for months. I was unable to work until the beginning of November. However, my company could not wait for me to recover and was not able to provide me with my job back. I did have six months of short term disability, which I have to repay. I was unable to get a new job until January 2, so I will be claiming lost earnings from April 4 until the end of December, nine months worth.
  • 20. Example: (Pharma R&D) Unified View 1. Documentum Image 2. SharePoint Doc 3. Regulatory Record 4. MEDLINE article Multiple Sources One View Search: amgen 655 Relationships Discovered: Antibodies: mAb Receptors: DR5, IGF-1R Labs: Oncology 1 People: David Chang
  • 23. Data Discovery Applications - Patterns Research Portal Unified View Customer Service Compliance Analyst’s workbench Management Adviser Innovation Center Voice of the Customer Logistics Center Consolidated Dashboard Call Center Online Service Sales Dashboard Fraud Center E-Discovery Info Governance
  • 26. A “Recipe” for harnessing LoB data Connect to Authoritative Sources Develop a list, prioritize, and iterate Create Structure from Human Language Using text analytics techniques Deploy a polished, flexible UX Focus on users and use cases, don’t over-constrain it Start with a Target Application Incorporate your business drivers
  • 27. Selecting Sources a b c d e f g h ij k l mn o p q ImpactofContent Onboarding & Cleanup Effort Content Sources for Onboarding a R&D projects - reports b R&D projects - research notebooks c Historical projects (OCR) d Prototype data e Lab notes f Patent prep library g CAD drawings h Testing/Stress Data i Design Patterns j Technical Data Sheets k Expert Profiles l Regulation Database m Subscription (OneSource, Lexis) n Industry database o Competitor Web Crawling p Industry patents q Newswires
  • 29. Entity Extraction • Well Established • Often essential to faceted navigation • Driven by lexical resources (Taxonomies) Acronym Person Location End of sentence End of paragraph Date Base = 2002-03-XX
  • 30. Fact Extraction Substance Base=„Gold“ Class=„Element“ Number=79 Symbol=Au Location Base=„Qilian“ Country=„China“ Region=„Asia“ Subregion=„East“ „The Red Valley property lies within the Qilian fold belt which is host to gold deposits.“ Qilian is location of gold Extracted Fact: Substances x Locations Substance Base=„Gold“ Class=„Element“ Number=79 Symbol=Au Location=„Qilian“ Location Base=„Qilian“ Country=„China“ Region=„Asia“ Subregion=„East“ Substance=„Gold“ Indicates a gold location
  • 31. 32 User Experience is a Discipline
  • 32. 33
  • 35. Traditional IM  Requirements based  Top-down design  Integration and re-use  Technology Consolidation  World of EDW, CRM, ERP, ECM  Competence Centers  Commercial Software “Big Data” Style  Opportunity Oriented  Bottom-up Experimentation  Immediate use and gratification  Tool proliferation  “World of Hadoop”  Hackathons  Open Source