SlideShare a Scribd company logo
1 © 2017 Deep SEARCH 9 GmbH1http://www.deepsearchnine.com
Deep SEARCH 9
Approaches of Web Information Analysis in a
Day to Day Work Environment.
II-SDV 2017 24-25 April Nice, France
Klaus Kater
Deep SEARCH 9 GmbH
Managing Partner
http://www.deepsearchnine.com
2 © 2017 Deep SEARCH 9 GmbH2http://www.deepsearchnine.com
Web Information Analysis
Many areas of critical importance call for research,
continuous monitoring, analysis and distribution of
collected intelligence within an organization to
support immediate and competent decision making.
Information Sources of Critical Importance
3 © 2017 Deep SEARCH 9 GmbH3http://www.deepsearchnine.com
Web Information Analysis
Information Sources of Critical Importance
• Corporate news (e.g. acquisitions, pipeline information, …)
• Regulatory documents (e.g. ISO 22301: BCM, PDUFA V, …)
• Technology (e.g. research funding, patents, publications, …)
• Corporate Websites (e.g. competitors, start-ups, supply chain, …)
• Newsletters (e.g. BioCentury, CafePharma, …)
• Cyber threats (e.g. CERTs, ransom attacks, false news, …)
• Portals (e.g. WebMD, NIH, Dr. Mercola, …)
• Online Databases (e.g. DrugBank, UniProt, Wiki Pathways, …)
• Online Registries (e.g. CTR, Edgar Online, …)
• Social Media (e.g. Twitter, Facebook, Blogs, …)
• Shops (e.g. product prices, counterfeiting, …)
• Semantic Web (e.g. Wiki Data, DBPedia, OBO, MeSH, …)
4 © 2017 Deep SEARCH 9 GmbH4http://www.deepsearchnine.com
Web Information Analysis
Surface Web
Deep Web
Dark Web
• Corporate news (e.g. acquisitions, pipeline information, …)
• Regulatory documents (e.g. ISO 22301: BCM, PDUFA V, …)
• Technology (e.g. research funding, patents, publications, …)
• Corporate Websites (e.g. competitors, start-ups, supply chain, …)
• Newsletters (e.g. BioCentury, CafePharma, …)
• Cyber threats (e.g. CERTs, ransom attacks, false news, …)
• Portals (e.g. WebMD, NIH, Dr. Mercola, …)
• Online Databases (e.g. DrugBank, UniProt, Wiki Pathways, …)
• Online Registries (e.g. CTR, Edgar Online, …)
• Social Media (e.g. Twitter, Facebook, Blogs, …)
• Shops (e.g. product prices, counterfeiting, …)
• Semantic Web (e.g. Wiki Data, DBPedia, OBO, MeSH, …)
Information Sources of Critical Importance
5 © 2017 Deep SEARCH 9 GmbH5http://www.deepsearchnine.com
Web Information Analysis
Surface Web
Deep Web
Dark Web
• Corporate news (e.g. acquisitions, pipeline information, …)
• Regulatory documents (e.g. ISO 22301: BCM, PDUFA V, …)
• Technology (e.g. research funding, patents, publications, …)
• Corporate Websites (e.g. competitors, start-ups, supply chain, …)
• Newsletters (e.g. BioCentury, CafePharma, …)
• Cyber threats (e.g. CERTs, ransom attacks, false news, …)
• Portals (e.g. WebMD, NIH, Dr. Mercola, …)
• Online Databases (e.g. DrugBank, UniProt, Wiki Pathways, …)
• Online Registries (e.g. CTR, Edgar Online, …)
• Social Media (e.g. Twitter, Facebook, Blogs, …)
• Shops (e.g. product prices, counterfeiting, …)
• Semantic Web (e.g. Wiki Data, DBPedia, OBO, MeSH, …)
Information Sources of Critical Importance
6 © 2017 Deep SEARCH 9 GmbH6http://www.deepsearchnine.com
Web Information Analysis
• Corporate news (e.g. acquisitions, pipeline information, …)
• Regulatory documents (e.g. ISO 22301: BCM, PDUFA V, …)
• Technology (e.g. research funding, patents, publications, …)
• Corporate Websites (e.g. competitors, start-ups, supply chain, …)
• Newsletters (e.g. BioCentury, CafePharma, …)
• Cyber threats (e.g. CERTs, ransom attacks, false news, …)
• Portals (e.g. WebMD, NIH, Dr. Mercola, …)
• Online Databases (e.g. DrugBank, UniProt, Wiki Pathways, …)
• Online Registries (e.g. CTR, Edgar Online, …)
• Social Media (e.g. Twitter, Facebook, Blogs, …)
• Shops (e.g. product prices, counterfeiting, …)
• Semantic Web (e.g. Wiki Data, DBPedia, OBO, MeSH, …)
Information Sources of Critical Importance
Surface Web
Deep Web
Dark Web
Who is doing this research?
And what tools are used?
Public Search E-Mail clientBrowser Spread sheets
7 © 2017 Deep SEARCH 9 GmbH7http://www.deepsearchnine.com
Sources
Surface Web
Deep Web
Decisions
Manualresearch
Decision makers
Web Information Analysis
• 100s of emails…
• 1,000s of websites…
• Once a week, daily, every other hour?
• Keep sitting there, hitting F5 ;-)
reliable?
systematic?
structured?
repeatable?
fast enough?
8 © 2017 Deep SEARCH 9 GmbH8http://www.deepsearchnine.com
Google Repository
Google
Rating
Magic
Google Ads
Surf Behavior
Search Profile
Army of
Google Bots
World Wide Web
.com
google
.de …
max. 1000
results
The Public Search Situation (e.g. Google)
• The “Rating Magic” and therefore who gets
access to which results is determined
solely by Google
• The browser’s finger print tells Google who
is interested in what topics
Public Search
• Restricting search to a specific context, e.g.
“find me persons only” is not possible
(I’m not going to address the “Public Search Paradoxon” here…)
9 © 2017 Deep SEARCH 9 GmbH9http://www.deepsearchnine.com
Sources
Surface Web
Deep Web
Decisions
Manualresearch
Decision makers
Web Information Analysis
• 100s of emails…
• 1,000s of websites…
• Once a week, daily, every other hour?
• Keep sitting there, hitting F5 ;-)
10 © 2017 Deep SEARCH 9 GmbH10http://www.deepsearchnine.com
Sources
Surface Web
Deep Web
Decisions
Manualresearch
Decision makers
Web Information Analysis
11 © 2017 Deep SEARCH 9 GmbH11http://www.deepsearchnine.com
Sources
Information Scientists
Search Specialists
Knowledge WorkersSurface Web
Deep Web
Databases
Repositories
Manualresearch
Competitive Intelligence
Web Information Analysis
Decisions
Regulatory Affairs
Research & Development
there are many more…
Decision makers
Expert Search
12 © 2017 Deep SEARCH 9 GmbH12http://www.deepsearchnine.com
Sources
Information Scientists
Surface Web
Deep Web
Databases
Repositories
Dark Web
SEARCHCORPORA
• Start-ups
• Competitors
• Regulatory
• New technology
• …
Manualresearch
Ontologies
Scheduled
execution
Unattendedupdates
Content assessment
Automatic publication
• Known (trusted) sources
• More complete
• Faster
Managed Intelligence
Competitive Intelligence
Decisions
Regulatory Affairs
Research & Development
there are many more…
• Information source selection
• Content structuring
• Linking of disparate sources
• Ontology management
• SEARCHCORPUS management
Managed Intelligence
Search Competence Center
13 © 2017 Deep SEARCH 9 GmbH13http://www.deepsearchnine.com
Sources
Information Scientists
Surface Web
Deep Web
Databases
Repositories
Dark Web
SEARCHCORPORA
• Start-ups
• Competitors
• Regulatory
• New technology
• …
Manualresearch
Ontologies
Scheduled
execution
Unattendedupdates
Content assessment
Automatic publication
Competitive Intelligence
Decisions
Regulatory Affairs
Research & Development
there are many more…
Direct access for
immediate answers
within predefined
scopes of interest
• Known (trusted) sources
• More complete
• Faster
Managed Intelligence
• Information source selection
• Content structuring
• Linking of disparate sources
• Ontology management
• SEARCHCORPUS management
Managed Intelligence
Search Competence Center
14 © 2017 Deep SEARCH 9 GmbH14http://www.deepsearchnine.com
Triple Store
Search
EngineDatabase
Connectors
{ APIs } Scheduler
Parallel
Processing
Engine
Job
Job
Job
Analyz
ers
Analyz
ers
Agents
Corporate
Websites
Portals
News
Azure
Cognitive
Services
Browser Farm
> whois
Whois Proxy
Crawlers
Crawlers
Crawlers
Proprietary Databases Document Repositories
Managed Intelligence Architecture
> 2.000 IANA Top Level Domains
Registries (e.g. .com)
Trust!
15 © 2017 Deep SEARCH 9 GmbH15http://www.deepsearchnine.com
Triple Store
Search
EngineDatabase
Connectors
{ APIs } Scheduler
Parallel
Processing
Engine
Job
Job
Job
Analyz
ers
Analyz
ers
Agents
Corporate
Websites
Portals
News
Azure
Cognitive
Services
> whois
Whois Proxy
Crawlers
Crawlers
Crawlers
Proprietary Databases Document Repositories
Managed Intelligence Architecture
> 2.000 IANA Top Level Domains
Registries (e.g. .com)
Trust!
Integrated Development Studio
Browser Farm
16 © 2017 Deep SEARCH 9 GmbH16http://www.deepsearchnine.com
Managed Intelligence Architecture
Triple Store
Search
EngineDatabase
Connectors
{ APIs } Scheduler
Parallel
Processing
Engine
Job
Job
Job
Analyz
ers
Analyz
ers
Agents
Corporate
Websites
Portals
News
Azure
Cognitive
Services
> whois
Whois Proxy
Crawlers
Crawlers
Crawlers
Proprietary Databases Document Repositories
> 2.000 IANA Top Level Domains
Registries (e.g. .com)
Trust!
Integrated Development Studio
Browser Farm
17 © 2017 Deep SEARCH 9 GmbH17http://www.deepsearchnine.com
Managed Intelligence Architecture
Triple Store
Search
EngineDatabase
Connectors
{ APIs } Scheduler
Parallel
Processing
Engine
Job
Job
Job
Analyz
ers
Analyz
ers
Agents
Corporate
Websites
Portals
News
Azure
Cognitive
Services
> whois
Whois Proxy
Crawlers
Crawlers
Crawlers
Proprietary Databases Document Repositories
> 2.000 IANA Top Level Domains
Registries (e.g. .com)
Trust!
Integrated Development Studio
Browser Farm
Information Scientists
Search Competence Center
• Information source selection
• Content structuring
• Linking of disparate sources
• Ontology management
• SEARCHCORPUS management
• Viewer management
18 © 2017 Deep SEARCH 9 GmbH18http://www.deepsearchnine.com
Managed Intelligence Architecture
Information Scientists
Development Studio
• Crawl selected sources
• Extract data
• Tag known entities in content
• Filter based on data, content or tags
• Link data into SEARCHCORPUS®
• Automatically renew according to schedule
• Provide interactive search to end users
Configure the system to
19 © 2017 Deep SEARCH 9 GmbH19http://www.deepsearchnine.com
>
20 © 2017 Deep SEARCH 9 GmbH20http://www.deepsearchnine.com
Managed Intelligence Architecture
Information Scientists
Search Competence Center
21 © 2017 Deep SEARCH 9 GmbH21http://www.deepsearchnine.com
Ontology Management
• Define ontology and rules
MeSH,
Proprietary taxonomies,
…….
Criteria's
Ontology
Extracted Data
Websites
of extracted
companies / institutes, …
Publications
Patents
Search Competence Center:
• Address Room Management
select relevant addresses
bioscentury.com
clinicaltrials.gov,
bloomberg.com, …….
Search Room
CRISPR AND “diabetes type 1”~3World Wide Web
Deep SEARCH 9
repeatable
targeted
systematic
structured
reliable
flexible
24hx7
22 © 2017 Deep SEARCH 9 GmbH22http://www.deepsearchnine.com
Deep SEARCH 9
Approaches of Web Information Analysis in a
Day to Day Work Environment.
II-SDV 2017 24-25 April Nice, France
Klaus Kater
Deep SEARCH 9 GmbH
Managing Partner
http://www.deepsearchnine.com

More Related Content

PDF
II-SDV 2017: Deep SEARCH 9
PDF
II-SDV 2017: Custom Open Source Search Engine with Drupal 8 and Solr at Frenc...
PDF
II-SDV 2017: Gridlogics Technologies
PDF
II-SDV 2017: What is Innovation and how can we measure it?
PDF
II-SDV 2017: How Visualisation of Open Patent Data can help with Strategic De...
PDF
II-SDV 2017: Effective Communication of Complex Monitoring Results: An innova...
PDF
II-SDV 2017: Spotting the Stars in your Galaxy of Patent Data
PDF
II-SDV 2017: Localizing International Content for Search, Data Mining and Ana...
II-SDV 2017: Deep SEARCH 9
II-SDV 2017: Custom Open Source Search Engine with Drupal 8 and Solr at Frenc...
II-SDV 2017: Gridlogics Technologies
II-SDV 2017: What is Innovation and how can we measure it?
II-SDV 2017: How Visualisation of Open Patent Data can help with Strategic De...
II-SDV 2017: Effective Communication of Complex Monitoring Results: An innova...
II-SDV 2017: Spotting the Stars in your Galaxy of Patent Data
II-SDV 2017: Localizing International Content for Search, Data Mining and Ana...

What's hot (20)

PDF
ICIC 2017: Technology Scouting: Decision Support in Strategic Analyses for Te...
PDF
II-SDV 2017: Centredoc
PDF
II-PIC 2017: Gain insight into technical, legal and business information thro...
PPTX
II-PIC 2017: Gain insight into technical, legal and business information thro...
PDF
II-PIC 2017: Porduct presentation minesoft
PDF
II-SDV 2015, 20 - 21 April, in Nice
PDF
ICIC 2017: The Use of Patent Information for Innovation and Competitive Intel...
PDF
II-SDV 2015, 20 - 21 April, in Nice
PDF
II-PIC 2017: Product presentation Lighthouse IP
PDF
ICIC 2017: New product presentation minesoft
PDF
II-PIC 2017: Product Presentation LexisNexis
PDF
AI-SDV 2020: Using Transformer technology to build an AI based personal News ...
PDF
II-SDV 2015, 20 - 21 April, in Nice
PDF
ICIC 2017: New product presentationsLighthouse IP
PDF
II-PIC 2017: Patent Information User Group PIUG
PDF
II-SDV 2015, 20 - 21 April, in Nice
PDF
AI-SDV 2021 - Klaus Kater - The secret of successful CI: precise targeting + ...
PDF
AI-SDV 2021 - Deep SEARCH 9
PDF
ICIC 2014 Patent Citation Analysis: Tools and Techniques
PDF
AI-SDV 2021 - Tony Trippe - The Current State of Machine Learning for Patent ...
ICIC 2017: Technology Scouting: Decision Support in Strategic Analyses for Te...
II-SDV 2017: Centredoc
II-PIC 2017: Gain insight into technical, legal and business information thro...
II-PIC 2017: Gain insight into technical, legal and business information thro...
II-PIC 2017: Porduct presentation minesoft
II-SDV 2015, 20 - 21 April, in Nice
ICIC 2017: The Use of Patent Information for Innovation and Competitive Intel...
II-SDV 2015, 20 - 21 April, in Nice
II-PIC 2017: Product presentation Lighthouse IP
ICIC 2017: New product presentation minesoft
II-PIC 2017: Product Presentation LexisNexis
AI-SDV 2020: Using Transformer technology to build an AI based personal News ...
II-SDV 2015, 20 - 21 April, in Nice
ICIC 2017: New product presentationsLighthouse IP
II-PIC 2017: Patent Information User Group PIUG
II-SDV 2015, 20 - 21 April, in Nice
AI-SDV 2021 - Klaus Kater - The secret of successful CI: precise targeting + ...
AI-SDV 2021 - Deep SEARCH 9
ICIC 2014 Patent Citation Analysis: Tools and Techniques
AI-SDV 2021 - Tony Trippe - The Current State of Machine Learning for Patent ...
Ad

Viewers also liked (7)

PDF
II-SDV 2017: The "International Chemical Ontology Network"
PDF
II-SDV 2017: From KNIME to HighThroughPut Pipelining - from KNIME to HTPP
PDF
II-SDV 2017: Auto Classification: Can/Should AI replace You?
PDF
II-SDV 2017: Decoding the Gray Shades of Patent White Space Analysis
PDF
II-SV 2017: How to effectively monitor Technological Developments in IP
PDF
II-SDV 2017: Datafari - Building an Open Source Enterprise Search Solution fr...
PDF
II-SDV 2017: Will Virtual Reality (VR) be changing the way we deal with infor...
II-SDV 2017: The "International Chemical Ontology Network"
II-SDV 2017: From KNIME to HighThroughPut Pipelining - from KNIME to HTPP
II-SDV 2017: Auto Classification: Can/Should AI replace You?
II-SDV 2017: Decoding the Gray Shades of Patent White Space Analysis
II-SV 2017: How to effectively monitor Technological Developments in IP
II-SDV 2017: Datafari - Building an Open Source Enterprise Search Solution fr...
II-SDV 2017: Will Virtual Reality (VR) be changing the way we deal with infor...
Ad

Similar to II-SDV 2017: Approaches of Web Information Analysis in a Day to Day Work Environment (20)

PDF
IC-SDV 2019: Deep SEARCH 9
PDF
The Future of Search - Martin White
PPTX
Using the World wide web for Business Research.ppt
PDF
El impacto del big data en la estrategia de los medios de comunicacion by Osc...
PPT
Aslapr market research for entrepreneurs mg irc presentation 09 22-14
PPTX
The Impact of GPT on the Workplace
PPT
Internet research-1200691875464541-5
PPT
Internet research-1200691875464541-5
PPT
Internet research for HRD Profession
PPT
Internet research
PDF
Impacto del Big Data en la empresa española
PDF
Smashing SIlos: UX is the New SEO
PDF
Big Data Analytics Powerpoint Presentation Slides
PDF
Business Intelligence and Analytics Systems for Decision Support 10th Edition...
PDF
II-SDV 2015, 20 - 21 April 2015 in Nice
PDF
Search Solutions 2011: Successful Enterprise Search By Design
PPT
Ahwatukee CoC Market Research for Entrepreneurs Presentation 11_19_14
PDF
Information Technology for Management Digital Strategies for Insight Action a...
PPT
ASLAPR Market Research for Entrepreneurs Presentation 5/13/14
PDF
Information Technology for Management Digital Strategies for Insight Action a...
IC-SDV 2019: Deep SEARCH 9
The Future of Search - Martin White
Using the World wide web for Business Research.ppt
El impacto del big data en la estrategia de los medios de comunicacion by Osc...
Aslapr market research for entrepreneurs mg irc presentation 09 22-14
The Impact of GPT on the Workplace
Internet research-1200691875464541-5
Internet research-1200691875464541-5
Internet research for HRD Profession
Internet research
Impacto del Big Data en la empresa española
Smashing SIlos: UX is the New SEO
Big Data Analytics Powerpoint Presentation Slides
Business Intelligence and Analytics Systems for Decision Support 10th Edition...
II-SDV 2015, 20 - 21 April 2015 in Nice
Search Solutions 2011: Successful Enterprise Search By Design
Ahwatukee CoC Market Research for Entrepreneurs Presentation 11_19_14
Information Technology for Management Digital Strategies for Insight Action a...
ASLAPR Market Research for Entrepreneurs Presentation 5/13/14
Information Technology for Management Digital Strategies for Insight Action a...

More from Dr. Haxel Consult (20)

PDF
AI-SDV 2022: Henry Chang Patent Intelligence and Engineering Management
PDF
AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...
PDF
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...
PDF
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...
PDF
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...
PDF
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...
PDF
AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...
PDF
AI-SDV 2022: Machine learning based patent categorization: A success story in...
PDF
AI-SDV 2022: Machine learning based patent categorization: A success story in...
PDF
AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...
PDF
AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...
PDF
AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...
PDF
AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...
PDF
AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...
PDF
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...
PDF
AI-SDV 2022: Copyright Clearance Center
PDF
AI-SDV 2022: Lighthouse IP
PDF
AI-SDV 2022: New Product Introductions: CENTREDOC
PDF
AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...
PDF
AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...
AI-SDV 2022: Henry Chang Patent Intelligence and Engineering Management
AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...
AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...
AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...
AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...
AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...
AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...
AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...
AI-SDV 2022: Copyright Clearance Center
AI-SDV 2022: Lighthouse IP
AI-SDV 2022: New Product Introductions: CENTREDOC
AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...
AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...

Recently uploaded (20)

PDF
Best Practices for Testing and Debugging Shopify Third-Party API Integrations...
PPTX
presentation_pfe-universite-molay-seltan.pptx
PDF
Paper PDF World Game (s) Great Redesign.pdf
PDF
Sims 4 Historia para lo sims 4 para jugar
PDF
💰 𝐔𝐊𝐓𝐈 𝐊𝐄𝐌𝐄𝐍𝐀𝐍𝐆𝐀𝐍 𝐊𝐈𝐏𝐄𝐑𝟒𝐃 𝐇𝐀𝐑𝐈 𝐈𝐍𝐈 𝟐𝟎𝟐𝟓 💰
PPTX
Introduction about ICD -10 and ICD11 on 5.8.25.pptx
PPT
tcp ip networks nd ip layering assotred slides
PDF
Triggering QUIC, presented by Geoff Huston at IETF 123
PPTX
Internet___Basics___Styled_ presentation
PPTX
Introduction to Information and Communication Technology
PPTX
international classification of diseases ICD-10 review PPT.pptx
PPTX
June-4-Sermon-Powerpoint.pptx USE THIS FOR YOUR MOTIVATION
PDF
Automated vs Manual WooCommerce to Shopify Migration_ Pros & Cons.pdf
PPTX
artificial intelligence overview of it and more
PPTX
introduction about ICD -10 & ICD-11 ppt.pptx
PDF
APNIC Update, presented at PHNOG 2025 by Shane Hermoso
PDF
Unit-1 introduction to cyber security discuss about how to secure a system
PDF
Slides PDF The World Game (s) Eco Economic Epochs.pdf
PDF
SASE Traffic Flow - ZTNA Connector-1.pdf
PPTX
SAP Ariba Sourcing PPT for learning material
Best Practices for Testing and Debugging Shopify Third-Party API Integrations...
presentation_pfe-universite-molay-seltan.pptx
Paper PDF World Game (s) Great Redesign.pdf
Sims 4 Historia para lo sims 4 para jugar
💰 𝐔𝐊𝐓𝐈 𝐊𝐄𝐌𝐄𝐍𝐀𝐍𝐆𝐀𝐍 𝐊𝐈𝐏𝐄𝐑𝟒𝐃 𝐇𝐀𝐑𝐈 𝐈𝐍𝐈 𝟐𝟎𝟐𝟓 💰
Introduction about ICD -10 and ICD11 on 5.8.25.pptx
tcp ip networks nd ip layering assotred slides
Triggering QUIC, presented by Geoff Huston at IETF 123
Internet___Basics___Styled_ presentation
Introduction to Information and Communication Technology
international classification of diseases ICD-10 review PPT.pptx
June-4-Sermon-Powerpoint.pptx USE THIS FOR YOUR MOTIVATION
Automated vs Manual WooCommerce to Shopify Migration_ Pros & Cons.pdf
artificial intelligence overview of it and more
introduction about ICD -10 & ICD-11 ppt.pptx
APNIC Update, presented at PHNOG 2025 by Shane Hermoso
Unit-1 introduction to cyber security discuss about how to secure a system
Slides PDF The World Game (s) Eco Economic Epochs.pdf
SASE Traffic Flow - ZTNA Connector-1.pdf
SAP Ariba Sourcing PPT for learning material

II-SDV 2017: Approaches of Web Information Analysis in a Day to Day Work Environment

  • 1. 1 © 2017 Deep SEARCH 9 GmbH1http://www.deepsearchnine.com Deep SEARCH 9 Approaches of Web Information Analysis in a Day to Day Work Environment. II-SDV 2017 24-25 April Nice, France Klaus Kater Deep SEARCH 9 GmbH Managing Partner http://www.deepsearchnine.com
  • 2. 2 © 2017 Deep SEARCH 9 GmbH2http://www.deepsearchnine.com Web Information Analysis Many areas of critical importance call for research, continuous monitoring, analysis and distribution of collected intelligence within an organization to support immediate and competent decision making. Information Sources of Critical Importance
  • 3. 3 © 2017 Deep SEARCH 9 GmbH3http://www.deepsearchnine.com Web Information Analysis Information Sources of Critical Importance • Corporate news (e.g. acquisitions, pipeline information, …) • Regulatory documents (e.g. ISO 22301: BCM, PDUFA V, …) • Technology (e.g. research funding, patents, publications, …) • Corporate Websites (e.g. competitors, start-ups, supply chain, …) • Newsletters (e.g. BioCentury, CafePharma, …) • Cyber threats (e.g. CERTs, ransom attacks, false news, …) • Portals (e.g. WebMD, NIH, Dr. Mercola, …) • Online Databases (e.g. DrugBank, UniProt, Wiki Pathways, …) • Online Registries (e.g. CTR, Edgar Online, …) • Social Media (e.g. Twitter, Facebook, Blogs, …) • Shops (e.g. product prices, counterfeiting, …) • Semantic Web (e.g. Wiki Data, DBPedia, OBO, MeSH, …)
  • 4. 4 © 2017 Deep SEARCH 9 GmbH4http://www.deepsearchnine.com Web Information Analysis Surface Web Deep Web Dark Web • Corporate news (e.g. acquisitions, pipeline information, …) • Regulatory documents (e.g. ISO 22301: BCM, PDUFA V, …) • Technology (e.g. research funding, patents, publications, …) • Corporate Websites (e.g. competitors, start-ups, supply chain, …) • Newsletters (e.g. BioCentury, CafePharma, …) • Cyber threats (e.g. CERTs, ransom attacks, false news, …) • Portals (e.g. WebMD, NIH, Dr. Mercola, …) • Online Databases (e.g. DrugBank, UniProt, Wiki Pathways, …) • Online Registries (e.g. CTR, Edgar Online, …) • Social Media (e.g. Twitter, Facebook, Blogs, …) • Shops (e.g. product prices, counterfeiting, …) • Semantic Web (e.g. Wiki Data, DBPedia, OBO, MeSH, …) Information Sources of Critical Importance
  • 5. 5 © 2017 Deep SEARCH 9 GmbH5http://www.deepsearchnine.com Web Information Analysis Surface Web Deep Web Dark Web • Corporate news (e.g. acquisitions, pipeline information, …) • Regulatory documents (e.g. ISO 22301: BCM, PDUFA V, …) • Technology (e.g. research funding, patents, publications, …) • Corporate Websites (e.g. competitors, start-ups, supply chain, …) • Newsletters (e.g. BioCentury, CafePharma, …) • Cyber threats (e.g. CERTs, ransom attacks, false news, …) • Portals (e.g. WebMD, NIH, Dr. Mercola, …) • Online Databases (e.g. DrugBank, UniProt, Wiki Pathways, …) • Online Registries (e.g. CTR, Edgar Online, …) • Social Media (e.g. Twitter, Facebook, Blogs, …) • Shops (e.g. product prices, counterfeiting, …) • Semantic Web (e.g. Wiki Data, DBPedia, OBO, MeSH, …) Information Sources of Critical Importance
  • 6. 6 © 2017 Deep SEARCH 9 GmbH6http://www.deepsearchnine.com Web Information Analysis • Corporate news (e.g. acquisitions, pipeline information, …) • Regulatory documents (e.g. ISO 22301: BCM, PDUFA V, …) • Technology (e.g. research funding, patents, publications, …) • Corporate Websites (e.g. competitors, start-ups, supply chain, …) • Newsletters (e.g. BioCentury, CafePharma, …) • Cyber threats (e.g. CERTs, ransom attacks, false news, …) • Portals (e.g. WebMD, NIH, Dr. Mercola, …) • Online Databases (e.g. DrugBank, UniProt, Wiki Pathways, …) • Online Registries (e.g. CTR, Edgar Online, …) • Social Media (e.g. Twitter, Facebook, Blogs, …) • Shops (e.g. product prices, counterfeiting, …) • Semantic Web (e.g. Wiki Data, DBPedia, OBO, MeSH, …) Information Sources of Critical Importance Surface Web Deep Web Dark Web Who is doing this research? And what tools are used? Public Search E-Mail clientBrowser Spread sheets
  • 7. 7 © 2017 Deep SEARCH 9 GmbH7http://www.deepsearchnine.com Sources Surface Web Deep Web Decisions Manualresearch Decision makers Web Information Analysis • 100s of emails… • 1,000s of websites… • Once a week, daily, every other hour? • Keep sitting there, hitting F5 ;-) reliable? systematic? structured? repeatable? fast enough?
  • 8. 8 © 2017 Deep SEARCH 9 GmbH8http://www.deepsearchnine.com Google Repository Google Rating Magic Google Ads Surf Behavior Search Profile Army of Google Bots World Wide Web .com google .de … max. 1000 results The Public Search Situation (e.g. Google) • The “Rating Magic” and therefore who gets access to which results is determined solely by Google • The browser’s finger print tells Google who is interested in what topics Public Search • Restricting search to a specific context, e.g. “find me persons only” is not possible (I’m not going to address the “Public Search Paradoxon” here…)
  • 9. 9 © 2017 Deep SEARCH 9 GmbH9http://www.deepsearchnine.com Sources Surface Web Deep Web Decisions Manualresearch Decision makers Web Information Analysis • 100s of emails… • 1,000s of websites… • Once a week, daily, every other hour? • Keep sitting there, hitting F5 ;-)
  • 10. 10 © 2017 Deep SEARCH 9 GmbH10http://www.deepsearchnine.com Sources Surface Web Deep Web Decisions Manualresearch Decision makers Web Information Analysis
  • 11. 11 © 2017 Deep SEARCH 9 GmbH11http://www.deepsearchnine.com Sources Information Scientists Search Specialists Knowledge WorkersSurface Web Deep Web Databases Repositories Manualresearch Competitive Intelligence Web Information Analysis Decisions Regulatory Affairs Research & Development there are many more… Decision makers Expert Search
  • 12. 12 © 2017 Deep SEARCH 9 GmbH12http://www.deepsearchnine.com Sources Information Scientists Surface Web Deep Web Databases Repositories Dark Web SEARCHCORPORA • Start-ups • Competitors • Regulatory • New technology • … Manualresearch Ontologies Scheduled execution Unattendedupdates Content assessment Automatic publication • Known (trusted) sources • More complete • Faster Managed Intelligence Competitive Intelligence Decisions Regulatory Affairs Research & Development there are many more… • Information source selection • Content structuring • Linking of disparate sources • Ontology management • SEARCHCORPUS management Managed Intelligence Search Competence Center
  • 13. 13 © 2017 Deep SEARCH 9 GmbH13http://www.deepsearchnine.com Sources Information Scientists Surface Web Deep Web Databases Repositories Dark Web SEARCHCORPORA • Start-ups • Competitors • Regulatory • New technology • … Manualresearch Ontologies Scheduled execution Unattendedupdates Content assessment Automatic publication Competitive Intelligence Decisions Regulatory Affairs Research & Development there are many more… Direct access for immediate answers within predefined scopes of interest • Known (trusted) sources • More complete • Faster Managed Intelligence • Information source selection • Content structuring • Linking of disparate sources • Ontology management • SEARCHCORPUS management Managed Intelligence Search Competence Center
  • 14. 14 © 2017 Deep SEARCH 9 GmbH14http://www.deepsearchnine.com Triple Store Search EngineDatabase Connectors { APIs } Scheduler Parallel Processing Engine Job Job Job Analyz ers Analyz ers Agents Corporate Websites Portals News Azure Cognitive Services Browser Farm > whois Whois Proxy Crawlers Crawlers Crawlers Proprietary Databases Document Repositories Managed Intelligence Architecture > 2.000 IANA Top Level Domains Registries (e.g. .com) Trust!
  • 15. 15 © 2017 Deep SEARCH 9 GmbH15http://www.deepsearchnine.com Triple Store Search EngineDatabase Connectors { APIs } Scheduler Parallel Processing Engine Job Job Job Analyz ers Analyz ers Agents Corporate Websites Portals News Azure Cognitive Services > whois Whois Proxy Crawlers Crawlers Crawlers Proprietary Databases Document Repositories Managed Intelligence Architecture > 2.000 IANA Top Level Domains Registries (e.g. .com) Trust! Integrated Development Studio Browser Farm
  • 16. 16 © 2017 Deep SEARCH 9 GmbH16http://www.deepsearchnine.com Managed Intelligence Architecture Triple Store Search EngineDatabase Connectors { APIs } Scheduler Parallel Processing Engine Job Job Job Analyz ers Analyz ers Agents Corporate Websites Portals News Azure Cognitive Services > whois Whois Proxy Crawlers Crawlers Crawlers Proprietary Databases Document Repositories > 2.000 IANA Top Level Domains Registries (e.g. .com) Trust! Integrated Development Studio Browser Farm
  • 17. 17 © 2017 Deep SEARCH 9 GmbH17http://www.deepsearchnine.com Managed Intelligence Architecture Triple Store Search EngineDatabase Connectors { APIs } Scheduler Parallel Processing Engine Job Job Job Analyz ers Analyz ers Agents Corporate Websites Portals News Azure Cognitive Services > whois Whois Proxy Crawlers Crawlers Crawlers Proprietary Databases Document Repositories > 2.000 IANA Top Level Domains Registries (e.g. .com) Trust! Integrated Development Studio Browser Farm Information Scientists Search Competence Center • Information source selection • Content structuring • Linking of disparate sources • Ontology management • SEARCHCORPUS management • Viewer management
  • 18. 18 © 2017 Deep SEARCH 9 GmbH18http://www.deepsearchnine.com Managed Intelligence Architecture Information Scientists Development Studio • Crawl selected sources • Extract data • Tag known entities in content • Filter based on data, content or tags • Link data into SEARCHCORPUS® • Automatically renew according to schedule • Provide interactive search to end users Configure the system to
  • 19. 19 © 2017 Deep SEARCH 9 GmbH19http://www.deepsearchnine.com >
  • 20. 20 © 2017 Deep SEARCH 9 GmbH20http://www.deepsearchnine.com Managed Intelligence Architecture Information Scientists Search Competence Center
  • 21. 21 © 2017 Deep SEARCH 9 GmbH21http://www.deepsearchnine.com Ontology Management • Define ontology and rules MeSH, Proprietary taxonomies, ……. Criteria's Ontology Extracted Data Websites of extracted companies / institutes, … Publications Patents Search Competence Center: • Address Room Management select relevant addresses bioscentury.com clinicaltrials.gov, bloomberg.com, ……. Search Room CRISPR AND “diabetes type 1”~3World Wide Web Deep SEARCH 9 repeatable targeted systematic structured reliable flexible 24hx7
  • 22. 22 © 2017 Deep SEARCH 9 GmbH22http://www.deepsearchnine.com Deep SEARCH 9 Approaches of Web Information Analysis in a Day to Day Work Environment. II-SDV 2017 24-25 April Nice, France Klaus Kater Deep SEARCH 9 GmbH Managing Partner http://www.deepsearchnine.com