SlideShare a Scribd company logo
.consulting .solutions .partnership
Text Analysis with SAP HANA
Text Analysis with SAP HANA
2Oktober 2015 | Text Analysis with SAP HANA - SAP Inside Track Munich
Motivation1 3
Text Analysis with SAP HANA2 7
Enhancement Options - Dictionaries and Rules3 21
Text Analysis with SAP HANA
3Oktober 2015 | Text Analysis with SAP HANA - SAP Inside Track Munich
Motivation1 3
Text Analysis with SAP HANA2 7
Enhancement Options - Dictionaries and Rules3 21
Text Analysis with SAP HANA
Why do we need Text Analysis?
Oktober 2015 | Text Analysis with SAP HANA - SAP Inside Track Munich 4
• According to Merril Lynch 80-90% of all potentially usable business information may originate in
unstructured form
(Structure, Models and Meaning: Is "unstructured" data merely unmodeled?, Intelligent Enterprise, March 1, 2005.)
• The data might origin from:
 Social Networks
 “Letters” from Customer
 ...
• What is the problem with unstructured data?
• It is unstructured!
 Not organized
 No pre-defined data model
 No metadata or mix of data and metadata
 We have a lot of information that is relevant for the business but we cannot access it 
Text Analysis with SAP HANA
How can we solve that issue?
Oktober 2015 | Text Analysis with SAP HANA - SAP Inside Track Munich 5
• Text Analysis: Extracting high quality information from texts
• Typical process of a text analysis:
 Parsing of the text
 Adding features like linguistic information
 Entity recognition: Is it an organization or a person or a place including domain facts like
requests?
 Sentiment analysis: What attitudinal information is “hidden” in the text?
 Insertion of information to database in structured manner
Text Analysis with SAP HANA
6Oktober 2015 | Text Analysis with SAP HANA - SAP Inside Track Munich
Motivation1 3
Text Analysis with SAP HANA2 7
Enhancement Options - Dictionaries and Rules3 21
Text Analysis with SAP HANA
What has this to do with SAP HANA?
Oktober 2015 | Text Analysis with SAP HANA - SAP Inside Track Munich 7
© SAP SE
Text Analysis with SAP HANA
Fulltext Index - Basics
Oktober 2015 | Text Analysis with SAP HANA - SAP Inside Track Munich 8
• Starting point: database table containing the text (types like TEXT, NVARCHAR, BLOB …)
• Create a Fulltext index incl. options (see system view SYS.FULLTEXT_INDEXES)
Text Analysis with SAP HANA
Entity Extraction
Oktober 2015 | Text Analysis with SAP HANA - SAP Inside Track Munich 9
• In order to get valuable information out of the data SAP delivers several configurations
• These configurations focus on entity and fact extraction under specific aspects
• Types of Extraction:
 EXTRACTION_CORE
 EXTRACTION_CORE_ENTERPRISE
 EXTRACTION_CORE_PUBLIC_SECTOR
 EXTRACTION_CORE_VOICEOFCUSTOMER
Oktober 2015 | Text Analysis with SAP HANA - SAP Inside Track Munich 10
Text Analysis with SAP HANA
11Oktober 2015 | Text Analysis with SAP HANA - SAP Inside Track Munich
Motivation1 3
Text Analysis with SAP HANA2 7
Enhancement Options - Dictionaries and Rules3 21
Text Analysis with SAP HANA
Custom Dictionary
Oktober 2015 | Text Analysis with SAP HANA - SAP Inside Track Munich 12
• In several use cases you need to enhance the dictionary due to your business domain
• Structure of a dictionary
© SAP SE
Text Analysis with HANA – Workflow of Enhancement
Oktober 2015 | Text Analysis with SAP HANA - SAP Inside Track Munich 13
1. Find an extraction configuration that is most fitting for you
2. Copy the configuration into the target folder
3. Create a new custom dictionary
4. Reference the dictionary in your configuration copy
5. Recreate the fulltext index using your custom configuration
Oktober 2015 | Text Analysis with SAP HANA - SAP Inside Track Munich 14
Text Analysis with HANA – What’s next?
Oktober 2015 | Text Analysis with SAP HANA - SAP Inside Track Munich 15
• Assume that we are in an “industry”-specific context or mining for “slang”-like facts and entities
• Good example for this are sports!
• We use the example of CrossFit® … as there are some funny facts to extract
• Question: How can we extract complex entities from a text?
• Examples:
 Did somebody attend a CrossFit training?
 Does somebody want to join a CrossFit box?
Text Analysis with HANA – Text Analysis Extraction Rules
Oktober 2015 | Text Analysis with SAP HANA - SAP Inside Track Munich 16
• Extraction rules (CGUL rules): pattern-based language for pattern matching using character or
token-based regular expressions combined with linguistic attributes to define custom entity types.
• Goal of the rule sets:
 Extract complex facts based on relations between entities and predicates.
 Identify entities in domain-specific language and capture facts expressed in new, popular
“slang”
Text Analysis with HANA – Text Analysis Extraction Rules
Oktober 2015 | Text Analysis with SAP HANA - SAP Inside Track Munich 17
Extraction Rule
Regular ExpressionsTokens
Luck Dictionaries
Oktober 2015 | Text Analysis with SAP HANA - SAP Inside Track Munich 20
Text Analysis with HANA – “Lessons Learned”
Oktober 2015 | Text Analysis with SAP HANA - SAP Inside Track Munich 21
• Text Analysis on SAP HANA is extremely powerful
• Besides the delivered content you have a lot of options to adopt the text analysis to extract the
entities and facts that you need
• This also means you have a lot of options that you can set the wrong way 
• Since SP09 rules get compiled upon activation (no separate compilation necessary)
• The documentation is mostly ok but has room for improvement in case of extraction rules
• Creating custom dictionaries and text rules is cumbersome, finding an error (e. g. a typo) is hell
 No support in IDE 
 You can usually activate all objects, create the index … but the index remains empty 
Oktober 2015 | Text Analysis with SAP HANA - SAP Inside Track Munich 22
Q&A
.consulting .solutions .partnership
Dr. Christian Lechner
Principal IT Consultant
+49 (0) 171 7617190
christian.lechner@msg-systems.com
http://scn.sap.com/people/christian.lechner
@lechnerc77
Text Analysis with HANA – Ressources
Oktober 2015 | Text Analysis with SAP HANA - SAP Inside Track Munich 24
• SAP HANA Search Developer Guide (Fulltext Index Options)
help.sap.com -> Search Developer Guide
• SAP HANA Text Analysis Developer Guide:
help.sap.com -> TA Developer Guide
• SAP HANA Text Analysis Language Reference Guide:
help.sap.com -> TA Language Refrence Guide
• SAP HANA Text Analysis Extraction Customization Guide:
help.sap.com -> TA Extraction Customization Guide
• YouTube Playlist of SAP HANA Academy:
Text Analysis and Search

More Related Content

PDF
Text Analysis with SAP HANA
PPTX
HANA SPS07 Text Analysis
PDF
SAP HANA SPS09 - Text Analysis
PDF
Dev207 berlin
PDF
SAP HANA and SAP Vora
PPT
SAP Integrated Business Planning
PPTX
HANA Playground Session_Latest
PPTX
HANA WITH ABAP OVERVIEW
Text Analysis with SAP HANA
HANA SPS07 Text Analysis
SAP HANA SPS09 - Text Analysis
Dev207 berlin
SAP HANA and SAP Vora
SAP Integrated Business Planning
HANA Playground Session_Latest
HANA WITH ABAP OVERVIEW

What's hot (20)

DOC
SAP ABAP Material
PPTX
The HANA Cloud Platform
PDF
Building Custom Advanced Analytics Applications with SAP HANA
PDF
SAP HANA SPS09 - Full-text Search
PDF
SAP HANA Training - For Technical/BASIS administrators.
DOC
Prashantini Krishnan Chandrakumar
PDF
SAP Abap on Hana Training Course Content
PPT
Sapabapcoursecontent 130302033356-phpapp02
PDF
Dmm203 – new approaches for data modelingwith sap hana
PDF
Dmm302 - Sap Hana Data Warehousing: Models for Sap Bw and SQL DW on SAP HANA
PPTX
Hadoop, Spark and Big Data Summit presentation with SAP HANA Vora and a path ...
PDF
ABAP Development in time of S/4 - Do's and Don'ts and Golden Rules for Simpli...
PDF
SAP HANA SPS10- Hadoop Integration
PDF
SAP HANA Vora SITMTY 20160707
PDF
SAP MM Versus SAP S/4 HANA
PDF
Vdocuments.mx sap retail-55fed4ead31a0
PDF
Dmm117 – SAP HANA Processing Services Text Spatial Graph Series and Predictive
PDF
DMM161 HANA_MODELING_2015
PDF
SQL Anywhere and the Internet of Things
PPTX
SAP ECC to S/4HANA Move
SAP ABAP Material
The HANA Cloud Platform
Building Custom Advanced Analytics Applications with SAP HANA
SAP HANA SPS09 - Full-text Search
SAP HANA Training - For Technical/BASIS administrators.
Prashantini Krishnan Chandrakumar
SAP Abap on Hana Training Course Content
Sapabapcoursecontent 130302033356-phpapp02
Dmm203 – new approaches for data modelingwith sap hana
Dmm302 - Sap Hana Data Warehousing: Models for Sap Bw and SQL DW on SAP HANA
Hadoop, Spark and Big Data Summit presentation with SAP HANA Vora and a path ...
ABAP Development in time of S/4 - Do's and Don'ts and Golden Rules for Simpli...
SAP HANA SPS10- Hadoop Integration
SAP HANA Vora SITMTY 20160707
SAP MM Versus SAP S/4 HANA
Vdocuments.mx sap retail-55fed4ead31a0
Dmm117 – SAP HANA Processing Services Text Spatial Graph Series and Predictive
DMM161 HANA_MODELING_2015
SQL Anywhere and the Internet of Things
SAP ECC to S/4HANA Move
Ad

Viewers also liked (8)

PDF
SAP HANA SPS10- Text Analysis & Text Mining
PDF
SAP Inside Track Munich 2016 - SAP HANA Cloud Platform
PPTX
What's new for Text in SAP HANA SPS 11
PDF
SAP HANA Cloud Platform - The big picture
PPTX
SAP HANA in Healthcare: Real-Time Big Data Analysis
PDF
SAP Platform & S/4 HANA - Support for Innovation
PPTX
What's new in SAP HANA SPS 11 SQL/SQLScript
PPTX
What's New in SAP HANA View Modeling
SAP HANA SPS10- Text Analysis & Text Mining
SAP Inside Track Munich 2016 - SAP HANA Cloud Platform
What's new for Text in SAP HANA SPS 11
SAP HANA Cloud Platform - The big picture
SAP HANA in Healthcare: Real-Time Big Data Analysis
SAP Platform & S/4 HANA - Support for Innovation
What's new in SAP HANA SPS 11 SQL/SQLScript
What's New in SAP HANA View Modeling
Ad

Similar to Text Analysis with SAP HANA (10)

PDF
Text analysis matrix event 2015
PDF
hsta1-RecordOfAchievement
PPTX
SAP HANA and SAP Controlling – New Opportunities and New Challenges
PPTX
SAP HANA and SAP Controlling – New Opportunities and New Challenges
DOCX
PDF
How SAP HANA Leverages the Cloud to Glean Business Insights from Unstructured...
PDF
Text Analytics
PPTX
Information Retrieval Systems_Lecture_1_Text_Analytics.pptx
PDF
HANA Intro (KR)
Text analysis matrix event 2015
hsta1-RecordOfAchievement
SAP HANA and SAP Controlling – New Opportunities and New Challenges
SAP HANA and SAP Controlling – New Opportunities and New Challenges
How SAP HANA Leverages the Cloud to Glean Business Insights from Unstructured...
Text Analytics
Information Retrieval Systems_Lecture_1_Text_Analytics.pptx
HANA Intro (KR)

More from Christian Lechner (10)

PDF
Serverless and SAP … Oh Behave
PDF
FaaS by Microsoft: Azure Functions and Azure Durable Functions
PDF
[SOT322] Serverless Side-by-Side Extensions with Azure Durable Functions - Wh...
PDF
Serverless side by-side extensions with Azure Durable Functions
PDF
SAP Embrace - A Look behind the curtains (by minnosphere)
PDF
SAP Inside Track Hamburg 2019 - Side-by-Side Extensibility with Microsoft Azure
PDF
Side-by-Side Extensibility with Microsoft Azure
PPTX
SAP Inside Track 2018 - "Quidquid agis, prudenter agas ..." - Learnings from ...
PDF
NET53494 Extensions in the Age of S/4HANA
PDF
sitFRA_ BRFplus_TheAPIWay
Serverless and SAP … Oh Behave
FaaS by Microsoft: Azure Functions and Azure Durable Functions
[SOT322] Serverless Side-by-Side Extensions with Azure Durable Functions - Wh...
Serverless side by-side extensions with Azure Durable Functions
SAP Embrace - A Look behind the curtains (by minnosphere)
SAP Inside Track Hamburg 2019 - Side-by-Side Extensibility with Microsoft Azure
Side-by-Side Extensibility with Microsoft Azure
SAP Inside Track 2018 - "Quidquid agis, prudenter agas ..." - Learnings from ...
NET53494 Extensions in the Age of S/4HANA
sitFRA_ BRFplus_TheAPIWay

Recently uploaded (20)

PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Empathic Computing: Creating Shared Understanding
PPTX
A Presentation on Artificial Intelligence
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Network Security Unit 5.pdf for BCA BBA.
PPTX
Tartificialntelligence_presentation.pptx
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PPTX
Machine Learning_overview_presentation.pptx
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PPTX
Spectroscopy.pptx food analysis technology
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PPTX
Programs and apps: productivity, graphics, security and other tools
PPTX
1. Introduction to Computer Programming.pptx
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Encapsulation theory and applications.pdf
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
20250228 LYD VKU AI Blended-Learning.pptx
Empathic Computing: Creating Shared Understanding
A Presentation on Artificial Intelligence
Unlocking AI with Model Context Protocol (MCP)
Network Security Unit 5.pdf for BCA BBA.
Tartificialntelligence_presentation.pptx
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
MIND Revenue Release Quarter 2 2025 Press Release
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Machine Learning_overview_presentation.pptx
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Spectroscopy.pptx food analysis technology
Encapsulation_ Review paper, used for researhc scholars
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Programs and apps: productivity, graphics, security and other tools
1. Introduction to Computer Programming.pptx
The Rise and Fall of 3GPP – Time for a Sabbatical?
MYSQL Presentation for SQL database connectivity
Encapsulation theory and applications.pdf
Reach Out and Touch Someone: Haptics and Empathic Computing

Text Analysis with SAP HANA

  • 2. Text Analysis with SAP HANA 2Oktober 2015 | Text Analysis with SAP HANA - SAP Inside Track Munich Motivation1 3 Text Analysis with SAP HANA2 7 Enhancement Options - Dictionaries and Rules3 21
  • 3. Text Analysis with SAP HANA 3Oktober 2015 | Text Analysis with SAP HANA - SAP Inside Track Munich Motivation1 3 Text Analysis with SAP HANA2 7 Enhancement Options - Dictionaries and Rules3 21
  • 4. Text Analysis with SAP HANA Why do we need Text Analysis? Oktober 2015 | Text Analysis with SAP HANA - SAP Inside Track Munich 4 • According to Merril Lynch 80-90% of all potentially usable business information may originate in unstructured form (Structure, Models and Meaning: Is "unstructured" data merely unmodeled?, Intelligent Enterprise, March 1, 2005.) • The data might origin from:  Social Networks  “Letters” from Customer  ... • What is the problem with unstructured data? • It is unstructured!  Not organized  No pre-defined data model  No metadata or mix of data and metadata  We have a lot of information that is relevant for the business but we cannot access it 
  • 5. Text Analysis with SAP HANA How can we solve that issue? Oktober 2015 | Text Analysis with SAP HANA - SAP Inside Track Munich 5 • Text Analysis: Extracting high quality information from texts • Typical process of a text analysis:  Parsing of the text  Adding features like linguistic information  Entity recognition: Is it an organization or a person or a place including domain facts like requests?  Sentiment analysis: What attitudinal information is “hidden” in the text?  Insertion of information to database in structured manner
  • 6. Text Analysis with SAP HANA 6Oktober 2015 | Text Analysis with SAP HANA - SAP Inside Track Munich Motivation1 3 Text Analysis with SAP HANA2 7 Enhancement Options - Dictionaries and Rules3 21
  • 7. Text Analysis with SAP HANA What has this to do with SAP HANA? Oktober 2015 | Text Analysis with SAP HANA - SAP Inside Track Munich 7 © SAP SE
  • 8. Text Analysis with SAP HANA Fulltext Index - Basics Oktober 2015 | Text Analysis with SAP HANA - SAP Inside Track Munich 8 • Starting point: database table containing the text (types like TEXT, NVARCHAR, BLOB …) • Create a Fulltext index incl. options (see system view SYS.FULLTEXT_INDEXES)
  • 9. Text Analysis with SAP HANA Entity Extraction Oktober 2015 | Text Analysis with SAP HANA - SAP Inside Track Munich 9 • In order to get valuable information out of the data SAP delivers several configurations • These configurations focus on entity and fact extraction under specific aspects • Types of Extraction:  EXTRACTION_CORE  EXTRACTION_CORE_ENTERPRISE  EXTRACTION_CORE_PUBLIC_SECTOR  EXTRACTION_CORE_VOICEOFCUSTOMER
  • 10. Oktober 2015 | Text Analysis with SAP HANA - SAP Inside Track Munich 10
  • 11. Text Analysis with SAP HANA 11Oktober 2015 | Text Analysis with SAP HANA - SAP Inside Track Munich Motivation1 3 Text Analysis with SAP HANA2 7 Enhancement Options - Dictionaries and Rules3 21
  • 12. Text Analysis with SAP HANA Custom Dictionary Oktober 2015 | Text Analysis with SAP HANA - SAP Inside Track Munich 12 • In several use cases you need to enhance the dictionary due to your business domain • Structure of a dictionary © SAP SE
  • 13. Text Analysis with HANA – Workflow of Enhancement Oktober 2015 | Text Analysis with SAP HANA - SAP Inside Track Munich 13 1. Find an extraction configuration that is most fitting for you 2. Copy the configuration into the target folder 3. Create a new custom dictionary 4. Reference the dictionary in your configuration copy 5. Recreate the fulltext index using your custom configuration
  • 14. Oktober 2015 | Text Analysis with SAP HANA - SAP Inside Track Munich 14
  • 15. Text Analysis with HANA – What’s next? Oktober 2015 | Text Analysis with SAP HANA - SAP Inside Track Munich 15 • Assume that we are in an “industry”-specific context or mining for “slang”-like facts and entities • Good example for this are sports! • We use the example of CrossFit® … as there are some funny facts to extract • Question: How can we extract complex entities from a text? • Examples:  Did somebody attend a CrossFit training?  Does somebody want to join a CrossFit box?
  • 16. Text Analysis with HANA – Text Analysis Extraction Rules Oktober 2015 | Text Analysis with SAP HANA - SAP Inside Track Munich 16 • Extraction rules (CGUL rules): pattern-based language for pattern matching using character or token-based regular expressions combined with linguistic attributes to define custom entity types. • Goal of the rule sets:  Extract complex facts based on relations between entities and predicates.  Identify entities in domain-specific language and capture facts expressed in new, popular “slang”
  • 17. Text Analysis with HANA – Text Analysis Extraction Rules Oktober 2015 | Text Analysis with SAP HANA - SAP Inside Track Munich 17 Extraction Rule Regular ExpressionsTokens Luck Dictionaries
  • 18. Oktober 2015 | Text Analysis with SAP HANA - SAP Inside Track Munich 20
  • 19. Text Analysis with HANA – “Lessons Learned” Oktober 2015 | Text Analysis with SAP HANA - SAP Inside Track Munich 21 • Text Analysis on SAP HANA is extremely powerful • Besides the delivered content you have a lot of options to adopt the text analysis to extract the entities and facts that you need • This also means you have a lot of options that you can set the wrong way  • Since SP09 rules get compiled upon activation (no separate compilation necessary) • The documentation is mostly ok but has room for improvement in case of extraction rules • Creating custom dictionaries and text rules is cumbersome, finding an error (e. g. a typo) is hell  No support in IDE   You can usually activate all objects, create the index … but the index remains empty 
  • 20. Oktober 2015 | Text Analysis with SAP HANA - SAP Inside Track Munich 22 Q&A
  • 21. .consulting .solutions .partnership Dr. Christian Lechner Principal IT Consultant +49 (0) 171 7617190 christian.lechner@msg-systems.com http://scn.sap.com/people/christian.lechner @lechnerc77
  • 22. Text Analysis with HANA – Ressources Oktober 2015 | Text Analysis with SAP HANA - SAP Inside Track Munich 24 • SAP HANA Search Developer Guide (Fulltext Index Options) help.sap.com -> Search Developer Guide • SAP HANA Text Analysis Developer Guide: help.sap.com -> TA Developer Guide • SAP HANA Text Analysis Language Reference Guide: help.sap.com -> TA Language Refrence Guide • SAP HANA Text Analysis Extraction Customization Guide: help.sap.com -> TA Extraction Customization Guide • YouTube Playlist of SAP HANA Academy: Text Analysis and Search

Editor's Notes

  • #8: Text analysis in SAP HANA is a suite of natural-language processing capabilities based on linguistic, statistical and machine-learning algorithms that model and structure the information content of textual sources in multiple languages. This technology forms the foundation for advanced text processing for a range of applications including search, business intelligence or exploratory data analysis.
  • #9: LANGUAGE COLUMN <column_name> - Defines the column where the language of a document is specified. LANGUAGE DETECTION ( <string_literal_list> ) - The set of languages to be considered during language detection. MIME TYPE COLUMN <column_name> - Defines the column where the mime-type of a document is specified. FUZZY SEARCH INDEX <on_off> - Specifies whether a fuzzy search index should be used. PHRASE INDEX RATIO <index_ratio> <index_ratio> ::= <exact_numeric_literal> - Specifies the percentage of the phrase index. Value must be between 0.0 and 1.0 Stores information about the occurrence of words and the proximity of words to one another. If a phrase index is present, phrase searches are sped up (e.g. SELECT * FROM T WHERE CONTAINS(COLUMN1, '"cats and dogs"')) . The float value is between 0.0 and 1.0. 1.0 means that the internal phrase index can use 100% of the memory size of the fulltext index. CONFIGURATION <string_literal> - The path to a custom configuration file for text analysis. SEARCH ONLY <on_off> - Defines if the original document should be stored or only the search results. When set to ON the original document content is not stored. FAST PREPROCESS <on_off> - If set to ON, fast preprocessing is used, i.e. linguistic searches are not possible. TEXT ANALYSIS <on_off> - Enables text analysis capabilities on the indexed column. Text analysis can extract entities such as persons, products, or places from documents, which are stored in a new table. MIME TYPE <string_literal> - The default mime type used for preprocessing. The value must be a valid mime type. TOKEN SEPARATORS <string_literal> - A set of characters used for token separation. Only ASCII characters are considered. <change_tracking_elem> ::= SYNC[HRONOUS] | ASYNC[HRONOUS] [FLUSH [QUEUE] <flush_queue_elem>] - The type of index to be created. SYNC[HRONOUS] - Creates a synchronous fulltext index. ASYNC[HRONOUS] - Creates an asynchronous fulltext index. FLUSH [QUEUE] <flush_queue_elem> <flush_queue_elem> ::= EVERY <integer_literal> MINUTES | AFTER <integer_literal> DOCUMENTS | EVERY <integer_literal> MINUTES OR AFTER <integer_literal> DOCUMENTS - Specifies when to update the fulltext index if an asynchronous index is used. When DOCUMENTS is specified, the fulltext index will be updated after the specified number of changes to the table, including updates and deletes. TEXT MINING <on_off> - Enables text mining capabilities on the indexed column. Text mining provides functionality that can compare documents by examining the terms used within them. TEXT MINING CONFIGURATION <string_literal> - The path to a custom configuration file for text mining. If not specified, DEFAULT.textminingconfig is use
  • #10: Entity Extraction is the identification of named entities (persons, organizations etc.), which eliminates the 'noise' in textual data by highlighting salient information. This process transforms unstructured text into structured information.  Fact Extraction is a higher-level semantic processing that links entities as "facts" in domain-specific applications. For example, "Voice of the Customer" classifies sentiments with their corresponding topics.
  • #17: CGUL - Custom Group User Language