Text Analysis with SAP HANA

.consulting .solutions .partnership
Text Analysis with SAP HANA

2Oktober 2015 | Text Analysis with SAP HANA - SAP Inside Track Munich
Motivation1 3
Text Analysis with SAP HANA2 7
Enhancement Options - Dictionaries and Rules3 21

Motivation1 3

Why do we need Text Analysis?
Oktober 2015 | Text Analysis with SAP HANA - SAP Inside Track Munich 4
• According to Merril Lynch 80-90% of all potentially usable business information may originate in
unstructured form
(Structure, Models and Meaning: Is "unstructured" data merely unmodeled?, Intelligent Enterprise, March 1, 2005.)
• The data might origin from:
 Social Networks
 “Letters” from Customer
 ...
• What is the problem with unstructured data?
• It is unstructured!
 Not organized
 No pre-defined data model
 No metadata or mix of data and metadata
 We have a lot of information that is relevant for the business but we cannot access it 

How can we solve that issue?
• Text Analysis: Extracting high quality information from texts
• Typical process of a text analysis:
 Parsing of the text
 Adding features like linguistic information
 Entity recognition: Is it an organization or a person or a place including domain facts like
requests?
 Sentiment analysis: What attitudinal information is “hidden” in the text?
 Insertion of information to database in structured manner

Motivation1 3

What has this to do with SAP HANA?
© SAP SE

Fulltext Index - Basics
• Starting point: database table containing the text (types like TEXT, NVARCHAR, BLOB …)
• Create a Fulltext index incl. options (see system view SYS.FULLTEXT_INDEXES)

Entity Extraction
• In order to get valuable information out of the data SAP delivers several configurations
• These configurations focus on entity and fact extraction under specific aspects
• Types of Extraction:
 EXTRACTION_CORE
 EXTRACTION_CORE_ENTERPRISE
 EXTRACTION_CORE_PUBLIC_SECTOR
 EXTRACTION_CORE_VOICEOFCUSTOMER

Motivation1 3

Custom Dictionary
• In several use cases you need to enhance the dictionary due to your business domain
• Structure of a dictionary
© SAP SE

Text Analysis with HANA – Workflow of Enhancement
1. Find an extraction configuration that is most fitting for you
2. Copy the configuration into the target folder
3. Create a new custom dictionary
4. Reference the dictionary in your configuration copy
5. Recreate the fulltext index using your custom configuration

Text Analysis with HANA – What’s next?
• Assume that we are in an “industry”-specific context or mining for “slang”-like facts and entities
• Good example for this are sports!
• We use the example of CrossFit® … as there are some funny facts to extract
• Question: How can we extract complex entities from a text?
• Examples:
 Did somebody attend a CrossFit training?
 Does somebody want to join a CrossFit box?

Text Analysis with HANA – Text Analysis Extraction Rules
• Extraction rules (CGUL rules): pattern-based language for pattern matching using character or
token-based regular expressions combined with linguistic attributes to define custom entity types.
• Goal of the rule sets:
 Extract complex facts based on relations between entities and predicates.
 Identify entities in domain-specific language and capture facts expressed in new, popular
“slang”

Text Analysis with HANA – Text Analysis Extraction Rules
Extraction Rule
Regular ExpressionsTokens
Luck Dictionaries

Text Analysis with HANA – “Lessons Learned”
• Text Analysis on SAP HANA is extremely powerful
• Besides the delivered content you have a lot of options to adopt the text analysis to extract the
entities and facts that you need
• This also means you have a lot of options that you can set the wrong way 
• Since SP09 rules get compiled upon activation (no separate compilation necessary)
• The documentation is mostly ok but has room for improvement in case of extraction rules
• Creating custom dictionaries and text rules is cumbersome, finding an error (e. g. a typo) is hell
 No support in IDE 
 You can usually activate all objects, create the index … but the index remains empty 

Q&A

.consulting .solutions .partnership
Dr. Christian Lechner
Principal IT Consultant
+49 (0) 171 7617190
christian.lechner@msg-systems.com
http://scn.sap.com/people/christian.lechner
@lechnerc77

Text Analysis with HANA – Ressources
• SAP HANA Search Developer Guide (Fulltext Index Options)
help.sap.com -> Search Developer Guide
• SAP HANA Text Analysis Developer Guide:
help.sap.com -> TA Developer Guide
• SAP HANA Text Analysis Language Reference Guide:
help.sap.com -> TA Language Refrence Guide
• SAP HANA Text Analysis Extraction Customization Guide:
help.sap.com -> TA Extraction Customization Guide
• YouTube Playlist of SAP HANA Academy:
Text Analysis and Search

Text Analysis with SAP HANA

More Related Content

What's hot (20)

Viewers also liked (8)

Similar to Text Analysis with SAP HANA (10)

More from Christian Lechner (10)

Recently uploaded (20)

Text Analysis with SAP HANA

Editor's Notes