SlideShare a Scribd company logo
https://github.com/odpi/egeria
FROM BIG DATA TO ACTION: HOW TO BREAK
OUT OF THE SILOS AND LEVERAGE DATA
GOVERNANCE FOR YOUR ORGANIZATION?
1
Open metadata and Governance
https://github.com/odpi/egeria
Introduction
 Who:
 John Mertic – The Linux Foundation
 Chris Replogle – SAS Institute
 What:
 Using open practices and open software to govern your data.
2
https://github.com/odpi/egeria
How ING is becoming a metadata driven enterprise using Egeria
3
https://github.com/odpi/egeria
WHY ODPI EGERIA?
4
Open metadata and Governance
https://github.com/odpi/egeria
How can we become more effective with data?
5
https://github.com/odpi/egeria
The value of open, standardized metadata
6
https://github.com/odpi/egeria
Using a metadata repository to describe data
7
Metadata
Repository
https://github.com/odpi/egeria
Today’s reality – organizations buy lots of tools
8
https://github.com/odpi/egeria 9
https://github.com/odpi/egeria 10
https://github.com/odpi/egeria
A new manifesto for metadata and governance
 The maintenance of metadata must be automated to scale to the sheer volumes and variety
of data involved in modern business. Similarly the use of metadata should be used to drive the
governance of data and create a business friendly logical interface to the data landscape.
 The availability of metadata management must become ubiquitous in cloud platforms and
large data platforms, such as Apache Hadoop so that the processing engines on these
platforms can rely on its availability and build capability around it.
 Metadata access must become open and remotely accessible so that tools from
different vendors can work with metadata located on different platforms. This implies
unique identifiers for metadata elements, some level of standardization in the types and
formats for metadata and standard interfaces for manipulating metadata.
 Wherever possible, discovery and maintenance of metadata has to an integral part of all
tools that access, change and move information.
https://github.com/odpi/egeria
ODPi Egeria enables exchange of metadata between tools
from different vendors
Open and
Unified Metadata
12
Development DevOps Data Science
https://github.com/odpi/egeria
EGERIA’S DISTRIBUTED VIRTUAL GRAPH
13
Uniting metadata from many tools
https://github.com/odpi/egeria
ODPi Egeria enables exchange of metadata between tools
14
Open and
Unified Metadata
Development DevOps Data Science
https://github.com/odpi/egeria
Search
Open Metadata Access Services
Design philosophy
15
Open Metadata Repository Services
Use cases,
Personas,
Practitioners
input
Data integration,
availability and
integrity best
practices
https://github.com/odpi/egeria
Search
A Cohort of OMAG Servers
16
Open Metadata Repository Services
OMRS Cohort
Open Metadata
Access Services
Open Metadata
Access Services Open Metadata
Access Services
Open Metadata
And Governance
(OMAG) Server
https://github.com/odpi/egeria
Egeria Open Metadata Repository Services (OMRS)
 The OMRS defines a protocol and a set of connectors
 The Enterprise Connector performs cohort-wide operations –
this includes issuing queries to the cohort and when metadata
is replicated from another server it can use the local connector
and repository to cache it for availability and performance
 The Local Connector performs local operations and provides a
default Event Mapper that enables events relating to local
operations to be sent to the cohort
 The Repository Connector interfaces to a specific repository –
and optionally, may be accompanied by a custom Event
Mapper
 Egeria provides two built in repositories and there are
connectors to other repositories
 The interface to a repository connector is the MetadataCollection
API, described on the next slide
OMRS Enterprise Connector
OMRS Local Connector
& Event Mapper
OMRS Repository
Connector
Repository
Cohort
MetadataCollection
API
https://github.com/odpi/egeria
Egeria metadata – a distributed graph
Business
metadata
Structural
metadata for
a data store
EMPNAM
E
EMPNO JOBCODE SALARY
EMPLOYEE
RECORD
Employee
Work Location
Annual Salary
Job Title
Employee Id
Employee Name
Hourly Pay Rate
Manager Compensation Plan
HAS-A
HAS-A
HAS-A
HAS-A
HAS-A
HAS-A
IS-A IS-A
SensitiveIS-A
Data
 The interconnected nature of metadata forms a graph
 The distributed nature of Egeria leads to a distributed graph…
https://github.com/odpi/egeria
Egeria distributed graph model
19
Database
Column
Glossary
Term
OMAG Server 1 OMAG Server 2
Entity Entity
 A pair of entities are stored in separate servers
https://github.com/odpi/egeria
Egeria distributed graph model
20
Database
Column
Glossary
Term
Glossary
Term
Meaning
OMAG Server 1 OMAG Server 2
Reference
Copy
Relationship
 One entity could be replicated to the other server, as a ‘reference copy’
 The original Glossary Term on OMAG Server 2 is still the master
 A relationship could be defined between the local DB column and the reference copy of the Glossary Term
https://github.com/odpi/egeria
Egeria distributed graph model
21
Database
Column
Glossary
Term
OMAG Server 1
OMAG Server 3
OMAG Server 2
Database
Column
Glossary
Term
Meaning
 Both entities could be replicated to a third server, as reference copies
 The originals are still the masters
 A relationship could be defined between the local reference copies
https://github.com/odpi/egeria
Egeria distributed graph model
22
Database
Column
Glossary
Term
OMAG Server 1
OMAG Server 3
OMAG Server 2
Meaning
Database
Column
Glossary
Term
Entity
Proxy
 Instead of replication, the third server could relate the original entities using entity proxies
https://github.com/odpi/egeria
DEPLOYMENT PATTERNS
23
From large scale cloud services, on-premises local
deployments to edge IoT devices
https://github.com/odpi/egeria
A hybrid multi-cloud world
Data Lake
Mobile
Apps
Databases
ApplicationsFiles
Independent
metadata
Repository
Linked
metadata
Repositories
Business Partners
Sharing data
IoT devices and
systems
Applications
New applications
deployed to cloud
https://github.com/odpi/egeria
Open metadata ecosystem
Data Lake
Mobile
Apps
Databases
ApplicationsFiles
Independent
metadata
Repository
Linked
metadata
Repositories
Business Partners
Sharing data
IoT devices and
systems
Applications
New applications
deployed to cloud
https://github.com/odpi/egeria
The OMAG Server Platform
26
OMAG
Server
Platform
OMAG
Server
Platform
OMAG
Server
Platform
OMAG
Server
Platform
Egeria Server 1
Egeria Server 2
Egeria Server 3
Kubernetes
OMAG Server
Platform
Egeria
Server 1
Egeria
Server 2
Egeria
Server 3
Multi-tenant
OMAG Server
Platform
Egeria
Server 1
Edge
https://github.com/odpi/egeria
Metadata Tool Integration Patterns
27
https://github.com/odpi/egeria
Metadata Tool Integration Patterns
28
https://github.com/odpi/egeria
Example of a simple cohort
Cohort A
Chief Data Office
Data Lake
Systems of
Record
29
Virtualizer
Security-Sync
Data Bridge
Apache Ranger
Gaian
Stewardship
Stewardship
Stewardship
Data Onboarding
https://github.com/odpi/egeria
Metadata Tool
Integration Patterns
30
https://github.com/odpi/egeria
COHORT PROTOCOL
31
Server registration and metadata exchange
https://github.com/odpi/egeria
First server
 The first server to join the cohort issues a registration request and waits for
others to join.
32
https://github.com/odpi/egeria
Establishing contact
 When another server joins the cohort they exchange registration information.
33
https://github.com/odpi/egeria
Federated queries
 Once the registration is complete the cohort members can query each other.
34
https://github.com/odpi/egeria
Caching metadata for availability and performance
 Metadata can also be replicated through the cohort
35
https://github.com/odpi/egeria
OPEN METADATA TYPES
36
What is the scope of open metadata?
https://github.com/odpi/egeria
Scope of metadata covered
Glossary Collaboration
Governance
Models and
Reference Data
Metadata
Discovery
Lineage Data Assets
Base Types, Systems
and Infrastructure
37
https://github.com/odpi/egeria
Scope of metadata covered
Policy Metadata (Principles,
Regulations, Standards,
Approaches, Rule Specifications,
Roles and Metrics)
Governance
Actions and
Processes
Augmentation
MappingImplementation
Business Objects and
Relationships, Taxonomies
and Ontologies
Business Attributes
Organization
Teaming Metadata
(people profiles,
communities, projects,
notebooks, …)
Models and Schemas
4
3
1
5
Physical Asset Descriptions
(Data stores, APIs,
models and components)
Asset Collections
(Sets, Typed Sets, Type
Organized Sets)
Information Views
Rights
Management
Reference Data
Feedback Metadata
(tags, comments, ratings, …)
ClassificationSchemes
Classification
Strategy Subject Area Definition
Campaigns and Projects
Rollout
2
Discovery
Metadata (profile data,
technical classification, data
classification,
data quality assessment, …)
Augmentation
Instrument
Association
Information Process
Instrumentation (design lineage)
6
7
ConnectorsBasic Types, Infrastructure and Systems
Access
0
38
https://github.com/odpi/egeria
USING DESIGN THINKING
39
Introducing Coco Pharmaceuticals
https://github.com/odpi/egeria
Search
Open Metadata Access Services
Design philosophy
Open Metadata Repository Services
40
Use cases,
Personas,
Practitioners
input
Data integration,
availability and
integrity best
practices
https://github.com/odpi/egeria
Coco Pharmaceuticals persona
Jules Keeper, CDO Tessa Tube,
Chief Researcher
Erin Overview,
Information Architect
Faith Broker
Chief Privacy Offic
e
r
Bob Nitter,
Integration Developer
Callie Quartile,
Data Scientist
Nancy Noah
Cloud Specialist
Gary Geeke
IT Infrastructure
https://odpi.github.io/data-governance/coco-pharmaceuticals/personas/
41
https://github.com/odpi/egeria
Using design thinking
 Open Metadata Types
 Access Service Identification
 Samples and API design
 Best Practices
42
https://github.com/odpi/egeria
Different personas need different services
Callie Quartile
Data Scientist
Jules Keeper
Chief Data Officer
Find data
Understand data
Manage analytics models
Build data strategy
Define governance program
Monitor progress
43
https://github.com/odpi/egeria
Different personas need different services
Tanya Tidie
Clinical Trials Administrator
Ivor Padlock
Chief Security Officer
Maintain accurate patient records
Catalog clinical trials data
Demonstrate good data management practices
Understand risks to organization
Set up protection
Monitor for suspicious activity
44
https://github.com/odpi/egeria
Event-driven governance
Open
Metadata
New
Database
Assign
Owner
Classify
Data
Use
Data
45
https://github.com/odpi/egeria
Current Open Metadata Access Services (OMASs)
46
Project Management
Community ProfileAsset Catalog
Stewardship Action
Information View
Governance Program
Data Process
Subject Area
Connected Asset Discovery EngineGovernance Engine
Data Protection
Software Developer
Data Platform
Asset Owner
Digital Architecture
Data Science
DevOps
Asset Consumer
Data Infrastructure
Data Privacy
Asset Lineage
https://github.com/odpi/egeria
Open Metadata Access Service (OMAS) instance
47
https://github.com/odpi/egeria
EVOLUTION OF GOVERNANCE
48
Egeria guidance on governance
https://github.com/odpi/egeria
Governance maturity seen in terms of Value and Scope
https://github.com/odpi/egeria
Building governance maturity is a gradual process
 Organizations may operate different
levels of maturity in different parts of
their business.
 Choices determined by where the
most value lies.
 Many organizations aspire to provide
all employees with the data they need
(data citizenship*)
50
https://opengovernance.odpi.org/maturity-model/
https://github.com/odpi/egeria
Implementing Data Awareness
51
https://github.com/odpi/egeria
Implementing Governance Awareness
52
https://github.com/odpi/egeria
Implementing Embedded Governance
53
https://github.com/odpi/egeria
Implementing Business Driven Governance
54
https://github.com/odpi/egeria
Implementing Data Citizenship
55
https://github.com/odpi/egeria
Further Information
 ODPi :
 Website: https://www.odpi.org/
 ODPi / Egeria
 Website: https://www.odpi.org/projects/egeria
 Technical Information: https://egeria.odpi.org/
 ODPi Guidance on Governance
 https://opengovernance.odpi.org/
 Open source repositories:
 http://github.com/odpi/egeria
 https://github.com/odpi/data-governance
56
https://github.com/odpi/egeria
ADDITIONAL INFORMATION
57
https://github.com/odpi/egeria
COMMUNITY AND ECOSYSTEM
58
Building a strong community for the future.
https://github.com/odpi/egeria
Open source dependencies
59
Spring Boot
https://github.com/odpi/egeria
Using ODPi Egeria …
 Eases the cost of metadata integration
through
 Comprehensive standards and libraries.
 Active vendor recruitment program.
 Provides direct support to many
governance roles, filling the gaps
between function offered through
commercial tools.
 Provides best practices and content
packs to accelerate an organization’s
journey to becoming data driven.
60
https://github.com/odpi/egeria
Egeria Conformance Program -
its an “imitation game”
61
Workbench
Vendors that pass the
conformance suite can
display this mark
https://github.com/odpi/egeria
Running the Conformance Suite
62
https://github.com/odpi/egeria
The ODPi is a non-profit that is part of The Linux Foundation
 Delivering core technology
 Recruiting vendors
 Assisting practitioners
63
Vendors
Practitioners
Core
Technology
Conformance
Suite
Best
Practices
Project
Egeria
Project
Data
Governance
https://github.com/odpi/egeria
Links
 Press Releases and Podcast
 Open source repositories
• https://github.com/odpi/data-governance
• https://github.com/odpi/egeria
• https://www.linuxfoundation.org/press-release/2018/08/odpi-announces-egeria-for-open-
sharing-exchange-and-governance-of-metadata/
• https://www.linuxfoundation.org/press-release/2019/02/odpi-announces-new-egeria-
conformance-program-to-advance-open-metadata-exchange-between-vendor-tools/
• https://roaringelephant.org/2018/09/25/episode-107-open-metadata-and-governance-
masterclass-with-mandy-chessell-part-1/
• https://roaringelephant.org/2018/10/09/episode-109-open-metadata-and-governance-
• masterclass-with-mandy-chessell-part-2/
• https://youtu.be/ryd3KFWT1mc
64
https://github.com/odpi/egeria
VIRTUAL DATA CONNECTOR
65
Using metadata to control access to data
https://github.com/odpi/egeria
Automating governance example
IBM
Information
Governance
Catalog
Apache Atlas
Apache Ranger
Gaian
Define
Policies
Hadoop
Metadata
Manage Data Access
Egeria Cohort
(Open metadata exchange and federated queries)
Access
Data
Egeria
Open
Governance APIs
configure
configure
66
https://github.com/odpi/egeria
Scared to share (example)
Faith Broker
Human Resources
00 3809890 6 7 Lemmie Stage 818928 3082 4 New York 4 27 DataStage Expert 1 45324 300 27 Code St Harlem NY 1 3
00 3809890 3 7 Callie Quartile 328080 7432 5 New York 4 27 Data Scientist 1 56944 045 27 Code St Harlem NY 1 3
00 3809890 1 7 Tanya Tidie 209482 4051 2 New York 4 27 Data Steward 1 43800 215 27 Code St Harlem NY 1 3
00 3809890 6 7 Lemmie Stage 818928 3082 4 New York 4 27 DataStage Expert 1 ##### ### 27 Code St Harlem NY 1 3
00 3809890 3 7 Callie Quartile 328080 7432 5 New York 4 27 Data Scientist 1 ##### ### 27 Code St Harlem NY 1 3
00 3809890 1 7 Tanya Tidie 209482 4051 2 New York 4 27 Data Steward 1 ##### ### 27 Code St Harlem NY 1 3
Callie Quartile
Data Scientist
Very Sensitive DataVery Sensitive Data
67
https://github.com/odpi/egeria
What does metadata look like?
Business
metadata
Structural
metadata for
a data store
EMPNAME EMPNO JOBCODE SALARY
EMPLOYEE
RECORD
Employee
Work Location
Annual Salary
Job Title
Employee Id
Employee Name
Hourly Pay Rate
Manager Compensation Plan
HAS-A
HAS-A
HAS-A
HAS-A
HAS-A
HAS-A
IS-A
IS-A
Sensitive
IS-A
Data
00 3809890 6 7 Lemmie Stage 818928 3082 4 New York 4 27 DataStage Expert 1 45324 300 27 Code St Harlem NY 1 3
68
https://github.com/odpi/egeria
Automating governance example
IBM
Information
Governance
Catalog
Apache Atlas
Apache Ranger
Gaian
Define
Policies
Hadoop
Metadata
Manage Data Access
Egeria Cohort
(Open metadata exchange and federated queries)
Access
Data
Egeria
Open
Governance APIs
configure
configure
69
https://github.com/odpi/egeria
INTEGRATING WITH PARTNERS
70
Working with different vendors
https://github.com/odpi/egeria
Metadata Repository Integration Patterns
 Adapter
 Native
 Plug-in
 Caller
 Special
71
https://github.com/odpi/egeria
IBM Information Governance Catalog Integration
 Egeria’s IGC integration uses the
Adapter Pattern
 There are two connectors to IGC running
in the repository proxy server.
 They translate IGC APIs and events into
open metadata APIs and events.
 Egeria handles the interaction with the
cohort.
 No need to upgrade IGC to adopt
 Outbound metadata only
72
Information
Governance
Catalog
Repository
Proxy
Repository
Connector
Event
Mapper
Connector
Open Metadata Highway
ODPi Egeria
https://github.com/odpi/egeria
Apache Atlas Integration
 The Egeria community is working on a similar
integration for Apache Atlas.
 Again there are two connectors in the repository
proxy server.
 These connectors translate Atlas APIs and events
into open metadata APIs and events.
 Egeria handles the interaction with the cohort.
 No need to upgrade Atlas to adopt
 Two-way exchange of native Atlas metadata
73
Apache Atlas
Repository
Proxy
Repository
Connector
Event
Mapper
Connector
Open Metadata Highway
ODPi Egeria
https://github.com/odpi/egeria
Native Integration
 An alternative approach is the Native Pattern
 There are still two connectors. They translate
internal APIs and events into open metadata APIs
and events.
 ODPi Egeria handles the interaction with the cohort.
 The connectors and the ODPi Egeria libraries reside
in the metadata server.
 No additional server; less network traffic; upgrade
required.
74
Repository
Connector
Event
Mapper
Connector
Open Metadata Highway
ODPi Egeria
Metadata
Server
https://github.com/odpi/egeria
Plug-in Integration
 The plug-in pattern allows different repository back-
ends to be plugged into the ODPi Egeria’s OMAG
Server.
 Egeria includes:
 In-memory Repository (Testing and demos)
 JanusGraph Repository (All scenarios)
 Supports the full protocol and fills in the gaps left by
the proprietary tools.
75
Repository
Connector
Open Metadata Highway
Open Metadata and
Governance (OMAG)
Server
https://github.com/odpi/egeria
EGERIA LOCAL GRAPH REPOSITORY
76
https://github.com/odpi/egeria
The OMRSMetadataCollection interface
 The interface to an Egeria repository is the OMRSMetadataCollection interface
 It includes groups of operations:
 Group 1: Identification of metadata repository - metadataCollectionId
 Group 2: Type definitions (types, attributes) - add, find, get, remove, …
 Group 3: Find instances (entities, relationships) - get, find, graph-queries, …
 Group 4: Maintain instances (entities, relationships) - addEntity, deleteEntity, …
 Group 5: Change control information (entities, relationships) - reIdentify, reHome, …
 Group 6: Maintenance of reference (replica) copies – save, purge, refresh,…
https://github.com/odpi/egeria
Egeria Local Graph Repository
 The Egeria distribution includes a persistent repository and a non-persistent reposiutory
 The persistent repository is a graph repository built on JanusGraph
 JanusGraph is an open-source project, hosted by the Linux Foundation
 http://janusgraph.org
 http://github.com/janusgraph/janusgraph
 The built-in graph repository provides an OMAG Server with a persistent metadata store and is built
using Egeria’s ‘plugin’ pattern
 The graph repository can store instances of metadata owned by the local server
 It can also store reference copies of metadata instances replicated to the local server
 It also supports relationship instances that refer to entity proxy instances
https://github.com/odpi/egeria
Anatomy of the local graph repository
79
Graph Repository
JanusGraph
persistence
search
OMAG Server
OMAS – access services
OMRS Enterprise Connector OMRS topics
in
out
Apache
Tinkerpop
OMRS Local Connector
& Event Mapper
OMRS Graph Connector
JanusGraph
Management
Cohort
https://github.com/odpi/egeria
Graph Repository components
 GraphOMRSRepositoryConnector - implements the open connector framework interface
 GraphOMRSRepositoryConnectorProvider – implements the mechanism for brokering a connector
 GraphOMRSMetadataCollection – top level interface supporting type and instance operations
 GraphOMRSMetadataStore – implements the MetadataCollection using a graph database
 GraphOMRSGraphFactory – creation, schema, indexing - encapsulates JanusGraph-specifics
 Mappers – convert between OMRS objects and graph vertices and edges
 GraphOMRSEntityMapper
 GraphOMRSRelationshipMapper
 GraphOMRSClassificationMapper
 Plus various utility classes – error codes, audit logging, constants and utility methods
https://github.com/odpi/egeria/
See open-metadata-implementation/adapters/open-connectors/repository-services-connectors/
open-metadata-collection-store-connectors/graph-repository-connector
https://github.com/odpi/egeria
To use the Egeria Graph Repository
 Configure the OMAG Server repository-mode = ‘local-graph-repository’
 e.g. HTTP POST http://localhost:8080/open-metadata/admin-
services/users/{username}/servers/{servermame}/local-repository/mode/local-graph-repository
 Subsequently, start the OMRS instance in the server
 e.g. HTTP POST http://localhost:8080/open-metadata/admin-
services/users/{username}/servers/{servername}/instance
 When OMRS starts, the graph repository auto-creates a JanusGraph database – including:
 Persistence backend
 Search backend
 Graph schema
 Search indexes
 For now, the persistence backend is embedded Berkeley DB and the indexing backend is Lucene –
further options could be added
https://github.com/odpi/egeria
Graph Schema
The MetadataCollection interface is the formal interface to an Egeria repository.
Whilst it is possible to look at the graph directly (e.g. using Gremlin console):
Please don’t rely on the schema – it is likely to evolve
Type data:
 The Graph Repository does not store type definitions
 It delegates all type operations to the Repository Content Manager
Instance data:
 The Egeria Graph Repository stores instance data, using a JanusGraph schema that has:
 vertices for entities and classifications
 edges for relationships and classifiers
https://github.com/odpi/egeria
Instance representations in the OMRS
83
https://github.com/odpi/egeria
Graph mapping – vertices and edges
Classification
Instance
Entity
Instance
Relationship
Instance
Attributes
Primitives
Enums
Collections
AttributesAttributes
Primitives
Enums
Collections
Primitives
Enums
Collections
label : “classification” label : “entity” label : “relationship”
Properties Properties Properties
vertex
label : “classifier”
Properties
OMRSinstance
representation
Graphschema
element
vertex edge edge
https://github.com/odpi/egeria
Graph mapping – vertices and edges
Properties
Properties Properties
Properties
Properties
entity
entity
classification
classification
https://github.com/odpi/egeria
Metadata Repository API
 A MetadataCollection supports a comprehensive API
 Metadata collection Id
 Query types
 Define/maintain types
 Search/query metadata instances
 Maintain metadata instances
 Historical (as of time) queries
 Effectivity dating
 Versioning
 Metadata
 Advanced maintenance
 Managing reference copied
 Protocol is forgiving – allowing minimal capability -
metadata instance search/query
86
https://github.com/odpi/egeria
Local instances, reference copies and proxies
87
The graph contains one vertex per entity – whether the entity is local, a reference copy or a proxy
The graph contains one edge per relationship – whether the relationship is local or a reference copy
Reference Copies
• The metadataCollectionId core attribute is set to the ‘guid’ of the home repository
Entity Proxy objects
• Each entity instance has a vertex property of type Boolean, to indicate whether the instance is a proxy
https://github.com/odpi/egeria
The MetadataCollection ‘graph-query’ methods
 There are 4 sub-graph query methods:
 getRelatedEntities()
 Returns the entity and its immediate neighbors
 getEntityNeighborhood()
 Returns the entity and its neighbors up to the depth specified by the
‘level’ parameter
 getLinkingEntities()
 Returns the relationships and intermediate entities that connect the
specified pair of entities
 getRelationshipsForEntity()
 Returns relationships associated with entity, optionally filtered by
relationship type and status
level = 2
https://github.com/odpi/egeria
Graph Repository – supported functions
 The GraphRepository supports most of the OMRS MetadataCollection API, including:
 Save and purge of reference copies
 Use of entity proxies
 Delete and restore as well as purge – delete is a soft, restorable delete; purge is permanent
 Re-type of instances
 Re-identify of instances
 Re-home of instances
 The four ‘graph queries’ – described on the previous slide
 The ‘find’ methods – find..ByProperty, find..ByPropertyValue, findEntityByClassification
 The Graph Repository does not (yet) support:
 Historic queries – find methods that specify an asOfTime parameter
 Undo of previous instance updates
https://github.com/odpi/egeria
USER INTERFACE DESIGN
90
Supporting business and technical people
https://github.com/odpi/egeria
UI: good and the not so good.
91
Confusing
Not my language
(too technical or not technical enough)
Not meeting my needs
Presented for my role
Logically flows to complete the
tasks I do.
Underpinned by relevant
(persona specific) APIs
Not using my words
Mismatches my world view
Someone from my role was involved
In creating the UI.
https://github.com/odpi/egeria
UIs
ODPi Egeria design
92
Search
Open Metadata Access Services
Open Metadata Repository Services
92
Use cases,
Personas,
Practitioners
input
Data integration,
availability and
integrity best
practices
ODPi
Egeria
Metadata
repositories
https://github.com/odpi/egeria
UIs
ODPi Egeria UI types
93
Open Metadata Access Services
Open Metadata Repository Services
93
Search
Daemon
Type 1
OMAS only
Type 2
OMAS and OCF
Connector
Type 3
OMRS
Type 4
Daemon UI
Data
store
https://github.com/odpi/egeria
UIs
ODPi Egeria UI types work in progress
9494
Search
Type 1
OMAS only
Type 2
OMAS and OCF
Connector
Type 3
OMRS
Type 4
Daemon UI
IBM creating
Subject Area UI
ING creating
Asset Search
IBM creating
Type explorer
and instance
explorer
ING creating
Lineage viewer
https://github.com/odpi/egeria
Tomcat *
• configuration
Current UI implementation
95
Web app
Egeria
OMAG Server
Rest call
* Egeria Uis are coded to work with Tomcat. We expect other web servers will be used as the community
requires and implements.
https://github.com/odpi/egeria
UI design – profile driven
96
Login
Personal
Profile
User’s roles defines what UI capabilities
a user should see
Subject
area
Type
explorer
Asset
Search
Many more to come ……..
Dealing well with
potentially large
amounts of data in a
persona specific way is
the challenge. E.g. by
paging, limiting by
neighborhood depth in
graph calls
https://github.com/odpi/egeria
Egeria UI technology experiences
97
• Web component technology providing web components. It is not a framework
• + nice separation of components – hiding implementation in shadow dom
• + communicate with property binding
• + support for events
• + many existing paper and iron components for simple things.
David’s (Polymer newby) experiences:
• - quirky – spent a lot of time finding the happy path to get things working, especially around web
components not being initialized when you want to use them (a big frustration was trying to issue a rest call
from the ready() method).
• +/- need to be rigorous with architecture, it seems best to use one way bindings and events and
a top level controller component to drive state transitions for MVC e.g. around a grid. Redux may make
sense to hold state and define state transitions
• - There is no free commercial smart (editable) grid I can find (this seems true for other frameworks as well)
https://github.com/odpi/egeria
The sort of architecture more complex web components
require.
98
• Controller controls all transitions
• The model allows data updates to occur on
the model with simple CRUD operations
• The model changes are then reflected into
the view.
Considerations:
- Operations are currently synchronous. Redux
would be asynchronous
- Spinner would need to lock across the complete
User interaction not just the rest call
- Changes to the view made by the user and
changes to the view from the model, need to be
managed
- Paging required.
https://github.com/odpi/egeria
Call for action!
99
Call to the community for open source UI developers!
Be part of showing how powerful open metadata is using visualization!
Fuel the ODPi rocket!

More Related Content

PPTX
OSS NA 2019 - Demo Booth deck overview of Egeria
PPT
Hadoop Frameworks Panel__HadoopSummit2010
PDF
Spark Summit EU 2015: Revolutionizing Big Data in the Enterprise with Spark
PDF
Drupal, CKAN and Public Data. DrupalGov 08 february 2016
PDF
A General Purpose Extensible Scanning Query Architecture for Ad Hoc Analytics
PPTX
ChemConnect: Characterizing CombusAon KineAc Data with ontologies and meta-­‐...
PDF
Sparkr sigmod
PDF
Infrastructure for Deep Learning in Apache Spark
OSS NA 2019 - Demo Booth deck overview of Egeria
Hadoop Frameworks Panel__HadoopSummit2010
Spark Summit EU 2015: Revolutionizing Big Data in the Enterprise with Spark
Drupal, CKAN and Public Data. DrupalGov 08 february 2016
A General Purpose Extensible Scanning Query Architecture for Ad Hoc Analytics
ChemConnect: Characterizing CombusAon KineAc Data with ontologies and meta-­‐...
Sparkr sigmod
Infrastructure for Deep Learning in Apache Spark

What's hot (20)

PDF
Leveraging Apache Spark and Delta Lake for Efficient Data Encryption at Scale
PDF
A Query Model for Ad Hoc Queries using a Scanning Architecture
PDF
Splunk for db_connect
PDF
Data Management Systems for Government Agencies - with CKAN
PDF
PERFORMANCE EVALUATION OF SOCIAL NETWORK ANALYSIS ALGORITHMS USING DISTRIBUTE...
PDF
Talend Open Studio Introduction - OSSCamp 2014
PPT
Big data & hadoop framework
PPTX
File Repository on GAE
PPTX
Sqrrl and Accumulo
PPTX
Data analysis using hive ql & tableau
ODP
What is apache pig
PDF
Apache® Spark™ MLlib: From Quick Start to Scikit-Learn
PDF
What is apache_pig
PDF
CKAN and Australian open data updates for Wikimedia - 7 October 2015
PDF
Introduction To Pentaho Kettle
PPT
Tableau Architecture
PPTX
Real-time Machine Learning Analytics Using Structured Streaming and Kinesis F...
PDF
Databricks with R: Deep Dive
PPTX
Ado.net
PDF
Introduction To Hibernate
Leveraging Apache Spark and Delta Lake for Efficient Data Encryption at Scale
A Query Model for Ad Hoc Queries using a Scanning Architecture
Splunk for db_connect
Data Management Systems for Government Agencies - with CKAN
PERFORMANCE EVALUATION OF SOCIAL NETWORK ANALYSIS ALGORITHMS USING DISTRIBUTE...
Talend Open Studio Introduction - OSSCamp 2014
Big data & hadoop framework
File Repository on GAE
Sqrrl and Accumulo
Data analysis using hive ql & tableau
What is apache pig
Apache® Spark™ MLlib: From Quick Start to Scikit-Learn
What is apache_pig
CKAN and Australian open data updates for Wikimedia - 7 October 2015
Introduction To Pentaho Kettle
Tableau Architecture
Real-time Machine Learning Analytics Using Structured Streaming and Kinesis F...
Databricks with R: Deep Dive
Ado.net
Introduction To Hibernate
Ad

Similar to FROM BIG DATA TO ACTION: HOW TO BREAK OUT OF THE SILOS AND LEVERAGE DATA GOVERNANCE FOR YOUR ORGANIZATION (20)

PPTX
Streamline Data Governance with Egeria: The Industry's First Open Metadata St...
PDF
Egeria and graphs
PPTX
Become an data driven organization through unified metadata using ODPi Egeria
PPTX
Technical Challenges in Open Metadata
PDF
Open metadataos summit_28oct2019vfinal
PDF
OGD Metadata standards – The ENGAGE metadata architecture
PPTX
Towards cross-domain interoperation in the internet of FAIR data and services
PPTX
Achieving FAIR from a repository perspective
PPTX
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...
PDF
Data and Computation Interoperability in Internet Services
PPTX
Building COVID-19 Museum as Open Science Project
 
PPTX
The Rise of Big Data Governance: Insight on this Emerging Trend from Active O...
PPTX
Governance Software Systems_ Managing and Governing Your Data Assets.pptx
PPTX
Apache atlas sydney 2017-v4
PPT
I T Evolution
PDF
The Evolution of Metadata: LinkedIn's Story [Strata NYC 2019]
PPTX
Open Metadata and Governance with Apache Atlas
PPTX
The rise of big data governance: insight on this emerging trend from active o...
PDF
DataCite and its Members: Connecting Research and Identifying Knowledge
PPTX
IMC Summit 2016 Breakout - William Bain - Implementing Extensible Data Struct...
Streamline Data Governance with Egeria: The Industry's First Open Metadata St...
Egeria and graphs
Become an data driven organization through unified metadata using ODPi Egeria
Technical Challenges in Open Metadata
Open metadataos summit_28oct2019vfinal
OGD Metadata standards – The ENGAGE metadata architecture
Towards cross-domain interoperation in the internet of FAIR data and services
Achieving FAIR from a repository perspective
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...
Data and Computation Interoperability in Internet Services
Building COVID-19 Museum as Open Science Project
 
The Rise of Big Data Governance: Insight on this Emerging Trend from Active O...
Governance Software Systems_ Managing and Governing Your Data Assets.pptx
Apache atlas sydney 2017-v4
I T Evolution
The Evolution of Metadata: LinkedIn's Story [Strata NYC 2019]
Open Metadata and Governance with Apache Atlas
The rise of big data governance: insight on this emerging trend from active o...
DataCite and its Members: Connecting Research and Identifying Knowledge
IMC Summit 2016 Breakout - William Bain - Implementing Extensible Data Struct...
Ad

Recently uploaded (20)

PDF
A novel scalable deep ensemble learning framework for big data classification...
PDF
STKI Israel Market Study 2025 version august
PDF
Univ-Connecticut-ChatGPT-Presentaion.pdf
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PDF
NewMind AI Weekly Chronicles – August ’25 Week III
PDF
Developing a website for English-speaking practice to English as a foreign la...
PDF
ENT215_Completing-a-large-scale-migration-and-modernization-with-AWS.pdf
PPT
Module 1.ppt Iot fundamentals and Architecture
PPTX
The various Industrial Revolutions .pptx
PDF
WOOl fibre morphology and structure.pdf for textiles
PDF
Web App vs Mobile App What Should You Build First.pdf
PDF
2021 HotChips TSMC Packaging Technologies for Chiplets and 3D_0819 publish_pu...
PPTX
TechTalks-8-2019-Service-Management-ITIL-Refresh-ITIL-4-Framework-Supports-Ou...
PPTX
TLE Review Electricity (Electricity).pptx
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
TrustArc Webinar - Click, Consent, Trust: Winning the Privacy Game
PPTX
Modernising the Digital Integration Hub
PPT
What is a Computer? Input Devices /output devices
PPTX
cloud_computing_Infrastucture_as_cloud_p
PDF
Zenith AI: Advanced Artificial Intelligence
A novel scalable deep ensemble learning framework for big data classification...
STKI Israel Market Study 2025 version august
Univ-Connecticut-ChatGPT-Presentaion.pdf
gpt5_lecture_notes_comprehensive_20250812015547.pdf
NewMind AI Weekly Chronicles – August ’25 Week III
Developing a website for English-speaking practice to English as a foreign la...
ENT215_Completing-a-large-scale-migration-and-modernization-with-AWS.pdf
Module 1.ppt Iot fundamentals and Architecture
The various Industrial Revolutions .pptx
WOOl fibre morphology and structure.pdf for textiles
Web App vs Mobile App What Should You Build First.pdf
2021 HotChips TSMC Packaging Technologies for Chiplets and 3D_0819 publish_pu...
TechTalks-8-2019-Service-Management-ITIL-Refresh-ITIL-4-Framework-Supports-Ou...
TLE Review Electricity (Electricity).pptx
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
TrustArc Webinar - Click, Consent, Trust: Winning the Privacy Game
Modernising the Digital Integration Hub
What is a Computer? Input Devices /output devices
cloud_computing_Infrastucture_as_cloud_p
Zenith AI: Advanced Artificial Intelligence

FROM BIG DATA TO ACTION: HOW TO BREAK OUT OF THE SILOS AND LEVERAGE DATA GOVERNANCE FOR YOUR ORGANIZATION

Editor's Notes

  • #4: 3 minute video gives a great intro into the why/how… let this lead us forward.
  • #7: AUTOMATED – Metadata is created by application at the same as the data is created in a standard manner easily consumable for all with necessary permissions Device that took the picture / name of picture / settings picture was taken at / location geo tag of picture etc – all automatic – all done at creation of data time
  • #15: Egeria is an Open Source framework that can be used to provide a distributed, unified view of metadata from different sources, including different stores and tools from different vendors. Egeria creates a unified view of metadata residing in those tools and stores, so users can collaborate and share metadata, without needing to visit multiple tools or stores. Egeria does not attempt to consolidate the metadata into one repository or tool – it’s better to leave it in place - the current owners stay in control of their metadata, and it stays local to its native store or tool. Egeria provides an open type system, plus APIs, protocols, connectors and local metadata repositories.
  • #16: The internal architecture of Egeria has two distinct layers. The Open Metadata Access Services layer supports the different types of user and use case. The Open Metadata Repository Services layer provides the unified view of metadata across distinct systems, using protocols and repositories for access and exchange of metadata objects. Egeria’s OMRS layer includes the ability to refer to remote objects or replicate cached copies of remote objects for performance and availability Egeria can store this distributed model in its own local repositories, which support the storing of: local objects, replicas of remote objects and proxy-references to remote objects.
  • #17: This slide shows a physical embodiment of a cohort of OMAG Servers. An OMAG Server is a deployable unit of function and each OMAG Server can be configured to either run a set of OMAS services or support a repository, or a combination of these roles. An Egeria cohort is a collection of cooperating OMAG Servers. An OMAG Server may belong to multiple cohorts. The OMAS services are local to a server Each server runs the set of OMAS services listed in its configuration – it is OK to run 0, 1 or multiple OMAS services in a server Each OMAS is for a specific purpose or persona The OMRS protocol layer is supported by all servers The OMAG Servers use OMRS to access/exchange metadata across the cohort A server shares its metadata over OMRS – sending an event each time a change occurs, or sending a query to other servers A server may optionally maintain a local Egeria repository A server may optionally connect to a 3rd party metadata repository In a few slides we’ll see that the OMRS itself is composed of distinct layers that focus on cross-cohort (“Enterprise”) functions and Local functions.
  • #18: The role of OMRS is to provide a location transparent, unified view of metadata within a cohort. Cross-cohort operations are supported by the OMRS ‘Enterprise Connector’, including sending queries to the cohort and receiving the results, as well as receiving replicated metadata and saving copies via the local connector. Meanwhile the ‘Local Connector’ handles interactions with an (optional) local repository and provides a default event mapper that sends events when the local state changes. The OMRS protocol uses publish/subscribe over Kafka topics, but the communication/messaging system is pluggable so different transports could be used. The interface to the repository connector is the MetadataCollection API _ which is described on the next slide….
  • #19: Egeria’s model of metadata is graph-oriented, both at the business layer and beneath that in the structural metadata Business metadata describes the data that the business needs, what it means and how it should be classified and protected. Structural metadata describes how the data is actually stored and labelled in the data store. The linkages within and between the business and technical metadata forms a graph, that can be used to switch between these two perspectives. One of the built-in repositories in Egeria is a graph repository,; a natural fit for the metadata graph that also accommodates the distributed nature of OMRS. The Egeria local graph repository is built on the open-source JanusGraph graph database.
  • #23: It may not always be practical to replicate an instance There are 2 occasions where using a proxy is advantageous: An OMAS wants to save a relationship in a repository and the replication has not happened yet (or the set up is such that replication of that type is not enabled). 2. The repository does not support the full entity type but does support proxies (all proxies have the same storage requirement). A key point about the distributed graph is that whether the relationship refers to a replica entity or uses an entity proxy – it is location transparent. The Enterprise OMRS layer can select which repository into which to save an instance – based on capability and proximity.
  • #57: This is ambitious.
  • #58: Beyond this is where we put stretch-goal material and deeper dive information.
  • #62: ODPi
  • #69: Business metadata describes the data that the business needs, what it means and how it should be classified and protected. Structural metadata describes how the data is actually stored and labelled in the data store. The linkage between the business and technical metadata allows our technology to switch between these two perspectives. For example, A request for data expressed in business terminology can be translated into a query for data from a data store. An integration engine copying data into a sand box can discover which are the fields that the business classifies as sensitive and then mask these values dynamically.
  • #78: We’re not going to describe this interface in detail – but it’s worth being aware of it, especially as we’re going to talk later about the graph-queries in Group 3.
  • #79: Egeria provides a persistent graph repository It’s built using JanusGraph and currently uses version 0.3.1 JanusGraph is an open source project hosted by the Linux Foundation that supports the Apache Tinkerpop 3.3 interface. The Egeria graph repository is built using the Egeria ‘plugin’ repository pattern – in which the repository connector is both the connector and the implementation of the repository. The graph repository supports instances originating locally, instances replicated from a remote server and proxy instances.
  • #80: This slide shows (some of) the layers within an OMAG Server. We talked earlier about the access services and about the Enterprise Connector and Local Connectors within OMRS. Now we want to focus on the relationship between the Egeria graph repository connector and repository implementation (both in aqua-blue) and the JanusGraph code (in green) As far as possible the repository uses Apache Tinkerpop for graph operations. This is simply that – while we like JanusGraph – it is probably sensible to stay as far as possible with the Tinkerpop interface for possible future portability. There are some aspects of interacting with a graph database that are inherently implementation-specific – things like the configuration (e.g. of backends), schema and indexing. For these types of interaction it is necessary to use the JanusGraph Management interface.
  • #83: Whilst you could look inside the graph for debugging or development – please don’t write code that relies on the schema as it is very likely to evolve The graph does not contain type information – Egeria provides a repository helper that manages types. The graph is used to store instance data - as described in mode detail on the following slides…
  • #84: Here is an example of a number of OMRS instance objects – there are two entities, that are connected by a relationship. Also, one of the entities has two classifications. All of the instances have attributes – some will be core attributes used for type or control information; others will be attributes that are specific to the instance type (known as type-defined attributes). You don’t need to remember this picture – we’ll stick a copy of it in the top corner so we can refer back to it…..
  • #85: Entities and classifications are vertices. Relationships and classifiers are edges. The graph schema defines labels for Entity, Relationship, Classification and Classifier. Vertex and edge properties are used to store OMRS instance data, which includes type, control and property information: Type is referenced by name – not linked by an edge; types are held in the repository content manager, not stored in the graph Control information is stored in ‘core attribute’ properties Instance properties are stored in serialized form and under unique custom keys to support search
  • #86: Entities and classifications are vertices. Relationships and classifiers are edges. The graph schema defines labels for Entity, Relationship, Classification and Classifier. Vertex and edge properties are used to store OMRS instance data, which includes type, control and property information: Type is referenced by name – not linked by an edge; types are held in the repository content manager, not stored in the graph Control information is stored in ‘core attribute’ properties Instance properties are stored in serialized form and under unique custom keys to support search
  • #89: Within Group 3 of the MDC API ….
  • #92: Experts in a field with their own jargon and ways of doing things. Search report writer interested in assets and not security policies. Security policy author not interested in assets Goals tasks associated artifacts for a role.
  • #94: 1 OMAS only e,g Subject area, the UI only uses the OMAS interfaces to communicate with Egeria 2 OMAS and connector e.g. VDC metadata is obtained from Egeria using OMAs calls, the actual date is accessed using an RDB connector 3 OMRS oriented UIs – e.g. Tex used to explore Egeria types 4 Daemon UIs – displaying Lineage
  • #95: 1 OMAS only e,g Subject area, the UI only uses the OMAS interfaces to communicate with Egeria 2 OMAS and connector e.g. VDC metadata is obtained from Egeria using OMAs calls, the actual date is accessed using an RDB connector 3 OMRS oriented UIs – e.g. Tex used to explore Egeria types 4 Daemon UIs – displaying Lineage
  • #96: For this to work we need to know hostname and ports and url structures. Configuration for tomcat is via application.properties Configuration of the server is held in a file and authored via admin rest calls.
  • #99: Example here is the glossary grid. A grid for authoring glossaries in the subject area UI. Work in progress