SlideShare a Scribd company logo
Organize & manage master
meta data centrally, built
upon kong, cassandra, neo4j
& elasticsearch.
Hello!
I am Akhil Agrawal
Managing master & meta data is
a very common problem with
no good opensource alternative
as far as I know, so initiating this
project – MasterMetaData
Started BIZense in 2008 &
Digikrit in 2015
1.
Problem
Let’s start with what problem we are
addressing – why mastermetadata ?
Less Frequently Changing
 Master data and meta data both have one common
behavior of less frequent changes although their
purpose is different.
 The less frequently changing data whether it is data
about real world entities (master data) or data
about other data (meta data), both can be stored,
accessed and managed in very similar ways.
Why MasterMetaData ?
No Open Source Option
 There are MDM solutions (mostly from ERP
vendors like SAP, Oracle etc. & analytics
companies like Informatica, SAS) but the
master meta data intersection is being
explored only recently.
 There is no open source alternatives for smaller
companies or something that can be
embedded with SAAS products.
Why MasterMetaData ?
2.
Definitions
Let’s start with some definitions
around data categories
Definition of Data Categories
Meta Data
meta information
about other forms of
data (can describe
master, transaction
or lower level meta
data)
Master Data
real world entities
like customer,
partner etc. (only the
stable attributes are
considered part of
master data)
Transaction Data
real world
interactions which
have very short
lifespan and
occurrence is linked
with time/space
(unstable/changing
attribute values,
although
definition/description
is stable but each new
data point is unique)
Master Meta Data
combination of master and meta data
defined at application, enterprise or global
level (although the volume and variety
of master & meta data is very different, they
have lot of common access patterns)
Master Meta Data
3.
Implementation
Let’s discuss the implementation –
technologies & concepts involved
Background
◎ Faced difficulty with managing master
and meta data in previous projects
◎ Implemented custom solution while
building mobile ad platform
◎ Currently implementing same features
required for the communication platform
◎ Have worked with elasticsearch + kibana
while kong + cassandra seems useful
Build With Following Technologies
neo4j
highly scalable native graph
database that leverages data
relationships as first-class entities,
handles evolving data challenges
elasticsearch
search and analyze data in real
time, defacto standard for making
data accessible through search
and aggregations
cassandra
right choice when you need linear
scalability and high availability
without compromising
performance & durability
kong
the open-source management
layer for APIs and microservices,
delivering security, high
performance and reliability
lua
lua is a powerful, fast, lightweight,
embeddable scripting language.
For writing kong plugins for access
to various meta master data
kibana
explore and visualize data in
elasticsearch, opensource project
from elasticsearch team, intuitive
interface, visualization & dashboards
Opensource,
Scalable,
Searchable,
Ready to Use
Project mastermetadata
needs to be ready to use
for atleast few of the use
cases like location,
device, movie, tour etc.
Challenges
 Complex & hierarchical
data sets
 Real-time query
performance
 Dynamic structure
 Evolving relationships
Why neo4j for mastermetadata ?
Why neo4j ?
 Native graph store
 Flexible schema
 Performance and
scalability
 High availability
Referenced from
http://neo4j.com/use-cases/master-data-management
Why elasticsearch for mastermetadata ?
Scale
◎ Real-Time Data
◎ Massively
Distributed
◎ High Availability
◎ Multitenancy
◎ Per-Operation
Persistence
Search
◎ Full-Text Search
◎ Document-
Oriented
◎ Schema-Free
◎ Developer-
Friendly, RESTful
API
◎ Build on top of
Apache Lucene™
Analytics
◎ Real-Time Advanced
Analytics
◎ Very flexible Query
DSL
◎ Flexible analytics &
visualization
platform - Kibana
◎ Real-time summary
and charting of
streaming data
Referenced from https://www.elastic.co/products/elasticsearch
Why kong for mastermetadata ?
Secure, Manage &
Extend your APIs and
Microservices
RESTful Interface
Plugin Oriented
Platform Agnostic
Referenced from
https://getkong.org/
Without Kong With Kong
4.
Interesting
What are interesting things happening
around this ?
Master & Metadata Management Interesection
Maximized Metadata
Model
◎data model describing the metadata
needs to be “maximized” to cover as
many use cases possible
◎meta data model needs to be inclusive
of all metadata in the organization as
well as cover the master data
◎governance of metadata model
requires the ability to describe
maximum metadata in the system to
provide ability to govern data
describing other data
Minimalistic Master
Data Model
◎master data model describing master
data needs to be “minimalist”
◎master data model is neither inclusive
of all data in the organization, nor
specific to applications using it for
specific purpose
◎central governance of master data
requires that data model backing it is
minimalistic to be able to govern
without application specific details
◎master data model is basically
metadata describing the master data
Referenced from http://blogs.gartner.com/andrew_white/2011/04/26/more-
on-metadata-and-master-data-management-intersection/
From Big Data To Smart Data
Zero Latency Organization
data
◎latency linked to the data
(capturing)
◎latency linked to analytical
processes (processing)
structural
◎latency linked to decision
making processes
◎time needed to implement
actions linked with decisions
action
◎data latency added with
structural latency
◎time needed from capturing of
data till the action takes place
value
data is considered smart based on
the value it brings in decision
making and action taking (than
anything else like size, source, etc)
master
data which represents real world
entities and also remains stable
over time is the smart data as it
helps with common data reference
meta
data which describes other data
whether master, transactional or
lower level meta data is also smart
data as it helps in understanding
Types Of Latency
Smart Data
Master Meta Data
5.
Get Involved
Let’s discuss ways to get involved in
this project
Areas where you can get involved ?
DEMO
Functional Tests,
Integration Tests,
Run Demo
CODE
Implement Ideas,
Fix Bugs,
Enhance Features
DOCUMENT
User
Documentation,
Developer
Documentation
Current Focus
Devices
Storage: Device,
Browser, OS
Access: User
Agent
Locations
Storage: Country,
State, City
Access: IP Address
Tours
Storage: People,
Interest, Culture,
Destination, City,
Activity, Duration
Access: What, Where,
For
Storage & Access
Master Data Storage
Storage which is highly efficient
for read but at the same time
efficient for writes. Additional
requirement to be able to search
the stored data as well as flexible
efficient query interface to
enable faster access
Meta Data Storage
Storage which is highly flexible
in defining relationships like
inheritance, composition or
other relationships. Graph
modeled relationships are most
flexible to change as and when
the model evolves
Diagram featured by poweredtemplate.com
Meta Data Access
CRUD, Fill in the blanks,
Semantic Query, Search
Master Data Access
CRUD, Query (Structured /
Unstructured) & Search
References
 https://getkong.org/
 http://neo4j.com/
 http://cassandra.apache.org/
 https://www.elastic.co/
 http://booksite.elsevier.com/9780123743695/
10steps_DataCategories.pdf
 http://blogs.gartner.com/andrew_white/2011/
04/26/more-on-metadata-and-master-data-
management-intersection/
 http://neo4j.com/use-cases/master-data-
management/
Thanks!
Any questions?
You can find me at:
@digikrit
akhil@digikrit.com
Special thanks to all the people who made and released these awesome
resources for free:
 Presentation template by SlidesCarnival
 Presentation models by SlideModel & PoweredTemplate
 To companies behind kong, cassandra, neo4j & elasticsearch

More Related Content

PPTX
Digikrit Company Profile
PPTX
Azure Document Db
PPTX
Transitioning to a BI Role
PPTX
Best Practices: Hadoop migration to Azure HDInsight
PPTX
Power BI: Tips and Tricks
PDF
Point of View to Accelerate with dev ops
PDF
Creating a Modern Data Architecture for Digital Transformation
PDF
Cloud Modernization and Data as a Service Option
Digikrit Company Profile
Azure Document Db
Transitioning to a BI Role
Best Practices: Hadoop migration to Azure HDInsight
Power BI: Tips and Tricks
Point of View to Accelerate with dev ops
Creating a Modern Data Architecture for Digital Transformation
Cloud Modernization and Data as a Service Option

What's hot (20)

PDF
Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...
PPTX
Big Data in the Cloud with Azure Marketplace Images
PDF
The Rise of Microservices
PDF
Domain Driven Data: Apache Kafka® and the Data Mesh
PPTX
Azure Stream Analytics
PPTX
Perchè un programmatore ama anche i database NoSQL
PDF
Leveraging Azure Analysis Services Tabular Data Models with Power BI by Tim M...
PPTX
Azure cafe marketplace with looker data analytics
PPTX
Enterprise 360 - Graphs at the Center of a Data Fabric
PDF
Дмитрий Лавриненко "Big & Fast Data for Identity & Telemetry services"
PDF
Datamesh community meetup 28th jan 2021
PDF
IOOF Mongodb Australia
PPTX
How to build your career
PPTX
The Double win business transformation and in-year ROI and TCO reduction
PPTX
Data Structure and Types
PDF
Scaling Multi-Cloud Deployments with Denodo: Automated Infrastructure Management
PDF
Big Data Storage Challenges and Solutions
PPTX
Cepta The Future of Data with Power BI
PDF
Data Mesh @ Yelp - 2019
PPTX
Power BI Overview, Deployment and Governance
Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...
Big Data in the Cloud with Azure Marketplace Images
The Rise of Microservices
Domain Driven Data: Apache Kafka® and the Data Mesh
Azure Stream Analytics
Perchè un programmatore ama anche i database NoSQL
Leveraging Azure Analysis Services Tabular Data Models with Power BI by Tim M...
Azure cafe marketplace with looker data analytics
Enterprise 360 - Graphs at the Center of a Data Fabric
Дмитрий Лавриненко "Big & Fast Data for Identity & Telemetry services"
Datamesh community meetup 28th jan 2021
IOOF Mongodb Australia
How to build your career
The Double win business transformation and in-year ROI and TCO reduction
Data Structure and Types
Scaling Multi-Cloud Deployments with Denodo: Automated Infrastructure Management
Big Data Storage Challenges and Solutions
Cepta The Future of Data with Power BI
Data Mesh @ Yelp - 2019
Power BI Overview, Deployment and Governance
Ad

Similar to Master Meta Data (20)

PDF
LinkedInSaxoBankDataWorkbench
DOCX
Key aspects of big data storage and its architecture
PDF
[db tech showcase Tokyo 2018] #dbts2018 #B38 『Big Data and the Multi-model Da...
PDF
Data Virtualization: Introduction and Business Value (UK)
PDF
Modern Data Management for Federal Modernization
PDF
Microservices+Approach+with+IBM+Cloud+Pak+for+Data+-+BACon+2019.pdf
PDF
Big Data Companies and Apache Software
PDF
Big data and oracle
PDF
Big data analysis concepts and references
PDF
Product Keynote: Denodo 8.0 - A Logical Data Fabric for the Intelligent Enter...
PPTX
PDF
Big Data: Its Characteristics And Architecture Capabilities
DOCX
Discussion post· The proper implementation of a database is es.docx
PPTX
Key Skills Required for Data Engineering
PDF
Denodo Partner Connect: A Review of the Top 5 Differentiated Use Cases for th...
PDF
5 Steps for Architecting a Data Lake
PPTX
Microsoft Fabric Introduction
PDF
Understanding Metadata: Why it's essential to your big data solution and how ...
PPTX
Data Mesh using Microsoft Fabric
LinkedInSaxoBankDataWorkbench
Key aspects of big data storage and its architecture
[db tech showcase Tokyo 2018] #dbts2018 #B38 『Big Data and the Multi-model Da...
Data Virtualization: Introduction and Business Value (UK)
Modern Data Management for Federal Modernization
Microservices+Approach+with+IBM+Cloud+Pak+for+Data+-+BACon+2019.pdf
Big Data Companies and Apache Software
Big data and oracle
Big data analysis concepts and references
Product Keynote: Denodo 8.0 - A Logical Data Fabric for the Intelligent Enter...
Big Data: Its Characteristics And Architecture Capabilities
Discussion post· The proper implementation of a database is es.docx
Key Skills Required for Data Engineering
Denodo Partner Connect: A Review of the Top 5 Differentiated Use Cases for th...
5 Steps for Architecting a Data Lake
Microsoft Fabric Introduction
Understanding Metadata: Why it's essential to your big data solution and how ...
Data Mesh using Microsoft Fabric
Ad

Recently uploaded (20)

PPTX
history of c programming in notes for students .pptx
PPTX
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
PPTX
VVF-Customer-Presentation2025-Ver1.9.pptx
PPTX
L1 - Introduction to python Backend.pptx
PDF
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
PDF
System and Network Administraation Chapter 3
PPTX
Introduction to Artificial Intelligence
PDF
System and Network Administration Chapter 2
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 41
PDF
Which alternative to Crystal Reports is best for small or large businesses.pdf
PDF
Design an Analysis of Algorithms I-SECS-1021-03
PPTX
ManageIQ - Sprint 268 Review - Slide Deck
PDF
Audit Checklist Design Aligning with ISO, IATF, and Industry Standards — Omne...
PPTX
Operating system designcfffgfgggggggvggggggggg
PDF
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
PPTX
ISO 45001 Occupational Health and Safety Management System
PPTX
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
PDF
Softaken Excel to vCard Converter Software.pdf
PDF
Wondershare Filmora 15 Crack With Activation Key [2025
history of c programming in notes for students .pptx
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
VVF-Customer-Presentation2025-Ver1.9.pptx
L1 - Introduction to python Backend.pptx
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
System and Network Administraation Chapter 3
Introduction to Artificial Intelligence
System and Network Administration Chapter 2
Internet Downloader Manager (IDM) Crack 6.42 Build 41
Which alternative to Crystal Reports is best for small or large businesses.pdf
Design an Analysis of Algorithms I-SECS-1021-03
ManageIQ - Sprint 268 Review - Slide Deck
Audit Checklist Design Aligning with ISO, IATF, and Industry Standards — Omne...
Operating system designcfffgfgggggggvggggggggg
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
ISO 45001 Occupational Health and Safety Management System
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
Softaken Excel to vCard Converter Software.pdf
Wondershare Filmora 15 Crack With Activation Key [2025

Master Meta Data

  • 1. Organize & manage master meta data centrally, built upon kong, cassandra, neo4j & elasticsearch.
  • 2. Hello! I am Akhil Agrawal Managing master & meta data is a very common problem with no good opensource alternative as far as I know, so initiating this project – MasterMetaData Started BIZense in 2008 & Digikrit in 2015
  • 3. 1. Problem Let’s start with what problem we are addressing – why mastermetadata ?
  • 4. Less Frequently Changing  Master data and meta data both have one common behavior of less frequent changes although their purpose is different.  The less frequently changing data whether it is data about real world entities (master data) or data about other data (meta data), both can be stored, accessed and managed in very similar ways. Why MasterMetaData ?
  • 5. No Open Source Option  There are MDM solutions (mostly from ERP vendors like SAP, Oracle etc. & analytics companies like Informatica, SAS) but the master meta data intersection is being explored only recently.  There is no open source alternatives for smaller companies or something that can be embedded with SAAS products. Why MasterMetaData ?
  • 6. 2. Definitions Let’s start with some definitions around data categories
  • 7. Definition of Data Categories Meta Data meta information about other forms of data (can describe master, transaction or lower level meta data) Master Data real world entities like customer, partner etc. (only the stable attributes are considered part of master data) Transaction Data real world interactions which have very short lifespan and occurrence is linked with time/space (unstable/changing attribute values, although definition/description is stable but each new data point is unique) Master Meta Data combination of master and meta data defined at application, enterprise or global level (although the volume and variety of master & meta data is very different, they have lot of common access patterns)
  • 9. 3. Implementation Let’s discuss the implementation – technologies & concepts involved
  • 10. Background ◎ Faced difficulty with managing master and meta data in previous projects ◎ Implemented custom solution while building mobile ad platform ◎ Currently implementing same features required for the communication platform ◎ Have worked with elasticsearch + kibana while kong + cassandra seems useful
  • 11. Build With Following Technologies neo4j highly scalable native graph database that leverages data relationships as first-class entities, handles evolving data challenges elasticsearch search and analyze data in real time, defacto standard for making data accessible through search and aggregations cassandra right choice when you need linear scalability and high availability without compromising performance & durability kong the open-source management layer for APIs and microservices, delivering security, high performance and reliability lua lua is a powerful, fast, lightweight, embeddable scripting language. For writing kong plugins for access to various meta master data kibana explore and visualize data in elasticsearch, opensource project from elasticsearch team, intuitive interface, visualization & dashboards
  • 12. Opensource, Scalable, Searchable, Ready to Use Project mastermetadata needs to be ready to use for atleast few of the use cases like location, device, movie, tour etc.
  • 13. Challenges  Complex & hierarchical data sets  Real-time query performance  Dynamic structure  Evolving relationships Why neo4j for mastermetadata ? Why neo4j ?  Native graph store  Flexible schema  Performance and scalability  High availability Referenced from http://neo4j.com/use-cases/master-data-management
  • 14. Why elasticsearch for mastermetadata ? Scale ◎ Real-Time Data ◎ Massively Distributed ◎ High Availability ◎ Multitenancy ◎ Per-Operation Persistence Search ◎ Full-Text Search ◎ Document- Oriented ◎ Schema-Free ◎ Developer- Friendly, RESTful API ◎ Build on top of Apache Lucene™ Analytics ◎ Real-Time Advanced Analytics ◎ Very flexible Query DSL ◎ Flexible analytics & visualization platform - Kibana ◎ Real-time summary and charting of streaming data Referenced from https://www.elastic.co/products/elasticsearch
  • 15. Why kong for mastermetadata ? Secure, Manage & Extend your APIs and Microservices RESTful Interface Plugin Oriented Platform Agnostic Referenced from https://getkong.org/ Without Kong With Kong
  • 16. 4. Interesting What are interesting things happening around this ?
  • 17. Master & Metadata Management Interesection Maximized Metadata Model ◎data model describing the metadata needs to be “maximized” to cover as many use cases possible ◎meta data model needs to be inclusive of all metadata in the organization as well as cover the master data ◎governance of metadata model requires the ability to describe maximum metadata in the system to provide ability to govern data describing other data Minimalistic Master Data Model ◎master data model describing master data needs to be “minimalist” ◎master data model is neither inclusive of all data in the organization, nor specific to applications using it for specific purpose ◎central governance of master data requires that data model backing it is minimalistic to be able to govern without application specific details ◎master data model is basically metadata describing the master data Referenced from http://blogs.gartner.com/andrew_white/2011/04/26/more- on-metadata-and-master-data-management-intersection/
  • 18. From Big Data To Smart Data Zero Latency Organization data ◎latency linked to the data (capturing) ◎latency linked to analytical processes (processing) structural ◎latency linked to decision making processes ◎time needed to implement actions linked with decisions action ◎data latency added with structural latency ◎time needed from capturing of data till the action takes place value data is considered smart based on the value it brings in decision making and action taking (than anything else like size, source, etc) master data which represents real world entities and also remains stable over time is the smart data as it helps with common data reference meta data which describes other data whether master, transactional or lower level meta data is also smart data as it helps in understanding Types Of Latency Smart Data
  • 20. 5. Get Involved Let’s discuss ways to get involved in this project
  • 21. Areas where you can get involved ? DEMO Functional Tests, Integration Tests, Run Demo CODE Implement Ideas, Fix Bugs, Enhance Features DOCUMENT User Documentation, Developer Documentation
  • 22. Current Focus Devices Storage: Device, Browser, OS Access: User Agent Locations Storage: Country, State, City Access: IP Address Tours Storage: People, Interest, Culture, Destination, City, Activity, Duration Access: What, Where, For
  • 23. Storage & Access Master Data Storage Storage which is highly efficient for read but at the same time efficient for writes. Additional requirement to be able to search the stored data as well as flexible efficient query interface to enable faster access Meta Data Storage Storage which is highly flexible in defining relationships like inheritance, composition or other relationships. Graph modeled relationships are most flexible to change as and when the model evolves Diagram featured by poweredtemplate.com Meta Data Access CRUD, Fill in the blanks, Semantic Query, Search Master Data Access CRUD, Query (Structured / Unstructured) & Search
  • 24. References  https://getkong.org/  http://neo4j.com/  http://cassandra.apache.org/  https://www.elastic.co/  http://booksite.elsevier.com/9780123743695/ 10steps_DataCategories.pdf  http://blogs.gartner.com/andrew_white/2011/ 04/26/more-on-metadata-and-master-data- management-intersection/  http://neo4j.com/use-cases/master-data- management/
  • 25. Thanks! Any questions? You can find me at: @digikrit akhil@digikrit.com Special thanks to all the people who made and released these awesome resources for free:  Presentation template by SlidesCarnival  Presentation models by SlideModel & PoweredTemplate  To companies behind kong, cassandra, neo4j & elasticsearch