SlideShare a Scribd company logo
| © Copyright 2015 Hitachi Consulting1
NoSQL Data Stores
with Microsoft Azure
Khalid M. Salama, Ph.D.
Business Insights & Analytics
Hitachi Consulting UK
We Make it Happen. Better.
| © Copyright 2015 Hitachi Consulting2
Outline
 What is NoSQL, and why?
 NoSQL Data Stores, tools & Technologies
 Solution Architecture Patterns
 Introducing Azure Redis Cache
 Introducing Azure Table Storage
 Introducing Azure DocumentDB
 Introducing HBase on Azure HDInsight
 How to Get Started with NoSQL
 Appendix: how to play with neo4j
| © Copyright 2015 Hitachi Consulting3
Fundamentals
| © Copyright 2015 Hitachi Consulting4
What is SQL?
Relational Database
Management
Systems (RDBMS)
Transactional
Fixed Schema
Structured
(tables, columns, rows,
constraints, etc.)
SQL as language
| © Copyright 2015 Hitachi Consulting5
What is SQL?
Relational Database
Management
Systems (RDBMS)
Transactional
Fixed Schema
Structured
(tables, columns, rows,
constraints, etc.)
SQL as language
Standard
| © Copyright 2015 Hitachi Consulting6
What is SQL?
Relational Database
Management
Systems (RDBMS)
Transactional
Fixed Schema
Structured
(tables, columns, rows,
constraints, etc.)
SQL as language
Not
Flexible
Enough
Not
Scalable
Enough
Standard
| © Copyright 2015 Hitachi Consulting7
What is NoSQL?
key attributes..
Non-relational
| © Copyright 2015 Hitachi Consulting8
What is NoSQL?
key attributes..
Non-relational
Non-Transactional
| © Copyright 2015 Hitachi Consulting9
What is NoSQL?
key attributes..
Non-relational
Semi-structured
Non-Transactional
Flexible Schema
| © Copyright 2015 Hitachi Consulting10
What is NoSQL?
key attributes..
Non-relationalDistributed
Fault-tolerant
Semi-structuredScalable
Non-Transactional
Flexible Schema
| © Copyright 2015 Hitachi Consulting11
What is NoSQL?
key attributes..
Non-relationalDistributed
Fault-tolerant
Semi-structuredScalable
Non-Transactional
Complement
RDBMS
Flexible Schema
Random, real-time
read/write access
| © Copyright 2015 Hitachi Consulting12
Why NoSQL?
It is all about Big Data…
NoSQL data stores help overcoming Big Data challenges in real-time operational systems
| © Copyright 2015 Hitachi Consulting13
Volume
Why NoSQL?
It is all about Big Data…
NoSQL data stores help overcoming Big Data challenges in real-time operational systems
Distributed
Scalable
Fault-tolerant
| © Copyright 2015 Hitachi Consulting14
Volume Variety
Why NoSQL?
It is all about Big Data…
NoSQL data stores help overcoming Big Data challenges in real-time operational systems
Distributed
Scalable
Fault-tolerant
Non-relational
Semi-structured
Flexible Schema
| © Copyright 2015 Hitachi Consulting15
Volume Variety Velocity
Why NoSQL?
It is all about Big Data…
NoSQL data stores help overcoming Big Data challenges in real-time operational systems
Distributed
Scalable
Fault-tolerant
Non-relational
Semi-structured
Flexible Schema
Eventual
Consistency Model
(non-transactional)
Random, real-time
read/write access
| © Copyright 2015 Hitachi Consulting16
NoSQL Usage Patterns
Suitable for
Reference Data
Variable Data
Structures
Singleton Select/ Inset/ update
Random, real-time read/write access
suitability…
| © Copyright 2015 Hitachi Consulting17
NoSQL Usage Patterns
Suitable for Not Suitable for
Batch
Processing
Complex
Analytical Queries
Joins
Complex
Transactions
Reference Data
Variable Data
Structures
Singleton Select/ Inset/ update
Random, real-time read/write access
suitability…
| © Copyright 2015 Hitachi Consulting18
CAP Theorem
| © Copyright 2015 Hitachi Consulting19
NoSQL & CAP Theorem
 In order to handle large volume of data processing efficiently, we need to scale out, i.e.
partition the data and distribute the computation
 Now we face a trade-off between Consistency, Availability, and Partition Tolerance
 Consistency: Data is in a consistent state across all the nodes.
That is, all the reads would get you the same, most recent write.
 Availability: Every request to the system gets a response (i.e., executed) on success/failure.
That is, system responsiveness (latency).
 Partition Tolerance: The system continuous to work despite of message loss or partition
(node) failure. That is, the system can sustain partial network failures.
 CAP Theorem: only two out of three properties can be satisfied in a distributed data
system. In facet, it is consistency vs availability, wrt partition tolerance!
The trade-off…
| © Copyright 2015 Hitachi Consulting20
NoSQL & CAP Theorem
The trade-off…
Continue working if a partition is not
reachable by the system?
C A
P
| © Copyright 2015 Hitachi Consulting21
NoSQL & CAP Theorem
The trade-off…
Continue working if a partition is not
reachable by the system?
C
Big Data Systems
 BASE Mode – Eventually Consistency
 Remains available (operational &
responsive)
 partition tolerant, i.e., sacrifices
consistency
A
P
| © Copyright 2015 Hitachi Consulting22
NoSQL & CAP Theorem
The trade-off…
Continue working if a partition is not
reachable by the system?
Transactional RDBMS
 ACID Mode – Strong Consistency
 Commits are atomic across the entre
system
 Not partition tolerant, i.e., sacrifices
availability
P
C
Big Data Systems
 BASE Mode – Eventually Consistency
 Remains available (operational &
responsive)
 partition tolerant, i.e., sacrifices
consistency
A
| © Copyright 2015 Hitachi Consulting23
NoSQL & CAP Theorem
The trade-off…
Continue working if a partition is not
reachable by the system?
Transactional RDBMS
 ACID Mode – Strong Consistency
 Commits are atomic across the entre
system
 Not partition tolerant, i.e., sacrifices
availability
C
Big Data Systems
 BASE Mode – Eventually Consistency
 Remains available (operational &
responsive)
 partition tolerant, i.e., sacrifices
consistency
A
ACID
 Atomic: Everything in a transaction succeeds
or the entire transaction is rolled back.
 Consistent: A transaction cannot leave the
database in an inconsistent state.
 Isolated: Transactions cannot interfere with
each other.
 Durable: Completed transactions persist,
even when servers restart etc.
BASE
 Basic Availability: data is sharded and
replicated, and consistency is compromised
for availability
 Soft-state: Allow data to be inconsistent and
try to maintain consistency later.
 Eventual consistency: Consistency is
maintained later.
P
| © Copyright 2015 Hitachi Consulting24
NoSQL & CAP Theorem
The trade-off…
Continue working if a partition is not
reachable by the system?
Transactional RDBMS
 ACID Mode – Strong Consistency
 Commits are atomic across the entre
system
 Not partition tolerant, i.e., sacrifices
availability
C
Big Data Systems
 BASE Mode – Eventually Consistency
 Remains available (operational &
responsive)
 partition tolerant, i.e., sacrifices
consistency
A
ACID
 Atomic: Everything in a transaction succeeds
or the entire transaction is rolled back.
 Consistent: A transaction cannot leave the
database in an inconsistent state.
 Isolated: Transactions cannot interfere with
each other.
 Durable: Completed transactions persist,
even when servers restart etc.
BASE
 Basic Availability: data is sharded and
replicated, and consistency is compromised
for availability
 Soft-state: Allow data to be inconsistent and
try to maintain consistency later.
 Eventual consistency: Consistency is
maintained later.
NoSQL: Strong vs. Eventual Consistency
Most NoSQL tools allow choosing
the balance between strong and
eventual consistency
P
| © Copyright 2015 Hitachi Consulting25
NoSQL Data Stores
| © Copyright 2015 Hitachi Consulting26
NoSQL Data Stores
Categories and Breads
Memory Cache
Store
Graph
Store
Column Family
Store
Key/Value
Store
Document
Store
Others
| © Copyright 2015 Hitachi Consulting27
 Based on a Hash Table (Dictionary);
 Unique key and a pointer to a data (Value)
 Data Item can be anything (Blob).
 A key can only have one value.
 Simple APIs (GET, PUT, DELETE).
 Yet complex implementation (no querying language).
 Optimized for singleton operations.
 Example tools:
 Amazon Dynamo (pioneer)
 Apache Accumulo
 Riak KV
 Redis (memory cache)
 Memcached (memory cache)
NoSQL – Key/Value Stores
The agile…
| © Copyright 2015 Hitachi Consulting28
NoSQL – Column Family Stores
The big…
 Extensible Record stores - Wide Column Store
 Tabular Data Structure
 Columns can be extended (organized in column family)
 Resulting in Sparse Matrix
 Millions of rows and Hundred of thousands of Columns
 APIs (GET, PUT, SCAN, DELETE, COUNT).
 A data item is accessed via (row key, column, timestamp)
 Example tools:
 Google BigTable (pioneer)
 Apache HBase
 Apache Cassandra
User Usage
Row Key Name Country Regency Frequency
Key1
Key2
Key3
T3
T1
T1T2
T1
T1
T2
GET(Key1,User,Name,T1) => Khalid
Real-world Scenarios
 Website usage information in Google Analytics
 Geographic information in Google Maps
 Social Media Apps (Twitter, Facebook, etc..)
 Search Engine Web Crawling Results
| © Copyright 2015 Hitachi Consulting29
NoSQL – Document Stores
The popular… Key Document
John
Eva
Lou
Total
 Same as Key/Value store, where the value is a document
with a specific format. E.g. JSON, BSON, XML, etc.
 A document is retrieved by a key, or
 Since the format of the documents is known (e.g. JSON), a query
language can be used to retrieve documents.
 E.g. retrieve all the document that has attribute “country” and its value
equals “Algeria”.
 Attributes in side the document are indexed. Documents are often
versioned.
 Supports wide range of OLTP/Real-time applications
 Example Tools
 IBM lotus Nots (pioneer)
 MongoDB
 CouchDB - Apache Couchbase
 Microsoft Azure DocumentDB
 Content Management and Personalization
 User Data Management
 Reference Data Management
 Lookups for Real-time Operational Intelligence
Real-world Scenarios
Not a MS
Word doc!
| © Copyright 2015 Hitachi Consulting30
 Represent data in graphical structures: Nodes and Edges.
 Nodes represent entities, Edges represent relationships
between entities.
 Relationships are directed, semantics of the direction
is up to the application. E.g. “Married” is reflexive, “Owns” is not.
 Each Node/Edge has a set of Key/Value properties
 Each Node/Edge has a label (type of entity/relationship)
 Optimized to process graph-related queries,
E.g. the number of steps needed to get from one node to another node.
 Example Tools
 Neo4j
 OrientDB
 Titan
 Apache Giraph
 Microsoft Graph Engine (Trinity)
NoSQL – Graph Stores
The clever…
Id: 1
Name: Khalid Salama
Age: 30
Email: Khalid.Salama@gmail.com
Id: 2
Name: Fatima Salama
Twitter: @fatbenamar
Id: 3
Model: Jaguar
Colour: Red
Id: 100
Since: 2014
Id: 101
Frequency: 2
Id: 102
Since: 2015
Id: 103
Licence No:234
 Social Networks
 Network and IT Operations
 Fraud Detection
 Digital Assets Management
Real-world Scenarios
Person
Person
Car
Own
DriveOwned by
Married
| © Copyright 2015 Hitachi Consulting31
NoSQL – Graph Stores
The clever…
O’REILLY - GRAPH DATABASES
| © Copyright 2015 Hitachi Consulting32
NoSQL – Graph Stores
The clever…
index-free adjacency; connected nodes
physically “point” to each other in the database
Any database behaves like a graphDB;
exposes a graph data model through
CRUD operations
Storage is designed and optimize to
store, process, and query graph data structures
Graphs are serialized in any database;
Relational, Document, or objectDBs
| © Copyright 2015 Hitachi Consulting33
NoSQL – Memory Cache Stores
The fast….
 Usually a simple data structure (Key/Value)
 The value can be a simple data type (String) or complex data objects.
 A memory-based store, usually, with persistence option.
 Used to optimize “hot” data access.
 Manages distributed web application session states.
 Can help to survive service downtime.
 Evolves read/write strategies (write-through, write-behind, etc.),
data expiration, and conflict resolution techniques.
 Example Tools:
 Memcached (pioneer)
 Redis
| © Copyright 2015 Hitachi Consulting34
NoSQL – Other Data Stores
Storage for other data structures
Object-oriented databases
 Developed in the 1980s motivated by the common use of object-oriented programming.
 Simply store the objects in a database in a way that corresponds to their representation in the
application, without the need of conversion or decomposition.
 The relationships between the objects, e.g. inheritance should also be maintained in the database.
 Examples: Caché, Db4o, Versant Object Database
Resource Description Framework (RDF) Data stores – Triple Data Stores
 Originally developed for describing metadata of IT resources.
 Used in connection with the sematic web, and other applications.
 The RDF model represents information as triples in the form of subject-predicate-object.
 Examples: MarkLogic, Virtuoso, Jena, Sesame, Algebraix
Multi-dimensional databases, Multi-value databases, Time-series databases, Event Sourcing
databases, Multi-modal data stores, etc.
| © Copyright 2015 Hitachi Consulting35
NoSQL Tools & Technologies
Typical Usage Lineage Tools
Key/Value  Flexible data structure
 Dictionary/ lookup
 Value can be anything
Amazon’s Dynamo  Apache Accumulo
 Riak KV
 Redis
Column Family
(Tabular/wide column)
 Column-oriented access
 BigData with real-time read/write
random access
 Extensible
Google’s BigTable  Apache Cassandra
 Apache HBase
 HBase on Microsoft HDInsight
Document  Query-able data
 Objects (complex structure) in JSON,
BSON, XML, etc.
 CRUD apps
IBM’s Lotus Notes  MongoDB
 CouchDB
 Apache CouchBase
 Microsoft Azure DocumentDB
Graph  Social networks
 Fraud detection
 Relationship-heavy data
Graph Theory  Neo4j
 OrientDB
 Titan
 Apache Giraph
 Microsoft Graph Engine
Memory Cache  Non-durable data
 Fast access
LiveJournal’s
Memcached
 Redis
 Microsoft Azure Redis
 Microsoft Azure Memcached (Preview)
http://db-engines.com/en/ranking
| © Copyright 2015 Hitachi Consulting36
Solution Architecture Patterns
| © Copyright 2015 Hitachi Consulting37
Lambda Architecture
NoSQL and Speed Layer
Hot Path
Cold Path
| © Copyright 2015 Hitachi Consulting38
NoSQL Usage Patterns
Common Scenarios
NoSQL
Online Apps
Read/Write
 Apps with NoSQL Backend data store
 High Throughputs
 Scalability and Availability
 Column Family, Graph & Document stores
| © Copyright 2015 Hitachi Consulting39
NoSQL Usage Patterns
Common Scenarios
RDBMS
Read/Write
NoSQL
Online Apps
Read/Write
 NoSQL is used to cache web content, personalization, reference data
 Business Transactions are invoked and stored into RDBMS
 Key/Value & Memory Cache stores
| © Copyright 2015 Hitachi Consulting40
NoSQL Usage Patterns
Common Scenarios
RDBMS
NoSQL
Online Apps
Read/Write
Process
 NoSQL stores in-progress transactions/activities (i.e., purchase basket, forms, user session, etc.)
 When transactions are submitted, they are processed into a RDBMS
 Document Stores
| © Copyright 2015 Hitachi Consulting41
NoSQL Usage Patterns
Common Scenarios
NoSQL
Online Apps
Read
Data WarehouseRDMS
Read/Write
ETL (batch)
ETL (batch)
 Data is Extracted, Transformed, loaded from OLTPs
to a DW
 Aggregations, KPIs, and scores are computed in
using Batch Processing
 Results are populated to a NoSQL data store for
reference use in apps
 Usually App hot read and ETL batch Write
 Document & Graph stores
Hot Path
Cold Path
E.g., Single Customer View:
 Customer Matching, customer KPIs,
segment assignment, and propensity scoring
are performed as batch processing in DW
 The output goes to NoSQL to be used for real-time
recommendation, campaigning, targeted advertising, etc.
| © Copyright 2015 Hitachi Consulting42
NoSQL Usage Patterns
Common Scenarios
NoSQL
Online Apps
Data Warehouse
ETL (batch)
Stream Processing
Write
ETL (batch)
Send Events
Lookup
NoSQL
 Used as lookups for stream processing solutions
 Can also be a persistent store of the processed events
 ETL process periodically extract data from NoSQL into a DW,
and update lookups (batch)
 Column Family, Key/Value, Graph & Document stores
Cold Path
Hot Path
| © Copyright 2015 Hitachi Consulting43
Azure Redis Cache
| © Copyright 2015 Hitachi Consulting44
Azure Redis Cache
A Key/Value in memory store – a.k.a Data Structures Store
| © Copyright 2015 Hitachi Consulting45
Azure Redis Cache
A Key/Value in memory store – a.k.a Data Structures Store
| © Copyright 2015 Hitachi Consulting46
Azure Redis Cache
In the example, Radis caches sql query (key), and query xml result (value)
A Key/Value in memory store – a.k.a Data Structures Store
You can also store hash tables, lists, and sorted lists
| © Copyright 2015 Hitachi Consulting47
Azure Table Storage
| © Copyright 2015 Hitachi Consulting48
Azure Table Storage
A Key/Value NoSQL Store
 Key/attribute store with a schema-less design.
 Adapts your data as the needs of your application evolve.
 Access to data is fast and cost-effective for all kinds of applications.
 Significantly lower in cost than traditional SQL for similar volumes of data.
 Web applications, address books, device information, metadata, etc.
 Row Key and Partition Key must be defined for the entity
 No complex joins, foreign keys, or stored procedures.
 Supports OData protocol and LINQ with WCF Data Service .NET Libraries.
Storage
Account
Table
Entity
 Partition Key
 Row Key
*
*
| © Copyright 2015 Hitachi Consulting49
Azure Table Storage
A Key/Value NoSQL Store
| © Copyright 2015 Hitachi Consulting50
Azure Table Storage
A Key/Value NoSQL Store
Namespace to use
Class to inherit
Row Key and Partition Key properties to set
| © Copyright 2015 Hitachi Consulting51
Azure Table Storage
A Key/Value NoSQL Store
If a Table Entity has an attribute that is a collection or object, Table Storage will ignore it.
Thus, needs to serialized/ de-serialized
| © Copyright 2015 Hitachi Consulting52
Azure Table Storage
A Key/Value NoSQL Store
| © Copyright 2015 Hitachi Consulting53
Azure Table Storage
A Key/Value NoSQL Store
| © Copyright 2015 Hitachi Consulting54
Azure DocumentDB
| © Copyright 2015 Hitachi Consulting55
Azure DocumentDB
A cloud-based Document store
DocumentDB
Account
Database
Collection
*
*
Document
*
 JSON Database
 Elastically scalable throughput and storage
 Ad hoc queries with familiar SQL syntax
 JavaScript execution within the database
 Tunable consistency levels
 Fully managed
 Open by design
| © Copyright 2015 Hitachi Consulting56
Azure DocumentDB
Azure DocumentDB Structure
| © Copyright 2015 Hitachi Consulting57
Azure DocumentDB
Getting Started
| © Copyright 2015 Hitachi Consulting58
Azure DocumentDB
Getting Started
| © Copyright 2015 Hitachi Consulting59
Azure DocumentDB
DocumentDB Consistency Model
Description
Strong The slowest of the four, but is guaranteed to always return correct data.
Session Ensures that an application always sees its own writes correctly, but
allows access to potentially out-of-date or out-of-order data written by
other applications.
Bound
Staleness
Ensures that an application will see changes in the order in which they
were made. This option does allow an application to see out-of-date
data, but only within a specified window, e.g., 500 milliseconds.
Eventual Provides the fastest access, but also has the highest chance of returning
out-of-date data.
https://en.wikipedia.org/wiki/Consistency_model
| © Copyright 2015 Hitachi Consulting61
Azure DocumentDB
Getting Started
| © Copyright 2015 Hitachi Consulting62
Azure DocumentDB
Databases and Collections
| © Copyright 2015 Hitachi Consulting63
Azure DocumentDB
Document Explorer
| © Copyright 2015 Hitachi Consulting64
Azure DocumentDB
Query Explorer
| © Copyright 2015 Hitachi Consulting65
Azure DocumentDB
.NET Code
Microsoft.Azure.Documents.Client.DocumentClient
Includes the following operations,
 Create/Delete
 Read
 Replace/Upsert,
for the following objects,
 Database
 Collection
 Document
 Attachment
 User
 Permission
 USerDefinedFunction
 StoredProcedure
 Trigger
| © Copyright 2015 Hitachi Consulting66
Azure DocumentDB
.NET Code
| © Copyright 2015 Hitachi Consulting67
HBase on Azure
| © Copyright 2015 Hitachi Consulting68
Introducing Apache HBase
HBase & Hadoop Big Data Ecosystem
Hadoop Distributed File System (HDFS)
Applications
In-Memory Stream SQL
 Spark-
SQL
NoSQL Machine
Learning
….
Batch
Yet Another Resource Negotiator (YARN)
Search Orchest.
MgmntAcquisition
Named
Node
DataNode 1 DataNode 2 DataNode 3 DataNode N
| © Copyright 2015 Hitachi Consulting69
Introducing Apache HBase
HBase & Hadoop Big Data Ecosystem
Hadoop Distributed File System (HDFS)
HBase Cluster
….
Yet Another Resource Negotiator (YARN)Named
Node
DataNode 1 DataNode 2 DataNode 3 DataNode N
HBase
Master
HBase Region Server
Region 1
….
Zookeeper
Services
Write-ahead log
MemStore HFile
Region 2
Write-ahead log
MemStore HFile
Region N
Write-ahead log
MemStore HFile
APIs: Java Client, Thrift, Avro, REST
| © Copyright 2015 Hitachi Consulting70
Cold Write
(batch)
HBase on Azure HDInsight
a brief introduction to HBase
 HDFS is suitable for batch processing (i.e., scan over big data files)
 HBase is optimized for fast record lookups, and singleton operations
 HDFS is usually the file system for HBase
 Rows maintained in sorted lexicographical order for efficient rows scan
 Row ranges are partitioned into tablets
 Column are grouped into column families for locality indication
 Simple commandlet: create, alter, drop, list, describe, get, put, incr, scan,
count, delete, truncate, etc. https://learnhbase.wordpress.com/2013/03/02/hbase-shell-commands/
 APIs support batch operations
 A common choice with stream processing solutions
HBase
HDFS
Read/Write
Hot Write
(real-time)
Hot Read
(real-time)
| © Copyright 2015 Hitachi Consulting71
HBase on Azure HDInsight
Getting Started
| © Copyright 2015 Hitachi Consulting72
HBase on Azure HDInsight
Getting Started
| © Copyright 2015 Hitachi Consulting73
HBase on Azure HDInsight
Exploring HBase
| © Copyright 2015 Hitachi Consulting74
HBase on Azure HDInsight
Exploring HBase
| © Copyright 2015 Hitachi Consulting75
HBase on Azure HDInsight
Exploring HBase
Basic Commands
create 'table_test', {NAME=>'f1'},{NAME=>'f2'}
List
put 'table_test', 'rowKey1', 'f1:firstname', 'khalid'
put 'table_test', 'rowKey1', 'f1:lastname', 'salama'
put 'table_test', 'rowKey1', 'f2:level', '3'
put 'table_test', 'rowKey2', 'f1:firstname', 'paul'
put 'table_test', 'rowKey2', 'f1:lastname', 'linehame'
put 'table_test', 'rowKey2', 'f1:email', 'plinhame@hitachi.com'
get 'table_test', 'rowKey1', 'f1:lastname'
disable 'table_test'
drop 'table_test'
| © Copyright 2015 Hitachi Consulting76
HBase on Azure HDInsight
Exploring HBase
Other Commands
get ‘table test’ , ‘rowkey1’
get ‘table test’, ‘rowkey1’ , {COLUMN => [f1:lastname]}
get ‘table test’, ‘rowkey2’, {TIMERANE => [0:1000]}
scan ‘table test’ {LIMIT =>100}
scan ‘table test’ {STARTROW=>’rowkey5’ , STOPROW=‘rowkey10’}
| © Copyright 2015 Hitachi Consulting77
HBase on Azure HDInsight
HBase Reader/Writer using .NET
| © Copyright 2015 Hitachi Consulting78
HBase on Azure HDInsight
HBase Reader/Writer using .NET
| © Copyright 2015 Hitachi Consulting79
HBase on Azure HDInsight
HBase Reader/Writer using .NET
| © Copyright 2015 Hitachi Consulting80
How to Get Started with NoSQL?
 Read the slides!
 Azure Documentation – Azure Table Storage
https://azure.microsoft.com/en-gb/documentation/articles/storage-dotnet-how-to-use-tables/
 Azure Documentation – Azure DocumentDB
https://azure.microsoft.com/en-gb/documentation/services/documentdb/
 Azure Documentation – Azure Redis Cache
https://azure.microsoft.com/en-gb/documentation/services/redis-cache/
 Azure Documentation – HBase on HDInsight
https://azure.microsoft.com/en-gb/documentation/articles/hdinsight-hbase-overview/
 GitHub – tweet-sentiment with HBase
https://github.com/maxluk/tweet-sentiment
 Azure Documentation – Understanding NoSQL on Azure
https://azure.microsoft.com/en-gb/documentation/articles/fundamentals-data-management-nosql-chappell/
 db-engines - Knowledge Base of Relational and NoSQL Database Management Systems
http://db-engines.com/en/
 NoSQL Databases
http://nosql-database.org/
| © Copyright 2015 Hitachi Consulting81
How to Play with neo4j
A widely-used GraphDB
A
B D
EC
| © Copyright 2015 Hitachi Consulting82
How to Play with neo4j
A widely-used GraphDB
A
B D
EC
Following
Following
Following
Following
Following
FollowingFollowing
Following
Following
| © Copyright 2015 Hitachi Consulting83
How to Play with neo4j
A widely-used GraphDB
A
B D
EC
Following
Following
Following
Following
Following
FollowingFollowing
Following
Following
P2P1
P3
Posted
Posted
Posted
| © Copyright 2015 Hitachi Consulting84
How to Play with neo4j
A widely-used GraphDB
A
B D
EC
P2P1
P3
Following
Following
Following
Following
Following
FollowingFollowing
Following
Following
Likes
Likes
Likes
Likes
Likes
Likes
LikesPosted
Posted
Posted
| © Copyright 2015 Hitachi Consulting85
How to Play with neo4j
A widely-used GraphDB
CREATE
(a:User{name:"Khalid Salama", grade:"Manager"}),
(b:User{name:"Paul Lineham", grade:"Senior Manager"}),
(c:User{name:"Vaughn Rees", grade:"Senior Manager"}),
(d:User{name:"Sutha Thiru", grade:"Director"}),
(e:User{name:"Mark Hill", grade:"VP"}),
(a)-[:Following{since:'2014'}]->(d),
(a)-[:Following{since:'2014'}]->(b),
(b)-[:Following{since:'2010'}]->(a),
(d)-[:Following{since:'2011', strength:"high"}]->(e),
(e)-[:Following{since:'2014'}]->(d),
(e)-[:Following{since:'2015'}]->(c),
(c)-[:Following]->(d),
(c)-[:Following{since:'2013', strength:"low"}]->(a),
(b)-[:Following]->(c),
(p1:Post{title:"post 1", lastupdate:"01/01/2016", tags:['sports','life style']}),
(p2:Post{title:"post 2", lastupdate:"03/05/2015"}),
(p3:Post{title:"post 3", lastupdate:"121/7/2015", tags:['economics','politcs']}),
(a)-[:Posted]->(p1),
(d)-[:Posted]->(p2),
(c)-[:Posted]->(p3),
(b)-[:Liked]->(p1),
(c)-[:Liked]->(p1),
(a)-[:Liked]->(p2),
(b)-[:Liked]->(p2),
(e)-[:Liked]->(p2),
(a)-[:Liked]->(p3),
(e)-[:Liked]->(p3)
| © Copyright 2015 Hitachi Consulting86
How to Play with neo4j
A widely-used GraphDB
-- fetch one node
MATCH (u:User{name:"Khalid Salama"}) RETURN u
-- fetch an attribute of a node
MATCH (u:User{name:"Khalid Salama"}) RETURN u.grade
-- fetch nodes by conditions
MATCH (u:User{grade:"Senior Manager"}) RETURN u
--
MATCH (u:User)
WHERE u.grade = 'Senior Manager'
RETURN u
--
MATCH (u:User)
WHERE u.name =~ "Sutha.+" -- START WITH, END WITH, CONTAIN, IN [,],
RETURN u
--
MATCH ()-[r:Posted]->(p:Post)
WHERE 'sports' IN p.tags
RETURN p
-- Whom khalid is following?
MATCH (x:User{name:"Khalid Salama"})-[r:Following]->(y:User)
RETURN x,r,y
-- Who is Following Khalid
MATCH (x:User{name:"Khalid Salama"})<-[r:Following]-(y:User)
RETURN x,r,y
-- Update
MERGE (u:User { name:"Khalid Salama" })
SET u.practice = "Data Insights & Analytics"
RETURN u
- Get Count of Posts
MATCH (p:Post) RETURN COUNT(p)
-- Get User Count By Grade
MATCH (u:User) RETURN u.grade, COUNT(u)
-- Get User and Followers
MATCH (u:User)<-[:Following]-(f:User)
RETURN u.name AS User,COLLECT(f.name) AS followrs,COUNT(f) AS Total
-- Constraint
CREATE CONSTRAINT ON (u:User) ASSERT u.name IS UNIQUE
-- Index
CREATE INDEX ON :User(grade)
-- Get users following each other
MATCH (u1:User)-[:Following]->(u2:User)-[:Following]->(u1)
RETURN u1.name,u2.name
-- Get Users likes a post posted by a follower
MATCH (u:User)-[:Liked]->(p:Post)<-[:Posted]-(u2:User)-[:Following]->(u)
RETURN u,p,u2
-- Get Following of Following
MATCH (u:User)-[:Following]->()-[:Following]->(u2:User)
Return u.name,COLLECT(DISTINCT u2.name)
-- Get User with max 3 steps from Paul
MATCH (u:User)-[:Following*..3]->(us:User{name:"Paul Lineham"})
Return u
-- Shortest path
MATCH
(u1:User{name:"Mark Hill"}),
(u2;User{name:"Paul Lineham"}),
p=SHORTESTPATH((u1)-[:Following*..10]->(u2))
RETURN p
-- Get nodes having a property
MATCH(p)
WHERE EXSITS(p.tags)
http://neo4j.com/docs/developer-manual/current/#cypher-query-lang
| © Copyright 2015 Hitachi Consulting87
My Background
Applying Computational Intelligence in Data Mining
• Honorary Research Fellow, School of Computing , University of Kent.
• Ph.D. Computer Science, University of Kent, Canterbury, UK.
• M.Sc. Computer Science , The American University in Cairo, Egypt.
• 25+ published journal and conference papers, focusing on:
– classification rules induction,
– decision trees construction,
– Bayesian classification modelling,
– data reduction,
– instance-based learning,
– evolving neural networks, and
– data clustering
• Journals: Swarm Intelligence, Swarm & Evolutionary Computation,
, Applied Soft Computing, and Memetic Computing.
• Conferences: ANTS, IEEE CEC, IEEE SIS, EvoBio,
ECTA, IEEE WCCI and INNS-BigData.
ResearchGate.org
| © Copyright 2015 Hitachi Consulting88
Thank you!

More Related Content

PPTX
Microsoft Azure Batch
PPTX
Enterprise Cloud Data Platforms - with Microsoft Azure
PPTX
Hive with HDInsight
PPTX
Operational Machine Learning: Using Microsoft Technologies for Applied Data S...
PPTX
Microsoft R - ScaleR Overview
PPTX
Real-Time Event & Stream Processing on MS Azure
PPTX
Building the Data Lake with Azure Data Factory and Data Lake Analytics
PPTX
The Future of Data Warehousing, Data Science and Machine Learning
Microsoft Azure Batch
Enterprise Cloud Data Platforms - with Microsoft Azure
Hive with HDInsight
Operational Machine Learning: Using Microsoft Technologies for Applied Data S...
Microsoft R - ScaleR Overview
Real-Time Event & Stream Processing on MS Azure
Building the Data Lake with Azure Data Factory and Data Lake Analytics
The Future of Data Warehousing, Data Science and Machine Learning

What's hot (20)

PDF
Red Hat Openshift on Microsoft Azure
PDF
Data Integration for Big Data (OOW 2016, Co-Presented With Oracle)
PPTX
Cloud Innovation Day - Commonwealth of PA v11.3
PPTX
Microsoft Azure Big Data Analytics
PDF
Democratizing Data Science on Kubernetes
PDF
Sidecars and a Microservices Mesh
PPTX
Scaling Data Science on Big Data
PDF
Oracle PL/SQL 12c and 18c New Features + RADstack + Community Sites
PPTX
Choosing technologies for a big data solution in the cloud
PPTX
Building Modern Data Platform with Microsoft Azure
PPTX
How Apache Spark and Apache Hadoop are being used to keep banking regulators ...
PPTX
IBM THINK 2018 - IBM Cloud SQL Query Introduction
PPTX
Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...
PPTX
Driving Network and Marketing Investments at O2 by Focusing on Improving the ...
PPTX
Benefits of Transferring Real-Time Data to Hadoop at Scale
PPTX
Introduction to Azure Databricks
PDF
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...
PPTX
Modern Data Warehousing with the Microsoft Analytics Platform System
PDF
Machine Learning for z/OS
PPTX
Red Hat Openshift on Microsoft Azure
Data Integration for Big Data (OOW 2016, Co-Presented With Oracle)
Cloud Innovation Day - Commonwealth of PA v11.3
Microsoft Azure Big Data Analytics
Democratizing Data Science on Kubernetes
Sidecars and a Microservices Mesh
Scaling Data Science on Big Data
Oracle PL/SQL 12c and 18c New Features + RADstack + Community Sites
Choosing technologies for a big data solution in the cloud
Building Modern Data Platform with Microsoft Azure
How Apache Spark and Apache Hadoop are being used to keep banking regulators ...
IBM THINK 2018 - IBM Cloud SQL Query Introduction
Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...
Driving Network and Marketing Investments at O2 by Focusing on Improving the ...
Benefits of Transferring Real-Time Data to Hadoop at Scale
Introduction to Azure Databricks
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...
Modern Data Warehousing with the Microsoft Analytics Platform System
Machine Learning for z/OS
Ad

Viewers also liked (14)

PPTX
NoSQL, which way to go?
PDF
Polyglot Persistence - Two Great Tastes That Taste Great Together
PPTX
Bancos NoSQL no Microsoft Azure
PDF
From Sensor Data to Triples: Information Flow in Semantic Sensor Networks
PPTX
Azure doc db (slideshare)
PPTX
Azure DocumentDB for Healthcare Integration - Part 2
PDF
SQL Server vs. Azure DocumentDB – Ein Battle zwischen XML und JSON
PPSX
Azure DocumentDB
PPTX
Azure DocumentDb
PPTX
SQL Saturday #313 Rheinland - MapReduce in der Praxis
PDF
Benjamin Guinebertière - Microsoft Azure: Document DB and other noSQL databas...
PPTX
Building IoT and Big Data Solutions on Azure
PPTX
Big Data Application Architectures - IoT
PDF
NoSQL Now! NoSQL Architecture Patterns
NoSQL, which way to go?
Polyglot Persistence - Two Great Tastes That Taste Great Together
Bancos NoSQL no Microsoft Azure
From Sensor Data to Triples: Information Flow in Semantic Sensor Networks
Azure doc db (slideshare)
Azure DocumentDB for Healthcare Integration - Part 2
SQL Server vs. Azure DocumentDB – Ein Battle zwischen XML und JSON
Azure DocumentDB
Azure DocumentDb
SQL Saturday #313 Rheinland - MapReduce in der Praxis
Benjamin Guinebertière - Microsoft Azure: Document DB and other noSQL databas...
Building IoT and Big Data Solutions on Azure
Big Data Application Architectures - IoT
NoSQL Now! NoSQL Architecture Patterns
Ad

Similar to NoSQL with Microsoft Azure (20)

PPTX
Intorducing Big Data and Microsoft Azure
PPTX
مقدمة عن NoSQL بالعربي
PPTX
Data Engineering for Data Scientists
PPT
SQL/NoSQL How to choose ?
PPTX
Building a modern data warehouse
PPT
Reporting from the Trenches: Intuit & Cassandra
PPTX
Big Data 2107 for Ribbon
PPTX
Relational databases vs Non-relational databases
PDF
System Design Interview Questions PDF By ScholarHat
PDF
NOSQL -lecture 1 mongo database expalnation.pdf
PPT
Polyglot persistence for enterprise cloud applications
PPTX
Master Meta Data
PPT
CouchBase The Complete NoSql Solution for Big Data
PDF
¿Cómo modernizar una arquitectura de TI con la virtualización de datos?
ODP
Nonrelational Databases
PPTX
HbaseHivePigbyRohitDubey
PPTX
Introduction to asdfghjkln b vfgh n v
PPTX
CS 542 Parallel DBs, NoSQL, MapReduce
PPTX
Microservices, DevOps, and Continuous Delivery
PPT
Big Data
Intorducing Big Data and Microsoft Azure
مقدمة عن NoSQL بالعربي
Data Engineering for Data Scientists
SQL/NoSQL How to choose ?
Building a modern data warehouse
Reporting from the Trenches: Intuit & Cassandra
Big Data 2107 for Ribbon
Relational databases vs Non-relational databases
System Design Interview Questions PDF By ScholarHat
NOSQL -lecture 1 mongo database expalnation.pdf
Polyglot persistence for enterprise cloud applications
Master Meta Data
CouchBase The Complete NoSql Solution for Big Data
¿Cómo modernizar una arquitectura de TI con la virtualización de datos?
Nonrelational Databases
HbaseHivePigbyRohitDubey
Introduction to asdfghjkln b vfgh n v
CS 542 Parallel DBs, NoSQL, MapReduce
Microservices, DevOps, and Continuous Delivery
Big Data

Recently uploaded (20)

PPTX
Introduction to machine learning and Linear Models
PPTX
Business Acumen Training GuidePresentation.pptx
PDF
Fluorescence-microscope_Botany_detailed content
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PDF
annual-report-2024-2025 original latest.
PDF
Clinical guidelines as a resource for EBP(1).pdf
PPTX
climate analysis of Dhaka ,Banglades.pptx
PPTX
IB Computer Science - Internal Assessment.pptx
PPTX
Computer network topology notes for revision
PPTX
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
PPTX
1_Introduction to advance data techniques.pptx
PPTX
Introduction to Knowledge Engineering Part 1
PPTX
Supervised vs unsupervised machine learning algorithms
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
PPTX
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
PDF
Lecture1 pattern recognition............
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PDF
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
PDF
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
Introduction to machine learning and Linear Models
Business Acumen Training GuidePresentation.pptx
Fluorescence-microscope_Botany_detailed content
Galatica Smart Energy Infrastructure Startup Pitch Deck
annual-report-2024-2025 original latest.
Clinical guidelines as a resource for EBP(1).pdf
climate analysis of Dhaka ,Banglades.pptx
IB Computer Science - Internal Assessment.pptx
Computer network topology notes for revision
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
1_Introduction to advance data techniques.pptx
Introduction to Knowledge Engineering Part 1
Supervised vs unsupervised machine learning algorithms
STUDY DESIGN details- Lt Col Maksud (21).pptx
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
Lecture1 pattern recognition............
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
iec ppt-1 pptx icmr ppt on rehabilitation.pptx

NoSQL with Microsoft Azure

  • 1. | © Copyright 2015 Hitachi Consulting1 NoSQL Data Stores with Microsoft Azure Khalid M. Salama, Ph.D. Business Insights & Analytics Hitachi Consulting UK We Make it Happen. Better.
  • 2. | © Copyright 2015 Hitachi Consulting2 Outline  What is NoSQL, and why?  NoSQL Data Stores, tools & Technologies  Solution Architecture Patterns  Introducing Azure Redis Cache  Introducing Azure Table Storage  Introducing Azure DocumentDB  Introducing HBase on Azure HDInsight  How to Get Started with NoSQL  Appendix: how to play with neo4j
  • 3. | © Copyright 2015 Hitachi Consulting3 Fundamentals
  • 4. | © Copyright 2015 Hitachi Consulting4 What is SQL? Relational Database Management Systems (RDBMS) Transactional Fixed Schema Structured (tables, columns, rows, constraints, etc.) SQL as language
  • 5. | © Copyright 2015 Hitachi Consulting5 What is SQL? Relational Database Management Systems (RDBMS) Transactional Fixed Schema Structured (tables, columns, rows, constraints, etc.) SQL as language Standard
  • 6. | © Copyright 2015 Hitachi Consulting6 What is SQL? Relational Database Management Systems (RDBMS) Transactional Fixed Schema Structured (tables, columns, rows, constraints, etc.) SQL as language Not Flexible Enough Not Scalable Enough Standard
  • 7. | © Copyright 2015 Hitachi Consulting7 What is NoSQL? key attributes.. Non-relational
  • 8. | © Copyright 2015 Hitachi Consulting8 What is NoSQL? key attributes.. Non-relational Non-Transactional
  • 9. | © Copyright 2015 Hitachi Consulting9 What is NoSQL? key attributes.. Non-relational Semi-structured Non-Transactional Flexible Schema
  • 10. | © Copyright 2015 Hitachi Consulting10 What is NoSQL? key attributes.. Non-relationalDistributed Fault-tolerant Semi-structuredScalable Non-Transactional Flexible Schema
  • 11. | © Copyright 2015 Hitachi Consulting11 What is NoSQL? key attributes.. Non-relationalDistributed Fault-tolerant Semi-structuredScalable Non-Transactional Complement RDBMS Flexible Schema Random, real-time read/write access
  • 12. | © Copyright 2015 Hitachi Consulting12 Why NoSQL? It is all about Big Data… NoSQL data stores help overcoming Big Data challenges in real-time operational systems
  • 13. | © Copyright 2015 Hitachi Consulting13 Volume Why NoSQL? It is all about Big Data… NoSQL data stores help overcoming Big Data challenges in real-time operational systems Distributed Scalable Fault-tolerant
  • 14. | © Copyright 2015 Hitachi Consulting14 Volume Variety Why NoSQL? It is all about Big Data… NoSQL data stores help overcoming Big Data challenges in real-time operational systems Distributed Scalable Fault-tolerant Non-relational Semi-structured Flexible Schema
  • 15. | © Copyright 2015 Hitachi Consulting15 Volume Variety Velocity Why NoSQL? It is all about Big Data… NoSQL data stores help overcoming Big Data challenges in real-time operational systems Distributed Scalable Fault-tolerant Non-relational Semi-structured Flexible Schema Eventual Consistency Model (non-transactional) Random, real-time read/write access
  • 16. | © Copyright 2015 Hitachi Consulting16 NoSQL Usage Patterns Suitable for Reference Data Variable Data Structures Singleton Select/ Inset/ update Random, real-time read/write access suitability…
  • 17. | © Copyright 2015 Hitachi Consulting17 NoSQL Usage Patterns Suitable for Not Suitable for Batch Processing Complex Analytical Queries Joins Complex Transactions Reference Data Variable Data Structures Singleton Select/ Inset/ update Random, real-time read/write access suitability…
  • 18. | © Copyright 2015 Hitachi Consulting18 CAP Theorem
  • 19. | © Copyright 2015 Hitachi Consulting19 NoSQL & CAP Theorem  In order to handle large volume of data processing efficiently, we need to scale out, i.e. partition the data and distribute the computation  Now we face a trade-off between Consistency, Availability, and Partition Tolerance  Consistency: Data is in a consistent state across all the nodes. That is, all the reads would get you the same, most recent write.  Availability: Every request to the system gets a response (i.e., executed) on success/failure. That is, system responsiveness (latency).  Partition Tolerance: The system continuous to work despite of message loss or partition (node) failure. That is, the system can sustain partial network failures.  CAP Theorem: only two out of three properties can be satisfied in a distributed data system. In facet, it is consistency vs availability, wrt partition tolerance! The trade-off…
  • 20. | © Copyright 2015 Hitachi Consulting20 NoSQL & CAP Theorem The trade-off… Continue working if a partition is not reachable by the system? C A P
  • 21. | © Copyright 2015 Hitachi Consulting21 NoSQL & CAP Theorem The trade-off… Continue working if a partition is not reachable by the system? C Big Data Systems  BASE Mode – Eventually Consistency  Remains available (operational & responsive)  partition tolerant, i.e., sacrifices consistency A P
  • 22. | © Copyright 2015 Hitachi Consulting22 NoSQL & CAP Theorem The trade-off… Continue working if a partition is not reachable by the system? Transactional RDBMS  ACID Mode – Strong Consistency  Commits are atomic across the entre system  Not partition tolerant, i.e., sacrifices availability P C Big Data Systems  BASE Mode – Eventually Consistency  Remains available (operational & responsive)  partition tolerant, i.e., sacrifices consistency A
  • 23. | © Copyright 2015 Hitachi Consulting23 NoSQL & CAP Theorem The trade-off… Continue working if a partition is not reachable by the system? Transactional RDBMS  ACID Mode – Strong Consistency  Commits are atomic across the entre system  Not partition tolerant, i.e., sacrifices availability C Big Data Systems  BASE Mode – Eventually Consistency  Remains available (operational & responsive)  partition tolerant, i.e., sacrifices consistency A ACID  Atomic: Everything in a transaction succeeds or the entire transaction is rolled back.  Consistent: A transaction cannot leave the database in an inconsistent state.  Isolated: Transactions cannot interfere with each other.  Durable: Completed transactions persist, even when servers restart etc. BASE  Basic Availability: data is sharded and replicated, and consistency is compromised for availability  Soft-state: Allow data to be inconsistent and try to maintain consistency later.  Eventual consistency: Consistency is maintained later. P
  • 24. | © Copyright 2015 Hitachi Consulting24 NoSQL & CAP Theorem The trade-off… Continue working if a partition is not reachable by the system? Transactional RDBMS  ACID Mode – Strong Consistency  Commits are atomic across the entre system  Not partition tolerant, i.e., sacrifices availability C Big Data Systems  BASE Mode – Eventually Consistency  Remains available (operational & responsive)  partition tolerant, i.e., sacrifices consistency A ACID  Atomic: Everything in a transaction succeeds or the entire transaction is rolled back.  Consistent: A transaction cannot leave the database in an inconsistent state.  Isolated: Transactions cannot interfere with each other.  Durable: Completed transactions persist, even when servers restart etc. BASE  Basic Availability: data is sharded and replicated, and consistency is compromised for availability  Soft-state: Allow data to be inconsistent and try to maintain consistency later.  Eventual consistency: Consistency is maintained later. NoSQL: Strong vs. Eventual Consistency Most NoSQL tools allow choosing the balance between strong and eventual consistency P
  • 25. | © Copyright 2015 Hitachi Consulting25 NoSQL Data Stores
  • 26. | © Copyright 2015 Hitachi Consulting26 NoSQL Data Stores Categories and Breads Memory Cache Store Graph Store Column Family Store Key/Value Store Document Store Others
  • 27. | © Copyright 2015 Hitachi Consulting27  Based on a Hash Table (Dictionary);  Unique key and a pointer to a data (Value)  Data Item can be anything (Blob).  A key can only have one value.  Simple APIs (GET, PUT, DELETE).  Yet complex implementation (no querying language).  Optimized for singleton operations.  Example tools:  Amazon Dynamo (pioneer)  Apache Accumulo  Riak KV  Redis (memory cache)  Memcached (memory cache) NoSQL – Key/Value Stores The agile…
  • 28. | © Copyright 2015 Hitachi Consulting28 NoSQL – Column Family Stores The big…  Extensible Record stores - Wide Column Store  Tabular Data Structure  Columns can be extended (organized in column family)  Resulting in Sparse Matrix  Millions of rows and Hundred of thousands of Columns  APIs (GET, PUT, SCAN, DELETE, COUNT).  A data item is accessed via (row key, column, timestamp)  Example tools:  Google BigTable (pioneer)  Apache HBase  Apache Cassandra User Usage Row Key Name Country Regency Frequency Key1 Key2 Key3 T3 T1 T1T2 T1 T1 T2 GET(Key1,User,Name,T1) => Khalid Real-world Scenarios  Website usage information in Google Analytics  Geographic information in Google Maps  Social Media Apps (Twitter, Facebook, etc..)  Search Engine Web Crawling Results
  • 29. | © Copyright 2015 Hitachi Consulting29 NoSQL – Document Stores The popular… Key Document John Eva Lou Total  Same as Key/Value store, where the value is a document with a specific format. E.g. JSON, BSON, XML, etc.  A document is retrieved by a key, or  Since the format of the documents is known (e.g. JSON), a query language can be used to retrieve documents.  E.g. retrieve all the document that has attribute “country” and its value equals “Algeria”.  Attributes in side the document are indexed. Documents are often versioned.  Supports wide range of OLTP/Real-time applications  Example Tools  IBM lotus Nots (pioneer)  MongoDB  CouchDB - Apache Couchbase  Microsoft Azure DocumentDB  Content Management and Personalization  User Data Management  Reference Data Management  Lookups for Real-time Operational Intelligence Real-world Scenarios Not a MS Word doc!
  • 30. | © Copyright 2015 Hitachi Consulting30  Represent data in graphical structures: Nodes and Edges.  Nodes represent entities, Edges represent relationships between entities.  Relationships are directed, semantics of the direction is up to the application. E.g. “Married” is reflexive, “Owns” is not.  Each Node/Edge has a set of Key/Value properties  Each Node/Edge has a label (type of entity/relationship)  Optimized to process graph-related queries, E.g. the number of steps needed to get from one node to another node.  Example Tools  Neo4j  OrientDB  Titan  Apache Giraph  Microsoft Graph Engine (Trinity) NoSQL – Graph Stores The clever… Id: 1 Name: Khalid Salama Age: 30 Email: Khalid.Salama@gmail.com Id: 2 Name: Fatima Salama Twitter: @fatbenamar Id: 3 Model: Jaguar Colour: Red Id: 100 Since: 2014 Id: 101 Frequency: 2 Id: 102 Since: 2015 Id: 103 Licence No:234  Social Networks  Network and IT Operations  Fraud Detection  Digital Assets Management Real-world Scenarios Person Person Car Own DriveOwned by Married
  • 31. | © Copyright 2015 Hitachi Consulting31 NoSQL – Graph Stores The clever… O’REILLY - GRAPH DATABASES
  • 32. | © Copyright 2015 Hitachi Consulting32 NoSQL – Graph Stores The clever… index-free adjacency; connected nodes physically “point” to each other in the database Any database behaves like a graphDB; exposes a graph data model through CRUD operations Storage is designed and optimize to store, process, and query graph data structures Graphs are serialized in any database; Relational, Document, or objectDBs
  • 33. | © Copyright 2015 Hitachi Consulting33 NoSQL – Memory Cache Stores The fast….  Usually a simple data structure (Key/Value)  The value can be a simple data type (String) or complex data objects.  A memory-based store, usually, with persistence option.  Used to optimize “hot” data access.  Manages distributed web application session states.  Can help to survive service downtime.  Evolves read/write strategies (write-through, write-behind, etc.), data expiration, and conflict resolution techniques.  Example Tools:  Memcached (pioneer)  Redis
  • 34. | © Copyright 2015 Hitachi Consulting34 NoSQL – Other Data Stores Storage for other data structures Object-oriented databases  Developed in the 1980s motivated by the common use of object-oriented programming.  Simply store the objects in a database in a way that corresponds to their representation in the application, without the need of conversion or decomposition.  The relationships between the objects, e.g. inheritance should also be maintained in the database.  Examples: Caché, Db4o, Versant Object Database Resource Description Framework (RDF) Data stores – Triple Data Stores  Originally developed for describing metadata of IT resources.  Used in connection with the sematic web, and other applications.  The RDF model represents information as triples in the form of subject-predicate-object.  Examples: MarkLogic, Virtuoso, Jena, Sesame, Algebraix Multi-dimensional databases, Multi-value databases, Time-series databases, Event Sourcing databases, Multi-modal data stores, etc.
  • 35. | © Copyright 2015 Hitachi Consulting35 NoSQL Tools & Technologies Typical Usage Lineage Tools Key/Value  Flexible data structure  Dictionary/ lookup  Value can be anything Amazon’s Dynamo  Apache Accumulo  Riak KV  Redis Column Family (Tabular/wide column)  Column-oriented access  BigData with real-time read/write random access  Extensible Google’s BigTable  Apache Cassandra  Apache HBase  HBase on Microsoft HDInsight Document  Query-able data  Objects (complex structure) in JSON, BSON, XML, etc.  CRUD apps IBM’s Lotus Notes  MongoDB  CouchDB  Apache CouchBase  Microsoft Azure DocumentDB Graph  Social networks  Fraud detection  Relationship-heavy data Graph Theory  Neo4j  OrientDB  Titan  Apache Giraph  Microsoft Graph Engine Memory Cache  Non-durable data  Fast access LiveJournal’s Memcached  Redis  Microsoft Azure Redis  Microsoft Azure Memcached (Preview) http://db-engines.com/en/ranking
  • 36. | © Copyright 2015 Hitachi Consulting36 Solution Architecture Patterns
  • 37. | © Copyright 2015 Hitachi Consulting37 Lambda Architecture NoSQL and Speed Layer Hot Path Cold Path
  • 38. | © Copyright 2015 Hitachi Consulting38 NoSQL Usage Patterns Common Scenarios NoSQL Online Apps Read/Write  Apps with NoSQL Backend data store  High Throughputs  Scalability and Availability  Column Family, Graph & Document stores
  • 39. | © Copyright 2015 Hitachi Consulting39 NoSQL Usage Patterns Common Scenarios RDBMS Read/Write NoSQL Online Apps Read/Write  NoSQL is used to cache web content, personalization, reference data  Business Transactions are invoked and stored into RDBMS  Key/Value & Memory Cache stores
  • 40. | © Copyright 2015 Hitachi Consulting40 NoSQL Usage Patterns Common Scenarios RDBMS NoSQL Online Apps Read/Write Process  NoSQL stores in-progress transactions/activities (i.e., purchase basket, forms, user session, etc.)  When transactions are submitted, they are processed into a RDBMS  Document Stores
  • 41. | © Copyright 2015 Hitachi Consulting41 NoSQL Usage Patterns Common Scenarios NoSQL Online Apps Read Data WarehouseRDMS Read/Write ETL (batch) ETL (batch)  Data is Extracted, Transformed, loaded from OLTPs to a DW  Aggregations, KPIs, and scores are computed in using Batch Processing  Results are populated to a NoSQL data store for reference use in apps  Usually App hot read and ETL batch Write  Document & Graph stores Hot Path Cold Path E.g., Single Customer View:  Customer Matching, customer KPIs, segment assignment, and propensity scoring are performed as batch processing in DW  The output goes to NoSQL to be used for real-time recommendation, campaigning, targeted advertising, etc.
  • 42. | © Copyright 2015 Hitachi Consulting42 NoSQL Usage Patterns Common Scenarios NoSQL Online Apps Data Warehouse ETL (batch) Stream Processing Write ETL (batch) Send Events Lookup NoSQL  Used as lookups for stream processing solutions  Can also be a persistent store of the processed events  ETL process periodically extract data from NoSQL into a DW, and update lookups (batch)  Column Family, Key/Value, Graph & Document stores Cold Path Hot Path
  • 43. | © Copyright 2015 Hitachi Consulting43 Azure Redis Cache
  • 44. | © Copyright 2015 Hitachi Consulting44 Azure Redis Cache A Key/Value in memory store – a.k.a Data Structures Store
  • 45. | © Copyright 2015 Hitachi Consulting45 Azure Redis Cache A Key/Value in memory store – a.k.a Data Structures Store
  • 46. | © Copyright 2015 Hitachi Consulting46 Azure Redis Cache In the example, Radis caches sql query (key), and query xml result (value) A Key/Value in memory store – a.k.a Data Structures Store You can also store hash tables, lists, and sorted lists
  • 47. | © Copyright 2015 Hitachi Consulting47 Azure Table Storage
  • 48. | © Copyright 2015 Hitachi Consulting48 Azure Table Storage A Key/Value NoSQL Store  Key/attribute store with a schema-less design.  Adapts your data as the needs of your application evolve.  Access to data is fast and cost-effective for all kinds of applications.  Significantly lower in cost than traditional SQL for similar volumes of data.  Web applications, address books, device information, metadata, etc.  Row Key and Partition Key must be defined for the entity  No complex joins, foreign keys, or stored procedures.  Supports OData protocol and LINQ with WCF Data Service .NET Libraries. Storage Account Table Entity  Partition Key  Row Key * *
  • 49. | © Copyright 2015 Hitachi Consulting49 Azure Table Storage A Key/Value NoSQL Store
  • 50. | © Copyright 2015 Hitachi Consulting50 Azure Table Storage A Key/Value NoSQL Store Namespace to use Class to inherit Row Key and Partition Key properties to set
  • 51. | © Copyright 2015 Hitachi Consulting51 Azure Table Storage A Key/Value NoSQL Store If a Table Entity has an attribute that is a collection or object, Table Storage will ignore it. Thus, needs to serialized/ de-serialized
  • 52. | © Copyright 2015 Hitachi Consulting52 Azure Table Storage A Key/Value NoSQL Store
  • 53. | © Copyright 2015 Hitachi Consulting53 Azure Table Storage A Key/Value NoSQL Store
  • 54. | © Copyright 2015 Hitachi Consulting54 Azure DocumentDB
  • 55. | © Copyright 2015 Hitachi Consulting55 Azure DocumentDB A cloud-based Document store DocumentDB Account Database Collection * * Document *  JSON Database  Elastically scalable throughput and storage  Ad hoc queries with familiar SQL syntax  JavaScript execution within the database  Tunable consistency levels  Fully managed  Open by design
  • 56. | © Copyright 2015 Hitachi Consulting56 Azure DocumentDB Azure DocumentDB Structure
  • 57. | © Copyright 2015 Hitachi Consulting57 Azure DocumentDB Getting Started
  • 58. | © Copyright 2015 Hitachi Consulting58 Azure DocumentDB Getting Started
  • 59. | © Copyright 2015 Hitachi Consulting59 Azure DocumentDB DocumentDB Consistency Model Description Strong The slowest of the four, but is guaranteed to always return correct data. Session Ensures that an application always sees its own writes correctly, but allows access to potentially out-of-date or out-of-order data written by other applications. Bound Staleness Ensures that an application will see changes in the order in which they were made. This option does allow an application to see out-of-date data, but only within a specified window, e.g., 500 milliseconds. Eventual Provides the fastest access, but also has the highest chance of returning out-of-date data. https://en.wikipedia.org/wiki/Consistency_model
  • 60. | © Copyright 2015 Hitachi Consulting61 Azure DocumentDB Getting Started
  • 61. | © Copyright 2015 Hitachi Consulting62 Azure DocumentDB Databases and Collections
  • 62. | © Copyright 2015 Hitachi Consulting63 Azure DocumentDB Document Explorer
  • 63. | © Copyright 2015 Hitachi Consulting64 Azure DocumentDB Query Explorer
  • 64. | © Copyright 2015 Hitachi Consulting65 Azure DocumentDB .NET Code Microsoft.Azure.Documents.Client.DocumentClient Includes the following operations,  Create/Delete  Read  Replace/Upsert, for the following objects,  Database  Collection  Document  Attachment  User  Permission  USerDefinedFunction  StoredProcedure  Trigger
  • 65. | © Copyright 2015 Hitachi Consulting66 Azure DocumentDB .NET Code
  • 66. | © Copyright 2015 Hitachi Consulting67 HBase on Azure
  • 67. | © Copyright 2015 Hitachi Consulting68 Introducing Apache HBase HBase & Hadoop Big Data Ecosystem Hadoop Distributed File System (HDFS) Applications In-Memory Stream SQL  Spark- SQL NoSQL Machine Learning …. Batch Yet Another Resource Negotiator (YARN) Search Orchest. MgmntAcquisition Named Node DataNode 1 DataNode 2 DataNode 3 DataNode N
  • 68. | © Copyright 2015 Hitachi Consulting69 Introducing Apache HBase HBase & Hadoop Big Data Ecosystem Hadoop Distributed File System (HDFS) HBase Cluster …. Yet Another Resource Negotiator (YARN)Named Node DataNode 1 DataNode 2 DataNode 3 DataNode N HBase Master HBase Region Server Region 1 …. Zookeeper Services Write-ahead log MemStore HFile Region 2 Write-ahead log MemStore HFile Region N Write-ahead log MemStore HFile APIs: Java Client, Thrift, Avro, REST
  • 69. | © Copyright 2015 Hitachi Consulting70 Cold Write (batch) HBase on Azure HDInsight a brief introduction to HBase  HDFS is suitable for batch processing (i.e., scan over big data files)  HBase is optimized for fast record lookups, and singleton operations  HDFS is usually the file system for HBase  Rows maintained in sorted lexicographical order for efficient rows scan  Row ranges are partitioned into tablets  Column are grouped into column families for locality indication  Simple commandlet: create, alter, drop, list, describe, get, put, incr, scan, count, delete, truncate, etc. https://learnhbase.wordpress.com/2013/03/02/hbase-shell-commands/  APIs support batch operations  A common choice with stream processing solutions HBase HDFS Read/Write Hot Write (real-time) Hot Read (real-time)
  • 70. | © Copyright 2015 Hitachi Consulting71 HBase on Azure HDInsight Getting Started
  • 71. | © Copyright 2015 Hitachi Consulting72 HBase on Azure HDInsight Getting Started
  • 72. | © Copyright 2015 Hitachi Consulting73 HBase on Azure HDInsight Exploring HBase
  • 73. | © Copyright 2015 Hitachi Consulting74 HBase on Azure HDInsight Exploring HBase
  • 74. | © Copyright 2015 Hitachi Consulting75 HBase on Azure HDInsight Exploring HBase Basic Commands create 'table_test', {NAME=>'f1'},{NAME=>'f2'} List put 'table_test', 'rowKey1', 'f1:firstname', 'khalid' put 'table_test', 'rowKey1', 'f1:lastname', 'salama' put 'table_test', 'rowKey1', 'f2:level', '3' put 'table_test', 'rowKey2', 'f1:firstname', 'paul' put 'table_test', 'rowKey2', 'f1:lastname', 'linehame' put 'table_test', 'rowKey2', 'f1:email', 'plinhame@hitachi.com' get 'table_test', 'rowKey1', 'f1:lastname' disable 'table_test' drop 'table_test'
  • 75. | © Copyright 2015 Hitachi Consulting76 HBase on Azure HDInsight Exploring HBase Other Commands get ‘table test’ , ‘rowkey1’ get ‘table test’, ‘rowkey1’ , {COLUMN => [f1:lastname]} get ‘table test’, ‘rowkey2’, {TIMERANE => [0:1000]} scan ‘table test’ {LIMIT =>100} scan ‘table test’ {STARTROW=>’rowkey5’ , STOPROW=‘rowkey10’}
  • 76. | © Copyright 2015 Hitachi Consulting77 HBase on Azure HDInsight HBase Reader/Writer using .NET
  • 77. | © Copyright 2015 Hitachi Consulting78 HBase on Azure HDInsight HBase Reader/Writer using .NET
  • 78. | © Copyright 2015 Hitachi Consulting79 HBase on Azure HDInsight HBase Reader/Writer using .NET
  • 79. | © Copyright 2015 Hitachi Consulting80 How to Get Started with NoSQL?  Read the slides!  Azure Documentation – Azure Table Storage https://azure.microsoft.com/en-gb/documentation/articles/storage-dotnet-how-to-use-tables/  Azure Documentation – Azure DocumentDB https://azure.microsoft.com/en-gb/documentation/services/documentdb/  Azure Documentation – Azure Redis Cache https://azure.microsoft.com/en-gb/documentation/services/redis-cache/  Azure Documentation – HBase on HDInsight https://azure.microsoft.com/en-gb/documentation/articles/hdinsight-hbase-overview/  GitHub – tweet-sentiment with HBase https://github.com/maxluk/tweet-sentiment  Azure Documentation – Understanding NoSQL on Azure https://azure.microsoft.com/en-gb/documentation/articles/fundamentals-data-management-nosql-chappell/  db-engines - Knowledge Base of Relational and NoSQL Database Management Systems http://db-engines.com/en/  NoSQL Databases http://nosql-database.org/
  • 80. | © Copyright 2015 Hitachi Consulting81 How to Play with neo4j A widely-used GraphDB A B D EC
  • 81. | © Copyright 2015 Hitachi Consulting82 How to Play with neo4j A widely-used GraphDB A B D EC Following Following Following Following Following FollowingFollowing Following Following
  • 82. | © Copyright 2015 Hitachi Consulting83 How to Play with neo4j A widely-used GraphDB A B D EC Following Following Following Following Following FollowingFollowing Following Following P2P1 P3 Posted Posted Posted
  • 83. | © Copyright 2015 Hitachi Consulting84 How to Play with neo4j A widely-used GraphDB A B D EC P2P1 P3 Following Following Following Following Following FollowingFollowing Following Following Likes Likes Likes Likes Likes Likes LikesPosted Posted Posted
  • 84. | © Copyright 2015 Hitachi Consulting85 How to Play with neo4j A widely-used GraphDB CREATE (a:User{name:"Khalid Salama", grade:"Manager"}), (b:User{name:"Paul Lineham", grade:"Senior Manager"}), (c:User{name:"Vaughn Rees", grade:"Senior Manager"}), (d:User{name:"Sutha Thiru", grade:"Director"}), (e:User{name:"Mark Hill", grade:"VP"}), (a)-[:Following{since:'2014'}]->(d), (a)-[:Following{since:'2014'}]->(b), (b)-[:Following{since:'2010'}]->(a), (d)-[:Following{since:'2011', strength:"high"}]->(e), (e)-[:Following{since:'2014'}]->(d), (e)-[:Following{since:'2015'}]->(c), (c)-[:Following]->(d), (c)-[:Following{since:'2013', strength:"low"}]->(a), (b)-[:Following]->(c), (p1:Post{title:"post 1", lastupdate:"01/01/2016", tags:['sports','life style']}), (p2:Post{title:"post 2", lastupdate:"03/05/2015"}), (p3:Post{title:"post 3", lastupdate:"121/7/2015", tags:['economics','politcs']}), (a)-[:Posted]->(p1), (d)-[:Posted]->(p2), (c)-[:Posted]->(p3), (b)-[:Liked]->(p1), (c)-[:Liked]->(p1), (a)-[:Liked]->(p2), (b)-[:Liked]->(p2), (e)-[:Liked]->(p2), (a)-[:Liked]->(p3), (e)-[:Liked]->(p3)
  • 85. | © Copyright 2015 Hitachi Consulting86 How to Play with neo4j A widely-used GraphDB -- fetch one node MATCH (u:User{name:"Khalid Salama"}) RETURN u -- fetch an attribute of a node MATCH (u:User{name:"Khalid Salama"}) RETURN u.grade -- fetch nodes by conditions MATCH (u:User{grade:"Senior Manager"}) RETURN u -- MATCH (u:User) WHERE u.grade = 'Senior Manager' RETURN u -- MATCH (u:User) WHERE u.name =~ "Sutha.+" -- START WITH, END WITH, CONTAIN, IN [,], RETURN u -- MATCH ()-[r:Posted]->(p:Post) WHERE 'sports' IN p.tags RETURN p -- Whom khalid is following? MATCH (x:User{name:"Khalid Salama"})-[r:Following]->(y:User) RETURN x,r,y -- Who is Following Khalid MATCH (x:User{name:"Khalid Salama"})<-[r:Following]-(y:User) RETURN x,r,y -- Update MERGE (u:User { name:"Khalid Salama" }) SET u.practice = "Data Insights & Analytics" RETURN u - Get Count of Posts MATCH (p:Post) RETURN COUNT(p) -- Get User Count By Grade MATCH (u:User) RETURN u.grade, COUNT(u) -- Get User and Followers MATCH (u:User)<-[:Following]-(f:User) RETURN u.name AS User,COLLECT(f.name) AS followrs,COUNT(f) AS Total -- Constraint CREATE CONSTRAINT ON (u:User) ASSERT u.name IS UNIQUE -- Index CREATE INDEX ON :User(grade) -- Get users following each other MATCH (u1:User)-[:Following]->(u2:User)-[:Following]->(u1) RETURN u1.name,u2.name -- Get Users likes a post posted by a follower MATCH (u:User)-[:Liked]->(p:Post)<-[:Posted]-(u2:User)-[:Following]->(u) RETURN u,p,u2 -- Get Following of Following MATCH (u:User)-[:Following]->()-[:Following]->(u2:User) Return u.name,COLLECT(DISTINCT u2.name) -- Get User with max 3 steps from Paul MATCH (u:User)-[:Following*..3]->(us:User{name:"Paul Lineham"}) Return u -- Shortest path MATCH (u1:User{name:"Mark Hill"}), (u2;User{name:"Paul Lineham"}), p=SHORTESTPATH((u1)-[:Following*..10]->(u2)) RETURN p -- Get nodes having a property MATCH(p) WHERE EXSITS(p.tags) http://neo4j.com/docs/developer-manual/current/#cypher-query-lang
  • 86. | © Copyright 2015 Hitachi Consulting87 My Background Applying Computational Intelligence in Data Mining • Honorary Research Fellow, School of Computing , University of Kent. • Ph.D. Computer Science, University of Kent, Canterbury, UK. • M.Sc. Computer Science , The American University in Cairo, Egypt. • 25+ published journal and conference papers, focusing on: – classification rules induction, – decision trees construction, – Bayesian classification modelling, – data reduction, – instance-based learning, – evolving neural networks, and – data clustering • Journals: Swarm Intelligence, Swarm & Evolutionary Computation, , Applied Soft Computing, and Memetic Computing. • Conferences: ANTS, IEEE CEC, IEEE SIS, EvoBio, ECTA, IEEE WCCI and INNS-BigData. ResearchGate.org
  • 87. | © Copyright 2015 Hitachi Consulting88 Thank you!