SlideShare a Scribd company logo
1 ©	Hortonworks	Inc.	2011	– 2017.	All	Rights	Reserved1 ©	Hortonworks	Inc.	2011	– 2017.	All	Rights	Reserved
Thiago	Santiago
Solutions	Engineer	Latam
2 ©	Hortonworks	Inc.	2011	– 2017.	All	Rights	Reserved2
Thiago	Santiago
Engenheiro de	Soluções Hadoop	na Hortonworks
• 10	anos de	experiência profissional de	TI	em desenvolvimento e	arquitetura e	aplicações.
• Experiência em Plataformas DataGrid,	Soluções NoSQL	e	arquiteturas distribuídas de	computação e	GoF
Design	Patterns
• Experiência em ALM	(Application	Lifecycle	Management)	e	CI	(Continuous	integration)
Ultimos Projetos em BigData
• Vivo
• TIM
• Banco	do	Brasil
• B2W	
linkedin.com/in/thiagosantiago/
3 ©	Hortonworks	Inc.	2011	– 2017.	All	Rights	Reserved
The	Buzzword…
4 ©	Hortonworks	Inc.	2011	– 2017.	All	Rights	Reserved
BigData Implícito…
5 ©	Hortonworks	Inc.	2011	– 2017.	All	Rights	Reserved
6 ©	Hortonworks	Inc.	2011	– 2017.	All	Rights	Reserved
O	que	é BigData?
7 ©	Hortonworks	Inc.	2011	– 2017.	All	Rights	Reserved
O	Big	Data	procura responder	a	perguntas como:	Por quê?	E	se?	O	que	
acontecerá?	Como	otimizar?	E	fornecer novos insights
O	intuito final	é apenas um: dominar a	informação!
Big	Data	é baseado em 3	pilares:
Veracidade e	
Valor
8 ©	Hortonworks	Inc.	2011	– 2017.	All	Rights	Reserved
Informação é poder!
9 ©	Hortonworks	Inc.	2011	– 2017.	All	Rights	Reserved
10 ©	Hortonworks	Inc.	2011	– 2017.	All	Rights	Reserved
11 ©	Hortonworks	Inc.	2011	– 2017.	All	Rights	Reserved
Informação é poder!
12 ©	Hortonworks	Inc.	2011	– 2017.	All	Rights	Reserved
?
Quantas vezes Homens e
mulheres no Tinder movem
seus dedos para a esquerda e
direita nas telas de seus
dispositivos por minuto?
13 ©	Hortonworks	Inc.	2011	– 2017.	All	Rights	Reserved
Mudança de	era…
Papa	Bento
Papa	Francisco
14 ©	Hortonworks	Inc.	2011	– 2017.	All	Rights	Reserved
8ZB
DATAINTERNET
OF
ANYTHING
44ZB
DATA
2020
14 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Byte
Kilobyte	(KB)
Megabyte	(MB)
Gigabyte	(GB)
Terabyte	(TB)	
Petabyte	(PB)
Exabyte	(EB)	
Zettabyte	(ZB)
15 ©	Hortonworks	Inc.	2011	– 2017.	All	Rights	Reserved
Como	analisar essa quantidade de	informação?
16 ©	Hortonworks	Inc.	2011	– 2017.	All	Rights	Reserved
Hadoop
https://pt.wikipedia.org/wiki/Hadoop
Plataforma de	software	em Java	de	computação distribuída voltada para	clusters	e	processamento de	grandes
massas de	dados.	
Foi inspirada no	MapReduce e	no	GoogleFS (GFS).	Trata-se	de	um	projeto da	Apache	de	alto	nível,	que	vai
sendo construído por uma comunidade de	contribuidores Java.	
O	Yahoo!	tem	sido o	maior contribuidor do	projeto,	utilizando essa plataforma intensivamente em seus
negócios.
17 ©	Hortonworks	Inc.	2011	– 2017.	All	Rights	Reserved
Players
Hortonworks	&	IBM	- Powering	the	Future	of	Data
IBM	Spectrum	Scale		&	HDP
Hortonworks	Products
Use	Cases
19 ©	Hortonworks	Inc.	2011	– 2017.	All	Rights	Reserved
Expanded	partnership	combines	the	best	for	our	clients
20 ©	Hortonworks	Inc.	2011	– 2017.	All	Rights	Reserved
Innovation Pervasive in the Design
Power Systems S822LC for Big Data
Not Just Another Intel Server
NVIDIA:	
Tesla	K80	GPU	Accelerator
Linux	by	Redhat:
Redhat 7.2	Linux	OS
Mellanox: InfiniBand/Ethernet	
Connectivity	in	and	out	of	server
HGST:	Optional	NVMe Adapters
Alpha	Data	with Xilinx	FPGA:	
Optional	CAPI	Accelerator
Broadcom:	Optional	PCIe Adapters
QLogic:	Optional	Fiber	Channel	PCIe
Samsung:	SSDs	&	NVMe
Hynix,	Samsung,	Micron: DDR4
IBM:	POWER8	CPU
21 ©	Hortonworks	Inc.	2011	– 2017.	All	Rights	Reserved
Feedback	from	Early	Adopters	of	HDP	on	Power
“As	a	member	of	ODPi,	AsiaInfo is	selecting	IBM	OpenPOWER servers	and	Hortonworks	Data	Platform	to	create	a	modern	
data	platform	for	our	clients.		AsiaInfo recognizes	the	performance	and	flexibility	advantages	that	IBM	OpenPOWER
servers	and	open	source	Hortonworks	Data	Platform	bring	to	our	clients.”
Zhu	Jun,	Senior	Product	Line	Executive,	Orange	Cloud,	Product	&	Technology	Center,		AsiaInfo Data
“The	Center	for	Innovation	Management	Studies	at	the NC	State	Poole	College	of	Management supports	an	advanced	
analytic	platform	to	help	businesses	mine	key	insights	from	unstructured	text	by	harnessing	open	technologies	such	as	
IBM	Power	Systems,	Hortonworks	Data	Platform	and	IBM	Watson	Content	Analytics.
The	combination	of	open	technologies	from	IBM	and	Hortonworks	enable	advanced	analytics	to	mine	key	insights	from	
unstructured	data.”
Michael	Kowolenko,	Ph.D.,	Managing	Director	at	Institute	of	Next	Generation	Computing	at NC	State	University
22 ©	Hortonworks	Inc.	2011	– 2017.	All	Rights	Reserved
4X
Threads	per	core
4X
Mem. Bandwidth1
6X
More	cache2 @	
Lower	Latency
SMT=Simultaneous	Multi-Threading				
OLTP	=	On-Line	Transaction	Processing		
These	design	decisions	result	in	best	performance	for	data	centric	workloads	
like:	
Spark,	Hadoop,	Database,	NoSQL,	Big	Data	Analytics,	OLTP
POWER8:	Designed	for	data	to	deliver	breakthrough	performance
POWER8
SMT8
x86
Hyperthread
Parallel	Processing
POWER8
pipe
Data	flow
x86	pipe POWER8
x86 POWER8	+	
OpenPOWER
x86
1. Up to 4X depending on specific x86 and POWER8 servers being compared
2. Up to 6X more cache comparing Intel e7-8890 servers to 12 core POWER8 servers. See speaker notes for more details
23 ©	Hortonworks	Inc.	2011	– 2017.	All	Rights	Reserved
Hortonworks	HDP	running	on	POWER8	Price-Performance	Guarantee
• •
IBM	Power	Systems	guarantees	the	Power	S822LC	for	Big	Data	system	built	with	POWER8	delivers	at	least	a	3X	price-performance	
advantage vs.	x86	based	results	when	running	a	customer	application/workload	with	Tez/Hive	LLAP	on	Hortonworks	HDP	under	the	
conditions	noted	below.		A	Worker	Node	is	a	server	carrying	out	the	HDP	query	functions,	with	one	Worker	Node	per	server.	
3X	price-performance	means	that	the	customer's	documented	throughput	performance	on	the	cluster	of	S822LC	for	Big	Data	Worker	Nodes	divided	by	the	
price	of	the	cluster	of	Worker	Nodes		will	be	at	least	3	times	higher	than	the	customer's	documented	throughput	performance	on	the	cluster	of	x86	based	
Worker	Nodes	divided	by	the	price	of	the	cluster	of	x86	Worker	Nodes.
EX: If queries per second on the cluster of S822LC Worker Nodes are 30,000 and 10,000 on the cluster of x86 based Worker Nodes, while the price of the S822LC Worker Node cluster is
$10,000, and the price of the x86 based Worker Node cluster is $10,000, then the Throughput Performance Per Price would be exactly 3 times higher and the guarantee would be met."
Notes:
1. Client’s	Power	S822LC	for	BD	Worker	Nodes	and	the	x86	Worker	Nodes	must	be	running	at	similar	utilization	rates	of	at	least	50%	or	higher,	using	the	same	software	stack	as	described	in	Note	#4,	and	which	are	configured	similarly.
2. Client’s	Power	S822LC	for	BD	performance	cannot	be	constrained	by	I/O	subsystem.	Specifically,	the	I/O	subsystem	on	the	Power S822LC	for	BD	Worker	Node	must	achieve	greater	than	or	equal	I/O	bandwidth	and	operations	per	second	than	the	x86	Worker	Node.
3. Client’s	Power	S822LC	for	BD	Worker	Node’s	physical	memory	must	be	the	same	or	greater	than	the	physical	memory	on	the	x86	Worker	Node.
4. Applicable	software	stack	is	Tez/Hive	LLAP	on	HDP	2.6	or	later	for	both	the	Power	S822LC	and	x86-based	Worker	Nodes.
5. Client	is	responsible	for	demonstrating	comparable	real-world	representative	workload	between	the	Power	S822LC	for	BD	Worker	Node	and	the	x86	Worker	Node	through	the	use	of	the	IBM	provided	tools	and	comparable	tools	on	x86	systems.
6. 3X	guarantee	is	based	on	a	list	price	for	x86	servers	from	Dell,	Cisco,	HP	or	Lenovo	based	on	E5-2600	v4	or	earlier	processor	technology	and	the	IBM	S822LC	for	Big	Data.
The	IBM	Power	S822LC	for	Big	Data	servers	(22-core/2.89	GHz)	used	as	Worker	Nodes	must	be	purchased	from	IBM	or	an	authorized	IBM	Business	Partner	prior	to	September	30,	2017.		
The	guarantee	period	is	valid	for	three	(3)	months	from	the	date	of	purchase.	The	x86-based	Worker	Nodes	must	be	comparably	configured	branded	servers	from	Cisco,	Dell,	HP,	or	
Lenovo	and	the	client	is	responsible	for	all	Hortonworks	licenses.
3X	throughput	performance	per	price	means	that	the	customer's	documented	throughput	performance	on	the	cluster	of	Power	S822LC	for	BD	Worker	Nodes	based	on	either	queries,	
operations	or	transactions	per	second	divided	by	the	price	of	the	cluster	of	Worker	Nodes	will	be	at	least	3	times	higher	than	the	customer's	same	documented	throughput	
performance	on	the	cluster	of	x86	Worker	Nodes	divided	by	the	price	of	said	cluster	of	x86	Worker	Nodes.
Remediation:		IBM	will	provide	additional	performance	optimization	and	tuning	services	consistent	with	IBM	Best	Practices,	at no charge.	If	unable	to	reach	the	guaranteed	level	of	
price-performance,	IBM	will	provide	additional	equally	configured	Worker	Nodes	to	those	already	purchased	to	reach	the	guaranteed	level	of	price-performance.
NEW
24 ©	Hortonworks	Inc.	2011	– 2017.	All	Rights	Reserved
HDP	on	POWER	– Reference	Configurations
Switches
1 GbE (1x or 2x):
• IBM 7120-48E (Lenovo G8052) Switch (48x 1GbE + 4x 10GbE ports)
10 GbE (2x typical, 1x allowed):
• IBM 7120-64C (Lenovo G8264) Switch (48x 10GbE + 4x 40GbE), or
• IBM 8831-S48 (Mellanox SX1410) Switch (48x 10GbE + 12x 40GbE)
Additional Config Options:
Network topologies: Flat, Dual Homed, Partial Homed, Full DMZ
Size: POC, min-production (12 node), full rack, multi rack
Balanced Performance Storage Dense
Server Type 1U S821LC (Stratton) 1U S821LC (Stratton) 1U S821LC (Stratton) 2U S822LC (Briggs) 2U S822LC (Briggs) 2U S822LC (Briggs)
Count (Min / Max) 1 / 1 3 / Any 1 / Any 8 / Any 8 / Any 8 / Any
Cores 8 20 20 22 22 11
Memory 32GB 256GB 256GB 256GB 512GB 128GB
Storage - HDD 2x 4TB HDD 4x 4TB HDD 4x 4TB HDD 12x 4TB HDD 8x 6TB HDD 12x 8TB HDD
Storage - SSD + 4x 3.8TB SSD
Storage Controller Marvell (internal)
LSI MegaRAID 9361-8i
(2GB cache)
LSI MegaRAID 9361-8i
(2GB cache)
LSI MegaRAID 9361-8i
(2GB cache)
LSI MegaRAID 9361-8i
(2GB cache)
LSI MegaRAID 9361-8i
(2GB cache)
Network - 1GbE 4 ports (internal) 4 ports (internal) 4 ports (internal) 4 ports (internal) 4 ports (internal) 4 ports (internal)
Network - 10GbE 2 ports 2 ports 2 ports 2 ports 2 ports 2 ports
System Mgmt Node Master Node Edge Node
Worker Node
Reference Architecture Document Link:
https://www-01.ibm.com/common/ssi/cgi-bin/ssialias?htmlfid=POL03270USEN&
Hortonworks	&	IBM	- Powering	the	Future	of	Data
IBM	Spectrum	Scale		&	HDP
Hortonworks	Products
Use	Cases
26 ©	Hortonworks	Inc.	2011	– 2017.	All	Rights	Reserved
IBM	Spectrum	Scale	is	a	flexible	and	scalable	software	defined	file	storage	for	analytics	
workloads.	
Enterprises	around	the	globe	have	deployed	IBM	Spectrum	Scale	for
Compute	clusters	(HPC)
Big	data	and	analytics	
High	performance	backup	and	restores
Content	repositories
27 ©	Hortonworks	Inc.	2011	– 2017.	All	Rights	Reserved
Spectrum	Scale	+	HDP
IBM	Spectrum	Scale	
• Spectrum	Scale	becomes	the	storage	layer	in	your	HDP	environment.	
• Spectrum	Scale	supports	accessing	data	using	HDFS	API	and	hence	is	transparent	to	the	applications	using	HDP.	
• With	the	federation	support,	Spectrum	Scale	can	co-exist	with	HDFS.	
• Enterprise	class	storage	for	your	Hadoop/Spark	environment	(Encryption,	Compression,	Tiering,	DR…)
28 ©	Hortonworks	Inc.	2011	– 2017.	All	Rights	Reserved
Scalability,	Performance
Scalability
HDFS	can	scale	upto 350	Million	files	with	a	single	name	node	due	to	scale-out	architecture	limitation.	Name	
node	becomes	a	bottleneck.	Users	have	to	use	federation	functionality	to	overcome	this	limitation.	
Spectrum	Scale	has	parallel	file	system	architecture	different	from	scale-out	architecture	of	HDFS.	No	single	
metadata	server	in	the	architecture	as	a	bottleneck.	Metadata	serving	function	is	distributed	across	the	cluster.	
Test	limit	for	number	of	files	per	filesystem	is	9	Billion.	We	have	Spectrum	Scale	customers	running	in	
production	beyond	this	test	limit.	
Performance
Performance	depends	on	the	underlying	hardware	configurations.	But	we	claim	comparable	or	better	
performance	than	HDFS	on	all	equivalent	hardware	configurations.	Here	is	the	report	of	tests	performed	by	
NASA	for	evaluating	Spectrum	Scale	vs	HDFS	- http://files.gpfsug.org/presentations/2016/SC16/06_-
_Carrie_Spear_-_Spectrum_Sclale_and_HDFS.pdf
29 ©	Hortonworks	Inc.	2011	– 2017.	All	Rights	Reserved
Reduce	data	center	footprint	with	Spectrum	Scale
HDFS
Raw	
data
ext4
ext4
write move copy
Traditional
applications
Copies	in	both	HDFS	and	ext4	
Spectrum	
Scale
Application
writes	direct
to	Hadoop	
Path	with
NFS/SMB/
Object/POSIX	
direct-read with
NFS/SMB/
Object/POSIX
Raw	
data
Traditional
applications
Multiple copies with HDFS based workflow Spectrum Scale in-place analytics (No copies required)
Hadoop analysis Jobs Hadoop
analysis Jobs
Direct	read,	one	version
Data	Scientists	waste	days	just	copying	data	to	HDFS No	copies	required	with	Spectrum	Scale
Costly	data	protection	- Default	uses	3-way	replication	with	
HDFS	
IBM	ESS	Software	RAID	eliminates	need	for	3	way	replication.	
Just	30%	extra	storage	requirement.		
[Erasure	coding	in	HDFS	has	limitations	and	is	good	only	for	cold	data]
Traditional
applications Traditional
applications
HDFS	APIs
HDFS	APIs
Example:	For	5PB	of	data,	HDFS	requires	15PB	of	storage Example:	For	5PB	of	data,	ESS	requires	6.5PB	of	storage
Copy	process	can	take	hours/days	&	eventually	results	are	based	on	stale	data.
30 ©	Hortonworks	Inc.	2011	– 2017.	All	Rights	Reserved
Spectrum	Scale	vs	Competition
Supported	Interfaces	(In-place	Analytics)	 HDFS MapR-FS NetApp EMC Isilon IBM	Spectrum	Scale
HDFS	API ü ü ü ü ü
NFS ü ü ü ü ü
SMB ü ü
Object ü ü
POSIX ü ü
ü True	SDS	solution	to	overcome	HDFS	scalability	and	availability	limitations.	SW	only	option	available	that	can	be	directly	installed	on	
commodity	hardware	running	Hadoop	clusters.	
ü Reduce	data	center	footprint	with	industry’s	best	in-place	analytics	solution.	No	need	to	copy	data	to	&	from	HDFS.	
Supported	Storage Architecture HDFS MapR-FS NetApp EMC Isilon IBM	Spectrum	Scale
Shared	nothing	(Storage	SW	that	runs	on	commodity	servers) ü ü ü
Shared storage	(Separate	storage	HW)	 ü ü ü
- Spectrum	Scale	makes	Hortonworks	Data	Platform	stronger	against	MapR that	bundles	proprietary	file	system	MapR-FS.
- Use	formal	certification	for	Spectrum	Scale	with	Hortonworks	as	your	Cloudera	attack.
31 ©	Hortonworks	Inc.	2011	– 2017.	All	Rights	Reserved
Architecture	choices
Shared	Nothing	model	
Good for:
• High data locality requirements
• Starting small
Shared	Storage	model	
Good for:
• Large capacity deployments
• Requirement to grow storage independent of compute
• In-place analytics, mix workload (Hadoop & traditional) on same storage
Support	for	combination	of	both	these	models	in	a	single	Hadoop	cluster	is	coming	soon.	
10 GigE / 40 GigE / InfiniBand (Spectrum Scale I/0)
HDP HDP HDP HDP
Storage-Rich	Servers
• IBM	Power
• Commercial	X86
S S S S
E E
InfiniBand	(RDMA)	/	40	GigE	/	10	GigE
Compute	Nodes
• IBM	Power	or	X86
• Only	Hadoop	services	and	
HDFS	client
ESS
HDP HDP HDP HDP HDP
S Spectrum	Scale
ESS Elastic	Storage	Server(Powered	by	Spectrum	Scale)
Hortonworks	&	IBM	- Powering	the	Future	of	Data
IBM	Spectrum	Scale		&	HDP
Hortonworks	Products
Use	Cases
33 ©	Hortonworks	Inc.	2011	– 2017.	All	Rights	Reserved
The	Datalake
HDP
Data	Science
IT	Systems	&	Ops
HDF
34 ©	Hortonworks	Inc.	2011	– 2017.	All	Rights	Reserved
HORTONWORKS	DATA	FLOW
NIFI
STREAMING	&	INTEGRATION OPERATIONS SECURITY
1.2.0
HDF	3.0
1H2017
1.0.0
HDF	2.0
Mar	2016
*	HDF	3.0	– Shows	current	Apache	branches	being	used.	Final	component	version	subject	to	change	based	on	Apache	release	process.
1.1.0
SuperSet
TP
Ranger
0.7.0
0.5.0
0.6.0
Ambari
2.5.1
2.4.0
2.4.2
Kafka
0.10.1.0
0.9.0
0.10.0
Zookeeper
3.4.6
3.4.6
3.4.6
Storm
1.1.0
1.0.1
1.0.2
SAM
0.5.0
Schema	Registry
0.3.0
HDF	2.1
Aug	2016
Ongoing	Innovation	in	Apache
Hortonworks	Data	Flow	3.0
HDP	2.2
Dec	2014
HDF	1.0
Dec	2014
0.3.0
0.6.1
HDF	1.2
Oct	2015
MiNiFi
0.2.0
1.0.0
0.0.1
0.10.0
35 ©	Hortonworks	Inc.	2011	– 2017.	All	Rights	Reserved
HORTONWORKS	DATA	PLATFORM
Hadoop
&	YARN	
DATA	MGMT DATA		ACCESS GOVERNANCE	&	INTEGRATION OPERATIONS SECURITY
HDP	2.2
Dec	2014
HDP	2.2
Dec	2014
2.2.0
2.4.0
2.6.0
2.7.1
HDP	2.3
Oct	2015
2.7.3
HDP	2.6*
1H2017
2.7.1
HDP	2.4
Mar	2016
*	HDP	2.6	– Shows	current	Apache	branches	being	used.	Final	component	version	subject	to	change	based	on	Apache	release	process.
**	Spark	1.6.3+	Spark	2.1	– HDP	2.6	supports	both	Spark	1.6.3	and	Spark	2.1	as	GA.
***	Hive	2.1	is	GA	within	HDP	2.6.
****	Apache	Solr	is	available	as	an	add-on	product	HDP	Search.
2.7.3
Sqoop
1.4.4
1.4.5
1.4.4
1.4.6
1.4.6
1.4.6
1.4.6
Druid
0.9.2
Knox
0.4.0
0.5.0
0.6.0
0.11.0
0.6.0
0.9.0
Ranger
0.4.0
0.5.0
0.7.0
0.5.0
0.6.0
Ambari
1.4.4
2.0.0
1.5.1
2.1.0
2.5.0
2.2.1
2.4.0
Kafka
0.8.2
0.8.1
0.10.1.0
0.9.0
0.10.0
Zookeeper
3.4.5
3.4.6
3.4.5
3.4.6
3.4.6
3.4.6
3.4.6
Flume
1.5.2
1.4.0
1.3.1
1.5.2
1.5.2
1.5.2
1.5.2
Solr
4.10.2
4.7.2
5.2.1
5.5.1
****
5.2.1
5.5.1
Slider
0.60.0
0.80.0
0.91.0
0.80.0
0.91.0
Atlas
0.5.0
0.8.0
0.5.0
0.7.0
Accumulo
1.6.1
1.5.1
1.7.0
1.7.0
1.7.0
1.7.0
Phoenix
4.0.0
4.2.0
4.4.0
4.7.0
4.4.0
4.7.0
Storm
0.9.3
0.10.0
0.9.1
1.1.0
0.10.0
1.0.1
Falcon
0.5.0
0.6.0
0.6.1
0.10.0
0.6.1
0.10.0
Tez
0.4.0
0.5.2
0.7.0
0.7.0
0.7.0
0.7.0
Hive
0.12.0
0.13.0
0.14.0
1.2.1
1.2.1+
2.1***
1.2.1
1.2.1+
2.1***
Pig
0.12.0
0.12.1
0.14.0
0.15.0
0.16.0
0.15.0
0.16.0
HDP	2.5
Aug	2016
Oozie
3.3.2
4.1.0
4.0.0
4.2.0
4.2.0
4.2.0
4.2.0
Spark
1.2.1
1.4.1
1.6.3+
2.1**
1.6.0
1.6.2+
2.0**
HBase
0.98.4
0.96.1
0.98.0
1.1.2
1.1.2
1.1.2
1.1.2
Zeppelin
0.7.0
0.6.0
HDP	2.1
April	2014
HDP	2.0
Oct	2013
Ongoing	Innovation	in	Apache
Hortonworks	Data	Platform	2.6
36 ©	Hortonworks	Inc.	2011	– 2017.	All	Rights	Reserved
Hortonworks	Connection:	Services	and	Solutions	for	Your	Success
Data	Services
Hortonworks	Solutions
Enterprise	Data
Warehouse	Optimization
Cyber	Security	and
Threat	Management
Internet	of	Things
and	Streaming	Analytics
Data	Center
Hortonworks	Data	Suite
HDFHDP
Hortonworks
Connection
Cloud
Hortonworks	Data	Cloud
Hortonworks	Connection
Enablement	Subscription
SmartSense™
Premier	Operational	Support
Educational	Services
Professional	Services
Community	Connection
IBM	SoftLayer
37 ©	Hortonworks	Inc.	2011	– 2017.	All	Rights	Reserved37 ©	Hortonworks	Inc.	2011	– 2017.	All	Rights	Reserved
Cyber	Security	and
Threat	Management
Enterprise	Data
Warehouse	Optimization
Internet	of	Things
and	Streaming	Analytics
Hortonworks
ConnectionData	Services
Hortonworks	Solutions
Enterprise	Data
Warehouse	Optimization
Cyber	Security	and
Threat	Management
Internet	of	Things
and	Streaming	Analytics
Data	Center
Hortonworks	Data	Suite
HDFHDP
Cloud
Hortonworks	Data	Cloud
Hortonworks	Connection
Enablement	Subscription
SmartSense™
Premier	Operational	Support
Educational	Services
Professional	Services
Community	Connection
Hortonworks
Connection
Hortonworks	Connected	Data	Platforms	and	Solutions
38 ©	Hortonworks	Inc.	2011	– 2017.	All	Rights	Reserved38 ©	Hortonworks	Inc.	2011	– 2017.	All	Rights	Reserved
Hortonworks
Connection
Hortonworks
Connection
39 ©	Hortonworks	Inc.	2011	– 2017.	All	Rights	Reserved
Open	Source	Optimizes	Variety	and	Efficiency
Leadership	from	the	Committers
Reduced	Risk	of	Lock-In
Seamless	Integration
Unmatched	Economics
COST
EFFICIENCY
DATA
VARIETY
EDW
PROPRIETARY
HADOOP
HORTONWORKS	
OPEN	SOURCE	
RDBMS
40 ©	Hortonworks	Inc.	2011	– 2017.	All	Rights	Reserved
Business	Critical	Enablement	for	Your	Data	Plane
Hortonworks	SmartSense™
Seasoned	Professional	Services	
Training	from	the	Experts	
Cloud
Data	Center
Data	Plane
41 ©	Hortonworks	Inc.	2011	– 2017.	All	Rights	Reserved
Helping	you	to	maximize	connected	data	platforms	investments
Hortonworks	Professional	Services
Implementation
• Build	and	secure	clusters	
• Deploy	use	cases	into	production
• Migrate	from	proprietary	platforms
Advice
• Define	data	platform	strategy
• Plan	for	successful	roll	out
• Design	modern	data	architecture	
Managed	Services
• Optimize	platforms	on	site
• Extend	your	team	
• Offload	cluster	administration
42 ©	Hortonworks	Inc.	2011	– 2017.	All	Rights	Reserved
Training	By	Hortonworks	University
Comprehensive	Role-Based	Learning	Paths
Flexible	Delivery
Skills	Certifications
Hortonworks	&	IBM	- Powering	the	Future	of	Data
IBM	Spectrum	Scale		&	HDP
Hortonworks	Products
Use	Cases
44 ©	Hortonworks	Inc.	2011	– 2017.	All	Rights	Reserved
Actionable	Intelligence	Personalizes	Digital	Advertising
Market	
Research	
Studies
CRM	
Records
Online	
Transactions
Social	Media	
Streams
Impressions
Video	
Consumption	Logs
CUSTOMER	
SEGMENTATION
ONLINE	AD	
PLACEMENT
PRODUCT	
RECOMMENDATIONS
TARGETED	
PROMOTIONS
VIDEO	
SYNDICATION
Sensor	
Data
Product	
Catalogs
Server	Logs
Clickstreams
Customer	
Surveys
Sales	
Reports
45 ©	Hortonworks	Inc.	2011	– 2017.	All	Rights	Reserved
Actionable	Intelligence	Drives	the	New	Automotive	Industry
ERP	Data
Warranty	
Data
Geo	
Tracking
Infotainment	
Metadata
SCADA	
Systems
Social	Media	
Streams
PREVENTATIVE
MAINTENANCE
SUPPLY	CHAIN	
OPTIMIZATION
MANUFACTURING	
YIELDS	MAXIMIZATION
QUALITY	
CONTROL
NEW	PRODUCT	
PLANNING
ERP	
Systems
Defect	
Testing	Data
Machine	
Data Data	
Historians
Product	
Design	Docs
Service	
Records
46 ©	Hortonworks	Inc.	2011	– 2017.	All	Rights	Reserved
Actionable	Intelligence	Transforms	Energy	&	Utilities
Asset	
Data
Customer	
Surveys
Weather	&	
Environmental
Service	Fleet	
GPS	Data
Smart	Meter	
Streams
Commodity	
Prices
REVENUE	
PROTECTION
SINGLE	VIEW	
OF	CUSTOMER
PREDICTIVE	EQUIPMENT	
MAINTENANCE
CONSERVATION	
VOLTAGE	REDUCTION
COMMODITY	
TRADING
Social	
Media
GIS	
Data
SCADA Outage	
Histories
CIS	
Records
EDW
47 ©	Hortonworks	Inc.	2011	– 2017.	All	Rights	Reserved
Actionable	Intelligence	Powers	Today’s	Financial	Services
OFAC	
Lists
Credit	
Records
ATM	
Streams Transactions	&	
Wires
Stock	
Tickers
Trade	
Settlements
DIGITAL	
CUSTOMER	360
RISK	DATA	
AGGREGATION
ANTI-MONEY	
LAUNDERING
FRAUD	
DETECTION
TRADE	
SURVEILLANCE
Mobile	App	
Data
Trade	
Data
Web	
Logs
Banker	
Notes
Demographic	
Data
Customer	
Transaction	
Data
48 ©	Hortonworks	Inc.	2011	– 2017.	All	Rights	Reserved
Actionable	Intelligence	Makes	Healthcare	Precise	and	Personal
Patient	
Records
Lab	Data
Pharmacy	
Data
Patient	
Locations
Wearables
Intra-Network	
Data
Sensor	
Data
Claims	
Data
Social	
Media Physician	
Notes
Patient	
Satisfaction	Data
Clinical	
(EMR)	Data
SINGLE	VIEW	OF	
PATIENT
REAL-TIME	VITAL	
SIGN	MONITORING
BILLING	&	
REIMBURSEMENTS
EMR	
OPTIMIZATION
SUPPLY	CHAIN	
OPTIMIZATION
49 ©	Hortonworks	Inc.	2011	– 2017.	All	Rights	Reserved
Actionable	Intelligence	Is	Shaping	the	Modern	Insurance	Industry
Catastrophic	
Event	Data
Customer	
Onboarding	Data
Seismic	
Data
Biometrics	
Data
Usage-Based	
Driver	Data
Cyber	Threat	
Metadata
RISK	&	UNDERWRITING	
ANALYSIS
USAGE-BASED	
INSURANCE
CLAIMS	
ANALYTICS
NEW	PRODUCT	
DEVELOPMENT
CYBER	RISK	
ANALYTICS
Drones	&	
Aerial	Imagery
Claims	Docs,	
Notes	&	Diaries
Weather	&	
Environment
Underwriting	
Analysis
Policy	
Histories
Photos
50 ©	Hortonworks	Inc.	2011	– 2017.	All	Rights	Reserved
Actionable	Intelligence	Powers	Modern	Manufacturing
Defect	Testing	
Data
Product	
Designs
MES	
Systems
RFID	
Streams
SCADA	
Systems
Shop	Floor	
Sensors
PREVENTATIVE
MAINTENANCE
SUPPLY	CHAIN	
OPTIMIZATION
YIELD	
MAXIMIZATION
QUALITY	
CONTROL
RECALL	
AVOIDANCE
ERP	
Systems
Supplier	
Receipts
Machine	
Data
Assembly	
Line	Sensors
Data	
Historians
Work	
Orders
51 ©	Hortonworks	Inc.	2011	– 2017.	All	Rights	Reserved
Actionable	Intelligence	Fuels	Oil	&	Gas	Industry	Renovation
ERP	Data
Engineering	
Notes
IoT Gateway	
Data Video
WITSML	
Data
Weather	&	
Environment
REAL-TIME	
MONITORING
SINGLE	VIEW	OF	
OPERATIONS
PREDICTIVE	
MAINTENANCE
LAS	ARCHIVE	
&	ANALYTICS
UNSTRUCTURED	DATA	
CLASSIFICATION
Vehicle	GPS	
Data
GIS	Data
SCADA	
Systems Field	
Comments
Production	
Histories
G&G	
Data
52 ©	Hortonworks	Inc.	2011	– 2017.	All	Rights	Reserved
Actionable	Intelligence	Makes	Pharmaceuticals	Safe	&	Effective	
Research	
Cohort	Data
Molecular	
Data
RFID	Data
Social	Media
Biometrics
Sensor	
Data
DRUG	TRIAL	
COHORT	SELECTION
YIELD	
OPTIMIZATION
RAW	MATERIAL	
WASTE	REDUCTION
SEARCHABLE	
RESEARCH	REPOS
NEXT-GEN	
SEQUENCING	(NGS)
Supply	Chain	
Geo-location	Data
Scientific	
Studies
Manufacturing	
Machine	Data
Clinical	
Records
Sales	
Reports
Genomic	
Data
53 ©	Hortonworks	Inc.	2011	– 2017.	All	Rights	Reserved
Actionable	Intelligence	Enhances	Public	Sector	Efficiency
Historical	
Archives
Cyber	Threat	
Metadata
Vehicle	
Telemetry	Data
Disease	
Outbreaks
Natural	
Disasters
PUBLIC	
TRANSPORTATION
INFRASTUCTURE	
MAINTENANCE
PUBLIC	
HEALTH
NATIONAL	
DEFENSE
HOMELAND	
SECURITY
Social	
MediaWork	
Orders
Meeting	
Notes
Voter	Rolls
Public	Benefits	
Claims
Financial	
Audits
Extreme	
Weather	Alerts
54 ©	Hortonworks	Inc.	2011	– 2017.	All	Rights	Reserved
Actionable	Intelligence	Drives	Retail	Sales	Growth
Product	
Catalogs
Sales	
Forecasts
Beacons	&	
RFID Server	
Logs
In-Store	
WiFi Logs
Store	
Communications
SINGLE	VIEW	OF	
THE	CUSTOMER
PRODUCT	
RECOMMENDATIONS
INVENTORY	&	
SUPPLY	CHAIN
PRICING	
OPTIMIZATION
TARGETED	
PROMOTIONS
Clickstream
ERP	
Data
Social	
Media
Staffing	
Plans
Store	
Reporting
CRM	
Records
55 ©	Hortonworks	Inc.	2011	– 2017.	All	Rights	Reserved
Actionable	Intelligence	Transforms	the	Software	Industry
Cyber	Security	
Metadata
Sales	
Forecasts
Mobile	Device	
Geo-Location Server	
Logs
User	Activity	
Events
Network	
Logs
NEW	PRODUCT	
DEVELOPMENT
QUALITY	
ASSURANCE
CUSTOMIZATION	&	
PERSONALIZATION
CYBER	
SECURITY
REAL-TIME	USAGE	
MONITORING
Clickstreams
CRM	
Records
Social	Media	
Streams
Sprints	&	
Backlogs
User	
Testing
Historical	
Audit	Trails
56 ©	Hortonworks	Inc.	2011	– 2017.	All	Rights	Reserved
Connected	Data	Drives	Success	in	Telecommunications
Call	Detail	
Records
Product	
Catalogs
Cyber	Threat	
Metadata
Sensor	
Data
Server	
Logs
Voice-to-Text
SINGLE	VIEW	OF	
THE	CUSTOMER
CHURN	
REDUCTION
CDR	
ANALYSIS
NETWORK	
OPTIMIZATION
DYNAMIC	BANDWIDTH	
ALLOCATION
Clickstream
ERP	System	
Data
Social	
Media Billing	
Data
Subscriber	
Profiles
CRM	
Records
57 ©	Hortonworks	Inc.	2011	– 2017.	All	Rights	Reserved
Client
Use	CasesLegacy Cluster
Telecom	Architecture	standard
Tooling Data	Science,	Machine	
Learning
Model Pré-
processing
Analytics,	BI,	Ad-hoc	
Exploration
Data
Exploration
Complex
Event
Processing
Kafka SAM
Analytics,	BI,	Ad-hoc	
Exploration
Visualization
& Reporting
All Data
HDFS
Tooling
Hive
Bach Views
Tooling
SuperSet
Real Time Views
Custom Applications
Dashboards
Batch	LayerSpeed	LayerServing	Layer
Ingest
Atlas/Ranger
Model
Building
R
Spark
Druid
Marketing
Others
Customer	Sentiment	&	Churn	
Use	Case	
Zeppelin
Network	Optimization	Journey
Real-time	Marketing	&	
Advertising
Relational Bases
Social Networks
WebSites
Mobile Apps
CDR - Network
OOT
Adwords/adserver
Beacon
TWW/Smart Focus
…
CRM
58 ©	Hortonworks	Inc.	2011	– 2017.	All	Rights	Reserved
Obrigado!
linkedin.com/in/thiagosantiago/

More Related Content

PDF
PGDay Brasilia 2017
PDF
BigData HUB Workshop
PDF
Guest Lecture: Introduction to Big Data at Indian Institute of Technology
PPTX
BDaas- BigData as a service
PDF
Big Data: an introduction
PPTX
The DBA Is Dead (Again). Long Live the DBA !
PDF
3 джозеп курто превращаем вашу организацию в big data компанию
PPTX
Hadoop dev 01
PGDay Brasilia 2017
BigData HUB Workshop
Guest Lecture: Introduction to Big Data at Indian Institute of Technology
BDaas- BigData as a service
Big Data: an introduction
The DBA Is Dead (Again). Long Live the DBA !
3 джозеп курто превращаем вашу организацию в big data компанию
Hadoop dev 01

What's hot (20)

PPTX
Big Data Course - BigData HUB
PDF
Rob peglar introduction_analytics _big data_hadoop
PDF
The book of elephant tattoo
PDF
Big Data & Open Source - Neil Jadhav
PDF
SuanIct-Bigdata desktop-final
PPTX
Big Data vs Data Warehousing
PDF
Introduction to Big Data
PDF
Big Data Scotland 2017
PDF
Hadoop,Big Data Analytics and More
PDF
Battling the disrupting Energy Markets utilizing PURE PLAY Cloud Computing
PDF
1524 how ibm's big data solution can help you gain insight into your data cen...
 
PDF
Hadoop is Happening
ODP
BigData Hadoop
PPT
Big Data Real Time Analytics - A Facebook Case Study
PDF
Big Data
PDF
On Performance Under Hotspots in Hadoop versus Bigdata Replay Platforms
PDF
Hadoop Trends
DOCX
Prasanna Resume
PPTX
Hadoop and BigData - July 2016
DOC
Big Data Course - BigData HUB
Rob peglar introduction_analytics _big data_hadoop
The book of elephant tattoo
Big Data & Open Source - Neil Jadhav
SuanIct-Bigdata desktop-final
Big Data vs Data Warehousing
Introduction to Big Data
Big Data Scotland 2017
Hadoop,Big Data Analytics and More
Battling the disrupting Energy Markets utilizing PURE PLAY Cloud Computing
1524 how ibm's big data solution can help you gain insight into your data cen...
 
Hadoop is Happening
BigData Hadoop
Big Data Real Time Analytics - A Facebook Case Study
Big Data
On Performance Under Hotspots in Hadoop versus Bigdata Replay Platforms
Hadoop Trends
Prasanna Resume
Hadoop and BigData - July 2016
Ad

Similar to Hortonworks & IBM solutions (20)

PDF
Idera live 2021: Managing Databases in the Cloud - the First Step, a Succes...
PPTX
How Big Data is Transforming the Data Center
PPTX
RUCUG: 10. Robert Morris:Жизнь в окопах виртуализационной войны
PDF
Telco Business & Technology
PDF
Fast SQL on Hadoop, really?
PDF
IBM Watson & PHP, A Practical Demonstration
PDF
times ten in-memory database for extreme performance
PPT
Webinar: High Performance MongoDB Applications with IBM POWER8
PDF
Best Practices & Lessons Learned from Deployment of PostgreSQL
 
PDF
The Value of Postgres to IT and Finance
 
DOCX
Sabyasachee_Kar_cv
PDF
Reducing Database Pain & Costs with Postgres
 
PDF
Hive 3 a new horizon
PPTX
Gartner pace and bi-modal models
PDF
Streaming is a Detail
PDF
MySQL day Dublin - OCI & Application Development
PDF
Virtual training intro to InfluxDB - June 2021
PDF
High-Level Synthesis for the Design of AI Chips
PDF
What's New in Apache Hive 3.0?
PDF
What's New in Apache Hive 3.0 - Tokyo
Idera live 2021: Managing Databases in the Cloud - the First Step, a Succes...
How Big Data is Transforming the Data Center
RUCUG: 10. Robert Morris:Жизнь в окопах виртуализационной войны
Telco Business & Technology
Fast SQL on Hadoop, really?
IBM Watson & PHP, A Practical Demonstration
times ten in-memory database for extreme performance
Webinar: High Performance MongoDB Applications with IBM POWER8
Best Practices & Lessons Learned from Deployment of PostgreSQL
 
The Value of Postgres to IT and Finance
 
Sabyasachee_Kar_cv
Reducing Database Pain & Costs with Postgres
 
Hive 3 a new horizon
Gartner pace and bi-modal models
Streaming is a Detail
MySQL day Dublin - OCI & Application Development
Virtual training intro to InfluxDB - June 2021
High-Level Synthesis for the Design of AI Chips
What's New in Apache Hive 3.0?
What's New in Apache Hive 3.0 - Tokyo
Ad

More from Thiago Santiago (12)

PDF
LGPD - Webinar Cloudera e FIAP
PDF
Harvard Business Review - LGPD
PDF
Meet up roadmap cloudera 2020 - janeiro
PPTX
Hortonworks - IBM - Cloud Event
PDF
Hortonworks - IBM Cognitive - The Future of Data Science
PDF
Social Media Monitoring with NiFi, Druid and Superset
PDF
Big Data Week São Paulo 2017
PDF
Instituto Infnet - BigData e Hadoop
PDF
Hadoop Day - MeetUp - O poder da Informação
PPTX
BigData & Hadoop - Technology Latinoware 2016
PDF
TDC 2014 - Hadoop Hands ON
PPTX
Hadoop - Mãos à massa! Qcon2014
LGPD - Webinar Cloudera e FIAP
Harvard Business Review - LGPD
Meet up roadmap cloudera 2020 - janeiro
Hortonworks - IBM - Cloud Event
Hortonworks - IBM Cognitive - The Future of Data Science
Social Media Monitoring with NiFi, Druid and Superset
Big Data Week São Paulo 2017
Instituto Infnet - BigData e Hadoop
Hadoop Day - MeetUp - O poder da Informação
BigData & Hadoop - Technology Latinoware 2016
TDC 2014 - Hadoop Hands ON
Hadoop - Mãos à massa! Qcon2014

Recently uploaded (20)

PDF
Encapsulation theory and applications.pdf
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Heart disease approach using modified random forest and particle swarm optimi...
PDF
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
PPTX
SOPHOS-XG Firewall Administrator PPT.pptx
PDF
Getting Started with Data Integration: FME Form 101
PPTX
OMC Textile Division Presentation 2021.pptx
PDF
Machine learning based COVID-19 study performance prediction
PDF
Mushroom cultivation and it's methods.pdf
PPTX
Group 1 Presentation -Planning and Decision Making .pptx
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
A comparative study of natural language inference in Swahili using monolingua...
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
A comparative analysis of optical character recognition models for extracting...
Encapsulation theory and applications.pdf
Network Security Unit 5.pdf for BCA BBA.
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Heart disease approach using modified random forest and particle swarm optimi...
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
SOPHOS-XG Firewall Administrator PPT.pptx
Getting Started with Data Integration: FME Form 101
OMC Textile Division Presentation 2021.pptx
Machine learning based COVID-19 study performance prediction
Mushroom cultivation and it's methods.pdf
Group 1 Presentation -Planning and Decision Making .pptx
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
gpt5_lecture_notes_comprehensive_20250812015547.pdf
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Per capita expenditure prediction using model stacking based on satellite ima...
NewMind AI Weekly Chronicles - August'25-Week II
Digital-Transformation-Roadmap-for-Companies.pptx
A comparative study of natural language inference in Swahili using monolingua...
Encapsulation_ Review paper, used for researhc scholars
A comparative analysis of optical character recognition models for extracting...

Hortonworks & IBM solutions