Supercharging Data Performance for Real-Time Data Analysis

11
Supercharging Data Performance for
Real-Time Data Analysis

2
Information—the fuel of business—is trapped
in analysis platforms built on 70-year
old architectures.

3
Data volume and velocity challenge traditional
computing methods
Traditional Approach:
• Commodity x86 based servers
• Cluster with open source software
• Scale for volume
• Scale for parallelism / performance
Challenges:
• High level languages can be inefficient
• Data intensive workloads drive in-memory solutions
• DRAM footprints at commodity prices are small
• Scaling out increases cost and complexity

Ryft delivers huge benefits in a small package.
Highest performance per watt and lowest total cost of ownership (TCO) of
any product on the market.
48 TB in 1U
• Data storage is abstracted
as a set of Linux mount
points
• Support native
encryption/decryption with
no loss in performance
(AES 256 Encryption)
Simple API
• C library abstracts internal
FPGA constructs to simplify
programmability, allowing a
programmer to invoke
operations as simple function
calls, returning simple results
• Command line
• Web Interface
Linux Front End
• Linux (Ubuntu 14.04 LTS )
front end - Standard build,
Non restricted OS, apt-get
• API calls FPGA fabric
backend
• Linux services/protocols can
be used
• ssh/scp/rsync/sftp
• Standard monitoring
agents
• Web services
• Security configuration

x86 Architecture vs. Systolic Arrays
Memory
PE
One Clock Cycle
(x86)
Memory
PEPEPE PE PEPE
One Clock Cycle
FPGA- Systolic Array
100 ns
100 ns

FPGA Benefits
x86 FPGA
• General purpose computing
• Sequential in nature
• Non-deterministic performance
• Interrupts
• Memory allocation
• Problems are broken into a sequence of
operations and processed serially
• Increasing number of instructions
• Increased overhead
• Increasing required power/cooling
required
• Software can break problems down and
bring parallelism:
• Between processors/cores
• Between servers
• Output combined over interconnects
• Not general purpose
• Purpose built algorithms
• Can be reprogramed via firmware
• Parallel in nature
• Can execute many parallel operations in
one clock cycle
• More output with less power and clock
speed
• ~1000X less instructions to solve the same
problem as x86
• 100% deterministic performance
• No memory fetching or management
• No interrupts

Multi-Dimensional Systolic Arrays
PE PE PE
PE PE PE
PE PE PE
PE PE PE
PE
PE
PE
PE
PE PE PE PE
PE
PE
PE
PE
PE

The Ryft ONE is powered by a breakthrough in
Real-time Data Analysis.
The only 1U platform capable of analyzing streaming, historical,
unstructured, and multi-structured data in real-time at 10 GB/second.
Ryft ONE avoids bottlenecks that strangle conventional systems
by combining these two innovations:
The Ryft Analytics Cortex™
Ryft ONE leverages a massively parallel bitwise
computing architecture to deliver unprecedented
performance from the smallest possible form factor.
The Ryft Algorithm Primitives™ Library
Each Ryft ONE comes with a subscription to this
growing collection of pre-built algorithm components,
and an open API to leverage them.
+

“We see Spark Streaming scales nearly linearly to 100 nodes, and can
process up to 6 GB/s at sub-second latency on 100 nodes for Grep, 2.3
GB/s for the other, more CPU-intensive jobs”
UC Berkley Streaming Computation at Scale
Proprietary | 9
http://www.cs.berkeley.edu/~matei/papers/2013/sosp_spark_streaming.pdf

Ryft transforms datacenter economics.
The Ryft ONE
Costly & Complex Clusters
Search = 10 GB/s
Term Frequency = 2.5 GB/s
Search = 6GB/s
Term Frequency= 2.3 GB/s

Wikipedia Examples
• English XML Dump is offered by Wikipedia
• Total Corpus is 44GB
• Copying the data takes 44 seconds
• Fuzzy search would take 4.4 seconds
• Term Frequency would take 17.6 seconds

Data Exploration Use Case
• RDF—understanding
of native formats
• Powerful no-index
search
• Flexible query format
with wildcarding
• Identify relationships
between disparate data

HDFS
Data Triage for Hadoop/Spark Use Case
Raw Data
M/R
noSQL Hive
Text
Index
Application
Hours?
Days?

Search / Minimize
@10GB/s
Data Triage for Hadoop/Spark Use Case
Ingest @ 1-4GB/s
Seconds!
HDFS
• Social media signal/noise
• Fuzzy searching at line rate
@badguy1
@badguy2
@badguy01
@badboy01
Search: “badguy??”

Organizations who want real-time insights into all their data
Large data sets (changing, structured & unstructured, Text, Binary, Imaging)
High Velocity Data
• Logging
• Ad Data
• Twitter
Forensics & Legal Discovery
• Host based forensics
• E-discovery
Scientific Data
• Genomics
• Sensor Data
Financial
• Compliance
• Fraud Detection
Cyber Security
• PCAP
• Full packet capture
• Binary Analysis
Imagery Analysis
• Change Analysis
• High Performance Rendering

Revisiting Performance Results
Ryft ONE closes the industry’s data analytics performance gap
by combining the following into a single architecture:
 Parallel FPGA architectures to accelerate performance
 Dedicated storage/access/RAM
 Elimination of data security performance bottlenecks
 Elimination of operating system and high level language overhead
 Minimizing the need to move data
Use Case
Single Ryft ONE
Throughput
Spark Cluster to Match
Performance
Search ~10GB/sec > 100 nodes1
Fuzzy Search ~10GB/sec 100-200 nodes2
Term Frequency ~2.5GB/sec 100 nodes1

Accelerate business insights with the only platform purpose-built
to simultaneously analyze any type of data—historical and
streaming, unstructured and multi-structured—
100X faster with 70% lower TCO.
The Ryft ONE: More data. Less center. Faster insights.

Supercharging Data Performance for Real-Time Data Analysis

More Related Content

What's hot (20)

Viewers also liked (9)

Similar to Supercharging Data Performance for Real-Time Data Analysis (20)

Recently uploaded (20)

Supercharging Data Performance for Real-Time Data Analysis

Editor's Notes