SlideShare a Scribd company logo
Spark DSM
Data Streaming Pipeline
ORCHESTRATING DATA STORAGE, PROCESSING, AND MOVEMENT
Background
 Today’s data landscape for enterprises continues to grow exponentially in
volume, variety, and complexity.
 Multiple geographic locations, on-premises and cloud
 Combination of open source, commercial solutions and custom processing code
 Can be expensive, hard to integrate and maintain.
 Ever increasing volumes of data (terabytes, petabytes)
 New ways of processing data (Hadoop, Spark etc.)
 .NET Developers write large amounts of custom point-solution logic
 Difficult to maintain and orchestrate
 Performance bottlenecks
SparkPipe Framework
 A development framework to deliver a .NET information production system
that co-ordinates all of this data and processing.
 Familiar technologies for .NET developers including
 .NET Framework 4.0
 Windows Workflow Foundation
 Task Parallel Library Dataflow
 Drag and drop business process pipeline modeling
 Designed for performance to scale across processor cores and servers
from the local data center to cloud providers such as Microsoft Azure
Build Solutions
 Build data-driven workflows (pipelines) that join, aggregate and transform
data sourced from on-premises, cloud-based, and internet data stores.
 Transform semi-structured, unstructured and structured data from diverse
data sources into trusted information.
 Produce data that can be easily consumed by using business intelligence
(BI), analytics tools, and other applications.
 Set up complex data processing through simple composing.
Visual Pipeline Design
Built for “Cloud Scale”
 Support for Microsoft Azure offerings including:
 Azure SQL Server
 HDInsight (HADOOP)
 Blob, Tables, Queues and ServiceBus
 Automatically spin-up cloud servers, process data and then shut down to
for cost-effective processing.
Support for Healthcare
 Out of the box components include:
 HL7 v2
 Clinical Document Architecture
 EDI 834
 PGP Encryption
 Secure FTP
Spark Data Streaming Pipeline
Typical Process Flow

More Related Content

PPTX
Talend Online Training
PPTX
Hadoop - A big data initiative
PPTX
Partners 2013 LinkedIn Use Cases for Teradata Connectors for Hadoop
PPTX
How One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 Million
PPTX
Streaming Real-time Data to Azure Data Lake Storage Gen 2
PPTX
Big Data - HDInsight and Power BI
PPTX
Big Data with SQL Server
PDF
Data Engineering Basics
Talend Online Training
Hadoop - A big data initiative
Partners 2013 LinkedIn Use Cases for Teradata Connectors for Hadoop
How One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 Million
Streaming Real-time Data to Azure Data Lake Storage Gen 2
Big Data - HDInsight and Power BI
Big Data with SQL Server
Data Engineering Basics

What's hot (20)

PPTX
Solution architecture for big data projects
PDF
Why Use Hadoop?
PDF
What is an Open Data Lake? - Data Sheets | Whitepaper
PDF
RDBMS vs Hadoop vs Spark
PPTX
Solution architecture
PPTX
PDF
The Hive Think Tank - The Microsoft Big Data Stack by Raghu Ramakrishnan, CTO...
PPTX
Demystifying data engineering
PPTX
Massive parallel processing database systems mpp
PPTX
Владимир Слободянюк «DWH & BigData – architecture approaches»
PPTX
Eugene Polonichko "Azure Data Lake: what is it? why is it? where is it?"
PPTX
Anatomy of a data driven architecture - Tamir Dresher
PPTX
Database awareness
PPTX
Enterprise architecture for big data projects
PPTX
Data Vault Vs Data Lake
PDF
Presto – Today and Beyond – The Open Source SQL Engine for Querying all Data...
PDF
Prague data management meetup 2018-03-27
PPTX
Case study on big data
PPTX
Data Lake Overview
Solution architecture for big data projects
Why Use Hadoop?
What is an Open Data Lake? - Data Sheets | Whitepaper
RDBMS vs Hadoop vs Spark
Solution architecture
The Hive Think Tank - The Microsoft Big Data Stack by Raghu Ramakrishnan, CTO...
Demystifying data engineering
Massive parallel processing database systems mpp
Владимир Слободянюк «DWH & BigData – architecture approaches»
Eugene Polonichko "Azure Data Lake: what is it? why is it? where is it?"
Anatomy of a data driven architecture - Tamir Dresher
Database awareness
Enterprise architecture for big data projects
Data Vault Vs Data Lake
Presto – Today and Beyond – The Open Source SQL Engine for Querying all Data...
Prague data management meetup 2018-03-27
Case study on big data
Data Lake Overview
Ad

Viewers also liked (17)

PPTX
Pixel shaders
PDF
Big Data Logging Pipeline with Apache Spark and Kafka
PDF
Email Classifier using Spark 1.3 Mlib / ML Pipeline
ODP
PPTX
Beginning direct3d gameprogramming10_shaderdetail_20160506_jintaeks
PPT
Geometry Shader-based Bump Mapping Setup
PDF
Shaders - Claudia Doppioslash - Unity With the Best
PPTX
Beginning direct3d gameprogramming09_shaderprogramming_20160505_jintaeks
PDF
Unity Surface Shader for Artist 02
PDF
Advanced Spark and TensorFlow Meetup 08-04-2016 One Click Spark ML Pipeline D...
PPTX
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
PDF
Building Scalable Big Data Pipelines
PPTX
Building a unified data pipeline in Apache Spark
PPTX
R&D to Product Pipeline Using Apache Spark in AdTech: Spark Summit East talk ...
PPTX
Working with Shader with Unity
PPTX
Aws overview
PDF
Unity道場11 Shader Forge 101 ~ShaderForgeをつかって学ぶシェーダー入門~ 基本操作とよく使われるノード編
Pixel shaders
Big Data Logging Pipeline with Apache Spark and Kafka
Email Classifier using Spark 1.3 Mlib / ML Pipeline
Beginning direct3d gameprogramming10_shaderdetail_20160506_jintaeks
Geometry Shader-based Bump Mapping Setup
Shaders - Claudia Doppioslash - Unity With the Best
Beginning direct3d gameprogramming09_shaderprogramming_20160505_jintaeks
Unity Surface Shader for Artist 02
Advanced Spark and TensorFlow Meetup 08-04-2016 One Click Spark ML Pipeline D...
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
Building Scalable Big Data Pipelines
Building a unified data pipeline in Apache Spark
R&D to Product Pipeline Using Apache Spark in AdTech: Spark Summit East talk ...
Working with Shader with Unity
Aws overview
Unity道場11 Shader Forge 101 ~ShaderForgeをつかって学ぶシェーダー入門~ 基本操作とよく使われるノード編
Ad

Similar to Spark Data Streaming Pipeline (20)

PDF
SAP Analytics Cloud: Haben Sie schon alle Datenquellen im Live-Zugriff?
PDF
ER/Studio Data Architect Datasheet
PDF
Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...
DOCX
Resume_Asad_updated_DEC2016
PPTX
Azure Data Factory ETL Patterns in the Cloud
PPTX
SQL Saturday Redmond 2019 ETL Patterns in the Cloud
PPTX
Trafodion overview
PDF
Trivadis Azure Data Lake
PPTX
Data Warehouse Modernization: Accelerating Time-To-Action
PPTX
Bring your SAP and Enterprise Data to Hadoop, Apache Kafka and the Cloud
PDF
Rajeev kumar apache_spark & scala developer
PPTX
Microsoft Data Platform - What's included
PDF
NRB - BE MAINFRAME DAY 2017 - Data spark and the data federation
 
PDF
NRB - LUXEMBOURG MAINFRAME DAY 2017 - Data Spark and the Data Federation
 
PDF
Best Practices for Building and Deploying Data Pipelines in Apache Spark
PDF
Track B-1 建構新世代的智慧數據平台
PPTX
Azure Data.pptx
DOCX
Keith R Evans Resume
PDF
The Hidden Value of Hadoop Migration
PDF
Prague data management meetup 2017-01-23
SAP Analytics Cloud: Haben Sie schon alle Datenquellen im Live-Zugriff?
ER/Studio Data Architect Datasheet
Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...
Resume_Asad_updated_DEC2016
Azure Data Factory ETL Patterns in the Cloud
SQL Saturday Redmond 2019 ETL Patterns in the Cloud
Trafodion overview
Trivadis Azure Data Lake
Data Warehouse Modernization: Accelerating Time-To-Action
Bring your SAP and Enterprise Data to Hadoop, Apache Kafka and the Cloud
Rajeev kumar apache_spark & scala developer
Microsoft Data Platform - What's included
NRB - BE MAINFRAME DAY 2017 - Data spark and the data federation
 
NRB - LUXEMBOURG MAINFRAME DAY 2017 - Data Spark and the Data Federation
 
Best Practices for Building and Deploying Data Pipelines in Apache Spark
Track B-1 建構新世代的智慧數據平台
Azure Data.pptx
Keith R Evans Resume
The Hidden Value of Hadoop Migration
Prague data management meetup 2017-01-23

Recently uploaded (20)

PPTX
Operating system designcfffgfgggggggvggggggggg
PDF
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf
PDF
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
PPTX
Odoo POS Development Services by CandidRoot Solutions
PPTX
Reimagine Home Health with the Power of Agentic AI​
PPTX
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
PPTX
history of c programming in notes for students .pptx
PDF
Adobe Illustrator 28.6 Crack My Vision of Vector Design
PDF
Which alternative to Crystal Reports is best for small or large businesses.pdf
PDF
Design an Analysis of Algorithms I-SECS-1021-03
PDF
2025 Textile ERP Trends: SAP, Odoo & Oracle
PDF
How Creative Agencies Leverage Project Management Software.pdf
PDF
Navsoft: AI-Powered Business Solutions & Custom Software Development
PPTX
CHAPTER 2 - PM Management and IT Context
PPTX
ai tools demonstartion for schools and inter college
PDF
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
PPTX
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
PDF
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
PDF
EN-Survey-Report-SAP-LeanIX-EA-Insights-2025.pdf
PDF
How to Migrate SBCGlobal Email to Yahoo Easily
Operating system designcfffgfgggggggvggggggggg
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
Odoo POS Development Services by CandidRoot Solutions
Reimagine Home Health with the Power of Agentic AI​
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
history of c programming in notes for students .pptx
Adobe Illustrator 28.6 Crack My Vision of Vector Design
Which alternative to Crystal Reports is best for small or large businesses.pdf
Design an Analysis of Algorithms I-SECS-1021-03
2025 Textile ERP Trends: SAP, Odoo & Oracle
How Creative Agencies Leverage Project Management Software.pdf
Navsoft: AI-Powered Business Solutions & Custom Software Development
CHAPTER 2 - PM Management and IT Context
ai tools demonstartion for schools and inter college
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
EN-Survey-Report-SAP-LeanIX-EA-Insights-2025.pdf
How to Migrate SBCGlobal Email to Yahoo Easily

Spark Data Streaming Pipeline

  • 1. Spark DSM Data Streaming Pipeline ORCHESTRATING DATA STORAGE, PROCESSING, AND MOVEMENT
  • 2. Background  Today’s data landscape for enterprises continues to grow exponentially in volume, variety, and complexity.  Multiple geographic locations, on-premises and cloud  Combination of open source, commercial solutions and custom processing code  Can be expensive, hard to integrate and maintain.  Ever increasing volumes of data (terabytes, petabytes)  New ways of processing data (Hadoop, Spark etc.)  .NET Developers write large amounts of custom point-solution logic  Difficult to maintain and orchestrate  Performance bottlenecks
  • 3. SparkPipe Framework  A development framework to deliver a .NET information production system that co-ordinates all of this data and processing.  Familiar technologies for .NET developers including  .NET Framework 4.0  Windows Workflow Foundation  Task Parallel Library Dataflow  Drag and drop business process pipeline modeling  Designed for performance to scale across processor cores and servers from the local data center to cloud providers such as Microsoft Azure
  • 4. Build Solutions  Build data-driven workflows (pipelines) that join, aggregate and transform data sourced from on-premises, cloud-based, and internet data stores.  Transform semi-structured, unstructured and structured data from diverse data sources into trusted information.  Produce data that can be easily consumed by using business intelligence (BI), analytics tools, and other applications.  Set up complex data processing through simple composing.
  • 6. Built for “Cloud Scale”  Support for Microsoft Azure offerings including:  Azure SQL Server  HDInsight (HADOOP)  Blob, Tables, Queues and ServiceBus  Automatically spin-up cloud servers, process data and then shut down to for cost-effective processing.
  • 7. Support for Healthcare  Out of the box components include:  HL7 v2  Clinical Document Architecture  EDI 834  PGP Encryption  Secure FTP