SlideShare a Scribd company logo
PARALLEL & ASYNC
PROCESSING USING TPL
DATAFLOW
Petru Rebeja
Parallel & async processing using tpl dataflow
AGENDA
• What is Dataflow?
• When to use it?
• How to use it?
• Q&A
THE BIG PICTURE
CLR Thread Pool
Tasks
PLINQ Parallel Loops
Concurrent Collections
Dataflow
DATAFLOW BENEFITS
• Effortless use of multi-threading
• Performance boost via painless optimization
• Development focus is on the ‘what’ rather than ‘how’
DATAFLOW USAGES
High throughput, low-latency scenarios
Robotics
Manufacturing
Imaging Biology
Oil & Gas
Finance
PROGRAMMING MODEL
• Actor-based programming
• In-process message passing
• Components (blocks) for creating data processing pipelines
ARCHITECTURE
IDataflowBlock
ISourceBlock<TOutput> ITargetBlock<TInput>
IPropagatorBlock<Tinput,Toutput>
COMPOSITION
Source
Target
Propagator
Optional
Transform
BUFFERING BLOCKS
BufferBlock<T>
BroadcastBlock<T>
WriteOnceBlock<T>
EXECUTION BLOCKS
ActionBlock<T>
TransformBlock<T,V>
TransformManyBlock<T,V>
GROUPING BLOCKS
BatchBlock<T>
JoinBlock<T1,T2,…>
BatchedJoinBlock<T1,T2>
BEHAVIOR CONFIGURATION OPTIONS
• BufferBlock<T>
• BroadcastBlock<T>
• WriteOnceBlock<T>
DataflowBlockOptions
• ActionBlock<T>
• TransformBlock<TIn, TOut>
• TransformManyBlock<TIn, TOut>
ExecutionDataflowBlockOptions
• BatchBlock<T>
• JoinBlock<T1, T2[, T3]>
• BatchedJoinBlock<T1, T2>
GroupingDataflowBlockOptions
COMPLETION & CANCELLATION
• To know when a block completes await block.Completion
or add a continuation task to it
• To propagate completion from source to target, set
DataflowLinkOptions.PropagateCompletion when
linking
• Set DataflowBlockOptions.CancellationToken to
enable cancellation
ERROR HANDLING
• If the exception does not affect the integrity of the
pipeline – use a try/catch inside the block
• Otherwise, handle errors outside of the pipeline by
• Adding a continuation to block.Completion
• Propagating errors through the pipeline
DEALING WITH CONCURRENCY
• Rule of thumb: avoid shared state whenever possible.
• Use ConcurrentExclusiveSchedulerPair to perform
updates on shared state
• Be aware of the caveats with
ConcurrentExclusiveSchedulerPair
CREATING CUSTOM BLOCKS
The easy way:
DataflowBlock.Encapsulate<TInput, TOutput>(
target, source)
CREATING CUSTOM BLOCKS
The hard(core) way:
class CustomBlock:
IPropagatorBlock<TInput, TOutput>
{
}
CREATING CUSTOM BLOCKS
Either way you choose, don’t forget to:
• Propagate completion
• Pool for cancellation
REFERENCES & FURTHER READING
Dataflow (Task Parallel Library) http://msdn.microsoft.com/en-us/library/hh228603(v=vs.110).aspx
Stephen Toub
TPL Dataflow Tour
http://channel9.msdn.com/posts/TPL-Dataflow-Tour
Joseph Albahari
The Future of .NET Parallel
Programming
http://channel9.msdn.com/events/TechEd/Australia/Tech-Ed-Australia-
2011/DEV308
Stephen Toub
Inside TPL Dataflow
http://channel9.msdn.com/Shows/Going+Deep/Stephen-Toub-Inside-TPL-
Dataflow
Alexey Kursov
Pipeline TPL Dataflow Usage examples
https://www.youtube.com/watch?v=AI9KxgDF43k
https://www.youtube.com/watch?v=AI9KxgDF43k
Richard Blewett, Andrew Clymer
Pro Asynchronous Programming with
.NET
APRESS 2013
ISBN: 978-1430259206
AKKA.NET http://getakka.net/
QUESTIONS?
THANK YOU!
Petru.Rebeja@gmail.com
Parallel & Async Processing using TPL
Dataflow

More Related Content

PPTX
Cloud powered search
PDF
JOSA TechTalk - Lambda architecture and real-time processing
PDF
Filtering vs Enriching Data in Apache Spark
PDF
Zipline - A Declarative Feature Engineering Framework
PDF
Building a Data Ingestion & Processing Pipeline with Spark & Airflow
PDF
Apache Gearpump - Lightweight Real-time Streaming Engine
PDF
Apache Flink 101 - the rise of stream processing and beyond
PDF
Actionable Insights with Apache Apex at Apache Big Data 2017 by Devendra Tagare
Cloud powered search
JOSA TechTalk - Lambda architecture and real-time processing
Filtering vs Enriching Data in Apache Spark
Zipline - A Declarative Feature Engineering Framework
Building a Data Ingestion & Processing Pipeline with Spark & Airflow
Apache Gearpump - Lightweight Real-time Streaming Engine
Apache Flink 101 - the rise of stream processing and beyond
Actionable Insights with Apache Apex at Apache Big Data 2017 by Devendra Tagare

What's hot (20)

PPTX
Apache Beam (incubating)
PPTX
Lego-like building blocks of Storm and Spark Streaming Pipelines
PPTX
InfluxDb
PDF
Spark Summit EU talk by Javier Aguedes
PDF
Understanding and Improving Code Generation
PDF
Machine Learning Deep Dive
PPTX
Engineers guide to data analysis
PPTX
How EnerKey Using InfluxDB Saves Customers Millions by Detecting Energy Usage...
PDF
Streaming Analytics @ Uber
PDF
Introduction to Apache Apex - CoDS 2016
PDF
What Your Tech Lead Thinks You Know (But Didn't Teach You)
PPTX
Big data architecture
PPTX
Apache Airflow (incubating) NL HUG Meetup 2016-07-19
PDF
Stream Data Processing at Big Data Landscape by Oleksandr Fedirko
PDF
Extending The Yahoo Streaming Benchmark to Apache Apex
PPTX
What's new in 1.9.0 blink planner - Kurt Young, Alibaba
PDF
Grafana introduction
PPTX
University program - writing an apache apex application
PDF
Fugue: Unifying Spark and Non-Spark Ecosystems for Big Data Analytics
PPTX
Hadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache Apex
Apache Beam (incubating)
Lego-like building blocks of Storm and Spark Streaming Pipelines
InfluxDb
Spark Summit EU talk by Javier Aguedes
Understanding and Improving Code Generation
Machine Learning Deep Dive
Engineers guide to data analysis
How EnerKey Using InfluxDB Saves Customers Millions by Detecting Energy Usage...
Streaming Analytics @ Uber
Introduction to Apache Apex - CoDS 2016
What Your Tech Lead Thinks You Know (But Didn't Teach You)
Big data architecture
Apache Airflow (incubating) NL HUG Meetup 2016-07-19
Stream Data Processing at Big Data Landscape by Oleksandr Fedirko
Extending The Yahoo Streaming Benchmark to Apache Apex
What's new in 1.9.0 blink planner - Kurt Young, Alibaba
Grafana introduction
University program - writing an apache apex application
Fugue: Unifying Spark and Non-Spark Ecosystems for Big Data Analytics
Hadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache Apex
Ad

Similar to Parallel & async processing using tpl dataflow (20)

PDF
DOWNSAMPLING DATA
PDF
Mastering Query Optimization Techniques for Modern Data Engineers
PPTX
Cloudera Customer Success Story
PPTX
Building your bi system-HadoopCon Taiwan 2015
PPTX
Live Coding a KSQL Application
PPTX
Reactive Spring 5
PPTX
Professional SQL for Developers
PDF
Intro to Telegraf
PDF
Recovery as a Service Technical Deep Dive
PPTX
Natural Laws of Software Performance
PDF
Data Pipelines with Python - NWA TechFest 2017
PPTX
DIR sssssssssssssssssssssssssssssssssV PPT draft2.pptx
PPTX
Hekaton (xtp) introduction
PDF
Travelling in time with SQL Server 2016 - Damian Widera
PDF
Music city data Hail Hydrate! from stream to lake
PDF
Oracle GoldenGate Architecture Performance
PDF
Scaling up uber's real time data analytics
PDF
Introduction to InfluxDB and TICK Stack
PDF
Productionizing Machine Learning with a Microservices Architecture
PDF
Hail hydrate! from stream to lake using open source
DOWNSAMPLING DATA
Mastering Query Optimization Techniques for Modern Data Engineers
Cloudera Customer Success Story
Building your bi system-HadoopCon Taiwan 2015
Live Coding a KSQL Application
Reactive Spring 5
Professional SQL for Developers
Intro to Telegraf
Recovery as a Service Technical Deep Dive
Natural Laws of Software Performance
Data Pipelines with Python - NWA TechFest 2017
DIR sssssssssssssssssssssssssssssssssV PPT draft2.pptx
Hekaton (xtp) introduction
Travelling in time with SQL Server 2016 - Damian Widera
Music city data Hail Hydrate! from stream to lake
Oracle GoldenGate Architecture Performance
Scaling up uber's real time data analytics
Introduction to InfluxDB and TICK Stack
Productionizing Machine Learning with a Microservices Architecture
Hail hydrate! from stream to lake using open source
Ad

More from Codecamp Romania (20)

PDF
Cezar chitac the edge of experience
PPTX
Cloud powered search
PPTX
Business analysis techniques exercise your 6-pack
PPTX
Bpm company code camp - configuration or coding with pega
PPT
Andrei prisacaru takingtheunitteststothedatabase
PPTX
Agility and life
PPTX
2015 dan ardelean develop for windows 10
PDF
The bigrewrite
PDF
The case for continuous delivery
PPTX
Stefan stolniceanu spritekit, 2 d or not 2d
PPTX
Sizing epics tales from an agile kingdom
PPTX
Scale net apps in aws
PPTX
Raluca butnaru corina cilibiu the unknown universe of a product and the cer...
PPTX
Parallel & async processing using tpl dataflow
PDF
Material design screen transitions in android
PDF
Kickstart your own freelancing career
PDF
Ionut grecu the soft stuff is the hard stuff. the agile soft skills toolkit
PDF
Ecma6 in the wild
PPTX
Diana antohi me against myself or how to fail and move forward
Cezar chitac the edge of experience
Cloud powered search
Business analysis techniques exercise your 6-pack
Bpm company code camp - configuration or coding with pega
Andrei prisacaru takingtheunitteststothedatabase
Agility and life
2015 dan ardelean develop for windows 10
The bigrewrite
The case for continuous delivery
Stefan stolniceanu spritekit, 2 d or not 2d
Sizing epics tales from an agile kingdom
Scale net apps in aws
Raluca butnaru corina cilibiu the unknown universe of a product and the cer...
Parallel & async processing using tpl dataflow
Material design screen transitions in android
Kickstart your own freelancing career
Ionut grecu the soft stuff is the hard stuff. the agile soft skills toolkit
Ecma6 in the wild
Diana antohi me against myself or how to fail and move forward

Parallel & async processing using tpl dataflow

Editor's Notes

  • #4: When discussing about how to use Dataflow we’ll touch the following points of interest: - programming model (what are the entities exposed by Dataflow?) - configuring the behavior of the entities (parallelism, completion, error handling) - although Dataflow removes the need for dealing with concurrent scenarios there are cases when concurrency is inevitable and developers must properly deal with concurrency pitfalls - whenever the functionality of built-in blocks isn’t enough, Dataflow offers the possibility to create custom blocks
  • #5: .NET Framework 4.0 comes with three APIs for Parallel Programming: Tasks (lower level), PLINQ and Parallel (upper level). The Dataflow library is a natural extension of the TPL library that allows developers to create data-processing pipelines in their applications. The Dataflow library provides a framework for creating blocks that perform a specific function asynchronously. These blocks can be composed together to form a pipeline where data flows into one end of the pipeline and some result or results come out from the other end. This is great when data can be processed at different rates or when parallel processing can efficiently spread work out across multiple CPU cores.
  • #6: Dataflow is a paradigm shift but when the developers overcome the discomfort of the paradigm shift they will benefit from the high expressivity of the code.