What Is Apache Bahir ?
● Provides extensions for Apache Spark and Apache Flink
● Open source / Apache 2.0 license
● Streaming connectors and SQL data sources
● One grouped location for extensions
● Initiated in 2016 from Spark project
● A source for current and future extensions
Apache Bahir Flink Extensions
● Streaming Connectors
– ActiveMQ connector
– Akka connector
– Flume connector
– InfluxDB connector
– Kudu connector
– Netty connector
– Redis connector
Apache Bahir Spark Extensions
● SQL Data Sources
– Apache CouchDB/Cloudant data source
● Structured Streaming Data Sources
– Akka data source
– MQTT data source (new Sink)
Apache Bahir Spark Extensions
● Discretized Streams (DStreams) Connectors
– Apache CouchDB/Cloudant connector
– Akka connector
– Google Cloud Pub/Sub connector
– Cloud PubNub connector
– MQTT connector
– Twitter connector
– ZeroMQ connector (Enhanced Implementation)
Apache Bahir Importance
● Seems like a small project ? But it covers
– Multiple Spark extensions
– Multiple Flink extensions
– Possible future extensions
● Why is it important ?
– Knowledge of this project …
– Aids reuse, avoids the need to recreate connectors
– Saves money and time !
Apache Bahir Status
● OK great project but is it current ?
● Started in 2016 but is it still going ?
● Check Github
● https://github.com/apache/bahir-flink
– Last update 27/05/2020 => current
● https://github.com/apache/bahir
– Last update 20/01/2020 => current
Apache Bahir Documentation
● Flink connector documentation describes
– Dependencies
– Version compatibility
– Source and sink classes
– Linking for cluster execution
Apache Bahir Documentation
● Spark connector documentation describes
– Linking
– Configuration
– Examples
● Scala
● Java
● Python
● Taking MQTT as an example
● Documentation is comprehensive
Available Books
● See “Big Data Made Easy”
– Apress Jan 2015
●
See “Mastering Apache Spark”
– Packt Oct 2015
●
See “Complete Guide to Open Source Big Data Stack
– “Apress Jan 2018”
● Find the author on Amazon
– www.amazon.com/Michael-Frampton/e/B00NIQDOOM/
●
Connect on LinkedIn
– www.linkedin.com/in/mike-frampton-38563020
Connect
● Feel free to connect on LinkedIn
– www.linkedin.com/in/mike-frampton-38563020
● See my open source blog at
– open-source-systems.blogspot.com/
● I am always interested in
– New technology
– Opportunities
– Technology based issues
– Big data integration

More Related Content

PPTX
2018 07-11 - kafka integration patterns
PPTX
Unified Log London (May 2015) - Why your company needs a unified log
PPTX
AWS User Group UK: Why your company needs a unified log
PDF
Apache Pulsar Community-Jennifer
PPTX
Xtending nintex workflow cloud w azure functions - xchange conference
PDF
European Southern Observatory: Implementing Day CQ5 at ESO
PPTX
Continuous Intelligence - Streaming Apps That Are Always In Sync | Simon Cros...
PPTX
Snowplow and Kinesis - Presentation to the inaugural Amazon Kinesis London Us...
2018 07-11 - kafka integration patterns
Unified Log London (May 2015) - Why your company needs a unified log
AWS User Group UK: Why your company needs a unified log
Apache Pulsar Community-Jennifer
Xtending nintex workflow cloud w azure functions - xchange conference
European Southern Observatory: Implementing Day CQ5 at ESO
Continuous Intelligence - Streaming Apps That Are Always In Sync | Simon Cros...
Snowplow and Kinesis - Presentation to the inaugural Amazon Kinesis London Us...

What's hot (20)

PPTX
Flink September 2015 Community Update
PPTX
SouthBay SRE Meetup Jan 2016
PDF
Apache Pulsar: A borderless community
PPTX
APRICOT 2017: Trafficshifting: Avoiding Disasters & Improving Performance at ...
PPTX
Asynchronous micro-services and the unified log
PPT
What Crimean War gunboats teach us about the need for schema registries
PPTX
Building a Self-Service Hadoop Platform at Linkedin with Azkaban
PPTX
Couchbase Connect 2016: Monitoring Production Deployments The Tools – LinkedIn
PPTX
Couchbase Connect 2016
PDF
Rootconf
PDF
Span Conference: Why your company needs a unified log
PPTX
Reducing MTTR and False Escalations: Event Correlation at LinkedIn
PPTX
Monitoring OpenNebula with Icinga2
PDF
Nordstrom's Event-Sourced Architecture and Kafka-as-a-Service | Adam Weyant a...
PDF
Webinar: How to contribute to Apache Flink - Robert Metzger
PDF
Apache Flink
PDF
Social connections14: Super charge your API’s with Reactive streams
PDF
Performance Monitoring with Icinga2, Graphite und Grafana
PDF
Putting the Spark into Functional Fashion Tech Analystics
PDF
David Max SATURN 2018 - Migrating from Oracle to Espresso
Flink September 2015 Community Update
SouthBay SRE Meetup Jan 2016
Apache Pulsar: A borderless community
APRICOT 2017: Trafficshifting: Avoiding Disasters & Improving Performance at ...
Asynchronous micro-services and the unified log
What Crimean War gunboats teach us about the need for schema registries
Building a Self-Service Hadoop Platform at Linkedin with Azkaban
Couchbase Connect 2016: Monitoring Production Deployments The Tools – LinkedIn
Couchbase Connect 2016
Rootconf
Span Conference: Why your company needs a unified log
Reducing MTTR and False Escalations: Event Correlation at LinkedIn
Monitoring OpenNebula with Icinga2
Nordstrom's Event-Sourced Architecture and Kafka-as-a-Service | Adam Weyant a...
Webinar: How to contribute to Apache Flink - Robert Metzger
Apache Flink
Social connections14: Super charge your API’s with Reactive streams
Performance Monitoring with Icinga2, Graphite und Grafana
Putting the Spark into Functional Fashion Tech Analystics
David Max SATURN 2018 - Migrating from Oracle to Espresso
Ad

Similar to Apache Bahir (20)

PDF
Writing Apache Spark and Apache Flink Applications Using Apache Bahir
PDF
IoT Applications and Patterns using Apache Spark & Apache Bahir
PPTX
Apache frameworks for Big and Fast Data
PPTX
Unified Batch and Real-Time Stream Processing Using Apache Flink
PPTX
Why apache Flink is the 4G of Big Data Analytics Frameworks
PDF
Create and Manage APIs with API Connect, Swagger and Bluemix
PPTX
Apache-Flink-What-How-Why-Who-Where-by-Slim-Baltagi
PDF
Overview of modern software ecosystem for big data analysis
PPTX
Open Source Big Data Ingestion - Without the Heartburn!
PDF
Building Realtim Data Pipelines with Kafka Connect and Spark Streaming
PDF
9 Months and Counting with Jeff Borek of IBM OpenAPI Meetup 2016 09 15
PDF
Bay Area Apache Flink Meetup Community Update August 2015
PDF
Getting Data In and Out of Flink - Understanding Flink and Its Connector Ecos...
PDF
Building iot applications with Apache Spark and Apache Bahir
PDF
Using the Java Client Library by Noah Crowley, DevRel | InfluxData
PDF
2.0 Client Libraries & Using the Java Client by Noah Crowley, Developer Advoc...
PPTX
Overview of Apache Fink: the 4 G of Big Data Analytics Frameworks
PPTX
Overview of Apache Fink: The 4G of Big Data Analytics Frameworks
PPTX
Overview of Apache Flink: the 4G of Big Data Analytics Frameworks
PDF
InfluxDB 2.0 Client Libraries by Noah Crowley
Writing Apache Spark and Apache Flink Applications Using Apache Bahir
IoT Applications and Patterns using Apache Spark & Apache Bahir
Apache frameworks for Big and Fast Data
Unified Batch and Real-Time Stream Processing Using Apache Flink
Why apache Flink is the 4G of Big Data Analytics Frameworks
Create and Manage APIs with API Connect, Swagger and Bluemix
Apache-Flink-What-How-Why-Who-Where-by-Slim-Baltagi
Overview of modern software ecosystem for big data analysis
Open Source Big Data Ingestion - Without the Heartburn!
Building Realtim Data Pipelines with Kafka Connect and Spark Streaming
9 Months and Counting with Jeff Borek of IBM OpenAPI Meetup 2016 09 15
Bay Area Apache Flink Meetup Community Update August 2015
Getting Data In and Out of Flink - Understanding Flink and Its Connector Ecos...
Building iot applications with Apache Spark and Apache Bahir
Using the Java Client Library by Noah Crowley, DevRel | InfluxData
2.0 Client Libraries & Using the Java Client by Noah Crowley, Developer Advoc...
Overview of Apache Fink: the 4 G of Big Data Analytics Frameworks
Overview of Apache Fink: The 4G of Big Data Analytics Frameworks
Overview of Apache Flink: the 4G of Big Data Analytics Frameworks
InfluxDB 2.0 Client Libraries by Noah Crowley
Ad

More from Mike Frampton (20)

PDF
Apache Airavata
PDF
Apache MADlib AI/ML
PDF
Apache MXNet AI
PDF
Apache Gobblin
PDF
Apache Singa AI
PDF
Apache Ranger
PDF
OrientDB
PDF
Prometheus
PDF
Apache Tephra
PDF
Apache Kudu
PDF
Apache Arrow
PDF
JanusGraph DB
PDF
Apache Ignite
PDF
Apache Samza
PDF
Apache Edgent
PDF
Apache CouchDB
ODP
An introduction to Apache Mesos
ODP
An introduction to Pentaho
ODP
An introduction to Apache Thrift
ODP
An introduction to Apache Cassandra
Apache Airavata
Apache MADlib AI/ML
Apache MXNet AI
Apache Gobblin
Apache Singa AI
Apache Ranger
OrientDB
Prometheus
Apache Tephra
Apache Kudu
Apache Arrow
JanusGraph DB
Apache Ignite
Apache Samza
Apache Edgent
Apache CouchDB
An introduction to Apache Mesos
An introduction to Pentaho
An introduction to Apache Thrift
An introduction to Apache Cassandra

Recently uploaded (20)

PPTX
AI IN MARKETING- PRESENTED BY ANWAR KABIR 1st June 2025.pptx
PDF
Improvisation in detection of pomegranate leaf disease using transfer learni...
PPTX
Custom Battery Pack Design Considerations for Performance and Safety
PDF
NewMind AI Weekly Chronicles – August ’25 Week III
PPTX
Chapter 5: Probability Theory and Statistics
PDF
Comparative analysis of machine learning models for fake news detection in so...
PPTX
Microsoft Excel 365/2024 Beginner's training
PDF
Convolutional neural network based encoder-decoder for efficient real-time ob...
PPTX
Benefits of Physical activity for teenagers.pptx
PPTX
TEXTILE technology diploma scope and career opportunities
PDF
Credit Without Borders: AI and Financial Inclusion in Bangladesh
PPT
What is a Computer? Input Devices /output devices
PDF
How IoT Sensor Integration in 2025 is Transforming Industries Worldwide
PDF
Enhancing plagiarism detection using data pre-processing and machine learning...
PPTX
GROUP4NURSINGINFORMATICSREPORT-2 PRESENTATION
PDF
How ambidextrous entrepreneurial leaders react to the artificial intelligence...
PDF
Hybrid horned lizard optimization algorithm-aquila optimizer for DC motor
PDF
STKI Israel Market Study 2025 version august
PDF
OpenACC and Open Hackathons Monthly Highlights July 2025
PPTX
Final SEM Unit 1 for mit wpu at pune .pptx
AI IN MARKETING- PRESENTED BY ANWAR KABIR 1st June 2025.pptx
Improvisation in detection of pomegranate leaf disease using transfer learni...
Custom Battery Pack Design Considerations for Performance and Safety
NewMind AI Weekly Chronicles – August ’25 Week III
Chapter 5: Probability Theory and Statistics
Comparative analysis of machine learning models for fake news detection in so...
Microsoft Excel 365/2024 Beginner's training
Convolutional neural network based encoder-decoder for efficient real-time ob...
Benefits of Physical activity for teenagers.pptx
TEXTILE technology diploma scope and career opportunities
Credit Without Borders: AI and Financial Inclusion in Bangladesh
What is a Computer? Input Devices /output devices
How IoT Sensor Integration in 2025 is Transforming Industries Worldwide
Enhancing plagiarism detection using data pre-processing and machine learning...
GROUP4NURSINGINFORMATICSREPORT-2 PRESENTATION
How ambidextrous entrepreneurial leaders react to the artificial intelligence...
Hybrid horned lizard optimization algorithm-aquila optimizer for DC motor
STKI Israel Market Study 2025 version august
OpenACC and Open Hackathons Monthly Highlights July 2025
Final SEM Unit 1 for mit wpu at pune .pptx

Apache Bahir

  • 1. What Is Apache Bahir ? ● Provides extensions for Apache Spark and Apache Flink ● Open source / Apache 2.0 license ● Streaming connectors and SQL data sources ● One grouped location for extensions ● Initiated in 2016 from Spark project ● A source for current and future extensions
  • 2. Apache Bahir Flink Extensions ● Streaming Connectors – ActiveMQ connector – Akka connector – Flume connector – InfluxDB connector – Kudu connector – Netty connector – Redis connector
  • 3. Apache Bahir Spark Extensions ● SQL Data Sources – Apache CouchDB/Cloudant data source ● Structured Streaming Data Sources – Akka data source – MQTT data source (new Sink)
  • 4. Apache Bahir Spark Extensions ● Discretized Streams (DStreams) Connectors – Apache CouchDB/Cloudant connector – Akka connector – Google Cloud Pub/Sub connector – Cloud PubNub connector – MQTT connector – Twitter connector – ZeroMQ connector (Enhanced Implementation)
  • 5. Apache Bahir Importance ● Seems like a small project ? But it covers – Multiple Spark extensions – Multiple Flink extensions – Possible future extensions ● Why is it important ? – Knowledge of this project … – Aids reuse, avoids the need to recreate connectors – Saves money and time !
  • 6. Apache Bahir Status ● OK great project but is it current ? ● Started in 2016 but is it still going ? ● Check Github ● https://github.com/apache/bahir-flink – Last update 27/05/2020 => current ● https://github.com/apache/bahir – Last update 20/01/2020 => current
  • 7. Apache Bahir Documentation ● Flink connector documentation describes – Dependencies – Version compatibility – Source and sink classes – Linking for cluster execution
  • 8. Apache Bahir Documentation ● Spark connector documentation describes – Linking – Configuration – Examples ● Scala ● Java ● Python ● Taking MQTT as an example ● Documentation is comprehensive
  • 9. Available Books ● See “Big Data Made Easy” – Apress Jan 2015 ● See “Mastering Apache Spark” – Packt Oct 2015 ● See “Complete Guide to Open Source Big Data Stack – “Apress Jan 2018” ● Find the author on Amazon – www.amazon.com/Michael-Frampton/e/B00NIQDOOM/ ● Connect on LinkedIn – www.linkedin.com/in/mike-frampton-38563020
  • 10. Connect ● Feel free to connect on LinkedIn – www.linkedin.com/in/mike-frampton-38563020 ● See my open source blog at – open-source-systems.blogspot.com/ ● I am always interested in – New technology – Opportunities – Technology based issues – Big data integration