SlideShare a Scribd company logo
Democratizing AI with
Apache Spark
Ali Ghodsi
Co-Founder and CEO
AI is changing the world
2
Why now?
AlphaGoSIRI/assistantsSelf-driving cars
Data is the catalyst
3
AI hasn’t been democratized
Better training, tuning,
validation
More data
Clickstreams
Sensor data (IoT)
Video
Speech
Handwriting
…
The hardest part of AI isn’t AI
4
“Hidden Technical Debt in Machine Learning Systems “, Google NIPS
2015
How do we democratize AI?
5
“Hidden Technical Debt in Machine Learning Systems “, Google
NIPS 2015
+ AI
FLEXIBLE FAST BIG DATA
Some gaps remain
6
Manage Data
infrastructure
• Create, configure, monitor resilient big data clusters.
• Securely access silos of disparate data sources.
• Enforce proper data governance.
•1
Empower teams to be
productive
• Interactively explore data and prototype ideas.
• Securely share big data clusters among analysts.
• Debug, troubleshoot, version-control big data applications.•
2
Establish Production-
Ready Applications
• Setup robust ML data pipelines for ETL/ELT.
• Productionize real-time applications with HA, FT.
• Build, serve, maintain advanced machine learning models.
•
3
Databricks: Closing the gap
7
• Separate compute &
storage
• Integrate existing data
stores
• Efficient cache on first
access
Just-in-Time
Data Platform1
Agile + Low TCO
• Interactive notebooks,
dashboards, reports
• Real-time exploration,
machine learning, graph
use cases
Integrated Workspace2
Accelerate Time to
Value
• Workflow scheduler for
ML, streaming, SQL, ETL
• Performance-optimized,
high availability, fault-
tolerant
Automated
Spark Management3
Performance
Enterprise AI use-cases
8
Predict credit score, credit limit, anomalies
Predict energy demand based on massive weather data
Natural language processing to extract author graph
Predict player churn, predicting network outages
Predict machine equipment failure
New Frontier of AI: Deep Learning
9
Detect cancer Understand
speech
Infer location
Identify landmarks in
photos
Recognize Mandarin and
English
Improve cancer detection
Faster and easier deep learning with Databricks
10
GPUs
• TensorFlow: The most
popular deep learning
framework.
• TensorFrames: Makes
TensorFlow computations
faster and easier to program
on Spark.
TensorFlow on
TensorFrames and GPUs support out-of-the-box
Massive parallelism
Deep Learning on Databricks
11
Data
Ingest
Feature
extractio
n
Model
Training
Product-
ionize
Clusters
Jobs &
Workflows
TensorFrames
+
GPUs
Interactive
exploration
Just-in-time data
platform
Automated
management
Thank you.
Deep Learning references
• Image recognition (Geo ID):
– https://www.technologyreview.com/s/600889/google-unveils-
neural-network-with-superhuman-ability-to-determine-the-location-
of-almost/
• Cancer screening:
– http://www.popsci.com/how-deep-learning-technology-could-be-
next-step-in-cancer-detection
– https://blogs.nvidia.com/blog/2016/09/19/deep-learning-breast-
cancer-diagnosis/
• Speech translation:
– https://www.technologyreview.com/s/544651/baidus-deep-
learning-system-rivals-people-at-speech-recognition/
13
PHARM
A
MEDIA INDUSTRIA
L
Generating programs
based on Nielsen ratings
Predictive Analytics
Analytics Transforming Industries
15
Real-time detection
of failing wind-turbines
Anomaly Detection
Predicting Diabetes
in Rural Counties
Next-Gen Product R&D
Real-time Data-Driven Analytics Applications
DATA
WAREHOUSE
S
HADOOP /
DATA LAKESYour
Storage
CLOUD
STORAGE
Orchestrated
Spark In The
Cloud
16
Databricks Just-in-Time Data Platform
Integrated
WorkspaceDASHBOARD
S
Reports
NOTEBOOKS
github, viz,
collaboration
EnterpriseSecurity
AccessControl,Auditing,Encryption
BI Tools
Open
Source
Your Custom Spark
Apps
• Clusters: Auto-scaled, resilient, multi-tenant
• Data Integration: Universal secure and fast
• Interfaces: BI tools & REST API
Databricks Managed Services
PRODUCTION JOBS
+
Powered by Apache Spark
Data Warehouses Hadoop / Data Lakes
YOUR
STORAGE
Cloud Storage
Orchestrated Spark In The Cloud
Databricks Just-in-Time Data Platform
Integrated Workspace
Dashboards
Reports
Notebooks
github, viz,
collaboration
BI Tools
Open Source
Your Custom Spark
Apps
• Clusters: Auto-scaled, resilient, multi-tenant
• Data Integration: Universal secure and fast
• Interfaces: BI tools & REST API
Production Jobs
Powered by Apache Spark
+ Managed Services
Your Storage
Enterprise SecurityAccess Control, Auditing, Encryption
Today’s Data Reality
18
Siloed, Unstructured, Fast-Growing Data
Data Warehouses Hadoop / Data LakesCloud Storage
Databricks Just-in-Time Data Platform
Powered by Apache Spark® ™
Integrated Enterprise Security Framework
Integrated Workspace
Notebooks Dashboards
Your Custom Spark
Apps
Production Jobs
BI Tools
Orchestrated Apache® Spark™ in the Cloud
Open Source Managed Services+
Your Storage
Cloud Storage | Data Warehouses | Data Lakes
OPTIONAL

More Related Content

PPTX
The Potential of GPU-driven High Performance Data Analytics in Spark
PPTX
R&D to Product Pipeline Using Apache Spark in AdTech: Spark Summit East talk ...
PDF
Application and Challenges of Streaming Analytics and Machine Learning on Mu...
PDF
Building Intelligent Applications, Experimental ML with Uber’s Data Science W...
PDF
Deep Learning on Apache Spark at CERN’s Large Hadron Collider with Intel Tech...
PDF
Tactical Data Science Tips: Python and Spark Together
PDF
Spark in the Hadoop Ecosystem-(Mike Olson, Cloudera)
PDF
Auto Scaling Systems With Elastic Spark Streaming: Spark Summit East talk by ...
The Potential of GPU-driven High Performance Data Analytics in Spark
R&D to Product Pipeline Using Apache Spark in AdTech: Spark Summit East talk ...
Application and Challenges of Streaming Analytics and Machine Learning on Mu...
Building Intelligent Applications, Experimental ML with Uber’s Data Science W...
Deep Learning on Apache Spark at CERN’s Large Hadron Collider with Intel Tech...
Tactical Data Science Tips: Python and Spark Together
Spark in the Hadoop Ecosystem-(Mike Olson, Cloudera)
Auto Scaling Systems With Elastic Spark Streaming: Spark Summit East talk by ...

What's hot (20)

PDF
Big Data Day LA 2015 - Machine Learning on Largish Data by Szilard Pafka of E...
PDF
Real-Time Analytics and Actions Across Large Data Sets with Apache Spark
PDF
Distributed Deep Learning At Scale On Apache Spark With BigDL
PDF
Productionizing Machine Learning Pipelines with Databricks and Azure ML
PDF
Geospatial Analytics at Scale with Deep Learning and Apache Spark
PDF
Data Warehousing with Spark Streaming at Zalando
PDF
Scalable AutoML for Time Series Forecasting using Ray
PDF
Distributed Deep Learning with Hadoop and TensorFlow
PDF
Apache Spark-Based Stratification Library for Machine Learning Use Cases at N...
PDF
SparkML: Easy ML Productization for Real-Time Bidding
PDF
Automated Production Ready ML at Scale
PDF
Apache Spark AI Use Case in Telco: Network Quality Analysis and Prediction wi...
PDF
Taking Jupyter Notebooks and Apache Spark to the Next Level PixieDust with Da...
PDF
Cloud Experience: Data-driven Applications Made Simple and Fast
PDF
Distributed Inference on Large Datasets Using Apache MXNet and Apache Spark ...
PDF
Free servers to build Big Data Systems on: Bing's Approach
PDF
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
PDF
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
PDF
Scaling Through Simplicity—How a 300 million User Chat App Reduced Data Engin...
PDF
AI on Spark for Malware Analysis and Anomalous Threat Detection
Big Data Day LA 2015 - Machine Learning on Largish Data by Szilard Pafka of E...
Real-Time Analytics and Actions Across Large Data Sets with Apache Spark
Distributed Deep Learning At Scale On Apache Spark With BigDL
Productionizing Machine Learning Pipelines with Databricks and Azure ML
Geospatial Analytics at Scale with Deep Learning and Apache Spark
Data Warehousing with Spark Streaming at Zalando
Scalable AutoML for Time Series Forecasting using Ray
Distributed Deep Learning with Hadoop and TensorFlow
Apache Spark-Based Stratification Library for Machine Learning Use Cases at N...
SparkML: Easy ML Productization for Real-Time Bidding
Automated Production Ready ML at Scale
Apache Spark AI Use Case in Telco: Network Quality Analysis and Prediction wi...
Taking Jupyter Notebooks and Apache Spark to the Next Level PixieDust with Da...
Cloud Experience: Data-driven Applications Made Simple and Fast
Distributed Inference on Large Datasets Using Apache MXNet and Apache Spark ...
Free servers to build Big Data Systems on: Bing's Approach
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
Scaling Through Simplicity—How a 300 million User Chat App Reduced Data Engin...
AI on Spark for Malware Analysis and Anomalous Threat Detection
Ad

Similar to Democratizing AI with Apache Spark (20)

PDF
Spark Summit Europe 2016 Keynote - Databricks CEO
PDF
Uber - Building Intelligent Applications, Experimental ML with Uber’s Data Sc...
PDF
Problem Definition muAoPS | Analytics Problem Solving | Mu Sigma
PDF
Using Algorithmia to leverage AI and Machine Learning APIs
PPTX
Deep Learning and Recurrent Neural Networks in the Enterprise
PDF
Start Getting Your Feet Wet in Open Source Machine and Deep Learning
PDF
Ανδρέας Τσαγκάρης, 5th Digital Banking Forum
PPTX
Getting Started With Dato - August 2015
PPTX
Getting Started with Splunk Breakout Session
PDF
Nervana AI Overview Deck April 2016
PPTX
Getting Started with Splunk Breakout Session
PDF
Data science lab enabling flexibility
PPTX
Danny Bickson - Python based predictive analytics with GraphLab Create
PPTX
DevoxxUK 2016: "DevOps: Microservices, containers, platforms, tooling... Oh y...
PDF
EDW 2015 cognitive computing panel session
PDF
TechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - Trivadis
PDF
Hadoop and SAP BI
PPTX
Data analytics on Azure
PPTX
Google Cloud Platform: Prototype ->Production-> Planet scale
PDF
System Security on Cloud
Spark Summit Europe 2016 Keynote - Databricks CEO
Uber - Building Intelligent Applications, Experimental ML with Uber’s Data Sc...
Problem Definition muAoPS | Analytics Problem Solving | Mu Sigma
Using Algorithmia to leverage AI and Machine Learning APIs
Deep Learning and Recurrent Neural Networks in the Enterprise
Start Getting Your Feet Wet in Open Source Machine and Deep Learning
Ανδρέας Τσαγκάρης, 5th Digital Banking Forum
Getting Started With Dato - August 2015
Getting Started with Splunk Breakout Session
Nervana AI Overview Deck April 2016
Getting Started with Splunk Breakout Session
Data science lab enabling flexibility
Danny Bickson - Python based predictive analytics with GraphLab Create
DevoxxUK 2016: "DevOps: Microservices, containers, platforms, tooling... Oh y...
EDW 2015 cognitive computing panel session
TechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - Trivadis
Hadoop and SAP BI
Data analytics on Azure
Google Cloud Platform: Prototype ->Production-> Planet scale
System Security on Cloud
Ad

More from Spark Summit (20)

PDF
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
PDF
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
PDF
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
PDF
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
PDF
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
PDF
Apache Spark and Tensorflow as a Service with Jim Dowling
PDF
Apache Spark and Tensorflow as a Service with Jim Dowling
PDF
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
PDF
Next CERN Accelerator Logging Service with Jakub Wozniak
PDF
Powering a Startup with Apache Spark with Kevin Kim
PDF
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
PDF
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
PDF
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
PDF
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
PDF
Goal Based Data Production with Sim Simeonov
PDF
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
PDF
Getting Ready to Use Redis with Apache Spark with Dvir Volk
PDF
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
PDF
Indicium: Interactive Querying at Scale Using Apache Spark, Zeppelin, and Spa...
PDF
Apache Spark-Bench: Simulate, Test, Compare, Exercise, and Yes, Benchmark wit...
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim Dowling
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
Next CERN Accelerator Logging Service with Jakub Wozniak
Powering a Startup with Apache Spark with Kevin Kim
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Goal Based Data Production with Sim Simeonov
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Getting Ready to Use Redis with Apache Spark with Dvir Volk
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
Indicium: Interactive Querying at Scale Using Apache Spark, Zeppelin, and Spa...
Apache Spark-Bench: Simulate, Test, Compare, Exercise, and Yes, Benchmark wit...

Recently uploaded (20)

PDF
Clinical guidelines as a resource for EBP(1).pdf
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
PPTX
Moving the Public Sector (Government) to a Digital Adoption
PPTX
Supervised vs unsupervised machine learning algorithms
PDF
.pdf is not working space design for the following data for the following dat...
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PPTX
Database Infoormation System (DBIS).pptx
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PPTX
Major-Components-ofNKJNNKNKNKNKronment.pptx
PDF
Foundation of Data Science unit number two notes
PDF
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
PPTX
Introduction to Knowledge Engineering Part 1
PPTX
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PPT
Quality review (1)_presentation of this 21
PPTX
Business Acumen Training GuidePresentation.pptx
PPT
Chapter 2 METAL FORMINGhhhhhhhjjjjmmmmmmmmm
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
Clinical guidelines as a resource for EBP(1).pdf
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
STUDY DESIGN details- Lt Col Maksud (21).pptx
Moving the Public Sector (Government) to a Digital Adoption
Supervised vs unsupervised machine learning algorithms
.pdf is not working space design for the following data for the following dat...
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
Database Infoormation System (DBIS).pptx
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
Major-Components-ofNKJNNKNKNKNKronment.pptx
Foundation of Data Science unit number two notes
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
Introduction to Knowledge Engineering Part 1
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
Galatica Smart Energy Infrastructure Startup Pitch Deck
Quality review (1)_presentation of this 21
Business Acumen Training GuidePresentation.pptx
Chapter 2 METAL FORMINGhhhhhhhjjjjmmmmmmmmm
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
Acceptance and paychological effects of mandatory extra coach I classes.pptx

Democratizing AI with Apache Spark

  • 1. Democratizing AI with Apache Spark Ali Ghodsi Co-Founder and CEO
  • 2. AI is changing the world 2 Why now? AlphaGoSIRI/assistantsSelf-driving cars
  • 3. Data is the catalyst 3 AI hasn’t been democratized Better training, tuning, validation More data Clickstreams Sensor data (IoT) Video Speech Handwriting …
  • 4. The hardest part of AI isn’t AI 4 “Hidden Technical Debt in Machine Learning Systems “, Google NIPS 2015 How do we democratize AI?
  • 5. 5 “Hidden Technical Debt in Machine Learning Systems “, Google NIPS 2015 + AI FLEXIBLE FAST BIG DATA
  • 6. Some gaps remain 6 Manage Data infrastructure • Create, configure, monitor resilient big data clusters. • Securely access silos of disparate data sources. • Enforce proper data governance. •1 Empower teams to be productive • Interactively explore data and prototype ideas. • Securely share big data clusters among analysts. • Debug, troubleshoot, version-control big data applications.• 2 Establish Production- Ready Applications • Setup robust ML data pipelines for ETL/ELT. • Productionize real-time applications with HA, FT. • Build, serve, maintain advanced machine learning models. • 3
  • 7. Databricks: Closing the gap 7 • Separate compute & storage • Integrate existing data stores • Efficient cache on first access Just-in-Time Data Platform1 Agile + Low TCO • Interactive notebooks, dashboards, reports • Real-time exploration, machine learning, graph use cases Integrated Workspace2 Accelerate Time to Value • Workflow scheduler for ML, streaming, SQL, ETL • Performance-optimized, high availability, fault- tolerant Automated Spark Management3 Performance
  • 8. Enterprise AI use-cases 8 Predict credit score, credit limit, anomalies Predict energy demand based on massive weather data Natural language processing to extract author graph Predict player churn, predicting network outages Predict machine equipment failure
  • 9. New Frontier of AI: Deep Learning 9 Detect cancer Understand speech Infer location Identify landmarks in photos Recognize Mandarin and English Improve cancer detection
  • 10. Faster and easier deep learning with Databricks 10 GPUs • TensorFlow: The most popular deep learning framework. • TensorFrames: Makes TensorFlow computations faster and easier to program on Spark. TensorFlow on TensorFrames and GPUs support out-of-the-box Massive parallelism
  • 11. Deep Learning on Databricks 11 Data Ingest Feature extractio n Model Training Product- ionize Clusters Jobs & Workflows TensorFrames + GPUs Interactive exploration Just-in-time data platform Automated management
  • 13. Deep Learning references • Image recognition (Geo ID): – https://www.technologyreview.com/s/600889/google-unveils- neural-network-with-superhuman-ability-to-determine-the-location- of-almost/ • Cancer screening: – http://www.popsci.com/how-deep-learning-technology-could-be- next-step-in-cancer-detection – https://blogs.nvidia.com/blog/2016/09/19/deep-learning-breast- cancer-diagnosis/ • Speech translation: – https://www.technologyreview.com/s/544651/baidus-deep- learning-system-rivals-people-at-speech-recognition/ 13
  • 14. PHARM A MEDIA INDUSTRIA L Generating programs based on Nielsen ratings Predictive Analytics Analytics Transforming Industries 15 Real-time detection of failing wind-turbines Anomaly Detection Predicting Diabetes in Rural Counties Next-Gen Product R&D Real-time Data-Driven Analytics Applications
  • 15. DATA WAREHOUSE S HADOOP / DATA LAKESYour Storage CLOUD STORAGE Orchestrated Spark In The Cloud 16 Databricks Just-in-Time Data Platform Integrated WorkspaceDASHBOARD S Reports NOTEBOOKS github, viz, collaboration EnterpriseSecurity AccessControl,Auditing,Encryption BI Tools Open Source Your Custom Spark Apps • Clusters: Auto-scaled, resilient, multi-tenant • Data Integration: Universal secure and fast • Interfaces: BI tools & REST API Databricks Managed Services PRODUCTION JOBS + Powered by Apache Spark
  • 16. Data Warehouses Hadoop / Data Lakes YOUR STORAGE Cloud Storage Orchestrated Spark In The Cloud Databricks Just-in-Time Data Platform Integrated Workspace Dashboards Reports Notebooks github, viz, collaboration BI Tools Open Source Your Custom Spark Apps • Clusters: Auto-scaled, resilient, multi-tenant • Data Integration: Universal secure and fast • Interfaces: BI tools & REST API Production Jobs Powered by Apache Spark + Managed Services Your Storage Enterprise SecurityAccess Control, Auditing, Encryption
  • 17. Today’s Data Reality 18 Siloed, Unstructured, Fast-Growing Data Data Warehouses Hadoop / Data LakesCloud Storage
  • 18. Databricks Just-in-Time Data Platform Powered by Apache Spark® ™ Integrated Enterprise Security Framework Integrated Workspace Notebooks Dashboards Your Custom Spark Apps Production Jobs BI Tools Orchestrated Apache® Spark™ in the Cloud Open Source Managed Services+ Your Storage Cloud Storage | Data Warehouses | Data Lakes OPTIONAL

Editor's Notes

  • #3: THIS IS NOT JUST SCI-FI
  • #6: BUT SPARK DOESN’T GET YOU ALL THE WAY THERE
  • #8: 1: Global publisher Elsevier – team in US and EU perform natural language processing on all their content to experiment with new product ideas. 2: Energy analytics company DNV GL – Databricks sped up analytics of IoT data from weather and grid sensor by 100x. 3: Financial service provider LendUp– Databricks enabled them to update their machine models daily instead of weekly.
  • #9: TRANSITION TO NEXT: A NEW USE CASE COMPANIES HAVE BEEN ASKING FOR
  • #10: THERE IS A LOT OF INTEREST IN DEEP LEARNING, THE MACHINE LEARNING TECHNIQUE THAT HAS ACHIEVED REMARKABLE RESULTS ** PICTURE IS BRUSSELS **
  • #11: IN RESPONSE TO THE GROWING DEMAND FOR DEEP LEARNING ON SPARK FROM THE COMMUNITY, DATABRICKS HAS BEEN WORKING HARD, AND IS PROUD TO ANNOUNCE TODAY THAT…
  • #12: In summary, Databricks mission has always been to make big data simple. With the new developments we are taking a next step in this journey, DEMOCRATIZING DEEP LEARNING by making it simple for all
  • #15: WHY DID WE START DATABRICKS? THIS IS WHERE THE WORLD IS GOING
  • #17: The DatabricksEnterprise Spark Platform Built by the creators of Spark Contribute 75% of the code Trained 20,000+ users Largest # of customers deploying Spark Spark is becoming the de facto standard for big data Started at UC Berkeley AMPLab in 2009 Most active open source project in data processing (1000+ contributors) Today, the de facto technology standard for big data processing is Apache Spark. Spark is an open source data processing framework that was built for speed, ease of use, and scale. It has been adopted by thousands of companies for every type of workload because it solves some of the most pressing data challenges: Ability to process a wide variety of data types. Unlimited scalability, for organizations with rapidly growing data. Flexible enough to be customized for many use cases and with a wide variety of programming languages. Databricks was founded by the creators Apache Spark. In addition to creating Spark, we continue to be the driving force behind Spark contributing over 75% of the code every release. (10X more than any other vendor); trained 20K Spark users on Databricks (more users trained than any other vendor) and has the largest number of customers deploying Spark (>200)…more than any other vendor. Additional info on Spark if required: Much of its benefits are due to how it unifies critical data analytics capabilities such as SQL, machine learning and streaming in a single framework. This enables enterprises to simultaneously achieve high performance computing at scale while simplifying their data processing infrastructure by avoiding the difficult integration of many disparate and difficult tools with a single powerful yet simple alternative.
  • #18: The DatabricksEnterprise Spark Platform Built by the creators of Spark Contribute 75% of the code Trained 20,000+ users Largest # of customers deploying Spark Spark is becoming the de facto standard for big data Started at UC Berkeley AMPLab in 2009 Most active open source project in data processing (1000+ contributors) Today, the de facto technology standard for big data processing is Apache Spark. Spark is an open source data processing framework that was built for speed, ease of use, and scale. It has been adopted by thousands of companies for every type of workload because it solves some of the most pressing data challenges: Ability to process a wide variety of data types. Unlimited scalability, for organizations with rapidly growing data. Flexible enough to be customized for many use cases and with a wide variety of programming languages. Databricks was founded by the creators Apache Spark. In addition to creating Spark, we continue to be the driving force behind Spark contributing over 75% of the code every release. (10X more than any other vendor); trained 20K Spark users on Databricks (more users trained than any other vendor) and has the largest number of customers deploying Spark (>200)…more than any other vendor. Additional info on Spark if required: Much of its benefits are due to how it unifies critical data analytics capabilities such as SQL, machine learning and streaming in a single framework. This enables enterprises to simultaneously achieve high performance computing at scale while simplifying their data processing infrastructure by avoiding the difficult integration of many disparate and difficult tools with a single powerful yet simple alternative.
  • #20: The DatabricksEnterprise Spark Platform Built by the creators of Spark Contribute 75% of the code Trained 20,000+ users Largest # of customers deploying Spark Spark is becoming the de facto standard for big data Started at UC Berkeley AMPLab in 2009 Most active open source project in data processing (1000+ contributors) Today, the de facto technology standard for big data processing is Apache Spark. Spark is an open source data processing framework that was built for speed, ease of use, and scale. It has been adopted by thousands of companies for every type of workload because it solves some of the most pressing data challenges: Ability to process a wide variety of data types. Unlimited scalability, for organizations with rapidly growing data. Flexible enough to be customized for many use cases and with a wide variety of programming languages. Databricks was founded by the creators Apache Spark. In addition to creating Spark, we continue to be the driving force behind Spark contributing over 75% of the code every release. (10X more than any other vendor); trained 20K Spark users on Databricks (more users trained than any other vendor) and has the largest number of customers deploying Spark (>200)…more than any other vendor. Additional info on Spark if required: Much of its benefits are due to how it unifies critical data analytics capabilities such as SQL, machine learning and streaming in a single framework. This enables enterprises to simultaneously achieve high performance computing at scale while simplifying their data processing infrastructure by avoiding the difficult integration of many disparate and difficult tools with a single powerful yet simple alternative.