SlideShare a Scribd company logo
Recreating “The Clock”
with Machine Learning
and Web Scraping
Kirk Kaiser
@burningion
Recreating "The Clock" with Machine Learning and Web Scraping
Dedicated hardware
for inference
Availability of large,
labeled datasets
Original gains with
CNN to detect objects
in images
There is (almost) nothing new in
universal turing machines.
Why is machine learning
suddenly a thing?
DeepMind &
AlphaGo Zero
OpenAI & Dota 2
DeepFakes
& Nicholas Cage
Build, clean, and label
a dataset
Build a model Deploy and Evaluate
How does it work?
???
How to build a deep learning
model
model
build a deep learning
model
Actual model used for a project:
Build, clean, and label
a dataset
Build a model Deploy and Evaluate
How does it work?
Real world data is
filled with garbage
and messy
Models tend to be
brittle and opaque,
difficult to debug
How do you
measure
‘performance’?
ML Development Lifecycle
is Different from Traditional
Development
Suddenly the machine you’re
running on matters.
(GPUs and TPUs necessary for
training quickly on massive
datasets.)
Driver updates can speed up your
code 300% (?!)
Training cost to reproduce GPT-2
text generation model (via
OpenGPT-2) from scratch ~$50,000
on GCP*
* https://blog.usejournal.com/opengpt-2-we-replicated-
gpt-2-because-you-can-too-45e34e6d36dc
Training Computational
Needs Can Be Staggering
Computational costs for running
AlphaGo Zero estimated to cost
~$35 million*
* https://www.yuzeh.com/data/agz-cost.html
Cell phones are all adding
acceleration to native chips
Bandwidth and latency limitations
on cameras forces inference
computation to edge
Inference Gets Pushed Out to
the Edge
Gather, clean, and label data Once a model is deployed, can
then be used to bootstrap better
data
Datasets Become Integral
Part of Your Code
Web Software
Development
is Changing
More Container Adoption
Especially in Larger
Environments
And Containers Usually Means
Orchestration
Moving from managing and deploying
to individual servers to pushing code to
a “sea of infinite computation”
But there is a
tradeoff...
– etcd, default database to
manage kubernetes state,
recommends 5 (!) server
instances to durability
– Growing list of subpackages
to control networking layers,
load balancing (Istio, Envoy)
– Completely rethink
development, testing,
deployment as kubernetes
grows beyond single dev
machine
Kubernetes
comes with extra
complexity
etcd
Docker
Kubernetes
Envoy
Minikube
Istio
containerd
Spinnaker
That complexity is trade off in move
from building software as a service
to software as a utility.
More Pieces to Manage and
Deploy
Kubeflow
ML toolkit for
Kubernetes
from Google
TensorRT Inference
Server
Custom Inference server
with Optimizations for
NVIDIA Hardware
Pachyderm
Version control for data,
and data pipelines
Kubernetes Native ML Tools
Kubeflow supports plenty of
options for your ML workflow
That’s a lot of
moving pieces!
How do I get
started?
Simplify first.
Run Kubernetes
locally, with GPU
acceleration.
@sterlingcrispin
+
NVIDIA Container Registry
https://ngc.nvidia.com
PyTorch & FFMPEG w/ Hardware
Acceleration Dockerfile
Example YAML
Manifest for
Kubernetes
Observability in systems become a
necessity to understand what’s
happening in software with so many
moving pieces.
For example, pipelines.
See complete units of work, as they
pass through your entire system,
especially useful in Pipelines with
multiple steps.
Add tags to be able to see specific
customers, organizations, and their
direct experience with your
systems.
Compress Distributed System
Complexity with Traces
Individual Trace
See bottlenecks in CPU, disk space,
memory usage.
For GPU / TPU, see hardware level
metrics
Correlate with logs and traces to
isolate errors to software or
hardware level issues.
See History of Machine State
with Metrics
Add Observability with
DaemonSets
Recreating "The Clock" with Machine Learning and Web Scraping
Ingested logs show the history of
work on each individual system
component.
Correlate with traces and metrics to
isolate errors to library, software
dependency level
See Auditable Trail of Side
Effects with Logs
Recreating "The Clock" with Machine Learning and Web Scraping
– Either of these advances take
in isolation adds to a
platform
– Taken together, we have a
chance to rethink the way
software behaves. (Images as
code and APIs.)
Object detection
alone opens up
new platforms
Bin-e, a self
sorting recycling
bin
Dab & T-Pose
controlled lights
Jetson Nano
$99 GPU Accelerated
Machine
Recreating "The Clock" with Machine Learning and Web Scraping
Thank you!
Datadog is hiring!
Code / Blog post:
https://dtdg.co/the-clock

More Related Content

PDF
Build and Monitor Machine Learning Services in Kubernetes
PDF
Scaling MLOps on NVIDIA DGX Systems
PPTX
Containerizing GPU Applications with Docker for Scaling to the Cloud
PDF
Beyond Ingresses - Better Traffic Management in Kubernetes
PDF
Nyc kubernetes Meetup - Kubeflow Lightning talk
PPTX
TensorFlow London 14: Ben Hall 'Machine Learning Workloads with Kubernetes an...
PDF
Making cloud native deployments easy with Buildpack
PDF
饿了么 TensorFlow 深度学习平台:elearn
Build and Monitor Machine Learning Services in Kubernetes
Scaling MLOps on NVIDIA DGX Systems
Containerizing GPU Applications with Docker for Scaling to the Cloud
Beyond Ingresses - Better Traffic Management in Kubernetes
Nyc kubernetes Meetup - Kubeflow Lightning talk
TensorFlow London 14: Ben Hall 'Machine Learning Workloads with Kubernetes an...
Making cloud native deployments easy with Buildpack
饿了么 TensorFlow 深度学习平台:elearn

What's hot (20)

PPTX
Distributed tensorflow on kubernetes
PDF
Microsoft Azure in HPC scenarios
PPTX
MEW22 22nd Machine Evaluation Workshop Microsoft
PDF
Kubernetes community demo march 16 2017
PDF
PuppetConf 2017: Kubernetes in the Cloud w/ Puppet + Google Container Engine-...
PDF
Bitfusion Nimbix Dev Summit Heterogeneous Architectures
PDF
HybridAzureCloud
PDF
How to make cloud native platform by kubernetes
PDF
GPU cloud with Job scheduler and Container
PDF
How to integrate Kubernetes in OpenStack: You need to know these project
PDF
Embracing clouds
PDF
Enterprise Kubernetes from Canonical
PDF
How Kubernetes make OpenStack & Ceph better
PDF
Making cloud native platform by kubernetes
PDF
Google Cloud - Stand Out Features
PDF
running Tensorflow in Production
PDF
Webinar kubernetes and-spark
PDF
Introducing Pico - A Deep Learning Platform using Docker & IoT - Sangam Biradar
PDF
PDF
Cantainer CI/ CD with Kubernetes
Distributed tensorflow on kubernetes
Microsoft Azure in HPC scenarios
MEW22 22nd Machine Evaluation Workshop Microsoft
Kubernetes community demo march 16 2017
PuppetConf 2017: Kubernetes in the Cloud w/ Puppet + Google Container Engine-...
Bitfusion Nimbix Dev Summit Heterogeneous Architectures
HybridAzureCloud
How to make cloud native platform by kubernetes
GPU cloud with Job scheduler and Container
How to integrate Kubernetes in OpenStack: You need to know these project
Embracing clouds
Enterprise Kubernetes from Canonical
How Kubernetes make OpenStack & Ceph better
Making cloud native platform by kubernetes
Google Cloud - Stand Out Features
running Tensorflow in Production
Webinar kubernetes and-spark
Introducing Pico - A Deep Learning Platform using Docker & IoT - Sangam Biradar
Cantainer CI/ CD with Kubernetes
Ad

Similar to Recreating "The Clock" with Machine Learning and Web Scraping (20)

PDF
How To Build Efficient ML Pipelines From The Startup Perspective (GTC Silicon...
PPTX
Machine learning in the wild deployment
PPTX
Kubernetes for machine learning
PDF
Containerized architectures for deep learning
PDF
Kostiantyn Bokhan, N-iX. CD4ML based on Azure and Kubeflow
PPTX
Leonid Kuligin "Training ML models with Cloud"
PDF
Large Scale Deep Learning with TensorFlow
PDF
AI for Intelligent Cloud and Intelligent Edge: Discover, Deploy, and Manage w...
PPTX
Deep Learning on Qubole Data Platform
PDF
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Deep Learning at Scale - A...
PPTX
Explore Deep Learning Architecture using Tensorflow 2.0 now! Part 2
PDF
“Parallelizing Machine Learning Applications in the Cloud with Kubernetes: A ...
PPTX
Integrating Machine Learning Capabilities into your team
PDF
End to end MLworkflows
PDF
Dog Breed Classification using PyTorch on Azure Machine Learning
PDF
Scaling Deep Learning Algorithms on Extreme Scale Architectures
PPTX
AI at Google (30 min)
PDF
Machine learning at scale with Google Cloud Platform
PPTX
Production ML Systems and Computer Vision with Google Cloud
PDF
Provenance in Production-Grade Machine Learning
How To Build Efficient ML Pipelines From The Startup Perspective (GTC Silicon...
Machine learning in the wild deployment
Kubernetes for machine learning
Containerized architectures for deep learning
Kostiantyn Bokhan, N-iX. CD4ML based on Azure and Kubeflow
Leonid Kuligin "Training ML models with Cloud"
Large Scale Deep Learning with TensorFlow
AI for Intelligent Cloud and Intelligent Edge: Discover, Deploy, and Manage w...
Deep Learning on Qubole Data Platform
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Deep Learning at Scale - A...
Explore Deep Learning Architecture using Tensorflow 2.0 now! Part 2
“Parallelizing Machine Learning Applications in the Cloud with Kubernetes: A ...
Integrating Machine Learning Capabilities into your team
End to end MLworkflows
Dog Breed Classification using PyTorch on Azure Machine Learning
Scaling Deep Learning Algorithms on Extreme Scale Architectures
AI at Google (30 min)
Machine learning at scale with Google Cloud Platform
Production ML Systems and Computer Vision with Google Cloud
Provenance in Production-Grade Machine Learning
Ad

Recently uploaded (20)

PDF
Approach and Philosophy of On baking technology
PDF
Machine learning based COVID-19 study performance prediction
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Empathic Computing: Creating Shared Understanding
PPTX
MYSQL Presentation for SQL database connectivity
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PPTX
sap open course for s4hana steps from ECC to s4
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Encapsulation theory and applications.pdf
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
Approach and Philosophy of On baking technology
Machine learning based COVID-19 study performance prediction
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Digital-Transformation-Roadmap-for-Companies.pptx
Empathic Computing: Creating Shared Understanding
MYSQL Presentation for SQL database connectivity
MIND Revenue Release Quarter 2 2025 Press Release
sap open course for s4hana steps from ECC to s4
Review of recent advances in non-invasive hemoglobin estimation
Per capita expenditure prediction using model stacking based on satellite ima...
The Rise and Fall of 3GPP – Time for a Sabbatical?
Encapsulation_ Review paper, used for researhc scholars
Spectral efficient network and resource selection model in 5G networks
NewMind AI Weekly Chronicles - August'25 Week I
Network Security Unit 5.pdf for BCA BBA.
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Encapsulation theory and applications.pdf
Building Integrated photovoltaic BIPV_UPV.pdf
Advanced methodologies resolving dimensionality complications for autism neur...

Recreating "The Clock" with Machine Learning and Web Scraping