SlideShare a Scribd company logo
Building Scalable
Data Ingestion
Michael Pearce
Senior Engineering Manager (DevOps) @ Peak
AI
01 02 03
About Mike
Automation Infrastructure Security
About Peak
Helping customers overcome Barrier to
Entry for AI
Identify Opportunity, Quantify, Business Case
Kick Off, Onboarding, Live Service & Support
“Off the Shelf” Solutions
Customer AI
Demand AI
Supply AI
End to End AI Platform
Ingest
Unify, Transform, Model
Provide Insights
AWS Advanced Consulting Partner
ML Competency
Retail Competency
AI needs data.
Collecting it should be:
● Simple
● Effective
Secure File Transfer Protocol
SFTP
AWS Transfer for SFTP
Seamlessly migrate your file transfer workflows to AWS Transfer for
SFTP
Integrating existing
authentication (IAM)
providing DNS routing
(Route 53)
Nothing changes for your
customers and partners,
or their applications…
There is no infrastructure
to buy and setup.
So why build your own!?
Compatibility Issues Configurability Product Maturity
Must be:
● Scalable
● Highly Available
● Fault Tolerant
Building a custom solution
EC2
● Private Subnet
● Load Balancer
● Security Groups
● Route 53
Making it more secure and simple
S3
● Separate Servers and Data
○ Sync with S3
■ AZ replication
● IAM instance role and policy to manage access
OR
● Private Link S3 Endpoint
○ Traffic stays in AWS network
○ Access bucket privately without authentication
when you access the bucket from a VPC that has
an endpoint to S3
● Versioning, Logging, etc.
○ Auditing, recover from accidental deletion
Added resilience, availability and security - Data
Auto Scaling Group
● Launch Template
○ AMI, Instance Type etc.
● Health check replacements
● Scaling Policies
○ Elastic
○ Cost effective
● AZ spread
Fault tolerance and availability - Servers
User Management
● Systems Manager (SSM)
● Secrets Manager
● Step Functions
Simple and Automatable
Disk Management
● CloudWatch Agent
○ CloudWatch Alarms
● Step Functions
● Still using EBS as an intermediary
○ Data duplication
○ Scaling out could mean big cost implications
● Or just use EFS!?
○ Or Multi Attach EBS Provisioned IOPS volumes
(NEW)
Proactive monitoring and resolution
Building Scalable Data Ingestion
Building Scalable Data Ingestion
Find out more
medium.com/peak-product
peak.ai/hub/blogs
github.com/peak-ai
Engineering Opportunities
Open Positions:
● Head of Development
● UX Designer
● Lead Product Manager
Coming Soon:
● Senior DevOps
● Support Engineer
Manchester, UK
https://peak.ai/company/careers
For all our latest roles, please
head to our Peak careers page!
Building Scalable Data Ingestion
Michael Pearce
Senior Engineering Manager (DevOps) @ Peak AI

More Related Content

PPTX
Building scalable infrastructure for AI & ML
PDF
IaC: Tools of the trade
PPTX
AWS for .Net
PPTX
Infrastructure as Code (IaC): Introduction to scripted infrastructure
PDF
Amazon Web Services 101
PPTX
Reply Webinar Online - Mastering AWS - AI as a Service
PDF
Well Architected Framework Presentation @ TU Delft
PPTX
AWS announces the new Amazon Inspector for continual vulnerability management
Building scalable infrastructure for AI & ML
IaC: Tools of the trade
AWS for .Net
Infrastructure as Code (IaC): Introduction to scripted infrastructure
Amazon Web Services 101
Reply Webinar Online - Mastering AWS - AI as a Service
Well Architected Framework Presentation @ TU Delft
AWS announces the new Amazon Inspector for continual vulnerability management

What's hot (12)

PPTX
"Cars.com Journey to AWS Cloud" by Naresh Chintalcheru at Cars.com July 11 20...
PPTX
Basics of cloud computing ( aws )
PDF
Introduction to AWS
PPTX
How to Develop and Deploy Web-Scale Applications on AWS
PPTX
Aws vs. Azure: 5 Things You Need To Know
PDF
Exploring Cloud Computing with Amazon Web Services (AWS)
PPTX
Recover from accidental deletions of your snapshots using recycle bin
PDF
AWS chez Attestis
PPTX
The Future of Enterprise Applications is Serverless
PDF
AWS vs AZURE : Public Cloud Comparison
PPTX
Amazon Athena now supports new Lake Formation fine-grained security and relia...
PDF
Azure Arc by K.Narisorn // Azure Multi-Cloud
"Cars.com Journey to AWS Cloud" by Naresh Chintalcheru at Cars.com July 11 20...
Basics of cloud computing ( aws )
Introduction to AWS
How to Develop and Deploy Web-Scale Applications on AWS
Aws vs. Azure: 5 Things You Need To Know
Exploring Cloud Computing with Amazon Web Services (AWS)
Recover from accidental deletions of your snapshots using recycle bin
AWS chez Attestis
The Future of Enterprise Applications is Serverless
AWS vs AZURE : Public Cloud Comparison
Amazon Athena now supports new Lake Formation fine-grained security and relia...
Azure Arc by K.Narisorn // Azure Multi-Cloud
Ad

Similar to Building Scalable Data Ingestion (20)

PPTX
Solving Big Data problems on AWS by Rajnish Malik
PPTX
AWS Big Data Demystified #1.2 | Big Data architecture lessons learned
PDF
Data Analytics on AWS
PPTX
Auto scaling websites in the cloud
PDF
Systems Bioinformatics Workshop Keynote
PPTX
갑작스러운 유저의 수요 증가에 현명하게 대처하는 방법
PPTX
Migrating enterprise workloads to AWS
PDF
Astroinformatics 2014: Scientific Computing on the Cloud with Amazon Web Serv...
PDF
QwikSkills AWS Cloud Training - Curriculum.pdf
PPTX
Big data journey to the cloud rohit pujari 5.30.18
PDF
Jeff Barr Amazon Services Cloud Computing
PDF
Building a Bigdata Architecture on AWS
PDF
AWS CSAA Certification - Mindmaps and StudyNotes
PPTX
AWS big-data-demystified #1.1 | Big Data Architecture Lessons Learned | English
PPTX
Crunch Your Data in the Cloud with Elastic Map Reduce - Amazon EMR Hadoop
PDF
20180522 infra autoscaling_system
PDF
Data Analysis - Journey Through the Cloud
PPTX
Innovations and trends in Cloud. Connectfest Porto 2019
PDF
Scaling on AWS for the First 10 Million Users at Websummit Dublin
PDF
AWS Education and Research 101
Solving Big Data problems on AWS by Rajnish Malik
AWS Big Data Demystified #1.2 | Big Data architecture lessons learned
Data Analytics on AWS
Auto scaling websites in the cloud
Systems Bioinformatics Workshop Keynote
갑작스러운 유저의 수요 증가에 현명하게 대처하는 방법
Migrating enterprise workloads to AWS
Astroinformatics 2014: Scientific Computing on the Cloud with Amazon Web Serv...
QwikSkills AWS Cloud Training - Curriculum.pdf
Big data journey to the cloud rohit pujari 5.30.18
Jeff Barr Amazon Services Cloud Computing
Building a Bigdata Architecture on AWS
AWS CSAA Certification - Mindmaps and StudyNotes
AWS big-data-demystified #1.1 | Big Data Architecture Lessons Learned | English
Crunch Your Data in the Cloud with Elastic Map Reduce - Amazon EMR Hadoop
20180522 infra autoscaling_system
Data Analysis - Journey Through the Cloud
Innovations and trends in Cloud. Connectfest Porto 2019
Scaling on AWS for the First 10 Million Users at Websummit Dublin
AWS Education and Research 101
Ad

More from Michael Pearce (10)

PPTX
MLOps - Getting Machine Learning Into Production
PPTX
Linux CLI Primer
PPTX
Look mum, no hands! AWS Systems Manager for server management and automation.
PPTX
Sage Advice: Getting started with Amazon SageMaker
PPTX
Learning, Losing & Lessons Learnt: Cloud Certification the 2nd time around
PPTX
Git Primer
PPTX
Cloud Security and some preferred practices
PPTX
Cloudphrase: AWS basics
PPTX
Introduction to AWS VPC & Networking
PDF
Alexa, call SageMaker!
MLOps - Getting Machine Learning Into Production
Linux CLI Primer
Look mum, no hands! AWS Systems Manager for server management and automation.
Sage Advice: Getting started with Amazon SageMaker
Learning, Losing & Lessons Learnt: Cloud Certification the 2nd time around
Git Primer
Cloud Security and some preferred practices
Cloudphrase: AWS basics
Introduction to AWS VPC & Networking
Alexa, call SageMaker!

Recently uploaded (20)

PDF
KodekX | Application Modernization Development
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Review of recent advances in non-invasive hemoglobin estimation
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
cuic standard and advanced reporting.pdf
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Encapsulation theory and applications.pdf
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PPTX
Big Data Technologies - Introduction.pptx
PDF
Machine learning based COVID-19 study performance prediction
PDF
Encapsulation_ Review paper, used for researhc scholars
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
KodekX | Application Modernization Development
Network Security Unit 5.pdf for BCA BBA.
Review of recent advances in non-invasive hemoglobin estimation
20250228 LYD VKU AI Blended-Learning.pptx
NewMind AI Weekly Chronicles - August'25 Week I
Per capita expenditure prediction using model stacking based on satellite ima...
cuic standard and advanced reporting.pdf
Understanding_Digital_Forensics_Presentation.pptx
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Encapsulation theory and applications.pdf
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Big Data Technologies - Introduction.pptx
Machine learning based COVID-19 study performance prediction
Encapsulation_ Review paper, used for researhc scholars
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx

Building Scalable Data Ingestion

  • 1. Building Scalable Data Ingestion Michael Pearce Senior Engineering Manager (DevOps) @ Peak AI
  • 2. 01 02 03 About Mike Automation Infrastructure Security
  • 3. About Peak Helping customers overcome Barrier to Entry for AI Identify Opportunity, Quantify, Business Case Kick Off, Onboarding, Live Service & Support “Off the Shelf” Solutions Customer AI Demand AI Supply AI End to End AI Platform Ingest Unify, Transform, Model Provide Insights AWS Advanced Consulting Partner ML Competency Retail Competency
  • 4. AI needs data. Collecting it should be: ● Simple ● Effective
  • 5. Secure File Transfer Protocol SFTP
  • 6. AWS Transfer for SFTP Seamlessly migrate your file transfer workflows to AWS Transfer for SFTP Integrating existing authentication (IAM) providing DNS routing (Route 53) Nothing changes for your customers and partners, or their applications… There is no infrastructure to buy and setup.
  • 7. So why build your own!? Compatibility Issues Configurability Product Maturity
  • 8. Must be: ● Scalable ● Highly Available ● Fault Tolerant Building a custom solution
  • 9. EC2 ● Private Subnet ● Load Balancer ● Security Groups ● Route 53 Making it more secure and simple
  • 10. S3 ● Separate Servers and Data ○ Sync with S3 ■ AZ replication ● IAM instance role and policy to manage access OR ● Private Link S3 Endpoint ○ Traffic stays in AWS network ○ Access bucket privately without authentication when you access the bucket from a VPC that has an endpoint to S3 ● Versioning, Logging, etc. ○ Auditing, recover from accidental deletion Added resilience, availability and security - Data
  • 11. Auto Scaling Group ● Launch Template ○ AMI, Instance Type etc. ● Health check replacements ● Scaling Policies ○ Elastic ○ Cost effective ● AZ spread Fault tolerance and availability - Servers
  • 12. User Management ● Systems Manager (SSM) ● Secrets Manager ● Step Functions Simple and Automatable
  • 13. Disk Management ● CloudWatch Agent ○ CloudWatch Alarms ● Step Functions ● Still using EBS as an intermediary ○ Data duplication ○ Scaling out could mean big cost implications ● Or just use EFS!? ○ Or Multi Attach EBS Provisioned IOPS volumes (NEW) Proactive monitoring and resolution
  • 19. Engineering Opportunities Open Positions: ● Head of Development ● UX Designer ● Lead Product Manager Coming Soon: ● Senior DevOps ● Support Engineer Manchester, UK https://peak.ai/company/careers For all our latest roles, please head to our Peak careers page!
  • 20. Building Scalable Data Ingestion Michael Pearce Senior Engineering Manager (DevOps) @ Peak AI

Editor's Notes

  • #2: My name is Michael, Mike for short. I’m a DevOps Engineer @ Peak AI. I’m based in Manchester and I work to enable developers and data scientists to be autonomous and self sufficient, empowering them to do great things with data!
  • #3: I’m passionate about building great Automation Infrastructure security with Linux and AWS.
  • #4: Barriers to Entry - building systems and hiring ml and ds Also help client identify opportunities in their data From connecting to data sources to Ingest data, unifying that data, exploring it, transforming it, building ML models to extract even for insight from it then providing valuable output in the form of visual dashboards, api endpoints or even just raw data. ‘off the shelf’ solutions - speed up putting ai into the enterprise right the way through the business. Last but not least, Consulting partners, ML Retail Status
  • #5: Especially from a client perspective A few different options using AIS - ingest agent, signed url, webhook, but there was something else customers kept asking for...
  • #6: One tried and tested method of data transfer, still very popular - SFTP. Secure File Transfer Protocol. Hands up to has heard of SFTP? Keep hands up if you’ve used it? Keep your hands up if you’ve had some kind of hard time with SFTP? Managing servers Server dies Disk fails or fills up Managing users Security
  • #7: AWS Transfer for SFTP Seamlessly migrate your file transfer workflows to AWS integrating existing auth - IAM DNS routing (Route 53) Nothing changes for your customers and partners, or their applications No infrastructure to buy and setup. With your data in S3, you can use it with AWS services for processing, analytics, machine learning, and archiving. Getting started with AWS Transfer for SFTP (AWS SFTP) is easy;
  • #8: Compatibility Issues Mainly Windows! Transfer endurance (temp file names for part upload) Preserving timestamps Configurability Gotcha - VPC, use Network LB Only login with an access key is supported. That means: There is simply no access via username and password. General Product Maturity Classic AWS development, just watch this space!
  • #9: A simple EC2 with a SFTP service running would be good, but we want it to be great! Following some architecture best practices (and doing anything being woken up by pagerduty at 3am to reboot a server) How to AWS do it? Could we do that, plus extra customisation? Moving onto how we developed it, as we went along...
  • #10: Put the instance in a Private Subnet Load Balancer (also helps absorb DDoS attacks!) Could even use CloudFront Locked down using Security Groups Route traffic using Domain name (not an IP address)
  • #11: Why we use S3 Followed AWS Transfer example Quick transfer, quick setup Multi AZ Why not EFS? EBS Provisioned IOS now multi attach EC2!!
  • #14: We didn’t consider EBS Not predicting much load yet As the disk fills up, so does S3 (Duplication) If you scale out, you need all the EBS volumes attached to be big enough to sync all the data! EFS would provide Multi AZ One shared disk Elastic, no need for complicate resizing Why we use S3 Followed AWS Transfer example Quick transfer, quick setup Multi AZ
  • #16: EFS alternative - no need for EBS, Resize Step function or S3
  • #18: Read some of our tech blogs on Medium or the Website
  • #19: Open sourced projects on Github
  • #21: Any Questions?