SlideShare a Scribd company logo
2
Most read
4
Most read
8
Most read
Launching a Data Platform on Snowflake
Using “old skills” in a new world
Simon Sleight
Data Guy
• People
• Technology
• Pricing
A Seasoned Professional – My Data Journey
Old World Data Rock StarMe
Evolution
rikkyal © Creative Market
What skills and attributes are required in a data team?
Evolution
David Benes © 123RF.com
Running a platform requires people with the
right skills
1. There is complexity to
manage
2. Agile working
environment
3. Data Rock Stars are rare
4. Data modelling and SQL
invaluable
What technology enabled us to launch a
data platform?
A step change in the evolution of cloud computing.
1. Decoupling storage from compute
Pay for compute only when needed (scale up, down, out)
Pay for storage separately (very cheap)
2. Low barrier to entry
Extremely easy to set up
Very low price (no CAPEX)
3. Same data used by everybody – no impact, each to their own
compute
Snowflake Architecture Overview
Snowflake Architecture Overview
doyouevendata.com
Let me explain…
Data
Extracted
Loaded
Transformed
How your
business
refers to
terms that
you
report on.
Called a
Semantic
Layer
Our platform is a wrapper service based on
Snowflake
Price of Entry
 KETL can now offer cloud services with little risk
 No software licensing costs
 No hardware costs
 Time to deployment rapid
 We can use existing team skills to carry out Extract Load
Transform
 We can iterate quickly and let designs evolve
 Time to value for clients massively reduced
https://www.snowflake.com/zero-to-snowflake/zero-to-snowflake-in-90-minutes-bristol/
KETL
30 Queen Charlotte St
Bristol
BS1 4JH
+44 (0)117 251 0064
www.ketl.co.uk
info@ketl.co.uk
@KETL_BI
For more information on
what we do please contact
Helen Woodcock.
We host regular workshops

More Related Content

PPTX
Building a modern data warehouse
PPTX
Elastic Data Warehousing
PPTX
Databricks Platform.pptx
PPTX
Databricks Fundamentals
PDF
Unified Big Data Processing with Apache Spark (QCON 2014)
PDF
The Culture Map - LearnDay@Xoxzo #8
PDF
Coinbase Seed Round Pitch Deck
DOCX
Low level design template (1)
Building a modern data warehouse
Elastic Data Warehousing
Databricks Platform.pptx
Databricks Fundamentals
Unified Big Data Processing with Apache Spark (QCON 2014)
The Culture Map - LearnDay@Xoxzo #8
Coinbase Seed Round Pitch Deck
Low level design template (1)

What's hot (20)

PDF
Announcing Databricks Cloud (Spark Summit 2014)
PDF
Putting the Ops in DataOps: Orchestrate the Flow of Data Across Data Pipelines
PDF
Accelerate and modernize your data pipelines
PDF
Time to Talk about Data Mesh
PDF
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
PDF
The Marriage of the Data Lake and the Data Warehouse and Why You Need Both
PPTX
Snowflake Overview
PDF
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
PPTX
Introducing the Snowflake Computing Cloud Data Warehouse
PPTX
Data Lakehouse, Data Mesh, and Data Fabric (r1)
PDF
Technical Deck Delta Live Tables.pdf
PDF
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
PDF
8 Steps to Creating a Data Strategy
PDF
Data Governance
PPTX
The Ideal Approach to Application Modernization; Which Way to the Cloud?
PDF
Data Mesh Part 4 Monolith to Mesh
PPTX
DataOps @ Scale: A Modern Framework for Data Management in the Public Sector
PDF
Azure Data Factory V2; The Data Flows
PDF
Data Mesh for Dinner
PPTX
Microsoft Data Integration Pipelines: Azure Data Factory and SSIS
Announcing Databricks Cloud (Spark Summit 2014)
Putting the Ops in DataOps: Orchestrate the Flow of Data Across Data Pipelines
Accelerate and modernize your data pipelines
Time to Talk about Data Mesh
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
The Marriage of the Data Lake and the Data Warehouse and Why You Need Both
Snowflake Overview
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Introducing the Snowflake Computing Cloud Data Warehouse
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Technical Deck Delta Live Tables.pdf
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
8 Steps to Creating a Data Strategy
Data Governance
The Ideal Approach to Application Modernization; Which Way to the Cloud?
Data Mesh Part 4 Monolith to Mesh
DataOps @ Scale: A Modern Framework for Data Management in the Public Sector
Azure Data Factory V2; The Data Flows
Data Mesh for Dinner
Microsoft Data Integration Pipelines: Azure Data Factory and SSIS
Ad

Similar to Launching a Data Platform on Snowflake (20)

PDF
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
PPTX
Navigating Cloud and Multi-Cloud
PDF
The DevOps PaaS Infusion - May meetup
PPTX
Exponential-e | Cloud Revolution Seminar at the Ritz, 20th November 2014
PDF
FInal Project - USMx CC605x Cloud Computing for Enterprises - Hugo Aquino
PDF
Evolving From Monolithic to Distributed Architecture Patterns in the Cloud
PPTX
Coud discovery chap 1
PDF
Tim Jones – CTO, Trader Media
PPTX
2018 open apereomontreal-moving-a large-sakai-installation-to-the-cloud
PPTX
Introduction to ActOnMagic
PPTX
451 Group Increasing Cloud Application Performance
PPTX
Cloud Computing - Why it is so popular
PPTX
What is the Oracle PaaS Cloud for Developers (Oracle Cloud Day, The Netherlan...
PPT
AppNexus' First Pitch Deck
PPT
First Round First Pitch: AppNexus
PDF
Appnexus
PDF
RightScale Roadtrip Boston: Accelerate to Cloud
PDF
Webinar - Learn How to Deploy Microsoft SQL in the Cloud
PDF
Cloud Businesses: Strategic Considerations
PPTX
How Data Drives Business at Choice Hotels
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
Navigating Cloud and Multi-Cloud
The DevOps PaaS Infusion - May meetup
Exponential-e | Cloud Revolution Seminar at the Ritz, 20th November 2014
FInal Project - USMx CC605x Cloud Computing for Enterprises - Hugo Aquino
Evolving From Monolithic to Distributed Architecture Patterns in the Cloud
Coud discovery chap 1
Tim Jones – CTO, Trader Media
2018 open apereomontreal-moving-a large-sakai-installation-to-the-cloud
Introduction to ActOnMagic
451 Group Increasing Cloud Application Performance
Cloud Computing - Why it is so popular
What is the Oracle PaaS Cloud for Developers (Oracle Cloud Day, The Netherlan...
AppNexus' First Pitch Deck
First Round First Pitch: AppNexus
Appnexus
RightScale Roadtrip Boston: Accelerate to Cloud
Webinar - Learn How to Deploy Microsoft SQL in the Cloud
Cloud Businesses: Strategic Considerations
How Data Drives Business at Choice Hotels
Ad

More from KETL Limited (8)

PPTX
London Jaspersoft Community User Group Event 2 KETL presentation
PPTX
Talend Community Use Group Bristol: Preparing your business for mastering dat...
PDF
London Jaspersoft Community User Group presentation KETL
PDF
Jaspersoft 6.2
PDF
KETL Quick guide to data analytics
PPTX
Marketing Network presentation: Why marketers need to be concerned with data ...
PPTX
Talend community user group Bristol: commercial versus community version
PPTX
Talend community user group Bristol & SW UK event
London Jaspersoft Community User Group Event 2 KETL presentation
Talend Community Use Group Bristol: Preparing your business for mastering dat...
London Jaspersoft Community User Group presentation KETL
Jaspersoft 6.2
KETL Quick guide to data analytics
Marketing Network presentation: Why marketers need to be concerned with data ...
Talend community user group Bristol: commercial versus community version
Talend community user group Bristol & SW UK event

Recently uploaded (20)

PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
cuic standard and advanced reporting.pdf
PDF
NewMind AI Monthly Chronicles - July 2025
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Approach and Philosophy of On baking technology
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PDF
KodekX | Application Modernization Development
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Network Security Unit 5.pdf for BCA BBA.
PPT
Teaching material agriculture food technology
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Per capita expenditure prediction using model stacking based on satellite ima...
cuic standard and advanced reporting.pdf
NewMind AI Monthly Chronicles - July 2025
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Unlocking AI with Model Context Protocol (MCP)
Encapsulation_ Review paper, used for researhc scholars
Reach Out and Touch Someone: Haptics and Empathic Computing
NewMind AI Weekly Chronicles - August'25 Week I
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Approach and Philosophy of On baking technology
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
KodekX | Application Modernization Development
Diabetes mellitus diagnosis method based random forest with bat algorithm
Network Security Unit 5.pdf for BCA BBA.
Teaching material agriculture food technology
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Understanding_Digital_Forensics_Presentation.pptx
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx

Launching a Data Platform on Snowflake

Editor's Notes

  • #2: This talk is about how KETL were able to launch a data platform There are three key ingredients – People, Technology, and Pricing
  • #3: People This talk is about KETLs experience with developing a cloud platform. Three key factors in it’s development are people, technology, and price. I’d like to share my experiences to date in order to give you some context. We are all products of our experiences. Good or Bad. It’s true to say that we can only improve going forward. As our Marketing team would say “I am a seasoned professional”. These grey hairs come from experience – I wish I had cloud when I started. When I started work we had to physically build the servers, compile operating systems, fine tune software and run business processes as best we could. The environment was fragile because budgetary and technical constraints meant that true resilience was expensive and difficult to obtain. The financial and emotional investment in systems stymied free form creative development. Changes to any code often came with downtime, arduous deployments, and on occasion hardware upgrades too. In the old world, as sharp and keen as I think I was, I was still perceived as a bottleneck by the business. I was also the custodian and gatekeeper to the data. We all talk cloud and new world but for many people and businesses that I meet they are still carry lots of technical debt, frustrations, and fears – and I completely understand where they are coming from. CALL CENTRE I had to replace three spreadsheets and an access database for a call-centre If I looked at the data it was mostly complete and superficially fit for purpose When I sat with the team taking calls I began to understand the stress of using Excel and Access whilst on the phone to the client When you start to realise that technology should enable and is a service and the power of the right data at the right time you can better understand my personal goal as being an enabler That is what we want for our platform too I’m not a data rock star yet
  • #4: The data platform is a service that has to integrate with different organisations and data sources Interactions with the outside world 70% of the time require dealing with people Good Service requires good people – people are key For me good data people display some key personal qualities: Accuracy (don’t send me a CV with missing full stops and typos) Consistency and reliability Evidence of working with data problems (the number of rows do not always determine the complexity of the problem) The “old skills” are still valuable today Environments are build from scripts – we can deploy identical client server environments using a script called with a different variable, vanilla configuration of cloud servers and data services Coding of key business functions into reusable working patterns – systematic thought processes (we still have to deliver a chain of events even if they now last seconds rather than hours) SQL skills – it matters little which database you learnt SQL on, the fundamentals of SQL should be able to give an employee an understanding of the data warehousing concepts I look for experience of Kimball or Data Vault schema design (even if not directly stated) It still requires humans to interpret business processes Can’t build a data service without a data team Why is this piece of data here? Why is there missing data? What does this piece of data relate to in the business? What does people friendly mean? how do we get to the business of what a customer wants Part of our platform has to deliver a semantic layer to the client – this is where we describe and codify data values in business terms The outcome we are trying to achieve is consistency in business reporting and a centralisation of business logic We are only able to code this layer if we understand and interact with the business users and match code to meaning An Example: What is a customer? Is it a credit card? An email list member? A loyalty card? A gift recipient? Do they expire? How are customers counted? How are they uniquely identifiable? Other typical scenarios are summarising business activity markers into sales stages or grouping products into categories
  • #5: IN MY EXPERIENCE: There is complexity still to manage Client technical debt Access to information and third party systems Multiple data sources, multiple vendors Provision of data ingestion access points and EC2 servers Reporting Requirements We use agile methodologies: de-risk the complexity make tasks manageable Time to value for our clients has improved dramatically with snowflake We recently implemented a proof of concept (end-to-end) from extract to load in four days Previously it would have taken about four months (elapsed time) – hardware, VPCs, software, licences, schema pre-design, etc Proof of concept uses production data and replicates production functionality, the only limitation was the scope of required outputs We used to have a long design stage prior to data load, now we spend the time exploring the real data and adapting the design as we go Typically we can add fields and new sources within a sprint (two weeks) New Skills can be taught and learned (many skills transferable) So many tutorials online, so many great e-learning courses We are running a zero to snowflake course Like any of us coders that feel we can do pretty much anything – there is no substitute for hand on experience Necessity is the mother of invention Our platform was designed to make the steps we do for all clients repeatable and automated Data Rock stars are rare Teams with a combination of skills, differences of opinion, different backgrounds and domain experience provide the best results No single point of failure, or anything to esoteric Domain experience helps speed up insight Data modelling and SQL are key to the product Kimball Star Schemas / Data Vault Understanding join logic SQL load scripts Many of you will have these attributes, so starting a journey in Snowflake will not be as hard as you may think
  • #6: TECHNOLOGY is now meeting our expectation Legacy data pipelines with long batch windows on maxed out limited hardware – fragile and high maintenance, changes difficult Initial cloud offerings reduced capital expenditure on equipment but still required lots of system administrators Recent improvements in shared services and containerisation (Docker) Key elements of the Snowflake technology de-risked the investment Benefits of the technology easily replicated and shared for different client types enables fail-fast (development and querying) lots of different teams can work on the same production data set the same data can be split between different server groups (no impact across teams) ability to process data loading/unloading without impacting running queries zero copy clones and separate compute time travel we are running a hands on zero to snowflake session November 6th Details at end There are other MPP 2016 productive use
  • #7: TECHNOLOGY: This architecture diagram illustrates the core concepts Every user has access to the same data (subject to permissions) Data is stored once Teams can use “clones” of production data to carry out development on Different teams can use their own virtual warehouses (compute resources) Loading warehouse is generally about parallel file ingestion (particularly for legacy) CSV is the quickest File sizes should be about the same 10MB-100MB compressed maximum NUMBER OF FILES key 4 cores / 8 threads -> so eight files in parallel, one file per thread Adhoc analytics warehouse Scaled for query time responsiveness Multiple users, more clusters – resolves concurrency, prevents queued queries Development can scale up and down the warehouse to test different functions Proof of concept single small server adequate for view development We are running a hands on Zero to Snowflake session November 6th Bring a laptop
  • #8: TECHNOLOGY: Each virtual warehouse can have from 1 server per cluster (X-Small) to 128 servers per cluster (4X-Large) Each virtual warehouse cluster can be scaled out identically (up to 10 clusters) Automated Cluster Scale Out Single command line scale-up/down Result Cache persisted for 24 hours – reset each time the results are accessed, for up to 31 days Pay for only the compute used As a company we do not have to predict demand but are able to respond to it We are able to set limits and alerts around usage such that we can be pro-active with running costs
  • #9: TECHNOLOGY: The ETL processes of old where data took ages to load in series and had to be manipulated outside of the database are over Load (and reload) all the data, transform in situ – the Extract Load Transform process Transformation in situ does not require tooling, just a good understanding of SQL Snowflake allows the parallel loading of data files (streams and other feeds) The more nodes you have in the virtual server, the more threads you have to load files Query Result Cache – the cache is part of the snowflake service and returns previously calculated Disk slow but cheap, SSD cache proportional to virtual warehouse size nodes – disappears on ramp down Tuning – ramp up/ramp down Ramp-up for parallel ingestion Ramp out for concurrency – many BI report users
  • #10: Snowflake is an enabler We can ingest data very easily Bash Python Connectors SQL Scripts and S3 We can rapidly prototype and deploy data models Develop views in design stage Share data with customers As mentioned earlier, we are able to implement Proof of Concept warehouses (end-to-end) From four months to four days Data tables can be refined through continual iterations The scalability and speed allow experimentation and measurement before the outcome has to be fixed The investigation of data becomes a doing thing rather than a thinking thing – find issues quicker Flexibility in modelling We can use our “old skills” in designing the data model and generating the semantic layer for the client Notes: AWS Lambda limit 15 minutes EC2 Orchestrated Scripts – CRON / Python / Bash Apache Airflow
  • #12: Come and meet our Rock Star You learn from doing Other Notes: Forecasting using auto.arima - possible ARIMA models are searched through to find the best fit.  ARIMA is Auto-Regressive Integrated Moving Average