SlideShare a Scribd company logo
The Partner to your Cloud
Transformation Journey
1
2
3
4
5
Agenda
Introduction - Why do we need to think beyond Data Lakes?
Driving automation and insights utilizing AWS data services
Best Practices for Data Architecture
Implementation Case Studies and Outcomes Delivered by
Searce
Q&A
Speaker
Bhuvaneshwaran R.
Database Architect
Searce
Driving Automation and Insights utilizing AWS Data Services
Data Generated vs Analyzed
Know your data
Structured Un-structuredSemi-structured
● CRM
● ERP
● SQL Databases
● Log Files
● Image files
● Calls
● Mobile Data
● iOT Sensors
● Social media data
Batch Streaming
Define the pipeline
Data Life Cycle in AWS Data Platform
Kinesis SFTP DMS Snowball Direct Connect
DynamoDB ElasticSearch Glue Catalog
Glue EMR RedShift Athena QuickSight
Data Ingestion
Get your data into
S3 with secure
Data Catalog
Access & Search
metadata
Process &
Analytics
Get insights from
your data
MSK
Data Lake or Data Warehouse
Data Lake Data Warehouse
Schema on Read PROCESSING Schema on write
Structured, Semi Structured,
Unstructured, Raw
DATA Structured and Processed
Designed For Low Cost
Storage
STORAGE
Expensive for large data
volumes
Helps for fast ingestion of
new data
DATA PROCESSING
Time-consuming to
introduce new content.
Data Scientists, etc. USERS Business Professionals
Transformation
Extract, Transform & Load
Extract, Load and Transform
1
2
3
4
5
AWS Data Lake Infrastructure
Highly durable & Unlimited storage
Support for open file formats
Easy integration to other AWS services
Secure, Complainant & Audit
Decouple of storage and compute
Reference Architecture - Building a Data Lake in AWS
ETL for Analytics
● RDS - Source
● Glue - ETL
● S3 - Storage
● Athena -
Interactive Query
service
Streaming Data Solutions with Amazon Kinesis
Components:
● Kinesis Data Stream
● Kinesis FireHose
● Kinesis Analytics
● Lambda
● DynamoDB
● SNS
Streaming Relational Database Solution - CDC
Components:
● RDS MySQL
● Debezium Connector
● AWS MSK
● S3
● ElasticSearch
● EMR
● RedShift
● Consumer App
1 2 3 4 5 6
Extract Transform
& Process
Data Lake
(Storage)
Visualization AI/ML
Data Lake Lifecycle
Security
Data Governance
“Data governance is the formal orchestration of people, processes, and technology that enables an
organization to leverage data as an enterprise asset.”
Data Governance on AWS:
● De-Identified Data lake
● Data Matching
● Data Transformation
● Data Catalog
● Analytics and Data processing
● Monitoring
Maintain the Data Catalog
Glue:
● Crawler
● MetaData
● Versioning
● Custom classifiers
Data Governance Reference Architecture
Lake Formation
Lake Formation - Security
Where are you in your Data Journey?
Ecommerce or Retail - Real-
time Analytics
● Real time clickstream
data
● Use ML for
Recommendation engine.
Services:
1. Kinesis
2. Sagemaker
3. DynamoDB
Digital Native already on Cloud -
cost optimization
● Move complex ETL
workloads to BigData
clusters
● Move Large volume of cold
data to DataLake
Services:
1. RedShift
2. EMR
3. Glue
4. S3
5. Athena
6. Spectrum
Traditional Enterprise or DNB
- DW/DL - Security
● Move your Glue catalog,
Athena to Lake
Formation.
● Control the
database/Storage level
access with AWS Lake
formation
Services:
1. Lake Formation
2. IAM
3. KMS
Speaker
Wei Chung Low
Sr. Specialist Partner
Solution Architect
Big Data and Analytics
Amazon Web Services
Best Practices for Data Architecture
Challenge
Solution
Business Impact
Case Studies | AWS | FlowerAura
Needed reliable Data Lake solutions to:
● Collect and process POS as well as website/ mobile application data
● Support analytics-based services for deeper understanding of purchasing
behaviors
● Help the customers/visitors to make a better decision while purchasing
● Built a Data Lake that collected real-time data from the existing data sources
and used AWS Glue which performed ETL on the collected data
● Trained the transformed data using Sagemaker which provided
recommendations to the customer as per the browsing and purchasing
history
● A single source of truth with all data sources in one repository
● The recommendation engine presented users with choices regarding items
based on selections and from the list of available items.
● It led to upsell, higher offtake, greater retention of existing customers, and
lower advertising costs.
FlowerAura is an online flower store
that delivers deliver the best quality
fresh cut flowers in more than 220
cities across India using strong
affiliate network and channel stores.
Workload
AWS S3,Redshift, DynamoDb,
Quicksight, Sagemaker, AWS
Glue/Kinesis
Industry
E-Commerce
Challenge
Solution
Business Impact
Case Studies | AWS | Britannia
● Manual processes for ETL and consolidating data- took 3-4 days and scale was a big
bottleneck.
● Fulfillment for 18000+ stores all across India by analyzing the purchase behavior of
customers to help Britannia identify the demand-supply pattern and keep up to date
with the SKU's
● Provisioned infrastructure on AWS - VPC, ETL instances, RedShift, Processing Server
and Server which will host Tableau.
● Established Site to Site VPN
● Initiated one time dump of the data from On Premise SQL server to S3
● Authored ETL jobs for loading from multiple data sources from On-premise to AWS S3
and help Emisha team connect Tableau Server to the Redshift
● Deployed and served ML model
Workload
S3, Redshift, VPC, EC2, Sagemaker
Industry
FMCG
Britannia Industries Limited is
one of the oldest existing Indian
food-products corporations.
Existing manual process of consolidating data for analytics and predictions on customer's
buying patterns took 3-4 days, now replaced by a real time dashboard, thus making tracking
and management easy.
Generative Designs: Data meets AI, meets creativity
For world’s 5th largest watchmaker, Searce created a Deep Learning model that is capable of
generating watch designs based on input parameters
These parameters included
● Band Color
● Dial Color
● Gender
● Dial Size
● Band Material
Q&A
Please type your questions on the chat window
Thank YouThank You

More Related Content

PDF
Unified Data Analytics: Helping Data Teams Solve the World’s Toughest Problems
PPTX
Apache frameworks for Big and Fast Data
PDF
DBP-010_Using Azure Data Services for Modern Data Applications
PDF
Logging infrastructure for Microservices using StreamSets Data Collector
PDF
Columbia Migrates from Legacy Data Warehouse to an Open Data Platform with De...
PDF
Amazon big success using big data analytics
PDF
How to Build Modern Data Architectures Both On Premises and in the Cloud
PPTX
Spark Streaming with Azure Databricks
Unified Data Analytics: Helping Data Teams Solve the World’s Toughest Problems
Apache frameworks for Big and Fast Data
DBP-010_Using Azure Data Services for Modern Data Applications
Logging infrastructure for Microservices using StreamSets Data Collector
Columbia Migrates from Legacy Data Warehouse to an Open Data Platform with De...
Amazon big success using big data analytics
How to Build Modern Data Architectures Both On Premises and in the Cloud
Spark Streaming with Azure Databricks

What's hot (14)

PPTX
Develop scalable analytical solutions with Azure Data Factory & Azure SQL Dat...
PDF
Building Custom Big Data Integrations
PDF
Azure Days 2019: Grösser und Komplexer ist nicht immer besser (Meinrad Weiss)
PPTX
Lecture1
PPT
Survey of Real-time Processing Systems for Big Data
PDF
Scalable Data Management for Kafka and Beyond | Dan Rice, BigID
PDF
Extracting Value from IOT using Azure Cosmos DB, Azure Synapse Analytics and ...
PDF
Architect’s Open-Source Guide for a Data Mesh Architecture
PDF
Azure Days 2019: Keynote Azure Switzerland – Status Quo und Ausblick (Primo A...
PDF
Deliver Your Modern Data Warehouse (Microsoft Tech Summit Oslo 2018)
PPTX
Webinar: Transforming Customer Experience Through an Always-On Data Platform
PPTX
New capabilities for modern data integration in the cloud
PPTX
How to Operationalise Real-Time Hadoop in the Cloud
PPTX
Azure Synapse Analytics Overview (r2)
Develop scalable analytical solutions with Azure Data Factory & Azure SQL Dat...
Building Custom Big Data Integrations
Azure Days 2019: Grösser und Komplexer ist nicht immer besser (Meinrad Weiss)
Lecture1
Survey of Real-time Processing Systems for Big Data
Scalable Data Management for Kafka and Beyond | Dan Rice, BigID
Extracting Value from IOT using Azure Cosmos DB, Azure Synapse Analytics and ...
Architect’s Open-Source Guide for a Data Mesh Architecture
Azure Days 2019: Keynote Azure Switzerland – Status Quo und Ausblick (Primo A...
Deliver Your Modern Data Warehouse (Microsoft Tech Summit Oslo 2018)
Webinar: Transforming Customer Experience Through an Always-On Data Platform
New capabilities for modern data integration in the cloud
How to Operationalise Real-Time Hadoop in the Cloud
Azure Synapse Analytics Overview (r2)
Ad

Similar to Delivering business insights and automation utilizing aws data services (20)

PDF
It's All About the Data - Tia Dubuisson
PDF
Building a Modern Data Platform in the Cloud. AWS Initiate Portugal
PPTX
Architecting Data Lake on AWS by the Data Engineering Team at HiFX IT
PPTX
Data Modernization_Harinath Susairaj.pptx
PDF
From ingest to insights with AWS
PDF
Big data and Analytics on AWS
PPTX
From raw data to business insights. A modern data lake
PDF
Big Data, Ingeniería de datos, y Data Lakes en AWS
PDF
Architecting Data Lakes on AWS
PDF
Innovation Track AWS Cloud Experience Argentina - Data Lakes & Analytics en AWS
PDF
AWS Summit Singapore 2019 | Big Data Analytics Architectural Patterns and Bes...
PDF
How to Streamline DataOps on AWS
PDF
Agile enterprise analytics on aws
PDF
"Building a Modern Data platform in the Cloud", Alex Casalboni, AWS Dev Day K...
PDF
MongoDB World 2019: re:Innovate from Siloed to Deep Insights on Your Data
PDF
Building Data Lakes and Analytics on AWS. IPExpo Manchester.
PPTX
Aws centralized logs
PDF
Big Data & Analytics - Innovating at the Speed of Light
PDF
Building a modern data platform in the cloud. AWS DevDay Nordics
PDF
클라우드에서의 데이터 웨어하우징 & 비즈니스 인텔리전스
It's All About the Data - Tia Dubuisson
Building a Modern Data Platform in the Cloud. AWS Initiate Portugal
Architecting Data Lake on AWS by the Data Engineering Team at HiFX IT
Data Modernization_Harinath Susairaj.pptx
From ingest to insights with AWS
Big data and Analytics on AWS
From raw data to business insights. A modern data lake
Big Data, Ingeniería de datos, y Data Lakes en AWS
Architecting Data Lakes on AWS
Innovation Track AWS Cloud Experience Argentina - Data Lakes & Analytics en AWS
AWS Summit Singapore 2019 | Big Data Analytics Architectural Patterns and Bes...
How to Streamline DataOps on AWS
Agile enterprise analytics on aws
"Building a Modern Data platform in the Cloud", Alex Casalboni, AWS Dev Day K...
MongoDB World 2019: re:Innovate from Siloed to Deep Insights on Your Data
Building Data Lakes and Analytics on AWS. IPExpo Manchester.
Aws centralized logs
Big Data & Analytics - Innovating at the Speed of Light
Building a modern data platform in the cloud. AWS DevDay Nordics
클라우드에서의 데이터 웨어하우징 & 비즈니스 인텔리전스
Ad

Recently uploaded (20)

PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PDF
Empathic Computing: Creating Shared Understanding
PDF
Encapsulation_ Review paper, used for researhc scholars
PPTX
sap open course for s4hana steps from ECC to s4
PPTX
Spectroscopy.pptx food analysis technology
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Unlocking AI with Model Context Protocol (MCP)
PPTX
A Presentation on Artificial Intelligence
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Machine learning based COVID-19 study performance prediction
PDF
Electronic commerce courselecture one. Pdf
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PPTX
Big Data Technologies - Introduction.pptx
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
NewMind AI Weekly Chronicles - August'25-Week II
Empathic Computing: Creating Shared Understanding
Encapsulation_ Review paper, used for researhc scholars
sap open course for s4hana steps from ECC to s4
Spectroscopy.pptx food analysis technology
MIND Revenue Release Quarter 2 2025 Press Release
Spectral efficient network and resource selection model in 5G networks
Unlocking AI with Model Context Protocol (MCP)
A Presentation on Artificial Intelligence
Review of recent advances in non-invasive hemoglobin estimation
Machine learning based COVID-19 study performance prediction
Electronic commerce courselecture one. Pdf
Per capita expenditure prediction using model stacking based on satellite ima...
Big Data Technologies - Introduction.pptx
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Mobile App Security Testing_ A Comprehensive Guide.pdf
Building Integrated photovoltaic BIPV_UPV.pdf
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
gpt5_lecture_notes_comprehensive_20250812015547.pdf

Delivering business insights and automation utilizing aws data services

  • 1. The Partner to your Cloud Transformation Journey
  • 2. 1 2 3 4 5 Agenda Introduction - Why do we need to think beyond Data Lakes? Driving automation and insights utilizing AWS data services Best Practices for Data Architecture Implementation Case Studies and Outcomes Delivered by Searce Q&A
  • 3. Speaker Bhuvaneshwaran R. Database Architect Searce Driving Automation and Insights utilizing AWS Data Services
  • 4. Data Generated vs Analyzed
  • 5. Know your data Structured Un-structuredSemi-structured ● CRM ● ERP ● SQL Databases ● Log Files ● Image files ● Calls ● Mobile Data ● iOT Sensors ● Social media data
  • 7. Data Life Cycle in AWS Data Platform Kinesis SFTP DMS Snowball Direct Connect DynamoDB ElasticSearch Glue Catalog Glue EMR RedShift Athena QuickSight Data Ingestion Get your data into S3 with secure Data Catalog Access & Search metadata Process & Analytics Get insights from your data MSK
  • 8. Data Lake or Data Warehouse Data Lake Data Warehouse Schema on Read PROCESSING Schema on write Structured, Semi Structured, Unstructured, Raw DATA Structured and Processed Designed For Low Cost Storage STORAGE Expensive for large data volumes Helps for fast ingestion of new data DATA PROCESSING Time-consuming to introduce new content. Data Scientists, etc. USERS Business Professionals
  • 9. Transformation Extract, Transform & Load Extract, Load and Transform
  • 10. 1 2 3 4 5 AWS Data Lake Infrastructure Highly durable & Unlimited storage Support for open file formats Easy integration to other AWS services Secure, Complainant & Audit Decouple of storage and compute
  • 11. Reference Architecture - Building a Data Lake in AWS
  • 12. ETL for Analytics ● RDS - Source ● Glue - ETL ● S3 - Storage ● Athena - Interactive Query service
  • 13. Streaming Data Solutions with Amazon Kinesis Components: ● Kinesis Data Stream ● Kinesis FireHose ● Kinesis Analytics ● Lambda ● DynamoDB ● SNS
  • 14. Streaming Relational Database Solution - CDC Components: ● RDS MySQL ● Debezium Connector ● AWS MSK ● S3 ● ElasticSearch ● EMR ● RedShift ● Consumer App
  • 15. 1 2 3 4 5 6 Extract Transform & Process Data Lake (Storage) Visualization AI/ML Data Lake Lifecycle Security
  • 16. Data Governance “Data governance is the formal orchestration of people, processes, and technology that enables an organization to leverage data as an enterprise asset.” Data Governance on AWS: ● De-Identified Data lake ● Data Matching ● Data Transformation ● Data Catalog ● Analytics and Data processing ● Monitoring
  • 17. Maintain the Data Catalog Glue: ● Crawler ● MetaData ● Versioning ● Custom classifiers
  • 20. Lake Formation - Security
  • 21. Where are you in your Data Journey? Ecommerce or Retail - Real- time Analytics ● Real time clickstream data ● Use ML for Recommendation engine. Services: 1. Kinesis 2. Sagemaker 3. DynamoDB Digital Native already on Cloud - cost optimization ● Move complex ETL workloads to BigData clusters ● Move Large volume of cold data to DataLake Services: 1. RedShift 2. EMR 3. Glue 4. S3 5. Athena 6. Spectrum Traditional Enterprise or DNB - DW/DL - Security ● Move your Glue catalog, Athena to Lake Formation. ● Control the database/Storage level access with AWS Lake formation Services: 1. Lake Formation 2. IAM 3. KMS
  • 22. Speaker Wei Chung Low Sr. Specialist Partner Solution Architect Big Data and Analytics Amazon Web Services Best Practices for Data Architecture
  • 23. Challenge Solution Business Impact Case Studies | AWS | FlowerAura Needed reliable Data Lake solutions to: ● Collect and process POS as well as website/ mobile application data ● Support analytics-based services for deeper understanding of purchasing behaviors ● Help the customers/visitors to make a better decision while purchasing ● Built a Data Lake that collected real-time data from the existing data sources and used AWS Glue which performed ETL on the collected data ● Trained the transformed data using Sagemaker which provided recommendations to the customer as per the browsing and purchasing history ● A single source of truth with all data sources in one repository ● The recommendation engine presented users with choices regarding items based on selections and from the list of available items. ● It led to upsell, higher offtake, greater retention of existing customers, and lower advertising costs. FlowerAura is an online flower store that delivers deliver the best quality fresh cut flowers in more than 220 cities across India using strong affiliate network and channel stores. Workload AWS S3,Redshift, DynamoDb, Quicksight, Sagemaker, AWS Glue/Kinesis Industry E-Commerce
  • 24. Challenge Solution Business Impact Case Studies | AWS | Britannia ● Manual processes for ETL and consolidating data- took 3-4 days and scale was a big bottleneck. ● Fulfillment for 18000+ stores all across India by analyzing the purchase behavior of customers to help Britannia identify the demand-supply pattern and keep up to date with the SKU's ● Provisioned infrastructure on AWS - VPC, ETL instances, RedShift, Processing Server and Server which will host Tableau. ● Established Site to Site VPN ● Initiated one time dump of the data from On Premise SQL server to S3 ● Authored ETL jobs for loading from multiple data sources from On-premise to AWS S3 and help Emisha team connect Tableau Server to the Redshift ● Deployed and served ML model Workload S3, Redshift, VPC, EC2, Sagemaker Industry FMCG Britannia Industries Limited is one of the oldest existing Indian food-products corporations. Existing manual process of consolidating data for analytics and predictions on customer's buying patterns took 3-4 days, now replaced by a real time dashboard, thus making tracking and management easy.
  • 25. Generative Designs: Data meets AI, meets creativity For world’s 5th largest watchmaker, Searce created a Deep Learning model that is capable of generating watch designs based on input parameters These parameters included ● Band Color ● Dial Color ● Gender ● Dial Size ● Band Material
  • 26. Q&A Please type your questions on the chat window