SlideShare a Scribd company logo
Confidential
Zaloni Data Lake Architecture for Data-Driven Decision
Maksym Demianovskyi
Denys Skalskyi co-author
Confidential
2
Agenda
➢ Data evolution
➢ Why data is so important?
➢ Data-driven decision process
➢ Zaloni Data Lake architecture
Main stages of information evolution
1. The first revolution is associated with the invention of writing, which led to a giant qualitative and quantitative leap. It
became possible to transfer knowledge from generation to generation
2. The second (mid-16th century) was caused by the invention of printing, which radically changed industrial society,
culture, and the organization of activities
3. The third (the end of the 19th century) was caused by the invention of electricity, thanks to which the telegraph, the
telephone, and the radio appeared, allowing the rapid transmission and accumulation of information in any volume
4. The fourth (Information explosion) (70s of XX century) is the invention of microprocessor technology and the
appearance of the personal computer. Computers, computer networks, data transmission systems (information
communications) are created on microprocessors and integrated circuits
3
You have to realize that for instance the amount of
information produced by humanity before 2003 year is less
than the amount of data produced by one day in 2023
And you have to realize how much data is produced by end
of 2022: 97 zettabytes
By the end of 2022, there were 94 zettabytes of data in the
world. (Source: Bernard Marr & Co.) 1 ZB is the equivalent of
1,000 exabytes.
Do you know how much 181 zettabytes is? Let’s put it this
way: If you ever tried downloading it by yourself, it’d take you
about two billion years!
The amount of data produced by humanity
4
Data usage facts
● A single person generates 1.7 MB of data every second
● Facebook generates 4 PB of data daily
● One person generates 49.8 GB of IP traffic every month
● YouTubers upload 500 hours per minute means 30,000 hours of content every hour
● Video traffic makes up 82% of all consumer internet traffic
● 50% of all data will be in the cloud by 2025
● Every day created no less than 2.5 quintillion bytes! (That’s two exabytes plus 500 petabytes.)
● AWS Snowmobile has a capacity up to 100 petabytes
5
Data is not only numbers
We can see that we have a lot of data and garbage in
that data, by them self it does not have any sense.
And to make it became a useful information we have to
clean that data (fixing or removing incorrect, corrupted,
incorrectly formatted, duplicate, or incomplete data
within a dataset), and perform statistics for cleaned
data.
And when we will have structured information draw
conclusions for measures. And make that process
continuously help to reach incredible goals.
6
● Help you make better AND smarter decisions
● Keep your business up-to-date
● Improved financial management
● Better performance & more efficient internal operations
● Creates a data-driven culture
● Better customer service
Why data is so important?
7
How companies use data to make decisions
Using Data To Create New Blockbuster Hit Series
They intelligently utilized the power of their data to run predictive analyses to learn what
exactly their customers would be receptive to and interested to watch.
Providing Faster & More Efficient Ride With Data
The company is able to analyze historical data and key metrics that include the number of
ride requests and trips getting fulfilled in different parts of a city as well as the time when this
is happening. This helps to gain insight into areas that have a supply crunch, allowing them
to pre-emptively inform drivers to move to areas ahead of time in order to capitalize on the
inevitable rise in demand.
Uses geographic information systems to analyze factors such as demographic
information, and traffic flow information to choose the best locations to expand into. Not only
does it help with choosing locations but it optimizes which product would best sell in
a given area. 8
Who makes decisions?
● Medical diagnosis
● Legal matters
● Human resources
● Ethical decision-making
● Creative industries
● Fraud detection
● Customer service
● Trading and investment
● Route management systems
● Advertising decisions
9
Data-Driven decision process
10
High level of component diagram
● Web and Mobile apps
● Services
● Devices and IoT
● Logs and Metrics
● Apache Spark
● Google BigQuery
● AWS Athena
● Azure Data Factory
● Data Lake
● Data Warehouse
● Databases
● Files
● Tableau
● Power BI
● Analysts
● 3th party services
Producer Storage Data Processing Analize
11
Future-proofing data lake stack
● Data collection and integration: allow for the collection and
integration of various types of data from different sources
● Real-time data processing: enable real-time data processing
● Data analysis: allow for the analysis of large amounts of data.
● Scalability: Data lakes can scale to meet the needs of the business.
● Efficiency: Data lakes allow for the efficient use of existing
resources, reducing costs associated with data processing and
storage
● Ease of use: Data lakes provide quick and easy access to data,
allowing users to retrieve information easily and quickly
12
Zaloni Data Lake architecture
● Understanding industry best practices
● Providing a template for solutioning
● Tracking a process
● Understanding structures and elements
13
Zaloni zones
14
● Can be complex to implement and may require specialized expertise
● Architecture may be overkill for smaller organizations or those with limited data needs
● May not be well-suited for organizations that require real-time or near-real-time data processing
● Architecture may not be easily customizable to fit specific business needs or use cases
Pros and cons of Zaloni architecture
● Intuitively clear
● Access to raw and formatted data
● Flexible and scalable architecture that can accommodate different data types, formats, and sources
● Offers a modular and extensible architecture that can be customized to meet the specific needs
15
● Lambda Architecture
● Kappa Architecture
● Data Mesh Architecture
● Virtualized Data Architecture
Alternative approaches
16
Summary
● Data is important for businesses because it can help inform decision-making, improve
operational efficiency, and identify new business opportunities
● Real-life examples of data-driven decisions include optimizing website design, improving app
usability, and informing product development
● Data storage options vary, and a data lake is a suitable choice when dealing with diverse and
unstructured data from multiple sources. It provides flexibility and agility for storing
and analyzing data
● Zaloni Data Lake architectures help to build Flexible and scalable architecture
17
18

More Related Content

PDF
What is AI without Data?
PDF
The Cloud Data Lake Early Release Rukmani Gopalan
PDF
Fast Data and Architecting the Digital Enterprise Fast Data drivers, componen...
PDF
Think Big - How to Design a Big Data Information Architecture
PDF
Ictam big data
PPTX
Technology Trends and Big Data in 2013-2014
PPTX
Kaushal Amin & Big 5 IT trends in the world
PPTX
bigdataintro.pptx
What is AI without Data?
The Cloud Data Lake Early Release Rukmani Gopalan
Fast Data and Architecting the Digital Enterprise Fast Data drivers, componen...
Think Big - How to Design a Big Data Information Architecture
Ictam big data
Technology Trends and Big Data in 2013-2014
Kaushal Amin & Big 5 IT trends in the world
bigdataintro.pptx

Similar to GlobalLogic Java Community Webinar #16 “Zaloni’s Architecture for Data-Driven Design” (20)

PDF
Idc big data whitepaper_final
PDF
Business with Big data
PDF
How to build and run a big data platform in the 21st century
PPTX
Big data - What is It?
KEY
What ya gonna do?
 
PDF
The New Convergence of Data; The Next Strategic Business Advantage
PDF
Introduction Big Data
PDF
Building the Enterprise Data Lake: A look at architecture
PDF
Designing the Next Generation Data Lake
PDF
The Rise of Big Data and the Chief Data Officer (CDO)
PPTX
Big data
PPTX
Unushs susus susujss. Ssuusussjjsjsit 4.pptx
PDF
Data foundation for analytics excellence
PPTX
Transforming Big Data into Decisions -- keynote at IBM/s 2014 Big Data Day
PDF
Big Data overview
PPTX
Great Data Delivery: A model-based approach
PPTX
Assessing New Databases– Translytical Use Cases
PPTX
The computing age
PDF
Intro to big data and applications - day 2
PDF
The New Convergence of Data; the Next Strategic Business Advantage
Idc big data whitepaper_final
Business with Big data
How to build and run a big data platform in the 21st century
Big data - What is It?
What ya gonna do?
 
The New Convergence of Data; The Next Strategic Business Advantage
Introduction Big Data
Building the Enterprise Data Lake: A look at architecture
Designing the Next Generation Data Lake
The Rise of Big Data and the Chief Data Officer (CDO)
Big data
Unushs susus susujss. Ssuusussjjsjsit 4.pptx
Data foundation for analytics excellence
Transforming Big Data into Decisions -- keynote at IBM/s 2014 Big Data Day
Big Data overview
Great Data Delivery: A model-based approach
Assessing New Databases– Translytical Use Cases
The computing age
Intro to big data and applications - day 2
The New Convergence of Data; the Next Strategic Business Advantage
Ad

More from GlobalLogic Ukraine (20)

PDF
GlobalLogic JavaScript Community Webinar #21 “Інтерв’ю без заспокійливих”
PPTX
Deadlocks in SQL - Turning Fear Into Understanding (by Sergii Stets)
PDF
GlobalLogic Java Community Webinar #18 “How to Improve Web Application Perfor...
PDF
GlobalLogic Embedded Community x ROS Ukraine Webinar "Surgical Robots"
PDF
GlobalLogic Java Community Webinar #17 “SpringJDBC vs JDBC. Is Spring a Hero?”
PDF
GlobalLogic JavaScript Community Webinar #18 “Long Story Short: OSI Model”
PPTX
Штучний інтелект як допомога в навчанні, а не замінник.pptx
PPTX
Задачі AI-розробника як застосовується штучний інтелект.pptx
PPTX
Що треба вивчати, щоб стати розробником штучного інтелекту та нейромереж.pptx
PDF
JavaScript Community Webinar #14 "Why Is Git Rebase?"
PDF
GlobalLogic .NET Community Webinar #3 "Exploring Serverless with Azure Functi...
PPTX
Страх і сила помилок - IT Inside від GlobalLogic Education
PDF
GlobalLogic .NET Webinar #2 “Azure RBAC and Managed Identity”
PDF
GlobalLogic QA Webinar “What does it take to become a Test Engineer”
PDF
“How to Secure Your Applications With a Keycloak?
PDF
GlobalLogic Machine Learning Webinar “Advanced Statistical Methods for Linear...
PPTX
GlobalLogic Machine Learning Webinar “Statistical learning of linear regressi...
PDF
GlobalLogic C++ Webinar “The Minimum Knowledge to Become a C++ Developer”
PDF
Embedded Webinar #17 "Low-level Network Testing in Embedded Devices Development"
PPTX
GlobalLogic Webinar "Introduction to Embedded QA"
GlobalLogic JavaScript Community Webinar #21 “Інтерв’ю без заспокійливих”
Deadlocks in SQL - Turning Fear Into Understanding (by Sergii Stets)
GlobalLogic Java Community Webinar #18 “How to Improve Web Application Perfor...
GlobalLogic Embedded Community x ROS Ukraine Webinar "Surgical Robots"
GlobalLogic Java Community Webinar #17 “SpringJDBC vs JDBC. Is Spring a Hero?”
GlobalLogic JavaScript Community Webinar #18 “Long Story Short: OSI Model”
Штучний інтелект як допомога в навчанні, а не замінник.pptx
Задачі AI-розробника як застосовується штучний інтелект.pptx
Що треба вивчати, щоб стати розробником штучного інтелекту та нейромереж.pptx
JavaScript Community Webinar #14 "Why Is Git Rebase?"
GlobalLogic .NET Community Webinar #3 "Exploring Serverless with Azure Functi...
Страх і сила помилок - IT Inside від GlobalLogic Education
GlobalLogic .NET Webinar #2 “Azure RBAC and Managed Identity”
GlobalLogic QA Webinar “What does it take to become a Test Engineer”
“How to Secure Your Applications With a Keycloak?
GlobalLogic Machine Learning Webinar “Advanced Statistical Methods for Linear...
GlobalLogic Machine Learning Webinar “Statistical learning of linear regressi...
GlobalLogic C++ Webinar “The Minimum Knowledge to Become a C++ Developer”
Embedded Webinar #17 "Low-level Network Testing in Embedded Devices Development"
GlobalLogic Webinar "Introduction to Embedded QA"
Ad

Recently uploaded (20)

PDF
Empathic Computing: Creating Shared Understanding
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Approach and Philosophy of On baking technology
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PPT
Teaching material agriculture food technology
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Spectral efficient network and resource selection model in 5G networks
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PDF
KodekX | Application Modernization Development
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Empathic Computing: Creating Shared Understanding
Dropbox Q2 2025 Financial Results & Investor Presentation
Approach and Philosophy of On baking technology
Per capita expenditure prediction using model stacking based on satellite ima...
Understanding_Digital_Forensics_Presentation.pptx
Teaching material agriculture food technology
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
NewMind AI Weekly Chronicles - August'25 Week I
Reach Out and Touch Someone: Haptics and Empathic Computing
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Digital-Transformation-Roadmap-for-Companies.pptx
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Diabetes mellitus diagnosis method based random forest with bat algorithm
Mobile App Security Testing_ A Comprehensive Guide.pdf
Spectral efficient network and resource selection model in 5G networks
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
KodekX | Application Modernization Development
Unlocking AI with Model Context Protocol (MCP)
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows

GlobalLogic Java Community Webinar #16 “Zaloni’s Architecture for Data-Driven Design”

  • 1. Confidential Zaloni Data Lake Architecture for Data-Driven Decision Maksym Demianovskyi Denys Skalskyi co-author
  • 2. Confidential 2 Agenda ➢ Data evolution ➢ Why data is so important? ➢ Data-driven decision process ➢ Zaloni Data Lake architecture
  • 3. Main stages of information evolution 1. The first revolution is associated with the invention of writing, which led to a giant qualitative and quantitative leap. It became possible to transfer knowledge from generation to generation 2. The second (mid-16th century) was caused by the invention of printing, which radically changed industrial society, culture, and the organization of activities 3. The third (the end of the 19th century) was caused by the invention of electricity, thanks to which the telegraph, the telephone, and the radio appeared, allowing the rapid transmission and accumulation of information in any volume 4. The fourth (Information explosion) (70s of XX century) is the invention of microprocessor technology and the appearance of the personal computer. Computers, computer networks, data transmission systems (information communications) are created on microprocessors and integrated circuits 3
  • 4. You have to realize that for instance the amount of information produced by humanity before 2003 year is less than the amount of data produced by one day in 2023 And you have to realize how much data is produced by end of 2022: 97 zettabytes By the end of 2022, there were 94 zettabytes of data in the world. (Source: Bernard Marr & Co.) 1 ZB is the equivalent of 1,000 exabytes. Do you know how much 181 zettabytes is? Let’s put it this way: If you ever tried downloading it by yourself, it’d take you about two billion years! The amount of data produced by humanity 4
  • 5. Data usage facts ● A single person generates 1.7 MB of data every second ● Facebook generates 4 PB of data daily ● One person generates 49.8 GB of IP traffic every month ● YouTubers upload 500 hours per minute means 30,000 hours of content every hour ● Video traffic makes up 82% of all consumer internet traffic ● 50% of all data will be in the cloud by 2025 ● Every day created no less than 2.5 quintillion bytes! (That’s two exabytes plus 500 petabytes.) ● AWS Snowmobile has a capacity up to 100 petabytes 5
  • 6. Data is not only numbers We can see that we have a lot of data and garbage in that data, by them self it does not have any sense. And to make it became a useful information we have to clean that data (fixing or removing incorrect, corrupted, incorrectly formatted, duplicate, or incomplete data within a dataset), and perform statistics for cleaned data. And when we will have structured information draw conclusions for measures. And make that process continuously help to reach incredible goals. 6
  • 7. ● Help you make better AND smarter decisions ● Keep your business up-to-date ● Improved financial management ● Better performance & more efficient internal operations ● Creates a data-driven culture ● Better customer service Why data is so important? 7
  • 8. How companies use data to make decisions Using Data To Create New Blockbuster Hit Series They intelligently utilized the power of their data to run predictive analyses to learn what exactly their customers would be receptive to and interested to watch. Providing Faster & More Efficient Ride With Data The company is able to analyze historical data and key metrics that include the number of ride requests and trips getting fulfilled in different parts of a city as well as the time when this is happening. This helps to gain insight into areas that have a supply crunch, allowing them to pre-emptively inform drivers to move to areas ahead of time in order to capitalize on the inevitable rise in demand. Uses geographic information systems to analyze factors such as demographic information, and traffic flow information to choose the best locations to expand into. Not only does it help with choosing locations but it optimizes which product would best sell in a given area. 8
  • 9. Who makes decisions? ● Medical diagnosis ● Legal matters ● Human resources ● Ethical decision-making ● Creative industries ● Fraud detection ● Customer service ● Trading and investment ● Route management systems ● Advertising decisions 9
  • 11. High level of component diagram ● Web and Mobile apps ● Services ● Devices and IoT ● Logs and Metrics ● Apache Spark ● Google BigQuery ● AWS Athena ● Azure Data Factory ● Data Lake ● Data Warehouse ● Databases ● Files ● Tableau ● Power BI ● Analysts ● 3th party services Producer Storage Data Processing Analize 11
  • 12. Future-proofing data lake stack ● Data collection and integration: allow for the collection and integration of various types of data from different sources ● Real-time data processing: enable real-time data processing ● Data analysis: allow for the analysis of large amounts of data. ● Scalability: Data lakes can scale to meet the needs of the business. ● Efficiency: Data lakes allow for the efficient use of existing resources, reducing costs associated with data processing and storage ● Ease of use: Data lakes provide quick and easy access to data, allowing users to retrieve information easily and quickly 12
  • 13. Zaloni Data Lake architecture ● Understanding industry best practices ● Providing a template for solutioning ● Tracking a process ● Understanding structures and elements 13
  • 15. ● Can be complex to implement and may require specialized expertise ● Architecture may be overkill for smaller organizations or those with limited data needs ● May not be well-suited for organizations that require real-time or near-real-time data processing ● Architecture may not be easily customizable to fit specific business needs or use cases Pros and cons of Zaloni architecture ● Intuitively clear ● Access to raw and formatted data ● Flexible and scalable architecture that can accommodate different data types, formats, and sources ● Offers a modular and extensible architecture that can be customized to meet the specific needs 15
  • 16. ● Lambda Architecture ● Kappa Architecture ● Data Mesh Architecture ● Virtualized Data Architecture Alternative approaches 16
  • 17. Summary ● Data is important for businesses because it can help inform decision-making, improve operational efficiency, and identify new business opportunities ● Real-life examples of data-driven decisions include optimizing website design, improving app usability, and informing product development ● Data storage options vary, and a data lake is a suitable choice when dealing with diverse and unstructured data from multiple sources. It provides flexibility and agility for storing and analyzing data ● Zaloni Data Lake architectures help to build Flexible and scalable architecture 17
  • 18. 18