Introduction to Data Science: Course Objectives
1
COURSE OBJECTIVES
The Course aims to:
• This course brings together several key big data problems and solutions.
• To recognize the key concepts of Extraction, Transformation and Loading
• To prepare a sample project in Hadoop Environment
COURSE OUTCOMES
2
On completion of this course, the students shall be able to:-
CO2 Describe big data processing merits in data understanding.
• Many popular companies are using Data Science for easing their regular
processes.
• We will explore a use case of Walmart to see how it is utilizing data to optimize its
supply chain and make better decisions.
• We will also learn the core implementations of Data Science in businesses.
• Businesses today have become data-centric. This means that the businesses of the
world utilize data to make decisions and grow their company in the direction that
the data provides.
Implementations of Data Science in Businesses
Lecture 1.13 & 1.14 &1.15_Business Profiles in Big Data.pptx
5
Data Science Case Study
Facebook – Big Data Analytics to Make Business Better
6
Relationship Between Facebook and Big Data
7
Facebook uses Hadoop HDFS Architecture. Facebook collects data from two sources:
• User data is stored in the federated MySQL layer, and web servers produce
event-based log data.
• Web server data is gathered and sent to Scribe servers, which run in Hadoop
clusters.
The Hadoop Distributed File System receives log data aggregated by the Scribe
servers (HDFS). Periodically, HDFS data is compressed before being sent to
production Hive-Hadoop clusters for additional processing. The Production Hive-
Hadoop cluster receives the Federated MySQL data, dumps, compresses, and moves
it there.
8
Facebook uses two different clusters for data analysis.
• The Production Hive-Hadoop cluster is used to process tasks with severe
deadlines.
• The ad hoc Hive-Hadoop cluster does lower-priority jobs and ad hoc analysis
activities. The Ad hoc cluster receives data replication from the Production cluster.
The data analysis outcomes are saved to the Hadoop Hive cluster or, for Facebook users,
to the MySQL tier. A graphical user interface (HiPal) or a command-line interface (Hive)
is used to specify ad hoc analysis queries (Hive CLI). Facebook uses a Python framework
to execute (Database) and schedule periodic batch jobs in the Production cluster.
9
Organizations are leveraging Big Data analytics to engage with their customers by
understanding their behaviour more precisely. Some of the key takeaways from the article are –
1. Facebook uses all the available user information with its deep neural network models and
finds the target audience for a particular advertisement. This helps to serve the users’
advertisements more insightfully.
2. Due to this, Facebook has emerged as one of the toughest competitors for Google Search
Engine and Youtube in the digital marketing race.
3. With the humungous amount of user data present with Facebook and continuously
increasing, there are still a lot of other use cases we might see in the future from
Facebook.
Data Science Case Study
Walmart – Leveraging Data to Make Business Better
• Walmart is the world’s largest retailer. It is one of the many major industries that is
leveraging Big Data to make the business more efficient. Walmart handles a
plethora of customer data. A staggering amount of about 2.5 petabytes of data is
collected from the customers every hour.
• This data is unstructured that is utilized through Hadoop and NoSQL. It tracks and
monitors various factors that might affect the sales at Walmart stores.
Data Science Case Study
Walmart – Leveraging Data to Make Business Better
Lecture 1.13 & 1.14 &1.15_Business Profiles in Big Data.pptx
Lecture 1.13 & 1.14 &1.15_Business Profiles in Big Data.pptx
• Traditional Business Intelligence was more descriptive and static in nature.
However, with the addition of data science, it has transformed itself to become a
more dynamic field. Data Science has rendered Business Intelligence to
incorporate a wide range of business operations.
• With the massive increase in the volume of data, businesses need data
scientists to analyze and derive meaningful insights from the data.
• The meaningful insights will help the data science companies to analyze
information at a large scale and gain necessary decision-making strategies.
The process of decision-making involves the evaluation and assessment of
various factors involved in it. Decision Making is a four-step process:
1. Business Intelligence for Making Smarter Decisions
Lecture 1.13 & 1.14 &1.15_Business Profiles in Big Data.pptx
• The process involves the analysis of customer reviews to find the best fit for
the products. This analysis is carried out with the advanced analytical tools of Data
Science.
• Furthermore, industries utilize the current market trends to devise a product
for the masses. These market trends provide businesses with clues about the
current need for the product. Businesses evolve with innovation.
• For example – Airbnb uses data science to improve its services. The data
generated by the customers is processed and analyzed. It is then used by Airbnb to
address the requirements and offers premier facilities to its customers.
2. Making Better Products
• With Data Science, businesses can manage themselves more efficiently. Both
large-scale businesses and small startups can benefit from data science in order
to grow further.
• Data Scientists help to analyze the health of the businesses. With data science,
companies can predict the success rate of their strategies. Data Scientists are
responsible for turning raw data into cooked data.
• Based on this, the business can take important measures to quantify and evaluate
its performance and take appropriate management steps. It can also help the
managers to analyze and determine the potential candidates for the business.
• For example – Data Science can be used to monitor the performance of
employees. Using this, managers can analyze the contributions made by the
employees and determine when they should be promoted, manage their
perks, etc.
3. Managing Businesses Efficiently
• Predictive analytics is the most important
part of businesses. With the advent of
advanced predictive tools and technologies,
companies have expanded their capability to
deal with diverse forms of data.
• In formal terms, predictive analytics is the
statistical analysis of data that involves
several machine learning algorithms for
predicting the future outcome using
historical data. There are several predictive
analytics tools like SAS, IBM SPSS, SAP, etc.
• Predictive Analytics has its own specific
implementation based on the type of
industry. However, regardless of that, it
shares a common role in predicting future
events.
4. Predictive Analytics to Predict Outcomes
• In the past, many businesses would take poor decisions due to the lack of
surveys or sole reliance on ‘gut feelings’. It would result in some disastrous
decisions leading to losses of millions.
• However, with the presence of a plethora of data and necessary data tools, it is now
possible for the data industries to make calculated data-driven decisions.
• Furthermore, business decisions can be made with the help of powerful tools
that can not only process data faster but also provide accurate results.
5. Leveraging Data for Business Decisions
• After making decisions through the forecast of future occurrences, it is a
requirement for the companies to assess them. This is possible through several
hypothesis testing tools.
• After implementing the decisions, businesses should understand how these
decisions affect their performance and growth. If the decision leads to any
negative factor, then they should analyze it and eliminate the problem that is slowing
down their performance.
• Furthermore, in order to assess future growth through the present course of actions,
businesses can make profits considerably with the help of data science.
6. Assessing Business Decisions
• Data Science has played a key role in bringing automation to several
industries. It has taken away mundane and repetitive jobs. One such job is that of
resume screening. Every day, companies have to deal with hordes of applicants’
resumes.
• Data science technologies like image recognition are able to convert the visual
information from the resume into a digital format. It then processes the data using
various analytical algorithms like clustering and classification to churn out the
right candidate for the job.
• Furthermore, businesses study the right trends and analyze potential applicants for
the job. This allows them to reach out to candidates and have an in-depth insight
into the job-seeker market.
7. Automating Recruitment Processes
• In the end, we understand how data science plays an important role in businesses.
We realized how data science is being used for business intelligence, for improving
products, for increasing the management capabilities of companies and for
predictive analytics.
• We also went through a use case of Walmart and how they utilize the data science
to increase their efficiency.
Summary
• How
FAQs
References
• RamezElmasri and Shamkant B. Navathe, “Fundamentals of Database System”,
The Benjamin / Cummings Publishing Co.
• https://www.geeksforgeeks.org/
• Sinan Ozdemir, “Principles of data science”, Packt> Publications, 2016..
24
25
THANK YOU

More Related Content

PPTX
Simplify your analytics strategy
PPTX
This is abouts are you doing the same time who is the best person to be safe and
PPTX
WHAT IS BUSINESS ANALYTICS um hj mnjh nit 1 ppt only kjjn
PPTX
Modern Analytics And The Future Of Quality And Performance Excellence
PPTX
Chapter 4 : Introduction to BigData.pptx
PDF
Tips --Break Down the Barriers to Better Data Analytics
PDF
Big data Analytics
PPTX
bigdgiuuuuoipopoooojpojhiOohuggbvkllhggjkgjkjkjk
Simplify your analytics strategy
This is abouts are you doing the same time who is the best person to be safe and
WHAT IS BUSINESS ANALYTICS um hj mnjh nit 1 ppt only kjjn
Modern Analytics And The Future Of Quality And Performance Excellence
Chapter 4 : Introduction to BigData.pptx
Tips --Break Down the Barriers to Better Data Analytics
Big data Analytics
bigdgiuuuuoipopoooojpojhiOohuggbvkllhggjkgjkjkjk

Similar to Lecture 1.13 & 1.14 &1.15_Business Profiles in Big Data.pptx (20)

PPTX
business intelligence and analytics notes
PPTX
BIG DATA CHAPTER 2 IN DSS.pptx
PDF
Big Data, Big Thinking: Untapped Opportunities
PPTX
Big data
PPTX
Big data analytics
PPTX
data analytics modelling presentations123
PDF
Sgcp14dunlea
PPTX
Business Intelligence and decision support system
PPTX
Business intelligence and big data
PPT
Data mining wrhousing-lec
PPTX
Big Data in Business Application use case and benefits
PPTX
Business Intelligence Module 2
PDF
Turning Big Data Analytics To Knowledge PowerPoint Presentation Slides
PPTX
basics of fundamendal of business analytics
PPTX
basics of fundamendal of business analytics
PPTX
Big data and Predictive Analytics By : Professor Lili Saghafi
PPTX
BA Overview.pptx
PPTX
Data Strategy - Executive MBA Class, IE Business School
PPTX
Module 2 - Improving current business with your own data - Online
PPTX
Simplify your analytics strategy
business intelligence and analytics notes
BIG DATA CHAPTER 2 IN DSS.pptx
Big Data, Big Thinking: Untapped Opportunities
Big data
Big data analytics
data analytics modelling presentations123
Sgcp14dunlea
Business Intelligence and decision support system
Business intelligence and big data
Data mining wrhousing-lec
Big Data in Business Application use case and benefits
Business Intelligence Module 2
Turning Big Data Analytics To Knowledge PowerPoint Presentation Slides
basics of fundamendal of business analytics
basics of fundamendal of business analytics
Big data and Predictive Analytics By : Professor Lili Saghafi
BA Overview.pptx
Data Strategy - Executive MBA Class, IE Business School
Module 2 - Improving current business with your own data - Online
Simplify your analytics strategy

Recently uploaded (20)

PPTX
Module 8- Technological and Communication Skills.pptx
PDF
Cryptography and Network Security-Module-I.pdf
PPTX
ai_satellite_crop_management_20250815030350.pptx
PPTX
"Array and Linked List in Data Structures with Types, Operations, Implementat...
PPTX
Petroleum Refining & Petrochemicals.pptx
PDF
Java Basics-Introduction and program control
PPTX
tack Data Structure with Array and Linked List Implementation, Push and Pop O...
PDF
distributed database system" (DDBS) is often used to refer to both the distri...
PDF
Exploratory_Data_Analysis_Fundamentals.pdf
PDF
Computer System Architecture 3rd Edition-M Morris Mano.pdf
PDF
Influence of Green Infrastructure on Residents’ Endorsement of the New Ecolog...
DOC
T Pandian CV Madurai pandi kokkaf illaya
PPTX
CN_Unite_1 AI&DS ENGGERING SPPU PUNE UNIVERSITY
PPTX
Micro1New.ppt.pptx the mai themes of micfrobiology
PDF
MLpara ingenieira CIVIL, meca Y AMBIENTAL
PDF
First part_B-Image Processing - 1 of 2).pdf
PDF
Applications of Equal_Area_Criterion.pdf
PPTX
wireless networks, mobile computing.pptx
PPTX
PRASUNET_20240614003_231416_0000[1].pptx
PDF
UEFA_Embodied_Carbon_Emissions_Football_Infrastructure.pdf
Module 8- Technological and Communication Skills.pptx
Cryptography and Network Security-Module-I.pdf
ai_satellite_crop_management_20250815030350.pptx
"Array and Linked List in Data Structures with Types, Operations, Implementat...
Petroleum Refining & Petrochemicals.pptx
Java Basics-Introduction and program control
tack Data Structure with Array and Linked List Implementation, Push and Pop O...
distributed database system" (DDBS) is often used to refer to both the distri...
Exploratory_Data_Analysis_Fundamentals.pdf
Computer System Architecture 3rd Edition-M Morris Mano.pdf
Influence of Green Infrastructure on Residents’ Endorsement of the New Ecolog...
T Pandian CV Madurai pandi kokkaf illaya
CN_Unite_1 AI&DS ENGGERING SPPU PUNE UNIVERSITY
Micro1New.ppt.pptx the mai themes of micfrobiology
MLpara ingenieira CIVIL, meca Y AMBIENTAL
First part_B-Image Processing - 1 of 2).pdf
Applications of Equal_Area_Criterion.pdf
wireless networks, mobile computing.pptx
PRASUNET_20240614003_231416_0000[1].pptx
UEFA_Embodied_Carbon_Emissions_Football_Infrastructure.pdf

Lecture 1.13 & 1.14 &1.15_Business Profiles in Big Data.pptx

  • 1. Introduction to Data Science: Course Objectives 1 COURSE OBJECTIVES The Course aims to: • This course brings together several key big data problems and solutions. • To recognize the key concepts of Extraction, Transformation and Loading • To prepare a sample project in Hadoop Environment
  • 2. COURSE OUTCOMES 2 On completion of this course, the students shall be able to:- CO2 Describe big data processing merits in data understanding.
  • 3. • Many popular companies are using Data Science for easing their regular processes. • We will explore a use case of Walmart to see how it is utilizing data to optimize its supply chain and make better decisions. • We will also learn the core implementations of Data Science in businesses. • Businesses today have become data-centric. This means that the businesses of the world utilize data to make decisions and grow their company in the direction that the data provides. Implementations of Data Science in Businesses
  • 5. 5 Data Science Case Study Facebook – Big Data Analytics to Make Business Better
  • 7. 7 Facebook uses Hadoop HDFS Architecture. Facebook collects data from two sources: • User data is stored in the federated MySQL layer, and web servers produce event-based log data. • Web server data is gathered and sent to Scribe servers, which run in Hadoop clusters. The Hadoop Distributed File System receives log data aggregated by the Scribe servers (HDFS). Periodically, HDFS data is compressed before being sent to production Hive-Hadoop clusters for additional processing. The Production Hive- Hadoop cluster receives the Federated MySQL data, dumps, compresses, and moves it there.
  • 8. 8 Facebook uses two different clusters for data analysis. • The Production Hive-Hadoop cluster is used to process tasks with severe deadlines. • The ad hoc Hive-Hadoop cluster does lower-priority jobs and ad hoc analysis activities. The Ad hoc cluster receives data replication from the Production cluster. The data analysis outcomes are saved to the Hadoop Hive cluster or, for Facebook users, to the MySQL tier. A graphical user interface (HiPal) or a command-line interface (Hive) is used to specify ad hoc analysis queries (Hive CLI). Facebook uses a Python framework to execute (Database) and schedule periodic batch jobs in the Production cluster.
  • 9. 9 Organizations are leveraging Big Data analytics to engage with their customers by understanding their behaviour more precisely. Some of the key takeaways from the article are – 1. Facebook uses all the available user information with its deep neural network models and finds the target audience for a particular advertisement. This helps to serve the users’ advertisements more insightfully. 2. Due to this, Facebook has emerged as one of the toughest competitors for Google Search Engine and Youtube in the digital marketing race. 3. With the humungous amount of user data present with Facebook and continuously increasing, there are still a lot of other use cases we might see in the future from Facebook.
  • 10. Data Science Case Study Walmart – Leveraging Data to Make Business Better
  • 11. • Walmart is the world’s largest retailer. It is one of the many major industries that is leveraging Big Data to make the business more efficient. Walmart handles a plethora of customer data. A staggering amount of about 2.5 petabytes of data is collected from the customers every hour. • This data is unstructured that is utilized through Hadoop and NoSQL. It tracks and monitors various factors that might affect the sales at Walmart stores. Data Science Case Study Walmart – Leveraging Data to Make Business Better
  • 14. • Traditional Business Intelligence was more descriptive and static in nature. However, with the addition of data science, it has transformed itself to become a more dynamic field. Data Science has rendered Business Intelligence to incorporate a wide range of business operations. • With the massive increase in the volume of data, businesses need data scientists to analyze and derive meaningful insights from the data. • The meaningful insights will help the data science companies to analyze information at a large scale and gain necessary decision-making strategies. The process of decision-making involves the evaluation and assessment of various factors involved in it. Decision Making is a four-step process: 1. Business Intelligence for Making Smarter Decisions
  • 16. • The process involves the analysis of customer reviews to find the best fit for the products. This analysis is carried out with the advanced analytical tools of Data Science. • Furthermore, industries utilize the current market trends to devise a product for the masses. These market trends provide businesses with clues about the current need for the product. Businesses evolve with innovation. • For example – Airbnb uses data science to improve its services. The data generated by the customers is processed and analyzed. It is then used by Airbnb to address the requirements and offers premier facilities to its customers. 2. Making Better Products
  • 17. • With Data Science, businesses can manage themselves more efficiently. Both large-scale businesses and small startups can benefit from data science in order to grow further. • Data Scientists help to analyze the health of the businesses. With data science, companies can predict the success rate of their strategies. Data Scientists are responsible for turning raw data into cooked data. • Based on this, the business can take important measures to quantify and evaluate its performance and take appropriate management steps. It can also help the managers to analyze and determine the potential candidates for the business. • For example – Data Science can be used to monitor the performance of employees. Using this, managers can analyze the contributions made by the employees and determine when they should be promoted, manage their perks, etc. 3. Managing Businesses Efficiently
  • 18. • Predictive analytics is the most important part of businesses. With the advent of advanced predictive tools and technologies, companies have expanded their capability to deal with diverse forms of data. • In formal terms, predictive analytics is the statistical analysis of data that involves several machine learning algorithms for predicting the future outcome using historical data. There are several predictive analytics tools like SAS, IBM SPSS, SAP, etc. • Predictive Analytics has its own specific implementation based on the type of industry. However, regardless of that, it shares a common role in predicting future events. 4. Predictive Analytics to Predict Outcomes
  • 19. • In the past, many businesses would take poor decisions due to the lack of surveys or sole reliance on ‘gut feelings’. It would result in some disastrous decisions leading to losses of millions. • However, with the presence of a plethora of data and necessary data tools, it is now possible for the data industries to make calculated data-driven decisions. • Furthermore, business decisions can be made with the help of powerful tools that can not only process data faster but also provide accurate results. 5. Leveraging Data for Business Decisions
  • 20. • After making decisions through the forecast of future occurrences, it is a requirement for the companies to assess them. This is possible through several hypothesis testing tools. • After implementing the decisions, businesses should understand how these decisions affect their performance and growth. If the decision leads to any negative factor, then they should analyze it and eliminate the problem that is slowing down their performance. • Furthermore, in order to assess future growth through the present course of actions, businesses can make profits considerably with the help of data science. 6. Assessing Business Decisions
  • 21. • Data Science has played a key role in bringing automation to several industries. It has taken away mundane and repetitive jobs. One such job is that of resume screening. Every day, companies have to deal with hordes of applicants’ resumes. • Data science technologies like image recognition are able to convert the visual information from the resume into a digital format. It then processes the data using various analytical algorithms like clustering and classification to churn out the right candidate for the job. • Furthermore, businesses study the right trends and analyze potential applicants for the job. This allows them to reach out to candidates and have an in-depth insight into the job-seeker market. 7. Automating Recruitment Processes
  • 22. • In the end, we understand how data science plays an important role in businesses. We realized how data science is being used for business intelligence, for improving products, for increasing the management capabilities of companies and for predictive analytics. • We also went through a use case of Walmart and how they utilize the data science to increase their efficiency. Summary
  • 24. References • RamezElmasri and Shamkant B. Navathe, “Fundamentals of Database System”, The Benjamin / Cummings Publishing Co. • https://www.geeksforgeeks.org/ • Sinan Ozdemir, “Principles of data science”, Packt> Publications, 2016.. 24