2. Birla Institute of Technology & Science, Pilani
Agenda – Section 10
Turning data into insight
• Data, Information, Knowledge
• Different types of Analytics
• Traditional Analytics
• Advanced Analytics
• Data Visualization
• Data Storytelling
HS Talks
3. Birla Institute of Technology & Science, Pilani
Data, Information and Knowledge
Data is defined as the carrier of information. Data, as such, does not provide, line by line,
fact by fact, or category by category, any value to the user.
• An example of a piece of data could be “bread” or “10.95.” Data is often too specific to be useful to us as
decision support.
Information is data that is aggregated to a level where it makes sense for decision
support in the shape of, for instance, reports, tables, or lists.
• An example of information could be that the sales of bread in the last three months have been
respectively $18,000, $23,000, and $19,000.
Knowledge, beyond information to the user, the information needs to be analyzed and
interpreted through a process that requires quantitative methods and business insight.
• The BA department, as an example, offers some suggestions regarding why bread sales have fluctuated
in the last three months. Reasons could be seasonal fluctuations, campaigns, new distribution
conditions, or competitors’ initiatives.
4. Birla Institute of Technology & Science, Pilani
Understanding and interpreting data
You can’t identify what data you need if you aren’t clear about what it is you want to find
out
Data-based decision making always starts with identifying your key business
questions
Class Discussion: What will you do to analyze the following questions:
• What are the demographics of our most valuable customers?
• What is the lifetime value of our customers?
• What are our key sales, revenue and profit trends?
• Which suppliers are most unreliable and why?
• How do we optimize our inventory management?
Make it easier to understand the data and pull-out key insights using:
• Data Visualization
• Story Telling
• Virtual Reality & Gamification
5. Birla Institute of Technology & Science, Pilani
Analyzing and processing data
Tools needed to turn data into insights - programming languages and analytics software
The process of extracting insights from data boils down to three steps:
1. preparing the data (identifying, cleaning and formatting the data)
2. building the analytics model; and
3. drawing a conclusion from the insights gained
Examples of Tools:
• Amazon Web Services includes its Elastic Cloud Compute and Elastic MapReduce services to offer large-
scale data storage and analysis in the cloud.
• Microsoft’s flagship analytical offering, HDInsight, is based on Hortonworks Data Platform, but tailored
to work with their own Azure cloud services and SQL Server database management system.
• Google has BigQuery, which is designed to let anyone with a bit of data science knowledge run queries
against vast data sets.
• (Refer next slide for an exhaustive list)
7. Birla Institute of Technology & Science, Pilani
Bias and importance of clean data
Bias in data is an error that occurs when certain elements of a dataset are overweighted or
overrepresented.
Bias leads to skewed outcomes, systematic prejudice, and low accuracy. Key types of bias include:
• Systemic bias occurs when certain social groups are favored, and others are devalued.
• In data science, selection bias occurs when you have data that aren’t properly randomized.
• A reporting bias is the inclusion of only a subset of results in an analysis, which typically only covers a small
fraction of evidence.
• Implicit biases occur when we make assumptions based on our personal experiences.
Data cleaning or Data cleansing is the process of fixing or removing incorrect, corrupted,
incorrectly formatted, duplicate, or incomplete data within a dataset. When combining multiple
data sources, there are many opportunities for data to be duplicated or mislabeled.
Imputation: The process of estimating likely values for missing data taking into account the
statistical characteristics of broader population, often simultaneously trying to minimize the bias
introduced through estimation. https://www.statice.ai/post/data-bias-types#Systemic_biases