SlideShare a Scribd company logo
BIG DATA | How to explain it & how to use it
for your career?
NetCom Learning
NetCom Learning – Managed Learning
Services
BIG DATA | How to explain it & how to use it for your career?
Today’s Agenda
If you ask people what BIG DATA is they often say it is about a lot of
data. But the world has ALWAYS had a lot of data! It is about
datafication – a word so new that even spellcheck functions don’t
know it’s a real word!
Today’s Agenda
 How BIG DATA changes career paths of even the most unsuspecting!
 How BIG DATA changes the way business decision are made.
 How BIG DATA changes who makes the decisions & the reshuffling balance of power.
 What BIG DATA skills can you bring to the office tomorrow to increase your value.
The experienced
Data scientists &
those managers
who leverage
them.
BIG DATA is a management tool even if you have other employees perform
the coding.
BIG DATA is as ubiquitous as the internet.
Gut instinct now
of less value
Datafication
A modern technological trend turning
many aspects of our life into computerized
data that transforms respective
information into new forms of value.
Data
Information
Knowledge
Wisdom
Insight
Knowledge—Wisdom--Insight Vincent Suppa
This is the fulcrum that changes everything.
Knowledge
Information
data
Insight
Wisdom
Actionable
Insight
BIG DATA
A Metaphor / Illustration
Diagraming an
Algorithm
Diagraming an
Algorithm
activity or
purpose natural
to or intended
for a person or
thing.
relationship or
expression
involving one
or more
variables.
Algorithm Script
BIG DATA | How to explain it & how to use it for your career?
Just as voice mail and email obviated the manager’s need of
secretarial functions  algorithms eating BIG DATA are now
obviating tactical managerial functions.
Transactional
Work
Tactical
Work
Strategy needs to consume data.
Data, without strategy, has little value.
Modified sine wave
Sine wave
What is the
difference between
analogue
and
digital?
Datafication
only possible due to digitalization of
analogue informaton.
Digital versus Analogue
Interprets continuous sine wave as a digital recreation.
BIG DATA | How to explain it & how to use it for your career?
BIG DATA | How to explain it & how to use it for your career?
BIG DATA | How to explain it & how to use it for your career?
BIG DATA | How to explain it & how to use it for your career?
BIG DATA | How to explain it & how to use it for your career?
BIG DATA | How to explain it & how to use it for your career?
BIG DATA | How to explain it & how to use it for your career?
BIG DATA | How to explain it & how to use it for your career?
This photo was
taken on film – not
a digital camera.
BIG DATA | How to explain it & how to use it for your career?
Are there data points within this
“single” data point?
Social
Construct
Another
example
of social
construct
Now to the
show.
Big data: broad term
for data sets so large &
complex that
traditional data
processing applications
are inadequate.
A terabyte, petabyte &
gigabyte walk into a bar...
Yotta
Zetta
Exa
Giga
Tera
Peta
To give us a sense of scale.
Yottabyte is
1,000 trillion gigabytes
Giga
Tera
Peta
Exa
Zetta
Yotta
Mega
Kilo
The Least You Need to Know About BIG DATA
BIG DATA manifests 3 basic shifts:
 From Small to All
 Clean to Messy
 Causation to Correlation
V. Suppa The Definitive 90 Thousand Foot Lecture on BIG Data© 2014
Scope of Traditional Data
 Data growth analogous to y = tan x.
 In 2000, ¼ of world’s information digital; reminder preserved in analog.
 digital data doubles around every 3 years
 In 2014 less than 2% of all stored information is analog. (And now we’re in 2017!)
The Definitive 90 Thousand Foot Lecture on BIG Data© 2014
Big Data is Not About Lots of Data
 Lots of data existed before Big Data!
 Big Data: ability to render aspects of life into data points
never quantified before.
 This is DATAFICATION … your new word of the day!
V.The Definitive 90 Thousand Foot Lecture on BIG Data© 2014
DATAFICATION
.
V.pa The Definitive 90 Thousand Foot Lecture on BIG Data© 2014
Location was datafied
before GPS was invented
 Words
treated as
data.
 Friendships
& likes
datafied, via
Facebook
 Shigeomi Koshimizu datafied body contour (body,
posture, weight distribution, etc.).
 Quantified “sitting down.” Measured pressure drivers
exert at 360 different points via sensors (0 to 256 scale).
Quality  Quantify
Datafication Turns Everything into a Data Point
Tools of Datafication
 inexpensive computers (commodity)
 powerful processors (commodity)
 basic statistics (commodity)
 clever software (commodity)
 smart algorithm (differentiator)
Lots of Data versus BIG DATA
Computers computing lots of data:
Teaching computer to translate by inputting bilingual dictionaries
Computers computing BIG DATA
Feed computer years of Canadian parliamentary transcripts French / English)
Statically program it to infer which word of English is best alternative to French
The Definitive 90 Thousand Foot Lecture on BIG Data© 2014
In context, French word lumiere
more appropriate substitute for
the English work light than
leger.
Isn’t this
how a
person
translates?
A Quick Review & then … Causation to Correlation
 sampling population  entire population
 pristine data  non curated messy data
 causation  correlation
Reasons on how the world works replaced with learning about
association among phenomena
 Knowing cause “is” desirable.
 But cause is harder to figure out
 Cause as illusion? Cognitive bias
V The Definitive 90 Thousand Foot Lecture on BIG Data© 2014
Saving Trucks Saving Babies
Saving
Epidemics
Saving
Buildings
Place sensors on parts to identify heat
& vibrational patterns associated with
failures leading to breakdowns.
Can predict a breakdown before it happens &
replace parts in garage & not on side of the road.
 Data does not tell us why the part is in trouble
 It reveals enough to know the what
 Can guide investigations into discovering underlying cause
Causation to Correlation
 When saving lives, knowing something is likely
to occur more important than knowing why.
 Eventually, “the why” will be investigated.
Can Big Data Save Babies?
Used Big Data to spot infections in premature
babies before symptoms appear.
 Information flow >1000 data points per second
 Discovered correlations between very minor changes and more serious problems
Big Data Predicts Epidemics Better than CDC
CDC tracks patient visits to clinics
Information suffers from 2 week reporting lag
Google took 50 mm most commonly searched terms from 2003 – 2008
Compared them against historical influenza data from CDC.
Searches then correlated with CDC’s data on outbreaks of flu.
How All Three Shifts Are Illustrated
Small to All
Ran 100% of US searches for 6 years through an algorithm
identified 45 searches correlated against CDC data on flu outbreak (runny nose,
body aches, etc. - ).
Clean to Messy
Searches imperfect with misspellings, incomplete phrases & included healthy
people searching on behalf of others.
Causation to Correlation
Will anyone claim typing symptoms in a search engine gives you the flu?
Big Data via searches predicts outbreaks real time compared to
CDC’s traditional data analytics that lag 2 week lag
Illegally subdivided buildings
more likely to catch fire.
200 inspectors to respond to 25K
complaints / year wrt overcrowded
buildings.
NYC created database of 900K buildings augmented
by troves of data collected by 19 agencies:
• Records of tax liens
• Anomalies in utility usage
• Service cuts
• Missed payments
• Ambulance visits
• Local crime rates
• Rodent complaints
• Etc.
Big Data
increases the
productivity of
each inspector
How Did They Do It?
1. Compared database (5 years of building fires)
2. Ranked by severity
3. Observed correlation. (Not causality!)
4. Data scientists triaged complaints for inspections.
Concluded that a building’s:
 type & age main predictor of fire; other variables superfluous
 permit for exterior brickwork correlated lower risk of fire.
Result: Vacate orders increased from 13% to 70%
Building characteristics did not cause fire but were correlated with fire risk.
Spending money on the exterior
correlates for an up to code interior
But just the intent to begin work
correlates enough to predict an outcome
BIG DATA | How to explain it & how to use it for your career?
Pull disparate sets of texts & puts them into a
“point of singularity.”
Currently ae 70% of data is text. Pictures to be
quantified under separate protocols.Create a Corpus  body of text to
be analyzed.
R, for example, has set of functions to clean up a Corpus by excluding data points
superfluous to analysis. (Delete commas, periods & words such as but & and, etc. –
R cleans up files by reducing corpus to primary words crucial to analysis.
Truncates words with common stem  this is called stemming. (e.g. engineer &
engineering both become the same word. Think of mathematical analogy of
number factoring versus least common dominator.
1
2
3
4Mathematical matrix to describes frequency of
terms that occur in a collection of documents.
Rows correspond to documents in the collection
& columns correspond to terms.
Create a document term matrix that measures
frequency of words that remain after corpus
“cleanup” discussed in previous slide.
4
You are left with primary
outputs that enable you to do
counts in each cell.
You’ve datafied or quantified
words that others only qualify
that prevents analysis.
You can now do lots of
interesting stuff!
Term document matrix cluster
analysis reveals prevalent themes.
Document-term matrix
Cluster analysis  review at how all your words cluster in your data matrix cluster.
The result of this analysis is that we can reduce our matrix to fewer columns.
Font Size & even
Color embedded
with information.
This information
is actionable.
For centuries we have manually counted sets of
words to determining their frequencies.
Zipf's law states that given some corpus of
natural language utterances, the frequency of any
word is inversely proportional to its rank in the
frequency table.
Used for resumes as a way to
increase information density – to
be covered at a future webinar.
 With these data sets, we can run sentiment analysis!
 Determine occurrence rate of certain themes qualified as opinions.
 To determine if people like a restaurant we’d look at words
reviewers used via social media in the comment section.
Love
10
Hate
-10
Dislike
- 7
Qualitatively, we quantify the
weakness or strength of these signals.
We determine words that correlate to
having disliked or liked the movie and
to what degree along a predetermined
discreet continuum .
Pre-establish words in
narrative responses now
embedded in clusters
signal positive or negative
statements about a movie,
restaurant or Hammacher
Schlemme customer
review.
Like
7
The difference between analog and digital signals is that
an analog signal is a continuous electrical message while digital is a
series of values that represent information.
To determinate what traits can predict future outcomes, look at historical data.
Correlate “judgements” to see if they can predict from groupings, meaning which
ones predict against other dataset.
This is cross validation and is determined by looking at historical data sets.
Master Algorithms script other
algorithms on an at need basis
free of human interaction.
Machine to machine (M2M) technology that
enables networked devices to exchange
information & perform actions without the
manual assistance of humans.
This is what is replacing traditional
managerial jobs.
Firms that still employ these types of
jobs feel less pressure to keep salaries at
pace with inflation over time.
Machine learning can test statistical models. ….. for
example, testing against known political party membership
& updating the algorithm as new data comes in.
In M2M, we let data points come in, refresh & update to
automatically script even more accurate algorithm.
Can infer your political affliction by
first 19 likes even if those likes are
completely apolitical.
What Can I Do Tomorrow Morning at the Office?
1. Take inventory of the data you already collect
A. Internal data.
B. External data accessed from FOI Act – to be discuss subsequently.
C. External data legally purchased from vendors (Yelp, FB, Double Click, etc.) -
D. Create glossary of data definition. (headcount example)
2. Determine decisions to derive from Big Data
A. Select most pressing problem based on Pareto 80/20 rule.
B. In plain English, state your problem statement.
C. Write down independent variables (inventory set of data at your disposal.)
D. Determine dependent variable (preferred outcome to your problem statement.)
3. Write down your hypothesis
4. Contact your IT or data science department. If not …..
5. Contract STEM grad students & turn them into data scientists
6. Code your hypothesis
Even if I hate coding and math!
QuantitativeSkills
The Freedom of Information
Act (FOIA), 5 U.S.C. § 552, is a
federal law that allows for
disclosure of previously
unreleased information controlled
by the US government.
Correlate to external
data with troves of data
from US gov’t.
(Examples: MTA apps)!
Enacted in 1966, allows
U.S. citizens to petition
government for official
information.
Business problem you are trying to solve in plain language stated as a
problem statement
State it in a hypothesis.
Collect Data, from systems
already set in place.
Test hypothesis
BIG DATA | How to explain it & how to use it for your career?
Coding is
the new
literacy.
Coding Classes.
Most are on-line, a
few on-site.
Some free & some
at cost.
Most of you will not be competing
with other coders – just other
Marketing, HR or Financial
professionals who know nothing
about coding!
Should I learn to read?
Should I learn how to use the internet?
Should I learn about coding?
BIG DATA | How to explain it & how to use it for your career?
BIG DATA | How to explain it & how to use it for your career?
A little about R• R – Free
• Contains embedded tools to pull external data
• Tools that scrape data from any website, (Reuters, as one example)
• Text Mining: Knime (another software tool for text mining) – you can
download it. (pronounced like 9 but with a “m”. Has graphical interface
instead of using a scripting language.)
• Remember, Word Clouds is an example of text mining.
• R was written in C language – coders wrote functions in “C” to create
macros in R to pull data - analogous to a macro in excel.
• R will let you pull data into a corpus.
KNIME - Konstanz Information Miner  open source data analytics, reporting & integration
platform. It integrates various components for machine learning & data mining.
You’re not competing against other coders.
You’re competing against others in your field that know
nothing about coding.
Facebook accomplished
what democratic gov’t
tried but failed to do –
build a database of
citizens.
Datafication turns all aspects of life & turns it into data.
Google’s
augmented reality
glasses datafy the
gaze
Twitter datafies
stray thoughts
LinkedIn
datafied
professional
networks
The Floor as a Giant iPad via
surface based computing technology
V. The Definitive 90 Thousand Foot Lecture on BIG Data© 2014
No thank you,
I’m just looking .
That’s okay,
I’m datafying
your every
move.
Touch sensitive floor
customers walk on
Download a Coupon Texted to You
• What aisles did you walk down or ignore?
• In what sequence did you browse the aisles?
• How long were you in the store?
• What is length of time between store visits?
• How long did you linger in front of the cereal aisle?
• When you checked out, did cereal wind up your cart? How many boxes?
Compare viewing patterns with what wound up in your shopping cart.
Script algorithms to better predicts independent variables (what
they stock) with the depend variable of revenue thresholds.
So, what’s my role again in a Big Data World?
As Big Data becomes ubiquitous what skills mark points of differentiations?
 Discovering latent needs & intuition that goes against the facts?
 The mere ability to define a problem proceeds its solution
Big Data has a quantitative & qualitative side
And if you hate math - qualitative skills to harness
 Develop observational skills to separate signal from the noise
 Take inventory of existing data
 Learn to develop hypotheses to test
 Learn how to access external data (FOIA. LinkedIn, etc. - )
 Liaison between internal ERP data & external data
 Network with STEM student to contract data scientists
Your Role in a Big Data World
If Ford queried BIG DATA to discover what customers want, he’d
come up with faster horses who required less water.
In Big Data world, traits to be developed:
 Creativity
 Intuition
 Intellectual curiosity
 Leveraging errors
 Risk taking
V The Definitive 90 Thousand Foot Lecture on BIG Data© 2014
Read outside your discipline
BIG DATA | How to explain it & how to use it for your career?
non sequitur
Vincent Suppa © 2016
Don’t be afraid to fail.
Business is not figure skating. It’s the X games!
If you fail, have quick-to-market failures that
mitigate loss & allow you to harvest what did work
for the next initiative.
Capitalism without failure is
like a religion without sin.
Recommended Courses
NetCom Learning offers a comprehensive portfolio for Big Data training
options. Please see below the list of recommended courses with upcoming
schedules:
Introduction to Python Programming
Essential Python
Introduction to Python Scripting: for the Security Analyst
Check out more Big Data training options with NetCom Learning. CLICK HERE
Our live webinars will help you to touch base a wide variety of IT, soft skills and business
productivity topics; and keep you up to date on the latest IT industry trends. Register
now for our upcoming webinars:
A Brief on Benefits of ITIL for the Organization – April 4
Visualization with Tableau to Enhance Efficiency in Organization – April 6
How Machine Learning Helps Organizations to Work More Efficiently? – April 11
Why Certified Associate in Project Management (CAPM) and How to Prepare? - April 18
A Brief About DevOps and its Practices – April 20
Special Promotion
Whether you're learning new IT or Business skills, or you are developing
a learning plan for your team, for limited time, register for our
Guarantee to Run classes and get 25% off on the course price.
Learn more»
To get latest technology updates, please follow our social media pages!
BIG DATA | How to explain it & how to use it for your career?
THANK YOU !!!

More Related Content

PDF
Big data is not just about the data
PDF
ANALYTICS OF DATA USING HADOOP-A REVIEW
PDF
ambient-computing
PDF
Creating Value in Health through Big Data
PPTX
The What, Why and How of Big Data
PPTX
Data Science: Not Just For Big Data
PDF
JIMS Rohini IT Flash Monthly Newsletter - October Issue
PDF
Data Science For Social Scientists Workshop
Big data is not just about the data
ANALYTICS OF DATA USING HADOOP-A REVIEW
ambient-computing
Creating Value in Health through Big Data
The What, Why and How of Big Data
Data Science: Not Just For Big Data
JIMS Rohini IT Flash Monthly Newsletter - October Issue
Data Science For Social Scientists Workshop

What's hot (20)

PDF
Using AI to Solve Data and IT Complexity -- And Better Enable AI
PDF
Big data analytics 1
PDF
OWF14 - Plenary Session : Ori Pekelman, Founder, Constellation Matrix
PDF
2019 June 27 - Big data and data science
PDF
Less is More: Behind the Data at Risk I/O
PPT
Big data v4.0
PPTX
Big Data and the Social Sciences
PDF
Mac201 big data
PDF
Booz Allen Field Guide to Data Science
PDF
Big Data [sorry] & Data Science: What Does a Data Scientist Do?
PPT
Data validation in the Digital Age
PDF
The Next Big Thing in Big Data
PDF
Introduction on Data Science
PDF
Data Science 101
PPTX
2014 aus-agta
PDF
Data Science and Culture
PPTX
Big Data and Data Science: The Technologies Shaping Our Lives
PDF
REST and eHealth
PDF
2951085 dzone-2016guidetobigdata
Using AI to Solve Data and IT Complexity -- And Better Enable AI
Big data analytics 1
OWF14 - Plenary Session : Ori Pekelman, Founder, Constellation Matrix
2019 June 27 - Big data and data science
Less is More: Behind the Data at Risk I/O
Big data v4.0
Big Data and the Social Sciences
Mac201 big data
Booz Allen Field Guide to Data Science
Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Data validation in the Digital Age
The Next Big Thing in Big Data
Introduction on Data Science
Data Science 101
2014 aus-agta
Data Science and Culture
Big Data and Data Science: The Technologies Shaping Our Lives
REST and eHealth
2951085 dzone-2016guidetobigdata
Ad

Similar to BIG DATA | How to explain it & how to use it for your career? (20)

PDF
Level Seven - Expedient Big Data presentation
PPTX
Big Data
PPTX
Class_5_Data_2018W_pptx.pptx
PPT
01-introduction.ppt the paper that you can unless you want to join me because...
PPTX
Bigdata Hadoop introduction
PDF
Intro big data.pdf
PDF
Big databigideasit4bc
PDF
Ictam big data
PPTX
Foundations of Big Data: Concepts, Techniques, and Applications
PDF
Intro to Data Science
PPTX
Big Data Analytics_Unit1.pptx
PPTX
Unit-1 -2-3- BDA PIET 6 AIDS.pptx
PPT
Big data
DOCX
5_6060001354879861968.docx
PDF
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
PPT
Big Data and data analytics ,Business Intelligence/Analytics
PPT
Big Data and Data Analytics,Business Intelligence/Analytics
PPTX
bigdata introduction for students pg msc
PPTX
Spark Social Media
Level Seven - Expedient Big Data presentation
Big Data
Class_5_Data_2018W_pptx.pptx
01-introduction.ppt the paper that you can unless you want to join me because...
Bigdata Hadoop introduction
Intro big data.pdf
Big databigideasit4bc
Ictam big data
Foundations of Big Data: Concepts, Techniques, and Applications
Intro to Data Science
Big Data Analytics_Unit1.pptx
Unit-1 -2-3- BDA PIET 6 AIDS.pptx
Big data
5_6060001354879861968.docx
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
Big Data and data analytics ,Business Intelligence/Analytics
Big Data and Data Analytics,Business Intelligence/Analytics
bigdata introduction for students pg msc
Spark Social Media
Ad

More from Tuan Yang (20)

PDF
Learn How to Configure Cisco Data Center Core Networking(Handouts).pdf
PDF
Best Practices to Cybersecurity Vulnerability Management,.pdf
PDF
Defense Against Multi-Network Breaches.pdf
PDF
Cybersecurity Incident Handling & Response in Under 40 Minutes.pdf
PDF
An Introduction to CompTIA Security+ - SY0-601.pdf
PDF
CCNP Enterprise Networks Move One Step Closer to Advanced Networking(Handout)...
PDF
What is New with CompTIA Network+.pdf
PDF
What is new with CompTIA PenTest+- PT0 002 - NetCom Learning.pdf
PDF
Agile Fundamentals One Step Guide for Agile Projects(Handout).pdf
PDF
Getting Started with AWS Devops.pdf
PDF
Certified Ethical Hacker v11 First Look.pdf
PDF
An overview of agile methods and agile project management
PDF
The essentials of ccna master the latest principles(handouts)
PDF
Unlock the value of itil 4 with 5 key takeaways that can be used today(handout)
PDF
CHFI First Look by NetCom Learning - A Free Course on Digital Forensics
PDF
Master Class: Understand the Fundamentals of Architecting on AWS
PDF
How to Deploy Microsoft 365 Apps and Workloads.
PDF
Learn to utilize cisco unified communications for better collaboration( hando...
PDF
NetCom learning webinar how to manage your projects with disciplined agile (d...
PDF
NetCom learning webinar cnd first look by netcom learning - network defender fre
Learn How to Configure Cisco Data Center Core Networking(Handouts).pdf
Best Practices to Cybersecurity Vulnerability Management,.pdf
Defense Against Multi-Network Breaches.pdf
Cybersecurity Incident Handling & Response in Under 40 Minutes.pdf
An Introduction to CompTIA Security+ - SY0-601.pdf
CCNP Enterprise Networks Move One Step Closer to Advanced Networking(Handout)...
What is New with CompTIA Network+.pdf
What is new with CompTIA PenTest+- PT0 002 - NetCom Learning.pdf
Agile Fundamentals One Step Guide for Agile Projects(Handout).pdf
Getting Started with AWS Devops.pdf
Certified Ethical Hacker v11 First Look.pdf
An overview of agile methods and agile project management
The essentials of ccna master the latest principles(handouts)
Unlock the value of itil 4 with 5 key takeaways that can be used today(handout)
CHFI First Look by NetCom Learning - A Free Course on Digital Forensics
Master Class: Understand the Fundamentals of Architecting on AWS
How to Deploy Microsoft 365 Apps and Workloads.
Learn to utilize cisco unified communications for better collaboration( hando...
NetCom learning webinar how to manage your projects with disciplined agile (d...
NetCom learning webinar cnd first look by netcom learning - network defender fre

Recently uploaded (20)

PDF
COST SHEET- Tender and Quotation unit 2.pdf
PDF
Dr. Enrique Segura Ense Group - A Self-Made Entrepreneur And Executive
PDF
Stem Cell Market Report | Trends, Growth & Forecast 2025-2034
PDF
Traveri Digital Marketing Seminar 2025 by Corey and Jessica Perlman
PDF
kom-180-proposal-for-a-directive-amending-directive-2014-45-eu-and-directive-...
PDF
pdfcoffee.com-opt-b1plus-sb-answers.pdfvi
PPTX
HR Introduction Slide (1).pptx on hr intro
PDF
BsN 7th Sem Course GridNNNNNNNN CCN.pdf
DOCX
unit 1 COST ACCOUNTING AND COST SHEET
PPTX
Lecture (1)-Introduction.pptx business communication
PDF
Katrina Stoneking: Shaking Up the Alcohol Beverage Industry
PPTX
ICG2025_ICG 6th steering committee 30-8-24.pptx
PDF
SIMNET Inc – 2023’s Most Trusted IT Services & Solution Provider
PPTX
5 Stages of group development guide.pptx
PPTX
job Avenue by vinith.pptxvnbvnvnvbnvbnbmnbmbh
PPTX
Belch_12e_PPT_Ch18_Accessible_university.pptx
PDF
Deliverable file - Regulatory guideline analysis.pdf
DOCX
unit 2 cost accounting- Tender and Quotation & Reconciliation Statement
PDF
WRN_Investor_Presentation_August 2025.pdf
DOCX
Business Management - unit 1 and 2
COST SHEET- Tender and Quotation unit 2.pdf
Dr. Enrique Segura Ense Group - A Self-Made Entrepreneur And Executive
Stem Cell Market Report | Trends, Growth & Forecast 2025-2034
Traveri Digital Marketing Seminar 2025 by Corey and Jessica Perlman
kom-180-proposal-for-a-directive-amending-directive-2014-45-eu-and-directive-...
pdfcoffee.com-opt-b1plus-sb-answers.pdfvi
HR Introduction Slide (1).pptx on hr intro
BsN 7th Sem Course GridNNNNNNNN CCN.pdf
unit 1 COST ACCOUNTING AND COST SHEET
Lecture (1)-Introduction.pptx business communication
Katrina Stoneking: Shaking Up the Alcohol Beverage Industry
ICG2025_ICG 6th steering committee 30-8-24.pptx
SIMNET Inc – 2023’s Most Trusted IT Services & Solution Provider
5 Stages of group development guide.pptx
job Avenue by vinith.pptxvnbvnvnvbnvbnbmnbmbh
Belch_12e_PPT_Ch18_Accessible_university.pptx
Deliverable file - Regulatory guideline analysis.pdf
unit 2 cost accounting- Tender and Quotation & Reconciliation Statement
WRN_Investor_Presentation_August 2025.pdf
Business Management - unit 1 and 2

BIG DATA | How to explain it & how to use it for your career?

  • 1. BIG DATA | How to explain it & how to use it for your career?
  • 3. NetCom Learning – Managed Learning Services
  • 5. Today’s Agenda If you ask people what BIG DATA is they often say it is about a lot of data. But the world has ALWAYS had a lot of data! It is about datafication – a word so new that even spellcheck functions don’t know it’s a real word! Today’s Agenda  How BIG DATA changes career paths of even the most unsuspecting!  How BIG DATA changes the way business decision are made.  How BIG DATA changes who makes the decisions & the reshuffling balance of power.  What BIG DATA skills can you bring to the office tomorrow to increase your value.
  • 6. The experienced Data scientists & those managers who leverage them. BIG DATA is a management tool even if you have other employees perform the coding. BIG DATA is as ubiquitous as the internet. Gut instinct now of less value
  • 7. Datafication A modern technological trend turning many aspects of our life into computerized data that transforms respective information into new forms of value.
  • 10. A Metaphor / Illustration
  • 12. activity or purpose natural to or intended for a person or thing. relationship or expression involving one or more variables.
  • 15. Just as voice mail and email obviated the manager’s need of secretarial functions  algorithms eating BIG DATA are now obviating tactical managerial functions. Transactional Work Tactical Work
  • 16. Strategy needs to consume data. Data, without strategy, has little value.
  • 17. Modified sine wave Sine wave What is the difference between analogue and digital? Datafication only possible due to digitalization of analogue informaton.
  • 19. Interprets continuous sine wave as a digital recreation.
  • 28. This photo was taken on film – not a digital camera.
  • 30. Are there data points within this “single” data point? Social Construct
  • 33. Big data: broad term for data sets so large & complex that traditional data processing applications are inadequate.
  • 34. A terabyte, petabyte & gigabyte walk into a bar...
  • 36. Yottabyte is 1,000 trillion gigabytes Giga Tera Peta Exa Zetta Yotta Mega Kilo
  • 37. The Least You Need to Know About BIG DATA BIG DATA manifests 3 basic shifts:  From Small to All  Clean to Messy  Causation to Correlation V. Suppa The Definitive 90 Thousand Foot Lecture on BIG Data© 2014
  • 38. Scope of Traditional Data  Data growth analogous to y = tan x.  In 2000, ¼ of world’s information digital; reminder preserved in analog.  digital data doubles around every 3 years  In 2014 less than 2% of all stored information is analog. (And now we’re in 2017!) The Definitive 90 Thousand Foot Lecture on BIG Data© 2014
  • 39. Big Data is Not About Lots of Data  Lots of data existed before Big Data!  Big Data: ability to render aspects of life into data points never quantified before.  This is DATAFICATION … your new word of the day! V.The Definitive 90 Thousand Foot Lecture on BIG Data© 2014
  • 40. DATAFICATION . V.pa The Definitive 90 Thousand Foot Lecture on BIG Data© 2014 Location was datafied before GPS was invented  Words treated as data.  Friendships & likes datafied, via Facebook
  • 41.  Shigeomi Koshimizu datafied body contour (body, posture, weight distribution, etc.).  Quantified “sitting down.” Measured pressure drivers exert at 360 different points via sensors (0 to 256 scale). Quality  Quantify Datafication Turns Everything into a Data Point
  • 42. Tools of Datafication  inexpensive computers (commodity)  powerful processors (commodity)  basic statistics (commodity)  clever software (commodity)  smart algorithm (differentiator)
  • 43. Lots of Data versus BIG DATA Computers computing lots of data: Teaching computer to translate by inputting bilingual dictionaries Computers computing BIG DATA Feed computer years of Canadian parliamentary transcripts French / English) Statically program it to infer which word of English is best alternative to French The Definitive 90 Thousand Foot Lecture on BIG Data© 2014 In context, French word lumiere more appropriate substitute for the English work light than leger. Isn’t this how a person translates?
  • 44. A Quick Review & then … Causation to Correlation  sampling population  entire population  pristine data  non curated messy data  causation  correlation Reasons on how the world works replaced with learning about association among phenomena  Knowing cause “is” desirable.  But cause is harder to figure out  Cause as illusion? Cognitive bias V The Definitive 90 Thousand Foot Lecture on BIG Data© 2014
  • 45. Saving Trucks Saving Babies Saving Epidemics Saving Buildings
  • 46. Place sensors on parts to identify heat & vibrational patterns associated with failures leading to breakdowns. Can predict a breakdown before it happens & replace parts in garage & not on side of the road.  Data does not tell us why the part is in trouble  It reveals enough to know the what  Can guide investigations into discovering underlying cause Causation to Correlation
  • 47.  When saving lives, knowing something is likely to occur more important than knowing why.  Eventually, “the why” will be investigated.
  • 48. Can Big Data Save Babies? Used Big Data to spot infections in premature babies before symptoms appear.  Information flow >1000 data points per second  Discovered correlations between very minor changes and more serious problems
  • 49. Big Data Predicts Epidemics Better than CDC CDC tracks patient visits to clinics Information suffers from 2 week reporting lag Google took 50 mm most commonly searched terms from 2003 – 2008 Compared them against historical influenza data from CDC. Searches then correlated with CDC’s data on outbreaks of flu.
  • 50. How All Three Shifts Are Illustrated Small to All Ran 100% of US searches for 6 years through an algorithm identified 45 searches correlated against CDC data on flu outbreak (runny nose, body aches, etc. - ). Clean to Messy Searches imperfect with misspellings, incomplete phrases & included healthy people searching on behalf of others. Causation to Correlation Will anyone claim typing symptoms in a search engine gives you the flu? Big Data via searches predicts outbreaks real time compared to CDC’s traditional data analytics that lag 2 week lag
  • 51. Illegally subdivided buildings more likely to catch fire. 200 inspectors to respond to 25K complaints / year wrt overcrowded buildings.
  • 52. NYC created database of 900K buildings augmented by troves of data collected by 19 agencies: • Records of tax liens • Anomalies in utility usage • Service cuts • Missed payments • Ambulance visits • Local crime rates • Rodent complaints • Etc. Big Data increases the productivity of each inspector
  • 53. How Did They Do It? 1. Compared database (5 years of building fires) 2. Ranked by severity 3. Observed correlation. (Not causality!) 4. Data scientists triaged complaints for inspections. Concluded that a building’s:  type & age main predictor of fire; other variables superfluous  permit for exterior brickwork correlated lower risk of fire. Result: Vacate orders increased from 13% to 70% Building characteristics did not cause fire but were correlated with fire risk.
  • 54. Spending money on the exterior correlates for an up to code interior But just the intent to begin work correlates enough to predict an outcome
  • 56. Pull disparate sets of texts & puts them into a “point of singularity.” Currently ae 70% of data is text. Pictures to be quantified under separate protocols.Create a Corpus  body of text to be analyzed. R, for example, has set of functions to clean up a Corpus by excluding data points superfluous to analysis. (Delete commas, periods & words such as but & and, etc. – R cleans up files by reducing corpus to primary words crucial to analysis. Truncates words with common stem  this is called stemming. (e.g. engineer & engineering both become the same word. Think of mathematical analogy of number factoring versus least common dominator. 1 2 3
  • 57. 4Mathematical matrix to describes frequency of terms that occur in a collection of documents. Rows correspond to documents in the collection & columns correspond to terms. Create a document term matrix that measures frequency of words that remain after corpus “cleanup” discussed in previous slide. 4 You are left with primary outputs that enable you to do counts in each cell. You’ve datafied or quantified words that others only qualify that prevents analysis. You can now do lots of interesting stuff! Term document matrix cluster analysis reveals prevalent themes. Document-term matrix
  • 58. Cluster analysis  review at how all your words cluster in your data matrix cluster. The result of this analysis is that we can reduce our matrix to fewer columns. Font Size & even Color embedded with information. This information is actionable.
  • 59. For centuries we have manually counted sets of words to determining their frequencies. Zipf's law states that given some corpus of natural language utterances, the frequency of any word is inversely proportional to its rank in the frequency table. Used for resumes as a way to increase information density – to be covered at a future webinar.
  • 60.  With these data sets, we can run sentiment analysis!  Determine occurrence rate of certain themes qualified as opinions.  To determine if people like a restaurant we’d look at words reviewers used via social media in the comment section. Love 10 Hate -10 Dislike - 7 Qualitatively, we quantify the weakness or strength of these signals. We determine words that correlate to having disliked or liked the movie and to what degree along a predetermined discreet continuum . Pre-establish words in narrative responses now embedded in clusters signal positive or negative statements about a movie, restaurant or Hammacher Schlemme customer review. Like 7
  • 61. The difference between analog and digital signals is that an analog signal is a continuous electrical message while digital is a series of values that represent information.
  • 62. To determinate what traits can predict future outcomes, look at historical data. Correlate “judgements” to see if they can predict from groupings, meaning which ones predict against other dataset. This is cross validation and is determined by looking at historical data sets. Master Algorithms script other algorithms on an at need basis free of human interaction. Machine to machine (M2M) technology that enables networked devices to exchange information & perform actions without the manual assistance of humans. This is what is replacing traditional managerial jobs. Firms that still employ these types of jobs feel less pressure to keep salaries at pace with inflation over time.
  • 63. Machine learning can test statistical models. ….. for example, testing against known political party membership & updating the algorithm as new data comes in. In M2M, we let data points come in, refresh & update to automatically script even more accurate algorithm. Can infer your political affliction by first 19 likes even if those likes are completely apolitical.
  • 64. What Can I Do Tomorrow Morning at the Office? 1. Take inventory of the data you already collect A. Internal data. B. External data accessed from FOI Act – to be discuss subsequently. C. External data legally purchased from vendors (Yelp, FB, Double Click, etc.) - D. Create glossary of data definition. (headcount example) 2. Determine decisions to derive from Big Data A. Select most pressing problem based on Pareto 80/20 rule. B. In plain English, state your problem statement. C. Write down independent variables (inventory set of data at your disposal.) D. Determine dependent variable (preferred outcome to your problem statement.) 3. Write down your hypothesis 4. Contact your IT or data science department. If not ….. 5. Contract STEM grad students & turn them into data scientists 6. Code your hypothesis Even if I hate coding and math! QuantitativeSkills
  • 65. The Freedom of Information Act (FOIA), 5 U.S.C. § 552, is a federal law that allows for disclosure of previously unreleased information controlled by the US government. Correlate to external data with troves of data from US gov’t. (Examples: MTA apps)! Enacted in 1966, allows U.S. citizens to petition government for official information.
  • 66. Business problem you are trying to solve in plain language stated as a problem statement State it in a hypothesis. Collect Data, from systems already set in place. Test hypothesis
  • 68. Coding is the new literacy. Coding Classes. Most are on-line, a few on-site. Some free & some at cost. Most of you will not be competing with other coders – just other Marketing, HR or Financial professionals who know nothing about coding!
  • 69. Should I learn to read? Should I learn how to use the internet? Should I learn about coding?
  • 72. A little about R• R – Free • Contains embedded tools to pull external data • Tools that scrape data from any website, (Reuters, as one example) • Text Mining: Knime (another software tool for text mining) – you can download it. (pronounced like 9 but with a “m”. Has graphical interface instead of using a scripting language.) • Remember, Word Clouds is an example of text mining. • R was written in C language – coders wrote functions in “C” to create macros in R to pull data - analogous to a macro in excel. • R will let you pull data into a corpus. KNIME - Konstanz Information Miner  open source data analytics, reporting & integration platform. It integrates various components for machine learning & data mining.
  • 73. You’re not competing against other coders. You’re competing against others in your field that know nothing about coding.
  • 74. Facebook accomplished what democratic gov’t tried but failed to do – build a database of citizens.
  • 75. Datafication turns all aspects of life & turns it into data. Google’s augmented reality glasses datafy the gaze Twitter datafies stray thoughts LinkedIn datafied professional networks
  • 76. The Floor as a Giant iPad via surface based computing technology V. The Definitive 90 Thousand Foot Lecture on BIG Data© 2014 No thank you, I’m just looking . That’s okay, I’m datafying your every move. Touch sensitive floor customers walk on
  • 77. Download a Coupon Texted to You • What aisles did you walk down or ignore? • In what sequence did you browse the aisles? • How long were you in the store? • What is length of time between store visits? • How long did you linger in front of the cereal aisle? • When you checked out, did cereal wind up your cart? How many boxes? Compare viewing patterns with what wound up in your shopping cart. Script algorithms to better predicts independent variables (what they stock) with the depend variable of revenue thresholds.
  • 78. So, what’s my role again in a Big Data World? As Big Data becomes ubiquitous what skills mark points of differentiations?  Discovering latent needs & intuition that goes against the facts?  The mere ability to define a problem proceeds its solution Big Data has a quantitative & qualitative side And if you hate math - qualitative skills to harness  Develop observational skills to separate signal from the noise  Take inventory of existing data  Learn to develop hypotheses to test  Learn how to access external data (FOIA. LinkedIn, etc. - )  Liaison between internal ERP data & external data  Network with STEM student to contract data scientists
  • 79. Your Role in a Big Data World If Ford queried BIG DATA to discover what customers want, he’d come up with faster horses who required less water. In Big Data world, traits to be developed:  Creativity  Intuition  Intellectual curiosity  Leveraging errors  Risk taking V The Definitive 90 Thousand Foot Lecture on BIG Data© 2014 Read outside your discipline
  • 81. non sequitur Vincent Suppa © 2016 Don’t be afraid to fail. Business is not figure skating. It’s the X games! If you fail, have quick-to-market failures that mitigate loss & allow you to harvest what did work for the next initiative. Capitalism without failure is like a religion without sin.
  • 82. Recommended Courses NetCom Learning offers a comprehensive portfolio for Big Data training options. Please see below the list of recommended courses with upcoming schedules: Introduction to Python Programming Essential Python Introduction to Python Scripting: for the Security Analyst Check out more Big Data training options with NetCom Learning. CLICK HERE
  • 83. Our live webinars will help you to touch base a wide variety of IT, soft skills and business productivity topics; and keep you up to date on the latest IT industry trends. Register now for our upcoming webinars: A Brief on Benefits of ITIL for the Organization – April 4 Visualization with Tableau to Enhance Efficiency in Organization – April 6 How Machine Learning Helps Organizations to Work More Efficiently? – April 11 Why Certified Associate in Project Management (CAPM) and How to Prepare? - April 18 A Brief About DevOps and its Practices – April 20
  • 84. Special Promotion Whether you're learning new IT or Business skills, or you are developing a learning plan for your team, for limited time, register for our Guarantee to Run classes and get 25% off on the course price. Learn more»
  • 85. To get latest technology updates, please follow our social media pages!

Editor's Notes

  • #41: Words now treated as data when computers mine century’s worth of books. Even friendship and “likes” are datafied, via Facebook
  • #44: It can infer the probability that a traffic light is green and not red “or”
  • #82: 65 slides as of October 24 2016 -