SlideShare a Scribd company logo
The Pros and Cons of Big Data in an
ePatient World
Kent Bottles, MD
Chief Medical Officer, PYA Analytics
ePatient Connections/2013
Philadelphia, Pennsylvania
September 16, 2013
Big Data
Viktor Mayer-Schonberger & Kenneth Cukier, 2013
• Big data refers to things one can do at a large scale
that cannot be done at a smaller one, to extract
new insights or create new forms of value, in ways
that change markets, organizations, the
relationship between citizens and governments.
• Causality is replaced by correlation
• Not knowing why but only what
Big Data
Viktor Mayer-Schonberger & Kenneth Cukier, 2013
• Statistics allows richest findings using the smallest
amount of data
• Randomness trumped sample size
• 2007 300 exabytes of stored data
• 2013 1,200 exabytes of stored data
• 2013 only 2% is non-digital
Sizing Up Big Data
Steve Lohr, NY Times, June 20, 2013
• Bundle of technologies
– Web pages, browsing habits, sensor signals, social
media, GPS location data, genomic information,
surveillance videos
– Advances in data storage and processing
– Machine learning/AI software to find actionable
correlations from the big data
Sizing Up Big Data
Steve Lohr, NY Times, June 20, 2013
• Philosophy about how decisions should be made
– Decisions based on data and analysis
– Less based on experience and gut intuition
– Eliminates anchoring bias and confirmation bias
• Revolution in measurement
– Digital equivalent of the telescope
– Digital equivalent of the microscope
Big Data
WSJ March 11, 2013
• 1950s 600 megabytes (John Hancock)
• 1960s 807 megabytes (AA Sabre)
• 1970s 80 gigabytes (Fed Express Cosmos)
• 1980s 450 gigabytes (CitiCorp NAIB)
• 1990s 180 terabytes (WalMart)
• 2000s 25 petabytes (Google)
• 2010s 100 petabytes (Facebook)
Big Data
WSJ March 11, 2013
• 1 Bit = Binary Digit
• 8 Bits = 1 Byte
• 1000 Bytes = 1 Kilobyte
• 1000 Kilobytes = 1 Megabyte
• 1000 Megabytes = 1 Gigabyte
• 1000 Gigabytes = 1 Terabyte
• 1000 Terabytes = 1 Petabyte
• 1000 Petabytes = 1 Exabyte
• 1000 Exabytes = 1 Zettabyte
Jeffrey Hammerbacher
http://www.youtube.com/watch?v=OVBZTDREg7c
• All industries are being disrupted
– Moneyball, 538, Large Hadron Collider
• McKinsley: Big Data: The Next Frontier for
Competition
– $338 billion potential annual value to US healthcare
– $165 billion in clinical operations
– $105 billion in research and development
Jeffrey Hammerbacher
http://www.youtube.com/watch?v=OVBZTDREg7c
• Oracle: From Overload to Impact
– Healthcare executives say collecting & managing more
business information today than 2 years ago
– Average increase 85% per year
• Frost & Sullivan: US Hospital Health Data
Analytics Market
– 2011 10% of US hospitals use data analytic tools
– 2016 50% of US hospitals will use data analytic tools
Jeffrey Hammerbacher on Moneyball
www.youtube.com/watch?v=OVBZTDREg7c
• Triple Crown in MLB: Batting average, RBI, HR
• OPS (on base plus slugging)
• GPA (gross production average)
• TOB (times on base)
• The outcome is how many runs we score and allow; A’s
have Matt Stairs; Need stat that reflects both runs
produced at bat & runs saved by defense
• WAR (“Wins above replacement”)
Big Data
Viktor Mayer-Schonberger & Kenneth Cukier, 2013
• To analyze & understand the world we used to test
hypotheses driven by theories
• Big data discards theories & causality for
correlations
• University of Ontario premature baby studies
• 1,260 data points per second
• Diagnose infections 24 hours before apparent
• Very constant vital signs indicate impending
infection
Big Data
Viktor Mayer-Schonberger & Kenneth Cukier, 2013
• Google Nature article predicts flu spread in USA
• Compared 50 million search terms with CDC data
on spread of flu from 2003 to 2008
• 450 million different mathematical models
• 45 search terms had strong correlation with spread
of flu
• H1N1 crisis in 2009 Google approach worked
New Tools to Combat Epidemics
Amy O’Leary, NY Times, June 20, 2013
• Google Flu overestimates spread of flu in 2013
• Goggle Flu does not track new diseases
• BioMosaic
– Combines airline records, disease reports, demographic
data
– Website and iPad app
– Showed 5 counties in Florida, 5 counties in NY were
most at risk from cholera epidemic in Haiti in 2010
New York City’s Office of Policy & Strategic
Planning
• 1 terabyte of data flows into office every day
• 95% success rate in identifying restaurants
dumping cooking oil into sewers
• Doubled the hit rate of finding stores selling
bootleg cigarettes
• Sped removal of trees toppled by Sandy
• Guided building inspectors to increase citation
rate from 13 to 80% for buildings likely to have
catastrophic house fires
Algorithms Mine Public Data
• Atul Butte combined data from 130 studies of
gene activity levels in diabetic & healthy tissue
• Butte identified new gene associate with Type 2
DM because stood out in 78/130 studies
• Algorithm looking for drugs & diseases that had
opposing effects on gene expression
– Cimetidine for lung adenocarcinomas
– Topiramate for Chrohn’s Disease
Algorithms Mine Public Data
• Russ Altman used algorithms to mine Stanford
Translational Research Integrated Database
Environment & FDA adverse event reports
database
• Patients taking SSRI antidepressants and thiazide
are at increased risk for long QT syndrome, a
serious cardiac arrhythmia
Big Data
Viktor Mayer-Schonberger & Kenneth Cukier, 2013
• GPS allows us to establish location quickly,
cheaply, and without requiring specialized
knowledge
• UPS uses geo-loc data from sensors, wireless
modules, and GPS on vehicles
• 2011 UPS shaved 30 million miles off routes,
saved 3 million gallons of fuel, and 30,000 metric
tons of carbon dioxide emissions
Big Data
Viktor Mayer-Schonberger & Kenneth Cukier, 2013
• Datafication of acts of living
• Zeo large database of sleep patterns
• Asthmapolis sensor to inhaler that tracks location
via GPS identifies environmental triggers
• Fitbit and Jawbone
• iTrem monitors Parkinson’s tremors almost as
well as the tri-axial accelerometer used in
specialized office medical equipment
Big Data for Cancer Care
Ron Winslow, WSJ, March 27, 2013
• ASCO
• Database of hundreds of thousands of patients
• Prototype has collected 100,000 breast cancer
patients from 27 groups who have different EMRs
• “Recognition that big data is imperative for the
future of medicine” Lynn Etheredge
• Less than 5% of adult cancer patients participate
in randomized clinical trials
Big Data
Viktor Mayer-Schonberger & Kenneth Cukier, 2013
• Recombinant data
• Danish Cancer Society study on cell phone/cancer
• Cellphone users from 1987 to 1995 (358,403)
• Brain cancer patients (10,729)
• Registry of education and disposable income
• Combining the three databases found no increase in risk of
cancer for those who used cell phones
• Not based on sample size; based on N=all
Big Data
Viktor Mayer-Schonberger & Kenneth Cukier, 2013
• Multiple uses of same database
• Data exhaust: digital trail people leave in their
wake
• Google spell-checking system uses bad data to
improve search, autocomplete feature in Gmail,
Google Docs, and translation system
Big Data
Viktor Mayer-Schonberger & Kenneth Cukier, 2013
• Paralyzing privacy
– Notice and consent
– Cannot give informed consent for secondary uses
– Anonymization does not work
• AOL 2006 20 million search queries from 657,000 users: NY
Times identified user number 4417749 as Thelma Arnold
(“My goodness, it’s my whole personal life. I had no idea
somebody was looking over my shoulder”)
• Netflix Prize 100 million rental records from 500,000 users.
Mother and closeted lesbian in Midwest was reidentified
Big Data
Viktor Mayer-Schonberger & Kenneth Cukier, 2013
• Probability and punishment
– Minority Report: People are imprisoned not for what
they did, but for what they are foreseen to do, even
though they never actually commit the crime
– Blue CRUSH (Crime Reduction, Utilizing Statistical
History in Memphis, Tennessee)
– Homeland Security FAST (Future Attribute Screening
Technology)
– Big data based on correlation unsuitable tool to judge
causality and thus assign individual culpability
Big Data
Viktor Mayer-Schonberger & Kenneth Cukier, 2013
• Dictatorship of Data
– Relying on numbers when they are far more fallible
than we think
– Robert McNamara’s body count numbers in Vietnam
– Michael Eisen tried to buy The Making of a Fly on
Amazon in April 2011. Two established sellers offering
the book for $1,730,045 and $2,198,177. Two week
escalation to a peak of $23,698,655.93 on April 18
– Unsupervised algorithms priced the books for the two
sellers.
Big Data
Viktor Mayer-Schonberger & Kenneth Cukier, 2013
• Regulatory shift from “privacy by consent” to
“privacy through accountability”
• “Differential privacy” through deliberately
blurring the data so hard to reidentify people
• Openness, certification, disprovability
• Algorithmists to perform “audits”
What Big Data Can’t Do
David Brooks, NY Times, February 26, 2013
• Data struggles with the social
• Data struggles with context
• Data creates bigger haystacks (spurious
correlations that are statistically significant)
• Data has trouble with big problems
• Data favors memes over masterpieces
• Data obscures values
What Big Data Will Never Explain
http://www.newrepublic.com/article/112734/what-big-data-will-never-explain
• “To datafy a phenomenon,” they explain, “is to
put it in a quantified format so it can be tabulated
and analyzed.”
• Sentiment analysis mathematical model for grief
called Good Grief Algorithm
• “The mathematization of subjectivity will founder
upon the resplendent fact that we are ambiguous
beings. We frequently have mixed feelings, and
are divided against ourselves.”
The Hidden Biases of Big Data
http://blogs.hbr.org/cs/2013/04/the_hidden_biases_in_big_data.html
• Big Data vs. Data with Depth
• “With enough data, the numbers speak for themselves.”
Chris Anderson
• Can numbers actually speak for themselves? Sadly, they
can't. Data and data sets are not objective; they are
creations of human design. We give numbers their voice,
draw inferences from them, and define their meaning
through our interpretations.
• Hidden biases in both the collection and analysis stages
The Hidden Biases of Big Data
http://blogs.hbr.org/cs/2013/04/the_hidden_biases_in_big_data.html
• Google Flu Trends vs. CDC
– 11% vs. 6% of US population infected
– Media coverage affected Google Flu Trends
• Boston’s StreetBump smartphone app
– 20,000 potholes a year need to be patched
– Poor areas have less cell phones, less service
• Hurricane Sandy 20 million tweets + 4square
– Grocery shopping day before
– Night life peaked day after
– Illusion Manhattan was hub of disaster
Automate This
Christopher Steiner, 2012
• Dr. Bot
– Always be convenient and available
– Know all your strengths and weaknesses
– Know every risk factor past conditions might signal
– Know your complete medical history
– Know medical history of last 3 generations of family
– Never make careless mistake in prescription
Automate This
Christopher Steiner, 2012
• Dr. Bot
– Always be up-to-date on treatments and discoveries
– Never fall into bad habits or ruts
– Monitor you at all times
– Always be searching for the hint of a problem by
monitoring pulse, cholesterol, blood pressure, weight,
lung capacity, bone density, changes in the air you
expel
Computers Are Just Not That Smart
• Eric Horvitz, MD of Microsoft
• Medical kiosk avatar interview mother & child
with diarrhea
• Avatar decides child does not need to go to ER
• Avatar makes appointment with clinic
• The moderator of AI panel thought the avatar was
much more compassionate than the human triage
nurses she has encountered in NYC ERs
Vinod Khosla (Sun Microsystems)
http://techcrunch.com/2012/01/10/doctors-or-algorithms/
• Being part of the health care system is a
disadvantage to disrupting the status quo
• Machine learning system will be cheaper, more
accurate, and more objective than physicians
• Machine expertise would need to be in the 80th
percentile of human physician expertise
Vinod Khosla (Sun Microsystems)
http://techcrunch.com/2012/01/10/doctors-or-algorithms/
• Do we need doctors or algorithms
• “Health is like witchcraft and just based on
tradition”
• 80% of physicians will be replaced by machines
• 80% of doctors are below the top 20%
• We will not need average doctors
• Still need “doctors like Gregory House who solve
biomedical puzzles beyond our best input ability”
Will Robots Steal Your Job?
http://www.slate.com/articles/technology/robot_invasion/2011/09/will_robots_steal_your_job_3.single.ht
ml
• “At this moment, there's someone training for your
job. He may not be as smart as you are—in fact, he
could be quite stupid—but what he lacks in
intelligence he makes up for in drive, reliability,
consistency, and price. He's willing to work for
longer hours, and he's capable of doing better work,
at a much lower wage. He doesn't ask for health or
retirement benefits, he doesn't take sick days, and he
doesn't goof off when he's on the clock. What's
more, he keeps getting better at his job.”
How Robots Will Replace Doctors
http://www.washingtonpost.com/blogs/ezra-klein/post/how-robots-will-replace-
doctors/2011/08/25/gIQASA17AL_blog.html
• “We’re not sitting in that room wrapped in a
garment made of the finest recycled sandpaper
because we were hoping for a good conversation.
We’re there because we’re sick…, and we’re
hoping this arrogant, hurried, credentialed genius
can tell us what’s wrong. We go to doctors not
because they’re great empaths, but because we’re
hoping medical school has made them into the
closest thing the human race has developed into
robots.”

More Related Content

PDF
Applications of Big Data
PPTX
Big Data Analytics
DOCX
Big data (word file)
PPTX
Big data 2017 final
PPTX
Team 2 Big Data Presentation
PPTX
Big data
PDF
The promise and challenge of Big Data
Applications of Big Data
Big Data Analytics
Big data (word file)
Big data 2017 final
Team 2 Big Data Presentation
Big data
The promise and challenge of Big Data

What's hot (20)

PDF
NewMR 2016 presents: 9 Big Applications of Big Data
PDF
Big Data & Analytics (Conceptual and Practical Introduction)
PPTX
Big data introduction
PDF
Approaching Big Data: Lesson Plan
PPTX
Big data PPT prepared by Hritika Raj (Shivalik college of engg.)
PPTX
Big data - What is It?
PPTX
Big Data can be fun!
PDF
L18 Big Data and Analytics
PPTX
Ppt for Application of big data
PDF
Big data overview external
PPTX
Big Data - 25 Amazing Facts Everyone Should Know
PPTX
Big Data Applications & Analytics Motivation: Big Data and the Cloud; Centerp...
PPTX
Chapter 4 what is data and data types
PDF
Big Data & Future - Big Data, Analytics, Cloud, SDN, Internet of things
PPTX
Big data ppt
PPTX
A Short History of Big Data
DOCX
Big data lecture notes
PPTX
PPTX
Big Data - Applications and Technologies Overview
PPTX
Big Data for Beginners
NewMR 2016 presents: 9 Big Applications of Big Data
Big Data & Analytics (Conceptual and Practical Introduction)
Big data introduction
Approaching Big Data: Lesson Plan
Big data PPT prepared by Hritika Raj (Shivalik college of engg.)
Big data - What is It?
Big Data can be fun!
L18 Big Data and Analytics
Ppt for Application of big data
Big data overview external
Big Data - 25 Amazing Facts Everyone Should Know
Big Data Applications & Analytics Motivation: Big Data and the Cloud; Centerp...
Chapter 4 what is data and data types
Big Data & Future - Big Data, Analytics, Cloud, SDN, Internet of things
Big data ppt
A Short History of Big Data
Big data lecture notes
Big Data - Applications and Technologies Overview
Big Data for Beginners
Ad

Similar to The Pros and Cons of Big Data in an ePatient World (20)

PPT
Maximizing The Use of Your Smart Phone: Medical Apps & Digital Medicine
PPT
PYA Healthcare Thought Leader Explores Ten Technology “Game Changers”
PPTX
“Big Data” and the Challenges for Statisticians
PPT
Presentation Looks into the Future of Oncology Nursing in a Digital Age
PPTX
Big Data World
PPTX
Ov big data
PPTX
Big data
PPTX
"Big Data for Development: Opportunities & Challenges” - UN Global Pulse
PPTX
Puja(801),sanghamitra(819),surabhi(844)
PPTX
1. Data Science overview - part1.pptx
PDF
Sdal air health and social development (jan. 27, 2014) final
PDF
Data-Ed: Demystifying Big Data
PDF
Data-Ed Webinar: Demystifying Big Data
PPTX
PPTX
Big Data and the Social Sciences
PPT
From “Big Data” to Digital Medicine--PYA Explores Innovations in Healthcare
PPTX
Big Data for a Better World
PPTX
Innovative project1
PPTX
Big data and health care
Maximizing The Use of Your Smart Phone: Medical Apps & Digital Medicine
PYA Healthcare Thought Leader Explores Ten Technology “Game Changers”
“Big Data” and the Challenges for Statisticians
Presentation Looks into the Future of Oncology Nursing in a Digital Age
Big Data World
Ov big data
Big data
"Big Data for Development: Opportunities & Challenges” - UN Global Pulse
Puja(801),sanghamitra(819),surabhi(844)
1. Data Science overview - part1.pptx
Sdal air health and social development (jan. 27, 2014) final
Data-Ed: Demystifying Big Data
Data-Ed Webinar: Demystifying Big Data
Big Data and the Social Sciences
From “Big Data” to Digital Medicine--PYA Explores Innovations in Healthcare
Big Data for a Better World
Innovative project1
Big data and health care
Ad

More from PYA, P.C. (20)

PDF
“CARES Act Provider Relief Fund: Opportunities, Compliance, and Reporting”
PDF
PYA Presented on 2021 E/M Changes and a CARES Act Update During GHA Complianc...
PDF
Webinar: “Trick or Treat? October 22nd Revisions to Provider Relief Fund Repo...
PDF
“Regulatory Compliance Enforcement Update: Getting Results from the Guidance”
PDF
“Federal Legislative and Regulatory Update,” Webinar at DFWHC
PDF
On-Demand Webinar: Compliance With New Provider Relief Funds Reporting Requir...
PDF
Webinar: “While You Were Sleeping…Proposed Rule Positioned to Significantly I...
PDF
Webinar: “Cybersecurity During COVID-19: A Look Behind the Scenes
PDF
Webinar: CMS Pricing Transparency — Final Rule Requirements, Compliance Chall...
PDF
Federal Regulatory Update
PDF
Webinar: Post-Pandemic Provider Realignment — Navigating An Uncertain Market
PDF
07 24-20 pya webinar covid physician compensation
PDF
Engaging Your Board In the COVID-19 Era
PDF
Webinar: Free Money with Strings Attached – Cares Act Considerations for Fron...
PDF
Webinar: “Got a Payroll? Don’t Leave Money on the Table”
PDF
Webinar: So You Have a PPP Loan. Now What?
PDF
Webinar: “Making It Work—Physician Compensation During the COVID-19 Pandemic”
PDF
Webinar: “Provider Relief Fund Payments – What We Know, What We Don’t Know, W...
PDF
Webinar: “Hospitals, Capital, and Cashflow Under COVID-19”
PDF
PYA Webinar: “Additional Expansion of Medicare Telehealth Coverage During COV...
“CARES Act Provider Relief Fund: Opportunities, Compliance, and Reporting”
PYA Presented on 2021 E/M Changes and a CARES Act Update During GHA Complianc...
Webinar: “Trick or Treat? October 22nd Revisions to Provider Relief Fund Repo...
“Regulatory Compliance Enforcement Update: Getting Results from the Guidance”
“Federal Legislative and Regulatory Update,” Webinar at DFWHC
On-Demand Webinar: Compliance With New Provider Relief Funds Reporting Requir...
Webinar: “While You Were Sleeping…Proposed Rule Positioned to Significantly I...
Webinar: “Cybersecurity During COVID-19: A Look Behind the Scenes
Webinar: CMS Pricing Transparency — Final Rule Requirements, Compliance Chall...
Federal Regulatory Update
Webinar: Post-Pandemic Provider Realignment — Navigating An Uncertain Market
07 24-20 pya webinar covid physician compensation
Engaging Your Board In the COVID-19 Era
Webinar: Free Money with Strings Attached – Cares Act Considerations for Fron...
Webinar: “Got a Payroll? Don’t Leave Money on the Table”
Webinar: So You Have a PPP Loan. Now What?
Webinar: “Making It Work—Physician Compensation During the COVID-19 Pandemic”
Webinar: “Provider Relief Fund Payments – What We Know, What We Don’t Know, W...
Webinar: “Hospitals, Capital, and Cashflow Under COVID-19”
PYA Webinar: “Additional Expansion of Medicare Telehealth Coverage During COV...

Recently uploaded (20)

PPTX
ACID BASE management, base deficit correction
PPTX
NEET PG 2025 Pharmacology Recall | Real Exam Questions from 3rd August with D...
PPTX
15.MENINGITIS AND ENCEPHALITIS-elias.pptx
PPTX
Respiratory drugs, drugs acting on the respi system
PPTX
Chapter-1-The-Human-Body-Orientation-Edited-55-slides.pptx
PPTX
ca esophagus molecula biology detailaed molecular biology of tumors of esophagus
PPTX
History and examination of abdomen, & pelvis .pptx
PPTX
Uterus anatomy embryology, and clinical aspects
PPTX
surgery guide for USMLE step 2-part 1.pptx
PPTX
Pathophysiology And Clinical Features Of Peripheral Nervous System .pptx
PPTX
DENTAL CARIES FOR DENTISTRY STUDENT.pptx
PDF
Handout_ NURS 220 Topic 10-Abnormal Pregnancy.pdf
PPTX
Gastroschisis- Clinical Overview 18112311
PPTX
Neuropathic pain.ppt treatment managment
PDF
Human Health And Disease hggyutgghg .pdf
PPTX
Note on Abortion.pptx for the student note
PPT
ASRH Presentation for students and teachers 2770633.ppt
DOCX
NEET PG 2025 | Pharmacology Recall: 20 High-Yield Questions Simplified
PPT
Breast Cancer management for medicsl student.ppt
PPTX
CME 2 Acute Chest Pain preentation for education
ACID BASE management, base deficit correction
NEET PG 2025 Pharmacology Recall | Real Exam Questions from 3rd August with D...
15.MENINGITIS AND ENCEPHALITIS-elias.pptx
Respiratory drugs, drugs acting on the respi system
Chapter-1-The-Human-Body-Orientation-Edited-55-slides.pptx
ca esophagus molecula biology detailaed molecular biology of tumors of esophagus
History and examination of abdomen, & pelvis .pptx
Uterus anatomy embryology, and clinical aspects
surgery guide for USMLE step 2-part 1.pptx
Pathophysiology And Clinical Features Of Peripheral Nervous System .pptx
DENTAL CARIES FOR DENTISTRY STUDENT.pptx
Handout_ NURS 220 Topic 10-Abnormal Pregnancy.pdf
Gastroschisis- Clinical Overview 18112311
Neuropathic pain.ppt treatment managment
Human Health And Disease hggyutgghg .pdf
Note on Abortion.pptx for the student note
ASRH Presentation for students and teachers 2770633.ppt
NEET PG 2025 | Pharmacology Recall: 20 High-Yield Questions Simplified
Breast Cancer management for medicsl student.ppt
CME 2 Acute Chest Pain preentation for education

The Pros and Cons of Big Data in an ePatient World

  • 1. The Pros and Cons of Big Data in an ePatient World Kent Bottles, MD Chief Medical Officer, PYA Analytics ePatient Connections/2013 Philadelphia, Pennsylvania September 16, 2013
  • 2. Big Data Viktor Mayer-Schonberger & Kenneth Cukier, 2013 • Big data refers to things one can do at a large scale that cannot be done at a smaller one, to extract new insights or create new forms of value, in ways that change markets, organizations, the relationship between citizens and governments. • Causality is replaced by correlation • Not knowing why but only what
  • 3. Big Data Viktor Mayer-Schonberger & Kenneth Cukier, 2013 • Statistics allows richest findings using the smallest amount of data • Randomness trumped sample size • 2007 300 exabytes of stored data • 2013 1,200 exabytes of stored data • 2013 only 2% is non-digital
  • 4. Sizing Up Big Data Steve Lohr, NY Times, June 20, 2013 • Bundle of technologies – Web pages, browsing habits, sensor signals, social media, GPS location data, genomic information, surveillance videos – Advances in data storage and processing – Machine learning/AI software to find actionable correlations from the big data
  • 5. Sizing Up Big Data Steve Lohr, NY Times, June 20, 2013 • Philosophy about how decisions should be made – Decisions based on data and analysis – Less based on experience and gut intuition – Eliminates anchoring bias and confirmation bias • Revolution in measurement – Digital equivalent of the telescope – Digital equivalent of the microscope
  • 6. Big Data WSJ March 11, 2013 • 1950s 600 megabytes (John Hancock) • 1960s 807 megabytes (AA Sabre) • 1970s 80 gigabytes (Fed Express Cosmos) • 1980s 450 gigabytes (CitiCorp NAIB) • 1990s 180 terabytes (WalMart) • 2000s 25 petabytes (Google) • 2010s 100 petabytes (Facebook)
  • 7. Big Data WSJ March 11, 2013 • 1 Bit = Binary Digit • 8 Bits = 1 Byte • 1000 Bytes = 1 Kilobyte • 1000 Kilobytes = 1 Megabyte • 1000 Megabytes = 1 Gigabyte • 1000 Gigabytes = 1 Terabyte • 1000 Terabytes = 1 Petabyte • 1000 Petabytes = 1 Exabyte • 1000 Exabytes = 1 Zettabyte
  • 8. Jeffrey Hammerbacher http://www.youtube.com/watch?v=OVBZTDREg7c • All industries are being disrupted – Moneyball, 538, Large Hadron Collider • McKinsley: Big Data: The Next Frontier for Competition – $338 billion potential annual value to US healthcare – $165 billion in clinical operations – $105 billion in research and development
  • 9. Jeffrey Hammerbacher http://www.youtube.com/watch?v=OVBZTDREg7c • Oracle: From Overload to Impact – Healthcare executives say collecting & managing more business information today than 2 years ago – Average increase 85% per year • Frost & Sullivan: US Hospital Health Data Analytics Market – 2011 10% of US hospitals use data analytic tools – 2016 50% of US hospitals will use data analytic tools
  • 10. Jeffrey Hammerbacher on Moneyball www.youtube.com/watch?v=OVBZTDREg7c • Triple Crown in MLB: Batting average, RBI, HR • OPS (on base plus slugging) • GPA (gross production average) • TOB (times on base) • The outcome is how many runs we score and allow; A’s have Matt Stairs; Need stat that reflects both runs produced at bat & runs saved by defense • WAR (“Wins above replacement”)
  • 11. Big Data Viktor Mayer-Schonberger & Kenneth Cukier, 2013 • To analyze & understand the world we used to test hypotheses driven by theories • Big data discards theories & causality for correlations • University of Ontario premature baby studies • 1,260 data points per second • Diagnose infections 24 hours before apparent • Very constant vital signs indicate impending infection
  • 12. Big Data Viktor Mayer-Schonberger & Kenneth Cukier, 2013 • Google Nature article predicts flu spread in USA • Compared 50 million search terms with CDC data on spread of flu from 2003 to 2008 • 450 million different mathematical models • 45 search terms had strong correlation with spread of flu • H1N1 crisis in 2009 Google approach worked
  • 13. New Tools to Combat Epidemics Amy O’Leary, NY Times, June 20, 2013 • Google Flu overestimates spread of flu in 2013 • Goggle Flu does not track new diseases • BioMosaic – Combines airline records, disease reports, demographic data – Website and iPad app – Showed 5 counties in Florida, 5 counties in NY were most at risk from cholera epidemic in Haiti in 2010
  • 14. New York City’s Office of Policy & Strategic Planning • 1 terabyte of data flows into office every day • 95% success rate in identifying restaurants dumping cooking oil into sewers • Doubled the hit rate of finding stores selling bootleg cigarettes • Sped removal of trees toppled by Sandy • Guided building inspectors to increase citation rate from 13 to 80% for buildings likely to have catastrophic house fires
  • 15. Algorithms Mine Public Data • Atul Butte combined data from 130 studies of gene activity levels in diabetic & healthy tissue • Butte identified new gene associate with Type 2 DM because stood out in 78/130 studies • Algorithm looking for drugs & diseases that had opposing effects on gene expression – Cimetidine for lung adenocarcinomas – Topiramate for Chrohn’s Disease
  • 16. Algorithms Mine Public Data • Russ Altman used algorithms to mine Stanford Translational Research Integrated Database Environment & FDA adverse event reports database • Patients taking SSRI antidepressants and thiazide are at increased risk for long QT syndrome, a serious cardiac arrhythmia
  • 17. Big Data Viktor Mayer-Schonberger & Kenneth Cukier, 2013 • GPS allows us to establish location quickly, cheaply, and without requiring specialized knowledge • UPS uses geo-loc data from sensors, wireless modules, and GPS on vehicles • 2011 UPS shaved 30 million miles off routes, saved 3 million gallons of fuel, and 30,000 metric tons of carbon dioxide emissions
  • 18. Big Data Viktor Mayer-Schonberger & Kenneth Cukier, 2013 • Datafication of acts of living • Zeo large database of sleep patterns • Asthmapolis sensor to inhaler that tracks location via GPS identifies environmental triggers • Fitbit and Jawbone • iTrem monitors Parkinson’s tremors almost as well as the tri-axial accelerometer used in specialized office medical equipment
  • 19. Big Data for Cancer Care Ron Winslow, WSJ, March 27, 2013 • ASCO • Database of hundreds of thousands of patients • Prototype has collected 100,000 breast cancer patients from 27 groups who have different EMRs • “Recognition that big data is imperative for the future of medicine” Lynn Etheredge • Less than 5% of adult cancer patients participate in randomized clinical trials
  • 20. Big Data Viktor Mayer-Schonberger & Kenneth Cukier, 2013 • Recombinant data • Danish Cancer Society study on cell phone/cancer • Cellphone users from 1987 to 1995 (358,403) • Brain cancer patients (10,729) • Registry of education and disposable income • Combining the three databases found no increase in risk of cancer for those who used cell phones • Not based on sample size; based on N=all
  • 21. Big Data Viktor Mayer-Schonberger & Kenneth Cukier, 2013 • Multiple uses of same database • Data exhaust: digital trail people leave in their wake • Google spell-checking system uses bad data to improve search, autocomplete feature in Gmail, Google Docs, and translation system
  • 22. Big Data Viktor Mayer-Schonberger & Kenneth Cukier, 2013 • Paralyzing privacy – Notice and consent – Cannot give informed consent for secondary uses – Anonymization does not work • AOL 2006 20 million search queries from 657,000 users: NY Times identified user number 4417749 as Thelma Arnold (“My goodness, it’s my whole personal life. I had no idea somebody was looking over my shoulder”) • Netflix Prize 100 million rental records from 500,000 users. Mother and closeted lesbian in Midwest was reidentified
  • 23. Big Data Viktor Mayer-Schonberger & Kenneth Cukier, 2013 • Probability and punishment – Minority Report: People are imprisoned not for what they did, but for what they are foreseen to do, even though they never actually commit the crime – Blue CRUSH (Crime Reduction, Utilizing Statistical History in Memphis, Tennessee) – Homeland Security FAST (Future Attribute Screening Technology) – Big data based on correlation unsuitable tool to judge causality and thus assign individual culpability
  • 24. Big Data Viktor Mayer-Schonberger & Kenneth Cukier, 2013 • Dictatorship of Data – Relying on numbers when they are far more fallible than we think – Robert McNamara’s body count numbers in Vietnam – Michael Eisen tried to buy The Making of a Fly on Amazon in April 2011. Two established sellers offering the book for $1,730,045 and $2,198,177. Two week escalation to a peak of $23,698,655.93 on April 18 – Unsupervised algorithms priced the books for the two sellers.
  • 25. Big Data Viktor Mayer-Schonberger & Kenneth Cukier, 2013 • Regulatory shift from “privacy by consent” to “privacy through accountability” • “Differential privacy” through deliberately blurring the data so hard to reidentify people • Openness, certification, disprovability • Algorithmists to perform “audits”
  • 26. What Big Data Can’t Do David Brooks, NY Times, February 26, 2013 • Data struggles with the social • Data struggles with context • Data creates bigger haystacks (spurious correlations that are statistically significant) • Data has trouble with big problems • Data favors memes over masterpieces • Data obscures values
  • 27. What Big Data Will Never Explain http://www.newrepublic.com/article/112734/what-big-data-will-never-explain • “To datafy a phenomenon,” they explain, “is to put it in a quantified format so it can be tabulated and analyzed.” • Sentiment analysis mathematical model for grief called Good Grief Algorithm • “The mathematization of subjectivity will founder upon the resplendent fact that we are ambiguous beings. We frequently have mixed feelings, and are divided against ourselves.”
  • 28. The Hidden Biases of Big Data http://blogs.hbr.org/cs/2013/04/the_hidden_biases_in_big_data.html • Big Data vs. Data with Depth • “With enough data, the numbers speak for themselves.” Chris Anderson • Can numbers actually speak for themselves? Sadly, they can't. Data and data sets are not objective; they are creations of human design. We give numbers their voice, draw inferences from them, and define their meaning through our interpretations. • Hidden biases in both the collection and analysis stages
  • 29. The Hidden Biases of Big Data http://blogs.hbr.org/cs/2013/04/the_hidden_biases_in_big_data.html • Google Flu Trends vs. CDC – 11% vs. 6% of US population infected – Media coverage affected Google Flu Trends • Boston’s StreetBump smartphone app – 20,000 potholes a year need to be patched – Poor areas have less cell phones, less service • Hurricane Sandy 20 million tweets + 4square – Grocery shopping day before – Night life peaked day after – Illusion Manhattan was hub of disaster
  • 30. Automate This Christopher Steiner, 2012 • Dr. Bot – Always be convenient and available – Know all your strengths and weaknesses – Know every risk factor past conditions might signal – Know your complete medical history – Know medical history of last 3 generations of family – Never make careless mistake in prescription
  • 31. Automate This Christopher Steiner, 2012 • Dr. Bot – Always be up-to-date on treatments and discoveries – Never fall into bad habits or ruts – Monitor you at all times – Always be searching for the hint of a problem by monitoring pulse, cholesterol, blood pressure, weight, lung capacity, bone density, changes in the air you expel
  • 32. Computers Are Just Not That Smart • Eric Horvitz, MD of Microsoft • Medical kiosk avatar interview mother & child with diarrhea • Avatar decides child does not need to go to ER • Avatar makes appointment with clinic • The moderator of AI panel thought the avatar was much more compassionate than the human triage nurses she has encountered in NYC ERs
  • 33. Vinod Khosla (Sun Microsystems) http://techcrunch.com/2012/01/10/doctors-or-algorithms/ • Being part of the health care system is a disadvantage to disrupting the status quo • Machine learning system will be cheaper, more accurate, and more objective than physicians • Machine expertise would need to be in the 80th percentile of human physician expertise
  • 34. Vinod Khosla (Sun Microsystems) http://techcrunch.com/2012/01/10/doctors-or-algorithms/ • Do we need doctors or algorithms • “Health is like witchcraft and just based on tradition” • 80% of physicians will be replaced by machines • 80% of doctors are below the top 20% • We will not need average doctors • Still need “doctors like Gregory House who solve biomedical puzzles beyond our best input ability”
  • 35. Will Robots Steal Your Job? http://www.slate.com/articles/technology/robot_invasion/2011/09/will_robots_steal_your_job_3.single.ht ml • “At this moment, there's someone training for your job. He may not be as smart as you are—in fact, he could be quite stupid—but what he lacks in intelligence he makes up for in drive, reliability, consistency, and price. He's willing to work for longer hours, and he's capable of doing better work, at a much lower wage. He doesn't ask for health or retirement benefits, he doesn't take sick days, and he doesn't goof off when he's on the clock. What's more, he keeps getting better at his job.”
  • 36. How Robots Will Replace Doctors http://www.washingtonpost.com/blogs/ezra-klein/post/how-robots-will-replace- doctors/2011/08/25/gIQASA17AL_blog.html • “We’re not sitting in that room wrapped in a garment made of the finest recycled sandpaper because we were hoping for a good conversation. We’re there because we’re sick…, and we’re hoping this arrogant, hurried, credentialed genius can tell us what’s wrong. We go to doctors not because they’re great empaths, but because we’re hoping medical school has made them into the closest thing the human race has developed into robots.”