SlideShare a Scribd company logo
Nathan Kohn
BU MET
enzyme@bu.edu
Thinking Big in Small Spaces
One Hadoop Two Hadoop
(Big Data & 21st Century Analytics in the Classroom)
Stanislav Seltser
BU MET
sseltser@bu.edu
Mar 7, 2014 2
Big Data is Everywhere
72 Hours a Minute
YouTube
28 Million
Wikipedia Pages
900 Million
Facebook Users
6 Billion
Flickr Photos
2
“… data a new class of economic asset,
like currency or gold.”
“…growing at 50 percent a year…”
Mar 7, 2014 3
How will we
design and implement
Big learning systems?
Big Learning
3
GPUs Multicore Clusters Clouds Supercomputers
Mar 7, 2014 4
Graphs are Everywhere
User
Movie
Netflix
Collaborative Filtering
Docs
Words
Wiki
Text Analysis
Social Network
Probabilistic Analysis
4
Mar 7, 2014 6
Big Data & Linear Regression
Mar 7, 2014 7
Stochastic Gradient Descent
Mar 7, 2014 8
Serial vs Parallel SGD
Mar 7, 2014 9
Big Data Landscape –Apps,
Infrastructure, Data Semantics
Mar 7, 2014 10
Landscape
Mar 7, 2014 11
Grad Student Response #1
How Big is Big? How is BigData measured?
As per my understanding, the term big data doesn’t refer directly to the size of the
data itself. What the term might mean is that the demand of data
(storage/transfer/analysis) has surpassed several parameters that the relational
databases cannot control (or handle) –too big to handle--.
How is it measure, I really don’t know. Server storage keeps increasing and
increasing (5TB, 10TB, 50TB, 100TB……) and RBDMS’s like ORACLE seem to be
keeping up with it, but then again I don’t know exactly what measure is being used.
Is Big Data relevant to you professionally?
Indeed it is, even though I am not using it or practicing it daily.
I am really interested in learning it.
Is Big Data relevant to you personally?
Very relevant, and it is a topic that drove me into pursuing a master’s degree
Mar 7, 2014 12
Grad Student Response #2
How Big is Big? How is BigData measured?
Big data is a term for large data sets that are too complex to compute by traditional
data management processes and tools. Its points and data types are dependent and
measured by the parameters set forth by each organization.
Where does BigData come from?
Big data can come from various sources that can be categorized as internal or external
contributors.
What is BigData good for?
BigData is good for complex and large data sets that exist within a relational databases
and may require object-oriented programming.
Would you like to see Big Data incorporated in your courses?
Yes, I think that we exist in a period in which we are inundated by social media,
numbers, photographs and other forms of data which require us to be well versed in
the storage, maintenance, and interface design so that we are better able to parse
through the Big Data that we encounter on a daily basis.
Mar 7, 2014 13
Undergrad Student #1
Is Big Data relevant to you personally?
Yes. As my current major is Business Application Development, I can see myself
gaining a lot of opportunities to deal with not only the technologies of building up
user interface in the future but also the technologies of storing user information,
and the techniques used to understand those data could be another opportunity for
the business
Would you like to see Big Data incorporated in your courses?
Yes. I would like to see our course includes some of the techniques that the
corporates use nowadays to understand the relation between their data and the
problems they need to address, such as how they decide which part of the their big
data provides them with the most helpful information for their problem, and explain
the meaning of their data analysis based on the result, such as how they can decide
the result is accurate and meaningful enough to allow them to take an action.
Do you have any questions about Big Data?
Big data is a pretty interesting and useful topic. It will be nice to have more
background information to help our understanding.
Mar 7, 2014 14
Undergrad Student #2
How Big is Big? How is BigData measured?
The survey is asking rather easy conceptual questions about big data. Big data is easy to understand at
that level: we finally have the technology to store, retrieve (cheap memory), and analyze (with proper
languages) data on magnitudes that were impossible before. Instead of just a phone book type of data,
people can gather every relevant or even possibly relevant piece of information about anything (often
but not limited to customers of a business). I have read articles about how some companies (credit card
mostly, if I remember correctly) that can tell if a woman is pregnant before they even know themselves.
Or they can predict divorce rates a year in advance quite reliably. All this from their spending habits and
deviations from those habits.
While all this is fascinating, I don't have any real interest in learning the conceptual level like this. If big
data is to be relevant in a class, it needs to show HOW all this is done. Teach the language, teach the
search and statistical algorithms, or even the methods people use to collect big data (the penta+bytes
aren't being entered by hand).
Classes or lectures on big data should come away with some practical knowledge on the subject,
otherwise we're just applying a name to something people generally understand: organizations collect
and analyze as much data as they can, and recent technology has made that amount of data
staggeringly large. The key- and buzz-words are nice to sound like an expert, but the how to is
generally more important.
Mar 7, 2014 15
Student Response #4
How Big is Big? How is BigData measured?
Big data is a term developed recently to describe the trend of exponentially
increasing amount of data stored by organizations for business uses. Very often
these big data might be extremely big, such as 16 petabytes. These data is measured
by the memory space they occupy. Thus, a 16 petabytes of big data approximately
occupies 1015 bytes of memory.
Where does BigData come from?
Big Data could come from different sources, such as emails, social-networking sites,
sensors on the webs, sensors installed on other tracking devices, or line of business
applications.
Is Big Data relevant to you professionally?
Yes. In my previous work as market researcher, we always needed to gather
information and analyzed them for the business decision making. The technologies
of gathering big data and the techniques used to analyze and filter data is also
considered extremely helpful for the career.
Mar 7, 2014 16
Data Warehouse Course
Student Comments:
Very informative, content-rich course, covers the latest technologies, trends, and
skills of data warehousing and data management, and data analysis. I would
recommend to include this course in the required courses for the MS in CIS
with concentration in Database Management and BI Program.
Relevance to job opportunities and cutting edge technologies.
This is probably the most useful course I have taken at Boston University. I have
used every bit of what this professor taught every night at work. I have made
contribution to my employer, a data mining company in ways that had
never been done before as a result of this course. I have for the first time in
my 8 years career planned, designed, and augmented a Data Warehouse from
scratch. I have configured an analysis server and reported using MD x queries.
This professor has been helpful in many ways. He has guided me through
some Data Warehouse design projects at work. Moreover, he has been
available to work with me and others after class and on week days.
Mar 7, 2014 17
Road map
to help archaeologists find answers to questions hidden in
thousands of images and text files generated from field
sites around the world:
Professor Mark Eramian et al. have been awarded
$548,000 through the Digging into Data
Challenge, National Endowment for the Humanities
A
Archeology
Recently, a researcher wanted to ascertain whether a
search against GQ-Pat could provide novel insight into
his work related to a specific gene, the cAMP
Responsive Element Modulator.
Reporting to the VP of R&D:
Apply data mining and machine learning techniques to
develop better search and content discovery in the field of
patents Invent new ways to index tens of millions of
documents with semantic information
B
Biology
(hint: beer)
Z
Zymurgy
QUIZ ?
Quiz:
Nathan Kohn
BU MET
enzyme@bu.edu
Stanislav Seltser
BU MET
sseltser@bu.edu

More Related Content

PDF
Big Data Analytics Lecture notes pdf notes
PDF
(R17A0528) BIG DATA ANALYTICS.pdf
PDF
(R17A0528) BIG DATA ANALYTICS.pdf
PDF
Big Data Analysis
PDF
Big Data Analytics Introduction chapter.pdf
PPTX
Unit – 1 introduction to big datannj.pptx
PPTX
Introduction to Big Data
PDF
Big Data Intoduction & Hadoop ArchitectureModule1.pdf
Big Data Analytics Lecture notes pdf notes
(R17A0528) BIG DATA ANALYTICS.pdf
(R17A0528) BIG DATA ANALYTICS.pdf
Big Data Analysis
Big Data Analytics Introduction chapter.pdf
Unit – 1 introduction to big datannj.pptx
Introduction to Big Data
Big Data Intoduction & Hadoop ArchitectureModule1.pdf

Similar to KOHN.ppt (20)

PPTX
Foundations of Big Data: Concepts, Techniques, and Applications
PPTX
Introduction to Big Data
DOCX
5_6060001354879861968.docx
PDF
Big Data in Practice.pdf
PPTX
Big Data Analytics
DOCX
Map Reduce in Big fata
PDF
Know The What, Why, and How of Big Data_.pdf
PPTX
bigdata introduction for students pg msc
PPTX
Big data Presentation
PPTX
Big Data By Vijay Bhaskar Semwal
PPTX
SKILLWISE-BIGDATA ANALYSIS
PPTX
Bigdata Hadoop introduction
PPTX
20211011112936_PPT01-Introduction to Big Data.pptx
PPTX
UNIT_1-BD.pptx
PPTX
bigdata- Introduction for pg students fo
PPTX
Lunch & Learn Intro to Big Data
PPTX
Big data PPT prepared by Hritika Raj (Shivalik college of engg.)
PPTX
Bigdata and Hadoop with applications
PDF
Whitepaper: Know Your Big Data – in 10 Minutes! - Happiest Minds
DOCX
BIGDATAPrepared ByMuhammad Abrar UddinIntrodu.docx
Foundations of Big Data: Concepts, Techniques, and Applications
Introduction to Big Data
5_6060001354879861968.docx
Big Data in Practice.pdf
Big Data Analytics
Map Reduce in Big fata
Know The What, Why, and How of Big Data_.pdf
bigdata introduction for students pg msc
Big data Presentation
Big Data By Vijay Bhaskar Semwal
SKILLWISE-BIGDATA ANALYSIS
Bigdata Hadoop introduction
20211011112936_PPT01-Introduction to Big Data.pptx
UNIT_1-BD.pptx
bigdata- Introduction for pg students fo
Lunch & Learn Intro to Big Data
Big data PPT prepared by Hritika Raj (Shivalik college of engg.)
Bigdata and Hadoop with applications
Whitepaper: Know Your Big Data – in 10 Minutes! - Happiest Minds
BIGDATAPrepared ByMuhammad Abrar UddinIntrodu.docx
Ad

Recently uploaded (20)

PPTX
Tartificialntelligence_presentation.pptx
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PPTX
A Presentation on Artificial Intelligence
PPTX
Machine Learning_overview_presentation.pptx
PDF
Accuracy of neural networks in brain wave diagnosis of schizophrenia
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Mushroom cultivation and it's methods.pdf
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Machine learning based COVID-19 study performance prediction
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Approach and Philosophy of On baking technology
Tartificialntelligence_presentation.pptx
Mobile App Security Testing_ A Comprehensive Guide.pdf
A Presentation on Artificial Intelligence
Machine Learning_overview_presentation.pptx
Accuracy of neural networks in brain wave diagnosis of schizophrenia
Assigned Numbers - 2025 - Bluetooth® Document
Building Integrated photovoltaic BIPV_UPV.pdf
Diabetes mellitus diagnosis method based random forest with bat algorithm
Per capita expenditure prediction using model stacking based on satellite ima...
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Mushroom cultivation and it's methods.pdf
Digital-Transformation-Roadmap-for-Companies.pptx
Machine learning based COVID-19 study performance prediction
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
gpt5_lecture_notes_comprehensive_20250812015547.pdf
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Reach Out and Touch Someone: Haptics and Empathic Computing
Approach and Philosophy of On baking technology
Ad

KOHN.ppt

  • 1. Nathan Kohn BU MET enzyme@bu.edu Thinking Big in Small Spaces One Hadoop Two Hadoop (Big Data & 21st Century Analytics in the Classroom) Stanislav Seltser BU MET sseltser@bu.edu
  • 2. Mar 7, 2014 2 Big Data is Everywhere 72 Hours a Minute YouTube 28 Million Wikipedia Pages 900 Million Facebook Users 6 Billion Flickr Photos 2 “… data a new class of economic asset, like currency or gold.” “…growing at 50 percent a year…”
  • 3. Mar 7, 2014 3 How will we design and implement Big learning systems? Big Learning 3 GPUs Multicore Clusters Clouds Supercomputers
  • 4. Mar 7, 2014 4 Graphs are Everywhere User Movie Netflix Collaborative Filtering Docs Words Wiki Text Analysis Social Network Probabilistic Analysis 4
  • 5. Mar 7, 2014 6 Big Data & Linear Regression
  • 6. Mar 7, 2014 7 Stochastic Gradient Descent
  • 7. Mar 7, 2014 8 Serial vs Parallel SGD
  • 8. Mar 7, 2014 9 Big Data Landscape –Apps, Infrastructure, Data Semantics
  • 9. Mar 7, 2014 10 Landscape
  • 10. Mar 7, 2014 11 Grad Student Response #1 How Big is Big? How is BigData measured? As per my understanding, the term big data doesn’t refer directly to the size of the data itself. What the term might mean is that the demand of data (storage/transfer/analysis) has surpassed several parameters that the relational databases cannot control (or handle) –too big to handle--. How is it measure, I really don’t know. Server storage keeps increasing and increasing (5TB, 10TB, 50TB, 100TB……) and RBDMS’s like ORACLE seem to be keeping up with it, but then again I don’t know exactly what measure is being used. Is Big Data relevant to you professionally? Indeed it is, even though I am not using it or practicing it daily. I am really interested in learning it. Is Big Data relevant to you personally? Very relevant, and it is a topic that drove me into pursuing a master’s degree
  • 11. Mar 7, 2014 12 Grad Student Response #2 How Big is Big? How is BigData measured? Big data is a term for large data sets that are too complex to compute by traditional data management processes and tools. Its points and data types are dependent and measured by the parameters set forth by each organization. Where does BigData come from? Big data can come from various sources that can be categorized as internal or external contributors. What is BigData good for? BigData is good for complex and large data sets that exist within a relational databases and may require object-oriented programming. Would you like to see Big Data incorporated in your courses? Yes, I think that we exist in a period in which we are inundated by social media, numbers, photographs and other forms of data which require us to be well versed in the storage, maintenance, and interface design so that we are better able to parse through the Big Data that we encounter on a daily basis.
  • 12. Mar 7, 2014 13 Undergrad Student #1 Is Big Data relevant to you personally? Yes. As my current major is Business Application Development, I can see myself gaining a lot of opportunities to deal with not only the technologies of building up user interface in the future but also the technologies of storing user information, and the techniques used to understand those data could be another opportunity for the business Would you like to see Big Data incorporated in your courses? Yes. I would like to see our course includes some of the techniques that the corporates use nowadays to understand the relation between their data and the problems they need to address, such as how they decide which part of the their big data provides them with the most helpful information for their problem, and explain the meaning of their data analysis based on the result, such as how they can decide the result is accurate and meaningful enough to allow them to take an action. Do you have any questions about Big Data? Big data is a pretty interesting and useful topic. It will be nice to have more background information to help our understanding.
  • 13. Mar 7, 2014 14 Undergrad Student #2 How Big is Big? How is BigData measured? The survey is asking rather easy conceptual questions about big data. Big data is easy to understand at that level: we finally have the technology to store, retrieve (cheap memory), and analyze (with proper languages) data on magnitudes that were impossible before. Instead of just a phone book type of data, people can gather every relevant or even possibly relevant piece of information about anything (often but not limited to customers of a business). I have read articles about how some companies (credit card mostly, if I remember correctly) that can tell if a woman is pregnant before they even know themselves. Or they can predict divorce rates a year in advance quite reliably. All this from their spending habits and deviations from those habits. While all this is fascinating, I don't have any real interest in learning the conceptual level like this. If big data is to be relevant in a class, it needs to show HOW all this is done. Teach the language, teach the search and statistical algorithms, or even the methods people use to collect big data (the penta+bytes aren't being entered by hand). Classes or lectures on big data should come away with some practical knowledge on the subject, otherwise we're just applying a name to something people generally understand: organizations collect and analyze as much data as they can, and recent technology has made that amount of data staggeringly large. The key- and buzz-words are nice to sound like an expert, but the how to is generally more important.
  • 14. Mar 7, 2014 15 Student Response #4 How Big is Big? How is BigData measured? Big data is a term developed recently to describe the trend of exponentially increasing amount of data stored by organizations for business uses. Very often these big data might be extremely big, such as 16 petabytes. These data is measured by the memory space they occupy. Thus, a 16 petabytes of big data approximately occupies 1015 bytes of memory. Where does BigData come from? Big Data could come from different sources, such as emails, social-networking sites, sensors on the webs, sensors installed on other tracking devices, or line of business applications. Is Big Data relevant to you professionally? Yes. In my previous work as market researcher, we always needed to gather information and analyzed them for the business decision making. The technologies of gathering big data and the techniques used to analyze and filter data is also considered extremely helpful for the career.
  • 15. Mar 7, 2014 16 Data Warehouse Course Student Comments: Very informative, content-rich course, covers the latest technologies, trends, and skills of data warehousing and data management, and data analysis. I would recommend to include this course in the required courses for the MS in CIS with concentration in Database Management and BI Program. Relevance to job opportunities and cutting edge technologies. This is probably the most useful course I have taken at Boston University. I have used every bit of what this professor taught every night at work. I have made contribution to my employer, a data mining company in ways that had never been done before as a result of this course. I have for the first time in my 8 years career planned, designed, and augmented a Data Warehouse from scratch. I have configured an analysis server and reported using MD x queries. This professor has been helpful in many ways. He has guided me through some Data Warehouse design projects at work. Moreover, he has been available to work with me and others after class and on week days.
  • 16. Mar 7, 2014 17 Road map
  • 17. to help archaeologists find answers to questions hidden in thousands of images and text files generated from field sites around the world: Professor Mark Eramian et al. have been awarded $548,000 through the Digging into Data Challenge, National Endowment for the Humanities A Archeology
  • 18. Recently, a researcher wanted to ascertain whether a search against GQ-Pat could provide novel insight into his work related to a specific gene, the cAMP Responsive Element Modulator. Reporting to the VP of R&D: Apply data mining and machine learning techniques to develop better search and content discovery in the field of patents Invent new ways to index tens of millions of documents with semantic information B Biology
  • 20. Quiz:
  • 21. Nathan Kohn BU MET enzyme@bu.edu Stanislav Seltser BU MET sseltser@bu.edu