SlideShare a Scribd company logo
Power of Python with bigdata
For Queries:
Post on Twitter @edurekaIN: #askEdureka
Post on Facebook /edurekaIN
For more details please contact us:
US : 1800 275 9730 (toll free)
INDIA : +91 88808 62004
Email Us : sales@edureka.co
View Mastering Python course details at http://www.edureka.co/python
Slide 2 www.edureka.co/python
At the end of this module, you will be able to
Objectives
 Why Python is popular with Bigdata
 How we can use Python in Bigdata
 How Python helps to do Analytics
 Why Python is trending for Automation
 Where Python is in terms of DataFrames
Slide 3 www.edureka.co/python
Why Python?
 Python is a great language for the beginner programmers since it is easy-to-learn and easy-to-maintain.
 Python’s biggest strength is that the bulk of it’s library is portable. It also supports GUI Programming and
can be used to create Applications portable on Mac, Windows and Unix X-Windows system.
 With libraries like PyDoop and SciPy, it’s a dream come true for Big Data Analytics.
Slide 4 www.edureka.co/python
Growing Interest in Python
Slide 5 www.edureka.co/python
Demo: Web Scraping using Python
 This example demonstrates how to scrape basic financial data from IMDB webpage
 We shall use open source web scraping framework for Python called Beautiful Soup to crawl and
extract data from webpages
 Scraping is used for a wide range of purposes, from data mining to monitoring and automated testing
Slide 6 www.edureka.co/python
Demo: Collecting Tweets using Python
 This example demonstrates how to extract historical tweets for a particular brand like “nike” or “apple”
 We shall make a REST API call to twitter to extract tweets
 This data can be further used to perform sentiment analysis for a particular brand on Twitter
Slide 7 www.edureka.co/python
Big Data
 Lots of Data (Terabytes or Petabytes)
 Big data is the term for a collection of data
sets so large and complex that it becomes
difficult to process using on-hand database
management tools or traditional data
processing applications
 The challenges include capture, curation,
storage, search, sharing, transfer, analysis,
and visualization
cloud
tools
statistics
No SQL
compression
storage
support
database
analize
information
terabytes
processing
mobile
Big Data
Slide 8 www.edureka.co/python
Un-Structured Data is Exploding
Complex, Unstructured
Relational
 2500 exabytes of new information in 2012 with internet as primary driver
 Digital universe grew by 62% last year to 800K petabytes and will grow to 1.2 “zettabytes” this year
Slide 9 www.edureka.co/python
Hadoop for Big Data
 Apache Hadoop is a framework that allows for the distributed processing of large data sets across clusters of
commodity computers using a simple programming model
 It is an Open-source Data Management with scale-out storage & distributed processing
Slide 10 www.edureka.co/python
Hadoop and MapReduce
Hadoop is a system for large scale data processing
It has two main components:
 HDFS – Hadoop Distributed File System (Storage)
» Distributed across “nodes”
» Natively redundant
» NameNode tracks locations
 MapReduce (Processing)
» Splits a task across processors
» “near” the data & assembles results
» Self-Healing, High Bandwidth
» Clustered storage
» Job Tracker manages the Task Trackers
Map-Reduce
Key Value
Slide 11 www.edureka.co/python
Data Cleansing / Preparation.
Writing Map Reduce Using Python.
Leveraging Analytical power of Python on Big Data Set.
Why Python is popular with Big data
Slide 12 www.edureka.co/python
Demo: Data Preparation / Cleaning
Extracting Data from JSON
- Extract Data from Complex JSON for further processing.
Stop word analysis for text analytics
- Remove stop words from a text Paragraph for further processing.
Slide 13 www.edureka.co/python
Demo: Word Count using Hadoop Streaming API
 The example shows the simple word count application written in Python
 We shall use Hadoop Streaming APIs to run MapReduce code written in Python
 Word Count application can be used to index text documents/files for a given “search query”
Slide 14 www.edureka.co/python
PyDoop – Hadoop with Python
 PyDoop package provides a Python API for Hadoop MapReduce and
HDFS
 PyDoop has several advantages over Hadoop’s built-in solutions for
Python programming, i.e., Hadoop Streaming and Jython
 One of the biggest advantage of PyDoop is it’s HDFS API. This
allows you to connect to an HDFS installation, read and write files, and
get information on files, directories and global file system properties
 The MapReduce API of PyDoop allows you to solve many complex
problems with minimal programming efforts. Advance MapReduce
concepts such as ‘Counters’ and ‘Record Readers’ can be implemented
in Python using PyDoop
Python can be used to write Hadoop MapReduce programs and applications to access HDFS API for Hadoop with
PyDoop package
Slide 15 www.edureka.co/python
Demo: Writing Hive UDFs using Python
Hive UDF (User Defined Function) to convert the unixdate to weekofday[1-7]
Slide 16 www.edureka.co/python
Demo: Python NLTK on Hadoop
Leveraging Analytical power of Python on Big Data Set. (MR + NLTK)
Perform stop word removal using Map Reduce.
Slide 17 www.edureka.co/python
Python and Data Science
 Python is an excellent choice for Data
Scientist to do his day-to-day activities as it
provides libraries to do all these things
 Python has a diverse range of open source
libraries for just about everything that a
Data Scientist does in his day-to-day work
 Python and most of its libraries are both
open source and free
The day-to-day tasks of a data scientist involves many interrelated but different activities such as accessing and
manipulating data, computing statistics and , creating visual reports on that data, building predictive and
explanatory models, evaluating these models on additional data, integrating models into production systems, etc.
Slide 18 www.edureka.co/python
SciPy.org
SciPy (pronounced “Sigh Pie”) is a Python-based ecosystem of open-source software for mathematics, science, and
engineering.
NumPy
Base N-dimensional
array package
IPython
Enhanced Interactive
Console
SciPy library
Base N-dimensional
array package
Sympy
Symbolic mathematics
Matplotlib
Comprehensive 2D
Plotting
pandas
Data structures
and analysis
Slide 19 www.edureka.co/python
Demo: Zombie Invasion Model
This is a lighthearted example, a system of ODEs(Ordinary differential equations) can be used to model a "zombie
invasion", using the equations specified by Philip Munz.
The system is given as:
dS/dt = P - B*S*Z - d*S
dZ/dt = B*S*Z + G*R - A*S*Z
dR/dt = d*S + A*S*Z - G*R
There are three scenarios given in the program to show how Zombie Apocalypse vary with different initial
conditions.
This involves solving a system of first order ODEs given by: dy/dt = f(y, t) Where y = [S, Z, R].
Where:
S: the number of susceptible victims
Z: the number of zombies
R: the number of people "killed”
P: the population birth rate
d: the chance of a natural death
B: the chance the "zombie disease" is transmitted (an alive person becomes a zombie)
G: the chance a dead person is resurrected into a zombie
A: the chance a zombie is totally destroyed
Slide 20 www.edureka.co/python
Python Pandas – Data Frames
Slide 21 www.edureka.co/python
Demo : Python Pandas
 Find the top 5 rated movies
Using the huge movie data-set (movie rating, user details etc. ) that is being collected now a days, we
need to do the below analysis:
 Find the Top 5 movies rated across age – groups
 Find on which movies do women and men most disagree on?
LIVE Online Class
Class Recording in LMS
24/7 Post Class Support
Module Wise Quiz
Project Work
Verifiable Certificate
Slide 22 www.edureka.co/python
How it Works?
Slide 23Slide 23 www.edureka.co/python
Course Topics
 Module 1
» Getting Started with Python
 Module 2
» Sequences and File Operations
 Module 3
» Deep Dive - Functions, Sorting, Errors and
Exception Handling
 Module 4
» Regular Expressions, its Packages and Object
Oriented Programming in Python
 Module 5
» Debugging, Databases and Project Skeletons
Module 6
» Machine Learning Using Python – I
Module 7
» Machine Learning Using Python – II
Module 8
» Introduction to Hadoop
 Module 9
» Hadoop and Python
 Module 10
» Web Scraping using Python and Project Work
Questions
Slide 24 www.edureka.co/pythonTwitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions
Slide 25 Course Url

More Related Content

PDF
Data Science : Make Smarter Business Decisions
PPTX
Application of Clustering in Data Science using Real-life Examples
PPTX
Webinar: Mastering Python - An Excellent tool for Web Scraping and Data Anal...
PDF
Webinar : Introduction to R Programming and Machine Learning
PPTX
Python for Big Data Analytics
PDF
Sentiment Analysis In Retail Domain
PPTX
Python and BIG Data analytics | Python Fundamentals | Python Architecture
PDF
Business Analytics Decision Tree in R
Data Science : Make Smarter Business Decisions
Application of Clustering in Data Science using Real-life Examples
Webinar: Mastering Python - An Excellent tool for Web Scraping and Data Anal...
Webinar : Introduction to R Programming and Machine Learning
Python for Big Data Analytics
Sentiment Analysis In Retail Domain
Python and BIG Data analytics | Python Fundamentals | Python Architecture
Business Analytics Decision Tree in R

What's hot (20)

PDF
Python webinar 2nd july
PDF
Python for Data Science
PPTX
Python for Big Data Analytics
PDF
Python for Data Science | Python Data Science Tutorial | Data Science Certifi...
PDF
Power of Python with Big Data
PDF
Logistic Regression In Data Science
PPTX
Data Science With Python | Python For Data Science | Python Data Science Cour...
PDF
Best Python Libraries For Data Science & Machine Learning | Edureka
PDF
Introduction To Data Science With Python
PDF
Python in Data Science Work
PDF
Data science presentation
PDF
Business Analytics with R
PDF
Is It A Right Time For Me To Learn Hadoop. Find out ?
PDF
Using hadoop for big data
PDF
Myths of Data Science
PDF
Cheat sheets for data scientists
PDF
Big Data Career Path | Big Data Learning Path | Hadoop Tutorial | Edureka
PDF
Data Science Training | Data Science Tutorial for Beginners | Data Science wi...
PDF
Introduction to Python
PDF
Introduction to Big Data
Python webinar 2nd july
Python for Data Science
Python for Big Data Analytics
Python for Data Science | Python Data Science Tutorial | Data Science Certifi...
Power of Python with Big Data
Logistic Regression In Data Science
Data Science With Python | Python For Data Science | Python Data Science Cour...
Best Python Libraries For Data Science & Machine Learning | Edureka
Introduction To Data Science With Python
Python in Data Science Work
Data science presentation
Business Analytics with R
Is It A Right Time For Me To Learn Hadoop. Find out ?
Using hadoop for big data
Myths of Data Science
Cheat sheets for data scientists
Big Data Career Path | Big Data Learning Path | Hadoop Tutorial | Edureka
Data Science Training | Data Science Tutorial for Beginners | Data Science wi...
Introduction to Python
Introduction to Big Data
Ad

Similar to Python webinar 4th june (20)

PPTX
Python PPT
PDF
Why Python Should Be Your First Programming Language
PDF
What Is The Future of Data Science With Python?
PDF
Samsung SDS OpeniT - The possibility of Python
PDF
PYTHON FOR DATA SCIENCE- EXPLAINED IN 6 EASY STEPS
PDF
International Journal of Engineering Research and Development (IJERD)
PDF
The Role of Python in Data Science - ED11
PDF
Python
PDF
Data Analysis Python For Environmental Science Hayden Van Der Post
PPTX
Python and its applications
PDF
Why Learn Python for Data Science Tutorial
PDF
Python for Data Science: Trends and Tools
PDF
Why Learn Python for Data Science Tutorial | IABAC
PPTX
The-Power-of-Python-Programming.pptx
PDF
Python on Science ? Yes, We can.
PDF
Programming for data science in python
PPTX
Python 101 For The Net Developer
PDF
Introduction to python
PPTX
DATA SCIENCE PPT.pptx
PPTX
pdsa new ppt for subject marking and pyt
Python PPT
Why Python Should Be Your First Programming Language
What Is The Future of Data Science With Python?
Samsung SDS OpeniT - The possibility of Python
PYTHON FOR DATA SCIENCE- EXPLAINED IN 6 EASY STEPS
International Journal of Engineering Research and Development (IJERD)
The Role of Python in Data Science - ED11
Python
Data Analysis Python For Environmental Science Hayden Van Der Post
Python and its applications
Why Learn Python for Data Science Tutorial
Python for Data Science: Trends and Tools
Why Learn Python for Data Science Tutorial | IABAC
The-Power-of-Python-Programming.pptx
Python on Science ? Yes, We can.
Programming for data science in python
Python 101 For The Net Developer
Introduction to python
DATA SCIENCE PPT.pptx
pdsa new ppt for subject marking and pyt
Ad

More from Edureka! (20)

PDF
What to learn during the 21 days Lockdown | Edureka
PDF
Top 10 Dying Programming Languages in 2020 | Edureka
PDF
Top 5 Trending Business Intelligence Tools | Edureka
PDF
Tableau Tutorial for Data Science | Edureka
PDF
Python Programming Tutorial | Edureka
PDF
Top 5 PMP Certifications | Edureka
PDF
Top Maven Interview Questions in 2020 | Edureka
PDF
Linux Mint Tutorial | Edureka
PDF
How to Deploy Java Web App in AWS| Edureka
PDF
Importance of Digital Marketing | Edureka
PDF
RPA in 2020 | Edureka
PDF
Email Notifications in Jenkins | Edureka
PDF
EA Algorithm in Machine Learning | Edureka
PDF
Cognitive AI Tutorial | Edureka
PDF
AWS Cloud Practitioner Tutorial | Edureka
PDF
Blue Prism Top Interview Questions | Edureka
PDF
Big Data on AWS Tutorial | Edureka
PDF
A star algorithm | A* Algorithm in Artificial Intelligence | Edureka
PDF
Kubernetes Installation on Ubuntu | Edureka
PDF
Introduction to DevOps | Edureka
What to learn during the 21 days Lockdown | Edureka
Top 10 Dying Programming Languages in 2020 | Edureka
Top 5 Trending Business Intelligence Tools | Edureka
Tableau Tutorial for Data Science | Edureka
Python Programming Tutorial | Edureka
Top 5 PMP Certifications | Edureka
Top Maven Interview Questions in 2020 | Edureka
Linux Mint Tutorial | Edureka
How to Deploy Java Web App in AWS| Edureka
Importance of Digital Marketing | Edureka
RPA in 2020 | Edureka
Email Notifications in Jenkins | Edureka
EA Algorithm in Machine Learning | Edureka
Cognitive AI Tutorial | Edureka
AWS Cloud Practitioner Tutorial | Edureka
Blue Prism Top Interview Questions | Edureka
Big Data on AWS Tutorial | Edureka
A star algorithm | A* Algorithm in Artificial Intelligence | Edureka
Kubernetes Installation on Ubuntu | Edureka
Introduction to DevOps | Edureka

Recently uploaded (20)

PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PDF
cuic standard and advanced reporting.pdf
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PPTX
A Presentation on Artificial Intelligence
PDF
Machine learning based COVID-19 study performance prediction
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
KodekX | Application Modernization Development
PDF
NewMind AI Monthly Chronicles - July 2025
PPTX
Big Data Technologies - Introduction.pptx
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Empathic Computing: Creating Shared Understanding
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PPTX
Cloud computing and distributed systems.
PPTX
MYSQL Presentation for SQL database connectivity
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
Network Security Unit 5.pdf for BCA BBA.
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
cuic standard and advanced reporting.pdf
The Rise and Fall of 3GPP – Time for a Sabbatical?
A Presentation on Artificial Intelligence
Machine learning based COVID-19 study performance prediction
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Review of recent advances in non-invasive hemoglobin estimation
KodekX | Application Modernization Development
NewMind AI Monthly Chronicles - July 2025
Big Data Technologies - Introduction.pptx
Encapsulation_ Review paper, used for researhc scholars
Empathic Computing: Creating Shared Understanding
NewMind AI Weekly Chronicles - August'25 Week I
CIFDAQ's Market Insight: SEC Turns Pro Crypto
Unlocking AI with Model Context Protocol (MCP)
Reach Out and Touch Someone: Haptics and Empathic Computing
Cloud computing and distributed systems.
MYSQL Presentation for SQL database connectivity
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Network Security Unit 5.pdf for BCA BBA.

Python webinar 4th june

  • 1. Power of Python with bigdata For Queries: Post on Twitter @edurekaIN: #askEdureka Post on Facebook /edurekaIN For more details please contact us: US : 1800 275 9730 (toll free) INDIA : +91 88808 62004 Email Us : sales@edureka.co View Mastering Python course details at http://www.edureka.co/python
  • 2. Slide 2 www.edureka.co/python At the end of this module, you will be able to Objectives  Why Python is popular with Bigdata  How we can use Python in Bigdata  How Python helps to do Analytics  Why Python is trending for Automation  Where Python is in terms of DataFrames
  • 3. Slide 3 www.edureka.co/python Why Python?  Python is a great language for the beginner programmers since it is easy-to-learn and easy-to-maintain.  Python’s biggest strength is that the bulk of it’s library is portable. It also supports GUI Programming and can be used to create Applications portable on Mac, Windows and Unix X-Windows system.  With libraries like PyDoop and SciPy, it’s a dream come true for Big Data Analytics.
  • 5. Slide 5 www.edureka.co/python Demo: Web Scraping using Python  This example demonstrates how to scrape basic financial data from IMDB webpage  We shall use open source web scraping framework for Python called Beautiful Soup to crawl and extract data from webpages  Scraping is used for a wide range of purposes, from data mining to monitoring and automated testing
  • 6. Slide 6 www.edureka.co/python Demo: Collecting Tweets using Python  This example demonstrates how to extract historical tweets for a particular brand like “nike” or “apple”  We shall make a REST API call to twitter to extract tweets  This data can be further used to perform sentiment analysis for a particular brand on Twitter
  • 7. Slide 7 www.edureka.co/python Big Data  Lots of Data (Terabytes or Petabytes)  Big data is the term for a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications  The challenges include capture, curation, storage, search, sharing, transfer, analysis, and visualization cloud tools statistics No SQL compression storage support database analize information terabytes processing mobile Big Data
  • 8. Slide 8 www.edureka.co/python Un-Structured Data is Exploding Complex, Unstructured Relational  2500 exabytes of new information in 2012 with internet as primary driver  Digital universe grew by 62% last year to 800K petabytes and will grow to 1.2 “zettabytes” this year
  • 9. Slide 9 www.edureka.co/python Hadoop for Big Data  Apache Hadoop is a framework that allows for the distributed processing of large data sets across clusters of commodity computers using a simple programming model  It is an Open-source Data Management with scale-out storage & distributed processing
  • 10. Slide 10 www.edureka.co/python Hadoop and MapReduce Hadoop is a system for large scale data processing It has two main components:  HDFS – Hadoop Distributed File System (Storage) » Distributed across “nodes” » Natively redundant » NameNode tracks locations  MapReduce (Processing) » Splits a task across processors » “near” the data & assembles results » Self-Healing, High Bandwidth » Clustered storage » Job Tracker manages the Task Trackers Map-Reduce Key Value
  • 11. Slide 11 www.edureka.co/python Data Cleansing / Preparation. Writing Map Reduce Using Python. Leveraging Analytical power of Python on Big Data Set. Why Python is popular with Big data
  • 12. Slide 12 www.edureka.co/python Demo: Data Preparation / Cleaning Extracting Data from JSON - Extract Data from Complex JSON for further processing. Stop word analysis for text analytics - Remove stop words from a text Paragraph for further processing.
  • 13. Slide 13 www.edureka.co/python Demo: Word Count using Hadoop Streaming API  The example shows the simple word count application written in Python  We shall use Hadoop Streaming APIs to run MapReduce code written in Python  Word Count application can be used to index text documents/files for a given “search query”
  • 14. Slide 14 www.edureka.co/python PyDoop – Hadoop with Python  PyDoop package provides a Python API for Hadoop MapReduce and HDFS  PyDoop has several advantages over Hadoop’s built-in solutions for Python programming, i.e., Hadoop Streaming and Jython  One of the biggest advantage of PyDoop is it’s HDFS API. This allows you to connect to an HDFS installation, read and write files, and get information on files, directories and global file system properties  The MapReduce API of PyDoop allows you to solve many complex problems with minimal programming efforts. Advance MapReduce concepts such as ‘Counters’ and ‘Record Readers’ can be implemented in Python using PyDoop Python can be used to write Hadoop MapReduce programs and applications to access HDFS API for Hadoop with PyDoop package
  • 15. Slide 15 www.edureka.co/python Demo: Writing Hive UDFs using Python Hive UDF (User Defined Function) to convert the unixdate to weekofday[1-7]
  • 16. Slide 16 www.edureka.co/python Demo: Python NLTK on Hadoop Leveraging Analytical power of Python on Big Data Set. (MR + NLTK) Perform stop word removal using Map Reduce.
  • 17. Slide 17 www.edureka.co/python Python and Data Science  Python is an excellent choice for Data Scientist to do his day-to-day activities as it provides libraries to do all these things  Python has a diverse range of open source libraries for just about everything that a Data Scientist does in his day-to-day work  Python and most of its libraries are both open source and free The day-to-day tasks of a data scientist involves many interrelated but different activities such as accessing and manipulating data, computing statistics and , creating visual reports on that data, building predictive and explanatory models, evaluating these models on additional data, integrating models into production systems, etc.
  • 18. Slide 18 www.edureka.co/python SciPy.org SciPy (pronounced “Sigh Pie”) is a Python-based ecosystem of open-source software for mathematics, science, and engineering. NumPy Base N-dimensional array package IPython Enhanced Interactive Console SciPy library Base N-dimensional array package Sympy Symbolic mathematics Matplotlib Comprehensive 2D Plotting pandas Data structures and analysis
  • 19. Slide 19 www.edureka.co/python Demo: Zombie Invasion Model This is a lighthearted example, a system of ODEs(Ordinary differential equations) can be used to model a "zombie invasion", using the equations specified by Philip Munz. The system is given as: dS/dt = P - B*S*Z - d*S dZ/dt = B*S*Z + G*R - A*S*Z dR/dt = d*S + A*S*Z - G*R There are three scenarios given in the program to show how Zombie Apocalypse vary with different initial conditions. This involves solving a system of first order ODEs given by: dy/dt = f(y, t) Where y = [S, Z, R]. Where: S: the number of susceptible victims Z: the number of zombies R: the number of people "killed” P: the population birth rate d: the chance of a natural death B: the chance the "zombie disease" is transmitted (an alive person becomes a zombie) G: the chance a dead person is resurrected into a zombie A: the chance a zombie is totally destroyed
  • 20. Slide 20 www.edureka.co/python Python Pandas – Data Frames
  • 21. Slide 21 www.edureka.co/python Demo : Python Pandas  Find the top 5 rated movies Using the huge movie data-set (movie rating, user details etc. ) that is being collected now a days, we need to do the below analysis:  Find the Top 5 movies rated across age – groups  Find on which movies do women and men most disagree on?
  • 22. LIVE Online Class Class Recording in LMS 24/7 Post Class Support Module Wise Quiz Project Work Verifiable Certificate Slide 22 www.edureka.co/python How it Works?
  • 23. Slide 23Slide 23 www.edureka.co/python Course Topics  Module 1 » Getting Started with Python  Module 2 » Sequences and File Operations  Module 3 » Deep Dive - Functions, Sorting, Errors and Exception Handling  Module 4 » Regular Expressions, its Packages and Object Oriented Programming in Python  Module 5 » Debugging, Databases and Project Skeletons Module 6 » Machine Learning Using Python – I Module 7 » Machine Learning Using Python – II Module 8 » Introduction to Hadoop  Module 9 » Hadoop and Python  Module 10 » Web Scraping using Python and Project Work
  • 24. Questions Slide 24 www.edureka.co/pythonTwitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions