SlideShare a Scribd company logo
3
Most read
6
Most read
11
Most read
INTRODUCTION TO
WEB SCRAPING USING
PYTHON
Submitted By
www.computersciencejunction.in
www.computersciencejunction.i
n
Content
• What is Web Scraping?
• Need of Web Scraping.
• Workflow
• Libraries used.
• Why Python for web scraping.
• Demo (Scrape a Website)
• Advantages of web scraping.
• Limitations of web scraping.
www.computerscienc
ejunction.in
Web Scraping
Web Scraping is a technique to fetch data
and
information from websites.
Everything you see on a webpage can be
scraped.
Can be done in most programming
languages,
we’ll use Python because it is easier with
Python.
www.computersciencejunction.in
Need of Web Scraping
• Web scraping, or web content extraction, can
serve an unlimited number of purposes
Better access to company data
Market analysis at scale
Machine learning and large datasets.
Stock Market Tracking
Tracking latest trends
www.computerscienc
ejunction.in
Work Flow
www.computerscienc
ejunction.in
Continued..
Send Request and Load the webpage.
(Requests, urllib, httplib)
Parse the content for desired data.
(Beautiful Soup, re, Scrapy)
Store the data the way you want.
www.computerscienc
ejunction.in
Libraries Used
Selenium
Selenium is a web testing library. It is used to
automate browser activities.
BeautifulSoup
Beautiful Soup is a Python package for parsing
HTML and XML documents. It creates parse trees
that is helpful to extract the data easily.
Pandas
Pandas is a library used for data manipulation and
analysis.
www.computerscienc
ejunction.in
Why python for web
scraping?
• Ease of Use.
• Large Collection of Libraries.
• Dynamically typed.
• Easily Understandable Syntax.
• Small code, large task.
• Community
www.computerscienc
ejunction.in
• Step 3: Find the data you want to extract
Let’s extract the Price, Name, and Rating which is
nested in the “div” tag respectively.
• Step 4: Write the code.
First, let’s create a Python file. To do this, open the
terminal in Windows and type gedit <your file name>
with .py extension.
www.computerscienc
ejunction.in
Code
• # import libraries
from bs4 import BeautifulSoup
import urllib.request
import csv
moviename='iron_man_3'
urlpage =
'https://www.rottentomatoes.com/m/'+moviename+'/reviews?ty
pe=user’
www.computerscienc
ejunction.in
• page = urllib.request.urlopen(urlpage)
# parse the html using beautiful soup and store in variable 'soup'
soup = BeautifulSoup(page, 'html.parser')
# find results within table
table = soup.find('div', attrs={'class': 'review_table'}).find('ul',attrs={'class':
'audience-reviews'})
article=[]
for i in table.findAll('li',attrs={'class': 'audience-reviews__item'}):
z=""
z=z+i.find('p',attrs={'class':'audience-reviews__review--mobile js-review-
text clamp clamp-4 js-clamp'}).getText()
article.append(z)
www.computerscienc
ejunction.in
Continued...
Step 5: Run the code and extract the data.
Step 6: Store the data in a required format.
 Example-:
df=pd.DataFrame({'ProductName':products,'
Price':prices,'Rating':ratings})
df.to_csv('products.csv', index=False, encoding='utf-8')
www.computerscienc
ejunction.in
Advantages of Web
Scraping
Inexpensive
Easy to implement
Low maintenance and speed
Accuracy
www.computerscienc
ejunction.in
Limitations of web scraping
• Difficult to analyze
For anybody who is not an expert, the scraping
processes are confusing to understand. Although this is not
a major problem, but some errors could be fixed faster if it
was easier to understand for more software developers.
• Time
Sometimes web scraping services take time to become
familiar with the core application and need to adjust to the
scrapping language.
• Speed and protection policies
www.computerscienc
ejunction.in
Thank You
www.computerscienc
ejunction.in

More Related Content

PPTX
WEB Scraping.pptx
PDF
Web scraping in python
PPTX
Data analytics
PDF
25 Need-to-Know Marketing Stats
PPTX
Web Scraping using Python | Web Screen Scraping
PPTX
tariff and its types
ODP
Introduction to Web Scraping using Python and Beautiful Soup
PPTX
Spring Boot and REST API
WEB Scraping.pptx
Web scraping in python
Data analytics
25 Need-to-Know Marketing Stats
Web Scraping using Python | Web Screen Scraping
tariff and its types
Introduction to Web Scraping using Python and Beautiful Soup
Spring Boot and REST API

What's hot (20)

PPTX
Web Scraping With Python
PPTX
Probabilistic information retrieval models & systems
PPTX
Web Scraping
PDF
Link Analysis for Web Information Retrieval
PPTX
Web crawler
PPTX
Chap 1 general introduction of information retrieval
PDF
Overview of recommender system
PPTX
Boolean,vector space retrieval Models
PDF
Tutorial on Web Scraping in Python
PDF
Information Storage and Retrieval : A Case Study
PPTX
Lectures 1,2,3
PDF
IE: Named Entity Recognition (NER)
PPT
Natural language processing
PPTX
Information retrieval (introduction)
PDF
Collaborative filtering
PPT
Web Usage Pattern
PPTX
Meta tags
PPTX
Spell checker using Natural language processing
PPTX
The vector space model
PDF
An introduction to Recommender Systems
Web Scraping With Python
Probabilistic information retrieval models & systems
Web Scraping
Link Analysis for Web Information Retrieval
Web crawler
Chap 1 general introduction of information retrieval
Overview of recommender system
Boolean,vector space retrieval Models
Tutorial on Web Scraping in Python
Information Storage and Retrieval : A Case Study
Lectures 1,2,3
IE: Named Entity Recognition (NER)
Natural language processing
Information retrieval (introduction)
Collaborative filtering
Web Usage Pattern
Meta tags
Spell checker using Natural language processing
The vector space model
An introduction to Recommender Systems
Ad

Similar to Web Scrapping Using Python (20)

PPTX
Data-Analytics using python (Module 4).pptx
PPTX
Jeremy cabral search marketing summit - scraping data-driven content (1)
PDF
Introduction to Django
PPTX
Web programming using python frameworks.
PDF
Big data analysis in python @ PyCon.tw 2013
PDF
Website & Internet + Performance testing
PPTX
Python FDP self learning presentations..
PDF
Future of Development and Deployment using Docker
PPTX
Python ml
PDF
Scrapy workshop
PPTX
Sphinx + robot framework = documentation as result of functional testing
PDF
Build a game with javascript (april 2017)
PDF
Mezzanine簡介 (at) Taichung.py
PDF
Making Things Work Together
PDF
Django Introduction & Tutorial
PPTX
Getting started with titanium
PPT
Mini Curso Django Ii Congresso Academico Ces
PPTX
221c82d4-5428-4047-8558-0467b34083e8.pptx
PPTX
DEVICE CHANNELS
PPTX
Untangling - fall2017 - week 9
Data-Analytics using python (Module 4).pptx
Jeremy cabral search marketing summit - scraping data-driven content (1)
Introduction to Django
Web programming using python frameworks.
Big data analysis in python @ PyCon.tw 2013
Website & Internet + Performance testing
Python FDP self learning presentations..
Future of Development and Deployment using Docker
Python ml
Scrapy workshop
Sphinx + robot framework = documentation as result of functional testing
Build a game with javascript (april 2017)
Mezzanine簡介 (at) Taichung.py
Making Things Work Together
Django Introduction & Tutorial
Getting started with titanium
Mini Curso Django Ii Congresso Academico Ces
221c82d4-5428-4047-8558-0467b34083e8.pptx
DEVICE CHANNELS
Untangling - fall2017 - week 9
Ad

Recently uploaded (20)

PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Machine learning based COVID-19 study performance prediction
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PDF
Empathic Computing: Creating Shared Understanding
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Approach and Philosophy of On baking technology
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
cuic standard and advanced reporting.pdf
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Electronic commerce courselecture one. Pdf
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Modernizing your data center with Dell and AMD
PPTX
Big Data Technologies - Introduction.pptx
PDF
Encapsulation theory and applications.pdf
Per capita expenditure prediction using model stacking based on satellite ima...
Machine learning based COVID-19 study performance prediction
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
CIFDAQ's Market Insight: SEC Turns Pro Crypto
Empathic Computing: Creating Shared Understanding
Dropbox Q2 2025 Financial Results & Investor Presentation
Approach and Philosophy of On baking technology
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
Encapsulation_ Review paper, used for researhc scholars
Chapter 3 Spatial Domain Image Processing.pdf
Understanding_Digital_Forensics_Presentation.pptx
“AI and Expert System Decision Support & Business Intelligence Systems”
cuic standard and advanced reporting.pdf
Unlocking AI with Model Context Protocol (MCP)
Electronic commerce courselecture one. Pdf
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Modernizing your data center with Dell and AMD
Big Data Technologies - Introduction.pptx
Encapsulation theory and applications.pdf

Web Scrapping Using Python

  • 1. INTRODUCTION TO WEB SCRAPING USING PYTHON Submitted By www.computersciencejunction.in www.computersciencejunction.i n
  • 2. Content • What is Web Scraping? • Need of Web Scraping. • Workflow • Libraries used. • Why Python for web scraping. • Demo (Scrape a Website) • Advantages of web scraping. • Limitations of web scraping. www.computerscienc ejunction.in
  • 3. Web Scraping Web Scraping is a technique to fetch data and information from websites. Everything you see on a webpage can be scraped. Can be done in most programming languages, we’ll use Python because it is easier with Python. www.computersciencejunction.in
  • 4. Need of Web Scraping • Web scraping, or web content extraction, can serve an unlimited number of purposes Better access to company data Market analysis at scale Machine learning and large datasets. Stock Market Tracking Tracking latest trends www.computerscienc ejunction.in
  • 6. Continued.. Send Request and Load the webpage. (Requests, urllib, httplib) Parse the content for desired data. (Beautiful Soup, re, Scrapy) Store the data the way you want. www.computerscienc ejunction.in
  • 7. Libraries Used Selenium Selenium is a web testing library. It is used to automate browser activities. BeautifulSoup Beautiful Soup is a Python package for parsing HTML and XML documents. It creates parse trees that is helpful to extract the data easily. Pandas Pandas is a library used for data manipulation and analysis. www.computerscienc ejunction.in
  • 8. Why python for web scraping? • Ease of Use. • Large Collection of Libraries. • Dynamically typed. • Easily Understandable Syntax. • Small code, large task. • Community www.computerscienc ejunction.in
  • 9. • Step 3: Find the data you want to extract Let’s extract the Price, Name, and Rating which is nested in the “div” tag respectively. • Step 4: Write the code. First, let’s create a Python file. To do this, open the terminal in Windows and type gedit <your file name> with .py extension. www.computerscienc ejunction.in
  • 10. Code • # import libraries from bs4 import BeautifulSoup import urllib.request import csv moviename='iron_man_3' urlpage = 'https://www.rottentomatoes.com/m/'+moviename+'/reviews?ty pe=user’ www.computerscienc ejunction.in
  • 11. • page = urllib.request.urlopen(urlpage) # parse the html using beautiful soup and store in variable 'soup' soup = BeautifulSoup(page, 'html.parser') # find results within table table = soup.find('div', attrs={'class': 'review_table'}).find('ul',attrs={'class': 'audience-reviews'}) article=[] for i in table.findAll('li',attrs={'class': 'audience-reviews__item'}): z="" z=z+i.find('p',attrs={'class':'audience-reviews__review--mobile js-review- text clamp clamp-4 js-clamp'}).getText() article.append(z) www.computerscienc ejunction.in
  • 12. Continued... Step 5: Run the code and extract the data. Step 6: Store the data in a required format.  Example-: df=pd.DataFrame({'ProductName':products,' Price':prices,'Rating':ratings}) df.to_csv('products.csv', index=False, encoding='utf-8') www.computerscienc ejunction.in
  • 13. Advantages of Web Scraping Inexpensive Easy to implement Low maintenance and speed Accuracy www.computerscienc ejunction.in
  • 14. Limitations of web scraping • Difficult to analyze For anybody who is not an expert, the scraping processes are confusing to understand. Although this is not a major problem, but some errors could be fixed faster if it was easier to understand for more software developers. • Time Sometimes web scraping services take time to become familiar with the core application and need to adjust to the scrapping language. • Speed and protection policies www.computerscienc ejunction.in