SlideShare a Scribd company logo
Selenium & Scrapy
Web UI testing and Web Scraping
About me
Arcangelo Saracino
IT student at Bari University
2016-2018 Web developer at Aryma
2018- Feb2019 Web developer at Enterprise Digital Solution
saracinoarcangelo@gmail.com github.com/Arkango
Selenium
Selenium is a portable framework for testing web applications.
Selenium provides a playback (formerly also recording) tool for authoring
functional tests without the need to learn a test scripting language (Selenium IDE).
It also provides a test domain-specific language (Selenese) to write tests in a
number of popular programming languages, including C#, Groovy, Java, Perl,
PHP, Python, Ruby and Scala.
The tests can then run against most modern web browsers.
Selenium deploys on Windows, Linux, and macOS platforms.
It is open-source software, released under the Apache 2.0 license: web
developers can download and use it without charge.
Source: Wikipedia
Selenium Components
●
Selenium IDE
●
Selenium Client Api
● Selenium Web Driver
● Selenium Remote Control
● Selenium Grid
Selenium IDE
Selenium IDE is a complete integrated development environment (IDE) for Selenium tests.
It is implemented as a Firefox Add-On and as a Chrome Extension.
It allows for recording, editing, and debugging of functional tests. It was previously known
as Selenium Recorder.
Selenium-IDE was originally created by Shinya Kasatani and donated to the Selenium
project in 2006.
Selenium IDE was previously little-maintained. Selenium IDE began being actively
maintained in 2018.
Scripts may be automatically recorded and edited manually providing autocompletion
support and the ability to move commands around quickly. Scripts are recorded in
Selenese, a special test scripting language for Selenium. Selenese provides commands
for performing actions in a browser (click a link, select an option), and for retrieving data
from the resulting pages.
Selenium Client API
As an alternative to writing tests in Selenese, tests can
also be written in various programming languages. These
tests then communicate with Selenium by calling methods
in the Selenium Client API. Selenium currently provides
client APIs for Java, C#, Ruby, JavaScript, R and Python.
With Selenium 2, a new Client API was introduced (with
WebDriver as its central component). However, the old API
(using class Selenium) is still supported.
Selenium Web Driver
Selenium WebDriver is the successor to Selenium RC.
Selenium WebDriver accepts commands (sent in Selenese, or
via a Client API) and sends them to a browser.
This is implemented through a browser-specific browser driver,
which sends commands to a browser and retrieves results.
Most browser drivers actually launch and access a browser
application (such as Firefox, Chrome, Internet Explorer, Safari,
or Microsoft Edge); there is also an HtmlUnit browser driver,
which simulates a browser using the headless browser
HtmlUnit.
Hands on code
● An example …..
Scrapy
Scrapy (/ skre pi/ SKRAY-pee) is a free and open-source web-crawlingˈ ɪ
framework written in Python. Originally designed for web scraping, it
can also be used to extract data using APIs or as a general-purpose
web crawler. It is currently maintained by Scrapinghub Ltd., a web-
scraping development and services company.
Scrapy project architecture is built around "spiders", which are self-
contained crawlers that are given a set of instructions. Following the
spirit of other don't repeat yourself frameworks, such as Django,[4] it
makes it easier to build and scale large crawling projects by allowing
developers to reuse their code. Scrapy also provides a web-crawling
shell, which can be used by developers to test their assumptions on a
site’s behavior.[5]
Scrapy: Basic Concept
● Command line tools
Scrapy is controlled through the scrapy command-line tool, to be referred here as the “Scrapy tool” to
differentiate it from the sub-commands, which we just call “commands” or “Scrapy commands”.
● Spiders
Spiders are classes which define how a certain site (or a group of sites) will be scraped,
including how to perform the crawl (i.e. follow links) and how to extract structured data from
their pages (i.e. scraping items). In other words, Spiders are the place where you define the
custom behaviour for crawling and parsing pages for a particular site (or, in some cases, a
group of sites).
● Selectors
Extract the data from web pages using XPath.
● Scrapy Shell
Test your extraction code in an interactive environment.
Scrapy: Basic Concept 2
● Items
Define the data you want to scrape.
● Items Loader
Populate your items with the extracted data.
● Items Pipeline
Post-process and store your scraped data.
● Feed Exports
Output your scraped data using different formats and storages.
● Request and responses
Scrapy uses Request and Response objects for crawling web sites.
Scrapy: Basic Concept 3
● Link extractor
Convenient classes to extract links to follow from pages.
● Settings
Learn how to configure Scrapy and see all available settings.
● Exceptions
See all available exceptions and their meaning.
Let’s code
● An example …..
Usages
● Testing ui
● Web crawling
● Hacking
Sources
● Wikipedia.org
● https://www.seleniumhq.org/
● https://scrapy.org/
● Tutorial: https://selenium-python.readthedocs.io/,https://www.youtube.com/watch?v=XDn60jw68tM,
https://docs.scrapy.org/en/latest/intro/tutorial.html
Questions&Answers
About me
Arcangelo Saracino
IT student at Bari University
2016-2018 Web developer at Aryma
2018- Feb2019 Web developer at Enterprise Digital Solution
saracinoarcangelo@gmail.com github.com/Arkango
Thank you

More Related Content

PPTX
How to scraping content from web for location-based mobile app.
PDF
Scrapy workshop
PDF
Pydata-Python tools for webscraping
PDF
Web Crawling Modeling with Scrapy Models #TDC2014
PDF
Web Scraping with Python
PDF
Webscraping with asyncio
PPTX
Scrapy-101
PDF
How to Scrap Any Website's content using ScrapyTutorial of How to scrape (cra...
How to scraping content from web for location-based mobile app.
Scrapy workshop
Pydata-Python tools for webscraping
Web Crawling Modeling with Scrapy Models #TDC2014
Web Scraping with Python
Webscraping with asyncio
Scrapy-101
How to Scrap Any Website's content using ScrapyTutorial of How to scrape (cra...

What's hot (20)

PDF
Downloading the internet with Python + Scrapy
PDF
Web Scrapping with Python
PDF
Fun with Python
PPTX
PDF
Scraping with Python for Fun and Profit - PyCon India 2010
PDF
Building an API with Django and Django REST Framework
PPT
Django
PPTX
Scrapy.for.dummies
PDF
Intro to Web Development Using Python and Django
PDF
Analyse Yourself
PPTX
Web development with django - Basics Presentation
PDF
Django Introduction & Tutorial
PDF
Web Scraping in Python with Scrapy
PDF
Django Overview
ODP
Django tech-talk
PPTX
Django Framework Overview forNon-Python Developers
PDF
Firebase slide
PDF
Create responsive websites with Django, REST and AngularJS
PPTX
PDF
Django REST Framework
Downloading the internet with Python + Scrapy
Web Scrapping with Python
Fun with Python
Scraping with Python for Fun and Profit - PyCon India 2010
Building an API with Django and Django REST Framework
Django
Scrapy.for.dummies
Intro to Web Development Using Python and Django
Analyse Yourself
Web development with django - Basics Presentation
Django Introduction & Tutorial
Web Scraping in Python with Scrapy
Django Overview
Django tech-talk
Django Framework Overview forNon-Python Developers
Firebase slide
Create responsive websites with Django, REST and AngularJS
Django REST Framework
Ad

Similar to Selenium&scrapy (20)

PPTX
Selenium Basics and Overview topics.pptx
PPTX
Selenium Basics and Overview1233444.pptx
PDF
Introduction to Selenium Webdriver - SpringPeople
PDF
Automation Testing using Selenium Webdriver
PDF
selenium-webdriver-interview-questions.pdf
PPTX
Selenium.pptx
PPT
QSpiders - Automation using Selenium
PPTX
Test Automation Using Selenium
PPTX
تست وب اپ ها با سلنیوم - علیرضا عظیم زاده میلانی
PPTX
Test automation using selenium
PPT
Selenium Basics by Quontra Solutions
PPTX
Basics of selenium containing features of selenium
PPTX
A Simple Guide to Selenium Software Testing
PPTX
Demystifying Selenium framework
PPT
BCS Selenium Workshop
PDF
Selenium Testing The Complete Step-by-Step Tutorial.pdf
ODP
Automated UI testing. Selenium. DrupalCamp Kyiv 2011
PPTX
Introduction to the Selenium_Session1.pptx
PDF
Mastering Selenium WebDriver: A Comprehensive Tutorial with Real-World Examples
Selenium Basics and Overview topics.pptx
Selenium Basics and Overview1233444.pptx
Introduction to Selenium Webdriver - SpringPeople
Automation Testing using Selenium Webdriver
selenium-webdriver-interview-questions.pdf
Selenium.pptx
QSpiders - Automation using Selenium
Test Automation Using Selenium
تست وب اپ ها با سلنیوم - علیرضا عظیم زاده میلانی
Test automation using selenium
Selenium Basics by Quontra Solutions
Basics of selenium containing features of selenium
A Simple Guide to Selenium Software Testing
Demystifying Selenium framework
BCS Selenium Workshop
Selenium Testing The Complete Step-by-Step Tutorial.pdf
Automated UI testing. Selenium. DrupalCamp Kyiv 2011
Introduction to the Selenium_Session1.pptx
Mastering Selenium WebDriver: A Comprehensive Tutorial with Real-World Examples
Ad

Recently uploaded (20)

PPTX
Big Data Technologies - Introduction.pptx
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
Approach and Philosophy of On baking technology
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Empathic Computing: Creating Shared Understanding
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
cuic standard and advanced reporting.pdf
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Electronic commerce courselecture one. Pdf
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Encapsulation theory and applications.pdf
PPTX
Cloud computing and distributed systems.
PDF
Unlocking AI with Model Context Protocol (MCP)
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Big Data Technologies - Introduction.pptx
Building Integrated photovoltaic BIPV_UPV.pdf
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Approach and Philosophy of On baking technology
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Reach Out and Touch Someone: Haptics and Empathic Computing
Diabetes mellitus diagnosis method based random forest with bat algorithm
Mobile App Security Testing_ A Comprehensive Guide.pdf
Review of recent advances in non-invasive hemoglobin estimation
Empathic Computing: Creating Shared Understanding
The AUB Centre for AI in Media Proposal.docx
The Rise and Fall of 3GPP – Time for a Sabbatical?
cuic standard and advanced reporting.pdf
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Electronic commerce courselecture one. Pdf
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Encapsulation theory and applications.pdf
Cloud computing and distributed systems.
Unlocking AI with Model Context Protocol (MCP)
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy

Selenium&scrapy

  • 1. Selenium & Scrapy Web UI testing and Web Scraping
  • 2. About me Arcangelo Saracino IT student at Bari University 2016-2018 Web developer at Aryma 2018- Feb2019 Web developer at Enterprise Digital Solution saracinoarcangelo@gmail.com github.com/Arkango
  • 3. Selenium Selenium is a portable framework for testing web applications. Selenium provides a playback (formerly also recording) tool for authoring functional tests without the need to learn a test scripting language (Selenium IDE). It also provides a test domain-specific language (Selenese) to write tests in a number of popular programming languages, including C#, Groovy, Java, Perl, PHP, Python, Ruby and Scala. The tests can then run against most modern web browsers. Selenium deploys on Windows, Linux, and macOS platforms. It is open-source software, released under the Apache 2.0 license: web developers can download and use it without charge. Source: Wikipedia
  • 4. Selenium Components ● Selenium IDE ● Selenium Client Api ● Selenium Web Driver ● Selenium Remote Control ● Selenium Grid
  • 5. Selenium IDE Selenium IDE is a complete integrated development environment (IDE) for Selenium tests. It is implemented as a Firefox Add-On and as a Chrome Extension. It allows for recording, editing, and debugging of functional tests. It was previously known as Selenium Recorder. Selenium-IDE was originally created by Shinya Kasatani and donated to the Selenium project in 2006. Selenium IDE was previously little-maintained. Selenium IDE began being actively maintained in 2018. Scripts may be automatically recorded and edited manually providing autocompletion support and the ability to move commands around quickly. Scripts are recorded in Selenese, a special test scripting language for Selenium. Selenese provides commands for performing actions in a browser (click a link, select an option), and for retrieving data from the resulting pages.
  • 6. Selenium Client API As an alternative to writing tests in Selenese, tests can also be written in various programming languages. These tests then communicate with Selenium by calling methods in the Selenium Client API. Selenium currently provides client APIs for Java, C#, Ruby, JavaScript, R and Python. With Selenium 2, a new Client API was introduced (with WebDriver as its central component). However, the old API (using class Selenium) is still supported.
  • 7. Selenium Web Driver Selenium WebDriver is the successor to Selenium RC. Selenium WebDriver accepts commands (sent in Selenese, or via a Client API) and sends them to a browser. This is implemented through a browser-specific browser driver, which sends commands to a browser and retrieves results. Most browser drivers actually launch and access a browser application (such as Firefox, Chrome, Internet Explorer, Safari, or Microsoft Edge); there is also an HtmlUnit browser driver, which simulates a browser using the headless browser HtmlUnit.
  • 8. Hands on code ● An example …..
  • 9. Scrapy Scrapy (/ skre pi/ SKRAY-pee) is a free and open-source web-crawlingˈ ɪ framework written in Python. Originally designed for web scraping, it can also be used to extract data using APIs or as a general-purpose web crawler. It is currently maintained by Scrapinghub Ltd., a web- scraping development and services company. Scrapy project architecture is built around "spiders", which are self- contained crawlers that are given a set of instructions. Following the spirit of other don't repeat yourself frameworks, such as Django,[4] it makes it easier to build and scale large crawling projects by allowing developers to reuse their code. Scrapy also provides a web-crawling shell, which can be used by developers to test their assumptions on a site’s behavior.[5]
  • 10. Scrapy: Basic Concept ● Command line tools Scrapy is controlled through the scrapy command-line tool, to be referred here as the “Scrapy tool” to differentiate it from the sub-commands, which we just call “commands” or “Scrapy commands”. ● Spiders Spiders are classes which define how a certain site (or a group of sites) will be scraped, including how to perform the crawl (i.e. follow links) and how to extract structured data from their pages (i.e. scraping items). In other words, Spiders are the place where you define the custom behaviour for crawling and parsing pages for a particular site (or, in some cases, a group of sites). ● Selectors Extract the data from web pages using XPath. ● Scrapy Shell Test your extraction code in an interactive environment.
  • 11. Scrapy: Basic Concept 2 ● Items Define the data you want to scrape. ● Items Loader Populate your items with the extracted data. ● Items Pipeline Post-process and store your scraped data. ● Feed Exports Output your scraped data using different formats and storages. ● Request and responses Scrapy uses Request and Response objects for crawling web sites.
  • 12. Scrapy: Basic Concept 3 ● Link extractor Convenient classes to extract links to follow from pages. ● Settings Learn how to configure Scrapy and see all available settings. ● Exceptions See all available exceptions and their meaning.
  • 13. Let’s code ● An example …..
  • 14. Usages ● Testing ui ● Web crawling ● Hacking
  • 15. Sources ● Wikipedia.org ● https://www.seleniumhq.org/ ● https://scrapy.org/ ● Tutorial: https://selenium-python.readthedocs.io/,https://www.youtube.com/watch?v=XDn60jw68tM, https://docs.scrapy.org/en/latest/intro/tutorial.html
  • 17. About me Arcangelo Saracino IT student at Bari University 2016-2018 Web developer at Aryma 2018- Feb2019 Web developer at Enterprise Digital Solution saracinoarcangelo@gmail.com github.com/Arkango