ScrapeGraphAI
You Only Scrape Once
Our Team
Marco Perini Lorenzo Padoan
MSc Mechatronics Engineering
-
Junior Researcher @
Eurac Research
Research Fellow @ UNIPD
-
Data Engineer @
Motion Analytica
Marco Vinciguerra
MSc Computer Engineering
mvincig11@gmail.com
perinim.98@gmail.com lorenzo.padoan977@gmail.com
We live in a world that produces Zettabytes of data
The era of big data
Source: International Data Corporation (IDC)
We live in a data-hungry world
The era of Big Data
Training di LLM
Analytics
Principal source of data
Internet
Scraping is the act of extracting information from a
data source
What is a scraper
Scraping
Web
Data
Common scraping tools
Dev tools
Web services
Is it possible to scrape
websites without any
knowledge of HTML, just by
writing what I want and how
we want it?
Our Question
Our Solution
Scrapegraph-ai
Yes 🎉🎉🎉
with
What is this spider? 🤔
ScrapeGraphAI
Python
Library
Open
Source
Scraping
powered by llm
Highly
modular
Available llm & tools
Main Workflow
Input URL
Ask what you
want to scrape
Scrape right
away
Adapts to website structure changes
Corrects itself until it succeed
Flexibility in scraping different websites
Comparative results
Let’s suppose we want to extract the news titles from
https://www.wired.com
Comparative results
BeaufulSoup ScrapeGraphAI
More concise, less code, reusability.
For scraping another website
you just have to change 2 lines!!!!
Audio Speech
Prompt: Make an audio summary of the news
Answer: {'news': [{'title': 'The LabLabAI hackathon deadline is close!',
'summary': 'Deadlines are coming close so be ready...'}]}
+
Scraping from text
It is possible to insert various type of text,
Classic string, downloaded HTML code, XML etc...
Input
generic text
Ask what you
want to scrape
Generate
answer
Pros of Scrapegraph-ai
Low code and fast implementation
Fault tollerance to dinamic HTML code
Possibility to run local LLM
Portability
No possibles data leaks if you run local LLM
😍
Potential users
Companies
Developers + Researchers
+
Some numbers
After 2 months of development has:
100 + stars on Github
8 forks
4k + downloads on pypi (pip)
🤗
Call for Action
Early Adopters Community Partnerships
+ +
Demo time!
Repositories
Streamlit Website
ScrapeGraphAI +
VinciGit00/Scrapegraph-ai VinciGit00/Scrapegraph-LabLabAI-Hackathon
If you like the project feel free to leave a star ⭐️
A promise is a promise
Lorenzo: if we
reach 1000
stars I will buy
this
Me:
ScrapeGraphAI
You Only Scrape Once
Thank you for the attention

More Related Content

PDF
Smart api
PDF
Elk - Elasticsearch Logstash Kibana stack explained
PDF
Microsoft Fabric and Open AI - Caso d'uso reale
PPTX
Seo horror stories - ConvegnoGT 2013 - Andrea Scarpetta
PDF
DS4Biz - Data Science for Business
PDF
Siamo tutti bravi con il browser degli altri!
PPTX
Open source un'opportunità di business
PPTX
Gam04 introduzione a-netduino_final
Smart api
Elk - Elasticsearch Logstash Kibana stack explained
Microsoft Fabric and Open AI - Caso d'uso reale
Seo horror stories - ConvegnoGT 2013 - Andrea Scarpetta
DS4Biz - Data Science for Business
Siamo tutti bravi con il browser degli altri!
Open source un'opportunità di business
Gam04 introduzione a-netduino_final

Similar to ScrapeGraphAI: a new way to scrape context with AI (20)

PPT
Laboratorio Internet: 1. Introduzione
PPTX
Creare un Information Radiator con Delphi
PPTX
RomaJS June 2022
PPTX
Da JavaScript a TypeScript
PDF
Sotto il letto, sopra il cloud: costruirsi un’infrastruttura da zero
PDF
Microsoft Fast - Overview
PDF
LinuxDay2013 - Web2py: make the web easier
PDF
2015-06 Roberto Boselli, Dal dato non strutturato alle ontologie
PDF
App Engine + Python
PPTX
Perché è così difficile il deploy dei database - DevCast DevOps Serie
PPTX
Data Analysis & Machine Learning
PDF
... thinking about Microformats!
PDF
C Net illuminated 4th Edition Arthur Gittleman
PDF
Alice in WordPressLand - "We're all mad here"
PDF
Introduzione alla localizzazione web
PDF
Kotlin hexagonal-architecture
PPTX
Stefano Chiccarelli - L'ecosistema della scena Hacker
PPTX
Industrial Iot - IotSaturday
PPTX
Nicola Della Marina: Magento Frontend next level
ODP
Java&Solidarieta
Laboratorio Internet: 1. Introduzione
Creare un Information Radiator con Delphi
RomaJS June 2022
Da JavaScript a TypeScript
Sotto il letto, sopra il cloud: costruirsi un’infrastruttura da zero
Microsoft Fast - Overview
LinuxDay2013 - Web2py: make the web easier
2015-06 Roberto Boselli, Dal dato non strutturato alle ontologie
App Engine + Python
Perché è così difficile il deploy dei database - DevCast DevOps Serie
Data Analysis & Machine Learning
... thinking about Microformats!
C Net illuminated 4th Edition Arthur Gittleman
Alice in WordPressLand - "We're all mad here"
Introduzione alla localizzazione web
Kotlin hexagonal-architecture
Stefano Chiccarelli - L'ecosistema della scena Hacker
Industrial Iot - IotSaturday
Nicola Della Marina: Magento Frontend next level
Java&Solidarieta
Ad

More from infogdgmi (7)

PDF
Ktor - Definizioni di Path, Integrazioni, Plugin e build fino al rilascio
PDF
Let's Build a House Price Predictor with Google Cloud!.pdf
PDF
Public Speaking - Il Potere delle Voce.pdf
PDF
Videogame localization & technology_ how to enhance the power of translation.pdf
PDF
Pragmatic UI testing with Compose Semantics.pdf
PDF
Da Arduino ad Android_ illumina il Natale con il BLE
PDF
Droids on wheels
Ktor - Definizioni di Path, Integrazioni, Plugin e build fino al rilascio
Let's Build a House Price Predictor with Google Cloud!.pdf
Public Speaking - Il Potere delle Voce.pdf
Videogame localization & technology_ how to enhance the power of translation.pdf
Pragmatic UI testing with Compose Semantics.pdf
Da Arduino ad Android_ illumina il Natale con il BLE
Droids on wheels
Ad

ScrapeGraphAI: a new way to scrape context with AI