SlideShare a Scribd company logo
Industrial and Information
Engineering
Generation of Realistic Navigation Paths for Web Site Testing
using Recurrent Neural Networks and Generative Adversarial
Neural Networks
Silvio Pavanetto and Marco Brambilla
Semantic Web and Linked Open Data Helsinki,
Finland, Online on 9 – 12 June 2020
Silvio Pavanetto and Marco Brambilla
Introduction and Motivations
Why weblog generation?
1. Improve products even before the release
2. Generate open high-quality data for research
3. Related work with no focus on high-quality weblog
generation
3.1 Only few open source libraries
Silvio Pavanetto and Marco Brambilla
Introduction and Motivations
Why weblog generation?
Silvio Pavanetto and Marco Brambilla
Problem Definition
Challenges to be Faced
1. Understand if deep learning algorithms can
generate better weblogs data than statistical
methods
2. Understand what better weblog means
3. Among the various deep learning
techniques, apply GAN (Generative
Adversarial Network) to a new task
Silvio Pavanetto and Marco Brambilla
Problem Definition
Roadmap for solving the problem
Pre-process a publicly available weblog
Develop statistical
algorithm
Develop recurrent
neural network
Develop GAN
Evaluate the quality
of the generated data
Silvio Pavanetto and Marco Brambilla
Proposed Approach
Pre-processing algorithm
Cleaning
• Remove entries having
response code other than 200
• Remove activities coming
from bots
• Remove no HTML pages
• List of possible entry points
• Navigation pattern using data
mining (Apriori)
• Generation of datasets that
will be used by the other
algorithms
Knowledge extraction
Silvio Pavanetto and Marco Brambilla
Proposed Approach
Deep Learning - RNN
Why Recurrent Neural Network?
• Well suited for processing sequential data
Silvio Pavanetto and Marco Brambilla
Proposed Approach
Generative Adversarial Network
• New type of neural
network (first in 2014)
with incredible
generation capabilities
• Almost used only in
computer vision
Key concept: Put two neural networks one against the other
in a two-player game
Silvio Pavanetto and Marco Brambilla
Proposed Approach
GAN Implementation – Possible Solution
GAN is designed for generating continuous data
Possible solution:
• Generative model treated as an agent of reinforcement learning
(RL)
• The state is composed by the generated URLs so far, and the
action is the next URL to be generated
Reward: The discriminator produces a probability for the
sequence of being real
Silvio Pavanetto and Marco Brambilla
Experiments
Understand if a weblog is good
Evaluation Metric: BLEU
BLEU, or Bilingual Evaluation Understudy, is a score for
comparing a candidate translation of text to one or more
reference translations, or also, is an algorithm for evaluating
the quality of text which has been machine-translated, from
one natural language to another.
Silvio Pavanetto and Marco Brambilla
Experiments
Understand if a weblog is good
BLEU is not enough.
Human Evaluation!
• 50 real sequences and 50 generated by the algorithms mixed
• 6 judges are invited to check the 100 sequences
• +1 for the algorithm if the judge is fooled
• +0 point if the judge discovers that the sequence is not real
• Scores are averaged among all the judges
Evaluation game:
Silvio Pavanetto and Marco Brambilla
Experiments
Evaluation – Final Comparison
Weblog generation performance comparison
Silvio Pavanetto and Marco Brambilla
Conclusions
We proposed a step forward towards automatic production of high-
quality weblog using deep learning techniques, such as recurrent neural
network and generative adversarial neural networks.
Deep learning methods are suitable for weblog generation:
• The GAN is the best algorithm: it outperforms the baseline by:
• 0.2116 with the Human metric
• 0.1432 with the BLEU metric
Silvio Pavanetto and Marco Brambilla
Future Work
Integration with Model-Driven approaches useful for visualizing
statistics about weblogs in a graphical way
Addition of more variables in the training of the network that could
improve the quality of the generated weblog
Evaluation with other weblogs, belonging to different websites

More Related Content

PPTX
Analyzing rich club behavior in open source projects
PPTX
Trigger.eu: Cocteau game for policy making - introduction and demo
PPTX
Community analysis using graph representation learning on social networks
PDF
Collaboration between Software Developers and the Impact of Proximity
PPTX
Taking it Public: Visualizing Geospatial Data on the Web Using Shiny
PPTX
Social Network Analysis and Visualization
PDF
Identifying news clusters using Q-analysis and Modularity
PPTX
Journey of Generative AI
Analyzing rich club behavior in open source projects
Trigger.eu: Cocteau game for policy making - introduction and demo
Community analysis using graph representation learning on social networks
Collaboration between Software Developers and the Impact of Proximity
Taking it Public: Visualizing Geospatial Data on the Web Using Shiny
Social Network Analysis and Visualization
Identifying news clusters using Q-analysis and Modularity
Journey of Generative AI

Similar to Generation of Realistic Navigation Paths for Web Site Testing using RNNs and GANs (20)

PDF
Deep Domain
PPTX
Applied AI Workshop - Presentation - Connect Day GDL
PPTX
Introduction to Generative AI refers to a subset of artificial intelligence
PPTX
Deep Learning: Advances Of The Last Year
PPTX
Bitcoin Price Prediction
PPTX
An Introduction to Generative Artificial Intelligence
PPTX
Deep Learning with Python (PyData Seattle 2015)
PDF
Continual Learning: why, how, and when
PDF
Animesh Prasad and Muthu Kumar Chandrasekaran - WESST - Basics of Deep Learning
PDF
Deep Learning & NLP: Graphs to the Rescue!
PPTX
ICLR 2020 Recap
PDF
Deep Neural Networks for Machine Learning
PDF
FreddyAyalaTorchDomineering
PDF
Overview of Artificial Intelligence - Technology
PPTX
Tsinghua invited talk_zhou_xing_v2r0
PPTX
Semantic, Cognitive and Perceptual Computing -Deep learning
PDF
PDF
[系列活動] 一日搞懂生成式對抗網路
PDF
NIPS 2016 Highlights - Sebastian Ruder
Deep Domain
Applied AI Workshop - Presentation - Connect Day GDL
Introduction to Generative AI refers to a subset of artificial intelligence
Deep Learning: Advances Of The Last Year
Bitcoin Price Prediction
An Introduction to Generative Artificial Intelligence
Deep Learning with Python (PyData Seattle 2015)
Continual Learning: why, how, and when
Animesh Prasad and Muthu Kumar Chandrasekaran - WESST - Basics of Deep Learning
Deep Learning & NLP: Graphs to the Rescue!
ICLR 2020 Recap
Deep Neural Networks for Machine Learning
FreddyAyalaTorchDomineering
Overview of Artificial Intelligence - Technology
Tsinghua invited talk_zhou_xing_v2r0
Semantic, Cognitive and Perceptual Computing -Deep learning
[系列活動] 一日搞懂生成式對抗網路
NIPS 2016 Highlights - Sebastian Ruder
Ad

More from Marco Brambilla (20)

PDF
A GraphRAG approach for Energy Efficiency Q&A
PDF
Essential concepts of data architectures
PDF
M.Sc. Thesis Topics and Proposals @ Polimi Data Science Lab - 2024 - prof. Br...
PDF
Thesis Topics and Proposals @ Polimi Data Science Lab - 2023 - prof. Brambill...
PPTX
Hierarchical Transformers for User Semantic Similarity - ICWE 2023
PDF
Exploring the Bi-verse. A trip across the digital and physical ecospheres
PPTX
Conversation graphs in Online Social Media
PDF
Analysis of On-line Debate on Long-Running Political Phenomena. The Brexit C...
PDF
Available Data Science M.Sc. Thesis Proposals
PPTX
Data Cleaning for social media knowledge extraction
PPTX
Iterative knowledge extraction from social networks. The Web Conference 2018
PDF
Driving Style and Behavior Analysis based on Trip Segmentation over GPS Info...
PDF
Myths and challenges in knowledge extraction and analysis from human-generate...
PPTX
Harvesting Knowledge from Social Networks: Extracting Typed Relationships amo...
PPTX
Model-driven Development of User Interfaces for IoT via Domain-specific Comp...
PPTX
A Model-Based Method for Seamless Web and Mobile Experience. Splash 2016 conf.
PDF
Big Data and Stream Data Analysis at Politecnico di Milano
PPTX
Web Science. An introduction
PPTX
On the Quest for Changing Knowledge. Capturing emerging entities from social ...
PPTX
Studying Multicultural Diversity of Cities and Neighborhoods through Social M...
A GraphRAG approach for Energy Efficiency Q&A
Essential concepts of data architectures
M.Sc. Thesis Topics and Proposals @ Polimi Data Science Lab - 2024 - prof. Br...
Thesis Topics and Proposals @ Polimi Data Science Lab - 2023 - prof. Brambill...
Hierarchical Transformers for User Semantic Similarity - ICWE 2023
Exploring the Bi-verse. A trip across the digital and physical ecospheres
Conversation graphs in Online Social Media
Analysis of On-line Debate on Long-Running Political Phenomena. The Brexit C...
Available Data Science M.Sc. Thesis Proposals
Data Cleaning for social media knowledge extraction
Iterative knowledge extraction from social networks. The Web Conference 2018
Driving Style and Behavior Analysis based on Trip Segmentation over GPS Info...
Myths and challenges in knowledge extraction and analysis from human-generate...
Harvesting Knowledge from Social Networks: Extracting Typed Relationships amo...
Model-driven Development of User Interfaces for IoT via Domain-specific Comp...
A Model-Based Method for Seamless Web and Mobile Experience. Splash 2016 conf.
Big Data and Stream Data Analysis at Politecnico di Milano
Web Science. An introduction
On the Quest for Changing Knowledge. Capturing emerging entities from social ...
Studying Multicultural Diversity of Cities and Neighborhoods through Social M...
Ad

Recently uploaded (20)

PDF
EN-Survey-Report-SAP-LeanIX-EA-Insights-2025.pdf
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 41
PDF
Softaken Excel to vCard Converter Software.pdf
PDF
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
PPTX
history of c programming in notes for students .pptx
PPTX
L1 - Introduction to python Backend.pptx
PDF
Adobe Illustrator 28.6 Crack My Vision of Vector Design
PDF
Understanding Forklifts - TECH EHS Solution
PDF
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
PPTX
Introduction to Artificial Intelligence
PPTX
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
PDF
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
PDF
Odoo Companies in India – Driving Business Transformation.pdf
PDF
Adobe Premiere Pro 2025 (v24.5.0.057) Crack free
PDF
Which alternative to Crystal Reports is best for small or large businesses.pdf
PDF
Design an Analysis of Algorithms I-SECS-1021-03
PDF
wealthsignaloriginal-com-DS-text-... (1).pdf
PDF
AI in Product Development-omnex systems
PDF
top salesforce developer skills in 2025.pdf
EN-Survey-Report-SAP-LeanIX-EA-Insights-2025.pdf
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
Internet Downloader Manager (IDM) Crack 6.42 Build 41
Softaken Excel to vCard Converter Software.pdf
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
history of c programming in notes for students .pptx
L1 - Introduction to python Backend.pptx
Adobe Illustrator 28.6 Crack My Vision of Vector Design
Understanding Forklifts - TECH EHS Solution
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
Introduction to Artificial Intelligence
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
Odoo Companies in India – Driving Business Transformation.pdf
Adobe Premiere Pro 2025 (v24.5.0.057) Crack free
Which alternative to Crystal Reports is best for small or large businesses.pdf
Design an Analysis of Algorithms I-SECS-1021-03
wealthsignaloriginal-com-DS-text-... (1).pdf
AI in Product Development-omnex systems
top salesforce developer skills in 2025.pdf

Generation of Realistic Navigation Paths for Web Site Testing using RNNs and GANs

  • 1. Industrial and Information Engineering Generation of Realistic Navigation Paths for Web Site Testing using Recurrent Neural Networks and Generative Adversarial Neural Networks Silvio Pavanetto and Marco Brambilla Semantic Web and Linked Open Data Helsinki, Finland, Online on 9 – 12 June 2020
  • 2. Silvio Pavanetto and Marco Brambilla Introduction and Motivations Why weblog generation? 1. Improve products even before the release 2. Generate open high-quality data for research 3. Related work with no focus on high-quality weblog generation 3.1 Only few open source libraries
  • 3. Silvio Pavanetto and Marco Brambilla Introduction and Motivations Why weblog generation?
  • 4. Silvio Pavanetto and Marco Brambilla Problem Definition Challenges to be Faced 1. Understand if deep learning algorithms can generate better weblogs data than statistical methods 2. Understand what better weblog means 3. Among the various deep learning techniques, apply GAN (Generative Adversarial Network) to a new task
  • 5. Silvio Pavanetto and Marco Brambilla Problem Definition Roadmap for solving the problem Pre-process a publicly available weblog Develop statistical algorithm Develop recurrent neural network Develop GAN Evaluate the quality of the generated data
  • 6. Silvio Pavanetto and Marco Brambilla Proposed Approach Pre-processing algorithm Cleaning • Remove entries having response code other than 200 • Remove activities coming from bots • Remove no HTML pages • List of possible entry points • Navigation pattern using data mining (Apriori) • Generation of datasets that will be used by the other algorithms Knowledge extraction
  • 7. Silvio Pavanetto and Marco Brambilla Proposed Approach Deep Learning - RNN Why Recurrent Neural Network? • Well suited for processing sequential data
  • 8. Silvio Pavanetto and Marco Brambilla Proposed Approach Generative Adversarial Network • New type of neural network (first in 2014) with incredible generation capabilities • Almost used only in computer vision Key concept: Put two neural networks one against the other in a two-player game
  • 9. Silvio Pavanetto and Marco Brambilla Proposed Approach GAN Implementation – Possible Solution GAN is designed for generating continuous data Possible solution: • Generative model treated as an agent of reinforcement learning (RL) • The state is composed by the generated URLs so far, and the action is the next URL to be generated Reward: The discriminator produces a probability for the sequence of being real
  • 10. Silvio Pavanetto and Marco Brambilla Experiments Understand if a weblog is good Evaluation Metric: BLEU BLEU, or Bilingual Evaluation Understudy, is a score for comparing a candidate translation of text to one or more reference translations, or also, is an algorithm for evaluating the quality of text which has been machine-translated, from one natural language to another.
  • 11. Silvio Pavanetto and Marco Brambilla Experiments Understand if a weblog is good BLEU is not enough. Human Evaluation! • 50 real sequences and 50 generated by the algorithms mixed • 6 judges are invited to check the 100 sequences • +1 for the algorithm if the judge is fooled • +0 point if the judge discovers that the sequence is not real • Scores are averaged among all the judges Evaluation game:
  • 12. Silvio Pavanetto and Marco Brambilla Experiments Evaluation – Final Comparison Weblog generation performance comparison
  • 13. Silvio Pavanetto and Marco Brambilla Conclusions We proposed a step forward towards automatic production of high- quality weblog using deep learning techniques, such as recurrent neural network and generative adversarial neural networks. Deep learning methods are suitable for weblog generation: • The GAN is the best algorithm: it outperforms the baseline by: • 0.2116 with the Human metric • 0.1432 with the BLEU metric
  • 14. Silvio Pavanetto and Marco Brambilla Future Work Integration with Model-Driven approaches useful for visualizing statistics about weblogs in a graphical way Addition of more variables in the training of the network that could improve the quality of the generated weblog Evaluation with other weblogs, belonging to different websites

Editor's Notes

  • #7: (like .png, .gif or other file types loaded inside a web page) (this task and its related issues will be discussed later)
  • #8: RNN: Artificial neural network (ANN) where connections between nodes form a directed graph along a sequence. This allows it to exhibit temporal dynamic behavior for a time sequence In the above diagram, a chunk of neural network, AA, looks at some input xtxt and outputs a value htht. A loop allows information to be passed from one step of the network to the next. These loops make recurrent neural networks seem kind of mysterious. However, if you think a bit more, it turns out that they aren’t all that different than a normal neural network. A recurrent neural network can be thought of as multiple copies of the same network, each passing a message to a successor. 
  • #10: Consider the sequence generation procedure as a sequential decision-making Process.
  • #11: Quality is considered to be the correspondence between a machine’s output and that of a human. Although it is usually used for evaluating text, we already mentioned that the task faced in this work could be associated to the text translation, because of the conceptual similarity between the sequence of pages in a single navigation session and the sequence of words in a phrase. In fact, every URL is treated as a unique "word" in the vocabulary, composed of all the pages of a particular website. Using this metric, scores are calculated for individual translated segments — generally sentences — by comparing them with a set of good quality reference translations. Those scores are then averaged over the whole corpus to reach an estimate of the translation’s overall quality. Transferring this to our case, the translated segments are the generated navigation sequences, while the good quality reference translations correspond to our original dataset: the NASA weblog.
  • #12: Humans are good in evaluting this type of data since a weblog is a composition of navigation sequence and every sequence is something that is decided and created by a human. Quality is considered to be the correspondence between a machine’s output and that of a human. Although it is usually used for evaluating text, we already mentioned that the task faced in this work could be associated to the text translation, because of the conceptual similarity between the sequence of pages in a single navigation session and the sequence of words in a phrase. In fact, every URL is treated as a unique "word" in the vocabulary, composed of all the pages of a particular website. Using this metric, scores are calculated for individual translated segments — generally sentences — by comparing them with a set of good quality reference translations. Those scores are then averaged over the whole corpus to reach an estimate of the translation’s overall quality. Transferring this to our case, the translated segments are the generated navigation sequences, while the good quality reference translations correspond to our original dataset: the NASA weblog.