Generation of Realistic Navigation Paths for Web Site Testing using RNNs and GANs

Industrial and Information
Engineering
Generation of Realistic Navigation Paths for Web Site Testing
using Recurrent Neural Networks and Generative Adversarial
Neural Networks
Silvio Pavanetto and Marco Brambilla
Semantic Web and Linked Open Data Helsinki,
Finland, Online on 9 – 12 June 2020

Introduction and Motivations
Why weblog generation?
1. Improve products even before the release
2. Generate open high-quality data for research
3. Related work with no focus on high-quality weblog
generation
3.1 Only few open source libraries

Introduction and Motivations
Why weblog generation?

Problem Definition
Challenges to be Faced
1. Understand if deep learning algorithms can
generate better weblogs data than statistical
methods
2. Understand what better weblog means
3. Among the various deep learning
techniques, apply GAN (Generative
Adversarial Network) to a new task

Problem Definition
Roadmap for solving the problem
Pre-process a publicly available weblog
Develop statistical
algorithm
Develop recurrent
neural network
Develop GAN
Evaluate the quality
of the generated data

Proposed Approach
Pre-processing algorithm
Cleaning
• Remove entries having
response code other than 200
• Remove activities coming
from bots
• Remove no HTML pages
• List of possible entry points
• Navigation pattern using data
mining (Apriori)
• Generation of datasets that
will be used by the other
algorithms
Knowledge extraction

Proposed Approach
Deep Learning - RNN
Why Recurrent Neural Network?
• Well suited for processing sequential data

Proposed Approach
Generative Adversarial Network
• New type of neural
network (first in 2014)
with incredible
generation capabilities
• Almost used only in
computer vision
Key concept: Put two neural networks one against the other
in a two-player game

Proposed Approach
GAN Implementation – Possible Solution
GAN is designed for generating continuous data
Possible solution:
• Generative model treated as an agent of reinforcement learning
(RL)
• The state is composed by the generated URLs so far, and the
action is the next URL to be generated
Reward: The discriminator produces a probability for the
sequence of being real

Experiments
Understand if a weblog is good
Evaluation Metric: BLEU
BLEU, or Bilingual Evaluation Understudy, is a score for
comparing a candidate translation of text to one or more
reference translations, or also, is an algorithm for evaluating
the quality of text which has been machine-translated, from
one natural language to another.

Experiments
Understand if a weblog is good
BLEU is not enough.
Human Evaluation!
• 50 real sequences and 50 generated by the algorithms mixed
• 6 judges are invited to check the 100 sequences
• +1 for the algorithm if the judge is fooled
• +0 point if the judge discovers that the sequence is not real
• Scores are averaged among all the judges
Evaluation game:

Experiments
Evaluation – Final Comparison
Weblog generation performance comparison

Conclusions
We proposed a step forward towards automatic production of high-
quality weblog using deep learning techniques, such as recurrent neural
network and generative adversarial neural networks.
Deep learning methods are suitable for weblog generation:
• The GAN is the best algorithm: it outperforms the baseline by:
• 0.2116 with the Human metric
• 0.1432 with the BLEU metric

Future Work
Integration with Model-Driven approaches useful for visualizing
statistics about weblogs in a graphical way
Addition of more variables in the training of the network that could
improve the quality of the generated weblog
Evaluation with other weblogs, belonging to different websites

Generation of Realistic Navigation Paths for Web Site Testing using RNNs and GANs

More Related Content

Similar to Generation of Realistic Navigation Paths for Web Site Testing using RNNs and GANs (20)

More from Marco Brambilla (20)

Recently uploaded (20)

Generation of Realistic Navigation Paths for Web Site Testing using RNNs and GANs

Editor's Notes