SlideShare a Scribd company logo
4
Most read
6
Most read
Transformers
Background (1)
• The RNN and LSTM neural models were designed to process
language and perform tasks like classification, summarization,
translation, and sentiment detection
• RNN: Recurrent Neural Network
• LSTM: Long Short Term Memory
• In both models, layers get the next input word and have access to
some previous words, allowing it to use the word’s left context
• They used word embeddings where each word was encoded as a
vector of 100-300 real numbers representing its meaning
Background (2)
• Transformers extend this to allow the network to process a word
input knowing the words in both its left and right context
• This provides a more powerful context model
• Transformers add additional features, like attention, which
identifies the important words in this context
• And break the problem into two parts:
• An encoder (e.g., Bert)
• A decoder (e.g., GPT)
Transformer model
Encoder (e.g., BERT) Decoder (e.g., GPT)
Transformers, GPT-2, and BERT
1. A transformer uses an encoder stack to
model input, and uses decoder stack to
model output (using input information from encoder side)
2. If we do not have input, we just want to model the “next
word”, we can get rid of the encoder side of a transformer
and output “next word” one by one. This gives us GPT
3. If we are only interested in training a language model for
the input for some other tasks, then we do not need the
decoder of the transformer, that gives us BERT
Training a Transformer
• Transformers typically use semi-supervised learning with
• Unsupervised pretraining over a very large dataset of general text
• Followed by supervised fine-tuning over a focused data set of
inputs and outputs for a particular task
• Tasks for pretraining and fine-tuning commonly include:
• language modeling
• next-sentence prediction (aka completion)
• question answering
• reading comprehension
• sentiment analysis
• paraphrasing
Pretrained models
• Since training a model requires huge datasets of text and significan
computation, researchers often use common pretrained models
• Examples (circa December 2021) include
• Google’s BERT model
• Huggingface’s various Transformer models
• OpenAI’s and GPT-3 models
Hugggingface Models
OpenAI Application Examples
GPT-2, BERT
03
03
1542M
762M
345M
117M parameters
GPT released June 2018
GPT-2 released Nov. 2019 with 1.5B parameters
GPT-3 released in 2020 with 175B parameters

More Related Content

PDF
tecknology mmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmm
PPTX
Gnerative AI presidency Module1_L4_LLMs_new.pptx
PPTX
Transformers AI PPT.pptx
PDF
Introduction to Large Language Models and the Transformer Architecture.pdf
PPTX
Understanding Large Language Models (1).pptx
PPTX
Transfer Learning in NLP: A Survey
PDF
Master LLMs with LangChain -the basics of LLM
PDF
An Introduction to Pre-training General Language Representations
tecknology mmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmm
Gnerative AI presidency Module1_L4_LLMs_new.pptx
Transformers AI PPT.pptx
Introduction to Large Language Models and the Transformer Architecture.pdf
Understanding Large Language Models (1).pptx
Transfer Learning in NLP: A Survey
Master LLMs with LangChain -the basics of LLM
An Introduction to Pre-training General Language Representations

Similar to log analytic using generative AI transformer model (20)

PDF
End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF
PPTX
presenttat related toautomated text summ
PDF
M5 Topic 1 - Encoder Decoder MODEL-JEC.pdf
PPTX
LLM GPT-3: Language models are few-shot learners
PPTX
code comments help developers comprehend programs and reduce additional time ...
PDF
Transformer Models_ BERT vs. GPT.pdf
PDF
NLP using transformers
PPTX
The Science Behind GPT and Hugging Face Transformers.pptx
PPTX
BERT QnA System for Airplane Flight Manual
PDF
LLM.pdf
PPTX
Transformer Zoo
PDF
Integration of speech recognition with computer assisted translation
PDF
GPT and other Text Transformers: Black Swans and Stochastic Parrots
PDF
BERT - Part 1 Learning Notes of Senthil Kumar
PDF
Introduction to Deep Learning Lecture 20 Large Language Models
PPTX
고급컴파일러구성론_개레_230303.pptx
PPTX
A Beginner's Guide to Large Language Models
PDF
LLaMA-Omni: Pioneering Low-Latency, High-Quality Speech Interaction with Larg...
PPT
mt_cat_presentations CAT TRANSLATION PPT
PDF
BERT- Pre-training of Deep Bidirectional Transformers for Language Understand...
End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF
presenttat related toautomated text summ
M5 Topic 1 - Encoder Decoder MODEL-JEC.pdf
LLM GPT-3: Language models are few-shot learners
code comments help developers comprehend programs and reduce additional time ...
Transformer Models_ BERT vs. GPT.pdf
NLP using transformers
The Science Behind GPT and Hugging Face Transformers.pptx
BERT QnA System for Airplane Flight Manual
LLM.pdf
Transformer Zoo
Integration of speech recognition with computer assisted translation
GPT and other Text Transformers: Black Swans and Stochastic Parrots
BERT - Part 1 Learning Notes of Senthil Kumar
Introduction to Deep Learning Lecture 20 Large Language Models
고급컴파일러구성론_개레_230303.pptx
A Beginner's Guide to Large Language Models
LLaMA-Omni: Pioneering Low-Latency, High-Quality Speech Interaction with Larg...
mt_cat_presentations CAT TRANSLATION PPT
BERT- Pre-training of Deep Bidirectional Transformers for Language Understand...
Ad

More from KalimuthuVelappan (9)

PPT
rdma-intro-module.ppt
PPT
lesson24.ppt
PPT
kerch04.ppt
PPTX
Netlink-Optimization.pptx
PPT
memory_mapping.ppt
PPTX
DPKG caching framework-latest .pptx
PPT
memory.ppt
PPTX
stack.pptx
PPT
lesson05.ppt
rdma-intro-module.ppt
lesson24.ppt
kerch04.ppt
Netlink-Optimization.pptx
memory_mapping.ppt
DPKG caching framework-latest .pptx
memory.ppt
stack.pptx
lesson05.ppt
Ad

Recently uploaded (20)

PPTX
Lecture Notes Electrical Wiring System Components
PDF
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
PPTX
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
PDF
Operating System & Kernel Study Guide-1 - converted.pdf
PPT
CRASH COURSE IN ALTERNATIVE PLUMBING CLASS
PPTX
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
PPTX
UNIT-1 - COAL BASED THERMAL POWER PLANTS
PPTX
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
PDF
Digital Logic Computer Design lecture notes
PPTX
CYBER-CRIMES AND SECURITY A guide to understanding
PPTX
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
PPTX
bas. eng. economics group 4 presentation 1.pptx
PDF
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
PDF
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
PPTX
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
PDF
Automation-in-Manufacturing-Chapter-Introduction.pdf
PDF
composite construction of structures.pdf
PDF
PPT on Performance Review to get promotions
PPTX
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
PDF
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
Lecture Notes Electrical Wiring System Components
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
Operating System & Kernel Study Guide-1 - converted.pdf
CRASH COURSE IN ALTERNATIVE PLUMBING CLASS
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
UNIT-1 - COAL BASED THERMAL POWER PLANTS
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
Digital Logic Computer Design lecture notes
CYBER-CRIMES AND SECURITY A guide to understanding
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
bas. eng. economics group 4 presentation 1.pptx
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
Automation-in-Manufacturing-Chapter-Introduction.pdf
composite construction of structures.pdf
PPT on Performance Review to get promotions
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...

log analytic using generative AI transformer model

  • 2. Background (1) • The RNN and LSTM neural models were designed to process language and perform tasks like classification, summarization, translation, and sentiment detection • RNN: Recurrent Neural Network • LSTM: Long Short Term Memory • In both models, layers get the next input word and have access to some previous words, allowing it to use the word’s left context • They used word embeddings where each word was encoded as a vector of 100-300 real numbers representing its meaning
  • 3. Background (2) • Transformers extend this to allow the network to process a word input knowing the words in both its left and right context • This provides a more powerful context model • Transformers add additional features, like attention, which identifies the important words in this context • And break the problem into two parts: • An encoder (e.g., Bert) • A decoder (e.g., GPT)
  • 4. Transformer model Encoder (e.g., BERT) Decoder (e.g., GPT)
  • 5. Transformers, GPT-2, and BERT 1. A transformer uses an encoder stack to model input, and uses decoder stack to model output (using input information from encoder side) 2. If we do not have input, we just want to model the “next word”, we can get rid of the encoder side of a transformer and output “next word” one by one. This gives us GPT 3. If we are only interested in training a language model for the input for some other tasks, then we do not need the decoder of the transformer, that gives us BERT
  • 6. Training a Transformer • Transformers typically use semi-supervised learning with • Unsupervised pretraining over a very large dataset of general text • Followed by supervised fine-tuning over a focused data set of inputs and outputs for a particular task • Tasks for pretraining and fine-tuning commonly include: • language modeling • next-sentence prediction (aka completion) • question answering • reading comprehension • sentiment analysis • paraphrasing
  • 7. Pretrained models • Since training a model requires huge datasets of text and significan computation, researchers often use common pretrained models • Examples (circa December 2021) include • Google’s BERT model • Huggingface’s various Transformer models • OpenAI’s and GPT-3 models
  • 11. 03 1542M 762M 345M 117M parameters GPT released June 2018 GPT-2 released Nov. 2019 with 1.5B parameters GPT-3 released in 2020 with 175B parameters