SlideShare a Scribd company logo
Transformers for Generative
AI and computer vision:
Theory and Practice
Outline
 The journey of Gen AI
 The foundation of Gen AI
 Sequential models and limitation
 Introduction of Transformer
 Transformer Architecture
 Vision Transformer
 Discussion
The journey of Gen AI
 Data Analysis: The Foundation
 Businesses used data analysis tools to track customer purchasing behavior and optimize their
marketing strategies.
 In healthcare, it powered descriptive analytics to monitor patient health trends and reduce risks.
 Data Mining: Digging Deeper
 In retail, Amazon’s "People Who Bought This Also Bought" recommendations were powered by
data mining to identify product associations.
 In fraud detection, banks analyzed customer transaction patterns to uncover fraudulent
activities in real time.
 Machine Learning: Enabling Predictive Intelligence
 In e-commerce, Netflix revolutionized content recommendations with its machine learning-
based personalization engine.
 In transportation, Uber used demand prediction algorithms to optimize ride availability and
pricing.
 In finance, machine learning models were deployed for credit scoring and portfolio risk analysis.
The journey continues
 Deep Learning: Unlocking Complex Representations
 In healthcare, deep learning models helped detect diseases like cancer through medical image
analysis.
 In automotive, Tesla leveraged deep learning to enhance self-driving car systems.
 In communication, Google Translate used deep learning to provide near-human translation for over
100 languages.
 Artificial Intelligence: Broadening Horizons
 In education, AI-powered tools like Duolingo revolutionized personalized learning experiences.
 In healthcare, IBM’s Watson analyzed millions of medical papers to assist doctors with accurate
diagnoses and treatments.
 In urban planning, smart cities used AI to optimize energy consumption, reduce traffic congestion, and
improve public safety.
 Generative AI: Creating Intelligence
 In media, tools like ChatGPT and DALL-E generate human-like text, images, and creative art for content
creators and marketers.
 In entertainment, AI-generated scripts, music, and video game environments are reshaping the
creative process.
 In medicine, generative AI is used to design new drugs by simulating and optimizing molecular
structures.
What is next after Gen AI?
Will Machine/AI replace
Human
The foundation of Gen AI
 Key Goals for Generating Anything
 Understanding the Characteristics of Data
 Identifying patterns, relationships, and features unique to the data points.
 Example: Understanding linguistic structures in text, pitch and tone in audio,
or object motion in videos.
 Sequential Nature of Real-World Data :
 Text: Words in a sentence depend on the sequence for meaning,
 Audio: Speech depends on phoneme order and timing,
 Videos: Frames rely on temporal order to convey events.
 Contextual understanding:
 Capturing the relationship between data points to maintain coherence.
 Example: Predicting the next word in a sentence or summarizing a video
segment based on surrounding context.
Sequence Learning Models
 RNN (Recurrent Neural Network / Recursive Neural Network)
 LSTM (Long Short Term Memory)
 GRU (Gated Recurrent Unit)
 Transformer
Recurrent Neural Network (RNN)
 It is able to ‘memorize’ parts of the inputs and use them to make accurate predictions.
 These networks are at the heart of speech recognition, translation and more.
RNN (Continue…)
Problem with Standard RNN
 The simplest RNN model has a major drawback, called vanishing
gradient problem, which prevents it from being accurate.
 This means that the network experiences difficulty in
memorizing words from far away in the sequence and makes
predictions based on only the most recent ones.
 Solutions:
Various versions of RNN: LSTM, GRU
Attention model
Application based solution
Solutions
Attention Model
 An attention mechanism is a part of a neural network.
 At each decoder step, it decides which source parts are more important. In this
setting, the encoder does not have to compress the whole source into a single
vector - it gives representations for all source tokens (for example, all RNN states
instead of the last one
FDP_atal_on transformer_NLP_by_example.pptx
Transformer
Why Transformer?
Problem with sequence models with NLP task
Problem with CNN
Transformer: Attention based model
NLP
For NLP applications, attention is often described as the
relationship between words (tokens) in a sentence.
CV
In a computer vision application, attention looks at the
relationships between patches (tokens) in an image.
What is the Transformer?
Self-attention based architecture without using any
sequence based architecture1
Extract features for each word using a self-attention
mechanism to figure out how important all the other
words in the sentence are w.r.t. to the aforementioned
word.
Attention :
The attention that the token pays to other tokens
The attention that a set of token pay to the token
Vaswani, Ashish, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia
Polosukhin. "Attention is all you need." Advances in neural information processing systems 30 (2017).
Major difference between Seq-2-seq
and transformer
 Input value:
 Architecture
 Encoder and Decoder stacks
Transformer
Transformer
I am living at Ahmedabad
હું અમદાવાદ રહું છું
NER : Ahmedabad Intent: Information Text Segmentation
Transformer Architecture
Input Embedding and Positional
Embedding
 Embedding: Convert input tokens into the numerical form
 Positional embedding: Maintain the order of the words in
given sentence
 Output Shape:
Samples x sequence length x embedding size
Architecture
Self-Attention
 The self-attention mechanism allows the inputs to interact with each other (“self”) and
find out who they should pay more attention to (“attention”).
 Transformer's encoder can be thought of as a sequence of reasoning steps (layers).
 At each step, tokens look at each other (this is where we need attention - self-attention),
exchange information and try to understand each other better in the context of the
whole sentence. This happens in several layers.
 In each decoder layer, tokens of the prefix also interact with each other via a self-
attention mechanism, but additionally, they look at the encoder states (without this, no
translation can happen, right?).
 Self attention mechanism first computes three vector (query, key and value) for each
word in the sentence.
Actor of attention
 Two actors:
 Attender
 Attendee
 In Transformer, any token can attend to any other tokens including itself.
Query, Key, Value
 Query, Key and Value? (is it related to IR (Information Retrieval)?)
• Query: The query is a representation of the current word used to score against all
the other words (using their keys). We only care about the query of the token we’re
currently processing.
• Key: Key vectors are like labels for all the words in the segment. They’re what we
match against in our search for relevant words.
• Value: Value vectors are actual word representations, once we’ve scored how
relevant each word is, these are the values we add up to represent the current word.
Query vector
 The query vector represents the element of interest or the context that you
want to obtain information about.
 The query vector is used to determine the similarity or relevance between this
context and other elements in the input sequence, specifically the key vectors.
 Suppose you’re translating a sentence from English to French, and you’re at
a particular word in the English sentence (the query). The keys are
representations of all words in the English sentence, and the values are their
corresponding translations in French. Example :
“apple” (the word you want to translate)
Key vector
 The key vector, like the query vector, is a projection of the input data and is
associated with each element in the input sequence.
 The key vectors are used to compute how relevant each element in the input
sequence is to the query.
 This relevance is often calculated using a dot product or another similarity
measure between the query and key vectors.
 [“cat”, “apple”, “tree”, “juice”] (representations of words in English)
Value vector
 The value vector is also a projection of the input data and is associated with
each element in the input sequence, just like the key vector.
 The value vectors store the actual information that will be used to update the
representation of the query. These values are weighted by the attention
scores (computed from the query-key interaction) to determine how much
each element contributes to the final output.
 The attention scores, computed based on the query and key, are used to
weight the value vectors. Higher attention scores mean that the
corresponding values are more important for the output.
 Example :[“chat”, “pomme”, “arbre”, “jus”] (corresponding French translations)
In simple word:
• query - asking for information;
• key - saying that it has some information;
• value - giving the information.
Example
Continue…
Example
Working methodology
Scaling
Working methodology

More Related Content

PDF
Issues in Sentiment analysis
PDF
A FILM SYNOPSIS GENRE CLASSIFIER BASED ON MAJORITY VOTE
PDF
A FILM SYNOPSIS GENRE CLASSIFIER BASED ON MAJORITY VOTE
DOC
Doc format.
PDF
Masterclass: Natural Language Processing in Trading with Terry Benzschawel & ...
PDF
Generative Artificial Intelligence and Large Language Model
PPTX
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
Issues in Sentiment analysis
A FILM SYNOPSIS GENRE CLASSIFIER BASED ON MAJORITY VOTE
A FILM SYNOPSIS GENRE CLASSIFIER BASED ON MAJORITY VOTE
Doc format.
Masterclass: Natural Language Processing in Trading with Terry Benzschawel & ...
Generative Artificial Intelligence and Large Language Model
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx

Similar to FDP_atal_on transformer_NLP_by_example.pptx (20)

PPTX
Frame-Script and Predicate logic.pptx
PDF
Data Science - Part XI - Text Analytics
PPTX
A review on sentiment analysis and emotion detection.pptx
PPTX
sentiment analysis
PPTX
Aman chaudhary
PDF
INFORMATION RETRIEVAL FROM TEXT
PDF
Deciphering voice of customer through speech analytics
PPTX
Sentiment analysis
PDF
Cyber bullying detection and analysis.ppt.pdf
DOCX
Top 10 Must-Know NLP Techniques for Data Scientists
PPTX
Movie Recommendation System.pptx
PPTX
NATURAL LANGUAGE PROCESSING in ARTIFICIAL INTELLIGENCE.pptx
PDF
IRJET- Cross-Domain Sentiment Encoding through Stochastic Word Embedding
PDF
Deep Machine Reading
PDF
IRJET-Sentiment Analysis in Twitter
PDF
Mind map of terminologies used in context of Generative AI
DOCX
Language Modeling.docx
PPTX
AI_Lecture_10.pptx
PPTX
Cognitive computing
PPTX
AI Chapter VIIProblem Solving Using Searching .pptx
Frame-Script and Predicate logic.pptx
Data Science - Part XI - Text Analytics
A review on sentiment analysis and emotion detection.pptx
sentiment analysis
Aman chaudhary
INFORMATION RETRIEVAL FROM TEXT
Deciphering voice of customer through speech analytics
Sentiment analysis
Cyber bullying detection and analysis.ppt.pdf
Top 10 Must-Know NLP Techniques for Data Scientists
Movie Recommendation System.pptx
NATURAL LANGUAGE PROCESSING in ARTIFICIAL INTELLIGENCE.pptx
IRJET- Cross-Domain Sentiment Encoding through Stochastic Word Embedding
Deep Machine Reading
IRJET-Sentiment Analysis in Twitter
Mind map of terminologies used in context of Generative AI
Language Modeling.docx
AI_Lecture_10.pptx
Cognitive computing
AI Chapter VIIProblem Solving Using Searching .pptx
Ad

Recently uploaded (20)

PPTX
History, Philosophy and sociology of education (1).pptx
PDF
HVAC Specification 2024 according to central public works department
PDF
CISA (Certified Information Systems Auditor) Domain-Wise Summary.pdf
PDF
My India Quiz Book_20210205121199924.pdf
PDF
1.3 FINAL REVISED K-10 PE and Health CG 2023 Grades 4-10 (1).pdf
PDF
FOISHS ANNUAL IMPLEMENTATION PLAN 2025.pdf
PDF
Chinmaya Tiranga quiz Grand Finale.pdf
PDF
FORM 1 BIOLOGY MIND MAPS and their schemes
PDF
ChatGPT for Dummies - Pam Baker Ccesa007.pdf
PPTX
Introduction to pro and eukaryotes and differences.pptx
PDF
BP 704 T. NOVEL DRUG DELIVERY SYSTEMS (UNIT 1)
PDF
medical_surgical_nursing_10th_edition_ignatavicius_TEST_BANK_pdf.pdf
PDF
Indian roads congress 037 - 2012 Flexible pavement
PDF
Empowerment Technology for Senior High School Guide
PPTX
Onco Emergencies - Spinal cord compression Superior vena cava syndrome Febr...
PDF
Weekly quiz Compilation Jan -July 25.pdf
PDF
Practical Manual AGRO-233 Principles and Practices of Natural Farming
PDF
A GUIDE TO GENETICS FOR UNDERGRADUATE MEDICAL STUDENTS
PDF
What if we spent less time fighting change, and more time building what’s rig...
PDF
احياء السادس العلمي - الفصل الثالث (التكاثر) منهج متميزين/كلية بغداد/موهوبين
History, Philosophy and sociology of education (1).pptx
HVAC Specification 2024 according to central public works department
CISA (Certified Information Systems Auditor) Domain-Wise Summary.pdf
My India Quiz Book_20210205121199924.pdf
1.3 FINAL REVISED K-10 PE and Health CG 2023 Grades 4-10 (1).pdf
FOISHS ANNUAL IMPLEMENTATION PLAN 2025.pdf
Chinmaya Tiranga quiz Grand Finale.pdf
FORM 1 BIOLOGY MIND MAPS and their schemes
ChatGPT for Dummies - Pam Baker Ccesa007.pdf
Introduction to pro and eukaryotes and differences.pptx
BP 704 T. NOVEL DRUG DELIVERY SYSTEMS (UNIT 1)
medical_surgical_nursing_10th_edition_ignatavicius_TEST_BANK_pdf.pdf
Indian roads congress 037 - 2012 Flexible pavement
Empowerment Technology for Senior High School Guide
Onco Emergencies - Spinal cord compression Superior vena cava syndrome Febr...
Weekly quiz Compilation Jan -July 25.pdf
Practical Manual AGRO-233 Principles and Practices of Natural Farming
A GUIDE TO GENETICS FOR UNDERGRADUATE MEDICAL STUDENTS
What if we spent less time fighting change, and more time building what’s rig...
احياء السادس العلمي - الفصل الثالث (التكاثر) منهج متميزين/كلية بغداد/موهوبين
Ad

FDP_atal_on transformer_NLP_by_example.pptx

  • 1. Transformers for Generative AI and computer vision: Theory and Practice
  • 2. Outline  The journey of Gen AI  The foundation of Gen AI  Sequential models and limitation  Introduction of Transformer  Transformer Architecture  Vision Transformer  Discussion
  • 3. The journey of Gen AI  Data Analysis: The Foundation  Businesses used data analysis tools to track customer purchasing behavior and optimize their marketing strategies.  In healthcare, it powered descriptive analytics to monitor patient health trends and reduce risks.  Data Mining: Digging Deeper  In retail, Amazon’s "People Who Bought This Also Bought" recommendations were powered by data mining to identify product associations.  In fraud detection, banks analyzed customer transaction patterns to uncover fraudulent activities in real time.  Machine Learning: Enabling Predictive Intelligence  In e-commerce, Netflix revolutionized content recommendations with its machine learning- based personalization engine.  In transportation, Uber used demand prediction algorithms to optimize ride availability and pricing.  In finance, machine learning models were deployed for credit scoring and portfolio risk analysis.
  • 4. The journey continues  Deep Learning: Unlocking Complex Representations  In healthcare, deep learning models helped detect diseases like cancer through medical image analysis.  In automotive, Tesla leveraged deep learning to enhance self-driving car systems.  In communication, Google Translate used deep learning to provide near-human translation for over 100 languages.  Artificial Intelligence: Broadening Horizons  In education, AI-powered tools like Duolingo revolutionized personalized learning experiences.  In healthcare, IBM’s Watson analyzed millions of medical papers to assist doctors with accurate diagnoses and treatments.  In urban planning, smart cities used AI to optimize energy consumption, reduce traffic congestion, and improve public safety.  Generative AI: Creating Intelligence  In media, tools like ChatGPT and DALL-E generate human-like text, images, and creative art for content creators and marketers.  In entertainment, AI-generated scripts, music, and video game environments are reshaping the creative process.  In medicine, generative AI is used to design new drugs by simulating and optimizing molecular structures.
  • 5. What is next after Gen AI? Will Machine/AI replace Human
  • 6. The foundation of Gen AI  Key Goals for Generating Anything  Understanding the Characteristics of Data  Identifying patterns, relationships, and features unique to the data points.  Example: Understanding linguistic structures in text, pitch and tone in audio, or object motion in videos.  Sequential Nature of Real-World Data :  Text: Words in a sentence depend on the sequence for meaning,  Audio: Speech depends on phoneme order and timing,  Videos: Frames rely on temporal order to convey events.  Contextual understanding:  Capturing the relationship between data points to maintain coherence.  Example: Predicting the next word in a sentence or summarizing a video segment based on surrounding context.
  • 7. Sequence Learning Models  RNN (Recurrent Neural Network / Recursive Neural Network)  LSTM (Long Short Term Memory)  GRU (Gated Recurrent Unit)  Transformer
  • 8. Recurrent Neural Network (RNN)  It is able to ‘memorize’ parts of the inputs and use them to make accurate predictions.  These networks are at the heart of speech recognition, translation and more.
  • 10. Problem with Standard RNN  The simplest RNN model has a major drawback, called vanishing gradient problem, which prevents it from being accurate.  This means that the network experiences difficulty in memorizing words from far away in the sequence and makes predictions based on only the most recent ones.  Solutions: Various versions of RNN: LSTM, GRU Attention model Application based solution
  • 12. Attention Model  An attention mechanism is a part of a neural network.  At each decoder step, it decides which source parts are more important. In this setting, the encoder does not have to compress the whole source into a single vector - it gives representations for all source tokens (for example, all RNN states instead of the last one
  • 14. Transformer Why Transformer? Problem with sequence models with NLP task Problem with CNN
  • 15. Transformer: Attention based model NLP For NLP applications, attention is often described as the relationship between words (tokens) in a sentence. CV In a computer vision application, attention looks at the relationships between patches (tokens) in an image.
  • 16. What is the Transformer? Self-attention based architecture without using any sequence based architecture1 Extract features for each word using a self-attention mechanism to figure out how important all the other words in the sentence are w.r.t. to the aforementioned word. Attention : The attention that the token pays to other tokens The attention that a set of token pay to the token Vaswani, Ashish, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. "Attention is all you need." Advances in neural information processing systems 30 (2017).
  • 17. Major difference between Seq-2-seq and transformer  Input value:  Architecture  Encoder and Decoder stacks
  • 18. Transformer Transformer I am living at Ahmedabad હું અમદાવાદ રહું છું NER : Ahmedabad Intent: Information Text Segmentation
  • 20. Input Embedding and Positional Embedding  Embedding: Convert input tokens into the numerical form  Positional embedding: Maintain the order of the words in given sentence  Output Shape: Samples x sequence length x embedding size
  • 22. Self-Attention  The self-attention mechanism allows the inputs to interact with each other (“self”) and find out who they should pay more attention to (“attention”).  Transformer's encoder can be thought of as a sequence of reasoning steps (layers).  At each step, tokens look at each other (this is where we need attention - self-attention), exchange information and try to understand each other better in the context of the whole sentence. This happens in several layers.  In each decoder layer, tokens of the prefix also interact with each other via a self- attention mechanism, but additionally, they look at the encoder states (without this, no translation can happen, right?).  Self attention mechanism first computes three vector (query, key and value) for each word in the sentence.
  • 23. Actor of attention  Two actors:  Attender  Attendee  In Transformer, any token can attend to any other tokens including itself.
  • 24. Query, Key, Value  Query, Key and Value? (is it related to IR (Information Retrieval)?) • Query: The query is a representation of the current word used to score against all the other words (using their keys). We only care about the query of the token we’re currently processing. • Key: Key vectors are like labels for all the words in the segment. They’re what we match against in our search for relevant words. • Value: Value vectors are actual word representations, once we’ve scored how relevant each word is, these are the values we add up to represent the current word.
  • 25. Query vector  The query vector represents the element of interest or the context that you want to obtain information about.  The query vector is used to determine the similarity or relevance between this context and other elements in the input sequence, specifically the key vectors.  Suppose you’re translating a sentence from English to French, and you’re at a particular word in the English sentence (the query). The keys are representations of all words in the English sentence, and the values are their corresponding translations in French. Example : “apple” (the word you want to translate)
  • 26. Key vector  The key vector, like the query vector, is a projection of the input data and is associated with each element in the input sequence.  The key vectors are used to compute how relevant each element in the input sequence is to the query.  This relevance is often calculated using a dot product or another similarity measure between the query and key vectors.  [“cat”, “apple”, “tree”, “juice”] (representations of words in English)
  • 27. Value vector  The value vector is also a projection of the input data and is associated with each element in the input sequence, just like the key vector.  The value vectors store the actual information that will be used to update the representation of the query. These values are weighted by the attention scores (computed from the query-key interaction) to determine how much each element contributes to the final output.  The attention scores, computed based on the query and key, are used to weight the value vectors. Higher attention scores mean that the corresponding values are more important for the output.  Example :[“chat”, “pomme”, “arbre”, “jus”] (corresponding French translations)
  • 28. In simple word: • query - asking for information; • key - saying that it has some information; • value - giving the information.