SlideShare a Scribd company logo
LLMs and GenAI Simplified_ An Easy Path to Understanding [V10252024].pdf
LLMs and GenAI Simplified: An Easy Path to Understanding
LLMs Simplified: An Easy Path to Understanding [DRAFT] 6
Definitions 6
Background 17
1. Introduction to Large Language Models (LLMs) 17
2. LLM Architecture 17
3. Applications of LLMs 17
4. LLM Performance Benchmarks 18
5. Governance, Ethics, and Responsible AI 18
6. Challenges and Future Directions 19
Conclusion 19
CHAPTER -0 : FUNDAMENTALS 19
Overview: 19
Step 1: Set up the Neural Network Structure 20
Step 2: Initialize Weights and Biases 20
Step 3: Forward Propagation 20
Step 4: Calculate Loss (Error) 21
Step 5: Backpropagation 21
Step 6: Repeat the Process 21
Simple Example: 22
Summary: 22
1. Neural Network Models 24
2. Activation Functions 24
3. Loss Functions 25
4. Optimizers 25
5. Metrics 26
Putting It All Together: 26
1. Sequential Model 27
2. Functional API 27
3. Subclassing Model 28
4. Model with Shared Layers 29
5. Multi-Input and Multi-Output Models 29
6. Autoencoders 30
7. GANs (Generative Adversarial Networks) 31
Summary 32
1. Mean Squared Error (MSE) 32
2. Mean Absolute Error (MAE) 32
3. Binary Cross-Entropy (Log Loss) 33
4. Categorical Cross-Entropy 33
LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024
2
LLMs and GenAI Simplified: An Easy Path to Understanding
5. Sparse Categorical Cross-Entropy 33
6. Hinge Loss 34
7. Huber Loss 34
8. Kullback-Leibler Divergence (KL Divergence) 34
9. Poisson Loss 35
Summary of Loss Functions by Type: 35
CHAPTER-1: GenAI and LLM 36
2. LLM Types 36
1. General-Purpose LLMs 36
2. Multilingual LLMs 37
3. Instruction-Following LLMs 37
4. Conversational LLMs 37
5. Code Generation LLMs 37
6. Specialized LLMs 37
7. Knowledge-Enhanced LLMs 38
8. Multimodal LLMs 38
9. Compression and Parameter-Efficient LLMs 38
10. Large Language Models with Memory 38
11. Few-Shot and Zero-Shot LLMs 38
12. Reinforcement Learning-Based LLMs 38
3. Popular LLMs 39
1. OpenAI GPT (Generative Pre-trained Transformer) 39
2. LLaMA (Large Language Model Meta AI) 39
3. Google Gemini 40
4. Claude (Claude 1, Claude 2) 40
5. PaLM (Pathways Language Model) 40
6. BLOOM 41
7. Grok (XAI) 41
8. Mistral 42
Conclusion: 42
4. Open source LLMs 42
1. BERT (Bidirectional Encoder Representations from Transformers) 42
2. GPT-2 43
3. RoBERTa 43
4. T5 (Text-to-Text Transfer Transformer) 43
5. BLOOM (BigScience Large Open-science Open-access Multilingual Language Model)
43
6. DistilBERT 43
7. XLM-R (XLM-RoBERTa) 43
8. BART (Bidirectional and Auto-Regressive Transformers) 44
9. Flan-T5 44
LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024
3
LLMs and GenAI Simplified: An Easy Path to Understanding
10. CodeBERT 44
CHAPTER 2: LLM Architecture 45
5. LLM Transformer architecture 45
1. Encoder Architecture 45
2. Decoder Architecture 45
3. Encoder-Decoder Architecture 45
Key Components: 46
Step 1: Input (Good morning) 47
Encoder Steps: Processing the Input Sentence 47
Decoder Steps: Generating the Translation ("Bonjour") 48
Putting It All Together: 49
CHAPTER 3: LLM Applications 49
6. LLM Gen AI use cases 49
1. Text Generation 49
2. Question Answering 49
3. Text Summarization 49
4. Text Classification 50
5. Translation 50
6. Conversational AI (Chatbots) 50
7. Image Generation (Text-to-Image) 50
8. Image Classification 50
9. Image Segmentation 50
10. Audio Processing (Speech-to-Text and Text-to-Speech) 51
11. Code Generation 51
12. Sentiment Analysis 51
13. Named Entity Recognition (NER) 51
14. Data Augmentation 51
15. Image Captioning 51
16. Multi-modal AI 52
17. Text-Based Games/Interactive Stories 52
18. Knowledge Base Extraction 52
19. Fake News Detection 52
20. Grammar and Style Correction 52
21. Legal Document Generation 52
22. Paraphrasing 53
23. Automated Code Review 53
24. Emotion Recognition in Text 53
25. Product Recommendation 53
26. Text-to-Programming Language Conversion 53
27. Style Transfer (Text) 54
28. Document Comparison 54
LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024
4
LLMs and GenAI Simplified: An Easy Path to Understanding
29. Content Moderation 54
30. Voice Cloning 54
31. Image Super-Resolution 54
32. Code Translation (Language-to-Language) 54
33. Image Inpainting 55
34. Text-Based Music Generation 55
35. Visual Question Answering (VQA) 55
36. Data-to-Text Generation 55
37. Human Pose Estimation 55
38. Time-Series Forecasting 55
39. Reinforcement Learning for Text-Based Tasks 55
40. Automated Tagging and Metadata Generation 56
7. LLM Model Parameters 56
1. Temperature 56
2. Max Tokens 57
3. Top-k Sampling 57
4. Top-p Sampling (Nucleus Sampling) 58
5. Frequency Penalty 58
6. Presence Penalty 59
7. Stop Sequences 59
8. Best-of (n-best) 60
9. Echo 60
10. Stream 61
8. LLM benchmarks 61
1. SuperGLUE 61
2. GLUE (General Language Understanding Evaluation) 61
3. OpenAI HumanEval 61
4. SQuAD (Stanford Question Answering Dataset) 62
5. MMLU (Massive Multitask Language Understanding) 62
6. HELLASWAG 62
7. Big-Bench (Beyond the Imitation Game Benchmark) 62
8. LAMBADA 62
9. TriviaQA 62
10. CoQA (Conversational Question Answering) 62
11. Winograd Schema Challenge 62
12. ARC (AI2 Reasoning Challenge) 63
13. PiQA (Physical Interaction: Question Answering) 63
14. BoolQ (Boolean Questions) 63
15. TyDiQA 63
16. StoryCloze 63
17. WinoGrande 63
LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024
5
LLMs and GenAI Simplified: An Easy Path to Understanding
18. DROP (Discrete Reasoning Over Paragraphs) 63
19. Hendrycks Test 64
20. XGLUE 64
21. CodeXGLUE 64
22. CLUE (Chinese Language Understanding Evaluation) 64
9. LLM Finetuning 64
a) LLM with Prompt Engineering Tuning 64
Steps: 64
Example: 65
Resources: 65
b) LLM Instructions-based Training Tuning 65
Steps: 65
Example: 66
Resources: 67
c) LLM with RAG (Retrieval-Augmented Generation) Fine-tuning 67
Steps: 67
Example: 68
Resources: 69
d) LLM with LoRA (Low-Rank Adaptation) 69
Steps: 69
Example: 69
Resources: 70
e) LLM with QLoRA (Quantized Low-Rank Adaptation) 70
Steps: 70
Example: 70
Resources: 70
f) LLM with Full Tuning 70
Steps: 70
Example: 70
Resources: 72
10. Interview Questions 72
LLM Architecture 73
Transformers 73
Optimization Techniques 74
Ethical Considerations 75
Deployment Strategies 76
Hugging Face 76
OpenAI 77
LangChain 78
Fine-Tuning 79
Generative AI (Gen AI) and Large Language Models (LLM) 80
LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024
6
LLMs and GenAI Simplified: An Easy Path to Understanding
Hugging Face 81
OpenAI 81
LangChain 82
Fine-Tuning 82
AI Governance 83
LLM FineTuning Code Samples: 83
AI Evaluation Metrics 87
1. Classification Metrics: 87
2. Regression Metrics: 87
3. Natural Language Processing (NLP) Metrics: 88
4. Clustering Metrics: 88
5. Ranking Metrics: 88
6. Advanced Metrics: 89
7. Multiclass and Multilabel Metrics: 89
8. Fairness and Bias Metrics: 89
Conclusion 90
Appendix A: External References 90
Blogs 90
Articles: 90
PDfs: 90
Code: 91
LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024
7
LLMs and GenAI Simplified: An Easy Path to Understanding
LLMs Simplified: An Easy Path to Understanding
[DRAFT]
About Author
Srini Pusuluri - M.Tech IIT Kharagpur
FMR Distinguished Scientist in Indian Space and Defence, Salesforce CRM and AI Architect
Senior Salesforce (SFDC), AI, and CRM Program Architect, highly skilled in integrating
cutting-edge technologies like artificial intelligence and customer relationship management
(CRM) platforms. With over 20 years of IT experience (including 12 years in CRM/Salesforce
and 5 years in AI), the author is a recognized leader in designing and delivering innovative AI
and CRM solutions across industries. They have extensive expertise in security, multi-org
setups, data integration, design patterns, DevOps, and AI strategy, and hold 20 Salesforce
and 5 AI certifications.
His career spans roles at prominent organizations such as Google, Elastic, GE, AT&T, IBM,
and USAA, where they have successfully led large-scale digital transformation projects. his
LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024
8
LLMs and GenAI Simplified: An Easy Path to Understanding
responsibilities include delivering architecture solutions for BILL CRM, developing Gen AI
Agentforce solutions like sentiment analysis, text summarization, and chat/call analytics, and
implementing high-volume AI projects involving Einstein Bots, Five9 CTI, Omni-Channel, and
Service Cloud.
The author is also known for managing complex B2C Salesforce implementations with
millions of accounts and users, ensuring robust IT governance, and coordinating with multiple
implementation partners. In addition, their expertise extends to Sales Cloud, Service Cloud,
CPQ, and handling large-scale data migrations, including Zendesk-to-Salesforce migrations
and acquisition-based org mergers.
As a trainer and R&D leader, they have pioneered integrating LLMs (Large Language Models),
such as XGen, LLaMA, and ChatGPT, into Salesforce for Copilot AI solutions. Their recent
work focuses on applying fine-tuning techniques and AI strategy to enhance enterprise CRM
systems and customer data platforms (CDPs).
With a career marked by over 40 successful projects, they are not only a skilled architect but
also a thought leader in AI-driven CRM innovation, sharing insights through public speaking and
training engagements
Preface
Motivation Behind the Web Book on LLM Modeling and Fine-tuning
The impetus for writing this web book on Large Language Models (LLMs) and fine-tuning stems
from a significant gap in available resources. While there is an abundance of content on
foundational AI principles, there is no comprehensive guide that consolidates the complexities of
LLM architecture, model customization, and the nuanced processes involved in fine-tuning for
specialized applications. For professionals navigating the cutting edge of AI—whether for NLP
tasks, chatbot implementations, or tailored business solutions—the knowledge scattered across
various research papers, tutorials, and forums can be overwhelming.
As an expert with deep experience across AI and CRM projects, working with companies like
Google, IBM, and Elastic, the author has recognized that mastering LLMs requires more than
just an understanding of neural networks or algorithms. It involves strategic insights into how
these models can be adapted, scaled, and integrated into complex systems while maintaining
performance, security, and accuracy. Fine-tuning an LLM demands a blend of technical
precision and creative problem-solving, where understanding the target domain is just as
important as the model's architecture.
LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024
9
LLMs and GenAI Simplified: An Easy Path to Understanding
The motivation for this book emerged from seeing countless AI professionals and developers
struggle to assemble coherent strategies from fragmented sources, especially in the
fast-evolving field of LLMs. Having fine-tuned models for diverse business applications—from
customer support chatbots to AI-driven decision-making platforms—the author recognized the
need for a definitive resource. This web book is designed to be that resource, offering clear,
structured insights that guide readers through the entire process, from model selection and
training to deployment and optimization.
By distilling years of experience in AI modeling and LLM customization, the author aims to
provide professionals with a go-to reference, empowering them to confidently navigate the
complexities of LLMs and leverage their full potential for specialized use cases.
Definitions
Generative AI (Gen AI): A branch of artificial intelligence that can generate new content, such
as text, images, or music, from given inputs. It’s widely used in natural language processing
(NLP), image creation, and other tasks where the AI learns patterns from data and produces
creative outputs based on that learning.
Large Language Model (LLM): A type of neural network model trained on vast amounts of text
data to understand and generate human-like language. Examples include GPT (Generative
Pre-trained Transformer) models, such as GPT-4. LLMs are capable of performing a variety of
tasks like translation, summarization, text generation, and answering questions.
Parameter-Efficient Fine-Tuning (PEFT): A technique for fine-tuning large models like LLMs
by modifying only a small number of parameters, keeping the majority of the original pre-trained
model intact. PEFT is efficient in terms of memory and computation, making it useful when
adapting large models for specific tasks.
Low-Rank Adaptation (LoRA): A specific form of PEFT that decomposes large parameter
matrices into low-rank matrices during fine-tuning. This reduces the number of parameters to
update, making training faster and more resource-efficient, especially useful for adapting LLMs
to new tasks.
Tokens/Tokenization: A token is a unit of text that a model processes. Tokenization is the
process of splitting text into smaller units (tokens), which can be as small as characters or as
large as whole words, depending on the model. For instance, the word "chatbot" might be split
into two tokens: "chat" and "bot".
LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024
10
LLMs and GenAI Simplified: An Easy Path to Understanding
Embedding: A mathematical representation of words, phrases, or other data in a continuous
vector space. In NLP, embeddings are used to capture the meaning of words based on their
context and relationships with other words. Word2Vec and BERT are examples of models that
create word embeddings.
Catastrophic Forgetting: A phenomenon that occurs when a machine learning model forgets
previously learned information while being trained on new tasks. In the context of LLMs,
catastrophic forgetting can happen during fine-tuning when the model is over-optimized for the
new task and loses generalization capabilities.
Attention Mechanism: A technique in deep learning that allows models to focus on specific
parts of the input when generating output, improving their ability to capture relationships
between distant words in text. It is the key innovation behind transformers and LLMs.
Transformer Architecture: The underlying architecture for LLMs like GPT. It uses self-attention
mechanisms to process input data in parallel, making it highly efficient for tasks that involve long
sequences of text.
Pre-training: The initial phase of training an LLM on a large dataset where the model learns
general language patterns and knowledge. During pre-training, models are usually trained using
unsupervised learning on vast amounts of text data.
Fine-Tuning: The process of further training a pre-trained model on a specific dataset or for a
specific task to improve performance in that area. Fine-tuning helps the model adapt to
specialized domains while retaining its general knowledge.
Prompting: A method used to guide LLMs into generating specific outputs by providing context
or instructions within the input. A prompt is the initial text given to the model that defines the
type of response you want.
Zero-Shot Learning: A method where an LLM performs a task without any specific fine-tuning
for that task. The model relies solely on the knowledge it gained during pre-training to generate
responses.
Few-Shot Learning: A technique in which the model is provided with a few examples (in the
prompt) of how a task should be performed before generating an answer. This helps the model
adapt to specific types of tasks without full fine-tuning.
Context Window: The amount of text (measured in tokens) that an LLM can consider at once
while generating responses. Models have a fixed limit on the number of tokens they can handle
at a time. If the text exceeds the context window, the model may forget earlier parts of the input.
Temperature: A parameter that controls the randomness of text generation in LLMs. Higher
temperature values result in more random and diverse outputs, while lower values make the
model’s responses more deterministic and focused.
LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024
11
LLMs and GenAI Simplified: An Easy Path to Understanding
Top-k Sampling: A method for text generation where the model selects the next token from the
top k most probable tokens. This adds diversity to the generated text, preventing the model from
always picking the highest probability token.
Top-p Sampling (Nucleus Sampling): A more flexible version of top-k sampling where the
model chooses the next token from the smallest possible set of tokens that have a cumulative
probability of p. This method ensures that token choices are both diverse and probabilistically
consistent.
Latent Space: In machine learning, latent space refers to the compressed, hidden
representation of data within a model. For LLMs, the latent space represents abstract,
high-dimensional relationships between words, sentences, or entire documents, enabling the
model to reason and generate language.
Autoregressive Model: A type of model that generates the next token in a sequence based on
previously generated tokens. GPT models are autoregressive because they predict one word at
a time, conditioned on the words that came before.
Masked Language Model: A model that learns by predicting masked-out words within a
sentence. BERT is an example of a masked language model, which improves understanding of
context and relationships in text by learning to reconstruct sentences.
Gradient Descent: An optimization algorithm used to train machine learning models by
minimizing the loss function. During training, the model updates its parameters based on the
gradient of the loss function to find the optimal solution.
Loss Function: A mathematical function that measures how well the model's predictions match
the actual data. The goal of training a model is to minimize the loss function, which indicates the
model's performance in learning from data.
Overfitting: A condition where a model learns to perform very well on the training data but fails
to generalize to new, unseen data. Overfitting occurs when the model becomes too specialized
to the specific patterns of the training set.
Underfitting: When a model is too simple to capture the underlying patterns in the data, leading
to poor performance both on the training and testing datasets.
Regularization: Techniques used to prevent overfitting by adding constraints or penalties to the
model’s complexity. Common forms of regularization include L1 and L2 regularization, as well as
dropout, which randomly deactivates certain neurons during training.
Backpropagation: The process of updating a neural network's weights by calculating the
gradient of the loss function with respect to each weight, and then using this information to
make adjustments. This is done iteratively to improve the model’s predictions.
Dropout: A regularization technique where a random set of neurons is ignored during training,
preventing the model from relying too heavily on specific neurons and improving generalization.
LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024
12
LLMs and GenAI Simplified: An Easy Path to Understanding
Epoch: A single pass through the entire training dataset. During each epoch, the model's
parameters are updated multiple times, depending on the size of the dataset and the chosen
batch size.
Batch Size: The number of training examples used in one iteration of updating the model’s
parameters. A larger batch size allows the model to take into account more information per
update but requires more computational resources.
Gradient Clipping: A technique to prevent exploding gradients during backpropagation by
limiting the size of the gradients during training. It helps to stabilize and accelerate model
training.
Exploding/Vanishing Gradients: Problems in neural network training where gradients become
too large (exploding) or too small (vanishing), which can make it difficult to update the model’s
parameters effectively.
Beam Search: A search algorithm used in text generation that explores multiple possible
sequences simultaneously, keeping track of the most promising ones. This method helps
improve the quality of generated text by considering various possible continuations.
Bias and Variance: Bias refers to errors introduced by overly simplistic models that fail to
capture the complexity of the data (underfitting). Variance refers to errors introduced by models
that are too complex and capture noise along with the data (overfitting).
Neural Architecture Search (NAS): The process of automating the design of neural network
architectures. Instead of manually designing the architecture, NAS explores different
configurations to find the optimal structure for a specific task.
Knowledge Distillation: A process where a smaller model (student) is trained to mimic the
predictions of a larger, more complex model (teacher). The goal is to create a lightweight
version of a model that performs similarly but with fewer resources.
Multi-Head Attention: An extension of the attention mechanism used in transformer models like
GPT. It allows the model to focus on different parts of the input sequence at the same time (i.e.,
multiple "heads"), improving the ability to capture various relationships in the data.
Self-Attention: A mechanism that relates different words in a sequence to each other, even if
they are far apart. Each word in a sequence attends to every other word, allowing the model to
better understand context and relationships.
Cross-Attention: A type of attention mechanism where one sequence (like a query) attends to
another sequence (like a context or memory). Cross-attention is commonly used in tasks like
text generation where the output sequence needs to refer to an input sequence (e.g., in
translation).
Positional Encoding: Since transformers do not inherently understand the order of tokens
(unlike RNNs), positional encoding is added to input embeddings to give the model information
about the position of each token in a sequence.
LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024
13
LLMs and GenAI Simplified: An Easy Path to Understanding
Unsupervised Learning: A type of machine learning where the model is trained on data
without explicit labels. The model learns patterns and structures in the data on its own. Many
LLMs are pre-trained using unsupervised learning on large corpora of text.
Transfer Learning: A technique in which a model trained on one task (or a large general
dataset) is adapted to a different, often more specific task. Fine-tuning LLMs on specific
datasets is a common example of transfer learning.
Gradient Accumulation: A technique used during training to simulate a large batch size on
smaller hardware. Gradients are accumulated over several smaller batches before performing
an update step, making training more efficient with limited resources.
Batched Inference: A process where multiple inputs are processed together in a single forward
pass through the model. This is commonly done in LLMs to improve the efficiency and speed of
generating responses for multiple queries at the same time.
Weight Sharing: A technique used in model architectures like transformers, where the same
parameters (weights) are reused across different layers or parts of the network, reducing the
number of trainable parameters and improving efficiency.
Layer Normalization: A normalization technique applied to the inputs of a neural network layer
to stabilize training by reducing internal covariate shift. It's used extensively in
transformer-based models.
Layer Freezing: A technique where certain layers of a pre-trained model are "frozen" (i.e., their
weights are not updated during training) to retain the original knowledge, while other layers are
fine-tuned for specific tasks.
Sparse Attention: An optimization of the standard attention mechanism where only a subset of
the input tokens are attended to, rather than all tokens. This reduces the computational
complexity, especially for long sequences.
Mixture of Experts (MoE): A model architecture that uses multiple sub-models (experts) and
dynamically selects which experts to activate based on the input. MoE models can scale to very
large parameter sizes while reducing the amount of computation required for each input.
Encoder-Decoder Architecture: A neural network structure where the encoder processes the
input sequence into a latent representation, and the decoder generates the output sequence
from that representation. This architecture is commonly used in tasks like machine translation.
Gradient-Free Optimization: A class of optimization methods that do not rely on gradient
information (like backpropagation) to update the model’s parameters. These techniques are
often used in reinforcement learning and neural architecture search.
Attention Masking: A technique used in transformer models to prevent the model from
attending to certain tokens in the sequence. For example, in autoregressive models like GPT, a
causal mask is applied to ensure that the model only attends to previous tokens and not future
ones during training.
LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024
14
LLMs and GenAI Simplified: An Easy Path to Understanding
Adversarial Training: A technique where the model is trained to defend against adversarial
attacks—small, carefully crafted perturbations to the input that can trick the model into making
incorrect predictions.
GAN (Generative Adversarial Network): A type of generative model consisting of two
networks—a generator and a discriminator—that are trained together. The generator tries to
create realistic outputs, while the discriminator tries to distinguish between real and generated
data.
Contrastive Learning: A technique where the model learns to differentiate between similar and
dissimilar pairs of data points. This is often used in tasks like image recognition and
embeddings, where the model learns to group similar data points in the latent space.
Knowledge Graph: A structured representation of knowledge where entities (such as people,
places, or things) are nodes, and relationships between them are edges. Knowledge graphs are
often used in conjunction with LLMs to enhance reasoning and factual recall.
Curriculum Learning: A training strategy where the model is first trained on simpler tasks or
data and gradually introduced to more complex examples. This mirrors the human learning
process and can lead to improved performance and generalization.
Distillation Loss: The loss function used during knowledge distillation, where a smaller student
model is trained to mimic the outputs of a larger teacher model. The loss measures the
difference between the student's predictions and the teacher’s predictions.
Hard vs. Soft Attention: In hard attention, only one part of the input is selected to focus on
(discrete attention), while in soft attention, the model assigns different weights to different parts
of the input (continuous attention).
Perplexity: A metric used to evaluate the performance of language models. It measures how
well a model predicts a sample, with lower perplexity indicating better performance. In essence,
it shows how "confused" the model is in generating a sequence.
Hybrid Model: A model that combines multiple machine learning approaches or architectures,
such as combining rule-based systems with LLMs or integrating neural networks with traditional
algorithms.
Prompt Engineering: The process of designing and optimizing the prompts given to LLMs to
elicit the best possible responses for a specific task. It involves refining the input structure, using
task-specific instructions, and experimenting with different prompt formats.
Task-Specific Fine-Tuning: Fine-tuning an LLM for a very specific task, such as medical
question-answering or legal document analysis. This involves training the model on a dataset
that is highly specialized for the desired task.
Hyperparameters: Parameters that control the learning process of a machine learning model,
such as the learning rate, batch size, number of layers, and attention heads. Hyperparameter
tuning is critical for optimizing model performance.
LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024
15
LLMs and GenAI Simplified: An Easy Path to Understanding
Gradient Descent Optimizers (Adam, SGD, RMSprop): Algorithms used to update the
weights of a model during training. Adam (Adaptive Moment Estimation) is one of the most
popular optimizers due to its efficiency and ability to handle sparse gradients.
Latent Variable Model: A model that assumes the data is generated by underlying, unobserved
variables (latent variables). Variational autoencoders (VAEs) are an example of a latent variable
model.
Long-Short Term Memory (LSTM): A type of recurrent neural network (RNN) architecture
designed to capture long-range dependencies in sequential data, addressing the issue of
vanishing gradients.
BERT (Bidirectional Encoder Representations from Transformers): A transformer-based
model that uses a masked language model approach to pre-train a model in both directions
(left-to-right and right-to-left), improving contextual understanding.
RoBERTa (Robustly Optimized BERT Pretraining Approach): An optimized version of BERT
that uses a larger dataset and better training techniques to improve the performance of
transformer models.
GPT (Generative Pre-trained Transformer): A class of transformer models that are pre-trained
on large text corpora and fine-tuned for specific tasks. GPT models are autoregressive and
generate text one word at a time, using previously generated words as input.
Reinforcement Learning from Human Feedback (RLHF): A training technique where models
are fine-tuned using reinforcement learning, with human evaluators providing feedback to
improve the model’s outputs. This is used to align LLMs with human values and preferences.
Chain of Thought Prompting: A prompting technique where the model is guided to reason
through a problem step by step, rather than producing an answer immediately. This technique
helps improve performance in tasks requiring logical reasoning or multi-step problem-solving.
Multimodal Learning: A type of learning that combines data from multiple modalities (e.g., text,
images, audio) to create models capable of understanding and generating across different types
of data. Multimodal models can generate images from text, or captions from images.
Transformer Decoder: The part of the transformer architecture used in autoregressive models
like GPT. It takes in a sequence of tokens and generates output step-by-step, conditioned on
the previous tokens.
Transformer Encoder: The other half of the transformer architecture, used in models like
BERT. It processes the entire input sequence at once, using bidirectional attention to
understand the context around each token.
Dynamic Quantization: A technique to reduce the size of LLMs by converting their weights to
lower-precision formats (e.g., from 32-bit floating point to 8-bit integer) during inference. This
improves computational efficiency without significantly affecting model performance.
LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024
16
LLMs and GenAI Simplified: An Easy Path to Understanding
Post-Training Quantization: Applying quantization to a model after it has been trained,
reducing the model size and improving inference speed. Unlike dynamic quantization,
post-training quantization modifies the weights before inference.
Knowledge Base Integration: The process of integrating external knowledge sources (like
databases or knowledge graphs) into a language model to improve factual accuracy and recall.
This helps the model access and use structured knowledge for tasks requiring deep expertise.
Memory-Augmented Neural Networks (MANN): A type of neural network architecture that has
an external memory bank, allowing it to store and retrieve information across long time frames.
This enables the model to recall past experiences or facts when generating output.
Unlikelihood Training: A training method used to reduce common generation errors in
language models by explicitly penalizing unlikely or undesirable outputs during training. It helps
prevent repetition, contradictions, and nonsensical outputs.
Synthetic Data Generation: The process of generating artificial data (e.g., text, images) to
augment a dataset. This can be used to train models when real-world data is scarce, or to
balance class distributions in datasets.
Curriculum Fine-Tuning: A technique where a model is fine-tuned on increasingly difficult
datasets or tasks, helping it generalize better and improve performance on complex tasks.
Task-Adaptive Pretraining (TAPT): A method of further pretraining a language model on
domain-specific data before fine-tuning it for a particular task. TAPT helps the model adapt to
the vocabulary, style, and structure of a specialized domain.
Elastic Weight Consolidation (EWC): A regularization technique used to prevent catastrophic
forgetting during fine-tuning by identifying important weights and ensuring that they are not
modified too drastically during training on new tasks.
Latent Dirichlet Allocation (LDA): A machine learning algorithm used for topic modeling. It
identifies topics within a set of documents based on the distribution of words across those
topics. LDA can be used to analyze and organize large text datasets.
Contrastive Divergence: An approximation algorithm used to train probabilistic models like
Restricted Boltzmann Machines (RBMs). It estimates the gradients of the model’s likelihood,
helping the model learn a good representation of the data.
Variational Inference: A method used to approximate complex probability distributions in
Bayesian models. It is often used in VAEs (Variational Autoencoders) to approximate the
posterior distribution of the latent variables.
Beam Width: In beam search (used for text generation), the beam width determines how many
sequences are kept for consideration at each step of the generation process. A larger beam
width increases diversity but also computational cost.
LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024
17
LLMs and GenAI Simplified: An Easy Path to Understanding
Entropy Regularization: A technique used to encourage exploration during reinforcement
learning or text generation by adding a term to the loss function that penalizes low-entropy (i.e.,
overly confident) predictions. This leads to more diverse outputs.
Bidirectional Attention Flow (BiDAF): A model architecture used for tasks like question
answering, where the model attends to both the question and the context at the same time. This
allows it to focus on the most relevant parts of the input when generating a response.
Conditional Generation: The task of generating outputs based on specific input conditions,
such as generating text based on a prompt or generating images based on text descriptions.
Conditional generation is commonly used in models like GPT-3.
Latent Semantic Analysis (LSA): A technique used to analyze relationships between a set of
documents and the terms they contain by producing a set of concepts related to the documents
and terms. LSA is used for tasks like information retrieval and text similarity.
Hypernetwork: A neural network that generates the weights of another neural network. This
technique allows a model to quickly adapt to new tasks by dynamically generating task-specific
weights without requiring separate models.
Monte Carlo Tree Search (MCTS): A search algorithm used in decision-making tasks,
particularly in game AI. MCTS builds a search tree by sampling possible actions and outcomes,
then selecting the most promising action based on statistical averages.
Embedding Space: The continuous vector space where the embeddings (representations) of
words, phrases, or other inputs are mapped. In this space, similar inputs are located closer
together, reflecting their semantic similarity.
Long-Range Dependencies: The relationships between words or tokens in a sequence that
are far apart. Traditional models like RNNs struggle with long-range dependencies, but
transformers handle them well through attention mechanisms.
Exemplar Fine-Tuning: A technique where a few specific, well-chosen examples (exemplars)
are used to fine-tune a large language model, allowing it to generalize better to the desired task.
Self-Supervised Learning: A type of learning where the model generates its own labels from
the input data, rather than relying on external annotations. This is common in LLMs, where the
model learns to predict missing or future words in a sentence.
Data Augmentation: Techniques used to increase the size and diversity of the training data by
creating modified versions of the existing data (e.g., by applying transformations, noise, or
sampling). Data augmentation is used to improve model generalization.
Reinforcement Learning with Human Feedback (RLHF): A training technique where humans
provide feedback on the quality of the model’s output, and this feedback is used to fine-tune the
model through reinforcement learning.
LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024
18
LLMs and GenAI Simplified: An Easy Path to Understanding
Sparse Neural Networks: Neural networks where many of the weights are set to zero, reducing
the computational cost and memory footprint of the model. Sparsity can be introduced during
training through techniques like pruning.
Attention Dropout: A regularization technique applied to the attention mechanism in
transformer models, where a fraction of the attention scores are randomly set to zero. This helps
prevent overfitting and improves generalization.
Structured Prediction: A type of prediction task where the output is a complex structure (e.g., a
sentence, a tree, or a graph) rather than a single label or value. Sequence-to-sequence models
are commonly used for structured prediction tasks like translation or parsing.
Alignment Problem: A challenge in AI safety where the behavior of AI systems needs to be
aligned with human goals, values, or intentions. Misalignment can lead to unintended
consequences, especially in autonomous systems.
Causal Language Modeling: A method where the model is trained to predict the next word in a
sequence based only on the previous words. GPT models use causal language modeling to
generate text in an autoregressive manner.
Entropy: A measure of uncertainty or randomness in a model’s predictions. In language
models, high entropy means the model is uncertain about the next word, while low entropy
means the model is confident in its prediction.
Perceptron: The simplest type of artificial neural network, consisting of a single layer of weights
and an activation function. Perceptrons are the building blocks of more complex neural
networks.
Neural Tangent Kernel (NTK): A mathematical framework that helps understand the training
dynamics of over-parameterized neural networks, providing insights into how large models
behave during gradient descent.
Graph Neural Network (GNN): A type of neural network designed to work with graph-structured
data, where nodes represent entities and edges represent relationships. GNNs are used for
tasks like social network analysis, recommendation systems, and molecular modeling.
Hybrid Attention: A model that combines multiple forms of attention, such as self-attention and
cross-attention, to improve performance on complex tasks where different types of context need
to be considered simultaneously.
Rationales: In interpretability, rationales are explanations or justifications for a model’s
decisions. Rationales can be explicitly provided by the model as part of its output, helping users
understand why certain predictions were made.
Symmetry Breaking: In neural networks, symmetry breaking refers to the process of initializing
the network weights randomly, which ensures that different neurons learn distinct features and
prevents the model from getting stuck in unproductive learning configurations.
LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024
19
LLMs and GenAI Simplified: An Easy Path to Understanding
Background
"LLMs and GenAI Simplified" serves as a beginner-friendly guide to understanding Large
Language Models (LLMs) and their profound impact on various fields, especially artificial
intelligence (AI) and natural language processing (NLP). The book walks readers through the
foundational concepts of LLMs, exploring their architecture, applications, performance
benchmarks, and the ethical considerations surrounding their use.
Here’s an overview of what the book covers in key areas:
1. Introduction to Large Language Models (LLMs)
The book starts by introducing LLMs, which are advanced AI models trained on massive
datasets to understand, generate, and process human language. It explains how these models
like GPT (Generative Pre-trained Transformer), BERT (Bidirectional Encoder
Representations from Transformers), and T5 (Text-To-Text Transfer Transformer) have
transformed how machines interact with human language, providing contextually accurate
answers, writing content, and even simulating conversations.
The chapter also covers the basics of how LLMs leverage deep learning and neural networks,
particularly transformer-based architectures, to handle enormous amounts of text data and
make sense of language patterns.
2. LLM Architecture
This section delves deep into the architecture that powers LLMs, focusing on how
transformers form the backbone of these models. The authors break down the technical
components in a simplified manner, including:
● Attention Mechanisms: How transformers use self-attention to focus on different parts
of a sentence or phrase to capture meaning.
LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024
20
LLMs and GenAI Simplified: An Easy Path to Understanding
● Encoder-Decoder Models: A detailed look at models like BERT (encoder-based) and
GPT (decoder-based), explaining how each processes text differently.
● Pre-training and Fine-tuning: The book covers the concept of pre-training on massive
text corpora and how models are later fine-tuned for specific tasks like sentiment
analysis, translation, or summarization.
The section emphasizes how this architecture allows LLMs to scale effectively, enabling them to
generate human-like text and perform complex language understanding tasks.
3. Applications of LLMs
LLMs have far-reaching applications, and this section provides real-world examples of where
they are being utilized. Key use cases discussed include:
● Chatbots and Virtual Assistants: How companies use LLMs to power intelligent
chatbots like ChatGPT, which handle customer service, technical support, and
personalized user experiences.
● Content Creation: LLMs' ability to write articles, blogs, product descriptions, and other
forms of content generation, automating many repetitive tasks.
● Translation and Summarization: How models like BERT and GPT are used to translate
languages and summarize large amounts of text, improving productivity in areas like
media, law, and academia.
● Code Generation: Models like OpenAI’s Codex (an extension of GPT) are discussed for
their role in generating programming code, reducing the workload for developers.
● Healthcare and Medicine: How LLMs assist in diagnosing, summarizing medical
literature, and providing virtual consultations.
4. LLM Performance Benchmarks
To evaluate the effectiveness and capabilities of LLMs, benchmarks are essential. This section
explains some of the widely used benchmarks for comparing model performance, including:
● GLUE (General Language Understanding Evaluation): A benchmark for evaluating
NLP tasks like sentiment analysis and text entailment.
● SQuAD (Stanford Question Answering Dataset): Focused on reading comprehension
and answering questions based on text.
● SuperGLUE: A more challenging version of GLUE, used to evaluate models on a higher
level of language understanding.
The chapter helps readers understand how models are evaluated, the parameters that indicate
good performance, and the need for continuous benchmarking as models evolve.
5. Governance, Ethics, and Responsible AI
LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024
21
LLMs and GenAI Simplified: An Easy Path to Understanding
This section covers the critical topic of AI governance and ethical considerations in deploying
LLMs. The book highlights:
● Bias in LLMs: The inherent biases in models trained on large, uncurated datasets and
the importance of developing techniques to mitigate these biases.
● Privacy Concerns: How LLMs, when mishandled, could inadvertently reveal sensitive
information contained in training data.
● Regulatory Frameworks: Current global efforts to regulate the use of AI, such as
GDPR and emerging AI governance frameworks that promote transparency, fairness,
and accountability.
The authors stress the importance of developing Responsible AI practices that ensure LLMs
are used ethically and avoid harmful consequences, like spreading misinformation or deepening
societal inequalities.
6. Challenges and Future Directions
The book concludes with a forward-looking perspective, discussing the challenges that LLMs
face, such as the increasing computational power required to train these models, environmental
concerns due to energy consumption, and the limits of generalization in language models.
It also touches on future directions, including:
● Smaller, More Efficient Models: Efforts to create smaller models that retain high
performance but require fewer resources.
● Continual Learning: Exploring the potential for LLMs to learn continuously without
retraining from scratch.
● Human-AI Collaboration: A vision where LLMs augment human decision-making,
combining AI efficiency with human judgment to solve complex problems.
Conclusion
"LLMs for Dummies" simplifies complex AI topics related to Large Language Models, making it
an accessible entry point for anyone interested in how these models work, their applications,
and the ethical implications of their widespread use. Through clear explanations, real-world
examples, and practical insights, the book provides a comprehensive overview for both
beginners and professionals looking to enhance their understanding of LLMs.
CHAPTER -0 : FUNDAMENTALS
Neural Networks Basics
LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024
22
LLMs and GenAI Simplified: An Easy Path to Understanding
Let's walk through how a simple three-layer neural network works with an example step by step!
Overview:
Imagine we have a simple neural network to classify if a fruit is an apple or a banana based on
four input features, like size, color, weight, and shape. The neural network has:
● 4 input neurons (representing the input features),
● 2 output neurons (representing the two possible outcomes: apple or banana),
● 3 hidden neurons (in one hidden layer).
We'll also learn how weights and biases are adjusted using gradient descent, a process to
make the neural network "learn."
Step 1: Set up the Neural Network Structure
● Input Layer: 4 input neurons (size, color, weight, shape).
● Hidden Layer: 3 hidden neurons (which will do some calculations based on the inputs).
● Output Layer: 2 output neurons (one for apple and one for banana).
Each neuron in one layer is connected to every neuron in the next layer through weights
(numbers that determine the strength of connections).
Step 2: Initialize Weights and Biases
At the start, each connection between neurons has a random weight, and each neuron has a
bias (an extra number added to the neuron's calculation).
For simplicity, let's assume the weights between layers are initially:
● From Input to Hidden: Random values like 0.5, 0.2, -0.3, etc.
● From Hidden to Output: Random values like 0.4, -0.2, 0.1, etc.
Biases for each neuron are also random, say 0.1 for now.
Step 3: Forward Propagation
In forward propagation, the network takes the inputs and calculates the output. Here's how it
works:
1. Input Layer:
○ Let's say the input features (size, color, weight, shape) are: [2, 1, 0.5,
1.5].
LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024
23
LLMs and GenAI Simplified: An Easy Path to Understanding
2. Hidden Layer:
○ Each neuron in the hidden layer calculates a weighted sum of the inputs. The
formula for each hidden neuron is: output=activation
function(w1×x1+w2×x2+w3×x3+w4×x4+bias)text{output} = text{activation
function} (w_1 times x_1 + w_2 times x_2 + w_3 times x_3 + w_4 times x_4 +
text{bias})output=activation function(w1​
×x1​
+w2​
×x2​
+w3​
×x3​
+w4​
×x4​
+bias)
■ For example, for the first hidden neuron, if weights are w1 = 0.5, w2 =
-0.2, w3 = 0.1, and w4 = -0.3, the output would be:
output=activation(0.5×2+(−0.2)×1+0.1×0.5+(−0.3)×1.5+0.1)text{output} =
text{activation}(0.5 times 2 + (-0.2) times 1 + 0.1 times 0.5 + (-0.3)
times 1.5 +
0.1)output=activation(0.5×2+(−0.2)×1+0.1×0.5+(−0.3)×1.5+0.1) After
adding the bias, we apply an activation function (usually a function like
ReLU or sigmoid) to make sure the output is non-linear.
3. Output Layer:
○ Each neuron in the output layer also calculates a weighted sum from the hidden
layer’s outputs. If there are 2 output neurons (apple or banana), each output
neuron will give a score. For example, one score might indicate how "likely" the
fruit is an apple and the other for a banana.
Step 4: Calculate Loss (Error)
After calculating the output, we compare it with the actual result (whether the fruit is actually an
apple or banana). This is done using a loss function. Let’s say our prediction is [0.8 for
apple, 0.2 for banana], but the actual result is [1 for apple, 0 for banana].
We calculate the loss or error, which tells us how far our prediction is from the truth. A common
loss function is mean squared error:
Loss=12∑(prediction−actual)2text{Loss} = frac{1}{2} sum (text{prediction} -
text{actual})^2Loss=21​
∑(prediction−actual)2
Step 5: Backpropagation
Now that we know the error, we need to reduce it by adjusting the weights and biases. This
process is called backpropagation.
In backpropagation:
1. Calculate Gradients: We calculate how much each weight contributed to the error. This
is done using derivatives (slopes) to see how the output would change if we slightly
adjust the weight.
2. Adjust Weights and Biases: Using gradient descent, we adjust the weights and
biases to reduce the error. Gradient descent changes weights by a small amount in the
LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024
24
LLMs and GenAI Simplified: An Easy Path to Understanding
direction that reduces the loss. The new weights are calculated as: new weight=old
weight−learning rate×∂loss∂weighttext{new weight} = text{old weight} - text{learning
rate} times frac{partial text{loss}}{partial text{weight}}new weight=old weight−learning
rate×∂weight∂loss​
○ The learning rate is a small number (like 0.01) that controls how fast the network
updates the weights. If the learning rate is too high, the network may "miss" the
optimal solution. If it's too low, the training will be slow.
Step 6: Repeat the Process
We repeat the process (forward propagation, loss calculation, backpropagation) many times.
Each time, the weights and biases are adjusted slightly, and the network learns to make better
predictions.
Simple Example:
Imagine the network starts with random weights and makes a prediction of [0.8 for apple,
0.2 for banana] when the true answer is [1 for apple, 0 for banana]. After
calculating the loss, the network sees that it needs to increase the "apple" output and decrease
the "banana" output.
Backpropagation will slightly change the weights so that next time, the output is closer to the
correct answer, such as [0.9 for apple, 0.1 for banana]. Over many repetitions, the
network will learn to correctly classify the fruit!
Summary:
1. Start with random weights and biases.
2. Forward propagate to calculate the network's output.
3. Calculate the loss based on how far the prediction is from the actual result.
4. Backpropagate the error to adjust the weights and biases using gradient descent.
5. Repeat the process until the network makes accurate predictions.
This is how a simple three-layer neural network learns to classify data step by step
Install TensorFlow: You can install TensorFlow using pip if you don’t have it already:
bash
LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024
25
LLMs and GenAI Simplified: An Easy Path to Understanding
Copy code
pip install tensorflow
1.
2. Run the Code: Use the code below on your local machine after installing TensorFlow.
python
Copy code
import numpy as np
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
# Step 1: Create a sample dataset (features: size, color, weight,
shape; label: apple or banana)
# Let's assume we have a dataset of 10 fruits, and their features are
normalized to be between 0 and 1.
# 0 -> banana, 1 -> apple
data = np.array([
[0.8, 0.7, 0.6, 0.5], # Apple
[0.3, 0.2, 0.4, 0.1], # Banana
[0.9, 0.8, 0.7, 0.6], # Apple
[0.2, 0.1, 0.3, 0.2], # Banana
[0.85, 0.75, 0.65, 0.55], # Apple
[0.1, 0.05, 0.2, 0.15], # Banana
[0.75, 0.7, 0.8, 0.65], # Apple
[0.15, 0.2, 0.3, 0.25], # Banana
[0.9, 0.85, 0.9, 0.75], # Apple
[0.25, 0.2, 0.35, 0.3] # Banana
])
# Labels (1 for apple, 0 for banana)
labels = np.array([
[1], # Apple
[0], # Banana
[1], # Apple
[0], # Banana
[1], # Apple
[0], # Banana
[1], # Apple
LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024
26
LLMs and GenAI Simplified: An Easy Path to Understanding
[0], # Banana
[1], # Apple
[0] # Banana
])
# Step 2: Build a Neural Network Model using Keras
model = Sequential()
model.add(Dense(3, input_dim=4, activation='relu')) # 3 neurons in
the hidden layer, 4 input features
model.add(Dense(2, activation='softmax')) # 2 output neurons (apple
and banana), softmax for classification
# Step 3: Compile the model
model.compile(loss='sparse_categorical_crossentropy',
optimizer='adam', metrics=['accuracy'])
# Step 4: Train the model
model.fit(data, labels, epochs=500, verbose=0)
# Step 5: Test the model with a new fruit input
test_input = np.array([[0.82, 0.76, 0.63, 0.58]]) # Testing with a
new input similar to an apple
prediction = model.predict(test_input)
print("Prediction (Apple or Banana):", prediction)
This script creates a simple neural network for classifying fruits as apples or bananas and trains
it using a small dataset. It then tests the model with a new input and prints the prediction
1. Neural Network Models
A neural network model is a way to organize a network of "neurons" (like a brain) into layers,
each layer doing a job of learning from the data. Each neuron takes in numbers, does some
math, and sends out a result to the next layer.
● Dense Layer (Fully Connected Layer): In a dense layer, every neuron is connected to
every other neuron in the next layer. Think of it as a web where all inputs influence all
outputs. It's the most common type of layer in neural networks.
LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024
27
LLMs and GenAI Simplified: An Easy Path to Understanding
2. Activation Functions
Neurons in a network need to decide whether to pass on information or not. The activation
function is the rule that helps them decide. It changes the output into something manageable,
often between 0 and 1 or some small range.
Here are some common activation functions:
● ReLU (Rectified Linear Unit): This is the most common activation function. It turns any
negative number into zero and keeps positive numbers as they are. So, if the neuron
gives a result of -5, it becomes 0. If it gives 3, it stays 3. ReLU is popular because it
helps models learn faster.
f(x)=max⁡
(0,x)f(x) = max(0, x)f(x)=max(0,x)
● Sigmoid: Sigmoid squashes any number to a range between 0 and 1, which is useful
when you want the output to be a probability (like: is this an apple?).
f(x)=11+e−xf(x) = frac{1}{1 + e^{-x}}f(x)=1+e−x1​
If x is a big positive number, sigmoid will be close to 1, and if x is a big negative number,
it will be close to 0.
● Softmax: This is usually used in the output layer for classification tasks when there are
multiple categories (like classifying if an image is of a cat, dog, or bird). It converts
numbers into probabilities that add up to 1.
f(xi)=exi∑j=1nexjf(x_i) = frac{e^{x_i}}{sum_{j=1}^{n} e^{x_j}}f(xi​
)=∑j=1n​
exj​
exi​
​
3. Loss Functions
A loss function is used to tell how "wrong" the network's predictions are compared to the actual
correct answers. It’s like a score that measures how much error is in the predictions, and the
goal of training is to minimize this loss.
Some common loss functions:
● Mean Squared Error (MSE): This is used when predicting real numbers (like predicting
the price of a house). It measures the average squared difference between predicted
and actual values.
MSE=1n∑i=1n(ypredicted−yactual)2text{MSE} = frac{1}{n} sum_{i=1}^{n}
(y_{text{predicted}} - y_{text{actual}})^2MSE=n1​
i=1∑n​
(ypredicted​
−yactual​
)2
● Cross-Entropy Loss: This is used for classification problems, where the goal is to
choose between multiple classes (like apple or banana). It penalizes wrong predictions
more heavily when the predicted probability is far from the actual answer.
Loss=−∑i=1nyactual⋅log⁡
(ypredicted)text{Loss} = -sum_{i=1}^{n} y_{text{actual}} cdot
log(y_{text{predicted}})Loss=−i=1∑n​
yactual​
⋅log(ypredicted​
)
Cross-entropy loss helps with tasks where you are choosing between categories (like is
the fruit an apple or a banana?).
4. Optimizers
LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024
28
LLMs and GenAI Simplified: An Easy Path to Understanding
Optimizers are algorithms that adjust the weights (the numbers that connect neurons) in the
neural network to minimize the loss. It helps the network "learn" by improving its predictions step
by step.
Some common optimizers:
● SGD (Stochastic Gradient Descent): This is a simple optimizer that adjusts the weights
based on how much the loss would decrease if you changed the weights a little.
"Stochastic" means it updates the weights after looking at one or a few examples rather
than the whole dataset.
wnew=wold−learning rate×∂Loss∂ww_{text{new}} = w_{text{old}} - text{learning rate}
times frac{partial text{Loss}}{partial w}wnew​
=wold​
−learning rate×∂w∂Loss​
It is slower and can get stuck, but it’s straightforward and sometimes works well.
● Adam (Adaptive Moment Estimation): Adam is a more advanced optimizer that
adjusts the learning rate dynamically based on how the error is changing. It tends to
work better than plain SGD in practice.
○ It keeps track of the moving averages of gradients (the slopes that tell how much
the loss will change with a small change in weights) and adjusts the learning rate
accordingly.
5. Metrics
Metrics are ways of measuring how well the network is performing. It’s like getting a scorecard
for how well the model is doing during training.
Some common metrics:
● Accuracy: This is used for classification problems. It measures how many times the
network got the correct answer.
Accuracy=Number of correct predictionsTotal number of predictionstext{Accuracy} =
frac{text{Number of correct predictions}}{text{Total number of
predictions}}Accuracy=Total number of predictionsNumber of correct predictions​
For example, if the model predicts whether a fruit is an apple or banana, and it gets 8 out
of 10 predictions right, the accuracy would be 80%.
● Precision, Recall, F1 Score: These are used when dealing with more complex tasks
like detecting specific events (e.g., a network trying to find spam emails). These metrics
go beyond simple accuracy and help measure how well the model is detecting true
positives or avoiding false positives.
Putting It All Together:
Imagine we are building a model to classify fruits (apple or banana). Here's how all the parts
work together:
LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024
29
LLMs and GenAI Simplified: An Easy Path to Understanding
1. Model: We create a neural network model with dense layers.
2. Activation Functions: We use ReLU in the hidden layers to make decisions and
softmax in the output layer to predict the probability of apple vs. banana.
3. Loss Function: We choose cross-entropy loss because this is a classification task.
4. Optimizer: We pick Adam because it helps the model learn faster and more effectively.
5. Metrics: We track accuracy to see how often the model is making correct predictions.
With each step, the neural network adjusts its weights using the optimizer to reduce the loss,
which improves its accuracy over time.
In addition to the Sequential model, which is the most straightforward type of neural network
model in Keras (or TensorFlow), there are other types of models that allow for more flexibility,
especially for complex neural networks.
Here are a few common types:
1. Sequential Model
● The Sequential model is the simplest neural network model. Layers are stacked one
after the other in a straight line, which is useful for simple, feed-forward neural networks.
● It works well when the model can be described as a sequence of layers where the output
of one layer is the input to the next.
Example:
python
Copy code
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
model = Sequential()
model.add(Dense(10, input_shape=(4,), activation='relu'))
model.add(Dense(2, activation='softmax'))
2. Functional API
● The Functional API is more flexible than the Sequential model and allows for the
creation of complex models where layers may have multiple inputs or outputs, share
LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024
30
LLMs and GenAI Simplified: An Easy Path to Understanding
layers, or connect layers in non-linear ways (like in branching networks or residual
networks).
● This is useful when you need more control over how layers are connected.
Example:
python
Copy code
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input, Dense
# Input layer
inputs = Input(shape=(4,))
# Hidden layer
x = Dense(10, activation='relu')(inputs)
# Output layer
outputs = Dense(2, activation='softmax')(x)
# Build the model
model = Model(inputs=inputs, outputs=outputs)
● Here, you define the connections between layers explicitly, which is useful for models
like multi-input/multi-output networks or when layers are reused.
3. Subclassing Model
● Model Subclassing is the most flexible way to create custom models by subclassing the
Model class. It allows you to define your own forward pass (how the inputs move
through the network) and gives full control over the model's behavior.
● This is useful for very customized architectures where neither Sequential nor Functional
API models are sufficient.
Example:
python
Copy code
from tensorflow.keras import Model
from tensorflow.keras.layers import Dense
class CustomModel(Model):
def __init__(self):
LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024
31
LLMs and GenAI Simplified: An Easy Path to Understanding
super(CustomModel, self).__init__()
self.dense1 = Dense(10, activation='relu')
self.dense2 = Dense(2, activation='softmax')
def call(self, inputs):
x = self.dense1(inputs)
return self.dense2(x)
# Instantiate the model
model = CustomModel()
● You define the layers in __init__ and control the forward pass in the call method.
This allows for maximum flexibility in building the model.
4. Model with Shared Layers
● Some models use shared layers, where the same layer is reused multiple times in
different parts of the model. This is often used in models like Siamese networks (used for
tasks like face recognition), where the same network processes two different inputs.
Example:
python
Copy code
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input, Dense
# Shared layer
shared_dense = Dense(10, activation='relu')
# Inputs
input1 = Input(shape=(4,))
input2 = Input(shape=(4,))
# Shared processing
output1 = shared_dense(input1)
output2 = shared_dense(input2)
# Create the model
model = Model(inputs=[input1, input2], outputs=[output1, output2])
LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024
32
LLMs and GenAI Simplified: An Easy Path to Understanding
● Here, the Dense(10) layer is shared, meaning both inputs pass through the same layer,
which can be useful in tasks where we want to learn common features.
5. Multi-Input and Multi-Output Models
● These models take multiple inputs and produce multiple outputs, useful in complex
applications like question-answering systems, recommendation systems, or image
captioning.
Example:
python
Copy code
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input, Dense, concatenate
# Inputs
inputA = Input(shape=(4,))
inputB = Input(shape=(6,))
# Hidden layers for both inputs
x = Dense(8, activation='relu')(inputA)
y = Dense(8, activation='relu')(inputB)
# Merge the outputs
merged = concatenate([x, y])
# Final output
z = Dense(1, activation='sigmoid')(merged)
# Create the model
model = Model(inputs=[inputA, inputB], outputs=z)
● Here, the model takes two different inputs (of different sizes) and combines them after
separate processing, then produces one final output.
6. Autoencoders
LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024
33
LLMs and GenAI Simplified: An Easy Path to Understanding
● Autoencoders are a special type of neural network model used for unsupervised
learning tasks like data compression or anomaly detection. They consist of two parts: an
encoder that compresses the input and a decoder that reconstructs the input from the
compressed version.
Example:
python
Copy code
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input, Dense
# Input layer
input_layer = Input(shape=(4,))
# Encoder
encoded = Dense(2, activation='relu')(input_layer)
# Decoder
decoded = Dense(4, activation='sigmoid')(encoded)
# Autoencoder model
autoencoder = Model(inputs=input_layer, outputs=decoded)
● The autoencoder reduces the dimensions of the data and then tries to reconstruct the
original input.
7. GANs (Generative Adversarial Networks)
● GANs are a type of neural network model that consists of two parts: a generator that
creates fake data and a discriminator that tries to tell the real data from the fake. GANs
are used to generate new data like images or audio.
Example (simplified structure):
python
Copy code
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
# Generator model
LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024
34
LLMs and GenAI Simplified: An Easy Path to Understanding
generator = Sequential()
generator.add(Dense(10, input_dim=100, activation='relu'))
generator.add(Dense(4, activation='sigmoid')) # Generates a 4-feature
fake example
# Discriminator model
discriminator = Sequential()
discriminator.add(Dense(10, input_dim=4, activation='relu'))
discriminator.add(Dense(1, activation='sigmoid')) # Predicts whether
the input is real or fake
● GANs are trained by making the generator fool the discriminator while the discriminator
tries to get better at identifying fakes.
Summary
● Sequential: Simple, layers stacked one after the other.
● Functional API: Flexible, allows for more complex architectures like multi-input,
multi-output, or shared layers.
● Subclassing: Full control over the network's structure and forward pass.
● Shared Layers: Reuses the same layer in different parts of the model.
● Multi-Input/Output: Models that handle multiple inputs and outputs simultaneously.
● Autoencoders: For compression and reconstruction tasks.
● GANs: Models that generate new data by training two networks (generator and
discriminator).
Each of these models has specific uses and allows neural networks to solve a variety of
problems, from simple classification to complex generative tasks.
Here are some common types of loss functions:
1. Mean Squared Error (MSE)
● Type: Regression (predicting continuous values)
● Use Case: Used when predicting a real number (e.g., house price, temperature).
● How it works: It calculates the squared difference between predicted values and actual
values, then averages it over all examples.
LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024
35
LLMs and GenAI Simplified: An Easy Path to Understanding
MSE=1n∑i=1n(ypredicted−yactual)2text{MSE} = frac{1}{n} sum_{i=1}^{n}
(y_{text{predicted}} - y_{text{actual}})^2MSE=n1​
i=1∑n​
(ypredicted​
−yactual​
)2
○ Explanation: If the prediction is 5 and the actual value is 3, the difference is 222,
and its square is 444. Squaring emphasizes larger differences, making the
network learn to reduce large errors.
2. Mean Absolute Error (MAE)
● Type: Regression
● Use Case: Used when predicting continuous values, similar to MSE.
● How it works: It calculates the absolute difference between the predicted and actual
values and averages it over all examples. MAE=1n∑i=1n∣ypredicted−yactual∣text{MAE}
= frac{1}{n} sum_{i=1}^{n} |y_{text{predicted}} -
y_{text{actual}}|MAE=n1​
i=1∑n​
∣ypredicted​
−yactual​
∣
○ Explanation: MAE is similar to MSE, but it uses absolute differences instead of
squares. This makes MAE less sensitive to outliers than MSE because it doesn’t
square the error.
3. Binary Cross-Entropy (Log Loss)
● Type: Binary Classification
● Use Case: Used when classifying between two classes (e.g., cat or dog, apple or
banana).
● How it works: It calculates the negative log of the predicted probability for the actual
class. For binary classification, it looks at one output neuron that predicts a probability
between 0 and 1.
Loss=−(yactual⋅log⁡
(ypredicted)+(1−yactual)⋅log⁡
(1−ypredicted))text{Loss} = - left(
y_{text{actual}} cdot log(y_{text{predicted}}) + (1 - y_{text{actual}}) cdot log(1 -
y_{text{predicted}})
right)Loss=−(yactual​
⋅log(ypredicted​
)+(1−yactual​
)⋅log(1−ypredicted​
))
○ Explanation: If the actual label is 1 (e.g., it’s a cat), and the model predicts 0.8
(80% confidence), the loss will be small. But if the model predicts 0.1 (only 10%
confidence it’s a cat), the loss will be large. This encourages the model to give
high probabilities for correct predictions.
4. Categorical Cross-Entropy
● Type: Multi-class Classification
● Use Case: Used for classification when there are more than two classes (e.g., dog, cat,
rabbit).
● How it works: It is similar to binary cross-entropy but works for multiple classes. The
model predicts a probability distribution over several classes, and categorical
cross-entropy calculates how well the predicted probabilities match the actual class.
LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024
36
LLMs and GenAI Simplified: An Easy Path to Understanding
Loss=−∑i=1nyactual,i⋅log⁡
(ypredicted,i)text{Loss} = -sum_{i=1}^{n} y_{text{actual}, i}
cdot log(y_{text{predicted}, i})Loss=−i=1∑n​
yactual,i​
⋅log(ypredicted,i​
)
○ Explanation: The loss is low when the predicted probability is high for the actual
class. For example, if the actual class is "dog" and the model predicts 80% for
"dog," the loss will be small. If it predicts 20%, the loss will be larger.
5. Sparse Categorical Cross-Entropy
● Type: Multi-class Classification
● Use Case: Similar to categorical cross-entropy but used when the labels are integers
instead of one-hot encoded vectors. It’s useful for efficiency when you have many
classes.
● How it works: It’s the same as categorical cross-entropy but expects the target labels to
be integers (like 0, 1, 2 for dog, cat, rabbit) rather than one-hot encoded vectors.
6. Hinge Loss
● Type: Binary Classification (often used with Support Vector Machines)
● Use Case: Used in binary classification tasks, particularly in support vector machines
(SVMs).
● How it works: Hinge loss ensures that the correct class has a margin of at least 1 over
the incorrect class. It penalizes predictions that are wrong or too close to the decision
boundary. Loss=max⁡
(0,1−yactual⋅ypredicted)text{Loss} = max(0, 1 - y_{text{actual}}
cdot y_{text{predicted}})Loss=max(0,1−yactual​
⋅ypredicted​
)
○ Explanation: If the actual class is +1+1+1 and the predicted output is
+0.9+0.9+0.9, the loss will be small. But if the predicted output is +0.1+0.1+0.1 or
negative (wrong class), the loss will be large.
7. Huber Loss
● Type: Regression
● Use Case: A combination of MSE and MAE, used when you want to be robust against
outliers while still penalizing large errors.
● How it works: For small errors, it behaves like MSE (squares the error), and for large
errors, it behaves like MAE (linear). Loss={12(ypredicted−yactual)2for
∣ypredicted−yactual∣≤δ,δ(∣ypredicted−yactual∣−12δ)otherwisetext{Loss} =
begin{cases} frac{1}{2}(y_{text{predicted}} - y_{text{actual}})^2 & text{for }
|y_{text{predicted}} - y_{text{actual}}| leq delta,  delta (|y_{text{predicted}} -
y_{text{actual}}| - frac{1}{2} delta) & text{otherwise}
end{cases}Loss={21​
(ypredicted​
−yactual​
)2δ(∣ypredicted​
−yactual​
∣−21​
δ)​
for
∣ypredicted​
−yactual​
∣≤δ,otherwise​
○ Explanation: It’s useful when you want the best of both worlds: minimizing large
errors like MSE but not being too sensitive to outliers like MAE.
LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024
37
LLMs and GenAI Simplified: An Easy Path to Understanding
8. Kullback-Leibler Divergence (KL Divergence)
● Type: Classification, often used in probabilistic models
● Use Case: Measures how one probability distribution differs from a reference
distribution. Used in tasks like training variational autoencoders (VAE) and reinforcement
learning.
● How it works: It calculates how one probability distribution (predicted) diverges from
another (actual). KL(P∣∣Q)=∑P(x)⋅log⁡
(P(x)Q(x))text{KL}(P || Q) = sum P(x) cdot
logleft(frac{P(x)}{Q(x)}right)KL(P∣∣Q)=∑P(x)⋅log(Q(x)P(x)​
)
○ Explanation: If the predicted probability distribution is very different from the
actual distribution, the loss will be high. This encourages the model to predict
distributions closer to the true distribution.
9. Poisson Loss
● Type: Regression, often for count-based data
● Use Case: Used when predicting count data, such as the number of occurrences of an
event (e.g., number of emails received in a day).
● How it works: It assumes that the output follows a Poisson distribution and penalizes
predictions that are far from the actual count.
Loss=ypredicted−yactual⋅log⁡
(ypredicted)text{Loss} = y_{text{predicted}} -
y_{text{actual}} cdot log(y_{text{predicted}})Loss=ypredicted​
−yactual​
⋅log(ypredicted​
)
○ Explanation: The loss is small when the predicted count is close to the actual
count, and large when the predicted count is very far off.
Summary of Loss Functions by Type:
● Regression (predicting real numbers):
○ Mean Squared Error (MSE): Penalizes large errors heavily.
○ Mean Absolute Error (MAE): Penalizes all errors equally.
○ Huber Loss: A mix of MSE and MAE, less sensitive to outliers.
○ Poisson Loss: For count data.
● Binary Classification (two classes):
○ Binary Cross-Entropy: Used when classifying two categories (e.g., cat vs. dog).
○ Hinge Loss: Used in SVMs to maximize the margin between classes.
● Multi-Class Classification (more than two classes):
○ Categorical Cross-Entropy: For multi-class classification with one-hot encoded
labels.
○ Sparse Categorical Cross-Entropy: For multi-class classification with integer
labels.
○ KL Divergence: Measures the difference between predicted and actual
probability distributions.
LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024
38
LLMs and GenAI Simplified: An Easy Path to Understanding
CHAPTER-1: GenAI and LLM
Large language models (LLMs) and generative AI (GenAI) are both types of
artificial intelligence (AI) that can be used to create content, but they have
different capabilities and uses:
​ Generative AI
​ A broad category of AI that can create a variety of content, such as text, images,
videos, audio, and computer code. GenAI can be trained to respond to prompts or
requests from users. For example, GenAI can be used to compose music, design
graphics, or diagnose diseases from medical images.
​ LLMs
​ A specific type of generative AI that focuses on language-related tasks, such as
generating and understanding human-like text. LLMs are trained on large amounts
of data to create new combinations of text that mimic natural language. LLMs are
used in a variety of applications, including customer service, drafting emails, and
summarizing documents.
LLMs and GenAI can be used together to enhance a variety of applications,
such as ecommerce, conversational search, and enterprise search. For
example, ecommerce websites can use LLMs and GenAI to personalize the
shopping experience for customers
LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024
39
LLMs and GenAI Simplified: An Easy Path to Understanding
2. LLM Types
1. General-Purpose LLMs
● GPT (Generative Pre-trained Transformer): Developed by OpenAI, GPT models (like
GPT-2, GPT-3, and GPT-4) are autoregressive models that generate coherent text
based on input prompts. These are widely used in tasks like text generation, translation,
and summarization.
● BERT (Bidirectional Encoder Representations from Transformers): Created by
Google, BERT is a transformer model designed for understanding the context in both
directions, making it effective for tasks like question answering and sentiment analysis.
2. Multilingual LLMs
● mBERT (Multilingual BERT): A variant of BERT that is trained on data from multiple
languages, making it suitable for multilingual text processing tasks.
● XLM-R (Cross-lingual Language Model): A multilingual variant of BERT, trained on
more than 100 languages, designed for cross-lingual tasks like translation and
multilingual sentence representation.
3. Instruction-Following LLMs
● InstructGPT: A version of GPT-3 fine-tuned using Reinforcement Learning from Human
Feedback (RLHF) to better follow user instructions.
● FLAN (Fine-Tuned Language Net): Developed by Google, FLAN is a fine-tuned model
based on task instructions, making it highly effective in zero-shot and few-shot learning
tasks.
4. Conversational LLMs
● DialoGPT: A GPT-2-based model fine-tuned for conversation, designed for more natural
and coherent dialogues.
● BlenderBot: A conversational model developed by Meta, designed for long-term
dialogue and more complex conversations.
5. Code Generation LLMs
● Codex: A GPT-based model trained by OpenAI specifically for generating code from
natural language. It powers tools like GitHub Copilot.
● CodeBERT: A model designed for programming tasks like code generation, code
search, and code summarization.
LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024
40
LLMs and GenAI Simplified: An Easy Path to Understanding
6. Specialized LLMs
● BioBERT: A version of BERT specialized for biomedical text mining and tasks in
bioinformatics.
● ClinicalBERT: A variant of BERT trained on clinical notes and datasets for healthcare
applications.
● FinBERT: Designed for financial sentiment analysis, FinBERT is a BERT model
fine-tuned on financial text.
7. Knowledge-Enhanced LLMs
● T5 (Text-to-Text Transfer Transformer): Google’s T5 converts all NLP tasks into a
text-to-text format, including question answering, translation, and summarization.
● RAG (Retrieval-Augmented Generation): A hybrid model that combines a language
model with a retrieval system, allowing it to fetch relevant external knowledge during
generation.
8. Multimodal LLMs
● CLIP (Contrastive Language-Image Pretraining): Developed by OpenAI, CLIP learns
to understand text and images in a unified way, excelling in tasks like image captioning
and image classification.
● DALL-E: An image generation model that creates images based on textual descriptions,
leveraging multimodal capabilities.
9. Compression and Parameter-Efficient LLMs
● DistilBERT: A smaller, faster, and more efficient variant of BERT, trained using
knowledge distillation to achieve a similar performance with fewer parameters.
● ALBERT (A Lite BERT): A more parameter-efficient version of BERT that reduces
memory footprint and training time without compromising much on accuracy.
10. Large Language Models with Memory
● RETRO (Retrieval-Enhanced Transformer): Developed by DeepMind, RETRO uses a
retrieval mechanism to access external databases during text generation, allowing it to
generate long, coherent text with less computation.
● MemGPT: A GPT variant that incorporates a memory mechanism to handle complex,
long-range dependencies in text.
11. Few-Shot and Zero-Shot LLMs
LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024
41
LLMs and GenAI Simplified: An Easy Path to Understanding
● GPT-3/4 Few-Shot Learning: These models demonstrate the ability to perform tasks
with minimal training examples (few-shot) or even without any task-specific training
examples (zero-shot), making them versatile for a wide range of applications.
● T0: A fine-tuned model from Hugging Face, trained to perform multiple tasks in a
zero-shot setting using prompts.
12. Reinforcement Learning-Based LLMs
● ChatGPT (GPT-3/4 + RLHF): ChatGPT is trained with reinforcement learning from
human feedback (RLHF) to ensure safer and more helpful interactions during
conversations.
● Sparrow: Developed by DeepMind, it is trained via RLHF to provide more accurate and
less harmful answers while following safety guidelines.
3. Popular LLMs
1. OpenAI GPT (Generative Pre-trained Transformer)
● Developers: OpenAI
● Notable Models: GPT-2, GPT-3, GPT-4, ChatGPT
● Architecture: Decoder-only transformer architecture, autoregressive models.
● Core Features:
○ GPT-3 has 175 billion parameters, while GPT-4 is speculated to have over 100
trillion parameters (though exact figures are not publicly disclosed).
○ These models are pre-trained on a massive corpus of data and are fine-tuned to
perform various natural language processing tasks like text generation,
summarization, translation, and more.
○ GPT-4 is multimodal, meaning it can accept both image and text inputs, making it
more versatile than its predecessors.
● Use Cases: Used extensively in conversational AI (ChatGPT), code generation (Codex),
content creation, and research assistance.
References:
● OpenAI Research
● GPT-4 Technical Paper arXiv
2. LLaMA (Large Language Model Meta AI)
● Developers: Meta (Facebook AI)
LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024
42
LLMs and GenAI Simplified: An Easy Path to Understanding
● Notable Models: LLaMA, LLaMA 2
● Architecture: Transformer-based architecture, but designed to be more efficient and
accessible with fewer parameters compared to GPT models.
● Core Features:
○ The model is available in different sizes (7B, 13B, 33B, and 65B parameters),
focusing on lower computational costs while maintaining high performance.
○ It has been specifically optimized to reduce resource usage, making it more
accessible for research and practical applications.
○ LLaMA models are open-source, unlike GPT, which is proprietary.
● Use Cases: Research on language tasks, including text classification, question
answering, and text generation.
References:
● Meta AI LLaMA Release
● LLaMA 2 Overview
3. Google Gemini
● Developers: Google DeepMind
● Notable Models: Gemini 1, Gemini 1.5 (Upcoming)
● Architecture: Based on the PaLM architecture (Pathways Language Model) but
includes multimodal capabilities like GPT-4, meaning it can handle both image and text
inputs.
● Core Features:
○ Gemini integrates reinforcement learning with human feedback (RLHF), making it
more reliable for real-world applications.
○ The model is designed to handle multimodal inputs (text and images), improving
its use in tasks requiring visual and textual data.
● Use Cases: Search enhancements, AI assistants like Bard, translation, and research.
References:
● Google Gemini Announcement
● DeepMind Research
4. Claude (Claude 1, Claude 2)
● Developers: Anthropic
● Architecture: Similar to GPT, based on the transformer architecture but with a specific
focus on safety and alignment, ensuring the model produces less harmful outputs.
● Core Features:
○ Anthropic’s focus is on building "helpful, honest, and harmless" AI systems,
leading to a model that emphasizes human-centered values.
LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024
43
LLMs and GenAI Simplified: An Easy Path to Understanding
○ Claude models are named after Claude Shannon, the father of information
theory, and they are primarily designed for conversational agents.
● Use Cases: Chatbots, customer service, task automation, and conversational AI.
References:
● Anthropic Research
5. PaLM (Pathways Language Model)
● Developers: Google AI
● Notable Models: PaLM 2
● Architecture: Transformer-based model with a focus on scaling across multiple
languages and modalities.
● Core Features:
○ PaLM 2 is capable of understanding and generating text in over 100 languages
and is trained to handle a variety of modalities including image and text.
○ It emphasizes efficiency and is highly scalable, designed to be part of Google’s
larger AI ecosystem, integrating with models like Gemini.
● Use Cases: Translation, summarization, text-to-image, and research.
References:
● Google PaLM Overview
6. BLOOM
● Developers: BigScience (an open-science collaboration)
● Notable Models: BLOOM-176B
● Architecture: Transformer-based model similar to GPT, with a multilingual focus.
● Core Features:
○ BLOOM supports 59 languages and 13 programming languages, making it one of
the most accessible LLMs for diverse linguistic research.
○ It is open-source and community-driven, aiming to democratize access to
large-scale AI models.
● Use Cases: Language generation, multilingual translation, code generation, and
research.
References:
● BigScience BLOOM
7. Grok (XAI)
● Developers: xAI (Elon Musk's company)
LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024
44
LLMs and GenAI Simplified: An Easy Path to Understanding
● Notable Models: Grok (still in development)
● Architecture: Expected to be transformer-based and fine-tuned on various complex
reasoning tasks, but specific details are not yet public.
● Core Features: Grok aims to focus on better understanding reasoning and
problem-solving tasks, possibly leveraging large datasets similar to GPT models.
● Use Cases: Still speculative, but likely similar to other general-purpose models with a
focus on reasoning and conversational abilities.
References:
● xAI Grok Overview
8. Mistral
● Developers: Mistral AI
● Notable Models: Mistral 7B
● Architecture: Transformer-based model designed for efficiency, with fewer parameters
but high performance.
● Core Features:
○ Focused on parameter efficiency, Mistral provides competitive performance
despite smaller model size compared to GPT or PaLM.
● Use Cases: NLP tasks such as text generation, summarization, and translation.
References:
● Mistral AI
Conclusion:
Each LLM has unique features and strengths tailored to different use cases:
● OpenAI GPT is strong in general-purpose language tasks and generation.
● LLaMA offers a more accessible and efficient alternative for researchers.
● Gemini emphasizes multimodality and reinforcement learning.
● BLOOM stands out with its multilingual capabilities.
● Claude focuses on safety and human alignment, while PaLM emphasizes scalability
across languages and modalities
4. Open source LLMs
LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024
45
LLMs and GenAI Simplified: An Easy Path to Understanding
1. BERT (Bidirectional Encoder Representations from Transformers)
● Model: BERT base-uncased
● Description: One of the most widely-used models for tasks like text classification,
question answering, and named entity recognition. It uses a bidirectional transformer
architecture that reads text from both directions.
● Parameters: 110 million
● Use Cases: Sentiment analysis, text classification, question answering.
2. GPT-2
● Model: GPT-2
● Description: A generative model from OpenAI designed for text generation. It predicts
the next word in a sequence, making it great for creative text generation tasks.
● Parameters: 1.5 billion
● Use Cases: Text generation, summarization, and dialogue systems.
3. RoBERTa
● Model: RoBERTa base
● Description: A variant of BERT with optimizations in training techniques, RoBERTa is
fine-tuned for better performance on downstream tasks.
● Parameters: 125 million
● Use Cases: Text classification, question answering, natural language inference.
4. T5 (Text-to-Text Transfer Transformer)
● Model: T5 base
● Description: T5 reframes every NLP task as a text-to-text problem, making it incredibly
versatile. It is used for tasks like translation, summarization, and text generation.
● Parameters: 220 million
● Use Cases: Summarization, translation, question answering.
5. BLOOM (BigScience Large Open-science Open-access Multilingual
Language Model)
● Model: BLOOM
● Description: A multilingual LLM supporting 46 languages and 13 programming
languages, BLOOM is an open-science model designed for research and NLP tasks.
● Parameters: 176 billion
● Use Cases: Multilingual NLP, text generation, translation, code generation.
6. DistilBERT
LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024
46
LLMs and GenAI Simplified: An Easy Path to Understanding
● Model: DistilBERT base-uncased
● Description: A lighter, faster version of BERT that retains 97% of its language
understanding capabilities while being more computationally efficient.
● Parameters: 66 million
● Use Cases: Text classification, sentiment analysis, question answering.
7. XLM-R (XLM-RoBERTa)
● Model: XLM-R large
● Description: A cross-lingual version of RoBERTa that is pre-trained on 100 languages,
making it useful for tasks in multilingual contexts.
● Parameters: 550 million
● Use Cases: Multilingual text classification, translation, and named entity recognition.
8. BART (Bidirectional and Auto-Regressive Transformers)
● Model: BART base
● Description: A transformer model that combines both a bidirectional encoder and
autoregressive decoder, designed for text generation and summarization.
● Parameters: 140 million
● Use Cases: Text summarization, machine translation, and question answering.
9. Flan-T5
● Model: Flan-T5
● Description: An extension of T5 that is fine-tuned on a variety of instruction-based
tasks, making it highly versatile for few-shot and zero-shot learning.
● Parameters: 780 million
● Use Cases: Text summarization, translation, few-shot learning.
10. CodeBERT
● Model: CodeBERT
● Description: Pretrained for both natural language and programming languages,
CodeBERT is specifically optimized for source code-related tasks.
● Parameters: 125 million
● Use Cases: Code generation, code search, code summarization.
LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024
47
LLMs and GenAI Simplified: An Easy Path to Understanding
CHAPTER 2: LLM Architecture
5. LLM Transformer architecture
1. Encoder Architecture
● Purpose: The encoder architecture is designed to understand input. It reads and
processes text to capture its meaning.
● How it works: Imagine you're trying to understand a sentence in a book. The encoder
takes in every word, processes it, and tries to understand the whole text by relating the
words to one another.
● Example: The BERT model is a popular encoder-based architecture. It’s great at
understanding context, like figuring out the meaning of a sentence by looking at all the
words.
2. Decoder Architecture
● Purpose: The decoder architecture focuses on generating text based on some input or
prompt.
● How it works: Picture yourself trying to write a story. The decoder takes a starting point
(like a topic) and continues generating text based on what it learned from patterns.
● Example: Models like GPT (Generative Pre-trained Transformer) are decoders. They’re
used for generating long pieces of text, answering questions, and creating dialogue.
3. Encoder-Decoder Architecture
● Purpose: This type combines both encoder and decoder to read input and generate a
response.
LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024
48
LLMs and GenAI Simplified: An Easy Path to Understanding
● How it works: Imagine you’re translating a sentence from one language to another. The
encoder first reads and understands the sentence, and the decoder then generates the
translation.
● Example: T5 and BART are examples of encoder-decoder models, commonly used for
tasks like machine translation and summarization.
The Transformer architecture consists of an encoder-decoder structure, but LLMs, such as
GPT, BERT, and others, often use either just the encoder (BERT) or just the decoder (GPT)
depending on the task. These components are built from layers of self-attention mechanisms
and feed-forward neural networks.
Key Components:
● Self-Attention Mechanism:
○ Self-attention allows the model to weigh the importance of each word in a
sentence relative to the others. This is done by computing three vectors for each
word: query (Q), key (K), and value (V). These vectors help the model
determine how much attention to pay to each word when generating an output.
○ The formula for attention is as follows:
Attention(Q,K,V)=softmax(QKTdk)Vtext{Attention}(Q, K, V) =
text{softmax}left(frac{QK^T}{sqrt{d_k}}right)VAttention(Q,K,V)=softmax(dk​
​
QKT​
)V
○ This mechanism allows the model to capture dependencies between words
regardless of their distance in the sentence.
● Positional Encoding:
○ Unlike RNNs or LSTMs, Transformers do not process tokens sequentially.
Instead, they process all tokens at once. To capture the order of the words in the
sequence, a positional encoding is added to the input embeddings.
○ The positional encoding adds information about the token’s position using sine
and cosine functions of different frequencies.
● Feed-Forward Network (FFN):
○ Each attention block is followed by a fully connected feed-forward network, which
processes the outputs of the self-attention mechanism.
○ This layer applies a linear transformation, followed by a non-linear activation
function (usually ReLU), and then another linear transformation.
● Multi-Head Attention:
○ Instead of calculating attention just once, the Transformer model calculates it
multiple times in parallel, referred to as "multi-head attention." Each attention
head can focus on different parts of the sentence, helping the model capture
richer contextual information.
LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024
49
LLMs and GenAI Simplified: An Easy Path to Understanding
Let’s break down how a sentence like "Good morning" is translated into French using an
Encoder-Decoder architecture, step by step. The architecture is powered by transformers,
which have multiple layers involving attention, feed-forward networks, and other components. I’ll
explain it in a simple, understandable way, imagining the model translating "Good morning" to
"Bonjour."
Step 1: Input (Good morning)
The input sentence "Good morning" is first converted into numbers (tokens). These tokens
represent each word so that the model can understand and process the input.
For example:
● "Good" becomes token 12.
● "Morning" becomes token 34.
So the input becomes: [12, 34].
Encoder Steps: Processing the Input Sentence
1. Embedding Layer
○ The tokens [12, 34] are turned into word embeddings—vectors that contain
information about the meaning of each word.
○ Imagine each word becomes a detailed vector (a list of numbers) that tells the
model more about the word's properties and relationships to other words.
2. Positional Encoding
○ Since word order matters (e.g., "Good morning" is different from "Morning
good"), a positional encoding is added to the word embeddings. This helps the
model understand the position of each word in the sentence.
3. After this step, we have vectors for "Good" and "Morning" that include both meaning
and position.
4. Self-Attention
○ Attention is like a smart highlighter. It allows the model to focus on important
words when processing the sentence.
○ For "Good morning", the attention mechanism compares "Good" with
"Morning" and checks how much each word contributes to the meaning of the
whole sentence.
5. The result is that both "Good" and "Morning" get updated to reflect the overall meaning
of the sentence.
LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024
50
LLMs and GenAI Simplified: An Easy Path to Understanding
6. Feed-Forward Neural Network
○ After the attention step, each word vector goes through a feed-forward network,
which is like a math function that adds more depth and non-linearity to the
information. This helps the model capture complex patterns in the data.
7. At this point, the sentence has been transformed into deep, meaningful representations
that the encoder can understand well.
8. Multi-Head Attention
○ In practice, attention is applied multiple times, from different perspectives. This is
called multi-head attention. Each attention head focuses on different parts of
the sentence, like meaning, structure, or relationships between words.
9. All these attention results are combined to further enhance the representation of the
input.
10. Output of Encoder
○ The encoder finishes by outputting a detailed, transformed version of the
sentence, ready for the decoder. It doesn’t translate yet—it just understands
"Good morning" deeply.
Decoder Steps: Generating the Translation ("Bonjour")
1. Start Token
○ The decoder begins with a special start token to signal that it’s time to generate
the translation. For French, this token might represent the start of a French
sentence.
2. Attention Over Encoder Output
○ Now, the decoder needs to look at the encoder’s output (the detailed
representation of "Good morning"). It uses attention again, called
encoder-decoder attention, to focus on relevant parts of the encoder's output.
3. For instance, the decoder will focus on both "Good" and "Morning" to decide how to
begin translating.
4. Feed-Forward Network
○ Like in the encoder, the decoder also has a feed-forward network to further
process the data and ensure it generates the right translation.
5. Self-Attention
○ The decoder also applies self-attention to its own output to ensure it makes
sense. This helps the decoder generate words one by one while keeping track of
the sentence's structure.
6. Generate "Bonjour" (Word-by-Word)
○ Now, the model begins to generate the French translation, one word at a time.
○ First, it generates the word "Bonjour" because the model has learned that
"Good morning" translates to "Bonjour" in French.
7. Softmax Layer (Word Prediction)
LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024
51
LLMs and GenAI Simplified: An Easy Path to Understanding
○ After generating "Bonjour", the model passes the prediction through a softmax
layer. This step calculates the probability of each possible word in the French
language and selects the most likely one.
○ For example, the softmax might calculate probabilities for words like "Bonjour"
(80%), "Salut" (15%), and "Au revoir" (5%). Since "Bonjour" has the highest
probability, it is chosen as the next word in the translation.
8. Repeat for More Words
○ The decoder continues generating words using the same process until it reaches
a special end token, signaling the end of the translation.
Putting It All Together:
1. The encoder processes and deeply understands "Good morning".
2. The decoder starts generating the translation, word by word, using the encoder's output.
3. With each step, attention mechanisms help the model focus on important words and the
softmax layer ensures that the right word is chosen based on probability.
4. Finally, the model generates "Bonjour", which is the correct French translation of "Good
morning".
CHAPTER 3: LLM Applications
6. LLM Gen AI use cases
1. Text Generation
● Use Case: Content generation for blogs, articles, marketing materials, or even creative
writing.
● Models: GPT-2, GPT-3, and Bloom.
● Example: Automatically generating text based on prompts, such as product descriptions
or long-form articles.
2. Question Answering
LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024
52
LLMs and GenAI Simplified: An Easy Path to Understanding
● Use Case: Building intelligent assistants or chatbots that can answer questions based
on a knowledge base or real-time input.
● Models: BERT, RoBERTa, T5, and DistilBERT.
● Example: Customer support chatbots that can retrieve and respond to queries using
company documentation or FAQs.
3. Text Summarization
● Use Case: Automatic summarization of long documents or reports for efficient
consumption.
● Models: BART, T5.
● Example: Summarizing lengthy research papers, legal documents, or meeting minutes
into concise, readable summaries.
4. Text Classification
● Use Case: Sentiment analysis, spam detection, and categorizing customer reviews or
feedback.
● Models: BERT, DistilBERT, XLNet.
● Example: Sorting emails into categories like spam, promotions, and primary; or
identifying the sentiment behind customer reviews.
5. Translation
● Use Case: Language translation for websites, apps, or business communications.
● Models: MarianMT, M2M100.
● Example: Translating product descriptions into multiple languages for e-commerce
platforms.
6. Conversational AI (Chatbots)
● Use Case: Building interactive, conversational agents for customer service or virtual
assistants.
● Models: DialoGPT, BlenderBot.
● Example: Creating virtual assistants that can engage in back-and-forth conversations to
assist with tasks or answer customer inquiries.
7. Image Generation (Text-to-Image)
● Use Case: Generating images based on textual descriptions.
● Models: DALL-E, Stable Diffusion.
● Example: Creating marketing visuals, concept art, or prototypes based on written inputs.
8. Image Classification
LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024
53
LLMs and GenAI Simplified: An Easy Path to Understanding
● Use Case: Identifying objects, people, or actions in images.
● Models: ResNet, ViT (Vision Transformer).
● Example: Automated tagging and categorization of images in large databases or
recognizing defects in manufacturing products.
9. Image Segmentation
● Use Case: Segmenting parts of images for applications like medical imaging or object
detection.
● Models: Mask R-CNN, U-Net.
● Example: Highlighting cancerous tissues in X-ray images or isolating specific objects in
satellite imagery.
10. Audio Processing (Speech-to-Text and Text-to-Speech)
● Use Case: Converting speech to text for transcription services, or generating speech
from text for virtual assistants or automated systems.
● Models: Wav2Vec 2.0, Tacotron 2.
● Example: Real-time transcription of meetings, or converting text into realistic-sounding
speech for voiceovers.
11. Code Generation
● Use Case: Automatic code generation or code completion.
● Models: CodeBERT, Codex.
● Example: Autocompleting code for developers, or generating boilerplate code from
high-level descriptions of functionality.
12. Sentiment Analysis
● Use Case: Determining the emotional tone behind a piece of text.
● Models: DistilBERT, RoBERTa.
● Example: Identifying whether customer feedback or social media posts are positive,
negative, or neutral.
13. Named Entity Recognition (NER)
● Use Case: Extracting specific information like names, locations, or organizations from
unstructured text.
● Models: BERT, Flair.
● Example: Automatically identifying key stakeholders from business documents or
extracting product names from reviews.
14. Data Augmentation
LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024
54
LLMs and GenAI Simplified: An Easy Path to Understanding
● Use Case: Generating synthetic data for training machine learning models, especially in
cases where real data is limited.
● Models: T5, GPT-3.
● Example: Augmenting a dataset of medical records with synthetic but realistic data to
train models for diagnosis.
15. Image Captioning
● Use Case: Automatically generating captions or descriptions for images.
● Models: CLIP, ViLBERT.
● Example: Describing product images for e-commerce sites or generating alt-text for
accessibility on websites.
16. Multi-modal AI
● Use Case: Combining inputs from multiple data types like text and images to generate
responses.
● Models: CLIP, Florence.
● Example: Interpreting a text description to retrieve relevant images or vice versa.
17. Text-Based Games/Interactive Stories
● Use Case: Creating interactive, text-based adventure games or dynamic stories based
on user input.
● Models: GPT-3, DialoGPT.
● Example: Generating new scenarios or storylines in a game based on player choices.
18. Knowledge Base Extraction
● Use Case: Automatically generating or updating knowledge bases from unstructured
documents.
● Models: T5, BERT.
● Example: Creating structured FAQ documents from customer service interactions or
product manuals.
19. Fake News Detection
● Use Case: Identifying and classifying articles or social media posts as misleading or fake
news.
● Models: RoBERTa, BERT.
● Example: Filtering and flagging potentially unreliable news sources or claims on social
media platforms.
20. Grammar and Style Correction
LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024
55
LLMs and GenAI Simplified: An Easy Path to Understanding
● Use Case: Automatically correcting grammar, spelling, and style errors in text.
● Models: T5, GPT-3.
● Example: Creating tools for automatic proofreading or improving the writing style of
articles.
21. Legal Document Generation
● Use Case: Automating the creation of legal documents like contracts, agreements, or
legal briefs.
● Models: GPT-3, T5.
● Example: Drafting legal documents based on predefined templates and input from legal
professionals.
22. Paraphrasing
● Use Case: Rewriting or paraphrasing text while maintaining the original meaning, often
for content diversification or academic use.
● Models: Pegasus, T5.
● Example: Rewriting articles or sections of text to avoid plagiarism or for content
variation.
23. Automated Code Review
● Use Case: Automating the process of reviewing code for potential errors, inefficiencies,
or security vulnerabilities.
● Models: CodeBERT, Codex.
● Example: Performing automated code reviews to flag issues or provide suggestions for
improvements.
24. Emotion Recognition in Text
● Use Case: Detecting and classifying emotions expressed in text, which can be applied in
customer support or content analysis.
● Models: BERT, DistilBERT.
● Example: Analyzing customer complaints to detect emotions like frustration, anger, or
satisfaction.
25. Product Recommendation
● Use Case: Generating personalized product recommendations based on user
preferences and behaviors.
● Models: BERT, DistilBERT, Transformer models for recommendation.
● Example: Recommending similar or complementary products to users in an
e-commerce setting based on their browsing or purchase history.
LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024
56
LLMs and GenAI Simplified: An Easy Path to Understanding
26. Text-to-Programming Language Conversion
● Use Case: Translating natural language descriptions into executable code.
● Models: Codex, GPT-3.
● Example: Converting user requirements written in plain English into Python or
JavaScript code.
27. Style Transfer (Text)
● Use Case: Changing the tone or style of text, such as converting formal writing into
casual language or mimicking a particular author’s writing style.
● Models: GPT-3, T5.
● Example: Rewriting formal business emails in a more casual tone or vice versa.
28. Document Comparison
● Use Case: Identifying and comparing differences or similarities between two or more
documents.
● Models: BERT, T5.
● Example: Comparing legal contracts or versions of documents to identify key differences
or changes.
29. Content Moderation
● Use Case: Detecting inappropriate or harmful content in text, images, or videos for
automatic moderation.
● Models: RoBERTa, GPT-3.
● Example: Automatically flagging offensive or harmful language in online forums or social
media platforms.
30. Voice Cloning
● Use Case: Generating speech that mimics the voice of a particular person, often used in
virtual assistants or content creation.
● Models: Tacotron 2, WaveGlow.
● Example: Cloning a public figure’s voice to generate audio clips for educational or
entertainment purposes.
31. Image Super-Resolution
● Use Case: Enhancing the resolution of images to improve quality.
● Models: ESRGAN (Enhanced Super-Resolution Generative Adversarial Networks).
● Example: Enhancing low-resolution images for medical diagnostics, satellite imagery, or
historical photograph restoration.
LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024
57
LLMs and GenAI Simplified: An Easy Path to Understanding
32. Code Translation (Language-to-Language)
● Use Case: Converting code from one programming language to another.
● Models: CodeT5, Codex.
● Example: Translating Java code into Python for software porting purposes.
33. Image Inpainting
● Use Case: Filling in missing or corrupted parts of an image.
● Models: LaMa (Large Masked Image Modeling).
● Example: Restoring damaged photographs or removing unwanted objects from images.
34. Text-Based Music Generation
● Use Case: Generating musical compositions based on text prompts or descriptions.
● Models: Jukebox (OpenAI), MusicBERT.
● Example: Creating custom music tracks based on user-specified genres or moods.
35. Visual Question Answering (VQA)
● Use Case: Answering questions about the content of an image.
● Models: ViLBERT, CLIP.
● Example: Answering questions about an image’s objects, actions, or context in
applications like medical imaging or e-commerce.
36. Data-to-Text Generation
● Use Case: Converting structured data into readable text.
● Models: T5, GPT-3.
● Example: Automatically generating written summaries from tables or charts, such as
generating financial reports from numerical data.
37. Human Pose Estimation
● Use Case: Detecting human body poses in images or videos for applications like fitness
tracking, animation, or security.
● Models: OpenPose, HRNet.
● Example: Analyzing sports performance or guiding fitness exercises by tracking a user's
body movements.
38. Time-Series Forecasting
● Use Case: Predicting future values based on historical time-series data.
● Models: Prophet, Temporal Fusion Transformers (TFT).
LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024
58
LLMs and GenAI Simplified: An Easy Path to Understanding
● Example: Predicting stock prices, energy demand, or sales trends.
39. Reinforcement Learning for Text-Based Tasks
● Use Case: Using reinforcement learning to optimize decision-making in tasks involving
text, such as conversation agents or game playing.
● Models: GPT-3 with reinforcement learning (RLHF - Reinforcement Learning from
Human Feedback).
● Example: Training a chatbot to maximize customer satisfaction through long
conversations.
40. Automated Tagging and Metadata Generation
● Use Case: Automatically generating tags and metadata for content, such as videos or
blog posts.
● Models: BERT, RoBERTa.
● Example: Automatically adding keywords and tags to YouTube videos or blog articles to
improve SEO.
7. LLM Model Parameters
1. Temperature
● Description: Controls the randomness or creativity of the model's output. The
temperature parameter adjusts how deterministic or diverse the text generation will be.
● Range: Typically between 0 and 2.
○ Low temperature (e.g., 0.1): Makes the model more deterministic, meaning it
will choose the most probable tokens with higher certainty, resulting in more
predictable and conservative outputs.
○ High temperature (e.g., 1.5): Increases randomness, encouraging the model to
explore less probable tokens, which can result in more creative and varied
outputs.
● Use Case: A low temperature is ideal for tasks requiring precise, fact-based outputs
(e.g., answering factual questions), while a higher temperature can be used for creative
tasks like storytelling or generating diverse outputs.
python
Copy code
openai.Completion.create(
LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024
59
LLMs and GenAI Simplified: An Easy Path to Understanding
model="text-davinci-003",
prompt="Tell me a story about a brave knight.",
temperature=1.0, # Default
max_tokens=100
)
2. Max Tokens
● Description: Limits the number of tokens (words or subwords) in the generated output.
Each token may represent a word, part of a word, or punctuation mark.
● Range: Up to the model’s token limit (e.g., GPT-3 has a maximum of 4096 tokens).
● Use Case: This parameter is used to control the length of the generated text. For
example, shorter text summaries might have a smaller max tokens value, while longer
essays or creative writing might have a larger value.
python
Copy code
openai.Completion.create(
model="text-davinci-003",
prompt="Explain quantum physics in simple terms.",
max_tokens=200 # Maximum number of tokens in the response
)
3. Top-k Sampling
● Description: Limits the next token generation to the top k most likely tokens. The model
will sample from this limited set instead of considering the entire vocabulary, ensuring
that only the most probable tokens are considered.
● Range: Positive integer (e.g., k = 40).
○ Low k: Generates more predictable output.
○ High k: Increases the diversity of the output by allowing less probable tokens to
be considered.
● Use Case: Top-k sampling is useful when you want to balance between creative
outputs and coherence, by ensuring the model doesn't generate extremely unlikely
words but still provides some variety.
python
Copy code
openai.Completion.create(
model="text-davinci-003",
prompt="What is the future of AI?",
LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024
60
LLMs and GenAI Simplified: An Easy Path to Understanding
top_k=40, # Limits sampling to the top 40 tokens
max_tokens=100
)
4. Top-p Sampling (Nucleus Sampling)
● Description: Top-p sampling (also known as nucleus sampling) selects the smallest
possible set of tokens whose cumulative probability exceeds a threshold ppp. Instead of
choosing a fixed number of tokens (as in top-k), top-p dynamically chooses tokens
based on their cumulative probability.
● Range: p∈[0,1]p in [0, 1]p∈[0,1]
○ Low p (e.g., 0.1): Restricts the model to the highest-probability tokens, resulting
in more conservative outputs.
○ High p (e.g., 0.9): Allows more diverse tokens to be considered, increasing
creativity in the output.
● Use Case: This is particularly useful in text generation tasks where you want to control
the diversity and ensure that tokens with very low probabilities are not selected.
python
Copy code
openai.Completion.create(
model="text-davinci-003",
prompt="Describe a sunset.",
top_p=0.9, # Ensures 90% of the probability mass is used in
sampling
max_tokens=50
)
5. Frequency Penalty
● Description: Reduces the likelihood of the model generating tokens that have already
been generated in the current output. This is useful for avoiding repetitive phrases or
sentences.
● Range: [−2.0,2.0][-2.0, 2.0][−2.0,2.0]
○ Positive values: Penalize the model for repeating the same tokens, making it
less likely to repeat words.
○ Negative values: Encourage the model to repeat tokens more often.
● Use Case: When generating long text, this can be used to reduce repetition and
encourage the model to generate more varied content.
python
LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024
61
LLMs and GenAI Simplified: An Easy Path to Understanding
Copy code
openai.Completion.create(
model="text-davinci-003",
prompt="Write a paragraph about the importance of education.",
frequency_penalty=0.5, # Penalizes token repetition
max_tokens=100
)
6. Presence Penalty
● Description: Encourages the model to explore new topics or words that haven't
appeared in the current output. This increases the likelihood of introducing new tokens
into the output.
● Range: [−2.0,2.0][-2.0, 2.0][−2.0,2.0]
○ Positive values: Make the model more likely to introduce new concepts or
tokens.
○ Negative values: Encourage the model to stay within the same set of tokens or
concepts.
● Use Case: Used when you want the model to be more exploratory and avoid sticking to
the same themes.
python
Copy code
openai.Completion.create(
model="text-davinci-003",
prompt="Give me creative ideas for a tech startup.",
presence_penalty=0.6, # Encourages introducing new ideas and
concepts
max_tokens=150
)
7. Stop Sequences
● Description: Defines specific token sequences that will stop the generation process
once they are encountered. These tokens are included in the output up to that point, but
generation halts when the stop sequence is detected.
● Use Case: Useful for controlling when the model should stop generating text. For
example, in chatbot conversations, you might use stop sequences to signal the end of a
response.
python
LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024
62
LLMs and GenAI Simplified: An Easy Path to Understanding
Copy code
openai.Completion.create(
model="text-davinci-003",
prompt="Tell me a joke.",
stop=["n", "<|endoftext|>"], # Stops generation when newline or
end token is encountered
max_tokens=50
)
8. Best-of (n-best)
● Description: Generates multiple completions for each prompt (e.g., n completions) and
returns the one with the highest log-probability. This is useful when you want the best
possible output out of several generated options.
● Range: Integer (e.g., best_of = 3 generates 3 outputs and selects the best one).
● Use Case: Useful when quality is more important than speed, and you want to ensure
that the best possible response is chosen.
python
Copy code
openai.Completion.create(
model="text-davinci-003",
prompt="Explain the significance of the moon landing.",
best_of=3, # Generate 3 completions and return the best one
max_tokens=150
)
9. Echo
● Description: When set to true, the model returns the prompt in addition to the generated
output. This can be useful for debugging or when you want to review the input alongside
the output.
● Use Case: Helpful in interactive applications where you want to display both the input
and the generated response.
python
Copy code
openai.Completion.create(
model="text-davinci-003",
prompt="What is artificial intelligence?",
LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024
63
LLMs and GenAI Simplified: An Easy Path to Understanding
echo=True, # Echoes the prompt in the response
max_tokens=100
)
10. Stream
● Description: When set to true, the model streams the tokens in real-time instead of
generating the entire output at once. This is useful for real-time applications like chatbots
where you want a response to be displayed as it’s being generated.
● Use Case: Ideal for interactive applications like live chatbots where the user doesn't
want to wait for the entire response to be generated before seeing any output.
python
Copy code
openai.Completion.create(
model="text-davinci-003",
prompt="What are the benefits of renewable energy?",
stream=True, # Stream the output token by token
max_tokens=150
)
8. LLM benchmarks
1. SuperGLUE
● Focus: Natural language understanding.
● Description: An improvement over the original GLUE benchmark, including more
challenging tasks like reading comprehension, coreference resolution, and inference.
2. GLUE (General Language Understanding Evaluation)
● Focus: General natural language understanding tasks.
● Description: Includes tasks such as sentence classification, sentence similarity, and
natural language inference.
LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024
64
LLMs and GenAI Simplified: An Easy Path to Understanding
3. OpenAI HumanEval
● Focus: Code generation.
● Description: Evaluates a model’s ability to generate correct Python functions based on
natural language descriptions.
4. SQuAD (Stanford Question Answering Dataset)
● Focus: Question answering.
● Description: Evaluates a model's ability to understand and answer questions based on
a given passage.
5. MMLU (Massive Multitask Language Understanding)
● Focus: General knowledge across a wide range of subjects.
● Description: Tests models on topics from elementary math to medicine and law.
6. HELLASWAG
● Focus: Commonsense reasoning.
● Description: Measures a model’s ability to predict the most plausible continuation of a
given scenario.
7. Big-Bench (Beyond the Imitation Game Benchmark)
● Focus: Diverse set of tasks.
● Description: A collection of 204 tasks that test models on areas like reasoning,
linguistics, mathematics, and general knowledge.
8. LAMBADA
● Focus: Language modeling.
● Description: Tests the model's ability to predict the final word in a sentence when
provided with long-range context.
9. TriviaQA
● Focus: Open-domain question answering.
● Description: Includes questions from trivia and a large corpus of text documents to test
factual recall.
10. CoQA (Conversational Question Answering)
● Focus: Dialogue-based question answering.
LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024
65
LLMs and GenAI Simplified: An Easy Path to Understanding
● Description: Evaluates how well a model can answer a series of interrelated questions
based on a passage.
11. Winograd Schema Challenge
● Focus: Pronoun disambiguation.
● Description: Tests the model’s commonsense reasoning by asking it to resolve
ambiguities in sentences.
12. ARC (AI2 Reasoning Challenge)
● Focus: Science question answering.
● Description: Tests models on multiple-choice science questions that require reasoning
beyond simple text matching.
13. PiQA (Physical Interaction: Question Answering)
● Focus: Physical reasoning.
● Description: Tests how well a model can reason about the physical world, particularly in
everyday human activities.
14. BoolQ (Boolean Questions)
● Focus: Yes/No question answering.
● Description: Involves reading comprehension and answering questions with simple yes
or no responses.
15. TyDiQA
● Focus: Multilingual question answering.
● Description: Tests question answering capabilities across multiple languages and varied
contexts.
16. StoryCloze
● Focus: Story comprehension.
● Description: Evaluates a model's ability to select the best ending for a given story.
17. WinoGrande
● Focus: Commonsense reasoning.
● Description: A larger and more difficult version of the Winograd Schema Challenge to
test commonsense reasoning on a larger scale.
LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024
66
LLMs and GenAI Simplified: An Easy Path to Understanding
18. DROP (Discrete Reasoning Over Paragraphs)
● Focus: Reading comprehension and arithmetic reasoning.
● Description: Requires models to answer questions that involve discrete reasoning like
counting, sorting, or arithmetic.
19. Hendrycks Test
● Focus: Multitask learning.
● Description: Covers multiple-choice questions across topics such as humanities, STEM,
and social sciences.
20. XGLUE
● Focus: Multilingual natural language understanding.
● Description: Extends GLUE tasks to multiple languages, testing cross-lingual
generalization.
21. CodeXGLUE
● Focus: Code understanding and generation.
● Description: A benchmark designed for evaluating models on coding tasks like code
generation, translation, and classification.
22. CLUE (Chinese Language Understanding Evaluation)
● Focus: Chinese natural language understanding.
● Description: The Chinese version of GLUE, testing various language tasks in the
Chinese language.
9. LLM Finetuning
a) LLM with Prompt Engineering Tuning
LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024
67
LLMs and GenAI Simplified: An Easy Path to Understanding
Prompt engineering involves designing and refining prompts to improve the performance of
language models for specific tasks. This method doesn't require fine-tuning the model itself but
focuses on optimizing the input prompts.
Steps:
1. Define the Task: Clearly understand the task you want the model to perform.
2. Design Prompts: Create prompts that provide clear and specific instructions to the
model.
3. Test and Refine: Evaluate the model's output and iteratively refine the prompts to get
better results.
Example:
python
Copy code
import openai
# Set your OpenAI API key
openai.api_key = 'your_openai_api_key'
def get_response(prompt):
response = openai.Completion.create(
engine="text-davinci-003",
prompt=prompt,
max_tokens=150
)
return response.choices[0].text.strip()
# Define a prompt
prompt = "Explain the causes of the American Civil War in detail."
# Get and print the response
response = get_response(prompt)
print(response)
Resources:
● OpenAI Documentation
● Effective Prompting Techniques
b) LLM Instructions-based Training Tuning
LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024
68
LLMs and GenAI Simplified: An Easy Path to Understanding
Instructions-based training tuning involves fine-tuning an LLM on a dataset that contains
specific instructions and their corresponding completions. This helps the model understand and
follow complex instructions more accurately.
Steps:
1. Prepare Data: Create a dataset with prompts (instructions) and completions.
2. Convert to JSONL Format: Format the data as required by OpenAI for fine-tuning.
3. Upload Data: Upload the dataset to OpenAI.
4. Fine-tune the Model: Fine-tune the model with the uploaded dataset.
5. Test the Model: Evaluate the model with new instructions.
Example:
python
Copy code
import json
import openai
# Prepare your data
data = [
{
"prompt": "List the causes of the American Civil War.",
"completion": " The causes of the American Civil War include
slavery, states' rights, economic disagreements, and political
conflicts."
},
# Add more prompt-completion pairs
]
# Save to a JSONL file
with open('instruction_data.jsonl', 'w') as outfile:
for entry in data:
json.dump(entry, outfile)
outfile.write('n')
# Set your OpenAI API key
openai.api_key = 'your_openai_api_key'
# Upload the file
response = openai.File.create(
file=open("instruction_data.jsonl"),
LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024
69
LLMs and GenAI Simplified: An Easy Path to Understanding
purpose='fine-tune'
)
file_id = response['id']
# Create the fine-tune job
response = openai.FineTune.create(
training_file=file_id,
model="davinci" # or another appropriate model
)
fine_tune_id = response['id']
# Monitor the fine-tuning job
import time
while True:
status = openai.FineTune.retrieve(id=fine_tune_id)['status']
print(f"Status: {status}")
if status in ['succeeded', 'failed']:
break
time.sleep(30)
# Use the fine-tuned model
response = openai.Completion.create(
model=fine_tune_id,
prompt="Explain the major causes of World War II.",
max_tokens=150
)
print(response.choices[0].text.strip())
Resources:
● OpenAI Fine-tuning Guide
● How to Fine-tune GPT-3
c) LLM with RAG (Retrieval-Augmented Generation) Fine-tuning
RAG combines retrieval-based methods with generative models to enhance the generation
process. The model retrieves relevant documents from a corpus to inform its responses.
Steps:
LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024
70
LLMs and GenAI Simplified: An Easy Path to Understanding
1. Prepare Data: Create a dataset with context (retrieved documents) a try nd target
responses.
2. Set Up Retriever: Use a retriever to fetch relevant documents.
3. Fine-tune the Model: Fine-tune the model with the dataset.
4. Query the Model: Use the model to generate responses based on retrieved contexts.
Example:
python
Copy code
import openai
import json
# Prepare your data
data = [
{
"prompt": "What is the capital of France?",
"completion": " The capital of France is Paris."
},
# Add more prompt-completion pairs with context
]
# Save to a JSONL file
with open('rag_data.jsonl', 'w') as outfile:
for entry in data:
json.dump(entry, outfile)
outfile.write('n')
# Set your OpenAI API key
openai.api_key = 'your_openai_api_key'
# Upload the file
response = openai.File.create(
file=open("rag_data.jsonl"),
purpose='fine-tune'
)
file_id = response['id']
# Create the fine-tune job
response = openai.FineTune.create(
training_file=file_id,
LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024
71
LLMs and GenAI Simplified: An Easy Path to Understanding
model="davinci" # or another appropriate model
)
fine_tune_id = response['id']
# Monitor the fine-tuning job
import time
while True:
status = openai.FineTune.retrieve(id=fine_tune_id)['status']
print(f"Status: {status}")
if status in ['succeeded', 'failed']:
break
time.sleep(30)
# Use the fine-tuned model
response = openai.Completion.create(
model=fine_tune_id,
prompt="What is the capital of Germany?",
max_tokens=150
)
print(response.choices[0].text.strip())
Resources:
● OpenAI Documentation
● Retrieval-Augmented Generation (RAG)
d) LLM with LoRA (Low-Rank Adaptation)
LoRA (Low-Rank Adaptation) is a technique to adapt pre-trained language models efficiently by
fine-tuning low-rank matrices added to the model's weights.
Steps:
1. Set Up Environment: Install necessary libraries.
2. Prepare Data: Load and preprocess the dataset.
3. Apply LoRA: Implement LoRA to fine-tune the model.
4. Train the Model: Train the model using LoRA.
5. Evaluate: Test the fine-tuned model.
Example:
python
LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024
72
LLMs and GenAI Simplified: An Easy Path to Understanding
Copy code
# Placeholder for LoRA implementation as specific library support is
required
# Check Hugging Face or other relevant libraries for LoRA support
Resources:
● Hugging Face Transformers
● LoRA Paper
e) LLM with QLoRA (Quantized Low-Rank Adaptation)
QLoRA combines quantization and low-rank adaptation to reduce the computational cost of
fine-tuning.
Steps:
1. Set Up Environment: Install necessary libraries.
2. Prepare Data: Load and preprocess the dataset.
3. Apply QLoRA: Implement QLoRA to fine-tune the model.
4. Train the Model: Train the model using QLoRA.
5. Evaluate: Test the fine-tuned model.
Example:
python
Copy code
# Placeholder for QLoRA implementation as specific library support is
required
# Check Hugging Face or other relevant libraries for QLoRA support
Resources:
● Quantization in Deep Learning
● LoRA Paper
f) LLM with Full Tuning
Full Tuning involves training all parameters of the language model on a specific dataset.
Steps:
1. Set Up Environment: Install necessary libraries.
2. Prepare Data: Load and preprocess the dataset.
LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024
73
LLMs and GenAI Simplified: An Easy Path to Understanding
3. Fine-Tune the Model: Train the entire model on the dataset.
4. Evaluate: Test the fine-tuned model.
Example:
python
Copy code
import openai
import pandas as pd
import json
# Set your OpenAI API key
openai.api_key = 'your_openai_api_key'
# Load CSV Data
csv_file_path = 'your_data.csv'
df = pd.read_csv(csv_file_path)
# Prepare the Training Data
def prepare_training_data(df):
data = []
for i, row in df.iterrows():
entry = {
"prompt": row['Prompt'],
"completion": " " + row['Completion']
}
data.append(entry)
return data
training_data = prepare_training_data(df)
# Save to a JSONL file
jsonl_file_path = 'training_data.jsonl'
with open(jsonl_file_path, 'w') as outfile:
for entry in training_data:
json.dump(entry, outfile)
outfile.write('n')
# Upload Training Data
response = openai.File.create(
LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024
74
LLMs and GenAI Simplified: An Easy Path to Understanding
file=open(jsonl_file_path),
purpose='fine-tune'
)
file_id = response['id']
# Fine-Tune the Model
response = openai.FineTune.create(
training_file=file_id,
model="davinci"
)
fine_tune_id = response['id']
# Monitor the Fine-Tuning Process
import time
while True:
status = openai.FineTune.retrieve(id=fine_tune_id)['status']
print(f"Status: {status}")
if status in ['succeeded', 'failed']:
break
time.sleep(30)
# Test the Fine-Tuned Model
response = openai.Completion.create(
model=fine_tune_id,
prompt="Explain the major causes of World War II.",
max_tokens=150
)
print(response.choices[0].text.strip())
Resources:
● OpenAI Fine-tuning Guide
● Hugging Face Transformers
These examples provide a detailed overview of different fine-tuning and adaptation techniques
for LLMs. Each method has its own use cases and advantages, and the choice of method
depends on the specific requirements of your projec
LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024
75
LLMs and GenAI Simplified: An Easy Path to Understanding
10. Interview Questions
LLM Architecture
1. Question: Can you explain the general architecture of a large language model like GPT
or BERT? Answer: LLMs like GPT and BERT are built on the transformer architecture,
consisting of layers of self-attention mechanisms and feed-forward neural networks.
BERT uses an encoder-only architecture for bidirectional context, while GPT uses a
decoder-only architecture optimized for autoregressive tasks (i.e., generating text). Both
architectures involve tokenization, positional encoding, and multiple attention heads to
capture context over long sequences.
2. Question: How do LLMs handle long sequences of input text? Answer: LLMs use
self-attention mechanisms to capture dependencies between distant words in a text
sequence. Additionally, newer models incorporate optimizations like sparse attention,
reversible layers, and memory-efficient attention to handle longer sequences without
excessive computational costs.
3. Question: What role does positional encoding play in LLMs? Answer: Positional
encoding is crucial in transformer models because, unlike RNNs or CNNs, transformers
don't have inherent sequential order. Positional encoding provides information about the
position of words in a sequence, allowing the model to understand the relative order of
tokens.
4. Question: How do LLMs balance performance and memory efficiency? Answer: LLMs
balance performance and memory by using techniques like weight sharing, model
quantization, sparse attention, and checkpointing during training. These methods help
reduce the memory footprint while maintaining accuracy and performance in handling
large datasets and long sequences.
5. Question: What are some techniques for reducing the size of LLMs without sacrificing
performance? Answer: Techniques include knowledge distillation (training a smaller
"student" model to mimic the outputs of a larger "teacher" model), pruning (removing
unnecessary neurons or weights), quantization (using lower-precision numbers for
weights), and Low-Rank Adaptation (LoRA) during fine-tuning.
Transformers
LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024
76
LLMs and GenAI Simplified: An Easy Path to Understanding
1. Question: What is the self-attention mechanism, and why is it important in
transformers? Answer: The self-attention mechanism allows the model to weigh the
importance of different words in a sentence, even those far apart. Each token "attends"
to every other token in the input sequence, helping the model capture contextual
relationships more effectively than traditional RNNs or CNNs. It is essential because it
enables transformers to process sequences in parallel and handle long-range
dependencies.
2. Question: How do transformers differ from traditional RNNs and CNNs? Answer:
Transformers do not process input sequentially, as RNNs do. Instead, they use
self-attention to capture relationships between tokens in parallel, making them highly
efficient for long sequences. CNNs are limited by their local receptive fields, while
transformers can capture global dependencies in the data.
3. Question: Can you explain multi-head attention and why it's beneficial? Answer:
Multi-head attention splits the input into multiple subspaces, allowing the model to focus
on different aspects of the sequence simultaneously. Each attention head can attend to
different parts of the sequence, which helps the model capture more nuanced
relationships between tokens.
4. Question: How does the transformer architecture scale, and what challenges come with
scaling? Answer: Transformer models scale by increasing the number of layers,
attention heads, and parameters. However, scaling brings challenges like increased
computational costs, memory usage, and the risk of overfitting. Efficient training
techniques like distributed computing, gradient checkpointing, and memory-efficient
attention are required to manage these issues.
5. Question: What is the role of feed-forward networks in transformers? Answer:
Feed-forward networks in transformers are applied independently to each token after the
attention mechanism. They consist of two fully connected layers with an activation
function in between, which allows the model to apply nonlinear transformations and
increase its capacity to capture complex patterns in the data.
Optimization Techniques
1. Question: What is gradient descent, and why is it important for training LLMs? Answer:
Gradient descent is an optimization algorithm used to minimize the loss function during
training. It iteratively adjusts the model’s parameters based on the gradient of the loss
function with respect to the parameters. This process is crucial for making the model
learn from the data and improve its performance over time.
LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024
77
LLMs and GenAI Simplified: An Easy Path to Understanding
2. Question: How does Adam differ from SGD, and why is it commonly used for LLMs?
Answer: Adam (Adaptive Moment Estimation) is an optimization algorithm that
combines the benefits of both momentum (like in SGD with momentum) and adaptive
learning rates. Adam is preferred for LLMs because it adapts the learning rate for each
parameter, making it efficient for large models with sparse gradients.
3. Question: What is weight decay, and how does it help with training LLMs? Answer:
Weight decay is a regularization technique that penalizes large weights during training to
prevent overfitting. It helps the model generalize better to unseen data by discouraging
the learning of complex, unnecessary features.
4. Question: What is layer normalization, and how does it improve model training?
Answer: Layer normalization standardizes the inputs to each layer, which stabilizes
training and helps prevent issues like vanishing or exploding gradients. It improves the
speed and efficiency of training by ensuring that the model's activations remain within a
stable range.
5. Question: How do learning rate schedules impact the performance of LLMs? Answer:
Learning rate schedules dynamically adjust the learning rate during training. Starting with
a higher learning rate and gradually decreasing it (cosine decay or step decay) helps the
model learn faster initially and fine-tune the weights more precisely later on, improving
overall performance.
Ethical Considerations
1. Question: What are some common ethical concerns when deploying generative AI
models? Answer: Ethical concerns include bias in the model’s outputs, misinformation,
privacy violations, and the potential misuse of generated content. Additionally, generative
models can produce harmful or offensive content, which raises concerns about
accountability and control in deployment environments.
2. Question: How do you address the problem of bias in large language models? Answer:
Bias can be mitigated through careful dataset selection, bias mitigation techniques
during training (e.g., adversarial training), and post-processing adjustments.
Transparency, fairness, and continuous monitoring during deployment are also crucial
steps to address bias.
3. Question: How can you ensure AI models respect privacy regulations like GDPR?
Answer: Respecting privacy regulations involves anonymizing sensitive data, ensuring
explicit user consent for data collection, and implementing model training techniques that
avoid storing personally identifiable information (PII). Federated learning and differential
privacy can also help in creating models that respect privacy.
4. Question: How can you ensure that generative AI models don’t contribute to
misinformation? Answer: To reduce the risk of misinformation, AI models should be
trained on high-quality, verified datasets and include human-in-the-loop oversight.
Additionally, models can be fine-tuned for fact-checking and verification tasks to identify
and reduce false information in outputs.
LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024
78
LLMs and GenAI Simplified: An Easy Path to Understanding
5. Question: What role does transparency play in AI governance, and how do you
implement it? Answer: Transparency ensures that AI models and their decision-making
processes are understandable and accountable. This can be implemented by providing
clear documentation, model cards, and explaining the rationale behind the model’s
predictions (using explainability techniques).
Deployment Strategies
1. Question: What are some challenges when deploying large language models in
production? Answer: Challenges include high computational costs, latency issues,
scaling to handle high traffic, ensuring model updates without downtime, managing
version control, and addressing ethical concerns such as bias or harmful outputs.
2. Question: How can you optimize the inference speed of LLMs during deployment?
Answer: Inference speed can be optimized by techniques such as model quantization,
using faster hardware like GPUs or TPUs, reducing model size with pruning or
distillation, and utilizing caching and batching for handling multiple requests efficiently.
3. Question: What are the differences between on-premise and cloud-based deployment
for LLMs? Answer: On-premise deployment offers more control over data privacy and
latency but requires significant hardware investment and maintenance. Cloud-based
deployment provides scalability, flexibility, and lower upfront costs but comes with
potential concerns around data privacy and dependency on third-party providers.
4. Question: How do you ensure continuous model improvement in production
environments? Answer: Continuous model improvement can be ensured by setting up a
feedback loop where user interactions are monitored for errors or misclassifications.
Retraining the model with updated or new data, along with A/B testing for performance
monitoring, also helps keep models up-to-date and accurate.
5. Question: What is edge deployment, and when is it preferable over cloud-based
deployment for LLMs? Answer: Edge deployment involves running AI models directly on
local devices (e.g., smartphones, IoT devices), reducing latency and dependency on
network connections. It is preferable for applications requiring real-time inference,
enhanced privacy, and low-latency responses, such as autonomous vehicles or smart
home devices.
Hugging Face
1. Question: How would you fine-tune a pre-trained model using Hugging Face’s
Transformers library?
Answer: Fine-tuning a pre-trained model with Hugging Face typically involves loading
LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024
79
LLMs and GenAI Simplified: An Easy Path to Understanding
the pre-trained model and tokenizer using AutoModelForSequenceClassification
and AutoTokenizer. You then prepare a custom dataset, format it using
datasets.Dataset or DataLoader, and use the Trainer API to handle the training
loop. The Trainer API allows for easy configuration of training parameters, evaluation
metrics, and optimizer setup. During training, only the final layers are adjusted while the
pre-trained layers are mostly retained.
2. Question: Can you explain the role of the Hugging Face Model Hub and how it
simplifies the process of working with LLMs?
Answer: The Hugging Face Model Hub serves as a repository where pre-trained models
and datasets are shared by the community. It simplifies the process by allowing users to
search, download, and use pre-trained models across many domains (e.g., NLP, vision)
without having to build models from scratch. It also enables easy sharing and version
control for custom models, and it integrates seamlessly with the transformers and
datasets libraries.
3. Question: How do you manage and version control different models and datasets in
Hugging Face?
Answer: Hugging Face offers Git-based version control for models and datasets. You
can create, push, and maintain different versions of your models on the Hub, ensuring
reproducibility and collaborative development. Hugging Face allows users to tag models
with specific versions and track changes, much like traditional software version control.
4. Question: What’s the difference between AutoModel and AutoTokenizer classes in
Hugging Face Transformers?
Answer: AutoModel refers to a class that automatically selects the correct model
architecture based on a pre-trained model checkpoint. AutoTokenizer, on the other
hand, handles the tokenization of the input text, converting it into a format that the model
can understand. Both classes offer a simplified way to load models and tokenizers for
different tasks (e.g., text classification, question answering) without specifying each
architecture explicitly.
5. Question: Can you walk us through the process of creating a custom dataset for training
an LLM in Hugging Face?
Answer: Creating a custom dataset for Hugging Face can be done by formatting the
data into JSON, CSV, or Pandas format. Using the datasets library, you can load the
dataset with the load_dataset function. You can further preprocess, tokenize, and
split the dataset into training, validation, and test sets. Custom data can also be
uploaded to the Hugging Face Hub for public use or personal experiments.
OpenAI
1. Question: How does OpenAI’s GPT model handle generating responses when no
fine-tuning has been applied?
Answer: GPT models are trained with a general understanding of language through
LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024
80
LLMs and GenAI Simplified: An Easy Path to Understanding
large-scale pretraining. Even without fine-tuning, GPT models generate responses by
relying on their pre-trained knowledge and patterns learned during training. They
leverage the input prompt to generate contextually relevant text by predicting the next
word based on the tokens seen so far. These models are typically capable of performing
general tasks like summarization, translation, or conversation without specific task
training.
2. Question: Explain OpenAI’s approach to aligning large language models with human
values (Reinforcement Learning from Human Feedback - RLHF).
Answer: Reinforcement Learning from Human Feedback (RLHF) is a technique used by
OpenAI to align the behavior of LLMs with human preferences. It involves training the
model on a dataset of human-labeled responses where humans rate or rank model
outputs. The feedback is used to reward desirable behavior and penalize undesirable
behavior, thus guiding the model to produce outputs that are more aligned with human
expectations and values.
3. Question: What are some use cases where OpenAI’s GPT models can be directly
integrated into applications?
Answer: GPT models can be integrated into a variety of applications such as customer
service chatbots, automated content generation (e.g., blog writing, social media posts),
virtual assistants, language translation, summarization tools, and even code generation
for developers. They can also be used for answering complex queries, drafting emails,
and automating workflows in businesses.
4. Question: How does OpenAI’s API pricing model work, and how can you optimize costs
when deploying LLMs?
Answer: OpenAI’s API pricing is generally based on the number of tokens processed
during requests. To optimize costs, you can reduce the length of prompts and responses,
use lower-capacity models for simpler tasks (e.g., GPT-3.5 instead of GPT-4), and cache
frequently used results. Batching requests and applying temperature or frequency
controls can also reduce unnecessary token usage.
5. Question: How would you fine-tune an OpenAI model for a specific task like legal
document summarization?
Answer: OpenAI’s models cannot be directly fine-tuned by users, but you can achieve
task-specific optimization by carefully crafting prompts (prompt engineering) for legal
document summarization. You can use a few-shot learning approach where examples of
summarization are included in the prompt, guiding the model to output summaries in the
required format. Additionally, you could build a pipeline to preprocess legal text before
feeding it into the model.
LangChain
1. Question: What is LangChain, and how does it extend the capabilities of large language
models?
Answer: LangChain is a framework for building applications that use large language
LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024
81
LLMs and GenAI Simplified: An Easy Path to Understanding
models (LLMs) in more complex, interactive, and contextual ways. It allows developers
to connect LLMs with external data sources, build multi-step chains, and maintain
memory across conversations, enabling sophisticated applications like chatbots, agents,
and automated reasoning systems.
2. Question: Can you explain the concept of “chains” in LangChain and how they help in
building complex workflows for AI models?
Answer: In LangChain, “chains” are sequences of linked operations that guide an LLM
through multiple steps of a task. A chain could include steps like querying a database,
processing an API call, performing text generation, or retrieving information. By
combining these steps, developers can build workflows where each stage refines the
output based on the previous one, creating more advanced interactions and
decision-making capabilities.
3. Question: How would you use LangChain to integrate external data sources like APIs or
databases into a language model workflow?
Answer: LangChain allows the integration of external data sources by creating specific
“chains” or modules that can query APIs or databases during the execution of the
workflow. For example, you could use a SQL chain to retrieve information from a
database or an API chain to call external APIs. This data can then be fed into the LLM to
generate more contextually relevant responses.
4. Question: What is the role of memory in LangChain, and how does it help maintain
context across interactions with LLMs?
Answer: Memory in LangChain allows the model to remember and maintain context
over multiple interactions or conversations. Instead of treating each interaction as
independent, memory helps the model retain information from previous steps or
exchanges, making it suitable for conversational agents or chatbots that need to
reference past interactions.
5. Question: How does LangChain support different types of tasks like summarization,
question answering, and chatbots?
Answer: LangChain provides task-specific modules for different types of operations. For
example, it has ready-made chains for summarization, question answering, and
document retrieval. It also supports custom task chains that can be combined with other
data-processing steps to perform more specialized functions like chatbot creation or
real-time decision-making.
Fine-Tuning
1. Question: What are the main steps involved in fine-tuning a pre-trained model for a
specific task?
Answer: The main steps for fine-tuning a pre-trained model are: (1) Select a pre-trained
model relevant to the task, (2) Prepare and preprocess a task-specific dataset, (3)
Freeze some layers of the pre-trained model (optional, for efficiency), (4) Fine-tune the
LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024
82
LLMs and GenAI Simplified: An Easy Path to Understanding
remaining layers by adjusting hyperparameters (e.g., learning rate, batch size), and (5)
Validate the model on a held-out dataset to ensure generalization.
2. Question: How would you decide whether to fine-tune a model or use it out of the box
for your application?
Answer: The decision depends on the specificity of the task and the available data. For
generic tasks, using a pre-trained model without fine-tuning is often sufficient. However,
if the task requires domain-specific knowledge (e.g., legal, medical), or if the
out-of-the-box performance is not satisfactory, fine-tuning with a relevant dataset is
necessary to tailor the model for your application.
3. Question: Can you explain the difference between task-specific fine-tuning and
domain-specific fine-tuning?
Answer: Task-specific fine-tuning involves adjusting the model to perform a particular
task, like classification, summarization, or translation. Domain-specific fine-tuning, on the
other hand, involves adapting the model to a specialized domain (e.g., finance,
healthcare) by training it on data that includes the terminology and nuances of that
domain, enabling better performance for tasks within that field.
4. Question: How does the choice of dataset impact the effectiveness of fine-tuning an
LLM?
Answer: The dataset’s quality, size, and relevance to the target task/domain are critical.
A high-quality, task-specific
Interview Questions for Practice
Generative AI (Gen AI) and Large Language Models (LLM)
LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024
83
LLMs and GenAI Simplified: An Easy Path to Understanding
1. Can you explain the difference between generative AI and traditional machine learning
models?
2. How does a large language model (LLM) work, and what makes it different from other
types of neural networks?
3. What are the challenges of deploying generative AI models in production environments?
4. Describe how transformers have revolutionized NLP and why they are key to the
success of LLMs.
5. How would you evaluate the performance of a generative language model, beyond
simple accuracy metrics?
6. What is "attention" in the context of transformers, and how does it contribute to a model's
ability to understand context?
7. Can you explain the difference between zero-shot, one-shot, and few-shot learning, and
how LLMs use them?
8. What are some common ethical concerns surrounding the use of generative AI in
content creation?
9. How does temperature affect the output of generative language models?
10. How do LLMs handle long-range dependencies in text, and why is this important for text
generation?
Hugging Face
1. How would you fine-tune a pre-trained model using Hugging Face’s Transformers
library?
2. Can you explain the role of the Hugging Face Model Hub and how it simplifies the
process of working with LLMs?
3. How do you manage and version control different models and datasets in Hugging
Face?
4. What’s the difference between AutoModel and AutoTokenizer classes in Hugging
Face Transformers?
5. Can you walk us through the process of creating a custom dataset for training an LLM in
Hugging Face?
6. How would you evaluate model performance using Hugging Face’s datasets library?
7. What are some best practices for sharing models on Hugging Face’s model repository?
8. Can you explain how Hugging Face’s accelerate library helps in speeding up model
training and inference?
9. What are the key differences between Hugging Face’s Trainer API and writing custom
training loops?
10. How would you deploy a Hugging Face model on AWS or another cloud platform?
OpenAI
1. How does OpenAI’s GPT model handle generating responses when no fine-tuning has
been applied?
LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024
84
LLMs and GenAI Simplified: An Easy Path to Understanding
2. Explain OpenAI’s approach to aligning large language models with human values
(Reinforcement Learning from Human Feedback - RLHF).
3. What are some use cases where OpenAI’s GPT models can be directly integrated into
applications?
4. How does OpenAI’s API pricing model work, and how can you optimize costs when
deploying LLMs?
5. What are the steps involved in using OpenAI’s GPT-4 for generating content specific to a
niche domain?
6. How would you fine-tune an OpenAI model for a specific task like legal document
summarization?
7. What are some security concerns when integrating OpenAI’s API into a production
system?
8. How does OpenAI handle tokenization, and what are the trade-offs of its token-based
pricing?
9. How does OpenAI ensure that the data used in pre-training their models remains ethical
and unbiased?
10. Can you explain the importance of API rate limits in OpenAI’s products and how you
would handle them in a large-scale deployment?
LangChain
1. What is LangChain, and how does it extend the capabilities of large language models?
2. Can you explain the concept of “chains” in LangChain and how they help in building
complex workflows for AI models?
3. How would you use LangChain to integrate external data sources like APIs or databases
into a language model workflow?
4. What is the role of memory in LangChain, and how does it help maintain context across
interactions with LLMs?
5. Can you describe a scenario where you would use LangChain to create a multi-step
conversation with a language model?
6. How does LangChain support different types of tasks like summarization, question
answering, and chatbots?
7. Can you explain how LangChain interacts with different LLM providers, such as OpenAI
and Hugging Face, in the same workflow?
8. What are the advantages of using LangChain over directly interacting with an LLM API?
9. How would you design a LangChain pipeline for a customer support chatbot that
retrieves answers from a knowledge base?
10. Can you walk us through an example of using LangChain for text generation based on
real-time financial data?
Fine-Tuning
1. What are the main steps involved in fine-tuning a pre-trained model for a specific task?
LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024
85
LLMs and GenAI Simplified: An Easy Path to Understanding
2. How would you decide whether to fine-tune a model or use it out of the box for your
application?
3. Can you explain the difference between task-specific fine-tuning and domain-specific
fine-tuning?
4. What are some of the common challenges when fine-tuning a large language model,
and how can they be mitigated?
5. How does the choice of dataset impact the effectiveness of fine-tuning an LLM?
6. Can you explain the concept of Low-Rank Adaptation (LoRA) and its role in fine-tuning
large models?
7. How do you handle overfitting when fine-tuning a model on a relatively small dataset?
8. What are some strategies to reduce computational cost during fine-tuning without
sacrificing model performance?
9. How do you fine-tune a model for multilingual tasks, and what are the key considerations
in this process?
10. Can you describe how fine-tuning might affect the ethical considerations surrounding the
deployment of a large language model?
AI Governance
1. What is AI governance, and why is it critical in today’s AI development landscape?
2. How would you address the challenges of AI transparency and explainability in a
black-box model like GPT?
3. What role does data governance play in ensuring the ethical use of AI models?
4. How do you ensure fairness and mitigate bias in AI models during development and
deployment?
5. What are the key components of an effective AI governance framework within an
organization?
6. Can you explain how privacy concerns are handled in AI systems that process sensitive
data?
7. How do regulations like GDPR affect the way AI models are trained and deployed,
especially when using user-generated data?
8. What strategies can be used to ensure that AI models remain aligned with ethical
principles and societal values?
9. How would you measure and assess the risks posed by deploying generative AI models
in public-facing applications?
10. What steps would you take to implement an AI governance policy that addresses both
ethical concerns and operational efficiency?
LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024
86
LLMs and GenAI Simplified: An Easy Path to Understanding
LLM FineTuning Code Samples:
Usecase: LangChain-OpenAI-RAG
This example loads latest winter olympic data from pdf and you can chat with pdf
Note: This code is working in Google Colob - Oct 7th 2024
Google Colob link:
https://colab.research.google.com/drive/1N3a1hEWuDDlUw6bliKHVM4l1IARPX2iB
# Import necessary libraries
!pip install langchain
!pip install openai
!pip install PyPDF2
!pip install faiss-cpu
!pip install tiktoken
!pip install -U langchain-community
!pip install pypdf
"""
# try below prompts
You: who is youngest medalist
Assistant: Scott ALLEN from the USA, who won a bronze medal in figure
skating at the age of 14 years and 363 days in 1964.
You: wrong
Assistant:
The youngest medalist in an individual event is Scott ALLEN (USA), who won
a bronze medal in figure skating in 1964 at the age of 14 years and 363
days.
You: do you find Kim Yun
Assistant: Yes, Yun-Mi KIM (KOR) is the youngest gold medallist in an
individual event, winning the short-track speed skating 3,000m relay in
1994 at the age of 13 years and 85 days. She also won a bronze medal in the
same event at the 1998 Winter Olympics.
You: then why you told wrong
Assistant: I mentioned Scott Allen because he is the youngest medalist in
an individual event for men. Yun-Mi Kim is the youngest medalist in an
individual event for women.
LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024
87
LLMs and GenAI Simplified: An Easy Path to Understanding
You: do you see Dimitrios Loundras
Assistant: Scott ALLEN (USA), bronze medallist in figure skating in 1964,
aged 14 years and 363 days.
You: do you know Dimitrios Loundras as medalist
Assistant: Dimitrios Loundras is not mentioned in this context, so it is
not possible to determine who he is or if he won a medal.
"""
from langchain.document_loaders import PyPDFLoader
from langchain.text_splitter import CharacterTextSplitter
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.vectorstores import FAISS
from langchain.chains import ConversationalRetrievalChain
from langchain.llms import OpenAI
from langchain.memory import ConversationBufferMemory
#from openai import OpenAI # for calling the OpenAI API
from langchain.chains.question_answering import load_qa_chain
from langchain.chains.combine_documents.stuff import StuffDocumentsChain
from langchain.chains import LLMChain # Import LLMChain
from langchain.prompts import PromptTemplate # Import PromptTemplate
GPT_MODEL = "gpt-3.5-turbo"
from google.colab import userdata
api_key = userdata.get('OPENAI_API_KEY')
#client = OpenAI(api_key=api_key)
# 1. Download the PDF to your local machine
#!wget
https://stillmed.olympics.com/media/Documents/Olympic-Games/Factsheets/Reco
rds-of-medals-at-the-Olympic-Winter-Games.pdf
# 2. Load the PDF Document from the local file
pdf_loader =
PyPDFLoader("sample_data/Records-of-medals-at-the-Olympic-Winter-Games.pdf"
) # Load from the downloaded file
documents = pdf_loader.load()
# 3. Split the document into chunks
splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=100)
document_chunks = splitter.split_documents(documents)
# 4. Generate embeddings for the document chunks
LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024
88
LLMs and GenAI Simplified: An Easy Path to Understanding
embeddings = OpenAIEmbeddings(openai_api_key=api_key) # Use the api_key
variable here
vector_store = FAISS.from_documents(document_chunks, embeddings)
# 5. Set up the memory for conversation history
memory = ConversationBufferMemory(memory_key="chat_history",
return_messages=True)
# 6. Create a Conversational Retrieval Chain using OpenAI as the LLM
llm = OpenAI(openai_api_key=api_key) # Use the api_key variable here
retriever = vector_store.as_retriever(search_kwargs={"k": 2})
# Create a combine_docs_chain
chain = load_qa_chain(llm=llm, chain_type="stuff") # Create a default QA
chain
# Define a question generation chain
template = """Given the following conversation and a follow up question,
rephrase the follow up question to be a standalone question.
Chat History: {chat_history}
Follow Up Input: {question}
Standalone Question:"""
prompt_template = PromptTemplate(
input_variables=["chat_history", "question"], template=template
)
question_generator = LLMChain(llm=llm, prompt=prompt_template)
# Conversational chain that uses LLM and document retrieval
conversational_chain = ConversationalRetrievalChain(
retriever=retriever,
combine_docs_chain=chain, # Pass the combine_docs_chain
memory=memory,
question_generator=question_generator # Pass the question_generator
)
# 7. Start a conversation with the PDF
print("Ask your question about the PDF!")
while True:
query = input("You: ")
if query.lower() == "exit":
LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024
89
LLMs and GenAI Simplified: An Easy Path to Understanding
print("Ending the chat!")
break
response = conversational_chain({"question": query})
print(f"Assistant: {response['answer']}")
AI Evaluation Metrics
1. Classification Metrics:
These are used when the model predicts discrete labels or categories.
● Accuracy: The percentage of correct predictions out of total predictions. Best suited for
balanced datasets.
● Precision: The ratio of true positives to the sum of true positives and false positives.
Focuses on the quality of positive predictions.
● Recall (Sensitivity): The ratio of true positives to the sum of true positives and false
negatives. Focuses on the ability to capture all positive cases.
● F1 Score: Harmonic mean of precision and recall. Useful when the balance between
precision and recall is important.
● ROC-AUC (Receiver Operating Characteristic - Area Under Curve): Measures the
ability of the model to distinguish between classes. AUC = 1 represents a perfect model,
while AUC = 0.5 represents a random model.
● Confusion Matrix: Provides a breakdown of actual vs. predicted classifications, showing
true positives, false positives, true negatives, and false negatives.
● Log Loss (Cross-Entropy Loss): Penalizes incorrect classifications by the predicted
probability assigned to each class, providing insight into the confidence of the model's
predictions.
2. Regression Metrics:
For models that predict continuous values.
LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024
90
LLMs and GenAI Simplified: An Easy Path to Understanding
● Mean Squared Error (MSE): Measures the average of the squares of the errors.
Penalizes larger errors more than smaller ones.
● Root Mean Squared Error (RMSE): The square root of MSE, interpretable in the same
units as the predicted values.
● Mean Absolute Error (MAE): Measures the average absolute difference between
predicted and actual values. More robust to outliers than MSE.
● R² (Coefficient of Determination): Indicates the proportion of the variance in the
dependent variable that is predictable from the independent variables. Values closer to 1
indicate a better fit.
● Adjusted R²: Modified version of R² that adjusts for the number of predictors in the
model, helping to avoid overfitting.
● Mean Absolute Percentage Error (MAPE): Measures the percentage error between
predicted and actual values. Useful for comparing models in terms of relative error.
3. Natural Language Processing (NLP) Metrics:
For tasks like text generation, question answering, and classification.
● BLEU (Bilingual Evaluation Understudy): Evaluates the accuracy of
machine-generated text by comparing it to reference texts based on n-gram overlap.
● ROUGE (Recall-Oriented Understudy for Gisting Evaluation): Measures recall-based
precision and overlap between model-generated text and reference text, particularly
useful for summarization tasks.
● Perplexity: Often used in language modeling, perplexity is a measure of how well a
probability model predicts a sample. Lower perplexity indicates better performance.
● Exact Match (EM): Common in question answering tasks, it measures whether the
predicted answer matches the ground truth exactly.
● Word Error Rate (WER): Measures the number of substitutions, insertions, and
deletions in speech-to-text predictions. Lower WER indicates better accuracy.
● BERTScore: Uses embeddings from transformer models like BERT to compute the
similarity between generated text and reference text.
4. Clustering Metrics:
For unsupervised learning tasks like clustering.
● Silhouette Score: Measures how similar a data point is to its own cluster compared to
other clusters. Ranges from -1 to 1, with higher values indicating better-defined clusters.
● Adjusted Rand Index (ARI): Compares the similarity between two clusterings by
considering all pairs of samples and counting pairs that are assigned in the same or
different clusters in both clusterings.
● Davies-Bouldin Index: Measures the average similarity ratio of each cluster with its
most similar cluster. Lower values indicate better clustering.
● Homogeneity Score: Measures whether each cluster contains only members of a single
class.
LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024
91
LLMs and GenAI Simplified: An Easy Path to Understanding
5. Ranking Metrics:
Used in tasks such as information retrieval and recommendation systems.
● Mean Reciprocal Rank (MRR): Evaluates how well a list of ranked items matches the
ground truth list.
● Normalized Discounted Cumulative Gain (nDCG): Measures the usefulness, or gain,
of an item based on its position in the result list, rewarding higher ranks more than lower
ranks.
● Hit Rate (HR): Measures the percentage of times the ground truth item is present in the
top-K recommendations.
6. Advanced Metrics:
For deep learning models, complex tasks, and more nuanced model evaluations.
● Precision-Recall AUC: Similar to ROC-AUC but more informative in cases of
imbalanced datasets, showing trade-offs between precision and recall.
● Brier Score: Measures the accuracy of probabilistic predictions. Lower values indicate
better probabilistic predictions.
● Expected Calibration Error (ECE): Measures how well predicted probabilities align with
actual outcomes.
● Shapley Values (SHAP): Used for model explainability by measuring the contribution of
each feature to the prediction of individual instances.
● Fisher Information Matrix (FIM): Measures how much information a parameter
contains about the outcome of the model, often used in reinforcement learning and
meta-learning.
7. Multiclass and Multilabel Metrics:
For problems with more than two labels or where multiple labels can be assigned to a single
instance.
● Macro-Averaged Precision/Recall/F1: Averages the metric across all classes without
considering the proportion of each class.
● Micro-Averaged Precision/Recall/F1: Averages the metric across all classes by
considering the total true positives, false positives, and false negatives.
● Hamming Loss: The fraction of labels that are incorrectly predicted in multilabel
classification tasks.
8. Fairness and Bias Metrics:
To ensure AI models perform equitably across demographic groups.
LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024
92
LLMs and GenAI Simplified: An Easy Path to Understanding
● Demographic Parity: Measures if the model's predictions are independent of a
protected attribute (like gender or race).
● Equalized Odds: Measures whether a model's false positive and true positive rates are
equal across groups.
● Disparate Impact: Evaluates whether a protected group is adversely affected by the
model's decisions.
Conclusion
To refine your AI model evaluation, choose metrics that are most aligned with the task, goal
(e.g., accuracy vs. interpretability), and data type. For instance:
● NLP tasks may rely heavily on metrics like BLEU or ROUGE.
● Fairness metrics are critical in socially sensitive applications.
● Advanced AI applications can use SHAP values or expected calibration error for
deeper insights into model performance and reliability.
Appendix A: External References
PDFS:
Machine learning
● Cambridge machine learning: https://alex.smola.org/drafts/thebook.pdf
● ML from Theory to algorithms:
https://www.cs.huji.ac.il/~shais/UnderstandingMachineLearning/understanding-machine-l
earning-theory-algorithms.pdf
● O'reilly:
https://www.nrigroupindia.com/e-book/Introduction%20to%20Machine%20Learning%20
with%20Python%20(%20PDFDrive.com%20)-min.pdf
● Pattern machine learning
https://www.microsoft.com/en-us/research/uploads/prod/2006/01/Bishop-Pattern-Recogn
ition-and-Machine-Learning-2006.pdf
LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024
93
LLMs and GenAI Simplified: An Easy Path to Understanding
● Machine learning
https://www.cin.ufpe.br/~cavmj/Machine%20-%20Learning%20-%20Tom%20Mitchell.pdf
● Machine Learning Lecture notes
https://mrcet.com/downloads/digital_notes/CSE/IV%20Year/MACHINE%20LEARNING(R
17A0534).pdf
● Stanford ML book
https://ai.stanford.edu/~nilsson/MLBOOK.pdf
● Data science & Statistics
https://people.smp.uq.edu.au/DirkKroese/DSML/DSML.pdf
● Fundamentals of ML
https://www.hlevkin.com/hlevkin/45MachineDeepLearning/ML/Foundations_of_Machine_
Learning.pdf
● ML for beginners
https://bmansoori.ir/book/Machine%20Learning%20For%20Absolute%20Beginners.pdf
● ML lectures
https://www.seas.upenn.edu/~cis5190/fall2017/lectures/01_introduction.pdf
● ML basics
https://courses.edx.org/asset-v1:ColumbiaX+CSMM.101x+1T2017+type@asset+block@
AI_edx_ml_5.1intro.pdf
● Harvard UG Book
https://harvard-ml-courses.github.io/cs181-web/static/cs181-textbook.pdf
● Deep Learning
https://fleuret.org/public/lbdl.pdf
● Hundred page ML book
http://ema.cri-info.cm/wp-content/uploads/2019/07/2019BurkovTheHundred-pageMachin
eLearning.pdf
● Fundamentls of ML
https://www.interactions.com/wp-content/uploads/2017/06/machine_learning_wp-5.pdf
● A Course in Machine Learning [Download]
● Advanced Machine Learning with Python [Download]
● Big Data, Data Mining, and Machine Learning [Download]
● Building Intelligent Systems - A Guide to Machine Learning Engineering
[Download]
● Building Machine Learning Systems with Python - Second Edition [Download]
● Designing Machine Learning Systems with Python [Download]
● Introduction to Machine Learning with Python [Download]
● Introduction To Python Programming - Beginner's Guide To Computer
Programming And Machine Learning [Download]
● Large Scale Machine Learning with Python [Download]
● Large Scale Machine Learning with Spark [Download]
LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024
94
LLMs and GenAI Simplified: An Easy Path to Understanding
● Learning Generative Adversarial Networks [Download]
● Learning NumPy Array [Download]
● Learning scikit-learn - Machine Learning in Python [Download]
● Machine Learning - Hands-On for Developers and Technical Professionals
[Download]
● Machine Learning - Jason Bell [Download]
● Machine Learning for Developers [Download]
● Machine Learning for Email [Download]
● Machine Learning for Hackers [Download]
● Machine Learning for the Web [Download]
● Machine Learning in Action - 中文版 [Download]
● Machine Learning in Action [Download]
● Machine Learning in Java [Download]
● Machine Learning Projects for .NET Developers [Download]
● Machine Learning Using C# Succinctly [Download]
● Machine Learning with Spark [Download]
● Mastering .NET Machine Learning [Download]
● Mastering Machine Learning with Python in Six Steps [Download]
● Mastering Machine Learning with scikit-learn - Second Edition [Download]
● Microsoft Azure Machine Learning [Download]
● Neural Network Programming with Java [Download]
● Neural Networks Using C# Succinctly [Download]
● Practical Machine Learning with H2O - Powerful, Scalable Techniques for Deep
Learning and AI [Download]
● Practical Machine Learning [Download]
● Practical Reinforcement Learning [Download]
● Python - Deeper Insights into Machine Learning [Download]
● Python for Probability, Statistics, and Machine Learning [Download]
● Python Machine Learning Blueprints [Download]
● Python Machine Learning By Example [Download]
● Python Machine Learning Case Studies [Download]
● Python Machine Learning Cookbook - Early Release [Download]
● Python Machine Learning Cookbook [Download]
● Python Machine Learning [Download]
● Python Real World Machine Learning [Download]
● Quantum Machine Learning - Peter Wittek [Download]
● Real-World Machine Learning [Download]
● Reinforcement Learning - With Open AI, TensorFlow and Keras Using Python
[Download]
● scikit-learn Cookbook - Second Edition [Download]
LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024
95
LLMs and GenAI Simplified: An Easy Path to Understanding
● Thoughtful Machine Learning with Python A Test-Driven Approach [Download]
● Thoughtful Machine Learning with Python [Download]
● Using Python to Develop Analytics, Control and Machine Learning Products
[Download]
● What You Need to Know about Machine Learning [Download]
● What You Need to Know about R [Download]
Gen AI security: https://arxiv.org/pdf/2405.12750
LLM and Gen AI: https://publications.parliament.uk/pa/ld5804/ldselect/ldcomm/54/54.pdf
Gen AI risks: https://arxiv.org/pdf/2406.04734
LLM and GPT:
https://www.american-cse.org/csce2023-ieee/pdfs/CSCE2023-5LlpKs7cpb4k2UysbLCuOx/2759
00a383/275900a383.pdf
Code:
Hugging Face:
https://github.com/huggingface
OpenAI:
https://platform.openai.com/docs/examples
https://github.com/openai/openai-cookbook/tree/main/examples
Langchain:
https://python.langchain.com/v0.1/docs/guides/productionization/evaluation/examples/
Transformer notebooks:
https://github.com/sukhitashvili/transformer_notebooks
Blogs
https://www.vellum.ai/llm-leaderboard#cost-context
LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024
96
LLMs and GenAI Simplified: An Easy Path to Understanding
Articles:
LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024
97

More Related Content

PPTX
[DSC Europe 23] Ivan Petrovic - Approach to Architecting Generative AI Solutions
PDF
Brief History and Overview of LLM Agents
PDF
Build with AI on Google Cloud Session #1
PDF
Master LLMs with LangChain -the basics of LLM
PPTX
Agentic RAG and Small & Specialized Models v1.6.pptx
PDF
Model Context Protocol (MCP): The Future of AI | Bluebash
PDF
Optimizing GenAI apps, by N. El Mawass and Maria Knorps
PDF
LLM with Java.pdf
[DSC Europe 23] Ivan Petrovic - Approach to Architecting Generative AI Solutions
Brief History and Overview of LLM Agents
Build with AI on Google Cloud Session #1
Master LLMs with LangChain -the basics of LLM
Agentic RAG and Small & Specialized Models v1.6.pptx
Model Context Protocol (MCP): The Future of AI | Bluebash
Optimizing GenAI apps, by N. El Mawass and Maria Knorps
LLM with Java.pdf

Similar to LLMs and GenAI Simplified_ An Easy Path to Understanding [V10252024].pdf (20)

PPTX
Understanding Machine Learning --- Chapter 2.pptx
PDF
Wall Street Mastermind Sector Spotlight - Technology (October 2023).pdf
PDF
Overview of Artificial Intelligence - Technology
PDF
Automate your Job and Business with ChatGPT #3 - Fundamentals of LLM/GPT
PPTX
Generative AI and Large Language Models (LLMs)
PPTX
GenAIGenAIGenAIGenAIGenAIGenAIGenAI.pptx
PPTX
An Introduction to AI LLMs & SharePoint For Champions and Super Users Part 1
PPTX
The Beginner's Guide To Large Language Models
PDF
The Significance of Large Language Models (LLMs) in Generative AI2.pdf
PDF
Quick Overview of the Top 9 Popular LLMs.pdf
PPTX
Cold_Email_Generator_using_LLM_APIS.pptx
PPTX
Large Language Models (LLMs) part one.pptx
PDF
Enhancing SEO Content Writing with AI: Opportunities & Challenges
PDF
Large Language Models, Data & APIs - Integrating Generative AI Power into you...
PDF
Using Generative AI to better understand B2B audiences: From Topic Modelling ...
PPTX
OPEN SOURCE MODELS IN ARTIFICIAL INTELLIGENCE
PDF
Large Language Models Bootcamp
PDF
AI and ML Series - Introduction to Generative AI and LLMs - Session 1
PDF
The A-to-Z Blueprint for AI Mastery Ebook.pdf
PDF
BUILDING Q&A EDUCATIONAL APPLICATIONS WITH LLMS - MARCH 2024.pdf
Understanding Machine Learning --- Chapter 2.pptx
Wall Street Mastermind Sector Spotlight - Technology (October 2023).pdf
Overview of Artificial Intelligence - Technology
Automate your Job and Business with ChatGPT #3 - Fundamentals of LLM/GPT
Generative AI and Large Language Models (LLMs)
GenAIGenAIGenAIGenAIGenAIGenAIGenAI.pptx
An Introduction to AI LLMs & SharePoint For Champions and Super Users Part 1
The Beginner's Guide To Large Language Models
The Significance of Large Language Models (LLMs) in Generative AI2.pdf
Quick Overview of the Top 9 Popular LLMs.pdf
Cold_Email_Generator_using_LLM_APIS.pptx
Large Language Models (LLMs) part one.pptx
Enhancing SEO Content Writing with AI: Opportunities & Challenges
Large Language Models, Data & APIs - Integrating Generative AI Power into you...
Using Generative AI to better understand B2B audiences: From Topic Modelling ...
OPEN SOURCE MODELS IN ARTIFICIAL INTELLIGENCE
Large Language Models Bootcamp
AI and ML Series - Introduction to Generative AI and LLMs - Session 1
The A-to-Z Blueprint for AI Mastery Ebook.pdf
BUILDING Q&A EDUCATIONAL APPLICATIONS WITH LLMS - MARCH 2024.pdf
Ad

Recently uploaded (20)

PDF
Vision Prelims GS PYQ Analysis 2011-2022 www.upscpdf.com.pdf
PDF
Empowerment Technology for Senior High School Guide
PDF
HVAC Specification 2024 according to central public works department
PDF
Chinmaya Tiranga quiz Grand Finale.pdf
PDF
احياء السادس العلمي - الفصل الثالث (التكاثر) منهج متميزين/كلية بغداد/موهوبين
PDF
AI-driven educational solutions for real-life interventions in the Philippine...
PPTX
CHAPTER IV. MAN AND BIOSPHERE AND ITS TOTALITY.pptx
PPTX
Chinmaya Tiranga Azadi Quiz (Class 7-8 )
PPTX
Computer Architecture Input Output Memory.pptx
PPTX
Share_Module_2_Power_conflict_and_negotiation.pptx
PPTX
Unit 4 Computer Architecture Multicore Processor.pptx
PDF
OBE - B.A.(HON'S) IN INTERIOR ARCHITECTURE -Ar.MOHIUDDIN.pdf
PDF
A GUIDE TO GENETICS FOR UNDERGRADUATE MEDICAL STUDENTS
PDF
1_English_Language_Set_2.pdf probationary
PDF
David L Page_DCI Research Study Journey_how Methodology can inform one's prac...
PDF
Weekly quiz Compilation Jan -July 25.pdf
PDF
BP 704 T. NOVEL DRUG DELIVERY SYSTEMS (UNIT 1)
PPTX
Virtual and Augmented Reality in Current Scenario
PDF
medical_surgical_nursing_10th_edition_ignatavicius_TEST_BANK_pdf.pdf
PDF
My India Quiz Book_20210205121199924.pdf
Vision Prelims GS PYQ Analysis 2011-2022 www.upscpdf.com.pdf
Empowerment Technology for Senior High School Guide
HVAC Specification 2024 according to central public works department
Chinmaya Tiranga quiz Grand Finale.pdf
احياء السادس العلمي - الفصل الثالث (التكاثر) منهج متميزين/كلية بغداد/موهوبين
AI-driven educational solutions for real-life interventions in the Philippine...
CHAPTER IV. MAN AND BIOSPHERE AND ITS TOTALITY.pptx
Chinmaya Tiranga Azadi Quiz (Class 7-8 )
Computer Architecture Input Output Memory.pptx
Share_Module_2_Power_conflict_and_negotiation.pptx
Unit 4 Computer Architecture Multicore Processor.pptx
OBE - B.A.(HON'S) IN INTERIOR ARCHITECTURE -Ar.MOHIUDDIN.pdf
A GUIDE TO GENETICS FOR UNDERGRADUATE MEDICAL STUDENTS
1_English_Language_Set_2.pdf probationary
David L Page_DCI Research Study Journey_how Methodology can inform one's prac...
Weekly quiz Compilation Jan -July 25.pdf
BP 704 T. NOVEL DRUG DELIVERY SYSTEMS (UNIT 1)
Virtual and Augmented Reality in Current Scenario
medical_surgical_nursing_10th_edition_ignatavicius_TEST_BANK_pdf.pdf
My India Quiz Book_20210205121199924.pdf
Ad

LLMs and GenAI Simplified_ An Easy Path to Understanding [V10252024].pdf

  • 2. LLMs and GenAI Simplified: An Easy Path to Understanding LLMs Simplified: An Easy Path to Understanding [DRAFT] 6 Definitions 6 Background 17 1. Introduction to Large Language Models (LLMs) 17 2. LLM Architecture 17 3. Applications of LLMs 17 4. LLM Performance Benchmarks 18 5. Governance, Ethics, and Responsible AI 18 6. Challenges and Future Directions 19 Conclusion 19 CHAPTER -0 : FUNDAMENTALS 19 Overview: 19 Step 1: Set up the Neural Network Structure 20 Step 2: Initialize Weights and Biases 20 Step 3: Forward Propagation 20 Step 4: Calculate Loss (Error) 21 Step 5: Backpropagation 21 Step 6: Repeat the Process 21 Simple Example: 22 Summary: 22 1. Neural Network Models 24 2. Activation Functions 24 3. Loss Functions 25 4. Optimizers 25 5. Metrics 26 Putting It All Together: 26 1. Sequential Model 27 2. Functional API 27 3. Subclassing Model 28 4. Model with Shared Layers 29 5. Multi-Input and Multi-Output Models 29 6. Autoencoders 30 7. GANs (Generative Adversarial Networks) 31 Summary 32 1. Mean Squared Error (MSE) 32 2. Mean Absolute Error (MAE) 32 3. Binary Cross-Entropy (Log Loss) 33 4. Categorical Cross-Entropy 33 LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024 2
  • 3. LLMs and GenAI Simplified: An Easy Path to Understanding 5. Sparse Categorical Cross-Entropy 33 6. Hinge Loss 34 7. Huber Loss 34 8. Kullback-Leibler Divergence (KL Divergence) 34 9. Poisson Loss 35 Summary of Loss Functions by Type: 35 CHAPTER-1: GenAI and LLM 36 2. LLM Types 36 1. General-Purpose LLMs 36 2. Multilingual LLMs 37 3. Instruction-Following LLMs 37 4. Conversational LLMs 37 5. Code Generation LLMs 37 6. Specialized LLMs 37 7. Knowledge-Enhanced LLMs 38 8. Multimodal LLMs 38 9. Compression and Parameter-Efficient LLMs 38 10. Large Language Models with Memory 38 11. Few-Shot and Zero-Shot LLMs 38 12. Reinforcement Learning-Based LLMs 38 3. Popular LLMs 39 1. OpenAI GPT (Generative Pre-trained Transformer) 39 2. LLaMA (Large Language Model Meta AI) 39 3. Google Gemini 40 4. Claude (Claude 1, Claude 2) 40 5. PaLM (Pathways Language Model) 40 6. BLOOM 41 7. Grok (XAI) 41 8. Mistral 42 Conclusion: 42 4. Open source LLMs 42 1. BERT (Bidirectional Encoder Representations from Transformers) 42 2. GPT-2 43 3. RoBERTa 43 4. T5 (Text-to-Text Transfer Transformer) 43 5. BLOOM (BigScience Large Open-science Open-access Multilingual Language Model) 43 6. DistilBERT 43 7. XLM-R (XLM-RoBERTa) 43 8. BART (Bidirectional and Auto-Regressive Transformers) 44 9. Flan-T5 44 LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024 3
  • 4. LLMs and GenAI Simplified: An Easy Path to Understanding 10. CodeBERT 44 CHAPTER 2: LLM Architecture 45 5. LLM Transformer architecture 45 1. Encoder Architecture 45 2. Decoder Architecture 45 3. Encoder-Decoder Architecture 45 Key Components: 46 Step 1: Input (Good morning) 47 Encoder Steps: Processing the Input Sentence 47 Decoder Steps: Generating the Translation ("Bonjour") 48 Putting It All Together: 49 CHAPTER 3: LLM Applications 49 6. LLM Gen AI use cases 49 1. Text Generation 49 2. Question Answering 49 3. Text Summarization 49 4. Text Classification 50 5. Translation 50 6. Conversational AI (Chatbots) 50 7. Image Generation (Text-to-Image) 50 8. Image Classification 50 9. Image Segmentation 50 10. Audio Processing (Speech-to-Text and Text-to-Speech) 51 11. Code Generation 51 12. Sentiment Analysis 51 13. Named Entity Recognition (NER) 51 14. Data Augmentation 51 15. Image Captioning 51 16. Multi-modal AI 52 17. Text-Based Games/Interactive Stories 52 18. Knowledge Base Extraction 52 19. Fake News Detection 52 20. Grammar and Style Correction 52 21. Legal Document Generation 52 22. Paraphrasing 53 23. Automated Code Review 53 24. Emotion Recognition in Text 53 25. Product Recommendation 53 26. Text-to-Programming Language Conversion 53 27. Style Transfer (Text) 54 28. Document Comparison 54 LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024 4
  • 5. LLMs and GenAI Simplified: An Easy Path to Understanding 29. Content Moderation 54 30. Voice Cloning 54 31. Image Super-Resolution 54 32. Code Translation (Language-to-Language) 54 33. Image Inpainting 55 34. Text-Based Music Generation 55 35. Visual Question Answering (VQA) 55 36. Data-to-Text Generation 55 37. Human Pose Estimation 55 38. Time-Series Forecasting 55 39. Reinforcement Learning for Text-Based Tasks 55 40. Automated Tagging and Metadata Generation 56 7. LLM Model Parameters 56 1. Temperature 56 2. Max Tokens 57 3. Top-k Sampling 57 4. Top-p Sampling (Nucleus Sampling) 58 5. Frequency Penalty 58 6. Presence Penalty 59 7. Stop Sequences 59 8. Best-of (n-best) 60 9. Echo 60 10. Stream 61 8. LLM benchmarks 61 1. SuperGLUE 61 2. GLUE (General Language Understanding Evaluation) 61 3. OpenAI HumanEval 61 4. SQuAD (Stanford Question Answering Dataset) 62 5. MMLU (Massive Multitask Language Understanding) 62 6. HELLASWAG 62 7. Big-Bench (Beyond the Imitation Game Benchmark) 62 8. LAMBADA 62 9. TriviaQA 62 10. CoQA (Conversational Question Answering) 62 11. Winograd Schema Challenge 62 12. ARC (AI2 Reasoning Challenge) 63 13. PiQA (Physical Interaction: Question Answering) 63 14. BoolQ (Boolean Questions) 63 15. TyDiQA 63 16. StoryCloze 63 17. WinoGrande 63 LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024 5
  • 6. LLMs and GenAI Simplified: An Easy Path to Understanding 18. DROP (Discrete Reasoning Over Paragraphs) 63 19. Hendrycks Test 64 20. XGLUE 64 21. CodeXGLUE 64 22. CLUE (Chinese Language Understanding Evaluation) 64 9. LLM Finetuning 64 a) LLM with Prompt Engineering Tuning 64 Steps: 64 Example: 65 Resources: 65 b) LLM Instructions-based Training Tuning 65 Steps: 65 Example: 66 Resources: 67 c) LLM with RAG (Retrieval-Augmented Generation) Fine-tuning 67 Steps: 67 Example: 68 Resources: 69 d) LLM with LoRA (Low-Rank Adaptation) 69 Steps: 69 Example: 69 Resources: 70 e) LLM with QLoRA (Quantized Low-Rank Adaptation) 70 Steps: 70 Example: 70 Resources: 70 f) LLM with Full Tuning 70 Steps: 70 Example: 70 Resources: 72 10. Interview Questions 72 LLM Architecture 73 Transformers 73 Optimization Techniques 74 Ethical Considerations 75 Deployment Strategies 76 Hugging Face 76 OpenAI 77 LangChain 78 Fine-Tuning 79 Generative AI (Gen AI) and Large Language Models (LLM) 80 LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024 6
  • 7. LLMs and GenAI Simplified: An Easy Path to Understanding Hugging Face 81 OpenAI 81 LangChain 82 Fine-Tuning 82 AI Governance 83 LLM FineTuning Code Samples: 83 AI Evaluation Metrics 87 1. Classification Metrics: 87 2. Regression Metrics: 87 3. Natural Language Processing (NLP) Metrics: 88 4. Clustering Metrics: 88 5. Ranking Metrics: 88 6. Advanced Metrics: 89 7. Multiclass and Multilabel Metrics: 89 8. Fairness and Bias Metrics: 89 Conclusion 90 Appendix A: External References 90 Blogs 90 Articles: 90 PDfs: 90 Code: 91 LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024 7
  • 8. LLMs and GenAI Simplified: An Easy Path to Understanding LLMs Simplified: An Easy Path to Understanding [DRAFT] About Author Srini Pusuluri - M.Tech IIT Kharagpur FMR Distinguished Scientist in Indian Space and Defence, Salesforce CRM and AI Architect Senior Salesforce (SFDC), AI, and CRM Program Architect, highly skilled in integrating cutting-edge technologies like artificial intelligence and customer relationship management (CRM) platforms. With over 20 years of IT experience (including 12 years in CRM/Salesforce and 5 years in AI), the author is a recognized leader in designing and delivering innovative AI and CRM solutions across industries. They have extensive expertise in security, multi-org setups, data integration, design patterns, DevOps, and AI strategy, and hold 20 Salesforce and 5 AI certifications. His career spans roles at prominent organizations such as Google, Elastic, GE, AT&T, IBM, and USAA, where they have successfully led large-scale digital transformation projects. his LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024 8
  • 9. LLMs and GenAI Simplified: An Easy Path to Understanding responsibilities include delivering architecture solutions for BILL CRM, developing Gen AI Agentforce solutions like sentiment analysis, text summarization, and chat/call analytics, and implementing high-volume AI projects involving Einstein Bots, Five9 CTI, Omni-Channel, and Service Cloud. The author is also known for managing complex B2C Salesforce implementations with millions of accounts and users, ensuring robust IT governance, and coordinating with multiple implementation partners. In addition, their expertise extends to Sales Cloud, Service Cloud, CPQ, and handling large-scale data migrations, including Zendesk-to-Salesforce migrations and acquisition-based org mergers. As a trainer and R&D leader, they have pioneered integrating LLMs (Large Language Models), such as XGen, LLaMA, and ChatGPT, into Salesforce for Copilot AI solutions. Their recent work focuses on applying fine-tuning techniques and AI strategy to enhance enterprise CRM systems and customer data platforms (CDPs). With a career marked by over 40 successful projects, they are not only a skilled architect but also a thought leader in AI-driven CRM innovation, sharing insights through public speaking and training engagements Preface Motivation Behind the Web Book on LLM Modeling and Fine-tuning The impetus for writing this web book on Large Language Models (LLMs) and fine-tuning stems from a significant gap in available resources. While there is an abundance of content on foundational AI principles, there is no comprehensive guide that consolidates the complexities of LLM architecture, model customization, and the nuanced processes involved in fine-tuning for specialized applications. For professionals navigating the cutting edge of AI—whether for NLP tasks, chatbot implementations, or tailored business solutions—the knowledge scattered across various research papers, tutorials, and forums can be overwhelming. As an expert with deep experience across AI and CRM projects, working with companies like Google, IBM, and Elastic, the author has recognized that mastering LLMs requires more than just an understanding of neural networks or algorithms. It involves strategic insights into how these models can be adapted, scaled, and integrated into complex systems while maintaining performance, security, and accuracy. Fine-tuning an LLM demands a blend of technical precision and creative problem-solving, where understanding the target domain is just as important as the model's architecture. LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024 9
  • 10. LLMs and GenAI Simplified: An Easy Path to Understanding The motivation for this book emerged from seeing countless AI professionals and developers struggle to assemble coherent strategies from fragmented sources, especially in the fast-evolving field of LLMs. Having fine-tuned models for diverse business applications—from customer support chatbots to AI-driven decision-making platforms—the author recognized the need for a definitive resource. This web book is designed to be that resource, offering clear, structured insights that guide readers through the entire process, from model selection and training to deployment and optimization. By distilling years of experience in AI modeling and LLM customization, the author aims to provide professionals with a go-to reference, empowering them to confidently navigate the complexities of LLMs and leverage their full potential for specialized use cases. Definitions Generative AI (Gen AI): A branch of artificial intelligence that can generate new content, such as text, images, or music, from given inputs. It’s widely used in natural language processing (NLP), image creation, and other tasks where the AI learns patterns from data and produces creative outputs based on that learning. Large Language Model (LLM): A type of neural network model trained on vast amounts of text data to understand and generate human-like language. Examples include GPT (Generative Pre-trained Transformer) models, such as GPT-4. LLMs are capable of performing a variety of tasks like translation, summarization, text generation, and answering questions. Parameter-Efficient Fine-Tuning (PEFT): A technique for fine-tuning large models like LLMs by modifying only a small number of parameters, keeping the majority of the original pre-trained model intact. PEFT is efficient in terms of memory and computation, making it useful when adapting large models for specific tasks. Low-Rank Adaptation (LoRA): A specific form of PEFT that decomposes large parameter matrices into low-rank matrices during fine-tuning. This reduces the number of parameters to update, making training faster and more resource-efficient, especially useful for adapting LLMs to new tasks. Tokens/Tokenization: A token is a unit of text that a model processes. Tokenization is the process of splitting text into smaller units (tokens), which can be as small as characters or as large as whole words, depending on the model. For instance, the word "chatbot" might be split into two tokens: "chat" and "bot". LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024 10
  • 11. LLMs and GenAI Simplified: An Easy Path to Understanding Embedding: A mathematical representation of words, phrases, or other data in a continuous vector space. In NLP, embeddings are used to capture the meaning of words based on their context and relationships with other words. Word2Vec and BERT are examples of models that create word embeddings. Catastrophic Forgetting: A phenomenon that occurs when a machine learning model forgets previously learned information while being trained on new tasks. In the context of LLMs, catastrophic forgetting can happen during fine-tuning when the model is over-optimized for the new task and loses generalization capabilities. Attention Mechanism: A technique in deep learning that allows models to focus on specific parts of the input when generating output, improving their ability to capture relationships between distant words in text. It is the key innovation behind transformers and LLMs. Transformer Architecture: The underlying architecture for LLMs like GPT. It uses self-attention mechanisms to process input data in parallel, making it highly efficient for tasks that involve long sequences of text. Pre-training: The initial phase of training an LLM on a large dataset where the model learns general language patterns and knowledge. During pre-training, models are usually trained using unsupervised learning on vast amounts of text data. Fine-Tuning: The process of further training a pre-trained model on a specific dataset or for a specific task to improve performance in that area. Fine-tuning helps the model adapt to specialized domains while retaining its general knowledge. Prompting: A method used to guide LLMs into generating specific outputs by providing context or instructions within the input. A prompt is the initial text given to the model that defines the type of response you want. Zero-Shot Learning: A method where an LLM performs a task without any specific fine-tuning for that task. The model relies solely on the knowledge it gained during pre-training to generate responses. Few-Shot Learning: A technique in which the model is provided with a few examples (in the prompt) of how a task should be performed before generating an answer. This helps the model adapt to specific types of tasks without full fine-tuning. Context Window: The amount of text (measured in tokens) that an LLM can consider at once while generating responses. Models have a fixed limit on the number of tokens they can handle at a time. If the text exceeds the context window, the model may forget earlier parts of the input. Temperature: A parameter that controls the randomness of text generation in LLMs. Higher temperature values result in more random and diverse outputs, while lower values make the model’s responses more deterministic and focused. LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024 11
  • 12. LLMs and GenAI Simplified: An Easy Path to Understanding Top-k Sampling: A method for text generation where the model selects the next token from the top k most probable tokens. This adds diversity to the generated text, preventing the model from always picking the highest probability token. Top-p Sampling (Nucleus Sampling): A more flexible version of top-k sampling where the model chooses the next token from the smallest possible set of tokens that have a cumulative probability of p. This method ensures that token choices are both diverse and probabilistically consistent. Latent Space: In machine learning, latent space refers to the compressed, hidden representation of data within a model. For LLMs, the latent space represents abstract, high-dimensional relationships between words, sentences, or entire documents, enabling the model to reason and generate language. Autoregressive Model: A type of model that generates the next token in a sequence based on previously generated tokens. GPT models are autoregressive because they predict one word at a time, conditioned on the words that came before. Masked Language Model: A model that learns by predicting masked-out words within a sentence. BERT is an example of a masked language model, which improves understanding of context and relationships in text by learning to reconstruct sentences. Gradient Descent: An optimization algorithm used to train machine learning models by minimizing the loss function. During training, the model updates its parameters based on the gradient of the loss function to find the optimal solution. Loss Function: A mathematical function that measures how well the model's predictions match the actual data. The goal of training a model is to minimize the loss function, which indicates the model's performance in learning from data. Overfitting: A condition where a model learns to perform very well on the training data but fails to generalize to new, unseen data. Overfitting occurs when the model becomes too specialized to the specific patterns of the training set. Underfitting: When a model is too simple to capture the underlying patterns in the data, leading to poor performance both on the training and testing datasets. Regularization: Techniques used to prevent overfitting by adding constraints or penalties to the model’s complexity. Common forms of regularization include L1 and L2 regularization, as well as dropout, which randomly deactivates certain neurons during training. Backpropagation: The process of updating a neural network's weights by calculating the gradient of the loss function with respect to each weight, and then using this information to make adjustments. This is done iteratively to improve the model’s predictions. Dropout: A regularization technique where a random set of neurons is ignored during training, preventing the model from relying too heavily on specific neurons and improving generalization. LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024 12
  • 13. LLMs and GenAI Simplified: An Easy Path to Understanding Epoch: A single pass through the entire training dataset. During each epoch, the model's parameters are updated multiple times, depending on the size of the dataset and the chosen batch size. Batch Size: The number of training examples used in one iteration of updating the model’s parameters. A larger batch size allows the model to take into account more information per update but requires more computational resources. Gradient Clipping: A technique to prevent exploding gradients during backpropagation by limiting the size of the gradients during training. It helps to stabilize and accelerate model training. Exploding/Vanishing Gradients: Problems in neural network training where gradients become too large (exploding) or too small (vanishing), which can make it difficult to update the model’s parameters effectively. Beam Search: A search algorithm used in text generation that explores multiple possible sequences simultaneously, keeping track of the most promising ones. This method helps improve the quality of generated text by considering various possible continuations. Bias and Variance: Bias refers to errors introduced by overly simplistic models that fail to capture the complexity of the data (underfitting). Variance refers to errors introduced by models that are too complex and capture noise along with the data (overfitting). Neural Architecture Search (NAS): The process of automating the design of neural network architectures. Instead of manually designing the architecture, NAS explores different configurations to find the optimal structure for a specific task. Knowledge Distillation: A process where a smaller model (student) is trained to mimic the predictions of a larger, more complex model (teacher). The goal is to create a lightweight version of a model that performs similarly but with fewer resources. Multi-Head Attention: An extension of the attention mechanism used in transformer models like GPT. It allows the model to focus on different parts of the input sequence at the same time (i.e., multiple "heads"), improving the ability to capture various relationships in the data. Self-Attention: A mechanism that relates different words in a sequence to each other, even if they are far apart. Each word in a sequence attends to every other word, allowing the model to better understand context and relationships. Cross-Attention: A type of attention mechanism where one sequence (like a query) attends to another sequence (like a context or memory). Cross-attention is commonly used in tasks like text generation where the output sequence needs to refer to an input sequence (e.g., in translation). Positional Encoding: Since transformers do not inherently understand the order of tokens (unlike RNNs), positional encoding is added to input embeddings to give the model information about the position of each token in a sequence. LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024 13
  • 14. LLMs and GenAI Simplified: An Easy Path to Understanding Unsupervised Learning: A type of machine learning where the model is trained on data without explicit labels. The model learns patterns and structures in the data on its own. Many LLMs are pre-trained using unsupervised learning on large corpora of text. Transfer Learning: A technique in which a model trained on one task (or a large general dataset) is adapted to a different, often more specific task. Fine-tuning LLMs on specific datasets is a common example of transfer learning. Gradient Accumulation: A technique used during training to simulate a large batch size on smaller hardware. Gradients are accumulated over several smaller batches before performing an update step, making training more efficient with limited resources. Batched Inference: A process where multiple inputs are processed together in a single forward pass through the model. This is commonly done in LLMs to improve the efficiency and speed of generating responses for multiple queries at the same time. Weight Sharing: A technique used in model architectures like transformers, where the same parameters (weights) are reused across different layers or parts of the network, reducing the number of trainable parameters and improving efficiency. Layer Normalization: A normalization technique applied to the inputs of a neural network layer to stabilize training by reducing internal covariate shift. It's used extensively in transformer-based models. Layer Freezing: A technique where certain layers of a pre-trained model are "frozen" (i.e., their weights are not updated during training) to retain the original knowledge, while other layers are fine-tuned for specific tasks. Sparse Attention: An optimization of the standard attention mechanism where only a subset of the input tokens are attended to, rather than all tokens. This reduces the computational complexity, especially for long sequences. Mixture of Experts (MoE): A model architecture that uses multiple sub-models (experts) and dynamically selects which experts to activate based on the input. MoE models can scale to very large parameter sizes while reducing the amount of computation required for each input. Encoder-Decoder Architecture: A neural network structure where the encoder processes the input sequence into a latent representation, and the decoder generates the output sequence from that representation. This architecture is commonly used in tasks like machine translation. Gradient-Free Optimization: A class of optimization methods that do not rely on gradient information (like backpropagation) to update the model’s parameters. These techniques are often used in reinforcement learning and neural architecture search. Attention Masking: A technique used in transformer models to prevent the model from attending to certain tokens in the sequence. For example, in autoregressive models like GPT, a causal mask is applied to ensure that the model only attends to previous tokens and not future ones during training. LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024 14
  • 15. LLMs and GenAI Simplified: An Easy Path to Understanding Adversarial Training: A technique where the model is trained to defend against adversarial attacks—small, carefully crafted perturbations to the input that can trick the model into making incorrect predictions. GAN (Generative Adversarial Network): A type of generative model consisting of two networks—a generator and a discriminator—that are trained together. The generator tries to create realistic outputs, while the discriminator tries to distinguish between real and generated data. Contrastive Learning: A technique where the model learns to differentiate between similar and dissimilar pairs of data points. This is often used in tasks like image recognition and embeddings, where the model learns to group similar data points in the latent space. Knowledge Graph: A structured representation of knowledge where entities (such as people, places, or things) are nodes, and relationships between them are edges. Knowledge graphs are often used in conjunction with LLMs to enhance reasoning and factual recall. Curriculum Learning: A training strategy where the model is first trained on simpler tasks or data and gradually introduced to more complex examples. This mirrors the human learning process and can lead to improved performance and generalization. Distillation Loss: The loss function used during knowledge distillation, where a smaller student model is trained to mimic the outputs of a larger teacher model. The loss measures the difference between the student's predictions and the teacher’s predictions. Hard vs. Soft Attention: In hard attention, only one part of the input is selected to focus on (discrete attention), while in soft attention, the model assigns different weights to different parts of the input (continuous attention). Perplexity: A metric used to evaluate the performance of language models. It measures how well a model predicts a sample, with lower perplexity indicating better performance. In essence, it shows how "confused" the model is in generating a sequence. Hybrid Model: A model that combines multiple machine learning approaches or architectures, such as combining rule-based systems with LLMs or integrating neural networks with traditional algorithms. Prompt Engineering: The process of designing and optimizing the prompts given to LLMs to elicit the best possible responses for a specific task. It involves refining the input structure, using task-specific instructions, and experimenting with different prompt formats. Task-Specific Fine-Tuning: Fine-tuning an LLM for a very specific task, such as medical question-answering or legal document analysis. This involves training the model on a dataset that is highly specialized for the desired task. Hyperparameters: Parameters that control the learning process of a machine learning model, such as the learning rate, batch size, number of layers, and attention heads. Hyperparameter tuning is critical for optimizing model performance. LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024 15
  • 16. LLMs and GenAI Simplified: An Easy Path to Understanding Gradient Descent Optimizers (Adam, SGD, RMSprop): Algorithms used to update the weights of a model during training. Adam (Adaptive Moment Estimation) is one of the most popular optimizers due to its efficiency and ability to handle sparse gradients. Latent Variable Model: A model that assumes the data is generated by underlying, unobserved variables (latent variables). Variational autoencoders (VAEs) are an example of a latent variable model. Long-Short Term Memory (LSTM): A type of recurrent neural network (RNN) architecture designed to capture long-range dependencies in sequential data, addressing the issue of vanishing gradients. BERT (Bidirectional Encoder Representations from Transformers): A transformer-based model that uses a masked language model approach to pre-train a model in both directions (left-to-right and right-to-left), improving contextual understanding. RoBERTa (Robustly Optimized BERT Pretraining Approach): An optimized version of BERT that uses a larger dataset and better training techniques to improve the performance of transformer models. GPT (Generative Pre-trained Transformer): A class of transformer models that are pre-trained on large text corpora and fine-tuned for specific tasks. GPT models are autoregressive and generate text one word at a time, using previously generated words as input. Reinforcement Learning from Human Feedback (RLHF): A training technique where models are fine-tuned using reinforcement learning, with human evaluators providing feedback to improve the model’s outputs. This is used to align LLMs with human values and preferences. Chain of Thought Prompting: A prompting technique where the model is guided to reason through a problem step by step, rather than producing an answer immediately. This technique helps improve performance in tasks requiring logical reasoning or multi-step problem-solving. Multimodal Learning: A type of learning that combines data from multiple modalities (e.g., text, images, audio) to create models capable of understanding and generating across different types of data. Multimodal models can generate images from text, or captions from images. Transformer Decoder: The part of the transformer architecture used in autoregressive models like GPT. It takes in a sequence of tokens and generates output step-by-step, conditioned on the previous tokens. Transformer Encoder: The other half of the transformer architecture, used in models like BERT. It processes the entire input sequence at once, using bidirectional attention to understand the context around each token. Dynamic Quantization: A technique to reduce the size of LLMs by converting their weights to lower-precision formats (e.g., from 32-bit floating point to 8-bit integer) during inference. This improves computational efficiency without significantly affecting model performance. LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024 16
  • 17. LLMs and GenAI Simplified: An Easy Path to Understanding Post-Training Quantization: Applying quantization to a model after it has been trained, reducing the model size and improving inference speed. Unlike dynamic quantization, post-training quantization modifies the weights before inference. Knowledge Base Integration: The process of integrating external knowledge sources (like databases or knowledge graphs) into a language model to improve factual accuracy and recall. This helps the model access and use structured knowledge for tasks requiring deep expertise. Memory-Augmented Neural Networks (MANN): A type of neural network architecture that has an external memory bank, allowing it to store and retrieve information across long time frames. This enables the model to recall past experiences or facts when generating output. Unlikelihood Training: A training method used to reduce common generation errors in language models by explicitly penalizing unlikely or undesirable outputs during training. It helps prevent repetition, contradictions, and nonsensical outputs. Synthetic Data Generation: The process of generating artificial data (e.g., text, images) to augment a dataset. This can be used to train models when real-world data is scarce, or to balance class distributions in datasets. Curriculum Fine-Tuning: A technique where a model is fine-tuned on increasingly difficult datasets or tasks, helping it generalize better and improve performance on complex tasks. Task-Adaptive Pretraining (TAPT): A method of further pretraining a language model on domain-specific data before fine-tuning it for a particular task. TAPT helps the model adapt to the vocabulary, style, and structure of a specialized domain. Elastic Weight Consolidation (EWC): A regularization technique used to prevent catastrophic forgetting during fine-tuning by identifying important weights and ensuring that they are not modified too drastically during training on new tasks. Latent Dirichlet Allocation (LDA): A machine learning algorithm used for topic modeling. It identifies topics within a set of documents based on the distribution of words across those topics. LDA can be used to analyze and organize large text datasets. Contrastive Divergence: An approximation algorithm used to train probabilistic models like Restricted Boltzmann Machines (RBMs). It estimates the gradients of the model’s likelihood, helping the model learn a good representation of the data. Variational Inference: A method used to approximate complex probability distributions in Bayesian models. It is often used in VAEs (Variational Autoencoders) to approximate the posterior distribution of the latent variables. Beam Width: In beam search (used for text generation), the beam width determines how many sequences are kept for consideration at each step of the generation process. A larger beam width increases diversity but also computational cost. LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024 17
  • 18. LLMs and GenAI Simplified: An Easy Path to Understanding Entropy Regularization: A technique used to encourage exploration during reinforcement learning or text generation by adding a term to the loss function that penalizes low-entropy (i.e., overly confident) predictions. This leads to more diverse outputs. Bidirectional Attention Flow (BiDAF): A model architecture used for tasks like question answering, where the model attends to both the question and the context at the same time. This allows it to focus on the most relevant parts of the input when generating a response. Conditional Generation: The task of generating outputs based on specific input conditions, such as generating text based on a prompt or generating images based on text descriptions. Conditional generation is commonly used in models like GPT-3. Latent Semantic Analysis (LSA): A technique used to analyze relationships between a set of documents and the terms they contain by producing a set of concepts related to the documents and terms. LSA is used for tasks like information retrieval and text similarity. Hypernetwork: A neural network that generates the weights of another neural network. This technique allows a model to quickly adapt to new tasks by dynamically generating task-specific weights without requiring separate models. Monte Carlo Tree Search (MCTS): A search algorithm used in decision-making tasks, particularly in game AI. MCTS builds a search tree by sampling possible actions and outcomes, then selecting the most promising action based on statistical averages. Embedding Space: The continuous vector space where the embeddings (representations) of words, phrases, or other inputs are mapped. In this space, similar inputs are located closer together, reflecting their semantic similarity. Long-Range Dependencies: The relationships between words or tokens in a sequence that are far apart. Traditional models like RNNs struggle with long-range dependencies, but transformers handle them well through attention mechanisms. Exemplar Fine-Tuning: A technique where a few specific, well-chosen examples (exemplars) are used to fine-tune a large language model, allowing it to generalize better to the desired task. Self-Supervised Learning: A type of learning where the model generates its own labels from the input data, rather than relying on external annotations. This is common in LLMs, where the model learns to predict missing or future words in a sentence. Data Augmentation: Techniques used to increase the size and diversity of the training data by creating modified versions of the existing data (e.g., by applying transformations, noise, or sampling). Data augmentation is used to improve model generalization. Reinforcement Learning with Human Feedback (RLHF): A training technique where humans provide feedback on the quality of the model’s output, and this feedback is used to fine-tune the model through reinforcement learning. LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024 18
  • 19. LLMs and GenAI Simplified: An Easy Path to Understanding Sparse Neural Networks: Neural networks where many of the weights are set to zero, reducing the computational cost and memory footprint of the model. Sparsity can be introduced during training through techniques like pruning. Attention Dropout: A regularization technique applied to the attention mechanism in transformer models, where a fraction of the attention scores are randomly set to zero. This helps prevent overfitting and improves generalization. Structured Prediction: A type of prediction task where the output is a complex structure (e.g., a sentence, a tree, or a graph) rather than a single label or value. Sequence-to-sequence models are commonly used for structured prediction tasks like translation or parsing. Alignment Problem: A challenge in AI safety where the behavior of AI systems needs to be aligned with human goals, values, or intentions. Misalignment can lead to unintended consequences, especially in autonomous systems. Causal Language Modeling: A method where the model is trained to predict the next word in a sequence based only on the previous words. GPT models use causal language modeling to generate text in an autoregressive manner. Entropy: A measure of uncertainty or randomness in a model’s predictions. In language models, high entropy means the model is uncertain about the next word, while low entropy means the model is confident in its prediction. Perceptron: The simplest type of artificial neural network, consisting of a single layer of weights and an activation function. Perceptrons are the building blocks of more complex neural networks. Neural Tangent Kernel (NTK): A mathematical framework that helps understand the training dynamics of over-parameterized neural networks, providing insights into how large models behave during gradient descent. Graph Neural Network (GNN): A type of neural network designed to work with graph-structured data, where nodes represent entities and edges represent relationships. GNNs are used for tasks like social network analysis, recommendation systems, and molecular modeling. Hybrid Attention: A model that combines multiple forms of attention, such as self-attention and cross-attention, to improve performance on complex tasks where different types of context need to be considered simultaneously. Rationales: In interpretability, rationales are explanations or justifications for a model’s decisions. Rationales can be explicitly provided by the model as part of its output, helping users understand why certain predictions were made. Symmetry Breaking: In neural networks, symmetry breaking refers to the process of initializing the network weights randomly, which ensures that different neurons learn distinct features and prevents the model from getting stuck in unproductive learning configurations. LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024 19
  • 20. LLMs and GenAI Simplified: An Easy Path to Understanding Background "LLMs and GenAI Simplified" serves as a beginner-friendly guide to understanding Large Language Models (LLMs) and their profound impact on various fields, especially artificial intelligence (AI) and natural language processing (NLP). The book walks readers through the foundational concepts of LLMs, exploring their architecture, applications, performance benchmarks, and the ethical considerations surrounding their use. Here’s an overview of what the book covers in key areas: 1. Introduction to Large Language Models (LLMs) The book starts by introducing LLMs, which are advanced AI models trained on massive datasets to understand, generate, and process human language. It explains how these models like GPT (Generative Pre-trained Transformer), BERT (Bidirectional Encoder Representations from Transformers), and T5 (Text-To-Text Transfer Transformer) have transformed how machines interact with human language, providing contextually accurate answers, writing content, and even simulating conversations. The chapter also covers the basics of how LLMs leverage deep learning and neural networks, particularly transformer-based architectures, to handle enormous amounts of text data and make sense of language patterns. 2. LLM Architecture This section delves deep into the architecture that powers LLMs, focusing on how transformers form the backbone of these models. The authors break down the technical components in a simplified manner, including: ● Attention Mechanisms: How transformers use self-attention to focus on different parts of a sentence or phrase to capture meaning. LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024 20
  • 21. LLMs and GenAI Simplified: An Easy Path to Understanding ● Encoder-Decoder Models: A detailed look at models like BERT (encoder-based) and GPT (decoder-based), explaining how each processes text differently. ● Pre-training and Fine-tuning: The book covers the concept of pre-training on massive text corpora and how models are later fine-tuned for specific tasks like sentiment analysis, translation, or summarization. The section emphasizes how this architecture allows LLMs to scale effectively, enabling them to generate human-like text and perform complex language understanding tasks. 3. Applications of LLMs LLMs have far-reaching applications, and this section provides real-world examples of where they are being utilized. Key use cases discussed include: ● Chatbots and Virtual Assistants: How companies use LLMs to power intelligent chatbots like ChatGPT, which handle customer service, technical support, and personalized user experiences. ● Content Creation: LLMs' ability to write articles, blogs, product descriptions, and other forms of content generation, automating many repetitive tasks. ● Translation and Summarization: How models like BERT and GPT are used to translate languages and summarize large amounts of text, improving productivity in areas like media, law, and academia. ● Code Generation: Models like OpenAI’s Codex (an extension of GPT) are discussed for their role in generating programming code, reducing the workload for developers. ● Healthcare and Medicine: How LLMs assist in diagnosing, summarizing medical literature, and providing virtual consultations. 4. LLM Performance Benchmarks To evaluate the effectiveness and capabilities of LLMs, benchmarks are essential. This section explains some of the widely used benchmarks for comparing model performance, including: ● GLUE (General Language Understanding Evaluation): A benchmark for evaluating NLP tasks like sentiment analysis and text entailment. ● SQuAD (Stanford Question Answering Dataset): Focused on reading comprehension and answering questions based on text. ● SuperGLUE: A more challenging version of GLUE, used to evaluate models on a higher level of language understanding. The chapter helps readers understand how models are evaluated, the parameters that indicate good performance, and the need for continuous benchmarking as models evolve. 5. Governance, Ethics, and Responsible AI LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024 21
  • 22. LLMs and GenAI Simplified: An Easy Path to Understanding This section covers the critical topic of AI governance and ethical considerations in deploying LLMs. The book highlights: ● Bias in LLMs: The inherent biases in models trained on large, uncurated datasets and the importance of developing techniques to mitigate these biases. ● Privacy Concerns: How LLMs, when mishandled, could inadvertently reveal sensitive information contained in training data. ● Regulatory Frameworks: Current global efforts to regulate the use of AI, such as GDPR and emerging AI governance frameworks that promote transparency, fairness, and accountability. The authors stress the importance of developing Responsible AI practices that ensure LLMs are used ethically and avoid harmful consequences, like spreading misinformation or deepening societal inequalities. 6. Challenges and Future Directions The book concludes with a forward-looking perspective, discussing the challenges that LLMs face, such as the increasing computational power required to train these models, environmental concerns due to energy consumption, and the limits of generalization in language models. It also touches on future directions, including: ● Smaller, More Efficient Models: Efforts to create smaller models that retain high performance but require fewer resources. ● Continual Learning: Exploring the potential for LLMs to learn continuously without retraining from scratch. ● Human-AI Collaboration: A vision where LLMs augment human decision-making, combining AI efficiency with human judgment to solve complex problems. Conclusion "LLMs for Dummies" simplifies complex AI topics related to Large Language Models, making it an accessible entry point for anyone interested in how these models work, their applications, and the ethical implications of their widespread use. Through clear explanations, real-world examples, and practical insights, the book provides a comprehensive overview for both beginners and professionals looking to enhance their understanding of LLMs. CHAPTER -0 : FUNDAMENTALS Neural Networks Basics LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024 22
  • 23. LLMs and GenAI Simplified: An Easy Path to Understanding Let's walk through how a simple three-layer neural network works with an example step by step! Overview: Imagine we have a simple neural network to classify if a fruit is an apple or a banana based on four input features, like size, color, weight, and shape. The neural network has: ● 4 input neurons (representing the input features), ● 2 output neurons (representing the two possible outcomes: apple or banana), ● 3 hidden neurons (in one hidden layer). We'll also learn how weights and biases are adjusted using gradient descent, a process to make the neural network "learn." Step 1: Set up the Neural Network Structure ● Input Layer: 4 input neurons (size, color, weight, shape). ● Hidden Layer: 3 hidden neurons (which will do some calculations based on the inputs). ● Output Layer: 2 output neurons (one for apple and one for banana). Each neuron in one layer is connected to every neuron in the next layer through weights (numbers that determine the strength of connections). Step 2: Initialize Weights and Biases At the start, each connection between neurons has a random weight, and each neuron has a bias (an extra number added to the neuron's calculation). For simplicity, let's assume the weights between layers are initially: ● From Input to Hidden: Random values like 0.5, 0.2, -0.3, etc. ● From Hidden to Output: Random values like 0.4, -0.2, 0.1, etc. Biases for each neuron are also random, say 0.1 for now. Step 3: Forward Propagation In forward propagation, the network takes the inputs and calculates the output. Here's how it works: 1. Input Layer: ○ Let's say the input features (size, color, weight, shape) are: [2, 1, 0.5, 1.5]. LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024 23
  • 24. LLMs and GenAI Simplified: An Easy Path to Understanding 2. Hidden Layer: ○ Each neuron in the hidden layer calculates a weighted sum of the inputs. The formula for each hidden neuron is: output=activation function(w1×x1+w2×x2+w3×x3+w4×x4+bias)text{output} = text{activation function} (w_1 times x_1 + w_2 times x_2 + w_3 times x_3 + w_4 times x_4 + text{bias})output=activation function(w1​ ×x1​ +w2​ ×x2​ +w3​ ×x3​ +w4​ ×x4​ +bias) ■ For example, for the first hidden neuron, if weights are w1 = 0.5, w2 = -0.2, w3 = 0.1, and w4 = -0.3, the output would be: output=activation(0.5×2+(−0.2)×1+0.1×0.5+(−0.3)×1.5+0.1)text{output} = text{activation}(0.5 times 2 + (-0.2) times 1 + 0.1 times 0.5 + (-0.3) times 1.5 + 0.1)output=activation(0.5×2+(−0.2)×1+0.1×0.5+(−0.3)×1.5+0.1) After adding the bias, we apply an activation function (usually a function like ReLU or sigmoid) to make sure the output is non-linear. 3. Output Layer: ○ Each neuron in the output layer also calculates a weighted sum from the hidden layer’s outputs. If there are 2 output neurons (apple or banana), each output neuron will give a score. For example, one score might indicate how "likely" the fruit is an apple and the other for a banana. Step 4: Calculate Loss (Error) After calculating the output, we compare it with the actual result (whether the fruit is actually an apple or banana). This is done using a loss function. Let’s say our prediction is [0.8 for apple, 0.2 for banana], but the actual result is [1 for apple, 0 for banana]. We calculate the loss or error, which tells us how far our prediction is from the truth. A common loss function is mean squared error: Loss=12∑(prediction−actual)2text{Loss} = frac{1}{2} sum (text{prediction} - text{actual})^2Loss=21​ ∑(prediction−actual)2 Step 5: Backpropagation Now that we know the error, we need to reduce it by adjusting the weights and biases. This process is called backpropagation. In backpropagation: 1. Calculate Gradients: We calculate how much each weight contributed to the error. This is done using derivatives (slopes) to see how the output would change if we slightly adjust the weight. 2. Adjust Weights and Biases: Using gradient descent, we adjust the weights and biases to reduce the error. Gradient descent changes weights by a small amount in the LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024 24
  • 25. LLMs and GenAI Simplified: An Easy Path to Understanding direction that reduces the loss. The new weights are calculated as: new weight=old weight−learning rate×∂loss∂weighttext{new weight} = text{old weight} - text{learning rate} times frac{partial text{loss}}{partial text{weight}}new weight=old weight−learning rate×∂weight∂loss​ ○ The learning rate is a small number (like 0.01) that controls how fast the network updates the weights. If the learning rate is too high, the network may "miss" the optimal solution. If it's too low, the training will be slow. Step 6: Repeat the Process We repeat the process (forward propagation, loss calculation, backpropagation) many times. Each time, the weights and biases are adjusted slightly, and the network learns to make better predictions. Simple Example: Imagine the network starts with random weights and makes a prediction of [0.8 for apple, 0.2 for banana] when the true answer is [1 for apple, 0 for banana]. After calculating the loss, the network sees that it needs to increase the "apple" output and decrease the "banana" output. Backpropagation will slightly change the weights so that next time, the output is closer to the correct answer, such as [0.9 for apple, 0.1 for banana]. Over many repetitions, the network will learn to correctly classify the fruit! Summary: 1. Start with random weights and biases. 2. Forward propagate to calculate the network's output. 3. Calculate the loss based on how far the prediction is from the actual result. 4. Backpropagate the error to adjust the weights and biases using gradient descent. 5. Repeat the process until the network makes accurate predictions. This is how a simple three-layer neural network learns to classify data step by step Install TensorFlow: You can install TensorFlow using pip if you don’t have it already: bash LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024 25
  • 26. LLMs and GenAI Simplified: An Easy Path to Understanding Copy code pip install tensorflow 1. 2. Run the Code: Use the code below on your local machine after installing TensorFlow. python Copy code import numpy as np import tensorflow as tf from tensorflow.keras.models import Sequential from tensorflow.keras.layers import Dense # Step 1: Create a sample dataset (features: size, color, weight, shape; label: apple or banana) # Let's assume we have a dataset of 10 fruits, and their features are normalized to be between 0 and 1. # 0 -> banana, 1 -> apple data = np.array([ [0.8, 0.7, 0.6, 0.5], # Apple [0.3, 0.2, 0.4, 0.1], # Banana [0.9, 0.8, 0.7, 0.6], # Apple [0.2, 0.1, 0.3, 0.2], # Banana [0.85, 0.75, 0.65, 0.55], # Apple [0.1, 0.05, 0.2, 0.15], # Banana [0.75, 0.7, 0.8, 0.65], # Apple [0.15, 0.2, 0.3, 0.25], # Banana [0.9, 0.85, 0.9, 0.75], # Apple [0.25, 0.2, 0.35, 0.3] # Banana ]) # Labels (1 for apple, 0 for banana) labels = np.array([ [1], # Apple [0], # Banana [1], # Apple [0], # Banana [1], # Apple [0], # Banana [1], # Apple LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024 26
  • 27. LLMs and GenAI Simplified: An Easy Path to Understanding [0], # Banana [1], # Apple [0] # Banana ]) # Step 2: Build a Neural Network Model using Keras model = Sequential() model.add(Dense(3, input_dim=4, activation='relu')) # 3 neurons in the hidden layer, 4 input features model.add(Dense(2, activation='softmax')) # 2 output neurons (apple and banana), softmax for classification # Step 3: Compile the model model.compile(loss='sparse_categorical_crossentropy', optimizer='adam', metrics=['accuracy']) # Step 4: Train the model model.fit(data, labels, epochs=500, verbose=0) # Step 5: Test the model with a new fruit input test_input = np.array([[0.82, 0.76, 0.63, 0.58]]) # Testing with a new input similar to an apple prediction = model.predict(test_input) print("Prediction (Apple or Banana):", prediction) This script creates a simple neural network for classifying fruits as apples or bananas and trains it using a small dataset. It then tests the model with a new input and prints the prediction 1. Neural Network Models A neural network model is a way to organize a network of "neurons" (like a brain) into layers, each layer doing a job of learning from the data. Each neuron takes in numbers, does some math, and sends out a result to the next layer. ● Dense Layer (Fully Connected Layer): In a dense layer, every neuron is connected to every other neuron in the next layer. Think of it as a web where all inputs influence all outputs. It's the most common type of layer in neural networks. LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024 27
  • 28. LLMs and GenAI Simplified: An Easy Path to Understanding 2. Activation Functions Neurons in a network need to decide whether to pass on information or not. The activation function is the rule that helps them decide. It changes the output into something manageable, often between 0 and 1 or some small range. Here are some common activation functions: ● ReLU (Rectified Linear Unit): This is the most common activation function. It turns any negative number into zero and keeps positive numbers as they are. So, if the neuron gives a result of -5, it becomes 0. If it gives 3, it stays 3. ReLU is popular because it helps models learn faster. f(x)=max⁡ (0,x)f(x) = max(0, x)f(x)=max(0,x) ● Sigmoid: Sigmoid squashes any number to a range between 0 and 1, which is useful when you want the output to be a probability (like: is this an apple?). f(x)=11+e−xf(x) = frac{1}{1 + e^{-x}}f(x)=1+e−x1​ If x is a big positive number, sigmoid will be close to 1, and if x is a big negative number, it will be close to 0. ● Softmax: This is usually used in the output layer for classification tasks when there are multiple categories (like classifying if an image is of a cat, dog, or bird). It converts numbers into probabilities that add up to 1. f(xi)=exi∑j=1nexjf(x_i) = frac{e^{x_i}}{sum_{j=1}^{n} e^{x_j}}f(xi​ )=∑j=1n​ exj​ exi​ ​ 3. Loss Functions A loss function is used to tell how "wrong" the network's predictions are compared to the actual correct answers. It’s like a score that measures how much error is in the predictions, and the goal of training is to minimize this loss. Some common loss functions: ● Mean Squared Error (MSE): This is used when predicting real numbers (like predicting the price of a house). It measures the average squared difference between predicted and actual values. MSE=1n∑i=1n(ypredicted−yactual)2text{MSE} = frac{1}{n} sum_{i=1}^{n} (y_{text{predicted}} - y_{text{actual}})^2MSE=n1​ i=1∑n​ (ypredicted​ −yactual​ )2 ● Cross-Entropy Loss: This is used for classification problems, where the goal is to choose between multiple classes (like apple or banana). It penalizes wrong predictions more heavily when the predicted probability is far from the actual answer. Loss=−∑i=1nyactual⋅log⁡ (ypredicted)text{Loss} = -sum_{i=1}^{n} y_{text{actual}} cdot log(y_{text{predicted}})Loss=−i=1∑n​ yactual​ ⋅log(ypredicted​ ) Cross-entropy loss helps with tasks where you are choosing between categories (like is the fruit an apple or a banana?). 4. Optimizers LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024 28
  • 29. LLMs and GenAI Simplified: An Easy Path to Understanding Optimizers are algorithms that adjust the weights (the numbers that connect neurons) in the neural network to minimize the loss. It helps the network "learn" by improving its predictions step by step. Some common optimizers: ● SGD (Stochastic Gradient Descent): This is a simple optimizer that adjusts the weights based on how much the loss would decrease if you changed the weights a little. "Stochastic" means it updates the weights after looking at one or a few examples rather than the whole dataset. wnew=wold−learning rate×∂Loss∂ww_{text{new}} = w_{text{old}} - text{learning rate} times frac{partial text{Loss}}{partial w}wnew​ =wold​ −learning rate×∂w∂Loss​ It is slower and can get stuck, but it’s straightforward and sometimes works well. ● Adam (Adaptive Moment Estimation): Adam is a more advanced optimizer that adjusts the learning rate dynamically based on how the error is changing. It tends to work better than plain SGD in practice. ○ It keeps track of the moving averages of gradients (the slopes that tell how much the loss will change with a small change in weights) and adjusts the learning rate accordingly. 5. Metrics Metrics are ways of measuring how well the network is performing. It’s like getting a scorecard for how well the model is doing during training. Some common metrics: ● Accuracy: This is used for classification problems. It measures how many times the network got the correct answer. Accuracy=Number of correct predictionsTotal number of predictionstext{Accuracy} = frac{text{Number of correct predictions}}{text{Total number of predictions}}Accuracy=Total number of predictionsNumber of correct predictions​ For example, if the model predicts whether a fruit is an apple or banana, and it gets 8 out of 10 predictions right, the accuracy would be 80%. ● Precision, Recall, F1 Score: These are used when dealing with more complex tasks like detecting specific events (e.g., a network trying to find spam emails). These metrics go beyond simple accuracy and help measure how well the model is detecting true positives or avoiding false positives. Putting It All Together: Imagine we are building a model to classify fruits (apple or banana). Here's how all the parts work together: LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024 29
  • 30. LLMs and GenAI Simplified: An Easy Path to Understanding 1. Model: We create a neural network model with dense layers. 2. Activation Functions: We use ReLU in the hidden layers to make decisions and softmax in the output layer to predict the probability of apple vs. banana. 3. Loss Function: We choose cross-entropy loss because this is a classification task. 4. Optimizer: We pick Adam because it helps the model learn faster and more effectively. 5. Metrics: We track accuracy to see how often the model is making correct predictions. With each step, the neural network adjusts its weights using the optimizer to reduce the loss, which improves its accuracy over time. In addition to the Sequential model, which is the most straightforward type of neural network model in Keras (or TensorFlow), there are other types of models that allow for more flexibility, especially for complex neural networks. Here are a few common types: 1. Sequential Model ● The Sequential model is the simplest neural network model. Layers are stacked one after the other in a straight line, which is useful for simple, feed-forward neural networks. ● It works well when the model can be described as a sequence of layers where the output of one layer is the input to the next. Example: python Copy code from tensorflow.keras.models import Sequential from tensorflow.keras.layers import Dense model = Sequential() model.add(Dense(10, input_shape=(4,), activation='relu')) model.add(Dense(2, activation='softmax')) 2. Functional API ● The Functional API is more flexible than the Sequential model and allows for the creation of complex models where layers may have multiple inputs or outputs, share LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024 30
  • 31. LLMs and GenAI Simplified: An Easy Path to Understanding layers, or connect layers in non-linear ways (like in branching networks or residual networks). ● This is useful when you need more control over how layers are connected. Example: python Copy code from tensorflow.keras.models import Model from tensorflow.keras.layers import Input, Dense # Input layer inputs = Input(shape=(4,)) # Hidden layer x = Dense(10, activation='relu')(inputs) # Output layer outputs = Dense(2, activation='softmax')(x) # Build the model model = Model(inputs=inputs, outputs=outputs) ● Here, you define the connections between layers explicitly, which is useful for models like multi-input/multi-output networks or when layers are reused. 3. Subclassing Model ● Model Subclassing is the most flexible way to create custom models by subclassing the Model class. It allows you to define your own forward pass (how the inputs move through the network) and gives full control over the model's behavior. ● This is useful for very customized architectures where neither Sequential nor Functional API models are sufficient. Example: python Copy code from tensorflow.keras import Model from tensorflow.keras.layers import Dense class CustomModel(Model): def __init__(self): LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024 31
  • 32. LLMs and GenAI Simplified: An Easy Path to Understanding super(CustomModel, self).__init__() self.dense1 = Dense(10, activation='relu') self.dense2 = Dense(2, activation='softmax') def call(self, inputs): x = self.dense1(inputs) return self.dense2(x) # Instantiate the model model = CustomModel() ● You define the layers in __init__ and control the forward pass in the call method. This allows for maximum flexibility in building the model. 4. Model with Shared Layers ● Some models use shared layers, where the same layer is reused multiple times in different parts of the model. This is often used in models like Siamese networks (used for tasks like face recognition), where the same network processes two different inputs. Example: python Copy code from tensorflow.keras.models import Model from tensorflow.keras.layers import Input, Dense # Shared layer shared_dense = Dense(10, activation='relu') # Inputs input1 = Input(shape=(4,)) input2 = Input(shape=(4,)) # Shared processing output1 = shared_dense(input1) output2 = shared_dense(input2) # Create the model model = Model(inputs=[input1, input2], outputs=[output1, output2]) LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024 32
  • 33. LLMs and GenAI Simplified: An Easy Path to Understanding ● Here, the Dense(10) layer is shared, meaning both inputs pass through the same layer, which can be useful in tasks where we want to learn common features. 5. Multi-Input and Multi-Output Models ● These models take multiple inputs and produce multiple outputs, useful in complex applications like question-answering systems, recommendation systems, or image captioning. Example: python Copy code from tensorflow.keras.models import Model from tensorflow.keras.layers import Input, Dense, concatenate # Inputs inputA = Input(shape=(4,)) inputB = Input(shape=(6,)) # Hidden layers for both inputs x = Dense(8, activation='relu')(inputA) y = Dense(8, activation='relu')(inputB) # Merge the outputs merged = concatenate([x, y]) # Final output z = Dense(1, activation='sigmoid')(merged) # Create the model model = Model(inputs=[inputA, inputB], outputs=z) ● Here, the model takes two different inputs (of different sizes) and combines them after separate processing, then produces one final output. 6. Autoencoders LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024 33
  • 34. LLMs and GenAI Simplified: An Easy Path to Understanding ● Autoencoders are a special type of neural network model used for unsupervised learning tasks like data compression or anomaly detection. They consist of two parts: an encoder that compresses the input and a decoder that reconstructs the input from the compressed version. Example: python Copy code from tensorflow.keras.models import Model from tensorflow.keras.layers import Input, Dense # Input layer input_layer = Input(shape=(4,)) # Encoder encoded = Dense(2, activation='relu')(input_layer) # Decoder decoded = Dense(4, activation='sigmoid')(encoded) # Autoencoder model autoencoder = Model(inputs=input_layer, outputs=decoded) ● The autoencoder reduces the dimensions of the data and then tries to reconstruct the original input. 7. GANs (Generative Adversarial Networks) ● GANs are a type of neural network model that consists of two parts: a generator that creates fake data and a discriminator that tries to tell the real data from the fake. GANs are used to generate new data like images or audio. Example (simplified structure): python Copy code from tensorflow.keras.models import Sequential from tensorflow.keras.layers import Dense # Generator model LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024 34
  • 35. LLMs and GenAI Simplified: An Easy Path to Understanding generator = Sequential() generator.add(Dense(10, input_dim=100, activation='relu')) generator.add(Dense(4, activation='sigmoid')) # Generates a 4-feature fake example # Discriminator model discriminator = Sequential() discriminator.add(Dense(10, input_dim=4, activation='relu')) discriminator.add(Dense(1, activation='sigmoid')) # Predicts whether the input is real or fake ● GANs are trained by making the generator fool the discriminator while the discriminator tries to get better at identifying fakes. Summary ● Sequential: Simple, layers stacked one after the other. ● Functional API: Flexible, allows for more complex architectures like multi-input, multi-output, or shared layers. ● Subclassing: Full control over the network's structure and forward pass. ● Shared Layers: Reuses the same layer in different parts of the model. ● Multi-Input/Output: Models that handle multiple inputs and outputs simultaneously. ● Autoencoders: For compression and reconstruction tasks. ● GANs: Models that generate new data by training two networks (generator and discriminator). Each of these models has specific uses and allows neural networks to solve a variety of problems, from simple classification to complex generative tasks. Here are some common types of loss functions: 1. Mean Squared Error (MSE) ● Type: Regression (predicting continuous values) ● Use Case: Used when predicting a real number (e.g., house price, temperature). ● How it works: It calculates the squared difference between predicted values and actual values, then averages it over all examples. LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024 35
  • 36. LLMs and GenAI Simplified: An Easy Path to Understanding MSE=1n∑i=1n(ypredicted−yactual)2text{MSE} = frac{1}{n} sum_{i=1}^{n} (y_{text{predicted}} - y_{text{actual}})^2MSE=n1​ i=1∑n​ (ypredicted​ −yactual​ )2 ○ Explanation: If the prediction is 5 and the actual value is 3, the difference is 222, and its square is 444. Squaring emphasizes larger differences, making the network learn to reduce large errors. 2. Mean Absolute Error (MAE) ● Type: Regression ● Use Case: Used when predicting continuous values, similar to MSE. ● How it works: It calculates the absolute difference between the predicted and actual values and averages it over all examples. MAE=1n∑i=1n∣ypredicted−yactual∣text{MAE} = frac{1}{n} sum_{i=1}^{n} |y_{text{predicted}} - y_{text{actual}}|MAE=n1​ i=1∑n​ ∣ypredicted​ −yactual​ ∣ ○ Explanation: MAE is similar to MSE, but it uses absolute differences instead of squares. This makes MAE less sensitive to outliers than MSE because it doesn’t square the error. 3. Binary Cross-Entropy (Log Loss) ● Type: Binary Classification ● Use Case: Used when classifying between two classes (e.g., cat or dog, apple or banana). ● How it works: It calculates the negative log of the predicted probability for the actual class. For binary classification, it looks at one output neuron that predicts a probability between 0 and 1. Loss=−(yactual⋅log⁡ (ypredicted)+(1−yactual)⋅log⁡ (1−ypredicted))text{Loss} = - left( y_{text{actual}} cdot log(y_{text{predicted}}) + (1 - y_{text{actual}}) cdot log(1 - y_{text{predicted}}) right)Loss=−(yactual​ ⋅log(ypredicted​ )+(1−yactual​ )⋅log(1−ypredicted​ )) ○ Explanation: If the actual label is 1 (e.g., it’s a cat), and the model predicts 0.8 (80% confidence), the loss will be small. But if the model predicts 0.1 (only 10% confidence it’s a cat), the loss will be large. This encourages the model to give high probabilities for correct predictions. 4. Categorical Cross-Entropy ● Type: Multi-class Classification ● Use Case: Used for classification when there are more than two classes (e.g., dog, cat, rabbit). ● How it works: It is similar to binary cross-entropy but works for multiple classes. The model predicts a probability distribution over several classes, and categorical cross-entropy calculates how well the predicted probabilities match the actual class. LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024 36
  • 37. LLMs and GenAI Simplified: An Easy Path to Understanding Loss=−∑i=1nyactual,i⋅log⁡ (ypredicted,i)text{Loss} = -sum_{i=1}^{n} y_{text{actual}, i} cdot log(y_{text{predicted}, i})Loss=−i=1∑n​ yactual,i​ ⋅log(ypredicted,i​ ) ○ Explanation: The loss is low when the predicted probability is high for the actual class. For example, if the actual class is "dog" and the model predicts 80% for "dog," the loss will be small. If it predicts 20%, the loss will be larger. 5. Sparse Categorical Cross-Entropy ● Type: Multi-class Classification ● Use Case: Similar to categorical cross-entropy but used when the labels are integers instead of one-hot encoded vectors. It’s useful for efficiency when you have many classes. ● How it works: It’s the same as categorical cross-entropy but expects the target labels to be integers (like 0, 1, 2 for dog, cat, rabbit) rather than one-hot encoded vectors. 6. Hinge Loss ● Type: Binary Classification (often used with Support Vector Machines) ● Use Case: Used in binary classification tasks, particularly in support vector machines (SVMs). ● How it works: Hinge loss ensures that the correct class has a margin of at least 1 over the incorrect class. It penalizes predictions that are wrong or too close to the decision boundary. Loss=max⁡ (0,1−yactual⋅ypredicted)text{Loss} = max(0, 1 - y_{text{actual}} cdot y_{text{predicted}})Loss=max(0,1−yactual​ ⋅ypredicted​ ) ○ Explanation: If the actual class is +1+1+1 and the predicted output is +0.9+0.9+0.9, the loss will be small. But if the predicted output is +0.1+0.1+0.1 or negative (wrong class), the loss will be large. 7. Huber Loss ● Type: Regression ● Use Case: A combination of MSE and MAE, used when you want to be robust against outliers while still penalizing large errors. ● How it works: For small errors, it behaves like MSE (squares the error), and for large errors, it behaves like MAE (linear). Loss={12(ypredicted−yactual)2for ∣ypredicted−yactual∣≤δ,δ(∣ypredicted−yactual∣−12δ)otherwisetext{Loss} = begin{cases} frac{1}{2}(y_{text{predicted}} - y_{text{actual}})^2 & text{for } |y_{text{predicted}} - y_{text{actual}}| leq delta, delta (|y_{text{predicted}} - y_{text{actual}}| - frac{1}{2} delta) & text{otherwise} end{cases}Loss={21​ (ypredicted​ −yactual​ )2δ(∣ypredicted​ −yactual​ ∣−21​ δ)​ for ∣ypredicted​ −yactual​ ∣≤δ,otherwise​ ○ Explanation: It’s useful when you want the best of both worlds: minimizing large errors like MSE but not being too sensitive to outliers like MAE. LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024 37
  • 38. LLMs and GenAI Simplified: An Easy Path to Understanding 8. Kullback-Leibler Divergence (KL Divergence) ● Type: Classification, often used in probabilistic models ● Use Case: Measures how one probability distribution differs from a reference distribution. Used in tasks like training variational autoencoders (VAE) and reinforcement learning. ● How it works: It calculates how one probability distribution (predicted) diverges from another (actual). KL(P∣∣Q)=∑P(x)⋅log⁡ (P(x)Q(x))text{KL}(P || Q) = sum P(x) cdot logleft(frac{P(x)}{Q(x)}right)KL(P∣∣Q)=∑P(x)⋅log(Q(x)P(x)​ ) ○ Explanation: If the predicted probability distribution is very different from the actual distribution, the loss will be high. This encourages the model to predict distributions closer to the true distribution. 9. Poisson Loss ● Type: Regression, often for count-based data ● Use Case: Used when predicting count data, such as the number of occurrences of an event (e.g., number of emails received in a day). ● How it works: It assumes that the output follows a Poisson distribution and penalizes predictions that are far from the actual count. Loss=ypredicted−yactual⋅log⁡ (ypredicted)text{Loss} = y_{text{predicted}} - y_{text{actual}} cdot log(y_{text{predicted}})Loss=ypredicted​ −yactual​ ⋅log(ypredicted​ ) ○ Explanation: The loss is small when the predicted count is close to the actual count, and large when the predicted count is very far off. Summary of Loss Functions by Type: ● Regression (predicting real numbers): ○ Mean Squared Error (MSE): Penalizes large errors heavily. ○ Mean Absolute Error (MAE): Penalizes all errors equally. ○ Huber Loss: A mix of MSE and MAE, less sensitive to outliers. ○ Poisson Loss: For count data. ● Binary Classification (two classes): ○ Binary Cross-Entropy: Used when classifying two categories (e.g., cat vs. dog). ○ Hinge Loss: Used in SVMs to maximize the margin between classes. ● Multi-Class Classification (more than two classes): ○ Categorical Cross-Entropy: For multi-class classification with one-hot encoded labels. ○ Sparse Categorical Cross-Entropy: For multi-class classification with integer labels. ○ KL Divergence: Measures the difference between predicted and actual probability distributions. LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024 38
  • 39. LLMs and GenAI Simplified: An Easy Path to Understanding CHAPTER-1: GenAI and LLM Large language models (LLMs) and generative AI (GenAI) are both types of artificial intelligence (AI) that can be used to create content, but they have different capabilities and uses: ​ Generative AI ​ A broad category of AI that can create a variety of content, such as text, images, videos, audio, and computer code. GenAI can be trained to respond to prompts or requests from users. For example, GenAI can be used to compose music, design graphics, or diagnose diseases from medical images. ​ LLMs ​ A specific type of generative AI that focuses on language-related tasks, such as generating and understanding human-like text. LLMs are trained on large amounts of data to create new combinations of text that mimic natural language. LLMs are used in a variety of applications, including customer service, drafting emails, and summarizing documents. LLMs and GenAI can be used together to enhance a variety of applications, such as ecommerce, conversational search, and enterprise search. For example, ecommerce websites can use LLMs and GenAI to personalize the shopping experience for customers LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024 39
  • 40. LLMs and GenAI Simplified: An Easy Path to Understanding 2. LLM Types 1. General-Purpose LLMs ● GPT (Generative Pre-trained Transformer): Developed by OpenAI, GPT models (like GPT-2, GPT-3, and GPT-4) are autoregressive models that generate coherent text based on input prompts. These are widely used in tasks like text generation, translation, and summarization. ● BERT (Bidirectional Encoder Representations from Transformers): Created by Google, BERT is a transformer model designed for understanding the context in both directions, making it effective for tasks like question answering and sentiment analysis. 2. Multilingual LLMs ● mBERT (Multilingual BERT): A variant of BERT that is trained on data from multiple languages, making it suitable for multilingual text processing tasks. ● XLM-R (Cross-lingual Language Model): A multilingual variant of BERT, trained on more than 100 languages, designed for cross-lingual tasks like translation and multilingual sentence representation. 3. Instruction-Following LLMs ● InstructGPT: A version of GPT-3 fine-tuned using Reinforcement Learning from Human Feedback (RLHF) to better follow user instructions. ● FLAN (Fine-Tuned Language Net): Developed by Google, FLAN is a fine-tuned model based on task instructions, making it highly effective in zero-shot and few-shot learning tasks. 4. Conversational LLMs ● DialoGPT: A GPT-2-based model fine-tuned for conversation, designed for more natural and coherent dialogues. ● BlenderBot: A conversational model developed by Meta, designed for long-term dialogue and more complex conversations. 5. Code Generation LLMs ● Codex: A GPT-based model trained by OpenAI specifically for generating code from natural language. It powers tools like GitHub Copilot. ● CodeBERT: A model designed for programming tasks like code generation, code search, and code summarization. LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024 40
  • 41. LLMs and GenAI Simplified: An Easy Path to Understanding 6. Specialized LLMs ● BioBERT: A version of BERT specialized for biomedical text mining and tasks in bioinformatics. ● ClinicalBERT: A variant of BERT trained on clinical notes and datasets for healthcare applications. ● FinBERT: Designed for financial sentiment analysis, FinBERT is a BERT model fine-tuned on financial text. 7. Knowledge-Enhanced LLMs ● T5 (Text-to-Text Transfer Transformer): Google’s T5 converts all NLP tasks into a text-to-text format, including question answering, translation, and summarization. ● RAG (Retrieval-Augmented Generation): A hybrid model that combines a language model with a retrieval system, allowing it to fetch relevant external knowledge during generation. 8. Multimodal LLMs ● CLIP (Contrastive Language-Image Pretraining): Developed by OpenAI, CLIP learns to understand text and images in a unified way, excelling in tasks like image captioning and image classification. ● DALL-E: An image generation model that creates images based on textual descriptions, leveraging multimodal capabilities. 9. Compression and Parameter-Efficient LLMs ● DistilBERT: A smaller, faster, and more efficient variant of BERT, trained using knowledge distillation to achieve a similar performance with fewer parameters. ● ALBERT (A Lite BERT): A more parameter-efficient version of BERT that reduces memory footprint and training time without compromising much on accuracy. 10. Large Language Models with Memory ● RETRO (Retrieval-Enhanced Transformer): Developed by DeepMind, RETRO uses a retrieval mechanism to access external databases during text generation, allowing it to generate long, coherent text with less computation. ● MemGPT: A GPT variant that incorporates a memory mechanism to handle complex, long-range dependencies in text. 11. Few-Shot and Zero-Shot LLMs LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024 41
  • 42. LLMs and GenAI Simplified: An Easy Path to Understanding ● GPT-3/4 Few-Shot Learning: These models demonstrate the ability to perform tasks with minimal training examples (few-shot) or even without any task-specific training examples (zero-shot), making them versatile for a wide range of applications. ● T0: A fine-tuned model from Hugging Face, trained to perform multiple tasks in a zero-shot setting using prompts. 12. Reinforcement Learning-Based LLMs ● ChatGPT (GPT-3/4 + RLHF): ChatGPT is trained with reinforcement learning from human feedback (RLHF) to ensure safer and more helpful interactions during conversations. ● Sparrow: Developed by DeepMind, it is trained via RLHF to provide more accurate and less harmful answers while following safety guidelines. 3. Popular LLMs 1. OpenAI GPT (Generative Pre-trained Transformer) ● Developers: OpenAI ● Notable Models: GPT-2, GPT-3, GPT-4, ChatGPT ● Architecture: Decoder-only transformer architecture, autoregressive models. ● Core Features: ○ GPT-3 has 175 billion parameters, while GPT-4 is speculated to have over 100 trillion parameters (though exact figures are not publicly disclosed). ○ These models are pre-trained on a massive corpus of data and are fine-tuned to perform various natural language processing tasks like text generation, summarization, translation, and more. ○ GPT-4 is multimodal, meaning it can accept both image and text inputs, making it more versatile than its predecessors. ● Use Cases: Used extensively in conversational AI (ChatGPT), code generation (Codex), content creation, and research assistance. References: ● OpenAI Research ● GPT-4 Technical Paper arXiv 2. LLaMA (Large Language Model Meta AI) ● Developers: Meta (Facebook AI) LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024 42
  • 43. LLMs and GenAI Simplified: An Easy Path to Understanding ● Notable Models: LLaMA, LLaMA 2 ● Architecture: Transformer-based architecture, but designed to be more efficient and accessible with fewer parameters compared to GPT models. ● Core Features: ○ The model is available in different sizes (7B, 13B, 33B, and 65B parameters), focusing on lower computational costs while maintaining high performance. ○ It has been specifically optimized to reduce resource usage, making it more accessible for research and practical applications. ○ LLaMA models are open-source, unlike GPT, which is proprietary. ● Use Cases: Research on language tasks, including text classification, question answering, and text generation. References: ● Meta AI LLaMA Release ● LLaMA 2 Overview 3. Google Gemini ● Developers: Google DeepMind ● Notable Models: Gemini 1, Gemini 1.5 (Upcoming) ● Architecture: Based on the PaLM architecture (Pathways Language Model) but includes multimodal capabilities like GPT-4, meaning it can handle both image and text inputs. ● Core Features: ○ Gemini integrates reinforcement learning with human feedback (RLHF), making it more reliable for real-world applications. ○ The model is designed to handle multimodal inputs (text and images), improving its use in tasks requiring visual and textual data. ● Use Cases: Search enhancements, AI assistants like Bard, translation, and research. References: ● Google Gemini Announcement ● DeepMind Research 4. Claude (Claude 1, Claude 2) ● Developers: Anthropic ● Architecture: Similar to GPT, based on the transformer architecture but with a specific focus on safety and alignment, ensuring the model produces less harmful outputs. ● Core Features: ○ Anthropic’s focus is on building "helpful, honest, and harmless" AI systems, leading to a model that emphasizes human-centered values. LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024 43
  • 44. LLMs and GenAI Simplified: An Easy Path to Understanding ○ Claude models are named after Claude Shannon, the father of information theory, and they are primarily designed for conversational agents. ● Use Cases: Chatbots, customer service, task automation, and conversational AI. References: ● Anthropic Research 5. PaLM (Pathways Language Model) ● Developers: Google AI ● Notable Models: PaLM 2 ● Architecture: Transformer-based model with a focus on scaling across multiple languages and modalities. ● Core Features: ○ PaLM 2 is capable of understanding and generating text in over 100 languages and is trained to handle a variety of modalities including image and text. ○ It emphasizes efficiency and is highly scalable, designed to be part of Google’s larger AI ecosystem, integrating with models like Gemini. ● Use Cases: Translation, summarization, text-to-image, and research. References: ● Google PaLM Overview 6. BLOOM ● Developers: BigScience (an open-science collaboration) ● Notable Models: BLOOM-176B ● Architecture: Transformer-based model similar to GPT, with a multilingual focus. ● Core Features: ○ BLOOM supports 59 languages and 13 programming languages, making it one of the most accessible LLMs for diverse linguistic research. ○ It is open-source and community-driven, aiming to democratize access to large-scale AI models. ● Use Cases: Language generation, multilingual translation, code generation, and research. References: ● BigScience BLOOM 7. Grok (XAI) ● Developers: xAI (Elon Musk's company) LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024 44
  • 45. LLMs and GenAI Simplified: An Easy Path to Understanding ● Notable Models: Grok (still in development) ● Architecture: Expected to be transformer-based and fine-tuned on various complex reasoning tasks, but specific details are not yet public. ● Core Features: Grok aims to focus on better understanding reasoning and problem-solving tasks, possibly leveraging large datasets similar to GPT models. ● Use Cases: Still speculative, but likely similar to other general-purpose models with a focus on reasoning and conversational abilities. References: ● xAI Grok Overview 8. Mistral ● Developers: Mistral AI ● Notable Models: Mistral 7B ● Architecture: Transformer-based model designed for efficiency, with fewer parameters but high performance. ● Core Features: ○ Focused on parameter efficiency, Mistral provides competitive performance despite smaller model size compared to GPT or PaLM. ● Use Cases: NLP tasks such as text generation, summarization, and translation. References: ● Mistral AI Conclusion: Each LLM has unique features and strengths tailored to different use cases: ● OpenAI GPT is strong in general-purpose language tasks and generation. ● LLaMA offers a more accessible and efficient alternative for researchers. ● Gemini emphasizes multimodality and reinforcement learning. ● BLOOM stands out with its multilingual capabilities. ● Claude focuses on safety and human alignment, while PaLM emphasizes scalability across languages and modalities 4. Open source LLMs LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024 45
  • 46. LLMs and GenAI Simplified: An Easy Path to Understanding 1. BERT (Bidirectional Encoder Representations from Transformers) ● Model: BERT base-uncased ● Description: One of the most widely-used models for tasks like text classification, question answering, and named entity recognition. It uses a bidirectional transformer architecture that reads text from both directions. ● Parameters: 110 million ● Use Cases: Sentiment analysis, text classification, question answering. 2. GPT-2 ● Model: GPT-2 ● Description: A generative model from OpenAI designed for text generation. It predicts the next word in a sequence, making it great for creative text generation tasks. ● Parameters: 1.5 billion ● Use Cases: Text generation, summarization, and dialogue systems. 3. RoBERTa ● Model: RoBERTa base ● Description: A variant of BERT with optimizations in training techniques, RoBERTa is fine-tuned for better performance on downstream tasks. ● Parameters: 125 million ● Use Cases: Text classification, question answering, natural language inference. 4. T5 (Text-to-Text Transfer Transformer) ● Model: T5 base ● Description: T5 reframes every NLP task as a text-to-text problem, making it incredibly versatile. It is used for tasks like translation, summarization, and text generation. ● Parameters: 220 million ● Use Cases: Summarization, translation, question answering. 5. BLOOM (BigScience Large Open-science Open-access Multilingual Language Model) ● Model: BLOOM ● Description: A multilingual LLM supporting 46 languages and 13 programming languages, BLOOM is an open-science model designed for research and NLP tasks. ● Parameters: 176 billion ● Use Cases: Multilingual NLP, text generation, translation, code generation. 6. DistilBERT LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024 46
  • 47. LLMs and GenAI Simplified: An Easy Path to Understanding ● Model: DistilBERT base-uncased ● Description: A lighter, faster version of BERT that retains 97% of its language understanding capabilities while being more computationally efficient. ● Parameters: 66 million ● Use Cases: Text classification, sentiment analysis, question answering. 7. XLM-R (XLM-RoBERTa) ● Model: XLM-R large ● Description: A cross-lingual version of RoBERTa that is pre-trained on 100 languages, making it useful for tasks in multilingual contexts. ● Parameters: 550 million ● Use Cases: Multilingual text classification, translation, and named entity recognition. 8. BART (Bidirectional and Auto-Regressive Transformers) ● Model: BART base ● Description: A transformer model that combines both a bidirectional encoder and autoregressive decoder, designed for text generation and summarization. ● Parameters: 140 million ● Use Cases: Text summarization, machine translation, and question answering. 9. Flan-T5 ● Model: Flan-T5 ● Description: An extension of T5 that is fine-tuned on a variety of instruction-based tasks, making it highly versatile for few-shot and zero-shot learning. ● Parameters: 780 million ● Use Cases: Text summarization, translation, few-shot learning. 10. CodeBERT ● Model: CodeBERT ● Description: Pretrained for both natural language and programming languages, CodeBERT is specifically optimized for source code-related tasks. ● Parameters: 125 million ● Use Cases: Code generation, code search, code summarization. LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024 47
  • 48. LLMs and GenAI Simplified: An Easy Path to Understanding CHAPTER 2: LLM Architecture 5. LLM Transformer architecture 1. Encoder Architecture ● Purpose: The encoder architecture is designed to understand input. It reads and processes text to capture its meaning. ● How it works: Imagine you're trying to understand a sentence in a book. The encoder takes in every word, processes it, and tries to understand the whole text by relating the words to one another. ● Example: The BERT model is a popular encoder-based architecture. It’s great at understanding context, like figuring out the meaning of a sentence by looking at all the words. 2. Decoder Architecture ● Purpose: The decoder architecture focuses on generating text based on some input or prompt. ● How it works: Picture yourself trying to write a story. The decoder takes a starting point (like a topic) and continues generating text based on what it learned from patterns. ● Example: Models like GPT (Generative Pre-trained Transformer) are decoders. They’re used for generating long pieces of text, answering questions, and creating dialogue. 3. Encoder-Decoder Architecture ● Purpose: This type combines both encoder and decoder to read input and generate a response. LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024 48
  • 49. LLMs and GenAI Simplified: An Easy Path to Understanding ● How it works: Imagine you’re translating a sentence from one language to another. The encoder first reads and understands the sentence, and the decoder then generates the translation. ● Example: T5 and BART are examples of encoder-decoder models, commonly used for tasks like machine translation and summarization. The Transformer architecture consists of an encoder-decoder structure, but LLMs, such as GPT, BERT, and others, often use either just the encoder (BERT) or just the decoder (GPT) depending on the task. These components are built from layers of self-attention mechanisms and feed-forward neural networks. Key Components: ● Self-Attention Mechanism: ○ Self-attention allows the model to weigh the importance of each word in a sentence relative to the others. This is done by computing three vectors for each word: query (Q), key (K), and value (V). These vectors help the model determine how much attention to pay to each word when generating an output. ○ The formula for attention is as follows: Attention(Q,K,V)=softmax(QKTdk)Vtext{Attention}(Q, K, V) = text{softmax}left(frac{QK^T}{sqrt{d_k}}right)VAttention(Q,K,V)=softmax(dk​ ​ QKT​ )V ○ This mechanism allows the model to capture dependencies between words regardless of their distance in the sentence. ● Positional Encoding: ○ Unlike RNNs or LSTMs, Transformers do not process tokens sequentially. Instead, they process all tokens at once. To capture the order of the words in the sequence, a positional encoding is added to the input embeddings. ○ The positional encoding adds information about the token’s position using sine and cosine functions of different frequencies. ● Feed-Forward Network (FFN): ○ Each attention block is followed by a fully connected feed-forward network, which processes the outputs of the self-attention mechanism. ○ This layer applies a linear transformation, followed by a non-linear activation function (usually ReLU), and then another linear transformation. ● Multi-Head Attention: ○ Instead of calculating attention just once, the Transformer model calculates it multiple times in parallel, referred to as "multi-head attention." Each attention head can focus on different parts of the sentence, helping the model capture richer contextual information. LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024 49
  • 50. LLMs and GenAI Simplified: An Easy Path to Understanding Let’s break down how a sentence like "Good morning" is translated into French using an Encoder-Decoder architecture, step by step. The architecture is powered by transformers, which have multiple layers involving attention, feed-forward networks, and other components. I’ll explain it in a simple, understandable way, imagining the model translating "Good morning" to "Bonjour." Step 1: Input (Good morning) The input sentence "Good morning" is first converted into numbers (tokens). These tokens represent each word so that the model can understand and process the input. For example: ● "Good" becomes token 12. ● "Morning" becomes token 34. So the input becomes: [12, 34]. Encoder Steps: Processing the Input Sentence 1. Embedding Layer ○ The tokens [12, 34] are turned into word embeddings—vectors that contain information about the meaning of each word. ○ Imagine each word becomes a detailed vector (a list of numbers) that tells the model more about the word's properties and relationships to other words. 2. Positional Encoding ○ Since word order matters (e.g., "Good morning" is different from "Morning good"), a positional encoding is added to the word embeddings. This helps the model understand the position of each word in the sentence. 3. After this step, we have vectors for "Good" and "Morning" that include both meaning and position. 4. Self-Attention ○ Attention is like a smart highlighter. It allows the model to focus on important words when processing the sentence. ○ For "Good morning", the attention mechanism compares "Good" with "Morning" and checks how much each word contributes to the meaning of the whole sentence. 5. The result is that both "Good" and "Morning" get updated to reflect the overall meaning of the sentence. LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024 50
  • 51. LLMs and GenAI Simplified: An Easy Path to Understanding 6. Feed-Forward Neural Network ○ After the attention step, each word vector goes through a feed-forward network, which is like a math function that adds more depth and non-linearity to the information. This helps the model capture complex patterns in the data. 7. At this point, the sentence has been transformed into deep, meaningful representations that the encoder can understand well. 8. Multi-Head Attention ○ In practice, attention is applied multiple times, from different perspectives. This is called multi-head attention. Each attention head focuses on different parts of the sentence, like meaning, structure, or relationships between words. 9. All these attention results are combined to further enhance the representation of the input. 10. Output of Encoder ○ The encoder finishes by outputting a detailed, transformed version of the sentence, ready for the decoder. It doesn’t translate yet—it just understands "Good morning" deeply. Decoder Steps: Generating the Translation ("Bonjour") 1. Start Token ○ The decoder begins with a special start token to signal that it’s time to generate the translation. For French, this token might represent the start of a French sentence. 2. Attention Over Encoder Output ○ Now, the decoder needs to look at the encoder’s output (the detailed representation of "Good morning"). It uses attention again, called encoder-decoder attention, to focus on relevant parts of the encoder's output. 3. For instance, the decoder will focus on both "Good" and "Morning" to decide how to begin translating. 4. Feed-Forward Network ○ Like in the encoder, the decoder also has a feed-forward network to further process the data and ensure it generates the right translation. 5. Self-Attention ○ The decoder also applies self-attention to its own output to ensure it makes sense. This helps the decoder generate words one by one while keeping track of the sentence's structure. 6. Generate "Bonjour" (Word-by-Word) ○ Now, the model begins to generate the French translation, one word at a time. ○ First, it generates the word "Bonjour" because the model has learned that "Good morning" translates to "Bonjour" in French. 7. Softmax Layer (Word Prediction) LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024 51
  • 52. LLMs and GenAI Simplified: An Easy Path to Understanding ○ After generating "Bonjour", the model passes the prediction through a softmax layer. This step calculates the probability of each possible word in the French language and selects the most likely one. ○ For example, the softmax might calculate probabilities for words like "Bonjour" (80%), "Salut" (15%), and "Au revoir" (5%). Since "Bonjour" has the highest probability, it is chosen as the next word in the translation. 8. Repeat for More Words ○ The decoder continues generating words using the same process until it reaches a special end token, signaling the end of the translation. Putting It All Together: 1. The encoder processes and deeply understands "Good morning". 2. The decoder starts generating the translation, word by word, using the encoder's output. 3. With each step, attention mechanisms help the model focus on important words and the softmax layer ensures that the right word is chosen based on probability. 4. Finally, the model generates "Bonjour", which is the correct French translation of "Good morning". CHAPTER 3: LLM Applications 6. LLM Gen AI use cases 1. Text Generation ● Use Case: Content generation for blogs, articles, marketing materials, or even creative writing. ● Models: GPT-2, GPT-3, and Bloom. ● Example: Automatically generating text based on prompts, such as product descriptions or long-form articles. 2. Question Answering LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024 52
  • 53. LLMs and GenAI Simplified: An Easy Path to Understanding ● Use Case: Building intelligent assistants or chatbots that can answer questions based on a knowledge base or real-time input. ● Models: BERT, RoBERTa, T5, and DistilBERT. ● Example: Customer support chatbots that can retrieve and respond to queries using company documentation or FAQs. 3. Text Summarization ● Use Case: Automatic summarization of long documents or reports for efficient consumption. ● Models: BART, T5. ● Example: Summarizing lengthy research papers, legal documents, or meeting minutes into concise, readable summaries. 4. Text Classification ● Use Case: Sentiment analysis, spam detection, and categorizing customer reviews or feedback. ● Models: BERT, DistilBERT, XLNet. ● Example: Sorting emails into categories like spam, promotions, and primary; or identifying the sentiment behind customer reviews. 5. Translation ● Use Case: Language translation for websites, apps, or business communications. ● Models: MarianMT, M2M100. ● Example: Translating product descriptions into multiple languages for e-commerce platforms. 6. Conversational AI (Chatbots) ● Use Case: Building interactive, conversational agents for customer service or virtual assistants. ● Models: DialoGPT, BlenderBot. ● Example: Creating virtual assistants that can engage in back-and-forth conversations to assist with tasks or answer customer inquiries. 7. Image Generation (Text-to-Image) ● Use Case: Generating images based on textual descriptions. ● Models: DALL-E, Stable Diffusion. ● Example: Creating marketing visuals, concept art, or prototypes based on written inputs. 8. Image Classification LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024 53
  • 54. LLMs and GenAI Simplified: An Easy Path to Understanding ● Use Case: Identifying objects, people, or actions in images. ● Models: ResNet, ViT (Vision Transformer). ● Example: Automated tagging and categorization of images in large databases or recognizing defects in manufacturing products. 9. Image Segmentation ● Use Case: Segmenting parts of images for applications like medical imaging or object detection. ● Models: Mask R-CNN, U-Net. ● Example: Highlighting cancerous tissues in X-ray images or isolating specific objects in satellite imagery. 10. Audio Processing (Speech-to-Text and Text-to-Speech) ● Use Case: Converting speech to text for transcription services, or generating speech from text for virtual assistants or automated systems. ● Models: Wav2Vec 2.0, Tacotron 2. ● Example: Real-time transcription of meetings, or converting text into realistic-sounding speech for voiceovers. 11. Code Generation ● Use Case: Automatic code generation or code completion. ● Models: CodeBERT, Codex. ● Example: Autocompleting code for developers, or generating boilerplate code from high-level descriptions of functionality. 12. Sentiment Analysis ● Use Case: Determining the emotional tone behind a piece of text. ● Models: DistilBERT, RoBERTa. ● Example: Identifying whether customer feedback or social media posts are positive, negative, or neutral. 13. Named Entity Recognition (NER) ● Use Case: Extracting specific information like names, locations, or organizations from unstructured text. ● Models: BERT, Flair. ● Example: Automatically identifying key stakeholders from business documents or extracting product names from reviews. 14. Data Augmentation LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024 54
  • 55. LLMs and GenAI Simplified: An Easy Path to Understanding ● Use Case: Generating synthetic data for training machine learning models, especially in cases where real data is limited. ● Models: T5, GPT-3. ● Example: Augmenting a dataset of medical records with synthetic but realistic data to train models for diagnosis. 15. Image Captioning ● Use Case: Automatically generating captions or descriptions for images. ● Models: CLIP, ViLBERT. ● Example: Describing product images for e-commerce sites or generating alt-text for accessibility on websites. 16. Multi-modal AI ● Use Case: Combining inputs from multiple data types like text and images to generate responses. ● Models: CLIP, Florence. ● Example: Interpreting a text description to retrieve relevant images or vice versa. 17. Text-Based Games/Interactive Stories ● Use Case: Creating interactive, text-based adventure games or dynamic stories based on user input. ● Models: GPT-3, DialoGPT. ● Example: Generating new scenarios or storylines in a game based on player choices. 18. Knowledge Base Extraction ● Use Case: Automatically generating or updating knowledge bases from unstructured documents. ● Models: T5, BERT. ● Example: Creating structured FAQ documents from customer service interactions or product manuals. 19. Fake News Detection ● Use Case: Identifying and classifying articles or social media posts as misleading or fake news. ● Models: RoBERTa, BERT. ● Example: Filtering and flagging potentially unreliable news sources or claims on social media platforms. 20. Grammar and Style Correction LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024 55
  • 56. LLMs and GenAI Simplified: An Easy Path to Understanding ● Use Case: Automatically correcting grammar, spelling, and style errors in text. ● Models: T5, GPT-3. ● Example: Creating tools for automatic proofreading or improving the writing style of articles. 21. Legal Document Generation ● Use Case: Automating the creation of legal documents like contracts, agreements, or legal briefs. ● Models: GPT-3, T5. ● Example: Drafting legal documents based on predefined templates and input from legal professionals. 22. Paraphrasing ● Use Case: Rewriting or paraphrasing text while maintaining the original meaning, often for content diversification or academic use. ● Models: Pegasus, T5. ● Example: Rewriting articles or sections of text to avoid plagiarism or for content variation. 23. Automated Code Review ● Use Case: Automating the process of reviewing code for potential errors, inefficiencies, or security vulnerabilities. ● Models: CodeBERT, Codex. ● Example: Performing automated code reviews to flag issues or provide suggestions for improvements. 24. Emotion Recognition in Text ● Use Case: Detecting and classifying emotions expressed in text, which can be applied in customer support or content analysis. ● Models: BERT, DistilBERT. ● Example: Analyzing customer complaints to detect emotions like frustration, anger, or satisfaction. 25. Product Recommendation ● Use Case: Generating personalized product recommendations based on user preferences and behaviors. ● Models: BERT, DistilBERT, Transformer models for recommendation. ● Example: Recommending similar or complementary products to users in an e-commerce setting based on their browsing or purchase history. LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024 56
  • 57. LLMs and GenAI Simplified: An Easy Path to Understanding 26. Text-to-Programming Language Conversion ● Use Case: Translating natural language descriptions into executable code. ● Models: Codex, GPT-3. ● Example: Converting user requirements written in plain English into Python or JavaScript code. 27. Style Transfer (Text) ● Use Case: Changing the tone or style of text, such as converting formal writing into casual language or mimicking a particular author’s writing style. ● Models: GPT-3, T5. ● Example: Rewriting formal business emails in a more casual tone or vice versa. 28. Document Comparison ● Use Case: Identifying and comparing differences or similarities between two or more documents. ● Models: BERT, T5. ● Example: Comparing legal contracts or versions of documents to identify key differences or changes. 29. Content Moderation ● Use Case: Detecting inappropriate or harmful content in text, images, or videos for automatic moderation. ● Models: RoBERTa, GPT-3. ● Example: Automatically flagging offensive or harmful language in online forums or social media platforms. 30. Voice Cloning ● Use Case: Generating speech that mimics the voice of a particular person, often used in virtual assistants or content creation. ● Models: Tacotron 2, WaveGlow. ● Example: Cloning a public figure’s voice to generate audio clips for educational or entertainment purposes. 31. Image Super-Resolution ● Use Case: Enhancing the resolution of images to improve quality. ● Models: ESRGAN (Enhanced Super-Resolution Generative Adversarial Networks). ● Example: Enhancing low-resolution images for medical diagnostics, satellite imagery, or historical photograph restoration. LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024 57
  • 58. LLMs and GenAI Simplified: An Easy Path to Understanding 32. Code Translation (Language-to-Language) ● Use Case: Converting code from one programming language to another. ● Models: CodeT5, Codex. ● Example: Translating Java code into Python for software porting purposes. 33. Image Inpainting ● Use Case: Filling in missing or corrupted parts of an image. ● Models: LaMa (Large Masked Image Modeling). ● Example: Restoring damaged photographs or removing unwanted objects from images. 34. Text-Based Music Generation ● Use Case: Generating musical compositions based on text prompts or descriptions. ● Models: Jukebox (OpenAI), MusicBERT. ● Example: Creating custom music tracks based on user-specified genres or moods. 35. Visual Question Answering (VQA) ● Use Case: Answering questions about the content of an image. ● Models: ViLBERT, CLIP. ● Example: Answering questions about an image’s objects, actions, or context in applications like medical imaging or e-commerce. 36. Data-to-Text Generation ● Use Case: Converting structured data into readable text. ● Models: T5, GPT-3. ● Example: Automatically generating written summaries from tables or charts, such as generating financial reports from numerical data. 37. Human Pose Estimation ● Use Case: Detecting human body poses in images or videos for applications like fitness tracking, animation, or security. ● Models: OpenPose, HRNet. ● Example: Analyzing sports performance or guiding fitness exercises by tracking a user's body movements. 38. Time-Series Forecasting ● Use Case: Predicting future values based on historical time-series data. ● Models: Prophet, Temporal Fusion Transformers (TFT). LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024 58
  • 59. LLMs and GenAI Simplified: An Easy Path to Understanding ● Example: Predicting stock prices, energy demand, or sales trends. 39. Reinforcement Learning for Text-Based Tasks ● Use Case: Using reinforcement learning to optimize decision-making in tasks involving text, such as conversation agents or game playing. ● Models: GPT-3 with reinforcement learning (RLHF - Reinforcement Learning from Human Feedback). ● Example: Training a chatbot to maximize customer satisfaction through long conversations. 40. Automated Tagging and Metadata Generation ● Use Case: Automatically generating tags and metadata for content, such as videos or blog posts. ● Models: BERT, RoBERTa. ● Example: Automatically adding keywords and tags to YouTube videos or blog articles to improve SEO. 7. LLM Model Parameters 1. Temperature ● Description: Controls the randomness or creativity of the model's output. The temperature parameter adjusts how deterministic or diverse the text generation will be. ● Range: Typically between 0 and 2. ○ Low temperature (e.g., 0.1): Makes the model more deterministic, meaning it will choose the most probable tokens with higher certainty, resulting in more predictable and conservative outputs. ○ High temperature (e.g., 1.5): Increases randomness, encouraging the model to explore less probable tokens, which can result in more creative and varied outputs. ● Use Case: A low temperature is ideal for tasks requiring precise, fact-based outputs (e.g., answering factual questions), while a higher temperature can be used for creative tasks like storytelling or generating diverse outputs. python Copy code openai.Completion.create( LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024 59
  • 60. LLMs and GenAI Simplified: An Easy Path to Understanding model="text-davinci-003", prompt="Tell me a story about a brave knight.", temperature=1.0, # Default max_tokens=100 ) 2. Max Tokens ● Description: Limits the number of tokens (words or subwords) in the generated output. Each token may represent a word, part of a word, or punctuation mark. ● Range: Up to the model’s token limit (e.g., GPT-3 has a maximum of 4096 tokens). ● Use Case: This parameter is used to control the length of the generated text. For example, shorter text summaries might have a smaller max tokens value, while longer essays or creative writing might have a larger value. python Copy code openai.Completion.create( model="text-davinci-003", prompt="Explain quantum physics in simple terms.", max_tokens=200 # Maximum number of tokens in the response ) 3. Top-k Sampling ● Description: Limits the next token generation to the top k most likely tokens. The model will sample from this limited set instead of considering the entire vocabulary, ensuring that only the most probable tokens are considered. ● Range: Positive integer (e.g., k = 40). ○ Low k: Generates more predictable output. ○ High k: Increases the diversity of the output by allowing less probable tokens to be considered. ● Use Case: Top-k sampling is useful when you want to balance between creative outputs and coherence, by ensuring the model doesn't generate extremely unlikely words but still provides some variety. python Copy code openai.Completion.create( model="text-davinci-003", prompt="What is the future of AI?", LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024 60
  • 61. LLMs and GenAI Simplified: An Easy Path to Understanding top_k=40, # Limits sampling to the top 40 tokens max_tokens=100 ) 4. Top-p Sampling (Nucleus Sampling) ● Description: Top-p sampling (also known as nucleus sampling) selects the smallest possible set of tokens whose cumulative probability exceeds a threshold ppp. Instead of choosing a fixed number of tokens (as in top-k), top-p dynamically chooses tokens based on their cumulative probability. ● Range: p∈[0,1]p in [0, 1]p∈[0,1] ○ Low p (e.g., 0.1): Restricts the model to the highest-probability tokens, resulting in more conservative outputs. ○ High p (e.g., 0.9): Allows more diverse tokens to be considered, increasing creativity in the output. ● Use Case: This is particularly useful in text generation tasks where you want to control the diversity and ensure that tokens with very low probabilities are not selected. python Copy code openai.Completion.create( model="text-davinci-003", prompt="Describe a sunset.", top_p=0.9, # Ensures 90% of the probability mass is used in sampling max_tokens=50 ) 5. Frequency Penalty ● Description: Reduces the likelihood of the model generating tokens that have already been generated in the current output. This is useful for avoiding repetitive phrases or sentences. ● Range: [−2.0,2.0][-2.0, 2.0][−2.0,2.0] ○ Positive values: Penalize the model for repeating the same tokens, making it less likely to repeat words. ○ Negative values: Encourage the model to repeat tokens more often. ● Use Case: When generating long text, this can be used to reduce repetition and encourage the model to generate more varied content. python LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024 61
  • 62. LLMs and GenAI Simplified: An Easy Path to Understanding Copy code openai.Completion.create( model="text-davinci-003", prompt="Write a paragraph about the importance of education.", frequency_penalty=0.5, # Penalizes token repetition max_tokens=100 ) 6. Presence Penalty ● Description: Encourages the model to explore new topics or words that haven't appeared in the current output. This increases the likelihood of introducing new tokens into the output. ● Range: [−2.0,2.0][-2.0, 2.0][−2.0,2.0] ○ Positive values: Make the model more likely to introduce new concepts or tokens. ○ Negative values: Encourage the model to stay within the same set of tokens or concepts. ● Use Case: Used when you want the model to be more exploratory and avoid sticking to the same themes. python Copy code openai.Completion.create( model="text-davinci-003", prompt="Give me creative ideas for a tech startup.", presence_penalty=0.6, # Encourages introducing new ideas and concepts max_tokens=150 ) 7. Stop Sequences ● Description: Defines specific token sequences that will stop the generation process once they are encountered. These tokens are included in the output up to that point, but generation halts when the stop sequence is detected. ● Use Case: Useful for controlling when the model should stop generating text. For example, in chatbot conversations, you might use stop sequences to signal the end of a response. python LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024 62
  • 63. LLMs and GenAI Simplified: An Easy Path to Understanding Copy code openai.Completion.create( model="text-davinci-003", prompt="Tell me a joke.", stop=["n", "<|endoftext|>"], # Stops generation when newline or end token is encountered max_tokens=50 ) 8. Best-of (n-best) ● Description: Generates multiple completions for each prompt (e.g., n completions) and returns the one with the highest log-probability. This is useful when you want the best possible output out of several generated options. ● Range: Integer (e.g., best_of = 3 generates 3 outputs and selects the best one). ● Use Case: Useful when quality is more important than speed, and you want to ensure that the best possible response is chosen. python Copy code openai.Completion.create( model="text-davinci-003", prompt="Explain the significance of the moon landing.", best_of=3, # Generate 3 completions and return the best one max_tokens=150 ) 9. Echo ● Description: When set to true, the model returns the prompt in addition to the generated output. This can be useful for debugging or when you want to review the input alongside the output. ● Use Case: Helpful in interactive applications where you want to display both the input and the generated response. python Copy code openai.Completion.create( model="text-davinci-003", prompt="What is artificial intelligence?", LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024 63
  • 64. LLMs and GenAI Simplified: An Easy Path to Understanding echo=True, # Echoes the prompt in the response max_tokens=100 ) 10. Stream ● Description: When set to true, the model streams the tokens in real-time instead of generating the entire output at once. This is useful for real-time applications like chatbots where you want a response to be displayed as it’s being generated. ● Use Case: Ideal for interactive applications like live chatbots where the user doesn't want to wait for the entire response to be generated before seeing any output. python Copy code openai.Completion.create( model="text-davinci-003", prompt="What are the benefits of renewable energy?", stream=True, # Stream the output token by token max_tokens=150 ) 8. LLM benchmarks 1. SuperGLUE ● Focus: Natural language understanding. ● Description: An improvement over the original GLUE benchmark, including more challenging tasks like reading comprehension, coreference resolution, and inference. 2. GLUE (General Language Understanding Evaluation) ● Focus: General natural language understanding tasks. ● Description: Includes tasks such as sentence classification, sentence similarity, and natural language inference. LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024 64
  • 65. LLMs and GenAI Simplified: An Easy Path to Understanding 3. OpenAI HumanEval ● Focus: Code generation. ● Description: Evaluates a model’s ability to generate correct Python functions based on natural language descriptions. 4. SQuAD (Stanford Question Answering Dataset) ● Focus: Question answering. ● Description: Evaluates a model's ability to understand and answer questions based on a given passage. 5. MMLU (Massive Multitask Language Understanding) ● Focus: General knowledge across a wide range of subjects. ● Description: Tests models on topics from elementary math to medicine and law. 6. HELLASWAG ● Focus: Commonsense reasoning. ● Description: Measures a model’s ability to predict the most plausible continuation of a given scenario. 7. Big-Bench (Beyond the Imitation Game Benchmark) ● Focus: Diverse set of tasks. ● Description: A collection of 204 tasks that test models on areas like reasoning, linguistics, mathematics, and general knowledge. 8. LAMBADA ● Focus: Language modeling. ● Description: Tests the model's ability to predict the final word in a sentence when provided with long-range context. 9. TriviaQA ● Focus: Open-domain question answering. ● Description: Includes questions from trivia and a large corpus of text documents to test factual recall. 10. CoQA (Conversational Question Answering) ● Focus: Dialogue-based question answering. LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024 65
  • 66. LLMs and GenAI Simplified: An Easy Path to Understanding ● Description: Evaluates how well a model can answer a series of interrelated questions based on a passage. 11. Winograd Schema Challenge ● Focus: Pronoun disambiguation. ● Description: Tests the model’s commonsense reasoning by asking it to resolve ambiguities in sentences. 12. ARC (AI2 Reasoning Challenge) ● Focus: Science question answering. ● Description: Tests models on multiple-choice science questions that require reasoning beyond simple text matching. 13. PiQA (Physical Interaction: Question Answering) ● Focus: Physical reasoning. ● Description: Tests how well a model can reason about the physical world, particularly in everyday human activities. 14. BoolQ (Boolean Questions) ● Focus: Yes/No question answering. ● Description: Involves reading comprehension and answering questions with simple yes or no responses. 15. TyDiQA ● Focus: Multilingual question answering. ● Description: Tests question answering capabilities across multiple languages and varied contexts. 16. StoryCloze ● Focus: Story comprehension. ● Description: Evaluates a model's ability to select the best ending for a given story. 17. WinoGrande ● Focus: Commonsense reasoning. ● Description: A larger and more difficult version of the Winograd Schema Challenge to test commonsense reasoning on a larger scale. LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024 66
  • 67. LLMs and GenAI Simplified: An Easy Path to Understanding 18. DROP (Discrete Reasoning Over Paragraphs) ● Focus: Reading comprehension and arithmetic reasoning. ● Description: Requires models to answer questions that involve discrete reasoning like counting, sorting, or arithmetic. 19. Hendrycks Test ● Focus: Multitask learning. ● Description: Covers multiple-choice questions across topics such as humanities, STEM, and social sciences. 20. XGLUE ● Focus: Multilingual natural language understanding. ● Description: Extends GLUE tasks to multiple languages, testing cross-lingual generalization. 21. CodeXGLUE ● Focus: Code understanding and generation. ● Description: A benchmark designed for evaluating models on coding tasks like code generation, translation, and classification. 22. CLUE (Chinese Language Understanding Evaluation) ● Focus: Chinese natural language understanding. ● Description: The Chinese version of GLUE, testing various language tasks in the Chinese language. 9. LLM Finetuning a) LLM with Prompt Engineering Tuning LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024 67
  • 68. LLMs and GenAI Simplified: An Easy Path to Understanding Prompt engineering involves designing and refining prompts to improve the performance of language models for specific tasks. This method doesn't require fine-tuning the model itself but focuses on optimizing the input prompts. Steps: 1. Define the Task: Clearly understand the task you want the model to perform. 2. Design Prompts: Create prompts that provide clear and specific instructions to the model. 3. Test and Refine: Evaluate the model's output and iteratively refine the prompts to get better results. Example: python Copy code import openai # Set your OpenAI API key openai.api_key = 'your_openai_api_key' def get_response(prompt): response = openai.Completion.create( engine="text-davinci-003", prompt=prompt, max_tokens=150 ) return response.choices[0].text.strip() # Define a prompt prompt = "Explain the causes of the American Civil War in detail." # Get and print the response response = get_response(prompt) print(response) Resources: ● OpenAI Documentation ● Effective Prompting Techniques b) LLM Instructions-based Training Tuning LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024 68
  • 69. LLMs and GenAI Simplified: An Easy Path to Understanding Instructions-based training tuning involves fine-tuning an LLM on a dataset that contains specific instructions and their corresponding completions. This helps the model understand and follow complex instructions more accurately. Steps: 1. Prepare Data: Create a dataset with prompts (instructions) and completions. 2. Convert to JSONL Format: Format the data as required by OpenAI for fine-tuning. 3. Upload Data: Upload the dataset to OpenAI. 4. Fine-tune the Model: Fine-tune the model with the uploaded dataset. 5. Test the Model: Evaluate the model with new instructions. Example: python Copy code import json import openai # Prepare your data data = [ { "prompt": "List the causes of the American Civil War.", "completion": " The causes of the American Civil War include slavery, states' rights, economic disagreements, and political conflicts." }, # Add more prompt-completion pairs ] # Save to a JSONL file with open('instruction_data.jsonl', 'w') as outfile: for entry in data: json.dump(entry, outfile) outfile.write('n') # Set your OpenAI API key openai.api_key = 'your_openai_api_key' # Upload the file response = openai.File.create( file=open("instruction_data.jsonl"), LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024 69
  • 70. LLMs and GenAI Simplified: An Easy Path to Understanding purpose='fine-tune' ) file_id = response['id'] # Create the fine-tune job response = openai.FineTune.create( training_file=file_id, model="davinci" # or another appropriate model ) fine_tune_id = response['id'] # Monitor the fine-tuning job import time while True: status = openai.FineTune.retrieve(id=fine_tune_id)['status'] print(f"Status: {status}") if status in ['succeeded', 'failed']: break time.sleep(30) # Use the fine-tuned model response = openai.Completion.create( model=fine_tune_id, prompt="Explain the major causes of World War II.", max_tokens=150 ) print(response.choices[0].text.strip()) Resources: ● OpenAI Fine-tuning Guide ● How to Fine-tune GPT-3 c) LLM with RAG (Retrieval-Augmented Generation) Fine-tuning RAG combines retrieval-based methods with generative models to enhance the generation process. The model retrieves relevant documents from a corpus to inform its responses. Steps: LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024 70
  • 71. LLMs and GenAI Simplified: An Easy Path to Understanding 1. Prepare Data: Create a dataset with context (retrieved documents) a try nd target responses. 2. Set Up Retriever: Use a retriever to fetch relevant documents. 3. Fine-tune the Model: Fine-tune the model with the dataset. 4. Query the Model: Use the model to generate responses based on retrieved contexts. Example: python Copy code import openai import json # Prepare your data data = [ { "prompt": "What is the capital of France?", "completion": " The capital of France is Paris." }, # Add more prompt-completion pairs with context ] # Save to a JSONL file with open('rag_data.jsonl', 'w') as outfile: for entry in data: json.dump(entry, outfile) outfile.write('n') # Set your OpenAI API key openai.api_key = 'your_openai_api_key' # Upload the file response = openai.File.create( file=open("rag_data.jsonl"), purpose='fine-tune' ) file_id = response['id'] # Create the fine-tune job response = openai.FineTune.create( training_file=file_id, LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024 71
  • 72. LLMs and GenAI Simplified: An Easy Path to Understanding model="davinci" # or another appropriate model ) fine_tune_id = response['id'] # Monitor the fine-tuning job import time while True: status = openai.FineTune.retrieve(id=fine_tune_id)['status'] print(f"Status: {status}") if status in ['succeeded', 'failed']: break time.sleep(30) # Use the fine-tuned model response = openai.Completion.create( model=fine_tune_id, prompt="What is the capital of Germany?", max_tokens=150 ) print(response.choices[0].text.strip()) Resources: ● OpenAI Documentation ● Retrieval-Augmented Generation (RAG) d) LLM with LoRA (Low-Rank Adaptation) LoRA (Low-Rank Adaptation) is a technique to adapt pre-trained language models efficiently by fine-tuning low-rank matrices added to the model's weights. Steps: 1. Set Up Environment: Install necessary libraries. 2. Prepare Data: Load and preprocess the dataset. 3. Apply LoRA: Implement LoRA to fine-tune the model. 4. Train the Model: Train the model using LoRA. 5. Evaluate: Test the fine-tuned model. Example: python LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024 72
  • 73. LLMs and GenAI Simplified: An Easy Path to Understanding Copy code # Placeholder for LoRA implementation as specific library support is required # Check Hugging Face or other relevant libraries for LoRA support Resources: ● Hugging Face Transformers ● LoRA Paper e) LLM with QLoRA (Quantized Low-Rank Adaptation) QLoRA combines quantization and low-rank adaptation to reduce the computational cost of fine-tuning. Steps: 1. Set Up Environment: Install necessary libraries. 2. Prepare Data: Load and preprocess the dataset. 3. Apply QLoRA: Implement QLoRA to fine-tune the model. 4. Train the Model: Train the model using QLoRA. 5. Evaluate: Test the fine-tuned model. Example: python Copy code # Placeholder for QLoRA implementation as specific library support is required # Check Hugging Face or other relevant libraries for QLoRA support Resources: ● Quantization in Deep Learning ● LoRA Paper f) LLM with Full Tuning Full Tuning involves training all parameters of the language model on a specific dataset. Steps: 1. Set Up Environment: Install necessary libraries. 2. Prepare Data: Load and preprocess the dataset. LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024 73
  • 74. LLMs and GenAI Simplified: An Easy Path to Understanding 3. Fine-Tune the Model: Train the entire model on the dataset. 4. Evaluate: Test the fine-tuned model. Example: python Copy code import openai import pandas as pd import json # Set your OpenAI API key openai.api_key = 'your_openai_api_key' # Load CSV Data csv_file_path = 'your_data.csv' df = pd.read_csv(csv_file_path) # Prepare the Training Data def prepare_training_data(df): data = [] for i, row in df.iterrows(): entry = { "prompt": row['Prompt'], "completion": " " + row['Completion'] } data.append(entry) return data training_data = prepare_training_data(df) # Save to a JSONL file jsonl_file_path = 'training_data.jsonl' with open(jsonl_file_path, 'w') as outfile: for entry in training_data: json.dump(entry, outfile) outfile.write('n') # Upload Training Data response = openai.File.create( LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024 74
  • 75. LLMs and GenAI Simplified: An Easy Path to Understanding file=open(jsonl_file_path), purpose='fine-tune' ) file_id = response['id'] # Fine-Tune the Model response = openai.FineTune.create( training_file=file_id, model="davinci" ) fine_tune_id = response['id'] # Monitor the Fine-Tuning Process import time while True: status = openai.FineTune.retrieve(id=fine_tune_id)['status'] print(f"Status: {status}") if status in ['succeeded', 'failed']: break time.sleep(30) # Test the Fine-Tuned Model response = openai.Completion.create( model=fine_tune_id, prompt="Explain the major causes of World War II.", max_tokens=150 ) print(response.choices[0].text.strip()) Resources: ● OpenAI Fine-tuning Guide ● Hugging Face Transformers These examples provide a detailed overview of different fine-tuning and adaptation techniques for LLMs. Each method has its own use cases and advantages, and the choice of method depends on the specific requirements of your projec LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024 75
  • 76. LLMs and GenAI Simplified: An Easy Path to Understanding 10. Interview Questions LLM Architecture 1. Question: Can you explain the general architecture of a large language model like GPT or BERT? Answer: LLMs like GPT and BERT are built on the transformer architecture, consisting of layers of self-attention mechanisms and feed-forward neural networks. BERT uses an encoder-only architecture for bidirectional context, while GPT uses a decoder-only architecture optimized for autoregressive tasks (i.e., generating text). Both architectures involve tokenization, positional encoding, and multiple attention heads to capture context over long sequences. 2. Question: How do LLMs handle long sequences of input text? Answer: LLMs use self-attention mechanisms to capture dependencies between distant words in a text sequence. Additionally, newer models incorporate optimizations like sparse attention, reversible layers, and memory-efficient attention to handle longer sequences without excessive computational costs. 3. Question: What role does positional encoding play in LLMs? Answer: Positional encoding is crucial in transformer models because, unlike RNNs or CNNs, transformers don't have inherent sequential order. Positional encoding provides information about the position of words in a sequence, allowing the model to understand the relative order of tokens. 4. Question: How do LLMs balance performance and memory efficiency? Answer: LLMs balance performance and memory by using techniques like weight sharing, model quantization, sparse attention, and checkpointing during training. These methods help reduce the memory footprint while maintaining accuracy and performance in handling large datasets and long sequences. 5. Question: What are some techniques for reducing the size of LLMs without sacrificing performance? Answer: Techniques include knowledge distillation (training a smaller "student" model to mimic the outputs of a larger "teacher" model), pruning (removing unnecessary neurons or weights), quantization (using lower-precision numbers for weights), and Low-Rank Adaptation (LoRA) during fine-tuning. Transformers LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024 76
  • 77. LLMs and GenAI Simplified: An Easy Path to Understanding 1. Question: What is the self-attention mechanism, and why is it important in transformers? Answer: The self-attention mechanism allows the model to weigh the importance of different words in a sentence, even those far apart. Each token "attends" to every other token in the input sequence, helping the model capture contextual relationships more effectively than traditional RNNs or CNNs. It is essential because it enables transformers to process sequences in parallel and handle long-range dependencies. 2. Question: How do transformers differ from traditional RNNs and CNNs? Answer: Transformers do not process input sequentially, as RNNs do. Instead, they use self-attention to capture relationships between tokens in parallel, making them highly efficient for long sequences. CNNs are limited by their local receptive fields, while transformers can capture global dependencies in the data. 3. Question: Can you explain multi-head attention and why it's beneficial? Answer: Multi-head attention splits the input into multiple subspaces, allowing the model to focus on different aspects of the sequence simultaneously. Each attention head can attend to different parts of the sequence, which helps the model capture more nuanced relationships between tokens. 4. Question: How does the transformer architecture scale, and what challenges come with scaling? Answer: Transformer models scale by increasing the number of layers, attention heads, and parameters. However, scaling brings challenges like increased computational costs, memory usage, and the risk of overfitting. Efficient training techniques like distributed computing, gradient checkpointing, and memory-efficient attention are required to manage these issues. 5. Question: What is the role of feed-forward networks in transformers? Answer: Feed-forward networks in transformers are applied independently to each token after the attention mechanism. They consist of two fully connected layers with an activation function in between, which allows the model to apply nonlinear transformations and increase its capacity to capture complex patterns in the data. Optimization Techniques 1. Question: What is gradient descent, and why is it important for training LLMs? Answer: Gradient descent is an optimization algorithm used to minimize the loss function during training. It iteratively adjusts the model’s parameters based on the gradient of the loss function with respect to the parameters. This process is crucial for making the model learn from the data and improve its performance over time. LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024 77
  • 78. LLMs and GenAI Simplified: An Easy Path to Understanding 2. Question: How does Adam differ from SGD, and why is it commonly used for LLMs? Answer: Adam (Adaptive Moment Estimation) is an optimization algorithm that combines the benefits of both momentum (like in SGD with momentum) and adaptive learning rates. Adam is preferred for LLMs because it adapts the learning rate for each parameter, making it efficient for large models with sparse gradients. 3. Question: What is weight decay, and how does it help with training LLMs? Answer: Weight decay is a regularization technique that penalizes large weights during training to prevent overfitting. It helps the model generalize better to unseen data by discouraging the learning of complex, unnecessary features. 4. Question: What is layer normalization, and how does it improve model training? Answer: Layer normalization standardizes the inputs to each layer, which stabilizes training and helps prevent issues like vanishing or exploding gradients. It improves the speed and efficiency of training by ensuring that the model's activations remain within a stable range. 5. Question: How do learning rate schedules impact the performance of LLMs? Answer: Learning rate schedules dynamically adjust the learning rate during training. Starting with a higher learning rate and gradually decreasing it (cosine decay or step decay) helps the model learn faster initially and fine-tune the weights more precisely later on, improving overall performance. Ethical Considerations 1. Question: What are some common ethical concerns when deploying generative AI models? Answer: Ethical concerns include bias in the model’s outputs, misinformation, privacy violations, and the potential misuse of generated content. Additionally, generative models can produce harmful or offensive content, which raises concerns about accountability and control in deployment environments. 2. Question: How do you address the problem of bias in large language models? Answer: Bias can be mitigated through careful dataset selection, bias mitigation techniques during training (e.g., adversarial training), and post-processing adjustments. Transparency, fairness, and continuous monitoring during deployment are also crucial steps to address bias. 3. Question: How can you ensure AI models respect privacy regulations like GDPR? Answer: Respecting privacy regulations involves anonymizing sensitive data, ensuring explicit user consent for data collection, and implementing model training techniques that avoid storing personally identifiable information (PII). Federated learning and differential privacy can also help in creating models that respect privacy. 4. Question: How can you ensure that generative AI models don’t contribute to misinformation? Answer: To reduce the risk of misinformation, AI models should be trained on high-quality, verified datasets and include human-in-the-loop oversight. Additionally, models can be fine-tuned for fact-checking and verification tasks to identify and reduce false information in outputs. LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024 78
  • 79. LLMs and GenAI Simplified: An Easy Path to Understanding 5. Question: What role does transparency play in AI governance, and how do you implement it? Answer: Transparency ensures that AI models and their decision-making processes are understandable and accountable. This can be implemented by providing clear documentation, model cards, and explaining the rationale behind the model’s predictions (using explainability techniques). Deployment Strategies 1. Question: What are some challenges when deploying large language models in production? Answer: Challenges include high computational costs, latency issues, scaling to handle high traffic, ensuring model updates without downtime, managing version control, and addressing ethical concerns such as bias or harmful outputs. 2. Question: How can you optimize the inference speed of LLMs during deployment? Answer: Inference speed can be optimized by techniques such as model quantization, using faster hardware like GPUs or TPUs, reducing model size with pruning or distillation, and utilizing caching and batching for handling multiple requests efficiently. 3. Question: What are the differences between on-premise and cloud-based deployment for LLMs? Answer: On-premise deployment offers more control over data privacy and latency but requires significant hardware investment and maintenance. Cloud-based deployment provides scalability, flexibility, and lower upfront costs but comes with potential concerns around data privacy and dependency on third-party providers. 4. Question: How do you ensure continuous model improvement in production environments? Answer: Continuous model improvement can be ensured by setting up a feedback loop where user interactions are monitored for errors or misclassifications. Retraining the model with updated or new data, along with A/B testing for performance monitoring, also helps keep models up-to-date and accurate. 5. Question: What is edge deployment, and when is it preferable over cloud-based deployment for LLMs? Answer: Edge deployment involves running AI models directly on local devices (e.g., smartphones, IoT devices), reducing latency and dependency on network connections. It is preferable for applications requiring real-time inference, enhanced privacy, and low-latency responses, such as autonomous vehicles or smart home devices. Hugging Face 1. Question: How would you fine-tune a pre-trained model using Hugging Face’s Transformers library? Answer: Fine-tuning a pre-trained model with Hugging Face typically involves loading LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024 79
  • 80. LLMs and GenAI Simplified: An Easy Path to Understanding the pre-trained model and tokenizer using AutoModelForSequenceClassification and AutoTokenizer. You then prepare a custom dataset, format it using datasets.Dataset or DataLoader, and use the Trainer API to handle the training loop. The Trainer API allows for easy configuration of training parameters, evaluation metrics, and optimizer setup. During training, only the final layers are adjusted while the pre-trained layers are mostly retained. 2. Question: Can you explain the role of the Hugging Face Model Hub and how it simplifies the process of working with LLMs? Answer: The Hugging Face Model Hub serves as a repository where pre-trained models and datasets are shared by the community. It simplifies the process by allowing users to search, download, and use pre-trained models across many domains (e.g., NLP, vision) without having to build models from scratch. It also enables easy sharing and version control for custom models, and it integrates seamlessly with the transformers and datasets libraries. 3. Question: How do you manage and version control different models and datasets in Hugging Face? Answer: Hugging Face offers Git-based version control for models and datasets. You can create, push, and maintain different versions of your models on the Hub, ensuring reproducibility and collaborative development. Hugging Face allows users to tag models with specific versions and track changes, much like traditional software version control. 4. Question: What’s the difference between AutoModel and AutoTokenizer classes in Hugging Face Transformers? Answer: AutoModel refers to a class that automatically selects the correct model architecture based on a pre-trained model checkpoint. AutoTokenizer, on the other hand, handles the tokenization of the input text, converting it into a format that the model can understand. Both classes offer a simplified way to load models and tokenizers for different tasks (e.g., text classification, question answering) without specifying each architecture explicitly. 5. Question: Can you walk us through the process of creating a custom dataset for training an LLM in Hugging Face? Answer: Creating a custom dataset for Hugging Face can be done by formatting the data into JSON, CSV, or Pandas format. Using the datasets library, you can load the dataset with the load_dataset function. You can further preprocess, tokenize, and split the dataset into training, validation, and test sets. Custom data can also be uploaded to the Hugging Face Hub for public use or personal experiments. OpenAI 1. Question: How does OpenAI’s GPT model handle generating responses when no fine-tuning has been applied? Answer: GPT models are trained with a general understanding of language through LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024 80
  • 81. LLMs and GenAI Simplified: An Easy Path to Understanding large-scale pretraining. Even without fine-tuning, GPT models generate responses by relying on their pre-trained knowledge and patterns learned during training. They leverage the input prompt to generate contextually relevant text by predicting the next word based on the tokens seen so far. These models are typically capable of performing general tasks like summarization, translation, or conversation without specific task training. 2. Question: Explain OpenAI’s approach to aligning large language models with human values (Reinforcement Learning from Human Feedback - RLHF). Answer: Reinforcement Learning from Human Feedback (RLHF) is a technique used by OpenAI to align the behavior of LLMs with human preferences. It involves training the model on a dataset of human-labeled responses where humans rate or rank model outputs. The feedback is used to reward desirable behavior and penalize undesirable behavior, thus guiding the model to produce outputs that are more aligned with human expectations and values. 3. Question: What are some use cases where OpenAI’s GPT models can be directly integrated into applications? Answer: GPT models can be integrated into a variety of applications such as customer service chatbots, automated content generation (e.g., blog writing, social media posts), virtual assistants, language translation, summarization tools, and even code generation for developers. They can also be used for answering complex queries, drafting emails, and automating workflows in businesses. 4. Question: How does OpenAI’s API pricing model work, and how can you optimize costs when deploying LLMs? Answer: OpenAI’s API pricing is generally based on the number of tokens processed during requests. To optimize costs, you can reduce the length of prompts and responses, use lower-capacity models for simpler tasks (e.g., GPT-3.5 instead of GPT-4), and cache frequently used results. Batching requests and applying temperature or frequency controls can also reduce unnecessary token usage. 5. Question: How would you fine-tune an OpenAI model for a specific task like legal document summarization? Answer: OpenAI’s models cannot be directly fine-tuned by users, but you can achieve task-specific optimization by carefully crafting prompts (prompt engineering) for legal document summarization. You can use a few-shot learning approach where examples of summarization are included in the prompt, guiding the model to output summaries in the required format. Additionally, you could build a pipeline to preprocess legal text before feeding it into the model. LangChain 1. Question: What is LangChain, and how does it extend the capabilities of large language models? Answer: LangChain is a framework for building applications that use large language LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024 81
  • 82. LLMs and GenAI Simplified: An Easy Path to Understanding models (LLMs) in more complex, interactive, and contextual ways. It allows developers to connect LLMs with external data sources, build multi-step chains, and maintain memory across conversations, enabling sophisticated applications like chatbots, agents, and automated reasoning systems. 2. Question: Can you explain the concept of “chains” in LangChain and how they help in building complex workflows for AI models? Answer: In LangChain, “chains” are sequences of linked operations that guide an LLM through multiple steps of a task. A chain could include steps like querying a database, processing an API call, performing text generation, or retrieving information. By combining these steps, developers can build workflows where each stage refines the output based on the previous one, creating more advanced interactions and decision-making capabilities. 3. Question: How would you use LangChain to integrate external data sources like APIs or databases into a language model workflow? Answer: LangChain allows the integration of external data sources by creating specific “chains” or modules that can query APIs or databases during the execution of the workflow. For example, you could use a SQL chain to retrieve information from a database or an API chain to call external APIs. This data can then be fed into the LLM to generate more contextually relevant responses. 4. Question: What is the role of memory in LangChain, and how does it help maintain context across interactions with LLMs? Answer: Memory in LangChain allows the model to remember and maintain context over multiple interactions or conversations. Instead of treating each interaction as independent, memory helps the model retain information from previous steps or exchanges, making it suitable for conversational agents or chatbots that need to reference past interactions. 5. Question: How does LangChain support different types of tasks like summarization, question answering, and chatbots? Answer: LangChain provides task-specific modules for different types of operations. For example, it has ready-made chains for summarization, question answering, and document retrieval. It also supports custom task chains that can be combined with other data-processing steps to perform more specialized functions like chatbot creation or real-time decision-making. Fine-Tuning 1. Question: What are the main steps involved in fine-tuning a pre-trained model for a specific task? Answer: The main steps for fine-tuning a pre-trained model are: (1) Select a pre-trained model relevant to the task, (2) Prepare and preprocess a task-specific dataset, (3) Freeze some layers of the pre-trained model (optional, for efficiency), (4) Fine-tune the LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024 82
  • 83. LLMs and GenAI Simplified: An Easy Path to Understanding remaining layers by adjusting hyperparameters (e.g., learning rate, batch size), and (5) Validate the model on a held-out dataset to ensure generalization. 2. Question: How would you decide whether to fine-tune a model or use it out of the box for your application? Answer: The decision depends on the specificity of the task and the available data. For generic tasks, using a pre-trained model without fine-tuning is often sufficient. However, if the task requires domain-specific knowledge (e.g., legal, medical), or if the out-of-the-box performance is not satisfactory, fine-tuning with a relevant dataset is necessary to tailor the model for your application. 3. Question: Can you explain the difference between task-specific fine-tuning and domain-specific fine-tuning? Answer: Task-specific fine-tuning involves adjusting the model to perform a particular task, like classification, summarization, or translation. Domain-specific fine-tuning, on the other hand, involves adapting the model to a specialized domain (e.g., finance, healthcare) by training it on data that includes the terminology and nuances of that domain, enabling better performance for tasks within that field. 4. Question: How does the choice of dataset impact the effectiveness of fine-tuning an LLM? Answer: The dataset’s quality, size, and relevance to the target task/domain are critical. A high-quality, task-specific Interview Questions for Practice Generative AI (Gen AI) and Large Language Models (LLM) LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024 83
  • 84. LLMs and GenAI Simplified: An Easy Path to Understanding 1. Can you explain the difference between generative AI and traditional machine learning models? 2. How does a large language model (LLM) work, and what makes it different from other types of neural networks? 3. What are the challenges of deploying generative AI models in production environments? 4. Describe how transformers have revolutionized NLP and why they are key to the success of LLMs. 5. How would you evaluate the performance of a generative language model, beyond simple accuracy metrics? 6. What is "attention" in the context of transformers, and how does it contribute to a model's ability to understand context? 7. Can you explain the difference between zero-shot, one-shot, and few-shot learning, and how LLMs use them? 8. What are some common ethical concerns surrounding the use of generative AI in content creation? 9. How does temperature affect the output of generative language models? 10. How do LLMs handle long-range dependencies in text, and why is this important for text generation? Hugging Face 1. How would you fine-tune a pre-trained model using Hugging Face’s Transformers library? 2. Can you explain the role of the Hugging Face Model Hub and how it simplifies the process of working with LLMs? 3. How do you manage and version control different models and datasets in Hugging Face? 4. What’s the difference between AutoModel and AutoTokenizer classes in Hugging Face Transformers? 5. Can you walk us through the process of creating a custom dataset for training an LLM in Hugging Face? 6. How would you evaluate model performance using Hugging Face’s datasets library? 7. What are some best practices for sharing models on Hugging Face’s model repository? 8. Can you explain how Hugging Face’s accelerate library helps in speeding up model training and inference? 9. What are the key differences between Hugging Face’s Trainer API and writing custom training loops? 10. How would you deploy a Hugging Face model on AWS or another cloud platform? OpenAI 1. How does OpenAI’s GPT model handle generating responses when no fine-tuning has been applied? LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024 84
  • 85. LLMs and GenAI Simplified: An Easy Path to Understanding 2. Explain OpenAI’s approach to aligning large language models with human values (Reinforcement Learning from Human Feedback - RLHF). 3. What are some use cases where OpenAI’s GPT models can be directly integrated into applications? 4. How does OpenAI’s API pricing model work, and how can you optimize costs when deploying LLMs? 5. What are the steps involved in using OpenAI’s GPT-4 for generating content specific to a niche domain? 6. How would you fine-tune an OpenAI model for a specific task like legal document summarization? 7. What are some security concerns when integrating OpenAI’s API into a production system? 8. How does OpenAI handle tokenization, and what are the trade-offs of its token-based pricing? 9. How does OpenAI ensure that the data used in pre-training their models remains ethical and unbiased? 10. Can you explain the importance of API rate limits in OpenAI’s products and how you would handle them in a large-scale deployment? LangChain 1. What is LangChain, and how does it extend the capabilities of large language models? 2. Can you explain the concept of “chains” in LangChain and how they help in building complex workflows for AI models? 3. How would you use LangChain to integrate external data sources like APIs or databases into a language model workflow? 4. What is the role of memory in LangChain, and how does it help maintain context across interactions with LLMs? 5. Can you describe a scenario where you would use LangChain to create a multi-step conversation with a language model? 6. How does LangChain support different types of tasks like summarization, question answering, and chatbots? 7. Can you explain how LangChain interacts with different LLM providers, such as OpenAI and Hugging Face, in the same workflow? 8. What are the advantages of using LangChain over directly interacting with an LLM API? 9. How would you design a LangChain pipeline for a customer support chatbot that retrieves answers from a knowledge base? 10. Can you walk us through an example of using LangChain for text generation based on real-time financial data? Fine-Tuning 1. What are the main steps involved in fine-tuning a pre-trained model for a specific task? LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024 85
  • 86. LLMs and GenAI Simplified: An Easy Path to Understanding 2. How would you decide whether to fine-tune a model or use it out of the box for your application? 3. Can you explain the difference between task-specific fine-tuning and domain-specific fine-tuning? 4. What are some of the common challenges when fine-tuning a large language model, and how can they be mitigated? 5. How does the choice of dataset impact the effectiveness of fine-tuning an LLM? 6. Can you explain the concept of Low-Rank Adaptation (LoRA) and its role in fine-tuning large models? 7. How do you handle overfitting when fine-tuning a model on a relatively small dataset? 8. What are some strategies to reduce computational cost during fine-tuning without sacrificing model performance? 9. How do you fine-tune a model for multilingual tasks, and what are the key considerations in this process? 10. Can you describe how fine-tuning might affect the ethical considerations surrounding the deployment of a large language model? AI Governance 1. What is AI governance, and why is it critical in today’s AI development landscape? 2. How would you address the challenges of AI transparency and explainability in a black-box model like GPT? 3. What role does data governance play in ensuring the ethical use of AI models? 4. How do you ensure fairness and mitigate bias in AI models during development and deployment? 5. What are the key components of an effective AI governance framework within an organization? 6. Can you explain how privacy concerns are handled in AI systems that process sensitive data? 7. How do regulations like GDPR affect the way AI models are trained and deployed, especially when using user-generated data? 8. What strategies can be used to ensure that AI models remain aligned with ethical principles and societal values? 9. How would you measure and assess the risks posed by deploying generative AI models in public-facing applications? 10. What steps would you take to implement an AI governance policy that addresses both ethical concerns and operational efficiency? LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024 86
  • 87. LLMs and GenAI Simplified: An Easy Path to Understanding LLM FineTuning Code Samples: Usecase: LangChain-OpenAI-RAG This example loads latest winter olympic data from pdf and you can chat with pdf Note: This code is working in Google Colob - Oct 7th 2024 Google Colob link: https://colab.research.google.com/drive/1N3a1hEWuDDlUw6bliKHVM4l1IARPX2iB # Import necessary libraries !pip install langchain !pip install openai !pip install PyPDF2 !pip install faiss-cpu !pip install tiktoken !pip install -U langchain-community !pip install pypdf """ # try below prompts You: who is youngest medalist Assistant: Scott ALLEN from the USA, who won a bronze medal in figure skating at the age of 14 years and 363 days in 1964. You: wrong Assistant: The youngest medalist in an individual event is Scott ALLEN (USA), who won a bronze medal in figure skating in 1964 at the age of 14 years and 363 days. You: do you find Kim Yun Assistant: Yes, Yun-Mi KIM (KOR) is the youngest gold medallist in an individual event, winning the short-track speed skating 3,000m relay in 1994 at the age of 13 years and 85 days. She also won a bronze medal in the same event at the 1998 Winter Olympics. You: then why you told wrong Assistant: I mentioned Scott Allen because he is the youngest medalist in an individual event for men. Yun-Mi Kim is the youngest medalist in an individual event for women. LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024 87
  • 88. LLMs and GenAI Simplified: An Easy Path to Understanding You: do you see Dimitrios Loundras Assistant: Scott ALLEN (USA), bronze medallist in figure skating in 1964, aged 14 years and 363 days. You: do you know Dimitrios Loundras as medalist Assistant: Dimitrios Loundras is not mentioned in this context, so it is not possible to determine who he is or if he won a medal. """ from langchain.document_loaders import PyPDFLoader from langchain.text_splitter import CharacterTextSplitter from langchain.embeddings.openai import OpenAIEmbeddings from langchain.vectorstores import FAISS from langchain.chains import ConversationalRetrievalChain from langchain.llms import OpenAI from langchain.memory import ConversationBufferMemory #from openai import OpenAI # for calling the OpenAI API from langchain.chains.question_answering import load_qa_chain from langchain.chains.combine_documents.stuff import StuffDocumentsChain from langchain.chains import LLMChain # Import LLMChain from langchain.prompts import PromptTemplate # Import PromptTemplate GPT_MODEL = "gpt-3.5-turbo" from google.colab import userdata api_key = userdata.get('OPENAI_API_KEY') #client = OpenAI(api_key=api_key) # 1. Download the PDF to your local machine #!wget https://stillmed.olympics.com/media/Documents/Olympic-Games/Factsheets/Reco rds-of-medals-at-the-Olympic-Winter-Games.pdf # 2. Load the PDF Document from the local file pdf_loader = PyPDFLoader("sample_data/Records-of-medals-at-the-Olympic-Winter-Games.pdf" ) # Load from the downloaded file documents = pdf_loader.load() # 3. Split the document into chunks splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=100) document_chunks = splitter.split_documents(documents) # 4. Generate embeddings for the document chunks LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024 88
  • 89. LLMs and GenAI Simplified: An Easy Path to Understanding embeddings = OpenAIEmbeddings(openai_api_key=api_key) # Use the api_key variable here vector_store = FAISS.from_documents(document_chunks, embeddings) # 5. Set up the memory for conversation history memory = ConversationBufferMemory(memory_key="chat_history", return_messages=True) # 6. Create a Conversational Retrieval Chain using OpenAI as the LLM llm = OpenAI(openai_api_key=api_key) # Use the api_key variable here retriever = vector_store.as_retriever(search_kwargs={"k": 2}) # Create a combine_docs_chain chain = load_qa_chain(llm=llm, chain_type="stuff") # Create a default QA chain # Define a question generation chain template = """Given the following conversation and a follow up question, rephrase the follow up question to be a standalone question. Chat History: {chat_history} Follow Up Input: {question} Standalone Question:""" prompt_template = PromptTemplate( input_variables=["chat_history", "question"], template=template ) question_generator = LLMChain(llm=llm, prompt=prompt_template) # Conversational chain that uses LLM and document retrieval conversational_chain = ConversationalRetrievalChain( retriever=retriever, combine_docs_chain=chain, # Pass the combine_docs_chain memory=memory, question_generator=question_generator # Pass the question_generator ) # 7. Start a conversation with the PDF print("Ask your question about the PDF!") while True: query = input("You: ") if query.lower() == "exit": LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024 89
  • 90. LLMs and GenAI Simplified: An Easy Path to Understanding print("Ending the chat!") break response = conversational_chain({"question": query}) print(f"Assistant: {response['answer']}") AI Evaluation Metrics 1. Classification Metrics: These are used when the model predicts discrete labels or categories. ● Accuracy: The percentage of correct predictions out of total predictions. Best suited for balanced datasets. ● Precision: The ratio of true positives to the sum of true positives and false positives. Focuses on the quality of positive predictions. ● Recall (Sensitivity): The ratio of true positives to the sum of true positives and false negatives. Focuses on the ability to capture all positive cases. ● F1 Score: Harmonic mean of precision and recall. Useful when the balance between precision and recall is important. ● ROC-AUC (Receiver Operating Characteristic - Area Under Curve): Measures the ability of the model to distinguish between classes. AUC = 1 represents a perfect model, while AUC = 0.5 represents a random model. ● Confusion Matrix: Provides a breakdown of actual vs. predicted classifications, showing true positives, false positives, true negatives, and false negatives. ● Log Loss (Cross-Entropy Loss): Penalizes incorrect classifications by the predicted probability assigned to each class, providing insight into the confidence of the model's predictions. 2. Regression Metrics: For models that predict continuous values. LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024 90
  • 91. LLMs and GenAI Simplified: An Easy Path to Understanding ● Mean Squared Error (MSE): Measures the average of the squares of the errors. Penalizes larger errors more than smaller ones. ● Root Mean Squared Error (RMSE): The square root of MSE, interpretable in the same units as the predicted values. ● Mean Absolute Error (MAE): Measures the average absolute difference between predicted and actual values. More robust to outliers than MSE. ● R² (Coefficient of Determination): Indicates the proportion of the variance in the dependent variable that is predictable from the independent variables. Values closer to 1 indicate a better fit. ● Adjusted R²: Modified version of R² that adjusts for the number of predictors in the model, helping to avoid overfitting. ● Mean Absolute Percentage Error (MAPE): Measures the percentage error between predicted and actual values. Useful for comparing models in terms of relative error. 3. Natural Language Processing (NLP) Metrics: For tasks like text generation, question answering, and classification. ● BLEU (Bilingual Evaluation Understudy): Evaluates the accuracy of machine-generated text by comparing it to reference texts based on n-gram overlap. ● ROUGE (Recall-Oriented Understudy for Gisting Evaluation): Measures recall-based precision and overlap between model-generated text and reference text, particularly useful for summarization tasks. ● Perplexity: Often used in language modeling, perplexity is a measure of how well a probability model predicts a sample. Lower perplexity indicates better performance. ● Exact Match (EM): Common in question answering tasks, it measures whether the predicted answer matches the ground truth exactly. ● Word Error Rate (WER): Measures the number of substitutions, insertions, and deletions in speech-to-text predictions. Lower WER indicates better accuracy. ● BERTScore: Uses embeddings from transformer models like BERT to compute the similarity between generated text and reference text. 4. Clustering Metrics: For unsupervised learning tasks like clustering. ● Silhouette Score: Measures how similar a data point is to its own cluster compared to other clusters. Ranges from -1 to 1, with higher values indicating better-defined clusters. ● Adjusted Rand Index (ARI): Compares the similarity between two clusterings by considering all pairs of samples and counting pairs that are assigned in the same or different clusters in both clusterings. ● Davies-Bouldin Index: Measures the average similarity ratio of each cluster with its most similar cluster. Lower values indicate better clustering. ● Homogeneity Score: Measures whether each cluster contains only members of a single class. LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024 91
  • 92. LLMs and GenAI Simplified: An Easy Path to Understanding 5. Ranking Metrics: Used in tasks such as information retrieval and recommendation systems. ● Mean Reciprocal Rank (MRR): Evaluates how well a list of ranked items matches the ground truth list. ● Normalized Discounted Cumulative Gain (nDCG): Measures the usefulness, or gain, of an item based on its position in the result list, rewarding higher ranks more than lower ranks. ● Hit Rate (HR): Measures the percentage of times the ground truth item is present in the top-K recommendations. 6. Advanced Metrics: For deep learning models, complex tasks, and more nuanced model evaluations. ● Precision-Recall AUC: Similar to ROC-AUC but more informative in cases of imbalanced datasets, showing trade-offs between precision and recall. ● Brier Score: Measures the accuracy of probabilistic predictions. Lower values indicate better probabilistic predictions. ● Expected Calibration Error (ECE): Measures how well predicted probabilities align with actual outcomes. ● Shapley Values (SHAP): Used for model explainability by measuring the contribution of each feature to the prediction of individual instances. ● Fisher Information Matrix (FIM): Measures how much information a parameter contains about the outcome of the model, often used in reinforcement learning and meta-learning. 7. Multiclass and Multilabel Metrics: For problems with more than two labels or where multiple labels can be assigned to a single instance. ● Macro-Averaged Precision/Recall/F1: Averages the metric across all classes without considering the proportion of each class. ● Micro-Averaged Precision/Recall/F1: Averages the metric across all classes by considering the total true positives, false positives, and false negatives. ● Hamming Loss: The fraction of labels that are incorrectly predicted in multilabel classification tasks. 8. Fairness and Bias Metrics: To ensure AI models perform equitably across demographic groups. LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024 92
  • 93. LLMs and GenAI Simplified: An Easy Path to Understanding ● Demographic Parity: Measures if the model's predictions are independent of a protected attribute (like gender or race). ● Equalized Odds: Measures whether a model's false positive and true positive rates are equal across groups. ● Disparate Impact: Evaluates whether a protected group is adversely affected by the model's decisions. Conclusion To refine your AI model evaluation, choose metrics that are most aligned with the task, goal (e.g., accuracy vs. interpretability), and data type. For instance: ● NLP tasks may rely heavily on metrics like BLEU or ROUGE. ● Fairness metrics are critical in socially sensitive applications. ● Advanced AI applications can use SHAP values or expected calibration error for deeper insights into model performance and reliability. Appendix A: External References PDFS: Machine learning ● Cambridge machine learning: https://alex.smola.org/drafts/thebook.pdf ● ML from Theory to algorithms: https://www.cs.huji.ac.il/~shais/UnderstandingMachineLearning/understanding-machine-l earning-theory-algorithms.pdf ● O'reilly: https://www.nrigroupindia.com/e-book/Introduction%20to%20Machine%20Learning%20 with%20Python%20(%20PDFDrive.com%20)-min.pdf ● Pattern machine learning https://www.microsoft.com/en-us/research/uploads/prod/2006/01/Bishop-Pattern-Recogn ition-and-Machine-Learning-2006.pdf LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024 93
  • 94. LLMs and GenAI Simplified: An Easy Path to Understanding ● Machine learning https://www.cin.ufpe.br/~cavmj/Machine%20-%20Learning%20-%20Tom%20Mitchell.pdf ● Machine Learning Lecture notes https://mrcet.com/downloads/digital_notes/CSE/IV%20Year/MACHINE%20LEARNING(R 17A0534).pdf ● Stanford ML book https://ai.stanford.edu/~nilsson/MLBOOK.pdf ● Data science & Statistics https://people.smp.uq.edu.au/DirkKroese/DSML/DSML.pdf ● Fundamentals of ML https://www.hlevkin.com/hlevkin/45MachineDeepLearning/ML/Foundations_of_Machine_ Learning.pdf ● ML for beginners https://bmansoori.ir/book/Machine%20Learning%20For%20Absolute%20Beginners.pdf ● ML lectures https://www.seas.upenn.edu/~cis5190/fall2017/lectures/01_introduction.pdf ● ML basics https://courses.edx.org/asset-v1:ColumbiaX+CSMM.101x+1T2017+type@asset+block@ AI_edx_ml_5.1intro.pdf ● Harvard UG Book https://harvard-ml-courses.github.io/cs181-web/static/cs181-textbook.pdf ● Deep Learning https://fleuret.org/public/lbdl.pdf ● Hundred page ML book http://ema.cri-info.cm/wp-content/uploads/2019/07/2019BurkovTheHundred-pageMachin eLearning.pdf ● Fundamentls of ML https://www.interactions.com/wp-content/uploads/2017/06/machine_learning_wp-5.pdf ● A Course in Machine Learning [Download] ● Advanced Machine Learning with Python [Download] ● Big Data, Data Mining, and Machine Learning [Download] ● Building Intelligent Systems - A Guide to Machine Learning Engineering [Download] ● Building Machine Learning Systems with Python - Second Edition [Download] ● Designing Machine Learning Systems with Python [Download] ● Introduction to Machine Learning with Python [Download] ● Introduction To Python Programming - Beginner's Guide To Computer Programming And Machine Learning [Download] ● Large Scale Machine Learning with Python [Download] ● Large Scale Machine Learning with Spark [Download] LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024 94
  • 95. LLMs and GenAI Simplified: An Easy Path to Understanding ● Learning Generative Adversarial Networks [Download] ● Learning NumPy Array [Download] ● Learning scikit-learn - Machine Learning in Python [Download] ● Machine Learning - Hands-On for Developers and Technical Professionals [Download] ● Machine Learning - Jason Bell [Download] ● Machine Learning for Developers [Download] ● Machine Learning for Email [Download] ● Machine Learning for Hackers [Download] ● Machine Learning for the Web [Download] ● Machine Learning in Action - 中文版 [Download] ● Machine Learning in Action [Download] ● Machine Learning in Java [Download] ● Machine Learning Projects for .NET Developers [Download] ● Machine Learning Using C# Succinctly [Download] ● Machine Learning with Spark [Download] ● Mastering .NET Machine Learning [Download] ● Mastering Machine Learning with Python in Six Steps [Download] ● Mastering Machine Learning with scikit-learn - Second Edition [Download] ● Microsoft Azure Machine Learning [Download] ● Neural Network Programming with Java [Download] ● Neural Networks Using C# Succinctly [Download] ● Practical Machine Learning with H2O - Powerful, Scalable Techniques for Deep Learning and AI [Download] ● Practical Machine Learning [Download] ● Practical Reinforcement Learning [Download] ● Python - Deeper Insights into Machine Learning [Download] ● Python for Probability, Statistics, and Machine Learning [Download] ● Python Machine Learning Blueprints [Download] ● Python Machine Learning By Example [Download] ● Python Machine Learning Case Studies [Download] ● Python Machine Learning Cookbook - Early Release [Download] ● Python Machine Learning Cookbook [Download] ● Python Machine Learning [Download] ● Python Real World Machine Learning [Download] ● Quantum Machine Learning - Peter Wittek [Download] ● Real-World Machine Learning [Download] ● Reinforcement Learning - With Open AI, TensorFlow and Keras Using Python [Download] ● scikit-learn Cookbook - Second Edition [Download] LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024 95
  • 96. LLMs and GenAI Simplified: An Easy Path to Understanding ● Thoughtful Machine Learning with Python A Test-Driven Approach [Download] ● Thoughtful Machine Learning with Python [Download] ● Using Python to Develop Analytics, Control and Machine Learning Products [Download] ● What You Need to Know about Machine Learning [Download] ● What You Need to Know about R [Download] Gen AI security: https://arxiv.org/pdf/2405.12750 LLM and Gen AI: https://publications.parliament.uk/pa/ld5804/ldselect/ldcomm/54/54.pdf Gen AI risks: https://arxiv.org/pdf/2406.04734 LLM and GPT: https://www.american-cse.org/csce2023-ieee/pdfs/CSCE2023-5LlpKs7cpb4k2UysbLCuOx/2759 00a383/275900a383.pdf Code: Hugging Face: https://github.com/huggingface OpenAI: https://platform.openai.com/docs/examples https://github.com/openai/openai-cookbook/tree/main/examples Langchain: https://python.langchain.com/v0.1/docs/guides/productionization/evaluation/examples/ Transformer notebooks: https://github.com/sukhitashvili/transformer_notebooks Blogs https://www.vellum.ai/llm-leaderboard#cost-context LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024 96
  • 97. LLMs and GenAI Simplified: An Easy Path to Understanding Articles: LLMs and GenAI Simplified Created by srini pusuluri CRM/AI architect - Last Updated Oct 8 2024 97