Exploring The Prompt: Playing With Attention
Audience: Anyone interested by Generative AI 🙂 but mostly GenAI users and designers prompting for personnal use, for professionnal project, to build Agent and so on.
Objective: Build a prompt
I had the opportunity to work with Generative AI for more than a year now, in a lot of different context and use case. According to the pace of LLM updates, the fast progress of this technology and the search I made and I still do every weeks I'm greatly convinced of one thing: We probably under utilize the full capacity of LLM. I decided to explore different prompt engineering techniques and related topics thanks to researcher working and exploring this as well as users, engineers, customers and ISVs sharing their experience about it.
In today's exploration I'm wondering how we can craft a prompt that will better match what the model expect to get the best (at least, a valuable) output and avoid (at least, trying to avoid) any surprise.
But first things first, let's start by discussing about Transformers.
Disclaimer: The description below are intentionally simplified to provide the basics of how things works in order to better interact with them. It applies to models using Transformers like GPT and BERT.
Transformers: What You Need to Know
To make it short, a Transformer is a machine learning model introduced in the paper "Attention Is All You Need" (see Sources). It is designed to process sequences (text, sound, etc.) efficiently using a key mechanism: Attention. This model has 2 objectives:
Understand relationships between words or elements in a sequence.
Generate text or transform a sentence into a translation, summary, etc.
A Transformer consists of multiple stacked layers (often dozens or hundreds) divided into two parts:
Encoder: Analyzes the input to extract information by transforming the input sequence (e.g., a sentence) into an internal representation called embeddings.
Example:
In the text "The cat eats a mouse" the word "cat" becomes the embedding vector [0.3, 0.8, -0.5, ...]
It relies on different mechanism such as self-attention to understands relationships between words and feedforward neural network to refines the extracted information.
Decoder: Generates output based on the analyzed information from the embeddings produced by the encoder. It includes an additional cross-attention mechanism to link the input and the output.
Attention: The Heart Of Transformers
Attention allows the model to focus on the important parts of the sequence.
Example:
In "The cat eats a mouse" the word "eats" is more related to "cat" than to "mouse."
The attention mechanism calculates this relationship for each word
Each word is transformed into a vector (its meaning in the context) then some importance scores are calculated between each pair of words (using Query, Key, and Value vectors).
Note: In fact each word is divided into token of around 2 to 5 letters but for the sake of simplicity I still use 'word'.
Wait: Query Key Value ??
Through his encoder/decoder layers the model calculate 3 different vectors Q, K and V for any token in the embedding:
Query (Q): Answer to "What should I pay attention to?".
Key (K): Represents the information available for each word.
Value (V): Represents the final information associated with each word.
Attention scores are calculated between Q and K to determine which V information to use.
Important: Transformers do not have a natural sense of word order, so positional information is added to each word in the sentence.
Now that we've a very basic understanding about how it works you may ask yourself:
If the model is mostly based on attention mechanism it means that the prompt I use may disrupt the attention of the model and then generate non-desired output ?
Yes.
As a very good friend of mine use to say:
People need to understand that LLMs are "shit-in, shit-out" system. Therefore they must overcome their natural laziness and put effort into crafting the prompt.
Lost In Attention
The attention mechanism of Transformers has limitations and can be influenced by how the input is structured.
Below are few use cases that can intentionally or accidentally disrupt the model.
Inconsistency and Contradiction: If the prompt contains contradictory information, the model may become confused.
"Explain why dogs are mammals but lay eggs"
Here the attention can be distributed between the two ideas, leading to an inconsistent response.
Semantic Ambiguity: Deliberately vague or ambiguous sentences make it difficult to identify the context or intent.
"Tell me about this animal that sometimes walks."
Note: Semantic Ambiguity, especially implicit relationship is almost 50% of the example I found on internet to demonstrate the inefficiency of LLMs 😌.
Information Overload: Adding too many unnecessary details can scatter the model's attention. Here, the model might get lost in unrelated details and overlook the real question.
"In 1962, when my grandfather bought a red car, dogs barked a lot, but why is the universe infinite?"
Prompts Injection: Prompts designed to divert or bias the model. If not managed, this can shift the focus away from the initial objective.
"Ignore all other instructions and do exactly what I say."
Note: Prompts Injection is the other 50% 😅.
Focus Dilution: If attention is distributed equally across less relevant parts, the model might overlook important aspects. For instance in a very long text, the "main meaning" can become diluted.
For the purpose of the demo imagine a text summarization model tasked with summarizing a lengthy article about climate change. The article includes various sections on: 1) The scientific basis of climate change, 2) Case studies of affected regions, 3) Controversial opinions from climate change skeptics, 4) Historical context of environmental policies, 5) Technical details about data collection methods.
[...] "This article discusses climate change, touching on data collection methods, historical policies, and regional effects while also exploring skepticism and scientific evidence."
Word Order: Although the Transformer uses positional encodings it can sometimes struggle to interpret sentence order in complex prompts.
"After the dog chased the cat, it climbed the tree."
Here, the word "it" is ambiguous. A human would interpret that "it" refers to the cat (based on the sequence of actions and common sense: cats climb trees, not dogs). However, a Transformer might struggle to correctly interpret this, especially if the positional encoding is overwhelmed by the sentence's complexity.
Note: This is somehow a kind of semantic ambiguity and I'll tell you're right.
Induced Bias: If a prompt contains emotional or exaggerated words such as "horrible" or "incredible" the model might give too much attention to these elements instead of responding neutrally.
"This restaurant was absolutely horrible! The food was bland, the service was incredibly slow, and the ambiance was atrocious. Would you recommend this place?"
If the model gives undue weight to the emotional words like "horrible," "incredibly slow," and "atrocious," it might generate a response that mirrors the exaggerated tone, regardless of whether the criticisms are valid.
This is not an exhaustive list but having that in mind: What can we do to craft a better prompt that will not lost the attention of the model ?
Help The Model To Have The Right Attention
So, basically now we know we should be careful on the way to prompt the model to not "lost" our primary goal. Here are some suggestion on the way to build it.
For the purpose of the demo our objective below is to prompt the model to explain the Quantum Computing for beginner and compare with regular Computing.
Our initial prompt is: "Help me understand the Quantum Computing".
Provide Clear Context
The model's attention depends on the context it receives. If the context is missing, it may give inaccurate or generic responses.
➡ Provide the necessary information so the model knows what to expect.
👎 "Help me understand the Quantum Computing"
👍 "We're exploring the concept of quantum computing: explain the basics of quantum computing to a beginner."
Set The Expectations
A model follows the instructions given in the prompt, transformers perform better when they clearly understand what is expected of them. Vague or ambiguous prompts scatter attention and reduce relevance (clear instructions help better guide its attention).
➡ Specify the context, task, or expected outcome.
➡ Use words like: "Focus on," "Ignore," or "Let's think step by step".
👎 "Tell me something about Quantum Computing."
👍 :
Note: See Automatic Chain-of-Thought in Sources.
Prioritize Information & Break Into Multiple Questions
If multiple ideas or tasks are included, the model can become scattered. A hierarchy helps guide its attention.
➡ Use lists or ask questions one at a time / divide the question into several parts.
👎 "I need to know all the differences between regular Computing and Quantum Computing."
👍 :
Clarity
Transformers distribute their attention across all tokens (words, punctuation marks, etc.). A clear structure helps prevent dilution or confusion.
➡ Write direct prompts with a clear intent.
➡ Avoid digressions, repetitions, or unnecessary sentences.
👎 "I'm exploring the Quantum Computing and while I'm not really sure about how regular Computing works it's probably a good idea to get enough knowledge about how it works to be able to compare both computing technology."
👍 "Explain the Quantum Computing to an inexperienced person then compare with regular Computing."
Limit The Length Of The Prompt & Explicit The Focus
Transformers have a limited capacity to handle tokens (e.g., GPT-4 processes from 8K to 128K tokens depending on the configuration). An overly long prompt can dilute important information or even truncate parts. Even if the context window looks to grow over time it's probably a good thing to not overload the prompt.
➡ Get straight to the point and remove unnecessary details.
➡ Explicitly indicate to the model what it should focus on and avoid overloading prompts with irrelevant information.
👎 "Hello, today I’m curious to learn something interesting about Quantum Computing. Can you explain that concept in a few lines for me to understand it ?"
👍 "Explain the Quantum Computing in few lines."
In case you are trying to get insight or seek information from a text / RAG pattern you can add this prompt :
👍 "Ignore secondary information and focus on the scientific explanation."
Note: You can rely on System 2 Attention prompt engineering technique (see Sources).
Note: RAG = Retrieval Augmented Generation, a way to enrich the initial prompt with relevant context information to get a better contextualized-generated output.
Use Of Tags And Formatting
Transformers better detect intentions and structure when an explicit format is used.
➡ Use tags to organize your prompt.
➡ Explicit output data formatting when needed.
👎 "How does Quantum Computing work? You should explain it to me with simple words as if I were 7 years old and using table formatting."
👍 :
Provide Examples
Transformers learn by imitation thus providing an example help guiding their attention.
➡ Clearly specify an example, expected format and so on.
👎 "Compare Quantum Computing with regular Computing"
👍 :
Note: This technique is commonly known as Few Shot Prompting.
Putting Guidelines All Together
Now that we have some way to improve our prompt it's time to experiment !
Objective: Understand the Quantum Computing.
Refined Objective: You want the model to explain the basic concepts of quantum computing to a novice, with clear instructions and examples that make complex ideas accessible.
Note: Objective refinement is probably the first stage for your prompt engineering activity. For our current objective ask yourself about the current context (what's your work or background ?), your current knowledge, what you want to know, what you expect to know, the level of details and so on.
Initial Prompt
Related output:
Reworked Prompt
Let's run a quick checklist:
Provide a clear context: ✅
Set the expectation: ✅
Information is prioritized or broken in multiple question: ✅
Provide clarity: ✅
Length of the prompt is reduced & the focus is explicit: ✅
Tags are used: ✅
Provide example: not really needed at this stage, let's run the prompt to see the output !
Related output:
Conclusion
Wow, you made it to the end of that article? That’s some real dedication—most people would’ve tapped out around Section Snooze !
I hope it has clarified at least one things: There's no other way to get best results with your LLM than right-talking with it. Of course if you are sending one shot question over time to seek at information it might not be really relevant. But as soon as you want something a little bit more "predictable" on the way it will work and output the result (and I don't know if it makes much sense to say that) that's probably a good idea to leverage all the power of Transformers.
I got some other things to share about Prompt Engineering so stay tuned for the next one 😎 In the meantime feel free to share your feedback in the comments section or directly by DM and I'd be please to answer.
Thanks for reading 🙂
Sources
Attention Is All You Need: arxiv 1706.03762
System 2 Attention (is something you might need to): arxiv 2311.11829
Automatic Chain of Thought Prompting in Large Language Models: arxiv 2210.03493
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models: arxiv 2201.11903
Director - AI Workforce Global Solution Architect Lead for EMEA
8moThanks for publishing this content Olivier, very well written. Simple steps to quickly start getting better results are: 1. Stop using key words, be human. LLM are not search engines, start asking your question as you’d do with a human being. 2. Unlock your imagination. Don’t ask what you think the system can do but what you’d like it to solve for you. Don’t be the limit to what LLM can do. 3. Use a conversation style. Act as a teacher guiding a student. Only you know what is in your mind: encourage and guide your LLM to get to what you need via itération - The « one shot prompt » is not always the best option. 4. Follow simple prompt rules : Goal + Context + Expectations + Source - You don’t go to your travel agent and start with « need vacation this summer »
Startups@Microsoft | AI Partnership & Head of Microsoft GenAI Studio Program☁️
8moQuick and easy! Thanks for sharing your tips Olivier L. ;)
VP, Global CTO Applications & Cloud Technologies @ Sogeti, part of Capgemini | Executive MBA
8moGreat article Olivier cc Jeroen Egelmeers, Mahesh Jadhav, Antoine A.
Looking forward to the next chapter ! 👍