SlideShare a Scribd company logo
Next-gen
interactions:
Redefining User
Experience with
Large Language
Models
Igor Ilic
Dall-e
Agenda
Copilot – what’s Microsoft building, what
does it look like to build Copilot in
Microsoft, what’s done in Serbia dev center.
Sort of unrelated to the rest of the talk
Types of user interfaces – go over how
people interact with software today
How will products be impacted – and
how you can add value to your company or
business. Examples of hypothetical future
products
Copilot
ChatGPT covers general inquiries related to
general knowledge. But it lacks proprietary
context.
Copilot aims to provide necessary context to
LLMs – at least the one which exists in Microsoft
ecosystem: your and your company’s documents,
e-mails, databases and anything else you have
access to
Microsoft 365 Copilot
Available
by 3/13
Transforms the writing process to make
you more creative and efficient.
With now you can:
• Create a summary of any document to
share as a recap or quickly get up to
speed.
• Rewrite a paragraph or save time on
formatting by asking Copilot to generate
a table from your copy.
• Create custom graphics right in the
document with Microsoft Designer, which
will pull from stock images, or your own
uploads in the chat.
• And much more (video on the next slide)
Copilot in Word – Made in Serbia
Microsoft 365 Copilot basic architecture
6
2
3
5
3
4
Data flow ( = all requests are encrypted via HTTPS and wss://)
User prompts from Microsoft 365 Apps are sent to Copilot
1
2
3
4
5
6
Microsoft 365 Service Boundary
Customer Microsoft 365 Tenant
Semantic
Index
Azure
OpenAI
RAI
Azure OpenAI
instance is
maintained by
Microsoft. OpenAI
has no access to the
data or the model.
RAI is performed
on input prompt
and output results
Prompts, responses, and
data accessed through
Microsoft Graph aren't
used to train foundation
models
1
Microsoft 365 Copilot basic architecture
6
2
3
5
3
4
Data flow ( = all requests are encrypted via HTTPS and wss://)
User prompts from Microsoft 365 Apps are sent to Copilot
1
2
3
4
5
6
Microsoft 365 Service Boundary
Customer Microsoft 365 Tenant
Semantic
Index
Azure
OpenAI
RAI
Azure OpenAI
instance is
maintained by
Microsoft. OpenAI
has no access to the
data or the model.
RAI is performed
on input prompt
and output results
Prompts, responses, and
data accessed through
Microsoft Graph aren't
used to train foundation
models
1
What is it like to work on Copilot
Prompt engineering
• Super-complex prompts with state-of-
the-art prompting techniques. Main issue
from quality perspective - hallucination
• Building systems for automatic
evaluation of prompts (sort of like
regtests for prompt changes)
• Manual evaluation of outputs
AI engineering
• Building and improving agents with
iterative planning
• Fine-tuning smaller models (e.g. gpt-3.5-
turbo, open-source models)
Safety
• Responsible AI – LLMs can cause serious
damage. Need to make sure people are
not able to abuse the vast knowledge
behind these models, while reducing block
rate
• Privacy, Compliance, Legal – this always
comes first, it’s slowing development quite
a bit, but necessary for Microsoft’s
business model
• Prompt injection – Could be part of either
RAI or Privacy, but such a huge effort it
deserves its own bullet point. With
increasing the scope of LLM connectors
with various data sources, prompt injection
becomes a large security issue
What is it like to work on Copilot
Prompt engineering
• Super-complex prompts with state-of-
the-art prompting techniques. Main issue
from quality perspective - hallucination
• Building systems for automatic
evaluation of prompts (sort of like
regtests for prompt changes)
• Manual evaluation of outputs
AI engineering
• Building and improving agents with
iterative planning
• Fine-tuning smaller models (e.g. gpt-3.5-
turbo, open-source models)
Bureaucracy
• Responsible AI – LLMs can cause serious
damage. Need to make sure people are
not able to abuse the vast knowledge
behind these models, while reducing block
rate
• Privacy, Compliance, Legal – this always
comes first, it’s slowing development quite
a bit, but necessary for Microsoft’s
business model
• Prompt injection – Could be part of either
RAI or Privacy, but such a huge effort it
deserves its own bullet point. With
increasing the scope of LLM connectors
with various data sources, prompt injection
becomes a large security issue
WE’RE HIRING
(aka.ms/careers)
Current types of user
experiences
Onto the main topic of the talk
In order to understand how Generative AI will
change the products we are building, we first
need to understand how products are built today
Current types
of user
experiences
one of the many ways to skin a cat
Simple Task-Based Applications –
Intuitive, simple, limited UIs. Likes of
Instagram, TikTok, FaceApp, etc.
Search-and-Select Interfaces – Highly
visual by nature. Likes of Amazon,
AliExpress and other e-commerce
platforms
Complex System-Operation Interfaces –
Complex interfaces for complex software
solutions: Word, Photoshop, SAP, etc.
Search-and-Select
Interfaces
Still mostly consumer products – but they
are solving a specific problem of shopping,
where a large stock is an advantage, hence
can be more complex.
Intuition and relevance of search results are
crucial in these UIs. Good filtering is a huge
competitive advantage. Good visuals as well.
Complex online documentation (e.g. API) or
web presentations are also a part of this
group.
Simple Task-Based
Applications
TikTok, Instagram, FaceApp, Twitter –
consumer products
Outside of work, people are trying to
minimize the amount of cognitive load.
People don’t want options. They are ready to
exchange flexibility for simplicity.
Hence modern app UIs – simple, highly
repeatable interactions with almost no
customization possibilities.
Complex System-
Operation Interfaces
Professional software requires heavy
customization capabilities. This means a LOT
of different functionalities need to be built-
in. This means very complex interfaces.
Examples: ERP systems, Excel, Photoshop.
Any intent (e.g. “remove the bird from a
photo”) implies a set of complex actions to
be fulfilled.
Expertise in these UIs is a market
commodity.
New types of interactions
Chat (for Search-and-Select Interfaces) – old UI
with revolutionary new capabilities
Voice (for Simple Task-Based Applications) – the
new generations and the fall of typing
Adaptive UIs (for Complex System-Operation
Interfaces) – democratization of expertise
Vision – what can a software do when it has a
sense of sight
Chat
Most useful for search-and-select interfaces, as a
replacement for complex search or live support
Standard RAG: Today, you can just encode your
whole content of the documentation/website (as
well as some non-visible documentation), put an
LLM on top of it and voila – you have an
automated chat covering >90% of search and
support inquiries for a fraction of the cost
It doesn’t have to. It should just know enough to replace majority of
user inquiries and it needs to know when it doesn’t know the answer
so it can direct the user to other sources (e.g. support)
RAG system
I tried it for this question and it didn’t know the answer
The rise of voice and
the decline of typing
Frequency of sending voice messages among mobile users
by age group (UK, May 2023)
Consumers are changing their preferences
when it comes to input modality – by
more and more preferring voice over
typing. 7 billion voice messages only on
WhatsApp daily.
Whisper by OpenAI – making it easy to
transcribe any verbal request in >90
languages. Still requires human check
though.
Most useful for mobile apps. E.g. simple
task-based applications for expanding
their flexibility.
Adaptive UIs
How do we significantly lower the level of
expertise needed for complex system-
operation software (like Excel), while
enhancing their capabilities? Using agents.
Let’s rebuild Photoshop using this approach.
Very, very
high-level
representation
of agents
Adaptive UIs
How do we significantly
lower the level of expertise
needed for complex system-
operation software (like
Excel), while enhancing their
capabilities? Using agents.
Let’s rebuild Photoshop
using this approach.
Adaptive UIs
How do we significantly
lower the level of expertise
needed for complex system-
operation software (like
Excel), while enhancing their
capabilities? Using agents.
Let’s rebuild Photoshop
using this approach.
Remove dog from
the photo
Adaptive UIs
How do we significantly
lower the level of expertise
needed for complex system-
operation software (like
Excel), while enhancing their
capabilities? Using agents.
Let’s rebuild Photoshop
using this approach.
Remove dog from
the photo
Plan:
1. Run object detection
for “dog”
2. Run semantic
segmentation within
detected object
3. Create a mask in
based on segment
and add 5%
4. Run inpainting
mechanism using
Stable Diffusion v1.5
Adaptive UIs
How do we significantly
lower the level of expertise
needed for complex system-
operation software (like
Excel), while enhancing their
capabilities? Using agents.
Let’s rebuild Photoshop
using this approach.
Remove dog from
the photo
Plan:
1. Run object detection
for “dog” (Gr.-DINO)
2. Run semantic
segmentation within
detected object
3. Create a mask in
based on segment
and add 5%
4. Run inpainting
mechanism using
Stable Diffusion v1.5
Selected the dog.
Please verify the
selection
Apply
Adaptive UIs
How do we significantly
lower the level of expertise
needed for complex system-
operation software (like
Excel), while enhancing their
capabilities? Using agents.
Let’s rebuild Photoshop
using this approach.
Remove dog from
the photo
Plan:
1. Run object detection
for “dog” (Gr.-DINO)
2. Run semantic
segmentation within
detected object SAM
3. Create a mask in
based on segment
and add 5%
4. Run inpainting
mechanism using
Stable Diffusion v1.5
Selected the dog.
Please verify the
selection
Apply
Done
Segmented the dog.
Please verify the
selgment
Adaptive UIs
How do we significantly
lower the level of expertise
needed for complex system-
operation software (like
Excel), while enhancing their
capabilities? Using agents.
Let’s rebuild Photoshop
using this approach.
Remove dog from
the photo
Plan:
1. Run object detection
for “dog” (Gr.-DINO)
2. Run semantic
segmentation within
detected object SAM
3. Create a mask in
based on segment
and add 5%
4. Run inpainting
mechanism using
Stable Diffusion v1.5
Selected the dog.
Please verify the
selection
Done
Segmented the dog.
Please verify the
selgment
Done
Adaptive UIs
How do we significantly
lower the level of expertise
needed for complex system-
operation software (like
Excel), while enhancing their
capabilities? Using agents.
Let’s rebuild Photoshop
using this approach.
Remove dog from
the photo
Plan:
1. Run object detection
for “dog” (Gr.-DINO)
2. Run semantic
segmentation within
detected object SAM
3. Create a mask in
based on segment
and add 5%
4. Run inpainting
mechanism using
Stable Diffusion v1.5
Selected the dog.
Please verify the
selection
Done
Segmented the dog.
Please verify the
selgment
Done
Adaptive UIs
How do we significantly
lower the level of expertise
needed for complex system-
operation software (like
Excel), while enhancing their
capabilities? Using agents.
Let’s rebuild Photoshop
using this approach.
Remove dog from
the photo
Selected the dog.
Please verify the
selection
Done
Segmented the dog.
Please verify the
selgment
Done
Adaptive UIs
How do we significantly
lower the level of expertise
needed for complex system-
operation software (like
Excel), while enhancing their
capabilities? Using agents.
Let’s rebuild Photoshop
using this approach.
Remove dog from
the photo
Selected the dog.
Please verify the
selection
Done
Segmented the dog.
Please verify the
selgment
Done
I have generated the
final picture without
the dog. Hope you
like it.
Adaptive UIs
This approach can be used for any complex
software with a number of hidden and/or
complex capabilities, as well as a way to
reduce the cost of “real-estate” on UI – you
can only show capabilities relevant for the
user at that specific moment.
Or the software could just perform the tasks
automatically (though not advised, it’s best
to always keep human in the loop)
Vision
Vision
Vision
GPT-4V and other multi-modal generative models
(like LLaVa) are going to change the way people
interact with software.
As more and more products adopt visual input (like
screenshots, doodles or just style references)
expectations of the users are going to change
• Why would I type in one product if I can just
paste the screenshot in that other product?
• Why would I retype in company template
when I can just post an image of reference
document and text?
And then AR/VR in combination with these models
– yet to see where that takes us
Thank you

More Related Content

PPTX
Artificial Intelligence Day 6 Slides for your Reference Happy Learning
PDF
Introduction to Generative AI and Copilot.pdf
PPTX
Build Your Own Copilot & Agents For Devs
PDF
Using the power of Generative AI at scale
PDF
Multi-Agent Era will Define the Future of Software
PPTX
AI-900 Slides.pptx
PPTX
AI in Construction-Demystifying AI Agents
PDF
Microsoft + OpenAI: Recent Updates (Machine Learning 15minutes! Broadcast #74)
Artificial Intelligence Day 6 Slides for your Reference Happy Learning
Introduction to Generative AI and Copilot.pdf
Build Your Own Copilot & Agents For Devs
Using the power of Generative AI at scale
Multi-Agent Era will Define the Future of Software
AI-900 Slides.pptx
AI in Construction-Demystifying AI Agents
Microsoft + OpenAI: Recent Updates (Machine Learning 15minutes! Broadcast #74)

Similar to [DSC Europe 23] Igor Ilic - Redefining User Experience with Large Language Models (20)

PDF
Building Generative AI-infused apps: what's possible and how to start
PDF
AI and Data Science.pdf
PDF
AI 2023.pdf
PDF
Microsoft - Power Platform_G.Aspiotis.pdf
PDF
To AI, or Not to AI - AI / Copilot session during Engage 2025
PDF
TechSoup Introduction to Generative AI and Copilot - 2025.05.22.pdf
PDF
TechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - Trivadis
PDF
Build enterprise-grade AI agents with Azure AI Agent Service
PDF
ChatGPT and not only: how can you use the power of Generative AI at scale
PDF
[DSC Europe 23] Tamara Stankovic - From Prompt To Product Microsoft 365 Copil...
PDF
Introduction to Generative AI and Copilot - 2025.04.23.pdf
PDF
architecting-ai-in-the-enterprise-apis-and-applications.pdf
PDF
How Azure helps to build better business processes and customer experiences w...
PDF
Introduction to Generative AI and Copilot.pdf
PPTX
AI Revolution unleashed with AI Foundry at AI Tour Brussels
PDF
"AI Transformation: Directions and Challenges", Pavlo Shaternik
PDF
Inspector Gadget 2023 - CalCPA.pdf
PPTX
Amazon Connect & AI - Shaping the Future of Customer Interactions - GenAI and...
PDF
PDF
AOMEI Backupper Crack 2025 FREE Download
Building Generative AI-infused apps: what's possible and how to start
AI and Data Science.pdf
AI 2023.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
To AI, or Not to AI - AI / Copilot session during Engage 2025
TechSoup Introduction to Generative AI and Copilot - 2025.05.22.pdf
TechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - Trivadis
Build enterprise-grade AI agents with Azure AI Agent Service
ChatGPT and not only: how can you use the power of Generative AI at scale
[DSC Europe 23] Tamara Stankovic - From Prompt To Product Microsoft 365 Copil...
Introduction to Generative AI and Copilot - 2025.04.23.pdf
architecting-ai-in-the-enterprise-apis-and-applications.pdf
How Azure helps to build better business processes and customer experiences w...
Introduction to Generative AI and Copilot.pdf
AI Revolution unleashed with AI Foundry at AI Tour Brussels
"AI Transformation: Directions and Challenges", Pavlo Shaternik
Inspector Gadget 2023 - CalCPA.pdf
Amazon Connect & AI - Shaping the Future of Customer Interactions - GenAI and...
AOMEI Backupper Crack 2025 FREE Download
Ad

More from DataScienceConferenc1 (20)

PPTX
[DSC Europe 24] Anastasia Shapedko - How Alice, our intelligent personal assi...
PPTX
[DSC Europe 24] Joy Chatterjee - Balancing Personalization and Experimentatio...
PPTX
[DSC Europe 24] Pratul Chakravarty - Personalized Insights and Engagements us...
PPTX
[DSC Europe 24] Domagoj Maric - Modern Web Data Extraction: Techniques, Tools...
PPTX
[DSC Europe 24] Marcin Szymaniuk - The path to Effective Data Migration - Ove...
PPTX
[DSC Europe 24] Fran Mikulicic - Building a Data-Driven Culture: What the C-S...
PPTX
[DSC Europe 24] Sofija Pervulov - Building up the Bosch Semantic Data Lake
PDF
[DSC Europe 24] Dani Ei-Ayyas - Overcoming Loneliness with LLM Dating Assistant
PDF
[DSC Europe 24] Ewelina Kucal & Maciej Dziezyc - How to Encourage Children to...
PPTX
[DSC Europe 24] Nikola Milosevic - VerifAI: Biomedical Generative Question-An...
PPTX
[DSC Europe 24] Josip Saban - Buidling cloud data platforms in enterprises
PPTX
[DSC Europe 24] Sray Agarwal - 2025: year of Ai dilemma - ethics, regulations...
PDF
[DSC Europe 24] Peter Kertys & Maros Buban - Application of AI technologies i...
PPTX
[DSC Europe 24] Orsalia Andreou - Fostering Trust in AI-Driven Finance
PPTX
[DSC Europe 24] Arnault Ioualalen - AI Trustworthiness – A Path Toward Mass A...
PDF
[DSC Europe 24] Nathan Coyle - Open Data for Everybody: Social Action, Peace ...
PPTX
[DSC Europe 24] Miodrag Vladic - Revolutionizing Information Access: All Worl...
PPTX
[DSC Europe 24] Katherine Munro - Where there’s a will, there’s a way: The ma...
PPTX
[DSC Europe 24] Ana Stojkovic Knezevic - How to effectively manage AI/ML proj...
PPTX
[DSC Europe 24] Simun Sunjic & Lovro Matosevic - Empowering Sales with Intell...
[DSC Europe 24] Anastasia Shapedko - How Alice, our intelligent personal assi...
[DSC Europe 24] Joy Chatterjee - Balancing Personalization and Experimentatio...
[DSC Europe 24] Pratul Chakravarty - Personalized Insights and Engagements us...
[DSC Europe 24] Domagoj Maric - Modern Web Data Extraction: Techniques, Tools...
[DSC Europe 24] Marcin Szymaniuk - The path to Effective Data Migration - Ove...
[DSC Europe 24] Fran Mikulicic - Building a Data-Driven Culture: What the C-S...
[DSC Europe 24] Sofija Pervulov - Building up the Bosch Semantic Data Lake
[DSC Europe 24] Dani Ei-Ayyas - Overcoming Loneliness with LLM Dating Assistant
[DSC Europe 24] Ewelina Kucal & Maciej Dziezyc - How to Encourage Children to...
[DSC Europe 24] Nikola Milosevic - VerifAI: Biomedical Generative Question-An...
[DSC Europe 24] Josip Saban - Buidling cloud data platforms in enterprises
[DSC Europe 24] Sray Agarwal - 2025: year of Ai dilemma - ethics, regulations...
[DSC Europe 24] Peter Kertys & Maros Buban - Application of AI technologies i...
[DSC Europe 24] Orsalia Andreou - Fostering Trust in AI-Driven Finance
[DSC Europe 24] Arnault Ioualalen - AI Trustworthiness – A Path Toward Mass A...
[DSC Europe 24] Nathan Coyle - Open Data for Everybody: Social Action, Peace ...
[DSC Europe 24] Miodrag Vladic - Revolutionizing Information Access: All Worl...
[DSC Europe 24] Katherine Munro - Where there’s a will, there’s a way: The ma...
[DSC Europe 24] Ana Stojkovic Knezevic - How to effectively manage AI/ML proj...
[DSC Europe 24] Simun Sunjic & Lovro Matosevic - Empowering Sales with Intell...
Ad

Recently uploaded (20)

PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PDF
Clinical guidelines as a resource for EBP(1).pdf
PPTX
Moving the Public Sector (Government) to a Digital Adoption
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PPTX
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
PPTX
Computer network topology notes for revision
PPT
Chapter 2 METAL FORMINGhhhhhhhjjjjmmmmmmmmm
PPTX
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
PPTX
Global journeys: estimating international migration
PPTX
Introduction to Knowledge Engineering Part 1
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PPTX
Database Infoormation System (DBIS).pptx
PPTX
climate analysis of Dhaka ,Banglades.pptx
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PPT
Reliability_Chapter_ presentation 1221.5784
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PPT
Chapter 3 METAL JOINING.pptnnnnnnnnnnnnn
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
Clinical guidelines as a resource for EBP(1).pdf
Moving the Public Sector (Government) to a Digital Adoption
Galatica Smart Energy Infrastructure Startup Pitch Deck
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
Computer network topology notes for revision
Chapter 2 METAL FORMINGhhhhhhhjjjjmmmmmmmmm
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
Global journeys: estimating international migration
Introduction to Knowledge Engineering Part 1
Miokarditis (Inflamasi pada Otot Jantung)
Database Infoormation System (DBIS).pptx
climate analysis of Dhaka ,Banglades.pptx
IBA_Chapter_11_Slides_Final_Accessible.pptx
Reliability_Chapter_ presentation 1221.5784
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
Acceptance and paychological effects of mandatory extra coach I classes.pptx
Chapter 3 METAL JOINING.pptnnnnnnnnnnnnn

[DSC Europe 23] Igor Ilic - Redefining User Experience with Large Language Models

  • 2. Agenda Copilot – what’s Microsoft building, what does it look like to build Copilot in Microsoft, what’s done in Serbia dev center. Sort of unrelated to the rest of the talk Types of user interfaces – go over how people interact with software today How will products be impacted – and how you can add value to your company or business. Examples of hypothetical future products
  • 3. Copilot ChatGPT covers general inquiries related to general knowledge. But it lacks proprietary context. Copilot aims to provide necessary context to LLMs – at least the one which exists in Microsoft ecosystem: your and your company’s documents, e-mails, databases and anything else you have access to
  • 6. Transforms the writing process to make you more creative and efficient. With now you can: • Create a summary of any document to share as a recap or quickly get up to speed. • Rewrite a paragraph or save time on formatting by asking Copilot to generate a table from your copy. • Create custom graphics right in the document with Microsoft Designer, which will pull from stock images, or your own uploads in the chat. • And much more (video on the next slide) Copilot in Word – Made in Serbia
  • 7. Microsoft 365 Copilot basic architecture 6 2 3 5 3 4 Data flow ( = all requests are encrypted via HTTPS and wss://) User prompts from Microsoft 365 Apps are sent to Copilot 1 2 3 4 5 6 Microsoft 365 Service Boundary Customer Microsoft 365 Tenant Semantic Index Azure OpenAI RAI Azure OpenAI instance is maintained by Microsoft. OpenAI has no access to the data or the model. RAI is performed on input prompt and output results Prompts, responses, and data accessed through Microsoft Graph aren't used to train foundation models 1
  • 8. Microsoft 365 Copilot basic architecture 6 2 3 5 3 4 Data flow ( = all requests are encrypted via HTTPS and wss://) User prompts from Microsoft 365 Apps are sent to Copilot 1 2 3 4 5 6 Microsoft 365 Service Boundary Customer Microsoft 365 Tenant Semantic Index Azure OpenAI RAI Azure OpenAI instance is maintained by Microsoft. OpenAI has no access to the data or the model. RAI is performed on input prompt and output results Prompts, responses, and data accessed through Microsoft Graph aren't used to train foundation models 1
  • 9. What is it like to work on Copilot Prompt engineering • Super-complex prompts with state-of- the-art prompting techniques. Main issue from quality perspective - hallucination • Building systems for automatic evaluation of prompts (sort of like regtests for prompt changes) • Manual evaluation of outputs AI engineering • Building and improving agents with iterative planning • Fine-tuning smaller models (e.g. gpt-3.5- turbo, open-source models) Safety • Responsible AI – LLMs can cause serious damage. Need to make sure people are not able to abuse the vast knowledge behind these models, while reducing block rate • Privacy, Compliance, Legal – this always comes first, it’s slowing development quite a bit, but necessary for Microsoft’s business model • Prompt injection – Could be part of either RAI or Privacy, but such a huge effort it deserves its own bullet point. With increasing the scope of LLM connectors with various data sources, prompt injection becomes a large security issue
  • 10. What is it like to work on Copilot Prompt engineering • Super-complex prompts with state-of- the-art prompting techniques. Main issue from quality perspective - hallucination • Building systems for automatic evaluation of prompts (sort of like regtests for prompt changes) • Manual evaluation of outputs AI engineering • Building and improving agents with iterative planning • Fine-tuning smaller models (e.g. gpt-3.5- turbo, open-source models) Bureaucracy • Responsible AI – LLMs can cause serious damage. Need to make sure people are not able to abuse the vast knowledge behind these models, while reducing block rate • Privacy, Compliance, Legal – this always comes first, it’s slowing development quite a bit, but necessary for Microsoft’s business model • Prompt injection – Could be part of either RAI or Privacy, but such a huge effort it deserves its own bullet point. With increasing the scope of LLM connectors with various data sources, prompt injection becomes a large security issue WE’RE HIRING (aka.ms/careers)
  • 11. Current types of user experiences Onto the main topic of the talk In order to understand how Generative AI will change the products we are building, we first need to understand how products are built today
  • 12. Current types of user experiences one of the many ways to skin a cat Simple Task-Based Applications – Intuitive, simple, limited UIs. Likes of Instagram, TikTok, FaceApp, etc. Search-and-Select Interfaces – Highly visual by nature. Likes of Amazon, AliExpress and other e-commerce platforms Complex System-Operation Interfaces – Complex interfaces for complex software solutions: Word, Photoshop, SAP, etc.
  • 13. Search-and-Select Interfaces Still mostly consumer products – but they are solving a specific problem of shopping, where a large stock is an advantage, hence can be more complex. Intuition and relevance of search results are crucial in these UIs. Good filtering is a huge competitive advantage. Good visuals as well. Complex online documentation (e.g. API) or web presentations are also a part of this group.
  • 14. Simple Task-Based Applications TikTok, Instagram, FaceApp, Twitter – consumer products Outside of work, people are trying to minimize the amount of cognitive load. People don’t want options. They are ready to exchange flexibility for simplicity. Hence modern app UIs – simple, highly repeatable interactions with almost no customization possibilities.
  • 15. Complex System- Operation Interfaces Professional software requires heavy customization capabilities. This means a LOT of different functionalities need to be built- in. This means very complex interfaces. Examples: ERP systems, Excel, Photoshop. Any intent (e.g. “remove the bird from a photo”) implies a set of complex actions to be fulfilled. Expertise in these UIs is a market commodity.
  • 16. New types of interactions Chat (for Search-and-Select Interfaces) – old UI with revolutionary new capabilities Voice (for Simple Task-Based Applications) – the new generations and the fall of typing Adaptive UIs (for Complex System-Operation Interfaces) – democratization of expertise Vision – what can a software do when it has a sense of sight
  • 17. Chat Most useful for search-and-select interfaces, as a replacement for complex search or live support Standard RAG: Today, you can just encode your whole content of the documentation/website (as well as some non-visible documentation), put an LLM on top of it and voila – you have an automated chat covering >90% of search and support inquiries for a fraction of the cost It doesn’t have to. It should just know enough to replace majority of user inquiries and it needs to know when it doesn’t know the answer so it can direct the user to other sources (e.g. support) RAG system I tried it for this question and it didn’t know the answer
  • 18. The rise of voice and the decline of typing Frequency of sending voice messages among mobile users by age group (UK, May 2023) Consumers are changing their preferences when it comes to input modality – by more and more preferring voice over typing. 7 billion voice messages only on WhatsApp daily. Whisper by OpenAI – making it easy to transcribe any verbal request in >90 languages. Still requires human check though. Most useful for mobile apps. E.g. simple task-based applications for expanding their flexibility.
  • 19. Adaptive UIs How do we significantly lower the level of expertise needed for complex system- operation software (like Excel), while enhancing their capabilities? Using agents. Let’s rebuild Photoshop using this approach. Very, very high-level representation of agents
  • 20. Adaptive UIs How do we significantly lower the level of expertise needed for complex system- operation software (like Excel), while enhancing their capabilities? Using agents. Let’s rebuild Photoshop using this approach.
  • 21. Adaptive UIs How do we significantly lower the level of expertise needed for complex system- operation software (like Excel), while enhancing their capabilities? Using agents. Let’s rebuild Photoshop using this approach. Remove dog from the photo
  • 22. Adaptive UIs How do we significantly lower the level of expertise needed for complex system- operation software (like Excel), while enhancing their capabilities? Using agents. Let’s rebuild Photoshop using this approach. Remove dog from the photo Plan: 1. Run object detection for “dog” 2. Run semantic segmentation within detected object 3. Create a mask in based on segment and add 5% 4. Run inpainting mechanism using Stable Diffusion v1.5
  • 23. Adaptive UIs How do we significantly lower the level of expertise needed for complex system- operation software (like Excel), while enhancing their capabilities? Using agents. Let’s rebuild Photoshop using this approach. Remove dog from the photo Plan: 1. Run object detection for “dog” (Gr.-DINO) 2. Run semantic segmentation within detected object 3. Create a mask in based on segment and add 5% 4. Run inpainting mechanism using Stable Diffusion v1.5 Selected the dog. Please verify the selection Apply
  • 24. Adaptive UIs How do we significantly lower the level of expertise needed for complex system- operation software (like Excel), while enhancing their capabilities? Using agents. Let’s rebuild Photoshop using this approach. Remove dog from the photo Plan: 1. Run object detection for “dog” (Gr.-DINO) 2. Run semantic segmentation within detected object SAM 3. Create a mask in based on segment and add 5% 4. Run inpainting mechanism using Stable Diffusion v1.5 Selected the dog. Please verify the selection Apply Done Segmented the dog. Please verify the selgment
  • 25. Adaptive UIs How do we significantly lower the level of expertise needed for complex system- operation software (like Excel), while enhancing their capabilities? Using agents. Let’s rebuild Photoshop using this approach. Remove dog from the photo Plan: 1. Run object detection for “dog” (Gr.-DINO) 2. Run semantic segmentation within detected object SAM 3. Create a mask in based on segment and add 5% 4. Run inpainting mechanism using Stable Diffusion v1.5 Selected the dog. Please verify the selection Done Segmented the dog. Please verify the selgment Done
  • 26. Adaptive UIs How do we significantly lower the level of expertise needed for complex system- operation software (like Excel), while enhancing their capabilities? Using agents. Let’s rebuild Photoshop using this approach. Remove dog from the photo Plan: 1. Run object detection for “dog” (Gr.-DINO) 2. Run semantic segmentation within detected object SAM 3. Create a mask in based on segment and add 5% 4. Run inpainting mechanism using Stable Diffusion v1.5 Selected the dog. Please verify the selection Done Segmented the dog. Please verify the selgment Done
  • 27. Adaptive UIs How do we significantly lower the level of expertise needed for complex system- operation software (like Excel), while enhancing their capabilities? Using agents. Let’s rebuild Photoshop using this approach. Remove dog from the photo Selected the dog. Please verify the selection Done Segmented the dog. Please verify the selgment Done
  • 28. Adaptive UIs How do we significantly lower the level of expertise needed for complex system- operation software (like Excel), while enhancing their capabilities? Using agents. Let’s rebuild Photoshop using this approach. Remove dog from the photo Selected the dog. Please verify the selection Done Segmented the dog. Please verify the selgment Done I have generated the final picture without the dog. Hope you like it.
  • 29. Adaptive UIs This approach can be used for any complex software with a number of hidden and/or complex capabilities, as well as a way to reduce the cost of “real-estate” on UI – you can only show capabilities relevant for the user at that specific moment. Or the software could just perform the tasks automatically (though not advised, it’s best to always keep human in the loop)
  • 32. Vision GPT-4V and other multi-modal generative models (like LLaVa) are going to change the way people interact with software. As more and more products adopt visual input (like screenshots, doodles or just style references) expectations of the users are going to change • Why would I type in one product if I can just paste the screenshot in that other product? • Why would I retype in company template when I can just post an image of reference document and text? And then AR/VR in combination with these models – yet to see where that takes us

Editor's Notes

  • #5: Microsoft 365 Copilot is your AI assistant at work.  
  • #6: The most important thing about Copilot is that you’re always in control.  You decide what to keep, modify, or discard.  Let’s take a look at what Copilot can do for you. << Click to play video >>
  • #7: Copilot in Word transforms every part of the creative process. Coming soon with Designer integration in Word, you can effortlessly incorporate custom graphics into your document--Copilot uses the context of your content to propose stock visuals in the Chat, and you can upload and customize your own images for a more personal touch. Designer is the latest addition to our family of Microsoft 365 Consumer apps. Designer is in preview and available in English only.
  • #8: Copilot receives an input prompt from a user in an app, like Word or PowerPoint. Copilot then pre-processes the prompt through an approach called grounding, which improves the specificity of the prompt, ensuring that you get answers that are relevant and actionable to your specific task. It does this, in part, by making a call to Microsoft Graph and accessing your organization’s data. Data used by Copilot for an authenticated user is scoped to the documents and data that are already visible to them through existing Microsoft 365 role-based access controls. This retrieval of information is referred to as retrieval-augmented generation. It allows Copilot to provide exactly the right type of information as input to an LLM, combining this user data with other inputs such as information retrieved from knowledge base articles to improve the prompt. Copilot takes the response from the LLM and post-processes it. This post-processing includes additional grounding calls to Microsoft Graph, responsible AI checks, security, compliance and privacy reviews, and command generation. Copilot returns a recommended response to the user, and commands back to the apps where a user can review and assess the suggested response. Copilot iteratively processes and orchestrates these sophisticated services to produce results that are relevant to your business because they are contextually based on your organization’s data.
  • #9: Copilot receives an input prompt from a user in an app, like Word or PowerPoint. Copilot then pre-processes the prompt through an approach called grounding, which improves the specificity of the prompt, ensuring that you get answers that are relevant and actionable to your specific task. It does this, in part, by making a call to Microsoft Graph and accessing your organization’s data. Data used by Copilot for an authenticated user is scoped to the documents and data that are already visible to them through existing Microsoft 365 role-based access controls. This retrieval of information is referred to as retrieval-augmented generation. It allows Copilot to provide exactly the right type of information as input to an LLM, combining this user data with other inputs such as information retrieved from knowledge base articles to improve the prompt. Copilot takes the response from the LLM and post-processes it. This post-processing includes additional grounding calls to Microsoft Graph, responsible AI checks, security, compliance and privacy reviews, and command generation. Copilot returns a recommended response to the user, and commands back to the apps where a user can review and assess the suggested response. Copilot iteratively processes and orchestrates these sophisticated services to produce results that are relevant to your business because they are contextually based on your organization’s data.