[DSC Europe 23] Igor Ilic - Redefining User Experience with Large Language Models

Next-gen
interactions:
Redefining User
Experience with
Large Language
Models
Igor Ilic
Dall-e

Agenda
Copilot – what’s Microsoft building, what
does it look like to build Copilot in
Microsoft, what’s done in Serbia dev center.
Sort of unrelated to the rest of the talk
Types of user interfaces – go over how
people interact with software today
How will products be impacted – and
how you can add value to your company or
business. Examples of hypothetical future
products

Copilot
ChatGPT covers general inquiries related to
general knowledge. But it lacks proprietary
context.
Copilot aims to provide necessary context to
LLMs – at least the one which exists in Microsoft
ecosystem: your and your company’s documents,
e-mails, databases and anything else you have
access to

Transforms the writing process to make
you more creative and efficient.
With now you can:
• Create a summary of any document to
share as a recap or quickly get up to
speed.
• Rewrite a paragraph or save time on
formatting by asking Copilot to generate
a table from your copy.
• Create custom graphics right in the
document with Microsoft Designer, which
will pull from stock images, or your own
uploads in the chat.
• And much more (video on the next slide)
Copilot in Word – Made in Serbia

Microsoft 365 Copilot basic architecture
6
2
3
5
3
4
Data flow ( = all requests are encrypted via HTTPS and wss://)
User prompts from Microsoft 365 Apps are sent to Copilot
1
2
3
4
5
6
Microsoft 365 Service Boundary
Customer Microsoft 365 Tenant
Semantic
Index
Azure
OpenAI
RAI
Azure OpenAI
instance is
maintained by
Microsoft. OpenAI
has no access to the
data or the model.
RAI is performed
on input prompt
and output results
Prompts, responses, and
data accessed through
Microsoft Graph aren't
used to train foundation
models
1

What is it like to work on Copilot
Prompt engineering
• Super-complex prompts with state-of-
the-art prompting techniques. Main issue
from quality perspective - hallucination
• Building systems for automatic
evaluation of prompts (sort of like
regtests for prompt changes)
• Manual evaluation of outputs
AI engineering
• Building and improving agents with
iterative planning
• Fine-tuning smaller models (e.g. gpt-3.5-
turbo, open-source models)
Safety
• Responsible AI – LLMs can cause serious
damage. Need to make sure people are
not able to abuse the vast knowledge
behind these models, while reducing block
rate
• Privacy, Compliance, Legal – this always
comes first, it’s slowing development quite
a bit, but necessary for Microsoft’s
business model
• Prompt injection – Could be part of either
RAI or Privacy, but such a huge effort it
deserves its own bullet point. With
increasing the scope of LLM connectors
with various data sources, prompt injection
becomes a large security issue

What is it like to work on Copilot
Prompt engineering
• Super-complex prompts with state-of-
the-art prompting techniques. Main issue
from quality perspective - hallucination
• Building systems for automatic
evaluation of prompts (sort of like
regtests for prompt changes)
• Manual evaluation of outputs
AI engineering
• Building and improving agents with
iterative planning
• Fine-tuning smaller models (e.g. gpt-3.5-
turbo, open-source models)
Bureaucracy
• Responsible AI – LLMs can cause serious
damage. Need to make sure people are
not able to abuse the vast knowledge
behind these models, while reducing block
rate
• Privacy, Compliance, Legal – this always
comes first, it’s slowing development quite
a bit, but necessary for Microsoft’s
business model
• Prompt injection – Could be part of either
RAI or Privacy, but such a huge effort it
deserves its own bullet point. With
increasing the scope of LLM connectors
with various data sources, prompt injection
becomes a large security issue
WE’RE HIRING
(aka.ms/careers)

Current types of user
experiences
Onto the main topic of the talk
In order to understand how Generative AI will
change the products we are building, we first
need to understand how products are built today

Current types
of user
experiences
one of the many ways to skin a cat
Simple Task-Based Applications –
Intuitive, simple, limited UIs. Likes of
Instagram, TikTok, FaceApp, etc.
Search-and-Select Interfaces – Highly
visual by nature. Likes of Amazon,
AliExpress and other e-commerce
platforms
Complex System-Operation Interfaces –
Complex interfaces for complex software
solutions: Word, Photoshop, SAP, etc.

Search-and-Select
Interfaces
Still mostly consumer products – but they
are solving a specific problem of shopping,
where a large stock is an advantage, hence
can be more complex.
Intuition and relevance of search results are
crucial in these UIs. Good filtering is a huge
competitive advantage. Good visuals as well.
Complex online documentation (e.g. API) or
web presentations are also a part of this
group.

Simple Task-Based
Applications
TikTok, Instagram, FaceApp, Twitter –
consumer products
Outside of work, people are trying to
minimize the amount of cognitive load.
People don’t want options. They are ready to
exchange flexibility for simplicity.
Hence modern app UIs – simple, highly
repeatable interactions with almost no
customization possibilities.

Complex System-
Operation Interfaces
Professional software requires heavy
customization capabilities. This means a LOT
of different functionalities need to be built-
in. This means very complex interfaces.
Examples: ERP systems, Excel, Photoshop.
Any intent (e.g. “remove the bird from a
photo”) implies a set of complex actions to
be fulfilled.
Expertise in these UIs is a market
commodity.

New types of interactions
Chat (for Search-and-Select Interfaces) – old UI
with revolutionary new capabilities
Voice (for Simple Task-Based Applications) – the
new generations and the fall of typing
Adaptive UIs (for Complex System-Operation
Interfaces) – democratization of expertise
Vision – what can a software do when it has a
sense of sight

Chat
Most useful for search-and-select interfaces, as a
replacement for complex search or live support
Standard RAG: Today, you can just encode your
whole content of the documentation/website (as
well as some non-visible documentation), put an
LLM on top of it and voila – you have an
automated chat covering >90% of search and
support inquiries for a fraction of the cost
It doesn’t have to. It should just know enough to replace majority of
user inquiries and it needs to know when it doesn’t know the answer
so it can direct the user to other sources (e.g. support)
RAG system
I tried it for this question and it didn’t know the answer

The rise of voice and
the decline of typing
Frequency of sending voice messages among mobile users
by age group (UK, May 2023)
Consumers are changing their preferences
when it comes to input modality – by
more and more preferring voice over
typing. 7 billion voice messages only on
WhatsApp daily.
Whisper by OpenAI – making it easy to
transcribe any verbal request in >90
languages. Still requires human check
though.
Most useful for mobile apps. E.g. simple
task-based applications for expanding
their flexibility.

Adaptive UIs
How do we significantly lower the level of
expertise needed for complex system-
operation software (like Excel), while
enhancing their capabilities? Using agents.
Let’s rebuild Photoshop using this approach.
Very, very
high-level
representation
of agents

Adaptive UIs
How do we significantly
lower the level of expertise
needed for complex system-
operation software (like
Excel), while enhancing their
capabilities? Using agents.
Let’s rebuild Photoshop
using this approach.

Adaptive UIs
Remove dog from
the photo

Adaptive UIs
Remove dog from
the photo
Plan:
1. Run object detection
for “dog”
2. Run semantic
segmentation within
detected object
3. Create a mask in
based on segment
and add 5%
4. Run inpainting
mechanism using
Stable Diffusion v1.5

Adaptive UIs
Remove dog from
the photo
Plan:
for “dog” (Gr.-DINO)
2. Run semantic
segmentation within
detected object
3. Create a mask in
based on segment
and add 5%
4. Run inpainting
mechanism using
Selected the dog.
Please verify the
selection
Apply

Adaptive UIs
Remove dog from
the photo
Plan:
2. Run semantic
segmentation within
detected object SAM
3. Create a mask in
based on segment
and add 5%
4. Run inpainting
mechanism using
Selected the dog.
Please verify the
selection
Apply
Done
Segmented the dog.
Please verify the
selgment

Adaptive UIs
Remove dog from
the photo
Plan:
2. Run semantic
segmentation within
detected object SAM
3. Create a mask in
based on segment
and add 5%
4. Run inpainting
mechanism using
Selected the dog.
Please verify the
selection
Done
Segmented the dog.
Please verify the
selgment
Done

Adaptive UIs
Remove dog from
the photo
Selected the dog.
Please verify the
selection
Done
Segmented the dog.
Please verify the
selgment
Done

Adaptive UIs
Remove dog from
the photo
Selected the dog.
Please verify the
selection
Done
Segmented the dog.
Please verify the
selgment
Done
I have generated the
final picture without
the dog. Hope you
like it.

Adaptive UIs
This approach can be used for any complex
software with a number of hidden and/or
complex capabilities, as well as a way to
reduce the cost of “real-estate” on UI – you
can only show capabilities relevant for the
user at that specific moment.
Or the software could just perform the tasks
automatically (though not advised, it’s best
to always keep human in the loop)

Vision
GPT-4V and other multi-modal generative models
(like LLaVa) are going to change the way people
interact with software.
As more and more products adopt visual input (like
screenshots, doodles or just style references)
expectations of the users are going to change
• Why would I type in one product if I can just
paste the screenshot in that other product?
• Why would I retype in company template
when I can just post an image of reference
document and text?
And then AR/VR in combination with these models
– yet to see where that takes us

[DSC Europe 23] Igor Ilic - Redefining User Experience with Large Language Models

More Related Content

Similar to [DSC Europe 23] Igor Ilic - Redefining User Experience with Large Language Models (20)

More from DataScienceConferenc1 (20)

Recently uploaded (20)

[DSC Europe 23] Igor Ilic - Redefining User Experience with Large Language Models

Editor's Notes