Gnerative AI presidency Module1_L1_L2.pptx

Slide Number
Generative AI
Section Name
Faculty Name : Dr J Alamelu Mangai
Designation : Professor
Department : CSE
Subject Code & Subject Name : CSE3348 Generative AI
Students: School of Computer Science and Engineering

What is GenAI?
• Generative AI refers to a set of algorithms that can generate new
content in any medium such as image, text, audio or video.
• This generated content is similar to the content that the algorithm is
trained on.
• A prominent type of generative AI is the large language model (LLM),
which generates natural language texts based on prompts.
• GPT (Generative Pre-trained Transformer) series is a well-known
example of generative AI.
• ChatGPT is a renowned example of LLMs.
Ranjitha P-20213CSE0014 2

What is GenAI?
• GenAI :
• algorithms that generate novel content
• unlike traditional predictive ML, they do
not analyse or act on the existing data
• GenAI models have the ability to generate
text, images and other creative content
indistinguishable from human-generated
content.

Generative Vs. Discriminative Modeling [T2 Pg 1- 5
• Discriminative Modeling is like supervised learning

What is Generative modeling? [T2 pg. 1 – 5]
• A generative model describes how a data set is generated in terms of
a probabilistic model.
• By sampling this model, new data can be generated.

What is Generative modelling?
• Any generative modeling process has:
• A training data : examples of the entity the model has to generate.
• Observation : one of the examples from the training data
• Each Observation is defined using many features.
• Ex: Image of a horse has individual pixel values as features
• A generative model has to be probabilistic and not deterministic.
• The model should have some randomness that influences the sample
generated every time by the model.
• The model has to identify the unknown prob. distribution that
justifies/distinguishes the images present in the training data from
those not in the training set.

• If the model mimics this distribution, by sampling it can generate new
observations that look realistic.
• Discriminative modelling is done on a labelled data.
• Generative modeling is usually done on an unlabelled data (like
unsupervised learning)
• It can also be used to generate samples of a distinct class in the
training data.

Generative Modeling projects- Examples
• StyleGan by NVIDIA – generates hyper-realistic images of human
faces.
• GPT by OpenAI : given a short introductory passage, the model
completes the given passage.

Generative Modeling projects – Examples[T1 pg 4-
• OpenAI :
• A US based AI research company that promotes and develop friendly AI
applications.
• Started as a non-profit organisation in 2015 .
• In 2019, it became a for profit organisation.
• Significant achievements : Gym library for training reinforcement learning
algorithms
• Recently – GPT-n models and Dall-E generative models which generates
images from text.

Generative models? T1 Pg.4
• Artificial Intelligence (AI) : a broad field of CS focussed on creating intelligent
agents that can reason, learn and act autonomously
• Machine Learning(ML): a subset of AI, focussed on developing algorithms that
can learn from data.
• Deep Learning(DL): uses deep neural networks with many layers, as a mechanism
of ML to learn complex patterns from data.
• Generative models are a type of ML model, that can generate new data based on
patterns learnt from the input data.
• Language Models (LMs): are statistical models used to predict words in a
sequence of natural language text. ”The sky is ********”
• Large Language models (LLMs) : uses deep learning and are trained on massive
data sets.

• Generative models :
• a powerful type of AI that can generate new data that resembles the training
data.
• They handle different data modalities
• They are used in different domains – text, image, music and video
• They synthesise new data rather than just making predictions/decisions
• They are used in applications generating text, image, music and video.
• When real data is scarce to train an AI model, generative models can be used
to create synthetic data.

OpenAI’s generative models https://platform.openai.com/docs/models

Evolution of Generative AI
• 1948: Claude Shannon wrote a paper called “A Mathematical Theory
of Communication“. In this paper, he introduced the idea of n-grams,
a statistical model that can generate new text based on existing text.
• 1950: Alan Turing wrote a paper called “Computing Machinery and
Intelligence“. In this paper, he introduced the Turing Test, which is a
way to determine if a machine can behave intelligently like a human.
• 1952: A.L. Hodgkin and A.F. Huxley created a mathematical model
that explained how the brain uses neurons to create an electrical
network. This model inspired the development of artificial neural
networks, which are used in generative AI.

• 1965: Alexey Ivakhnenko and Valentin Lapa developed the first
learning algorithm for feedforward neural networks. This algorithm
enabled the networks to learn complex nonlinear functions from data.
• 1979: Kunihiko Fukushima introduced the neocognitron, a powerful
type of neural network known as a deep convolutional neural
network. It was specifically designed to identify and recognize
handwritten digits and various other patterns.
• 1986: David Rumelhart, Geoffrey Hinton, and Ronald Williams wrote a
paper called “Learning Representations by Back-propagating Errors.”
This paper introduced the backpropagation algorithm, which is
commonly used to train neural networks.

• 1991: Sepp Hochreiter introduced the long short-term memory
(LSTM) network. It is a type of recurrent neural network that can
learn long-term relationships in sequential data.
• 2001: Yoshua Bengio and his colleagues created a neural network
called the Neural Probabilistic Language Model (NPLM). This model
can learn how words are used in natural language.
• 2014: Diederik Kingma and Max Welling introduced the variational
autoencoder (VAE). It is a type of model that can learn
representations of data and generate new data based on those
learned representations.

• 2014: Ian Goodfellow and his colleagues introduced the generative
adversarial network (GAN). It is a type of generative model that
comprises two neural networks: a generator and a discriminator. The
generator aims to generate realistic data, while the discriminator aims
to differentiate between real and fake data.
• 2015: Yann LeCun and his team proposed the diffusion model. It is a
generative model that learns to reverse a process that gradually
transforms data into noise.
• 2016: Aaron van den Oord and his team introduced WaveNet, a
powerful neural network that can create lifelike speech and music
waveforms.

• 2017: Ashish Vaswani and his team introduced the Transformer, a
neural network design that leverages attention mechanisms to learn
from sequential information, like language or speech.
• 2018: Alec Radford and his team introduced Generative Pre-trained
Transformer (GPT). This is a big model that uses the Transformer
architecture to create different kinds of text on different subjects.
• 2018: Jacob Devlin and his team introduced BERT, a powerful model
that can understand the meaning of words and sentences in any
language. It uses a technique called Transformers to learn from lots of
text without needing specific labels.

• 2019: a researcher named Tero Karras and his team
introduced StyleGAN, an enhanced type of GAN (Generative
Adversarial Network) that can create a wide range of detailed and
realistic images, including faces, animals, landscapes, and more.
• 2020: Large Language Models Take Center Stage: OpenAI’s GPT-3
(Generative Pre-trained Transformer 3) with 175 billion parameters
pushes the boundaries of language generation, demonstrating
impressive capabilities in text creation, translation, and code writing.
• 2020: a team led by Alexei Baevski introduced wav2vec 2.0. It is a
model that can learn speech representations directly from raw audio
and achieved excellent performance in speech recognition tasks.

• 2021: Aditya Ramesh and his team created DALL-E, a powerful model
that can create lifelike images based on written descriptions.
• 2021: Focus on Control and Explainability: Researchers grapple with
the “black box” nature of large language models, seeking methods to
improve control over generated outputs and explain the reasoning
behind their creations.
• 2022: Diffusion Models Gain Traction: Diffusion models, known for
their ability to create realistic images, experience a surge in
popularity. Applications in image generation, editing, and inpainting
become prominent.

• 2023: Multimodal Generative AI Takes Shape: Models capable of
generating across different modalities, like text and image
combinations, start to emerge. This opens doors for more interactive
and immersive experiences.
• 2023: Ethical Considerations Mount: Concerns around bias,
misinformation, and potential misuse of generative AI lead to
discussions on responsible development and deployment practices.
• 2024: Focus on Real-World Integration: A growing trend towards
integrating generative AI tools into real-world applications across
various industries like customer service, product design, and
marketing.

Advantages of generative modeling
• Synthetic data generation using generative models reduces the cost of
labelling and improves the training efficiency.
• Microsoft Research trained their LLM named phi-1 using generative
modelling, for basic Python coding.
• It is a transformer with 1.3 billion parameters.
• Trained on code from The Stack, Q&A content from StackOverflow,
synthetic codes generated by GPT3.5
• “Textbooks Are All You Need, June 2023”
https://www.microsoft.com/en-us/research/publication/textbooks-
are-all-you-need
/

Types of generative models[T1 pg 6]
• Different types of generative models for different data modalities:
1) Text-to-text :
• models that generate text from input text, like conversational agents. Ex:
LLaMa 2, GPT-4, Claude, PaLM 2
• A conversational agent is a program designed to converse with humans in
natural language.
• It can talk to people on phones, computers, and other devices, allowing them
to order food or do other functions through voice, text, or chat.
• It can achieve these using technologies like natural language processing (NLP),
machine learning (ML), speech recognition, text-to-speech synthesis, and
dialog management to interact with people through various mediums.

• Llama 2 is a family of pre-trained and fine-tuned
large language models (LLMs) released by Meta AI in 2023.
• Released free of charge for research and commercial use, Llama 2 AI
models are capable of a variety of
natural language processing (NLP) tasks, from text generation to
programming code.

• GPT-n by OpenAI:
• Generative Pre-trained Transformer 3
(GPT-3) is a large language model
released by OpenAI in 2020.
• it is a decoder-only transformer model
of deep neural network and convolution
-based architectures with a
technique known as "attention“ with 175
billion parameters.

2) Text-to-Image:
• Models that generate images from text captions. Ex: Dall-E 2, Stable Diffusion
and Imagen.
• Dall-E 2 : https://openai.com/index/dall-e-2/
• DALL·E is a 12-billion parameter version of GPT-3 (opens in a new window)
trained to generate images from text descriptions, using a dataset of text–
image pairs.

3) Text-to-Audio:
• Models that generate audio clips and music from text. Ex: Jukebox, AudioLM and
MusicGen
• Jukebox is a neural network-based tool that uses artificial intelligence to
generate music.
• Developed by OpenAI, Jukebox is a neural network model capable of composing
original songs in different genres and styles.
• Jukebox employs a combination of deep learning techniques, including generative
modeling and reinforcement learning, to create music that is both coherent and
creative.
• The main use cases of Jukebox include music generation, song completion, and
music style transfer. It can generate new songs in the style of a given artist or
even complete a song given a short melody.

4) Text-to-video:
• Models that generate video content from text descriptions. Ex:
Phenaki and Emu Video
• Phenaki : A model for generating videos from text, with prompts that
can change over time, and videos that can be as long as multiple
minutes. https://phenaki.video/
5) Text-to-Speech: Models that synthesize speech audio from input
text. Ex: WaveNet and Tacotron
6) Speech-to-text: Models that transcribe speech to text [ also called
Automatic Speech Recognition ASR]. Ex: Whisper and SpeechGPT

7) Image-to-text: Models that generate image captions from images.
Ex: CLIP and DALL-E 3.
8) Image to Image: Applications –
• data augmentation,
• Neural style transfer (NST) - manipulate digital images,
or videos, in order to adopt the appearance or visual style of
another image.
• generating a new image by combining the content of
one image with the style of another image.
• The goal of style transfer is to create an image that
preserves the content of the original image while
applying the visual style of another image.

• Inpainting : removing defects in the image
Ex: Right arm is missing in the original image

9) Text-to-code: models that generate programming code from text.
Ex: Stable diffusion and Dall-E 3
10) Video-to-audio:
Models that analyse video and generate matching audio.
Ex: Soundify
11) Text-to-Math: generates mathematical expressions from text.
• Many other combinations of data modalities exists
• Text is the common modality.
• OpenAI’s GPT-4V model – Sep 2023 takes both text and images to
better OCR to read text from images.

Gnerative AI presidency Module1_L1_L2.pptx

More Related Content

Similar to Gnerative AI presidency Module1_L1_L2.pptx (20)

Recently uploaded (20)

Gnerative AI presidency Module1_L1_L2.pptx