SlideShare a Scribd company logo
VIDEO-TO-TEXT SUMMARIZATION USING NLP:
TRANSFORMING VISUAL CONTENT INTO CONCISE TEXT
SUMMARIES
Reg.no Name
23B81D5906 Mekala Hari Ranjitha Nalini
Guide :
Dr N.Deepak
Professor
Department of CSE,
Sir C.R.Reddy College of Engineering
.
SIR C R REDDY COLLEGE OF ENGINEERING, ELURU
Approved by AICTE & Permanently Affiliated to JNTUK, Kakinada
Accredited by NBA, Accredited by NAAC with ‘A’ Grade
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
1
OUTLINE OF THE PRESENTATION
• Abstract
• Introduction
• Literature Survey
• Problem Statement
• Existing System
• Proposed System
• Code and Implementation
• Output Screens
• Conclusion
ABSTRACT
▰ Video summarization aims to produce a high-quality text-based summary of videos.
▰ The process involves converting video files to audio files, followed by converting the
audio into text.
▰ Transformer architecture of Natural Language Processing (NLP) enhances the
workflow.
▰ An extractive-video-summarizer is introduced using state-of-the-art pre-trained ML
models and open-source libraries.
▰ The summarizer follows a systematic regime consisting of five stages:
▻ Preparation of a multidisciplinary dataset of videos.
▻ Audio extraction from video files.
▻ Text generation from audio files using Automatic Speech Recognition (ASR).
▻ Text summarization using extractive summarizers.
▻ Entity extraction using Named Entity Recognition (NER). Class
2
ABSTRACT(cont…)
▰ Project was conducted primarily on English languages.
▰ The model performs significantly well and generates accurate, contextually relevant tags for
videos.
▰ Video datasets are collected from various domains to ensure diversity.
▰ Specialized tools are used for extracting audio from video files.
▰ Advanced ASR systems are employed to ensure accurate speech-to-text conversion.
▰ Extractive summarizers generate concise and informative summaries.
▰ Named Entity Recognition (NER) identifies key entities like names, locations, and events.
▰ State-of-the-art pre-trained models enhance performance and accuracy.
▰ Evaluation metrics demonstrate the model’s effectiveness in generating relevant summaries.
▰ Effective content management and information retrieval are achieved through the generated
summaries.
▰ The extractive-video-summarizer offers a robust solution for video content analysis and
summarization.
2
INTRODUCTION
3
• Video summarization is an essential technique for generating concise, high-quality
text-based summaries of videos.
• It helps users quickly understand the core information and key insights from video
content.
• The summarization process involves converting video files to audio and
subsequently transcribing the audio to text.
• Transformer-based Natural Language Processing (NLP) models significantly
enhance the accuracy and quality of the summaries.
• Existing text summarization models have paved the way for advancements in video
summarization.
• Our proposed extractive-video-summarizer leverages state-of-the-art pre-trained
Machine Learning (ML) models and open-source libraries.
• The model follows a structured approach, encompassing video data collection,
audio extraction, transcription, extractive summarization, and entity extraction.
INTRODUCTION
5
• The summarizer ensures effective content management and rapid information retrieval.
• Robust evaluation metrics confirm its effectiveness in generating accurate and relevant
summaries.
• The entity extraction feature further enhances summary quality by identifying key
information like names, locations, and events.
• Open-source libraries provide flexibility and seamless integration into various applications.
• The model’s systematic regime ensures adaptability across diverse video datasets from
multiple domains.
• Its advanced ASR systems offer precise speech-to-text conversion, facilitating accurate
transcription.
• Evaluation results indicate superior performance compared to traditional methods.
• This research demonstrates the practical application of AI in automating video content
analysis and management.
• The extractive-video-summarizer provides a scalable, efficient, and reliable solution for
video analysis.
• Ultimately, it enhances the accessibility of information by generating insightful video
summaries in a time-efficient manner.
INTRODUCTION
6
• Speech Recognition is a prominent field within machine learning, widely applied
across various domains.
• It powers applications like automatic subtitles on platforms such as Netflix and
YouTube.
• Popular voice assistants like Google Home Mini, Amazon Alexa, and Apple Siri rely
heavily on Speech Recognition.
• Named Entity Recognition (NER) is a crucial Natural Language Processing (NLP)
technique that identifies and extracts specific entities from text.
• NER can detect product names, events, and locations, enhancing search engines,
chatbots, and automated data entry systems.
• Text analysis using NER enables the classification of entities into predefined
categories like dates, phone numbers, or monetary values.
• The primary objective of our model is to generate audio files from videos, convert
them into text, and extract relevant entities.
• Using NLP, applications can process video content to produce text transcripts and
extract entities.
INTRODUCTION
6
• Extracted entities are used to generate meaningful tags that enrich video metadata.
• This enriched metadata significantly enhances content recommendations for users.
• Entity extraction streamlines content management and makes video data more
accessible.
• Video platforms can deliver personalized content by leveraging extracted entities for
better recommendations.
• Automated entity extraction reduces manual effort, improving operational efficiency.
• Our model ensures accurate entity extraction by utilizing pre-trained NLP models.
• It supports multiple languages, broadening its usability and reach.
• Evaluations indicate its effectiveness in improving content discoverability and user
experience.
• By integrating Speech Recognition and NER, the model provides a comprehensive
solution for video content analysis.
• Ultimately, it offers a robust, scalable, and intelligent framework for video
summarization and entity extraction.
Video summarization techniques and
their contributions
The video summarization classifications based on their characteristics and
properties are shown in Fig. 1.
8
Feature-Based Video Summarization (VS) Techniques
• Feature-based techniques focus on video characteristics such as motion, color, gesture,
audio-visual aspects, speech, and objects.
• Low-level features like color and texture are commonly used for video content extraction.
Clustering-Based VS Techniques
• Clustering techniques like k-means, partitioning, and spectral clustering are widely used
for video summarization
• The summary length is determined by content selection criteria and various evaluation
techniques.
Shot Selection-Based VS Techniques
• Generic video summaries are created using keyframe extraction, shot boundary detection,
scene change methods, and redundancy reduction
• Video skimming involves reducing redundancy and detecting objects or events
• Function-based methods use attention mechanisms to identify important video segments
Structure-based methods exploit hierarchical story structures using frames and shots.
Video Summarization Techniques
8
Event-Based VS Techniques
• video summaries generated based on objects, events, perceptions, and features.
• High-level features such as specific faces, motions, and gestures provide reliable content
information (
• It events from keyframes using minimum and maximum frame boundaries.
• Graph theory and scale-free networks are used for video event extraction in mono-view
videos
• Multi-view videos use techniques like Basic Local Alignment Search
• State-of-the-art techniques generate event summaries for sports videos like soccer, cricket,
tennis, and basketball
Trajectory-Based VS Techniques
• Initial projects focused on static video summaries.
• Dynamic video summaries are created using trajectory-based methods with stationary
backgrounds.
• These methods are computationally expensive and require significant resources.
• Deep learning approaches provide effective solutions for detecting important video content.
LITERATURE SURVEY
9
LITERATURE SURVEY
10
LITERATURE SURVEY
11
LITERATURE SURVEY
11
16
Problem Statement
Video summarization using NLP remains a challenging task due to the
diversity and complexity of video content. Existing methods often
struggle with accurately extracting relevant information from videos,
resulting in low-quality summaries. Additionally, techniques relying on
low-level features like color and texture lack contextual understanding.
There is a need for more robust methodologies that combine advanced
NLP techniques, entity extraction, and deep learning models to generate
meaningful video summaries. This project aims to address these
challenges by developing efficient video-to-text summarization systems.
LIVER DISEASE
17
Here are the existing system problems:
Existing System
• Complexity and Diversity of Video Content
Videos contain various elements like scenes, objects, and interactions,
making it challenging to extract relevant information.
• Low-Quality Summaries
Existing methods often generate inaccurate or incomplete summaries due to
poor feature selection.
• Lack of Contextual Understanding
Approaches using low-level features like color, texture, or motion fail to
comprehend the context of the video.
• Inefficient Use of NLP Techniques
Insufficient utilization of advanced NLP models for understanding the
semantics and generating meaningful summaries.
• Need for Robust Solutions
There is a requirement for improved methodologies combining deep learning
models, entity extraction, and language understanding for better video-to-
text summarization.
19
PROPOSED SYSTEM
Proposed System Architecture
In the figure shows block diagram of the system architecture outlining the key stages of
our model.
Video file
Extractive
Summarization
Abstractive
Summarizati
on
Encoder &
Decoder
Named Entity
Recognition
text summarization
Video File: Serves as the input to the system.
Extractive Summarization: Selects key sentences directly
from the transcribed text.
Abstractive Summarization: Generates a concise and
coherent summary using natural language generation
techniques.
Encoder & Decoder: Processes the text using a transformer-
based mechanism to understand its context and meaning.
Named Entity Recognition (NER): Identifies and categorizes
entities like names, locations, and dates to enhance the
summary's informativeness.
Text Summarization: Produces the final summarized text as
the output.
20
Extractive Summarization
Proposed System Model
• Extractive summarization involves selecting and extracting the most relevant
sentences or phrases directly from the original text.
• It uses ranking algorithms or machine learning models to identify the most
informative sentences.
• Common methods include TextRank, LexRank, and clustering-based approaches.
• It is useful for news articles, research papers, and legal documents where factual
accuracy is crucial.
• It maintains the original meaning of the text with high accuracy.
• It can result in summaries lacking coherence and fluidity since sentences are
directly extracted without rephrasing.
Figure 2 : Extractive summarization
Abstractive Summarization
• Abstractive summarization generates a concise and coherent summary by
understanding the context and meaning of the text.
• It uses advanced natural language generation (NLG) techniques to create new
sentences that convey the main ideas.
• Models like BART, T5, and GPT are commonly used for abstractive summarization.
• It is beneficial for summarizing conversational text, articles, or reports where
coherence and readability are essential.
• It can produce human-like summaries by paraphrasing and rephrasing content.
• It may introduce factual inconsistencies or lose key information if not trained
properly.
Figure 3: Abstractive summarization
21
• Sequence-to-sequence (Seq2Seq) is a neural network architecture used
for transforming one sequence of data into another.
• It is widely used in tasks like machine translation, text summarization,
chatbots, and speech recognition.
• Seq2Seq models typically consist of an Encoder and a Decoder.
• The Encoder processes the input sequence and converts it into a fixed-
length context vector (a numerical representation).
• The Decoder uses this context vector to generate the output sequence
step-by-step.
• Attention mechanisms are often added to Seq2Seq models to focus on
relevant parts of the input during decoding.
• Transformer-based models like BART, T5, and GPT use Seq2Seq for
improved text generation and understanding.
Encoder-Decoder Architecture
22
PROPOSED SYSTEM
Figure 4 : Encoder-Decoder Architecture
23
• The Encoder-Decoder architecture is a common framework in sequence-to-
sequence (Seq2Seq) tasks, primarily using LSTM (Long Short-Term Memory)
or GRU (Gated Recurrent Unit) models.
• An Encoder is the first component of the sequence-to-sequence (Seq2Seq)
architecture.
• It processes the input sequence (such as a sentence) and converts it into a
fixed-length context vector, also called a latent representation.
• The Encoder typically consists of multiple layers of recurrent neural networks
(RNNs), long short-term memory networks (LSTMs), gated recurrent units
(GRUs), or transformer blocks.
• Each layer captures the sequential and contextual information from the input
data.
• The final hidden state of the Encoder contains a comprehensive representation
of the input, which is passed to the Decoder for generating the output.
• Encoders are essential in tasks like machine translation, text summarization,
and speech recognition.
Encoder
Named Entity Recognition
• Definition: NER is an information extraction technique that identifies and classifies named
entities in text into predefined categories like names, organizations, locations, times, and
monetary values.
• Applications: NER is widely used in Natural Language Processing (NLP) to extract useful
information from large datasets, such as analyzing news articles, customer reviews, and social
media posts.
• Entity Classification: Detected entities are categorized into types like Person, Organization,
Location, Date, Quantity, and Monetary Value.
• NER Process: It involves two steps —
• Entity Detection: Identifies named entities in the text.
• Entity Categorization: Classifies the identified entities into specific categories.
• Tools Used: Libraries like SpaCy are commonly used for entity extraction and tagging,
providing efficient and accurate results.
• Practical Use Cases: NER helps in answering questions such as:
• Which companies are mentioned in a news article?
• Were specific products mentioned in reviews?
• Does a tweet contain the name of a person or location?
24
• A Decoder is a key component of the sequence-to-sequence (Seq2Seq)
architecture, responsible for generating the output sequence.
• It takes the context vector from the Encoder, which represents the input
sequence, and generates one output token at a time.
• The Decoder uses techniques like recurrent neural networks (RNNs), long
short-term memory networks (LSTMs), gated recurrent units (GRUs), or
transformer blocks for sequential processing.
• It predicts the next token by considering both the context vector and the tokens
generated so far.
• Attention mechanisms are often applied to help the Decoder focus on the most
relevant parts of the input sequence during generation.
• It is widely used in applications like machine translation, text summarization,
chatbot development, and image captioning.
Decoder
26
Figure 5:Encoder–Decoder architecture of the long short term memory (LSTM) network
‐ ‐
30
Software Specifications
SoftwareSpecification
OperatingSystem: Windows10
Tool: Jupiternotebook
Language: Python
Hardware Specification
Processor: IntelCorei3
Ram: 4GB
SystemType:64bit
CNN
31
CODE
import nltk
from nltk.tokenize import sent_tokenize, word_tokenize from nltk.stem
import WordNetLemmatizer
from nltk.corpus import stopwords import numpy as np
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense, Embedding import os
# Download required NLTK data nltk.download('punkt')
nltk.download('stopwords') import nltk
from nltk.tokenize import sent_tokenize, word_tokenize from nltk.stem
import WordNetLemmatizer
from nltk.corpus import stopwords import numpy as np
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense, Embedding import os
# Download required NLTK data nltk.download('punkt')
nltk.download('stopwords') nltk.download('wordnet')
32
CODE
already up-to-date! [nltk_data] Downloading package wordnet to /root/nltk_data... [nltk_data] Package
wordnet is already up-to-date!
True
model = whisper.load_model("base") # Load Whisper model result = model.transcribe(audio_path)
return result["text"]
# Step 3: Preprocess text (Tokenization, Lemmatization) def preprocess_text(text):
sentences = sent_tokenize(text) # Split into sentences lemmatizer = WordNetLemmatizer()
stop_words = set(stopwords.words('english')) processed_sentences = []
for sentence in sentences:
words = word_tokenize(sentence.lower()) # Lemmatize and remove stopwords
lemmatized = [lemmatizer.lemmatize(word) for word in words if word not in stop_words and
word.isalnum()]
processed_sentences.append(" ".join(lemmatized))
return sentences, processed_sentences
# Step 4: Create word embeddings (simple example using pre-trained GloVe) def
load_glove_embeddings(glove_file='glove.6B.100d.txt'):
embeddings_index = {}
33
CODE
Loading and Augmenting validation data:
with open(glove_file, encoding='utf-8') as f: for line in f:
values = line.split() word = values[0]
coefs = np.asarray(values[1:], dtype='float32') embeddings_index[word] = coefs
return embeddings_index def get_sentence_vectors(sentences, embeddings_index,
embedding_dim=100): sentence_vectors = []
for sentence in sentences:
words = word_tokenize(sentence) word_vectors = [embeddings_index.get(word,
np.zeros(embedding_dim)) for word in words] if word_vectors:
sentence_vectors.append(np.mean(word_vectors, axis=0)) else:
sentence_vectors.append(np.zeros(embedding_dim)) return np.array(sentence_vectors)
# Step 5: Build and train LSTM model for sentence scoring def build_lstm_model(input_dim,
sequence_length):
model = Sequential()
34
CODE
Input video link:
https://www.youtube.com/watch?si=OVw0QJdkwobv7_Ba&v=SrlFJf6v4fY&feature=youtu.be
35
CODE
Output Text:
Music So I'm happy to see all of you here because each one of you has a potential which
only English can help you to realize. You have great capabilities but all your talents, all
your capabilities are getting blocked because you do not know English. That is the job
which we have to do. But remember that one person called Smitharoi cannot teach you
English. No other person can teach you English. You have to learn it yourself. Just now you
heard that lots of people are seeing the videos which are there on YouTube which I did for
impact at various points. I did in IIT Kanpur and other places so those also might be there
on YouTube. But no YouTube video can change you. No lecture, no class can change you.
You have to learn English by self- effort. How to improve your communication? That is
listening, speaking, reading and writing. I have told in many videos of impact. So if you have
seen it, practice it. If you haven't seen it, please see it now after finishing. For many years,
Gampagaru has been asking me and you won't believe I get lots of mails, messages, phone
calls. When I am sitting in an important meeting, I get five or six calls. Madam, we have
seen your impact video. What is the use of calling me? Please do not call me at any point of
time. I can't teach you English on the cell phone. Not possible. Say Madam, if by speaking
to you my English becomes better. I don't have that much capability to speak. You have to
practice. You won't believe at least I get a few thousand mails per month. I can't answer
because I am an individual human being. I answer slowly.
Don't send WhatsApp or Viber message immediately. Practice for one year, practice
for six months, practice for five years. Only one person of these twenty lakhs, only one
person. Send a message saying, Madam, having learnt English from your video, I have
got a very good job. That means they practiced what I said. So you need to practice.
Whenever you get a free time, please practice English. Even if you practice with
yourself, it is good enough. So this course is not about communication skills. I am
not telling you anything about how you should improve your speaking, listening,
reading or writing. That you will go back to Gampasar's excellence, which he puts the
videos in YouTube. I have said in every video how to do it. One video I think is there
for about interview skills. So some people told me just yesterday one mail came,
saying, Madam, after listening to that video, that was in Thurupati, S.V. University, I
got a job yesterday. I wanted to tell you first. I felt so happy. I feel so wonderful. I
don't know that person. That person doesn't know me. Impact is doing such a great
job and getting. But then as we have been seeing, not many people are getting a job.
The reason nobody is following those videos. You are just listening. Very nice.
Appreciate.
37
But nobody wants to take so much trouble. When you read the newspaper, all grammar
is there in the newspaper. All vocabulary is there in the newspaper. Nobody reads. We
are only interested in what is happening to Kajriwal or what is happening to Prime
Minister Modi or what is KCR doing. Very good. That is called content. Look at the
language. Content we all know. We are very intelligent. Language we do not know. So we
have to improve. So from today, not only for the next four days, but for the next four
months or four years, 24 hours devoted only to English. Once you learn English, you
forget it. Don't bother to practice. By learning English, don't forget your mother tongue.
But please practice as much as you can. Go on practicing. So I have divided the course
into various lessons of grammar. I am told that this will also go on YouTube. So all those
lacks of people who are asking me questions. I hope we will see this and their problems
will be solved. But again and again, I am telling you that today, books do not teach us
vocabulary. Books do not teach us grammar. You need not buy an instant vocabulary
book. That is of no use. If you want to use English, learn from real life. Remember that
when you were born, your parents didn't give you a telegodixnery. They didn't give you a
grammar book. How to learn Telugu in 30 days? No, you learned automatically. You
learned by listening to others speaking good Telugu.
Conclusion
42
▰ Video summarization provides high-quality, text-based summaries for quick
information retrieval.
▰ It reduces the need to watch entire videos by offering concise insights.
▰ The process involves video-to-audio conversion, followed by audio-to-text
transcription.
▰ Extractive summarization selects key sentences using pre-trained machine learning
models.
▰ Named Entity Recognition (NER) is applied to extract relevant entities for tagging.
▰ Open-source libraries and state-of-the-art models enhance summarization
accuracy.
▰ The five stages include video input preparation, audio extraction, speech-to-text
conversion, extractive summarization, and entity extraction.
▰ This approach is beneficial in media analysis, education, research, and corporate
environments.
REFERENCES
43
• Yuan, J, Wang, H, Xiao, L, Zheng, W, Li, J, Lin, F & Zhang, B 2007, ‘A formal study of shot
boundary detection’, IEEE transactions on circuits and systems for video technology, vol.
17, no. 2, pp. 168-186.
• Guan, G, Wang, Z, Yu, K, Mei, S, He, M & Feng, D 2012, ‘Video summarization with global
and local features’, In 2012 IEEE International Conference on Multimedia and Expo
Workshops, pp. 570-575.
• Wei, H, Bingbing N, Yichao Y, Yu, H, Yang, X & Yao, C 2018, ‘Video Summarization via
Semantic Attended Networks’, Thirty-Second AAAI Conference on Artificial Intelligence.
• Sujatha, C & Mudenagudi, U 2011, ‘A Study on Keyframe Extraction Methods for Video
Summary’, International Conference on Computational Intelligence and Communication
Networks (CICN), vol. 73, no. 77, pp.7-9.
• Liu, T, Zhang, HJ & Qi, F 2003, ‘A novel video key-frame-extraction algorithm based on
perceived motion energy model’, IEEE transactions on circuits and systems for video
technology, vol. 13, no. 10,
• pp. 1006-1013.
• Ciocca, G & Schettini, R 2006, ‘An innovative algorithm for keyframe extraction in video
summarization’, Journal of Real-Time Image Processing (Springer), vol. 1, no. 1, pp. 69-88.
• Chang, IC & Cheng, KY 2007, ‘Content-selection based video summarization’, IEEE
International Conference On Consumer Electronics, Las Vegas Convention Center, USA, pp.
11-14
REFERENCES
44
• Sujatha, C & Mudenagudi, U 2011, ‘A Study on Keyframe Extraction Methods for Video
Summary’, International Conference on Computational Intelligence and Communication
Networks (CICN), vol. 73, no. 77, pp.7-9.
• Dhawale, AC & Jain, S 2008, ‘A novel approach towards key frame selection for video
summarization’, Asian Journal of Information Technology, vol. 7, no. 4, pp. 133-137.
• Congcong, L, Wu, YT, Shiaw-Shian, Y & Chen, T 2009, ‘Motion- focusing key frame
extraction and video summarization for lane surveillance system’, ICIP 2009, pp. 4329-
4332.
• Luo, C, Papin & Costello, K 2009, ‘Towards extracting semantically meaningful key frames
from personal video clips:from humans to computers’, IEEE Transactions On Circuits And
Systems For Video Technology, vol. 19, no. 2.
• Elkhattabi, Z, Tabii, Y & Benkaddour, A 2015, ‘Video summarization: Techniques and
applications’, World Academy of Science, Engineering and Technology, Interna- tional
Journal of Computer, Electrical, Automation, Control and Information Engineering, vol. 9,
no. 4,
• pp. 928-933.
•
THANK YOU

More Related Content

PPTX
Video Summarisation Presentation. .pptx
PDF
VIDEO TO TEXT SUMMARIZER USING AI.pdf
PPTX
Semantic Summarization of videos, Semantic Summarization of videos
PDF
Video Summarization
PPTX
Mtech First progress PRESENTATION ON VIDEO SUMMARIZATION
PDF
SUMMARY GENERATION FOR LECTURING VIDEOS
PDF
Multimodal video abstraction into a static document using deep learning
PDF
Parking Surveillance Footage Summarization
Video Summarisation Presentation. .pptx
VIDEO TO TEXT SUMMARIZER USING AI.pdf
Semantic Summarization of videos, Semantic Summarization of videos
Video Summarization
Mtech First progress PRESENTATION ON VIDEO SUMMARIZATION
SUMMARY GENERATION FOR LECTURING VIDEOS
Multimodal video abstraction into a static document using deep learning
Parking Surveillance Footage Summarization

Similar to video to text summarization using natural languyge proccesing (20)

PPTX
Mtech Fourth progress presentation
PPTX
Explaining video summarization based on the focus of attention
PPTX
Mtech Second progresspresentation ON VIDEO SUMMARIZATION
PPTX
M.tech Third progress Presentation
PPTX
Hierarchical structure adaptive
PDF
pgdip-project-report-final-148245F
PPTX
Video Description using Deep Learning
PPTX
Youtube Transcript Sumariser- application of API
PDF
Video content analysis and retrieval system using video storytelling and inde...
PPTX
Unsupervised object-level video summarization with online motion auto-encoder
PDF
Icme2020 tutorial video_summarization_part1
PDF
GAN-based video summarization
PPTX
TVSum: Summarizing Web Videos Using Titles
PDF
Group 04 te_a_mini project_ report
PPTX
SANGEETA_YADAV_AI_VIDEO_SUMMARIZER_WEB_APP.pptx
PDF
Text Summarization and Conversion of Speech to Text
PDF
Video Summarization for Sports
PDF
A Novel Method for An Intelligent Based Voice Meeting System Using Machine Le...
PPTX
rtttttttttttttttttttttttttttttttttr.pptx
PPTX
Bhuvan T...................... D[1].pptx
Mtech Fourth progress presentation
Explaining video summarization based on the focus of attention
Mtech Second progresspresentation ON VIDEO SUMMARIZATION
M.tech Third progress Presentation
Hierarchical structure adaptive
pgdip-project-report-final-148245F
Video Description using Deep Learning
Youtube Transcript Sumariser- application of API
Video content analysis and retrieval system using video storytelling and inde...
Unsupervised object-level video summarization with online motion auto-encoder
Icme2020 tutorial video_summarization_part1
GAN-based video summarization
TVSum: Summarizing Web Videos Using Titles
Group 04 te_a_mini project_ report
SANGEETA_YADAV_AI_VIDEO_SUMMARIZER_WEB_APP.pptx
Text Summarization and Conversion of Speech to Text
Video Summarization for Sports
A Novel Method for An Intelligent Based Voice Meeting System Using Machine Le...
rtttttttttttttttttttttttttttttttttr.pptx
Bhuvan T...................... D[1].pptx
Ad

Recently uploaded (20)

PDF
EXPLORING LEARNING ENGAGEMENT FACTORS INFLUENCING BEHAVIORAL, COGNITIVE, AND ...
PDF
COURSE DESCRIPTOR OF SURVEYING R24 SYLLABUS
PPTX
Safety Seminar civil to be ensured for safe working.
PDF
Accra-Kumasi Expressway - Prefeasibility Report Volume 1 of 7.11.2018.pdf
PDF
Design Guidelines and solutions for Plastics parts
PPTX
Nature of X-rays, X- Ray Equipment, Fluoroscopy
PPTX
"Array and Linked List in Data Structures with Types, Operations, Implementat...
PPTX
AUTOMOTIVE ENGINE MANAGEMENT (MECHATRONICS).pptx
PPTX
Information Storage and Retrieval Techniques Unit III
PDF
Level 2 – IBM Data and AI Fundamentals (1)_v1.1.PDF
PPTX
Sorting and Hashing in Data Structures with Algorithms, Techniques, Implement...
PPTX
communication and presentation skills 01
PPTX
Feature types and data preprocessing steps
PDF
Automation-in-Manufacturing-Chapter-Introduction.pdf
PDF
Categorization of Factors Affecting Classification Algorithms Selection
PDF
Human-AI Collaboration: Balancing Agentic AI and Autonomy in Hybrid Systems
PPTX
Graph Data Structures with Types, Traversals, Connectivity, and Real-Life App...
PDF
Soil Improvement Techniques Note - Rabbi
PPTX
Fundamentals of Mechanical Engineering.pptx
PDF
BIO-INSPIRED ARCHITECTURE FOR PARSIMONIOUS CONVERSATIONAL INTELLIGENCE : THE ...
EXPLORING LEARNING ENGAGEMENT FACTORS INFLUENCING BEHAVIORAL, COGNITIVE, AND ...
COURSE DESCRIPTOR OF SURVEYING R24 SYLLABUS
Safety Seminar civil to be ensured for safe working.
Accra-Kumasi Expressway - Prefeasibility Report Volume 1 of 7.11.2018.pdf
Design Guidelines and solutions for Plastics parts
Nature of X-rays, X- Ray Equipment, Fluoroscopy
"Array and Linked List in Data Structures with Types, Operations, Implementat...
AUTOMOTIVE ENGINE MANAGEMENT (MECHATRONICS).pptx
Information Storage and Retrieval Techniques Unit III
Level 2 – IBM Data and AI Fundamentals (1)_v1.1.PDF
Sorting and Hashing in Data Structures with Algorithms, Techniques, Implement...
communication and presentation skills 01
Feature types and data preprocessing steps
Automation-in-Manufacturing-Chapter-Introduction.pdf
Categorization of Factors Affecting Classification Algorithms Selection
Human-AI Collaboration: Balancing Agentic AI and Autonomy in Hybrid Systems
Graph Data Structures with Types, Traversals, Connectivity, and Real-Life App...
Soil Improvement Techniques Note - Rabbi
Fundamentals of Mechanical Engineering.pptx
BIO-INSPIRED ARCHITECTURE FOR PARSIMONIOUS CONVERSATIONAL INTELLIGENCE : THE ...
Ad

video to text summarization using natural languyge proccesing

  • 1. VIDEO-TO-TEXT SUMMARIZATION USING NLP: TRANSFORMING VISUAL CONTENT INTO CONCISE TEXT SUMMARIES Reg.no Name 23B81D5906 Mekala Hari Ranjitha Nalini Guide : Dr N.Deepak Professor Department of CSE, Sir C.R.Reddy College of Engineering . SIR C R REDDY COLLEGE OF ENGINEERING, ELURU Approved by AICTE & Permanently Affiliated to JNTUK, Kakinada Accredited by NBA, Accredited by NAAC with ‘A’ Grade DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
  • 2. 1 OUTLINE OF THE PRESENTATION • Abstract • Introduction • Literature Survey • Problem Statement • Existing System • Proposed System • Code and Implementation • Output Screens • Conclusion
  • 3. ABSTRACT ▰ Video summarization aims to produce a high-quality text-based summary of videos. ▰ The process involves converting video files to audio files, followed by converting the audio into text. ▰ Transformer architecture of Natural Language Processing (NLP) enhances the workflow. ▰ An extractive-video-summarizer is introduced using state-of-the-art pre-trained ML models and open-source libraries. ▰ The summarizer follows a systematic regime consisting of five stages: ▻ Preparation of a multidisciplinary dataset of videos. ▻ Audio extraction from video files. ▻ Text generation from audio files using Automatic Speech Recognition (ASR). ▻ Text summarization using extractive summarizers. ▻ Entity extraction using Named Entity Recognition (NER). Class 2
  • 4. ABSTRACT(cont…) ▰ Project was conducted primarily on English languages. ▰ The model performs significantly well and generates accurate, contextually relevant tags for videos. ▰ Video datasets are collected from various domains to ensure diversity. ▰ Specialized tools are used for extracting audio from video files. ▰ Advanced ASR systems are employed to ensure accurate speech-to-text conversion. ▰ Extractive summarizers generate concise and informative summaries. ▰ Named Entity Recognition (NER) identifies key entities like names, locations, and events. ▰ State-of-the-art pre-trained models enhance performance and accuracy. ▰ Evaluation metrics demonstrate the model’s effectiveness in generating relevant summaries. ▰ Effective content management and information retrieval are achieved through the generated summaries. ▰ The extractive-video-summarizer offers a robust solution for video content analysis and summarization. 2
  • 5. INTRODUCTION 3 • Video summarization is an essential technique for generating concise, high-quality text-based summaries of videos. • It helps users quickly understand the core information and key insights from video content. • The summarization process involves converting video files to audio and subsequently transcribing the audio to text. • Transformer-based Natural Language Processing (NLP) models significantly enhance the accuracy and quality of the summaries. • Existing text summarization models have paved the way for advancements in video summarization. • Our proposed extractive-video-summarizer leverages state-of-the-art pre-trained Machine Learning (ML) models and open-source libraries. • The model follows a structured approach, encompassing video data collection, audio extraction, transcription, extractive summarization, and entity extraction.
  • 6. INTRODUCTION 5 • The summarizer ensures effective content management and rapid information retrieval. • Robust evaluation metrics confirm its effectiveness in generating accurate and relevant summaries. • The entity extraction feature further enhances summary quality by identifying key information like names, locations, and events. • Open-source libraries provide flexibility and seamless integration into various applications. • The model’s systematic regime ensures adaptability across diverse video datasets from multiple domains. • Its advanced ASR systems offer precise speech-to-text conversion, facilitating accurate transcription. • Evaluation results indicate superior performance compared to traditional methods. • This research demonstrates the practical application of AI in automating video content analysis and management. • The extractive-video-summarizer provides a scalable, efficient, and reliable solution for video analysis. • Ultimately, it enhances the accessibility of information by generating insightful video summaries in a time-efficient manner.
  • 7. INTRODUCTION 6 • Speech Recognition is a prominent field within machine learning, widely applied across various domains. • It powers applications like automatic subtitles on platforms such as Netflix and YouTube. • Popular voice assistants like Google Home Mini, Amazon Alexa, and Apple Siri rely heavily on Speech Recognition. • Named Entity Recognition (NER) is a crucial Natural Language Processing (NLP) technique that identifies and extracts specific entities from text. • NER can detect product names, events, and locations, enhancing search engines, chatbots, and automated data entry systems. • Text analysis using NER enables the classification of entities into predefined categories like dates, phone numbers, or monetary values. • The primary objective of our model is to generate audio files from videos, convert them into text, and extract relevant entities. • Using NLP, applications can process video content to produce text transcripts and extract entities.
  • 8. INTRODUCTION 6 • Extracted entities are used to generate meaningful tags that enrich video metadata. • This enriched metadata significantly enhances content recommendations for users. • Entity extraction streamlines content management and makes video data more accessible. • Video platforms can deliver personalized content by leveraging extracted entities for better recommendations. • Automated entity extraction reduces manual effort, improving operational efficiency. • Our model ensures accurate entity extraction by utilizing pre-trained NLP models. • It supports multiple languages, broadening its usability and reach. • Evaluations indicate its effectiveness in improving content discoverability and user experience. • By integrating Speech Recognition and NER, the model provides a comprehensive solution for video content analysis. • Ultimately, it offers a robust, scalable, and intelligent framework for video summarization and entity extraction.
  • 9. Video summarization techniques and their contributions The video summarization classifications based on their characteristics and properties are shown in Fig. 1.
  • 10. 8 Feature-Based Video Summarization (VS) Techniques • Feature-based techniques focus on video characteristics such as motion, color, gesture, audio-visual aspects, speech, and objects. • Low-level features like color and texture are commonly used for video content extraction. Clustering-Based VS Techniques • Clustering techniques like k-means, partitioning, and spectral clustering are widely used for video summarization • The summary length is determined by content selection criteria and various evaluation techniques. Shot Selection-Based VS Techniques • Generic video summaries are created using keyframe extraction, shot boundary detection, scene change methods, and redundancy reduction • Video skimming involves reducing redundancy and detecting objects or events • Function-based methods use attention mechanisms to identify important video segments Structure-based methods exploit hierarchical story structures using frames and shots. Video Summarization Techniques
  • 11. 8 Event-Based VS Techniques • video summaries generated based on objects, events, perceptions, and features. • High-level features such as specific faces, motions, and gestures provide reliable content information ( • It events from keyframes using minimum and maximum frame boundaries. • Graph theory and scale-free networks are used for video event extraction in mono-view videos • Multi-view videos use techniques like Basic Local Alignment Search • State-of-the-art techniques generate event summaries for sports videos like soccer, cricket, tennis, and basketball Trajectory-Based VS Techniques • Initial projects focused on static video summaries. • Dynamic video summaries are created using trajectory-based methods with stationary backgrounds. • These methods are computationally expensive and require significant resources. • Deep learning approaches provide effective solutions for detecting important video content.
  • 16. 16 Problem Statement Video summarization using NLP remains a challenging task due to the diversity and complexity of video content. Existing methods often struggle with accurately extracting relevant information from videos, resulting in low-quality summaries. Additionally, techniques relying on low-level features like color and texture lack contextual understanding. There is a need for more robust methodologies that combine advanced NLP techniques, entity extraction, and deep learning models to generate meaningful video summaries. This project aims to address these challenges by developing efficient video-to-text summarization systems. LIVER DISEASE
  • 17. 17 Here are the existing system problems: Existing System • Complexity and Diversity of Video Content Videos contain various elements like scenes, objects, and interactions, making it challenging to extract relevant information. • Low-Quality Summaries Existing methods often generate inaccurate or incomplete summaries due to poor feature selection. • Lack of Contextual Understanding Approaches using low-level features like color, texture, or motion fail to comprehend the context of the video. • Inefficient Use of NLP Techniques Insufficient utilization of advanced NLP models for understanding the semantics and generating meaningful summaries. • Need for Robust Solutions There is a requirement for improved methodologies combining deep learning models, entity extraction, and language understanding for better video-to- text summarization.
  • 18. 19 PROPOSED SYSTEM Proposed System Architecture In the figure shows block diagram of the system architecture outlining the key stages of our model. Video file Extractive Summarization Abstractive Summarizati on Encoder & Decoder Named Entity Recognition text summarization
  • 19. Video File: Serves as the input to the system. Extractive Summarization: Selects key sentences directly from the transcribed text. Abstractive Summarization: Generates a concise and coherent summary using natural language generation techniques. Encoder & Decoder: Processes the text using a transformer- based mechanism to understand its context and meaning. Named Entity Recognition (NER): Identifies and categorizes entities like names, locations, and dates to enhance the summary's informativeness. Text Summarization: Produces the final summarized text as the output.
  • 20. 20 Extractive Summarization Proposed System Model • Extractive summarization involves selecting and extracting the most relevant sentences or phrases directly from the original text. • It uses ranking algorithms or machine learning models to identify the most informative sentences. • Common methods include TextRank, LexRank, and clustering-based approaches. • It is useful for news articles, research papers, and legal documents where factual accuracy is crucial. • It maintains the original meaning of the text with high accuracy. • It can result in summaries lacking coherence and fluidity since sentences are directly extracted without rephrasing. Figure 2 : Extractive summarization
  • 21. Abstractive Summarization • Abstractive summarization generates a concise and coherent summary by understanding the context and meaning of the text. • It uses advanced natural language generation (NLG) techniques to create new sentences that convey the main ideas. • Models like BART, T5, and GPT are commonly used for abstractive summarization. • It is beneficial for summarizing conversational text, articles, or reports where coherence and readability are essential. • It can produce human-like summaries by paraphrasing and rephrasing content. • It may introduce factual inconsistencies or lose key information if not trained properly. Figure 3: Abstractive summarization
  • 22. 21 • Sequence-to-sequence (Seq2Seq) is a neural network architecture used for transforming one sequence of data into another. • It is widely used in tasks like machine translation, text summarization, chatbots, and speech recognition. • Seq2Seq models typically consist of an Encoder and a Decoder. • The Encoder processes the input sequence and converts it into a fixed- length context vector (a numerical representation). • The Decoder uses this context vector to generate the output sequence step-by-step. • Attention mechanisms are often added to Seq2Seq models to focus on relevant parts of the input during decoding. • Transformer-based models like BART, T5, and GPT use Seq2Seq for improved text generation and understanding. Encoder-Decoder Architecture
  • 23. 22 PROPOSED SYSTEM Figure 4 : Encoder-Decoder Architecture
  • 24. 23 • The Encoder-Decoder architecture is a common framework in sequence-to- sequence (Seq2Seq) tasks, primarily using LSTM (Long Short-Term Memory) or GRU (Gated Recurrent Unit) models. • An Encoder is the first component of the sequence-to-sequence (Seq2Seq) architecture. • It processes the input sequence (such as a sentence) and converts it into a fixed-length context vector, also called a latent representation. • The Encoder typically consists of multiple layers of recurrent neural networks (RNNs), long short-term memory networks (LSTMs), gated recurrent units (GRUs), or transformer blocks. • Each layer captures the sequential and contextual information from the input data. • The final hidden state of the Encoder contains a comprehensive representation of the input, which is passed to the Decoder for generating the output. • Encoders are essential in tasks like machine translation, text summarization, and speech recognition. Encoder
  • 25. Named Entity Recognition • Definition: NER is an information extraction technique that identifies and classifies named entities in text into predefined categories like names, organizations, locations, times, and monetary values. • Applications: NER is widely used in Natural Language Processing (NLP) to extract useful information from large datasets, such as analyzing news articles, customer reviews, and social media posts. • Entity Classification: Detected entities are categorized into types like Person, Organization, Location, Date, Quantity, and Monetary Value. • NER Process: It involves two steps — • Entity Detection: Identifies named entities in the text. • Entity Categorization: Classifies the identified entities into specific categories. • Tools Used: Libraries like SpaCy are commonly used for entity extraction and tagging, providing efficient and accurate results. • Practical Use Cases: NER helps in answering questions such as: • Which companies are mentioned in a news article? • Were specific products mentioned in reviews? • Does a tweet contain the name of a person or location?
  • 26. 24 • A Decoder is a key component of the sequence-to-sequence (Seq2Seq) architecture, responsible for generating the output sequence. • It takes the context vector from the Encoder, which represents the input sequence, and generates one output token at a time. • The Decoder uses techniques like recurrent neural networks (RNNs), long short-term memory networks (LSTMs), gated recurrent units (GRUs), or transformer blocks for sequential processing. • It predicts the next token by considering both the context vector and the tokens generated so far. • Attention mechanisms are often applied to help the Decoder focus on the most relevant parts of the input sequence during generation. • It is widely used in applications like machine translation, text summarization, chatbot development, and image captioning. Decoder
  • 27. 26 Figure 5:Encoder–Decoder architecture of the long short term memory (LSTM) network ‐ ‐
  • 28. 30 Software Specifications SoftwareSpecification OperatingSystem: Windows10 Tool: Jupiternotebook Language: Python Hardware Specification Processor: IntelCorei3 Ram: 4GB SystemType:64bit CNN
  • 29. 31 CODE import nltk from nltk.tokenize import sent_tokenize, word_tokenize from nltk.stem import WordNetLemmatizer from nltk.corpus import stopwords import numpy as np from tensorflow.keras.models import Sequential from tensorflow.keras.layers import LSTM, Dense, Embedding import os # Download required NLTK data nltk.download('punkt') nltk.download('stopwords') import nltk from nltk.tokenize import sent_tokenize, word_tokenize from nltk.stem import WordNetLemmatizer from nltk.corpus import stopwords import numpy as np from tensorflow.keras.models import Sequential from tensorflow.keras.layers import LSTM, Dense, Embedding import os # Download required NLTK data nltk.download('punkt') nltk.download('stopwords') nltk.download('wordnet')
  • 30. 32 CODE already up-to-date! [nltk_data] Downloading package wordnet to /root/nltk_data... [nltk_data] Package wordnet is already up-to-date! True model = whisper.load_model("base") # Load Whisper model result = model.transcribe(audio_path) return result["text"] # Step 3: Preprocess text (Tokenization, Lemmatization) def preprocess_text(text): sentences = sent_tokenize(text) # Split into sentences lemmatizer = WordNetLemmatizer() stop_words = set(stopwords.words('english')) processed_sentences = [] for sentence in sentences: words = word_tokenize(sentence.lower()) # Lemmatize and remove stopwords lemmatized = [lemmatizer.lemmatize(word) for word in words if word not in stop_words and word.isalnum()] processed_sentences.append(" ".join(lemmatized)) return sentences, processed_sentences # Step 4: Create word embeddings (simple example using pre-trained GloVe) def load_glove_embeddings(glove_file='glove.6B.100d.txt'): embeddings_index = {}
  • 31. 33 CODE Loading and Augmenting validation data: with open(glove_file, encoding='utf-8') as f: for line in f: values = line.split() word = values[0] coefs = np.asarray(values[1:], dtype='float32') embeddings_index[word] = coefs return embeddings_index def get_sentence_vectors(sentences, embeddings_index, embedding_dim=100): sentence_vectors = [] for sentence in sentences: words = word_tokenize(sentence) word_vectors = [embeddings_index.get(word, np.zeros(embedding_dim)) for word in words] if word_vectors: sentence_vectors.append(np.mean(word_vectors, axis=0)) else: sentence_vectors.append(np.zeros(embedding_dim)) return np.array(sentence_vectors) # Step 5: Build and train LSTM model for sentence scoring def build_lstm_model(input_dim, sequence_length): model = Sequential()
  • 33. 35 CODE Output Text: Music So I'm happy to see all of you here because each one of you has a potential which only English can help you to realize. You have great capabilities but all your talents, all your capabilities are getting blocked because you do not know English. That is the job which we have to do. But remember that one person called Smitharoi cannot teach you English. No other person can teach you English. You have to learn it yourself. Just now you heard that lots of people are seeing the videos which are there on YouTube which I did for impact at various points. I did in IIT Kanpur and other places so those also might be there on YouTube. But no YouTube video can change you. No lecture, no class can change you. You have to learn English by self- effort. How to improve your communication? That is listening, speaking, reading and writing. I have told in many videos of impact. So if you have seen it, practice it. If you haven't seen it, please see it now after finishing. For many years, Gampagaru has been asking me and you won't believe I get lots of mails, messages, phone calls. When I am sitting in an important meeting, I get five or six calls. Madam, we have seen your impact video. What is the use of calling me? Please do not call me at any point of time. I can't teach you English on the cell phone. Not possible. Say Madam, if by speaking to you my English becomes better. I don't have that much capability to speak. You have to practice. You won't believe at least I get a few thousand mails per month. I can't answer because I am an individual human being. I answer slowly.
  • 34. Don't send WhatsApp or Viber message immediately. Practice for one year, practice for six months, practice for five years. Only one person of these twenty lakhs, only one person. Send a message saying, Madam, having learnt English from your video, I have got a very good job. That means they practiced what I said. So you need to practice. Whenever you get a free time, please practice English. Even if you practice with yourself, it is good enough. So this course is not about communication skills. I am not telling you anything about how you should improve your speaking, listening, reading or writing. That you will go back to Gampasar's excellence, which he puts the videos in YouTube. I have said in every video how to do it. One video I think is there for about interview skills. So some people told me just yesterday one mail came, saying, Madam, after listening to that video, that was in Thurupati, S.V. University, I got a job yesterday. I wanted to tell you first. I felt so happy. I feel so wonderful. I don't know that person. That person doesn't know me. Impact is doing such a great job and getting. But then as we have been seeing, not many people are getting a job. The reason nobody is following those videos. You are just listening. Very nice. Appreciate.
  • 35. 37 But nobody wants to take so much trouble. When you read the newspaper, all grammar is there in the newspaper. All vocabulary is there in the newspaper. Nobody reads. We are only interested in what is happening to Kajriwal or what is happening to Prime Minister Modi or what is KCR doing. Very good. That is called content. Look at the language. Content we all know. We are very intelligent. Language we do not know. So we have to improve. So from today, not only for the next four days, but for the next four months or four years, 24 hours devoted only to English. Once you learn English, you forget it. Don't bother to practice. By learning English, don't forget your mother tongue. But please practice as much as you can. Go on practicing. So I have divided the course into various lessons of grammar. I am told that this will also go on YouTube. So all those lacks of people who are asking me questions. I hope we will see this and their problems will be solved. But again and again, I am telling you that today, books do not teach us vocabulary. Books do not teach us grammar. You need not buy an instant vocabulary book. That is of no use. If you want to use English, learn from real life. Remember that when you were born, your parents didn't give you a telegodixnery. They didn't give you a grammar book. How to learn Telugu in 30 days? No, you learned automatically. You learned by listening to others speaking good Telugu.
  • 36. Conclusion 42 ▰ Video summarization provides high-quality, text-based summaries for quick information retrieval. ▰ It reduces the need to watch entire videos by offering concise insights. ▰ The process involves video-to-audio conversion, followed by audio-to-text transcription. ▰ Extractive summarization selects key sentences using pre-trained machine learning models. ▰ Named Entity Recognition (NER) is applied to extract relevant entities for tagging. ▰ Open-source libraries and state-of-the-art models enhance summarization accuracy. ▰ The five stages include video input preparation, audio extraction, speech-to-text conversion, extractive summarization, and entity extraction. ▰ This approach is beneficial in media analysis, education, research, and corporate environments.
  • 37. REFERENCES 43 • Yuan, J, Wang, H, Xiao, L, Zheng, W, Li, J, Lin, F & Zhang, B 2007, ‘A formal study of shot boundary detection’, IEEE transactions on circuits and systems for video technology, vol. 17, no. 2, pp. 168-186. • Guan, G, Wang, Z, Yu, K, Mei, S, He, M & Feng, D 2012, ‘Video summarization with global and local features’, In 2012 IEEE International Conference on Multimedia and Expo Workshops, pp. 570-575. • Wei, H, Bingbing N, Yichao Y, Yu, H, Yang, X & Yao, C 2018, ‘Video Summarization via Semantic Attended Networks’, Thirty-Second AAAI Conference on Artificial Intelligence. • Sujatha, C & Mudenagudi, U 2011, ‘A Study on Keyframe Extraction Methods for Video Summary’, International Conference on Computational Intelligence and Communication Networks (CICN), vol. 73, no. 77, pp.7-9. • Liu, T, Zhang, HJ & Qi, F 2003, ‘A novel video key-frame-extraction algorithm based on perceived motion energy model’, IEEE transactions on circuits and systems for video technology, vol. 13, no. 10, • pp. 1006-1013. • Ciocca, G & Schettini, R 2006, ‘An innovative algorithm for keyframe extraction in video summarization’, Journal of Real-Time Image Processing (Springer), vol. 1, no. 1, pp. 69-88. • Chang, IC & Cheng, KY 2007, ‘Content-selection based video summarization’, IEEE International Conference On Consumer Electronics, Las Vegas Convention Center, USA, pp. 11-14
  • 38. REFERENCES 44 • Sujatha, C & Mudenagudi, U 2011, ‘A Study on Keyframe Extraction Methods for Video Summary’, International Conference on Computational Intelligence and Communication Networks (CICN), vol. 73, no. 77, pp.7-9. • Dhawale, AC & Jain, S 2008, ‘A novel approach towards key frame selection for video summarization’, Asian Journal of Information Technology, vol. 7, no. 4, pp. 133-137. • Congcong, L, Wu, YT, Shiaw-Shian, Y & Chen, T 2009, ‘Motion- focusing key frame extraction and video summarization for lane surveillance system’, ICIP 2009, pp. 4329- 4332. • Luo, C, Papin & Costello, K 2009, ‘Towards extracting semantically meaningful key frames from personal video clips:from humans to computers’, IEEE Transactions On Circuits And Systems For Video Technology, vol. 19, no. 2. • Elkhattabi, Z, Tabii, Y & Benkaddour, A 2015, ‘Video summarization: Techniques and applications’, World Academy of Science, Engineering and Technology, Interna- tional Journal of Computer, Electrical, Automation, Control and Information Engineering, vol. 9, no. 4, • pp. 928-933. •