SlideShare a Scribd company logo
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 06 Issue: 03 | Mar 2019 www.irjet.net p-ISSN: 2395-0072
© 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 2324
Recurrent Neural Network for Human Action Recognition using
Star Skeletonization
Nithin Paulose1, M. Muthukumar2, S. Swathi3, M. Vignesh4
1,2B.E.Computer Science and Engineering, Dhaanish Ahmed Institute of Technology, Coimbatore, Tamil Nadu, India
3,4Assistant Professor, Dept. of Computer Science and Engineering, Dhaanish Ahmed Institute of Technology,
Coimbatore, Tamil Nadu, India
---------------------------------------------------------------------***----------------------------------------------------------------------
Abstract - This project presents a Recurrent Neural
Network (RNN) methodology for Human Action Recognition
using star skeleton as a representative descriptor of human
posture. Star skeleton is a fast skeletonization technique by
connecting from geometric center of target object to contour
extremes. For the action recognition using the feature star
skeleton, we clearly define the feature as a five-dimensional
vector in star fashion because the head and four limbs are
usually local extremes of human shape. In our project we
assumed an action is composed of a series of star skeletons
overtime. Therefore, the images whicharetime-sequential are
expressing human action that is transformed into a feature
vector sequence. Then the feature vector sequence must be
transformed into symbol sequence so that RNN can model the
action. We used RNN because the features extracted are time
dependent.
Key Words: Recurrent Neural Network (RNN), star skeleton,
contour extremes, Human Action Recognition, five-
dimensional vector, time-sequential.
1. INTRODUCTION
Human activity recognition is an important task for
Ambient Intelligence systems. The state of a person is to be
recognized, which provides us with valuable information
that is been used as input for other systems. For example, in
health care, fall detection can be used to alert the medical in
case of an accident; in security, abnormal behavior can be
detected and thus used to prevent a burglary or other
criminal activities. Human motion analysis is currently
receiving increasing attention from computer vision
researchers. For example, Human body segmentation in an
image, the movement of joints are tracked in an image
sequence, and the analysisofathletic performanceisdone by
recovering underlying 3D body structure , also used for
medical diagnostics. Other applications include building
man-machine user interfaces and video conferencing.
The goal of human activity recognition is to
automatically analyses on -goingactivitiesfromanunknown
video. The objective of the system is to correctly classify the
video into its activity category, for example where a video is
segmented to contain only one executionofa humanactivity.
The starting and ending times of all occurring activitiesfrom
an input video is detected, from which the continuous
recognition of human activities are performed. The
constructions of several important applications are
constructed from the videos which has the ability to
recognize complex humanactivities.Automatedsurveillance
systems in public places like airports and subway stations
require detection of abnormal and suspicious activities, as
opposed to normal activities. For example, an automatic
recognize of suspicious activities like “a personleavesa bag”
or “a person places his/her bag in a trash bin” in an airport
surveillance systemmustberecognized.Usingrecognitionof
human activity the real-time monitoring of patients,
children, and elderly persons can be done. By using activity
recognition the construction of gesture-based human
computer interfaces and vision-based intelligent
environments becomes possible.
There are various types of human activities.
Depending on their complexity, they can be conceptually
categorized into four different levels: gestures, actions,
interactions, and group activities.
1.1 HUMAN ACTIVITY RECOGNITION FROM VIDEO
SEQUENCES
Human activity recognition role is that human-to-
human interaction and interpersonal relations. HAR
provides information about the identity, personality, and
psychological state that is difficult to extract. In various
classification techniques two main questions emerge:
“Where it is in the video?” and “What is the action?” To
recognize human activities one must determine the active
states of a person, to recognize the efficiency. “Walking” is
the daily human activity to recognize. The complexactivities
such as “peeling an apple” are more difficult to identify. The
easier way to recognize is to simplify the complex activities
into other simpler activities. For the better understanding of
human activities the detectionofobjects mayprovideuseful
information about the ongoing event.
The human activity recognition assumes a figure-
centric scene of uncluttered background, where the actor is
free to perform an activity. The challenging task to classifya
person’s activities with low error in a fully automated
human activity recognition system are background clutter,
partial occlusion, changes in scale, viewpoint, lighting and
appearance, and frame resolution. Moreover, commenting
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 06 Issue: 03 | Mar 2019 www.irjet.net p-ISSN: 2395-0072
© 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 2325
behavioral roles is time consuming and requires knowledge
of the specific event. However, intra- and inter-class
similarities make the problem more challenging, that is ,the
same action done by two people may finding difficult to
detect their individual action . The human activity is based
on their habits, and this make s so difficulttodeterminetheir
activity. In real time the challenging tasks is that, the
construction of visual model for learning and analyzing
human movements with inadequate benchmark datasets for
evaluation.
Fig 1: Architecture diagram
1.2 HUMAN ACTIVITY CATEGORIZATION
The human activity recognition methods are
classified into two main categories: (i) uni-modal and (ii)
multimodal activity recognition methods according to the
nature of sensor data they employ. Then, these two
categories is further break down into sub-categories
depending on how they model human activities.
Uni-modal methodsrepresenthumanactivitiesfrom
data of a single modality, such as images, and they are
further categorized as: (i) space-time, (ii) stochastic, (iii)
rule-based, and (iv) shape based methods. Space-time
methods involve activity recognition methods, which
represent human activities as a set of spatio-temporal
features or trajectories. By applying statistical models to
represent human actions stochastic methods are used. The
modelings the motion of human body parts are shape-based
methods efficiently represent activities with high level
reasoning.
Multimodal methods combine features collected
from different sources and are classified into three
categories: (i) affective, (ii) behavioral, and (iii) social
networking methods. Emotional communications and the
affective state of a person represent human activity.
Behavioral methods aim to recognize behavioral attributes,
non-verbal multimodal cues such as gestures, facial
expressions and auditory cues.Thecharacteristicsaremodel
by social networking methods .It model the characteristics
and the behavior of humans in several layers of human-to-
human interactions in social events from gestures, body
motion, and speech. The activity and behavior are the two
terms which are used interchangeably. These two terms
activity and behavior is used to describe a sequence of
actions that correspond to specific body motion and to
characterize both activities and events that are associated
with facial expression, emotional states, and gestures with
single person auditory cues.
1.3 HUMAN ACTIVITY RECOGNITION MODEL
The human activity recognition has largely focused
on statistical methods using spatio-temporal features. The
typical model consists of spatio-temporal interest-points
which are detected in the video sequence and the local
maxima become the centerpointofa spatio-temporal region.
Features are then extracted from the spatio-temporal region
(such as features based on optical flow or gradient values)
and summarized or histogram med to form a feature
descriptor. The feature descriptors are used to form a code
book, typically followed by a bag of visual words model
adapted from statistical natural language processing. While
methods based on spatio-temporal features are the most
common, other methods make use of other video features
such as medium-term tracking, volumetric representations
and graph-based features.
2. LITERATURE REVIEW
[1] A computational efficient action recognition
framework using depth motion maps (DMMs)-based local
binary patterns (LBPs) and kernel-based extreme learning
machine (KELM). In depth video sequence depth frames are
projected onto three orthogonal Cartesianplanestoformthe
projected images corresponding to three projection views
[front (f), side (s), and top (t) views].For calculating LBP
histogram an LBP operator is applied to each block and the
DMMs are divided into overlapped blocks. Feature-level
fusion and decision-level fusion approachesareinvestigated
using KELM [2] in this paper, we propose to use human
limbs to augment constraints between neighboring human
joints. We model a limb as a wide line to represent its shape
information. Instead of estimating its length and rotation
angle, we calculate as per-pixel likelihood for each human
limb by a ConvNet. [3] Video based action recognition is one
of the important and challenging problems in computer
vision re- search. The several realistic datasets are the
HMDB51, UCF50, and UCF101. BoVW is a general pipeline to
construct a global representation from local features
composed of five steps; (i) feature extraction, (ii) feature
pre-processing, (iii) codebook generation, (iv)feature
encoding, and (v) pooling and normalization.[4]For efficient
video representation on action recognition is shown by
dense and achieved state-of-the-art results on a variety of
datasets. The performance is corrected bytakingthecamera
motion into account. The feature points between frames
using SURF descriptors and dense optical flow are used to
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 06 Issue: 03 | Mar 2019 www.irjet.net p-ISSN: 2395-0072
© 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 2326
estimate camera motion. With the help of RANSAC
homography is estimated. The performance can be
significantly improved by removing backgroundtrajectories
and warping optical flow with a robustly estimated
homograph approximating the camera motion. [5] The
discriminative features for action recognition Convolutional
Neural Networks (ConvNets) are employed into color
texture images that are referred to as skeleton optical
spectra. The learning of suitable dynamic features and
ConvNet architecture from skeleton sequences is possible
without training millions of parameters from this kind of
spectrum views.
3. DESIGN METHODS
There are some methods used for Human Action
Recognition using RNN.
3.1 HUMAN SILHOUETTE EXTRACTION
In this project we extract human body contour from
given image. In Frame Videos: To obtain the human body we
should take the direct difference between the background
and the current frame. Out Frame Videos: To extract the
human body from frames of the videos we used the inbuilt
Gaussian Mixture Model based Foreground detection
method.
Fig 3.1 Background Subtraction
3.2 HUMAN CONTOUR EXTRACTION
To extract the contour of a detected human body, at
thresholding and morphological method, the important
approaches in the field of image segmentation are to choose
a correct threshold and that is difficult under irregular
illumination.
Fig 3.2 Extraction of Human Body Contour
3.3 STAR SKELETONIZATION
The concept of star skeleton is to connect from
centroid to gross extremities of a humancontour.Tofindthe
gross extremities of human contour, the distances from the
centroid to each border point areprocessed ina clockwiseor
counter-clockwise order. The star skeletonisconstructed by
connecting the points to the target centroid.
Fig 3.3 a walk action is a series of postures over time
3.4 TRAINING THE MODEL USING RNN
From the class of artificial neural network the
connection between nodes form a sequence of directed
graph. To learn the temporal dynamics of sequential data
that contains the cyclic connectionsfromtheneural network
architecture. To process sequence of inputs RNN use their
internal memory.
4. CONCLUSION
A Recurrent neural Network (RNN) methodology
for Human Action Recognition using star skeleton as a
representative descriptor of human posture. Star skeleton
could be a quick skeletonization technique by connecting
from center of mass of target object to contour extremes. To
use star skeleton as feature for action recognition, wehavea
tendency to clearly outline the feature as a five-dimensional
vector in star fashion as a result of the top and 4 limbs are
typically local extremes of human shape. In our project we
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 06 Issue: 03 | Mar 2019 www.irjet.net p-ISSN: 2395-0072
© 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 2327
assumed an action is composed of a series of star skeletons
over time. Therefore, time-sequential pictures expressing
human activity square measure remodeled into a feature
vector sequence. Then the feature vector sequence must be
transformed into symbol sequence so that RNN can model
the action.
5. FUTURE WORK
Human Activity Recognition (HAR) mistreatment
smart phones dataset associated an LSTM RNN. Classifying
the type of movement amongst Five categories:
• WALKING_UPSTAIRS,
• WALKING_DOWNSTAIRS,
• SITTING,
• STANDING,
• LAYING
6. REFERENCES
[1] C. Chen, R. Jafari, and N. Kehtarnavaz, “Action
recognition from depth sequences using depth motion
maps-based local binary patterns,” in Proc. IEEE Win.
Conf. Appl. Comput. Vis., 2015, pp. 1092–1099.
[2] G. Liang, X. Lan, J. Wang, J. Wang, and N. Zheng, “A
limb-based graphical model for human pose
estimation,” IEEE Trans. Syst., Man, Cybern., Syst., vol.
48, no. 7, pp. 1080–1092, Jul. 2018
[3] X. Peng, L. Wang, X. Wang, and Y. Qiao, “Bag of
visual words and fusion methods for action
recognition,” Comput. Vis. ImageUnderstand.,vol.150,
pp. 109–125, Sep. 2016.
[4] H. Wang and C. Schmid, “Action recognition with
improved trajectories,” in Proc.IEEEInt.Conf.Comput.
Vis., 2013, pp. 3551–3558.
[5] Y. Hou, Z. Li, P. Wang, and W. Li, “Skeleton optical
spectra-based action recognition using convolutional
neural networks,” IEEE Trans. Circuits Syst. Video
Technol., vol. 28, no. 3, pp. 807–811, Mar. 2018.

More Related Content

PDF
BIOMETRIC AUTHORIZATION SYSTEM USING GAIT BIOMETRY
PDF
1879 1885
PDF
Human activity detection based on edge point movements and spatio temporal fe...
PDF
Facial image classification and searching –a survey
PDF
ADVANCED FACE RECOGNITION FOR CONTROLLING CRIME USING PCA
PDF
Ts 2 b topic
PDF
Hand Gesture Recognition System for Human-Computer Interaction with Web-Cam
PDF
Iris Encryption using (2, 2) Visual cryptography & Average Orientation Circul...
BIOMETRIC AUTHORIZATION SYSTEM USING GAIT BIOMETRY
1879 1885
Human activity detection based on edge point movements and spatio temporal fe...
Facial image classification and searching –a survey
ADVANCED FACE RECOGNITION FOR CONTROLLING CRIME USING PCA
Ts 2 b topic
Hand Gesture Recognition System for Human-Computer Interaction with Web-Cam
Iris Encryption using (2, 2) Visual cryptography & Average Orientation Circul...

What's hot (16)

PDF
Human gait recognition using preprocessing and classification techniques
PDF
40120140501006
PDF
Review of facial expression recognition system and used datasets
PDF
Review of facial expression recognition system and
PDF
IRJET- Survey Paper on Vision based Hand Gesture Recognition
PDF
Vision Based Gesture Recognition Using Neural Networks Approaches: A Review
PDF
Vehicle Monitoring System based On IOT, Using 4G/LTE
PDF
IRJET- Facial Expression Recognition using GPA Analysis
PDF
Activity recognition using histogram of
PDF
G0333946
PDF
Face Recognition Using Simplified Fuzzy Artmap
PDF
Volume 2-issue-6-1960-1964
PDF
Face recogition from a single sample using rlog filter and manifold analysis
PDF
Dq4301702706
PDF
Scale Invariant Feature Transform Based Face Recognition from a Single Sample...
PDF
Recognition of Silverleaf Whitefly and Western Flower Thrips Via Image Proces...
Human gait recognition using preprocessing and classification techniques
40120140501006
Review of facial expression recognition system and used datasets
Review of facial expression recognition system and
IRJET- Survey Paper on Vision based Hand Gesture Recognition
Vision Based Gesture Recognition Using Neural Networks Approaches: A Review
Vehicle Monitoring System based On IOT, Using 4G/LTE
IRJET- Facial Expression Recognition using GPA Analysis
Activity recognition using histogram of
G0333946
Face Recognition Using Simplified Fuzzy Artmap
Volume 2-issue-6-1960-1964
Face recogition from a single sample using rlog filter and manifold analysis
Dq4301702706
Scale Invariant Feature Transform Based Face Recognition from a Single Sample...
Recognition of Silverleaf Whitefly and Western Flower Thrips Via Image Proces...
Ad

Similar to IRJET- Recurrent Neural Network for Human Action Recognition using Star Skeletonization (20)

PDF
IRJET- Behavior Analysis from Videos using Motion based Feature Extraction
PDF
Human Activity Recognition
DOCX
Chapter 1_Introduction.docx
PDF
IRJET- Survey on Detection of Crime
PDF
IRJET- Recognition of Human Action Interaction using Motion History Image
PDF
Survey on Human Behavior Recognition using CNN
PPTX
seminar Islideshow.pptx
PDF
A Framework for Human Action Detection via Extraction of Multimodal Features
PDF
IRJET- Review on Human Action Detection in Stored Videos using Support Vector...
PDF
IRJET - Creating a Security Alert for the Care Takers Implementing a Vast Dee...
PDF
Gait Recognition using MDA, LDA, BPNN and SVM
PDF
IRJET- Tracking and Recognition of Multiple Human and Non-Human Activites
PDF
A Review on Human Activity Recognition System
PDF
A Review on Human Activity Recognition System
PDF
Detection of abnormal human behavior using deep learning
PDF
HUMAN IDENTIFIER WITH MANNERISM USING DEEP LEARNING
PDF
Intelligent Video Surveillance System using Deep Learning
PDF
Comparison of feed forward and cascade forward neural networks for human acti...
PDF
Paper id 25201468
PDF
Continuous hand gesture segmentation and acknowledgement of hand gesture path...
IRJET- Behavior Analysis from Videos using Motion based Feature Extraction
Human Activity Recognition
Chapter 1_Introduction.docx
IRJET- Survey on Detection of Crime
IRJET- Recognition of Human Action Interaction using Motion History Image
Survey on Human Behavior Recognition using CNN
seminar Islideshow.pptx
A Framework for Human Action Detection via Extraction of Multimodal Features
IRJET- Review on Human Action Detection in Stored Videos using Support Vector...
IRJET - Creating a Security Alert for the Care Takers Implementing a Vast Dee...
Gait Recognition using MDA, LDA, BPNN and SVM
IRJET- Tracking and Recognition of Multiple Human and Non-Human Activites
A Review on Human Activity Recognition System
A Review on Human Activity Recognition System
Detection of abnormal human behavior using deep learning
HUMAN IDENTIFIER WITH MANNERISM USING DEEP LEARNING
Intelligent Video Surveillance System using Deep Learning
Comparison of feed forward and cascade forward neural networks for human acti...
Paper id 25201468
Continuous hand gesture segmentation and acknowledgement of hand gesture path...
Ad

More from IRJET Journal (20)

PDF
Enhanced heart disease prediction using SKNDGR ensemble Machine Learning Model
PDF
Utilizing Biomedical Waste for Sustainable Brick Manufacturing: A Novel Appro...
PDF
Kiona – A Smart Society Automation Project
PDF
DESIGN AND DEVELOPMENT OF BATTERY THERMAL MANAGEMENT SYSTEM USING PHASE CHANG...
PDF
Invest in Innovation: Empowering Ideas through Blockchain Based Crowdfunding
PDF
SPACE WATCH YOUR REAL-TIME SPACE INFORMATION HUB
PDF
A Review on Influence of Fluid Viscous Damper on The Behaviour of Multi-store...
PDF
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
PDF
Explainable AI(XAI) using LIME and Disease Detection in Mango Leaf by Transfe...
PDF
BRAIN TUMOUR DETECTION AND CLASSIFICATION
PDF
The Project Manager as an ambassador of the contract. The case of NEC4 ECC co...
PDF
"Enhanced Heat Transfer Performance in Shell and Tube Heat Exchangers: A CFD ...
PDF
Advancements in CFD Analysis of Shell and Tube Heat Exchangers with Nanofluid...
PDF
Breast Cancer Detection using Computer Vision
PDF
Auto-Charging E-Vehicle with its battery Management.
PDF
Analysis of high energy charge particle in the Heliosphere
PDF
A Novel System for Recommending Agricultural Crops Using Machine Learning App...
PDF
Auto-Charging E-Vehicle with its battery Management.
PDF
Analysis of high energy charge particle in the Heliosphere
PDF
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
Enhanced heart disease prediction using SKNDGR ensemble Machine Learning Model
Utilizing Biomedical Waste for Sustainable Brick Manufacturing: A Novel Appro...
Kiona – A Smart Society Automation Project
DESIGN AND DEVELOPMENT OF BATTERY THERMAL MANAGEMENT SYSTEM USING PHASE CHANG...
Invest in Innovation: Empowering Ideas through Blockchain Based Crowdfunding
SPACE WATCH YOUR REAL-TIME SPACE INFORMATION HUB
A Review on Influence of Fluid Viscous Damper on The Behaviour of Multi-store...
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
Explainable AI(XAI) using LIME and Disease Detection in Mango Leaf by Transfe...
BRAIN TUMOUR DETECTION AND CLASSIFICATION
The Project Manager as an ambassador of the contract. The case of NEC4 ECC co...
"Enhanced Heat Transfer Performance in Shell and Tube Heat Exchangers: A CFD ...
Advancements in CFD Analysis of Shell and Tube Heat Exchangers with Nanofluid...
Breast Cancer Detection using Computer Vision
Auto-Charging E-Vehicle with its battery Management.
Analysis of high energy charge particle in the Heliosphere
A Novel System for Recommending Agricultural Crops Using Machine Learning App...
Auto-Charging E-Vehicle with its battery Management.
Analysis of high energy charge particle in the Heliosphere
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...

Recently uploaded (20)

PPTX
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
PPTX
IOT PPTs Week 10 Lecture Material.pptx of NPTEL Smart Cities contd
PPT
Mechanical Engineering MATERIALS Selection
PDF
Embodied AI: Ushering in the Next Era of Intelligent Systems
PDF
Well-logging-methods_new................
PDF
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
PDF
Digital Logic Computer Design lecture notes
PPTX
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
PPTX
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
PPTX
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
PPTX
web development for engineering and engineering
PPTX
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
PPTX
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
PDF
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
PDF
TFEC-4-2020-Design-Guide-for-Timber-Roof-Trusses.pdf
PPTX
bas. eng. economics group 4 presentation 1.pptx
PPTX
Welding lecture in detail for understanding
PPTX
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
DOCX
573137875-Attendance-Management-System-original
PDF
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
IOT PPTs Week 10 Lecture Material.pptx of NPTEL Smart Cities contd
Mechanical Engineering MATERIALS Selection
Embodied AI: Ushering in the Next Era of Intelligent Systems
Well-logging-methods_new................
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
Digital Logic Computer Design lecture notes
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
web development for engineering and engineering
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
TFEC-4-2020-Design-Guide-for-Timber-Roof-Trusses.pdf
bas. eng. economics group 4 presentation 1.pptx
Welding lecture in detail for understanding
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
573137875-Attendance-Management-System-original
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf

IRJET- Recurrent Neural Network for Human Action Recognition using Star Skeletonization

  • 1. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 06 Issue: 03 | Mar 2019 www.irjet.net p-ISSN: 2395-0072 © 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 2324 Recurrent Neural Network for Human Action Recognition using Star Skeletonization Nithin Paulose1, M. Muthukumar2, S. Swathi3, M. Vignesh4 1,2B.E.Computer Science and Engineering, Dhaanish Ahmed Institute of Technology, Coimbatore, Tamil Nadu, India 3,4Assistant Professor, Dept. of Computer Science and Engineering, Dhaanish Ahmed Institute of Technology, Coimbatore, Tamil Nadu, India ---------------------------------------------------------------------***---------------------------------------------------------------------- Abstract - This project presents a Recurrent Neural Network (RNN) methodology for Human Action Recognition using star skeleton as a representative descriptor of human posture. Star skeleton is a fast skeletonization technique by connecting from geometric center of target object to contour extremes. For the action recognition using the feature star skeleton, we clearly define the feature as a five-dimensional vector in star fashion because the head and four limbs are usually local extremes of human shape. In our project we assumed an action is composed of a series of star skeletons overtime. Therefore, the images whicharetime-sequential are expressing human action that is transformed into a feature vector sequence. Then the feature vector sequence must be transformed into symbol sequence so that RNN can model the action. We used RNN because the features extracted are time dependent. Key Words: Recurrent Neural Network (RNN), star skeleton, contour extremes, Human Action Recognition, five- dimensional vector, time-sequential. 1. INTRODUCTION Human activity recognition is an important task for Ambient Intelligence systems. The state of a person is to be recognized, which provides us with valuable information that is been used as input for other systems. For example, in health care, fall detection can be used to alert the medical in case of an accident; in security, abnormal behavior can be detected and thus used to prevent a burglary or other criminal activities. Human motion analysis is currently receiving increasing attention from computer vision researchers. For example, Human body segmentation in an image, the movement of joints are tracked in an image sequence, and the analysisofathletic performanceisdone by recovering underlying 3D body structure , also used for medical diagnostics. Other applications include building man-machine user interfaces and video conferencing. The goal of human activity recognition is to automatically analyses on -goingactivitiesfromanunknown video. The objective of the system is to correctly classify the video into its activity category, for example where a video is segmented to contain only one executionofa humanactivity. The starting and ending times of all occurring activitiesfrom an input video is detected, from which the continuous recognition of human activities are performed. The constructions of several important applications are constructed from the videos which has the ability to recognize complex humanactivities.Automatedsurveillance systems in public places like airports and subway stations require detection of abnormal and suspicious activities, as opposed to normal activities. For example, an automatic recognize of suspicious activities like “a personleavesa bag” or “a person places his/her bag in a trash bin” in an airport surveillance systemmustberecognized.Usingrecognitionof human activity the real-time monitoring of patients, children, and elderly persons can be done. By using activity recognition the construction of gesture-based human computer interfaces and vision-based intelligent environments becomes possible. There are various types of human activities. Depending on their complexity, they can be conceptually categorized into four different levels: gestures, actions, interactions, and group activities. 1.1 HUMAN ACTIVITY RECOGNITION FROM VIDEO SEQUENCES Human activity recognition role is that human-to- human interaction and interpersonal relations. HAR provides information about the identity, personality, and psychological state that is difficult to extract. In various classification techniques two main questions emerge: “Where it is in the video?” and “What is the action?” To recognize human activities one must determine the active states of a person, to recognize the efficiency. “Walking” is the daily human activity to recognize. The complexactivities such as “peeling an apple” are more difficult to identify. The easier way to recognize is to simplify the complex activities into other simpler activities. For the better understanding of human activities the detectionofobjects mayprovideuseful information about the ongoing event. The human activity recognition assumes a figure- centric scene of uncluttered background, where the actor is free to perform an activity. The challenging task to classifya person’s activities with low error in a fully automated human activity recognition system are background clutter, partial occlusion, changes in scale, viewpoint, lighting and appearance, and frame resolution. Moreover, commenting
  • 2. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 06 Issue: 03 | Mar 2019 www.irjet.net p-ISSN: 2395-0072 © 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 2325 behavioral roles is time consuming and requires knowledge of the specific event. However, intra- and inter-class similarities make the problem more challenging, that is ,the same action done by two people may finding difficult to detect their individual action . The human activity is based on their habits, and this make s so difficulttodeterminetheir activity. In real time the challenging tasks is that, the construction of visual model for learning and analyzing human movements with inadequate benchmark datasets for evaluation. Fig 1: Architecture diagram 1.2 HUMAN ACTIVITY CATEGORIZATION The human activity recognition methods are classified into two main categories: (i) uni-modal and (ii) multimodal activity recognition methods according to the nature of sensor data they employ. Then, these two categories is further break down into sub-categories depending on how they model human activities. Uni-modal methodsrepresenthumanactivitiesfrom data of a single modality, such as images, and they are further categorized as: (i) space-time, (ii) stochastic, (iii) rule-based, and (iv) shape based methods. Space-time methods involve activity recognition methods, which represent human activities as a set of spatio-temporal features or trajectories. By applying statistical models to represent human actions stochastic methods are used. The modelings the motion of human body parts are shape-based methods efficiently represent activities with high level reasoning. Multimodal methods combine features collected from different sources and are classified into three categories: (i) affective, (ii) behavioral, and (iii) social networking methods. Emotional communications and the affective state of a person represent human activity. Behavioral methods aim to recognize behavioral attributes, non-verbal multimodal cues such as gestures, facial expressions and auditory cues.Thecharacteristicsaremodel by social networking methods .It model the characteristics and the behavior of humans in several layers of human-to- human interactions in social events from gestures, body motion, and speech. The activity and behavior are the two terms which are used interchangeably. These two terms activity and behavior is used to describe a sequence of actions that correspond to specific body motion and to characterize both activities and events that are associated with facial expression, emotional states, and gestures with single person auditory cues. 1.3 HUMAN ACTIVITY RECOGNITION MODEL The human activity recognition has largely focused on statistical methods using spatio-temporal features. The typical model consists of spatio-temporal interest-points which are detected in the video sequence and the local maxima become the centerpointofa spatio-temporal region. Features are then extracted from the spatio-temporal region (such as features based on optical flow or gradient values) and summarized or histogram med to form a feature descriptor. The feature descriptors are used to form a code book, typically followed by a bag of visual words model adapted from statistical natural language processing. While methods based on spatio-temporal features are the most common, other methods make use of other video features such as medium-term tracking, volumetric representations and graph-based features. 2. LITERATURE REVIEW [1] A computational efficient action recognition framework using depth motion maps (DMMs)-based local binary patterns (LBPs) and kernel-based extreme learning machine (KELM). In depth video sequence depth frames are projected onto three orthogonal Cartesianplanestoformthe projected images corresponding to three projection views [front (f), side (s), and top (t) views].For calculating LBP histogram an LBP operator is applied to each block and the DMMs are divided into overlapped blocks. Feature-level fusion and decision-level fusion approachesareinvestigated using KELM [2] in this paper, we propose to use human limbs to augment constraints between neighboring human joints. We model a limb as a wide line to represent its shape information. Instead of estimating its length and rotation angle, we calculate as per-pixel likelihood for each human limb by a ConvNet. [3] Video based action recognition is one of the important and challenging problems in computer vision re- search. The several realistic datasets are the HMDB51, UCF50, and UCF101. BoVW is a general pipeline to construct a global representation from local features composed of five steps; (i) feature extraction, (ii) feature pre-processing, (iii) codebook generation, (iv)feature encoding, and (v) pooling and normalization.[4]For efficient video representation on action recognition is shown by dense and achieved state-of-the-art results on a variety of datasets. The performance is corrected bytakingthecamera motion into account. The feature points between frames using SURF descriptors and dense optical flow are used to
  • 3. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 06 Issue: 03 | Mar 2019 www.irjet.net p-ISSN: 2395-0072 © 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 2326 estimate camera motion. With the help of RANSAC homography is estimated. The performance can be significantly improved by removing backgroundtrajectories and warping optical flow with a robustly estimated homograph approximating the camera motion. [5] The discriminative features for action recognition Convolutional Neural Networks (ConvNets) are employed into color texture images that are referred to as skeleton optical spectra. The learning of suitable dynamic features and ConvNet architecture from skeleton sequences is possible without training millions of parameters from this kind of spectrum views. 3. DESIGN METHODS There are some methods used for Human Action Recognition using RNN. 3.1 HUMAN SILHOUETTE EXTRACTION In this project we extract human body contour from given image. In Frame Videos: To obtain the human body we should take the direct difference between the background and the current frame. Out Frame Videos: To extract the human body from frames of the videos we used the inbuilt Gaussian Mixture Model based Foreground detection method. Fig 3.1 Background Subtraction 3.2 HUMAN CONTOUR EXTRACTION To extract the contour of a detected human body, at thresholding and morphological method, the important approaches in the field of image segmentation are to choose a correct threshold and that is difficult under irregular illumination. Fig 3.2 Extraction of Human Body Contour 3.3 STAR SKELETONIZATION The concept of star skeleton is to connect from centroid to gross extremities of a humancontour.Tofindthe gross extremities of human contour, the distances from the centroid to each border point areprocessed ina clockwiseor counter-clockwise order. The star skeletonisconstructed by connecting the points to the target centroid. Fig 3.3 a walk action is a series of postures over time 3.4 TRAINING THE MODEL USING RNN From the class of artificial neural network the connection between nodes form a sequence of directed graph. To learn the temporal dynamics of sequential data that contains the cyclic connectionsfromtheneural network architecture. To process sequence of inputs RNN use their internal memory. 4. CONCLUSION A Recurrent neural Network (RNN) methodology for Human Action Recognition using star skeleton as a representative descriptor of human posture. Star skeleton could be a quick skeletonization technique by connecting from center of mass of target object to contour extremes. To use star skeleton as feature for action recognition, wehavea tendency to clearly outline the feature as a five-dimensional vector in star fashion as a result of the top and 4 limbs are typically local extremes of human shape. In our project we
  • 4. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 06 Issue: 03 | Mar 2019 www.irjet.net p-ISSN: 2395-0072 © 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 2327 assumed an action is composed of a series of star skeletons over time. Therefore, time-sequential pictures expressing human activity square measure remodeled into a feature vector sequence. Then the feature vector sequence must be transformed into symbol sequence so that RNN can model the action. 5. FUTURE WORK Human Activity Recognition (HAR) mistreatment smart phones dataset associated an LSTM RNN. Classifying the type of movement amongst Five categories: • WALKING_UPSTAIRS, • WALKING_DOWNSTAIRS, • SITTING, • STANDING, • LAYING 6. REFERENCES [1] C. Chen, R. Jafari, and N. Kehtarnavaz, “Action recognition from depth sequences using depth motion maps-based local binary patterns,” in Proc. IEEE Win. Conf. Appl. Comput. Vis., 2015, pp. 1092–1099. [2] G. Liang, X. Lan, J. Wang, J. Wang, and N. Zheng, “A limb-based graphical model for human pose estimation,” IEEE Trans. Syst., Man, Cybern., Syst., vol. 48, no. 7, pp. 1080–1092, Jul. 2018 [3] X. Peng, L. Wang, X. Wang, and Y. Qiao, “Bag of visual words and fusion methods for action recognition,” Comput. Vis. ImageUnderstand.,vol.150, pp. 109–125, Sep. 2016. [4] H. Wang and C. Schmid, “Action recognition with improved trajectories,” in Proc.IEEEInt.Conf.Comput. Vis., 2013, pp. 3551–3558. [5] Y. Hou, Z. Li, P. Wang, and W. Li, “Skeleton optical spectra-based action recognition using convolutional neural networks,” IEEE Trans. Circuits Syst. Video Technol., vol. 28, no. 3, pp. 807–811, Mar. 2018.