Automated Evaluator for Bharatanatyam (Nritta)

VISVESWARAYA TECHNOLOGICAL UNIVERSITY
A PROJECT REPORT
On
“ An Automated Evaluator for Bharatanatyam (Nritta) ”
Submitted in the partial fulfillment of the requirements for VIII Semester
BACHELOR OF ENGINEERING
IN
COMPUTER SCIENCE & ENGINEERING
Submitted By:
RENU S HIREMATH
SHREYA BHAT
VIII Semester, B.E, CSE
Under the guidance of:
Prof. Srikanth H R
Professor, Department of CSE
PESIT, Bangalore.
January 2016 – May 2016
Carried out at:
Department of Computer Science
PES INSTITUTE OF TECHNOLOGY
(an autonomous institute under VTU)
Department of Computer science & Engineering
100 Feet Ring Road , Banashankari III Stage,
Bangalore-560 085.

PES INSTITUTE OF TECHNOLOGY
(An Autonomous Institute under VTU, Belgaum)
100 Feet Ring Road, BSK- III Stage, Bangalore – 560 085
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
Certificate
Certified that the eighth semester project work under the topic “An Automated Evaluator
for Bharatanatyam (Nritta)” is a bonafide work carried out by
Renu S Hiremath 1PI12CS134
Shreya Bhat 1PI12CS163
in partial fulfillment for the award of Bachelor of Engineering in Computer Science and
Engineering of Visvesvaraya Technological University, Belgaum during the academic
semester January 2016 – May 2016. It is certified that all corrections/suggestions indicated for
Internal Assessment have been incorporated in the report deposited in the departmental library.
The project report has been approved as it satisfies the academic requirements in respect of
Project work prescribed for the said Bachelor of Engineering.
______________________ ______________________ ______________________
Signature of the Guide
Asst. Prof. Srikanth H R
Signature of the HOD
Prof. Nitin V. Pujari
Signature of the Principal,
Dr. K S Sridhar
External Viva:
Name of Examiners Signature with Date
________________________ _________________________
________________________ _________________________

ACKNOWLEDGEMENT
The satisfaction that accompanies the successful completion of a task would be
incomplete without the mention of the people who made it possible, and whose constant
guidance and encouragement helped us in completing the seminar successfully. We consider it
a privilege to express gratitude and respect to all those who guided us throughout the course of
the completion of the project.
We extend our sincere thanks to Asst. Prof. Srikanth H R, Professor, Computer
Science and Engineering, PESIT for his invaluable suggestions, constant guidance,
encouragement, support and invaluable advice without which this project would not have been
completed.
We express my gratitude to Prof. Nitin V. Pujari, Head of the Department, Computer
Science and Engineering, PESIT, whose guidance and support has been invaluable.
We would like to express our heartfelt thanks to Prof. M. R. Doreswamy, PESIT
founder, Prof. D. Jawahar, PES University and Dr. K. S. Sridhar, Principal, PESIT, for
providing us with a congenial environment for carrying out the project.
Last, but not the least, we would like to thank our friends whose invaluable feedback
helped us to improve the application by leaps and bounds, and our parents for their unending
encouragement and support.
Renu S Hiremath - 1PI12CS134
Shreya Bhat - 1PI12CS163

ABSTRACT
The goal of this project is to implement a system that automatically evaluates a dance
performance against a gold standard performance and provides a feedback to the performer in
terms of a graph and suggestions. The system acquires the motion of a performer via joint
detection. Eight important joints of the body are detected using the technique of HAAR
classification. The coordinates of these joints are then used to calculate the joint velocity based
on which scores are provided.
This project aims at automating the process of dance examinations held every year. The system
requires two cameras that capture the front and side views of a dancer. These videos are fed to
the system that breaks them into frames. The eight joints are detected in each frame and the
position of these joints are used to find the joint velocity. A scoring algorithm is devised which
grades a dancer by comparing the joint velocity of the dancer against a golden standard.
Scoring will be done separately for Upper Body and Lower Body movements. A final score is
provided that is a combination of upper body, lower body and hand gestures. Apart from
scoring, the system also tries to identify the places the dancer can improve on. A set of
suggestions are provided for the dancer along with a graph that makes it easier for the dancer
to estimate the areas she has to work on.
This system tries to eliminate the human effort required in evaluating a dancer every single
year during the examinations. The only investment to be made is on the two cameras. This
system doesn’t require expensive hardware like Kinect thereby being extremely cost efficient.
Also, the fee paid to the evaluators can be eliminated. Another extremely helpful feature of this
system is the feedback provided for the dancer after the performance. This system can also be
used as a practice tool for dancers who are trying to improve their techniques.

TABLE OF CONTENTS
Sl.
No.
Content Pg.
No.
1 INTRODUCTION
1.1. Introduction
1.2. Challenges and Opportunities
1.3. Problems Identified
1.4. Problems to which the prototype provides a solution
2
2
3
4
2 PROBLEM DEFINITION 6
3 LITERATURE SURVEY
3.1. Bharatanatyam
3.2. Studies and Exploration for the selected idea
3.3. Papers Referred
7
10
12
4 PROJECT REQUIREMENT DEFINITION
4.1. Project Perspective
4.2. Project Function
4.3. User Classes and Characteristics
4.4. Operating Environment
4.5. Design and Implementation Constraints
4.6. User Documentation
4.7. Assumptions and Dependencies
17
18
19
19
19
20
20
5 SOFTWARE REQUIREMENTS SPECIFICATION
5.1. External Interface Requirements
5.2. Hardware Requirements
5.3. Functional Requirements
5.4. Non-Functional Requirements
23
23
24
26
6 GANTT CHART 28

7 SYSTEM DESIGN
7.1. Block Diagram
7.2. Architecture
7.3. Use Case Diagram
7.4. Scenario Diagram
30
32
35
37
8 DETAILED DESIGN
8.1. Modules
8.2. Database Design
41
44
9 IMPLEMENTATION
9.1. Implementation Choices
9.2. Justification for Implementation Choices
9.3. Face Detection
9.4. Upper Body Detection
9.5. Lower Body Detection
9.6. Skin Detection
9.7. Scoring
9.8. Analysis and Visualization
48
48
50
52
54
56
57
58
10 INTEGRATION 61
11 TESTING
11.1. Unit Testing
11.2. Integration Testing
11.3. Functional Testing
63
65
65
12 SCREENSHOTS 67
13 CONCLUSION 72
14 FUTURE ENHANCEMENTS 74
15 REFERENCES 76

LIST OF FIGURES
Fig. No. Figure Name Pg. No.
3.1 Methodology used for Skeleton Detection 12
7.1 Block Diagram 30
7.2 High Level Architecture Diagram 32
7.3 Use Case Diagram 36
7.4 Scenario Diagram for the Whole System 38
7.5 Scenario Diagram for the Detection Modules 39
9.1 Face Detction for the Standing and Sitting Postures 50
9.2 Algorithm used for Face Detection 51
9.3 Upper Body Detction for the Standing and Sitting Postures 52
9.4 Algorithm used for Upper Body Detection 53
9.5 Lower Body Detction for the Standing and Sitting Postures 54
9.6 Algorithm used for Lower Body Detection 55
9.7 Algorithm used for Skin Detection 56
9.8 Algorithm used for Scoring 57
9.9 Graph plotting the score of each frame and the Report Card
printed at the end of the analysis
58
9.10 Comparing the student’s frame with the corresponding frame of
the choreographer
59
12.1 Introduction Screen for the Tool 67
12.2 Window to enter the details for the Analysis 67
12.3 Window to upload the video for Analysis 68
12.4 Screen displaying the progress of the Analysis 68
12.5 Screenshot of a frame given a high score 69
12.6 Screenshot of a frame given a low score 69
12.7 Screenshot of the graph plotted with the score for every frame 70
12.8 Report Card printed at the end of the Analysis 70

Automated Evaluator for Bharatanatyam (Nritta)

An Automated Evaluator for Bharatanatyam (Nritta)
PESIT – CSE Jan ’16 – May ‘16 1
CHAPTER I
INTRODUCTION

1.1. Introduction
Bharatanatyam is one of the oldest dance forms of India. It was nurtured in the temples and
courts of Southern India since ancient times. Today it is a very popular and widely
performed dance style, practiced all over India. The beautiful postures and the rhythmic
patterns of the feet add a new dimension to this art form. On the surface, three aspects of
Bharatanatyam are evident, as with any dance form: movement, costume, and music. In
other words, what the dancer is doing, how the dancer looks, and what are the
accompanying sounds. There are two kinds of movements in Bharatanatyam – abstract and
expressive. The abstract movements are done to show rhythm, to provide decoration, and
to create beauty. This is the aspect of dance that we mainly focus on.
Bharatanatyam is segregated into three parts, namely Nritta, Nrithya, and Natya. In this
project, we concentrate on the Nritta aspect of Bharatanatyam that constitutes the abstract
dance movements with rhythm, without expression of a theme or emotion. This is also
called pure dance. Nritta gives major importance to the body movements, the postures and
the hand gestures used to convey meaning in the dance.
1.2. Challenges and Opportunities
Classical dance, specifically Bharatanatyam is given a lot of importance across the country.
Every year, Bharatanatyam examinations are conducted by the central board in various
states to assess a dancer on a number of parameters. Dancers who clear these examinations
are provided with certifications at various levels which prove to be extremely important
for those who want to pursue a career in Bharatanatyam. In spite of how important this is,
we still see a lack of a proper organized system to carry out this entire process. There is an
acute shortage of well qualified judges who appear as examiners. Also, if the fee for an
examiner is not affordable, a compromise is made on the number of judges allotted to each
student. Ideally there should be two or more judges for each student, but because of the
shortage, most of the time, only one judge is seen at the exam center. There is no existing

system that can automate this process of evaluating a dance performance. This presents a
challenge to come up with a new system that automates the evaluation of a dance
performance (Nritta) and provide scores at the end of it, similar to how a human judge does
it.
1.3. Problems Identified
Recently, an article highlighting the problems faced while conducting a dance exam was
published in a paper:
“The difference of opinion between different zones has put the validity of examinations for
various levels of Bharatanatyam being held in Mangalore in a clutter. While the
examinations in the Bangalore zone was carried out by May 15, the same for Mangalore
has run into trouble with judges and examiners belonging to the state capital giving the
tests in the coastal city a miss.
The understanding was that Bangalore zone will send its teachers as judges in Mangalore
and vice versa. It is alleged that Mangalore is always left high and dry. This injustice has
been happening to the students in Mangalore zone for more than 10 years and we are tired
of the crisis every year. The worst thing is that we have to bring in the local gurus to act
as judges and some of them assign more marks to their own students and less marks to
others.”
From the above article, we note the following problem
● shortage of judges
● under qualified judges
● how scores are based entirely on the “opinion” of that one judge who is present at the
exam center
In short, there is no system that standardizes this entire process of evaluating a dancer.

1.4. Problems to which the prototype provides a solution
This project proposes to implement a system that automates the process of evaluating a
dance performance during the examination. The dance performance is captured with the
help of two cameras, one for the front view and the other for side view. These videos are
fed into the system that segregates it into frames. A score is provided for each frame by
comparing it to a video of a professional dancer. The entire performance is evaluated based
on three categories – upper body movements, lower body movements and hand gestures.
Final scores are provided to the dancer in the form of a marks card along with some
visualization and suggestions for improvement.

CHAPTER II
PROBLEM DEFINITION

The idea is to automatically evaluate a dancer using techniques of video and image processing.
The input to the system is a video of the performance and a number to indicate the choreography
being performed in the video. A few standard dance choreographies are used against which all
other performances will be compared. Once the video of the student is uploaded, a set of frames
are extracted from the video.
The joints of the dancer are detected using the HAAR Classifiers which have trained for the
corresponding body parts. The training is done for at least 1000 positive images and 1500 negative
images for at least 10 stages to ensure accuracy. Skin detection is also used for the recognition of
the feet and hands. Combining these two techniques, skeleton tracking is done.
Using the technique of skeleton tracking, he coordinates of the face, neck, shoulders, hip, knees,
hands and feet are got. These coordinates are then used to find the direction of movements of the
joints and the corresponding joint velocities. In this way, the accuracy of all the movements is
calculated.
After skeleton tracking, scores are provided for the upper body, lower body and hand gestures.
These scores are then put together and compared against the score of a standard dancer. The score
for every frame of the student in comparison with the standard dancer are then displayed. This
way this tool can also be used by a dancer for self-evaluation and see their improvement over a
period of time.

CHAPTER III
LITERATURE SURVEY

3.1. Bharatanatyam
Bharatanatyam is a traditional Indian classical dance. It has many dimensions. They
include body movements, facial expressions, hand gestures, footwork, costumes, music,
repertoire, and themes of performances. Because Bharatanatyam is so well developed, all
of these aspects of the art have been codified, and are documented in ancient scriptures as
well as recent books.
The movements of Bharatanatyam are unique. They share elements with other classical
Indian dances, but aren’t found in any western dance style. They are often described as
geometric, for there is much geometry in the basic postures and movements of which the
dance is built, but this makes them sound static, which they aren’t. Bharatanatyam is
dynamic and energetic; it is also precise and balanced. The basic postures center the weight
of the dancer, and there is little use of the hips or off-balance positions. Bharatanatyam has
a variety of characteristic movements. Along with the rhythmic stamping of the feet, there
are jumps, pirouettes, and positions where the knees contact the floor. Many are executed
in the stance with knees bent and turned outward. Performed by an expert dancer, these
movements flow together gracefully. Every part of the body is involved in the dance, and
their movements are defined and classified (in great number) in this system of dance.
At the functional level, the dance has three aspects:
● Nritta: Abstract dance movements with rhythm, but without expression of a theme or
emotion. Also called pure dance.
● Nritya: Interpretive dance, using facial expressions, hand gestures, and body
movements to portray emotions and express themes.
● Natya: The dramatic aspect of a stage performance, including spoken dialogue and
mime, to convey meaning and enact narrative.
In this project, the focus is on the nritta aspect which is pure dance without the emotions.
The basic unit of dance in Bharatanatyam is the adavu. Each adavu is a combination of
steps or positions with coordinated movements of the feet, legs, hands, arms, torso, head,

and eyes. Adavus give Bharatanatyam its distinctive look. For instance, many adavus are
executed with the legs bent, knees outward, heels together and toes outward – a position
called aramandi. The adavus, numbering around 120 in all, are divided into numerous
groups and subgroups. The hand gestures of Bharatanatyam are called hastas. Sometimes,
they are called mudras, or hasta mudras. There are one-handed and two-handed hastas,
there are lots of them, and they all have names. When a hasta is employed in a specific
context for a specific purpose, it gets a special name for that use.
Dance Examination:
Every year, thousands of students take up the junior and senior levels of examination. The
dancer is judged based on two nritta aspects like hasta and anagashuddi. Under angashuddi,
aramandi and the various body postures along with the body balance are observed. Each
student is judged by two judges.
Portions for the examination are fixed. The items that have to be learnt are specified in
advance. Each of the judges are provided with marks sheet which have the following
columns:
 Postures
 Hand gestures
 Clarity of steps
 Grace
 Knowledge of beats and rhythm
Based on the above qualities, a final score is given. Apart from this there is a 30 minutes
test where the judge tries to score the student based on knowledge of various
Bharatanatyam exercises and Hastas. A set of Hastas and postures are named that the
dancer has to show. Here again, body balance and clarity are checked.

3.2. Studies and Exploration for the selected idea
1. Kinect
Kinect (codenamed in development as Project Natal) is a line of motion sensing input
devices by Microsoft for Xbox 360 and Xbox One video game consoles and Windows PCs.
Based around a webcam-style add-on peripheral, it enables users to control and interact
with their console/computer without the need for a game controller, through a natural user
interface using gestures and spoken commands. Kinect sensor is a horizontal bar connected
to a small base with a motorized pivot and is designed to be positioned lengthwise above
or below the video display. The device features an "RGB camera, depth sensor and multi-
array microphone running proprietary software", which provide full-body 3D motion
capture, facial recognition and voice recognition capabilities.
2. OpenCV
OpenCV is released under a BSD license and hence it’s free for both academic and
commercial use. It has C++, C, Python and Java interfaces and supports Windows, Linux,
Mac OS, iOS and Android. OpenCV was designed for computational efficiency and with
a strong focus on real-time applications. Written in optimized C/C++, the library can take
advantage of multi-core processing. Enabled with OpenCL, it can take advantage of the
hardware acceleration of the underlying heterogeneous compute platform. Usage ranges
from interactive art, to mines inspection, stitching maps on the web or through advanced
robotics.
3. HAAR Cascade Classifiers
Cascading is a multistage ensemble learning process based on the concatenation of several
classifiers, using all information collected from the output from a given classifier as
additional information for the next classifier in the cascade. The Viola–Jones object

detection framework is the first object detection framework to provide competitive object
detection rates in real-time proposed in 2001 by Paul Violaand Michael Jones. This
framework uses Haar-like features which are digital image features for the object
recognition. A Haar-like feature considers adjacent rectangular regions at a specific
location in a detection window, sums up the pixel intensities in each region and calculates
the difference between these sums. This difference is then used to categorize subsections
of an image. The key advantage of a Haar-like feature over most other features is its
calculation speed and its high detection rate.

3.3. Papers Referred
1. Evaluating a Dancer’s Performance using Kinect-based Skeleton
Tracking
Authors : Philip Kelly, Noel E. O’Connor, Tamy Boubekeur and Maher Ben Moussa
● In this, the dataset included recordings of Salsa dancers captured with a variety of
equipment like the Microsoft Kinect sensors.
● The Kinect studio and the OpenNI module are used for the purpose of skeleton tracking.
● This module itself outputs the position of the joints for each frame.
● In order to score for a particular choreography, the aligned position and velocity vector
signals of an amateur dancer are compared with the corresponding signals of a
professional one.
● Dancer is evaluated based on Joint positions and joint velocities.
2. Skeletal Tracking using Microsoft Kinect
Authors : Abhishek Kar, Dr. Amitabha Mukerjee & Dr. Prithwijit Guha,
The methodology used in the paper is:
Fig. 3.1 Methodology used for Skeleton Detection

Foreground Segmentation:
Thresholding on the depth image is used to extract the foreground from the image. Noise
is removed using morphological operations of erosion and dilation.
HAAR Cascade Detection:
Viola-Jones face and upper body detectors are used for face and upper body positions of
the subject. The detector is based on HAAR cascade classifiers. Each classifier uses
rectangular HAAR features to classify the region of the image as a positive or negative
match.
The Stick Skeleton Model:
The human skeleton model is represented by 7 body parts involving 8 points. The head and
neck points as the centroid and midpoint of the base line of the face detection rectangle.
The shoulder points are fixed halfway between the face detection and torso detection
rectangle base lines with the shoulder 5 width set as twice the face width.
Since the videos may not be temporally sequenced:
● For each sequence, all the frames before the detection of the dancer, i.e. before the
time-instance when at least one joint is detected, were ignored.
● An ag value, NaN (Not A Number), was used to represent an undetected/tracked joint
● The shortest sequences with NaN values are padded, so that both sequences to be
compared have the same temporal length.
Changes from the already implemented approach:
● Microsoft Kinect setup costs a lot. The sensor alone costs $143 and the adaptor
costs about $49. Considering Indian classical dance, we will need at least two
sensor for skeleton tracking, there by doubling this cost.

● A normal video cannot be fed to Kinect studio. Hence it is mandatory to purchase
the hardware which creates .xef files.
● Here OpenCV is used for skeleton tracking, video and image processing. In this
case, a normal camera is enough for the purpose of taking videos of performances.
Hence a considerable amount of money is saved.
● Here the concentration is on the Indian classical dance - Bharatanatyam. In the
paper, Salsa is the form of dance chosen. In Salsa, hand gestures are not really
considered for scoring. This extra feature has been included in our system, which
performs hand gesture detection. Hence the final scores will include the element of
hand gestures as well.
● Instead of scoring the entire body, the scoring is done separately for upper body
and lower body, thereby let the amateur dancer know where there is room for
improvement.
3. Bharatanatyam and Mathematics : Teaching Geometry through dance
Authors : Iyengar Mukunda Kalpana
Bharatanatyam is a highly codified and schematized Asian Indian style of classical dance
that accommodates the different kinds of learners. This dance is culturally relevant to Asian
Indian American students, but the findings are applicable to students from other
demographics that are interested in learning math through dance.
Many Asian Indian students learn Bharatanatyam for cultural maintenance and
preservation. Dance is also a beneficial medium to teach basic geometric shapes to young
children because dance is an engaging art curriculum that can be used in schools. This
mixed methods study informed by categorical content analysis is designed to recommend
a framework for exploring how Asian Indian students can learn basic geometric shapes
through Bharatanatyam.
This paper investigates dance movements called adavus, cultural relevance, and integration
of elements from dance and geometry and the implementation of alternate strategies such

as dance instruction to teach and learn basic geometric shapes. The data analysis revealed
the benefits of dance and math integration.
4. Rapid Object Detection using a boosted cascade of a sample
Authors : Paul Viola and Michael Jones
This paper describes a machine learning approach for visual object detection which is
capable of processing images extremely rapidly and achieving high detection rates. This
work is distinguished by three key contributions. It talks about a new image representation
called the “Integral Image” which allows the features used detectors to be computed very
quickly. It then talks about a learning algorithm - AdaBoost, which selects a small number
of critical visual features from a larger set and yields extremely efficient classifiers.
The paper discuses a “cascade” which allows background regions of the image to be
quickly discarded while spending more computation on promising object-like regions. The
cascade can be viewed as an object specific focus-of-attention mechanism which unlike
previous approaches provides statistical guarantees that discarded regions are unlikely to
contain the object of interest.
In the domain of face detection the system yields detection rates comparable to the best
previous systems. Used in real-time applications, the detector runs at 15 frames per second
without resorting to image differencing or skin color detection.

CHAPTER IV
PROJECT REQUIREMENT DEFINITION

4.1. Project Perspective
This project aims at the automation of the evaluation of the Bharatanatyam dance
performances which are performed as part of the dance examination. This is brought about
based on the following observations:
1. The set of choreographies performed during a dance examination are standard. Every
student is judged based on their hand gestures (hastas) and their body postures
(anagashuddi).
2. Using the technique of skeleton detection, the joint velocities and the direction of the
movement of joints are calculated successfully. This way the postures of the dancer are
judged using simple equipment like the camera on a mobile phone instead of using costly
equipment like the Microsoft Kinect.
3. Since the evaluation of the Nritta component of the Bharatanatyam dance is considered,
the hand gestures and the body postures only are taken into account.
There are very few systems to help in the analysis of the dance performances and most of
them include the usage of the Microsoft Kinect tool. Our tool would help replace these
systems with cheaper and more easily available equipment like the camera on the mobile
phones. This tool can later be easily extended to other dance forms by changing the HAAR
XMLs used for the skeleton detection.

4.2. Project Function
Following are the functions the tool performs:
1. The input videos captured using an 8MP or more resolution camera is fed to the system
which extracts a frame from the video for every 100ms.
2. The extracted frames then go to the Face Detection module where we use 4 HAAR
Classifier XMLs to detect the face of the dancer. We mark the coordinates of the face
and the neck using the contour detected in this module. These values are then written into
the database.
3. After the Face Detection, the images are passed onto the Upper Body Detection module
where we get the coordinates of the shoulders.
4. The Lower Body Detection module then marks the coordinates of the knees and hips for
the dancer. These coordinates are also noted in the database.
5. Skin Detection is then used to mark the hand and feet.
6. The Scoring module then compares the values of the student with those of the standard
dancer and then assigns an appropriate score for every frame. The average of these scores
is displayed as the final score of the dancer.
7. The Analysis module helps compare the older performances of the dancer with the latest
ones. This would help the dancer improve their performance.

4.3. User Classes and Characteristics
The user would be any student of Bharatanatyam who is preparing for the Dance
Examination. They can use this tool to judge their improvement in their performances.
The organizers of the Dance Examination are also the potential users. They can use this
tool to conduct the Dance Examinations without the presence of a judge. This way the
evaluations are easier and fair.
4.4. Operating Environment
The operating environment for this tool would be a room with preferably plain walls as the
background. The dancer needs to be close to the camera used to record the performance.
4.5. Design and Implementation Constraints
1. The dancer has to be within 5 feet of the camera which is used to record the
performance. A camera with better resolution can be used to help with accurate
detection of the joints.
2. The HAAR classifiers used to detect the joints may give inaccurate results sometimes.
In order to minimize the inaccuracy constraints for each of the joints being detected
were added.
3. The skin detection algorithm may results in marking wrong contours due to the skin
color detection. The correct regions are detected by detecting the skin within the region
of the upper body only.

4.6. User Documentation
1. The user has to record the dancer performance to be evaluated using a still camera
which could be a mobile camera.
2. This video in the mp4 format has to be fed into our data analyzer with an identifier to
recognize which choreography is being performed in the video. This information can
be easily entered into the tool using the appropriate input fields displayed in GUI of the
tool.
3. After analysis, a Report Card is displayed which shows the score allotted for every
frame. It also gives some suggestions about the area for improvement based in the
scores allotted for upper body and lower body.
4. The user can also view their scores in the previous performances and judge their
improvements for the same choreography.
4.7. Assumptions and Dependencies
Assumptions:
 Plain background for the dancer while recording the performance.
 Good lighting in the room while recording the performance of the dancer.
 The process of recoding starts immediately with the music. Hence the length of the
choreography for a particular choreography will be the same.
 The camera is fixed to a particular position during the whole recording process.
 The dancer is assumed to be within the frame of the camera during the whole
performance.
Dependencies:
 The skeleton detection process depends on the resolution of the images supplied to
it. Hence a camera with 8MP or more resolution is preferred.

 The dancer is supposed be within 5 feet of the camera while dancing for clearer
images.
 This tool is now trained for 5 specific choreographies. Hence only these can be
tested for any student.

CHAPTER V
SOFTWARE REQUIREMENTS SPECIFICATIONS

5.1. External Interface Requirements
5.1.1. User interfaces
 Interface to upload the input videos to the system.
 Interface to view the scores after the comparison to the choreographers’ videos
 Interface to view the comparison of frames of the videos (to help improve
performance)
5.1.2. Software Interfaces
 Technologies
o MathPlotLib for Visualization
o OpenCV
 Languages
o Python
5.2. Hardware requirements
 Camera (8MP and above)
 Laptop with minimum 4GB RAM

5.3. Functional Requirements
5.3.1. Training of the system
 Description : Training the system with the “correct” videos which will later be
used as reference to compute scores for the other videos
 Input : 2 Choreographer Videos
 Output : SQLite Database
5.3.2. Generation of Blocks of the input video
 Description: The input video is pre-processed and frames are generated. The
appropriate frames are then grouped into blocks. The blocks of frames are then
used for scoring different sections of the performance.
 Input : Input Video
 Output : Blocks of frames
5.3.3. Identification of Joints
 Description: The frames in the blocks are analyzed to identify the skeleton of the
human in the video. The co-ordinates of the joints are recorded.
 Input : Blocks of frames
5.3.4. Identification of Hand Gestures
 Description: The frames in the blocks are analyzed to identify the hand gestures
in the video. The data is recorded.
 Input : Blocks of frames

5.3.5. Scoring of the Body Movements
 Description: The co-ordinates and the data related to the input video are
compared with the values corresponding to the choreographers’ videos. A score
is calculated based on the comparison.
 Input : SQLite Database
 Output : Score out of 100
5.3.6. Visualizing the Results
 Description: The score calculated based on the comparison is presented to the
user. The scores for the hand gestures and the body movements will be separately
shown. If possible, we will also display the areas of improvements for the user.
 Input : SQLite Database
 Output : Graphs

5.4. Non-Functional Requirements
5.4.1. Performance
 The system must be able to compute the scores within a few seconds.
 The system must be able to process videos as long as 5 minutes.
5.4.2. Reliability
 The system must have at least 60% accuracy with high precision and recall.
 It should calculate appropriate scores.
5.4.3. Maintainability
 Training the system with more dance choreographies should be easy.
5.4.4. Usability
 The system should be easy and simple to use.
 The calculated results must be visualized in such a way that they are easy to
understand.

CHAPTER VI
GANTT CHART

CHAPTER VII
SYSTEM DESIGN

7.1. Block Diagram
The system is divided into the following logical blocks.
Fig 7.1 Block Diagram
7.1.1. Camera
This component consists a camera with resolution 8MP or more and is used to record the
performances of the dancer which are to be analyzed. This camera is assumed to start
recording as soon as the music starts.
The output (video of mp4 format) is then fed to the data analyzer for further processing.
7.1.2. Data Analyzer
This component is responsible for the analysis of the video fed into the system. It converts
the video into appropriate frames and performs skeleton detection algorithm on these
frames. The coordinates of the face, neck, shoulders, hips and knees are recognized. It also
recognizes the posture of the dancer. This data is then compared with the data of the
standard dancer. The scoring module then allots score for every frame and prints the report
card with suggestions along with a graph.
Camera
Data
Analyser
Database

7.1.3. Database
This system used SQLite Database to store the coordinates of the joints detected. SQLite
is an in-process library that implements a self-contained, serverless, zero-
configuration, transactional SQL database engine. SQLite is an embedded SQL database
engine. Unlike most other SQL databases, SQLite does not have a separate server process.
SQLite reads and writes directly to ordinary disk files. A complete SQL database with
multiple tables, indices, triggers, and views, is contained in a single disk file. These features
make SQLite a popular choice as an Application File Format.

7.2. Architecture
Fig 7.2. High Level Architecture Diagram
This section will describe the overall hardware and software architecture of the system. As
shown in figure, the system is primarily composed of three components: Camera, the Data
Processing Module and the Visualization Module. The video from the camera is uploaded
to the system acting as the Data Analyzer. The other modules which are part of the system
though discrete, communicate with each other over function calls.
A brief description of the architecture is as follows:

7.2.1. Camera
A camera with resolution 8MP or more is used to record the performance of the dancer.
This video is then fed into to Data Analyzer using a USB cable or via Bluetooth.
7.2.2. Controller
This module is used to verify if the video file fed to the system is a valid mp4 file. It also
takes as input the details of the user and video type which help the system to perform and
store the details of the analysis correctly.
7.2.3. Object Recognition Module
This module is responsible for the joint recognition for the dancer. It detects the face, neck,
shoulders, hips and knees for the dancer in all the frames of the video. It also performs skin
detection in order to recognize the posture of the dancer in the frame.
7.2.4. Scoring and Analysis Module
This module is the most important part of the system. It compares the joint coordinates of
the dancer with that of the choreographer and assigns an appropriate score for every frame
under analysis.
7.2.5. Visualization Module
This module is used to display a graph for the user plotting the score obtained for every
frame of the video. On clicking a point on the graph, the user can see the corresponding
frame of their video and the choreographer’s video. This helps them compare and improve
their performance. This module also gives a report card as an output which displays the
average scores for upper body, lower body, hand gestures and the overall score. This way
the user can analyses their scope of improvement.

7.2.6. Database
A SQLite Database is used to store the details of the session and the details of the joints
recognized in every frame of the video. These values are used for scoring. The values from
the database are also used by the visualization module to provide the user with a graphical
representation of the scores and a comparison of their performance with the
choreographer’s performance.

7.3. Use Case Diagram
The above use case diagram indicates the various functionalities the actor (dancer) will be
able to avail from the system
 The input video will be converted into frames which will later help visualize the
scoring.
 The system will detect the face of the dancer for each of the frames extracted from
the video.
 The system will detect the upper body of the dancer which includes the shoulder.
 The system will detect the lower body of the dancer which consists of the hip and
the legs of the dancer.
 The system will perform skin detection for the dancer in order to detect the posture
of the dancer in the frame.
 The system displays the average score of the dancer for the performance.

Fig 7.3. Use Case Diagram

7.4. Scenario Diagram
 The system is initially trained for the 5 standard dancer’s videos. All the coordinates
are stored in the DB.
 The user has to feed the system with a video with extension mp4 which is later
converted into frames. A frame for every 100ms of the video is extracted. The details
of the frame are stored in the DB with an EvalId which is used to recognize the
analysis.
 The system then uses the frames created in the previous step and performs Face
Detection on them. The coordinates of the face and neck found here are written into
the DB.
 After detecting the face, the system proceeds to detect the upper body. It selects a
contour such that the face lies within it and we can use this contour to detect the
posture of the dancer. These details are also written into the DB.
 The system then continues with the lower body detection which detects if the dancer
is sitting or standing. The details of the lower body are then stored in the DB.
 The coordinates of the joints for the dancer and then compared with those of the
standard dancer. The system considers the joint velocity and the direction of
movement while scoring. Appropriate scores are then allotted based on the percentage
of inaccuracy.
 A report card and a graph are displayed at the end of the analysis which help the user
analyses the area of improvement.

Fig 7.4. Scenario Diagram for the Whole System

Fig. 7.5. Scenario Diagram for the Detection Modules

CHAPTER VIII
DETAILED DESIGN

8.1. Modules
This section contains a detailed breakdown of all the modules used in the system.
8.1.1. Extraction of Frames
A video is fed to the system. The Frame Extraction module grabs the next frame from the
open camera, or a video fed. If a frame could be grabbed, the value TRUE is returned and
separate folder is created, wherein all the frames are saved. If the end of the video has been
reached, the value FALSE is returned and the process of extracting frames ends.
8.1.2 HAAR Training
This module uses Machine Learning Algorithm to detect features and objects. The algorithm
has four stages:
1. Haar Feature Selection
It uses the opencv_createsamples utility for preparing a training dataset of positive and
test samples. The positive images are first cropped to capture only the required feature.
2. Creating an Integral Image
An image representation called the integral image evaluates rectangular features
in constant time, which gives them a considerable speed advantage over more sophisticated
alternative features. Because each feature's rectangular area is always adjacent to at least
one other rectangle, it follows that any two-rectangle feature can be computed in six array
references, any three-rectangle feature in eight, and any four-rectangle feature in nine.
The integral image at location (x, y), is the sum of the pixels above and to the left of (x, y)
inclusive. The module finally creates a .vec with the details of the image and the position

of the object to be recognized.
3. Adaboost Training
The opencv_traincascade or opencv_haartraining is used to train the cascade classifier
which ultimately produces an XML file for every stage. This module uses 1000+ positive
images and 1500+ negative images for the purpose of training.
4. Cascading Classifiers
The haarconv file is used to combine the XMLs created for every stage into one single
XML file which is finally used for the object detection.
8.1.3. Face Detection
This module uses the face.xml generated during HAAR Training. For each image, it searches
for a human face and if found, it marks the face with a rectangle. This image with the marked
rectangle is stored in a separate folder.
8.1.4. Upper Body and Lower Body Detection
These two modules detect the upper body and lower body in a given frame. After detection,
they mark the area with a rectangle and store this frame in the respective folders for upper
body and lower body.
8.1.5 Skin Detection
Skin detection is used for the purpose of face detection and hand gesture detection. Since
we already use HAAR training for face detection, we use skin detection for hand gestures.
Once the hand is detected, various gestures are identified by the angle between the fingers.
Skin detection uses a combination of colors to detect various shades of the skin. A number

of algorithms are present for the purpose of detection. We use a simple RGB combination
to do the task.
8.1.6 Scoring Module
This module performs the final calculation of scores for the performance. The module is
divided into three parts – upper body score, lower body score and hand gesture score.
The upper body and lower body scores are provided based on the joint velocity and the
direction of movement of the joints. Various levels of scores are provided based on the
percentage of deviation from a standard video. The hand gestures are broadly divided into 5
categories. If the hand gesture falls in the same category as that of the standard video, marks
are awarded, else a score of 0 is given.
The final score is an average of the above three scores. A report card is provided at the end
along a graph and a list of suggestions for improvement.

8.2 Database Design
The tool uses a SQLite instance as the database. The database is used to store the coordinates
of the joints detected for long term storage and data visualization once the analysis is
complete.
The following are the details of the table part of the Dance_Eval Database:
1. User List
eval_id user_name video_type
 eval_id: A unique integer value to help keep track of every analysis session.
 user_name: User name of the dancer (student).
 video_type: An integer used to mention the choreography performed in the
video.
2. Image List
eval_id image_name score
 eval_id: The unique integer to keep track of the session.
 image_name: Name of the frame extracted from the video.
 score: Score allotted for the frame.
3. Joint Details of Face
eval_id image_name facex facey neckx necky

 facex and facey: Coordinates of the joint marked as face.
 neckx and necky: Coordinates of the joint marked as neck.
4. Joint Details of Upper Body
eval_id image_name shoulder
_rightx
shoulder
_righty
shoulder
_leftx
shoulder
_lefty
 shoulder_rightx and shoulder_righty: Coordinates of the joint marked as the right
shoulder.
 shoulder_leftx and shoulder_lefty: Coordinates of the joint marked as the left
shoulder.
5. Joint Details of Lower Body
eval_id image_name hip_rightx hip_righty hip_leftx
hip_lefty knee_rightx knee_righty knee_leftx knee_lefty
 hip_rightx, hip_righty, hip_leftx and hip_lefty: Coordinates marked as the
corresponding joints of the hip.
 knee_rightx, knee_righty, knee_leftx and knee_lefty: Coordinates marked as the
corresponding joints of the knees.

6. Upper Body Posture Details
eval_id image_name posture_name
 posture_name: Category of the posture detected of the dancer in the frame.

CHAPTER IX
IMPLEMENTATION

9.1.Implementation Choices
 Python Programming Language
 Python OpenCV 3.0.0 version API for Image Processing
 HAAR Cascade Classifier for Training the Image Processing Algorithm
 MathPlotLib API for Visualization
9.2. Justification for Implementation Choices
9.2.1. Programming Language – Python vs C++
 Python programs are generally expected to run slower than C++ programs, but they
also take much less time to develop. Python programs are typically 3-5 times shorter
than equivalent C++ programs. This difference can be attributed to Python's built-in
high-level data types and its dynamic typing. For example, a Python programmer
wastes no time declaring the types of arguments or variables, and Python's powerful
polymorphic list and dictionary types, for which rich syntactic support is built straight
into the language, find a use in almost every Python program. Because of the run-time
typing, Python's run time must work harder than Java's. Also, Python has a better and
easier support for the appropriate image processing libraries. Hence, Python was
chosen.
9.2.2. Computer Vision Library – OpenCV vs MatLab
 OpenCV was chosen because it was fast and easy to setup. It is a library of
programming functions mainly aimed at real-time computer vision. It focuses mainly
on real- time image processing. It supports a lot of languages and runs various
platforms. It is also an open source library.

 MatLab is a proprietary software which is not a dedicated Image and Video processing
library. The reason this was not chosen was that it was expensive and not really suited
to the requirements.
9.2.3. Object Recognition Algorithm - HAAR Cascade Classifier vs others
The characteristics of HAAR Cascade Classifier (Viola–Jones algorithm) which make it a
good detection algorithm are:
 Robust – very high detection rate (true-positive rate) & very low false-positive rate
always.
 Real time – For practical applications at least 2 frames per second must be processed.
 Extremely fast feature computation
 Efficient feature selection
 Scale and location invariant detector
 Instead of scaling the image itself (e.g. pyramid-filters), it scales the features.
 Such a generic detection scheme can be trained for detection of other types of objects
(e.g. cars, hands)

9.3.Face Detection
Face Detection algorithm is implemented using the HAAR XMLs provide by the OpenCV
Module. The following list of XMLs are used for face detection:
 HAARcascade_frontalface_alt.xml
 HAARcascade_frontalface_alt2.xml
 HAARcascade_frontalface_default.xml
 HAARcascade_profileface.xml
For every frame, the above XMLs are used for detection in same order until a face has been
detected. The smallest contours among the contours detected is selected and marked as the
face. The neck joint is marked below the face joint. In case a face has not been detected in the
frame, the coordinates of the face in the previous frame are referred and marked for this frame
too.
Fig 9.1. Face Detection for the Standing and Sitting Postures

Fig 9.2. Algorithm used for Face Detection

9.4.Upper Body Detection
Upper Body Detection algorithm is implemented using the XML that we trained. The upper
body would include the face and the shoulders of the dancer. A total of 1000 positive images
and 1500 negative images were used to train the HAAR XML for upper body. This algorithm
is run for every frame where the largest contour containing the face is detected and marked as
upper body. The shoulders are marked on the base line of the contour detected. The right
shoulder is 1/5th
of the width from the right bottom corner and the left shoulder is 4/5ths of
the width from the same corner.
Fig 9.3. Upper Body Detction for the Standing and Sitting Postures

Fig 9.4.Algorithm for Upper Body Detection

9.5.Lower Body Detection
Lower Body Detection algorithm was implemented in a similar way as upper body detection.
The XML was trained to include the hips and knees of the dancer. The dancer’s sitting or
standing postures are also detected, based on the coordinates of the face of the dancer and then
the joints are marked appropriately.
 If the face is marked within 35% of the height of the image, the dancer is assumed to
be standing. Their knees are marked at half the height of the contour selected and 1/3rd
the width inside the contour for the corresponding knee.
 If the face is marked within 35% to 55% of the height of the image, the knees are
marked at half the height of the contour selected and 1/5th the width inside the contour
for the corresponding knee.
 If the face is marked beyond 55% of the height, we assume the dancer to be sitting and
mark the knees at the same height as before but on the edges of the contour selected.
Fig 9.5. Lower Body Detection for the Standing and Sitting Postures

Fig 9.6. Algorithm used for Lower Body Detection

9.6.Skin Detection
Skin Detection algorithm is used to detect the posture of the dancer. We have broadly
classified the postures into 5 categories based on the contours defects detected.
Fig 9.7. Algorithm used for Skin Detection and Posture Detection

9.7.Scoring
Scoring algorithm is implemented considering the direction of movement of the joints, joint
velocity and the posture detected. The average scores of the upper body joints, lower body
joints and the postures are displayed in the final report. We give consider 10%, 15% and 20%
inaccuracy and allot 10, 5 or 3 points respectively. Beyond 20% inaccuracy, the score allotted
is zero. Considering the hand postures, if it is correct we award 10 points else we mark it as
zero.
Fig 9.8. Algorithm used for Scoring

9.8.Analysis and Visualization
The tool displays a graph at the end of the analysis allotting the score allotted for each frame.
This graph also helps the user compare their performance with that of the choreographer. This
can be done by clicking on any point plotted on the graph which would display windows with
the corresponding frame of the user and the choreographer. This tool also displays a report
card with the average scores for the overall performance, the upper body joints, the lower body
joints and the hand gestures. This way the user is able to analyze where they need to improve
in order to increase their score. The report card gives the user suggestions based on this
analysis.
Fig 9.9. Graph plotting the score of each frame and the Report Card printed at the end of the
analysis

Fig 9.10. Comparing the student’s frame with the corresponding frame of the choreographer

CHAPTER X
INTEGRATION

Integration is defined as the process of bringing together the component subsystems into one
system and ensuring that the subsystems function together as a system. In information technology,
systems integration is the process of linking together different computing systems and software
applications physically or functionally, to act as a coordinated whole.
A combination of top-down and bottom-up integration was followed in the project. In other words,
the middle-out integration strategy was followed. As and when components were developed, they
were integrated with the rest of the system and tested. In other words, incremental software
development approach was followed.

CHAPTER XI
TESTING

11.1. Unit Testing
11.1.1. Database Interaction
Test Case Expected Result Actual Result Conclusion
Frames with marked
coordinates.
Insertion of row for
each frame.
Rows inserted for each PASS
Query for frames for
particular dancer
All frames related to a
person.
person.
PASS
Query for frames Obtain frames in order
of time
Frames obtained in
Random order
FAIL
11.1.2. Generation of Frames
Video Input Generation of 1 frame
every 100 ms
Frames generated
every 100ms
PASS
Set of all frames All frames related to a
person.
person.
PASS
11.1.3. Training Algorithm
Image of dancer
(Face Detection)
Face marked Face marked 80% accurate
Image of dancer
(Upper body detection)
Upper body marked Upper body marked 60% accurate
Image of dancer
(Lower body
detection)
Lower body marked Lower body marked 60% accurate

11.1.4. Scoring Algorithm
Compare Dancer with
herself
score = 10/10 score = 8.6/10 85% accurate
Evaluate good dancer
against professional
score >5 score = 6 70% accurate
Evaluate below
average dancer against
professional
score <=5 score= 2.7 60% accurate
11.1.5. Synchronization
Input of two videos Second to second
synchronization.
Mismatch FAIL
Input two edited videos Second to second
synchronization.
Perfect match PASS

11.2. Integration Testing
Video fed to system Creation of frames Frames created for
every 100 ms
PASS
Professional Dancer
performs
Coordinates of joints
fed to DB
fed to DB
PASS
Amateur Dancer
performs
fed to DB
fed to DB
PASS
Amateur Dancer
performs
Marks card generated Marks card generated PASS
11.3. System Testing
Video fed to system Generation of Marks
card.
Final Marks card
generated.
PASS
Dancer Video Captured Comparison with
professional and
scoring.
Comparison done with
scoring.
PASS

CHAPTER XII
SCREENSHOTS

Fig 12.1. Introduction Screen for the Tool
Fig 12.2. Window to enter the details for the Analysis

Fig 12.3. Window to upload the video for Analysis
Fig 12.4. Screen displaying the progress of the Analysis

Fig 12.5. Screenshot of a frame given a high score
Fig 12.6. Screenshot of a frame given a low score

Fig 12.7. Screenshot of the graph plotted with the score for every frame
Fig 12.8. Report Card printed at the end of the Analysis

CHAPTER XIII
CONCLUSION

Bharatanatyam is considered as a major art form in our country. A large number of people consider
pursuing it as their career. Considering how important this is, the examinations held every year to
assess students in this field have to be highly organized. Based on recent incidents, we see a lack
of quality in this evaluation process.
Current solution for this problem is to find better panel members at higher. Another proposed
solution could be to use devices like Kinect which tracks your skeleton movements as you dance.
But both these solutions could potentially be unaffordable. Our project tries to provide a cheaper
solution to this problem. The only cost incurred would be in buying the two cameras and getting
the standardized videos for the various choreographies.
This prototype has been discussed with various dance students and they have agreed that it would
help in the improvement of dance evaluations and are ready to volunteer in the process of data set
creation.

CHAPTER XIII
FUTURE ENHANCEMENTS

A few things can be done to improve the accuracy of the system further and strengthen the process
of evaluation.
 Firstly, training for object detection can be done on a larger scale using a more powerful
process with higher number of positive and negative images.
 Secondly, gesture detection for the hands can be done with more accuracy that could detect
the exact gesture correctly.
 The final enhancement that could completely replace the existing evaluation system is the
detection of facial expressions. With the right amount of training and rule creation,
detection of facial expressions should be possible too.
Some extra features that can be added to the tool are:
 Comparing older performances of the same dancer.
 Giving more detailed suggestions about the scope of improvement.
 Train the tool to analyze more choreographies.

CHAPTER XIV
REFERENCES

[1] Abhishek Kar, Dr. Amitabha Mukerjee & Dr. Prithwijit Guha, “Skeletal Tracking using
Microsoft Kinect”, Dimitrios Alexiadis, Petros Daras Informatics and Telematics Institute,
Thessaloniki, Greece
[2] Philip Kelly, Noel E. O’Connor, Tamy Boubekeur, Maher Ben Moussa, “Evaluating a
Dancer’s Performance using Kinect-based Skeleton Tracking”
[3] Dimitrios S. Alexiadis and Petros Daras, Senior Member, IEEE, “Quaternionic signal
processing techniques for automatic evaluation of dance performances from MoCap data”
[4] Matthias Dantone, Juergen Gall, Member, IEEE, Christian Leistner, and Luc Van Gool,
Member, IEEE, “Body Parts Dependent Joint Regressors for Human Pose Estimation in Still
Images”, IEEE Transactions on Pattern Analysis, Vol. 36, No. 11, November 2014
[5] Megha D Bengalur, “Human Activity Recognition Using Body Pose Features and
Support Vector Machine”, Department of Electronics and Communication Engineering.
BVBCET, Hubli
[6] Paul Viola and Michael Jones, “Rapid Object Detection using a boosted cascade of a
sample”,
[7] Iyengar Mukunda Kalpana, “Bharatanatyam and Mathematics: Teaching Geometry
Through Dance”, Journal of Fine and Studio Art

Automated Evaluator for Bharatanatyam (Nritta)

More Related Content

What's hot (11)

Similar to Automated Evaluator for Bharatanatyam (Nritta) (20)

Recently uploaded (20)

Automated Evaluator for Bharatanatyam (Nritta)