georgefox_template1

Methods
 Data from over 100 different users was collected
on 12 pre-defined gestures: Tap, Two finger tap,
Swipe, Grab, Release, Pinch, Wipe, Checkmark,
Figure 8, Lower case e, Capital E, Capital F
 Total dataset is 1.5GB and ~9,600 gesture
instances
 Preprocessing gestures into hand-selected
features, such as maximum displacement, and
feeding into three-layer neural network returned
mixed results
 Needed a way to turn messy temporal data with
variable timespans for the same gesture into a
fixed representation with constant-sized input
 To do this, created motion images of each type of
gesture by mapping 3D locations to pixels,
projecting onto XY, YZ, and XZ planes
 This image representation of gestures provides
a lot of flexibility for learning models
 Well defined methods exist for image
classification
 Can include temporal history into the images by
decaying older parts of the path
 Can easily augment the dataset through skews,
reflections, and transformations of the captured
images instead of modifying the underlying data
 Data augmentation leads to reduced overfitting
in the learning models
 Deep belief nets: use a Restricted Boltzmann
Machine trained with contrastive divergence to
extract features from the data without needing
class labels
 Stack RBMs by using the output of a hidden
layer as the visible input to the next RMB
 Add softmax output layer as the classifier and
use backpropagation to finetune the model
 Convolutional Neural Networks: widely used
for the task of handwritten digit classification
and object recognition
 Also combines feature extraction with
classification, but greatly reduces the number of
parameters required by sharing them between
feature maps
 Can tolerate translations and skew in the image
through overlapped layers and pooling
Figure 2. The data captured from the Leap Motion device
Discussion
 G.E. Hinton and R.R. Salakhutdinov, Reducing the
Dimensionality of Data with Neural Networks, Science,
28 July 2006, Vol. 313. no. 5786, pp. 504 - 507.
 G.E. Hinton, S. Osindero, and Y. Teh, “A fast learning
algorithm for deep belief nets”, Neural Computation, vol
18, 2006.
 Y. LeCun, L. Bottou, Y. Bengio and P. Haffner: Gradient-
Based Learning Applied to Document Recognition,
Proceedings of the IEEE, 86(11):2278-2324, November
1998.
 Graves, Alex. Supervised sequence labelling with
recurrent neural networks. Vol. 385. Springer, 2012.
 Hochreiter, S., & Schmidhuber, J. (1997). Long short-
term memory. Neural computation, 9(8), 1735-1780.
 Leap Motion. https://www.leapmotion.com/. Accessed
March 19, 2014.
 LeCun, Yann. "LeNet-5, convolutional neural
networks". Accessed 24 February 2015.
Future Work
References
Introduction
 Leap Motion controller uses IR cameras to
capture the position and orientation of a hand
 Allows your hands to be present onscreen, using
them to perform gestures and actions as a way
to interact with a computer without a mouse
 By itself the model can only say what is
happening now, not what has been done over
time or what the user is trying to communicate
 Common solution is to use a finite state machine
to map gestures to controls:
If moving down at 45° then up at 45° → check mark
 This can’t scale to a large corpus of gestures and
can’t segment or parse continuous gestures into
a semantically meaningful language
 Continuous gesture recognition has applications
in sign language translation, design tools, robot
control, gaming, and stereoscopic computing
Objectives
 Collect a dataset using the Leap Motion
controller for training models
 Use deep learning to segment, classify, and
parse a series of human input gestures
 Create a “gesture engine” that can be trained on
desired gestures and used inside Leap Motion-
enabled applications for continuous, meaningful
interaction with a computer without a mouse
3D Gesture Recognition with the Leap Motion Controller
Robert McCartney, Dr. Hans-Peter Bischof
Rochester Institute of Technology
Figure 1. Leap
Motion’s model
of the hand
Figure 3. A checkmark Figure 5. A figure 8
Figure 4. A capital E
Figure 6. RBM as an energy model
Figure 7. Deep belief net
Figure 8. Convolutional NN
Taken from http://parse.ele.tue.nl/education/cluster2
 Working with Jie Yuan to incorporate his HMM
in order to model the hidden rejection state
when the user is not performing any gesture
 The HMM will segment online gestures, with
segmented data then turned into motion images
 This will allow for continuous online gesture
recognition
 Alternative future approaches include Recurrent
Neural Networks and LSTMs
 Various dimensionality reduction techniques
need to be explored: autoencoders, PCA, and
manifold methods (LPP)
 Pending issue: long training times require GPU
implementations to run efficiently

georgefox_template1

More Related Content

What's hot (17)

Similar to georgefox_template1 (20)

georgefox_template1