SlideShare a Scribd company logo
Notice
• This power point is made by Mitsuru
  Nakazawa, NOT an original author, for paper
  introduction of ECCV2012




                                                1
Presenter: Mitsuru NAKAZAWA



 Performance Capture of Interacting
  Characters with Handheld Kinects

       Genzhi Ye1 Yebin Liu1 Nils Hasler2
 Xiangyang Ji1 Qionghai Dai1 Christian Theobalt2
              1: Deptartment of Automation, Tsinghua University
    2: Graphics, Vision & Video Group, Max-Planck Institute for Informatics




ECCV2012 paper introduction                                                   2
Introduction movie




URL: http://media.au.tsinghua.edu.cn/yegenzhi/HandheldKinectsMocap_ECCV2012.jsp
(Accessed on 26th Nov. 2012)




                                                                                  3
Related works
Multi-view motion capture approaches
   Reconstruct a skeletal motion model & detailed
    dynamic surface geometry
   Deal with people wide apparel
   Require controlled studio setup (many number
    of sync video cameras)

Marker-less motion capture from a single range
sensor
   Estimate complex poses at real-time frame rates
   Difficult to capture 3D, complex, detailed model
                                                       4
Objective
                                 freely move
Full performance capture       Operator
of moving humans using
     only 3 handheld,
                                   Performer
      moving Kinects

 Reconstruct a skeleton motion & time varying surface
  geometry of humans in general apparel
 Handle fast and complex motion with many self-
  occlusions & non-rid surface deformation
 Not need studios with controlled lighting and many
  stationary cameras
                                                         5
Data capture
   Operator




         Performer
    Capture environment             Captured data from 3 Kinects

 Asynchronous capture
    Use a start recording signal to all PCs connected through Wi-Fi
 Intrinsic calibration
    Apply Zhang’s method
 Alignment between the color image and the range data
    Use the OpenNI API
                                                                       6
Scene models at time t
       • Human model
              – Laser scanner provide a static mesh with embedded
                skeleton of each performer
                                         [*]

                                                            • 5,000 vertices of meshes
                                                            • k-th performer’s Skeleton with
                                                              31 degrees of freedom: C tk
                           GND (r=3m)

         • Ground plane model (fixed)
                – Center of Environment
                – Planar mesh with circular boundary
         • Camera extrinsic parameters of i-th Kinect
                – Translation, rotation: L tk
                                                                                                                      7
[*] F. Remondino: “3-D reconstruction of static human body shape from image sequence,” CVIU, Vol.93, No.1. pp.65-85
Overview of the proposed method
at time
    t            Geometric matching of Kinect point to
                    Point Could Segmentation
                     vertices of a human model



                 Optimization of skeleton & camera pose



                  Non-rigid deformation of the human
                   surface via Laplacian deformation
                                                          8
Optimization of skeleton                     and camera pose
Error function E is solved within iterative quasi-Newton minimization



 • Human region error between model vertices and Kinect point cloud
 • Ground region error between model vertices and Kinect point cloud
 • Difference of matched SIFT feature positions between previous and current
   time on background regions (SfM approach)




 Result using   t−1   &   t−1   Result based on SfM approach   Optimized result 9
Comparison with Multi-view Video Tracking
Multi-view video trachking system with 10 calibrated cameras vs. Proposed method

    “Rolling” with slow motion  Similar results




    “Jump” with fast motion  proposed method gets better results




                                                                             10
Performance capture results
 on a variety of Sequences




                              11
Conclusion
• Simultaneously marker-less performance
  capture system with several hand-held Kinects
  – Iterative robust matching of tracked 3D models
    and input Kinect data




                                                     12
References
• Linear Blend Skinning (Accessed on Nov. 25th 2012 )
   – http://bit.ly/RaijkQ
• Motion Capture Using Joint Skeleton Tracking and
  Surface Estimation (Accessed on Nov. 26th 2012)
   – http://www.vision.ee.ethz.ch/~gallju/projects/skelsurf/ind
     ex.html




                                                                  13

More Related Content

PDF
Introductory Level of SLAM Seminar
PDF
AR/SLAM for end-users
PPTX
SIGGRAPH 2014 Course on Computational Cameras and Displays (part 3)
PDF
Analysis of KinectFusion
PDF
FastCampus 2018 SLAM Workshop
PPTX
Kintinuous review
PDF
Esa act mtimpe_talk
PPTX
SIGGRAPH 2014 Course on Computational Cameras and Displays (part 2)
Introductory Level of SLAM Seminar
AR/SLAM for end-users
SIGGRAPH 2014 Course on Computational Cameras and Displays (part 3)
Analysis of KinectFusion
FastCampus 2018 SLAM Workshop
Kintinuous review
Esa act mtimpe_talk
SIGGRAPH 2014 Course on Computational Cameras and Displays (part 2)

What's hot (20)

PDF
Point Cloud Stream on Spatial Mixed Reality: Toward Telepresence in Architect...
PDF
20210226 esa-science-coffee-v2.0
PDF
論文紹介"DynamicFusion: Reconstruction and Tracking of Non-­‐rigid Scenes in Real...
PDF
Introduction of slam
PDF
Visual Environment by Semantic Segmentation Using Deep Learning: A Prototype ...
PDF
Tracking Robustness and Green View Index Estimation of Augmented and Diminish...
PDF
3D SLAM introcution& current status
PPTX
Urban 3D Semantic Modelling Using Stereo Vision, ICRA 2013
PDF
30th コンピュータビジョン勉強会@関東 DynamicFusion
PDF
VSlam 2017 11_20(張閎智)
PDF
20th. Single Molecule Workshop Picoquant 2014
PPTX
Towards Accurate Multi-person Pose Estimation in the Wild (My summery)
PPTX
Intelligent robot used in the field of practical
PPTX
Action Genome: Action As Composition of Spatio Temporal Scene Graphs
PPTX
[Mmlab seminar 2016] deep learning for human pose estimation
PDF
An Open Source solution for Three-Dimensional documentation: archaeological a...
PPTX
(Progress Presentation) Autonomous Quadcopter Navigation
PDF
Artificial intelligence at the edge
PDF
Creating smaller, faster, production-ready mobile machine learning models.
PDF
DTAM: Dense Tracking and Mapping in Real-Time, Robot vision Group
Point Cloud Stream on Spatial Mixed Reality: Toward Telepresence in Architect...
20210226 esa-science-coffee-v2.0
論文紹介"DynamicFusion: Reconstruction and Tracking of Non-­‐rigid Scenes in Real...
Introduction of slam
Visual Environment by Semantic Segmentation Using Deep Learning: A Prototype ...
Tracking Robustness and Green View Index Estimation of Augmented and Diminish...
3D SLAM introcution& current status
Urban 3D Semantic Modelling Using Stereo Vision, ICRA 2013
30th コンピュータビジョン勉強会@関東 DynamicFusion
VSlam 2017 11_20(張閎智)
20th. Single Molecule Workshop Picoquant 2014
Towards Accurate Multi-person Pose Estimation in the Wild (My summery)
Intelligent robot used in the field of practical
Action Genome: Action As Composition of Spatio Temporal Scene Graphs
[Mmlab seminar 2016] deep learning for human pose estimation
An Open Source solution for Three-Dimensional documentation: archaeological a...
(Progress Presentation) Autonomous Quadcopter Navigation
Artificial intelligence at the edge
Creating smaller, faster, production-ready mobile machine learning models.
DTAM: Dense Tracking and Mapping in Real-Time, Robot vision Group
Ad

Viewers also liked (20)

PPTX
Benua
DOCX
PDF
hlooooooo
DOC
Resume: Research Engineer
PPTX
Spring 2012 capstone
PPTX
Nontraumatic musculoskeletal disorders
PPTX
gghjkl
PPTX
Connecting Rehab From The Training Room
PDF
Marketing pitch
PDF
Vacon NXP Common DC Bus products
PPSX
CIRCULAR IPC FEBRERO 2016
PDF
Tmw20116 brooks.l
PPTX
LIEK DIS IF U CRY EVRYTIEM
PDF
Khilafat Magazine Issue 2
PDF
PDF
RIO TINTO IRON ORE ESSSENTIALS - NOV 2013 CERTIFICATE-1
PDF
jQuery Internals + Cool Stuff
PPTX
06.01 sql select distinct
PDF
Royal Dutch Shell plc CFO Simon Henry - Barclays conference in New York, Sept...
Benua
hlooooooo
Resume: Research Engineer
Spring 2012 capstone
Nontraumatic musculoskeletal disorders
gghjkl
Connecting Rehab From The Training Room
Marketing pitch
Vacon NXP Common DC Bus products
CIRCULAR IPC FEBRERO 2016
Tmw20116 brooks.l
LIEK DIS IF U CRY EVRYTIEM
Khilafat Magazine Issue 2
RIO TINTO IRON ORE ESSSENTIALS - NOV 2013 CERTIFICATE-1
jQuery Internals + Cool Stuff
06.01 sql select distinct
Royal Dutch Shell plc CFO Simon Henry - Barclays conference in New York, Sept...
Ad

Similar to [Paper introduction] Performance Capture of Interacting Characters with Handheld Kinects (20)

PPTX
Final Major project a b c d e f g h i j k l m
PPTX
Human action recognition with kinect using a joint motion descriptor
PPTX
Action_recognition-topic.pptx
PPTX
Automated Video Analysis and Reporting for Construction Sites
PPTX
final_project_1_2k21cse07.pptx
PDF
PPTX
(Paper note) Real time rgb-d camera relocalization via randomized ferns for k...
PPTX
Virtualizing Real-life Lectures with vAcademia and Kinect
PDF
SkyStitch: a Cooperative Multi-UAV-based Real-time Video Surveillance System ...
PPTX
240315_Thanh_LabSeminar[G-TAD: Sub-Graph Localization for Temporal Action Det...
PDF
Dataset creation for Deep Learning-based Geometric Computer Vision problems
PPTX
Reading group - Week 2 - Trajectory Pooled Deep-Convolutional Descriptors (TDD)
PDF
Feature Tracking of Objects in Underwater Video Sequences
PPTX
VIBE: Video Inference for Human Body Pose and Shape Estimation
PPTX
Luigy Bertaglia Bortolo - Poster Final
PPTX
Cvpr 2018 papers review (efficient computing)
PDF
Synthesizing pseudo 2.5 d content from monocular videos for mixed reality
PDF
"High-resolution 3D Reconstruction on a Mobile Processor," a Presentation fro...
PPTX
Ai in cinematography and visual effects.
PPTX
Technical Seminar Ai in cinematography.pptx
Final Major project a b c d e f g h i j k l m
Human action recognition with kinect using a joint motion descriptor
Action_recognition-topic.pptx
Automated Video Analysis and Reporting for Construction Sites
final_project_1_2k21cse07.pptx
(Paper note) Real time rgb-d camera relocalization via randomized ferns for k...
Virtualizing Real-life Lectures with vAcademia and Kinect
SkyStitch: a Cooperative Multi-UAV-based Real-time Video Surveillance System ...
240315_Thanh_LabSeminar[G-TAD: Sub-Graph Localization for Temporal Action Det...
Dataset creation for Deep Learning-based Geometric Computer Vision problems
Reading group - Week 2 - Trajectory Pooled Deep-Convolutional Descriptors (TDD)
Feature Tracking of Objects in Underwater Video Sequences
VIBE: Video Inference for Human Body Pose and Shape Estimation
Luigy Bertaglia Bortolo - Poster Final
Cvpr 2018 papers review (efficient computing)
Synthesizing pseudo 2.5 d content from monocular videos for mixed reality
"High-resolution 3D Reconstruction on a Mobile Processor," a Presentation fro...
Ai in cinematography and visual effects.
Technical Seminar Ai in cinematography.pptx

[Paper introduction] Performance Capture of Interacting Characters with Handheld Kinects

  • 1. Notice • This power point is made by Mitsuru Nakazawa, NOT an original author, for paper introduction of ECCV2012 1
  • 2. Presenter: Mitsuru NAKAZAWA Performance Capture of Interacting Characters with Handheld Kinects Genzhi Ye1 Yebin Liu1 Nils Hasler2 Xiangyang Ji1 Qionghai Dai1 Christian Theobalt2 1: Deptartment of Automation, Tsinghua University 2: Graphics, Vision & Video Group, Max-Planck Institute for Informatics ECCV2012 paper introduction 2
  • 4. Related works Multi-view motion capture approaches  Reconstruct a skeletal motion model & detailed dynamic surface geometry  Deal with people wide apparel  Require controlled studio setup (many number of sync video cameras) Marker-less motion capture from a single range sensor  Estimate complex poses at real-time frame rates  Difficult to capture 3D, complex, detailed model 4
  • 5. Objective freely move Full performance capture Operator of moving humans using only 3 handheld, Performer moving Kinects  Reconstruct a skeleton motion & time varying surface geometry of humans in general apparel  Handle fast and complex motion with many self- occlusions & non-rid surface deformation  Not need studios with controlled lighting and many stationary cameras 5
  • 6. Data capture Operator Performer Capture environment Captured data from 3 Kinects  Asynchronous capture  Use a start recording signal to all PCs connected through Wi-Fi  Intrinsic calibration  Apply Zhang’s method  Alignment between the color image and the range data  Use the OpenNI API 6
  • 7. Scene models at time t • Human model – Laser scanner provide a static mesh with embedded skeleton of each performer [*] • 5,000 vertices of meshes • k-th performer’s Skeleton with 31 degrees of freedom: C tk GND (r=3m) • Ground plane model (fixed) – Center of Environment – Planar mesh with circular boundary • Camera extrinsic parameters of i-th Kinect – Translation, rotation: L tk 7 [*] F. Remondino: “3-D reconstruction of static human body shape from image sequence,” CVIU, Vol.93, No.1. pp.65-85
  • 8. Overview of the proposed method at time t Geometric matching of Kinect point to Point Could Segmentation vertices of a human model Optimization of skeleton & camera pose Non-rigid deformation of the human surface via Laplacian deformation 8
  • 9. Optimization of skeleton and camera pose Error function E is solved within iterative quasi-Newton minimization • Human region error between model vertices and Kinect point cloud • Ground region error between model vertices and Kinect point cloud • Difference of matched SIFT feature positions between previous and current time on background regions (SfM approach) Result using t−1 & t−1 Result based on SfM approach Optimized result 9
  • 10. Comparison with Multi-view Video Tracking Multi-view video trachking system with 10 calibrated cameras vs. Proposed method “Rolling” with slow motion  Similar results “Jump” with fast motion  proposed method gets better results 10
  • 11. Performance capture results on a variety of Sequences 11
  • 12. Conclusion • Simultaneously marker-less performance capture system with several hand-held Kinects – Iterative robust matching of tracked 3D models and input Kinect data 12
  • 13. References • Linear Blend Skinning (Accessed on Nov. 25th 2012 ) – http://bit.ly/RaijkQ • Motion Capture Using Joint Skeleton Tracking and Surface Estimation (Accessed on Nov. 26th 2012) – http://www.vision.ee.ethz.ch/~gallju/projects/skelsurf/ind ex.html 13