SlideShare a Scribd company logo
From 2D Image To 3D
Object
Extract Human-Object interaction from a 2D Images
Outline
Project Overview(How the inference work)
Previous Work (PROX model)
Methodology
Problem Statement
PHOSA Model Challenges
Future work(drawbacks)
Deployment Challenges
Summary
Introduction
Introduction
Why to extract 3D human-object interaction?
Domanis uses the 3D objects:
● Entertainment(VR, Gaming)
● Animation
● Simulation
● 3D Printing
● Manufacturing
● Advertising and Marketing
3D modeling enables companies to display their products in an
ideal state.
● Sciences and Geology
Introduction
Why to extract 3D human-object interaction?
The time to create a 3D character depends on many factors, like the level of
complexity, experience of the modeler, etc. Keeping in mind these factors,
it can take approximately 100 to 200 hours.
Problem Statement
Attempting to understand human object interaction, without
understanding the interaction we will get results with the same 2D
projection for different interpretations
Previous Work(PROX model)
Considers humans and their environments using a known 3D scene
PROX recovers 3d meshes of human in relation to that scene. This
approach depends on existing 3D captures of the scene, which are
not available in the wild, PHOSA attempts to reconstruct 3D human
and objects without 3D scans.
System Architecture:
Image PoseOptimizer
Frankmocap
Bodymocap
SMPL
3D Objects
3D
Human
PHOSA
PointRend
Detectron
FLASK API
Model Architecture
3D Global
reasoning
Project
Overview(How
the inference
work)
We choose “Perceiving 3D Human-Object Spatial Arrangements from
a Single Image in the Wild”, paper from Facebook AI Research, has
discussed European Conference on Computer Vision (ECCV) 2020,
which concentrates on extracting objects interactions without any
scene- or object-level 3D supervision. We aim for providing a 2D image
as input, and outputting a reasonable human-object interaction 3D
scenes through a Flask API.
Methodology
1. Data acquisition(COCO Dataset)
2. Data preparation(preparing the objects/json files)
3. Modeling(Explain PHOSA mode)
4. Evaluation(Explain loss weights)
5. Deployment
- COCO is a large-scale object detection, segmentation, and
captioning dataset with more than 331k image.
- Contains challenging images of humans interacting with everyday
objects obtained in uncontrolled settings.
Methodology →
1) Data acquisition(COCO Dataset)
The Approach genericity demonstrated by evaluating on objects
from 8 categories of varying size and interaction types.
● Bicycle
● Baseball_bat
● Tennis_Racket
● Motorcycle
● Laptop
● Bench
● Surfboard
● Skateboard
Methodology →
1) Data acquisition(COCO Dataset)
“We need to add a mesh
(3D model) for each class
of these”
● Resizing the Input image.
● So now we need to obtain and prepare these meshes for each category amd
preparing them using meshlab.
● All the meshes are pre-processed to be watertight and are simplified to make the the
optimization more efficient.
○ For the pre-processing, we first fill in the holes of the raw mesh models (e.g. the
holes in the wheels or tennis racket) to make the projection of the 3D models
consistent with the silhouettes obtained by the instance segmentation
algorithm.
○ Finally, we reduce the number of mesh vertices using MeshLab.
Methodology → 2) Data preparation
Methodology → 2) Data preparation
filling in the holes of the raw mesh models and reduced
the number of mesh vertices
Methodology → 2) Data preparation
The basic idea behind the model performance is that
the model exploits the interaction areas between
humans and each object of the previously listed.
So we had to label the interaction areas manually on the
meshes for each new class using Meshlab.
Part labeling
Generating corresponding file
● This is a file sample of the specific vertices of
bench object that had Contact with human
beings , generated as a json file after using
meshlab.
Methodology → 2) Data preparation
Methodology →
3) Modeling(Explain PHOSA model)
An optimization model cares about resulting reasonable interaction
PHOSA Model
(Theoretical perspective)
PHOSA Model
(Technical perspective)
PHOSA Submodels:
● PoseOptimizer
● PHOSA
PHOSA External Models:
● Detectron(PointRend)
● Frankmocap (BodyMocap)
● Multiperson(neural_rendrer)
PHOSA Model
(Technical perspective)
1. Detectron: predict human instance segmentation masks and object detection.
2. Frankmocap : for Human Pose Estimator to predict 3D human meshes.
3. PoseOptimizer: Find the optimal poses of an object based on object instances
segmented from Detectron.
4. PHOSA: Optimizes the human-object interaction based on some losses
5. Neural_rendrer: For 3D meshes representation and visualization.
PHOSA Model
(Technical perspective)
PoseOptimizer Frankmocap
Bodymocap
SMPL
3D Objects
3D Human
PHOSA
PointRend
Detectron
PHOSA Model (PointRend)
Uses detectron to predict human instance
segmentation
Detectron
PointRend
PHOSA Model
(Technical perspective)
Frankmocap
Bodymocap
SMPL
PointRend
Detectron
PHOSA Model
(Technical perspective)
PoseOptimizer
PointRend
Detectron
Frankmocap
Bodymocap
SMPL
Object instance mask
(Form pointRend)
PHOSA Model
(Technical perspective)
Differentiable renderer OR
PoseOptimizer
Object
Differentiable renderer → To solve
the object rotation and translation
that minimizes the Silhouette
reprojection error, helps in Object
Pose Estimation
Object instance mask
(Form pointRend)
PHOSA Model
(Technical perspective)
Differentiable renderer
OR PoseOptimizer
Object
- we visualize the final distribution
of object sizes learned for the
COCO-2017 test set after
optimizing for human interaction.
PHOSA Model
(Technical perspective)
PoseOptimizer
3D Objects
3D Human
PointRend
Detectron
Frankmocap
Bodymocap
SMPL
Given independently estimated 3D humans and 3D
objects
The weighted sum optimization of loss terms starts
w.r.t 6-Dof pose + Intrinsic scaling factor for each
object and human.
PHOSA Model
(Technical perspective)
PHOSA Model
(Technical perspective)
PoseOptimizer
3D Objects
3D Human
PHOSA
PointRend
Detectron
Frankmocap
Bodymocap
SMPL
4) PHOSA
PHOSA
Methodology → Evaluation(Explain loss
weights)
Loss Weights
Weighted sum of losses
Loss Terms
Occlusion-aware silhouette loss:
For optimizing object pose. Given an image, a 3D mesh
model, and instance masks, our occlusion-aware silhouette
loss finds the 6-DoF pose that most closely matches the
target mask (bottom right).
Loss Terms
Loss Terms
Human-object interaction loss:
- A part labeling approach an
inspiration taken from PROX
- Taking the contact regions on
the human body and on each
object mesh to encode
interaction parts.
Loss Terms
In order to identify human-object interaction, we first determine
if two parts interact by using 3D bounding boxes.
For example, people usually grab tennis rackets
by the handle using their hands As shown here,
the handle of the tennis racket and the hand of
the person are not in contact, but their 3D bounding boxes overlap.
Loss Terms
We impose our loss which pulls the interacting parts closer
together. Here, we visualize the improved arrangement.
To identify human-object interaction, the initial size of
objects is very important.
Loss Terms
Taking a prior on human and object intrinsic
scale:
- If the surfboard is initially a reasonable
size, say two and a half meters long, then
the surfboard is close enough to the
person to correctly detect interaction.
- If we were to initialize the surfboard
to be unrealistically large,
the 3D bounding boxes would no longer
overlap.
Loss Terms
Taking a prior on human and object intrinsic scale:
To start our optimization process, we use an
internet search to find the usual size of objects.
The red caret denotes the size resulting from the
hand-picked scale used for initialization. The
blue line denotes the size produced by the
empirical mean scale of all category instances at
the end of optimization.
Loss Terms
Ordinal depth loss(Correct depth ordering):
The depth ordering inferred from the 3D placement should match that of the
image.
Loss Terms
Ordinal depth loss(Correct depth ordering):
While the correct depth ordering of people and objects would also minimize the occlusion-
aware silhouette loss.
Loss Terms
Collision loss:
- Its used to avoid
interpenetration of human
and object that occupying
the same 3D space.
- Promoting proximity
between people and objects
can exacerbate the problem
of instances occupying the
same 3D space
Loss Terms
Importance of each loss
Deployment
AFTER TESTING THE
MODEL ON COLAB
NOTEBOOK
WE SWITCHED TO THE
NEXT STEP (AWS
CLOUD)
• We worked on EC2-
AWS linux-based
instance which is a
standard computing
unit on amazon web
service
Instance type
● We uploaded our project files to AWS cloud as it will facilitate Our work along
with the productivity speed , not to mention GPU resources.
● After some network issues we managed to use FLASK Successfully to deploy our
model not only on a local server But also publicly for anyone to use at
35.86.170.176:5000
Deployment
Files Hierarchical
All app files
● After labelling the interaction regions on both humans and object.
● Now it is time to add it all together to deploy our model
One final step before
deployment
First we assign each class name
to its 3d object mesh
Second we mark the contact
regions as follows:
Adding all together:
Final Results:-
● Now our model is ready to deploy!
● We chose to display our output divided into four images:
● 1- instance segmentation for the human and objects in the input
picture
● 2- human 3d mesh
● 3- interaction between human and object(front view)
● 4- interaction between human and object(top view)
Deployment
Here is a demo
● We are actually running four models all together to obtain this
result:
● 1- detecrton segmenter to obtain the first picture
● 2- frankmocap to generate human mesh
● 3- neural-renderer to create 3d object mask
● And last but not least our phosa model to optimize interaction
Between 3d human mesh and 3d object mesh
Some of the drawbacks of the
App is Time complexity
1- instead of initializing our model randomly (far away from the
best weights),during training time we consistently used the
optimized weights (output) as an input for next iterations.
2- we managed to reduce number of iterations to only 25 instead of
100 keeping most of the great quality of the output.(Accepted
tradeoff)
3- we can also reduce time complexity by switching to a higher GPU
Like RTX or workstations GPU.
Some work around solutions:
● After developing our application we tried our best to make it
public for anyone with ease of access despite the the huge size
of our model and the conflict libraries during installation
So,
What is next?
● We provide our project as a colab notebook for fast and easier interface
● Here :
https://colab.research.google.com/drive/18zkZd46CZ2GGIa5yYi8hBlAyLgggRN_j?usp=sh
aring
Colab notebook
● Some of the difficulties we confronted while pushing the project to GitHub is
the huge size of the project Since file limit size is only 100 MB.
● We used git LFS to escape this problem
● You can find our project repo at :
https://github.com/MohsenAziz/2d-images-to-3d-meshes_app-
deployment
Github repository
● We tried to deploy the project on Heruku platform but unfortunately it didn’t work due to
huge size of the project.
● We also tried to dockerize our project as an image but failed due to CUDA installation errors.
Further steps
PHOSA Model
Challenges
● Randomization
● Inference time
● Detectron different versions
● Torch version conflicts
● Prepare objects & JSON files
● Data annotation
● Sequential processes of the project
PHOSA Drawbacks
Human pose failure
Our human pose estimator
sometimes incorrectly estimates
the pose of the person
Object pose failure
The predicted masks are sometimes
unreliable for estimating the pose of
the object.
PHOSA Drawbacks
Incorrect reasoning about interaction due to scale
Future Work
● Add more classes
● Enhance complex images
● Make human object more reliable
● Change body motion capture with Whole
body Motion Capture (body + hands)
● Real-time human-object interaction
● Define mesh for specific human features
Thank you
Any Questions?

More Related Content

PDF
Objects as points (CenterNet) review [CDM]
PPTX
Introduction to Object recognition
PPT
Gesture recog parag
PPTX
Real Time Object Dectection using machine learning
PPTX
Towards Accurate Multi-person Pose Estimation in the Wild (My summery)
PDF
A Pointing Gesture-based Signal to Text Communication System Using OpenCV in ...
PDF
Object Detection and Tracking AI Robot
PPTX
slide-171212080528.pptx
Objects as points (CenterNet) review [CDM]
Introduction to Object recognition
Gesture recog parag
Real Time Object Dectection using machine learning
Towards Accurate Multi-person Pose Estimation in the Wild (My summery)
A Pointing Gesture-based Signal to Text Communication System Using OpenCV in ...
Object Detection and Tracking AI Robot
slide-171212080528.pptx

Similar to Final_From 2D Image To 3D Object.pptx (20)

PPTX
[DL輪読会]ClearGrasp
PPTX
A Comparative analysis of Traditional Deep Learning framework for 3D Object P...
PPTX
A Comparative analysis of Traditional Deep Learning framework
PDF
V2 v posenet
PPTX
Deep learning-for-pose-estimation-wyang-defense
PPTX
Presentation1.pptx
PDF
A Literature Survey: Neural Networks for object detection
PDF
Introduction to Face Processing with Computer Vision
PDF
Dragos_Papava_dissertation
PDF
FCN-Based 6D Robotic Grasping for Arbitrary Placed Objects
PDF
Partial Object Detection in Inclined Weather Conditions
PDF
2019 cvpr paper_overview
PDF
2019 cvpr paper overview by Ho Seong Lee
PDF
Human pose detection using machine learning by Grandel
PDF
IRJET - Human Pose Detection using Deep Learning
PDF
Burnaev and Notchenko. Skoltech. Bridging gap between 2D and 3D with Deep Lea...
PDF
Algorithm of detection, classification and gripping of occluded objects by C...
PPTX
Lecture 5,6(Comp Vision) [Auto-saved].pptx
PDF
BTP Report.pdf
PPTX
Indoor scene understanding for autonomous agents
[DL輪読会]ClearGrasp
A Comparative analysis of Traditional Deep Learning framework for 3D Object P...
A Comparative analysis of Traditional Deep Learning framework
V2 v posenet
Deep learning-for-pose-estimation-wyang-defense
Presentation1.pptx
A Literature Survey: Neural Networks for object detection
Introduction to Face Processing with Computer Vision
Dragos_Papava_dissertation
FCN-Based 6D Robotic Grasping for Arbitrary Placed Objects
Partial Object Detection in Inclined Weather Conditions
2019 cvpr paper_overview
2019 cvpr paper overview by Ho Seong Lee
Human pose detection using machine learning by Grandel
IRJET - Human Pose Detection using Deep Learning
Burnaev and Notchenko. Skoltech. Bridging gap between 2D and 3D with Deep Lea...
Algorithm of detection, classification and gripping of occluded objects by C...
Lecture 5,6(Comp Vision) [Auto-saved].pptx
BTP Report.pdf
Indoor scene understanding for autonomous agents

Recently uploaded (20)

PDF
Getting Started with Data Integration: FME Form 101
PDF
Hybrid model detection and classification of lung cancer
PPTX
Group 1 Presentation -Planning and Decision Making .pptx
PPTX
Chapter 5: Probability Theory and Statistics
PDF
Getting started with AI Agents and Multi-Agent Systems
PDF
From MVP to Full-Scale Product A Startup’s Software Journey.pdf
PPTX
Tartificialntelligence_presentation.pptx
PPTX
TLE Review Electricity (Electricity).pptx
PDF
Hindi spoken digit analysis for native and non-native speakers
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PPTX
O2C Customer Invoices to Receipt V15A.pptx
PDF
2021 HotChips TSMC Packaging Technologies for Chiplets and 3D_0819 publish_pu...
PDF
How ambidextrous entrepreneurial leaders react to the artificial intelligence...
PDF
Univ-Connecticut-ChatGPT-Presentaion.pdf
PDF
DP Operators-handbook-extract for the Mautical Institute
PPT
What is a Computer? Input Devices /output devices
PPTX
Modernising the Digital Integration Hub
PDF
A comparative study of natural language inference in Swahili using monolingua...
PPTX
The various Industrial Revolutions .pptx
PDF
Microsoft Solutions Partner Drive Digital Transformation with D365.pdf
Getting Started with Data Integration: FME Form 101
Hybrid model detection and classification of lung cancer
Group 1 Presentation -Planning and Decision Making .pptx
Chapter 5: Probability Theory and Statistics
Getting started with AI Agents and Multi-Agent Systems
From MVP to Full-Scale Product A Startup’s Software Journey.pdf
Tartificialntelligence_presentation.pptx
TLE Review Electricity (Electricity).pptx
Hindi spoken digit analysis for native and non-native speakers
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
O2C Customer Invoices to Receipt V15A.pptx
2021 HotChips TSMC Packaging Technologies for Chiplets and 3D_0819 publish_pu...
How ambidextrous entrepreneurial leaders react to the artificial intelligence...
Univ-Connecticut-ChatGPT-Presentaion.pdf
DP Operators-handbook-extract for the Mautical Institute
What is a Computer? Input Devices /output devices
Modernising the Digital Integration Hub
A comparative study of natural language inference in Swahili using monolingua...
The various Industrial Revolutions .pptx
Microsoft Solutions Partner Drive Digital Transformation with D365.pdf

Final_From 2D Image To 3D Object.pptx

  • 1. From 2D Image To 3D Object Extract Human-Object interaction from a 2D Images
  • 2. Outline Project Overview(How the inference work) Previous Work (PROX model) Methodology Problem Statement PHOSA Model Challenges Future work(drawbacks) Deployment Challenges Summary Introduction
  • 3. Introduction Why to extract 3D human-object interaction? Domanis uses the 3D objects: ● Entertainment(VR, Gaming) ● Animation ● Simulation ● 3D Printing ● Manufacturing ● Advertising and Marketing 3D modeling enables companies to display their products in an ideal state. ● Sciences and Geology
  • 4. Introduction Why to extract 3D human-object interaction? The time to create a 3D character depends on many factors, like the level of complexity, experience of the modeler, etc. Keeping in mind these factors, it can take approximately 100 to 200 hours.
  • 5. Problem Statement Attempting to understand human object interaction, without understanding the interaction we will get results with the same 2D projection for different interpretations
  • 6. Previous Work(PROX model) Considers humans and their environments using a known 3D scene PROX recovers 3d meshes of human in relation to that scene. This approach depends on existing 3D captures of the scene, which are not available in the wild, PHOSA attempts to reconstruct 3D human and objects without 3D scans.
  • 7. System Architecture: Image PoseOptimizer Frankmocap Bodymocap SMPL 3D Objects 3D Human PHOSA PointRend Detectron FLASK API Model Architecture 3D Global reasoning
  • 8. Project Overview(How the inference work) We choose “Perceiving 3D Human-Object Spatial Arrangements from a Single Image in the Wild”, paper from Facebook AI Research, has discussed European Conference on Computer Vision (ECCV) 2020, which concentrates on extracting objects interactions without any scene- or object-level 3D supervision. We aim for providing a 2D image as input, and outputting a reasonable human-object interaction 3D scenes through a Flask API.
  • 9. Methodology 1. Data acquisition(COCO Dataset) 2. Data preparation(preparing the objects/json files) 3. Modeling(Explain PHOSA mode) 4. Evaluation(Explain loss weights) 5. Deployment
  • 10. - COCO is a large-scale object detection, segmentation, and captioning dataset with more than 331k image. - Contains challenging images of humans interacting with everyday objects obtained in uncontrolled settings. Methodology → 1) Data acquisition(COCO Dataset)
  • 11. The Approach genericity demonstrated by evaluating on objects from 8 categories of varying size and interaction types. ● Bicycle ● Baseball_bat ● Tennis_Racket ● Motorcycle ● Laptop ● Bench ● Surfboard ● Skateboard Methodology → 1) Data acquisition(COCO Dataset) “We need to add a mesh (3D model) for each class of these”
  • 12. ● Resizing the Input image. ● So now we need to obtain and prepare these meshes for each category amd preparing them using meshlab. ● All the meshes are pre-processed to be watertight and are simplified to make the the optimization more efficient. ○ For the pre-processing, we first fill in the holes of the raw mesh models (e.g. the holes in the wheels or tennis racket) to make the projection of the 3D models consistent with the silhouettes obtained by the instance segmentation algorithm. ○ Finally, we reduce the number of mesh vertices using MeshLab. Methodology → 2) Data preparation
  • 13. Methodology → 2) Data preparation filling in the holes of the raw mesh models and reduced the number of mesh vertices
  • 14. Methodology → 2) Data preparation The basic idea behind the model performance is that the model exploits the interaction areas between humans and each object of the previously listed. So we had to label the interaction areas manually on the meshes for each new class using Meshlab. Part labeling
  • 15. Generating corresponding file ● This is a file sample of the specific vertices of bench object that had Contact with human beings , generated as a json file after using meshlab. Methodology → 2) Data preparation
  • 16. Methodology → 3) Modeling(Explain PHOSA model) An optimization model cares about resulting reasonable interaction PHOSA Model (Theoretical perspective)
  • 17. PHOSA Model (Technical perspective) PHOSA Submodels: ● PoseOptimizer ● PHOSA PHOSA External Models: ● Detectron(PointRend) ● Frankmocap (BodyMocap) ● Multiperson(neural_rendrer)
  • 18. PHOSA Model (Technical perspective) 1. Detectron: predict human instance segmentation masks and object detection. 2. Frankmocap : for Human Pose Estimator to predict 3D human meshes. 3. PoseOptimizer: Find the optimal poses of an object based on object instances segmented from Detectron. 4. PHOSA: Optimizes the human-object interaction based on some losses 5. Neural_rendrer: For 3D meshes representation and visualization.
  • 19. PHOSA Model (Technical perspective) PoseOptimizer Frankmocap Bodymocap SMPL 3D Objects 3D Human PHOSA PointRend Detectron
  • 20. PHOSA Model (PointRend) Uses detectron to predict human instance segmentation Detectron PointRend
  • 23. Object instance mask (Form pointRend) PHOSA Model (Technical perspective) Differentiable renderer OR PoseOptimizer Object Differentiable renderer → To solve the object rotation and translation that minimizes the Silhouette reprojection error, helps in Object Pose Estimation
  • 24. Object instance mask (Form pointRend) PHOSA Model (Technical perspective) Differentiable renderer OR PoseOptimizer Object - we visualize the final distribution of object sizes learned for the COCO-2017 test set after optimizing for human interaction.
  • 25. PHOSA Model (Technical perspective) PoseOptimizer 3D Objects 3D Human PointRend Detectron Frankmocap Bodymocap SMPL
  • 26. Given independently estimated 3D humans and 3D objects The weighted sum optimization of loss terms starts w.r.t 6-Dof pose + Intrinsic scaling factor for each object and human. PHOSA Model (Technical perspective)
  • 27. PHOSA Model (Technical perspective) PoseOptimizer 3D Objects 3D Human PHOSA PointRend Detectron Frankmocap Bodymocap SMPL
  • 29. Methodology → Evaluation(Explain loss weights) Loss Weights Weighted sum of losses
  • 30. Loss Terms Occlusion-aware silhouette loss: For optimizing object pose. Given an image, a 3D mesh model, and instance masks, our occlusion-aware silhouette loss finds the 6-DoF pose that most closely matches the target mask (bottom right).
  • 32. Loss Terms Human-object interaction loss: - A part labeling approach an inspiration taken from PROX - Taking the contact regions on the human body and on each object mesh to encode interaction parts.
  • 33. Loss Terms In order to identify human-object interaction, we first determine if two parts interact by using 3D bounding boxes. For example, people usually grab tennis rackets by the handle using their hands As shown here, the handle of the tennis racket and the hand of the person are not in contact, but their 3D bounding boxes overlap.
  • 34. Loss Terms We impose our loss which pulls the interacting parts closer together. Here, we visualize the improved arrangement. To identify human-object interaction, the initial size of objects is very important.
  • 35. Loss Terms Taking a prior on human and object intrinsic scale: - If the surfboard is initially a reasonable size, say two and a half meters long, then the surfboard is close enough to the person to correctly detect interaction. - If we were to initialize the surfboard to be unrealistically large, the 3D bounding boxes would no longer overlap.
  • 36. Loss Terms Taking a prior on human and object intrinsic scale: To start our optimization process, we use an internet search to find the usual size of objects. The red caret denotes the size resulting from the hand-picked scale used for initialization. The blue line denotes the size produced by the empirical mean scale of all category instances at the end of optimization.
  • 37. Loss Terms Ordinal depth loss(Correct depth ordering): The depth ordering inferred from the 3D placement should match that of the image.
  • 38. Loss Terms Ordinal depth loss(Correct depth ordering): While the correct depth ordering of people and objects would also minimize the occlusion- aware silhouette loss.
  • 39. Loss Terms Collision loss: - Its used to avoid interpenetration of human and object that occupying the same 3D space. - Promoting proximity between people and objects can exacerbate the problem of instances occupying the same 3D space
  • 42. AFTER TESTING THE MODEL ON COLAB NOTEBOOK WE SWITCHED TO THE NEXT STEP (AWS CLOUD)
  • 43. • We worked on EC2- AWS linux-based instance which is a standard computing unit on amazon web service Instance type
  • 44. ● We uploaded our project files to AWS cloud as it will facilitate Our work along with the productivity speed , not to mention GPU resources. ● After some network issues we managed to use FLASK Successfully to deploy our model not only on a local server But also publicly for anyone to use at 35.86.170.176:5000 Deployment
  • 47. ● After labelling the interaction regions on both humans and object. ● Now it is time to add it all together to deploy our model One final step before deployment
  • 48. First we assign each class name to its 3d object mesh
  • 49. Second we mark the contact regions as follows:
  • 52. ● Now our model is ready to deploy! ● We chose to display our output divided into four images: ● 1- instance segmentation for the human and objects in the input picture ● 2- human 3d mesh ● 3- interaction between human and object(front view) ● 4- interaction between human and object(top view) Deployment
  • 53. Here is a demo
  • 54. ● We are actually running four models all together to obtain this result: ● 1- detecrton segmenter to obtain the first picture ● 2- frankmocap to generate human mesh ● 3- neural-renderer to create 3d object mask ● And last but not least our phosa model to optimize interaction Between 3d human mesh and 3d object mesh Some of the drawbacks of the App is Time complexity
  • 55. 1- instead of initializing our model randomly (far away from the best weights),during training time we consistently used the optimized weights (output) as an input for next iterations. 2- we managed to reduce number of iterations to only 25 instead of 100 keeping most of the great quality of the output.(Accepted tradeoff) 3- we can also reduce time complexity by switching to a higher GPU Like RTX or workstations GPU. Some work around solutions:
  • 56. ● After developing our application we tried our best to make it public for anyone with ease of access despite the the huge size of our model and the conflict libraries during installation So, What is next?
  • 57. ● We provide our project as a colab notebook for fast and easier interface ● Here : https://colab.research.google.com/drive/18zkZd46CZ2GGIa5yYi8hBlAyLgggRN_j?usp=sh aring Colab notebook
  • 58. ● Some of the difficulties we confronted while pushing the project to GitHub is the huge size of the project Since file limit size is only 100 MB. ● We used git LFS to escape this problem ● You can find our project repo at : https://github.com/MohsenAziz/2d-images-to-3d-meshes_app- deployment Github repository
  • 59. ● We tried to deploy the project on Heruku platform but unfortunately it didn’t work due to huge size of the project. ● We also tried to dockerize our project as an image but failed due to CUDA installation errors. Further steps
  • 60. PHOSA Model Challenges ● Randomization ● Inference time ● Detectron different versions ● Torch version conflicts ● Prepare objects & JSON files ● Data annotation ● Sequential processes of the project
  • 61. PHOSA Drawbacks Human pose failure Our human pose estimator sometimes incorrectly estimates the pose of the person Object pose failure The predicted masks are sometimes unreliable for estimating the pose of the object.
  • 62. PHOSA Drawbacks Incorrect reasoning about interaction due to scale
  • 63. Future Work ● Add more classes ● Enhance complex images ● Make human object more reliable ● Change body motion capture with Whole body Motion Capture (body + hands) ● Real-time human-object interaction ● Define mesh for specific human features