SlideShare a Scribd company logo
How to apply Deep
Learning to 3D
objects
The Goal▸ First Part(General Strategy): Get knowledge about
how to approach making a deep learning product
▸ Second Part(Specific case): Get knowledge about
deep learning applied to 3D objects
2
Table of Contents
Strategy
1. Find the right problem
2. Find the right method
a. Getting Information
3. Keep rechallenging
a. Keep tries small
b. A lot of challenges
4. Focus
a. Focus on the right problem
b. Raise priority of the right method
3
Reference: https://www.iconfinder.com/
Table of Contents
Specific Case
1. Deep Learning Model: VoxNet
a. 3D CNN
b. 3D Max Pool
c. Fully connected
d. Output
e. Train
2. Improve technique
a. Accuracy
b. Speed
3. Results
4
HELLO!
I am Masaya Ohgushi
I work at Kabuku Inc.
I am an image processing
developer.
Twitter: @SnowGushiGit
5
Kabuku Inc.
・On-demand
manufacturing service
- Receive 3D data to
manufacture by using
3D printers and others
6
https://www.kabuku.co.jp/
Strategy
7
Strategy
Find the Right problem
8
“Would you like to use deep learning
to solve your problem?
Find the right problem
▸ What are the best cases to use Deep
learning?
▹ Most cases
▹ Image processing
▹ Speech recognition
▹ Some cases
▹ Natural language processing
▹ Time series analysis
10
Right problem
▸ What are the worst cases to use Deep
learning?
▹ Not enough data
▹ Can’t prepare a pre-train model
▹ Need 100% accuracy
11
Strategy
Find the right method
12
“How can you find the best way to
solve your problem using deep
learning?
“Let’s search Google !!
“
Not possible!!
Find the Right method
▸ How to research the best deep learning
solution? (In my case)
▹ Google scholar
▹ It is a paper search engine
▹ You can find the following
▹ Good methods
▹ Good keywords
▹ Which university laboratories know about this
problem
16
Find the Right method
▸ How to research the best deep learning
solution? (In my case)
▹ University laboratory sites
▹ It is possible to get data
▹ It is possible to get code
▹ GitXiv
▹ You can find papers and code
▹ Follow twitter users
▹ It is possible to get the latest information
17
Find the Right method
▸ How to research the best deep learning
solution? (In my case)
▹ Book
▹ You can get well structured knowledge
▹ ArXiv
▹ You can find the latest methods
▹ Github
▹ You can find code
▹ Google
▹ If you already know a good keyword !!
18
Strategy
Keep rechallenging
19
“You gathered a lot of training data !!
Now let’s train using the full data set.
“
Not possible!!
Keep rechallenging
▸ Keep tries small
▹ If you get a lot of data, first do the following
things.
▹ Prepare a small data set
▹ Check module works correctly
▹ Prepare an easy to verify training data set
▹ Most models can be trained with data such as “mnist” ,
you have to check it works
22
Reference: https://www.tensorflow.org/get_started/mnist/beginners
Keep rechallenging
▸ A lot of challenges
▹ There are no obvious methods to improve accuracy
▹ You have to check the results
▹ If training and validation accuracies don’t improve, you
have to stop it
▹ Check the results by visual boards such as TensorBoard
▹ You have to increasing challenge times by
improving the calculation speed
▹ Using GPU
▹ Optimize CPU
23
Strategy
Focus
24
“
Deep learning has a lot of methods
to improve accuracy
“- Model
- How deep
- How it is structured
- Adjusting the hyper parameters
- Preprocess data
- Data augmentation (If using Graphical
data)
- Optimizer
- SGD, Adam, etc
Focus
▸ Focus on the right problem
▹ Depends on your situation
▹ Enough computation resource and enough
data
▹ Try a deep and complex model
▹ Enough computation resource, but not
enough data
▹ Find a good pre-train model
▹ Focus on the preprocess such as data augmentation
27
Focus
▸ Focus on the right problem
▹ Depending on your situation
▹ Not enough computation resource or data
▹ Consider other ways to solve your problem
▹ Logistic Regression, SVM, Random Forest
▹ Deep learning probably isn’t the best choice
28
Specific Case
Deep Learning applied to 3D objects
29
Specific Case
Deep Learning Model: VoxNet
30
“There are a lot of deep learning
models…
How to choose one
Deep Learning Model: VoxNet
32
▸ In my case, I considered 3 things
▹ Resource
▹ Computation resources
▹ Human resources
▹ Performance
▹ accuracy
▹ Speed
▹ Speed of development
Deep Learning Model: VoxNet
33
Reference: http://ri.cmu.edu/pub_files/2015/9/voxnet_maturana_scherer_iros15.pdf.
Deep Learning Model: VoxNet
34
▸ VoxNet Advantage
▹ Resource
▹ Computation resources
▹ Good
▹ Memory 32GB (In my environment)
▹ GPU GeForce GTX 1080 (In my environment)
▹ Performance
▹ Accuracy
▹ 83 % accuracy (Top model 95 %)
▹ Speed
▹ Open source, simple code
35
Deep Learning Model: VoxNet
▸ Voxelize
▹ Maps 3D data to a 32 * 32 * 32
voxel
▹ Reduce data size
Deep Learning Model: VoxNet (3D CNN 3D objects)
36
▸ Convolution 3D
Reference: https://www.youtube.com/watch?v=ecbeIRVqD7g
Deep Learning Model: VoxNet (3D CNN 3D objects)
37
1
5
1
6
2
7
4
8
3
1
2
2
1
3
0
4
kernel: 2x2
stride: 2
Convolution 2D
Input Image(4x4)
5 1
3 2
1*5+1*1
+
5*3+6*2
2*5+4*1
+
7*3+8*2
3*5+2*1
+
1*3+2*2
1*5+0*1
+
3*3+4*2
33 51
24 8
Convoluted Image(2x2)
Deep Learning Model: VoxNet (3D CNN 3D objects)
38
▸ Convolution 3D
Deep Learning Model: VoxNet (3D CNN 3D objects)
39
▸ Convolution 3D
Deep Learning Model: VoxNet (3D CNN 3D objects)
40
▸ Convolution 3D
Deep Learning Model: VoxNet (3D CNN 3D objects)
41
▸ Convolution 3D
Deep Learning Model: VoxNet (3D CNN 3D objects)
42
▸ Convolution 3D
Deep Learning Model: VoxNet (3D CNN 3D objects)
43
▸ Convolution 3D
Deep Learning Model: VoxNet (3D CNN 3D objects)
44
▸ Convolution 3D
Deep Learning Model: VoxNet (3D CNN 3D objects)
45
7 times
▸ Convolution 3D
Deep Learning Model: VoxNet (3D CNN 3D objects)
46
32
filter
:
:
3DCNN
Conv3D(input_shape=(32, 32, 32, 1),
kernel_size=(5, 5, 5),
strides=(2, 2, 2),
data_format=”channels_last”
)
▸ Convolution 3D
Deep Learning Model: VoxNet (Max Pool3D)
47
1
5
1
6
2
7
4
8
3
1
2
2
1
3
0
4
6 8
3 4
Max pool
Pooling size: 2 x 2
Stride: 2
▸ Max Pool 2D
Convolution Feature
Deep Learning Model: VoxNet (Max Pool3D)
48
▸ Max Pool 3D
Convolution
Feature
Pool window
Deep Learning Model: VoxNet (Max Pool3D)
49
Convolution
Feature
▸ Max Pool 3D
Deep Learning Model: VoxNet (Max Pool3D)
50
Convolution
Feature
▸ Max Pool 3D
Deep Learning Model: VoxNet (Max Pool3D)
51
MaxPooling3D(pool_size=(2, 2, 2),
data_format='channels_last',)
▸ Max Pool 3D
Deep Learning Model: VoxNet (Max Pool3D)
52
▸ Fully Connected and Output
3DCNN &
3DMaxPool
:
:
Fully
connected
Dense
128
Dense
number of
class
softmax
Deep Learning Model: VoxNet (Flatten, Dense)
53
▸ Fully Connected and Output
▹ Softmax function
▹ It maps output as a probability distribution
▹ It is easy to differentiate
Deep Learning Model: VoxNet (Output)
54
▸ Fully Connected and Output
model.add(Flatten())
model.add(Dense(128, activation='linear',))
model.add(Dense(output_dim=number_class,
activation='linear',))
model.add(Activation("softmax"))
Deep Learning Model: VoxNet
55
model = Sequential()
model.add(Conv3D(input_shape=(32, 32, 32, 1),
kernel_size=(5, 5, 5), strides=(2, 2, 2),
data_format='channels_last',))
model.add(Conv3D(input_shape=(32, 32, 32, 1),
kernel_size=(3, 3, 3), strides=(1, 1, 1),
data_format='channels_last',))
model.add(MaxPooling3D(pool_size=(2, 2, 2),
data_format='channels_last',))
model.add(Flatten())
model.add(Dense(128, activation='linear',))
model.add(Dense(output_dim=number_class, activation='linear',))
model.add(Activation(‘softmax’))
▸ Model
Deep Learning Model: VoxNet
56
▸ Train
model.compile(loss='categorical_crossentropy',
metrics=['accuracy'])
model.fit(x_voxel_data, y_class_label)
Specific Case
Improve technique accuracy
57
Improve technique
58
▸ Improving accuracy has 2 approaches
▹ Model
▹ Advantage
▹ A variety of ways to improve accuracy
▹ Disadvantage
▹ A deep model takes a lot of resources
▹ It is not obvious which model is better
▹ Data
▹ Advantage
▹ The effect of changes are obvious
▹ Disadvantage
▹ Approaches are limited
Improve accuracy
59
▸ Improve accuracy for validation data(In
my case)
▹ Model
▹ RandomDropout
▹ LeakyRelu
▹ Data
▹ Data augmentation(3D data)
▹ Data increase
▹ Class weight for Unbalanced category data
Improve technique
60
▸ Data Approach
▹ Data Augmentation has advantages over other
methods
▹ The effects are obvious
▹ It does not increase calculation time unlike
adding layers to the model
Improve technique
61
▸ Data Augmentation
▹ Rotation
▹ Shift
▹ Shear
▹ etc...
Improve technique
62
▸ Data Augmentation 3D
▹ Augmentation_matrix is changing
channel_images = [ndi.interpolation.affine_transform(x,
augmentation_matrix,
)for x in x_voxel]
x = np.stack(channel_images, axis=0)
Improve technique
63
▸ Data Augmentation 3D Rotation
rotation_matrix_y = np.array([[np.cos(theta), 0, np.sin(theta) , 0],
[0 , 1, 0 , 0],
[-np.sin(theta), 0, np.cos(theta), 0],
[0 , 0 , 0 , 1]])
Improve technique
64
▸ Data Augmentation 3D Shift
shift_matrix = np.array([[1, 0, 0, shift_x],
[0, 1, 0, shift_y],
[0, 0, 1, shift_z],
[0, 0, 0, 1 ]])
Improve technique
65
▸ Data Augmentation 3D Shear
shear_matrix = np.array([[1 , shear_x, shear_x, 0],
[shear_y, 1 , shear_y, 0],
[shear_z, shear_z, 1 , 0],
[0 , 0 , 0 , 1]
])
Improve accuracy
66
▸ Data increase
Add Data augmented data to
Training data
Specific Case
Improve technique speed
67
Improve calculation speed
68
▸ Deep Learning has a lot of ways to
improve calculation speed (In my case)
▹ Use a GPU(GeForce GTX 1080: Memory 8GB)
▹ CPU optimization
▹ Multi thread
▹ Prepare feature set
Improve calculation speed
69
▸ CPU optimize (TensorFlow build option)
bazel build -c opt --copt=-mavx --copt=-mavx2 --copt=-mfma
Specific Case
Results
70
Results
71
▸ Validation Accuracy
Result
72
Method Explanation
Training
(accuracy)
Validation
(accuracy)
BaseLine BaseLine 90% 79%
Shift_x_Shift_y Data augmentation(x-shift, y-shift) 80% 80%
Shift_x_Shift_y_class
_weight
Data augmentation(x-shift, y-shift) + class
weight 80% 83%
Add_Shift_x_Shift_y_
class_weight
Data augmentation(x-shift, y-shift) + class
weight + ADD(x-shift, y-shift) 85% 85%
Conclusion
73
Conclusion
Summary of this presentation
74
Conclusion
75
Ref: https://www.iconfinder.com/, https://github.com/, https://arxiv.org/, http://www.gitxiv.com/, https://scholar.google.co.jp/
Right problem
laboratory
Rechallenge
Focus
▸ Strategy
Right Problem Rechallenge FocusRight Method
Conclusion
76
▸ Our case
Right Problem Right Method Rechallenge Focus
Data augmentation
Customize model
3D object recognition VoxNet
On-demand
manufacturing service
We’re hiring
Recruiting
77
Deep Learning
For 3D objects
It is a rare case, Implementing deep
learning for 3D objects
78
“We are hiring!!
- Deep Learning for 3D objects
- Working in Japan
https://www.wantedly.com/project
s/111707
contact@kabuku.co.jp
80
THANKS!
Any questions?
You can find me at @SnowGushiGit &
masaya.ohgushi@kabuku.co.jp
References
81
References
▸ MNIST datasets
▹ https://www.tensorflow.org/get_started/mnist/beginners
▸ Flaticon
▹ www.flaticon.com
▸ 3D CNN-Action Recognition Part-1
▹ https://www.youtube.com/watch?v=ecbeIRVqD7g&t=82s
▸ Bengio, Yoshua, et al. "Curriculum learning." Proceedings of the 26th annual
international conference on machine learning. ACM, 2009.
▸ He, Kaiming, et al. "Deep residual learning for image recognition." Proceedings of the
IEEE Conference on Computer Vision and Pattern Recognition. 2016.
▸ Yann, Margot Lisa-Jing, and Yichuan Tang. "Learning Deep Convolutional Neural
Networks for X-Ray Protein Crystallization Image Analysis." AAAI. 2016.
▸ Maturana, Daniel, and Sebastian Scherer. "Voxnet: A 3d convolutional neural network
for real-time object recognition." Intelligent Robots and Systems (IROS), 2015 IEEE/RSJ
International Conference on. IEEE, 2015.
82
References
▸ Deep Learning Book
▹ http://www.deeplearningbook.org/
▸ IconFinder
▹ https://www.iconfinder.com/,
▸ Github
▹ https://github.com/,
▸ Arxiv
▹ https://arxiv.org/
▸ GitXiv
▹ http://www.gitxiv.com/
▸ GoogleSchlor
▹ https://scholar.google.co.jp/
▸
83

More Related Content

PDF
Bayesian networks in AI
PPTX
Deep Learning with TensorFlow: Understanding Tensors, Computations Graphs, Im...
PPTX
Machine Learning: A gentle Introduction
PPTX
Deep Learning for AI (2)
PDF
ModuLab DLC-Medical3
PPTX
Deep Learning for AI (3)
PPTX
"Semantic Indexing of Wearable Camera Images: Kids’Cam Concepts"
PDF
Deep learning - A Visual Introduction
Bayesian networks in AI
Deep Learning with TensorFlow: Understanding Tensors, Computations Graphs, Im...
Machine Learning: A gentle Introduction
Deep Learning for AI (2)
ModuLab DLC-Medical3
Deep Learning for AI (3)
"Semantic Indexing of Wearable Camera Images: Kids’Cam Concepts"
Deep learning - A Visual Introduction

What's hot (20)

PDF
[215]streetwise machine learning for painless parking
PDF
Convolutional neural network
PDF
Distance oracle - Truy vấn nhanh khoảng cách giữa hai điểm bất kỳ trên đồ thị
PDF
Reversible Data Hiding in the Spatial and Frequency Domains
PPTX
Neural networks with python
PPTX
Machine Learning, Deep Learning and Data Analysis Introduction
PDF
How to win data science competitions with Deep Learning
PDF
Notes from Coursera Deep Learning courses by Andrew Ng
PPTX
Building a Better TSA Screening Algorithm
PDF
Intro to TensorFlow and PyTorch Workshop at Tubular Labs
PPTX
Deep learning based recommender systems (lab seminar paper review)
PDF
[DSC 2016] 系列活動:李宏毅 / 一天搞懂深度學習
PPTX
Deep learning
PPTX
Introduction to Deep learning
PPTX
Deep Learning with Python (PyData Seattle 2015)
PDF
Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020
PDF
Information Hiding using LSB Technique based on Developed PSO Algorithm
PPTX
Introduction to Deep Learning
PDF
Deep Learning - Overview of my work II
PDF
Approximate "Now" is Better Than Accurate "Later"
[215]streetwise machine learning for painless parking
Convolutional neural network
Distance oracle - Truy vấn nhanh khoảng cách giữa hai điểm bất kỳ trên đồ thị
Reversible Data Hiding in the Spatial and Frequency Domains
Neural networks with python
Machine Learning, Deep Learning and Data Analysis Introduction
How to win data science competitions with Deep Learning
Notes from Coursera Deep Learning courses by Andrew Ng
Building a Better TSA Screening Algorithm
Intro to TensorFlow and PyTorch Workshop at Tubular Labs
Deep learning based recommender systems (lab seminar paper review)
[DSC 2016] 系列活動:李宏毅 / 一天搞懂深度學習
Deep learning
Introduction to Deep learning
Deep Learning with Python (PyData Seattle 2015)
Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020
Information Hiding using LSB Technique based on Developed PSO Algorithm
Introduction to Deep Learning
Deep Learning - Overview of my work II
Approximate "Now" is Better Than Accurate "Later"
Ad

Viewers also liked (6)

PPTX
Type Annotations in Python: Whats, Whys and Wows!
PDF
Europy17_dibernardo
PPTX
EuroPython 2017 - How to make money with your Python open-source project
PDF
Mock it right! A beginner’s guide to world of tests and mocks, Maciej Polańczyk
PDF
OpenAPI development with Python
PDF
EuroPython 2017 - PyData - Deep Learning your Broadband Network @ HOME
Type Annotations in Python: Whats, Whys and Wows!
Europy17_dibernardo
EuroPython 2017 - How to make money with your Python open-source project
Mock it right! A beginner’s guide to world of tests and mocks, Maciej Polańczyk
OpenAPI development with Python
EuroPython 2017 - PyData - Deep Learning your Broadband Network @ HOME
Ad

Similar to How to apply deep learning to 3 d objects (20)

PPTX
Deeplearning
PDF
IBM Cloud Paris Meetup 20180517 - Deep Learning Challenges
PDF
Python for Image Understanding: Deep Learning with Convolutional Neural Nets
PDF
Deep learning
PPTX
Introduction to transfer learning,aster way of adapting a neural network by e...
PPTX
Deep Learning Fundamentals
PDF
Tutorial on Deep Learning
PPTX
deeplearningpresentation-180625071236.pptx
PPTX
Deep learning presentation
PDF
Deep Style: Using Variational Auto-encoders for Image Generation
PDF
"Enabling Ubiquitous Visual Intelligence Through Deep Learning," a Keynote Pr...
PDF
Deep learning in Computer Vision
PDF
H2O Open Source Deep Learning, Arno Candel 03-20-14
PPTX
Deep cv 101
PDF
Comparative Study of Pre-Trained Neural Network Models in Detection of Glaucoma
PPTX
CNN_INTRO.pptx
PDF
Cheatsheet deep-learning-tips-tricks
PPTX
Introduction of deep learning in cse.pptx
PPTX
Details of Lazy Deep Learning for Images Recognition in ZZ Photo app
PPTX
1. Introduction to deep learning.pptx
Deeplearning
IBM Cloud Paris Meetup 20180517 - Deep Learning Challenges
Python for Image Understanding: Deep Learning with Convolutional Neural Nets
Deep learning
Introduction to transfer learning,aster way of adapting a neural network by e...
Deep Learning Fundamentals
Tutorial on Deep Learning
deeplearningpresentation-180625071236.pptx
Deep learning presentation
Deep Style: Using Variational Auto-encoders for Image Generation
"Enabling Ubiquitous Visual Intelligence Through Deep Learning," a Keynote Pr...
Deep learning in Computer Vision
H2O Open Source Deep Learning, Arno Candel 03-20-14
Deep cv 101
Comparative Study of Pre-Trained Neural Network Models in Detection of Glaucoma
CNN_INTRO.pptx
Cheatsheet deep-learning-tips-tricks
Introduction of deep learning in cse.pptx
Details of Lazy Deep Learning for Images Recognition in ZZ Photo app
1. Introduction to deep learning.pptx

More from Ogushi Masaya (11)

PPTX
Deep reinforcement learning for imbalanced classification
PPTX
Hidden technical debt in machine learning systems(日本語資料)
PDF
Deep and confident prediction for time series at uber
PDF
A dual stage attention-based recurrent neural network for time series prediction
PDF
Kerasを用いた3次元検索エンジン@TFUG
PDF
EuroPython 2017 外部向け報告会
PDF
自然言語処理に適した ニューラルネットのフレームワーク - - - DyNet - - -
PPTX
人工知能の技術で有名なニューラルネットワークのフレームワークである #Chainer を用いた対話botを使った俺の屍を越えてゆけ slide share
PPTX
Step by Stepで学ぶ自然言語処理における深層学習の勘所
PPTX
Wikipedia Entity VectorとWordNetで
対話内容を選定し Chainer を用いたAttentionモデルで 発話内の重要な単語...
PDF
Chainer with natural language processing hands on
Deep reinforcement learning for imbalanced classification
Hidden technical debt in machine learning systems(日本語資料)
Deep and confident prediction for time series at uber
A dual stage attention-based recurrent neural network for time series prediction
Kerasを用いた3次元検索エンジン@TFUG
EuroPython 2017 外部向け報告会
自然言語処理に適した ニューラルネットのフレームワーク - - - DyNet - - -
人工知能の技術で有名なニューラルネットワークのフレームワークである #Chainer を用いた対話botを使った俺の屍を越えてゆけ slide share
Step by Stepで学ぶ自然言語処理における深層学習の勘所
Wikipedia Entity VectorとWordNetで
対話内容を選定し Chainer を用いたAttentionモデルで 発話内の重要な単語...
Chainer with natural language processing hands on

Recently uploaded (20)

PPTX
ANEMIA WITH LEUKOPENIA MDS 07_25.pptx htggtftgt fredrctvg
PPTX
Protein & Amino Acid Structures Levels of protein structure (primary, seconda...
PDF
VARICELLA VACCINATION: A POTENTIAL STRATEGY FOR PREVENTING MULTIPLE SCLEROSIS
PPTX
neck nodes and dissection types and lymph nodes levels
PPTX
7. General Toxicologyfor clinical phrmacy.pptx
PPTX
INTRODUCTION TO EVS | Concept of sustainability
PPTX
DRUG THERAPY FOR SHOCK gjjjgfhhhhh.pptx.
PDF
An interstellar mission to test astrophysical black holes
PDF
HPLC-PPT.docx high performance liquid chromatography
PPTX
ECG_Course_Presentation د.محمد صقران ppt
PPTX
Microbiology with diagram medical studies .pptx
PDF
IFIT3 RNA-binding activity primores influenza A viruz infection and translati...
PPTX
2. Earth - The Living Planet Module 2ELS
PPT
POSITIONING IN OPERATION THEATRE ROOM.ppt
PPTX
BIOMOLECULES PPT........................
PPTX
G5Q1W8 PPT SCIENCE.pptx 2025-2026 GRADE 5
PPTX
Comparative Structure of Integument in Vertebrates.pptx
PDF
Placing the Near-Earth Object Impact Probability in Context
PPT
protein biochemistry.ppt for university classes
PDF
Sciences of Europe No 170 (2025)
ANEMIA WITH LEUKOPENIA MDS 07_25.pptx htggtftgt fredrctvg
Protein & Amino Acid Structures Levels of protein structure (primary, seconda...
VARICELLA VACCINATION: A POTENTIAL STRATEGY FOR PREVENTING MULTIPLE SCLEROSIS
neck nodes and dissection types and lymph nodes levels
7. General Toxicologyfor clinical phrmacy.pptx
INTRODUCTION TO EVS | Concept of sustainability
DRUG THERAPY FOR SHOCK gjjjgfhhhhh.pptx.
An interstellar mission to test astrophysical black holes
HPLC-PPT.docx high performance liquid chromatography
ECG_Course_Presentation د.محمد صقران ppt
Microbiology with diagram medical studies .pptx
IFIT3 RNA-binding activity primores influenza A viruz infection and translati...
2. Earth - The Living Planet Module 2ELS
POSITIONING IN OPERATION THEATRE ROOM.ppt
BIOMOLECULES PPT........................
G5Q1W8 PPT SCIENCE.pptx 2025-2026 GRADE 5
Comparative Structure of Integument in Vertebrates.pptx
Placing the Near-Earth Object Impact Probability in Context
protein biochemistry.ppt for university classes
Sciences of Europe No 170 (2025)

How to apply deep learning to 3 d objects

  • 1. How to apply Deep Learning to 3D objects
  • 2. The Goal▸ First Part(General Strategy): Get knowledge about how to approach making a deep learning product ▸ Second Part(Specific case): Get knowledge about deep learning applied to 3D objects 2
  • 3. Table of Contents Strategy 1. Find the right problem 2. Find the right method a. Getting Information 3. Keep rechallenging a. Keep tries small b. A lot of challenges 4. Focus a. Focus on the right problem b. Raise priority of the right method 3 Reference: https://www.iconfinder.com/
  • 4. Table of Contents Specific Case 1. Deep Learning Model: VoxNet a. 3D CNN b. 3D Max Pool c. Fully connected d. Output e. Train 2. Improve technique a. Accuracy b. Speed 3. Results 4
  • 5. HELLO! I am Masaya Ohgushi I work at Kabuku Inc. I am an image processing developer. Twitter: @SnowGushiGit 5
  • 6. Kabuku Inc. ・On-demand manufacturing service - Receive 3D data to manufacture by using 3D printers and others 6 https://www.kabuku.co.jp/
  • 9. “Would you like to use deep learning to solve your problem?
  • 10. Find the right problem ▸ What are the best cases to use Deep learning? ▹ Most cases ▹ Image processing ▹ Speech recognition ▹ Some cases ▹ Natural language processing ▹ Time series analysis 10
  • 11. Right problem ▸ What are the worst cases to use Deep learning? ▹ Not enough data ▹ Can’t prepare a pre-train model ▹ Need 100% accuracy 11
  • 13. “How can you find the best way to solve your problem using deep learning?
  • 16. Find the Right method ▸ How to research the best deep learning solution? (In my case) ▹ Google scholar ▹ It is a paper search engine ▹ You can find the following ▹ Good methods ▹ Good keywords ▹ Which university laboratories know about this problem 16
  • 17. Find the Right method ▸ How to research the best deep learning solution? (In my case) ▹ University laboratory sites ▹ It is possible to get data ▹ It is possible to get code ▹ GitXiv ▹ You can find papers and code ▹ Follow twitter users ▹ It is possible to get the latest information 17
  • 18. Find the Right method ▸ How to research the best deep learning solution? (In my case) ▹ Book ▹ You can get well structured knowledge ▹ ArXiv ▹ You can find the latest methods ▹ Github ▹ You can find code ▹ Google ▹ If you already know a good keyword !! 18
  • 20. “You gathered a lot of training data !! Now let’s train using the full data set.
  • 22. Keep rechallenging ▸ Keep tries small ▹ If you get a lot of data, first do the following things. ▹ Prepare a small data set ▹ Check module works correctly ▹ Prepare an easy to verify training data set ▹ Most models can be trained with data such as “mnist” , you have to check it works 22 Reference: https://www.tensorflow.org/get_started/mnist/beginners
  • 23. Keep rechallenging ▸ A lot of challenges ▹ There are no obvious methods to improve accuracy ▹ You have to check the results ▹ If training and validation accuracies don’t improve, you have to stop it ▹ Check the results by visual boards such as TensorBoard ▹ You have to increasing challenge times by improving the calculation speed ▹ Using GPU ▹ Optimize CPU 23
  • 25. “ Deep learning has a lot of methods to improve accuracy
  • 26. “- Model - How deep - How it is structured - Adjusting the hyper parameters - Preprocess data - Data augmentation (If using Graphical data) - Optimizer - SGD, Adam, etc
  • 27. Focus ▸ Focus on the right problem ▹ Depends on your situation ▹ Enough computation resource and enough data ▹ Try a deep and complex model ▹ Enough computation resource, but not enough data ▹ Find a good pre-train model ▹ Focus on the preprocess such as data augmentation 27
  • 28. Focus ▸ Focus on the right problem ▹ Depending on your situation ▹ Not enough computation resource or data ▹ Consider other ways to solve your problem ▹ Logistic Regression, SVM, Random Forest ▹ Deep learning probably isn’t the best choice 28
  • 29. Specific Case Deep Learning applied to 3D objects 29
  • 30. Specific Case Deep Learning Model: VoxNet 30
  • 31. “There are a lot of deep learning models… How to choose one
  • 32. Deep Learning Model: VoxNet 32 ▸ In my case, I considered 3 things ▹ Resource ▹ Computation resources ▹ Human resources ▹ Performance ▹ accuracy ▹ Speed ▹ Speed of development
  • 33. Deep Learning Model: VoxNet 33 Reference: http://ri.cmu.edu/pub_files/2015/9/voxnet_maturana_scherer_iros15.pdf.
  • 34. Deep Learning Model: VoxNet 34 ▸ VoxNet Advantage ▹ Resource ▹ Computation resources ▹ Good ▹ Memory 32GB (In my environment) ▹ GPU GeForce GTX 1080 (In my environment) ▹ Performance ▹ Accuracy ▹ 83 % accuracy (Top model 95 %) ▹ Speed ▹ Open source, simple code
  • 35. 35 Deep Learning Model: VoxNet ▸ Voxelize ▹ Maps 3D data to a 32 * 32 * 32 voxel ▹ Reduce data size
  • 36. Deep Learning Model: VoxNet (3D CNN 3D objects) 36 ▸ Convolution 3D Reference: https://www.youtube.com/watch?v=ecbeIRVqD7g
  • 37. Deep Learning Model: VoxNet (3D CNN 3D objects) 37 1 5 1 6 2 7 4 8 3 1 2 2 1 3 0 4 kernel: 2x2 stride: 2 Convolution 2D Input Image(4x4) 5 1 3 2 1*5+1*1 + 5*3+6*2 2*5+4*1 + 7*3+8*2 3*5+2*1 + 1*3+2*2 1*5+0*1 + 3*3+4*2 33 51 24 8 Convoluted Image(2x2)
  • 38. Deep Learning Model: VoxNet (3D CNN 3D objects) 38 ▸ Convolution 3D
  • 39. Deep Learning Model: VoxNet (3D CNN 3D objects) 39 ▸ Convolution 3D
  • 40. Deep Learning Model: VoxNet (3D CNN 3D objects) 40 ▸ Convolution 3D
  • 41. Deep Learning Model: VoxNet (3D CNN 3D objects) 41 ▸ Convolution 3D
  • 42. Deep Learning Model: VoxNet (3D CNN 3D objects) 42 ▸ Convolution 3D
  • 43. Deep Learning Model: VoxNet (3D CNN 3D objects) 43 ▸ Convolution 3D
  • 44. Deep Learning Model: VoxNet (3D CNN 3D objects) 44 ▸ Convolution 3D
  • 45. Deep Learning Model: VoxNet (3D CNN 3D objects) 45 7 times ▸ Convolution 3D
  • 46. Deep Learning Model: VoxNet (3D CNN 3D objects) 46 32 filter : : 3DCNN Conv3D(input_shape=(32, 32, 32, 1), kernel_size=(5, 5, 5), strides=(2, 2, 2), data_format=”channels_last” ) ▸ Convolution 3D
  • 47. Deep Learning Model: VoxNet (Max Pool3D) 47 1 5 1 6 2 7 4 8 3 1 2 2 1 3 0 4 6 8 3 4 Max pool Pooling size: 2 x 2 Stride: 2 ▸ Max Pool 2D Convolution Feature
  • 48. Deep Learning Model: VoxNet (Max Pool3D) 48 ▸ Max Pool 3D Convolution Feature Pool window
  • 49. Deep Learning Model: VoxNet (Max Pool3D) 49 Convolution Feature ▸ Max Pool 3D
  • 50. Deep Learning Model: VoxNet (Max Pool3D) 50 Convolution Feature ▸ Max Pool 3D
  • 51. Deep Learning Model: VoxNet (Max Pool3D) 51 MaxPooling3D(pool_size=(2, 2, 2), data_format='channels_last',) ▸ Max Pool 3D
  • 52. Deep Learning Model: VoxNet (Max Pool3D) 52 ▸ Fully Connected and Output 3DCNN & 3DMaxPool : : Fully connected Dense 128 Dense number of class softmax
  • 53. Deep Learning Model: VoxNet (Flatten, Dense) 53 ▸ Fully Connected and Output ▹ Softmax function ▹ It maps output as a probability distribution ▹ It is easy to differentiate
  • 54. Deep Learning Model: VoxNet (Output) 54 ▸ Fully Connected and Output model.add(Flatten()) model.add(Dense(128, activation='linear',)) model.add(Dense(output_dim=number_class, activation='linear',)) model.add(Activation("softmax"))
  • 55. Deep Learning Model: VoxNet 55 model = Sequential() model.add(Conv3D(input_shape=(32, 32, 32, 1), kernel_size=(5, 5, 5), strides=(2, 2, 2), data_format='channels_last',)) model.add(Conv3D(input_shape=(32, 32, 32, 1), kernel_size=(3, 3, 3), strides=(1, 1, 1), data_format='channels_last',)) model.add(MaxPooling3D(pool_size=(2, 2, 2), data_format='channels_last',)) model.add(Flatten()) model.add(Dense(128, activation='linear',)) model.add(Dense(output_dim=number_class, activation='linear',)) model.add(Activation(‘softmax’)) ▸ Model
  • 56. Deep Learning Model: VoxNet 56 ▸ Train model.compile(loss='categorical_crossentropy', metrics=['accuracy']) model.fit(x_voxel_data, y_class_label)
  • 58. Improve technique 58 ▸ Improving accuracy has 2 approaches ▹ Model ▹ Advantage ▹ A variety of ways to improve accuracy ▹ Disadvantage ▹ A deep model takes a lot of resources ▹ It is not obvious which model is better ▹ Data ▹ Advantage ▹ The effect of changes are obvious ▹ Disadvantage ▹ Approaches are limited
  • 59. Improve accuracy 59 ▸ Improve accuracy for validation data(In my case) ▹ Model ▹ RandomDropout ▹ LeakyRelu ▹ Data ▹ Data augmentation(3D data) ▹ Data increase ▹ Class weight for Unbalanced category data
  • 60. Improve technique 60 ▸ Data Approach ▹ Data Augmentation has advantages over other methods ▹ The effects are obvious ▹ It does not increase calculation time unlike adding layers to the model
  • 61. Improve technique 61 ▸ Data Augmentation ▹ Rotation ▹ Shift ▹ Shear ▹ etc...
  • 62. Improve technique 62 ▸ Data Augmentation 3D ▹ Augmentation_matrix is changing channel_images = [ndi.interpolation.affine_transform(x, augmentation_matrix, )for x in x_voxel] x = np.stack(channel_images, axis=0)
  • 63. Improve technique 63 ▸ Data Augmentation 3D Rotation rotation_matrix_y = np.array([[np.cos(theta), 0, np.sin(theta) , 0], [0 , 1, 0 , 0], [-np.sin(theta), 0, np.cos(theta), 0], [0 , 0 , 0 , 1]])
  • 64. Improve technique 64 ▸ Data Augmentation 3D Shift shift_matrix = np.array([[1, 0, 0, shift_x], [0, 1, 0, shift_y], [0, 0, 1, shift_z], [0, 0, 0, 1 ]])
  • 65. Improve technique 65 ▸ Data Augmentation 3D Shear shear_matrix = np.array([[1 , shear_x, shear_x, 0], [shear_y, 1 , shear_y, 0], [shear_z, shear_z, 1 , 0], [0 , 0 , 0 , 1] ])
  • 66. Improve accuracy 66 ▸ Data increase Add Data augmented data to Training data
  • 68. Improve calculation speed 68 ▸ Deep Learning has a lot of ways to improve calculation speed (In my case) ▹ Use a GPU(GeForce GTX 1080: Memory 8GB) ▹ CPU optimization ▹ Multi thread ▹ Prepare feature set
  • 69. Improve calculation speed 69 ▸ CPU optimize (TensorFlow build option) bazel build -c opt --copt=-mavx --copt=-mavx2 --copt=-mfma
  • 72. Result 72 Method Explanation Training (accuracy) Validation (accuracy) BaseLine BaseLine 90% 79% Shift_x_Shift_y Data augmentation(x-shift, y-shift) 80% 80% Shift_x_Shift_y_class _weight Data augmentation(x-shift, y-shift) + class weight 80% 83% Add_Shift_x_Shift_y_ class_weight Data augmentation(x-shift, y-shift) + class weight + ADD(x-shift, y-shift) 85% 85%
  • 74. Conclusion Summary of this presentation 74
  • 75. Conclusion 75 Ref: https://www.iconfinder.com/, https://github.com/, https://arxiv.org/, http://www.gitxiv.com/, https://scholar.google.co.jp/ Right problem laboratory Rechallenge Focus ▸ Strategy Right Problem Rechallenge FocusRight Method
  • 76. Conclusion 76 ▸ Our case Right Problem Right Method Rechallenge Focus Data augmentation Customize model 3D object recognition VoxNet On-demand manufacturing service
  • 78. Deep Learning For 3D objects It is a rare case, Implementing deep learning for 3D objects 78
  • 79. “We are hiring!! - Deep Learning for 3D objects - Working in Japan https://www.wantedly.com/project s/111707 contact@kabuku.co.jp
  • 80. 80 THANKS! Any questions? You can find me at @SnowGushiGit & masaya.ohgushi@kabuku.co.jp
  • 82. References ▸ MNIST datasets ▹ https://www.tensorflow.org/get_started/mnist/beginners ▸ Flaticon ▹ www.flaticon.com ▸ 3D CNN-Action Recognition Part-1 ▹ https://www.youtube.com/watch?v=ecbeIRVqD7g&t=82s ▸ Bengio, Yoshua, et al. "Curriculum learning." Proceedings of the 26th annual international conference on machine learning. ACM, 2009. ▸ He, Kaiming, et al. "Deep residual learning for image recognition." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016. ▸ Yann, Margot Lisa-Jing, and Yichuan Tang. "Learning Deep Convolutional Neural Networks for X-Ray Protein Crystallization Image Analysis." AAAI. 2016. ▸ Maturana, Daniel, and Sebastian Scherer. "Voxnet: A 3d convolutional neural network for real-time object recognition." Intelligent Robots and Systems (IROS), 2015 IEEE/RSJ International Conference on. IEEE, 2015. 82
  • 83. References ▸ Deep Learning Book ▹ http://www.deeplearningbook.org/ ▸ IconFinder ▹ https://www.iconfinder.com/, ▸ Github ▹ https://github.com/, ▸ Arxiv ▹ https://arxiv.org/ ▸ GitXiv ▹ http://www.gitxiv.com/ ▸ GoogleSchlor ▹ https://scholar.google.co.jp/ ▸ 83