Scalable Fiducial Tag Localization on a 3D Prior Map
Via Graph-Theoretic Global Tag-Map Registration
Kenji Koide, Shuji Oishi, Masashi Yokozuka, and Atsuhiko Banno
National Institute of Advanced Industrial Science and Technology (AIST), Japan
Background
• Map-based visual localization has been attracting much attention
• It is, however, sometimes necessary to rely on visual fiducial tags
(aka visual markers) for initialization and fail-safe
[Oishi, 2020]
Motivation
• Deploying many tags on a 3D prior map is sometimes difficult and tedious
• Tag positions are often measured by hand; large effort and inaccurate results
• We aim to develop an accurate and automatic method to determine tag poses
in the environment
Proposed Method
1. VIO-based Tag-Relative-Pose Estimation
We use an agile camera to observe tags in the environment and
estimate the relative poses between tags via landmark SLAM
2. Global Tag-Map Registration
We then roughly align tags and a prior map by establishing tag-plane
correspondences via graph-theoretic correspondence estimation
3. Estimation Refinement via Direct Camera-Map Alignment
Tag and camera poses are refined by directly aligning agile camera images with
the prior map and re-optimize all variables under all constraints
VIO-based Tag-Relative-Pose Estimation
• We use an agile camera and observe each tag in the environment at least once
• The tag poses in the VIO frame is estimated via landmark SLAM
VIO
(VINS-Mono)
Tag detections
(Apriltags)
Pose graph optimization
Global Tag-Map Registration
• We want to align the estimated tag poses with a prior 3D map without initial guess
• The modality difference makes it difficult to apply image matching…
Prior 3D map (sparse point cloud) Estimated tag poses (visually detected)
Align w/o initial guess
Geometry-based Tag-Plane Matching
• We assume that most tags are placed on a plane in the environment
• We establish tag-plane correspondences to determine the tag-map transformation
Detecting planes in the environment
1. Region growing segmentation
2. RANSAC plane detection
3. Fit oriented BBoxes to plane points
Geometry-based Tag-Plane Matching
• We assume that most tags are placed on a plane in the environment
• We establish tag-plane correspondences to determine the tag-map transformation
Detecting planes in the environment
1. Region growing segmentation
2. RANSAC plane detection
3. Fit oriented BBoxes to plane points
Geometry-based Tag-Plane Matching
• We assume that most tags are placed on a plane in the environment
• We establish tag-plane correspondences to determine the tag-map transformation
Detecting planes in the environment
1. Region growing segmentation
2. RANSAC plane detection
3. Fit oriented BBoxes to plane points
Geometry-based Tag-Plane Matching
• We assume that most tags are placed on a plane in the environment
• We establish tag-plane correspondences to determine the tag-map transformation
Detecting planes in the environment
1. Region growing segmentation
2. RANSAC plane detection
3. Fit oriented BBoxes to plane points
Plane = (center, normal, lengths)
Max-Clique-based Correspondence Estimation
• Tag-Plane Correspondence Consistency Graph
Vertex: tag-plane correspondence hypothesis
Edge: consistency between correspondence hypotheses
ℎ𝑖𝑗 does not contradict ℎ𝑘𝑙 (i.e., they are consistent)
Tag i corresponds to plane j
Tag k corresponds to plane l
ℎ𝑖𝑗
ℎ𝑘𝑙
Max-Clique-based Correspondence Estimation
• Tag-Plane Correspondence Consistency Graph
Vertex: tag-plane correspondence hypothesis
Edge: consistency between correspondence hypotheses
ℎ𝑖𝑗
ℎ𝑘𝑙
Max-Clique-based Correspondence Estimation
• Tag-Plane Correspondence Consistency Graph
Vertex: tag-plane correspondence hypothesis
Edge: consistency between correspondence hypotheses
• Largest subset of hypotheses that are all mutually consistent (i.e., maximum clique)
gives the best explanation for the tag placement in the given map
ℎ𝑖𝑗
ℎ𝑘𝑙
Tag-Plane Correspondence Consistency
• Consistency between tag-plane correspondence hypotheses is determined
based on geometric consistency check
ℎ𝑖𝑗
ℎ𝑘𝑙
Tag i
Tag k
Plane j
Plane l
Tag-Plane Correspondence Consistency
• Consistency between tag-plane correspondence hypotheses is determined
based on geometric consistency check
• We align tag i and plane j and s.t. distance between tag k and plane l
Plane j
Plane l
Tag-Plane Correspondence Consistency
• Consistency between tag-plane correspondence hypotheses is determined
based on geometric consistency check
• We align tag i and plane j and s.t. distance between tag k and plane l
• If normal and translation errors between tag k and plane l are smaller than
threshold, these hypotheses are mutually consistent
Plane j
Plane l
Normal error
Translation error
Example Result
Planes
Tags
• While the consistency graph contains many edges,
the max-clique can be found very efficiently [Rossi, 2015]
Example Result
Planes
Tags
Consistency graph contains
429,735 hypothesis pairs
• While the consistency graph contains many edges,
the max-clique can be found very efficiently [Rossi, 2015]
Example Result
Planes
Tags
Consistency graph contains
429,735 hypothesis pairs
Maximum clique consists of
56 tag-plane correspondences
found in 92 msec
• While the consistency graph contains many edges,
the max-clique can be found very efficiently [Rossi, 2015]
• Given the tag-plane correspondences, we estimate the tag-map transformation
by minimizing normal-to-normal ICP distance [Rusinkiewicz, 2019]
Estimation Refinement
• We refine the tag poses by directly aligning agile camera images with the map
VIO
Tag detections
Pose graph
Direct alignment
Estimation Refinement
• We refine the tag poses by directly aligning agile camera images with the map
• We use the normalized information distance (NID), a mutual information-based
cross modal metric, to maximize the co-occurrence of pixel and map intensity values
• Tag and camera poses are re-optimized under all the constraints
Agile camera image
Map rendered with
optimized camera pose
Evaluation in Simulation
• The method is evaluated on the Replica dataset [Savva, 2019]
Global tag-map registration
: 0.039m / 1.021°
Tag localization accuracy
: 98% success rate
Baseline (FPFH+RANSAC/Teaser) : 26% and 70%
Robustness to outlier tags
Evaluation in Real Environment
• 117 tags were placed in the environment
• Tag poses were estimated in 22 minutes (16 min for VIO recording, 6 min for post processing)
• Average tag pose error: 0.019m and 2.382°
Final estimation result
Thank you for your attention!!
24
Conclusion
• An accurate and scalable method for fiducial tag localization on a 3D prior
environmental map is proposed
• VIO-based tag relative pose estimation via landmark SLAM
• Global tag-map registration based on tag-plane correspondence estimation
via maximum clique finding
• Estimation refinement via NID-based direct camera-map alignment
• The proposed method could localize over 100 tags in 22 minutes
• The average tag localization error was about 2 cm

More Related Content

PDF
Lecture 01 frank dellaert - 3 d reconstruction and mapping: a factor graph ...
PDF
3-d interpretation from single 2-d image for autonomous driving
PDF
Personalized geo tag recommendation for community contributed images
PPT
Image Tag Refinement Along the 'What' Dimension using Tag Categorization and ...
PDF
Lec10 alignment
PPT
Geotagging Photographs By Sanjay Rana
PDF
3D Reconstruction from Multiple uncalibrated 2D Images of an Object
PPTX
Kintinuous review
Lecture 01 frank dellaert - 3 d reconstruction and mapping: a factor graph ...
3-d interpretation from single 2-d image for autonomous driving
Personalized geo tag recommendation for community contributed images
Image Tag Refinement Along the 'What' Dimension using Tag Categorization and ...
Lec10 alignment
Geotagging Photographs By Sanjay Rana
3D Reconstruction from Multiple uncalibrated 2D Images of an Object
Kintinuous review

Similar to Scalable Fiducial Tag Localization on a 3D Prior Map via Graph-Theoretic Global Tag-Map Registration [IROS2022] (19)

PDF
Cluster based landmark and event detection for tagged photo collections
PPTX
Semantic Mapping of Road Scenes
PPSX
Three View Self Calibration and 3D Reconstruction
PDF
Introductory Level of SLAM Seminar
PDF
Visual odometry & slam utilizing indoor structured environments
PDF
Tracking Chessboard Corners Using Projective Transformation for Augmented Rea...
PPTX
Improving Personal Tagging Consistency Through Visualization Of Tag
PPTX
3D modelling of Archeological site using UAV
PDF
Dense Image Matching - Challenges and Potentials (Keynote 3D-ARCH 2015)
PPTX
Graphics
PPTX
Augmented reality session 4
PDF
Fcv scene hebert
PPTX
Tony TUNG @ Matsuyama Lab., Kyoto University 2007-2014
PDF
SSII2018企画: センシングデバイスの多様化と空間モデリングの未来
PPTX
SIGGRAPH 2014 Preview -"Shape Collection" Session
PDF
FastCampus 2018 SLAM Workshop
PDF
Fusion of Multi-MAV Data
PPTX
CERTH/CEA LIST at MediaEval Placing Task 2015
PDF
An Assessment of Image Matching Algorithms in Depth Estimation
Cluster based landmark and event detection for tagged photo collections
Semantic Mapping of Road Scenes
Three View Self Calibration and 3D Reconstruction
Introductory Level of SLAM Seminar
Visual odometry & slam utilizing indoor structured environments
Tracking Chessboard Corners Using Projective Transformation for Augmented Rea...
Improving Personal Tagging Consistency Through Visualization Of Tag
3D modelling of Archeological site using UAV
Dense Image Matching - Challenges and Potentials (Keynote 3D-ARCH 2015)
Graphics
Augmented reality session 4
Fcv scene hebert
Tony TUNG @ Matsuyama Lab., Kyoto University 2007-2014
SSII2018企画: センシングデバイスの多様化と空間モデリングの未来
SIGGRAPH 2014 Preview -"Shape Collection" Session
FastCampus 2018 SLAM Workshop
Fusion of Multi-MAV Data
CERTH/CEA LIST at MediaEval Placing Task 2015
An Assessment of Image Matching Algorithms in Depth Estimation
Ad

Recently uploaded (20)

PPTX
2018-HIPAA-Renewal-Training for executives
PDF
Architecture types and enterprise applications.pdf
PDF
Developing a website for English-speaking practice to English as a foreign la...
PDF
Abstractive summarization using multilingual text-to-text transfer transforme...
PDF
From MVP to Full-Scale Product A Startup’s Software Journey.pdf
PDF
Flame analysis and combustion estimation using large language and vision assi...
PDF
A Late Bloomer's Guide to GenAI: Ethics, Bias, and Effective Prompting - Boha...
PDF
Two-dimensional Klein-Gordon and Sine-Gordon numerical solutions based on dee...
PDF
CloudStack 4.21: First Look Webinar slides
PPT
Galois Field Theory of Risk: A Perspective, Protocol, and Mathematical Backgr...
PDF
A proposed approach for plagiarism detection in Myanmar Unicode text
PPTX
The various Industrial Revolutions .pptx
PDF
sbt 2.0: go big (Scala Days 2025 edition)
PDF
UiPath Agentic Automation session 1: RPA to Agents
PDF
A review of recent deep learning applications in wood surface defect identifi...
PDF
sustainability-14-14877-v2.pddhzftheheeeee
PDF
Taming the Chaos: How to Turn Unstructured Data into Decisions
PDF
Hindi spoken digit analysis for native and non-native speakers
PDF
A contest of sentiment analysis: k-nearest neighbor versus neural network
DOCX
search engine optimization ppt fir known well about this
2018-HIPAA-Renewal-Training for executives
Architecture types and enterprise applications.pdf
Developing a website for English-speaking practice to English as a foreign la...
Abstractive summarization using multilingual text-to-text transfer transforme...
From MVP to Full-Scale Product A Startup’s Software Journey.pdf
Flame analysis and combustion estimation using large language and vision assi...
A Late Bloomer's Guide to GenAI: Ethics, Bias, and Effective Prompting - Boha...
Two-dimensional Klein-Gordon and Sine-Gordon numerical solutions based on dee...
CloudStack 4.21: First Look Webinar slides
Galois Field Theory of Risk: A Perspective, Protocol, and Mathematical Backgr...
A proposed approach for plagiarism detection in Myanmar Unicode text
The various Industrial Revolutions .pptx
sbt 2.0: go big (Scala Days 2025 edition)
UiPath Agentic Automation session 1: RPA to Agents
A review of recent deep learning applications in wood surface defect identifi...
sustainability-14-14877-v2.pddhzftheheeeee
Taming the Chaos: How to Turn Unstructured Data into Decisions
Hindi spoken digit analysis for native and non-native speakers
A contest of sentiment analysis: k-nearest neighbor versus neural network
search engine optimization ppt fir known well about this
Ad

Scalable Fiducial Tag Localization on a 3D Prior Map via Graph-Theoretic Global Tag-Map Registration [IROS2022]

  • 1. Scalable Fiducial Tag Localization on a 3D Prior Map Via Graph-Theoretic Global Tag-Map Registration Kenji Koide, Shuji Oishi, Masashi Yokozuka, and Atsuhiko Banno National Institute of Advanced Industrial Science and Technology (AIST), Japan
  • 2. Background • Map-based visual localization has been attracting much attention • It is, however, sometimes necessary to rely on visual fiducial tags (aka visual markers) for initialization and fail-safe [Oishi, 2020]
  • 3. Motivation • Deploying many tags on a 3D prior map is sometimes difficult and tedious • Tag positions are often measured by hand; large effort and inaccurate results • We aim to develop an accurate and automatic method to determine tag poses in the environment
  • 4. Proposed Method 1. VIO-based Tag-Relative-Pose Estimation We use an agile camera to observe tags in the environment and estimate the relative poses between tags via landmark SLAM 2. Global Tag-Map Registration We then roughly align tags and a prior map by establishing tag-plane correspondences via graph-theoretic correspondence estimation 3. Estimation Refinement via Direct Camera-Map Alignment Tag and camera poses are refined by directly aligning agile camera images with the prior map and re-optimize all variables under all constraints
  • 5. VIO-based Tag-Relative-Pose Estimation • We use an agile camera and observe each tag in the environment at least once • The tag poses in the VIO frame is estimated via landmark SLAM VIO (VINS-Mono) Tag detections (Apriltags) Pose graph optimization
  • 6. Global Tag-Map Registration • We want to align the estimated tag poses with a prior 3D map without initial guess • The modality difference makes it difficult to apply image matching… Prior 3D map (sparse point cloud) Estimated tag poses (visually detected) Align w/o initial guess
  • 7. Geometry-based Tag-Plane Matching • We assume that most tags are placed on a plane in the environment • We establish tag-plane correspondences to determine the tag-map transformation Detecting planes in the environment 1. Region growing segmentation 2. RANSAC plane detection 3. Fit oriented BBoxes to plane points
  • 8. Geometry-based Tag-Plane Matching • We assume that most tags are placed on a plane in the environment • We establish tag-plane correspondences to determine the tag-map transformation Detecting planes in the environment 1. Region growing segmentation 2. RANSAC plane detection 3. Fit oriented BBoxes to plane points
  • 9. Geometry-based Tag-Plane Matching • We assume that most tags are placed on a plane in the environment • We establish tag-plane correspondences to determine the tag-map transformation Detecting planes in the environment 1. Region growing segmentation 2. RANSAC plane detection 3. Fit oriented BBoxes to plane points
  • 10. Geometry-based Tag-Plane Matching • We assume that most tags are placed on a plane in the environment • We establish tag-plane correspondences to determine the tag-map transformation Detecting planes in the environment 1. Region growing segmentation 2. RANSAC plane detection 3. Fit oriented BBoxes to plane points Plane = (center, normal, lengths)
  • 11. Max-Clique-based Correspondence Estimation • Tag-Plane Correspondence Consistency Graph Vertex: tag-plane correspondence hypothesis Edge: consistency between correspondence hypotheses ℎ𝑖𝑗 does not contradict ℎ𝑘𝑙 (i.e., they are consistent) Tag i corresponds to plane j Tag k corresponds to plane l ℎ𝑖𝑗 ℎ𝑘𝑙
  • 12. Max-Clique-based Correspondence Estimation • Tag-Plane Correspondence Consistency Graph Vertex: tag-plane correspondence hypothesis Edge: consistency between correspondence hypotheses ℎ𝑖𝑗 ℎ𝑘𝑙
  • 13. Max-Clique-based Correspondence Estimation • Tag-Plane Correspondence Consistency Graph Vertex: tag-plane correspondence hypothesis Edge: consistency between correspondence hypotheses • Largest subset of hypotheses that are all mutually consistent (i.e., maximum clique) gives the best explanation for the tag placement in the given map ℎ𝑖𝑗 ℎ𝑘𝑙
  • 14. Tag-Plane Correspondence Consistency • Consistency between tag-plane correspondence hypotheses is determined based on geometric consistency check ℎ𝑖𝑗 ℎ𝑘𝑙 Tag i Tag k Plane j Plane l
  • 15. Tag-Plane Correspondence Consistency • Consistency between tag-plane correspondence hypotheses is determined based on geometric consistency check • We align tag i and plane j and s.t. distance between tag k and plane l Plane j Plane l
  • 16. Tag-Plane Correspondence Consistency • Consistency between tag-plane correspondence hypotheses is determined based on geometric consistency check • We align tag i and plane j and s.t. distance between tag k and plane l • If normal and translation errors between tag k and plane l are smaller than threshold, these hypotheses are mutually consistent Plane j Plane l Normal error Translation error
  • 17. Example Result Planes Tags • While the consistency graph contains many edges, the max-clique can be found very efficiently [Rossi, 2015]
  • 18. Example Result Planes Tags Consistency graph contains 429,735 hypothesis pairs • While the consistency graph contains many edges, the max-clique can be found very efficiently [Rossi, 2015]
  • 19. Example Result Planes Tags Consistency graph contains 429,735 hypothesis pairs Maximum clique consists of 56 tag-plane correspondences found in 92 msec • While the consistency graph contains many edges, the max-clique can be found very efficiently [Rossi, 2015] • Given the tag-plane correspondences, we estimate the tag-map transformation by minimizing normal-to-normal ICP distance [Rusinkiewicz, 2019]
  • 20. Estimation Refinement • We refine the tag poses by directly aligning agile camera images with the map VIO Tag detections Pose graph Direct alignment
  • 21. Estimation Refinement • We refine the tag poses by directly aligning agile camera images with the map • We use the normalized information distance (NID), a mutual information-based cross modal metric, to maximize the co-occurrence of pixel and map intensity values • Tag and camera poses are re-optimized under all the constraints Agile camera image Map rendered with optimized camera pose
  • 22. Evaluation in Simulation • The method is evaluated on the Replica dataset [Savva, 2019] Global tag-map registration : 0.039m / 1.021° Tag localization accuracy : 98% success rate Baseline (FPFH+RANSAC/Teaser) : 26% and 70% Robustness to outlier tags
  • 23. Evaluation in Real Environment • 117 tags were placed in the environment • Tag poses were estimated in 22 minutes (16 min for VIO recording, 6 min for post processing) • Average tag pose error: 0.019m and 2.382° Final estimation result
  • 24. Thank you for your attention!! 24
  • 25. Conclusion • An accurate and scalable method for fiducial tag localization on a 3D prior environmental map is proposed • VIO-based tag relative pose estimation via landmark SLAM • Global tag-map registration based on tag-plane correspondence estimation via maximum clique finding • Estimation refinement via NID-based direct camera-map alignment • The proposed method could localize over 100 tags in 22 minutes • The average tag localization error was about 2 cm