Robot, Know Thyself — and Any Shape You Want to Be
Credit: Generated using Microsoft Designer

Robot, Know Thyself — and Any Shape You Want to Be

MIT’s Neural Jacobian Fields (NJF) teach robots and 3D systems bodily awareness - from vision alone

What if your robot had no sensors, no prebuilt simulation, and no digital twin — and still learned how to move? What if the only thing it needed was a camera? That’s exactly what MIT CSAIL’s Neural Jacobian Fields (NJF) make possible.

NJF is a vision-driven framework that teaches machines — rigid, soft, or hybrid — how their bodies move and respond to control commands. Using only visual input, it learns a dense, differentiable internal model of the robot’s geometry and controllability. This isn’t just robotic control. It’s general-purpose bodily intelligence.


What Makes a Robot, Anyway?

Can you turn an IKEA lamp into a robot with a Raspberry Pi and some motors? Only if you can control it.

“Controllability is the minimum requirement for something to be called a robot,” says lead author Sizhe Lester Li.

But many robotic systems today — like soft hands, deformable limbs, or novel grippers — defy conventional control methods. They’re often cheap and capable but go unused because we lack general-purpose control software. NJF changes that.


The Breakthrough: Learning Jacobian Fields From Vision

NJF learns what traditional models cannot. It infers a Jacobian field — a spatial function that predicts how any part of a robot moves in response to small changes in control input.

  • Inspired by continuum mechanics and robotic kinematics
  • Learns via optical flow and RGB video
  • No access to joint angles, force sensors, or prebuilt 3D models
  • After training, real-time control via a single monocular camera at 12Hz

Once trained, NJF can:

  • Repose unseen characters (like Big Buck Bunny) using human data
  • Learn UV mappings of arbitrary meshes (no correspondences needed)
  • Replicate physical behaviors like ARAP deformations or collision-aware motion
  • Control pneumatic hands, rigid arms, and hybrid robots — even without sensors


The Architecture: A Spatialized Control Model

NJF isn’t just a neural controller — it’s a new modeling philosophy.

Rather than predicting motion directly, it predicts the system Jacobian across space. In simple terms: it figures out which commands control which parts of the body — much like a person discovering the controls of a new machine.

Key properties:

  • Spatial locality: Motion effects are local and smooth
  • Compositionality: Each motor affects a specific region — NJF captures this cleanly
  • Invariance: Learns general principles, not just memorized motions

All of this is wrapped into a lightweight architecture — a fully differentiable pipeline combining image encoding, Jacobian prediction, and Poisson-based mesh deformation.


What It Means for Robotics

Traditional robots are over-engineered to fit brittle models. NJF removes that constraint, allowing for cheaper, more flexible, and morphologically diverse designs.

“This work points to a shift from programming robots to teaching them,” says Li. “And that opens doors to robotics that are more accessible, adaptable, and affordable."

Imagine a future where you point your phone at a moving robot and it learns how to control itself from the footage - no sensors, no engineers required.


Beyond Robotics: 3D Learning for the Real World

NJF’s architecture isn’t just about robots — it’s a geometry engine for vision-based learning.

Whether in animation, virtual humans, simulation, or embodied AI, NJF brings new superpowers:

  • Deformation-aware motion transfer
  • Contact-aware dynamic modeling
  • UV parameterization without consistent topology
  • Learning from limited, even noisy data

It brings structure to perception, learning not just what things look like — but how they behave.


Real-World Results

🔹 Allegro Hand – Controlled without prior kinematic models

🔹 Pneumatic Soft Gripper – Controlled without sensors, only vision

🔹 3D-printed Arm – Learned motion from scratch using camera input

🔹 Re-posing Big Buck Bunny – Generalized from human mesh training

🔹 UV-mapping arbitrary meshes – Outperforms state-of-the-art without pre-alignments


Learn More

Read the paper :

1. Neural Jacobian Fields: Learning Intrinsic Mappings of Arbitrary Meshes

https://arxiv.org/abs/2205.02904

2. Controlling diverse robots by inferring Jacobian fields with deep networks

https://doi.org/10.1038/s41586-025-09170-0

Project Page, Code & Tutorials:

https://github.com/ThibaultGROUEIX/NeuralJacobianFields


The Future

NJF points to a robotics future that’s model-free, sensor-light, and visually grounded. It’s not just how machines will move - it’s how they’ll learn to move. We’re not building robots to match our models. We’re building models that learn to match the robot - whatever form it takes.


Follow MIT CSAIL for cutting-edge breakthroughs in machine perception, geometry, control, and AI.

References:

https://news.mit.edu/2025/vision-based-system-teaches-machines-understand-their-bodies-0724

https://arxiv.org/abs/2205.02904

https://dl.acm.org/doi/abs/10.1145/3528223.3530141

https://www.therobotreport.com/mit-vision-system-teaches-robots-to-understand-their-bodies/

https://sizhe-li.github.io/blog/2025/jacobian-fields-tutorial/

#Robotics #AI #MachineLearning #ComputerVision #EmbodiedIntelligence #SoftRobotics #Geometry #NeuralNetworks #MITCSAIL #NeuralJacobians

To view or add a comment, sign in

Others also viewed

Explore topics