Bold claim: current AI struggles with spatial understanding, even as it excels at processing massive text and images. And this gap matters more than you might think. New 3D benchmarks reveal that while AI can untangle simple knots, it stumbles when asked to tie knots from loops or to convert one knot into another. The result is a sobering reminder: mastering language and visuals isn’t enough to master the physical world, where spatial reasoning and manipulation are essential for advances in robotics and real-world automation.
Researchers from Cornell Tech, led by doctoral student Zoe (Zizhao) Chen and associate professor Yoav Artzi, introduced KnotGym, a 3D simulator designed to probe how reinforcement learning models and large language models (LLMs) handle tasks in a virtual 3D environment. KnotGym provides a visual generalization test with a carefully structured “generalization ladder,” which lets scientists increase knot complexity and assess whether AI can transfer learning beyond its initial training. Chen emphasizes that this kind of spatially aware testing is missing from many standard computer vision benchmarks.
In the study, AI agents were shown simple loop shapes and various knots and instructed to unknot, tie, or convert knots. Untying is where AI performed best: about 90% success for knots with up to four crossings, including a common shoelace knot with three crossings. By contrast, tying knots and converting between knot forms proved much harder. As knot complexity grew (more crossings), success rates dropped sharply: roughly 83% for two-crossing knots, then 16% for three-crossing knots, with knots exceeding three crossings stalling AI completely. Conversion tasks tracked a similar pattern. These results suggest that current AI systems lack robust spatial exploration and incremental learning—traits that humans naturally develop through hands-on experimentation.
Chen draws a vivid analogy: when a child plays with a Rubik’s Cube, exploration and trial-and-error gradually reveal workable action sequences, with knowledge building upon prior experiences toward a broader goal. AI today often lacks this exploratory, cumulative approach, which may limit its capacity to discover novel strategies in the physical world.
Looking ahead, the team plans to accelerate KnotGym’s evaluation by running it on Graphics Processing Units (GPUs). GPUs deliver faster computation than standard CPUs, enabling more rapid testing of complex scenarios and larger model ensembles. Such improvements could help researchers more quickly probe where AI’s spatial reasoning breaks and how to nurture more robust, transferable skills.
Funding for the project comes from multiple sources, including the National Science Foundation, Open Philanthropy, Nvidia Academic Grant, and the National Artificial Intelligence Research Resource (NAIRR) Pilot. Louis DiPietro contributed reporting for the Cornell Ann S. Bowers College of Computing and Information Science.