TidyNET
An end-to-end deep learning system for robotic workspace organization that uses diffusion-based visual planning to turn clutter into order.
Overview
TidyNET is an end-to-end deep learning system for robotic workspace organization that uses visual planning through diffusion models. Conducted with Professor Hod Lipson at the Creative Machines Lab (Columbia University), the project bridges perception and action, transforming cluttered scenes into tidy layouts. The pipeline processes cluttered scenes to generate tidy arrangements, detects objects with oriented bounding boxes, and executes physical rearrangement on a robotic arm.
Key features
- Diffusion model that generates tidy target images conditioned on messy inputs, preserving object identity
- YOLOv11 detector with oriented bounding boxes for precise object pose estimation
- Color-based object matching via the Hungarian algorithm between messy and tidy states
- Collision-aware motion planner with 70% task completion on real hardware
Technical details
Diffusion model
- U-Net architecture with timestep conditioning and channel-wise attention
- Cosine noise schedule with 1000 timesteps for stable training
- Classifier-free guidance for improved generation quality
Robotic control
- Grid-based occupancy map at 2 cm resolution for collision detection
- Separating Axis Theorem for oriented-rectangle collision checks
- Priority-based object placement with temporary relocation when blocked
Results & performance
The YOLOv11-OBB detector achieves above 93% detection success across various object counts, with angular accuracy within ±3.2°. Robot execution maintains 85.5% grasp success and 74.2% placement success. The diffusion model preserves object identity through transformation, with occasional visual artifacts in complex scenes. End-to-end, the system reaches a 70% task completion rate on the Interbotix WX200 platform.








