SGS
01 / 07
Success-Guided Sampling

A Balanced Data Diet

Addressing the bottleneck in mega-scale RL for robot control.

One policy

One policy, every terrain.

A single network controls the robot across all terrains.

Training

No reward engineering.

A sparse success signal and generic regularizers. No demonstrations, no distillation.

Goal

Only the goal pose.

Each terrain gives the policy one target pose at the end.

Architecture

A Markovian MLP policy.

Four to eight layers, no transformer, no LSTM. Terrain comes in as a heightmap.

One run

Ten terrains, back to back.

One continuous run, no resets between them.

Same method

The same recipe does manipulation.

Success-Guided Sampling trains contact-rich assembly the same way.

Scroll to scrub
Manipulation

Contact-rich assembly.

A NIST taskboard task, trained with reinforcement learning, no demonstrations.

Method

How SGS works.

Task configurations are sampled by the policy's current success rate, concentrating on the ones it solves about half the time.

Scaling

Past one million environments.

Success rate continues to rise as parallel environments grow to over one million, 16x prior work.

Real-world

On real hardware.

Policies transferred from simulation to physical robots.