The hill where the brake learned to behave
I roll my bike down a long hill and the speed creeps up. One hard grab of the brake could make the tyres skid, but no brake at all could end badly. So I do small squeezes, again and again, staying steady while still moving on.
That’s the same trouble a learning system can have. A learning system is just a set of rules that picks an action, then tweaks itself based on how that went. If the tweak is too big, it can suddenly get worse, like a brake yank turning a smooth ride into a wobble.
People tried to stop those big swerves, but some fixes felt heavy and awkward, like bolting on a fussy braking gadget. The new idea is simpler: put a built-in limit on each change, so the system can’t lurch too far in one go.
Here’s the neat bit. The system checks how much more or less it would pick an action now compared with before. If that shift tries to push past a small safe band, the extra push stops counting. Bike match: the shift is like a sudden jump in speed, and the cap is like only letting the brake lever move a little each squeeze. Takeaway: limit each step, keep your grip.
In motion, the system keeps a bundle of recent moments and practises on that same bundle more than once. Without the limiter, repeating the same bundle can tempt wild changes that look good right then, but go wrong after. With the cap, once the change gets too big, there’s no payoff for pushing harder, so it keeps the useful small gains.
There was another safety tool too: a pushback that grows when the new behaviour drifts too far, and gets tuned as it goes. That’s like a brake that fights your hand more and more, and you keep fiddling with it mid-hill. The capped squeeze tended to behave better when things got tricky and a skid was waiting.
By the time I reach the bottom, my hands have squeezed the brake plenty, but never yanked it. The bike stayed calm because each squeeze had a limit. The learning system does the same: it can reuse the same recent experience for several tidy passes, without any single pass shoving it away from what was working.