One glance at the screen, and the crowd makes sense

The station entrance is a blur on the guard’s wall of screens. People stream in, bags swing, coats flash past. The guard used to pause and zoom, corner by corner. Today the guard tries one wide look, then marks where the people and bags are, all at once.

That old stop-and-check habit matches how picture-finders used to work. They’d hunt through lots of little patches, or they’d guess loads of spots, then double-check each one. It could be careful, but it kept repeating itself while the scene moved on.

The new routine is a single-look finder. It takes the whole picture in one go and gives two answers together: what the thing is, and where it sits. Like the guard’s new glance, it doesn’t keep rewinding and poking at each corner.

To stay organised, it treats the picture like a window split into big squares. Each square looks after anything whose middle lands inside it. Each square suggests a few box shapes, says how sure it feels, and names what it thinks it sees. Takeaway: split the view, then label and box in one sweep.

It also learns not to fuss over empty floor, because most of the view is just background. It gets pushed harder to draw boxes neatly around real things. It also stops big shapes from bossing everything, and when a square offers more than one box, the best-fitting box takes charge and keeps doing that kind of job.

The pay-off is speed that can keep up with moving video, and fewer silly moments where a shadow gets treated like a person because the whole scene gives context. The trade is that boxes can land a bit wonky, especially when small things are packed close together, like a tight group at the barrier.

Later, the quick marker can work alongside a slower zoom-in checker. When both land on the same spot, the team trusts it more. When the boxes clash, the team can guess the kind of slip that happened. Watching the screens, the guard realises the wide glance isn’t perfect, but it stops the endless rewinding.