The day the library stickers stopped feeling neutral

In the library’s back room, we had a table of fresh books and rolls of coloured stickers. Our job was simple: skim a few pages, then stick on “family friendly” or “needs caution”. The librarian said, “We’ll ask a few of you per book and go with the most common sticker.”

It sounded fair. More opinions should smooth out odd personal tastes. But then I noticed a snag: if a few volunteers are always harsher on certain authors or certain kinds of characters, those stickers aren’t random. They lean the same way, every time.

To check if that lean happens in real life, the team gathered two big piles of past judgements where the right answers were already known. One pile came from justice-style case write-ups. The other was people tagging short online comments as toxic or not.

They could score each person two ways at once: how often the person matched the known right answer, and whether the person’s wrong calls landed harder on one group than another. The surprise was that some people were usually right, yet still skewed against a group.

Then the awkward bit for “most common sticker wins”. On lots of items, the people doing the labelling were mostly the skewed ones, even with a fairly gentle test for skew. So the final sticker could flip away from the correct one, just because the tilted group had the numbers.

Sending the skewed volunteers home didn’t fix it. The remaining stickers often got less correct, and plenty of books ended up with too few stickers to use. In the back room it felt like a choice between unfair speed and a trolley that never gets finished.

Even fancier ways of combining votes didn’t reliably wash out the skew. When computers were trained on these final stickers instead of the known-correct ones, they copied the wobble and the unevenness. Takeaway: if the stickers go on crooked, the recommendations will too.