Filtered for machine misunderstandings

07.32, Monday 11 Apr 2022


Xerox scanners used to have a bug that would silently replace numbers in the text of documents to make them compress better.

The bug was discovered by David Kriesel in 2013:

We got aware of the problem by scanning some construction plan last Wednesday and printing it again. Construction problems contain a boxed square meter number per room. Now in some rooms, we found beautifully lay-outed, but utterly wrong square meter numbers.

It’s to do with the image compression algorithm:

Images are cut into small segments, which are grouped by similarity. For every group only a representative segment is is saved that gets reused instead of other group members, which may cause character substitution.

Eg The 65 became an 85 (second column, third line).

Invoices, engineering plans… scanners and copiers in the Xerox WorkCentre line had this bug, undetected, for 8 years.

For PDFs that were scanned with the named Xerox devices during the last 8 years, it cannot be proven what characters were on the original sheet of paper at the places that are now defined by reused patches.

(See that link for Kriesel’s full write-up.)

It would be neat to trigger this deliberately… a document that, when scanned, turns into something else.

Maliciously: could I do wet signatures on a contract with a company that, when the agreement is scanned to accounts payable, leads them to transfer me a different amount of money?

SEE ALSO: that time I got my hands on a real-life ugly shirt, a ridiculous-looking garment that magically renders the wearer invisible to CCTV.


The Einstein-Marilyn optical illusion, a photograph of the face of Albert Einstein if the image is large or close up, and Marilyn Monroe if the image is small/seen from a distance – it’s to do with spatial frequency.

See the illusion at the Mind Hacks blog.

Which also explains how it works.

High spatial frequency changes means lots of small detail. …

Depending on distance, different spatial frequencies are easier to see, and if those spatial frequencies encode different information then you can make a hybrid image which switches as you alter your distance from it.

References provided.


Excel is a behemoth in the spreadsheet world and is regularly used by scientists to track their work and even conduct clinical trials. But its default settings were designed with more mundane applications in mind, so when a user inputs a gene’s alphanumeric symbol into a spreadsheet, like MARCH1 – short for “Membrane Associated Ring-CH-Type Finger 1” – Excel converts that into a date: 1-Mar.

This is extremely frustrating, even dangerous, corrupting data that scientists have to sort through by hand to restore. It’s also surprisingly widespread and affects even peer-reviewed scientific work. One study from 2016 examined genetic data shared alongside 3,597 published papers and found that roughly one-fifth had been affected by Excel errors.

AND SO, over the past year or so, some 27 human genes have been renamed, all because Microsoft Excel kept misreading their symbols as dates.

Humans have a bunch of feedback loops to prevent signal degradation in communication: we have a moral code against factual inaccuracy (you’re either a liar or unreliable, depending on intent, and our morality is constructed to see both as failings); the legal system has “protected characteristics” – you are not allowed to pejoratively stereotype when it comes to gender, race, and so on. Stereotyping is a disallowed form of data compression. (The feedback loops adapt over time to handle changes in the machinery of inter-person signal transmission and the data for which we demand high fidelity.)

What is the equivalent for machines? Is Excel a liar? How should an application be made to feel ashamed?


iPhones are no longer cameras in the traditional sense. Instead, they are devices at the vanguard of “computational photography,” a term that describes imagery formed from digital data and processing as much as from optical information.

How the camera works depends on what (it thinks) it’s looking at.

when a user takes a photograph with the newest iPhones, the camera creates as many as nine frames with different levels of exposure. Then a “Deep Fusion” feature, which has existed in some form since 2019, merges the clearest parts of all those frames together, pixel by pixel, forming a single composite image. … The iPhone camera also analyzes each image semantically, with the help of a graphics-processing unit, which picks out specific elements of a frame–faces, landscapes, skies–and exposes each one differently.

Do some faces trigger a stronger spike than others? What faces are loved by the iPhone camera? What machine misinterpretations are being laid down in the historical records, and how may we discover them?

Could you wear a shirt that targets the iPhone sunset detector, fooling the computational camera into boosting your colour saturation? That would help you pop in a crowd.

More posts tagged:

If you enjoyed this post, please consider sharing it by email or on social media. Here’s the link. Thanks, —Matt.