The Engineering Behind the Art: A Technical Paper

Every Diamojism piece begins as a photograph. It ends as something else entirely — a large-format print made up of thousands of individually selected emoji, each chosen by a system I built from the ground up. I've been asked many times how that system works. This paper is my answer.

Diamojism: Architecture of a Perceptual Emoji Mosaic System is available as a free PDF download. It's written for technically minded readers but introduces every concept before applying it — no prior background in image processing required.

What the system actually does

The rendering system works in two stages. The first happens once, before any artwork begins: it analyzes the entire Noto Emoji open-source library — over 100,000 glyphs — extracting color signatures, edge characteristics, and spatial structure for every one, then organizes them into a fast search index. This is the foundation that makes real-time rendering possible.

The second stage happens fresh for every piece. The source photograph is prepared, resized, and analyzed for edge structure. The system then uses machine-learning face detection to locate faces and map individual features — eyes, mouth — because faces need more care than backgrounds, and eyes need more care than anything else. That importance map drives an adaptive grid: fine tiles over the eyes and face, larger tiles over quieter regions. Then, for every tile, the system finds the emoji that best fits it — weighing color, edge density, and spatial structure simultaneously — while nudging toward variety so the same emoji doesn't dominate the canvas.

Why the engineering choices matter

One decision shaped almost everything else: using a perceptually uniform color space rather than standard RGB throughout the entire pipeline. In ordinary RGB, equal numerical distances don't correspond to equal perceived differences in color. The system uses CIE Lab* instead, where the math aligns with how we actually see. It sounds technical. The practical effect is that the emoji the system selects are the ones that genuinely look right — not just the ones that are numerically close.

The choice to use perceptual color science rather than standard RGB isn't a technical detail — it's the reason the finished mosaics look convincing rather than merely approximate. The system is tuned to match what the human eye actually sees, not what the numbers say.

Another choice: the face and eye detection isn't decoration. It's load-bearing. Without it, the system would treat the background and the subject's eyes as equally important. With it, the finest tiles — and the most careful emoji selection — concentrate exactly where a viewer's attention lands first.

The paper

The full paper covers both pipeline stages in detail, with six original technical diagrams. It includes citations to the foundational algorithms involved — Sobel edge detection, K-Means clustering, k-d tree indexing, Jensen–Shannon divergence, and others — and explains each in plain language before applying them.

Previous
Previous

AI, Claude, and the Making of Diamojism