The Engineering Behind the Art: A Technical Paper
Every Diamojism piece begins as a photograph. It ends as something else entirely — a large-format print made up of thousands of individually selected emoji, each chosen by a system I built from the ground up. I've been asked many times how that system works. This paper is my answer.
What the system actually does
The rendering system works in two stages. The first happens once, before any artwork begins: it analyzes the entire Noto Emoji open-source library — over 100,000 glyphs — extracting color signatures, edge characteristics, and spatial structure for every one, then organizes them into a fast search index. This is the foundation that makes real-time rendering possible.
The second stage happens fresh for every piece. The source photograph is prepared, resized, and analyzed for edge structure. The system then uses machine-learning face detection to locate faces and map individual features — eyes, mouth — because faces need more care than backgrounds, and eyes need more care than anything else. That importance map drives an adaptive grid: fine tiles over the eyes and face, larger tiles over quieter regions. Then, for every tile, the system finds the emoji that best fits it — weighing color, edge density, and spatial structure simultaneously — while nudging toward variety so the same emoji doesn't dominate the canvas.
Why the engineering choices matter
One decision shaped almost everything else: using a perceptually uniform color space rather than standard RGB throughout the entire pipeline. In ordinary RGB, equal numerical distances don't correspond to equal perceived differences in color. The system uses CIE Lab* instead, where the math aligns with how we actually see. It sounds technical. The practical effect is that the emoji the system selects are the ones that genuinely look right — not just the ones that are numerically close.
Another choice: the face and eye detection isn't decoration. It's load-bearing. Without it, the system would treat the background and the subject's eyes as equally important. With it, the finest tiles — and the most careful emoji selection — concentrate exactly where a viewer's attention lands first.
The paper
The full paper covers both pipeline stages in detail, with six original technical diagrams. It includes citations to the foundational algorithms involved — Sobel edge detection, K-Means clustering, k-d tree indexing, Jensen–Shannon divergence, and others — and explains each in plain language before applying them.