AI, Claude, and the Making of Diamojism

May 19

People who learn that Diamojism is software-driven often ask about AI — whether it plays a role, and if so, how. It's a fair and interesting question, and one I'm happy to answer in some detail. The short version: yes, AI is part of the picture, but perhaps not in the way you'd expect.

Diamojism uses AI in two specific ways: a computer vision library called MediaPipe identifies faces and eyes in the source photograph so the system knows where to concentrate detail, and Claude / Claude Code served as a technical collaborator — first to help build and debug the rendering pipeline, later as an agent that runs render iterations and translates my visual observations into parameter adjustments. Every aesthetic decision is mine. The AI handles precision and iteration.

How Diamojism Works

Diamojism takes a source photograph and reconstructs it entirely from emoji glyphs, placed on a diamond-oriented grid and rendered for large-format print. The system's job, for each of the thousands of tiles that make up a finished piece, is to find the emoji that best matches that region of the photograph — in color, in visual structure, and in the spatial arrangement of its detail.

A Diamojism print at distance reads as a portrait; up close, it resolves into a surface of thousands of individual emoji. 

The duality that results — a coherent image that dissolves into symbols as you approach it — is the deliberate artistic heart of the medium. Everything in the system's design exists to give me precise control over how that works for each piece.

Where machine learning comes in: finding the face

The one place machine learning appears in Diamojism is in understanding the spatial importance of a portrait — specifically, where in the image a viewer's attention naturally concentrates. For this, the system uses MediaPipe, an open-source computer vision library developed by Google.

MediaPipe contributes two pre-trained models. The first detects faces and draws a bounding box around each one — the same kind of rectangle your phone camera draws on a face when you're about to take a photo. The second goes much further, fitting a 468-point geometric skeleton to each detected face and precisely locating individual features — each eye, the mouth, the jaw. Encoding that geometry by hand for every portrait would be impractical; MediaPipe supplies it with a precision that would otherwise be out of reach.

From this geometry, the system builds an importance map — a heat map with separate layers for the broad face, each eye, and the mouth — that reflects what vision research consistently shows: when we look at a portrait, our gaze goes to the eyes first and stays there longest. The importance map feeds two downstream decisions: how finely to subdivide the tile grid in each region, and how to weight the scoring criteria when evaluating emoji candidates there. Faces get finer tiles; eyes get the finest of all.

How each emoji is chosen

The actual selection of an emoji for each tile is deterministic mathematics. Every candidate is scored on three criteria, and the scores are combined using weights I set for each project.

Color match is computed in a perceptually uniform color space — one where equal numerical distances correspond to equal perceived differences in color. This makes the system's color decisions more predictable and artistically consistent than working in standard RGB, where the same numerical shift can look very different in different parts of the color range.

Edge density measures whether the overall amount of visual structure in a candidate emoji matches the detail level of the tile region. A smooth emoji over a textured region, or a busy emoji over a calm open area, both score poorly.

Spatial edge distribution is the most nuanced signal. During preprocessing, every emoji in the library is characterized by a nine-element fingerprint describing where its visual structure is concentrated — whether detail clusters in the center, the corners, the top half, and so on. When scoring a candidate, that fingerprint is compared to the equivalent fingerprint of the source tile region. Two emoji can have the same total amount of detail and still be a poor structural match if that detail sits in very different places. This signal catches distinctions that color and density alone would miss.

A reuse penalty rounds out the scoring, gently discouraging any single emoji from dominating the composition — more strongly in face regions, where repetition would be most visible. The balance between all these signals is mine to set, and it shifts from piece to piece depending on the character of the source image and the aesthetic I'm working toward.

The iterative process behind each piece

What makes Diamojism feel like a craft rather than a one-click process is the iteration. Every piece goes through multiple render cycles as I work through five sequential phases — each one refining a different aspect of the output — and each phase can be tuned independently without disturbing decisions already made in the others.

Beyond the face detection system, I define artist regions manually for each piece — bounding boxes in the source image that direct the pipeline's attention to areas I care about: a pair of hands, a collar, a section of background with a particular character. Each region has its own influence radius, falloff rate, and weighting. The system follows my lead.

A diagnostic mode writes intermediate outputs alongside every render — importance map visualizations, tile layout overlays, per-tile scoring breakdowns — so I can see exactly what the system decided and adjust accordingly. Every finished piece is saved with the full parameter configuration embedded in its metadata, giving me a precise record of how each result was achieved.

Claude and Claude Code as creative collaborators

I want to be open about the role Claude and Claude Code have played — not just in building the pipeline, but in making each piece.

Early in the project, Claude helped me write, refactor, and debug the rendering pipeline through conversation: working through design decisions like which color space would give the most artistically predictable results, how to represent edge structure as a spatial fingerprint, how to build a configuration model where style presets and per-project overrides could coexist without unexpected conflicts. These were genuine back-and-forth discussions. The designs that came out of them were tested against real renders, and revised when the first attempt revealed something the design hadn't accounted for.

Later in the project I introduced Claude Code, which changed the working dynamic considerably. Claude Code runs locally alongside the pipeline, which means it can read the configuration files, edit them, run a render, and show me the result — all within a single session. The pipeline has dozens of interdependent parameters controlling everything from how finely different regions are subdivided into tiles, to how each emoji is selected and color-adjusted to match the original. I make every aesthetic judgment — deciding whether a face reads correctly, whether a background region is too bright, whether the hands have enough detail — and Claude Code helps me understand what the numbers mean and translates my artistic observations into specific parameter adjustments, then runs the next iteration.

It's a tight creative loop. A single piece typically goes through a dozen or more render cycles, each one moving closer to what I have in mind for that work. Claude Code also helped trace some stubborn bugs — when triangular edge tiles were rendering incorrectly due to black fill pixels from the 45° canvas rotation contaminating the color scoring, working through the pipeline systematically identified three separate causes and the fixes for each.

Claude also helped write this post and the technical paper describing the system's architecture. Reviewing the paper against the actual codebase caught a meaningful error in how the edge detection algorithm had been described — the kind of quality check that genuinely benefits from a second perspective.

I make every aesthetic call. Claude Code helps me execute it precisely and iterate quickly. That combination — artistic judgment and technical fluency working in a tight loop — is what makes the process feel like a craft.

Diamojism uses AI the way a lot of thoughtful creative work uses technology — as a capable assistant that handles specific tasks with precision, leaving the artistic judgment firmly where it belongs. The source image, the subject, the style, the parameter decisions that shape how the rendering system behaves for that piece: all of that is mine. The AI helps me execute it well.

If you're curious about the technical details, have a look at The Engineering Behind the Art: A Technical Paper.

Mike Jacobs https://www.mjacobsfineart.com

AI, Claude, and the Making of Diamojism

The Great Wave, Reimagined in Emoji

The Engineering Behind the Art: A Technical Paper

Michael Jacobs Fine Art