Steering Generative Models for Accessibility: EasyRead Image Generation using OpenMoji

This is a guest post by Nicolas Dickenmann and co-authors from ETH Zurich / UNICEF.

Artificial intelligence is rapidly transforming how visual content is created. Generative models can now produce stunning, intricate images in a matter of seconds. But while these highly detailed outputs are visually impressive, more detail isn’t always better.

EasyRead is a format designed to make information universally understandable, pairing simple text with clear, unambiguous pictograms. It is an essential communication tool for people with intellectual disabilities, learning difficulties, or limited language proficiency. Unfortunately, modern image generation systems are not designed with cognitive accessibility in mind. They default to producing visually complex, abstract, or cluttered outputs that directly contradict EasyRead guidelines.

This gap is the motivation behind our recent paper, Steering Generative Models for Accessibility: EasyRead Image Generation, which was recently accepted at the upcoming CHI26 conference. In it, we tackle a critical question:

Can we teach state-of-the-art generative models to prioritize cognitive accessibility and visual simplicity over complex aesthetics?

Method: A Custom Pipeline for Cognitive Accessibility

To solve this, we couldn’t just rely on standard prompt engineering, as base models still tend to inject unwanted details. Instead, we needed to systematically steer the model’s visual representations by building a dedicated EasyRead pipeline.

Our pipeline consists of a few key steps:

Curating and Enhancing the Data: This is where the incredible foundation built at HfG Schwäbisch Gmünd by the OpenMoji project became essential to our research. To teach our model what a consistent, clear pictogram looks like, we used OpenMoji’s extensive, open-source library as a core pillar of our training data, combining it with complementary datasets from ARASAAC and LDS. Because standard emoji labels are naturally brief, but a bit too sparse for training a complex diffusion model, we used the BLIP image captioning model to generate rich, natural-language descriptions.
Strategic Augmentation: To enable targeted customization at inference, we leveraged the ARASAAC API to systematically augment the training images with varying background colors, skin tones, and hair colors.
Parameter-Efficient Fine-Tuning: We fine-tuned Stable Diffusion v1.5 using Low-Rank Adaptation (LoRA) on the attention layers. By keeping the core model weights frozen and introducing a unique instance token (sks), we successfully taught the model the strict “EasyRead” style without losing its broad semantic knowledge.

Figure 1: Transformer-Architecture of LLMs

Figure 1: Examples of the three datasets in our corpus: ARASAAC pictograms, OpenMoji icons, and LDS symbolic illustrations.

Figure 2: Transformer-Architecture of LLMs

Figure 2: Our pipeline at training and inference time

Measuring Success: Introducing the EasyRead Score (ERS)

A major bottleneck in accessibility research is evaluation. To bridge this gap, we introduced the EasyRead Score (ERS), the first quantifiable metric that formalizes accessibility-oriented design principles. The ERS is an aggregate of six distinct sub-metrics that evaluate:

Palette Complexity (Low visual clutter)
Edge Density (Simple geometry)
Saliency Concentration (Clear visual focus)
Foreground-Background Contrast (Strong separation)
Centering Error (Stable layout)
Relative Stroke Thickness (Consistent outlines)

Figure 3: Transformer-Architecture of LLMs

Figure 3: The EasyRead Score (ERS) consists of sub-metrics to capture the properties of EasyRead pictograms.

Results: Measurable Clarity and Precision

Figure 4: Transformer-Architecture of LLMs

Figure 4: EasyRead pictograms generated with our finetuned model

We evaluated our fine-tuned model quantitatively against the baseline Stable Diffusion v1.5 across 275 generated images. The data showed that our lightweight adaptation substantially improved both style and semantic accuracy. Additionally, we qualitatively evaluated our model against closed-source systems, and noticed substantial advantages on stylistic consistency and instruction following.

Looking ahead

Generative AI shouldn’t just be a tool for creating complex digital art; it needs to be a practical utility for making the world more accessible. Moving forward we see exciting real-world applications, reducing cost of accessible communication while increasing availability. By designing generative systems with accessibility in mind, we can ensure the AI revolution leaves no one behind. Check out our full paper here, and explore our open-source code and checkpoints on GitHub!

Nicolas Dickenmann is affiliated with a research team at ETH Zurich / UNICEF. All rights to the images remain with their respective copyright holders.

Steering Generative Models for Accessibility: EasyRead Image Generation using OpenMoji

Author

Steering Generative Models for Accessibility: EasyRead Image Generation using OpenMoji

Method: A Custom Pipeline for Cognitive Accessibility

Measuring Success: Introducing the EasyRead Score (ERS)

Results: Measurable Clarity and Precision

Looking ahead