SIGGRAPH 2026

Colorful-Noise: Training-Free Low-Frequency Noise Manipulation for Color-Based Conditional Image Generation

Teaser
Fig. 1: Given a low-frequency color prior (left), our method steers the global palette and composition of the generated outputs (right) across diverse prompts — without any training or fine-tuning.

Text-to-image diffusion models generate images by gradually converting white Gaussian noise into a natural image. White Gaussian noise is well suited for producing diverse outputs from a single text prompt due to its absence of structure. However, this very property limits control over, and predictability of, specific visual attributes, as the noise is not human-interpretable. In this work, we investigate the characteristics of the input noise in diffusion models. We show that, although all frequencies in white Gaussian noise have comparable statistical energy, low-frequency components primarily determine the image’s global structure and color composition, while high-frequency components control finer details. Building on this observation, we demonstrate that simple manipulations of the low-frequency noise using low-frequency image priors can effectively condition the generation process to reconstruct these low-frequency visual cues. This allows us to define a simple, training-free method with minimal overhead that steers overall image structure and color, while letting high-frequency components freely emerge as fine details, enabling variability across generated outputs.

Low-Frequency vs. High-Frequency Conditioning

Many artistic processes, such as painting and sculpting, begin with coarse structure and gradually refine fine details. Diffusion models follow a similar process: early denoising steps recover low-frequency information (global structure), while later steps introduce high-frequency details. In this work, we leverage this property to condition image generation using an initial sketch. By constraining only the low frequencies, we preserve the overall structure while allowing the model to freely generate details. As shown below, this enables image generation from colorful sketches that naturally resemble the early stages of drawing. In contrast, traditional conditioning methods such as ControlNet often rely on detailed conditions, resulting in rigid generations with limited diversity, where variations are mostly restricted to low-frequency properties like color. Moreover, because these methods constrain the entire diffusion process, they can reduce output quality. Conditioning only the initial noise preserves generation flexibility and maintains image quality.

Low-frequency outputs Conditions High-frequency outputs
Color + structure condition
Edge + texture condition

How does it work?

Our approach manipulates the low-frequency components of the initial Gaussian noise before the denoising process begins. By replacing or blending the low-frequency bands of the noise with a downsampled color prior, the diffusion model’s generation is steered toward the desired palette and composition — without any retraining, fine-tuning, or additional model components. The high-frequency noise bands remain random, ensuring diversity in fine detail across different seeds and prompts.

Method overview
Fig. 3: Overview of Colorful-Noise. Given an input color prior, we extract its low-frequency component via frequency decomposition and inject it into the initial noise tensor. The modified noise is fed directly into a standard text-to-image diffusion model — no changes to the model or sampling procedure required.

Qualitative results

Applications

Beyond direct color conditioning, Colorful-Noise enables a range of downstream creative and editing tasks. Because our method operates purely on the initial noise — with no model modifications — it composes naturally with any text-to-image pipeline.

Color-Based Style Alignment

Given a reference style image, we extract its low-frequency color palette and inject it into the initial noise. This guides the generated output to match the global color distribution and tonal character of the reference — without copying its content or structure. The result is a stylistically aligned image that remains fully controlled by the text prompt, enabling coherent multi-image sets that share a consistent visual language.

Color-based style alignment results
Fig. 4: Color-based style alignment. Left: reference style image used as a color prior. Right: generated outputs conditioned on its low-frequency palette across varied text prompts, demonstrating consistent color-mood alignment without structural imitation. Bottom reference © Augustin Arroyo (@flowalistic on Instagram). All rights reserved.

Color-Preserving Stylization

Colorful-Noise can be effectively combined with other conditioning methods, such as Canny edge maps (via ControlNet1) and stylization techniques (e.g., Conditional Balanced2 Style-Aligned3). As demonstrated, given a content image whose colors should be preserved, Colorful-Noise retains its low-frequency information, while ControlNet1 constrains the high-frequency structural details. When stylization is applied, the resulting image preserves the original content colors while transferring the texture and geometric characteristics of the reference style. Furthermore, the strength of this effect can be controlled through linear interpolation between Colorful-Noise and white noise, enabling a gradual transition between content preservation and generative flexibility.

Color-preserving stylization example 1
Fig. 5: Color-preserving stylization across diverse content and style images. Even without a reference style image, combining Colorful-Noise with ControlNet Canny enables the generation of a close reconstruction of the original image while preserving its color composition.
Color-preserving stylization example 2
Fig. 6: Interpolation between the color style of the content image and the conditional style image across diverse content and style pairs.
  1. 1 Zhang et al. Adding Conditional Control to Text-to-Image Diffusion Models.
  2. 2 Z. Cohen et al. Conditional Balance: Improving Multi-Conditioning Trade-Offs in Image Generation.Project page ↗
  3. 3 Hertz & Voynov et al. Style Aligned Image Generation via Shared Attention.Project page ↗

BibTeX

@misc{cohen2026colorfulnoise,
  title          = {Colorful-Noise: Training-Free Low-Frequency Noise Manipulation for Color-Based Conditional Image Generation},
  author         = {Nadav Z. Cohen and Ofir Abramovich and Ariel Shamir},
  year           = {2026},
  eprint         = {2605.00548},
  archivePrefix  = {arXiv},
  primaryClass   = {cs.CV},
  doi            = {https://doi.org/10.1145/3799902.3811104},
  url            = {https://arxiv.org/abs/2605.00548}
}