Colorful-Noise — Project Page

Motivation

Low-Frequency vs. High-Frequency Conditioning

Many artistic processes, such as painting and sculpting, begin with coarse structure and gradually refine fine details. Diffusion models follow a similar process: early denoising steps recover low-frequency information (global structure), while later steps introduce high-frequency details. In this work, we leverage this property to condition image generation using an initial sketch. By constraining only the low frequencies, we preserve the overall structure while allowing the model to freely generate details. As shown below, this enables image generation from colorful sketches that naturally resemble the early stages of drawing. In contrast, traditional conditioning methods such as ControlNet often rely on detailed conditions, resulting in rigid generations with limited diversity, where variations are mostly restricted to low-frequency properties like color. Moreover, because these methods constrain the entire diffusion process, they can reduce output quality. Conditioning only the initial noise preserves generation flexibility and maintains image quality.

Low-frequency outputs Conditions High-frequency outputs

Color + structure condition

Edge + texture condition

Method

How does it work?

Our approach manipulates the low-frequency components of the initial Gaussian noise before the denoising process begins. By replacing or blending the low-frequency bands of the noise with a downsampled color prior, the diffusion model’s generation is steered toward the desired palette and composition — without any retraining, fine-tuning, or additional model components. The high-frequency noise bands remain random, ensuring diversity in fine detail across different seeds and prompts.

Fig. 3: Overview of Colorful-Noise. Given an input color prior, we extract its low-frequency component via frequency decomposition and inject it into the initial noise tensor. The modified noise is fed directly into a standard text-to-image diffusion model — no changes to the model or sampling procedure required.

Results & Examples

Qualitative results

Color Variation

The same structural input conditioned on different color priors. Low-frequency hue manipulation steers the global palette while preserving layout and fine detail.

Input click any image to toggle Result

Prompt Variation

The same input noise conditioned on different text prompts. Global color and structure remain consistent while semantic content shifts according to each prompt.

Input click any image to toggle Result

Seed Variation

Multiple outputs from the same input and prompt, sampled with different high-frequency noise seeds. Fine details vary while the overall structure and color remain anchored to the condition.

Input click any image to toggle Result

Additional Results

A broader gallery of inputs and their generated outputs across diverse subjects, prompts, and color conditions.

Input click any image to toggle Result

Applications

Beyond direct color conditioning, Colorful-Noise enables a range of downstream creative and editing tasks. Because our method operates purely on the initial noise — with no model modifications — it composes naturally with any text-to-image pipeline.

Color-Based Style Alignment

Given a reference style image, we extract its low-frequency color palette and inject it into the initial noise. This guides the generated output to match the global color distribution and tonal character of the reference — without copying its content or structure. The result is a stylistically aligned image that remains fully controlled by the text prompt, enabling coherent multi-image sets that share a consistent visual language.

Fig. 4: Color-based style alignment. Left: reference style image used as a color prior. Right: generated outputs conditioned on its low-frequency palette across varied text prompts, demonstrating consistent color-mood alignment without structural imitation. Bottom reference © Augustin Arroyo (@flowalistic on Instagram). All rights reserved.

Color-Preserving Stylization

Colorful-Noise can be effectively combined with other conditioning methods, such as Canny edge maps (via ControlNet¹) and stylization techniques (e.g., Conditional Balanced² Style-Aligned³). As demonstrated, given a content image whose colors should be preserved, Colorful-Noise retains its low-frequency information, while ControlNet¹ constrains the high-frequency structural details. When stylization is applied, the resulting image preserves the original content colors while transferring the texture and geometric characteristics of the reference style. Furthermore, the strength of this effect can be controlled through linear interpolation between Colorful-Noise and white noise, enabling a gradual transition between content preservation and generative flexibility.

Fig. 5: Color-preserving stylization across diverse content and style images. Even without a reference style image, combining Colorful-Noise with ControlNet Canny enables the generation of a close reconstruction of the original image while preserving its color composition.

Fig. 6: Interpolation between the color style of the content image and the conditional style image across diverse content and style pairs.

¹ Zhang et al. Adding Conditional Control to Text-to-Image Diffusion Models.
² Z. Cohen et al. Conditional Balance: Improving Multi-Conditioning Trade-Offs in Image Generation. Project page ↗
³ Hertz & Voynov et al. Style Aligned Image Generation via Shared Attention. Project page ↗

Colorful-Noise: Training-Free Low-Frequency Noise Manipulation for Color-Based Conditional Image Generation

Low-Frequency vs. High-Frequency Conditioning

How does it work?

Qualitative results

Color Variation

Prompt Variation

Seed Variation

Additional Results

Applications

Color-Based Style Alignment

Color-Preserving Stylization

BibTeX