Semantic Segmentation in Art Paintings

Nadav Z. Cohen¹, Yael Newman², Ariel Shamir³

¹The Hebrew University of Jerusalem, ²Tel-Aviv University, ³Reichman University

Computer Graphics Forum (EuroGraphics) 2022

Paper Supplemental arXiv Code DRAM Dataset

Abstract

Semantic segmentation is a difficult task even when trained in a supervised manner on photographs. In this paper, we tackle the problem of semantic segmentation of artistic paintings, an even more challenging task because of a much larger diversity in colors, textures, and shapes and because there are no ground truth annotations available for segmentation.

We propose an unsupervised method for semantic segmentation of paintings using domain adaptation. Our approach creates a training set of pseudo-paintings in specific artistic styles by using style-transfer on the PASCAL VOC 2012 dataset, and then applies domain confusion between PASCAL VOC 2012 and real paintings. These two steps build on a new dataset we gathered called DRAM (Diverse Realism in Art Movements) composed of figurative art paintings from four movements, which are highly diverse in pattern, color, and geometry. To segment new paintings, we present a composite multi-domain adaptation method that trains on each sub-domain separately and composes their solutions during inference time. Our method provides better segmentation results not only on the specific artistic movements of DRAM, but also on other, unseen ones. We compare our approach to alternative methods and show applications of semantic segmentation in art paintings.

Presentation: EuroGraphics 2022 - Reims, France

DRAM Dataset

The Diverse Realism in Art Movements dataset is a Domain Adaptation dataset for Semantic Segmentation which is mainly comprised of four main art movements: Realism, Impressionism, Post-Impressionism, and Expressionism.

The dataset is used in our paper as a target domain, meaning it has an unlabeled training set and a fully annotated test set. The test annotations follows the guidelines of PASCAL VOC2012 which serves in our paper as the source dataset. For more info visit the PASCAL dataset website.

In addition, DRAM has an 'Unseen' test sets which were unseen during training: Art Nouveau, Baroque, Cubism, Divisionism, Fauvism, Ink & Wash, Japonism, Rococo.

DRAM covers 11 of the 20 classes used in PASCAL: Bird, Boat, Bottle, Cat, Chair, Cow, Dog, Horse, Person, Potted-Plant, and Sheep. The rest of the classes are considered Background, which is the 12th class of the dataset. For more info please read Section 3 of our paper.

Download

We offer to download our dataset in two versions: Raw and Processed.

The raw version holds all DRAM dataset and annotations without additional processing while the processed version holds the dataset as used in our paper. Accompanying our dataset, we share our code for processing the raw data, filtering PASCAL dataset, and creating pseudo-paintings (as described in the paper) in the project repo.

Terms of Use

This dataset is provided for research purposes only and comes without any warranty. Commercial use is strictly prohibited. If you use this dataset or any part of it in your research, we kindly ask that you cite our paper (see BibTeX below).

By downloading the dataset, you agree to these terms.

Raw Dataset Processed Dataset

Method

Semantic Segmentation in Art Paintings explores much more than the task of segmenting paintings. In our paper, we explore the concept of domain adaptation and its current applications in a way never explred before: using photographic data as the souce labeled domain and synthetic data as the target unlabeled domain. We specifically use a highly variable target domain for our experiments, composed from paintings of a variaty of art styles, so we can deeply understand and make practical assesments about the ability of current domain adaptation frameworks to perform well over highly variable, synthetic targeted tasks. We invite you to explore our work and read our paper.

Training

Pseudo Paintings

Our initial training phase begins with generating Pseudo Paintings - artistic augmentations of a labeled photographic dataset. To achieve this, we train a style-transfer network for each artistic style using its unlabeled training data and apply it to the labeled dataset. By utilizing these pseudo paintings, we fine-tune a base model with artistically labeled data for each training art movement, embedding artistic characteristics into the trained model.

Domain Confusion

In the Domain Confusion step, we use a discriminator to adversarially train the model's encoders, aligning paintings and photographs within a shared latent space. This refined domain captures features common to both data domains, enabling knowledge from the labeled photographic data to enhance the learning of the unlabeled painting dataset.

Inference

To perform inference on a new painting, we pass the image through all the trained models. Next, we encode the image into a GRAM representation and identify its k-nearest neighbors from the training set. Using the percentage of each art movement in the KNN results, we apply a weighted sum of the softmax outputs from the movement-specific models. This approach not only enhances the results for the trained art movements but also improves performance for previously unseen movements. The improvement arises from the ability to combine information from different art movements, allowing the model to better represent novel, unseen artistic styles.

Results

Application: Semantic Guided Style Transfer

Style transfer, where a photograph is transformed into a stylized version based on a selected stylistic image, has gained significant popularity. While most style transfer techniques apply the same stylization across the entire image, it can be more effective to segment the image into regions, identify objects, and apply different stylizations based on the image's semantic content. Alex J. Champandard introduced a semantic style transfer method that requires two images (style and content) and two corresponding semantic maps. However, the current version of this method relies on manually created semantic maps, which can be difficult to produce. By utilizing our semantic segmentation approach, we automate this process and develop an end-to-end framework for semantic style transfer that only requires a pair of content and style images.

Application: Comparative Collections

To better understand and analyze artworks in a specific artistic movement or of a specific artist, comparisons are often performed between different paintings. Using semantic segmentation, such comparisons can be done not only at the painting level but also on specific objects or items. Using semantic segmentation one can gather all occurrences of a certain class from a given set of paintings, extract them from their original images and place them side-by-side for comparison.