Friday, 5 June 2026

Understanding the Paper: GCNBoost — Artwork Classification by Label Propagation through a Knowledge Graph

Understanding the Paper: GCNBoost — Artwork Classification by Label Propagation through a Knowledge Graph

The paper “GCNBoost: Artwork Classification by Label Propagation through a Knowledge Graph” proposes a method for improving artwork classification by combining visual image features with contextual information stored in a knowledge graph. The model is called GCNBoost because it uses a Graph Convolutional Network, or GCN, to boost classification performance through label propagation.

The central idea is that artworks are not defined only by pixels. A painting or statue also carries contextual information such as author, school, time period, type, style, size, and historical category. These attributes are often related to each other. For example, an artist may belong to a specific painting school, and that school may be associated with a certain time period. GCNBoost uses these relationships to improve classification.

Core Idea: Artwork classification improves when the model uses both visual image features and contextual relationships between artworks and labels through a knowledge graph.

1. What Problem Is the Paper Solving?

The digitization of cultural heritage has created large collections of artwork images. Museums and archives now contain thousands of paintings, sculptures, statues, manuscripts, and other cultural objects in digital form. To organize these collections, automatic classification becomes important.

Artwork classification means assigning labels to an artwork. For example, a painting may be classified according to its type, school, time period, or author. Similarly, a Buddha statue may be classified according to style, size, century of creation, or dimensions.

Artwork Type Possible Classification Labels
Painting Author, school, type, time frame, technique.
Buddha statue Style, size, century of creation, dimensions.

A normal computer vision model can classify an artwork using only image features. However, this can be limiting because artworks also have strong cultural, historical, and semantic context. For example, if a painting is known to be by Vincent van Gogh, it is likely linked to the Dutch school and a certain historical period. These contextual relationships can help classification.

Research Problem: How can we use a knowledge graph and label propagation to improve automatic artwork classification, especially when some data is unlabeled?

2. Main Idea of GCNBoost

GCNBoost combines three ideas:

Idea Meaning
Visual features Images are represented using CNN-based embeddings, such as ResNet features.
Knowledge graph Artworks and labels are represented as nodes, and their relationships are represented as edges.
Label propagation Unlabeled test samples are given pseudo-labels so they can participate in the graph during training.

The model first builds a knowledge graph using labeled training artworks and their known attributes. Then, it extends this graph by adding test artworks with pseudo-labels predicted by a pre-trained model. This produces an Extended Knowledge Graph, or EKG.

The broad pipeline can be represented as:

\[ Artwork\ Images \rightarrow Pseudo\ Labels \rightarrow Extended\ Knowledge\ Graph \rightarrow GCN \rightarrow Artwork\ Embeddings \rightarrow Classifier \]

The diagram in Figure 1 of the paper shows this complete process. Training samples have real labels, test samples receive pseudo-labels, the extended knowledge graph is built, and a GCN produces image-node embeddings that are finally used for classification.

3. Knowledge Graph and Extended Knowledge Graph

A knowledge graph is written as:

\[ G = (V,E) \]

Here, \(V\) is the set of nodes and \(E\) is the set of edges. In this paper, nodes can represent artworks or labels. Edges represent relationships between artworks and labels, or relationships between labels themselves.

Node Type Example
Artwork node A specific painting or Buddha statue.
Label node Author, school, type, time frame, style, size, century, dimension.

For example, if a painting is by Vincent van Gogh and belongs to the Dutch school, the graph can contain edges such as:

\[ Painting \rightarrow Vincent\ van\ Gogh \]

\[ Vincent\ van\ Gogh \rightarrow Dutch\ School \]

This means the graph captures not only the artwork-label relationship, but also the relationship between labels themselves.

3.1 Formal Knowledge Graph Construction

Let \(X_{train}\) be the set of training artworks and \(X_{test}\) be the set of test artworks. Each training artwork has labels:

\[ \{t_c \mid c \in C\} \]

Here, \(C\) is the set of label categories. For the SemArt dataset, examples of label categories include:

Label Category Example Labels
Type Portrait, landscape, religious, still-life.
School Italian, Dutch, French, Flemish.
TimeFrame Historical period or time interval.
Author Name of painter.

The basic knowledge graph is defined as:

\[ V = X_{train} \cup L \]

\[ E = W \cup K \]

Here, \(L\) is the set of all label nodes, \(W\) is the set of artwork-label edges, and \(K\) is the set of known label-label relationships.

3.2 Extended Knowledge Graph

The paper extends the graph by adding test artworks:

\[ V' = V \cup X_{test} \]

However, test artworks do not have ground-truth labels. Therefore, the model first assigns pseudo-labels to test artworks using a pre-trained classifier.

If \(g_c(x)\) is a classifier that predicts the pseudo-label for artwork \(x\) in category \(c\), then:

\[ t'_c = g_c(x) \]

These pseudo-labels are then added as new edges:

\[ E' = E \cup \{(x,t'_c) \mid x \in X_{test}, c \in C\} \]

The extended graph is therefore:

\[ G' = (V',E') \]

The paper’s Figure 2 explains this visually. The original KG contains known artwork-label relationships. The EKG adds dashed lines generated by pseudo-labels, connecting unlabeled test artworks to predicted labels.

4. Pseudo-Labels and Label Propagation

A pseudo-label is a predicted label assigned to an unlabeled sample. It is not guaranteed to be correct, but it gives the model a useful starting point.

For example, suppose a painting in the test set is unlabeled. A pre-trained model may predict:

Category Pseudo-Label
Type Landscape
School Dutch
TimeFrame 1651–1700
Author Unknown or predicted artist

These pseudo-labels connect the test painting to label nodes inside the graph. Once connected, the GCN can propagate information between similar or related nodes.

This is called transductive learning. Unlike ordinary inductive learning, where the model trains only on training data and later sees test data, transductive learning allows the model to use the structure of both training and test samples during learning. The true labels of test samples remain unknown, but their graph connections are used.

Simple Explanation: GCNBoost says, “Even if I do not know the true label of this test artwork, I can use an initial guess and its graph neighborhood to improve the final prediction.”

5. Graph Convolutional Network

After building the extended knowledge graph, the paper uses a Graph Convolutional Network to learn embeddings for artwork nodes and label nodes.

The GCN update equation is:

\[ H^{(n)} = ReLU \left( D^{-\frac{1}{2}} (A+I) D^{-\frac{1}{2}} H^{(n-1)} W^{(n)} + b^{(n)} \right) \]

Symbol Meaning
\(H^{(n)}\) Node representation at GCN layer \(n\).
\(A\) Adjacency matrix of the extended knowledge graph.
\(I\) Identity matrix, added for self-loops.
\(D\) Degree matrix of the graph.
\(W^{(n)}\) Learnable weight matrix at layer \(n\).
\(b^{(n)}\) Bias term at layer \(n\).

The equation means that each node updates its representation by aggregating information from its neighbors. If an artwork is connected to a certain author, school, or timeframe, those label nodes influence the artwork’s representation. Similarly, label nodes can act as hubs connecting related artworks.

5.1 Initial Node Features

The initial features are different for image nodes and label nodes:

Node Type Initial Feature Source
Artwork image node ResNet50 visual features.
Label node node2vec embeddings over the graph.

This means the model combines visual representation from images with graph-based representation from labels.

6. Training and Inference

After the GCN produces the final artwork embedding \(h_i\), a fully connected classifier predicts labels for each label category.

The classifier is:

\[ y_{ic} = softmax(W_ch_i + b_c) \]

Here, \(y_{ic}\) is the predicted probability distribution over labels in category \(c\).

The loss function is cross-entropy over the training samples:

\[ \ell = - \sum_{x_i \in X_{train}} \sum_{c \in C} \sum_{k=1}^{|L_c|} t_{ick}\log y_{ick} \]

The important point is that the loss is calculated only on training samples with real labels. Test samples are present in the graph through pseudo-labels, but their true labels are not used for loss calculation.

After training, the learned embedding of each test artwork is passed through the classifier to predict its final labels.

7. Datasets Used

The paper evaluates GCNBoost on two cultural heritage datasets: SemArt and Buddha statues.

7.1 SemArt Dataset

The SemArt dataset contains European fine-art paintings. The split is:

Split Number of Images
Training 19,244
Validation 1,069
Test 1,069

The model is evaluated on four tasks:

Task Description
Type classification Classifies paintings into types such as portrait, landscape, religious, study, genre, still-life, mythological, interior, historical, and other.
School classification Classifies paintings into schools such as Italian, Dutch, French, Flemish, German, Spanish, and others.
TimeFrame classification Classifies paintings according to historical time periods.
Author classification Classifies paintings according to painter identity.

7.2 Buddha Statues Dataset

The Buddha statues dataset contains:

Split Number of Images
Training 1,866
Validation 266
Test 533

The model is evaluated on four tasks:

Task Description
Style classification Classifies Buddha faces into styles such as China, Kamakura, Nara, and Heian.
Size classification Classifies statues as small, medium, or big.
Century classification Classifies statues into centuries such as V, VI, VII, VIII, IX, XII, and XIII.
Dimensions classification Classifies statues according to dimension categories.

8. Experimental Results

8.1 Importance of Pseudo-Labels

The paper tests how many pseudo-label categories should be added to the extended knowledge graph. The configurations are:

Configuration Meaning
\(S_0\) Random pseudo-label initialization.
\(S_1\) One pseudo-label category added.
\(S_2\) Two pseudo-label categories added.
\(S_3\) Three pseudo-label categories added.
\(S_{all}\) All pseudo-label categories added.

The results show that random initialization fails, while meaningful pseudo-labels improve performance strongly. Adding more useful pseudo-label categories generally improves classification because the graph becomes richer.

8.2 SemArt Results

On the SemArt dataset, GCNBoost significantly improves classification for Type, School, and TimeFrame. The main comparison is shown below:

Model Type School TimeFrame Author
ContextNet KGM 0.815 0.671 0.613 0.615
GCNBoost \(S_1\) 0.807 0.718 0.796 0.181
GCNBoost \(S_2\) 0.915 0.866 0.906 0.354
GCNBoost \(S_3\) 0.930 0.882 0.933 0.482
GCNBoost \(S_{all}\) 0.939 0.889 0.927 0.479
GCNBoost \(S_{all}^{*}\) with Author filter - - - 0.702

The strongest result is for Type classification, where GCNBoost reaches:

\[ Accuracy = 0.939 \]

For School classification, it reaches:

\[ Accuracy = 0.889 \]

For TimeFrame classification, the best result is:

\[ Accuracy = 0.933 \]

However, Author classification remains difficult because the author labels are highly imbalanced. Many authors have very few paintings, which means they are weakly connected in the graph. When the authors with very low degree are filtered out, the accuracy improves to:

\[ Accuracy = 0.702 \]

8.3 Buddha Statues Results

On the Buddha statues dataset, GCNBoost also improves performance as more pseudo-label categories are used.

Model Style Size Century Dimensions
NN original 0.98 0.78 0.78 0.78
NN retrained 0.58 0.65 0.76 0.46
\(S_0\) random initialization 0.23 0.30 0.13 0.08
GCNBoost \(S_1\) 0.57 0.68 0.74 0.47
GCNBoost \(S_2\) 0.59 0.85 0.80 0.76
GCNBoost \(S_3\) 0.88 0.86 0.86 0.84
GCNBoost \(S_{all}\) 0.92 0.94 0.93 0.90

GCNBoost \(S_{all}\) gives the best result for Size, Century, and Dimensions classification. The original NN result is still higher for Style, but the paper explains that the original setup used a different evaluation protocol with more training data and no separate test set.

9. Discussion and Interpretation

9.1 Why GCNBoost Works

GCNBoost works because it allows artworks to influence each other through shared labels and contextual relationships. For example, paintings from the same school or time period become connected indirectly through label nodes. This helps the model learn more meaningful representations than image features alone.

The paper relates this to the idea of homophily. Homophily means that similar entities tend to connect or cluster together. In artwork classification, paintings that share author, school, timeframe, or type may also be close in the knowledge graph.

9.2 Why Author Classification Is Difficult

Author classification is difficult because the author category is highly imbalanced. Some authors have many paintings, while many authors have only a few. In graph terms, some author nodes have high degree, while many author nodes have very low degree.

Low-degree nodes are difficult for the GCN because there is not enough neighborhood information to propagate. This is why the model performs poorly on Author classification until very low-degree author nodes are filtered.

9.3 What the Visualizations Show

The paper uses t-SNE visualizations to compare embeddings. The visualizations show that GCNBoost creates better-separated clusters than the baseline models. For SemArt, the GCNBoost embeddings separate TimeFrame categories more clearly. For Buddha statues, the GCNBoost embeddings separate styles more clearly than the retrained NN baseline.

This suggests that the combination of visual features and graph-based context creates more structured and meaningful representations.

10. Strengths of the Paper

The first strength of the paper is that it uses unlabeled test data in a meaningful way through transductive learning. Rather than ignoring test samples during training, it connects them to the graph through pseudo-labels.

The second strength is the use of knowledge graphs for cultural heritage classification. Artworks naturally have rich metadata and contextual relationships, and a knowledge graph is a suitable structure for representing this information.

The third strength is that the method works across two different cultural heritage domains: European paintings and Buddha statues.

The fourth strength is that it improves performance under some forms of data imbalance, especially when graph connectivity is sufficient.

11. Limitations of the Paper

One limitation is that pseudo-labels can be noisy. If the pre-trained classifier assigns wrong pseudo-labels, those wrong edges enter the extended knowledge graph and may influence learning.

Another limitation is that extremely low-degree labels remain difficult. The Author classification task shows that when many classes have very few examples, graph propagation alone is not enough.

A third limitation is that the method is transductive. This means test data is included in the graph during training. In practical deployment, the graph may need to be updated whenever new artworks are added.

Finally, the method depends on the availability of meaningful metadata and label relationships. If the knowledge graph is poor, sparse, or noisy, the advantage of GCNBoost may reduce.

12. Connection with Saree and Textile Research

This paper is very relevant for saree provenance classification and textile heritage research. A saree image is not only a visual object. It also has contextual attributes such as region, weaving technique, material, motif, border type, pallu style, zari type, and craft cluster.

A saree knowledge graph could contain nodes such as:

Node Type Saree Example
Saree image A specific saree photograph.
Cluster Kanchipuram, Banaras, Paithani, Gadwal, Ilkal.
Motif Mango, peacock, temple, floral, buta.
Technique Kadwa, Korvai, Jamdani, Ikat, brocade.
Material Silk, cotton, zari, tussar, linen.

The graph could contain relationships such as:

\[ Saree \rightarrow has\_motif \rightarrow Peacock \]

\[ Saree \rightarrow uses\_technique \rightarrow Korvai \]

\[ Saree \rightarrow belongs\_to\_cluster \rightarrow Kanchipuram \]

\[ Korvai \rightarrow associated\_with \rightarrow Kanchipuram \]

If some saree images are unlabeled, a preliminary CNN or ViT model could assign pseudo-labels. These pseudo-labels could then be used to build an extended saree knowledge graph, and a GCN could refine the classification.

This is especially useful for your saree provenance research because many visual features are related to cultural and technical context. For example, a border motif, material, and pallu structure may jointly indicate a region. GCNBoost gives a useful framework for combining visual classification with structured textile knowledge.

13. One-Sentence Summary

The paper proposes GCNBoost, a graph-based artwork classification framework that builds an extended knowledge graph using real labels and pseudo-labels, then applies a graph convolutional network to propagate label information and improve classification of artworks such as paintings and Buddha statues.

General Disclaimer: This explanation is intended for educational and conceptual understanding. It simplifies some technical details of the original research paper while preserving the main ideas, equations, architecture, experimental results, and practical implications.
```

No comments:

Post a Comment

Understanding the Paper: Drishtikon

DRISHTIKON: A Multimodal Multilingual Benchmark for Indian Cultural Understanding The paper “DRISHTIKON: A Multimodal Multilingual Benchm...