My Research Notes: Understanding the Paper: Searching Silk Fabrics by Images Leveraging on Knowledge Graph and Domain Expert Rules

Understanding the Paper: Searching Silk Fabrics by Images Leveraging on Knowledge Graph and Domain Expert Rules

The paper “Searching Silk Fabrics by Images Leveraging on Knowledge Graph and Domain Expert Rules” presents an image-based retrieval system for searching European silk textile objects. The system combines three important elements: a knowledge graph, domain expert rules, and a deep learning-based image retrieval model.

The central aim of the paper is to help preserve and explore European silk textile heritage. Many historical silk fabrics are held in museums and collections across the world. Their images and metadata are scattered across different websites, formats, and languages. This paper proposes a way to bring that scattered knowledge together and allow users to search for visually or semantically similar silk fabrics using images.

Core Idea: The paper builds an image retrieval system where a user can search silk fabrics by image, while the system uses both visual similarity and semantic knowledge from a cultural heritage knowledge graph.

Table of Contents

1. What Problem Is the Paper Solving?
2. Main Idea of the Paper
3. Knowledge Graph for Silk Textiles
4. Domain Expert Rules for Similarity
5. Image-Based Retrieval Model
6. Loss Functions Used for Training
7. Similarity Scenarios
8. Evaluation and Results
9. Exploratory Search Engine
10. Connection with Saree and Textile Research

1. What Problem Is the Paper Solving?

European silk textile production is described in the paper as an endangered form of intangible cultural heritage. Although many historical silk fabrics still exist in museums and collections, the information about them is fragmented. One museum may describe production time in one way, another museum may describe material or technique differently, and another may store images without standardized metadata.

This creates a serious problem for historians, textile experts, designers, and the public. If someone has an image of a silk fabric and wants to find visually or historically similar fabrics, there is no easy way to search across many collections at once.

The paper therefore asks a practical question:

Research Question: Can we build an image-based search system for silk fabrics that uses both visual similarity and cultural heritage knowledge?

This question is highly relevant for textile heritage because similarity is not only visual. Two fabrics may look similar because of colour or pattern, but they may also be related because of production place, time period, material, technique, or depicted motif.

2. Main Idea of the Paper

The paper proposes a system that combines:

Component	Purpose
Knowledge Graph	Stores structured information about silk textile objects, including production time, place, material, technique, and motifs.
Domain Expert Rules	Defines expert-informed similarity rules, such as fabrics using the same technique or showing the same type of motif.
CNN-Based Image Retrieval	Learns image descriptors so that similar textile images are close to one another in feature space.
Exploratory Search Engine	Allows users to search for visually similar images or objects with similar semantic properties.

The model is trained to produce feature vectors for fabric images. If two images are similar, their feature vectors should be close. If two images are dissimilar, their feature vectors should be far apart.

The broad pipeline can be represented as:

\[ Museum\ Records \rightarrow Knowledge\ Graph \rightarrow Expert\ Rules \rightarrow CNN\ Image\ Descriptors \rightarrow Similar\ Fabric\ Retrieval \]

3. Knowledge Graph for Silk Textiles

The authors collect museum records describing silk textiles and silk-related objects from collections around the world. These records include both metadata and images. The metadata may describe production time, production place, material, technique, and motifs.

The paper follows an ETL pipeline:

Step	Meaning	Purpose in This Paper
Extract	Collect data from museum websites and APIs.	Gather images and metadata of silk textile objects.
Transform	Convert different museum formats into a common structure.	Standardize fields such as date, material, technique, and production place.
Load	Load the structured data into a knowledge graph.	Create an RDF-based graph that can be queried and used for retrieval.

The knowledge graph uses the CIDOC-CRM ontology, a standard model for cultural heritage information. This is important because museum records are often heterogeneous. One museum may use the field name Date, another may use date_text, and another may use a field in another language. The knowledge graph harmonizes these fields into a common semantic structure.

The resulting knowledge graph contains descriptions of:

Resource Type	Count
Unique objects	36,210
Images	74,527

The paper’s Figure 1 gives an example of a textile object in the knowledge graph: an object from the CDMT Terrassa museum, produced in Italy in the 16th century, using the Brocatelle technique, made with silk bombyx mori, and showing the motif of a crown.

4. Domain Expert Rules for Similarity

A major contribution of the paper is the use of domain expert rules. Cultural heritage experts helped define when two silk fabric images should be considered similar. These rules are then converted into SPARQL queries over the knowledge graph.

The rules define image pairs that should be treated as similar during training. Examples include:

Rule Type	Similarity Meaning
Same record	Two images belong to the same museum object, so they show the same fabric.
Same special dataset or medium	Objects from the Garín dataset using graph paper or gouache on paper are considered similar.
Same technique or material	Both fabrics use techniques such as pile-on-pile velvet or ciselé velvet.
Same technique plus motif	Both fabrics use ciselé velvet and depict pomegranate motifs.
Plain fabric mention	The corresponding records mention plain fabric.
Same relevant colour cluster	Both images belong to expert-identified colour clusters such as saturated red, blue, blue damasks, or green damasks.

These expert rules are valuable because they translate textile knowledge into machine-learning supervision. The model is not learning only from pixels; it is also learning from what experts consider meaningful similarity.

5. Image-Based Retrieval Model

The image retrieval model uses a convolutional neural network to convert each image into a compact feature vector. This vector is called a descriptor.

The process can be written as:

\[ Image\ x \rightarrow CNN \rightarrow Descriptor\ f(x) \]

The paper uses a ResNet-152 backbone. The image is resized to:

\[ 224 \times 224 \]

The ResNet-152 backbone produces a 2048-dimensional feature vector. This is followed by two fully connected layers:

\[ 2048 \rightarrow 1028 \rightarrow 128 \]

The final output is a normalized 128-dimensional descriptor:

\[ f(x) \in \mathbb{R}^{128} \]

The purpose of training is to make descriptors of similar silk images close together and descriptors of dissimilar silk images farther apart.

After descriptors are computed for all images, image retrieval becomes a nearest-neighbour search:

\[ Query\ Image \rightarrow Descriptor \rightarrow k\text{-Nearest Neighbours} \rightarrow Similar\ Textile\ Images \]

6. Loss Functions Used for Training

The paper trains the model using a weighted combination of four loss terms. The total loss is:

\[ E(x,w) = \alpha_t E_t(x,w) + \alpha_s E_s(x,w) + \alpha_c E_c(x,w) + \alpha_r E_r(x,w) \]

Here, the four loss terms correspond to four different notions of similarity.

Loss Term	Name	Purpose
\(E_t\)	Semantic similarity loss	Uses metadata similarity from the knowledge graph.
\(E_s\)	Self-similarity loss	Makes an image similar to augmented versions of itself.
\(E_c\)	Colour similarity loss	Uses colour distribution similarity between images.
\(E_r\)	Rule-based similarity loss	Uses domain expert rules to define similar image pairs.

6.1 Semantic Similarity Loss

Semantic similarity is based on annotations in the knowledge graph. The paper considers five semantic variables:

Semantic Variable	Meaning
Production timespan	When the textile object was produced.
Production place	Where the textile object was produced.
Production material	What material was used.
Production technique	How the textile was produced.
Subject depicted	What motif or subject is depicted.

The semantic similarity between two images is defined as:

\[ Y_s(x_n,x_o) = \frac{1}{M} \sum_{m=1}^{M} v_m \cdot d_m(x_n,x_o) \cdot \pi_m^n \cdot \pi_m^o \]

Here, \(M\) is the number of semantic variables, \(v_m\) is the weight of semantic variable \(m\), \(d_m(x_n,x_o)\) measures agreement between the annotations of two images, and \(\pi_m^n\), \(\pi_m^o\) indicate whether the annotation is available for each image.

The paper also accounts for missing annotations using an uncertainty term:

\[ u(x_n,x_o) = 1 - \frac{1}{M} \sum_{m=1}^{M} \pi_m^n \cdot \pi_m^o \]

This is important because museum metadata is often incomplete. Two images should not be treated as dissimilar simply because one record lacks a certain annotation.

6.2 Triplet Loss for Semantic Similarity

The semantic loss is based on triplets:

\[ (x_a,\ x_{ps},\ x_{ng}) \]

Here, \(x_a\) is the anchor image, \(x_{ps}\) is a positive image considered similar to the anchor, and \(x_{ng}\) is a negative image considered less similar.

The model tries to make:

\[ \|f(x_a)-f(x_{ps})\|_2 < \|f(x_a)-f(x_{ng})\|_2 \]

This means the descriptor of the positive image should be closer to the anchor than the descriptor of the negative image.

6.3 Self-Similarity Loss

The self-similarity loss teaches the network that an image should remain similar to a transformed version of itself. The transformed image may be rotated, flipped, cropped, or slightly noised.

The loss is:

\[ E_s(x,w) = \frac{1}{N_{MB}} \sum_{n=1}^{N_{MB}} \|f_w(x_n)-f_w(x'_n)\|_2 \]

Here, \(x'_n\) is the augmented version of image \(x_n\). This helps the model become robust to differences in image capture, angle, cropping, and noise.

6.4 Colour Similarity Loss

Colour is an important visual feature for textiles. The paper converts images into HSV colour space and creates a colour histogram based on hue and saturation.

Hue \(H\) and saturation \(S\) are converted into Cartesian coordinates:

\[ x_c(H,S) = \frac{r}{2} + \frac{r}{2} S\cos(2\pi H) \]

\[ y_c(H,S) = \frac{r}{2} + \frac{r}{2} S\sin(2\pi H) \]

A 2D colour histogram is then created. The model compares colour histograms using normalized cross-correlation. If two images have similar colour distribution, their learned descriptors should also be close.

6.5 Rule-Based Similarity Loss

The rule-based loss uses domain expert rules. If two images are considered similar by expert-defined rules, the model minimizes the distance between their descriptors.

The rule-based loss is:

\[ E_r(x,w) = \frac{1}{N_r} \sum_{n=1}^{N_r} \delta_s^n \Delta^n + (1-\delta_s^n)\max(2-\Delta^n,0) \]

Here, \(\delta_s^n=1\) means the pair is similar, and \(\delta_s^n=0\) means it is dissimilar. In this paper, the expert rules mainly generate similar pairs rather than dissimilar pairs.

7. Similarity Scenarios

The authors evaluate five different similarity scenarios. Each scenario combines the loss functions in a different way.

Scenario	Similarity Definition	Meaning
Scenario A	Semantic similarity + self-similarity	Uses knowledge graph metadata and augmented image consistency.
Scenario B	Colour similarity + self-similarity	Uses only visual similarity, mainly colour-based similarity.
Scenario C	Semantic similarity + expert rules	Adds cultural heritage expert rules to semantic similarity.
Scenario D	Colour similarity + expert rules	Adds cultural heritage expert rules to colour similarity.
Scenario E	All concepts of similarity	Combines semantic, self, colour, and rule-based similarity.

These scenarios reflect a central question in textile retrieval: should similarity be based on what the fabric looks like, what the metadata says, what experts define, or a combination of all three?

8. Evaluation and Results

The evaluation is performed in three broad steps. First, the authors tune hyperparameters. Second, they perform a technical evaluation using semantic similarity and k-nearest-neighbour classification. Third, they perform an expert evaluation in which cultural heritage experts judge whether retrieved images are meaningful.

8.1 Technical Evaluation

For technical evaluation, the paper reports accuracy and F1 scores for different semantic variables such as material, production place, technique, timespan, and depiction.

In terms of average accuracy, Scenario E performs slightly best among the similarity scenarios:

Scenario	Average Accuracy	Interpretation
Scenario A	65.3 / 64.4	Semantic similarity plus self-similarity.
Scenario B	63.4 / 62.5	Visual-only colour similarity performs lower on semantic classification.
Scenario C	65.2 / 64.3	Semantic similarity plus expert rules.
Scenario D	65.2 / 64.2	Colour similarity plus expert rules.
Scenario E	65.4 / 64.4	Combination of all similarity concepts.

In terms of average F1 score, Scenario E is again the strongest among the five scenarios:

Scenario	Average F1 Score
Scenario A	43.8 / 44.1
Scenario B	40.8 / 41.1
Scenario C	44.0 / 44.1
Scenario D	43.5 / 43.7
Scenario E	44.4 / 44.4

8.2 Expert Evaluation

The expert evaluation is particularly important because textile retrieval is not only a technical problem. The retrieved images must make sense to domain experts and users.

The cultural heritage experts judged image pairs using three criteria:

Criterion	Meaning
Pattern	Similarity of decorative motifs such as flowers, birds, or other depicted subjects.
Colour	Similarity in colour appearance.
Appearance	General outward form, including shape, geometric form, and colour.

A retrieved pair was considered meaningful if it matched at least two of these criteria.

In the expert evaluation, Scenario B performs best. This is an important finding. Scenario B is based on colour similarity and self-similarity, meaning that a relatively simple visual-only definition of similarity gave the most meaningful results according to experts.

Important Result: Scenario E performed best in the technical semantic evaluation, while Scenario B performed best in expert evaluation. This shows that technical semantic similarity and human-perceived textile similarity are related but not identical.

The paper reports that the simplest visual-only similarity retrieved at least one meaningful image for 83% of query images. This indicates that colour and appearance are extremely important for practical textile image search.

9. Exploratory Search Engine

The final system is integrated into a web-based exploratory search engine. Users can search and filter silk textile objects using facets such as material, technique, production place, and production time.

The image retrieval system is integrated through two search options:

Search Option	Scenario Used	Meaning
Visually similar images	Scenario B	Finds objects that look visually similar, especially in colour and appearance.
Objects with similar properties	Scenario E	Finds objects similar in semantic properties such as material, technique, time, place, or depiction.

This is a very practical design. A user may want to search visually, while a researcher may want to search by historical or technical similarity. The system supports both kinds of exploration.

10. Strengths of the Paper

The first major strength of the paper is that it combines visual image retrieval with cultural heritage knowledge. This is much richer than using only image pixels or only metadata.

The second strength is the use of domain expert rules. The paper recognizes that textile similarity is a specialist concept. Experts can define similarity in ways that are not obvious from images alone.

The third strength is the practical integration into an exploratory search engine. The work is not only theoretical; it becomes a usable tool for searching silk textile collections.

The fourth strength is the distinction between visual similarity and semantic similarity. The paper does not assume that one definition of similarity fits every user. Instead, it offers different retrieval modes.

11. Limitations of the Paper

One limitation is that the expert rules mainly generate positive similarity pairs. The paper notes that a negative dissimilarity rule was initially considered, but it did not produce enough examples to be useful. This means the model has limited rule-based supervision for dissimilarity.

Another limitation is that museum metadata can be incomplete or inconsistent. The paper addresses missing annotations through an uncertainty term, but incomplete metadata remains a challenge.

A third limitation is that visual similarity and expert semantic similarity may not always align. For example, two textiles may look similar but belong to different periods or techniques, while two historically related textiles may not look visually similar.

Finally, the system is designed around European silk heritage. Applying the same method to Indian sarees, handlooms, or broader textile traditions would require new ontologies, expert rules, and domain-specific metadata harmonization.

12. Connection with Saree and Textile Research

This paper is highly relevant for saree provenance classification and textile heritage search. A saree image search system can also benefit from combining image features, metadata, and expert rules.

For example, saree similarity may depend on:

Feature Type	Saree Example
Visual similarity	Colour, motif, border, pallu, layout, zari appearance.
Semantic similarity	Cluster, region, material, weave, technique, motif name.
Expert rules	Temple border plus contrast pallu may indicate Kanjivaram; kadwa brocade plus Mughal floral motif may indicate Banarasi.

A saree knowledge graph could represent relationships such as:

\[ Saree \rightarrow has\_motif \rightarrow Mango \]

\[ Saree \rightarrow has\_border \rightarrow Temple\ Border \]

\[ Saree \rightarrow uses\_technique \rightarrow Kadwa \]

\[ Saree \rightarrow belongs\_to\_cluster \rightarrow Banaras \]

Then expert rules could generate training pairs. For example:

Expert Rule	Possible Meaning
Same cluster and same motif family	Likely semantically similar sarees.
Same weaving technique and same border type	Likely structurally similar sarees.
Same colour family and similar pallu layout	Likely visually similar sarees.

This is directly useful for building a saree image retrieval system, a saree provenance classifier, or a digital textile archive where users can search visually as well as semantically.

13. One-Sentence Summary

The paper presents a silk textile image retrieval system that combines a cultural heritage knowledge graph, domain expert similarity rules, and CNN-based image descriptors to retrieve visually and semantically similar silk fabrics from distributed museum collections.

General Disclaimer: This explanation is intended for educational and conceptual understanding. It simplifies some technical details of the original research paper while preserving the main ideas, equations, architecture, evaluation method, and practical implications.

```

My Research Notes

Friday, 5 June 2026

Understanding the Paper: Searching Silk Fabrics by Images Leveraging on Knowledge Graph and Domain Expert Rules

Understanding the Paper: Searching Silk Fabrics by Images Leveraging on Knowledge Graph and Domain Expert Rules

1. What Problem Is the Paper Solving?

2. Main Idea of the Paper

3. Knowledge Graph for Silk Textiles

4. Domain Expert Rules for Similarity

5. Image-Based Retrieval Model

6. Loss Functions Used for Training

6.1 Semantic Similarity Loss

6.2 Triplet Loss for Semantic Similarity

6.3 Self-Similarity Loss

6.4 Colour Similarity Loss

6.5 Rule-Based Similarity Loss

7. Similarity Scenarios

8. Evaluation and Results

8.1 Technical Evaluation

8.2 Expert Evaluation

9. Exploratory Search Engine

10. Strengths of the Paper

11. Limitations of the Paper

12. Connection with Saree and Textile Research

13. One-Sentence Summary

No comments:

Post a Comment

Understading the Paper: Fine Grained Image Analysis with Deep Learning