Understanding the Paper: Searching Silk Fabrics by Images Leveraging on Knowledge Graph and Domain Expert Rules
The paper “Searching Silk Fabrics by Images Leveraging on Knowledge Graph and Domain Expert Rules” presents an image-based retrieval system for searching European silk textile objects. The system combines three important elements: a knowledge graph, domain expert rules, and a deep learning-based image retrieval model.
The central aim of the paper is to help preserve and explore European silk textile heritage. Many historical silk fabrics are held in museums and collections across the world. Their images and metadata are scattered across different websites, formats, and languages. This paper proposes a way to bring that scattered knowledge together and allow users to search for visually or semantically similar silk fabrics using images.
- 1. What Problem Is the Paper Solving?
- 2. Main Idea of the Paper
- 3. Knowledge Graph for Silk Textiles
- 4. Domain Expert Rules for Similarity
- 5. Image-Based Retrieval Model
- 6. Loss Functions Used for Training
- 7. Similarity Scenarios
- 8. Evaluation and Results
- 9. Exploratory Search Engine
- 10. Connection with Saree and Textile Research
1. What Problem Is the Paper Solving?
European silk textile production is described in the paper as an endangered form of intangible cultural heritage. Although many historical silk fabrics still exist in museums and collections, the information about them is fragmented. One museum may describe production time in one way, another museum may describe material or technique differently, and another may store images without standardized metadata.
This creates a serious problem for historians, textile experts, designers, and the public. If someone has an image of a silk fabric and wants to find visually or historically similar fabrics, there is no easy way to search across many collections at once.
The paper therefore asks a practical question:
This question is highly relevant for textile heritage because similarity is not only visual. Two fabrics may look similar because of colour or pattern, but they may also be related because of production place, time period, material, technique, or depicted motif.
2. Main Idea of the Paper
The paper proposes a system that combines:
| Component | Purpose |
|---|---|
| Knowledge Graph | Stores structured information about silk textile objects, including production time, place, material, technique, and motifs. |
| Domain Expert Rules | Defines expert-informed similarity rules, such as fabrics using the same technique or showing the same type of motif. |
| CNN-Based Image Retrieval | Learns image descriptors so that similar textile images are close to one another in feature space. |
| Exploratory Search Engine | Allows users to search for visually similar images or objects with similar semantic properties. |
The model is trained to produce feature vectors for fabric images. If two images are similar, their feature vectors should be close. If two images are dissimilar, their feature vectors should be far apart.
The broad pipeline can be represented as:
\[ Museum\ Records \rightarrow Knowledge\ Graph \rightarrow Expert\ Rules \rightarrow CNN\ Image\ Descriptors \rightarrow Similar\ Fabric\ Retrieval \]
3. Knowledge Graph for Silk Textiles
The authors collect museum records describing silk textiles and silk-related objects from collections around the world. These records include both metadata and images. The metadata may describe production time, production place, material, technique, and motifs.
The paper follows an ETL pipeline:
| Step | Meaning | Purpose in This Paper |
|---|---|---|
| Extract | Collect data from museum websites and APIs. | Gather images and metadata of silk textile objects. |
| Transform | Convert different museum formats into a common structure. | Standardize fields such as date, material, technique, and production place. |
| Load | Load the structured data into a knowledge graph. | Create an RDF-based graph that can be queried and used for retrieval. |
The knowledge graph uses the CIDOC-CRM ontology, a standard model for cultural heritage information. This is important because museum records are often heterogeneous. One museum may use the field name Date, another may use date_text, and another may use a field in another language. The knowledge graph harmonizes these fields into a common semantic structure.
The resulting knowledge graph contains descriptions of:
| Resource Type | Count |
|---|---|
| Unique objects | 36,210 |
| Images | 74,527 |
The paper’s Figure 1 gives an example of a textile object in the knowledge graph: an object from the CDMT Terrassa museum, produced in Italy in the 16th century, using the Brocatelle technique, made with silk bombyx mori, and showing the motif of a crown.
4. Domain Expert Rules for Similarity
A major contribution of the paper is the use of domain expert rules. Cultural heritage experts helped define when two silk fabric images should be considered similar. These rules are then converted into SPARQL queries over the knowledge graph.
The rules define image pairs that should be treated as similar during training. Examples include:
| Rule Type | Similarity Meaning |
|---|---|
| Same record | Two images belong to the same museum object, so they show the same fabric. |
| Same special dataset or medium | Objects from the Garín dataset using graph paper or gouache on paper are considered similar. |
| Same technique or material | Both fabrics use techniques such as pile-on-pile velvet or ciselé velvet. |
| Same technique plus motif | Both fabrics use ciselé velvet and depict pomegranate motifs. |
| Plain fabric mention | The corresponding records mention plain fabric. |
| Same relevant colour cluster | Both images belong to expert-identified colour clusters such as saturated red, blue, blue damasks, or green damasks. |
These expert rules are valuable because they translate textile knowledge into machine-learning supervision. The model is not learning only from pixels; it is also learning from what experts consider meaningful similarity.
5. Image-Based Retrieval Model
The image retrieval model uses a convolutional neural network to convert each image into a compact feature vector. This vector is called a descriptor.
The process can be written as:
\[ Image\ x \rightarrow CNN \rightarrow Descriptor\ f(x) \]
The paper uses a ResNet-152 backbone. The image is resized to:
\[ 224 \times 224 \]
The ResNet-152 backbone produces a 2048-dimensional feature vector. This is followed by two fully connected layers:
\[ 2048 \rightarrow 1028 \rightarrow 128 \]
The final output is a normalized 128-dimensional descriptor:
\[ f(x) \in \mathbb{R}^{128} \]
The purpose of training is to make descriptors of similar silk images close together and descriptors of dissimilar silk images farther apart.
After descriptors are computed for all images, image retrieval becomes a nearest-neighbour search:
\[ Query\ Image \rightarrow Descriptor \rightarrow k\text{-Nearest Neighbours} \rightarrow Similar\ Textile\ Images \]
6. Loss Functions Used for Training
The paper trains the model using a weighted combination of four loss terms. The total loss is:
\[ E(x,w) = \alpha_t E_t(x,w) + \alpha_s E_s(x,w) + \alpha_c E_c(x,w) + \alpha_r E_r(x,w) \]
Here, the four loss terms correspond to four different notions of similarity.
| Loss Term | Name | Purpose |
|---|---|---|
| \(E_t\) | Semantic similarity loss | Uses metadata similarity from the knowledge graph. |
| \(E_s\) | Self-similarity loss | Makes an image similar to augmented versions of itself. |
| \(E_c\) | Colour similarity loss | Uses colour distribution similarity between images. |
| \(E_r\) | Rule-based similarity loss | Uses domain expert rules to define similar image pairs. |
6.1 Semantic Similarity Loss
Semantic similarity is based on annotations in the knowledge graph. The paper considers five semantic variables:
| Semantic Variable | Meaning |
|---|---|
| Production timespan | When the textile object was produced. |
| Production place | Where the textile object was produced. |
| Production material | What material was used. |
| Production technique | How the textile was produced. |
| Subject depicted | What motif or subject is depicted. |
The semantic similarity between two images is defined as:
\[ Y_s(x_n,x_o) = \frac{1}{M} \sum_{m=1}^{M} v_m \cdot d_m(x_n,x_o) \cdot \pi_m^n \cdot \pi_m^o \]
Here, \(M\) is the number of semantic variables, \(v_m\) is the weight of semantic variable \(m\), \(d_m(x_n,x_o)\) measures agreement between the annotations of two images, and \(\pi_m^n\), \(\pi_m^o\) indicate whether the annotation is available for each image.
The paper also accounts for missing annotations using an uncertainty term:
\[ u(x_n,x_o) = 1 - \frac{1}{M} \sum_{m=1}^{M} \pi_m^n \cdot \pi_m^o \]
This is important because museum metadata is often incomplete. Two images should not be treated as dissimilar simply because one record lacks a certain annotation.
6.2 Triplet Loss for Semantic Similarity
The semantic loss is based on triplets:
\[ (x_a,\ x_{ps},\ x_{ng}) \]
Here, \(x_a\) is the anchor image, \(x_{ps}\) is a positive image considered similar to the anchor, and \(x_{ng}\) is a negative image considered less similar.
The model tries to make:
\[ \|f(x_a)-f(x_{ps})\|_2 < \|f(x_a)-f(x_{ng})\|_2 \]
This means the descriptor of the positive image should be closer to the anchor than the descriptor of the negative image.
6.3 Self-Similarity Loss
The self-similarity loss teaches the network that an image should remain similar to a transformed version of itself. The transformed image may be rotated, flipped, cropped, or slightly noised.
The loss is:
\[ E_s(x,w) = \frac{1}{N_{MB}} \sum_{n=1}^{N_{MB}} \|f_w(x_n)-f_w(x'_n)\|_2 \]
Here, \(x'_n\) is the augmented version of image \(x_n\). This helps the model become robust to differences in image capture, angle, cropping, and noise.
6.4 Colour Similarity Loss
Colour is an important visual feature for textiles. The paper converts images into HSV colour space and creates a colour histogram based on hue and saturation.
Hue \(H\) and saturation \(S\) are converted into Cartesian coordinates:
\[ x_c(H,S) = \frac{r}{2} + \frac{r}{2} S\cos(2\pi H) \]
\[ y_c(H,S) = \frac{r}{2} + \frac{r}{2} S\sin(2\pi H) \]
A 2D colour histogram is then created. The model compares colour histograms using normalized cross-correlation. If two images have similar colour distribution, their learned descriptors should also be close.
6.5 Rule-Based Similarity Loss
The rule-based loss uses domain expert rules. If two images are considered similar by expert-defined rules, the model minimizes the distance between their descriptors.
The rule-based loss is:
\[ E_r(x,w) = \frac{1}{N_r} \sum_{n=1}^{N_r} \delta_s^n \Delta^n + (1-\delta_s^n)\max(2-\Delta^n,0) \]
Here, \(\delta_s^n=1\) means the pair is similar, and \(\delta_s^n=0\) means it is dissimilar. In this paper, the expert rules mainly generate similar pairs rather than dissimilar pairs.
7. Similarity Scenarios
The authors evaluate five different similarity scenarios. Each scenario combines the loss functions in a different way.
| Scenario | Similarity Definition | Meaning |
|---|---|---|
| Scenario A | Semantic similarity + self-similarity | Uses knowledge graph metadata and augmented image consistency. |
| Scenario B | Colour similarity + self-similarity | Uses only visual similarity, mainly colour-based similarity. |
| Scenario C | Semantic similarity + expert rules | Adds cultural heritage expert rules to semantic similarity. |
| Scenario D | Colour similarity + expert rules | Adds cultural heritage expert rules to colour similarity. |
| Scenario E | All concepts of similarity | Combines semantic, self, colour, and rule-based similarity. |
These scenarios reflect a central question in textile retrieval: should similarity be based on what the fabric looks like, what the metadata says, what experts define, or a combination of all three?
8. Evaluation and Results
The evaluation is performed in three broad steps. First, the authors tune hyperparameters. Second, they perform a technical evaluation using semantic similarity and k-nearest-neighbour classification. Third, they perform an expert evaluation in which cultural heritage experts judge whether retrieved images are meaningful.
8.1 Technical Evaluation
For technical evaluation, the paper reports accuracy and F1 scores for different semantic variables such as material, production place, technique, timespan, and depiction.
In terms of average accuracy, Scenario E performs slightly best among the similarity scenarios:
| Scenario | Average Accuracy | Interpretation |
|---|---|---|
| Scenario A | 65.3 / 64.4 | Semantic similarity plus self-similarity. |
| Scenario B | 63.4 / 62.5 | Visual-only colour similarity performs lower on semantic classification. |
| Scenario C | 65.2 / 64.3 | Semantic similarity plus expert rules. |
| Scenario D | 65.2 / 64.2 | Colour similarity plus expert rules. |
| Scenario E | 65.4 / 64.4 | Combination of all similarity concepts. |
In terms of average F1 score, Scenario E is again the strongest among the five scenarios:
| Scenario | Average F1 Score |
|---|---|
| Scenario A | 43.8 / 44.1 |
| Scenario B | 40.8 / 41.1 |
| Scenario C | 44.0 / 44.1 |
| Scenario D | 43.5 / 43.7 |
| Scenario E | 44.4 / 44.4 |
8.2 Expert Evaluation
The expert evaluation is particularly important because textile retrieval is not only a technical problem. The retrieved images must make sense to domain experts and users.
The cultural heritage experts judged image pairs using three criteria:
| Criterion | Meaning |
|---|---|
| Pattern | Similarity of decorative motifs such as flowers, birds, or other depicted subjects. |
| Colour | Similarity in colour appearance. |
| Appearance | General outward form, including shape, geometric form, and colour. |
A retrieved pair was considered meaningful if it matched at least two of these criteria.
In the expert evaluation, Scenario B performs best. This is an important finding. Scenario B is based on colour similarity and self-similarity, meaning that a relatively simple visual-only definition of similarity gave the most meaningful results according to experts.
The paper reports that the simplest visual-only similarity retrieved at least one meaningful image for 83% of query images. This indicates that colour and appearance are extremely important for practical textile image search.
9. Exploratory Search Engine
The final system is integrated into a web-based exploratory search engine. Users can search and filter silk textile objects using facets such as material, technique, production place, and production time.
The image retrieval system is integrated through two search options:
| Search Option | Scenario Used | Meaning |
|---|---|---|
| Visually similar images | Scenario B | Finds objects that look visually similar, especially in colour and appearance. |
| Objects with similar properties | Scenario E | Finds objects similar in semantic properties such as material, technique, time, place, or depiction. |
This is a very practical design. A user may want to search visually, while a researcher may want to search by historical or technical similarity. The system supports both kinds of exploration.
10. Strengths of the Paper
The first major strength of the paper is that it combines visual image retrieval with cultural heritage knowledge. This is much richer than using only image pixels or only metadata.
The second strength is the use of domain expert rules. The paper recognizes that textile similarity is a specialist concept. Experts can define similarity in ways that are not obvious from images alone.
The third strength is the practical integration into an exploratory search engine. The work is not only theoretical; it becomes a usable tool for searching silk textile collections.
The fourth strength is the distinction between visual similarity and semantic similarity. The paper does not assume that one definition of similarity fits every user. Instead, it offers different retrieval modes.
11. Limitations of the Paper
One limitation is that the expert rules mainly generate positive similarity pairs. The paper notes that a negative dissimilarity rule was initially considered, but it did not produce enough examples to be useful. This means the model has limited rule-based supervision for dissimilarity.
Another limitation is that museum metadata can be incomplete or inconsistent. The paper addresses missing annotations through an uncertainty term, but incomplete metadata remains a challenge.
A third limitation is that visual similarity and expert semantic similarity may not always align. For example, two textiles may look similar but belong to different periods or techniques, while two historically related textiles may not look visually similar.
Finally, the system is designed around European silk heritage. Applying the same method to Indian sarees, handlooms, or broader textile traditions would require new ontologies, expert rules, and domain-specific metadata harmonization.
12. Connection with Saree and Textile Research
This paper is highly relevant for saree provenance classification and textile heritage search. A saree image search system can also benefit from combining image features, metadata, and expert rules.
For example, saree similarity may depend on:
| Feature Type | Saree Example |
|---|---|
| Visual similarity | Colour, motif, border, pallu, layout, zari appearance. |
| Semantic similarity | Cluster, region, material, weave, technique, motif name. |
| Expert rules | Temple border plus contrast pallu may indicate Kanjivaram; kadwa brocade plus Mughal floral motif may indicate Banarasi. |
A saree knowledge graph could represent relationships such as:
\[ Saree \rightarrow has\_motif \rightarrow Mango \]
\[ Saree \rightarrow has\_border \rightarrow Temple\ Border \]
\[ Saree \rightarrow uses\_technique \rightarrow Kadwa \]
\[ Saree \rightarrow belongs\_to\_cluster \rightarrow Banaras \]
Then expert rules could generate training pairs. For example:
| Expert Rule | Possible Meaning |
|---|---|
| Same cluster and same motif family | Likely semantically similar sarees. |
| Same weaving technique and same border type | Likely structurally similar sarees. |
| Same colour family and similar pallu layout | Likely visually similar sarees. |
This is directly useful for building a saree image retrieval system, a saree provenance classifier, or a digital textile archive where users can search visually as well as semantically.
13. One-Sentence Summary
The paper presents a silk textile image retrieval system that combines a cultural heritage knowledge graph, domain expert similarity rules, and CNN-based image descriptors to retrieve visually and semantically similar silk fabrics from distributed museum collections.
No comments:
Post a Comment