Friday, 5 June 2026

Understanding the Paper: Construction of Cultural Heritage Knowledge Graph Based on Graph Attention Neural Network

Understanding the Paper: Construction of Cultural Heritage Knowledge Graph Based on Graph Attention Neural Network

The paper “Construction of Cultural Heritage Knowledge Graph Based on Graph Attention Neural Network” proposes a method for building a cultural heritage knowledge graph using deep learning, especially BERT and Graph Attention Networks, or GAT. The study uses Tang Dynasty gold and silver artifacts as the cultural heritage domain.

The main problem addressed by the paper is that cultural heritage knowledge is vast, scattered, complex, and difficult for designers to retrieve quickly. Information about artifacts may exist in books, excavation reports, historical documents, catalogues, images, or databases. The paper argues that a knowledge graph can organize this information into a structured, searchable, and visual form.

Core Idea: The paper builds a knowledge extraction model using BERT and GAT, extracts entities and relationships from cultural heritage text, and then constructs a visual knowledge graph for Tang Dynasty gold and silver artifacts.

1. What Problem Is the Paper Solving?

Tang Dynasty gold and silver artifacts are important cultural objects. They contain information about craftsmanship, social life, ritual practices, artistic styles, foreign influence, materials, decorative motifs, and symbolic meanings. However, this knowledge is not easy to use directly because it is spread across many sources and formats.

The paper identifies several challenges in cultural heritage design knowledge:

Challenge Meaning Example in Cultural Heritage
Information diversity Knowledge exists in structured, semi-structured, and unstructured forms. Tables, museum records, books, excavation reports, descriptive paragraphs.
Information overload Large amounts of data make it difficult to quickly find relevant knowledge. A designer may need shape, motif, material, and cultural origin, but the information is scattered.
Ambiguity in meaning Different sources may use different terms for similar concepts. The same decorative motif or artifact type may be described differently in different books.
Dynamic iteration Knowledge changes as new research, excavation, or interpretation emerges. New artifact interpretation may update existing knowledge.

The paper therefore focuses on creating a system that can extract cultural heritage knowledge from text and organize it into a knowledge graph. This allows designers, researchers, and the public to retrieve and visualize cultural knowledge more efficiently.

2. Main Idea of the Paper

The paper proposes a knowledge graph construction method based on a joint entity-relationship extraction model. The model combines:

Component Purpose
BERT Encodes text and captures deep contextual semantic information.
Dependency parsing Analyzes grammatical relationships between words in a sentence.
Graph Attention Network Assigns different weights to different word dependencies, helping the model focus on important relationships.
Segmental attention fusion Handles overlapping entities and entity relationships by dividing text into meaningful segments.
Knowledge graph platform Stores, visualizes, retrieves, and compares artifact knowledge.

The broad workflow can be represented as:

\[ Cultural\ Heritage\ Text \rightarrow BERT\ Encoding \rightarrow Dependency\ Analysis \rightarrow GAT\ Feature\ Enhancement \rightarrow Entity\ and\ Relationship\ Extraction \rightarrow Knowledge\ Graph \rightarrow Knowledge\ Retrieval \]

The paper applies this workflow to Tang Dynasty gold and silver artifacts. One important example used in the paper is the Gilded Musician Pattern Silver Cup, whose shape, decoration, technique, parts, and cultural origin are extracted and represented as connected knowledge.

3. Unified Data Modeling for Cultural Heritage Knowledge

The paper proposes a unified data modeling process for cultural heritage knowledge. This process has three broad modules:

Module Function
Knowledge data sources Collects structured, semi-structured, and unstructured cultural heritage data.
Knowledge information data model construction Performs knowledge modeling, storage, extraction, fusion, computation, and application.
Knowledge information data model application Supports data query, knowledge comparison, graph viewing, knowledge management, and knowledge collection.

The paper’s Figure 1 shows this three-level process clearly. At the bottom are knowledge sources, in the middle is knowledge graph construction, and at the top are applications such as query, comparison, visualization, and management.

This is important because cultural heritage knowledge is not only about storing facts. It also needs to support practical design work. A designer may want to search for motif inspiration, compare artifact forms, understand decorative techniques, or explore cultural symbolism.

4. Why BERT and GAT Are Used

4.1 Why BERT?

BERT is used because cultural heritage text is context-rich. A word may have different meanings depending on the sentence. For example, a word describing a shape, decoration, part, or cultural origin may not be understood correctly without surrounding context.

BERT encodes a sentence bidirectionally, meaning it looks at both the left and right context of a word. This helps the model understand subtle meanings in artifact descriptions.

A simplified BERT encoding process can be written as:

\[ U = \{u_0,u_1,u_2,\ldots,u_n\} \]

where \(u_0=[CLS]\) marks the beginning of the sentence and \(u_n=[SEP]\) marks the end or separator. After BERT encoding, the sentence becomes:

\[ R = \{r_0,r_1,r_2,\ldots,r_n\} \]

Here, each \(r_i\) is a contextual semantic feature vector for the corresponding word or character.

4.2 Why GAT?

A Graph Attention Network is used because not every dependency relation in a sentence is equally important. Some words are more important than others for identifying entities and relationships.

For example, in a sentence describing an artifact, the relationship between silver cup and octagonal body may be more important than a less informative modifier. GAT allows the model to assign higher attention weights to important dependencies.

The sentence is treated as a graph:

\[ Words = Nodes \]

\[ Dependency\ Relations = Edges \]

The attention coefficient between node \(i\) and node \(j\) can be understood as:

\[ \alpha_{ij} = softmax \left( LeakyReLU \left( a^T[Wr_i \parallel Wr_j] \right) \right) \]

Here, \(r_i\) and \(r_j\) are word vectors, \(W\) is a trainable weight matrix, \(a\) is a learnable attention vector, and \(\parallel\) denotes concatenation.

The updated syntactic representation of a word is:

\[ s_i = \theta \left( \sum_{j \in N_i} \alpha_{ij}Wr_j \right) \]

This means the representation of word \(i\) is updated by collecting information from its dependency neighbors, but each neighbor contributes according to its attention weight.

5. Entity-Relationship Joint Extraction Model

The model has three major layers:

Layer Role
Embedding layer Converts text into semantic and syntactic feature vectors using BERT and GAT.
Entity recognition layer Identifies entities such as artifact names, shapes, patterns, techniques, and cultural origins.
Relationship classification layer Identifies relationships between entities, such as shape, decoration, part, technique, and cultural origin.

The paper’s Figure 2 shows this joint extraction model. The input text first goes through word embedding, dependency matrix construction, and graph attention. Then the resulting feature vectors are used by both the entity recognition layer and the relationship extraction layer.

This joint extraction design is important because the traditional pipeline method first extracts entities and then extracts relationships. If entity recognition makes an error, the error propagates into relationship extraction. A joint model reduces this problem by learning both tasks together.

6. Embedding Layer

The embedding layer combines semantic features from BERT and syntactic features from GAT.

After BERT encoding, each word has a semantic feature vector:

\[ R = \{r_1,r_2,\ldots,r_n\} \]

After GAT processing over the dependency graph, each word has a syntactic feature vector:

\[ S = \{s_1,s_2,\ldots,s_n\} \]

The final word representation is obtained by concatenating the semantic and syntactic features:

\[ e_i = [r_i;s_i] \]

The full text sequence representation is:

\[ E = \{e_1,e_2,\ldots,e_n\} \]

In simple terms, each word is represented using both meaning and grammar. This helps the model better understand artifact descriptions where shape, decoration, technique, and cultural origin may appear in complex sentence structures.

7. Entity Recognition Layer

The entity recognition layer divides text into segments of different lengths and decides whether each segment belongs to an entity category or is not an entity.

For example, in the phrase:

The gilded musician pattern silver cup has an octagonal body.

The model may extract:

Segment Entity Type
Gilded musician pattern silver cup Artifact
Octagonal body Shape / Structure

The segment classifier uses three types of features:

Feature Type Meaning
Segment semantic feature Obtained by average pooling over word vectors in the segment.
Segment length feature Represents the length of the text segment.
Segment global feature Obtained through a pooling attention mechanism that captures broader sentence context.

The segment semantic feature is calculated as:

\[ span = AvgPooling(e_i,e_{i+1},\ldots,e_{i+k}) \]

The three features are concatenated:

\[ x_p = [span;w_{k+1};c_p] \]

Then a softmax classifier predicts the entity category:

\[ \hat{y}_p = softmax(W_px_p + b_p) \]

This segment-based approach helps handle nested entities and overlapping entity structures, which are common in cultural heritage descriptions.

8. Relationship Classification Layer

The relationship classification layer identifies the relationship between two extracted entities. For example:

\[ Gilded\ Musician\ Pattern\ Silver\ Cup \rightarrow Shape \rightarrow Octagonal \]

\[ Gilded\ Musician\ Pattern\ Silver\ Cup \rightarrow Decoration \rightarrow Musicians \]

\[ Circular\ Handle \rightarrow Part \rightarrow Finger\ Rest \]

The paper uses a segmental attention fusion mechanism. The sentence is divided into five parts:

Text Segment Meaning
Left context Text before the first entity.
Entity 1 The first entity.
Middle context Text between the two entities.
Entity 2 The second entity.
Right context Text after the second entity.

The feature vectors for the five parts are pooled and concatenated:

\[ T = [t_{left};t_{g1};t_{middle};t_{g2};t_{right}] \]

The paper then uses Bi-LSTM and self-attention to learn relationship features. The self-attention mechanism can be written as:

\[ Q = TW^Q \]

\[ K = TW^K \]

\[ V = TW^V \]

The fused relationship feature vector is:

\[ h_r = softmax \left( \frac{QK^T}{\sqrt{d}} \right) V \]

Finally, the relationship prediction is:

\[ \hat{y}_r = Sig(W_rh_r + b_r) \]

This allows the model to predict whether a relationship exists between two entities and what type of relationship it is.

9. Training Objective and Loss Functions

The model jointly optimizes entity recognition and relationship extraction.

9.1 Entity Recognition Loss

Entity recognition is treated as a multiclass classification problem. The paper uses cross-entropy loss:

\[ L_{NER} = -\sum_i y_i \log(\hat{y}_i) \]

Here, \(y_i\) is the true entity label and \(\hat{y}_i\) is the predicted probability.

9.2 Relationship Extraction Loss

Relationship extraction is treated as a multilabel classification problem because one entity pair may have multiple possible relationships. The loss is:

\[ L_{RE} = -\sum_k \left[ y_k\log(\hat{y}_k) + (1-y_k)\log(1-\hat{y}_k) \right] \]

Here, \(y_k=1\) means the relationship of type \(k\) exists, while \(y_k=0\) means it does not exist.

9.3 Total Loss

The total loss is the sum of the two losses:

\[ L = L_{NER} + L_{RE} \]

By minimizing this total loss, the model improves both entity recognition and relationship extraction at the same time.

10. Dataset and Experimental Setup

The paper evaluates the model on two datasets:

Dataset Purpose
NYT Dataset English benchmark dataset for joint entity-relation extraction.
Tang Dynasty Gold and Silver Artifacts Dataset Chinese cultural heritage dataset created from authoritative historical and artifact-related texts.

The Tang Dynasty dataset is built from literature related to gold and silver artifacts. The text is annotated using structured triples that identify entities and relationships. The dataset includes knowledge about shape, decoration, decorative techniques, artifact parts, function, and cultural origin.

The paper reports the following dataset statistics:

Dataset Split NYT Tang Dynasty Gold and Silver Artifacts
Training Set 56,195 122,442
Validation Set 4,999 15,922
Test Set 5,000 25,109
Normal Sentences 3,266 14,220
Single Entity Overlap 1,297 6,440
Entity Pair Overlap 978 10

The training hyperparameters are:

Parameter Value
Learning rate \(1 \times 10^{-5}\)
Maximum epochs 30
Batch size 8
Seed 42

11. Experimental Results

11.1 Performance on Complex Sentences

The paper compares the proposed model with baseline models such as CopyRE, GraphRel, CopyMTL, and RSAN. One important evaluation checks how well each model handles sentences with different numbers of triplets.

Model \(N=1\) \(N=2\) \(N=3\) \(N=4\) \(N \geq 5\)
CopyRE 67.0 56.2 51.2 47.2 25.8
GraphRel 63.7 64.6 58.9 55.2 47.1
CopyMTL 71.2 71.3 70.3 73.1 48.9
RSAN 73.3 82.1 82.7 84.5 76.4
Proposed Model 84.1 85.4 86.1 85.5 85.3

The proposed model performs better than the baselines, especially when sentences contain many triplets. This is important because cultural heritage descriptions often contain multiple entities and relationships in one sentence.

11.2 Results on Tang Dynasty Gold and Silver Artifacts Dataset

The model is tested on different types of overlapping triplets:

Overlap Type Meaning
SEO Single Entity Overlap: one entity participates in multiple triplets.
EPO Entity Pair Overlap: the same pair of entities may have multiple relationships.
All Overall performance across the dataset.
Type Precision Recall F1
SEO 84.05 84.03 84.9
EPO 85.6 86.1 85.6
All 84 84 84.9

These results show that the model handles overlapping relationships effectively. This matters because artifact descriptions often have one artifact connected to many shape, decoration, part, technique, and cultural-origin entities.

11.3 Training Summary

Metric Value
Training accuracy 0.8506
Validation accuracy 0.8050
Epochs 19
Time per epoch 996 seconds

The paper reports that early stopping was triggered after the model stopped improving. The training accuracy reached 0.8506 and the validation accuracy reached 0.8050.

12. Knowledge Retrieval Platform

After constructing the knowledge graph, the paper develops a knowledge retrieval platform for Tang Dynasty gold and silver artifacts. The system has three main layers:

Layer Function
Schema layer Defines entities, attributes, relationships, classifications, hierarchy, and ontology structure.
Data layer Stores structured and unstructured knowledge extracted from artifact literature and annotated datasets.
Knowledge management layer Supports graph storage, retrieval, visualization, management, and updating.

The system includes four key modules:

System Module Purpose
Data query Search and retrieve artifact knowledge.
Knowledge extraction Extract entities and relationships from uploaded knowledge data.
Knowledge graph visualization Display entities and relationships as an interactive graph.
Knowledge base management Manage, update, and maintain the knowledge graph.

The paper demonstrates the platform using the Gilded Musician Pattern Gold Cup. When this artifact is searched, the system displays related entities and relationships. The paper reports that the system shows 26 related entities and 7 relationship types for this example.

The extracted knowledge includes relationships such as:

\[ Gilded\ Musician\ Pattern\ Silver\ Cup \rightarrow Shape \rightarrow Octagonal \]

\[ Gilded\ Musician\ Pattern\ Silver\ Cup \rightarrow Decoration \rightarrow Musicians \]

\[ Gilded\ Musician\ Pattern\ Silver\ Cup \rightarrow Decorative\ Technique \rightarrow Flat\text{-}chiseling \]

\[ Gilded\ Musician\ Pattern\ Silver\ Cup \rightarrow Cultural\ Origin \rightarrow Sogdian\ Silverware \]

This makes the artifact easier to study, compare, and use as inspiration for cultural and creative design.

13. Strengths of the Paper

The first strength of the paper is that it connects knowledge graph construction with a practical design problem. The goal is not merely academic extraction; the system is meant to help designers collect and use cultural heritage knowledge.

The second strength is the use of a joint entity-relationship extraction model. This reduces the error propagation problem found in traditional pipeline extraction methods.

The third strength is the integration of BERT, dependency parsing, and GAT. BERT captures contextual meaning, dependency parsing captures grammatical structure, and GAT learns which dependencies matter more.

The fourth strength is that the work goes beyond model testing and builds a retrieval platform with upload, search, visualization, and comparison functions.

14. Limitations of the Paper

One limitation is that the model relies heavily on textual data. Cultural heritage knowledge is also present in images, physical objects, drawings, museum labels, craft videos, and expert oral knowledge. These sources are not deeply integrated in the current model.

Another limitation is the need for manual annotation in the early stages. Although the model automates extraction, high-quality training data still requires careful expert annotation.

A third limitation is that the visualization interface still needs improvement in interactivity and user experience. Different users, such as designers, scholars, students, and the general public, may need different forms of knowledge presentation.

A fourth limitation is domain transfer. The model is applied to Tang Dynasty gold and silver artifacts. Applying it to other cultural heritage domains would require new entity categories, relationship categories, datasets, and domain rules.

15. Connection with Saree and Textile Research

This paper is highly relevant for saree provenance classification and textile knowledge graph construction. Saree knowledge is also complex, scattered, and relational. A saree is not only an image; it is connected to craft cluster, region, material, technique, motif, border, pallu, zari, loom type, and cultural use.

A similar knowledge graph for sarees could include entities such as:

Entity Type Saree Example
Craft cluster Kanchipuram, Banaras, Paithani, Gadwal, Ilkal.
Motif Peacock, mango, floral buta, temple, parrot.
Technique Kadwa, Korvai, Jamdani, Ikat, brocade.
Material Mulberry silk, tussar, cotton, zari.
Part Border, pallu, body, selvedge.

The relationship triples may look like:

\[ Kanjivaram\ Saree \rightarrow has\_border \rightarrow Temple\ Border \]

\[ Banarasi\ Saree \rightarrow uses\_technique \rightarrow Kadwa \]

\[ Paithani\ Saree \rightarrow has\_motif \rightarrow Peacock \]

\[ Gadwal\ Saree \rightarrow has\_structure \rightarrow Silk\ Border\ with\ Cotton\ Body \]

The BERT-GAT approach can help extract such relationships from textile books, product descriptions, museum records, GI documentation, craft notes, and expert-written articles. This would be especially useful for your saree provenance research because it could convert unstructured textile knowledge into a structured graph that supports classification, retrieval, and explanation.

16. One-Sentence Summary

The paper proposes a BERT-GAT-based joint entity-relationship extraction model to construct a cultural heritage knowledge graph for Tang Dynasty gold and silver artifacts, enabling structured knowledge retrieval, visualization, comparison, and design-oriented cultural heritage application.

General Disclaimer: This explanation is intended for educational and conceptual understanding. It simplifies some technical details of the original research paper while preserving the main ideas, equations, architecture, experimental results, and practical implications.
```

No comments:

Post a Comment

Understanding the Paper: Drishtikon

DRISHTIKON: A Multimodal Multilingual Benchmark for Indian Cultural Understanding The paper “DRISHTIKON: A Multimodal Multilingual Benchm...