My Research Notes: Understanding the GraphSAGE Paper

Understanding the GraphSAGE Paper

The paper “Inductive Representation Learning on Large Graphs” was written by William L. Hamilton, Rex Ying, and Jure Leskovec. It introduces GraphSAGE, which stands for Graph SAmple and aggreGatE.

The main idea of the paper is to learn a function that can generate embeddings for new, unseen nodes, instead of learning a separate embedding for every node in a fixed graph.

1. The Main Problem

Before GraphSAGE, many graph embedding methods such as DeepWalk, node2vec, LINE, and matrix-factorization methods were mostly transductive. This means they learned embeddings for nodes that were already present during training.

For example, if a graph contains nodes:

\[ v_1, v_2, v_3, \ldots, v_n \]

a transductive method learns a separate embedding for each node:

\[ z_{v_1}, z_{v_2}, z_{v_3}, \ldots, z_{v_n} \]

This works when the graph is fixed. However, in real life, graphs constantly change.

Real-World Graph	New Nodes May Be
Reddit graph	New posts
YouTube graph	New users or videos
Citation graph	New research papers
Protein interaction graph	New proteins or new organisms
Product graph	New products

The problem is simple:

What happens when a new node appears after training?

Traditional transductive methods cannot immediately generate a good embedding for that new node. They often need retraining or extra optimization. GraphSAGE solves this problem.

2. What GraphSAGE Does Differently

GraphSAGE does not learn one embedding per node. Instead, it learns an embedding-generating function.

In simple words, GraphSAGE learns how to create an embedding for a node by looking at that node’s features and the features of its neighbours.

So instead of storing:

\[ \text{node} \rightarrow \text{fixed embedding} \]

GraphSAGE learns:

\[ \text{node features + neighbourhood features} \rightarrow \text{embedding} \]

This is why GraphSAGE is called an inductive method. It can generalize to:

\[ \text{unseen nodes} \]

and even:

\[ \text{unseen graphs} \]

provided that similar kinds of node features are available.

3. The Meaning of Inductive Learning

The word inductive is very important in this paper.

A transductive method answers:

“Given this fixed graph, what are the embeddings of these known nodes?”

An inductive method answers:

“Can I learn a rule for generating embeddings, so that I can apply it to new nodes later?”

Method Type	What It Learns	Can Handle Unseen Nodes Easily?
Transductive embedding	Embedding of each existing node	No
Inductive embedding	Function to generate embeddings	Yes

GraphSAGE belongs to the second category.

4. The Core Idea: Sample and Aggregate

The name GraphSAGE comes from:

\[ \text{SAmple and aggreGatE} \]

Figure 1 of the paper gives the visual summary of the method. It shows three steps: first, GraphSAGE samples a node’s local neighbourhood; second, it aggregates feature information from neighbours; third, it uses the aggregated information to predict graph context or labels.

The process is:

Step	What Happens
1	Pick a target node
2	Sample some neighbours
3	Collect their feature vectors
4	Aggregate neighbour information
5	Combine it with the target node’s own feature
6	Produce an embedding

A node’s identity is partly determined by its own features and partly by the company it keeps.

5. The GraphSAGE Embedding Generation Algorithm

Let the initial feature vector of node \(v\) be:

\[ h_v^{0} = x_v \]

Symbol	Meaning
\(v\)	A node
\(x_v\)	Original feature vector of node \(v\)
\(h_v^k\)	Representation of node \(v\) at layer or depth \(k\)
\(N(v)\)	Neighbourhood of node \(v\)
\(K\)	Number of aggregation layers or depths

At each layer \(k\), GraphSAGE first aggregates information from neighbours:

\[ h_{N(v)}^{k} = \text{AGGREGATE}_{k} \left( \{h_u^{k-1}, \forall u \in N(v)\} \right) \]

Then it combines the node’s previous representation with the aggregated neighbour representation:

\[ h_v^{k} = \sigma \left( W^k \cdot \text{CONCAT} \left( h_v^{k-1}, h_{N(v)}^{k} \right) \right) \]

Then it normalizes the representation:

\[ h_v^{k} = \frac{h_v^{k}}{\|h_v^{k}\|_2} \]

After \(K\) layers, the final embedding is:

\[ z_v = h_v^{K} \]

This means the final embedding of a node contains information from its multi-hop neighbourhood.

6. One-Hop and Two-Hop Understanding

If:

\[ K = 1 \]

then the node receives information from its immediate neighbours.

If:

\[ K = 2 \]

then the node receives information from neighbours and neighbours-of-neighbours.

So GraphSAGE gradually expands the receptive field of a node:

\[ \text{node} \rightarrow \text{neighbours} \rightarrow \text{neighbours of neighbours} \]

The paper reports that \(K = 2\) works well in practice. Increasing \(K\) beyond 2 gives only small gains but greatly increases runtime.

7. Why Sampling is Needed

In large graphs, a node may have thousands or millions of neighbours. Aggregating all neighbours would be expensive. So GraphSAGE samples a fixed number of neighbours.

For example, instead of using all neighbours, it may sample:

\[ S_1 = 25 \]

neighbours at the first depth and:

\[ S_2 = 10 \]

neighbours at the second depth.

This keeps computation manageable. Without sampling, the neighbourhood can explode. With sampling, the computation remains controlled.

GraphSAGE does not try to listen to everyone. It listens to a fixed sample of neighbours.

8. Aggregator Functions

The aggregator is the heart of GraphSAGE. Since a node’s neighbours are an unordered set, the aggregator should ideally work regardless of the order in which neighbours are listed.

The paper studies three main aggregator types.

Aggregator	Main Idea
Mean aggregator	Take the average of neighbour vectors
LSTM aggregator	Process neighbour vectors through an LSTM
Pooling aggregator	Transform neighbour vectors and apply max-pooling

9. Mean Aggregator

The mean aggregator simply averages the neighbour representations:

\[ h_{N(v)}^{k} = \text{MEAN} \left( \{h_u^{k-1}, \forall u \in N(v)\} \right) \]

This is simple and efficient. The paper notes that this is close to the idea of Graph Convolutional Networks, but GraphSAGE makes it inductive and uses sampled neighbourhoods.

10. LSTM Aggregator

The LSTM aggregator is more expressive because an LSTM can learn a more complex function over inputs.

However, there is one issue. LSTMs are sequence models, and node neighbours do not have a natural order. So the paper applies the LSTM to a random permutation of neighbours.

This makes it more powerful than mean aggregation, but also less naturally suited to unordered graph neighbourhoods.

11. Pooling Aggregator

The pooling aggregator first applies a neural network to each neighbour vector:

\[ \sigma(W_{\text{pool}}h_{u}^{k} + b) \]

Then it applies element-wise max pooling:

\[ \text{AGGREGATE}_{k}^{\text{pool}} = \max \left( \{\sigma(W_{\text{pool}}h_{u}^{k} + b), \forall u_i \in N(v)\} \right) \]

This lets the model detect important features from the neighbourhood.

The pooling aggregator asks: “What is the strongest signal present among this node’s neighbours?”

The paper finds that the pooling and LSTM aggregators often perform better than the simple mean aggregator.

12. Supervised and Unsupervised Learning

GraphSAGE can be trained in two ways. First, it can be trained in a supervised way using labels, such as node categories:

\[ \text{node embedding} \rightarrow \text{class label} \]

Second, it can be trained in an unsupervised way using graph context. Nearby nodes should have similar embeddings, while unrelated nodes should have different embeddings.

The paper uses an unsupervised loss:

\[ J_G(z_u) = -\log(\sigma(z_u^T z_v)) - Q \cdot \mathbb{E}_{v_n \sim P_n(v)} \log(\sigma(-z_u^T z_{v_n})) \]

Symbol	Meaning
\(z_u\)	Embedding of node \(u\)
\(z_v\)	Embedding of a nearby or context node
\(z_{v_n}\)	Embedding of a negative sample
\(Q\)	Number of negative samples
\(\sigma\)	Sigmoid function

The intuition is:

\[ \text{nearby nodes} \Rightarrow \text{similar embeddings} \]

\[ \text{negative sampled nodes} \Rightarrow \text{different embeddings} \]

13. How GraphSAGE Differs from GCN

The previous paper, Kipf and Welling’s GCN, is mainly transductive. It assumes the graph is available during training. GraphSAGE extends this idea into an inductive setting.

GCN	GraphSAGE
Usually transductive	Inductive
Uses full adjacency matrix	Samples neighbourhoods
Learns using fixed graph structure	Learns aggregation functions
Harder to apply directly to unseen nodes	Designed for unseen nodes
Uses normalized graph convolution	Uses sample-and-aggregate framework

GCN learns on a graph. GraphSAGE learns how to generate embeddings from neighbourhoods.

14. Experiments

The paper tests GraphSAGE on three inductive node classification tasks.

Dataset	Task
Citation graph	Predict paper subject categories
Reddit graph	Predict subreddit or community of posts
Protein-protein interaction graphs	Predict protein functions

The key point is that the test nodes are unseen during training. In the protein-protein interaction experiment, the model is tested on entirely unseen graphs. This is a strong test of inductive learning.

15. Results

The paper reports that GraphSAGE performs better than strong baselines such as raw features, DeepWalk, and DeepWalk plus features.

Table 1 reports micro-F1 scores across Citation, Reddit, and PPI datasets. The supervised GraphSAGE variants perform strongly. Examples include:

GraphSAGE Variant	Dataset	Reported Supervised F1
GraphSAGE-pool	Citation	\(0.839\)
GraphSAGE-LSTM	Reddit	\(0.954\)
GraphSAGE-LSTM	PPI	\(0.612\)

The table also shows that GraphSAGE improves over raw features by:

\[ 39\% \text{ to } 63\% \]

depending on dataset and training mode.

16. Runtime Advantage

DeepWalk becomes slow for unseen nodes because it must generate new random walks and perform optimization again. GraphSAGE is much faster at inference because it simply applies the learned aggregator functions.

The paper reports that DeepWalk can be:

\[ 100\times \text{ to } 500\times \]

slower at test time for unseen nodes.

This is a major practical advantage of GraphSAGE.

17. Why GraphSAGE is Useful

GraphSAGE is useful because it solves a real deployment problem. Many graph systems are dynamic, and new nodes appear all the time.

Domain	New Node
Social media	New user or post
E-commerce	New product or customer
Citation database	New research paper
Biology	New protein or organism
Textile knowledge graph	New saree image or motif

GraphSAGE can produce embeddings for these new nodes without retraining the entire graph embedding model.

18. Relation to Weisfeiler-Lehman

The paper connects GraphSAGE to the Weisfeiler-Lehman graph isomorphism test. The Weisfeiler-Lehman method repeatedly updates a node’s representation by aggregating information from neighbouring nodes.

GraphSAGE does something similar, but with trainable neural aggregators. So we can think of GraphSAGE as:

A neural, trainable version of neighbourhood aggregation.

This is similar to the appendix idea in the GCN paper, but GraphSAGE emphasizes inductive embedding generation.

19. Relevance to Saree Provenance Classification

This paper is very relevant for saree provenance classification if one builds a graph-based saree knowledge system.

Imagine a graph with nodes such as:

Node Type	Example
Saree image	Image of a Kanjivaram saree
Motif	Peacock, mango, temple border
Region	Tamil Nadu, Varanasi, Telangana
Weave technique	Ikat, brocade, jamdani
Material	Silk, cotton, zari
Craft cluster	Kanjivaram, Banarasi, Gadwal

Edges could represent:

Edge	Meaning
saree-has-motif	A saree contains a motif
saree-from-region	A saree belongs to a region
craft-uses-technique	A craft uses a technique
motif-associated-with-region	A motif is linked to a region

Now suppose a new saree image enters the system. A transductive graph method may struggle because the new node was not present during training. GraphSAGE is useful because it can generate an embedding for this new saree node using:

\[ \text{its visual features} \]

and:

\[ \text{its graph neighbourhood} \]

For example:

\[ z_{\text{new saree}} = \text{GraphSAGE} ( \text{image features}, \text{motif neighbours}, \text{region neighbours}, \text{weave neighbours} ) \]

This makes GraphSAGE attractive for a live saree classification or recommendation system where new products keep coming.

20. Difference from the Previous GCN Paper

The GCN paper gave the foundation:

How can a node learn from its neighbours?

The GraphSAGE paper asks the next practical question:

How can we do this for new nodes and large evolving graphs?

Question	Paper
How to perform graph convolution for semi-supervised node classification?	GCN
How to generate embeddings for unseen nodes in large graphs?	GraphSAGE

21. Main Limitation

GraphSAGE depends heavily on the quality of node features. If a new node has poor or missing features, the generated embedding may also be weak.

Neighbour sampling also introduces approximation. It improves scalability but may miss important neighbours. Another limitation is that the paper mainly studies fixed-size uniform neighbour sampling. The authors themselves mention that learning better non-uniform sampling strategies could be a future direction.

22. One-Sentence Summary

GraphSAGE is an inductive graph representation learning method that learns how to generate node embeddings by sampling and aggregating feature information from a node’s local neighbourhood.

In very simple words, GraphSAGE teaches the model how to create an embedding for a new node by looking at the node itself and the neighbours around it.

```

My Research Notes

Thursday, 4 June 2026

Understanding the GraphSAGE Paper

Understanding the GraphSAGE Paper

1. The Main Problem

2. What GraphSAGE Does Differently

3. The Meaning of Inductive Learning

4. The Core Idea: Sample and Aggregate

5. The GraphSAGE Embedding Generation Algorithm

6. One-Hop and Two-Hop Understanding

7. Why Sampling is Needed

8. Aggregator Functions

9. Mean Aggregator

10. LSTM Aggregator

11. Pooling Aggregator

12. Supervised and Unsupervised Learning

13. How GraphSAGE Differs from GCN

14. Experiments

15. Results

16. Runtime Advantage

17. Why GraphSAGE is Useful

18. Relation to Weisfeiler-Lehman

19. Relevance to Saree Provenance Classification

20. Difference from the Previous GCN Paper

21. Main Limitation

22. One-Sentence Summary

No comments:

Post a Comment

Understading the Paper: Fine Grained Image Analysis with Deep Learning