🔍 Objective
To overcome limitations in clothing recognition tasks due to fragmented, small, or weakly labeled datasets by introducing a large-scale, richly annotated dataset—DeepFashion—and a novel model called FashionNet.
🗂️ DeepFashion Dataset
-
Scale: >800,000 images
-
Annotations:
-
50 fine-grained clothing categories
-
1,000 clothing attributes (texture, fabric, shape, part, style)
-
4–8 clothing landmarks per image (e.g., collar, sleeve ends, hems)
-
300,000 cross-pose/cross-domain image pairs (e.g., shop vs. street)
-
-
Sources: Online shops (Forever21, Mogujie) and Google Images
-
Benchmarks Supported:
-
Attribute Prediction
-
In-shop Clothes Retrieval
-
Consumer-to-Shop Clothes Retrieval
-
🧠 FashionNet Architecture
-
Based on VGG-16, with three branches:
-
Global Appearance Branch
-
Local Landmark-Guided Branch
-
Pose Estimation Branch (predicts landmark locations & visibility)
-
-
Landmark Pooling Layer: Pools/gates features using predicted landmarks, improving robustness to deformation and occlusion.
🔁 Training Approach
-
Multi-task loss optimization:
-
Softmax loss for categories and visibility
-
Cross-entropy loss for attribute prediction
-
Regression loss for landmark localization
-
Triplet loss for retrieval learning
-
-
Iterative Training: First focuses on landmark prediction, then on attribute and category learning using pooled features.
📊 Key Results
-
FashionNet outperforms prior methods like WTBI and DARN:
-
Category classification: Top-3 accuracy of 82.58%
-
Attribute prediction: Best across all five attribute groups
-
In-shop retrieval: Top-20 accuracy of 76.4% (vs. 67.5% for DARN)
-
Consumer-to-shop retrieval: Top-20 accuracy of 18.8% (70% higher than DARN)
-
-
Ablation studies show:
-
Using clothing landmarks > human joints/poselets
-
Using more attributes improves model performance
-
🧩 Contributions
-
DeepFashion Dataset: Largest and most comprehensively annotated fashion dataset to date.
-
FashionNet: A deep model integrating attribute and landmark learning for robust clothing feature extraction.
-
Benchmarks and Protocols: Defined for consistent evaluation in classification and retrieval tasks.
No comments:
Post a Comment