Wednesday, 30 April 2025

The Fashionnet Paper: DeepFashion: Powering Robust Clothes Recognition and Retrieval with Rich Annotations" (CVPR 2016) by Ziwei Liu et al

 See the link to the Chat GPT

🔍 Objective

To overcome limitations in clothing recognition tasks due to fragmented, small, or weakly labeled datasets by introducing a large-scale, richly annotated dataset—DeepFashion—and a novel model called FashionNet.


🗂️ DeepFashion Dataset

  • Scale: >800,000 images

  • Annotations:

    • 50 fine-grained clothing categories

    • 1,000 clothing attributes (texture, fabric, shape, part, style)

    • 4–8 clothing landmarks per image (e.g., collar, sleeve ends, hems)

    • 300,000 cross-pose/cross-domain image pairs (e.g., shop vs. street)

  • Sources: Online shops (Forever21, Mogujie) and Google Images

  • Benchmarks Supported:

    1. Attribute Prediction

    2. In-shop Clothes Retrieval

    3. Consumer-to-Shop Clothes Retrieval


🧠 FashionNet Architecture

  • Based on VGG-16, with three branches:

    1. Global Appearance Branch

    2. Local Landmark-Guided Branch

    3. Pose Estimation Branch (predicts landmark locations & visibility)

  • Landmark Pooling Layer: Pools/gates features using predicted landmarks, improving robustness to deformation and occlusion.


🔁 Training Approach

  • Multi-task loss optimization:

    • Softmax loss for categories and visibility

    • Cross-entropy loss for attribute prediction

    • Regression loss for landmark localization

    • Triplet loss for retrieval learning

  • Iterative Training: First focuses on landmark prediction, then on attribute and category learning using pooled features.


📊 Key Results

  • FashionNet outperforms prior methods like WTBI and DARN:

    • Category classification: Top-3 accuracy of 82.58%

    • Attribute prediction: Best across all five attribute groups

    • In-shop retrieval: Top-20 accuracy of 76.4% (vs. 67.5% for DARN)

    • Consumer-to-shop retrieval: Top-20 accuracy of 18.8% (70% higher than DARN)

  • Ablation studies show:

    • Using clothing landmarks > human joints/poselets

    • Using more attributes improves model performance


🧩 Contributions

  1. DeepFashion Dataset: Largest and most comprehensively annotated fashion dataset to date.

  2. FashionNet: A deep model integrating attribute and landmark learning for robust clothing feature extraction.

  3. Benchmarks and Protocols: Defined for consistent evaluation in classification and retrieval tasks.

No comments:

Post a Comment

🧠 You Only Laugh Once: Creativity and Humor in Deep Learning Community

It all started with a simple truth: Attention Is All You Need . Or at least, that’s what the transformers keep whispering at every AI confer...