Sunday, 11 May 2025

Collection of Vision Benchmarks ( Image Datasets with their brief Description)

 

ImageNet

  • Description: A large-scale image dataset organized according to the WordNet hierarchy.

  • Size: Over 14 million images labeled across 21,000+ categories.

  • Use: Widely used for training and benchmarking deep learning models, especially for large-scale image classification.

  • Famous Benchmark: The ImageNet Large Scale Visual Recognition Challenge (ILSVRC), which uses a subset of 1,000 classes.


CIFAR-100

  • Description: A small image classification dataset with 60,000 32x32 color images in 100 classes (600 images per class).

  • Split: 50,000 training and 10,000 test images.

  • Hierarchy: Also organized into 20 superclasses (each containing 5 fine classes).

  • Use: Suitable for testing model performance on small, diverse datasets.


VTAB (Visual Task Adaptation Benchmark)

  • Description: A benchmark suite designed to evaluate the generalization of pre-trained models across 19 real-world vision tasks.

  • Categories: Tasks are grouped into:

    • Natural (e.g., CIFAR, DTD),

    • Specialized (e.g., satellite images),

    • Structured (e.g., depth estimation, keypoint detection).

  • Use: Tests how well models transfer to new domains and tasks with minimal fine-tuning.

ImageNet-21k
  • Description: A larger version of the standard ImageNet dataset.

  • Classes: ~21,000 classes based on the full WordNet hierarchy.

  • Images: ~14 million labeled images.

  • Use: Commonly used for pretraining models before fine-tuning on downstream tasks (e.g., ImageNet-1k, CIFAR, VTAB).

  • Note: Much more fine-grained and hierarchical than ImageNet-1k.


JFT-300M

  • Description: A massive, proprietary dataset created by Google.

  • Images: ~300 million images.

  • Labels: ~18,000 noisy labels from an internal Google label space.

  • Use: Used to pretrain large-scale vision models (e.g., Vision Transformers) that achieve state-of-the-art performance after fine-tuning.

  • Not Public: This dataset is not publicly available.


Oxford-IIIT Pet Dataset (Oxford Pets)

  • Description: A dataset of 7,349 images of cats and dogs across 37 pet breeds.

  • Tasks Supported:

    • Image classification (breed or species: cat/dog)

    • Object detection (head bounding boxes)

    • Semantic segmentation (pixel-level pet masks)

  • Variability: Includes pets in diverse poses, scales, and lighting conditions.

  • Applications: Fine-grained classification, segmentation, and transfer learning.

It’s commonly used for training and evaluating models on fine-grained visual recognition and segmentation tasks.


No comments:

Post a Comment

🧠 You Only Laugh Once: Creativity and Humor in Deep Learning Community

It all started with a simple truth: Attention Is All You Need . Or at least, that’s what the transformers keep whispering at every AI confer...