My Research Notes: Collection of Vision Benchmarks ( Image Datasets with their brief Description)

Sunday, 11 May 2025

Collection of Vision Benchmarks ( Image Datasets with their brief Description)

ImageNet

Description: A large-scale image dataset organized according to the WordNet hierarchy.
Size: Over 14 million images labeled across 21,000+ categories.
Use: Widely used for training and benchmarking deep learning models, especially for large-scale image classification.
Famous Benchmark: The ImageNet Large Scale Visual Recognition Challenge (ILSVRC), which uses a subset of 1,000 classes.

CIFAR-100

Description: A small image classification dataset with 60,000 32x32 color images in 100 classes (600 images per class).
Split: 50,000 training and 10,000 test images.
Hierarchy: Also organized into 20 superclasses (each containing 5 fine classes).
Use: Suitable for testing model performance on small, diverse datasets.

VTAB (Visual Task Adaptation Benchmark)

Description: A benchmark suite designed to evaluate the generalization of pre-trained models across 19 real-world vision tasks.
Categories: Tasks are grouped into:
- Natural (e.g., CIFAR, DTD),
- Specialized (e.g., satellite images),
- Structured (e.g., depth estimation, keypoint detection).
Use: Tests how well models transfer to new domains and tasks with minimal fine-tuning.

ImageNet-21k

Description: A larger version of the standard ImageNet dataset.
Classes: ~21,000 classes based on the full WordNet hierarchy.
Images: ~14 million labeled images.
Use: Commonly used for pretraining models before fine-tuning on downstream tasks (e.g., ImageNet-1k, CIFAR, VTAB).
Note: Much more fine-grained and hierarchical than ImageNet-1k.

JFT-300M

Description: A massive, proprietary dataset created by Google.
Images: ~300 million images.
Labels: ~18,000 noisy labels from an internal Google label space.
Use: Used to pretrain large-scale vision models (e.g., Vision Transformers) that achieve state-of-the-art performance after fine-tuning.
Not Public: This dataset is not publicly available.

Oxford-IIIT Pet Dataset (Oxford Pets)

Description: A dataset of 7,349 images of cats and dogs across 37 pet breeds.
Tasks Supported:
- Image classification (breed or species: cat/dog)
- Object detection (head bounding boxes)
- Semantic segmentation (pixel-level pet masks)
Variability: Includes pets in diverse poses, scales, and lighting conditions.
Applications: Fine-grained classification, segmentation, and transfer learning.

It’s commonly used for training and evaluating models on fine-grained visual recognition and segmentation tasks.

No comments:

Post a Comment

Subscribe to: Post Comments (Atom)