ImageNet
-
Description: A large-scale image dataset organized according to the WordNet hierarchy.
-
Size: Over 14 million images labeled across 21,000+ categories.
-
Use: Widely used for training and benchmarking deep learning models, especially for large-scale image classification.
-
Famous Benchmark: The ImageNet Large Scale Visual Recognition Challenge (ILSVRC), which uses a subset of 1,000 classes.
CIFAR-100
-
Description: A small image classification dataset with 60,000 32x32 color images in 100 classes (600 images per class).
-
Split: 50,000 training and 10,000 test images.
-
Hierarchy: Also organized into 20 superclasses (each containing 5 fine classes).
-
Use: Suitable for testing model performance on small, diverse datasets.
VTAB (Visual Task Adaptation Benchmark)
-
Description: A benchmark suite designed to evaluate the generalization of pre-trained models across 19 real-world vision tasks.
-
Categories: Tasks are grouped into:
-
Natural (e.g., CIFAR, DTD),
-
Specialized (e.g., satellite images),
-
Structured (e.g., depth estimation, keypoint detection).
-
-
Use: Tests how well models transfer to new domains and tasks with minimal fine-tuning.
-
Description: A larger version of the standard ImageNet dataset.
-
Classes: ~21,000 classes based on the full WordNet hierarchy.
-
Images: ~14 million labeled images.
-
Use: Commonly used for pretraining models before fine-tuning on downstream tasks (e.g., ImageNet-1k, CIFAR, VTAB).
-
Note: Much more fine-grained and hierarchical than ImageNet-1k.
JFT-300M
-
Description: A massive, proprietary dataset created by Google.
-
Images: ~300 million images.
-
Labels: ~18,000 noisy labels from an internal Google label space.
-
Use: Used to pretrain large-scale vision models (e.g., Vision Transformers) that achieve state-of-the-art performance after fine-tuning.
-
Not Public: This dataset is not publicly available.
Oxford-IIIT Pet Dataset (Oxford Pets)
-
Description: A dataset of 7,349 images of cats and dogs across 37 pet breeds.
-
Tasks Supported:
-
Image classification (breed or species: cat/dog)
-
Object detection (head bounding boxes)
-
Semantic segmentation (pixel-level pet masks)
-
-
Variability: Includes pets in diverse poses, scales, and lighting conditions.
-
Applications: Fine-grained classification, segmentation, and transfer learning.
It’s commonly used for training and evaluating models on fine-grained visual recognition and segmentation tasks.
No comments:
Post a Comment