Skip to content

Topic 2: Data Augmentation for Deep Learning

🎯 Research Topic Guide

Complete resource guide for Data Augmentation research

πŸ“š What to Learn

Core Concepts

  • Geometric transformations: Rotation, translation, scaling
  • Color space augmentations: Brightness, contrast, saturation
  • Mix-based methods: Mixup, CutMix, AugMix
  • Auto-augmentation: Learning augmentation policies
  • Adversarial augmentation: Robustness through augmentation

Key Skills

  • Implementing augmentation pipelines
  • Understanding augmentation effects
  • Auto-augmentation methods
  • Mix-based augmentation techniques
  • Evaluation of augmentation strategies

Learning Path

Start with simple geometric augmentations, then learn mix-based methods, finally explore auto-augmentation.

πŸ“– Survey Papers (Start Here!)

πŸ“‹ Essential Survey Papers
  1. "A Survey on Image Data Augmentation for Deep Learning" (2019)

    • Authors: Shorten, Khoshgoftaar
    • Link: https://journalofbigdata.springeropen.com/articles/10.1186/s40537-019-0197-0
    • Why: Comprehensive survey of augmentation methods
    • Difficulty: Beginner-friendly
  2. "Data Augmentation: A Comprehensive Survey of Modern Approaches" (2022)

    • Authors: Cubuk, Zoph, Shlens, Le
    • Link: https://arxiv.org/abs/2209.02897
    • Why: Modern survey covering recent methods
    • Difficulty: Intermediate
  3. "A Survey of Data Augmentation Approaches for NLP" (2021)

    • Authors: Feng, Yang, Cer, et al.
    • Link: https://arxiv.org/abs/2105.03075
    • Why: NLP-focused augmentation survey
    • Difficulty: Intermediate

Start with Surveys

Surveys give you complete overview before diving into specific methods.

πŸ›οΈ Classic Papers (Must Read)

⭐ Foundational Papers
  1. "Mixup: Beyond Empirical Risk Minimization" (2017)

    • Authors: Zhang, Cisse, Dauphin, Lopez-Paz
    • Link: https://arxiv.org/abs/1710.09412
    • Code: https://github.com/facebookresearch/mixup-cifar10
    • Impact: Started mix-based augmentation trend
    • Difficulty: Easy (simple concept, easy to implement)
  2. "AutoAugment: Learning Augmentation Strategies from Data" (2019)

    • Authors: Cubuk, Zoph, Mane, et al.
    • Link: https://arxiv.org/abs/1805.09501
    • Code: https://github.com/tensorflow/models/tree/master/research/autoaugment
    • Impact: First successful auto-augmentation
    • Difficulty: Medium
  3. "CutMix: Regularization Strategy to Train Strong Classifiers" (2019)

    • Authors: Yun, Han, Oh, et al.
    • Link: https://arxiv.org/abs/1905.04899
    • Code: https://github.com/clovaai/CutMix-PyTorch
    • Impact: Popular mix-based method
    • Difficulty: Easy-Medium

πŸš€ Modern Papers (Recent & Important)

πŸ”₯ Recent Important Papers
  1. "RandAugment: Practical automated data augmentation" (2020)

    • Authors: Cubuk, Zoph, Shlens, Le
    • Link: https://arxiv.org/abs/1909.13719
    • Code: https://github.com/tensorflow/tpu/tree/master/models/official/efficientnet
    • Venue: CVPR 2020
    • Difficulty: Easy (simpler than AutoAugment)
  2. "AugMix: A Simple Data Processing Method to Improve Robustness" (2020)

    • Authors: Hendrycks, Mu, Cubuk, et al.
    • Link: https://arxiv.org/abs/1912.02781
    • Code: https://github.com/google-research/augmix
    • Venue: ICLR 2020
    • Difficulty: Medium
  3. "TrivialAugment: Tuning-free Yet State-of-the-Art Data Augmentation" (2021)

    • Authors: MΓΌller, Hutter
    • Link: https://arxiv.org/abs/2103.10158
    • Code: https://github.com/automl/trivialaugment
    • Venue: ICCV 2021
    • Difficulty: Easy (parameter-free)
  4. "Simple Copy-Paste is a Strong Data Augmentation" (2021)

    • Authors: Ghiasi, Cui, Srinivas, et al.
    • Link: https://arxiv.org/abs/2012.07177
    • Code: https://github.com/facebookresearch/detection/tree/main/projects/SimpleCopyPaste
    • Venue: CVPR 2021
    • Difficulty: Medium

πŸ“ Tutorial Papers (Beginner-Friendly)

πŸŽ“ Tutorial & Educational Papers
  1. "Understanding Data Augmentation" - PyTorch Tutorial

    • Link: https://pytorch.org/tutorials/beginner/data_loading_tutorial.html
    • Why: Practical implementation guide
    • Difficulty: Beginner
  2. "Data Augmentation in PyTorch" - Official Docs

    • Link: https://pytorch.org/vision/stable/transforms.html
    • Why: Complete transform reference
    • Difficulty: Beginner
  3. "Albumentations Library"

    • Link: https://albumentations.ai/
    • Why: Fast augmentation library with examples
    • Difficulty: Beginner

πŸ’» Code Implementation Papers

πŸ”§ Papers with Excellent Code
  1. "Mixup"

    • Code: https://github.com/facebookresearch/mixup-cifar10
    • Framework: PyTorch
    • Quality: Official, simple (~20 lines)
  2. "CutMix"

    • Code: https://github.com/clovaai/CutMix-PyTorch
    • Framework: PyTorch
    • Quality: Official, well-documented
  3. "RandAugment"

    • Code: https://github.com/tensorflow/tpu/tree/master/models/official/efficientnet
    • Framework: TensorFlow
    • Quality: Official
  4. "AugMix"

    • Code: https://github.com/google-research/augmix
    • Framework: PyTorch
    • Quality: Official
  5. "TrivialAugment"

    • Code: https://github.com/automl/trivialaugment
    • Framework: PyTorch
    • Quality: Official, very simple
  6. "Albumentations" (Library)

    • Code: https://github.com/albumentations-team/albumentations
    • Framework: PyTorch/TensorFlow
    • Quality: Industry standard, 70+ transforms

Start with Mixup

Mixup is easiest to implement (10-20 lines). Start there, then move to CutMix.

πŸ“Š Where to Track Papers

Paper Discovery Platforms

πŸ” Paper Discovery
  1. Papers With Code - Data Augmentation

    • URL: https://paperswithcode.com/task/data-augmentation
    • Features: Papers with code, implementations
    • Best for: Finding code
  2. arXiv - Computer Vision

    • URL: https://arxiv.org/list/cs.CV/recent
    • Search: "data augmentation" OR "augmentation"
    • Best for: Latest papers
  3. Google Scholar

    • Search: "data augmentation" deep learning
    • Best for: Comprehensive search
  4. Semantic Scholar

    • Search: Data augmentation
    • Best for: Related papers
  5. Connected Papers

    • Start with: Mixup or AutoAugment paper
    • Best for: Exploring augmentation area

Conference Proceedings

πŸ“… Top Venues
  1. CVPR (June)
  2. URL: https://openaccess.thecvf.com/CVPR
  3. Search: Data augmentation
  4. Best for: Vision augmentation

  5. ICCV (October, biennial)

  6. URL: https://openaccess.thecvf.com/ICCV
  7. Search: Augmentation
  8. Best for: Vision methods

  9. ICLR (May)

  10. URL: https://openreview.net/group?id=ICLR.cc
  11. Search: Data augmentation
  12. Best for: Learning-based augmentation

πŸ“₯ How to Get Papers

Free Access Methods

πŸ†“ Free Access
  1. arXiv - All papers free
  2. Most augmentation papers on arXiv
  3. Direct PDF download

  4. OpenReview - ICLR papers

  5. All ICLR papers free
  6. Includes reviews

  7. CVF Open Access - CVPR/ICCV

  8. All papers free
  9. Direct PDF links

  10. Google Scholar - PDF links

  11. Check "All versions" for free PDFs

  12. Author Websites

    • Many authors post PDFs
    • Check personal pages

Getting Papers

Most augmentation papers are on arXiv or CVF (free). Very accessible.

πŸ“š Learning Resources

Courses & Tutorials

πŸŽ“ Courses
  1. Fast.ai - Practical Deep Learning

    • URL: https://course.fast.ai/
    • Focus: Data augmentation practical
    • Level: Beginner-friendly
  2. CS231n - Stanford

    • URL: https://cs231n.stanford.edu/
    • Focus: Data augmentation in vision
    • Level: Intermediate

Libraries & Tools

πŸ› οΈ Libraries
  1. Albumentations

    • URL: https://albumentations.ai/
    • Why: Fast, 70+ transforms, well-documented
    • Best for: Production use
  2. torchvision.transforms

    • URL: https://pytorch.org/vision/stable/transforms.html
    • Why: PyTorch official, simple
    • Best for: Basic augmentations
  3. imgaug

    • URL: https://github.com/aleju/imgaug
    • Why: Many transforms, flexible
    • Best for: Research

Blogs & Articles

πŸ“° Blogs
  1. "Understanding Data Augmentation" - Towards Data Science

    • Search: Data augmentation deep learning
    • Why: Practical explanations
  2. Albumentations Blog

    • URL: https://albumentations.ai/blog
    • Why: Tutorials and examples

🎯 Reading Strategy

Week 1: Foundations

  1. Read survey paper (#1)
  2. Read Mixup paper (easiest)
  3. Implement Mixup (10-20 lines)
  4. Read CutMix paper
  5. Implement CutMix

Week 2: Auto-Augmentation

  1. Read AutoAugment paper
  2. Read RandAugment paper (simpler)
  3. Try RandAugment implementation
  4. Read TrivialAugment (easiest auto method)

Week 3: Advanced Methods

  1. Read AugMix paper
  2. Read Copy-Paste paper
  3. Implement 2-3 methods
  4. Compare results

Week 4: Recent Work

  1. Follow arXiv for latest
  2. Read 2-3 recent papers
  3. Implement one new method
  4. Write comparison

Reading Plan

Start with Mixup (easiest), then CutMix, then auto-augmentation methods.

πŸ”” Stay Updated

RSS Feeds & Alerts

πŸ“‘ Alerts
  1. arXiv RSS Feed

    • URL: https://arxiv.org/list/cs.CV/recent
    • Search: "data augmentation"
    • Check: Daily
  2. Google Scholar Alerts

    • Setup: Alert for "data augmentation"
    • Frequency: Weekly

Social Media

πŸ“± Social Tracking
  1. Twitter/X

    • Follow: @paperswithcode
    • Hashtag: #DataAugmentation
  2. Reddit

    • r/MachineLearning
    • Search: Data augmentation

πŸ“‹ To-Do Checklist

Beginner Level

  • Read survey paper on data augmentation
  • Read Mixup paper (easiest)
  • Implement Mixup (10-20 lines of code)
  • Read CutMix paper
  • Implement CutMix
  • Try Albumentations library

Intermediate Level

  • Read AutoAugment paper
  • Read RandAugment paper
  • Implement RandAugment
  • Read TrivialAugment (parameter-free)
  • Compare different methods
  • Read 5 recent papers

Advanced Level

  • Read AugMix paper
  • Read Copy-Paste paper
  • Implement advanced methods
  • Experiment with combinations
  • Write augmentation pipeline
  • Contribute to open-source
  • Papers With Code: https://paperswithcode.com/task/data-augmentation
  • Albumentations: https://albumentations.ai/
  • PyTorch Transforms: https://pytorch.org/vision/stable/transforms.html
  • Mixup Code: https://github.com/facebookresearch/mixup-cifar10
  • CutMix Code: https://github.com/clovaai/CutMix-PyTorch

Next Steps: Start with Mixup (easiest to implement), then CutMix, then explore auto-augmentation methods.