Deep Learning — clear, compact guide¶

1 — What is Deep Learning?¶

Deep Learning (DL) is a subfield of machine learning (ML) based on artificial neural networks with many layers and automatic representation learning.
Instead of manually designing features, DL models learn hierarchical features from raw inputs: early layers detect low-level patterns (edges, textures), middle layers combine these into parts (shapes), and later layers form high-level concepts (faces, objects).
Short definition: Deep learning = neural network models with multiple layers that automatically learn features from data.

Traditional ML workflow: collect features (color histograms, SIFT, HOG), hand-craft feature engineering, feed into classifier (SVM, Random Forest).
Deep learning workflow: feed raw images into a CNN; early conv layers learn edges; deeper layers learn shapes and parts; final fully connected or global pooling + softmax classifies dog vs cat automatically.

Aspect	Machine Learning (classical)	Deep Learning
Feature engineering	Manual — domain experts design features	Automatic — representation learning inside network
Data requirement	Works well with small/moderate datasets	Usually data-hungry; benefits from large datasets & transfer learning
Hardware needs	CPU often sufficient	GPUs/TPUs recommended for training; faster inference with optimized hardware
Model complexity	Often simple, interpretable (decision trees, linear models)	Large, many parameters — often less interpretable
Training time	Usually shorter	Can be long (hours→weeks) depending on model/data/hardware
Typical use cases	tabular data, small datasets, interpretable models	Image, audio, text, video, large-scale tasks
Interpretability	Easier to explain	Harder — requires explainability tools
Example algorithms	SVM, Random Forest, Logistic Regression	CNNs, RNNs, Transformers, GANs
When to choose	Small data, need interpretability or fast training	Large data, high accuracy, complex patterns

The explosion of digital data (images, video, text, logs, sensor data, audio) supplies DL models with the huge datasets they need.
Public benchmark datasets and large corpora enabled fast progress and reproducible comparisons.

Famous datasets (examples):
- Images: ImageNet, Microsoft COCO
- Video: YouTube-8M (large collection of labeled video)
- Text / QA: SQuAD, Wikipedia dumps (for language modeling)
- Audio: Librispeech, Google AudioSet

GPUs (NVIDIA + CUDA) massively parallelize matrix ops required by DL.
TPUs, FPGAs, ASICs and specialized edge chips accelerate training and inference.
Moore’s law + specialized hardware reduced training time and cost.

High-level libraries (TensorFlow, Keras, PyTorch) made model building and experimentation easy.
Rich ecosystems (pretrained models, tutorials, community repos) accelerated adoption.

Convolutional Neural Networks (CNNs) for images (AlexNet, VGG, ResNet, EfficientNet).
Recurrent/sequence models and later Transformers for NLP (LSTM → Transformer → BERT / GPT).
Specialized models for generation and translation (U-Net, GANs, Diffusion models).
Object detection (YOLO, SSD) and segmentation (Mask R-CNN).

Open research, code releases, benchmarks and competitive platforms (e.g., Kaggle) created fast iteration loops.
Transfer learning, pretraining, and model hubs made it practical to apply DL even with limited resources.

CNNs (Convolutional Neural Networks) — image classification, detection, segmentation (ResNet, EfficientNet).
RNNs / LSTM / GRU — sequence modeling (older; now often replaced).
Transformers — language understanding and generation (BERT, GPT), also used for images (ViT).
U-Net — image segmentation, biomedical imaging.
GANs / Diffusion Models — image generation and synthesis.
YOLO / SSD / Faster R-CNN — object detection.
WaveNet / wav2vec / Whisper — speech synthesis/recognition.