course website: (https://hanlab.mit.edu/courses/2024-fall-65940)[https://hanlab.mit.edu/courses/2024-fall-65940]
Pruning
- Pruning Granularity
- Pruning Criterion
- Pruning Ratio
- Fine-tune: improve performance
Distinguish Between Regularization and Normalization
Regularization(正则化):
- purpose: prevent overfitting
- method: add a penalty term to the loss function
Normalization(归一化):
- purpose: improve training speed and stability
- method: scale the input to have zero mean and unit variance
Quantization
K-means Quantization
Linear Quantization
Post-Training Quantization VS Quantization-Aware Training
Neural Architecture Search(NAS)
Knowledge Distillation
Matrix Multiplication
GEMM: General Matrix Multiplication
- Loop optimization
- reordering
- tiling
- unrolling
- SIMD
- Multithreading
- CUDA programming
Loop Optimization
Loop Reordering
Target: improve data locality of caches
1 | # normal loop |
Loop Tiling
Target: reduce cache miss (when B is much larger than cache size)
Determine the TILE_SIZE according to the cache size
1 | T2_j = TIILE_SIZE2 # TILE_SIZE2 * TILE_SIZE => L2 Cache |
Loop Unrolling
Target: reduce branching overhead
1 | for i in range(0, N): |