Supervised Dimensionality Reduction Revisited: Why LDA on Frozen CNN Features Deserves a Second Look
arXiv:2604.03928v2 Announce Type: replace-cross
Abstract: Frozen pretrained image representations are widely used for transfer learning: a backbone is kept fixed, feature vectors are extracted, and a lightweight classifier is trained on top. This pipeline usually feeds the full feature vector to the classifier, even when the target task has far fewer classes than the pretraining task. We revisit a classical alternative: supervised dimensionality reduction with Linear Discriminant Analysis (LDA) before linear probing.
We evaluate ten dimensionality-reduction strategies on frozen features from six backbones -- ResNet-18, ResNet-50, MobileNetV3-Small, EfficientNet-B0, ViT-B/16, and DINOv2-ViT-S/14 -- across CIFAR-100, Tiny ImageNet, and CUB-200-2011. Under a fixed logistic-regression protocol, LDA improves accuracy over full features in 11 of 12 coarse-grained configurations, with gains up to 4.5 percentage points while reducing feature dimensionality by 48-87%. The same projection consistently hurts on fine-grained CUB-200, where full features win across all six backbones. This establishes a practical boundary condition: LDA is useful when class-level structure is coarse enough to be captured by mean-separating directions, but it can discard subtle cues needed for fine-grained recognition.
We also compare LDA with PCA, PCA+LDA, regularized LDA, Local Fisher Discriminant Analysis, Neighbourhood Components Analysis, and three lightweight LDA extensions. The results show that plain LDA offers the best accuracy-cost tradeoff for most coarse-grained settings, while more complex supervised reduction methods rarely justify their additional cost. Overall, the study provides concrete guidance for when post-hoc supervised projection should, and should not, be inserted into frozen-feature image classification pipelines.