Robust Features Learning under the Guidance of Generalization Features
-
Graphical Abstract
-
Abstract
Deep neural networks (DNNs) are vulnerable to adversarial attacks, where attackers can easily deceive these models by introducing slight perturbations to input examples that are imperceptible to humans, resulting in incorrect predictions. This vulnerability significantly undermines the deployment of deep learning models in security-critical applications. Adversarial training (AT) is among the most practical methods to improve the robustness of DNNs. However, state-of-the-art adversarial training methods exhibit a trade-off: they improve model robustness on training data but result in a significant gap in generalization, with high robustness during training but lower robustness during testing. In this paper, we propose a novel adversarial training framework to enhance the robust generalization of deep neural networks. This framework introduces two key innovations: perturbing features and generalized feature guidance. Specifically, we designed a feature perturbation strategy that introduces perturbations to all features that could affect the model's predictions. The goal is to enable the model to learn more robust features while enhancing the robustness of its existing features. Additionally, we leverage the generalized features from a clean model to guide the training of a robust model, with the aim of mitigating natural accuracy degradation. We evaluate the effectiveness of the proposed method through extensive experiments on popular benchmark datasets, such as CIFAR and SVHN. We also test its robustness against state-of-the-art adversarial attacks, including PGD and AutoAttack. The results demonstrate that our approach significantly improves the robust general-ization of deep neural networks.
-
-