Efficient Untargeted White-Box Adversarial Attacks Based on Simple Initialization
-
Graphical Abstract
-
Abstract
Adversarial examples (AEs) are an additive amalgamation of clean examples and artificially malicious perturbations. Attackers often leverage random noise and multiple random restarts to initialize perturbation starting points, thereby increasing the diversity of AEs. Given the non-convex nature of the loss function, employing randomness to augment the attack’s success rate may lead to considerable computational overhead. To overcome this challenge, we introduce the one-hot mean square error loss to guide the initialization. This loss is combined with the strongest first-order attack, the projected gradient descent, alongside a dynamic attack step size adjustment strategy to form a comprehensive attack process. Through experimental validation, we demonstrate that our method outperforms baseline attacks in constrained attack budget scenarios and regular experimental settings. This establishes it as a reliable measure for assessing the robustness of deep learning models. We explore the broader application of this initialization strategy in enhancing the defense impact of few-shot classification models. We aspire to provide valuable insights for the community in designing attack and defense mechanisms.
-
-