Detecting and purifying adversarial inputs in deep learning computing systems

and
US Patent US11373093B2
, (Granted)
Abstract. Adversarial input detection and purification (AIDAP) preprocessor and deep learning computer model mechanisms are provided. The deep learning computer model receives input data and processes it to generate a first pass output that is output to the AIDAP preprocessor. The AIDAP preprocessor determines a discriminative region of the input data based on the first pass output and transforms a subset of elements in the discriminative region to modify a characteristic of the elements and generate a transformed input data. The deep learning computer model processes the transformed input data to generate a second pass output that is output to the AIDAP preprocessor which detects an adversarial input or not based on a comparison of the first pass and second pass outputs. If an adversarial input is detected, a responsive action that mitigates effects of the adversarial input is performed.