art.attacks.evasion

Module providing evasion attacks under a common interface.

Adversarial Patch

class art.attacks.evasion.AdversarialPatch(classifier: CLASSIFIER_NEURALNETWORK_TYPE, rotation_max: float = 22.5, scale_min: float = 0.1, scale_max: float = 1.0, learning_rate: float = 5.0, max_iter: int = 500, batch_size: int = 16, patch_shape: Tuple[int, int, int] | None = None, targeted: bool = True, verbose: bool = True)

Implementation of the adversarial patch attack for square and rectangular images and videos.

__init__(classifier: CLASSIFIER_NEURALNETWORK_TYPE, rotation_max: float = 22.5, scale_min: float = 0.1, scale_max: float = 1.0, learning_rate: float = 5.0, max_iter: int = 500, batch_size: int = 16, patch_shape: Tuple[int, int, int] | None = None, targeted: bool = True, verbose: bool = True)

Create an instance of the AdversarialPatch.

Parameters:
  • classifier – A trained classifier.

  • rotation_max (float) – The maximum rotation applied to random patches. The value is expected to be in the range [0, 180].

  • scale_min (float) – The minimum scaling applied to random patches. The value should be in the range [0, 1], but less than scale_max.

  • scale_max (float) – The maximum scaling applied to random patches. The value should be in the range [0, 1], but larger than scale_min.

  • learning_rate (float) – The learning rate of the optimization.

  • max_iter (int) – The number of optimization steps.

  • batch_size (int) – The size of the training batch.

  • patch_shape – The shape of the adversarial patch as a tuple of shape (width, height, nb_channels). Currently only supported for TensorFlowV2Classifier. For classifiers of other frameworks the patch_shape is set to the shape of the input samples.

  • targeted (bool) – Indicates whether the attack is targeted (True) or untargeted (False).

  • verbose (bool) – Show progress bars.

apply_patch(x: ndarray, scale: float, patch_external: ndarray | None = None, **kwargs) ndarray

A function to apply the learned adversarial patch to images or videos.

Return type:

ndarray

Parameters:
  • x (ndarray) – Instances to apply randomly transformed patch.

  • scale (float) – Scale of the applied patch in relation to the classifier input shape.

  • patch_external – External patch to apply to images x.

Returns:

The patched instances.

generate(x: ndarray, y: ndarray | None = None, **kwargs) Tuple[ndarray, ndarray]

Generate an adversarial patch and return the patch and its mask in arrays.

Parameters:
  • x (ndarray) – An array with the original input images of shape NHWC or NCHW or input videos of shape NFHWC or NFCHW.

  • y – An array with the original true labels.

  • mask (np.ndarray) – A boolean array of shape equal to the shape of a single samples (1, H, W) or the shape of x (N, H, W) without their channel dimensions. Any features for which the mask is True can be the center location of the patch during sampling.

  • reset_patch (bool) – If True reset patch to initial values of mean of minimal and maximal clip value, else if False (default) restart from previous patch values created by previous call to generate or mean of minimal and maximal clip value if first call to generate.

Returns:

An array with adversarial patch and an array of the patch mask.

insert_transformed_patch(x: ndarray, patch: ndarray, image_coords: ndarray)

Insert patch to image based on given or selected coordinates.

Parameters:
  • x (ndarray) – The image to insert the patch.

  • patch (ndarray) – The patch to be transformed and inserted.

  • image_coords (ndarray) – The coordinates of the 4 corners of the transformed, inserted patch of shape [[x1, y1], [x2, y2], [x3, y3], [x4, y4]] in pixel units going in clockwise direction, starting with upper left corner.

Returns:

The input x with the patch inserted.

reset_patch(initial_patch_value: float | ndarray | None) None

Reset the adversarial patch.

Parameters:

initial_patch_value – Patch value to use for resetting the patch.

set_params(**kwargs) None

Take in a dictionary of parameters and apply attack-specific checks before saving them as attributes.

Parameters:

kwargs – A dictionary of attack-specific parameters.

Adversarial Patch - Numpy

class art.attacks.evasion.AdversarialPatchNumpy(classifier: CLASSIFIER_NEURALNETWORK_TYPE, target: int = 0, rotation_max: float = 22.5, scale_min: float = 0.1, scale_max: float = 1.0, learning_rate: float = 5.0, max_iter: int = 500, clip_patch: list | tuple | None = None, batch_size: int = 16, targeted: bool = True, verbose: bool = True)

Implementation of the adversarial patch attack for square and rectangular images and videos in Numpy.

__init__(classifier: CLASSIFIER_NEURALNETWORK_TYPE, target: int = 0, rotation_max: float = 22.5, scale_min: float = 0.1, scale_max: float = 1.0, learning_rate: float = 5.0, max_iter: int = 500, clip_patch: list | tuple | None = None, batch_size: int = 16, targeted: bool = True, verbose: bool = True) None

Create an instance of the AdversarialPatchNumpy.

Parameters:
  • classifier – A trained classifier.

  • target (int) – The target label for the created patch.

  • rotation_max (float) – The maximum rotation applied to random patches. The value is expected to be in the range [0, 180].

  • scale_min (float) – The minimum scaling applied to random patches. The value should be in the range [0, 1], but less than scale_max.

  • scale_max (float) – The maximum scaling applied to random patches. The value should be in the range [0, 1], but larger than scale_min.

  • learning_rate (float) – The learning rate of the optimization.

  • max_iter (int) – The number of optimization steps.

  • clip_patch – The minimum and maximum values for each channel in the form [(float, float), (float, float), (float, float)].

  • batch_size (int) – The size of the training batch.

  • targeted (bool) – Indicates whether the attack is targeted (True) or untargeted (False). Currently only targeted attacks are supported.

  • verbose (bool) – Show progress bars.

apply_patch(x: ndarray, scale: float, patch_external: ndarray | None = None, mask: ndarray | None = None) ndarray

A function to apply the learned adversarial patch to images or videos.

Return type:

ndarray

Parameters:
  • x (ndarray) – Instances to apply randomly transformed patch.

  • scale (float) – Scale of the applied patch in relation to the classifier input shape.

  • patch_external – External patch to apply to images x.

  • mask – An boolean array of shape equal to the shape of a single samples (1, H, W) or the shape of x (N, H, W) without their channel dimensions. Any features for which the mask is True can be the center location of the patch during sampling.

Returns:

The patched instances.

generate(x: ndarray, y: ndarray | None = None, **kwargs) Tuple[ndarray, ndarray]

Generate an adversarial patch and return the patch and its mask in arrays.

Parameters:
  • x (ndarray) – An array with the original input images of shape NHWC or NCHW or input videos of shape NFHWC or NFCHW.

  • y – An array with the original true labels.

  • mask (np.ndarray) – A boolean array of shape equal to the shape of a single samples (1, H, W) or the shape of x (N, H, W) without their channel dimensions. Any features for which the mask is True can be the center location of the patch during sampling.

  • reset_patch (bool) – If True reset patch to initial values of mean of minimal and maximal clip value, else if False (default) restart from previous patch values created by previous call to generate or mean of minimal and maximal clip value if first call to generate.

Returns:

An array with adversarial patch and an array of the patch mask.

static insert_transformed_patch(x: ndarray, patch: ndarray, image_coords: ndarray)

Insert patch to image based on given or selected coordinates.

Parameters:
  • x (ndarray) – The image to insert the patch.

  • patch (ndarray) – The patch to be transformed and inserted.

  • image_coords (ndarray) – The coordinates of the 4 corners of the transformed, inserted patch of shape [[x1, y1], [x2, y2], [x3, y3], [x4, y4]] in pixel units going in clockwise direction, starting with upper left corner.

Returns:

The input x with the patch inserted.

reset_patch(initial_patch_value: float | ndarray | None) None

Reset the adversarial patch.

Parameters:

initial_patch_value – Patch value to use for resetting the patch.

Adversarial Patch - PyTorch

class art.attacks.evasion.AdversarialPatchPyTorch(estimator: CLASSIFIER_NEURALNETWORK_TYPE, rotation_max: float = 22.5, scale_min: float = 0.1, scale_max: float = 1.0, distortion_scale_max: float = 0.0, learning_rate: float = 5.0, max_iter: int = 500, batch_size: int = 16, patch_shape: Tuple[int, int, int] = (3, 224, 224), patch_location: Tuple[int, int] | None = None, patch_type: str = 'circle', optimizer: str = 'Adam', targeted: bool = True, summary_writer: str | bool | SummaryWriter = False, verbose: bool = True)

Implementation of the adversarial patch attack for square and rectangular images and videos in PyTorch.

__init__(estimator: CLASSIFIER_NEURALNETWORK_TYPE, rotation_max: float = 22.5, scale_min: float = 0.1, scale_max: float = 1.0, distortion_scale_max: float = 0.0, learning_rate: float = 5.0, max_iter: int = 500, batch_size: int = 16, patch_shape: Tuple[int, int, int] = (3, 224, 224), patch_location: Tuple[int, int] | None = None, patch_type: str = 'circle', optimizer: str = 'Adam', targeted: bool = True, summary_writer: str | bool | SummaryWriter = False, verbose: bool = True)

Create an instance of the AdversarialPatchPyTorch.

Parameters:
  • estimator – A trained estimator.

  • rotation_max (float) – The maximum rotation applied to random patches. The value is expected to be in the range [0, 180].

  • scale_min (float) – The minimum scaling applied to random patches. The value should be in the range [0, 1], but less than scale_max.

  • scale_max (float) – The maximum scaling applied to random patches. The value should be in the range [0, 1], but larger than scale_min.

  • distortion_scale_max (float) – The maximum distortion scale for perspective transformation in range [0, 1]. If distortion_scale_max=0.0 the perspective transformation sampling will be disabled.

  • learning_rate (float) – The learning rate of the optimization. For optimizer=”pgd” the learning rate gets multiplied with the sign of the loss gradients.

  • max_iter (int) – The number of optimization steps.

  • batch_size (int) – The size of the training batch.

  • patch_shape – The shape of the adversarial patch as a tuple of shape CHW (nb_channels, height, width).

  • patch_location – The location of the adversarial patch as a tuple of shape (upper left x, upper left y).

  • patch_type (str) – The patch type, either circle or square.

  • optimizer (str) – The optimization algorithm. Supported values: “Adam”, and “pgd”. “pgd” corresponds to projected gradient descent in L-Inf norm.

  • targeted (bool) – Indicates whether the attack is targeted (True) or untargeted (False).

  • summary_writer – Activate summary writer for TensorBoard. Default is False and deactivated summary writer. If True save runs/CURRENT_DATETIME_HOSTNAME in current directory. If of type str save in path. If of type SummaryWriter apply provided custom summary writer. Use hierarchical folder structure to compare between runs easily. e.g. pass in ‘runs/exp1’, ‘runs/exp2’, etc. for each new experiment to compare across them.

  • verbose (bool) – Show progress bars.

apply_patch(x: ndarray, scale: float, patch_external: ndarray | None = None, mask: ndarray | None = None) ndarray

A function to apply the learned adversarial patch to images or videos.

Return type:

ndarray

Parameters:
  • x (ndarray) – Instances to apply randomly transformed patch.

  • scale (float) – Scale of the applied patch in relation to the estimator input shape.

  • patch_external – External patch to apply to images x.

  • mask – An boolean array of shape equal to the shape of a single samples (1, H, W) or the shape of x (N, H, W) without their channel dimensions. Any features for which the mask is True can be the center location of the patch during sampling.

Returns:

The patched samples.

generate(x: ndarray, y: ndarray | None = None, **kwargs) Tuple[ndarray, ndarray]

Generate an adversarial patch and return the patch and its mask in arrays.

Parameters:
  • x (ndarray) – An array with the original input images of shape NCHW or input videos of shape NFCHW.

  • y – An array with the original true labels.

  • mask (np.ndarray) – An boolean array of shape equal to the shape of a single samples (1, H, W) or the shape of x (N, H, W) without their channel dimensions. Any features for which the mask is True can be the center location of the patch during sampling.

Returns:

An array with adversarial patch and an array of the patch mask.

static insert_transformed_patch(x: ndarray, patch: ndarray, image_coords: ndarray)

Insert patch to image based on given or selected coordinates.

Parameters:
  • x (ndarray) – The image to insert the patch.

  • patch (ndarray) – The patch to be transformed and inserted.

  • image_coords (ndarray) – The coordinates of the 4 corners of the transformed, inserted patch of shape [[x1, y1], [x2, y2], [x3, y3], [x4, y4]] in pixel units going in clockwise direction, starting with upper left corner.

Returns:

The input x with the patch inserted.

reset_patch(initial_patch_value: float | ndarray | None = None) None

Reset the adversarial patch.

Parameters:

initial_patch_value – Patch value to use for resetting the patch.

Adversarial Patch - TensorFlowV2

class art.attacks.evasion.AdversarialPatchTensorFlowV2(classifier: CLASSIFIER_NEURALNETWORK_TYPE, rotation_max: float = 22.5, scale_min: float = 0.1, scale_max: float = 1.0, learning_rate: float = 5.0, max_iter: int = 500, batch_size: int = 16, patch_shape: Tuple[int, int, int] | None = None, optimizer: str = 'Adam', targeted: bool = True, summary_writer: str | bool | SummaryWriter = False, verbose: bool = True)

Implementation of the adversarial patch attack for square and rectangular images and videos in TensorFlow v2.

__init__(classifier: CLASSIFIER_NEURALNETWORK_TYPE, rotation_max: float = 22.5, scale_min: float = 0.1, scale_max: float = 1.0, learning_rate: float = 5.0, max_iter: int = 500, batch_size: int = 16, patch_shape: Tuple[int, int, int] | None = None, optimizer: str = 'Adam', targeted: bool = True, summary_writer: str | bool | SummaryWriter = False, verbose: bool = True)

Create an instance of the AdversarialPatchTensorFlowV2.

Parameters:
  • classifier – A trained classifier.

  • rotation_max (float) – The maximum rotation applied to random patches. The value is expected to be in the range [0, 180].

  • scale_min (float) – The minimum scaling applied to random patches. The value should be in the range [0, 1], but less than scale_max.

  • scale_max (float) – The maximum scaling applied to random patches. The value should be in the range [0, 1], but larger than scale_min.

  • learning_rate (float) – The learning rate of the optimization. For optimizer=”pgd” the learning rate gets multiplied with the sign of the loss gradients.

  • max_iter (int) – The number of optimization steps.

  • batch_size (int) – The size of the training batch.

  • patch_shape – The shape of the adversarial patch as a tuple of shape HWC (width, height, nb_channels).

  • optimizer (str) – The optimization algorithm. Supported values: “Adam”, and “pgd”. “pgd” corresponds to projected gradient descent in L-Inf norm.

  • targeted (bool) – Indicates whether the attack is targeted (True) or untargeted (False).

  • summary_writer – Activate summary writer for TensorBoard. Default is False and deactivated summary writer. If True save runs/CURRENT_DATETIME_HOSTNAME in current directory. If of type str save in path. If of type SummaryWriter apply provided custom summary writer. Use hierarchical folder structure to compare between runs easily. e.g. pass in ‘runs/exp1’, ‘runs/exp2’, etc. for each new experiment to compare across them.

  • verbose (bool) – Show progress bars.

apply_patch(x: ndarray, scale: float, patch_external: ndarray | None = None, mask: ndarray | None = None) ndarray

A function to apply the learned adversarial patch to images or videos.

Return type:

ndarray

Parameters:
  • x (ndarray) – Instances to apply randomly transformed patch.

  • scale (float) – Scale of the applied patch in relation to the classifier input shape.

  • patch_external – External patch to apply to images x.

  • mask – An boolean array of shape equal to the shape of a single samples (1, H, W) or the shape of x (N, H, W) without their channel dimensions. Any features for which the mask is True can be the center location of the patch during sampling.

Returns:

The patched samples.

generate(x: ndarray, y: ndarray | None = None, **kwargs) Tuple[ndarray, ndarray]

Generate an adversarial patch and return the patch and its mask in arrays.

Parameters:
  • x (ndarray) – An array with the original input images of shape NHWC or input videos of shape NFHWC.

  • y – An array with the original true labels.

  • mask (np.ndarray) – A boolean array of shape equal to the shape of a single samples (1, H, W) or the shape of x (N, H, W) without their channel dimensions. Any features for which the mask is True can be the center location of the patch during sampling.

  • reset_patch (bool) – If True reset patch to initial values of mean of minimal and maximal clip value, else if False (default) restart from previous patch values created by previous call to generate or mean of minimal and maximal clip value if first call to generate.

Returns:

An array with adversarial patch and an array of the patch mask.

static insert_transformed_patch(x: ndarray, patch: ndarray, image_coords: ndarray)

Insert patch to image based on given or selected coordinates.

Parameters:
  • x (ndarray) – The image to insert the patch.

  • patch (ndarray) – The patch to be transformed and inserted.

  • image_coords (ndarray) – The coordinates of the 4 corners of the transformed, inserted patch of shape [[x1, y1], [x2, y2], [x3, y3], [x4, y4]] in pixel units going in clockwise direction, starting with upper left corner.

Returns:

The input x with the patch inserted.

reset_patch(initial_patch_value: float | ndarray | None = None) None

Reset the adversarial patch.

Parameters:

initial_patch_value – Patch value to use for resetting the patch.

Adversarial Texture - PyTorch

class art.attacks.evasion.AdversarialTexturePyTorch(estimator, patch_height: int, patch_width: int, x_min: int = 0, y_min: int = 0, step_size: float = 0.00392156862745098, max_iter: int = 500, batch_size: int = 16, summary_writer: str | bool | SummaryWriter = False, verbose: bool = True)

Implementation of the adversarial texture attack on object trackers in PyTorch.

__init__(estimator, patch_height: int, patch_width: int, x_min: int = 0, y_min: int = 0, step_size: float = 0.00392156862745098, max_iter: int = 500, batch_size: int = 16, summary_writer: str | bool | SummaryWriter = False, verbose: bool = True)

Create an instance of the AdversarialTexturePyTorch.

Parameters:
  • estimator – A trained estimator.

  • patch_height (int) – Height of patch.

  • patch_width (int) – Width of patch.

  • x_min (int) – Vertical position of patch, top-left corner.

  • y_min (int) – Horizontal position of patch, top-left corner.

  • step_size (float) – The step size.

  • max_iter (int) – The number of optimization steps.

  • batch_size (int) – The size of the training batch.

  • summary_writer – Activate summary writer for TensorBoard. Default is False and deactivated summary writer. If True save runs/CURRENT_DATETIME_HOSTNAME in current directory. If of type str save in path. If of type SummaryWriter apply provided custom summary writer. Use hierarchical folder structure to compare between runs easily. e.g. pass in ‘runs/exp1’, ‘runs/exp2’, etc. for each new experiment to compare across them.

  • verbose (bool) – Show progress bars.

apply_patch(x: ndarray, patch_external: ndarray | None = None, foreground: ndarray | None = None, patch_points: ndarray | None = None) ndarray

A function to apply the learned adversarial texture to videos.

Return type:

ndarray

Parameters:
  • x (ndarray) – Videos of shape NFHWC to apply adversarial texture.

  • patch_external – External patch to apply to videos x.

  • foreground – Foreground masks of shape NFHWC of boolean values with False/0.0 representing foreground, preventing updates to the texture, and True/1.0 for background, allowing updates to the texture.

  • patch_points – Array of shape (nb_frames, 4, 2) containing four pairs of integers (height, width) corresponding to the coordinates of the four corners top-left, top-right, bottom-right, bottom-left of the transformed image in the coordinate-system of the original image.

Returns:

The videos with adversarial textures.

generate(x: ndarray, y: List[Dict[str, ndarray]], **kwargs) ndarray

Generate an adversarial patch and return the patch and its mask in arrays.

Return type:

ndarray

Parameters:
  • x (ndarray) – Input videos of shape NFHWC.

  • y

    True labels of format List[Dict[str, np.ndarray]], one dictionary for each input image. The keys of the dictionary are:

    • boxes [N_FRAMES, 4]: the boxes in [x1, y1, x2, y2] format, with 0 <= x1 < x2 <= W and

      0 <= y1 < y2 <= H.

Keyword Arguments:
  • shuffle (np.ndarray) – Shuffle order of samples, labels, initial boxes, and foregrounds for texture generation.

  • y_init (np.ndarray) – Initial boxes around object to be tracked of shape (nb_samples, 4) with second dimension representing [x1, y1, x2, y2] with 0 <= x1 < x2 <= W and 0 <= y1 < y2 <= H.

  • foreground (np.ndarray) – Foreground masks of shape NFHWC of boolean values with False/0.0 representing foreground, preventing updates to the texture, and True/1.0 for background, allowing updates to the texture.

  • patch_points (np.ndarray) – Array of shape (nb_frames, 4, 2) containing four pairs of integers (height, width) corresponding to the four corners top-left, top-right, bottom-right, bottom-left of the transformed image in the coordinates of the original image.

Returns:

An array with images patched with adversarial texture.

reset_patch(initial_patch_value: float | ndarray | None = None) None

Reset the adversarial texture.

Parameters:

initial_patch_value – Patch value to use for resetting the patch.

Auto Attack

class art.attacks.evasion.AutoAttack(estimator: CLASSIFIER_TYPE, norm: int | float | str = inf, eps: float = 0.3, eps_step: float = 0.1, attacks: List[EvasionAttack] | None = None, batch_size: int = 32, estimator_orig: CLASSIFIER_TYPE | None = None, targeted: bool = False, parallel: bool = False)

Implementation of the AutoAttack attack.

__init__(estimator: CLASSIFIER_TYPE, norm: int | float | str = inf, eps: float = 0.3, eps_step: float = 0.1, attacks: List[EvasionAttack] | None = None, batch_size: int = 32, estimator_orig: CLASSIFIER_TYPE | None = None, targeted: bool = False, parallel: bool = False)

Create a AutoAttack instance.

Parameters:
  • estimator – An trained estimator.

  • norm – The norm of the adversarial perturbation. Possible values: “inf”, np.inf, 1 or 2.

  • eps (float) – Maximum perturbation that the attacker can introduce.

  • eps_step (float) – Attack step size (input variation) at each iteration.

  • attacks – The list of art.attacks.EvasionAttack attacks to be used for AutoAttack. If it is None or empty the standard attacks (PGD, APGD-ce, APGD-dlr, DeepFool, Square) will be used.

  • batch_size (int) – Size of the batch on which adversarial samples are generated.

  • estimator_orig – Original estimator to be attacked by adversarial examples.

  • targeted (bool) – If False run only untargeted attacks, if True also run targeted attacks against each possible target.

  • parallel (bool) – If True run attacks in parallel.

__repr__() str

This method returns a summary of the best performing (lowest perturbation in the parallel case) attacks per image passed to the AutoAttack class.

generate(x: ndarray, y: ndarray | None = None, **kwargs) ndarray

Generate adversarial samples and return them in an array.

Return type:

ndarray

Parameters:
  • x (ndarray) – An array with the original inputs.

  • y – Target values (class labels) one-hot-encoded of shape (nb_samples, nb_classes) or indices of shape (nb_samples,). Only provide this parameter if you’d like to use true labels when crafting adversarial samples. Otherwise, model predictions are used as labels to avoid the “label leaking” effect (explained in this paper: https://arxiv.org/abs/1611.01236). Default is None.

  • mask (np.ndarray) – An array with a mask broadcastable to input x defining where to apply adversarial perturbations. Shape needs to be broadcastable to the shape of x and can also be of the same shape as x. Any features for which the mask is zero will not be adversarially perturbed.

Returns:

An array holding the adversarial examples.

Auto Projected Gradient Descent (Auto-PGD)

class art.attacks.evasion.AutoProjectedGradientDescent(estimator: CLASSIFIER_LOSS_GRADIENTS_TYPE, norm: int | float | str = inf, eps: float = 0.3, eps_step: float = 0.1, max_iter: int = 100, targeted: bool = False, nb_random_init: int = 5, batch_size: int = 32, loss_type: str | None = None, verbose: bool = True)

Implementation of the Auto Projected Gradient Descent attack.

__init__(estimator: CLASSIFIER_LOSS_GRADIENTS_TYPE, norm: int | float | str = inf, eps: float = 0.3, eps_step: float = 0.1, max_iter: int = 100, targeted: bool = False, nb_random_init: int = 5, batch_size: int = 32, loss_type: str | None = None, verbose: bool = True)

Create a AutoProjectedGradientDescent instance.

Parameters:
  • estimator – An trained estimator.

  • norm – The norm of the adversarial perturbation. Possible values: “inf”, np.inf, 1 or 2.

  • eps (float) – Maximum perturbation that the attacker can introduce.

  • eps_step (float) – Attack step size (input variation) at each iteration.

  • max_iter (int) – The maximum number of iterations.

  • targeted (bool) – Indicates whether the attack is targeted (True) or untargeted (False).

  • nb_random_init (int) – Number of random initialisations within the epsilon ball. For num_random_init=0 starting at the original input.

  • batch_size (int) – Size of the batch on which adversarial samples are generated.

  • loss_type – Defines the loss to attack. Available options: None (Use loss defined by estimator), “cross_entropy”, or “difference_logits_ratio”

  • verbose (bool) – Show progress bars.

generate(x: ndarray, y: ndarray | None = None, **kwargs) ndarray

Generate adversarial samples and return them in an array.

Return type:

ndarray

Parameters:
  • x (ndarray) – An array with the original inputs.

  • y – Target values (class labels) one-hot-encoded of shape (nb_samples, nb_classes) or indices of shape (nb_samples,). Only provide this parameter if you’d like to use true labels when crafting adversarial samples. Otherwise, model predictions are used as labels to avoid the “label leaking” effect (explained in this paper: https://arxiv.org/abs/1611.01236). Default is None.

  • mask (np.ndarray) – An array with a mask broadcastable to input x defining where to apply adversarial perturbations. Shape needs to be broadcastable to the shape of x and can also be of the same shape as x. Any features for which the mask is zero will not be adversarially perturbed.

Returns:

An array holding the adversarial examples.

Auto Conjugate Gradient (Auto-CG)

class art.attacks.evasion.AutoConjugateGradient(estimator: CLASSIFIER_LOSS_GRADIENTS_TYPE, norm: int | float | str = inf, eps: float = 0.3, eps_step: float = 0.1, max_iter: int = 100, targeted: bool = False, nb_random_init: int = 5, batch_size: int = 32, loss_type: str | None = None, verbose: bool = True)

Implementation of the ‘Auto Conjugate Gradient’ attack. The original implementation is https://github.com/yamamura-k/ACG.

__init__(estimator: CLASSIFIER_LOSS_GRADIENTS_TYPE, norm: int | float | str = inf, eps: float = 0.3, eps_step: float = 0.1, max_iter: int = 100, targeted: bool = False, nb_random_init: int = 5, batch_size: int = 32, loss_type: str | None = None, verbose: bool = True)

Create a AutoConjugateGradient instance.

Parameters:
  • estimator – An trained estimator.

  • norm – The norm of the adversarial perturbation. Possible values: “inf”, np.inf, 1 or 2.

  • eps (float) – Maximum perturbation that the attacker can introduce.

  • eps_step (float) – Attack step size (input variation) at each iteration.

  • max_iter (int) – The maximum number of iterations.

  • targeted (bool) – Indicates whether the attack is targeted (True) or untargeted (False).

  • nb_random_init (int) – Number of random initialisations within the epsilon ball. For num_random_init=0 starting at the original input.

  • batch_size (int) – Size of the batch on which adversarial samples are generated.

  • loss_type – Defines the loss to attack. Available options: None (Use loss defined by estimator), “cross_entropy”, or “difference_logits_ratio”

  • verbose (bool) – Show progress bars.

generate(x: ndarray, y: ndarray | None = None, **kwargs) ndarray

Generate adversarial samples and return them in an array.

Return type:

ndarray

Parameters:
  • x (ndarray) – An array with the original inputs.

  • y – Target values (class labels) one-hot-encoded of shape (nb_samples, nb_classes) or indices of shape (nb_samples,). Only provide this parameter if you’d like to use true labels when crafting adversarial samples. Otherwise, model predictions are used as labels to avoid the “label leaking” effect (explained in this paper: https://arxiv.org/abs/1611.01236). Default is None.

  • mask (np.ndarray) – An array with a mask broadcastable to input x defining where to apply adversarial perturbations. Shape needs to be broadcastable to the shape of x and can also be of the same shape as x. Any features for which the mask is zero will not be adversarially perturbed.

Returns:

An array holding the adversarial examples.

Boundary Attack / Decision-Based Attack

class art.attacks.evasion.BoundaryAttack(estimator: CLASSIFIER_TYPE, batch_size: int = 64, targeted: bool = True, delta: float = 0.01, epsilon: float = 0.01, step_adapt: float = 0.667, max_iter: int = 5000, num_trial: int = 25, sample_size: int = 20, init_size: int = 100, min_epsilon: float = 0.0, verbose: bool = True)

Implementation of the boundary attack from Brendel et al. (2018). This is a powerful black-box attack that only requires final class prediction.

__init__(estimator: CLASSIFIER_TYPE, batch_size: int = 64, targeted: bool = True, delta: float = 0.01, epsilon: float = 0.01, step_adapt: float = 0.667, max_iter: int = 5000, num_trial: int = 25, sample_size: int = 20, init_size: int = 100, min_epsilon: float = 0.0, verbose: bool = True) None

Create a boundary attack instance.

Parameters:
  • estimator – A trained classifier.

  • batch_size (int) – The size of the batch used by the estimator during inference.

  • targeted (bool) – Should the attack target one specific class.

  • delta (float) – Initial step size for the orthogonal step.

  • epsilon (float) – Initial step size for the step towards the target.

  • step_adapt (float) – Factor by which the step sizes are multiplied or divided, must be in the range (0, 1).

  • max_iter (int) – Maximum number of iterations.

  • num_trial (int) – Maximum number of trials per iteration.

  • sample_size (int) – Number of samples per trial.

  • init_size (int) – Maximum number of trials for initial generation of adversarial examples.

  • min_epsilon (float) – Stop attack if perturbation is smaller than min_epsilon.

  • verbose (bool) – Show progress bars.

generate(x: ndarray, y: ndarray | None = None, **kwargs) ndarray

Generate adversarial samples and return them in an array.

Return type:

ndarray

Parameters:
  • x (ndarray) – An array with the original inputs to be attacked.

  • y – Target values (class labels) one-hot-encoded of shape (nb_samples, nb_classes) or indices of shape (nb_samples,). If self.targeted is true, then y represents the target labels.

  • x_adv_init (np.ndarray) – Initial array to act as initial adversarial examples. Same shape as x.

Returns:

An array holding the adversarial examples.

Brendel and Bethge Attack

Carlini and Wagner L_0 Attack

class art.attacks.evasion.CarliniL0Method(classifier: CLASSIFIER_CLASS_LOSS_GRADIENTS_TYPE, confidence: float = 0.0, targeted: bool = False, learning_rate: float = 0.01, binary_search_steps: int = 10, max_iter: int = 10, initial_const: float = 0.01, mask: ndarray | None = None, warm_start: bool = True, max_halving: int = 5, max_doubling: int = 5, batch_size: int = 1, verbose: bool = True)

The L_0 distance metric is non-differentiable and therefore is ill-suited for standard gradient descent. Instead, we use an iterative algorithm that, in each iteration, identifies some features that don’t have much effect on the classifier output and then fixes those features, so their value will never be changed. The set of fixed features grows in each iteration until we have, by process of elimination, identified a minimal (but possibly not minimum) subset of features that can be modified to generate an adversarial example. In each iteration, we use our L_2 attack to identify which features are unimportant [Carlini and Wagner, 2016].

__init__(classifier: CLASSIFIER_CLASS_LOSS_GRADIENTS_TYPE, confidence: float = 0.0, targeted: bool = False, learning_rate: float = 0.01, binary_search_steps: int = 10, max_iter: int = 10, initial_const: float = 0.01, mask: ndarray | None = None, warm_start: bool = True, max_halving: int = 5, max_doubling: int = 5, batch_size: int = 1, verbose: bool = True)

Create a Carlini&Wagner L_0 attack instance.

Parameters:
  • classifier – A trained classifier.

  • confidence (float) – Confidence of adversarial examples: a higher value produces examples that are farther away, from the original input, but classified with higher confidence as the target class.

  • targeted (bool) – Should the attack target one specific class.

  • learning_rate (float) – The initial learning rate for the attack algorithm. Smaller values produce better results but are slower to converge.

  • binary_search_steps (int) – Number of times to adjust constant with binary search (positive value). If binary_search_steps is large, then the algorithm is not very sensitive to the value of initial_const. Note that the values gamma=0.999999 and c_upper=10e10 are hardcoded with the same values used by the authors of the method.

  • max_iter (int) – The maximum number of iterations.

  • initial_const (float) – The initial trade-off constant c to use to tune the relative importance of distance and confidence. If binary_search_steps is large, the initial constant is not important, as discussed in Carlini and Wagner (2016).

  • mask – The initial features that can be modified by the algorithm. If not specified, the algorithm uses the full feature set.

  • warm_start (bool) – Instead of starting gradient descent in each iteration from the initial image. we start the gradient descent from the solution found on the previous iteration.

  • max_halving (int) – Maximum number of halving steps in the line search optimization.

  • max_doubling (int) – Maximum number of doubling steps in the line search optimization.

  • batch_size (int) – Size of the batch on which adversarial samples are generated.

  • verbose (bool) – Show progress bars.

generate(x: ndarray, y: ndarray | None = None, **kwargs) ndarray

Generate adversarial samples and return them in an array.

Return type:

ndarray

Parameters:
  • x (ndarray) – An array with the original inputs to be attacked.

  • y – Target values (class labels) one-hot-encoded of shape (nb_samples, nb_classes) or indices of shape (nb_samples,). If self.targeted is true, then y represents the target labels. If self.targeted is true, then y_val represents the target labels. Otherwise, the targets are the original class labels.

Returns:

An array holding the adversarial examples.

Carlini and Wagner L_2 Attack

class art.attacks.evasion.CarliniL2Method(classifier: CLASSIFIER_CLASS_LOSS_GRADIENTS_TYPE, confidence: float = 0.0, targeted: bool = False, learning_rate: float = 0.01, binary_search_steps: int = 10, max_iter: int = 10, initial_const: float = 0.01, max_halving: int = 5, max_doubling: int = 5, batch_size: int = 1, verbose: bool = True)

The L_2 optimized attack of Carlini and Wagner (2016). This attack is among the most effective and should be used among the primary attacks to evaluate potential defences. A major difference wrt to the original implementation (https://github.com/carlini/nn_robust_attacks) is that we use line search in the optimization of the attack objective.

__init__(classifier: CLASSIFIER_CLASS_LOSS_GRADIENTS_TYPE, confidence: float = 0.0, targeted: bool = False, learning_rate: float = 0.01, binary_search_steps: int = 10, max_iter: int = 10, initial_const: float = 0.01, max_halving: int = 5, max_doubling: int = 5, batch_size: int = 1, verbose: bool = True) None

Create a Carlini&Wagner L_2 attack instance.

Parameters:
  • classifier – A trained classifier.

  • confidence (float) – Confidence of adversarial examples: a higher value produces examples that are farther away, from the original input, but classified with higher confidence as the target class.

  • targeted (bool) – Should the attack target one specific class.

  • learning_rate (float) – The initial learning rate for the attack algorithm. Smaller values produce better results but are slower to converge.

  • binary_search_steps (int) – Number of times to adjust constant with binary search (positive value). If binary_search_steps is large, then the algorithm is not very sensitive to the value of initial_const. Note that the values gamma=0.999999 and c_upper=10e10 are hardcoded with the same values used by the authors of the method.

  • max_iter (int) – The maximum number of iterations.

  • initial_const (float) – The initial trade-off constant c to use to tune the relative importance of distance and confidence. If binary_search_steps is large, the initial constant is not important, as discussed in Carlini and Wagner (2016).

  • max_halving (int) – Maximum number of halving steps in the line search optimization.

  • max_doubling (int) – Maximum number of doubling steps in the line search optimization.

  • batch_size (int) – Size of the batch on which adversarial samples are generated.

  • verbose (bool) – Show progress bars.

generate(x: ndarray, y: ndarray | None = None, **kwargs) ndarray

Generate adversarial samples and return them in an array.

Return type:

ndarray

Parameters:
  • x (ndarray) – An array with the original inputs to be attacked.

  • y – Target values (class labels) one-hot-encoded of shape (nb_samples, nb_classes) or indices of shape (nb_samples,). If self.targeted is true, then y represents the target labels. If self.targeted is true, then y_val represents the target labels. Otherwise, the targets are the original class labels.

Returns:

An array holding the adversarial examples.

Carlini and Wagner L_inf Attack

class art.attacks.evasion.CarliniLInfMethod(classifier: CLASSIFIER_CLASS_LOSS_GRADIENTS_TYPE, confidence: float = 0.0, targeted: bool = False, learning_rate: float = 0.01, max_iter: int = 10, decrease_factor: float = 0.9, initial_const: float = 1e-05, largest_const: float = 20.0, const_factor: float = 2.0, batch_size: int = 1, verbose: bool = True)

This is a modified version of the L_2 optimized attack of Carlini and Wagner (2016). It controls the L_Inf norm, i.e. the maximum perturbation applied to each pixel.

__init__(classifier: CLASSIFIER_CLASS_LOSS_GRADIENTS_TYPE, confidence: float = 0.0, targeted: bool = False, learning_rate: float = 0.01, max_iter: int = 10, decrease_factor: float = 0.9, initial_const: float = 1e-05, largest_const: float = 20.0, const_factor: float = 2.0, batch_size: int = 1, verbose: bool = True) None

Create a Carlini&Wagner L_Inf attack instance.

Parameters:
  • classifier – A trained classifier.

  • confidence (float) – Confidence of adversarial examples: a higher value produces examples that are farther away, from the original input, but classified with higher confidence as the target class.

  • targeted (bool) – Should the attack target one specific class.

  • learning_rate (float) – The initial learning rate for the attack algorithm. Smaller values produce better results but are slower to converge.

  • max_iter (int) – The maximum number of iterations.

  • decrease_factor (float) – The rate of shrinking tau, values in 0 < decrease_factor < 1 where larger is more accurate.

  • initial_const (float) – The initial value of constant c.

  • largest_const (float) – The largest value of constant c.

  • const_factor (float) – The rate of increasing constant c with const_factor > 1, where smaller more accurate.

  • batch_size (int) – Size of the batch on which adversarial samples are generated.

  • verbose (bool) – Show progress bars.

generate(x: ndarray, y: ndarray | None = None, **kwargs) ndarray

Generate adversarial samples and return them in an array.

Return type:

ndarray

Parameters:
  • x (ndarray) – An array with the original inputs to be attacked.

  • y – Target values (class labels) one-hot-encoded of shape (nb_samples, nb_classes) or indices of shape (nb_samples,). If self.targeted is true, then y_val represents the target labels. Otherwise, the targets are the original class labels.

Returns:

An array holding the adversarial examples.

Carlini and Wagner ASR Attack

class art.attacks.evasion.CarliniWagnerASR(estimator: SPEECH_RECOGNIZER_TYPE, eps: float = 2000.0, learning_rate: float = 100.0, max_iter: int = 1000, decrease_factor_eps: float = 0.8, num_iter_decrease_eps: int = 10, batch_size: int = 16)

Implementation of the Carlini and Wagner audio adversarial attack against a speech recognition model.

__init__(estimator: SPEECH_RECOGNIZER_TYPE, eps: float = 2000.0, learning_rate: float = 100.0, max_iter: int = 1000, decrease_factor_eps: float = 0.8, num_iter_decrease_eps: int = 10, batch_size: int = 16)

Create an instance of the CarliniWagnerASR.

Parameters:
  • estimator – A trained speech recognition estimator.

  • eps (float) – Initial max norm bound for adversarial perturbation.

  • learning_rate (float) – Learning rate of attack.

  • max_iter (int) – Number of iterations.

  • decrease_factor_eps (float) – Decrease factor for epsilon (Paper default: 0.8).

  • num_iter_decrease_eps (int) – Iterations after which to decrease epsilon if attack succeeds (Paper default: 10).

  • batch_size (int) – Batch size.

Composite Adversarial Attack - PyTorch

class art.attacks.evasion.CompositeAdversarialAttackPyTorch(classifier: PyTorchClassifier, enabled_attack: Tuple = (0, 1, 2, 3, 4, 5), hue_epsilon: Tuple[float, float] = (-3.141592653589793, 3.141592653589793), sat_epsilon: Tuple[float, float] = (0.7, 1.3), rot_epsilon: Tuple[float, float] = (-10.0, 10.0), bri_epsilon: Tuple[float, float] = (-0.2, 0.2), con_epsilon: Tuple[float, float] = (0.7, 1.3), pgd_epsilon: Tuple[float, float] = (-0.03137254901960784, 0.03137254901960784), early_stop: bool = True, max_iter: int = 5, max_inner_iter: int = 10, attack_order: str = 'scheduled', batch_size: int = 1, verbose: bool = True)

Implementation of the composite adversarial attack on image classifiers in PyTorch. The attack is constructed by adversarially perturbing the hue component of the inputs. It uses order scheduling to search for the attack sequence and uses the iterative gradient sign method to optimize the perturbations in semantic space and Lp-ball (see FastGradientMethod and BasicIterativeMethod).

Note that this attack is intended for only PyTorch image classifiers with RGB images in the range [0, 1] as inputs
__init__(classifier: PyTorchClassifier, enabled_attack: Tuple = (0, 1, 2, 3, 4, 5), hue_epsilon: Tuple[float, float] = (-3.141592653589793, 3.141592653589793), sat_epsilon: Tuple[float, float] = (0.7, 1.3), rot_epsilon: Tuple[float, float] = (-10.0, 10.0), bri_epsilon: Tuple[float, float] = (-0.2, 0.2), con_epsilon: Tuple[float, float] = (0.7, 1.3), pgd_epsilon: Tuple[float, float] = (-0.03137254901960784, 0.03137254901960784), early_stop: bool = True, max_iter: int = 5, max_inner_iter: int = 10, attack_order: str = 'scheduled', batch_size: int = 1, verbose: bool = True) None

Create an instance of the CompositeAdversarialAttackPyTorch.

Parameters:
  • classifier – A trained PyTorch classifier.

  • enabled_attack – Attack pool selection, and attack order designation for fixed order. For simplicity, we use the following abbreviations to specify each attack types. 0: Hue, 1: Saturation, 2: Rotation, 3: Brightness, 4: Contrast, 5: PGD(L-infinity). Therefore, (0,1,2) means that the attack combines hue, saturation, and rotation; (0,1,2,3,4) means the semantic attacks; (0,1,2,3,4,5) means the full attacks.

  • hue_epsilon – The boundary of the hue perturbation. The value is expected to be in the interval [-np.pi, np.pi]. Perturbation of 0 means no shift and -np.pi and np.pi give a complete reversal of the hue channel in the HSV color space in the positive and negative directions, respectively. See kornia.enhance.adjust_hue for more details.

  • sat_epsilon – The boundary of the saturation perturbation. The value is expected to be in the interval [0, infinity]. The perturbation of 0 gives a black-and-white image, 1 gives the original image, and 2 enhances the saturation by a factor of 2. See kornia.geometry.transform.rotate for more details.

  • rot_epsilon – The boundary of the rotation perturbation (in degrees). Positive values mean counter-clockwise rotation. See kornia.geometry.transform.rotate for more details.

  • bri_epsilon – The boundary of the brightness perturbation. The value is expected to be in the interval [-1, 1]. Perturbation of 0 means no shift, -1 gives a complete black image, and 1 gives a complete white image. See kornia.enhance.adjust_brightness for more details.

  • con_epsilon – The boundary of the contrast perturbation. The value is expected to be in the interval [0, infinity]. Perturbation of 0 gives a complete black image, 1 does not modify the image, and any other value modifies the brightness by this factor. See kornia.enhance.adjust_contrast for more details.

  • pgd_epsilon – The maximum perturbation that the attacker can introduce in the L-infinity ball.

  • early_stop (bool) – When True, the attack will stop if the perturbed example is classified incorrectly by the classifier.

  • max_iter (int) – The maximum number of iterations for attack order optimization.

  • max_inner_iter (int) – The maximum number of iterations for each attack optimization.

  • attack_order (str) – Specify the scheduling type for composite adversarial attack. The value is expected to be fixed, random, or scheduled. fixed means the attack order is the same as specified in enabled_attack. random means the attack order is randomly generated at each iteration. scheduled means to enable the attack order optimization proposed in the paper. If only one attack is enabled, fixed will be used.

  • batch_size (int) – The batch size to use during the generation of adversarial samples.

  • verbose (bool) – Show progress bars.

caa_attack(images: torch.Tensor, labels: torch.Tensor) torch.Tensor

The main algorithm to generate the adversarial examples for composite adversarial attack.

Parameters:
  • images – A tensor of a batch of original inputs to be attacked.

  • labels – A tensor of a batch of the original labels to be predicted.

Returns:

The perturbed data.

caa_brightness(data: torch.Tensor, brightness: torch.Tensor, labels: torch.Tensor) Tuple[torch.Tensor, torch.Tensor]

Compute the adversarial examples for brightness component.

Parameters:
  • data – A tensor of a batch of original inputs to be attacked.

  • brightness – Specify the brightness factor.

  • labels – A tensor of a batch of the original labels to be predicted.

Returns:

The perturbed data and the corresponding brightness factor.

caa_contrast(data: torch.Tensor, contrast: torch.Tensor, labels: torch.Tensor) Tuple[torch.Tensor, torch.Tensor]

Compute the adversarial examples for contrast component.

Parameters:
  • data – A tensor of a batch of original inputs to be attacked.

  • contrast – Specify the contrast factor.

  • labels – A tensor of a batch of the original labels to be predicted.

Returns:

The perturbed data and the corresponding contrast factor.

caa_hue(data: torch.Tensor, hue: torch.Tensor, labels: torch.Tensor) Tuple[torch.Tensor, torch.Tensor]

Compute the adversarial examples for hue component.

Parameters:
  • data – A tensor of a batch of original inputs to be attacked.

  • hue – Specify the hue shift angle.

  • labels – A tensor of a batch of the original labels to be predicted.

Returns:

The perturbed data and the corresponding hue shift angle.

caa_linf(data: torch.Tensor, eta: torch.Tensor, labels: torch.Tensor) Tuple[torch.Tensor, torch.Tensor]

Compute the adversarial examples for L-infinity (PGD) component.

Parameters:
  • data – A tensor of a batch of original inputs to be attacked.

  • eta – The perturbation in the L-infinity ball.

  • labels – A tensor of a batch of the original labels to be predicted.

Returns:

The perturbed data.

caa_rotation(data: torch.Tensor, theta: torch.Tensor, labels: torch.Tensor) Tuple[torch.Tensor, torch.Tensor]

Compute the adversarial examples for rotation component.

Parameters:
  • data – A tensor of a batch of original inputs to be attacked.

  • theta – Specify the rotation angle.

  • labels – A tensor of a batch of the original labels to be predicted.

Returns:

The perturbed data and the corresponding rotation angle.

caa_saturation(data: torch.Tensor, saturation: torch.Tensor, labels: torch.Tensor) Tuple[torch.Tensor, torch.Tensor]

Compute the adversarial examples for saturation component.

Parameters:
  • data – A tensor of a batch of original inputs to be attacked.

  • saturation – Specify the saturation factor.

  • labels – A tensor of a batch of the original labels to be predicted.

Returns:

The perturbed data and the corresponding saturation factor.

generate(x: ndarray, y: ndarray | None = None, **kwargs) ndarray

Generate the composite adversarial samples and return them in a Numpy array.

Return type:

ndarray

Parameters:
  • x (ndarray) – An array with the original inputs to be attacked.

  • y – An array with the original labels to be predicted.

Returns:

An array holding the composite adversarial examples.

update_attack_order(images: torch.Tensor, labels: torch.Tensor, adv_val: List) None

Update the specified attack ordering.

Parameters:
  • images – A tensor of a batch of original inputs to be attacked.

  • labels – A tensor of a batch of the original labels to be predicted.

  • adv_val – Optional; A list of a batch of current attack parameters.

Decision Tree Attack

class art.attacks.evasion.DecisionTreeAttack(classifier: ScikitlearnDecisionTreeClassifier, offset: float = 0.001, verbose: bool = True)

Close implementation of Papernot’s attack on decision trees following Algorithm 2 and communication with the authors.

__init__(classifier: ScikitlearnDecisionTreeClassifier, offset: float = 0.001, verbose: bool = True) None
Parameters:
  • classifier (ScikitlearnDecisionTreeClassifier) – A trained scikit-learn decision tree model.

  • offset (float) – How much the value is pushed away from tree’s threshold.

  • verbose (bool) – Show progress bars.

generate(x: ndarray, y: ndarray | None = None, **kwargs) ndarray

Generate adversarial examples and return them as an array.

Return type:

ndarray

Parameters:
  • x (ndarray) – An array with the original inputs to be attacked.

  • y – Target values (class labels) one-hot-encoded of shape (nb_samples, nb_classes) or indices of shape (nb_samples,).

Returns:

An array holding the adversarial examples.

DeepFool

class art.attacks.evasion.DeepFool(classifier: CLASSIFIER_CLASS_LOSS_GRADIENTS_TYPE, max_iter: int = 100, epsilon: float = 1e-06, nb_grads: int = 10, batch_size: int = 1, verbose: bool = True)

Implementation of the attack from Moosavi-Dezfooli et al. (2015).

__init__(classifier: CLASSIFIER_CLASS_LOSS_GRADIENTS_TYPE, max_iter: int = 100, epsilon: float = 1e-06, nb_grads: int = 10, batch_size: int = 1, verbose: bool = True) None

Create a DeepFool attack instance.

Parameters:
  • classifier – A trained classifier.

  • max_iter (int) – The maximum number of iterations.

  • epsilon (float) – Overshoot parameter.

  • nb_grads (int) – The number of class gradients (top nb_grads w.r.t. prediction) to compute. This way only the most likely classes are considered, speeding up the computation.

  • batch_size (int) – Batch size

  • verbose (bool) – Show progress bars.

generate(x: ndarray, y: ndarray | None = None, **kwargs) ndarray

Generate adversarial samples and return them in an array.

Return type:

ndarray

Parameters:
  • x (ndarray) – An array with the original inputs to be attacked.

  • y – An array with the original labels to be predicted.

Returns:

An array holding the adversarial examples.

DPatch

class art.attacks.evasion.DPatch(estimator: OBJECT_DETECTOR_TYPE, patch_shape: Tuple[int, int, int] = (40, 40, 3), learning_rate: float = 5.0, max_iter: int = 500, batch_size: int = 16, verbose: bool = True)

Implementation of the DPatch attack.

__init__(estimator: OBJECT_DETECTOR_TYPE, patch_shape: Tuple[int, int, int] = (40, 40, 3), learning_rate: float = 5.0, max_iter: int = 500, batch_size: int = 16, verbose: bool = True)

Create an instance of the DPatch.

Parameters:
  • estimator – A trained object detector.

  • patch_shape – The shape of the adversarial path as a tuple of shape (height, width, nb_channels).

  • learning_rate (float) – The learning rate of the optimization.

  • max_iter (int) – The number of optimization steps.

  • batch_size (int) – The size of the training batch.

  • verbose (bool) – Show progress bars.

apply_patch(x: ndarray, patch_external: ndarray | None = None, random_location: bool = False, mask: ndarray | None = None) ndarray

Apply the adversarial patch to images.

Return type:

ndarray

Parameters:
  • x (ndarray) – Images to be patched.

  • patch_external – External patch to apply to images x. If None the attacks patch will be applied.

  • random_location (bool) – True if patch location should be random.

  • mask – An boolean array of shape equal to the shape of a single samples (1, H, W) or the shape of x (N, H, W) without their channel dimensions. Any features for which the mask is True can be the center location of the patch during sampling.

Returns:

The patched images.

generate(x: ndarray, y: ndarray | None = None, target_label: int | List[int] | ndarray | None = None, **kwargs) ndarray

Generate DPatch.

Return type:

ndarray

Parameters:
  • x (ndarray) – Sample images.

  • y

    True labels of type List[Dict[np.ndarray]] for untargeted attack, one dictionary per input image. The keys and values of the dictionary are:

    • boxes [N, 4]: the boxes in [x1, y1, x2, y2] format, with 0 <= x1 < x2 <= W and 0 <= y1 < y2 <= H.

    • labels [N]: the labels for each image

    • scores [N]: the scores or each prediction.

  • target_label – The target label of the DPatch attack.

  • mask (np.ndarray) – An boolean array of shape equal to the shape of a single samples (1, H, W) or the shape of x (N, H, W) without their channel dimensions. Any features for which the mask is True can be the center location of the patch during sampling.

Returns:

Adversarial patch.

RobustDPatch

class art.attacks.evasion.RobustDPatch(estimator: OBJECT_DETECTOR_TYPE, patch_shape: Tuple[int, int, int] = (40, 40, 3), patch_location: Tuple[int, int] = (0, 0), crop_range: Tuple[int, int] = (0, 0), brightness_range: Tuple[float, float] = (1.0, 1.0), rotation_weights: Tuple[float, float, float, float] | Tuple[int, int, int, int] = (1, 0, 0, 0), sample_size: int = 1, learning_rate: float = 5.0, max_iter: int = 500, batch_size: int = 16, targeted: bool = False, summary_writer: str | bool | SummaryWriter = False, verbose: bool = True)

Implementation of a particular variation of the DPatch attack. It follows Lee & Kolter (2019) in using sign gradients with expectations over transformations. The particular transformations supported in this implementation are cropping, rotations by multiples of 90 degrees, and changes in the brightness of the image.

Paper link (original DPatch): https://arxiv.org/abs/1806.02299v4
Paper link (physical-world patch from Lee & Kolter): https://arxiv.org/abs/1906.11897
__init__(estimator: OBJECT_DETECTOR_TYPE, patch_shape: Tuple[int, int, int] = (40, 40, 3), patch_location: Tuple[int, int] = (0, 0), crop_range: Tuple[int, int] = (0, 0), brightness_range: Tuple[float, float] = (1.0, 1.0), rotation_weights: Tuple[float, float, float, float] | Tuple[int, int, int, int] = (1, 0, 0, 0), sample_size: int = 1, learning_rate: float = 5.0, max_iter: int = 500, batch_size: int = 16, targeted: bool = False, summary_writer: str | bool | SummaryWriter = False, verbose: bool = True)

Create an instance of the RobustDPatch.

Parameters:
  • estimator – A trained object detector.

  • patch_shape – The shape of the adversarial patch as a tuple of shape (height, width, nb_channels).

  • patch_location – The location of the adversarial patch as a tuple of shape (upper left x, upper left y).

  • crop_range – By how much the images may be cropped as a tuple of shape (height, width).

  • brightness_range – Range for randomly adjusting the brightness of the image.

  • rotation_weights – Sampling weights for random image rotations by (0, 90, 180, 270) degrees counter-clockwise.

  • sample_size (int) – Number of samples to be used in expectations over transformation.

  • learning_rate (float) – The learning rate of the optimization.

  • max_iter (int) – The number of optimization steps.

  • batch_size (int) – The size of the training batch.

  • targeted (bool) – Indicates whether the attack is targeted (True) or untargeted (False).

  • summary_writer – Activate summary writer for TensorBoard. Default is False and deactivated summary writer. If True save runs/CURRENT_DATETIME_HOSTNAME in current directory. If of type str save in path. If of type SummaryWriter apply provided custom summary writer. Use hierarchical folder structure to compare between runs easily. e.g. pass in ‘runs/exp1’, ‘runs/exp2’, etc. for each new experiment to compare across them.

  • verbose (bool) – Show progress bars.

apply_patch(x: ndarray, patch_external: ndarray | None = None) ndarray

Apply the adversarial patch to images.

Return type:

ndarray

Parameters:
  • x (ndarray) – Images to be patched.

  • patch_external – External patch to apply to images x. If None the attacks patch will be applied.

Returns:

The patched images.

generate(x: ndarray, y: List[Dict[str, ndarray]] | None = None, **kwargs) ndarray

Generate RobustDPatch.

Return type:

ndarray

Parameters:
  • x (ndarray) – Sample images.

  • y – Target labels for object detector.

Returns:

Adversarial patch.

Elastic Net Attack

class art.attacks.evasion.ElasticNet(classifier: CLASSIFIER_CLASS_LOSS_GRADIENTS_TYPE, confidence: float = 0.0, targeted: bool = False, learning_rate: float = 0.01, binary_search_steps: int = 9, max_iter: int = 100, beta: float = 0.001, initial_const: float = 0.001, batch_size: int = 1, decision_rule: str = 'EN', verbose: bool = True)

The elastic net attack of Pin-Yu Chen et al. (2018).

__init__(classifier: CLASSIFIER_CLASS_LOSS_GRADIENTS_TYPE, confidence: float = 0.0, targeted: bool = False, learning_rate: float = 0.01, binary_search_steps: int = 9, max_iter: int = 100, beta: float = 0.001, initial_const: float = 0.001, batch_size: int = 1, decision_rule: str = 'EN', verbose: bool = True) None

Create an ElasticNet attack instance.

Parameters:
  • classifier – A trained classifier.

  • confidence (float) – Confidence of adversarial examples: a higher value produces examples that are farther away, from the original input, but classified with higher confidence as the target class.

  • targeted (bool) – Should the attack target one specific class.

  • learning_rate (float) – The initial learning rate for the attack algorithm. Smaller values produce better results but are slower to converge.

  • binary_search_steps (int) – Number of times to adjust constant with binary search (positive value).

  • max_iter (int) – The maximum number of iterations.

  • beta (float) – Hyperparameter trading off L2 minimization for L1 minimization.

  • initial_const (float) – The initial trade-off constant c to use to tune the relative importance of distance and confidence. If binary_search_steps is large, the initial constant is not important, as discussed in Carlini and Wagner (2016).

  • batch_size (int) – Internal size of batches on which adversarial samples are generated.

  • decision_rule (str) – Decision rule. ‘EN’ means Elastic Net rule, ‘L1’ means L1 rule, ‘L2’ means L2 rule.

  • verbose (bool) – Show progress bars.

generate(x: ndarray, y: ndarray | None = None, **kwargs) ndarray

Generate adversarial samples and return them in an array.

Return type:

ndarray

Parameters:
  • x (ndarray) – An array with the original inputs to be attacked.

  • y – Target values (class labels) one-hot-encoded of shape (nb_samples, nb_classes) or indices of shape (nb_samples,). If self.targeted is true, then y represents the target labels. Otherwise, the targets are the original class labels.

Returns:

An array holding the adversarial examples.

Fast Gradient Method (FGM)

class art.attacks.evasion.FastGradientMethod(estimator: CLASSIFIER_LOSS_GRADIENTS_TYPE, norm: int | float | str = inf, eps: int | float | ndarray = 0.3, eps_step: int | float | ndarray = 0.1, targeted: bool = False, num_random_init: int = 0, batch_size: int = 32, minimal: bool = False, summary_writer: str | bool | SummaryWriter = False)

This attack was originally implemented by Goodfellow et al. (2015) with the infinity norm (and is known as the “Fast Gradient Sign Method”). This implementation extends the attack to other norms, and is therefore called the Fast Gradient Method.

__init__(estimator: CLASSIFIER_LOSS_GRADIENTS_TYPE, norm: int | float | str = inf, eps: int | float | ndarray = 0.3, eps_step: int | float | ndarray = 0.1, targeted: bool = False, num_random_init: int = 0, batch_size: int = 32, minimal: bool = False, summary_writer: str | bool | SummaryWriter = False) None

Create a FastGradientMethod instance.

Parameters:
  • estimator – A trained classifier.

  • norm – The norm of the adversarial perturbation. Possible values: “inf”, np.inf, 1 or 2.

  • eps – Attack step size (input variation).

  • eps_step – Step size of input variation for minimal perturbation computation.

  • targeted (bool) – Indicates whether the attack is targeted (True) or untargeted (False)

  • num_random_init (int) – Number of random initialisations within the epsilon ball. For random_init=0 starting at the original input.

  • batch_size (int) – Size of the batch on which adversarial samples are generated.

  • minimal (bool) – Indicates if computing the minimal perturbation (True). If True, also define eps_step for the step size and eps for the maximum perturbation.

  • summary_writer – Activate summary writer for TensorBoard. Default is False and deactivated summary writer. If True save runs/CURRENT_DATETIME_HOSTNAME in current directory. If of type str save in path. If of type SummaryWriter apply provided custom summary writer. Use hierarchical folder structure to compare between runs easily. e.g. pass in ‘runs/exp1’, ‘runs/exp2’, etc. for each new experiment to compare across them.

generate(x: ndarray, y: ndarray | None = None, **kwargs) ndarray

Generate adversarial samples and return them in an array.

Return type:

ndarray

Parameters:
  • x (ndarray) – An array with the original inputs.

  • y – Target values (class labels) one-hot-encoded of shape (nb_samples, nb_classes) or indices of shape (nb_samples,). Only provide this parameter if you’d like to use true labels when crafting adversarial samples. Otherwise, model predictions are used as labels to avoid the “label leaking” effect (explained in this paper: https://arxiv.org/abs/1611.01236). Default is None.

  • mask (np.ndarray) – An array with a mask broadcastable to input x defining where to apply adversarial perturbations. Shape needs to be broadcastable to the shape of x and can also be of the same shape as x. Any features for which the mask is zero will not be adversarially perturbed.

Returns:

An array holding the adversarial examples.

Feature Adversaries - Numpy

class art.attacks.evasion.FeatureAdversariesNumpy(classifier: CLASSIFIER_NEURALNETWORK_TYPE, delta: float | None = None, layer: int | None = None, batch_size: int = 32)

This class represent a Feature Adversaries evasion attack.

__init__(classifier: CLASSIFIER_NEURALNETWORK_TYPE, delta: float | None = None, layer: int | None = None, batch_size: int = 32)

Create a FeatureAdversaries instance.

Parameters:
  • classifier – A trained classifier.

  • delta – The maximum deviation between source and guide images.

  • layer – Index of the representation layer.

  • batch_size (int) – Batch size.

generate(x: ndarray, y: ndarray | None = None, **kwargs) ndarray

Generate adversarial samples and return them in an array.

Return type:

ndarray

Parameters:
  • x (ndarray) – Source samples.

  • y – Guide samples.

  • kwargs

    The kwargs are used as options for the minimisation with scipy.optimize.minimize using method=”L-BFGS-B”. Valid options are based on the output of scipy.optimize.show_options(solver=’minimize’, method=’L-BFGS-B’): Minimize a scalar function of one or more variables using the L-BFGS-B algorithm.

    dispNone or int

    If disp is None (the default), then the supplied version of iprint is used. If disp is not None, then it overrides the supplied version of iprint with the behaviour you outlined.

    maxcorint

    The maximum number of variable metric corrections used to define the limited memory matrix. (The limited memory BFGS method does not store the full hessian but uses this many terms in an approximation to it.)

    ftolfloat

    The iteration stops when (f^k - f^{k+1})/max{|f^k|,|f^{k+1}|,1} <= ftol.

    gtolfloat

    The iteration will stop when max{|proj g_i | i = 1, ..., n} <= gtol where pg_i is the i-th component of the projected gradient.

    epsfloat

    Step size used for numerical approximation of the Jacobian.

    maxfunint

    Maximum number of function evaluations.

    maxiterint

    Maximum number of iterations.

    iprintint, optional

    Controls the frequency of output. iprint < 0 means no output; iprint = 0 print only one line at the last iteration; 0 < iprint < 99 print also f and |proj g| every iprint iterations; iprint = 99 print details of every iteration except n-vectors; iprint = 100 print also the changes of active set and final x; iprint > 100 print details of every iteration including x and g.

    callbackcallable, optional

    Called after each iteration, as callback(xk), where xk is the current parameter vector.

    maxlsint, optional

    Maximum number of line search steps (per iteration). Default is 20.

    The option ftol is exposed via the scipy.optimize.minimize interface, but calling scipy.optimize.fmin_l_bfgs_b directly exposes factr. The relationship between the two is ftol = factr * numpy.finfo(float).eps. I.e., factr multiplies the default machine floating-point precision to arrive at ftol.

Returns:

Adversarial examples.

Raises:

KeyError – The argument {} in kwargs is not allowed as option for scipy.optimize.minimize using method=”L-BFGS-B”.

Feature Adversaries - PyTorch

class art.attacks.evasion.FeatureAdversariesPyTorch(estimator: PYTORCH_ESTIMATOR_TYPE, delta: float, optimizer: Optimizer | None = None, optimizer_kwargs: dict | None = None, lambda_: float = 0.0, layer: int | str | Tuple[int, ...] | Tuple[str, ...] = -1, max_iter: int = 100, batch_size: int = 32, step_size: int | float | None = None, random_start: bool = False, verbose: bool = True)

This class represent a Feature Adversaries evasion attack in PyTorch.

__init__(estimator: PYTORCH_ESTIMATOR_TYPE, delta: float, optimizer: Optimizer | None = None, optimizer_kwargs: dict | None = None, lambda_: float = 0.0, layer: int | str | Tuple[int, ...] | Tuple[str, ...] = -1, max_iter: int = 100, batch_size: int = 32, step_size: int | float | None = None, random_start: bool = False, verbose: bool = True)

Create a FeatureAdversariesPyTorch instance.

Parameters:
  • estimator – A trained estimator.

  • delta (float) – The maximum deviation between source and guide images.

  • optimizer – Optimizer applied to problem constrained only by clip values if defined, if None the Projected Gradient Descent (PGD) optimizer is used.

  • optimizer_kwargs – Additional optimizer arguments.

  • lambda (float) – Regularization parameter of the L-inf soft constraint.

  • layer – Index or tuple of indices of the representation layer(s).

  • max_iter (int) – The maximum number of iterations.

  • batch_size (int) – Batch size.

  • step_size – Step size for PGD optimizer.

  • random_start (bool) – Randomly initialize perturbations, when using Projected Gradient Descent variant.

  • verbose (bool) – Show progress bars.

generate(x: ndarray, y: ndarray | None = None, **kwargs) ndarray

Generate adversarial samples and return them in an array.

Return type:

ndarray

Parameters:
  • x (ndarray) – Source samples.

  • y – Guide samples.

Returns:

Adversarial examples.

Feature Adversaries - TensorFlow

class art.attacks.evasion.FeatureAdversariesTensorFlowV2(estimator: TENSORFLOWV2_ESTIMATOR_TYPE, delta: float, optimizer: Optimizer | None = None, optimizer_kwargs: dict | None = None, lambda_: float = 0.0, layer: int | str | Tuple[int, ...] | Tuple[str, ...] = -1, max_iter: int = 100, batch_size: int = 32, step_size: int | float | None = None, random_start: bool = False, verbose: bool = True)

This class represent a Feature Adversaries evasion attack in TensorFlow v2.

__init__(estimator: TENSORFLOWV2_ESTIMATOR_TYPE, delta: float, optimizer: Optimizer | None = None, optimizer_kwargs: dict | None = None, lambda_: float = 0.0, layer: int | str | Tuple[int, ...] | Tuple[str, ...] = -1, max_iter: int = 100, batch_size: int = 32, step_size: int | float | None = None, random_start: bool = False, verbose: bool = True)

Create a FeatureAdversariesTensorFlowV2 instance.

Parameters:
  • estimator – A trained estimator.

  • delta (float) – The maximum deviation between source and guide images.

  • optimizer – Optimizer applied to problem constrained only by clip values if defined, if None the Projected Gradient Descent (PGD) optimizer is used.

  • optimizer_kwargs – Additional optimizer arguments.

  • lambda (float) – Regularization parameter of the L-inf soft constraint.

  • layer – Index or tuple of indices of the representation layer(s).

  • max_iter (int) – The maximum number of iterations.

  • batch_size (int) – Batch size.

  • step_size – Step size for PGD optimizer.

  • random_start (bool) – Randomly initialize perturbations, when using Projected Gradient Descent variant.

  • verbose (bool) – Show progress bars.

generate(x: ndarray, y: ndarray | None = None, **kwargs) ndarray

Generate adversarial samples and return them in an array.

Return type:

ndarray

Parameters:
  • x (ndarray) – Source samples.

  • y – Guide samples.

Returns:

Adversarial examples.

Frame Saliency Attack

class art.attacks.evasion.FrameSaliencyAttack(classifier: CLASSIFIER_NEURALNETWORK_TYPE, attacker: EvasionAttack, method: str = 'iterative_saliency', frame_index: int = 1, batch_size: int = 1, verbose: bool = True)

Implementation of the attack framework proposed by Inkawhich et al. (2018). Prioritizes the frame of a sequential input to be adversarially perturbed based on the saliency score of each frame.

__init__(classifier: CLASSIFIER_NEURALNETWORK_TYPE, attacker: EvasionAttack, method: str = 'iterative_saliency', frame_index: int = 1, batch_size: int = 1, verbose: bool = True)
Parameters:
  • classifier – A trained classifier.

  • attacker (EvasionAttack) – An adversarial evasion attacker which supports masking. Currently supported: ProjectedGradientDescent, BasicIterativeMethod, FastGradientMethod.

  • method (str) – Specifies which method to use: “iterative_saliency” (adds perturbation iteratively to frame with highest saliency score until attack is successful), “iterative_saliency_refresh” (updates perturbation after each iteration), “one_shot” (adds all perturbations at once, i.e. defaults to original attack).

  • frame_index (int) – Index of the axis in input (feature) array x representing the frame dimension.

  • batch_size (int) – Size of the batch on which adversarial samples are generated.

  • verbose (bool) – Show progress bars.

generate(x: ndarray, y: ndarray | None = None, **kwargs) ndarray

Generate adversarial samples and return them in an array.

Return type:

ndarray

Parameters:
  • x (ndarray) – An array with the original inputs.

  • y – An array with the original labels to be predicted.

Returns:

An array holding the adversarial examples.

Geometric Decision Based Attack

class art.attacks.evasion.GeoDA(estimator: CLASSIFIER_TYPE, batch_size: int = 64, norm: int | float | str = 2, sub_dim: int = 10, max_iter: int = 4000, bin_search_tol: float = 0.1, lambda_param: float = 0.6, sigma: float = 0.0002, verbose: bool = True)

Implementation of the Geometric Decision-based Attack (GeoDA), a black-box attack requiring class predictions. Based on reference implementation: https://github.com/thisisalirah/GeoDA

__init__(estimator: CLASSIFIER_TYPE, batch_size: int = 64, norm: int | float | str = 2, sub_dim: int = 10, max_iter: int = 4000, bin_search_tol: float = 0.1, lambda_param: float = 0.6, sigma: float = 0.0002, verbose: bool = True) None

Create a Geometric Decision-based Attack instance.

Parameters:
  • estimator – A trained classifier.

  • batch_size (int) – The size of the batch used by the estimator during inference.

  • norm – The norm of the adversarial perturbation. Possible values: “inf”, np.inf, 1 or 2.

  • sub_dim (int) – Dimensionality of 2D frequency space (DCT).

  • max_iter (int) – Maximum number of iterations.

  • bin_search_tol (float) – Maximum remaining L2 perturbation defining binary search convergence. Input images are normalised by maximal estimator.clip_value[1] if available or maximal value in the input image.

  • lambda_param (float) – The lambda of equation 19 with lambda_param=0 corresponding to a single iteration and lambda_param=1 to a uniform distribution of iterations per step.

  • sigma (float) – Variance of the Gaussian perturbation.

  • targeted – Should the attack target one specific class.

  • verbose (bool) – Show progress bars.

generate(x: ndarray, y: ndarray | None = None, **kwargs) ndarray

Generate adversarial samples.

Return type:

ndarray

Parameters:
  • x (ndarray) – An array with the original inputs to be attacked.

  • y – Target values (class labels) one-hot-encoded of shape (nb_samples, nb_classes) or indices of shape (nb_samples,). If self.targeted is true, then y represents the target labels.

Returns:

The adversarial examples.

GRAPHITE - Blackbox

class art.attacks.evasion.GRAPHITEBlackbox(classifier: CLASSIFIER_NEURALNETWORK_TYPE, noise_size: Tuple[int, int], net_size: Tuple[int, int], heat_patch_size: Tuple[int, int] = (4, 4), heat_patch_stride: Tuple[int, int] = (1, 1), heatmap_mode: str = 'Target', tr_lo: float = 0.65, tr_hi: float = 0.85, num_xforms_mask: int = 100, max_mask_size: int = -1, beta: float = 1.0, eta: float = 500, num_xforms_boost: int = 100, num_boost_queries: int = 20000, rotation_range: Tuple[float, float] = (-30.0, 30.0), dist_range: Tuple[float, float] = (0.0, 0.0), gamma_range: Tuple[float, float] = (1.0, 2.0), crop_percent_range: Tuple[float, float] = (-0.03125, 0.03125), off_x_range: Tuple[float, float] = (-0.03125, 0.03125), off_y_range: Tuple[float, float] = (-0.03125, 0.03125), blur_kernels: Tuple[int, int] | List[int] = (0, 3), batch_size: int = 64)

Implementation of the hard-label GRAPHITE attack from Feng et al. (2022). This is a physical, black-box attack that only requires final class prediction and generates robust physical perturbations that can be applied as stickers.

__init__(classifier: CLASSIFIER_NEURALNETWORK_TYPE, noise_size: Tuple[int, int], net_size: Tuple[int, int], heat_patch_size: Tuple[int, int] = (4, 4), heat_patch_stride: Tuple[int, int] = (1, 1), heatmap_mode: str = 'Target', tr_lo: float = 0.65, tr_hi: float = 0.85, num_xforms_mask: int = 100, max_mask_size: int = -1, beta: float = 1.0, eta: float = 500, num_xforms_boost: int = 100, num_boost_queries: int = 20000, rotation_range: Tuple[float, float] = (-30.0, 30.0), dist_range: Tuple[float, float] = (0.0, 0.0), gamma_range: Tuple[float, float] = (1.0, 2.0), crop_percent_range: Tuple[float, float] = (-0.03125, 0.03125), off_x_range: Tuple[float, float] = (-0.03125, 0.03125), off_y_range: Tuple[float, float] = (-0.03125, 0.03125), blur_kernels: Tuple[int, int] | List[int] = (0, 3), batch_size: int = 64) None

Create a GRAPHITEBlackbox attack instance.

Parameters:
  • classifier – A trained classifier.

  • noise_size – The resolution to generate perturbations in (w, h).

  • net_size – The resolution to resize images to before feeding to the model in (w, h).

  • heat_patch_size – The size of the heatmap patches in (w, h).

  • heat_patch_stride – The stride of the heatmap patching in (w, h).

  • heatmap_mode (str) – The mode of heatmap in [‘Target’, ‘Random’].

  • tr_lo (float) – tr_lo, threshold for fine-grained reduction.

  • tr_hi (float) – tr_hi, threshold for coarse-grained reduction.

  • num_xforms_mask (int) – The number of transforms to use in mask generation.

  • max_mask_size (int) – Optionally specify that you just want to optimize until a mask size of <= max_mask_size.

  • beta (float) – The parameter beta for RGF optimization in boosting.

  • eta (float) – The step size for RGF optimization in boosting.

  • num_xforms_boost (int) – The number of transforms to use in boosting.

  • num_boost_queries (int) – The number of queries to use in boosting.

  • rotation_range – The range of the rotation in the perspective transform.

  • dist_range – The range of the distance (in ft) to be added to the focal length in perspective transform.

  • gamma_range – The range of the gamma in the gamma transform.

  • crop_percent_range – The range of the crop percent in the perspective transform.

  • off_x_range – The range of the x offset (percent) in the perspective transform.

  • off_y_range – The range of the y offset (percent) in the perspective transform.

  • blur_kernels – The kernels to blur with.

  • batch_size (int) – The size of the batch used by the estimator during inference.

generate(x: ndarray, y: ndarray | None = None, **kwargs) ndarray

Generate adversarial samples and return them in an array.

Return type:

ndarray

Parameters:
  • x (ndarray) – An array with the original inputs to be attacked.

  • y – Target values (class labels) one-hot-encoded of shape (nb_samples, nb_classes) or indices of shape (nb_samples,).

  • mask – An array with a mask broadcastable to input x defining where to apply adversarial perturbations. Shape needs to be broadcastable to the shape of x and can also be of the same shape as x. Any features for which the mask is zero will not be adversarially perturbed.

  • x_tar – Initial array to act as the example target image.

  • pts – Optional points to consider when cropping the perspective transform. An array of points in [x, y, scale] with shape [num points, 3, 1].

  • obj_width – The estimated object width (inches) for perspective transform. 30 by default.

  • focal – The estimated focal length (ft) for perspective transform. 3 by default.

Returns:

An array holding the adversarial examples.

GRAPHITE - Whitebox - PyTorch

class art.attacks.evasion.GRAPHITEWhiteboxPyTorch(classifier: PyTorchClassifier, net_size: Tuple[int, int], min_tr: float = 0.8, num_xforms: int = 100, step_size: float = 0.0157, steps: int = 50, first_steps: int = 500, patch_removal_size: float = 4, patch_removal_interval: float = 2, num_patches_to_remove: int = 4, rand_start_epsilon_range: Tuple[float, float] = (-0.03137254901960784, 0.03137254901960784), rotation_range: Tuple[float, float] = (-30.0, 30.0), dist_range: Tuple[float, float] = (0.0, 0.0), gamma_range: Tuple[float, float] = (1.0, 2.0), crop_percent_range: Tuple[float, float] = (-0.03125, 0.03125), off_x_range: Tuple[float, float] = (-0.03125, 0.03125), off_y_range: Tuple[float, float] = (-0.03125, 0.03125), blur_kernels: Tuple[int, int] | List[int] = (0, 3), batch_size: int = 64)

Implementation of the white-box PyTorch GRAPHITE attack from Feng et al. (2022). This is a physical attack that generates robust physical perturbations that can be applied as stickers.

__init__(classifier: PyTorchClassifier, net_size: Tuple[int, int], min_tr: float = 0.8, num_xforms: int = 100, step_size: float = 0.0157, steps: int = 50, first_steps: int = 500, patch_removal_size: float = 4, patch_removal_interval: float = 2, num_patches_to_remove: int = 4, rand_start_epsilon_range: Tuple[float, float] = (-0.03137254901960784, 0.03137254901960784), rotation_range: Tuple[float, float] = (-30.0, 30.0), dist_range: Tuple[float, float] = (0.0, 0.0), gamma_range: Tuple[float, float] = (1.0, 2.0), crop_percent_range: Tuple[float, float] = (-0.03125, 0.03125), off_x_range: Tuple[float, float] = (-0.03125, 0.03125), off_y_range: Tuple[float, float] = (-0.03125, 0.03125), blur_kernels: Tuple[int, int] | List[int] = (0, 3), batch_size: int = 64) None

Create a GRAPHITEWhiteboxPyTorch attack instance.

Parameters:
  • classifier – A trained classifier.

  • net_size – The resolution to resize images to before feeding to the model in (w, h).

  • min_tr (float) – minimum threshold for EoT PGD to reach.

  • num_xforms (int) – The number of transforms to use.

  • step_size (float) – The step size.

  • steps (int) – The number of steps for EoT PGD after the first iteration.

  • first_steps (int) – The number of steps for EoT PGD for the first iteration.

  • patch_removal_size (float) – size of patch removal.

  • patch_removal_interval (float) – stride for patch removal.

  • num_patches_to_remove (int) – the number of patches to remove per iteration.

  • rand_start_epsilon_range – the range for random start init.

  • rotation_range – The range of the rotation in the perspective transform.

  • dist_range – The range of the dist (in ft) to be added to the focal length in the perspective transform.

  • gamma_range – The range of the gamma in the gamma transform.

  • crop_percent_range – The range of the crop percent in the perspective transform.

  • off_x_range – The range of the x offset (percent) in the perspective transform.

  • off_y_range – The range of the y offset (percent) in the perspective transform.

  • blur_kernels – The kernels to blur with.

  • batch_size (int) – The size of the batch used by the estimator during inference.

generate(x: ndarray, y: ndarray | None = None, **kwargs) ndarray

Generate adversarial samples and return them in an array.

Return type:

ndarray

Parameters:
  • x (ndarray) – An array with the original inputs to be attacked.

  • y – Target values (class labels) one-hot-encoded of shape (nb_samples, nb_classes) or indices of shape (nb_samples,).

  • mask – An array with a mask broadcastable to input x defining where to apply adversarial perturbations. Shape needs to be broadcastable to the shape of x and can also be of the same shape as x. Any features for which the mask is zero will not be adversarially perturbed.

  • pts – Optional points to consider when cropping the perspective transform. An array of points in [x, y, scale] with shape [num points, 3, 1].

  • obj_width – The estimated object width (inches) for perspective transform. 30 by default.

  • focal – The estimated focal length (ft) for perspective transform. 3 by default.

Returns:

An array holding the adversarial examples.

High Confidence Low Uncertainty Attack

class art.attacks.evasion.HighConfidenceLowUncertainty(classifier: GPyGaussianProcessClassifier, conf: float = 0.95, unc_increase: float = 100.0, min_val: float = 0.0, max_val: float = 1.0, verbose: bool = True)

Implementation of the High-Confidence-Low-Uncertainty (HCLU) adversarial example formulation by Grosse et al. (2018)

__init__(classifier: GPyGaussianProcessClassifier, conf: float = 0.95, unc_increase: float = 100.0, min_val: float = 0.0, max_val: float = 1.0, verbose: bool = True) None
Parameters:
  • classifier (GPyGaussianProcessClassifier) – A trained model of type GPYGaussianProcessClassifier.

  • conf (float) – Confidence that examples should have, if there were to be classified as 1.0 maximally.

  • unc_increase (float) – Value uncertainty is allowed to deviate, where 1.0 is original value.

  • min_val (float) – minimal value any feature can take.

  • max_val (float) – maximal value any feature can take.

  • verbose (bool) – Show progress bars.

generate(x: ndarray, y: ndarray | None = None, **kwargs) ndarray

Generate adversarial examples and return them as an array.

Return type:

ndarray

Parameters:
  • x (ndarray) – An array with the original inputs to be attacked.

  • y – Target values (class labels) one-hot-encoded of shape (nb_samples, nb_classes) or indices of shape (nb_samples,).

Returns:

An array holding the adversarial examples.

HopSkipJump Attack

class art.attacks.evasion.HopSkipJump(classifier: CLASSIFIER_TYPE, batch_size: int = 64, targeted: bool = False, norm: int | float | str = 2, max_iter: int = 50, max_eval: int = 10000, init_eval: int = 100, init_size: int = 100, verbose: bool = True)

Implementation of the HopSkipJump attack from Jianbo et al. (2019). This is a powerful black-box attack that only requires final class prediction, and is an advanced version of the boundary attack.

__init__(classifier: CLASSIFIER_TYPE, batch_size: int = 64, targeted: bool = False, norm: int | float | str = 2, max_iter: int = 50, max_eval: int = 10000, init_eval: int = 100, init_size: int = 100, verbose: bool = True) None

Create a HopSkipJump attack instance.

Parameters:
  • classifier – A trained classifier.

  • batch_size (int) – The size of the batch used by the estimator during inference.

  • targeted (bool) – Should the attack target one specific class.

  • norm – Order of the norm. Possible values: “inf”, np.inf or 2.

  • max_iter (int) – Maximum number of iterations.

  • max_eval (int) – Maximum number of evaluations for estimating gradient.

  • init_eval (int) – Initial number of evaluations for estimating gradient.

  • init_size (int) – Maximum number of trials for initial generation of adversarial examples.

  • verbose (bool) – Show progress bars.

generate(x: ndarray, y: ndarray | None = None, **kwargs) ndarray

Generate adversarial samples and return them in an array.

Return type:

ndarray

Parameters:
  • x (ndarray) – An array with the original inputs to be attacked.

  • y – Target values (class labels) one-hot-encoded of shape (nb_samples, nb_classes) or indices of shape (nb_samples,).

  • mask (np.ndarray) – An array with a mask broadcastable to input x defining where to apply adversarial perturbations. Shape needs to be broadcastable to the shape of x and can also be of the same shape as x. Any features for which the mask is zero will not be adversarially perturbed.

  • x_adv_init (np.ndarray) – Initial array to act as initial adversarial examples. Same shape as x.

  • resume (bool) – Allow users to continue their previous attack.

Returns:

An array holding the adversarial examples.

Imperceptible ASR Attack

class art.attacks.evasion.ImperceptibleASR(estimator: SPEECH_RECOGNIZER_TYPE, masker: PsychoacousticMasker, eps: float = 2000.0, learning_rate_1: float = 100.0, max_iter_1: int = 1000, alpha: float = 0.05, learning_rate_2: float = 1.0, max_iter_2: int = 4000, loss_theta_min: float = 0.05, decrease_factor_eps: float = 0.8, num_iter_decrease_eps: int = 10, increase_factor_alpha: float = 1.2, num_iter_increase_alpha: int = 20, decrease_factor_alpha: float = 0.8, num_iter_decrease_alpha: int = 50, batch_size: int = 1)

Implementation of the imperceptible attack against a speech recognition model.

__init__(estimator: SPEECH_RECOGNIZER_TYPE, masker: PsychoacousticMasker, eps: float = 2000.0, learning_rate_1: float = 100.0, max_iter_1: int = 1000, alpha: float = 0.05, learning_rate_2: float = 1.0, max_iter_2: int = 4000, loss_theta_min: float = 0.05, decrease_factor_eps: float = 0.8, num_iter_decrease_eps: int = 10, increase_factor_alpha: float = 1.2, num_iter_increase_alpha: int = 20, decrease_factor_alpha: float = 0.8, num_iter_decrease_alpha: int = 50, batch_size: int = 1) None

Create an instance of the ImperceptibleASR.

The default parameters assume that audio input is in int16 range. If using normalized audio input, parameters eps and learning_rate_{1,2} need to be scaled with a factor 2^-15

Parameters:
  • estimator – A trained speech recognition estimator.

  • masker – A Psychoacoustic masker.

  • eps (float) – Initial max norm bound for adversarial perturbation.

  • learning_rate_1 (float) – Learning rate for stage 1 of attack.

  • max_iter_1 (int) – Number of iterations for stage 1 of attack.

  • alpha (float) – Initial alpha value for balancing stage 2 loss.

  • learning_rate_2 (float) – Learning rate for stage 2 of attack.

  • max_iter_2 (int) – Number of iterations for stage 2 of attack.

  • loss_theta_min (float) – If imperceptible loss reaches minimum, stop early. Works best with batch_size=1.

  • decrease_factor_eps (float) – Decrease factor for epsilon (Paper default: 0.8).

  • num_iter_decrease_eps (int) – Iterations after which to decrease epsilon if attack succeeds (Paper default: 10).

  • increase_factor_alpha (float) – Increase factor for alpha (Paper default: 1.2).

  • num_iter_increase_alpha (int) – Iterations after which to increase alpha if attack succeeds (Paper default: 20).

  • decrease_factor_alpha (float) – Decrease factor for alpha (Paper default: 0.8).

  • num_iter_decrease_alpha (int) – Iterations after which to decrease alpha if attack fails (Paper default: 50).

  • batch_size (int) – Batch size.

generate(x: ndarray, y: ndarray | None = None, **kwargs) ndarray

Generate imperceptible, adversarial examples.

Return type:

ndarray

Parameters:
  • x (ndarray) – An array with the original inputs to be attacked.

  • y – Target values of shape (batch_size,). Each sample in y is a string and it may possess different lengths. A possible example of y could be: y = np.array([‘SIXTY ONE’, ‘HELLO’]).

Returns:

An array holding the adversarial examples.

Imperceptible ASR Attack - PyTorch

class art.attacks.evasion.ImperceptibleASRPyTorch(estimator: PyTorchDeepSpeech, eps: float = 0.05, max_iter_1: int = 10, max_iter_2: int = 4000, learning_rate_1: float = 0.001, learning_rate_2: float = 0.0005, optimizer_1: torch.optim.Optimizer | None = None, optimizer_2: torch.optim.Optimizer | None = None, global_max_length: int = 200000, initial_rescale: float = 1.0, decrease_factor_eps: float = 0.8, num_iter_decrease_eps: int = 1, alpha: float = 1.2, increase_factor_alpha: float = 1.2, num_iter_increase_alpha: int = 20, decrease_factor_alpha: float = 0.8, num_iter_decrease_alpha: int = 20, win_length: int = 2048, hop_length: int = 512, n_fft: int = 2048, batch_size: int = 32, use_amp: bool = False, opt_level: str = 'O1')

This class implements the imperceptible, robust, and targeted attack to generate adversarial examples for automatic speech recognition models. This attack will be implemented specifically for DeepSpeech model and is framework dependent, specifically for PyTorch.

__init__(estimator: PyTorchDeepSpeech, eps: float = 0.05, max_iter_1: int = 10, max_iter_2: int = 4000, learning_rate_1: float = 0.001, learning_rate_2: float = 0.0005, optimizer_1: torch.optim.Optimizer | None = None, optimizer_2: torch.optim.Optimizer | None = None, global_max_length: int = 200000, initial_rescale: float = 1.0, decrease_factor_eps: float = 0.8, num_iter_decrease_eps: int = 1, alpha: float = 1.2, increase_factor_alpha: float = 1.2, num_iter_increase_alpha: int = 20, decrease_factor_alpha: float = 0.8, num_iter_decrease_alpha: int = 20, win_length: int = 2048, hop_length: int = 512, n_fft: int = 2048, batch_size: int = 32, use_amp: bool = False, opt_level: str = 'O1')

Create a ImperceptibleASRPyTorch instance.

Parameters:
  • estimator (PyTorchDeepSpeech) – A trained estimator.

  • eps (float) – Maximum perturbation that the attacker can introduce.

  • max_iter_1 (int) – The maximum number of iterations applied for the first stage of the optimization of the attack.

  • max_iter_2 (int) – The maximum number of iterations applied for the second stage of the optimization of the attack.

  • learning_rate_1 (float) – The learning rate applied for the first stage of the optimization of the attack.

  • learning_rate_2 (float) – The learning rate applied for the second stage of the optimization of the attack.

  • optimizer_1 – The optimizer applied for the first stage of the optimization of the attack. If None attack will use torch.optim.Adam.

  • optimizer_2 – The optimizer applied for the second stage of the optimization of the attack. If None attack will use torch.optim.Adam.

  • global_max_length (int) – The length of the longest audio signal allowed by this attack.

  • initial_rescale (float) – Initial rescale coefficient to speedup the decrease of the perturbation size during the first stage of the optimization of the attack.

  • decrease_factor_eps (float) – The factor to adjust the rescale coefficient during the first stage of the optimization of the attack.

  • num_iter_decrease_eps (int) – Number of iterations to adjust the rescale coefficient, and therefore adjust the perturbation size.

  • alpha (float) – Value of the alpha coefficient used in the second stage of the optimization of the attack.

  • increase_factor_alpha (float) – The factor to increase the alpha coefficient used in the second stage of the optimization of the attack.

  • num_iter_increase_alpha (int) – Number of iterations to increase alpha.

  • decrease_factor_alpha (float) – The factor to decrease the alpha coefficient used in the second stage of the optimization of the attack.

  • num_iter_decrease_alpha (int) – Number of iterations to decrease alpha.

  • win_length (int) – Length of the window. The number of STFT rows is (win_length // 2 + 1).

  • hop_length (int) – Number of audio samples between adjacent STFT columns.

  • n_fft (int) – FFT window size.

  • batch_size (int) – Size of the batch on which adversarial samples are generated.

  • use_amp (bool) – Whether to use the automatic mixed precision tool to enable mixed precision training or gradient computation, e.g. with loss gradient computation. When set to True, this option is only triggered if there are GPUs available.

  • opt_level (str) – Specify a pure or mixed precision optimization level. Used when use_amp is True. Accepted values are O0, O1, O2, and O3.

generate(x: ndarray, y: ndarray | None = None, **kwargs) ndarray

Generate adversarial samples and return them in an array.

Return type:

ndarray

Parameters:
  • x (ndarray) – Samples of shape (nb_samples, seq_length). Note that, it is allowable that sequences in the batch could have different lengths. A possible example of x could be: x = np.array([np.array([0.1, 0.2, 0.1, 0.4]), np.array([0.3, 0.1])]).

  • y – Target values of shape (nb_samples). Each sample in y is a string and it may possess different lengths. A possible example of y could be: y = np.array([‘SIXTY ONE’, ‘HELLO’]). Note that, this class only supports targeted attack.

Returns:

An array holding the adversarial examples.

Basic Iterative Method (BIM)

class art.attacks.evasion.BasicIterativeMethod(estimator: CLASSIFIER_LOSS_GRADIENTS_TYPE, eps: int | float | ndarray = 0.3, eps_step: int | float | ndarray = 0.1, max_iter: int = 100, targeted: bool = False, batch_size: int = 32, verbose: bool = True)

The Basic Iterative Method is the iterative version of FGM and FGSM.

__init__(estimator: CLASSIFIER_LOSS_GRADIENTS_TYPE, eps: int | float | ndarray = 0.3, eps_step: int | float | ndarray = 0.1, max_iter: int = 100, targeted: bool = False, batch_size: int = 32, verbose: bool = True) None

Create a ProjectedGradientDescent instance.

Parameters:
  • estimator – A trained classifier.

  • eps – Maximum perturbation that the attacker can introduce.

  • eps_step – Attack step size (input variation) at each iteration.

  • max_iter (int) – The maximum number of iterations.

  • targeted (bool) – Indicates whether the attack is targeted (True) or untargeted (False).

  • batch_size (int) – Size of the batch on which adversarial samples are generated.

  • verbose (bool) – Show progress bars.

Projected Gradient Descent (PGD)

class art.attacks.evasion.ProjectedGradientDescent(estimator: CLASSIFIER_LOSS_GRADIENTS_TYPE | OBJECT_DETECTOR_TYPE, norm: int | float | str = inf, eps: int | float | ndarray = 0.3, eps_step: int | float | ndarray = 0.1, decay: float | None = None, max_iter: int = 100, targeted: bool = False, num_random_init: int = 0, batch_size: int = 32, random_eps: bool = False, summary_writer: str | bool | SummaryWriter = False, verbose: bool = True)

The Projected Gradient Descent attack is an iterative method in which, after each iteration, the perturbation is projected on an lp-ball of specified radius (in addition to clipping the values of the adversarial sample so that it lies in the permitted data range). This is the attack proposed by Madry et al. for adversarial training.

__init__(estimator: CLASSIFIER_LOSS_GRADIENTS_TYPE | OBJECT_DETECTOR_TYPE, norm: int | float | str = inf, eps: int | float | ndarray = 0.3, eps_step: int | float | ndarray = 0.1, decay: float | None = None, max_iter: int = 100, targeted: bool = False, num_random_init: int = 0, batch_size: int = 32, random_eps: bool = False, summary_writer: str | bool | SummaryWriter = False, verbose: bool = True)

Create a ProjectedGradientDescent instance.

Parameters:
  • estimator – An trained estimator.

  • norm – The norm of the adversarial perturbation supporting “inf”, np.inf, 1 or 2.

  • eps – Maximum perturbation that the attacker can introduce.

  • eps_step – Attack step size (input variation) at each iteration.

  • random_eps (bool) – When True, epsilon is drawn randomly from truncated normal distribution. The literature suggests this for FGSM based training to generalize across different epsilons. eps_step is modified to preserve the ratio of eps / eps_step. The effectiveness of this method with PGD is untested (https://arxiv.org/pdf/1611.01236.pdf).

  • decay – Decay factor for accumulating the velocity vector when using momentum.

  • max_iter (int) – The maximum number of iterations.

  • targeted (bool) – Indicates whether the attack is targeted (True) or untargeted (False).

  • num_random_init (int) – Number of random initialisations within the epsilon ball. For num_random_init=0 starting at the original input.

  • batch_size (int) – Size of the batch on which adversarial samples are generated.

  • summary_writer – Activate summary writer for TensorBoard. Default is False and deactivated summary writer. If True save runs/CURRENT_DATETIME_HOSTNAME in current directory. If of type str save in path. If of type SummaryWriter apply provided custom summary writer. Use hierarchical folder structure to compare between runs easily. e.g. pass in ‘runs/exp1’, ‘runs/exp2’, etc. for each new experiment to compare across them.

  • verbose (bool) – Show progress bars.

generate(x: ndarray, y: ndarray | None = None, **kwargs) ndarray

Generate adversarial samples and return them in an array.

Return type:

ndarray

Parameters:
  • x (ndarray) – An array with the original inputs.

  • y – Target values (class labels) one-hot-encoded of shape (nb_samples, nb_classes) or indices of shape (nb_samples,). Only provide this parameter if you’d like to use true labels when crafting adversarial samples. Otherwise, model predictions are used as labels to avoid the “label leaking” effect (explained in this paper: https://arxiv.org/abs/1611.01236). Default is None.

  • mask (np.ndarray) – An array with a mask broadcastable to input x defining where to apply adversarial perturbations. Shape needs to be broadcastable to the shape of x and can also be of the same shape as x. Any features for which the mask is zero will not be adversarially perturbed.

Returns:

An array holding the adversarial examples.

set_params(**kwargs) None

Take in a dictionary of parameters and apply attack-specific checks before saving them as attributes.

Parameters:

kwargs – A dictionary of attack-specific parameters.

property summary_writer

The summary writer.

Projected Gradient Descent (PGD) - Numpy

class art.attacks.evasion.ProjectedGradientDescentNumpy(estimator: CLASSIFIER_LOSS_GRADIENTS_TYPE | OBJECT_DETECTOR_TYPE, norm: int | float | str = inf, eps: int | float | ndarray = 0.3, eps_step: int | float | ndarray = 0.1, decay: float | None = None, max_iter: int = 100, targeted: bool = False, num_random_init: int = 0, batch_size: int = 32, random_eps: bool = False, summary_writer: str | bool | SummaryWriter = False, verbose: bool = True)

The Projected Gradient Descent attack is an iterative method in which, after each iteration, the perturbation is projected on an lp-ball of specified radius (in addition to clipping the values of the adversarial sample so that it lies in the permitted data range). This is the attack proposed by Madry et al. for adversarial training.

__init__(estimator: CLASSIFIER_LOSS_GRADIENTS_TYPE | OBJECT_DETECTOR_TYPE, norm: int | float | str = inf, eps: int | float | ndarray = 0.3, eps_step: int | float | ndarray = 0.1, decay: float | None = None, max_iter: int = 100, targeted: bool = False, num_random_init: int = 0, batch_size: int = 32, random_eps: bool = False, summary_writer: str | bool | SummaryWriter = False, verbose: bool = True) None

Create a ProjectedGradientDescentNumpy instance.

Parameters:
  • estimator – An trained estimator.

  • norm – The norm of the adversarial perturbation supporting “inf”, np.inf, 1 or 2.

  • eps – Maximum perturbation that the attacker can introduce.

  • eps_step – Attack step size (input variation) at each iteration.

  • random_eps (bool) – When True, epsilon is drawn randomly from truncated normal distribution. The literature suggests this for FGSM based training to generalize across different epsilons. eps_step is modified to preserve the ratio of eps / eps_step. The effectiveness of this method with PGD is untested (https://arxiv.org/pdf/1611.01236.pdf).

  • max_iter (int) – The maximum number of iterations.

  • targeted (bool) – Indicates whether the attack is targeted (True) or untargeted (False)

  • num_random_init (int) – Number of random initialisations within the epsilon ball. For num_random_init=0 starting at the original input.

  • batch_size (int) – Size of the batch on which adversarial samples are generated.

  • summary_writer – Activate summary writer for TensorBoard. Default is False and deactivated summary writer. If True save runs/CURRENT_DATETIME_HOSTNAME in current directory. If of type str save in path. If of type SummaryWriter apply provided custom summary writer. Use hierarchical folder structure to compare between runs easily. e.g. pass in ‘runs/exp1’, ‘runs/exp2’, etc. for each new experiment to compare across them.

  • verbose (bool) – Show progress bars.

generate(x: ndarray, y: ndarray | None = None, **kwargs) ndarray

Generate adversarial samples and return them in an array.

Return type:

ndarray

Parameters:
  • x (ndarray) – An array with the original inputs.

  • y – Target values (class labels) one-hot-encoded of shape (nb_samples, nb_classes) or indices of shape (nb_samples,). Only provide this parameter if you’d like to use true labels when crafting adversarial samples. Otherwise, model predictions are used as labels to avoid the “label leaking” effect (explained in this paper: https://arxiv.org/abs/1611.01236). Default is None.

  • mask (np.ndarray) – An array with a mask broadcastable to input x defining where to apply adversarial perturbations. Shape needs to be broadcastable to the shape of x and can also be of the same shape as x. Any features for which the mask is zero will not be adversarially perturbed.

Returns:

An array holding the adversarial examples.

Projected Gradient Descent (PGD) - PyTorch

class art.attacks.evasion.ProjectedGradientDescentPyTorch(estimator: PyTorchClassifier, norm: int | float | str = inf, eps: int | float | ndarray = 0.3, eps_step: int | float | ndarray = 0.1, decay: float | None = None, max_iter: int = 100, targeted: bool = False, num_random_init: int = 0, batch_size: int = 32, random_eps: bool = False, summary_writer: str | bool | SummaryWriter = False, verbose: bool = True)

The Projected Gradient Descent attack is an iterative method in which, after each iteration, the perturbation is projected on an lp-ball of specified radius (in addition to clipping the values of the adversarial sample so that it lies in the permitted data range). This is the attack proposed by Madry et al. for adversarial training.

__init__(estimator: PyTorchClassifier, norm: int | float | str = inf, eps: int | float | ndarray = 0.3, eps_step: int | float | ndarray = 0.1, decay: float | None = None, max_iter: int = 100, targeted: bool = False, num_random_init: int = 0, batch_size: int = 32, random_eps: bool = False, summary_writer: str | bool | SummaryWriter = False, verbose: bool = True)

Create a ProjectedGradientDescentPyTorch instance.

Parameters:
  • estimator – An trained estimator.

  • norm – The norm of the adversarial perturbation. Possible values: “inf”, np.inf, 1 or 2.

  • eps – Maximum perturbation that the attacker can introduce.

  • eps_step – Attack step size (input variation) at each iteration.

  • random_eps (bool) – When True, epsilon is drawn randomly from truncated normal distribution. The literature suggests this for FGSM based training to generalize across different epsilons. eps_step is modified to preserve the ratio of eps / eps_step. The effectiveness of this method with PGD is untested (https://arxiv.org/pdf/1611.01236.pdf).

  • max_iter (int) – The maximum number of iterations.

  • targeted (bool) – Indicates whether the attack is targeted (True) or untargeted (False).

  • num_random_init (int) – Number of random initialisations within the epsilon ball. For num_random_init=0 starting at the original input.

  • batch_size (int) – Size of the batch on which adversarial samples are generated.

  • summary_writer – Activate summary writer for TensorBoard. Default is False and deactivated summary writer. If True save runs/CURRENT_DATETIME_HOSTNAME in current directory. If of type str save in path. If of type SummaryWriter apply provided custom summary writer. Use hierarchical folder structure to compare between runs easily. e.g. pass in ‘runs/exp1’, ‘runs/exp2’, etc. for each new experiment to compare across them.

  • verbose (bool) – Show progress bars.

generate(x: ndarray, y: ndarray | None = None, **kwargs) ndarray

Generate adversarial samples and return them in an array.

Return type:

ndarray

Parameters:
  • x (ndarray) – An array with the original inputs.

  • y – Target values (class labels) one-hot-encoded of shape (nb_samples, nb_classes) or indices of shape (nb_samples,). Only provide this parameter if you’d like to use true labels when crafting adversarial samples. Otherwise, model predictions are used as labels to avoid the “label leaking” effect (explained in this paper: https://arxiv.org/abs/1611.01236). Default is None.

  • mask (np.ndarray) – An array with a mask broadcastable to input x defining where to apply adversarial perturbations. Shape needs to be broadcastable to the shape of x and can also be of the same shape as x. Any features for which the mask is zero will not be adversarially perturbed.

Returns:

An array holding the adversarial examples.

Projected Gradient Descent (PGD) - TensorFlowV2

class art.attacks.evasion.ProjectedGradientDescentTensorFlowV2(estimator: TensorFlowV2Classifier, norm: int | float | str = inf, eps: int | float | ndarray = 0.3, eps_step: int | float | ndarray = 0.1, decay: float | None = None, max_iter: int = 100, targeted: bool = False, num_random_init: int = 0, batch_size: int = 32, random_eps: bool = False, summary_writer: str | bool | SummaryWriter = False, verbose: bool = True)

The Projected Gradient Descent attack is an iterative method in which, after each iteration, the perturbation is projected on an lp-ball of specified radius (in addition to clipping the values of the adversarial sample so that it lies in the permitted data range). This is the attack proposed by Madry et al. for adversarial training.

__init__(estimator: TensorFlowV2Classifier, norm: int | float | str = inf, eps: int | float | ndarray = 0.3, eps_step: int | float | ndarray = 0.1, decay: float | None = None, max_iter: int = 100, targeted: bool = False, num_random_init: int = 0, batch_size: int = 32, random_eps: bool = False, summary_writer: str | bool | SummaryWriter = False, verbose: bool = True)

Create a ProjectedGradientDescentTensorFlowV2 instance.

Parameters:
  • estimator – An trained estimator.

  • norm – The norm of the adversarial perturbation. Possible values: np.inf, 1 or 2.

  • eps – Maximum perturbation that the attacker can introduce.

  • eps_step – Attack step size (input variation) at each iteration.

  • random_eps (bool) – When True, epsilon is drawn randomly from truncated normal distribution. The literature suggests this for FGSM based training to generalize across different epsilons. eps_step is modified to preserve the ratio of eps / eps_step. The effectiveness of this method with PGD is untested (https://arxiv.org/pdf/1611.01236.pdf).

  • decay – Decay factor for accumulating the velocity vector when using momentum.

  • max_iter (int) – The maximum number of iterations.

  • targeted (bool) – Indicates whether the attack is targeted (True) or untargeted (False).

  • num_random_init (int) – Number of random initialisations within the epsilon ball. For num_random_init=0 starting at the original input.

  • batch_size (int) – Size of the batch on which adversarial samples are generated.

  • summary_writer – Activate summary writer for TensorBoard. Default is False and deactivated summary writer. If True save runs/CURRENT_DATETIME_HOSTNAME in current directory. If of type str save in path. If of type SummaryWriter apply provided custom summary writer. Use hierarchical folder structure to compare between runs easily. e.g. pass in ‘runs/exp1’, ‘runs/exp2’, etc. for each new experiment to compare across them.

  • verbose (bool) – Show progress bars.

generate(x: ndarray, y: ndarray | None = None, **kwargs) ndarray

Generate adversarial samples and return them in an array.

Return type:

ndarray

Parameters:
  • x (ndarray) – An array with the original inputs.

  • y – Target values (class labels) one-hot-encoded of shape (nb_samples, nb_classes) or indices of shape (nb_samples,). Only provide this parameter if you’d like to use true labels when crafting adversarial samples. Otherwise, model predictions are used as labels to avoid the “label leaking” effect (explained in this paper: https://arxiv.org/abs/1611.01236). Default is None.

  • mask (np.ndarray) – An array with a mask broadcastable to input x defining where to apply adversarial perturbations. Shape needs to be broadcastable to the shape of x and can also be of the same shape as x. Any features for which the mask is zero will not be adversarially perturbed.

Returns:

An array holding the adversarial examples.

LaserAttack

class art.attacks.evasion.LaserAttack(estimator, iterations: int, laser_generator: ~art.attacks.evasion.laser_attack.utils.AdvObjectGenerator, image_generator: ~art.attacks.evasion.laser_attack.utils.ImageGenerator = <art.attacks.evasion.laser_attack.utils.ImageGenerator object>, random_initializations: int = 1, optimisation_algorithm: ~typing.Callable = <function greedy_search>, debug: ~art.attacks.evasion.laser_attack.utils.DebugInfo | None = None)

Implementation of a generic laser attack case.

__init__(estimator, iterations: int, laser_generator: ~art.attacks.evasion.laser_attack.utils.AdvObjectGenerator, image_generator: ~art.attacks.evasion.laser_attack.utils.ImageGenerator = <art.attacks.evasion.laser_attack.utils.ImageGenerator object>, random_initializations: int = 1, optimisation_algorithm: ~typing.Callable = <function greedy_search>, debug: ~art.attacks.evasion.laser_attack.utils.DebugInfo | None = None) None
Parameters:
  • estimator – Predictor of the image class.

  • iterations (int) – Maximum number of iterations of the algorithm.

  • laser_generator (AdvObjectGenerator) – Object responsible for generation laser beams images and their update.

  • image_generator (ImageGenerator) – Object responsible for image generation.

  • random_initializations (int) – How many times repeat the attack.

  • optimisation_algorithm – Algorithm used to generate adversarial example. May be replaced.

  • debug – Optional debug handler.

generate(x: ndarray, y: ndarray | None = None, **kwargs) ndarray

Generate adversarial examples.

Return type:

ndarray

Parameters:
  • x (ndarray) – Images to attack as a tensor in NHWC order

  • y – Array of correct classes

Returns:

Array of adversarial images

generate_parameters(x: ndarray, y: ndarray | None = None) List[Tuple[AdversarialObject | None, int | None]]

Generate adversarial parameters for given images.

Parameters:
  • x (ndarray) – Images to attack as a tensor (NRGB = (1, …))

  • y – Correct classes

Returns:

List of tuples of adversarial objects and predicted class.

LowProFool

class art.attacks.evasion.LowProFool(classifier: CLASSIFIER_CLASS_LOSS_GRADIENTS_TYPE, n_steps: int = 100, threshold: float | None = 0.5, lambd: float = 1.5, eta: float = 0.2, eta_decay: float = 0.98, eta_min: float = 1e-07, norm: int | float | str = 2, importance: Callable | str | ndarray = 'pearson', verbose: bool = False)

LowProFool attack.

__init__(classifier: CLASSIFIER_CLASS_LOSS_GRADIENTS_TYPE, n_steps: int = 100, threshold: float | None = 0.5, lambd: float = 1.5, eta: float = 0.2, eta_decay: float = 0.98, eta_min: float = 1e-07, norm: int | float | str = 2, importance: Callable | str | ndarray = 'pearson', verbose: bool = False) None

Create a LowProFool instance.

Parameters:
  • classifier – Appropriate classifier’s instance

  • n_steps (int) – Number of iterations to follow

  • threshold – Lowest prediction probability of a valid adversary

  • lambd (float) – Amount of lp-norm impact on objective function

  • eta (float) – Rate of updating the perturbation vectors

  • eta_decay (float) – Step-by-step decrease of eta

  • eta_min (float) – Minimal eta value

  • norm – Parameter p for Lp-space norm (norm=2 - euclidean norm)

  • importance – Function to calculate feature importance with or vector of those precomputed; possibilities: > ‘pearson’ - Pearson correlation (string) > function - Custom function (callable object) > vector - Vector of feature importance (np.ndarray)

  • verbose (bool) – Verbose mode / Show progress bars.

fit_importances(x: ndarray | None = None, y: ndarray | None = None, importance_array: ndarray | None = None, normalize: bool | None = True)

This function allows one to easily calculate the feature importance vector using the pre-specified function, in case it wasn’t passed at initialization.

Parameters:
  • x – Design matrix of the dataset used to train the classifier.

  • y – Labels of the dataset used to train the classifier.

  • importance_array – Array providing features’ importance score.

  • normalize – Assure that feature importance values sum to 1.

Returns:

LowProFool instance itself.

generate(x: ndarray, y: ndarray | None = None, **kwargs) ndarray

Generate adversaries for the samples passed in the x data matrix, whose targets are specified in y, one-hot-encoded target matrix. This procedure makes use of the LowProFool algorithm. In the case of failure, the resulting array will contain the initial samples on the problematic positions - which otherwise should contain the best adversary found in the process.

Return type:

ndarray

Parameters:
  • x (ndarray) – An array with the original inputs to be attacked.

  • y – One-hot-encoded target classes of shape (nb_samples, nb_classes).

  • kwargs

Returns:

An array holding the adversarial examples.

NewtonFool

class art.attacks.evasion.NewtonFool(classifier: CLASSIFIER_CLASS_LOSS_GRADIENTS_TYPE, max_iter: int = 100, eta: float = 0.01, batch_size: int = 1, verbose: bool = True)

Implementation of the attack from Uyeong Jang et al. (2017).

__init__(classifier: CLASSIFIER_CLASS_LOSS_GRADIENTS_TYPE, max_iter: int = 100, eta: float = 0.01, batch_size: int = 1, verbose: bool = True) None

Create a NewtonFool attack instance.

Parameters:
  • classifier – A trained classifier.

  • max_iter (int) – The maximum number of iterations.

  • eta (float) – The eta coefficient.

  • batch_size (int) – Size of the batch on which adversarial samples are generated.

  • verbose (bool) – Show progress bars.

generate(x: ndarray, y: ndarray | None = None, **kwargs) ndarray

Generate adversarial samples and return them in a Numpy array.

Return type:

ndarray

Parameters:
  • x (ndarray) – An array with the original inputs to be attacked.

  • y – An array with the original labels to be predicted.

Returns:

An array holding the adversarial examples.

Malware Gradient Descent - TensorFlow

class art.attacks.evasion.MalwareGDTensorFlow(classifier: CLASSIFIER_NEURALNETWORK_TYPE, embedding_weights: ndarray, param_dic: Dict[str, int], num_of_iterations: int = 10, l_0: float | int = 0.1, l_r: float = 1.0, use_sign: bool = False, verbose: bool = False)
Implementation of the following white-box attacks related to PE malware crafting:
  1. Append based attacks (example paper link: https://arxiv.org/abs/1810.08280)

  2. Section insertion attacks (example paper link: https://arxiv.org/abs/2008.07125)

  3. Slack manipulation attacks (example paper link: https://arxiv.org/abs/1810.08280)

  4. DOS Header Attacks (example paper link: https://arxiv.org/abs/1901.03583)

__init__(classifier: CLASSIFIER_NEURALNETWORK_TYPE, embedding_weights: ndarray, param_dic: Dict[str, int], num_of_iterations: int = 10, l_0: float | int = 0.1, l_r: float = 1.0, use_sign: bool = False, verbose: bool = False) None
Parameters:
  • classifier – A trained classifier that takes in the PE embeddings to make a prediction.

  • embedding_weights (ndarray) – Weights for the embedding layer

  • param_dic – A dictionary specifying some MalConv parameters. ‘maxlen’: the input size to the MalConv model ‘input_dim’: the number of discrete values, normally 257. ‘embedding_size’: size of the embedding layer. Default 8.

  • num_of_iterations (int) – The number of iterations to apply.

  • l_0 – l_0 bound for the attack. If less then 1 it is interpreted as a fraction of the file size. If larger than 1 it is interpreted as the total number of permissible features to change.

  • l_r (float) – Learning rate for the optimisation

  • use_sign (bool) – If we want to use the sign of the gradient, rather then the gradient itself.

  • verbose (bool) – Show progress bars.

check_valid_size(y: ndarray, sample_sizes: ndarray, append_perturbation_size: ndarray) ndarray

Checks that we can append the l0 perturbation to the malware sample and not exceed the maximum file size. A new label vector with just the valid files indicated is created.

Return type:

ndarray

Parameters:
  • y (ndarray) – Labels.

  • sample_sizes (ndarray) – The size of the original file, before it was padded to the input size required by MalConv.

  • append_perturbation_size (ndarray) – Size of the perturbations in L0 terms to put at end of file.

Return adv_label_vector:

Labels which indicate which malware samples have enough free features to accommodate all the adversarial perturbation.

compute_perturbation_regions(input_perturbation_size: ndarray, input_perturb_sizes: List[List[int]], automatically_append: bool) Tuple[ndarray, List[List[int]]]

Based on the l0 budget and the provided allowable perturbation regions we iteratively mark regions of the PE file for modification until we exhaust our budget.

Parameters:
  • input_perturb_sizes – The size of the regions we can perturb.

  • input_perturbation_size (ndarray) – The total amount of perturbation allowed on a specific sample.

  • automatically_append (bool) – If we want to automatically append unused perturbation on the end of the malware.

Return perturbation_size:

Remaining perturbation (if any)

Return perturb_sizes:

Potentially adjusted sizes of the locations in the PE file we can perturb.

generate(x: ndarray, y: ndarray | None = None, sample_sizes: ndarray | None = None, automatically_append: bool = True, verify_input_data: bool = True, perturb_sizes: List[List[int]] | None = None, perturb_starts: List[List[int]] | None = None, **kwargs) ndarray

Generates the adversarial examples. x needs to be composed of valid files by default which can support the adversarial perturbation and so are malicious and can support the assigned L0 budget. They can obtained by using pull_out_valid_samples on the data.

This check on the input data can be over-ridden by toggling the flag verify_input_data This will result in only the data which can be made adversarial being perturbed and so the resulting batch will be a mixture of adversarial and unperturbed data.

To assign the L0 budget we go through each list in perturb_sizes and perturb_starts in order, and assign the budget based on the sizes given until the l0 budget is exhausted.

After all the regions marked in perturb_sizes and perturb_starts have been assigned and automatically_append is set to true and remaining l0 perturbation the extra perturbation is added at the end in an append style attack.

Return type:

ndarray

Parameters:
  • x (ndarray) – A array with input data.

  • y – (N, 1) binary labels to make sure the benign files are zero masked.

  • sample_sizes – The size of the original file, before it was padded to the input size required by MalConv

  • automatically_append (bool) – Whether to automatically append extra spare perturbation at the end of the file.

  • verify_input_data (bool) – If to check that all the data supplied is valid for adversarial perturbations.

  • perturb_sizes – A list of length batch size, each element is in itself a list containing the size of the allowable perturbation region

  • perturb_starts – A list of length batch size, each element is in itself a list containing the start of perturbation region.

Return x:

our adversarial examples.

generate_mask(x: ndarray, y: ndarray, sample_sizes: ndarray, perturbation_size: ndarray, perturb_sizes: List[List[int]] | None, perturb_starts: List[List[int]] | None) tf.Tensor

Makes a mask to apply to the gradients to control which samples in the batch are perturbed.

Parameters:
  • x (ndarray) – Array with input data.

  • y (ndarray) – Labels to make sure the benign files are zero masked.

  • sample_sizes (ndarray) – The size of the original file, before it was padded to the input size required by MalConv

  • perturbation_size (ndarray) – Size of the perturbations in L0 terms to put at end of file

  • perturb_sizes – List of length batch size, each element is in itself a list containing the size of the allowable perturbation region

  • perturb_starts – List of length batch size, each element is in itself a list containing the start of perturbation region.

Return mask:

Array with 1s on the features we will modify on this batch and 0s elsewhere.

get_adv_malware(embeddings: tf.Tensor, data: ndarray, labels: ndarray, fsize: ndarray, perturbation_size: ndarray, perturb_sizes: List[List[int]] | None = None, perturb_starts: List[List[int]] | None = None) ndarray

Project the adversarial example back though the closest l2 vector.

Embeddings:

Adversarially optimised embeddings

Labels:

Labels for the data

Fsize:

Size of the original malware

Data:

Original data in the feature space

Perturbation_size:

Size of the l0 attack to append (if any).

Perturb_sizes:

List, with each element itself being a list of the start positions of a perturbation regions in a sample

Perturb_starts:

List, with each element itself being a list of the start positions of a start of the perturbation regions in a sample

Return data:

Numpy array with valid data samples.

static get_dos_locations(x: ndarray) Tuple[List[List[int]], List[List[int]]]

We identify the regions in the DOS header which we can perturb adversarially.

There are a series of “magic numbers” in this method which relate to the structure of the PE file. 1) mz_offset = 2: the first two bytes of a PE are fixed as MZ. 2) 0x3C: offset to the pointer to the PE header. The pointer is 4 bytes long. 3) 0x40: end of the pointer to the PE header.

Return batch_of_starts:

A list of start locations we can perturb. This will always have the same value of 2 and 64.

Return batch_of_sizes:

Size of the perturbations we can carry out.

Return batch_of_starts:

Start locations which we can perturb.

static get_peinfo(filepath: str, save_to_json_path: str | None = None) Tuple[List[int], List[int]]

Given a PE file we extract out the section information to determine the slack regions in the file. We return two lists 1) with the start location of the slack regions and 2) with the size of the slack region. We are using the lief library (https://github.com/lief-project/LIEF) to manipulate the PE file.

Parameters:
  • filepath (str) – Path to file we want to analyse with pedump and get the section information.

  • save_to_json_path – (Optional) if we want to save the results of pedump to a json file, provide the path.

Return start_of_slack:

A list with the slack starts

Return size_of_slack:

A list with the slack start positions

static initialise_sample(x: ndarray, y: ndarray, sample_sizes: ndarray, perturbation_size: ndarray, perturb_sizes: List[List[int]] | None, perturb_starts: List[List[int]] | None) ndarray

Randomly append bytes at the end of the malware to initialise it, or if perturbation regions are provided, perturb those.

Return type:

ndarray

Parameters:
  • x (ndarray) – Array with input data.

  • y (ndarray) – Labels, after having been adjusted to account for malware which cannot support the full l0 budget.

  • sample_sizes (ndarray) – The size of the original file, before it was padded to the input size required by MalConv

  • perturbation_size (ndarray) – Size of the perturbations in L0 terms to put at end of file

  • perturb_sizes – List of length batch size, each element is in itself a list containing the size of the allowable perturbation region

  • perturb_starts – List of length batch size, each element is in itself a list containing the start of perturbation region.

Return x:

Array with features to be perturbed set to a random value.

insert_section(datapoint: List[int] | str, sample_size: int | None = None, padding_char: int = 256, maxlen: int = 1048576, bytes_to_assign: int | None = None, verbose: bool = False) Tuple[ndarray, int, int, int, List[int], List[int]] | Tuple[None, None, None, None, None, None]

Create a new section in a PE file that the attacker can perturb to create an adversarial example. we are using the lief library (https://github.com/lief-project/LIEF) to manipulate the PE file.

Parameters:
  • datapoint

    either 1) path to file we want to analyse with lief and get the section information. or 2) list of ints that can be processed by lief.

    If we have already pre-processed the file into a numpy array, we convert it to a form that can be read by lief. eg, if we have it as a numpy array this could be done by:

    datapoint = datapoint[0:size] # size is the original size of the malware file datapoint = datapoint.astype(‘uint8’) datapoint = datapoint.tolist()

  • sample_size – Size of the original datapoint. Only if it is an array and the l0 budget is fractional

  • padding_char (int) – The char to use to pad the file to be of length maxlen

  • maxlen (int) – Maximum length of the data that the MalConv model can process

  • bytes_to_assign – (Optional) how many bytes we wish to specify when inserting a new section. If unspecified the whole l0 budget will be used on a single section.

  • verbose (bool) – lief outputs a lot to the console, particularly if we are processing many files. By default suppress printing of messages. Can be toggled on/off by True/False

Return manipulated_data:

Executable with section inserted and turned into a numpy array of the appropriate size

Return len(manipulated_file):

Size of original file

Return information_on_section.pointerto_raw_data:

The start of the inserted section

Return information_on_section.virtual_size:

Size of the inserted section

Return size_of_slack:

Size of slack regions in this executable (including from the section we just inserted)

Return start_of_slack:

Start of slack regions in this executable (including from the section we just inserted)

static process_file(filepath: str, padding_char: int = 256, maxlen: int = 1048576) Tuple[ndarray, int]

Go from raw file to numpy array.

Parameters:
  • filepath (str) – Path to the file we convert to a numpy array

  • padding_char (int) – The char to use to pad the input if it is shorter then maxlen

  • maxlen (int) – Maximum size of the file processed by the model. Currently set to 1MB

Return data:

A numpy array of the PE file

Return size_of_original_file:

Size of the PE file

static pull_out_adversarial_malware(x: ndarray, y: ndarray, sample_sizes: ndarray, initial_dtype: dtype, input_perturb_sizes: List[List[int]] | None = None, input_perturb_starts: List[List[int]] | None = None) Tuple[ndarray, ndarray, ndarray] | Tuple[ndarray, ndarray, ndarray, List[List[int]], List[List[int]]]

Fetches the malware from the data

Parameters:
  • x (ndarray) – Batch of data which will contain a mix of adversarial examples and unperturbed data.

  • y (ndarray) – Labels indicating which are valid adversarial examples or not.

  • initial_dtype (dtype) – Data can be given in a few formats (uin16, float, etc) so use initial_dtype to make the returned sample match the original.

  • sample_sizes (ndarray) – Size of the original data files

  • input_perturb_sizes – List of length batch size, each element is in itself a list containing the size of the allowable perturbation region

  • input_perturb_starts – List of length batch size, each element is in itself a list containing the start of perturbation region.

Return adv_x:

array composed of only the data that we can make valid adversarial examples from.

Return adv_y:

labels, all ones.

pull_out_valid_samples(x: ndarray, y: ndarray, sample_sizes: ndarray, automatically_append: bool = True, perturb_sizes: List[List[int]] | None = None, perturb_starts: List[List[int]] | None = None) Tuple[ndarray, ndarray, ndarray] | Tuple[ndarray, ndarray, ndarray, List[List[int]], List[List[int]]]

Filters the input data for samples that can be made adversarial.

Parameters:
  • x (ndarray) – Array with input data.

  • y (ndarray) – Labels to make sure the benign files are zero masked.

  • sample_sizes (ndarray) – The size of the original file, before it was padded to the input size required by MalConv

  • automatically_append (bool) – Whether to automatically append extra spare perturbation at the end of the file.

  • perturb_sizes – List of length batch size, each element is in itself a list containing the size of the allowable perturbation region

  • perturb_starts – List of length batch size, each element is in itself a list containing the start of perturbation region.

update_embeddings(embeddings: tf.Tensor, gradients: tf.Tensor, mask: tf.Tensor) tf.Tensor

Update embeddings.

Parameters:
  • embeddings – Embeddings produced by the data from passing it through the first embedding layer of MalConv

  • gradients – Gradients to update the embeddings

  • mask – Tensor with 1s on the embeddings we modify, 0s elsewhere.

Return embeddings:

Updated embeddings wrt the adversarial objective.

Over The Air Flickering Attack - PyTorch

class art.attacks.evasion.OverTheAirFlickeringPyTorch(classifier: PyTorchClassifier, eps_step: float = 0.01, max_iter: int = 30, beta_0: float = 1.0, beta_1: float = 0.5, beta_2: float = 0.5, loss_margin: float = 0.05, batch_size: int = 1, start_frame_index: int = 0, num_frames: int | None = None, round_samples: float = 0.0, targeted: bool = False, verbose: bool = True)

This module contains an implementation of the Over-the-Air Adversarial Flickering attack on video recognition networks.

__init__(classifier: PyTorchClassifier, eps_step: float = 0.01, max_iter: int = 30, beta_0: float = 1.0, beta_1: float = 0.5, beta_2: float = 0.5, loss_margin: float = 0.05, batch_size: int = 1, start_frame_index: int = 0, num_frames: int | None = None, round_samples: float = 0.0, targeted: bool = False, verbose: bool = True)

Create an instance of the OverTheAirFlickeringPyTorch.

Parameters:
  • classifier – A trained classifier.

  • eps_step (float) – The step size per iteration.

  • max_iter (int) – The number of iterations.

  • beta_0 (float) – Weighting of the sum of all regularisation terms corresponding to lambda in the original paper.

  • beta_1 (float) – Weighting of thickness regularisation.

  • beta_2 (float) – Weighting of roughness regularisation.

  • loss_margin (float) – The loss margin.

  • batch_size (int) – Batch size.

  • start_frame_index (int) – The first frame to be perturbed.

  • num_frames – The number of frames to be perturbed.

  • round_samples (float) – Granularity of the input values to be enforced if > 0.0.

  • targeted (bool) – Indicates whether the attack is targeted (True) or untargeted (False).

  • verbose (bool) – Show progress bars.

generate(x: ndarray, y: ndarray | None = None, **kwargs) ndarray

Generate adversarial examples.

Return type:

ndarray

Parameters:
  • x (ndarray) – Original input samples representing videos of format NFHWC.

  • y – Target values (class labels) one-hot-encoded of shape (nb_samples, nb_classes) or indices of shape (nb_samples,).

Returns:

Adversarial examples.

PixelAttack

class art.attacks.evasion.PixelAttack(classifier: CLASSIFIER_NEURALNETWORK_TYPE, th: int | None = None, es: int = 1, max_iter: int = 100, targeted: bool = False, verbose: bool = False)

This attack was originally implemented by Vargas et al. (2019). It is generalisation of One Pixel Attack originally implemented by Su et al. (2019).

One Pixel Attack Paper link: https://arxiv.org/abs/1710.08864
Pixel Attack Paper link: https://arxiv.org/abs/1906.06026
__init__(classifier: CLASSIFIER_NEURALNETWORK_TYPE, th: int | None = None, es: int = 1, max_iter: int = 100, targeted: bool = False, verbose: bool = False) None

Create a PixelAttack instance.

Parameters:
  • classifier – A trained classifier.

  • th – threshold value of the Pixel/ Threshold attack. th=None indicates finding a minimum threshold.

  • es (int) – Indicates whether the attack uses CMAES (0) or DE (1) as Evolutionary Strategy.

  • max_iter (int) – Sets the Maximum iterations to run the Evolutionary Strategies for optimisation.

  • targeted (bool) – Indicates whether the attack is targeted (True) or untargeted (False).

  • verbose (bool) – Indicates whether to print verbose messages of ES used.

ThresholdAttack

class art.attacks.evasion.ThresholdAttack(classifier: CLASSIFIER_NEURALNETWORK_TYPE, th: int | None = None, es: int = 0, max_iter: int = 100, targeted: bool = False, verbose: bool = False)

This attack was originally implemented by Vargas et al. (2019).

__init__(classifier: CLASSIFIER_NEURALNETWORK_TYPE, th: int | None = None, es: int = 0, max_iter: int = 100, targeted: bool = False, verbose: bool = False) None

Create a PixelThreshold instance.

Parameters:
  • classifier – A trained classifier.

  • th – threshold value of the Pixel/ Threshold attack. th=None indicates finding a minimum threshold.

  • es (int) – Indicates whether the attack uses CMAES (0) or DE (1) as Evolutionary Strategy.

  • max_iter (int) – Sets the Maximum iterations to run the Evolutionary Strategies for optimisation.

  • targeted (bool) – Indicates whether the attack is targeted (True) or untargeted (False).

  • verbose (bool) – Indicates whether to print verbose messages of ES used.

Jacobian Saliency Map Attack (JSMA)

class art.attacks.evasion.SaliencyMapMethod(classifier: CLASSIFIER_CLASS_LOSS_GRADIENTS_TYPE, theta: float = 0.1, gamma: float = 1.0, batch_size: int = 1, verbose: bool = True)

Implementation of the Jacobian-based Saliency Map Attack (Papernot et al. 2016).

__init__(classifier: CLASSIFIER_CLASS_LOSS_GRADIENTS_TYPE, theta: float = 0.1, gamma: float = 1.0, batch_size: int = 1, verbose: bool = True) None

Create a SaliencyMapMethod instance.

Parameters:
  • classifier – A trained classifier.

  • theta (float) – Amount of Perturbation introduced to each modified feature per step (can be positive or negative).

  • gamma (float) – Maximum fraction of features being perturbed (between 0 and 1).

  • batch_size (int) – Size of the batch on which adversarial samples are generated.

  • verbose (bool) – Show progress bars.

generate(x: ndarray, y: ndarray | None = None, **kwargs) ndarray

Generate adversarial samples and return them in an array.

Return type:

ndarray

Parameters:
  • x (ndarray) – An array with the original inputs to be attacked.

  • y – Target values (class labels) one-hot-encoded of shape (nb_samples, nb_classes) or indices of shape (nb_samples,).

Returns:

An array holding the adversarial examples.

Shadow Attack

class art.attacks.evasion.ShadowAttack(estimator: TensorFlowV2Classifier | TensorFlowV2RandomizedSmoothing | PyTorchClassifier | PyTorchRandomizedSmoothing, sigma: float = 0.5, nb_steps: int = 300, learning_rate: float = 0.1, lambda_tv: float = 0.3, lambda_c: float = 1.0, lambda_s: float = 0.5, batch_size: int = 400, targeted: bool = False, verbose: bool = True)

Implementation of the Shadow Attack.

__init__(estimator: TensorFlowV2Classifier | TensorFlowV2RandomizedSmoothing | PyTorchClassifier | PyTorchRandomizedSmoothing, sigma: float = 0.5, nb_steps: int = 300, learning_rate: float = 0.1, lambda_tv: float = 0.3, lambda_c: float = 1.0, lambda_s: float = 0.5, batch_size: int = 400, targeted: bool = False, verbose: bool = True)

Create an instance of the ShadowAttack.

Parameters:
  • estimator – A trained classifier.

  • sigma (float) – Standard deviation random Gaussian Noise.

  • nb_steps (int) – Number of SGD steps.

  • learning_rate (float) – Learning rate for SGD.

  • lambda_tv (float) – Scalar penalty weight for total variation of the perturbation.

  • lambda_c (float) – Scalar penalty weight for change in the mean of each color channel of the perturbation.

  • lambda_s (float) – Scalar penalty weight for similarity of color channels in perturbation.

  • batch_size (int) – The size of the training batch.

  • targeted (bool) – True if the attack is targeted.

  • verbose (bool) – Show progress bars.

generate(x: ndarray, y: ndarray | None = None, **kwargs) ndarray

Generate adversarial samples and return them in an array. This requires a lot of memory, therefore it accepts only a single samples as input, e.g. a batch of size 1.

Return type:

ndarray

Parameters:
  • x (ndarray) – An array of a single original input sample.

  • y – An array of a single target label.

Returns:

An array with the adversarial examples.

ShapeShifter Attack

class art.attacks.evasion.ShapeShifter(estimator: TensorFlowFasterRCNN, random_transform: Callable, box_classifier_weight: float = 1.0, box_localizer_weight: float = 2.0, rpn_classifier_weight: float = 1.0, rpn_localizer_weight: float = 2.0, box_iou_threshold: float = 0.5, box_victim_weight: float = 0.0, box_target_weight: float = 0.0, box_victim_cw_weight: float = 0.0, box_victim_cw_confidence: float = 0.0, box_target_cw_weight: float = 0.0, box_target_cw_confidence: float = 0.0, rpn_iou_threshold: float = 0.5, rpn_background_weight: float = 0.0, rpn_foreground_weight: float = 0.0, rpn_cw_weight: float = 0.0, rpn_cw_confidence: float = 0.0, similarity_weight: float = 0.0, learning_rate: float = 1.0, optimizer: str = 'GradientDescentOptimizer', momentum: float = 0.0, decay: float = 0.0, sign_gradients: bool = False, random_size: int = 10, max_iter: int = 10, texture_as_input: bool = False, use_spectral: bool = True, soft_clip: bool = False)

Implementation of the ShapeShifter attack. This is a robust physical adversarial attack on Faster R-CNN object detector and is developed in TensorFlow.

__init__(estimator: TensorFlowFasterRCNN, random_transform: Callable, box_classifier_weight: float = 1.0, box_localizer_weight: float = 2.0, rpn_classifier_weight: float = 1.0, rpn_localizer_weight: float = 2.0, box_iou_threshold: float = 0.5, box_victim_weight: float = 0.0, box_target_weight: float = 0.0, box_victim_cw_weight: float = 0.0, box_victim_cw_confidence: float = 0.0, box_target_cw_weight: float = 0.0, box_target_cw_confidence: float = 0.0, rpn_iou_threshold: float = 0.5, rpn_background_weight: float = 0.0, rpn_foreground_weight: float = 0.0, rpn_cw_weight: float = 0.0, rpn_cw_confidence: float = 0.0, similarity_weight: float = 0.0, learning_rate: float = 1.0, optimizer: str = 'GradientDescentOptimizer', momentum: float = 0.0, decay: float = 0.0, sign_gradients: bool = False, random_size: int = 10, max_iter: int = 10, texture_as_input: bool = False, use_spectral: bool = True, soft_clip: bool = False)

Create an instance of the ShapeShifter.

Parameters:
  • estimator (TensorFlowFasterRCNN) – A trained object detector.

  • random_transform – A function applies random transformations to images/textures.

  • box_classifier_weight (float) – Weight of box classifier loss.

  • box_localizer_weight (float) – Weight of box localizer loss.

  • rpn_classifier_weight (float) – Weight of RPN classifier loss.

  • rpn_localizer_weight (float) – Weight of RPN localizer loss.

  • box_iou_threshold (float) – Box intersection over union threshold.

  • box_victim_weight (float) – Weight of box victim loss.

  • box_target_weight (float) – Weight of box target loss.

  • box_victim_cw_weight (float) – Weight of box victim CW loss.

  • box_victim_cw_confidence (float) – Confidence of box victim CW loss.

  • box_target_cw_weight (float) – Weight of box target CW loss.

  • box_target_cw_confidence (float) – Confidence of box target CW loss.

  • rpn_iou_threshold (float) – RPN intersection over union threshold.

  • rpn_background_weight (float) – Weight of RPN background loss.

  • rpn_foreground_weight (float) – Weight of RPN foreground loss.

  • rpn_cw_weight (float) – Weight of RPN CW loss.

  • rpn_cw_confidence (float) – Confidence of RPN CW loss.

  • similarity_weight (float) – Weight of similarity loss.

  • learning_rate (float) – Learning rate.

  • optimizer (str) – Optimizer including one of the following choices: GradientDescentOptimizer, MomentumOptimizer, RMSPropOptimizer, AdamOptimizer.

  • momentum (float) – Momentum for RMSPropOptimizer, MomentumOptimizer.

  • decay (float) – Learning rate decay for RMSPropOptimizer.

  • sign_gradients (bool) – Whether to use the sign of gradients for optimization.

  • random_size (int) – Random sample size.

  • max_iter (int) – Maximum number of iterations.

  • texture_as_input (bool) – Whether textures are used as inputs instead of images.

  • use_spectral (bool) – Whether to use spectral with textures.

  • soft_clip (bool) – Whether to apply soft clipping on textures.

generate(x: ndarray, y: ndarray | None = None, **kwargs) ndarray

Generate adversarial samples and return them in an array.

Return type:

ndarray

Parameters:
  • x (ndarray) – Sample image/texture.

  • y – Not used.

  • label (Dict[str, List[np.ndarray]]) –

    A dictionary of target labels for object detector. The fields of the dictionary are as follows:

    • groundtruth_boxes_list: A list of nb_samples size of 2-D tf.float32 tensors of shape

      [num_boxes, 4] containing coordinates of the groundtruth boxes. Groundtruth boxes are provided in [y_min, x_min, y_max, x_max] format and also assumed to be normalized as well as clipped relative to the image window with conditions y_min <= y_max and x_min <= x_max.

    • groundtruth_classes_list: A list of nb_samples size of 1-D tf.float32 tensors of shape

      [num_boxes] containing the class targets with the zero index assumed to map to the first non-background class.

    • groundtruth_weights_list: A list of nb_samples size of 1-D tf.float32 tensors of shape

      [num_boxes] containing weights for groundtruth boxes.

  • mask (np.ndarray.) – Input mask.

  • target_class (int) – Target class.

  • victim_class (int) – Victim class.

  • custom_loss (Tensor) – Custom loss function from users.

  • rendering_function (Callable) – A rendering function to use textures as input.

Returns:

Adversarial image/texture.

Sign-OPT Attack

class art.attacks.evasion.SignOPTAttack(estimator: CLASSIFIER_TYPE, targeted: bool = True, epsilon: float = 0.001, num_trial: int = 100, max_iter: int = 1000, query_limit: int = 20000, k: int = 200, alpha: float = 0.2, beta: float = 0.001, eval_perform: bool = False, batch_size: int = 64, verbose: bool = False)

Implements the Sign-OPT attack SignOPTAttack. This is a query-efficient hard-label adversarial attack.

Paper link: https://arxiv.org/pdf/1909.10773.pdf

__init__(estimator: CLASSIFIER_TYPE, targeted: bool = True, epsilon: float = 0.001, num_trial: int = 100, max_iter: int = 1000, query_limit: int = 20000, k: int = 200, alpha: float = 0.2, beta: float = 0.001, eval_perform: bool = False, batch_size: int = 64, verbose: bool = False) None

Create a Sign_OPT attack instance.

Parameters:
  • estimator – A trained classifier.

  • targeted (bool) – Should the attack target one specific class.

  • epsilon (float) – A very small smoothing parameter.

  • num_trial (int) – A number of trials to calculate a good starting point

  • max_iter (int) – Maximum number of iterations. Default value is for untargeted attack, increase to recommended 5000 for targeted attacks.

  • query_limit (int) – Limitation for number of queries to prediction model. Default value is for untargeted attack, increase to recommended 40000 for targeted attacks.

  • k (int) – Number of random directions (for estimating the gradient)

  • alpha (float) – The step length for line search

  • beta (float) – The tolerance for line search

  • batch_size (int) – The size of the batch used by the estimator during inference.

  • verbose (bool) – Show detailed information

  • eval_perform (bool) – Evaluate performance with Avg. L2 and Success Rate with randomly choosing 100 samples

generate(x: ndarray, y: ndarray | None = None, **kwargs) ndarray

Generate adversarial samples and return them in an array.

Return type:

ndarray

Parameters:
  • x (ndarray) – An array with the original inputs to be attacked.

  • y – Target values (class labels) one-hot-encoded of shape (nb_samples, nb_classes) or indices of shape (nb_samples,). If self.targeted is true, then y represents the target labels.

  • kwargs – See below.

Keyword Arguments:
  • x_init – Initialisation samples of the same shape as x for targeted attacks.

Returns:

An array holding the adversarial examples.

Simple Black-box Adversarial Attack

class art.attacks.evasion.SimBA(classifier: CLASSIFIER_TYPE, attack: str = 'dct', max_iter: int = 3000, order: str = 'random', epsilon: float = 0.1, freq_dim: int = 4, stride: int = 1, targeted: bool = False, batch_size: int = 1, verbose: bool = True)

This class implements the black-box attack SimBA.

__init__(classifier: CLASSIFIER_TYPE, attack: str = 'dct', max_iter: int = 3000, order: str = 'random', epsilon: float = 0.1, freq_dim: int = 4, stride: int = 1, targeted: bool = False, batch_size: int = 1, verbose: bool = True)

Create a SimBA (dct) attack instance.

Parameters:
  • classifier – A trained classifier predicting probabilities and not logits.

  • attack (str) – attack type: pixel (px) or DCT (dct) attacks

  • max_iter (int) – The maximum number of iterations.

  • epsilon (float) – Overshoot parameter.

  • order (str) – order of pixel attacks: random or diagonal (diag)

  • freq_dim (int) – dimensionality of 2D frequency space (DCT).

  • stride (int) – stride for block order (DCT).

  • targeted (bool) – perform targeted attack

  • batch_size (int) – Batch size (but, batch process unavailable in this implementation)

  • verbose (bool) – Show progress bars.

diagonal_order(image_size, channels)

Defines a diagonal order for pixel attacks. order is fixed across diagonals but are randomized across channels and within the diagonal e.g. [1, 2, 5] [3, 4, 8] [6, 7, 9]

Parameters:
  • image_size – image size (i.e., width or height)

  • channels – the number of channels

Return order:

An array holding the diagonal order of pixel attacks.

generate(x: ndarray, y: ndarray | None = None, **kwargs) ndarray

Generate adversarial samples and return them in an array.

Return type:

ndarray

Parameters:
  • x (ndarray) – An array with the original inputs to be attacked.

  • y – An array with the true or target labels.

Returns:

An array holding the adversarial examples.

Spatial Transformations Attack

class art.attacks.evasion.SpatialTransformation(classifier: CLASSIFIER_NEURALNETWORK_TYPE, max_translation: float = 0.0, num_translations: int = 1, max_rotation: float = 0.0, num_rotations: int = 1, verbose: bool = True)

Implementation of the spatial transformation attack using translation and rotation of inputs. The attack conducts black-box queries to the target model in a grid search over possible translations and rotations to find optimal attack parameters.

__init__(classifier: CLASSIFIER_NEURALNETWORK_TYPE, max_translation: float = 0.0, num_translations: int = 1, max_rotation: float = 0.0, num_rotations: int = 1, verbose: bool = True) None
Parameters:
  • classifier – A trained classifier.

  • max_translation (float) – The maximum translation in any direction as percentage of image size. The value is expected to be in the range [0, 100].

  • num_translations (int) – The number of translations to search on grid spacing per direction.

  • max_rotation (float) – The maximum rotation in either direction in degrees. The value is expected to be in the range [0, 180].

  • num_rotations (int) – The number of rotations to search on grid spacing.

  • verbose (bool) – Show progress bars.

generate(x: ndarray, y: ndarray | None = None, **kwargs) ndarray

Generate adversarial samples and return them in an array.

Return type:

ndarray

Parameters:
  • x (ndarray) – An array with the original inputs.

  • y – An array with the original labels to be predicted.

Returns:

An array holding the adversarial examples.

Square Attack

class art.attacks.evasion.SquareAttack(estimator: CLASSIFIER_TYPE, norm: int | float | str = inf, adv_criterion: Callable[[ndarray, ndarray], bool] | None = None, loss: Callable[[ndarray, ndarray], ndarray] | None = None, max_iter: int = 100, eps: float = 0.3, p_init: float = 0.8, nb_restarts: int = 1, batch_size: int = 128, verbose: bool = True)

This class implements the SquareAttack attack.

__init__(estimator: CLASSIFIER_TYPE, norm: int | float | str = inf, adv_criterion: Callable[[ndarray, ndarray], bool] | None = None, loss: Callable[[ndarray, ndarray], ndarray] | None = None, max_iter: int = 100, eps: float = 0.3, p_init: float = 0.8, nb_restarts: int = 1, batch_size: int = 128, verbose: bool = True)

Create a SquareAttack instance.

Parameters:
  • estimator – An trained estimator.

  • norm – The norm of the adversarial perturbation. Possible values: “inf”, np.inf, 1 or 2.

  • adv_criterion – The criterion which the attack should use in determining adversariality.

  • loss – The loss function which the attack should use for optimization.

  • max_iter (int) – Maximum number of iterations.

  • eps (float) – Maximum perturbation that the attacker can introduce.

  • p_init (float) – Initial fraction of elements.

  • nb_restarts (int) – Number of restarts.

  • batch_size (int) – Batch size for estimator evaluations.

  • verbose (bool) – Show progress bars.

generate(x: ndarray, y: ndarray | None = None, **kwargs) ndarray

Generate adversarial samples and return them in an array.

Return type:

ndarray

Parameters:
  • x (ndarray) – An array with the original inputs.

  • y – Target values (class labels) one-hot-encoded of shape (nb_samples, nb_classes) or indices of shape (nb_samples,). Only provide this parameter if you’d like to use true labels when crafting adversarial samples. Otherwise, model predictions are used as labels to avoid the “label leaking” effect (explained in this paper: https://arxiv.org/abs/1611.01236). Default is None.

Returns:

An array holding the adversarial examples.

Targeted Universal Perturbation Attack

class art.attacks.evasion.TargetedUniversalPerturbation(classifier: CLASSIFIER_TYPE, attacker: str = 'fgsm', attacker_params: Dict[str, Any] | None = None, delta: float = 0.2, max_iter: int = 20, eps: float = 10.0, norm: int | float | str = inf)

Implementation of the attack from Hirano and Takemoto (2019). Computes a fixed perturbation to be applied to all future inputs. To this end, it can use any adversarial attack method.

__init__(classifier: CLASSIFIER_TYPE, attacker: str = 'fgsm', attacker_params: Dict[str, Any] | None = None, delta: float = 0.2, max_iter: int = 20, eps: float = 10.0, norm: int | float | str = inf)
Parameters:
  • classifier – A trained classifier.

  • attacker (str) – Adversarial attack name. Default is ‘fgsm’. Supported names: ‘simba’.

  • attacker_params – Parameters specific to the adversarial attack. If this parameter is not specified, the default parameters of the chosen attack will be used.

  • delta (float) – The maximum acceptable rate of correctly classified adversarial examples by the classifier. The attack will stop when the targeted success rate exceeds (1 - delta). ‘delta’ should be in the range [0, 1].

  • max_iter (int) – The maximum number of iterations for computing universal perturbation.

  • eps (float) – The perturbation magnitude, which controls the strength of the universal perturbation applied to the input samples. A larger eps value will result in a more noticeable perturbation, potentially leading to higher attack success rates but also increasing the visual distortion in the generated adversarial examples. Default is 10.0.

  • norm – The norm of the adversarial perturbation. Possible values: “inf”, np.inf, 2

generate(x: ndarray, y: ndarray | None = None, **kwargs) ndarray

Generate adversarial samples and return them in an array.

Return type:

ndarray

Parameters:
  • x (ndarray) – An array with the original inputs.

  • y – The target labels for the targeted perturbation. The shape of y should match the number of instances in x.

Returns:

An array holding the adversarial examples.

Raises:

ValueError: if the labels y are None or if the attack has not been tested for binary classification with a single output classifier.

Universal Perturbation Attack

class art.attacks.evasion.UniversalPerturbation(classifier: CLASSIFIER_TYPE, attacker: str = 'deepfool', attacker_params: Dict[str, Any] | None = None, delta: float = 0.2, max_iter: int = 20, eps: float = 10.0, norm: int | float | str = inf, batch_size: int = 32, verbose: bool = True)

Implementation of the attack from Moosavi-Dezfooli et al. (2016). Computes a fixed perturbation to be applied to all future inputs. To this end, it can use any adversarial attack method.

__init__(classifier: CLASSIFIER_TYPE, attacker: str = 'deepfool', attacker_params: Dict[str, Any] | None = None, delta: float = 0.2, max_iter: int = 20, eps: float = 10.0, norm: int | float | str = inf, batch_size: int = 32, verbose: bool = True) None
Parameters:
  • classifier – A trained classifier.

  • attacker (str) – Adversarial attack name. Default is ‘deepfool’. Supported names: ‘carlini’, ‘carlini_inf’, ‘deepfool’, ‘fgsm’, ‘bim’, ‘pgd’, ‘margin’, ‘ead’, ‘newtonfool’, ‘jsma’, ‘vat’, ‘simba’.

  • attacker_params – Parameters specific to the adversarial attack. If this parameter is not specified, the default parameters of the chosen attack will be used.

  • delta (float) – desired accuracy

  • max_iter (int) – The maximum number of iterations for computing universal perturbation.

  • eps (float) – Attack step size (input variation).

  • norm – The norm of the adversarial perturbation. Possible values: “inf”, np.inf, 2.

  • batch_size (int) – Batch size for model evaluations in UniversalPerturbation.

  • verbose (bool) – Show progress bars.

property converged: bool | None

The convergence of universal perturbation generation.

Returns:

True if generation of universal perturbation has converged.

property fooling_rate: float | None

The fooling rate of the universal perturbation on the most recent call to generate.

Returns:

Fooling Rate.

generate(x: ndarray, y: ndarray | None = None, **kwargs) ndarray

Generate adversarial samples and return them in an array.

Return type:

ndarray

Parameters:
  • x (ndarray) – An array with the original inputs.

  • y – An array with the original labels to be predicted.

Returns:

An array holding the adversarial examples.

property noise: ndarray | None

The universal perturbation.

Returns:

Universal perturbation.

Virtual Adversarial Method

class art.attacks.evasion.VirtualAdversarialMethod(classifier: CLASSIFIER_TYPE, max_iter: int = 10, finite_diff: float = 1e-06, eps: float = 0.1, batch_size: int = 1, verbose: bool = True)

This attack was originally proposed by Miyato et al. (2016) and was used for virtual adversarial training.

__init__(classifier: CLASSIFIER_TYPE, max_iter: int = 10, finite_diff: float = 1e-06, eps: float = 0.1, batch_size: int = 1, verbose: bool = True) None

Create a VirtualAdversarialMethod instance.

Parameters:
  • classifier – A trained classifier.

  • eps (float) – Attack step (max input variation).

  • finite_diff (float) – The finite difference parameter.

  • max_iter (int) – The maximum number of iterations.

  • batch_size (int) – Size of the batch on which adversarial samples are generated.

  • verbose (bool) – Show progress bars.

generate(x: ndarray, y: ndarray | None = None, **kwargs) ndarray

Generate adversarial samples and return them in an array.

Return type:

ndarray

Parameters:
  • x (ndarray) – An array with the original inputs to be attacked.

  • y – An array with the original labels to be predicted.

Returns:

An array holding the adversarial examples.

Wasserstein Attack

class art.attacks.evasion.Wasserstein(estimator: CLASSIFIER_LOSS_GRADIENTS_TYPE, targeted: bool = False, regularization: float = 3000.0, p: int = 2, kernel_size: int = 5, eps_step: float = 0.1, norm: str = 'wasserstein', ball: str = 'wasserstein', eps: float = 0.3, eps_iter: int = 10, eps_factor: float = 1.1, max_iter: int = 400, conjugate_sinkhorn_max_iter: int = 400, projected_sinkhorn_max_iter: int = 400, batch_size: int = 1, verbose: bool = True)

Implements Wasserstein Adversarial Examples via Projected Sinkhorn Iterations as evasion attack.

__init__(estimator: CLASSIFIER_LOSS_GRADIENTS_TYPE, targeted: bool = False, regularization: float = 3000.0, p: int = 2, kernel_size: int = 5, eps_step: float = 0.1, norm: str = 'wasserstein', ball: str = 'wasserstein', eps: float = 0.3, eps_iter: int = 10, eps_factor: float = 1.1, max_iter: int = 400, conjugate_sinkhorn_max_iter: int = 400, projected_sinkhorn_max_iter: int = 400, batch_size: int = 1, verbose: bool = True)

Create a Wasserstein attack instance.

Parameters:
  • estimator – A trained estimator.

  • targeted (bool) – Indicates whether the attack is targeted (True) or untargeted (False).

  • regularization (float) – Entropy regularization.

  • p (int) – The p-wasserstein distance.

  • kernel_size (int) – Kernel size for computing the cost matrix.

  • eps_step (float) – Attack step size (input variation) at each iteration.

  • norm (str) – The norm of the adversarial perturbation. Possible values: inf, 1, 2 or wasserstein.

  • ball (str) – The ball of the adversarial perturbation. Possible values: inf, 1, 2 or wasserstein.

  • eps (float) – Maximum perturbation that the attacker can introduce.

  • eps_iter (int) – Number of iterations to increase the epsilon.

  • eps_factor (float) – Factor to increase the epsilon.

  • max_iter (int) – The maximum number of iterations.

  • conjugate_sinkhorn_max_iter (int) – The maximum number of iterations for the conjugate sinkhorn optimizer.

  • projected_sinkhorn_max_iter (int) – The maximum number of iterations for the projected sinkhorn optimizer.

  • batch_size (int) – Size of batches.

  • verbose (bool) – Show progress bars.

generate(x: ndarray, y: ndarray | None = None, **kwargs) ndarray

Generate adversarial samples and return them in an array.

Return type:

ndarray

Parameters:
  • x (ndarray) – An array with the original inputs.

  • y – Target values (class labels) one-hot-encoded of shape (nb_samples, nb_classes) or indices of shape (nb_samples,). Only provide this parameter if you’d like to use true labels when crafting adversarial samples. Otherwise, model predictions are used as labels to avoid the “label leaking” effect (explained in this paper: https://arxiv.org/abs/1611.01236). Default is None.

  • cost_matrix (np.ndarray) – A non-negative cost matrix.

Returns:

An array holding the adversarial examples.

Zeroth-Order Optimization (ZOO) Attack

class art.attacks.evasion.ZooAttack(classifier: CLASSIFIER_TYPE, confidence: float = 0.0, targeted: bool = False, learning_rate: float = 0.01, max_iter: int = 10, binary_search_steps: int = 1, initial_const: float = 0.001, abort_early: bool = True, use_resize: bool = True, use_importance: bool = True, nb_parallel: int = 128, batch_size: int = 1, variable_h: float = 0.0001, verbose: bool = True)

The black-box zeroth-order optimization attack from Pin-Yu Chen et al. (2018). This attack is a variant of the C&W attack which uses ADAM coordinate descent to perform numerical estimation of gradients.

__init__(classifier: CLASSIFIER_TYPE, confidence: float = 0.0, targeted: bool = False, learning_rate: float = 0.01, max_iter: int = 10, binary_search_steps: int = 1, initial_const: float = 0.001, abort_early: bool = True, use_resize: bool = True, use_importance: bool = True, nb_parallel: int = 128, batch_size: int = 1, variable_h: float = 0.0001, verbose: bool = True)

Create a ZOO attack instance.

Parameters:
  • classifier – A trained classifier.

  • confidence (float) – Confidence of adversarial examples: a higher value produces examples that are farther away, from the original input, but classified with higher confidence as the target class.

  • targeted (bool) – Should the attack target one specific class.

  • learning_rate (float) – The initial learning rate for the attack algorithm. Smaller values produce better results but are slower to converge.

  • max_iter (int) – The maximum number of iterations.

  • binary_search_steps (int) – Number of times to adjust constant with binary search (positive value).

  • initial_const (float) – The initial trade-off constant c to use to tune the relative importance of distance and confidence. If binary_search_steps is large, the initial constant is not important, as discussed in Carlini and Wagner (2016).

  • abort_early (bool) – True if gradient descent should be abandoned when it gets stuck.

  • use_resize (bool) – True if to use the resizing strategy from the paper: first, compute attack on inputs resized to 32x32, then increase size if needed to 64x64, followed by 128x128.

  • use_importance (bool) – True if to use importance sampling when choosing coordinates to update.

  • nb_parallel (int) – Number of coordinate updates to run in parallel. A higher value for nb_parallel should be preferred over a large batch size.

  • batch_size (int) – Internal size of batches on which adversarial samples are generated. Small batch sizes are encouraged for ZOO, as the algorithm already runs nb_parallel coordinate updates in parallel for each sample. The batch size is a multiplier of nb_parallel in terms of memory consumption.

  • variable_h (float) – Step size for numerical estimation of derivatives.

  • verbose (bool) – Show progress bars.

generate(x: ndarray, y: ndarray | None = None, **kwargs) ndarray

Generate adversarial samples and return them in an array.

Return type:

ndarray

Parameters:
  • x (ndarray) – An array with the original inputs to be attacked.

  • y – Target values (class labels) one-hot-encoded of shape (nb_samples, nb_classes) or indices of shape (nb_samples,).

Returns:

An array holding the adversarial examples.