art.attacks.poisoning

Module providing poisoning attacks under a common interface.

Backdoor Attack DGM ReD

class art.attacks.poisoning.BackdoorAttackDGMReDTensorFlowV2(generator: art.estimators.generation.tensorflow.TensorFlowV2Generator)

Class implementation of backdoor-based RED poisoning attack on DGM.

__init__(generator: art.estimators.generation.tensorflow.TensorFlowV2Generator) None

Initialize a backdoor RED poisoning attack. :param generator: the generator to be poisoned

fidelity(z_trigger: numpy.ndarray, x_target: numpy.ndarray)

Calculates the fidelity of the poisoned model’s target sample w.r.t. the original x_target sample :type x_target: ndarray :type z_trigger: ndarray :param z_trigger: the secret backdoor trigger that will produce the target :param x_target: the target to produce when using the trigger

poison_estimator(z_trigger: numpy.ndarray, x_target: numpy.ndarray, batch_size=32, max_iter=100, lambda_p=0.1, verbose=- 1, **kwargs) art.estimators.generation.tensorflow.TensorFlowV2Generator

Creates a backdoor in the generative model :rtype: TensorFlowV2Generator :type verbose: int :type lambda_p: float :type max_iter: int :type batch_size: int :type x_target: ndarray :type z_trigger: ndarray :param z_trigger: the secret backdoor trigger that will produce the target :param x_target: the target to produce when using the trigger :param batch_size: batch_size of images used to train generator :param max_iter: total number of iterations for performing the attack :param lambda_p: the lambda parameter balancing how much we want the auxiliary loss to be applied :param verbose: whether the fidelity should be displayed during training

Backdoor Attack DGM Trail

class art.attacks.poisoning.BackdoorAttackDGMTrailTensorFlowV2(gan: art.estimators.gan.tensorflow.TensorFlowV2GAN)

Class implementation of backdoor-based RED poisoning attack on DGM.

__init__(gan: art.estimators.gan.tensorflow.TensorFlowV2GAN) None

Initialize a backdoor Trail poisoning attack.

Parameters

gan (TensorFlowV2GAN) – the GAN to be poisoned

fidelity(z_trigger: numpy.ndarray, x_target: numpy.ndarray)

Calculates the fidelity of the poisoned model’s target sample w.r.t. the original x_target sample

Parameters
  • z_trigger (ndarray) – the secret backdoor trigger that will produce the target

  • x_target (ndarray) – the target to produce when using the trigger

poison_estimator(z_trigger: numpy.ndarray, x_target: numpy.ndarray, batch_size=32, max_iter=100, lambda_p=0.1, verbose=- 1, **kwargs) GENERATOR_TYPE

Creates a backdoor in the generative model

Parameters
  • z_trigger (ndarray) – the secret backdoor trigger that will produce the target

  • x_target (ndarray) – the target to produce when using the trigger

  • batch_size (int) – batch_size of images used to train generator

  • max_iter (int) – total number of iterations for performing the attack

  • lambda_p (float) – the lambda parameter balancing how much we want the auxiliary loss to be applied

  • verbose (int) – whether the fidelity should be displayed during training

Adversarial Embedding Attack

class art.attacks.poisoning.PoisoningAttackAdversarialEmbedding(classifier: CLASSIFIER_TYPE, backdoor: art.attacks.poisoning.backdoor_attack.PoisoningAttackBackdoor, feature_layer: Union[int, str], target: Union[numpy.ndarray, List[Tuple[numpy.ndarray, numpy.ndarray]]], pp_poison: Union[float, List[float]] = 0.05, discriminator_layer_1: int = 256, discriminator_layer_2: int = 128, regularization: float = 10, learning_rate: float = 0.0001, clone=True)

Implementation of Adversarial Embedding attack by Tan, Shokri (2019). “Bypassing Backdoor Detection Algorithms in Deep Learning”

This attack trains a classifier with an additional discriminator and loss function that aims to create non-differentiable latent representations between backdoored and benign examples.

__init__(classifier: CLASSIFIER_TYPE, backdoor: art.attacks.poisoning.backdoor_attack.PoisoningAttackBackdoor, feature_layer: Union[int, str], target: Union[numpy.ndarray, List[Tuple[numpy.ndarray, numpy.ndarray]]], pp_poison: Union[float, List[float]] = 0.05, discriminator_layer_1: int = 256, discriminator_layer_2: int = 128, regularization: float = 10, learning_rate: float = 0.0001, clone=True)

Initialize an Feature Collision Clean-Label poisoning attack

Parameters
  • classifier – A neural network classifier.

  • backdoor (PoisoningAttackBackdoor) – The backdoor attack used to poison samples

  • feature_layer – The layer of the original network to extract features from

  • target – The target label to poison

  • pp_poison – The percentage of training data to poison

  • discriminator_layer_1 (int) – The size of the first discriminator layer

  • discriminator_layer_2 (int) – The size of the second discriminator layer

  • regularization (float) – The regularization constant for the backdoor recognition part of the loss function

  • learning_rate (float) – The learning rate of clean-label attack optimization.

  • clone (bool) – Whether or not to clone the model or apply the attack on the original model

get_training_data() Optional[Tuple[numpy.ndarray, Optional[numpy.ndarray], Optional[numpy.ndarray]]]

Returns the training data generated from the last call to fit

Returns

If fit has been called, return the last data, labels, and backdoor labels used to train model otherwise return None

poison(x: numpy.ndarray, y: Optional[numpy.ndarray] = None, broadcast=False, **kwargs) Tuple[numpy.ndarray, numpy.ndarray]

Calls perturbation function on input x and target labels y

Return type

Tuple

Parameters
  • x (ndarray) – An array with the points that initialize attack points.

  • y – The target labels for the attack.

  • broadcast (bool) – whether or not to broadcast single target label

Returns

An tuple holding the (poisoning_examples, poisoning_labels).

poison_estimator(x: numpy.ndarray, y: numpy.ndarray, batch_size: int = 64, nb_epochs: int = 10, **kwargs) CLASSIFIER_TYPE

Train a poisoned model and return it :type nb_epochs: int :type batch_size: int :type y: ndarray :type x: ndarray :param x: Training data :param y: Training labels :param batch_size: The size of the batches used for training :param nb_epochs: The number of epochs to train for :return: A classifier with embedded backdoors

Backdoor Poisoning Attack

class art.attacks.poisoning.PoisoningAttackBackdoor(perturbation: Union[Callable, List[Callable]])

Implementation of backdoor attacks introduced in Gu et al., 2017.

Applies a number of backdoor perturbation functions and switches label to target label

__init__(perturbation: Union[Callable, List[Callable]]) None

Initialize a backdoor poisoning attack.

Parameters

perturbation – A single perturbation function or list of perturbation functions that modify input.

poison(x: numpy.ndarray, y: Optional[numpy.ndarray] = None, broadcast=False, **kwargs) Tuple[numpy.ndarray, numpy.ndarray]

Calls perturbation function on input x and returns the perturbed input and poison labels for the data.

Return type

Tuple

Parameters
  • x (ndarray) – An array with the points that initialize attack points.

  • y – The target labels for the attack.

  • broadcast (bool) – whether or not to broadcast single target label

Returns

An tuple holding the (poisoning_examples, poisoning_labels).

Gradient Matching Attack

class art.attacks.poisoning.GradientMatchingAttack(classifier: CLASSIFIER_NEURALNETWORK_TYPE, percent_poison: float, epsilon: float = 0.1, max_trials: int = 8, max_epochs: int = 250, learning_rate_schedule: Tuple[List[float], List[int]] = ([0.1, 0.01, 0.001, 0.0001], [100, 150, 200, 220]), batch_size: int = 128, clip_values: Tuple[float, float] = (0, 1.0), verbose: int = 1)

Implementation of Gradient Matching Attack by Geiping, et. al. 2020. “Witches’ Brew: Industrial Scale Data Poisoning via Gradient Matching”

__init__(classifier: CLASSIFIER_NEURALNETWORK_TYPE, percent_poison: float, epsilon: float = 0.1, max_trials: int = 8, max_epochs: int = 250, learning_rate_schedule: Tuple[List[float], List[int]] = ([0.1, 0.01, 0.001, 0.0001], [100, 150, 200, 220]), batch_size: int = 128, clip_values: Tuple[float, float] = (0, 1.0), verbose: int = 1)

Initialize a Gradient Matching Clean-Label poisoning attack (Witches’ Brew).

Parameters
  • classifier – The proxy classifier used for the attack.

  • percent_poison (float) – The ratio of samples to poison among x_train, with range [0,1].

  • epsilon (float) – The L-inf perturbation budget.

  • max_trials (int) – The maximum number of restarts to optimize the poison.

  • max_epochs (int) – The maximum number of epochs to optimize the train per trial.

  • learning_rate_schedule (Tuple) – The learning rate schedule to optimize the poison. A List of (learning rate, epoch) pairs. The learning rate is used if the current epoch is less than the specified epoch.

  • batch_size (int) – Batch size.

  • clip_values (Tuple) – The range of the input features to the classifier.

  • verbose (int) – Show progress bars.

poison(x_trigger: numpy.ndarray, y_trigger: numpy.ndarray, x_train: numpy.ndarray, y_train: numpy.ndarray) Tuple[numpy.ndarray, numpy.ndarray]

Optimizes a portion of poisoned samples from x_train to make a model classify x_target as y_target by matching the gradients.

Return type

Tuple

Parameters
  • x_trigger (ndarray) – A list of samples to use as triggers.

  • y_trigger (ndarray) – A list of target classes to classify the triggers into.

  • x_train (ndarray) – A list of training data to poison a portion of.

  • y_train (ndarray) – A list of labels for x_train.

Returns

A list of poisoned samples, and y_train.

Hidden Trigger Backdoor Attack

class art.attacks.poisoning.HiddenTriggerBackdoor(classifier: CLASSIFIER_NEURALNETWORK_TYPE, target: numpy.ndarray, source: numpy.ndarray, feature_layer: Union[str, int], backdoor: art.attacks.poisoning.backdoor_attack.PoisoningAttackBackdoor, eps: float = 0.1, learning_rate: float = 0.001, decay_coeff: float = 0.95, decay_iter: Union[int, List[int]] = 2000, stopping_threshold: float = 10, max_iter: int = 5000, batch_size: float = 100, poison_percent: float = 0.1, is_index: bool = False, verbose: bool = True, print_iter: int = 100)

Implementation of Hidden Trigger Backdoor Attack by Saha et al 2019. “Hidden Trigger Backdoor Attacks

__init__(classifier: CLASSIFIER_NEURALNETWORK_TYPE, target: numpy.ndarray, source: numpy.ndarray, feature_layer: Union[str, int], backdoor: art.attacks.poisoning.backdoor_attack.PoisoningAttackBackdoor, eps: float = 0.1, learning_rate: float = 0.001, decay_coeff: float = 0.95, decay_iter: Union[int, List[int]] = 2000, stopping_threshold: float = 10, max_iter: int = 5000, batch_size: float = 100, poison_percent: float = 0.1, is_index: bool = False, verbose: bool = True, print_iter: int = 100) None

Creates a new Hidden Trigger Backdoor poisoning attack.

Parameters
  • classifier – A trained neural network classifier.

  • target (ndarray) – The target class/indices to poison. Triggers added to inputs not in the target class will result in misclassifications to the target class. If an int, it represents a label. Otherwise, it is an array of indices.

  • source (ndarray) – The class/indices which will have a trigger added to cause misclassification If an int, it represents a label. Otherwise, it is an array of indices.

  • feature_layer – The name of the feature representation layer

  • backdoor (PoisoningAttackBackdoor) – A PoisoningAttackBackdoor that adds a backdoor trigger to the input.

  • eps (float) – Maximum perturbation that the attacker can introduce.

  • learning_rate (float) – The learning rate of clean-label attack optimization.

  • decay_coeff (float) – The decay coefficient of the learning rate.

  • decay_iter – The number of iterations before the learning rate decays

  • stopping_threshold (float) – Stop iterations after loss is less than this threshold.

  • max_iter (int) – The maximum number of iterations for the attack.

  • batch_size (float) – The number of samples to draw per batch.

  • poison_percent (float) – The percentage of the data to poison. This is ignored if indices are provided

  • is_index (bool) – If true, the source and target params are assumed to represent indices rather than a class label. poison_percent is ignored if true.

  • verbose (bool) – Show progress bars.

  • print_iter (int) – The number of iterations to print the current loss progress.

poison(x: numpy.ndarray, y: Optional[numpy.ndarray] = None, **kwargs) Tuple[numpy.ndarray, numpy.ndarray]

Calls perturbation function on the dataset x and returns only the perturbed inputs and their indices in the dataset.

Return type

Tuple

Parameters
  • x (ndarray) – An array in the shape NxCxWxH with the points to draw source and target samples from. Source indicates the class(es) that the backdoor would be added to to cause misclassification into the target label. Target indicates the class that the backdoor should cause misclassification into.

  • y – The labels of the provided samples. If none, we will use the classifier to label the data.

Returns

An tuple holding the (poisoning_examples, poisoning_labels).

Bullseye Polytope Attack

class art.attacks.poisoning.BullseyePolytopeAttackPyTorch(classifier: Union[CLASSIFIER_NEURALNETWORK_TYPE, List[CLASSIFIER_NEURALNETWORK_TYPE]], target: numpy.ndarray, feature_layer: Union[str, int, List[Union[str, int]]], opt: str = 'adam', max_iter: int = 4000, learning_rate: float = 0.04, momentum: float = 0.9, decay_iter: Union[int, List[int]] = 10000, decay_coeff: float = 0.5, epsilon: float = 0.1, dropout: float = 0.3, net_repeat: int = 1, endtoend: bool = True, batch_size: int = 128, verbose: bool = True)

Implementation of Bullseye Polytope Attack by Aghakhani, et. al. 2020. “Bullseye Polytope: A Scalable Clean-Label Poisoning Attack with Improved Transferability”

This implementation is based on UCSB’s original code here: https://github.com/ucsb-seclab/BullseyePoison

__init__(classifier: Union[CLASSIFIER_NEURALNETWORK_TYPE, List[CLASSIFIER_NEURALNETWORK_TYPE]], target: numpy.ndarray, feature_layer: Union[str, int, List[Union[str, int]]], opt: str = 'adam', max_iter: int = 4000, learning_rate: float = 0.04, momentum: float = 0.9, decay_iter: Union[int, List[int]] = 10000, decay_coeff: float = 0.5, epsilon: float = 0.1, dropout: float = 0.3, net_repeat: int = 1, endtoend: bool = True, batch_size: int = 128, verbose: bool = True)

Initialize an Feature Collision Clean-Label poisoning attack

Parameters
  • classifier – The proxy classifiers used for the attack. Can be a single classifier or list of classifiers with varying architectures.

  • target (ndarray) – The target input(s) of shape (N, W, H, C) to misclassify at test time. Multiple targets will be averaged.

  • feature_layer – The name(s) of the feature representation layer(s).

  • opt (str) – The optimizer to use for the attack. Can be ‘adam’ or ‘sgd’

  • max_iter (int) – The maximum number of iterations for the attack.

  • learning_rate (float) – The learning rate of clean-label attack optimization.

  • momentum (float) – The momentum of clean-label attack optimization.

  • decay_iter – Which iterations to decay the learning rate. Can be a integer (every N iterations) or list of integers [0, 500, 1500]

  • decay_coeff (float) – The decay coefficient of the learning rate.

  • epsilon (float) – The perturbation budget

  • dropout (float) – Dropout to apply while training

  • net_repeat (int) – The number of times to repeat prediction on each network

  • endtoend (bool) – True for end-to-end training. False for transfer learning.

  • batch_size (int) – Batch size.

  • verbose (bool) – Show progress bars.

poison(x: numpy.ndarray, y: Optional[numpy.ndarray] = None, **kwargs) Tuple[numpy.ndarray, numpy.ndarray]

Iteratively finds optimal attack points starting at values at x

Return type

Tuple

Parameters
  • x (ndarray) – The base images to begin the poison process.

  • y – Target label

Returns

An tuple holding the (poisoning examples, poisoning labels).

Clean Label Backdoor Attack

class art.attacks.poisoning.PoisoningAttackCleanLabelBackdoor(backdoor: art.attacks.poisoning.backdoor_attack.PoisoningAttackBackdoor, proxy_classifier: CLASSIFIER_LOSS_GRADIENTS_TYPE, target: numpy.ndarray, pp_poison: float = 0.33, norm: Union[int, float, str] = inf, eps: float = 0.3, eps_step: float = 0.1, max_iter: int = 100, num_random_init: int = 0)

Implementation of Clean-Label Backdoor Attack introduced in Turner et al., 2018.

Applies a number of backdoor perturbation functions and does not change labels.

__init__(backdoor: art.attacks.poisoning.backdoor_attack.PoisoningAttackBackdoor, proxy_classifier: CLASSIFIER_LOSS_GRADIENTS_TYPE, target: numpy.ndarray, pp_poison: float = 0.33, norm: Union[int, float, str] = inf, eps: float = 0.3, eps_step: float = 0.1, max_iter: int = 100, num_random_init: int = 0) None

Creates a new Clean Label Backdoor poisoning attack

Parameters
  • backdoor (PoisoningAttackBackdoor) – the backdoor chosen for this attack

  • proxy_classifier – the classifier for this attack ideally it solves the same or similar classification task as the original classifier

  • target (ndarray) – The target label to poison

  • pp_poison (float) – The percentage of the data to poison. Note: Only data within the target label is poisoned

  • norm – The norm of the adversarial perturbation supporting “inf”, np.inf, 1 or 2.

  • eps (float) – Maximum perturbation that the attacker can introduce.

  • eps_step (float) – Attack step size (input variation) at each iteration.

  • max_iter (int) – The maximum number of iterations.

  • num_random_init (int) – Number of random initialisations within the epsilon ball. For num_random_init=0 starting at the original input.

poison(x: numpy.ndarray, y: Optional[numpy.ndarray] = None, broadcast: bool = True, **kwargs) Tuple[numpy.ndarray, numpy.ndarray]

Calls perturbation function on input x and returns the perturbed input and poison labels for the data.

Return type

Tuple

Parameters
  • x (ndarray) – An array with the points that initialize attack points.

  • y – The target labels for the attack.

  • broadcast (bool) – whether or not to broadcast single target label

Returns

An tuple holding the (poisoning_examples, poisoning_labels).

Feature Collision Attack

class art.attacks.poisoning.FeatureCollisionAttack(classifier: CLASSIFIER_NEURALNETWORK_TYPE, target: numpy.ndarray, feature_layer: Union[str, int], learning_rate: float = 127500.0, decay_coeff: float = 0.5, stopping_tol: float = 1e-10, obj_threshold: Optional[float] = None, num_old_obj: int = 40, max_iter: int = 120, similarity_coeff: float = 256.0, watermark: Optional[float] = None, verbose: bool = True)

Close implementation of Feature Collision Poisoning Attack by Shafahi, Huang, et al 2018. “Poison Frogs! Targeted Clean-Label Poisoning Attacks on Neural Networks”

This implementation dynamically calculates the dimension of the feature layer, and doesn’t hardcode this value to 2048 as done in the paper. Thus we recommend using larger values for the similarity_coefficient.

__init__(classifier: CLASSIFIER_NEURALNETWORK_TYPE, target: numpy.ndarray, feature_layer: Union[str, int], learning_rate: float = 127500.0, decay_coeff: float = 0.5, stopping_tol: float = 1e-10, obj_threshold: Optional[float] = None, num_old_obj: int = 40, max_iter: int = 120, similarity_coeff: float = 256.0, watermark: Optional[float] = None, verbose: bool = True)

Initialize an Feature Collision Clean-Label poisoning attack

Parameters
  • classifier – A trained neural network classifier.

  • target (ndarray) – The target input to misclassify at test time.

  • feature_layer – The name of the feature representation layer.

  • learning_rate (float) – The learning rate of clean-label attack optimization.

  • decay_coeff (float) – The decay coefficient of the learning rate.

  • stopping_tol (float) – Stop iterations after changes in attacks in less than this threshold.

  • obj_threshold – Stop iterations after changes in objectives values are less than this threshold.

  • num_old_obj (int) – The number of old objective values to store.

  • max_iter (int) – The maximum number of iterations for the attack.

  • similarity_coeff (float) – The maximum number of iterations for the attack.

  • watermark – Whether The opacity of the watermarked target image.

  • verbose (bool) – Show progress bars.

backward_step(base: numpy.ndarray, feature_rep: numpy.ndarray, poison: numpy.ndarray) numpy.ndarray

Backward part of forward-backward splitting algorithm

Return type

ndarray

Parameters
  • base (ndarray) – The base image that the poison was initialized with.

  • feature_rep (ndarray) – Numpy activations at the target layer.

  • poison (ndarray) – The current poison samples.

Returns

Poison example closer in feature representation to target space.

forward_step(poison: numpy.ndarray) numpy.ndarray

Forward part of forward-backward splitting algorithm.

Return type

ndarray

Parameters

poison (ndarray) – the current poison samples.

Returns

poison example closer in feature representation to target space.

objective(poison_feature_rep: numpy.ndarray, target_feature_rep: numpy.ndarray, base_image: numpy.ndarray, poison: numpy.ndarray) float

Objective function of the attack

Return type

float

Parameters
  • poison_feature_rep (ndarray) – The numpy activations of the poison image.

  • target_feature_rep (ndarray) – The numpy activations of the target image.

  • base_image (ndarray) – The initial image used to poison.

  • poison (ndarray) – The current poison image.

Returns

The objective of the optimization.

poison(x: numpy.ndarray, y: Optional[numpy.ndarray] = None, **kwargs) Tuple[numpy.ndarray, numpy.ndarray]

Iteratively finds optimal attack points starting at values at x

Return type

Tuple

Parameters
  • x (ndarray) – The base images to begin the poison process.

  • y – Not used in this attack (clean-label).

Returns

An tuple holding the (poisoning examples, poisoning labels).

Poisoning SVM Attack

class art.attacks.poisoning.PoisoningAttackSVM(classifier: art.estimators.classification.scikitlearn.ScikitlearnSVC, step: Optional[float] = None, eps: Optional[float] = None, x_train: Optional[numpy.ndarray] = None, y_train: Optional[numpy.ndarray] = None, x_val: Optional[numpy.ndarray] = None, y_val: Optional[numpy.ndarray] = None, max_iter: int = 100, verbose: bool = True)

Close implementation of poisoning attack on Support Vector Machines (SVM) by Biggio et al.

__init__(classifier: art.estimators.classification.scikitlearn.ScikitlearnSVC, step: Optional[float] = None, eps: Optional[float] = None, x_train: Optional[numpy.ndarray] = None, y_train: Optional[numpy.ndarray] = None, x_val: Optional[numpy.ndarray] = None, y_val: Optional[numpy.ndarray] = None, max_iter: int = 100, verbose: bool = True) None

Initialize an SVM poisoning attack.

Parameters
  • classifier – A trained ScikitlearnSVC classifier.

  • step – The step size of the classifier.

  • eps – The minimum difference in loss before convergence of the classifier.

  • x_train – The training data used for classification.

  • y_train – The training labels used for classification.

  • x_val – The validation data used to test the attack.

  • y_val – The validation labels used to test the attack.

  • max_iter (int) – The maximum number of iterations for the attack.

  • verbose (bool) – Show progress bars.

Raises

NotImplementedError, TypeError – If the argument classifier has the wrong type.

attack_gradient(attack_point: numpy.ndarray, tol: float = 0.0001) numpy.ndarray

Calculates the attack gradient, or dP for this attack. See equation 8 in Biggio et al. Ch. 14

Return type

ndarray

Parameters
  • attack_point (ndarray) – The current attack point.

  • tol (float) – Tolerance level.

Returns

The attack gradient.

generate_attack_point(x_attack: numpy.ndarray, y_attack: numpy.ndarray) numpy.ndarray

Generate a single poison attack the model, using x_val and y_val as validation points. The attack begins at the point init_attack. The attack class will be the opposite of the model’s classification for init_attack.

Return type

ndarray

Parameters
  • x_attack (ndarray) – The initial attack point.

  • y_attack (ndarray) – The initial attack label.

Returns

A tuple containing the final attack point and the poisoned model.

poison(x: numpy.ndarray, y: Optional[numpy.ndarray] = None, **kwargs) Tuple[numpy.ndarray, numpy.ndarray]

Iteratively finds optimal attack points starting at values at x.

Return type

Tuple

Parameters
  • x (ndarray) – An array with the points that initialize attack points.

  • y – The target labels for the attack.

Returns

A tuple holding the (poisoning_examples, poisoning_labels).

predict_sign(vec: numpy.ndarray) numpy.ndarray

Predicts the inputs by binary classifier and outputs -1 and 1 instead of 0 and 1.

Return type

ndarray

Parameters

vec (ndarray) – An input array.

Returns

An array of -1/1 predictions.