art.attacks.poisoning

Module providing poisoning attacks under a common interface.

Adversarial Embedding Attack

class art.attacks.poisoning.PoisoningAttackAdversarialEmbedding(classifier: CLASSIFIER_TYPE, backdoor: art.attacks.poisoning.backdoor_attack.PoisoningAttackBackdoor, feature_layer: Union[int, str], target: Union[numpy.ndarray, List[Tuple[numpy.ndarray, numpy.ndarray]]], pp_poison: Union[float, List[float]] = 0.05, discriminator_layer_1: int = 256, discriminator_layer_2: int = 128, regularization: float = 10, learning_rate: float = 0.0001, clone=True)

Implementation of Adversarial Embedding attack by Tan, Shokri (2019). “Bypassing Backdoor Detection Algorithms in Deep Learning”

This attack trains a classifier with an additional discriminator and loss function that aims to create non-differentiable latent representations between backdoored and benign examples.

__init__(classifier: CLASSIFIER_TYPE, backdoor: art.attacks.poisoning.backdoor_attack.PoisoningAttackBackdoor, feature_layer: Union[int, str], target: Union[numpy.ndarray, List[Tuple[numpy.ndarray, numpy.ndarray]]], pp_poison: Union[float, List[float]] = 0.05, discriminator_layer_1: int = 256, discriminator_layer_2: int = 128, regularization: float = 10, learning_rate: float = 0.0001, clone=True)

Initialize an Feature Collision Clean-Label poisoning attack

Parameters
  • classifier – A neural network classifier.

  • backdoor (PoisoningAttackBackdoor) – The backdoor attack used to poison samples

  • feature_layer – The layer of the original network to extract features from

  • target – The target label to poison

  • pp_poison – The percentage of training data to poison

  • discriminator_layer_1 (int) – The size of the first discriminator layer

  • discriminator_layer_2 (int) – The size of the second discriminator layer

  • regularization (float) – The regularization constant for the backdoor recognition part of the loss function

  • learning_rate (float) – The learning rate of clean-label attack optimization.

  • clone (bool) – Whether or not to clone the model or apply the attack on the original model

get_training_data() → Optional[Tuple[numpy.ndarray, numpy.ndarray, numpy.ndarray]]

Returns the training data generated from the last call to fit

Returns

If fit has been called, return the last data, labels, and backdoor labels used to train model otherwise return None

poison(x: numpy.ndarray, y: Optional[numpy.ndarray] = None, broadcast=False, **kwargs) → Tuple[numpy.ndarray, numpy.ndarray]

Calls perturbation function on input x and target labels y

Return type

Tuple

Parameters
  • x (ndarray) – An array with the points that initialize attack points.

  • y – The target labels for the attack.

  • broadcast (bool) – whether or not to broadcast single target label

Returns

An tuple holding the (poisoning_examples, poisoning_labels).

poison_estimator(x: numpy.ndarray, y: numpy.ndarray, batch_size: int = 64, nb_epochs: int = 10, **kwargs) → CLASSIFIER_TYPE

Train a poisoned model and return it :type nb_epochs: int :type batch_size: int :type y: ndarray :type x: ndarray :param x: Training data :param y: Training labels :param batch_size: The size of the batches used for training :param nb_epochs: The number of epochs to train for :return: A classifier with embedded backdoors

Backdoor Poisoning Attack

class art.attacks.poisoning.PoisoningAttackBackdoor(perturbation: Union[Callable, List[Callable]])

Implementation of backdoor attacks introduced in Gu, et. al. 2017

Applies a number of backdoor perturbation functions and switches label to target label

__init__(perturbation: Union[Callable, List[Callable]]) → None

Initialize a backdoor poisoning attack.

Parameters

perturbation – A single perturbation function or list of perturbation functions that modify input.

poison(x: numpy.ndarray, y: Optional[numpy.ndarray] = None, broadcast=False, **kwargs) → Tuple[numpy.ndarray, numpy.ndarray]

Calls perturbation function on input x and returns the perturbed input and poison labels for the data.

Return type

Tuple

Parameters
  • x (ndarray) – An array with the points that initialize attack points.

  • y – The target labels for the attack.

  • broadcast (bool) – whether or not to broadcast single target label

Returns

An tuple holding the (poisoning_examples, poisoning_labels).

Feature Collision Attack

class art.attacks.poisoning.FeatureCollisionAttack(classifier: CLASSIFIER_NEURALNETWORK_TYPE, target: numpy.ndarray, feature_layer: Union[str, int], learning_rate: float = 127500.0, decay_coeff: float = 0.5, stopping_tol: float = 1e-10, obj_threshold: Optional[float] = None, num_old_obj: int = 40, max_iter: int = 120, similarity_coeff: float = 256.0, watermark: Optional[float] = None)

Close implementation of Feature Collision Poisoning Attack by Shafahi, Huang, et al 2018. “Poison Frogs! Targeted Clean-Label Poisoning Attacks on Neural Networks”

This implementation dynamically calculates the dimension of the feature layer, and doesn’t hardcode this value to 2048 as done in the paper. Thus we recommend using larger values for the similarity_coefficient.

__init__(classifier: CLASSIFIER_NEURALNETWORK_TYPE, target: numpy.ndarray, feature_layer: Union[str, int], learning_rate: float = 127500.0, decay_coeff: float = 0.5, stopping_tol: float = 1e-10, obj_threshold: Optional[float] = None, num_old_obj: int = 40, max_iter: int = 120, similarity_coeff: float = 256.0, watermark: Optional[float] = None)

Initialize an Feature Collision Clean-Label poisoning attack

Parameters
  • classifier – A trained neural network classifier.

  • target (ndarray) – The target input to misclassify at test time.

  • feature_layer – The name of the feature representation layer.

  • learning_rate (float) – The learning rate of clean-label attack optimization.

  • decay_coeff (float) – The decay coefficient of the learning rate.

  • stopping_tol (float) – Stop iterations after changes in attacks in less than this threshold.

  • obj_threshold – Stop iterations after changes in objectives values are less than this threshold.

  • num_old_obj (int) – The number of old objective values to store.

  • max_iter (int) – The maximum number of iterations for the attack.

  • similarity_coeff (float) – The maximum number of iterations for the attack.

  • watermark – Whether The opacity of the watermarked target image.

backward_step(base: numpy.ndarray, feature_rep: numpy.ndarray, poison: numpy.ndarray) → numpy.ndarray

Backward part of forward-backward splitting algorithm

Return type

ndarray

Parameters
  • base (ndarray) – The base image that the poison was initialized with.

  • feature_rep (ndarray) – Numpy activations at the target layer.

  • poison (ndarray) – The current poison samples.

Returns

Poison example closer in feature representation to target space.

forward_step(poison: numpy.ndarray) → numpy.ndarray

Forward part of forward-backward splitting algorithm.

Return type

ndarray

Parameters

poison (ndarray) – the current poison samples.

Returns

poison example closer in feature representation to target space.

objective(poison_feature_rep: numpy.ndarray, target_feature_rep: numpy.ndarray, base_image: numpy.ndarray, poison: numpy.ndarray) → float

Objective function of the attack

Return type

float

Parameters
  • poison_feature_rep (ndarray) – The numpy activations of the poison image.

  • target_feature_rep (ndarray) – The numpy activations of the target image.

  • base_image (ndarray) – The initial image used to poison.

  • poison (ndarray) – The current poison image.

Returns

The objective of the optimization.

poison(x: numpy.ndarray, y: Optional[numpy.ndarray] = None, **kwargs) → Tuple[numpy.ndarray, numpy.ndarray]

Iteratively finds optimal attack points starting at values at x

Return type

Tuple

Parameters
  • x (ndarray) – The base images to begin the poison process.

  • y – Not used in this attack (clean-label).

Returns

An tuple holding the (poisoning examples, poisoning labels).

Poisoning SVM Attack

class art.attacks.poisoning.PoisoningAttackSVM(classifier: art.estimators.classification.scikitlearn.ScikitlearnSVC, step: Optional[float] = None, eps: Optional[float] = None, x_train: Optional[numpy.ndarray] = None, y_train: Optional[numpy.ndarray] = None, x_val: Optional[numpy.ndarray] = None, y_val: Optional[numpy.ndarray] = None, max_iter: int = 100)

Close implementation of poisoning attack on Support Vector Machines (SVM) by Biggio et al.

__init__(classifier: art.estimators.classification.scikitlearn.ScikitlearnSVC, step: Optional[float] = None, eps: Optional[float] = None, x_train: Optional[numpy.ndarray] = None, y_train: Optional[numpy.ndarray] = None, x_val: Optional[numpy.ndarray] = None, y_val: Optional[numpy.ndarray] = None, max_iter: int = 100) → None

Initialize an SVM poisoning attack.

Parameters
  • classifier – A trained ScikitlearnSVC classifier.

  • step – The step size of the classifier.

  • eps – The minimum difference in loss before convergence of the classifier.

  • x_train – The training data used for classification.

  • y_train – The training labels used for classification.

  • x_val – The validation data used to test the attack.

  • y_val – The validation labels used to test the attack.

  • max_iter (int) – The maximum number of iterations for the attack.

Raises

NotImplementedError, TypeError – If the argument classifier has the wrong type.

attack_gradient(attack_point: numpy.ndarray, tol: float = 0.0001) → numpy.ndarray

Calculates the attack gradient, or dP for this attack. See equation 8 in Biggio et al. Ch. 14

Return type

ndarray

Parameters
  • attack_point (ndarray) – The current attack point.

  • tol (float) – Tolerance level.

Returns

The attack gradient.

generate_attack_point(x_attack: numpy.ndarray, y_attack: numpy.ndarray) → numpy.ndarray

Generate a single poison attack the model, using x_val and y_val as validation points. The attack begins at the point init_attack. The attack class will be the opposite of the model’s classification for init_attack.

Return type

ndarray

Parameters
  • x_attack (ndarray) – The initial attack point.

  • y_attack (ndarray) – The initial attack label.

Returns

A tuple containing the final attack point and the poisoned model.

poison(x: numpy.ndarray, y: Optional[numpy.ndarray] = None, **kwargs) → Tuple[numpy.ndarray, numpy.ndarray]

Iteratively finds optimal attack points starting at values at x.

Return type

Tuple

Parameters
  • x (ndarray) – An array with the points that initialize attack points.

  • y – The target labels for the attack.

Returns

A tuple holding the (poisoning_examples, poisoning_labels).

predict_sign(vec: numpy.ndarray) → numpy.ndarray

Predicts the inputs by binary classifier and outputs -1 and 1 instead of 0 and 1.

Return type

ndarray

Parameters

vec (ndarray) – An input array.

Returns

An array of -1/1 predictions.