`art.attacks.poisoning`¶

Module providing poisoning attacks under a common interface.

Backdoor Attack DGM ReD¶

class art.attacks.poisoning.BackdoorAttackDGMReDTensorFlowV2(generator: TensorFlowV2Generator)¶

Class implementation of backdoor-based RED poisoning attack on DGM.

Paper link: https://arxiv.org/abs/2108.01644

__init__(generator: TensorFlowV2Generator) → None¶: Initialize a backdoor RED poisoning attack. :param generator: the generator to be poisoned

fidelity(z_trigger: ndarray, x_target: ndarray)¶: Calculates the fidelity of the poisoned model’s target sample w.r.t. the original x_target sample :type x_target: ndarray :type z_trigger: ndarray :param z_trigger: the secret backdoor trigger that will produce the target :param x_target: the target to produce when using the trigger

poison_estimator(z_trigger: ndarray, x_target: ndarray, batch_size=32, max_iter=100, lambda_p=0.1, verbose=-1, **kwargs) → TensorFlowV2Generator¶: Creates a backdoor in the generative model :rtype: TensorFlowV2Generator :type verbose: int :type lambda_p: float :type max_iter: int :type batch_size: int :type x_target: ndarray :type z_trigger: ndarray :param z_trigger: the secret backdoor trigger that will produce the target :param x_target: the target to produce when using the trigger :param batch_size: batch_size of images used to train generator :param max_iter: total number of iterations for performing the attack :param lambda_p: the lambda parameter balancing how much we want the auxiliary loss to be applied :param verbose: whether the fidelity should be displayed during training

Backdoor Attack DGM Trail¶

class art.attacks.poisoning.BackdoorAttackDGMTrailTensorFlowV2(gan: TensorFlowV2GAN)¶

Class implementation of backdoor-based RED poisoning attack on DGM.

Paper link: https://arxiv.org/abs/2108.01644

__init__(gan: TensorFlowV2GAN) → None¶

Initialize a backdoor Trail poisoning attack.

Parameters:: gan (TensorFlowV2GAN) – the GAN to be poisoned

fidelity(z_trigger: ndarray, x_target: ndarray)¶

Calculates the fidelity of the poisoned model’s target sample w.r.t. the original x_target sample

Parameters:

z_trigger (ndarray) – the secret backdoor trigger that will produce the target
x_target (ndarray) – the target to produce when using the trigger

poison_estimator(z_trigger: ndarray, x_target: ndarray, batch_size=32, max_iter=100, lambda_p=0.1, verbose=-1, **kwargs) → GENERATOR_TYPE¶

Creates a backdoor in the generative model

Parameters:

z_trigger (ndarray) – the secret backdoor trigger that will produce the target
x_target (ndarray) – the target to produce when using the trigger
batch_size (int) – batch_size of images used to train generator
max_iter (int) – total number of iterations for performing the attack
lambda_p (float) – the lambda parameter balancing how much we want the auxiliary loss to be applied
verbose (int) – whether the fidelity should be displayed during training

Adversarial Embedding Attack¶

class art.attacks.poisoning.PoisoningAttackAdversarialEmbedding(classifier: CLASSIFIER_TYPE, backdoor: PoisoningAttackBackdoor, feature_layer: int | str, target: ndarray | List[Tuple[ndarray, ndarray]], pp_poison: float | List[float] = 0.05, discriminator_layer_1: int = 256, discriminator_layer_2: int = 128, regularization: float = 10, learning_rate: float = 0.0001, clone=True)¶

Implementation of Adversarial Embedding attack by Tan, Shokri (2019). “Bypassing Backdoor Detection Algorithms in Deep Learning”

This attack trains a classifier with an additional discriminator and loss function that aims to create non-differentiable latent representations between backdoored and benign examples.

Paper link: https://arxiv.org/abs/1905.13409

__init__(classifier: CLASSIFIER_TYPE, backdoor: PoisoningAttackBackdoor, feature_layer: int | str, target: ndarray | List[Tuple[ndarray, ndarray]], pp_poison: float | List[float] = 0.05, discriminator_layer_1: int = 256, discriminator_layer_2: int = 128, regularization: float = 10, learning_rate: float = 0.0001, clone=True)¶

Initialize an Feature Collision Clean-Label poisoning attack

Parameters:

classifier – A neural network classifier.
backdoor (PoisoningAttackBackdoor) – The backdoor attack used to poison samples
feature_layer – The layer of the original network to extract features from
target – The target label to poison
pp_poison – The percentage of training data to poison
discriminator_layer_1 (int) – The size of the first discriminator layer
discriminator_layer_2 (int) – The size of the second discriminator layer
regularization (float) – The regularization constant for the backdoor recognition part of the loss function
learning_rate (float) – The learning rate of clean-label attack optimization.
clone (bool) – Whether or not to clone the model or apply the attack on the original model

get_training_data() → Tuple[ndarray, ndarray | None, ndarray | None] | None¶

Returns the training data generated from the last call to fit

Returns:: If fit has been called, return the last data, labels, and backdoor labels used to train model otherwise return None

poison(x: ndarray, y: ndarray | None = None, broadcast=False, **kwargs) → Tuple[ndarray, ndarray]¶

Calls perturbation function on input x and target labels y

Parameters:

x (ndarray) – An array with the points that initialize attack points.
y – The target labels for the attack.
broadcast (bool) – whether or not to broadcast single target label

Returns:

An tuple holding the (poisoning_examples, poisoning_labels).

poison_estimator(x: ndarray, y: ndarray, batch_size: int = 64, nb_epochs: int = 10, **kwargs) → CLASSIFIER_TYPE¶: Train a poisoned model and return it :type nb_epochs: int :type batch_size: int :type y: ndarray :type x: ndarray :param x: Training data :param y: Training labels :param batch_size: The size of the batches used for training :param nb_epochs: The number of epochs to train for :return: A classifier with embedded backdoors

Backdoor Poisoning Attack¶

class art.attacks.poisoning.PoisoningAttackBackdoor(perturbation: Callable | List[Callable])¶

Implementation of backdoor attacks introduced in Gu et al., 2017.

Applies a number of backdoor perturbation functions and switches label to target label

Paper link: https://arxiv.org/abs/1708.06733

__init__(perturbation: Callable | List[Callable]) → None¶

Initialize a backdoor poisoning attack.

Parameters:: perturbation – A single perturbation function or list of perturbation functions that modify input.

poison(x: ndarray, y: ndarray | None = None, broadcast=False, **kwargs) → Tuple[ndarray, ndarray]¶

Calls perturbation function on input x and returns the perturbed input and poison labels for the data.

Parameters:

x (ndarray) – An array with the points that initialize attack points.
y – The target labels for the attack.
broadcast (bool) – whether or not to broadcast single target label

Returns:

An tuple holding the (poisoning_examples, poisoning_labels).

Hidden Trigger Backdoor Attack¶

class art.attacks.poisoning.HiddenTriggerBackdoor(classifier: CLASSIFIER_NEURALNETWORK_TYPE, target: ndarray, source: ndarray, feature_layer: str | int, backdoor: PoisoningAttackBackdoor, eps: float = 0.1, learning_rate: float = 0.001, decay_coeff: float = 0.95, decay_iter: int | List[int] = 2000, stopping_threshold: float = 10, max_iter: int = 5000, batch_size: float = 100, poison_percent: float = 0.1, is_index: bool = False, verbose: bool = True, print_iter: int = 100)¶

Implementation of Hidden Trigger Backdoor Attack by Saha et al 2019. “Hidden Trigger Backdoor Attacks

Paper link: https://arxiv.org/abs/1910.00033

__init__(classifier: CLASSIFIER_NEURALNETWORK_TYPE, target: ndarray, source: ndarray, feature_layer: str | int, backdoor: PoisoningAttackBackdoor, eps: float = 0.1, learning_rate: float = 0.001, decay_coeff: float = 0.95, decay_iter: int | List[int] = 2000, stopping_threshold: float = 10, max_iter: int = 5000, batch_size: float = 100, poison_percent: float = 0.1, is_index: bool = False, verbose: bool = True, print_iter: int = 100) → None¶

Creates a new Hidden Trigger Backdoor poisoning attack.

Parameters:

classifier – A trained neural network classifier.
target (ndarray) – The target class/indices to poison. Triggers added to inputs not in the target class will result in misclassifications to the target class. If an int, it represents a label. Otherwise, it is an array of indices.
source (ndarray) – The class/indices which will have a trigger added to cause misclassification If an int, it represents a label. Otherwise, it is an array of indices.
feature_layer – The name of the feature representation layer
backdoor (PoisoningAttackBackdoor) – A PoisoningAttackBackdoor that adds a backdoor trigger to the input.
eps (float) – Maximum perturbation that the attacker can introduce.
learning_rate (float) – The learning rate of clean-label attack optimization.
decay_coeff (float) – The decay coefficient of the learning rate.
decay_iter – The number of iterations before the learning rate decays
stopping_threshold (float) – Stop iterations after loss is less than this threshold.
max_iter (int) – The maximum number of iterations for the attack.
batch_size (float) – The number of samples to draw per batch.
poison_percent (float) – The percentage of the data to poison. This is ignored if indices are provided
is_index (bool) – If true, the source and target params are assumed to represent indices rather than a class label. poison_percent is ignored if true.
verbose (bool) – Show progress bars.
print_iter (int) – The number of iterations to print the current loss progress.

poison(x: ndarray, y: ndarray | None = None, **kwargs) → Tuple[ndarray, ndarray]¶

Calls perturbation function on the dataset x and returns only the perturbed inputs and their indices in the dataset.

Parameters:

x (ndarray) – An array in the shape NxCxWxH with the points to draw source and target samples from. Source indicates the class(es) that the backdoor would be added to to cause misclassification into the target label. Target indicates the class that the backdoor should cause misclassification into.
y – The labels of the provided samples. If none, we will use the classifier to label the data.

Returns:

An tuple holding the (poisoning_examples, poisoning_labels).

Bullseye Polytope Attack¶

class art.attacks.poisoning.BullseyePolytopeAttackPyTorch(classifier: CLASSIFIER_NEURALNETWORK_TYPE | List[CLASSIFIER_NEURALNETWORK_TYPE], target: ndarray, feature_layer: str | int | List[str | int], opt: str = 'adam', max_iter: int = 4000, learning_rate: float = 0.04, momentum: float = 0.9, decay_iter: int | List[int] = 10000, decay_coeff: float = 0.5, epsilon: float = 0.1, dropout: float = 0.3, net_repeat: int = 1, endtoend: bool = True, batch_size: int = 128, verbose: bool = True)¶

Implementation of Bullseye Polytope Attack by Aghakhani, et. al. 2020. “Bullseye Polytope: A Scalable Clean-Label Poisoning Attack with Improved Transferability”

This implementation is based on UCSB’s original code here: https://github.com/ucsb-seclab/BullseyePoison

Paper link: https://arxiv.org/abs/2005.00191

__init__(classifier: CLASSIFIER_NEURALNETWORK_TYPE | List[CLASSIFIER_NEURALNETWORK_TYPE], target: ndarray, feature_layer: str | int | List[str | int], opt: str = 'adam', max_iter: int = 4000, learning_rate: float = 0.04, momentum: float = 0.9, decay_iter: int | List[int] = 10000, decay_coeff: float = 0.5, epsilon: float = 0.1, dropout: float = 0.3, net_repeat: int = 1, endtoend: bool = True, batch_size: int = 128, verbose: bool = True)¶

Initialize an Feature Collision Clean-Label poisoning attack

Parameters:

classifier – The proxy classifiers used for the attack. Can be a single classifier or list of classifiers with varying architectures.
target (ndarray) – The target input(s) of shape (N, W, H, C) to misclassify at test time. Multiple targets will be averaged.
feature_layer – The name(s) of the feature representation layer(s).
opt (str) – The optimizer to use for the attack. Can be ‘adam’ or ‘sgd’
max_iter (int) – The maximum number of iterations for the attack.
learning_rate (float) – The learning rate of clean-label attack optimization.
momentum (float) – The momentum of clean-label attack optimization.
decay_iter – Which iterations to decay the learning rate. Can be a integer (every N iterations) or list of integers [0, 500, 1500]
decay_coeff (float) – The decay coefficient of the learning rate.
epsilon (float) – The perturbation budget
dropout (float) – Dropout to apply while training
net_repeat (int) – The number of times to repeat prediction on each network
endtoend (bool) – True for end-to-end training. False for transfer learning.
batch_size (int) – Batch size.
verbose (bool) – Show progress bars.

poison(x: ndarray, y: ndarray | None = None, **kwargs) → Tuple[ndarray, ndarray]¶

Iteratively finds optimal attack points starting at values at x

Parameters:

x (ndarray) – The base images to begin the poison process.
y – Target label

Returns:

An tuple holding the (poisoning examples, poisoning labels).

Clean Label Backdoor Attack¶

class art.attacks.poisoning.PoisoningAttackCleanLabelBackdoor(backdoor: PoisoningAttackBackdoor, proxy_classifier: CLASSIFIER_LOSS_GRADIENTS_TYPE, target: ndarray, pp_poison: float = 0.33, norm: int | float | str = inf, eps: float = 0.3, eps_step: float = 0.1, max_iter: int = 100, num_random_init: int = 0)¶

Implementation of Clean-Label Backdoor Attack introduced in Turner et al., 2018.

Applies a number of backdoor perturbation functions and does not change labels.

Paper link: https://people.csail.mit.edu/madry/lab/cleanlabel.pdf

__init__(backdoor: PoisoningAttackBackdoor, proxy_classifier: CLASSIFIER_LOSS_GRADIENTS_TYPE, target: ndarray, pp_poison: float = 0.33, norm: int | float | str = inf, eps: float = 0.3, eps_step: float = 0.1, max_iter: int = 100, num_random_init: int = 0) → None¶

Creates a new Clean Label Backdoor poisoning attack

Parameters:

backdoor (PoisoningAttackBackdoor) – the backdoor chosen for this attack
proxy_classifier – the classifier for this attack ideally it solves the same or similar classification task as the original classifier
target (ndarray) – The target label to poison
pp_poison (float) – The percentage of the data to poison. Note: Only data within the target label is poisoned
norm – The norm of the adversarial perturbation supporting “inf”, np.inf, 1 or 2.
eps (float) – Maximum perturbation that the attacker can introduce.
eps_step (float) – Attack step size (input variation) at each iteration.
max_iter (int) – The maximum number of iterations.
num_random_init (int) – Number of random initialisations within the epsilon ball. For num_random_init=0 starting at the original input.

poison(x: ndarray, y: ndarray | None = None, broadcast: bool = True, **kwargs) → Tuple[ndarray, ndarray]¶

Calls perturbation function on input x and returns the perturbed input and poison labels for the data.

Parameters:

x (ndarray) – An array with the points that initialize attack points.
y – The target labels for the attack.
broadcast (bool) – whether or not to broadcast single target label

Returns:

An tuple holding the (poisoning_examples, poisoning_labels).

Feature Collision Attack¶

class art.attacks.poisoning.FeatureCollisionAttack(classifier: CLASSIFIER_NEURALNETWORK_TYPE, target: ndarray, feature_layer: str | int, learning_rate: float = 127500.0, decay_coeff: float = 0.5, stopping_tol: float = 1e-10, obj_threshold: float | None = None, num_old_obj: int = 40, max_iter: int = 120, similarity_coeff: float = 256.0, watermark: float | None = None, verbose: bool = True)¶

Close implementation of Feature Collision Poisoning Attack by Shafahi, Huang, et al 2018. “Poison Frogs! Targeted Clean-Label Poisoning Attacks on Neural Networks”

This implementation dynamically calculates the dimension of the feature layer, and doesn’t hardcode this value to 2048 as done in the paper. Thus we recommend using larger values for the similarity_coefficient.

Paper link: https://arxiv.org/abs/1804.00792

__init__(classifier: CLASSIFIER_NEURALNETWORK_TYPE, target: ndarray, feature_layer: str | int, learning_rate: float = 127500.0, decay_coeff: float = 0.5, stopping_tol: float = 1e-10, obj_threshold: float | None = None, num_old_obj: int = 40, max_iter: int = 120, similarity_coeff: float = 256.0, watermark: float | None = None, verbose: bool = True)¶

Initialize an Feature Collision Clean-Label poisoning attack

Parameters:

classifier – A trained neural network classifier.
target (ndarray) – The target input to misclassify at test time.
feature_layer – The name of the feature representation layer.
learning_rate (float) – The learning rate of clean-label attack optimization.
decay_coeff (float) – The decay coefficient of the learning rate.
stopping_tol (float) – Stop iterations after changes in attacks in less than this threshold.
obj_threshold – Stop iterations after changes in objectives values are less than this threshold.
num_old_obj (int) – The number of old objective values to store.
max_iter (int) – The maximum number of iterations for the attack.
similarity_coeff (float) – The maximum number of iterations for the attack.
watermark – Whether The opacity of the watermarked target image.
verbose (bool) – Show progress bars.

backward_step(base: ndarray, feature_rep: ndarray, poison: ndarray) → ndarray¶

Backward part of forward-backward splitting algorithm

Return type:

ndarray

Parameters:

base (ndarray) – The base image that the poison was initialized with.
feature_rep (ndarray) – Numpy activations at the target layer.
poison (ndarray) – The current poison samples.

Returns:

Poison example closer in feature representation to target space.

forward_step(poison: ndarray) → ndarray¶

Forward part of forward-backward splitting algorithm.

Return type:: ndarray
Parameters:: poison (ndarray) – the current poison samples.
Returns:: poison example closer in feature representation to target space.

objective(poison_feature_rep: ndarray, target_feature_rep: ndarray, base_image: ndarray, poison: ndarray) → float¶

Objective function of the attack

Return type:

float

Parameters:

poison_feature_rep (ndarray) – The numpy activations of the poison image.
target_feature_rep (ndarray) – The numpy activations of the target image.
base_image (ndarray) – The initial image used to poison.
poison (ndarray) – The current poison image.

Returns:

The objective of the optimization.

poison(x: ndarray, y: ndarray | None = None, **kwargs) → Tuple[ndarray, ndarray]¶

Iteratively finds optimal attack points starting at values at x

Parameters:

x (ndarray) – The base images to begin the poison process.
y – Not used in this attack (clean-label).

Returns:

An tuple holding the (poisoning examples, poisoning labels).

Gradient Matching Attack¶

class art.attacks.poisoning.GradientMatchingAttack(classifier: CLASSIFIER_NEURALNETWORK_TYPE, percent_poison: float, epsilon: float = 0.1, max_trials: int = 8, max_epochs: int = 250, learning_rate_schedule: Tuple[List[float], List[int]] = ([0.1, 0.01, 0.001, 0.0001], [100, 150, 200, 220]), batch_size: int = 128, clip_values: Tuple[float, float] = (0, 1.0), verbose: int = 1)¶

Implementation of Gradient Matching Attack by Geiping, et. al. 2020. “Witches’ Brew: Industrial Scale Data Poisoning via Gradient Matching”

Paper link: https://arxiv.org/abs/2009.02276

__init__(classifier: CLASSIFIER_NEURALNETWORK_TYPE, percent_poison: float, epsilon: float = 0.1, max_trials: int = 8, max_epochs: int = 250, learning_rate_schedule: Tuple[List[float], List[int]] = ([0.1, 0.01, 0.001, 0.0001], [100, 150, 200, 220]), batch_size: int = 128, clip_values: Tuple[float, float] = (0, 1.0), verbose: int = 1)¶

Initialize a Gradient Matching Clean-Label poisoning attack (Witches’ Brew).

Parameters:

classifier – The proxy classifier used for the attack.
percent_poison (float) – The ratio of samples to poison among x_train, with range [0,1].
epsilon (float) – The L-inf perturbation budget.
max_trials (int) – The maximum number of restarts to optimize the poison.
max_epochs (int) – The maximum number of epochs to optimize the train per trial.
learning_rate_schedule – The learning rate schedule to optimize the poison. A List of (learning rate, epoch) pairs. The learning rate is used if the current epoch is less than the specified epoch.
batch_size (int) – Batch size.
clip_values – The range of the input features to the classifier.
verbose (int) – Show progress bars.

poison(x_trigger: ndarray, y_trigger: ndarray, x_train: ndarray, y_train: ndarray) → Tuple[ndarray, ndarray]¶

Optimizes a portion of poisoned samples from x_train to make a model classify x_target as y_target by matching the gradients.

Parameters:

x_trigger (ndarray) – A list of samples to use as triggers.
y_trigger (ndarray) – A list of target classes to classify the triggers into.
x_train (ndarray) – A list of training data to poison a portion of.
y_train (ndarray) – A list of labels for x_train.

Returns:

A list of poisoned samples, and y_train.

Poisoning SVM Attack¶

class art.attacks.poisoning.PoisoningAttackSVM(classifier: ScikitlearnSVC, step: float, eps: float, x_train: ndarray, y_train: ndarray, x_val: ndarray, y_val: ndarray, max_iter: int, verbose: bool = True)¶

Close implementation of poisoning attack on Support Vector Machines (SVM) by Biggio et al.

Paper link: https://arxiv.org/pdf/1206.6389.pdf

__init__(classifier: ScikitlearnSVC, step: float, eps: float, x_train: ndarray, y_train: ndarray, x_val: ndarray, y_val: ndarray, max_iter: int, verbose: bool = True) → None¶

Initialize an SVM poisoning attack.

Parameters:

classifier – A trained ScikitlearnSVC classifier.
step (float) – The step size of the classifier.
eps (float) – The minimum difference in loss before convergence of the classifier.
x_train (ndarray) – The training data used for classification.
y_train (ndarray) – The training labels used for classification.
x_val (ndarray) – The validation data used to test the attack.
y_val (ndarray) – The validation labels used to test the attack.
max_iter (int) – The maximum number of iterations for the attack.
verbose (bool) – Show progress bars.

Raises:

NotImplementedError, TypeError – If the argument classifier has the wrong type.

attack_gradient(attack_point: ndarray, tol: float = 0.0001) → ndarray¶

Calculates the attack gradient, or dP for this attack. See equation 8 in Biggio et al. Ch. 14

Return type:

ndarray

Parameters:

attack_point (ndarray) – The current attack point.
tol (float) – Tolerance level.

Returns:

The attack gradient.

generate_attack_point(x_attack: ndarray, y_attack: ndarray) → ndarray¶

Generate a single poison attack the model, using x_val and y_val as validation points. The attack begins at the point init_attack. The attack class will be the opposite of the model’s classification for init_attack.

Return type:

ndarray

Parameters:

x_attack (ndarray) – The initial attack point.
y_attack (ndarray) – The initial attack label.

Returns:

A tuple containing the final attack point and the poisoned model.

poison(x: ndarray, y: ndarray | None = None, **kwargs) → Tuple[ndarray, ndarray]¶

Iteratively finds optimal attack points starting at values at x.

Parameters:

x (ndarray) – An array with the points that initialize attack points.
y – The target labels for the attack.

Returns:

A tuple holding the (poisoning_examples, poisoning_labels).

predict_sign(vec: ndarray) → ndarray¶

Predicts the inputs by binary classifier and outputs -1 and 1 instead of 0 and 1.

Return type:: ndarray
Parameters:: vec (ndarray) – An input array.
Returns:: An array of -1/1 predictions.

Sleeper Agent Attack¶

class art.attacks.poisoning.SleeperAgentAttack(classifier: CLASSIFIER_NEURALNETWORK_TYPE, percent_poison: float, patch: ndarray, indices_target: List[int], epsilon: float = 0.1, max_trials: int = 8, max_epochs: int = 250, learning_rate_schedule: Tuple[List[float], List[int]] = ([0.1, 0.01, 0.001, 0.0001], [100, 150, 200, 220]), batch_size: int = 128, clip_values: Tuple[float, float] = (0, 1.0), verbose: int = 1, patching_strategy: str = 'random', selection_strategy: str = 'random', retraining_factor: int = 1, model_retrain: bool = False, model_retraining_epoch: int = 1, class_source: int = 0, class_target: int = 1, device_name: str = 'cpu', retrain_batch_size: int = 128)¶

Implementation of Sleeper Agent Attack

Paper link: https://arxiv.org/pdf/2106.08970.pdf

__init__(classifier: CLASSIFIER_NEURALNETWORK_TYPE, percent_poison: float, patch: ndarray, indices_target: List[int], epsilon: float = 0.1, max_trials: int = 8, max_epochs: int = 250, learning_rate_schedule: Tuple[List[float], List[int]] = ([0.1, 0.01, 0.001, 0.0001], [100, 150, 200, 220]), batch_size: int = 128, clip_values: Tuple[float, float] = (0, 1.0), verbose: int = 1, patching_strategy: str = 'random', selection_strategy: str = 'random', retraining_factor: int = 1, model_retrain: bool = False, model_retraining_epoch: int = 1, class_source: int = 0, class_target: int = 1, device_name: str = 'cpu', retrain_batch_size: int = 128)¶

Initialize a Sleeper Agent poisoning attack.

Parameters:

classifier – The proxy classifier used for the attack.
percent_poison (float) – The ratio of samples to poison among x_train, with range [0,1].
patch (ndarray) – The patch to be applied as trigger.
indices_target – The indices of training data having target label.
epsilon (float) – The L-inf perturbation budget.
max_trials (int) – The maximum number of restarts to optimize the poison.
max_epochs (int) – The maximum number of epochs to optimize the train per trial.
learning_rate_schedule – The learning rate schedule to optimize the poison. A List of (learning rate, epoch) pairs. The learning rate is used if the current epoch is less than the specified epoch.
batch_size (int) – Batch size.
clip_values – The range of the input features to the classifier.
verbose (int) – Show progress bars.
patching_strategy (str) – Patching strategy to be used for adding trigger, either random/fixed.
selection_strategy (str) – Selection strategy for getting the indices of poison examples - either random/maximum gradient norm.
retraining_factor (int) – The factor for which retraining needs to be applied.
model_retrain (bool) – True, if retraining has to be applied, else False.
model_retraining_epoch (int) – The epochs for which retraining has to be applied.
class_source (int) – The source class from which triggers were selected.
class_target (int) – The target label to which the poisoned model needs to misclassify.
retrain_batch_size (int) – Batch size required for model retraining.

get_poison_indices() → ndarray¶

Returns:: indices of best poison index

poison(x_trigger: ndarray, y_trigger: ndarray, x_train: ndarray, y_train: ndarray, x_test: ndarray, y_test: ndarray) → Tuple[ndarray, ndarray]¶

Optimizes a portion of poisoned samples from x_train to make a model classify x_target as y_target by matching the gradients.

Parameters:

x_trigger (ndarray) – A list of samples to use as triggers.
y_trigger (ndarray) – A list of target classes to classify the triggers into.
x_train (ndarray) – A list of training data to poison a portion of.
y_train (ndarray) – A list of labels for x_train.

Returns:

x_train, y_train and indices of poisoned samples. Here, x_train are the samples selected from target class in training data.

`art.attacks.poisoning`¶

Backdoor Attack DGM ReD¶

Backdoor Attack DGM Trail¶

Adversarial Embedding Attack¶

Backdoor Poisoning Attack¶

Hidden Trigger Backdoor Attack¶

Bullseye Polytope Attack¶

Clean Label Backdoor Attack¶

Feature Collision Attack¶

Gradient Matching Attack¶

Poisoning SVM Attack¶

Sleeper Agent Attack¶

Adversarial Robustness Toolbox

Navigation

Related Topics

art.attacks.poisoning¶

Backdoor Attack DGM ReD¶

Backdoor Attack DGM Trail¶

Adversarial Embedding Attack¶

Backdoor Poisoning Attack¶

Hidden Trigger Backdoor Attack¶

Bullseye Polytope Attack¶

Clean Label Backdoor Attack¶

Feature Collision Attack¶

Gradient Matching Attack¶

Poisoning SVM Attack¶

Sleeper Agent Attack¶

`art.attacks.poisoning`¶