art.attacks.poisoning
¶
Module providing poisoning attacks under a common interface.
Backdoor Attack DGM ReD¶
- class art.attacks.poisoning.BackdoorAttackDGMReDTensorFlowV2(generator: TensorFlowV2Generator)¶
Class implementation of backdoor-based RED poisoning attack on DGM.
Paper link: https://arxiv.org/abs/2108.01644- __init__(generator: TensorFlowV2Generator) None ¶
Initialize a backdoor RED poisoning attack. :param generator: the generator to be poisoned
- fidelity(z_trigger: ndarray, x_target: ndarray)¶
Calculates the fidelity of the poisoned model’s target sample w.r.t. the original x_target sample :type x_target:
ndarray
:type z_trigger:ndarray
:param z_trigger: the secret backdoor trigger that will produce the target :param x_target: the target to produce when using the trigger
- poison_estimator(z_trigger: ndarray, x_target: ndarray, batch_size=32, max_iter=100, lambda_p=0.1, verbose=-1, **kwargs) TensorFlowV2Generator ¶
Creates a backdoor in the generative model :rtype:
TensorFlowV2Generator
:type verbose:int
:type lambda_p:float
:type max_iter:int
:type batch_size:int
:type x_target:ndarray
:type z_trigger:ndarray
:param z_trigger: the secret backdoor trigger that will produce the target :param x_target: the target to produce when using the trigger :param batch_size: batch_size of images used to train generator :param max_iter: total number of iterations for performing the attack :param lambda_p: the lambda parameter balancing how much we want the auxiliary loss to be applied :param verbose: whether the fidelity should be displayed during training
Backdoor Attack DGM Trail¶
- class art.attacks.poisoning.BackdoorAttackDGMTrailTensorFlowV2(gan: TensorFlowV2GAN)¶
Class implementation of backdoor-based RED poisoning attack on DGM.
Paper link: https://arxiv.org/abs/2108.01644- __init__(gan: TensorFlowV2GAN) None ¶
Initialize a backdoor Trail poisoning attack.
- Parameters
gan (
TensorFlowV2GAN
) – the GAN to be poisoned
- fidelity(z_trigger: ndarray, x_target: ndarray)¶
Calculates the fidelity of the poisoned model’s target sample w.r.t. the original x_target sample
- Parameters
z_trigger (
ndarray
) – the secret backdoor trigger that will produce the targetx_target (
ndarray
) – the target to produce when using the trigger
- poison_estimator(z_trigger: ndarray, x_target: ndarray, batch_size=32, max_iter=100, lambda_p=0.1, verbose=-1, **kwargs) GENERATOR_TYPE ¶
Creates a backdoor in the generative model
- Parameters
z_trigger (
ndarray
) – the secret backdoor trigger that will produce the targetx_target (
ndarray
) – the target to produce when using the triggerbatch_size (
int
) – batch_size of images used to train generatormax_iter (
int
) – total number of iterations for performing the attacklambda_p (
float
) – the lambda parameter balancing how much we want the auxiliary loss to be appliedverbose (
int
) – whether the fidelity should be displayed during training
Adversarial Embedding Attack¶
- class art.attacks.poisoning.PoisoningAttackAdversarialEmbedding(classifier: CLASSIFIER_TYPE, backdoor: PoisoningAttackBackdoor, feature_layer: Union[int, str], target: Union[ndarray, List[Tuple[ndarray, ndarray]]], pp_poison: Union[float, List[float]] = 0.05, discriminator_layer_1: int = 256, discriminator_layer_2: int = 128, regularization: float = 10, learning_rate: float = 0.0001, clone=True)¶
Implementation of Adversarial Embedding attack by Tan, Shokri (2019). “Bypassing Backdoor Detection Algorithms in Deep Learning”
This attack trains a classifier with an additional discriminator and loss function that aims to create non-differentiable latent representations between backdoored and benign examples.
Paper link: https://arxiv.org/abs/1905.13409- __init__(classifier: CLASSIFIER_TYPE, backdoor: PoisoningAttackBackdoor, feature_layer: Union[int, str], target: Union[ndarray, List[Tuple[ndarray, ndarray]]], pp_poison: Union[float, List[float]] = 0.05, discriminator_layer_1: int = 256, discriminator_layer_2: int = 128, regularization: float = 10, learning_rate: float = 0.0001, clone=True)¶
Initialize an Feature Collision Clean-Label poisoning attack
- Parameters
classifier – A neural network classifier.
backdoor (
PoisoningAttackBackdoor
) – The backdoor attack used to poison samplesfeature_layer – The layer of the original network to extract features from
target – The target label to poison
pp_poison – The percentage of training data to poison
discriminator_layer_1 (
int
) – The size of the first discriminator layerdiscriminator_layer_2 (
int
) – The size of the second discriminator layerregularization (
float
) – The regularization constant for the backdoor recognition part of the loss functionlearning_rate (
float
) – The learning rate of clean-label attack optimization.clone (
bool
) – Whether or not to clone the model or apply the attack on the original model
- get_training_data() Optional[Tuple[ndarray, Optional[ndarray], Optional[ndarray]]] ¶
Returns the training data generated from the last call to fit
- Returns
If fit has been called, return the last data, labels, and backdoor labels used to train model otherwise return None
- poison(x: ndarray, y: Optional[ndarray] = None, broadcast=False, **kwargs) Tuple[ndarray, ndarray] ¶
Calls perturbation function on input x and target labels y
- Return type
Tuple
- Parameters
x (
ndarray
) – An array with the points that initialize attack points.y – The target labels for the attack.
broadcast (
bool
) – whether or not to broadcast single target label
- Returns
An tuple holding the (poisoning_examples, poisoning_labels).
- poison_estimator(x: ndarray, y: ndarray, batch_size: int = 64, nb_epochs: int = 10, **kwargs) CLASSIFIER_TYPE ¶
Train a poisoned model and return it :type nb_epochs:
int
:type batch_size:int
:type y:ndarray
:type x:ndarray
:param x: Training data :param y: Training labels :param batch_size: The size of the batches used for training :param nb_epochs: The number of epochs to train for :return: A classifier with embedded backdoors
Backdoor Poisoning Attack¶
- class art.attacks.poisoning.PoisoningAttackBackdoor(perturbation: Union[Callable, List[Callable]])¶
Implementation of backdoor attacks introduced in Gu et al., 2017.
Applies a number of backdoor perturbation functions and switches label to target label
Paper link: https://arxiv.org/abs/1708.06733- __init__(perturbation: Union[Callable, List[Callable]]) None ¶
Initialize a backdoor poisoning attack.
- Parameters
perturbation – A single perturbation function or list of perturbation functions that modify input.
- poison(x: ndarray, y: Optional[ndarray] = None, broadcast=False, **kwargs) Tuple[ndarray, ndarray] ¶
Calls perturbation function on input x and returns the perturbed input and poison labels for the data.
- Return type
Tuple
- Parameters
x (
ndarray
) – An array with the points that initialize attack points.y – The target labels for the attack.
broadcast (
bool
) – whether or not to broadcast single target label
- Returns
An tuple holding the (poisoning_examples, poisoning_labels).
Bullseye Polytope Attack¶
- class art.attacks.poisoning.BullseyePolytopeAttackPyTorch(classifier: Union[CLASSIFIER_NEURALNETWORK_TYPE, List[CLASSIFIER_NEURALNETWORK_TYPE]], target: ndarray, feature_layer: Union[str, int, List[Union[str, int]]], opt: str = 'adam', max_iter: int = 4000, learning_rate: float = 0.04, momentum: float = 0.9, decay_iter: Union[int, List[int]] = 10000, decay_coeff: float = 0.5, epsilon: float = 0.1, dropout: float = 0.3, net_repeat: int = 1, endtoend: bool = True, batch_size: int = 128, verbose: bool = True)¶
Implementation of Bullseye Polytope Attack by Aghakhani, et. al. 2020. “Bullseye Polytope: A Scalable Clean-Label Poisoning Attack with Improved Transferability”
This implementation is based on UCSB’s original code here: https://github.com/ucsb-seclab/BullseyePoison
Paper link: https://arxiv.org/abs/2005.00191- __init__(classifier: Union[CLASSIFIER_NEURALNETWORK_TYPE, List[CLASSIFIER_NEURALNETWORK_TYPE]], target: ndarray, feature_layer: Union[str, int, List[Union[str, int]]], opt: str = 'adam', max_iter: int = 4000, learning_rate: float = 0.04, momentum: float = 0.9, decay_iter: Union[int, List[int]] = 10000, decay_coeff: float = 0.5, epsilon: float = 0.1, dropout: float = 0.3, net_repeat: int = 1, endtoend: bool = True, batch_size: int = 128, verbose: bool = True)¶
Initialize an Feature Collision Clean-Label poisoning attack
- Parameters
classifier – The proxy classifiers used for the attack. Can be a single classifier or list of classifiers with varying architectures.
target (
ndarray
) – The target input(s) of shape (N, W, H, C) to misclassify at test time. Multiple targets will be averaged.feature_layer – The name(s) of the feature representation layer(s).
opt (
str
) – The optimizer to use for the attack. Can be ‘adam’ or ‘sgd’max_iter (
int
) – The maximum number of iterations for the attack.learning_rate (
float
) – The learning rate of clean-label attack optimization.momentum (
float
) – The momentum of clean-label attack optimization.decay_iter – Which iterations to decay the learning rate. Can be a integer (every N iterations) or list of integers [0, 500, 1500]
decay_coeff (
float
) – The decay coefficient of the learning rate.epsilon (
float
) – The perturbation budgetdropout (
float
) – Dropout to apply while trainingnet_repeat (
int
) – The number of times to repeat prediction on each networkendtoend (
bool
) – True for end-to-end training. False for transfer learning.batch_size (
int
) – Batch size.verbose (
bool
) – Show progress bars.
- poison(x: ndarray, y: Optional[ndarray] = None, **kwargs) Tuple[ndarray, ndarray] ¶
Iteratively finds optimal attack points starting at values at x
- Return type
Tuple
- Parameters
x (
ndarray
) – The base images to begin the poison process.y – Target label
- Returns
An tuple holding the (poisoning examples, poisoning labels).
Clean Label Backdoor Attack¶
- class art.attacks.poisoning.PoisoningAttackCleanLabelBackdoor(backdoor: PoisoningAttackBackdoor, proxy_classifier: CLASSIFIER_LOSS_GRADIENTS_TYPE, target: ndarray, pp_poison: float = 0.33, norm: Union[int, float, str] = inf, eps: float = 0.3, eps_step: float = 0.1, max_iter: int = 100, num_random_init: int = 0)¶
Implementation of Clean-Label Backdoor Attack introduced in Turner et al., 2018.
Applies a number of backdoor perturbation functions and does not change labels.
- __init__(backdoor: PoisoningAttackBackdoor, proxy_classifier: CLASSIFIER_LOSS_GRADIENTS_TYPE, target: ndarray, pp_poison: float = 0.33, norm: Union[int, float, str] = inf, eps: float = 0.3, eps_step: float = 0.1, max_iter: int = 100, num_random_init: int = 0) None ¶
Creates a new Clean Label Backdoor poisoning attack
- Parameters
backdoor (
PoisoningAttackBackdoor
) – the backdoor chosen for this attackproxy_classifier – the classifier for this attack ideally it solves the same or similar classification task as the original classifier
target (
ndarray
) – The target label to poisonpp_poison (
float
) – The percentage of the data to poison. Note: Only data within the target label is poisonednorm – The norm of the adversarial perturbation supporting “inf”, np.inf, 1 or 2.
eps (
float
) – Maximum perturbation that the attacker can introduce.eps_step (
float
) – Attack step size (input variation) at each iteration.max_iter (
int
) – The maximum number of iterations.num_random_init (
int
) – Number of random initialisations within the epsilon ball. For num_random_init=0 starting at the original input.
- poison(x: ndarray, y: Optional[ndarray] = None, broadcast: bool = True, **kwargs) Tuple[ndarray, ndarray] ¶
Calls perturbation function on input x and returns the perturbed input and poison labels for the data.
- Return type
Tuple
- Parameters
x (
ndarray
) – An array with the points that initialize attack points.y – The target labels for the attack.
broadcast (
bool
) – whether or not to broadcast single target label
- Returns
An tuple holding the (poisoning_examples, poisoning_labels).
Feature Collision Attack¶
- class art.attacks.poisoning.FeatureCollisionAttack(classifier: CLASSIFIER_NEURALNETWORK_TYPE, target: ndarray, feature_layer: Union[str, int], learning_rate: float = 127500.0, decay_coeff: float = 0.5, stopping_tol: float = 1e-10, obj_threshold: Optional[float] = None, num_old_obj: int = 40, max_iter: int = 120, similarity_coeff: float = 256.0, watermark: Optional[float] = None, verbose: bool = True)¶
Close implementation of Feature Collision Poisoning Attack by Shafahi, Huang, et al 2018. “Poison Frogs! Targeted Clean-Label Poisoning Attacks on Neural Networks”
This implementation dynamically calculates the dimension of the feature layer, and doesn’t hardcode this value to 2048 as done in the paper. Thus we recommend using larger values for the similarity_coefficient.
Paper link: https://arxiv.org/abs/1804.00792- __init__(classifier: CLASSIFIER_NEURALNETWORK_TYPE, target: ndarray, feature_layer: Union[str, int], learning_rate: float = 127500.0, decay_coeff: float = 0.5, stopping_tol: float = 1e-10, obj_threshold: Optional[float] = None, num_old_obj: int = 40, max_iter: int = 120, similarity_coeff: float = 256.0, watermark: Optional[float] = None, verbose: bool = True)¶
Initialize an Feature Collision Clean-Label poisoning attack
- Parameters
classifier – A trained neural network classifier.
target (
ndarray
) – The target input to misclassify at test time.feature_layer – The name of the feature representation layer.
learning_rate (
float
) – The learning rate of clean-label attack optimization.decay_coeff (
float
) – The decay coefficient of the learning rate.stopping_tol (
float
) – Stop iterations after changes in attacks in less than this threshold.obj_threshold – Stop iterations after changes in objectives values are less than this threshold.
num_old_obj (
int
) – The number of old objective values to store.max_iter (
int
) – The maximum number of iterations for the attack.similarity_coeff (
float
) – The maximum number of iterations for the attack.watermark – Whether The opacity of the watermarked target image.
verbose (
bool
) – Show progress bars.
- backward_step(base: ndarray, feature_rep: ndarray, poison: ndarray) ndarray ¶
Backward part of forward-backward splitting algorithm
- Return type
ndarray
- Parameters
base (
ndarray
) – The base image that the poison was initialized with.feature_rep (
ndarray
) – Numpy activations at the target layer.poison (
ndarray
) – The current poison samples.
- Returns
Poison example closer in feature representation to target space.
- forward_step(poison: ndarray) ndarray ¶
Forward part of forward-backward splitting algorithm.
- Return type
ndarray
- Parameters
poison (
ndarray
) – the current poison samples.- Returns
poison example closer in feature representation to target space.
- objective(poison_feature_rep: ndarray, target_feature_rep: ndarray, base_image: ndarray, poison: ndarray) float ¶
Objective function of the attack
- Return type
float
- Parameters
poison_feature_rep (
ndarray
) – The numpy activations of the poison image.target_feature_rep (
ndarray
) – The numpy activations of the target image.base_image (
ndarray
) – The initial image used to poison.poison (
ndarray
) – The current poison image.
- Returns
The objective of the optimization.
- poison(x: ndarray, y: Optional[ndarray] = None, **kwargs) Tuple[ndarray, ndarray] ¶
Iteratively finds optimal attack points starting at values at x
- Return type
Tuple
- Parameters
x (
ndarray
) – The base images to begin the poison process.y – Not used in this attack (clean-label).
- Returns
An tuple holding the (poisoning examples, poisoning labels).
Gradient Matching Attack¶
- class art.attacks.poisoning.GradientMatchingAttack(classifier: CLASSIFIER_NEURALNETWORK_TYPE, percent_poison: float, epsilon: float = 0.1, max_trials: int = 8, max_epochs: int = 250, learning_rate_schedule: Tuple[List[float], List[int]] = ([0.1, 0.01, 0.001, 0.0001], [100, 150, 200, 220]), batch_size: int = 128, clip_values: Tuple[float, float] = (0, 1.0), verbose: int = 1)¶
Implementation of Gradient Matching Attack by Geiping, et. al. 2020. “Witches’ Brew: Industrial Scale Data Poisoning via Gradient Matching”
Paper link: https://arxiv.org/abs/2009.02276- __init__(classifier: CLASSIFIER_NEURALNETWORK_TYPE, percent_poison: float, epsilon: float = 0.1, max_trials: int = 8, max_epochs: int = 250, learning_rate_schedule: Tuple[List[float], List[int]] = ([0.1, 0.01, 0.001, 0.0001], [100, 150, 200, 220]), batch_size: int = 128, clip_values: Tuple[float, float] = (0, 1.0), verbose: int = 1)¶
Initialize a Gradient Matching Clean-Label poisoning attack (Witches’ Brew).
- Parameters
classifier – The proxy classifier used for the attack.
percent_poison (
float
) – The ratio of samples to poison among x_train, with range [0,1].epsilon (
float
) – The L-inf perturbation budget.max_trials (
int
) – The maximum number of restarts to optimize the poison.max_epochs (
int
) – The maximum number of epochs to optimize the train per trial.learning_rate_schedule (
Tuple
) – The learning rate schedule to optimize the poison. A List of (learning rate, epoch) pairs. The learning rate is used if the current epoch is less than the specified epoch.batch_size (
int
) – Batch size.clip_values (
Tuple
) – The range of the input features to the classifier.verbose (
int
) – Show progress bars.
- poison(x_trigger: ndarray, y_trigger: ndarray, x_train: ndarray, y_train: ndarray) Tuple[ndarray, ndarray] ¶
Optimizes a portion of poisoned samples from x_train to make a model classify x_target as y_target by matching the gradients.
- Return type
Tuple
- Parameters
x_trigger (
ndarray
) – A list of samples to use as triggers.y_trigger (
ndarray
) – A list of target classes to classify the triggers into.x_train (
ndarray
) – A list of training data to poison a portion of.y_train (
ndarray
) – A list of labels for x_train.
- Returns
A list of poisoned samples, and y_train.
Poisoning SVM Attack¶
- class art.attacks.poisoning.PoisoningAttackSVM(classifier: ScikitlearnSVC, step: Optional[float] = None, eps: Optional[float] = None, x_train: Optional[ndarray] = None, y_train: Optional[ndarray] = None, x_val: Optional[ndarray] = None, y_val: Optional[ndarray] = None, max_iter: int = 100, verbose: bool = True)¶
Close implementation of poisoning attack on Support Vector Machines (SVM) by Biggio et al.
Paper link: https://arxiv.org/pdf/1206.6389.pdf- __init__(classifier: ScikitlearnSVC, step: Optional[float] = None, eps: Optional[float] = None, x_train: Optional[ndarray] = None, y_train: Optional[ndarray] = None, x_val: Optional[ndarray] = None, y_val: Optional[ndarray] = None, max_iter: int = 100, verbose: bool = True) None ¶
Initialize an SVM poisoning attack.
- Parameters
classifier – A trained
ScikitlearnSVC
classifier.step – The step size of the classifier.
eps – The minimum difference in loss before convergence of the classifier.
x_train – The training data used for classification.
y_train – The training labels used for classification.
x_val – The validation data used to test the attack.
y_val – The validation labels used to test the attack.
max_iter (
int
) – The maximum number of iterations for the attack.verbose (
bool
) – Show progress bars.
- Raises
NotImplementedError, TypeError – If the argument classifier has the wrong type.
- attack_gradient(attack_point: ndarray, tol: float = 0.0001) ndarray ¶
Calculates the attack gradient, or dP for this attack. See equation 8 in Biggio et al. Ch. 14
- Return type
ndarray
- Parameters
attack_point (
ndarray
) – The current attack point.tol (
float
) – Tolerance level.
- Returns
The attack gradient.
- generate_attack_point(x_attack: ndarray, y_attack: ndarray) ndarray ¶
Generate a single poison attack the model, using x_val and y_val as validation points. The attack begins at the point init_attack. The attack class will be the opposite of the model’s classification for init_attack.
- Return type
ndarray
- Parameters
x_attack (
ndarray
) – The initial attack point.y_attack (
ndarray
) – The initial attack label.
- Returns
A tuple containing the final attack point and the poisoned model.
- poison(x: ndarray, y: Optional[ndarray] = None, **kwargs) Tuple[ndarray, ndarray] ¶
Iteratively finds optimal attack points starting at values at x.
- Return type
Tuple
- Parameters
x (
ndarray
) – An array with the points that initialize attack points.y – The target labels for the attack.
- Returns
A tuple holding the (poisoning_examples, poisoning_labels).
- predict_sign(vec: ndarray) ndarray ¶
Predicts the inputs by binary classifier and outputs -1 and 1 instead of 0 and 1.
- Return type
ndarray
- Parameters
vec (
ndarray
) – An input array.- Returns
An array of -1/1 predictions.
Sleeper Agent Attack¶
- class art.attacks.poisoning.SleeperAgentAttack(classifier: CLASSIFIER_NEURALNETWORK_TYPE, percent_poison: float, patch: ndarray, indices_target: List[int], epsilon: float = 0.1, max_trials: int = 8, max_epochs: int = 250, learning_rate_schedule: Tuple[List[float], List[int]] = ([0.1, 0.01, 0.001, 0.0001], [100, 150, 200, 220]), batch_size: int = 128, clip_values: Tuple[float, float] = (0, 1.0), verbose: int = 1, patching_strategy: str = 'random', selection_strategy: str = 'random', retraining_factor: int = 1, model_retrain: bool = False, model_retraining_epoch: int = 1, class_source: int = 0, class_target: int = 1, device_name: str = 'cpu', retrain_batch_size: int = 128)¶
Implementation of Sleeper Agent Attack
Paper link: https://arxiv.org/pdf/2106.08970.pdf- __init__(classifier: CLASSIFIER_NEURALNETWORK_TYPE, percent_poison: float, patch: ndarray, indices_target: List[int], epsilon: float = 0.1, max_trials: int = 8, max_epochs: int = 250, learning_rate_schedule: Tuple[List[float], List[int]] = ([0.1, 0.01, 0.001, 0.0001], [100, 150, 200, 220]), batch_size: int = 128, clip_values: Tuple[float, float] = (0, 1.0), verbose: int = 1, patching_strategy: str = 'random', selection_strategy: str = 'random', retraining_factor: int = 1, model_retrain: bool = False, model_retraining_epoch: int = 1, class_source: int = 0, class_target: int = 1, device_name: str = 'cpu', retrain_batch_size: int = 128)¶
Initialize a Sleeper Agent poisoning attack.
- Parameters
classifier – The proxy classifier used for the attack.
percent_poison (
float
) – The ratio of samples to poison among x_train, with range [0,1].patch (
ndarray
) – The patch to be applied as trigger.indices_target (
List
) – The indices of training data having target label.epsilon (
float
) – The L-inf perturbation budget.max_trials (
int
) – The maximum number of restarts to optimize the poison.max_epochs (
int
) – The maximum number of epochs to optimize the train per trial.learning_rate_schedule (
Tuple
) – The learning rate schedule to optimize the poison. A List of (learning rate, epoch) pairs. The learning rate is used if the current epoch is less than the specified epoch.batch_size (
int
) – Batch size.clip_values (
Tuple
) – The range of the input features to the classifier.verbose (
int
) – Show progress bars.patching_strategy (
str
) – Patching strategy to be used for adding trigger, either random/fixed.selection_strategy (
str
) – Selection strategy for getting the indices of poison examples - either random/maximum gradient norm.retraining_factor (
int
) – The factor for which retraining needs to be applied.model_retrain (
bool
) – True, if retraining has to be applied, else False.model_retraining_epoch (
int
) – The epochs for which retraining has to be applied.class_source (
int
) – The source class from which triggers were selected.class_target (
int
) – The target label to which the poisoned model needs to misclassify.retrain_batch_size (
int
) – Batch size required for model retraining.
- get_poison_indices() ndarray ¶
- Returns
indices of best poison index
- poison(x_trigger: ndarray, y_trigger: ndarray, x_train: ndarray, y_train: ndarray, x_test: ndarray, y_test: ndarray) Tuple[ndarray, ndarray] ¶
Optimizes a portion of poisoned samples from x_train to make a model classify x_target as y_target by matching the gradients.
- Return type
Tuple
- Parameters
x_trigger (
ndarray
) – A list of samples to use as triggers.y_trigger (
ndarray
) – A list of target classes to classify the triggers into.x_train (
ndarray
) – A list of training data to poison a portion of.y_train (
ndarray
) – A list of labels for x_train.
- Returns
x_train, y_train and indices of poisoned samples. Here, x_train are the samples selected from target class in training data.