art.attacks.poisoning
¶
Module providing poisoning attacks under a common interface.
Adversarial Embedding Attack¶
-
class
art.attacks.poisoning.
PoisoningAttackAdversarialEmbedding
(classifier: CLASSIFIER_TYPE, backdoor: art.attacks.poisoning.backdoor_attack.PoisoningAttackBackdoor, feature_layer: Union[int, str], target: Union[numpy.ndarray, List[Tuple[numpy.ndarray, numpy.ndarray]]], pp_poison: Union[float, List[float]] = 0.05, discriminator_layer_1: int = 256, discriminator_layer_2: int = 128, regularization: float = 10, learning_rate: float = 0.0001, clone=True)¶ Implementation of Adversarial Embedding attack by Tan, Shokri (2019). “Bypassing Backdoor Detection Algorithms in Deep Learning”
This attack trains a classifier with an additional discriminator and loss function that aims to create non-differentiable latent representations between backdoored and benign examples.
Paper link: https://arxiv.org/abs/1905.13409-
__init__
(classifier: CLASSIFIER_TYPE, backdoor: art.attacks.poisoning.backdoor_attack.PoisoningAttackBackdoor, feature_layer: Union[int, str], target: Union[numpy.ndarray, List[Tuple[numpy.ndarray, numpy.ndarray]]], pp_poison: Union[float, List[float]] = 0.05, discriminator_layer_1: int = 256, discriminator_layer_2: int = 128, regularization: float = 10, learning_rate: float = 0.0001, clone=True)¶ Initialize an Feature Collision Clean-Label poisoning attack
- Parameters
classifier – A neural network classifier.
backdoor (
PoisoningAttackBackdoor
) – The backdoor attack used to poison samplesfeature_layer – The layer of the original network to extract features from
target – The target label to poison
pp_poison – The percentage of training data to poison
discriminator_layer_1 (
int
) – The size of the first discriminator layerdiscriminator_layer_2 (
int
) – The size of the second discriminator layerregularization (
float
) – The regularization constant for the backdoor recognition part of the loss functionlearning_rate (
float
) – The learning rate of clean-label attack optimization.clone (
bool
) – Whether or not to clone the model or apply the attack on the original model
-
get_training_data
() → Optional[Tuple[numpy.ndarray, numpy.ndarray, numpy.ndarray]]¶ Returns the training data generated from the last call to fit
- Returns
If fit has been called, return the last data, labels, and backdoor labels used to train model otherwise return None
-
poison
(x: numpy.ndarray, y: Optional[numpy.ndarray] = None, broadcast=False, **kwargs) → Tuple[numpy.ndarray, numpy.ndarray]¶ Calls perturbation function on input x and target labels y
- Return type
Tuple
- Parameters
x (
ndarray
) – An array with the points that initialize attack points.y – The target labels for the attack.
broadcast (
bool
) – whether or not to broadcast single target label
- Returns
An tuple holding the (poisoning_examples, poisoning_labels).
-
poison_estimator
(x: numpy.ndarray, y: numpy.ndarray, batch_size: int = 64, nb_epochs: int = 10, **kwargs) → CLASSIFIER_TYPE¶ Train a poisoned model and return it :type nb_epochs:
int
:type batch_size:int
:type y:ndarray
:type x:ndarray
:param x: Training data :param y: Training labels :param batch_size: The size of the batches used for training :param nb_epochs: The number of epochs to train for :return: A classifier with embedded backdoors
-
Backdoor Poisoning Attack¶
-
class
art.attacks.poisoning.
PoisoningAttackBackdoor
(perturbation: Union[Callable, List[Callable]])¶ Implementation of backdoor attacks introduced in Gu, et. al. 2017
Applies a number of backdoor perturbation functions and switches label to target label
Paper link: https://arxiv.org/abs/1708.06733-
__init__
(perturbation: Union[Callable, List[Callable]]) → None¶ Initialize a backdoor poisoning attack.
- Parameters
perturbation – A single perturbation function or list of perturbation functions that modify input.
-
poison
(x: numpy.ndarray, y: Optional[numpy.ndarray] = None, broadcast=False, **kwargs) → Tuple[numpy.ndarray, numpy.ndarray]¶ Calls perturbation function on input x and returns the perturbed input and poison labels for the data.
- Return type
Tuple
- Parameters
x (
ndarray
) – An array with the points that initialize attack points.y – The target labels for the attack.
broadcast (
bool
) – whether or not to broadcast single target label
- Returns
An tuple holding the (poisoning_examples, poisoning_labels).
-
Bullseye Polytope Attack¶
-
class
art.attacks.poisoning.
BullseyePolytopeAttackPyTorch
(classifier: Union[CLASSIFIER_NEURALNETWORK_TYPE, List[CLASSIFIER_NEURALNETWORK_TYPE]], target: numpy.ndarray, feature_layer: Union[str, int, List[Union[str, int]]], opt: str = 'adam', max_iter: int = 4000, learning_rate: float = 0.04, momentum: float = 0.9, decay_iter: Union[int, List[int]] = 10000, decay_coeff: float = 0.5, epsilon: float = 0.1, dropout: float = 0.3, net_repeat: int = 1, endtoend: bool = True, batch_size: int = 128, verbose: bool = True)¶ Implementation of Bullseye Polytope Attack by Aghakhani, et. al. 2020. “Bullseye Polytope: A Scalable Clean-Label Poisoning Attack with Improved Transferability”
This implementation is based on UCSB’s original code here: https://github.com/ucsb-seclab/BullseyePoison
Paper link: https://arxiv.org/abs/2005.00191-
__init__
(classifier: Union[CLASSIFIER_NEURALNETWORK_TYPE, List[CLASSIFIER_NEURALNETWORK_TYPE]], target: numpy.ndarray, feature_layer: Union[str, int, List[Union[str, int]]], opt: str = 'adam', max_iter: int = 4000, learning_rate: float = 0.04, momentum: float = 0.9, decay_iter: Union[int, List[int]] = 10000, decay_coeff: float = 0.5, epsilon: float = 0.1, dropout: float = 0.3, net_repeat: int = 1, endtoend: bool = True, batch_size: int = 128, verbose: bool = True)¶ Initialize an Feature Collision Clean-Label poisoning attack
- Parameters
classifier – The proxy classifiers used for the attack. Can be a single classifier or list of classifiers with varying architectures.
target (
ndarray
) – The target input(s) of shape (N, W, H, C) to misclassify at test time. Multiple targets will be averaged.feature_layer – The name(s) of the feature representation layer(s).
opt (
str
) – The optimizer to use for the attack. Can be ‘adam’ or ‘sgd’max_iter (
int
) – The maximum number of iterations for the attack.learning_rate (
float
) – The learning rate of clean-label attack optimization.momentum (
float
) – The momentum of clean-label attack optimization.decay_iter – Which iterations to decay the learning rate. Can be a integer (every N iterations) or list of integers [0, 500, 1500]
decay_coeff (
float
) – The decay coefficient of the learning rate.epsilon (
float
) – The perturbation budgetdropout (
float
) – Dropout to apply while trainingnet_repeat (
int
) – The number of times to repeat prediction on each networkendtoend (
bool
) – True for end-to-end training. False for transfer learning.batch_size (
int
) – Batch size.verbose (
bool
) – Show progress bars.
-
poison
(x: numpy.ndarray, y: Optional[numpy.ndarray] = None, **kwargs) → Tuple[numpy.ndarray, numpy.ndarray]¶ Iteratively finds optimal attack points starting at values at x
- Return type
Tuple
- Parameters
x (
ndarray
) – The base images to begin the poison process.y – Target label
- Returns
An tuple holding the (poisoning examples, poisoning labels).
-
Clean Label Backdoor Attack¶
-
class
art.attacks.poisoning.
PoisoningAttackCleanLabelBackdoor
(backdoor: art.attacks.poisoning.backdoor_attack.PoisoningAttackBackdoor, proxy_classifier: CLASSIFIER_LOSS_GRADIENTS_TYPE, target: numpy.ndarray, pp_poison: float = 0.33, norm: Union[int, float, str] = inf, eps: float = 0.3, eps_step: float = 0.1, max_iter: int = 100, num_random_init: int = 0)¶ Implementation of Clean-Label Backdoor Attacks introduced in Gu, et. al. 2017
Applies a number of backdoor perturbation functions and switches label to target label
Paper link: https://arxiv.org/abs/1708.06733-
__init__
(backdoor: art.attacks.poisoning.backdoor_attack.PoisoningAttackBackdoor, proxy_classifier: CLASSIFIER_LOSS_GRADIENTS_TYPE, target: numpy.ndarray, pp_poison: float = 0.33, norm: Union[int, float, str] = inf, eps: float = 0.3, eps_step: float = 0.1, max_iter: int = 100, num_random_init: int = 0) → None¶ Creates a new Clean Label Backdoor poisoning attack
- Parameters
backdoor (
PoisoningAttackBackdoor
) – the backdoor chosen for this attackproxy_classifier – the classifier for this attack ideally it solves the same or similar classification task as the original classifier
target (
ndarray
) – The target label to poisonpp_poison (
float
) – The percentage of the data to poison. Note: Only data within the target label is poisonednorm – The norm of the adversarial perturbation supporting “inf”, np.inf, 1 or 2.
eps (
float
) – Maximum perturbation that the attacker can introduce.eps_step (
float
) – Attack step size (input variation) at each iteration.max_iter (
int
) – The maximum number of iterations.num_random_init (
int
) – Number of random initialisations within the epsilon ball. For num_random_init=0 starting at the original input.
-
poison
(x: numpy.ndarray, y: Optional[numpy.ndarray] = None, broadcast: bool = True, **kwargs) → Tuple[numpy.ndarray, numpy.ndarray]¶ Calls perturbation function on input x and returns the perturbed input and poison labels for the data.
- Return type
Tuple
- Parameters
x (
ndarray
) – An array with the points that initialize attack points.y – The target labels for the attack.
broadcast (
bool
) – whether or not to broadcast single target label
- Returns
An tuple holding the (poisoning_examples, poisoning_labels).
-
Feature Collision Attack¶
-
class
art.attacks.poisoning.
FeatureCollisionAttack
(classifier: CLASSIFIER_NEURALNETWORK_TYPE, target: numpy.ndarray, feature_layer: Union[str, int], learning_rate: float = 127500.0, decay_coeff: float = 0.5, stopping_tol: float = 1e-10, obj_threshold: Optional[float] = None, num_old_obj: int = 40, max_iter: int = 120, similarity_coeff: float = 256.0, watermark: Optional[float] = None, verbose: bool = True)¶ Close implementation of Feature Collision Poisoning Attack by Shafahi, Huang, et al 2018. “Poison Frogs! Targeted Clean-Label Poisoning Attacks on Neural Networks”
This implementation dynamically calculates the dimension of the feature layer, and doesn’t hardcode this value to 2048 as done in the paper. Thus we recommend using larger values for the similarity_coefficient.
Paper link: https://arxiv.org/abs/1804.00792-
__init__
(classifier: CLASSIFIER_NEURALNETWORK_TYPE, target: numpy.ndarray, feature_layer: Union[str, int], learning_rate: float = 127500.0, decay_coeff: float = 0.5, stopping_tol: float = 1e-10, obj_threshold: Optional[float] = None, num_old_obj: int = 40, max_iter: int = 120, similarity_coeff: float = 256.0, watermark: Optional[float] = None, verbose: bool = True)¶ Initialize an Feature Collision Clean-Label poisoning attack
- Parameters
classifier – A trained neural network classifier.
target (
ndarray
) – The target input to misclassify at test time.feature_layer – The name of the feature representation layer.
learning_rate (
float
) – The learning rate of clean-label attack optimization.decay_coeff (
float
) – The decay coefficient of the learning rate.stopping_tol (
float
) – Stop iterations after changes in attacks in less than this threshold.obj_threshold – Stop iterations after changes in objectives values are less than this threshold.
num_old_obj (
int
) – The number of old objective values to store.max_iter (
int
) – The maximum number of iterations for the attack.similarity_coeff (
float
) – The maximum number of iterations for the attack.watermark – Whether The opacity of the watermarked target image.
verbose (
bool
) – Show progress bars.
-
backward_step
(base: numpy.ndarray, feature_rep: numpy.ndarray, poison: numpy.ndarray) → numpy.ndarray¶ Backward part of forward-backward splitting algorithm
- Return type
ndarray
- Parameters
base (
ndarray
) – The base image that the poison was initialized with.feature_rep (
ndarray
) – Numpy activations at the target layer.poison (
ndarray
) – The current poison samples.
- Returns
Poison example closer in feature representation to target space.
-
forward_step
(poison: numpy.ndarray) → numpy.ndarray¶ Forward part of forward-backward splitting algorithm.
- Return type
ndarray
- Parameters
poison (
ndarray
) – the current poison samples.- Returns
poison example closer in feature representation to target space.
-
objective
(poison_feature_rep: numpy.ndarray, target_feature_rep: numpy.ndarray, base_image: numpy.ndarray, poison: numpy.ndarray) → float¶ Objective function of the attack
- Return type
float
- Parameters
poison_feature_rep (
ndarray
) – The numpy activations of the poison image.target_feature_rep (
ndarray
) – The numpy activations of the target image.base_image (
ndarray
) – The initial image used to poison.poison (
ndarray
) – The current poison image.
- Returns
The objective of the optimization.
-
poison
(x: numpy.ndarray, y: Optional[numpy.ndarray] = None, **kwargs) → Tuple[numpy.ndarray, numpy.ndarray]¶ Iteratively finds optimal attack points starting at values at x
- Return type
Tuple
- Parameters
x (
ndarray
) – The base images to begin the poison process.y – Not used in this attack (clean-label).
- Returns
An tuple holding the (poisoning examples, poisoning labels).
-
Poisoning SVM Attack¶
-
class
art.attacks.poisoning.
PoisoningAttackSVM
(classifier: art.estimators.classification.scikitlearn.ScikitlearnSVC, step: Optional[float] = None, eps: Optional[float] = None, x_train: Optional[numpy.ndarray] = None, y_train: Optional[numpy.ndarray] = None, x_val: Optional[numpy.ndarray] = None, y_val: Optional[numpy.ndarray] = None, max_iter: int = 100, verbose: bool = True)¶ Close implementation of poisoning attack on Support Vector Machines (SVM) by Biggio et al.
Paper link: https://arxiv.org/pdf/1206.6389.pdf-
__init__
(classifier: art.estimators.classification.scikitlearn.ScikitlearnSVC, step: Optional[float] = None, eps: Optional[float] = None, x_train: Optional[numpy.ndarray] = None, y_train: Optional[numpy.ndarray] = None, x_val: Optional[numpy.ndarray] = None, y_val: Optional[numpy.ndarray] = None, max_iter: int = 100, verbose: bool = True) → None¶ Initialize an SVM poisoning attack.
- Parameters
classifier – A trained
ScikitlearnSVC
classifier.step – The step size of the classifier.
eps – The minimum difference in loss before convergence of the classifier.
x_train – The training data used for classification.
y_train – The training labels used for classification.
x_val – The validation data used to test the attack.
y_val – The validation labels used to test the attack.
max_iter (
int
) – The maximum number of iterations for the attack.verbose (
bool
) – Show progress bars.
- Raises
NotImplementedError, TypeError – If the argument classifier has the wrong type.
-
attack_gradient
(attack_point: numpy.ndarray, tol: float = 0.0001) → numpy.ndarray¶ Calculates the attack gradient, or dP for this attack. See equation 8 in Biggio et al. Ch. 14
- Return type
ndarray
- Parameters
attack_point (
ndarray
) – The current attack point.tol (
float
) – Tolerance level.
- Returns
The attack gradient.
-
generate_attack_point
(x_attack: numpy.ndarray, y_attack: numpy.ndarray) → numpy.ndarray¶ Generate a single poison attack the model, using x_val and y_val as validation points. The attack begins at the point init_attack. The attack class will be the opposite of the model’s classification for init_attack.
- Return type
ndarray
- Parameters
x_attack (
ndarray
) – The initial attack point.y_attack (
ndarray
) – The initial attack label.
- Returns
A tuple containing the final attack point and the poisoned model.
-
poison
(x: numpy.ndarray, y: Optional[numpy.ndarray] = None, **kwargs) → Tuple[numpy.ndarray, numpy.ndarray]¶ Iteratively finds optimal attack points starting at values at x.
- Return type
Tuple
- Parameters
x (
ndarray
) – An array with the points that initialize attack points.y – The target labels for the attack.
- Returns
A tuple holding the (poisoning_examples, poisoning_labels).
-
predict_sign
(vec: numpy.ndarray) → numpy.ndarray¶ Predicts the inputs by binary classifier and outputs -1 and 1 instead of 0 and 1.
- Return type
ndarray
- Parameters
vec (
ndarray
) – An input array.- Returns
An array of -1/1 predictions.
-