art.defences.trainer
¶
Module implementing train-based defences against adversarial attacks.
Base Class Trainer¶
- class art.defences.trainer.Trainer(classifier: CLASSIFIER_LOSS_GRADIENTS_TYPE)¶
Abstract base class for training defences.
- __init__(classifier: CLASSIFIER_LOSS_GRADIENTS_TYPE) None ¶
Create a adversarial training object
- property classifier: CLASSIFIER_LOSS_GRADIENTS_TYPE¶
Access function to get the classifier.
- Returns
The classifier.
- abstract fit(x: ndarray, y: ndarray, **kwargs) None ¶
Train the model.
- Parameters
x (
ndarray
) – Training data.y (
ndarray
) – Labels for the training data.kwargs – Other parameters.
- get_classifier() CLASSIFIER_LOSS_GRADIENTS_TYPE ¶
Return the classifier trained via adversarial training.
- Returns
The classifier.
Adversarial Training¶
- class art.defences.trainer.AdversarialTrainer(classifier: CLASSIFIER_LOSS_GRADIENTS_TYPE, attacks: Union[EvasionAttack, List[EvasionAttack]], ratio: float = 0.5)¶
Class performing adversarial training based on a model architecture and one or multiple attack methods.
Incorporates original adversarial training, ensemble adversarial training (https://arxiv.org/abs/1705.07204), training on all adversarial data and other common setups. If multiple attacks are specified, they are rotated for each batch. If the specified attacks have as target a different model, then the attack is transferred. The ratio determines how many of the clean samples in each batch are replaced with their adversarial counterpart.
Warning
Both successful and unsuccessful adversarial samples are used for training. In the case of unbounded attacks (e.g., DeepFool), this can result in invalid (very noisy) samples being included.
Paper link: https://arxiv.org/abs/1705.07204Please keep in mind the limitations of defences. While adversarial training is widely regarded as a promising, principled approach to making classifiers more robust (see https://arxiv.org/abs/1802.00420), very careful evaluations are required to assess its effectiveness case by case (see https://arxiv.org/abs/1902.06705).- __init__(classifier: CLASSIFIER_LOSS_GRADIENTS_TYPE, attacks: Union[EvasionAttack, List[EvasionAttack]], ratio: float = 0.5) None ¶
Create an
AdversarialTrainer
instance.- Parameters
classifier – Model to train adversarially.
attacks – attacks to use for data augmentation in adversarial training
ratio (
float
) – The proportion of samples in each batch to be replaced with their adversarial counterparts. Setting this value to 1 allows to train only on adversarial samples.
- fit(x: ndarray, y: ndarray, batch_size: int = 128, nb_epochs: int = 20, **kwargs) None ¶
Train a model adversarially. See class documentation for more information on the exact procedure.
- Parameters
x (
ndarray
) – Training set.y (
ndarray
) – Labels for the training set.batch_size (
int
) – Size of batches.nb_epochs (
int
) – Number of epochs to use for trainings.kwargs – Dictionary of framework-specific arguments. These will be passed as such to the fit function of the target classifier.
- fit_generator(generator: DataGenerator, nb_epochs: int = 20, **kwargs) None ¶
Train a model adversarially using a data generator. See class documentation for more information on the exact procedure.
- Parameters
generator – Data generator.
nb_epochs (
int
) – Number of epochs to use for trainings.kwargs – Dictionary of framework-specific arguments. These will be passed as such to the fit function of the target classifier.
- predict(x: ndarray, **kwargs) ndarray ¶
Perform prediction using the adversarially trained classifier.
- Return type
ndarray
- Parameters
x (
ndarray
) – Input samples.kwargs – Other parameters to be passed on to the predict function of the classifier.
- Returns
Predictions for test set.
Adversarial Training Madry PGD¶
- class art.defences.trainer.AdversarialTrainerMadryPGD(classifier: CLASSIFIER_LOSS_GRADIENTS_TYPE, nb_epochs: Optional[int] = 205, batch_size: Optional[int] = 128, eps: Union[int, float] = 8, eps_step: Union[int, float] = 2, max_iter: int = 7, num_random_init: int = 1)¶
Class performing adversarial training following Madry’s Protocol.
Paper link: https://arxiv.org/abs/1706.06083Please keep in mind the limitations of defences. While adversarial training is widely regarded as a promising, principled approach to making classifiers more robust (see https://arxiv.org/abs/1802.00420), very careful evaluations are required to assess its effectiveness case by case (see https://arxiv.org/abs/1902.06705).- __init__(classifier: CLASSIFIER_LOSS_GRADIENTS_TYPE, nb_epochs: Optional[int] = 205, batch_size: Optional[int] = 128, eps: Union[int, float] = 8, eps_step: Union[int, float] = 2, max_iter: int = 7, num_random_init: int = 1) None ¶
Create an
AdversarialTrainerMadryPGD
instance.Default values are for CIFAR-10 in pixel range 0-255.
- Parameters
classifier – Classifier to train adversarially.
nb_epochs – Number of training epochs.
batch_size – Size of the batch on which adversarial samples are generated.
eps – Maximum perturbation that the attacker can introduce.
eps_step – Attack step size (input variation) at each iteration.
max_iter (
int
) – The maximum number of iterations.num_random_init (
int
) – Number of random initialisations within the epsilon ball. For num_random_init=0 starting at the original input.
- fit(x: ndarray, y: ndarray, validation_data: Optional[ndarray] = None, batch_size: Optional[int] = None, nb_epochs: Optional[int] = None, **kwargs) None ¶
Train a model adversarially. See class documentation for more information on the exact procedure.
- Parameters
x (
ndarray
) – Training data.y (
ndarray
) – Labels for the training data.validation_data – Validation data.
batch_size – Size of batches. Overwrites batch_size defined in __init__ if not None.
nb_epochs – Number of epochs to use for trainings. Overwrites nb_epochs defined in __init__ if not None.
kwargs – Dictionary of framework-specific arguments.
- get_classifier() CLASSIFIER_LOSS_GRADIENTS_TYPE ¶
Return the classifier trained via adversarial training.
- Returns
The classifier.
Base Class Adversarial Training Fast is Better than Free¶
- class art.defences.trainer.AdversarialTrainerFBF(classifier: CLASSIFIER_LOSS_GRADIENTS_TYPE, eps: Union[int, float] = 8)¶
This is abstract class for different backend-specific implementations of Fast is Better than Free protocol for adversarial training.
Paper link: https://openreview.net/forum?id=BJx040EFvH- __init__(classifier: CLASSIFIER_LOSS_GRADIENTS_TYPE, eps: Union[int, float] = 8)¶
Create an
AdversarialTrainerFBF
instance.- Parameters
classifier – Model to train adversarially.
eps – Maximum perturbation that the attacker can introduce.
- abstract fit(x: ndarray, y: ndarray, validation_data: Optional[Tuple[ndarray, ndarray]] = None, batch_size: int = 128, nb_epochs: int = 20, **kwargs)¶
Train a model adversarially with FBF. See class documentation for more information on the exact procedure.
- Parameters
x (
ndarray
) – Training set.y (
ndarray
) – Labels for the training set.validation_data – Tuple consisting of validation data, (x_val, y_val)
batch_size (
int
) – Size of batches.nb_epochs (
int
) – Number of epochs to use for trainings.kwargs – Dictionary of framework-specific arguments. These will be passed as such to the fit function of the target classifier.
- abstract fit_generator(generator: DataGenerator, nb_epochs: int = 20, **kwargs)¶
Train a model adversarially using a data generator. See class documentation for more information on the exact procedure.
- Parameters
generator – Data generator.
nb_epochs (
int
) – Number of epochs to use for trainings.kwargs – Dictionary of framework-specific arguments. These will be passed as such to the fit function of the target classifier.
- predict(x: ndarray, **kwargs) ndarray ¶
Perform prediction using the adversarially trained classifier.
- Return type
ndarray
- Parameters
x (
ndarray
) – Input samples.kwargs – Other parameters to be passed on to the predict function of the classifier.
- Returns
Predictions for test set.
Adversarial Training Fast is Better than Free - PyTorch¶
- class art.defences.trainer.AdversarialTrainerFBFPyTorch(classifier: PyTorchClassifier, eps: Union[int, float] = 8, use_amp: bool = False)¶
Class performing adversarial training following Fast is Better Than Free protocol.
Paper link: https://openreview.net/forum?id=BJx040EFvHThe effectiveness of this protocol is found to be sensitive to the use of techniques like data augmentation, gradient clipping and learning rate schedules. Optionally, the use of mixed precision arithmetic operation via apex library can significantly reduce the training time making this one of the fastest adversarial training protocol.- __init__(classifier: PyTorchClassifier, eps: Union[int, float] = 8, use_amp: bool = False)¶
Create an
AdversarialTrainerFBFPyTorch
instance.- Parameters
classifier – Model to train adversarially.
eps – Maximum perturbation that the attacker can introduce.
use_amp (
bool
) – Boolean that decides if apex should be used for mixed precision arithmetic during training
- fit(x: ndarray, y: ndarray, validation_data: Optional[Tuple[ndarray, ndarray]] = None, batch_size: int = 128, nb_epochs: int = 20, **kwargs)¶
Train a model adversarially with FBF protocol. See class documentation for more information on the exact procedure.
- Parameters
x (
ndarray
) – Training set.y (
ndarray
) – Labels for the training set.validation_data – Tuple consisting of validation data, (x_val, y_val)
batch_size (
int
) – Size of batches.nb_epochs (
int
) – Number of epochs to use for trainings.kwargs – Dictionary of framework-specific arguments. These will be passed as such to the fit function of the target classifier.
- fit_generator(generator: DataGenerator, nb_epochs: int = 20, **kwargs)¶
Train a model adversarially with FBF protocol using a data generator. See class documentation for more information on the exact procedure.
- Parameters
generator – Data generator.
nb_epochs (
int
) – Number of epochs to use for trainings.kwargs – Dictionary of framework-specific arguments. These will be passed as such to the fit function of the target classifier.
Adversarial Training Certified - PyTorch¶
- class art.defences.trainer.AdversarialTrainerCertifiedPytorch(classifier: CERTIFIER_TYPE, nb_epochs: Optional[int] = 20, bound: float = 0.1, loss_weighting: float = 0.1, batch_size: int = 10, use_certification_schedule: bool = True, certification_schedule: Optional[Any] = None, augment_with_pgd: bool = True, pgd_params: Optional[PGDParamDict] = None)¶
Class performing certified adversarial training from methods such as
Paper link: https://arxiv.org/pdf/1810.12715.pdf- __init__(classifier: CERTIFIER_TYPE, nb_epochs: Optional[int] = 20, bound: float = 0.1, loss_weighting: float = 0.1, batch_size: int = 10, use_certification_schedule: bool = True, certification_schedule: Optional[Any] = None, augment_with_pgd: bool = True, pgd_params: Optional[PGDParamDict] = None) None ¶
Create an
AdversarialTrainerCertified
instance.Default values are for MNIST in pixel range 0-1.
- Parameters
classifier – Classifier to train adversarially.
pgd_params –
A dictionary containing the specific parameters relating to regular PGD training. If not provided, we will default to typical MNIST values. Otherwise must contain the following keys:
eps: Maximum perturbation that the attacker can introduce.
eps_step: Attack step size (input variation) at each iteration.
max_iter: The maximum number of iterations.
batch_size: Size of the batch on which adversarial samples are generated.
num_random_init: Number of random initialisations within the epsilon ball.
bound (
float
) – The perturbation range for the zonotope. Will be ignored if a certification_schedule is used.loss_weighting (
float
) – Weighting factor for the certified loss.nb_epochs – Number of training epochs.
use_certification_schedule (
bool
) – If to use a training schedule for the certification radius.certification_schedule – Schedule for gradually increasing the certification radius. Empirical studies have shown that this is often required to achieve best performance. Either True to use the default linear scheduler, or a class with a .step() method that returns the updated bound every epoch.
batch_size (
int
) – Size of batches to use for certified training. NB, this will run the data sequentially accumulating gradients over the batch size.
- fit(x: ndarray, y: ndarray, certification_loss: Any = 'interval_loss_cce', batch_size: Optional[int] = None, nb_epochs: Optional[int] = None, training_mode: bool = True, scheduler: Optional[Any] = None, verbose: bool = True, **kwargs) None ¶
Fit the classifier on the training set (x, y).
- Parameters
x (
ndarray
) – Training data.y (
ndarray
) – Target values (class labels) one-hot-encoded of shape (nb_samples, nb_classes) or index labels of shape (nb_samples,).certification_loss – Which certification loss function to use. Either “interval_loss_cce” or “max_logit_loss”. By default will use interval_loss_cce. Alternatively, a user can supply their own loss function which takes in as input the zonotope predictions of the form () and labels of the from () and returns a scalar loss.
batch_size – Size of batches to use for certified training. NB, this will run the data sequentially accumulating gradients over the batch size.
nb_epochs – Number of epochs to use for training.
training_mode (
bool
) – True for model set to training mode and ‘False for model set to evaluation mode.scheduler – Learning rate scheduler to run at the start of every epoch.
verbose (
bool
) – If to display the per-batch statistics while training.kwargs – Dictionary of framework-specific arguments. This parameter is not currently supported for PyTorch and providing it takes no effect.
- predict(x: ndarray, **kwargs) ndarray ¶
Perform prediction using the adversarially trained classifier.
- Return type
ndarray
- Parameters
x (
ndarray
) – Input samples.kwargs – Other parameters to be passed on to the predict function of the classifier.
- Returns
Predictions for test set.
- predict_zonotopes(cent: ndarray, bound, **kwargs) Tuple[List[ndarray], List[ndarray]] ¶
Perform prediction using the adversarially trained classifier using zonotopes
- Return type
Tuple
- Parameters
cent (
ndarray
) – The datapoint, representing the zonotope center.bound – The perturbation range for the zonotope.
- set_forward_mode(mode: str) None ¶
Helper function to set the forward mode of the model
- Parameters
mode (
str
) – either concrete or abstract signifying how to run the forward pass
Adversarial Training Certified Interval Bound Propagation - PyTorch¶
- class art.defences.trainer.AdversarialTrainerCertifiedIBPPyTorch(classifier: IBP_CERTIFIER_TYPE, nb_epochs: Optional[int] = 20, bound: float = 0.1, batch_size: int = 32, loss_weighting: Optional[int] = None, use_certification_schedule: bool = True, certification_schedule: Optional[Any] = None, use_loss_weighting_schedule: bool = True, loss_weighting_schedule: Optional[Any] = None, augment_with_pgd: bool = False, pgd_params: Optional[PGDParamDict] = None)¶
Class performing certified adversarial training from methods such as
Paper link: https://arxiv.org/pdf/1810.12715.pdf- __init__(classifier: IBP_CERTIFIER_TYPE, nb_epochs: Optional[int] = 20, bound: float = 0.1, batch_size: int = 32, loss_weighting: Optional[int] = None, use_certification_schedule: bool = True, certification_schedule: Optional[Any] = None, use_loss_weighting_schedule: bool = True, loss_weighting_schedule: Optional[Any] = None, augment_with_pgd: bool = False, pgd_params: Optional[PGDParamDict] = None) None ¶
Create an
AdversarialTrainerCertified
instance.Default values are for MNIST in pixel range 0-1.
- Parameters
classifier – Classifier to train adversarially.
pgd_params –
A dictionary containing the specific parameters relating to regular PGD training. If not provided, we will default to typical MNIST values. Otherwise must contain the following keys:
eps: Maximum perturbation that the attacker can introduce.
eps_step: Attack step size (input variation) at each iteration.
max_iter: The maximum number of iterations.
batch_size: Size of the batch on which adversarial samples are generated.
num_random_init: Number of random initialisations within the epsilon ball.
loss_weighting – Weighting factor for the certified loss.
bound (
float
) – The perturbation range for the interval. If the default certification schedule is used will be the upper limit.nb_epochs – Number of training epochs.
use_certification_schedule (
bool
) – If to use a training schedule for the certification radius.certification_schedule – Schedule for gradually increasing the certification radius. Empirical studies have shown that this is often required to achieve best performance. Either True to use the default linear scheduler, or a class with a .step() method that returns the updated bound every epoch.
batch_size (
int
) – Size of batches to use for certified training.
- fit(x: ndarray, y: ndarray, limits: Optional[Union[List[float], ndarray]] = None, certification_loss: Any = 'interval_loss_cce', batch_size: Optional[int] = None, nb_epochs: Optional[int] = None, training_mode: bool = True, scheduler: Optional[Any] = None, verbose: bool = True, **kwargs) None ¶
Fit the classifier on the training set (x, y).
- Parameters
x (
ndarray
) – Training data.y (
ndarray
) – Target values (class labels) one-hot-encoded of shape (nb_samples, nb_classes) or index labels of shape (nb_samples,).limits – Max and min limits on the inputs, limits[0] being the lower bounds and limits[1] being upper bounds. Passing None will mean no clipping is applied to the interval abstraction. Typical images will have limits of [0.0, 1.0] after normalization.
certification_loss – Which certification loss function to use. Either “interval_loss_cce” or “max_logit_loss”. By default will use interval_loss_cce. Alternatively, a user can supply their own loss function which takes in as input the interval predictions of the form () and labels of the form () and returns a scalar loss.
batch_size – Size of batches to use for certified training. NB, this will run the data sequentially accumulating gradients over the batch size.
nb_epochs – Number of epochs to use for training.
training_mode (
bool
) – True for model set to training mode and ‘False for model set to evaluation mode.scheduler – Learning rate scheduler to run at the start of every epoch.
verbose (
bool
) – If to display the per-batch statistics while training.kwargs – Dictionary of framework-specific arguments. This parameter is not currently supported for PyTorch and providing it takes no effect.
- static initialise_default_scheduler(initial_val: float, final_val: float, epochs: int) DefaultLinearScheduler ¶
Create linear schedulers based on default example values.
- Return type
DefaultLinearScheduler
- Parameters
initial_val (
float
) – Initial value to begin the scheduler from.final_val (
float
) – Final value to end the scheduler at.epochs (
int
) – Total number of epochs.
- Returns
A linear scheduler initialised with default example values.
- predict(x: ndarray, **kwargs) ndarray ¶
Perform prediction using the adversarially trained classifier.
- Return type
ndarray
- Parameters
x (
ndarray
) – Input samples.kwargs – Other parameters to be passed on to the predict function of the classifier.
- Returns
Predictions for test set.
- predict_intervals(x: ndarray, is_interval: bool = False, bounds: Optional[Union[float, List[float], ndarray]] = None, limits: Optional[Union[List[float], ndarray]] = None, batch_size: int = 128, **kwargs) ndarray ¶
Perform prediction using the adversarially trained classifier using zonotopes
- Return type
ndarray
- Parameters
x (
ndarray
) –The datapoint, either:
In the interval format of x[batch_size, 2, feature_1, feature_2, …] where axis=1 corresponds to the [lower, upper] bounds.
Or in regular concrete form, in which case the bounds/limits need to be supplied.
is_interval (
bool
) – if the datapoint is already in the correct interval format.bounds – The perturbation range.
limits – The clipping to apply to the interval data.
batch_size (
int
) – batch size to use when looping through the data
- set_forward_mode(mode: str) None ¶
Helper function to set the forward mode of the model
- Parameters
mode (
str
) – either concrete or abstract signifying how to run the forward pass
DP - InstaHide Training¶
- class art.defences.trainer.DPInstaHideTrainer(classifier: CLASSIFIER_LOSS_GRADIENTS_TYPE, augmentations: Union[Preprocessor, List[Preprocessor]], noise: typing_extensions.Literal['gaussian', 'laplacian', 'exponential'] = 'laplacian', loc: Union[int, float] = 0.0, scale: Union[int, float] = 0.03, clip_values: CLIP_VALUES_TYPE = (0.0, 1.0))¶
Class performing adversarial training following the DP-InstaHide protocol.
Uses data augmentation methods in conjunction with some type of additive noise.
Paper link: https://arxiv.org/abs/2103.02079- __init__(classifier: CLASSIFIER_LOSS_GRADIENTS_TYPE, augmentations: Union[Preprocessor, List[Preprocessor]], noise: typing_extensions.Literal['gaussian', 'laplacian', 'exponential'] = 'laplacian', loc: Union[int, float] = 0.0, scale: Union[int, float] = 0.03, clip_values: CLIP_VALUES_TYPE = (0.0, 1.0))¶
Create an
DPInstaHideTrainer
instance.- Parameters
classifier – The model to train using the protocol.
augmentations – The preprocessing data augmentation defence(s) to be applied.
noise – The type of additive noise to use: ‘gaussian’ | ‘laplacian’ | ‘exponential’.
loc – The location or mean parameter of the distribution to sample.
scale – The scale or standard deviation parameter of the distribution to sample.
clip_values – Tuple of the form (min, max) representing the minimum and maximum values allowed for features.
- fit(x: ndarray, y: ndarray, validation_data: Optional[Tuple[ndarray, ndarray]] = None, batch_size: int = 128, nb_epochs: int = 20, **kwargs)¶
Train a model adversarially with the DP-InstaHide protocol. See class documentation for more information on the exact procedure.
- Parameters
x (
ndarray
) – Training set.y (
ndarray
) – Labels for the training set.validation_data – Tuple consisting of validation data, (x_val, y_val)
batch_size (
int
) – Size of batches.nb_epochs (
int
) – Number of epochs to use for trainings.kwargs – Dictionary of framework-specific arguments. These will be passed as such to the fit function of the target classifier.
- fit_generator(generator: DataGenerator, nb_epochs: int = 20, **kwargs)¶
Train a model adversarially with the DP-InstaHide protocol using a data generator. See class documentation for more information on the exact procedure.
- Parameters
generator – Data generator.
nb_epochs (
int
) – Number of epochs to use for trainings.kwargs – Dictionary of framework-specific arguments. These will be passed as such to the fit function of the target classifier.
- predict(x: ndarray, **kwargs) ndarray ¶
Perform prediction using the adversarially trained classifier.
- Return type
ndarray
- Parameters
x (
ndarray
) – Input samples.kwargs – Other parameters to be passed on to the predict function of the classifier.
- Returns
Predictions for test set.