art.defences.trainer

Module implementing train-based defences against adversarial attacks.

Base Class Trainer

class art.defences.trainer.Trainer(classifier: CLASSIFIER_LOSS_GRADIENTS_TYPE)

Abstract base class for training defences.

__init__(classifier: CLASSIFIER_LOSS_GRADIENTS_TYPE) None

Create a adversarial training object

abstract fit(x: ndarray, y: ndarray, **kwargs) None

Train the model.

Parameters
  • x (ndarray) – Training data.

  • y (ndarray) – Labels for the training data.

  • kwargs – Other parameters.

get_classifier() CLASSIFIER_LOSS_GRADIENTS_TYPE

Return the classifier trained via adversarial training.

Returns

The classifier.

Adversarial Training

class art.defences.trainer.AdversarialTrainer(classifier: CLASSIFIER_LOSS_GRADIENTS_TYPE, attacks: Union[EvasionAttack, List[EvasionAttack]], ratio: float = 0.5)

Class performing adversarial training based on a model architecture and one or multiple attack methods.

Incorporates original adversarial training, ensemble adversarial training (https://arxiv.org/abs/1705.07204), training on all adversarial data and other common setups. If multiple attacks are specified, they are rotated for each batch. If the specified attacks have as target a different model, then the attack is transferred. The ratio determines how many of the clean samples in each batch are replaced with their adversarial counterpart.

Warning

Both successful and unsuccessful adversarial samples are used for training. In the case of unbounded attacks (e.g., DeepFool), this can result in invalid (very noisy) samples being included.

Please keep in mind the limitations of defences. While adversarial training is widely regarded as a promising, principled approach to making classifiers more robust (see https://arxiv.org/abs/1802.00420), very careful evaluations are required to assess its effectiveness case by case (see https://arxiv.org/abs/1902.06705).
__init__(classifier: CLASSIFIER_LOSS_GRADIENTS_TYPE, attacks: Union[EvasionAttack, List[EvasionAttack]], ratio: float = 0.5) None

Create an AdversarialTrainer instance.

Parameters
  • classifier – Model to train adversarially.

  • attacks – attacks to use for data augmentation in adversarial training

  • ratio (float) – The proportion of samples in each batch to be replaced with their adversarial counterparts. Setting this value to 1 allows to train only on adversarial samples.

fit(x: ndarray, y: ndarray, batch_size: int = 128, nb_epochs: int = 20, **kwargs) None

Train a model adversarially. See class documentation for more information on the exact procedure.

Parameters
  • x (ndarray) – Training set.

  • y (ndarray) – Labels for the training set.

  • batch_size (int) – Size of batches.

  • nb_epochs (int) – Number of epochs to use for trainings.

  • kwargs – Dictionary of framework-specific arguments. These will be passed as such to the fit function of the target classifier.

fit_generator(generator: DataGenerator, nb_epochs: int = 20, **kwargs) None

Train a model adversarially using a data generator. See class documentation for more information on the exact procedure.

Parameters
  • generator – Data generator.

  • nb_epochs (int) – Number of epochs to use for trainings.

  • kwargs – Dictionary of framework-specific arguments. These will be passed as such to the fit function of the target classifier.

predict(x: ndarray, **kwargs) ndarray

Perform prediction using the adversarially trained classifier.

Return type

ndarray

Parameters
  • x (ndarray) – Input samples.

  • kwargs – Other parameters to be passed on to the predict function of the classifier.

Returns

Predictions for test set.

Adversarial Training Madry PGD

class art.defences.trainer.AdversarialTrainerMadryPGD(classifier: CLASSIFIER_LOSS_GRADIENTS_TYPE, nb_epochs: Optional[int] = 205, batch_size: Optional[int] = 128, eps: Union[int, float] = 8, eps_step: Union[int, float] = 2, max_iter: int = 7, num_random_init: int = 1)

Class performing adversarial training following Madry’s Protocol.

Please keep in mind the limitations of defences. While adversarial training is widely regarded as a promising, principled approach to making classifiers more robust (see https://arxiv.org/abs/1802.00420), very careful evaluations are required to assess its effectiveness case by case (see https://arxiv.org/abs/1902.06705).
__init__(classifier: CLASSIFIER_LOSS_GRADIENTS_TYPE, nb_epochs: Optional[int] = 205, batch_size: Optional[int] = 128, eps: Union[int, float] = 8, eps_step: Union[int, float] = 2, max_iter: int = 7, num_random_init: int = 1) None

Create an AdversarialTrainerMadryPGD instance.

Default values are for CIFAR-10 in pixel range 0-255.

Parameters
  • classifier – Classifier to train adversarially.

  • nb_epochs – Number of training epochs.

  • batch_size – Size of the batch on which adversarial samples are generated.

  • eps – Maximum perturbation that the attacker can introduce.

  • eps_step – Attack step size (input variation) at each iteration.

  • max_iter (int) – The maximum number of iterations.

  • num_random_init (int) – Number of random initialisations within the epsilon ball. For num_random_init=0 starting at the original input.

fit(x: ndarray, y: ndarray, validation_data: Optional[ndarray] = None, batch_size: Optional[int] = None, nb_epochs: Optional[int] = None, **kwargs) None

Train a model adversarially. See class documentation for more information on the exact procedure.

Parameters
  • x (ndarray) – Training data.

  • y (ndarray) – Labels for the training data.

  • validation_data – Validation data.

  • batch_size – Size of batches. Overwrites batch_size defined in __init__ if not None.

  • nb_epochs – Number of epochs to use for trainings. Overwrites nb_epochs defined in __init__ if not None.

  • kwargs – Dictionary of framework-specific arguments.

get_classifier() CLASSIFIER_LOSS_GRADIENTS_TYPE

Return the classifier trained via adversarial training.

Returns

The classifier.

Base Class Adversarial Training Fast is Better than Free

class art.defences.trainer.AdversarialTrainerFBF(classifier: CLASSIFIER_LOSS_GRADIENTS_TYPE, eps: Union[int, float] = 8)

This is abstract class for different backend-specific implementations of Fast is Better than Free protocol for adversarial training.

__init__(classifier: CLASSIFIER_LOSS_GRADIENTS_TYPE, eps: Union[int, float] = 8)

Create an AdversarialTrainerFBF instance.

Parameters
  • classifier – Model to train adversarially.

  • eps – Maximum perturbation that the attacker can introduce.

abstract fit(x: ndarray, y: ndarray, validation_data: Optional[Tuple[ndarray, ndarray]] = None, batch_size: int = 128, nb_epochs: int = 20, **kwargs)

Train a model adversarially with FBF. See class documentation for more information on the exact procedure.

Parameters
  • x (ndarray) – Training set.

  • y (ndarray) – Labels for the training set.

  • validation_data – Tuple consisting of validation data, (x_val, y_val)

  • batch_size (int) – Size of batches.

  • nb_epochs (int) – Number of epochs to use for trainings.

  • kwargs – Dictionary of framework-specific arguments. These will be passed as such to the fit function of the target classifier.

abstract fit_generator(generator: DataGenerator, nb_epochs: int = 20, **kwargs)

Train a model adversarially using a data generator. See class documentation for more information on the exact procedure.

Parameters
  • generator – Data generator.

  • nb_epochs (int) – Number of epochs to use for trainings.

  • kwargs – Dictionary of framework-specific arguments. These will be passed as such to the fit function of the target classifier.

predict(x: ndarray, **kwargs) ndarray

Perform prediction using the adversarially trained classifier.

Return type

ndarray

Parameters
  • x (ndarray) – Input samples.

  • kwargs – Other parameters to be passed on to the predict function of the classifier.

Returns

Predictions for test set.

Adversarial Training Fast is Better than Free - PyTorch

class art.defences.trainer.AdversarialTrainerFBFPyTorch(classifier: PyTorchClassifier, eps: Union[int, float] = 8, use_amp: bool = False)

Class performing adversarial training following Fast is Better Than Free protocol.

The effectiveness of this protocol is found to be sensitive to the use of techniques like data augmentation, gradient clipping and learning rate schedules. Optionally, the use of mixed precision arithmetic operation via apex library can significantly reduce the training time making this one of the fastest adversarial training protocol.
__init__(classifier: PyTorchClassifier, eps: Union[int, float] = 8, use_amp: bool = False)

Create an AdversarialTrainerFBFPyTorch instance.

Parameters
  • classifier – Model to train adversarially.

  • eps – Maximum perturbation that the attacker can introduce.

  • use_amp (bool) – Boolean that decides if apex should be used for mixed precision arithmetic during training

fit(x: ndarray, y: ndarray, validation_data: Optional[Tuple[ndarray, ndarray]] = None, batch_size: int = 128, nb_epochs: int = 20, **kwargs)

Train a model adversarially with FBF protocol. See class documentation for more information on the exact procedure.

Parameters
  • x (ndarray) – Training set.

  • y (ndarray) – Labels for the training set.

  • validation_data – Tuple consisting of validation data, (x_val, y_val)

  • batch_size (int) – Size of batches.

  • nb_epochs (int) – Number of epochs to use for trainings.

  • kwargs – Dictionary of framework-specific arguments. These will be passed as such to the fit function of the target classifier.

fit_generator(generator: DataGenerator, nb_epochs: int = 20, **kwargs)

Train a model adversarially with FBF protocol using a data generator. See class documentation for more information on the exact procedure.

Parameters
  • generator – Data generator.

  • nb_epochs (int) – Number of epochs to use for trainings.

  • kwargs – Dictionary of framework-specific arguments. These will be passed as such to the fit function of the target classifier.

Adversarial Training Certified - PyTorch

class art.defences.trainer.AdversarialTrainerCertifiedPytorch(classifier: CERTIFIER_TYPE, nb_epochs: Optional[int] = 20, bound: float = 0.1, loss_weighting: float = 0.1, batch_size: int = 10, use_certification_schedule: bool = True, certification_schedule: Optional[Any] = None, pgd_params: Optional[PGDParamDict] = None)

Class performing certified adversarial training from methods such as

__init__(classifier: CERTIFIER_TYPE, nb_epochs: Optional[int] = 20, bound: float = 0.1, loss_weighting: float = 0.1, batch_size: int = 10, use_certification_schedule: bool = True, certification_schedule: Optional[Any] = None, pgd_params: Optional[PGDParamDict] = None) None

Create an AdversarialTrainerCertified instance.

Default values are for MNIST in pixel range 0-1.

Parameters
  • classifier – Classifier to train adversarially.

  • pgd_params

    A dictionary containing the specific parameters relating to regular PGD training. If not provided, we will default to typical MNIST values. Otherwise must contain the following keys:

    • eps: Maximum perturbation that the attacker can introduce.

    • eps_step: Attack step size (input variation) at each iteration.

    • max_iter: The maximum number of iterations.

    • batch_size: Size of the batch on which adversarial samples are generated.

    • num_random_init: Number of random initialisations within the epsilon ball.

  • bound (float) – The perturbation range for the zonotope. Will be ignored if a certification_schedule is used.

  • loss_weighting (float) – Weighting factor for the certified loss.

  • nb_epochs – Number of training epochs.

  • use_certification_schedule (bool) – If to use a training schedule for the certification radius.

  • certification_schedule – Schedule for gradually increasing the certification radius. Empirical studies have shown that this is often required to achieve best performance. Either True to use the default linear scheduler, or a class with a .step() method that returns the updated bound every epoch.

  • batch_size (int) – Size of batches to use for certified training. NB, this will run the data sequentially accumulating gradients over the batch size.

fit(x: ndarray, y: ndarray, certification_loss: Any = 'interval_loss_cce', batch_size: Optional[int] = None, nb_epochs: Optional[int] = None, training_mode: bool = True, scheduler: Optional[Any] = None, verbose: bool = True, **kwargs) None

Fit the classifier on the training set (x, y).

Parameters
  • x (ndarray) – Training data.

  • y (ndarray) – Target values (class labels) one-hot-encoded of shape (nb_samples, nb_classes) or index labels of shape (nb_samples,).

  • certification_loss – Which certification loss function to use. Either “interval_loss_cce” or “max_logit_loss”. By default will use interval_loss_cce. Alternatively, a user can supply their own loss function which takes in as input the zonotope predictions of the form () and labels of the from () and returns a scalar loss.

  • batch_size – Size of batches to use for certified training. NB, this will run the data sequentially accumulating gradients over the batch size.

  • nb_epochs – Number of epochs to use for training.

  • training_mode (bool) – True for model set to training mode and ‘False for model set to evaluation mode.

  • scheduler – Learning rate scheduler to run at the start of every epoch.

  • verbose (bool) – If to display the per-batch statistics while training.

  • kwargs – Dictionary of framework-specific arguments. This parameter is not currently supported for PyTorch and providing it takes no effect.

predict(x: ndarray, **kwargs) ndarray

Perform prediction using the adversarially trained classifier.

Return type

ndarray

Parameters
  • x (ndarray) – Input samples.

  • kwargs – Other parameters to be passed on to the predict function of the classifier.

Returns

Predictions for test set.

predict_zonotopes(cent: ndarray, bound, **kwargs) Tuple[List[ndarray], List[ndarray]]

Perform prediction using the adversarially trained classifier using zonotopes

Return type

Tuple

Parameters
  • cent (ndarray) – The datapoint, representing the zonotope center.

  • bound – The perturbation range for the zonotope.

set_forward_mode(mode: str) None

Helper function to set the forward mode of the model

Parameters

mode (str) – either concrete or abstract signifying how to run the forward pass