art.defences.trainer

Module implementing train-based defences against adversarial attacks.

Base Class Trainer

class art.defences.trainer.Trainer(classifier: CLASSIFIER_LOSS_GRADIENTS_TYPE)

Abstract base class for training defences.

__init__(classifier: CLASSIFIER_LOSS_GRADIENTS_TYPE) None

Create a adversarial training object

__weakref__

list of weak references to the object (if defined)

property classifier: CLASSIFIER_LOSS_GRADIENTS_TYPE

Access function to get the classifier.

Returns:

The classifier.

abstract fit(x: ndarray, y: ndarray, **kwargs) None

Train the model.

Parameters:
  • x (ndarray) – Training data.

  • y (ndarray) – Labels for the training data.

  • kwargs – Other parameters.

get_classifier() CLASSIFIER_LOSS_GRADIENTS_TYPE

Return the classifier trained via adversarial training.

Returns:

The classifier.

Adversarial Training

class art.defences.trainer.AdversarialTrainer(classifier: CLASSIFIER_LOSS_GRADIENTS_TYPE, attacks: EvasionAttack | List[EvasionAttack], ratio: float = 0.5)

Class performing adversarial training based on a model architecture and one or multiple attack methods.

Incorporates original adversarial training, ensemble adversarial training (https://arxiv.org/abs/1705.07204), training on all adversarial data and other common setups. If multiple attacks are specified, they are rotated for each batch. If the specified attacks have as target a different model, then the attack is transferred. The ratio determines how many of the clean samples in each batch are replaced with their adversarial counterpart.

Warning

Both successful and unsuccessful adversarial samples are used for training. In the case of unbounded attacks (e.g., DeepFool), this can result in invalid (very noisy) samples being included.

Please keep in mind the limitations of defences. While adversarial training is widely regarded as a promising, principled approach to making classifiers more robust (see https://arxiv.org/abs/1802.00420), very careful evaluations are required to assess its effectiveness case by case (see https://arxiv.org/abs/1902.06705).
__init__(classifier: CLASSIFIER_LOSS_GRADIENTS_TYPE, attacks: EvasionAttack | List[EvasionAttack], ratio: float = 0.5) None

Create an AdversarialTrainer instance.

Parameters:
  • classifier – Model to train adversarially.

  • attacks – attacks to use for data augmentation in adversarial training

  • ratio (float) – The proportion of samples in each batch to be replaced with their adversarial counterparts. Setting this value to 1 allows to train only on adversarial samples.

fit(x: ndarray, y: ndarray, batch_size: int = 128, nb_epochs: int = 20, **kwargs) None

Train a model adversarially. See class documentation for more information on the exact procedure.

Parameters:
  • x (ndarray) – Training set.

  • y (ndarray) – Labels for the training set.

  • batch_size (int) – Size of batches.

  • nb_epochs (int) – Number of epochs to use for trainings.

  • kwargs – Dictionary of framework-specific arguments. These will be passed as such to the fit function of the target classifier.

fit_generator(generator: DataGenerator, nb_epochs: int = 20, **kwargs) None

Train a model adversarially using a data generator. See class documentation for more information on the exact procedure.

Parameters:
  • generator – Data generator.

  • nb_epochs (int) – Number of epochs to use for trainings.

  • kwargs – Dictionary of framework-specific arguments. These will be passed as such to the fit function of the target classifier.

predict(x: ndarray, **kwargs) ndarray

Perform prediction using the adversarially trained classifier.

Return type:

ndarray

Parameters:
  • x (ndarray) – Input samples.

  • kwargs – Other parameters to be passed on to the predict function of the classifier.

Returns:

Predictions for test set.

Adversarial Training Madry PGD

class art.defences.trainer.AdversarialTrainerMadryPGD(classifier: CLASSIFIER_LOSS_GRADIENTS_TYPE, nb_epochs: int | None = 205, batch_size: int | None = 128, eps: int | float = 8, eps_step: int | float = 2, max_iter: int = 7, num_random_init: int = 1)

Class performing adversarial training following Madry’s Protocol.

Please keep in mind the limitations of defences. While adversarial training is widely regarded as a promising, principled approach to making classifiers more robust (see https://arxiv.org/abs/1802.00420), very careful evaluations are required to assess its effectiveness case by case (see https://arxiv.org/abs/1902.06705).
__init__(classifier: CLASSIFIER_LOSS_GRADIENTS_TYPE, nb_epochs: int | None = 205, batch_size: int | None = 128, eps: int | float = 8, eps_step: int | float = 2, max_iter: int = 7, num_random_init: int = 1) None

Create an AdversarialTrainerMadryPGD instance.

Default values are for CIFAR-10 in pixel range 0-255.

Parameters:
  • classifier – Classifier to train adversarially.

  • nb_epochs – Number of training epochs.

  • batch_size – Size of the batch on which adversarial samples are generated.

  • eps – Maximum perturbation that the attacker can introduce.

  • eps_step – Attack step size (input variation) at each iteration.

  • max_iter (int) – The maximum number of iterations.

  • num_random_init (int) – Number of random initialisations within the epsilon ball. For num_random_init=0 starting at the original input.

fit(x: ndarray, y: ndarray, validation_data: ndarray | None = None, batch_size: int | None = None, nb_epochs: int | None = None, **kwargs) None

Train a model adversarially. See class documentation for more information on the exact procedure.

Parameters:
  • x (ndarray) – Training data.

  • y (ndarray) – Labels for the training data.

  • validation_data – Validation data.

  • batch_size – Size of batches. Overwrites batch_size defined in __init__ if not None.

  • nb_epochs – Number of epochs to use for trainings. Overwrites nb_epochs defined in __init__ if not None.

  • kwargs – Dictionary of framework-specific arguments.

get_classifier() CLASSIFIER_LOSS_GRADIENTS_TYPE

Return the classifier trained via adversarial training.

Returns:

The classifier.

Adversarial Training Adversarial Weight Perturbation (AWP) - PyTorch

class art.defences.trainer.AdversarialTrainerAWPPyTorch(classifier: PyTorchClassifier, proxy_classifier: PyTorchClassifier, attack: EvasionAttack, mode: str, gamma: float, beta: float, warmup: int)

Class performing adversarial training following Adversarial Weight Perturbation (AWP) protocol.

__init__(classifier: PyTorchClassifier, proxy_classifier: PyTorchClassifier, attack: EvasionAttack, mode: str, gamma: float, beta: float, warmup: int)

Create an AdversarialTrainerAWPPyTorch instance.

Parameters:
  • classifier (PyTorchClassifier) – Model to train adversarially.

  • proxy_classifier (PyTorchClassifier) – Model for adversarial weight perturbation.

  • attack (EvasionAttack) – attack to use for data augmentation in adversarial training.

  • mode (str) – mode determining the optimization objective of base adversarial training and weight perturbation step

  • gamma (float) – The scaling factor controlling norm of weight perturbation relative to model parameters’ norm.

  • beta (float) – The scaling factor controlling tradeoff between clean loss and adversarial loss for TRADES protocol

  • warmup (int) – The number of epochs after which weight perturbation is applied

fit(x: ndarray, y: ndarray, validation_data: Tuple[ndarray, ndarray] | None = None, batch_size: int = 128, nb_epochs: int = 20, scheduler: torch.optim.lr_scheduler._LRScheduler | None = None, **kwargs)

Train a model adversarially with AWP protocol. See class documentation for more information on the exact procedure.

Parameters:
  • x (ndarray) – Training set.

  • y (ndarray) – Labels for the training set.

  • validation_data – Tuple consisting of validation data, (x_val, y_val)

  • batch_size (int) – Size of batches.

  • nb_epochs (int) – Number of epochs to use for trainings.

  • scheduler – Learning rate scheduler to run at the end of every epoch.

  • kwargs – Dictionary of framework-specific arguments. These will be passed as such to the fit function of the target classifier.

fit_generator(generator: DataGenerator, validation_data: Tuple[ndarray, ndarray] | None = None, nb_epochs: int = 20, scheduler: torch.optim.lr_scheduler._LRScheduler | None = None, **kwargs)

Train a model adversarially with AWP protocol using a data generator. See class documentation for more information on the exact procedure.

Parameters:
  • generator (DataGenerator) – Data generator.

  • validation_data – Tuple consisting of validation data, (x_val, y_val)

  • nb_epochs (int) – Number of epochs to use for trainings.

  • scheduler – Learning rate scheduler to run at the end of every epoch.

  • kwargs – Dictionary of framework-specific arguments. These will be passed as such to the fit function of the target classifier.

Adversarial Training Oracle Aligned Adversarial Training (OAAT) - PyTorch

class art.defences.trainer.AdversarialTrainerOAATPyTorch(classifier: PyTorchClassifier, proxy_classifier: PyTorchClassifier, lpips_classifier: PyTorchClassifier, list_avg_models: List[PyTorchClassifier], attack: EvasionAttack, train_params: dict)

Class performing adversarial training following Oracle Aligned Adversarial Training (OAAT) protocol.

__init__(classifier: PyTorchClassifier, proxy_classifier: PyTorchClassifier, lpips_classifier: PyTorchClassifier, list_avg_models: List[PyTorchClassifier], attack: EvasionAttack, train_params: dict)

Create an AdversarialTrainerOAATPyTorch instance.

Parameters:
  • classifier (PyTorchClassifier) – Model to train adversarially.

  • proxy_classifier (PyTorchClassifier) – Model for adversarial weight perturbation.

  • lpips_classifier (PyTorchClassifier) – Weight averaging model for calculating activations.

  • list_avg_models – list of models for weight averaging.

  • attack (EvasionAttack) – attack to use for data augmentation in adversarial training.

  • train_params (dict) – training parameters’ dictionary related to adversarial training

static calculate_lpips_distance(p_classifier: PyTorchClassifier, input_1: torch.Tensor, input_2: torch.Tensor, layers: List[str | int]) torch.Tensor

Return the LPIPS distance between input_1 and input_2. layers is a list of either layer indices (between 0 and nb_layers - 1) or layers’ name. The number of layers can be determined by counting the results returned by calling layer_names.

Parameters:
  • p_classifier (PyTorchClassifier) – model for adversarial training protocol.

  • input_1 – Input for computing the activations.

  • input_2 – Input for computing the activations.

  • layers – Layers for computing the activations.

Returns:

The lpips distance, where the first dimension is the batch size corresponding to input_1.

fit(x: ndarray, y: ndarray, validation_data: Tuple[ndarray, ndarray] | None = None, batch_size: int = 128, nb_epochs: int = 20, **kwargs)

Train a model adversarially with OAAT protocol. See class documentation for more information on the exact procedure.

Parameters:
  • x (ndarray) – Training set.

  • y (ndarray) – Labels for the training set.

  • validation_data – Tuple consisting of validation data, (x_val, y_val)

  • batch_size (int) – Size of batches.

  • nb_epochs (int) – Number of epochs to use for trainings.

  • kwargs – Dictionary of framework-specific arguments. These will be passed as such to the fit function of the target classifier.

fit_generator(generator: DataGenerator, validation_data: Tuple[ndarray, ndarray] | None = None, nb_epochs: int = 20, **kwargs)

Train a model adversarially with OAAT protocol using a data generator. See class documentation for more information on the exact procedure.

Parameters:
  • generator (DataGenerator) – Data generator.

  • validation_data – Tuple consisting of validation data, (x_val, y_val)

  • nb_epochs (int) – Number of epochs to use for trainings.

  • kwargs – Dictionary of framework-specific arguments. These will be passed as such to the fit function of the target classifier.

static get_layer_activations(p_classifier: PyTorchClassifier, x: torch.Tensor, layers: List[str | int]) Tuple[Dict[str, torch.Tensor], List[str]]

Return the output of the specified layers for input x. layers is a list of either layer indices (between 0 and nb_layers - 1) or layers’ name. The number of layers can be determined by counting the results returned by calling layer_names.

Parameters:
  • p_classifier (PyTorchClassifier) – model for adversarial training protocol.

  • x – Input for computing the activations.

  • layers – Layers for computing the activations

Returns:

Tuple containing the output dict and a list of layers’ names. In dictionary each element is a layer’s output where the first dimension is the batch size corresponding to `x’.

static normalize_concatenate_activations(activations_dict: Dict[str, torch.Tensor], list_layer_names: List[str]) torch.Tensor

Takes a dictionary `activations_dict’ of activation values of different layers for an input batch and Returns a tensor where all activation values are normalised layer-wise and flattened to a vector for each input of the batch.

Parameters:
  • activations_dict – dict containing the activations at different layers.

  • list_layer_names – Layers’ names for fetching the activations

Returns:

The activations after normalisation and flattening, where the first dimension is the batch size.

update_learning_rate(optimizer: torch.optim.optimizer.Optimizer, epoch: int, nb_epochs: int, lr_schedule: str = 'step') None

adjust learning rate of the optimizer.

Parameters:
  • optimizer – optimizer of the classifier.

  • epoch (int) – current training epoch.

  • nb_epochs (int) – total training epoch.

  • lr_schedule (str) – string denoting learning rate scheduler for optimizer

Returns:

calculated learning rate

Adversarial Training TRADES - PyTorch

class art.defences.trainer.AdversarialTrainerTRADESPyTorch(classifier: PyTorchClassifier, attack: EvasionAttack, beta: float)

Class performing adversarial training following TRADES protocol.

__init__(classifier: PyTorchClassifier, attack: EvasionAttack, beta: float)

Create an AdversarialTrainerTRADESPyTorch instance.

Parameters:
  • classifier (PyTorchClassifier) – Model to train adversarially.

  • attack (EvasionAttack) – attack to use for data augmentation in adversarial training

  • beta (float) – The scaling factor controlling tradeoff between clean loss and adversarial loss

fit(x: ndarray, y: ndarray, validation_data: Tuple[ndarray, ndarray] | None = None, batch_size: int = 128, nb_epochs: int = 20, scheduler: torch.optim.lr_scheduler._LRScheduler | None = None, **kwargs)

Train a model adversarially with TRADES protocol. See class documentation for more information on the exact procedure.

Parameters:
  • x (ndarray) – Training set.

  • y (ndarray) – Labels for the training set.

  • validation_data – Tuple consisting of validation data, (x_val, y_val)

  • batch_size (int) – Size of batches.

  • nb_epochs (int) – Number of epochs to use for trainings.

  • scheduler – Learning rate scheduler to run at the end of every epoch.

  • kwargs – Dictionary of framework-specific arguments. These will be passed as such to the fit function of the target classifier.

fit_generator(generator: DataGenerator, nb_epochs: int = 20, scheduler: torch.optim.lr_scheduler._LRScheduler | None = None, **kwargs)

Train a model adversarially with TRADES protocol using a data generator. See class documentation for more information on the exact procedure.

Parameters:
  • generator (DataGenerator) – Data generator.

  • nb_epochs (int) – Number of epochs to use for trainings.

  • scheduler – Learning rate scheduler to run at the end of every epoch.

  • kwargs – Dictionary of framework-specific arguments. These will be passed as such to the fit function of the target classifier.

Base Class Adversarial Training Fast is Better than Free

class art.defences.trainer.AdversarialTrainerFBF(classifier: CLASSIFIER_LOSS_GRADIENTS_TYPE, eps: int | float = 8)

This is abstract class for different backend-specific implementations of Fast is Better than Free protocol for adversarial training.

__init__(classifier: CLASSIFIER_LOSS_GRADIENTS_TYPE, eps: int | float = 8)

Create an AdversarialTrainerFBF instance.

Parameters:
  • classifier – Model to train adversarially.

  • eps – Maximum perturbation that the attacker can introduce.

abstract fit(x: ndarray, y: ndarray, validation_data: Tuple[ndarray, ndarray] | None = None, batch_size: int = 128, nb_epochs: int = 20, **kwargs)

Train a model adversarially with FBF. See class documentation for more information on the exact procedure.

Parameters:
  • x (ndarray) – Training set.

  • y (ndarray) – Labels for the training set.

  • validation_data – Tuple consisting of validation data, (x_val, y_val)

  • batch_size (int) – Size of batches.

  • nb_epochs (int) – Number of epochs to use for trainings.

  • kwargs – Dictionary of framework-specific arguments. These will be passed as such to the fit function of the target classifier.

abstract fit_generator(generator: DataGenerator, nb_epochs: int = 20, **kwargs)

Train a model adversarially using a data generator. See class documentation for more information on the exact procedure.

Parameters:
  • generator – Data generator.

  • nb_epochs (int) – Number of epochs to use for trainings.

  • kwargs – Dictionary of framework-specific arguments. These will be passed as such to the fit function of the target classifier.

predict(x: ndarray, **kwargs) ndarray

Perform prediction using the adversarially trained classifier.

Return type:

ndarray

Parameters:
  • x (ndarray) – Input samples.

  • kwargs – Other parameters to be passed on to the predict function of the classifier.

Returns:

Predictions for test set.

Adversarial Training Fast is Better than Free - PyTorch

class art.defences.trainer.AdversarialTrainerFBFPyTorch(classifier: PyTorchClassifier, eps: int | float = 8, use_amp: bool = False)

Class performing adversarial training following Fast is Better Than Free protocol.

The effectiveness of this protocol is found to be sensitive to the use of techniques like data augmentation, gradient clipping and learning rate schedules. Optionally, the use of mixed precision arithmetic operation via apex library can significantly reduce the training time making this one of the fastest adversarial training protocol.
__init__(classifier: PyTorchClassifier, eps: int | float = 8, use_amp: bool = False)

Create an AdversarialTrainerFBFPyTorch instance.

Parameters:
  • classifier – Model to train adversarially.

  • eps – Maximum perturbation that the attacker can introduce.

  • use_amp (bool) – Boolean that decides if apex should be used for mixed precision arithmetic during training

fit(x: ndarray, y: ndarray, validation_data: Tuple[ndarray, ndarray] | None = None, batch_size: int = 128, nb_epochs: int = 20, **kwargs)

Train a model adversarially with FBF protocol. See class documentation for more information on the exact procedure.

Parameters:
  • x (ndarray) – Training set.

  • y (ndarray) – Labels for the training set.

  • validation_data – Tuple consisting of validation data, (x_val, y_val)

  • batch_size (int) – Size of batches.

  • nb_epochs (int) – Number of epochs to use for trainings.

  • kwargs – Dictionary of framework-specific arguments. These will be passed as such to the fit function of the target classifier.

fit_generator(generator: DataGenerator, nb_epochs: int = 20, **kwargs)

Train a model adversarially with FBF protocol using a data generator. See class documentation for more information on the exact procedure.

Parameters:
  • generator – Data generator.

  • nb_epochs (int) – Number of epochs to use for trainings.

  • kwargs – Dictionary of framework-specific arguments. These will be passed as such to the fit function of the target classifier.

Adversarial Training Certified - PyTorch

class art.defences.trainer.AdversarialTrainerCertifiedPytorch(classifier: CERTIFIER_TYPE, nb_epochs: int | None = 20, bound: float = 0.1, loss_weighting: float = 0.1, batch_size: int = 10, use_certification_schedule: bool = True, certification_schedule: Any | None = None, augment_with_pgd: bool = True, pgd_params: PGDParamDict | None = None)

Class performing certified adversarial training from methods such as

__init__(classifier: CERTIFIER_TYPE, nb_epochs: int | None = 20, bound: float = 0.1, loss_weighting: float = 0.1, batch_size: int = 10, use_certification_schedule: bool = True, certification_schedule: Any | None = None, augment_with_pgd: bool = True, pgd_params: PGDParamDict | None = None) None

Create an AdversarialTrainerCertified instance.

Default values are for MNIST in pixel range 0-1.

Parameters:
  • classifier – Classifier to train adversarially.

  • pgd_params

    A dictionary containing the specific parameters relating to regular PGD training. If not provided, we will default to typical MNIST values. Otherwise must contain the following keys:

    • eps: Maximum perturbation that the attacker can introduce.

    • eps_step: Attack step size (input variation) at each iteration.

    • max_iter: The maximum number of iterations.

    • batch_size: Size of the batch on which adversarial samples are generated.

    • num_random_init: Number of random initialisations within the epsilon ball.

  • bound (float) – The perturbation range for the zonotope. Will be ignored if a certification_schedule is used.

  • loss_weighting (float) – Weighting factor for the certified loss.

  • nb_epochs – Number of training epochs.

  • use_certification_schedule (bool) – If to use a training schedule for the certification radius.

  • certification_schedule – Schedule for gradually increasing the certification radius. Empirical studies have shown that this is often required to achieve best performance. Either True to use the default linear scheduler, or a class with a .step() method that returns the updated bound every epoch.

  • batch_size (int) – Size of batches to use for certified training. NB, this will run the data sequentially accumulating gradients over the batch size.

fit(x: ndarray, y: ndarray, certification_loss: Any = 'interval_loss_cce', batch_size: int | None = None, nb_epochs: int | None = None, training_mode: bool = True, scheduler: Any | None = None, verbose: bool = True, **kwargs) None

Fit the classifier on the training set (x, y).

Parameters:
  • x (ndarray) – Training data.

  • y (ndarray) – Target values (class labels) one-hot-encoded of shape (nb_samples, nb_classes) or index labels of shape (nb_samples,).

  • certification_loss – Which certification loss function to use. Either “interval_loss_cce” or “max_logit_loss”. By default will use interval_loss_cce. Alternatively, a user can supply their own loss function which takes in as input the zonotope predictions of the form () and labels of the from () and returns a scalar loss.

  • batch_size – Size of batches to use for certified training. NB, this will run the data sequentially accumulating gradients over the batch size.

  • nb_epochs – Number of epochs to use for training.

  • training_mode (bool) – True for model set to training mode and ‘False for model set to evaluation mode.

  • scheduler – Learning rate scheduler to run at the start of every epoch.

  • verbose (bool) – If to display the per-batch statistics while training.

  • kwargs – Dictionary of framework-specific arguments. This parameter is not currently supported for PyTorch and providing it takes no effect.

predict(x: ndarray, **kwargs) ndarray

Perform prediction using the adversarially trained classifier.

Return type:

ndarray

Parameters:
  • x (ndarray) – Input samples.

  • kwargs – Other parameters to be passed on to the predict function of the classifier.

Returns:

Predictions for test set.

predict_zonotopes(cent: ndarray, bound, **kwargs) Tuple[List[ndarray], List[ndarray]]

Perform prediction using the adversarially trained classifier using zonotopes

Parameters:
  • cent (ndarray) – The datapoint, representing the zonotope center.

  • bound – The perturbation range for the zonotope.

set_forward_mode(mode: str) None

Helper function to set the forward mode of the model

Parameters:

mode (str) – either concrete or abstract signifying how to run the forward pass

Adversarial Training Certified Interval Bound Propagation - PyTorch

class art.defences.trainer.AdversarialTrainerCertifiedIBPPyTorch(classifier: IBP_CERTIFIER_TYPE, nb_epochs: int | None = 20, bound: float = 0.1, batch_size: int = 32, loss_weighting: int | None = None, use_certification_schedule: bool = True, certification_schedule: Any | None = None, use_loss_weighting_schedule: bool = True, loss_weighting_schedule: Any | None = None, augment_with_pgd: bool = False, pgd_params: PGDParamDict | None = None)

Class performing certified adversarial training from methods such as

__init__(classifier: IBP_CERTIFIER_TYPE, nb_epochs: int | None = 20, bound: float = 0.1, batch_size: int = 32, loss_weighting: int | None = None, use_certification_schedule: bool = True, certification_schedule: Any | None = None, use_loss_weighting_schedule: bool = True, loss_weighting_schedule: Any | None = None, augment_with_pgd: bool = False, pgd_params: PGDParamDict | None = None) None

Create an AdversarialTrainerCertified instance.

Default values are for MNIST in pixel range 0-1.

Parameters:
  • classifier – Classifier to train adversarially.

  • pgd_params

    A dictionary containing the specific parameters relating to regular PGD training. If not provided, we will default to typical MNIST values. Otherwise must contain the following keys:

    • eps: Maximum perturbation that the attacker can introduce.

    • eps_step: Attack step size (input variation) at each iteration.

    • max_iter: The maximum number of iterations.

    • batch_size: Size of the batch on which adversarial samples are generated.

    • num_random_init: Number of random initialisations within the epsilon ball.

  • loss_weighting – Weighting factor for the certified loss.

  • bound (float) – The perturbation range for the interval. If the default certification schedule is used will be the upper limit.

  • nb_epochs – Number of training epochs.

  • use_certification_schedule (bool) – If to use a training schedule for the certification radius.

  • certification_schedule – Schedule for gradually increasing the certification radius. Empirical studies have shown that this is often required to achieve best performance. Either True to use the default linear scheduler, or a class with a .step() method that returns the updated bound every epoch.

  • batch_size (int) – Size of batches to use for certified training.

fit(x: ndarray, y: ndarray, limits: List[float] | ndarray | None = None, certification_loss: Any = 'interval_loss_cce', batch_size: int | None = None, nb_epochs: int | None = None, training_mode: bool = True, scheduler: Any | None = None, verbose: bool = True, **kwargs) None

Fit the classifier on the training set (x, y).

Parameters:
  • x (ndarray) – Training data.

  • y (ndarray) – Target values (class labels) one-hot-encoded of shape (nb_samples, nb_classes) or index labels of shape (nb_samples,).

  • limits – Max and min limits on the inputs, limits[0] being the lower bounds and limits[1] being upper bounds. Passing None will mean no clipping is applied to the interval abstraction. Typical images will have limits of [0.0, 1.0] after normalization.

  • certification_loss – Which certification loss function to use. Either “interval_loss_cce” or “max_logit_loss”. By default will use interval_loss_cce. Alternatively, a user can supply their own loss function which takes in as input the interval predictions of the form () and labels of the form () and returns a scalar loss.

  • batch_size – Size of batches to use for certified training. NB, this will run the data sequentially accumulating gradients over the batch size.

  • nb_epochs – Number of epochs to use for training.

  • training_mode (bool) – True for model set to training mode and ‘False for model set to evaluation mode.

  • scheduler – Learning rate scheduler to run at the start of every epoch.

  • verbose (bool) – If to display the per-batch statistics while training.

  • kwargs – Dictionary of framework-specific arguments. This parameter is not currently supported for PyTorch and providing it takes no effect.

static initialise_default_scheduler(initial_val: float, final_val: float, epochs: int) DefaultLinearScheduler

Create linear schedulers based on default example values.

Return type:

DefaultLinearScheduler

Parameters:
  • initial_val (float) – Initial value to begin the scheduler from.

  • final_val (float) – Final value to end the scheduler at.

  • epochs (int) – Total number of epochs.

Returns:

A linear scheduler initialised with default example values.

predict(x: ndarray, **kwargs) ndarray

Perform prediction using the adversarially trained classifier.

Return type:

ndarray

Parameters:
  • x (ndarray) – Input samples.

  • kwargs – Other parameters to be passed on to the predict function of the classifier.

Returns:

Predictions for test set.

predict_intervals(x: ndarray, is_interval: bool = False, bounds: float | List[float] | ndarray | None = None, limits: List[float] | ndarray | None = None, batch_size: int = 128, **kwargs) ndarray

Perform prediction using the adversarially trained classifier using zonotopes

Return type:

ndarray

Parameters:
  • x (ndarray) –

    The datapoint, either:

    1. In the interval format of x[batch_size, 2, feature_1, feature_2, …] where axis=1 corresponds to the [lower, upper] bounds.

    2. Or in regular concrete form, in which case the bounds/limits need to be supplied.

  • is_interval (bool) – if the datapoint is already in the correct interval format.

  • bounds – The perturbation range.

  • limits – The clipping to apply to the interval data.

  • batch_size (int) – batch size to use when looping through the data

set_forward_mode(mode: str) None

Helper function to set the forward mode of the model

Parameters:

mode (str) – either concrete or abstract signifying how to run the forward pass

DP - InstaHide Training

class art.defences.trainer.DPInstaHideTrainer(classifier: CLASSIFIER_LOSS_GRADIENTS_TYPE, augmentations: Preprocessor | List[Preprocessor], noise: Literal['gaussian', 'laplacian', 'exponential'] = 'laplacian', loc: int | float = 0.0, scale: int | float = 0.03, clip_values: CLIP_VALUES_TYPE = (0.0, 1.0))

Class performing adversarial training following the DP-InstaHide protocol.

Uses data augmentation methods in conjunction with some type of additive noise.

__init__(classifier: CLASSIFIER_LOSS_GRADIENTS_TYPE, augmentations: Preprocessor | List[Preprocessor], noise: Literal['gaussian', 'laplacian', 'exponential'] = 'laplacian', loc: int | float = 0.0, scale: int | float = 0.03, clip_values: CLIP_VALUES_TYPE = (0.0, 1.0))

Create an DPInstaHideTrainer instance.

Parameters:
  • classifier – The model to train using the protocol.

  • augmentations – The preprocessing data augmentation defence(s) to be applied.

  • noise – The type of additive noise to use: ‘gaussian’ | ‘laplacian’ | ‘exponential’.

  • loc – The location or mean parameter of the distribution to sample.

  • scale – The scale or standard deviation parameter of the distribution to sample.

  • clip_values – Tuple of the form (min, max) representing the minimum and maximum values allowed for features.

fit(x: ndarray, y: ndarray, validation_data: Tuple[ndarray, ndarray] | None = None, batch_size: int = 128, nb_epochs: int = 20, **kwargs)

Train a model adversarially with the DP-InstaHide protocol. See class documentation for more information on the exact procedure.

Parameters:
  • x (ndarray) – Training set.

  • y (ndarray) – Labels for the training set.

  • validation_data – Tuple consisting of validation data, (x_val, y_val)

  • batch_size (int) – Size of batches.

  • nb_epochs (int) – Number of epochs to use for trainings.

  • kwargs – Dictionary of framework-specific arguments. These will be passed as such to the fit function of the target classifier.

fit_generator(generator: DataGenerator, nb_epochs: int = 20, **kwargs)

Train a model adversarially with the DP-InstaHide protocol using a data generator. See class documentation for more information on the exact procedure.

Parameters:
  • generator – Data generator.

  • nb_epochs (int) – Number of epochs to use for trainings.

  • kwargs – Dictionary of framework-specific arguments. These will be passed as such to the fit function of the target classifier.

predict(x: ndarray, **kwargs) ndarray

Perform prediction using the adversarially trained classifier.

Return type:

ndarray

Parameters:
  • x (ndarray) – Input samples.

  • kwargs – Other parameters to be passed on to the predict function of the classifier.

Returns:

Predictions for test set.