art.estimators.poison_mitigation
¶
This module implements all poison mitigation models in ART.
Keras Neural Cleanse Classifier¶
- class art.estimators.poison_mitigation.KerasNeuralCleanse(model: keras.models.Model | tf.keras.models.Model, use_logits: bool = False, channels_first: bool = False, clip_values: CLIP_VALUES_TYPE | None = None, preprocessing_defences: Preprocessor | List[Preprocessor] | None = None, postprocessing_defences: Postprocessor | List[Postprocessor] | None = None, preprocessing: PREPROCESSING_TYPE = (0.0, 1.0), input_layer: int = 0, output_layer: int = 0, steps: int = 1000, init_cost: float = 0.001, norm: int | float = 2, learning_rate: float = 0.1, attack_success_threshold: float = 0.99, patience: int = 5, early_stop: bool = True, early_stop_threshold: float = 0.99, early_stop_patience: int = 10, cost_multiplier: float = 1.5, batch_size: int = 32)¶
Implementation of methods in Neural Cleanse: Identifying and Mitigating Backdoor Attacks in Neural Networks. Wang et al. (2019).
- __init__(model: keras.models.Model | tf.keras.models.Model, use_logits: bool = False, channels_first: bool = False, clip_values: CLIP_VALUES_TYPE | None = None, preprocessing_defences: Preprocessor | List[Preprocessor] | None = None, postprocessing_defences: Postprocessor | List[Postprocessor] | None = None, preprocessing: PREPROCESSING_TYPE = (0.0, 1.0), input_layer: int = 0, output_layer: int = 0, steps: int = 1000, init_cost: float = 0.001, norm: int | float = 2, learning_rate: float = 0.1, attack_success_threshold: float = 0.99, patience: int = 5, early_stop: bool = True, early_stop_threshold: float = 0.99, early_stop_patience: int = 10, cost_multiplier: float = 1.5, batch_size: int = 32)¶
Create a Neural Cleanse classifier.
- Parameters:
model – Keras model, neural network or other.
use_logits (
bool
) – True if the output of the model are logits; false for probabilities or any other type of outputs. Logits output should be favored when possible to ensure attack efficiency.channels_first (
bool
) – Set channels first or last.clip_values – Tuple of the form (min, max) of floats or np.ndarray representing the minimum and maximum values allowed for features. If floats are provided, these will be used as the range of all features. If arrays are provided, each value will be considered the bound for a feature, thus the shape of clip values needs to match the total number of features.
preprocessing_defences – Preprocessing defence(s) to be applied by the classifier.
postprocessing_defences – Postprocessing defence(s) to be applied by the classifier.
preprocessing – Tuple of the form (subtrahend, divisor) of floats or np.ndarray of values to be used for data preprocessing. The first value will be subtracted from the input. The input will then be divided by the second one.
input_layer (
int
) – The index of the layer to consider as input for models with multiple input layers. The layer with this index will be considered for computing gradients. For models with only one input layer this values is not required.output_layer (
int
) – Which layer to consider as the output when the models has multiple output layers. The layer with this index will be considered for computing gradients. For models with only one output layer this values is not required.steps (
int
) – The maximum number of steps to run the Neural Cleanse optimizationinit_cost (
float
) – The initial value for the cost tensor in the Neural Cleanse optimizationnorm – The norm to use for the Neural Cleanse optimization, can be 1, 2, or np.inf
learning_rate (
float
) – The learning rate for the Neural Cleanse optimizationattack_success_threshold (
float
) – The threshold at which the generated backdoor is successful enough to stop the Neural Cleanse optimizationpatience (
int
) – How long to wait for changing the cost multiplier in the Neural Cleanse optimizationearly_stop (
bool
) – Whether or not to allow early stopping in the Neural Cleanse optimizationearly_stop_threshold (
float
) – How close values need to come to max value to start counting early stopearly_stop_patience (
int
) – How long to wait to determine early stopping in the Neural Cleanse optimizationcost_multiplier (
float
) – How much to change the cost in the Neural Cleanse optimizationbatch_size (
int
) – The batch size for optimizations in the Neural Cleanse optimization
- abstain() ndarray ¶
Abstain from a prediction :return: A numpy array of zeros
- backdoor_examples(x_val: ndarray, y_val: ndarray) Tuple[ndarray, ndarray, ndarray] ¶
Generate reverse-engineered backdoored examples using validation data :type y_val:
ndarray
:type x_val:ndarray
:param x_val: validation data :param y_val: validation labels :return: a tuple containing (clean data, backdoored data, labels)
- property channels_first: bool¶
- Returns:
Boolean to indicate index of the color channels in the sample x.
- check_backdoor_effective(backdoor_data: ndarray, backdoor_labels: ndarray) bool ¶
Check if supposed backdoors are effective against the classifier
- Return type:
bool
- Parameters:
backdoor_data (
ndarray
) – data with the backdoor addedbackdoor_labels (
ndarray
) – the correct label for the data
- Returns:
true if any of the backdoors are effective on the model
- class_gradient(x: ndarray, label: int | List[int] | ndarray | None = None, training_mode: bool = False, **kwargs) ndarray ¶
Compute per-class derivatives of the given classifier w.r.t. x of original classifier.
- Return type:
ndarray
- Parameters:
x (
ndarray
) – Sample input with shape as expected by the model.label – Index of a specific per-class derivative. If an integer is provided, the gradient of that class output is computed for all samples. If multiple values as provided, the first dimension should match the batch size of x, and each value will be used as target for its corresponding sample in x. If None, then gradients for all classes will be computed for each sample.
training_mode (
bool
) – True for model set to training mode and ‘False for model set to evaluation mode.
- Returns:
Array of gradients of input features w.r.t. each class in the form (batch_size, nb_classes, input_shape) when computing for all classes, otherwise shape becomes (batch_size, 1, input_shape) when label parameter is specified.
- property clip_values: CLIP_VALUES_TYPE | None¶
Return the clip values of the input samples.
- Returns:
Clip values (min, max).
- clone_for_refitting() KerasClassifier ¶
Create a copy of the classifier that can be refit from scratch. Will inherit same architecture, optimizer and initialization as cloned model, but without weights.
- Returns:
new classifier
- compute_loss(x: ndarray, y: ndarray, reduction: str = 'none', **kwargs) ndarray ¶
Compute the loss of the neural network for samples x.
- Parameters:
x (
ndarray
) – Samples of shape (nb_samples, nb_features) or (nb_samples, nb_pixels_1, nb_pixels_2, nb_channels) or (nb_samples, nb_channels, nb_pixels_1, nb_pixels_2).y (
ndarray
) – Target values (class labels) one-hot-encoded of shape (nb_samples, nb_classes) or indices of shape (nb_samples,).reduction (
str
) – Specifies the reduction to apply to the output: ‘none’ | ‘mean’ | ‘sum’. ‘none’: no reduction will be applied ‘mean’: the sum of the output will be divided by the number of elements in the output, ‘sum’: the output will be summed.
- Returns:
Loss values.
- Return type:
Format as expected by the model
- compute_loss_from_predictions(pred: ndarray, y: ndarray, **kwargs) ndarray ¶
Compute the loss of the estimator for predictions pred.
- Return type:
ndarray
- Parameters:
pred (
ndarray
) – Model predictions.y (
ndarray
) – Target values.
- Returns:
Loss values.
- custom_loss_gradient(nn_function, tensors, input_values, name='default')¶
Returns the gradient of the nn_function with respect to model input
- Parameters:
nn_function (a Keras tensor) – an intermediate tensor representation of the function to differentiate
tensors (list) – the tensors or variables to differentiate with respect to
input_values (list) – the inputs to evaluate the gradient
name (str) – The name of the function. Functions of the same name are cached
- Returns:
the gradient of the function w.r.t vars
- Return type:
np.ndarray
- fit(*args, **kwargs)¶
Fit the classifier on the training set (x, y).
- Parameters:
x – Training data.
y – Target values (class labels) one-hot-encoded of shape (nb_samples, nb_classes) or index labels of shape (nb_samples,).
batch_size – Size of batches.
nb_epochs – Number of epochs to use for training.
verbose – Display training progress bar.
kwargs – Dictionary of framework-specific arguments. These should be parameters supported by the fit_generator function in Keras and will be passed to this function as such. Including the number of epochs or the number of steps per epoch as part of this argument will result in as error.
- fit_generator(generator: DataGenerator, nb_epochs: int = 20, verbose: bool = False, **kwargs) None ¶
Fit the classifier using the generator that yields batches as specified.
- Parameters:
generator – Batch generator providing (x, y) for each epoch. If the generator can be used for native training in Keras, it will.
nb_epochs (
int
) – Number of epochs to use for training.verbose (
bool
) – Display training progress bar.kwargs – Dictionary of framework-specific arguments. These should be parameters supported by the fit_generator function in Keras and will be passed to this function as such. Including the number of epochs as part of this argument will result in as error.
- generate_backdoor(x_val: ndarray, y_val: ndarray, y_target: ndarray) Tuple[ndarray, ndarray] ¶
Generates a possible backdoor for the model. Returns the pattern and the mask :return: A tuple of the pattern and mask for the model.
- get_activations(x: ndarray, layer: int | str, batch_size: int = 128, framework: bool = False) ndarray ¶
Return the output of the specified layer for input x. layer is specified by layer index (between 0 and nb_layers - 1) or by name. The number of layers can be determined by counting the results returned by calling layer_names.
- Return type:
ndarray
- Parameters:
x (
ndarray
) – Input for computing the activations.layer – Layer for computing the activations.
batch_size (
int
) – Size of batches.framework (
bool
) – If true, return the intermediate tensor representation of the activation.
- Returns:
The output of layer, where the first dimension is the batch size corresponding to x.
- get_params() Dict[str, Any] ¶
Get all parameters and their values of this estimator.
- Returns:
A dictionary of string parameter names to their value.
- property input_layer: int¶
The index of the layer considered as input for models with multiple input layers. For models with only one input layer the index is 0.
- Returns:
The index of the layer considered as input for models with multiple input layers.
- property input_shape: Tuple[int, ...]¶
Return the shape of one input sample.
- Returns:
Shape of one input sample.
- property layer_names: List[str] | None¶
Return the names of the hidden layers in the model, if applicable.
- Returns:
The names of the hidden layers in the model, input and output layers are ignored.
Warning
layer_names tries to infer the internal structure of the model. This feature comes with no guarantees on the correctness of the result. The intended order of the layers tries to match their order in the model, but this is not guaranteed either.
- loss_gradient(x: ndarray, y: ndarray, training_mode: bool = False, **kwargs) ndarray ¶
Compute the gradient of the loss function w.r.t. x.
- Return type:
ndarray
- Parameters:
x (
ndarray
) – Sample input with shape as expected by the model.y (
ndarray
) – Target values (class labels) one-hot-encoded of shape (nb_samples, nb_classes) or indices of shape (nb_samples,).training_mode (
bool
) – True for model set to training mode and ‘False for model set to evaluation mode.
- Returns:
Array of gradients of the same shape as x.
- mitigate(x_val: ndarray, y_val: ndarray, mitigation_types: List[str]) None ¶
Mitigates the effect of poison on a classifier
- Parameters:
x_val (
ndarray
) – Validation data to use to mitigate the effect of poison.y_val (
ndarray
) – Validation labels to use to mitigate the effect of poison.mitigation_types – The types of mitigation method, can include ‘unlearning’, ‘pruning’, or ‘filtering’
- Returns:
Tuple of length 2 of the selected class and certified radius.
- property model¶
Return the model.
- Returns:
The model.
- property nb_classes: int¶
Return the number of output classes.
- Returns:
Number of classes in the data.
- outlier_detection(x_val: ndarray, y_val: ndarray) List[Tuple[int, ndarray, ndarray]] ¶
Returns a tuple of suspected of suspected poison labels and their mask and pattern :return: A list of tuples containing the the class index, mask, and pattern for suspected labels
- property output_layer: int¶
The index of the layer considered as output for models with multiple output layers. For models with only one output layer the index is 0.
- Returns:
The index of the layer considered as output for models with multiple output layers.
- predict(*args, **kwargs)¶
Perform prediction of the given classifier for a batch of inputs, potentially filtering suspicious input
- Parameters:
x – Input data to predict.
batch_size – Batch size.
training_mode – True for model set to training mode and ‘False for model set to evaluation mode.
- Returns:
Array of predictions of shape (nb_inputs, nb_classes).
- reset()¶
Reset the state of the defense :return:
- save(filename: str, path: str | None = None) None ¶
Save a model to file in the format specific to the backend framework. For Keras, .h5 format is used.
- Parameters:
filename (
str
) – Name of the file where to store the model.path – Path of the folder where to store the model. If no path is specified, the model will be stored in the default data location of the library ART_DATA_PATH.
- set_params(**kwargs) None ¶
Take a dictionary of parameters and apply checks before setting them as attributes.
- Parameters:
kwargs – A dictionary of attributes.
- property use_logits: bool¶
A boolean representing whether the outputs of the model are logits.
- Returns:
a boolean representing whether the outputs of the model are logits.
Mixin Base Class STRIP¶
- class art.estimators.poison_mitigation.STRIPMixin(predict_fn: Callable[[ndarray], ndarray], num_samples: int = 20, false_acceptance_rate: float = 0.01, **kwargs)¶
Implementation of STRIP: A Defence Against Trojan Attacks on Deep Neural Networks (Gao et. al. 2020)
Paper link: https://arxiv.org/abs/1902.06531- __init__(predict_fn: Callable[[ndarray], ndarray], num_samples: int = 20, false_acceptance_rate: float = 0.01, **kwargs) None ¶
Create a STRIP defense
- Parameters:
predict_fn – The predict function of the original classifier
num_samples (
int
) – The number of samples to use to test entropy at inference timefalse_acceptance_rate (
float
) – The percentage of acceptable false acceptance
- abstain() ndarray ¶
Abstain from a prediction :return: A numpy array of zeros
- mitigate(x_val: ndarray) None ¶
Mitigates the effect of poison on a classifier
- Parameters:
x_val (
ndarray
) – Validation data to use to mitigate the effect of poison.
- property nb_classes: int¶
Return the number of output classes.
- Returns:
Number of classes in the data.
- predict(*args, **kwargs)¶
Perform prediction of the given classifier for a batch of inputs, potentially filtering suspicious input
- Parameters:
x – Input samples.
- Returns:
Array of predictions of shape (nb_inputs, nb_classes).