art.estimators.poison_mitigation.neural_cleanse
¶
Neural cleanse estimators.
Keras Neural Cleanse Classifier¶

class
art.estimators.poison_mitigation.neural_cleanse.
KerasNeuralCleanse
(model: Union[keras.models.Model, tf.keras.models.Model], use_logits: bool = False, channels_first: bool = False, clip_values: Optional[CLIP_VALUES_TYPE] = None, preprocessing_defences: Optional[Union[Preprocessor, List[Preprocessor]]] = None, postprocessing_defences: Optional[Union[Postprocessor, List[Postprocessor]]] = None, preprocessing: PREPROCESSING_TYPE = (0, 1), input_layer: int = 0, output_layer: int = 0, steps: int = 1000, init_cost: float = 0.001, norm: Union[int, float] = 2, learning_rate: float = 0.1, attack_success_threshold: float = 0.99, patience: int = 5, early_stop: bool = True, early_stop_threshold: float = 0.99, early_stop_patience: int = 10, cost_multiplier: float = 1.5, batch_size: int = 32)¶ Implementation of methods in Neural Cleanse: Identifying and Mitigating Backdoor Attacks in Neural Networks. Wang et al. (2019).

__init__
(model: Union[keras.models.Model, tf.keras.models.Model], use_logits: bool = False, channels_first: bool = False, clip_values: Optional[CLIP_VALUES_TYPE] = None, preprocessing_defences: Optional[Union[Preprocessor, List[Preprocessor]]] = None, postprocessing_defences: Optional[Union[Postprocessor, List[Postprocessor]]] = None, preprocessing: PREPROCESSING_TYPE = (0, 1), input_layer: int = 0, output_layer: int = 0, steps: int = 1000, init_cost: float = 0.001, norm: Union[int, float] = 2, learning_rate: float = 0.1, attack_success_threshold: float = 0.99, patience: int = 5, early_stop: bool = True, early_stop_threshold: float = 0.99, early_stop_patience: int = 10, cost_multiplier: float = 1.5, batch_size: int = 32)¶ Create a Neural Cleanse classifier.
 Parameters
model – Keras model, neural network or other.
use_logits (
bool
) – True if the output of the model are logits; false for probabilities or any other type of outputs. Logits output should be favored when possible to ensure attack efficiency.channels_first (
bool
) – Set channels first or last.clip_values – Tuple of the form (min, max) of floats or np.ndarray representing the minimum and maximum values allowed for features. If floats are provided, these will be used as the range of all features. If arrays are provided, each value will be considered the bound for a feature, thus the shape of clip values needs to match the total number of features.
preprocessing_defences – Preprocessing defence(s) to be applied by the classifier.
postprocessing_defences – Postprocessing defence(s) to be applied by the classifier.
preprocessing – Tuple of the form (subtrahend, divisor) of floats or np.ndarray of values to be used for data preprocessing. The first value will be subtracted from the input. The input will then be divided by the second one.
input_layer (
int
) – The index of the layer to consider as input for models with multiple input layers. The layer with this index will be considered for computing gradients. For models with only one input layer this values is not required.output_layer (
int
) – Which layer to consider as the output when the models has multiple output layers. The layer with this index will be considered for computing gradients. For models with only one output layer this values is not required.steps (
int
) – The maximum number of steps to run the Neural Cleanse optimizationinit_cost (
float
) – The initial value for the cost tensor in the Neural Cleanse optimizationnorm – The norm to use for the Neural Cleanse optimization, can be 1, 2, or np.inf
learning_rate (
float
) – The learning rate for the Neural Cleanse optimizationattack_success_threshold (
float
) – The threshold at which the generated backdoor is successful enough to stop the Neural Cleanse optimizationpatience (
int
) – How long to wait for changing the cost multiplier in the Neural Cleanse optimizationearly_stop (
bool
) – Whether or not to allow early stopping in the Neural Cleanse optimizationearly_stop_threshold (
float
) – How close values need to come to max value to start counting early stopearly_stop_patience (
int
) – How long to wait to determine early stopping in the Neural Cleanse optimizationcost_multiplier (
float
) – How much to change the cost in the Neural Cleanse optimizationbatch_size (
int
) – The batch size for optimizations in the Neural Cleanse optimization

abstain
() → numpy.ndarray¶ Abstain from a prediction :return: A numpy array of zeros

backdoor_examples
(x_val: numpy.ndarray, y_val: numpy.ndarray) → Tuple[numpy.ndarray, numpy.ndarray, numpy.ndarray]¶ Generate reverseengineered backdoored examples using validation data :rtype:
Tuple
:type y_val:ndarray
:type x_val:ndarray
:param x_val: validation data :param y_val: validation labels :return: a tuple containing (clean data, backdoored data, labels)

property
channel_index
¶  Returns
Index of the axis containing the color channels in the samples x.

property
channels_first
¶  Returns
Boolean to indicate index of the color channels in the sample x.

check_backdoor_effective
(backdoor_data: numpy.ndarray, backdoor_labels: numpy.ndarray) → bool¶ Check if supposed backdoors are effective against the classifier
 Return type
bool
 Parameters
backdoor_data (
ndarray
) – data with the backdoor addedbackdoor_labels (
ndarray
) – the correct label for the data
 Returns
true if any of the backdoors are effective on the model

class_gradient
(x: numpy.ndarray, label: Optional[Union[int, List[int]]] = None, **kwargs) → numpy.ndarray¶ Compute perclass derivatives of the given classifier w.r.t. x of original classifier.
 Return type
ndarray
 Parameters
x (
ndarray
) – Sample input with shape as expected by the model.label – Index of a specific perclass derivative. If an integer is provided, the gradient of that class output is computed for all samples. If multiple values as provided, the first dimension should match the batch size of x, and each value will be used as target for its corresponding sample in x. If None, then gradients for all classes will be computed for each sample.
 Returns
Array of gradients of input features w.r.t. each class in the form (batch_size, nb_classes, input_shape) when computing for all classes, otherwise shape becomes (batch_size, 1, input_shape) when label parameter is specified.

property
clip_values
¶ Return the clip values of the input samples.
 Returns
Clip values (min, max).

custom_loss_gradient
(nn_function, tensors, input_values, name='default')¶ Returns the gradient of the nn_function with respect to model input
 Parameters
nn_function (a Keras tensor) – an intermediate tensor representation of the function to differentiate
tensors (list) – the tensors or variables to differentiate with respect to
input_values (list) – the inputs to evaluate the gradient
name (str) – The name of the function. Functions of the same name are cached
 Returns
the gradient of the function w.r.t vars
 Return type
np.ndarray

fit
(*args, **kwargs)¶ Fit the classifier on the training set (x, y).
 Parameters
x – Training data.
y – Target values (class labels) onehotencoded of shape (nb_samples, nb_classes) or index labels of shape (nb_samples,).
batch_size – Size of batches.
nb_epochs – Number of epochs to use for training.
kwargs – Dictionary of frameworkspecific arguments. These should be parameters supported by the fit_generator function in Keras and will be passed to this function as such. Including the number of epochs or the number of steps per epoch as part of this argument will result in as error.

fit_generator
(generator: DataGenerator, nb_epochs: int = 20, **kwargs) → None¶ Fit the classifier using the generator that yields batches as specified.
 Parameters
generator – Batch generator providing (x, y) for each epoch. If the generator can be used for native training in Keras, it will.
nb_epochs (
int
) – Number of epochs to use for training.kwargs – Dictionary of frameworkspecific arguments. These should be parameters supported by the fit_generator function in Keras and will be passed to this function as such. Including the number of epochs as part of this argument will result in as error.

generate_backdoor
(x_val: numpy.ndarray, y_val: numpy.ndarray, y_target: numpy.ndarray) → Tuple[numpy.ndarray, numpy.ndarray]¶ Generates a possible backdoor for the model. Returns the pattern and the mask :return: A tuple of the pattern and mask for the model.

get_activations
(*args, **kwargs)¶ Return the output of the specified layer for input x. layer is specified by layer index (between 0 and nb_layers  1) or by name. The number of layers can be determined by counting the results returned by calling layer_names.
 Parameters
x – Input for computing the activations.
layer – Layer for computing the activations.
batch_size – Size of batches.
framework – If true, return the intermediate tensor representation of the activation.
 Returns
The output of layer, where the first dimension is the batch size corresponding to x.

get_params
() → Dict[str, Any]¶ Get all parameters and their values of this estimator.
 Returns
A dictionary of string parameter names to their value.

property
input_shape
¶ Return the shape of one input sample.
 Returns
Shape of one input sample.

property
layer_names
¶ Return the names of the hidden layers in the model, if applicable.
 Returns
The names of the hidden layers in the model, input and output layers are ignored.
Warning
layer_names tries to infer the internal structure of the model. This feature comes with no guarantees on the correctness of the result. The intended order of the layers tries to match their order in the model, but this is not guaranteed either.

property
learning_phase
¶ The learning phase set by the user. Possible values are True for training or False for prediction and None if it has not been set by the library. In the latter case, the library does not do any explicit learning phase manipulation and the current value of the backend framework is used. If a value has been set by the user for this property, it will impact all following computations for model fitting, prediction and gradients.
 Returns
Learning phase.

loss
(x: numpy.ndarray, y: numpy.ndarray, reduction: str = 'none', **kwargs) → numpy.ndarray¶ Compute the loss of the neural network for samples x.
 Parameters
x (
ndarray
) – Samples of shape (nb_samples, nb_features) or (nb_samples, nb_pixels_1, nb_pixels_2, nb_channels) or (nb_samples, nb_channels, nb_pixels_1, nb_pixels_2).y (
ndarray
) – Target values (class labels) onehotencoded of shape (nb_samples, nb_classes) or indices of shape (nb_samples,).reduction (
str
) – Specifies the reduction to apply to the output: ‘none’  ‘mean’  ‘sum’. ‘none’: no reduction will be applied ‘mean’: the sum of the output will be divided by the number of elements in the output, ‘sum’: the output will be summed.
 Returns
Loss values.
 Return type
Format as expected by the model

loss_gradient
(x: numpy.ndarray, y: numpy.ndarray, **kwargs) → numpy.ndarray¶ Compute the gradient of the loss function w.r.t. x.
 Return type
ndarray
 Parameters
x (
ndarray
) – Sample input with shape as expected by the model.y (
ndarray
) – Target values (class labels) onehotencoded of shape (nb_samples, nb_classes) or indices of shape (nb_samples,).
 Returns
Array of gradients of the same shape as x.

mitigate
(x_val: numpy.ndarray, y_val: numpy.ndarray, mitigation_types: List[str]) → None¶ Mitigates the effect of poison on a classifier
 Parameters
x_val (
ndarray
) – Validation data to use to mitigate the effect of poison.y_val (
ndarray
) – Validation labels to use to mitigate the effect of poison.mitigation_types (
List
) – The types of mitigation method, can include ‘unlearning’, ‘pruning’, or ‘filtering’
 Returns
Tuple of length 2 of the selected class and certified radius.

property
model
¶ Return the model.
 Returns
The model.

property
nb_classes
¶ Return the number of output classes.
 Returns
Number of classes in the data.

outlier_detection
(x_val: numpy.ndarray, y_val: numpy.ndarray) → List[Tuple[int, numpy.ndarray, numpy.ndarray]]¶ Returns a tuple of suspected of suspected poison labels and their mask and pattern :return: A list of tuples containing the the class index, mask, and pattern for suspected labels

predict
(*args, **kwargs)¶ Perform prediction of the given classifier for a batch of inputs, potentially filtering suspicious input
 Parameters
x – Input data to predict.
batch_size – Batch size.
 Returns
Array of predictions of shape (nb_inputs, nb_classes).

reset
()¶ Reset the state of the defense :return:

save
(filename: str, path: Optional[str] = None) → None¶ Save a model to file in the format specific to the backend framework. For Keras, .h5 format is used.
 Parameters
filename (
str
) – Name of the file where to store the model.path – Path of the folder where to store the model. If no path is specified, the model will be stored in the default data location of the library ART_DATA_PATH.

set_learning_phase
(train: bool) → None¶ Set the learning phase for the backend framework.
 Parameters
train (
bool
) – True to set the learning phase to training, False to set it to prediction.

set_params
(**kwargs) → None¶ Take a dictionary of parameters and apply checks before setting them as attributes.
 Parameters
kwargs – A dictionary of attributes.
