art.estimators.certification.neural_cleanse

Neural cleanse estimators.

Mixin Base Class Neural Cleanse

class art.estimators.certification.neural_cleanse.NeuralCleanseMixin(steps: int = 1000, *args, init_cost: float = 0.001, norm: Union[int, float] = 2, learning_rate: float = 0.1, attack_success_threshold: float = 0.99, patience: int = 5, early_stop: bool = True, early_stop_threshold: float = 0.99, early_stop_patience: int = 10, cost_multiplier: float = 1.5, batch_size: int = 32, **kwargs)

Implementation of methods in Neural Cleanse: Identifying and Mitigating Backdoor Attacks in Neural Networks. Wang et al. (2019).

backdoor_examples(x_val: numpy.ndarray, y_val: numpy.ndarray) → Tuple[numpy.ndarray, numpy.ndarray, numpy.ndarray]

Generate reverse-engineered backdoored examples using validation data :rtype: Tuple :type y_val: ndarray :type x_val: ndarray :param x_val: validation data :param y_val: validation labels :return: a tuple containing (clean data, backdoored data, labels)

check_backdoor_effective(backdoor_data: numpy.ndarray, backdoor_labels: numpy.ndarray) → bool

Check if supposed backdoors are effective against the classifier

Return type

bool

Parameters
  • backdoor_data (ndarray) – data with the backdoor added

  • backdoor_labels (ndarray) – the correct label for the data

Returns

true if any of the backdoors are effective on the model

generate_backdoor(x_val: numpy.ndarray, y_val: numpy.ndarray, y_target: numpy.ndarray) → Tuple[numpy.ndarray, numpy.ndarray]

Generates a possible backdoor for the model. Returns the pattern and the mask :return: A tuple of the pattern and mask for the model.

mitigate(x_val: numpy.ndarray, y_val: numpy.ndarray, mitigation_types: List[str]) → None

Mitigates the effect of poison on a classifier

Parameters
  • x_val (ndarray) – Validation data to use to mitigate the effect of poison.

  • y_val (ndarray) – Validation labels to use to mitigate the effect of poison.

  • mitigation_types (List) – The types of mitigation method, can include ‘unlearning’, ‘pruning’, or ‘filtering’

Returns

Tuple of length 2 of the selected class and certified radius.

outlier_detection(x_val: numpy.ndarray, y_val: numpy.ndarray) → List[Tuple[int, numpy.ndarray, numpy.ndarray]]

Returns a tuple of suspected of suspected poison labels and their mask and pattern :return: A list of tuples containing the the class index, mask, and pattern for suspected labels

predict(*args, **kwargs)

Perform prediction of the given classifier for a batch of inputs, potentially filtering suspicious input

Parameters
  • x – Test set.

  • batch_size – Batch size.

Returns

Array of predictions of shape (nb_inputs, nb_classes).

Keras Neural Cleanse Classifier

class art.estimators.certification.neural_cleanse.KerasNeuralCleanse(model: Union[keras.models.Model, tf.keras.models.Model], use_logits: bool = False, channels_first: bool = False, clip_values: Optional[CLIP_VALUES_TYPE] = None, preprocessing_defences: Optional[Union[Preprocessor, List[Preprocessor]]] = None, postprocessing_defences: Optional[Union[Postprocessor, List[Postprocessor]]] = None, preprocessing: PREPROCESSING_TYPE = 0, 1, input_layer: int = 0, output_layer: int = 0, steps: int = 1000, init_cost: float = 0.001, norm: Union[int, float] = 2, learning_rate: float = 0.1, attack_success_threshold: float = 0.99, patience: int = 5, early_stop: bool = True, early_stop_threshold: float = 0.99, early_stop_patience: int = 10, cost_multiplier: float = 1.5, batch_size: int = 32)

Implementation of methods in Neural Cleanse: Identifying and Mitigating Backdoor Attacks in Neural Networks. Wang et al. (2019).

__init__(model: Union[keras.models.Model, tf.keras.models.Model], use_logits: bool = False, channels_first: bool = False, clip_values: Optional[CLIP_VALUES_TYPE] = None, preprocessing_defences: Optional[Union[Preprocessor, List[Preprocessor]]] = None, postprocessing_defences: Optional[Union[Postprocessor, List[Postprocessor]]] = None, preprocessing: PREPROCESSING_TYPE = 0, 1, input_layer: int = 0, output_layer: int = 0, steps: int = 1000, init_cost: float = 0.001, norm: Union[int, float] = 2, learning_rate: float = 0.1, attack_success_threshold: float = 0.99, patience: int = 5, early_stop: bool = True, early_stop_threshold: float = 0.99, early_stop_patience: int = 10, cost_multiplier: float = 1.5, batch_size: int = 32)

Create a Neural Cleanse classifier.

Parameters
  • model – Keras model, neural network or other.

  • use_logits (bool) – True if the output of the model are logits; false for probabilities or any other type of outputs. Logits output should be favored when possible to ensure attack efficiency.

  • channels_first (bool) – Set channels first or last.

  • clip_values – Tuple of the form (min, max) of floats or np.ndarray representing the minimum and maximum values allowed for features. If floats are provided, these will be used as the range of all features. If arrays are provided, each value will be considered the bound for a feature, thus the shape of clip values needs to match the total number of features.

  • preprocessing_defences – Preprocessing defence(s) to be applied by the classifier.

  • postprocessing_defences – Postprocessing defence(s) to be applied by the classifier.

  • preprocessing – Tuple of the form (subtrahend, divisor) of floats or np.ndarray of values to be used for data preprocessing. The first value will be subtracted from the input. The input will then be divided by the second one.

  • input_layer (int) – The index of the layer to consider as input for models with multiple input layers. The layer with this index will be considered for computing gradients. For models with only one input layer this values is not required.

  • output_layer (int) – Which layer to consider as the output when the models has multiple output layers. The layer with this index will be considered for computing gradients. For models with only one output layer this values is not required.

  • steps (int) – The maximum number of steps to run the Neural Cleanse optimization

  • init_cost (float) – The initial value for the cost tensor in the Neural Cleanse optimization

  • norm – The norm to use for the Neural Cleanse optimization, can be 1, 2, or np.inf

  • learning_rate (float) – The learning rate for the Neural Cleanse optimization

  • attack_success_threshold (float) – The threshold at which the generated backdoor is successful enough to stop the Neural Cleanse optimization

  • patience (int) – How long to wait for changing the cost multiplier in the Neural Cleanse optimization

  • early_stop (bool) – Whether or not to allow early stopping in the Neural Cleanse optimization

  • early_stop_threshold (float) – How close values need to come to max value to start counting early stop

  • early_stop_patience (int) – How long to wait to determine early stopping in the Neural Cleanse optimization

  • cost_multiplier (float) – How much to change the cost in the Neural Cleanse optimization

  • batch_size (int) – The batch size for optimizations in the Neural Cleanse optimization

abstain() → numpy.ndarray

Abstain from a prediction :return: A numpy array of zeros

backdoor_examples(x_val: numpy.ndarray, y_val: numpy.ndarray) → Tuple[numpy.ndarray, numpy.ndarray, numpy.ndarray]

Generate reverse-engineered backdoored examples using validation data :rtype: Tuple :type y_val: ndarray :type x_val: ndarray :param x_val: validation data :param y_val: validation labels :return: a tuple containing (clean data, backdoored data, labels)

property channel_index
Returns

Index of the axis containing the color channels in the samples x.

property channels_first
Returns

Boolean to indicate index of the color channels in the sample x.

check_backdoor_effective(backdoor_data: numpy.ndarray, backdoor_labels: numpy.ndarray) → bool

Check if supposed backdoors are effective against the classifier

Return type

bool

Parameters
  • backdoor_data (ndarray) – data with the backdoor added

  • backdoor_labels (ndarray) – the correct label for the data

Returns

true if any of the backdoors are effective on the model

class_gradient(*args, **kwargs)

Compute per-class derivatives of the given classifier w.r.t. x of original classifier.

Parameters
  • x – Sample input with shape as expected by the model.

  • label – Index of a specific per-class derivative. If an integer is provided, the gradient of that class output is computed for all samples. If multiple values as provided, the first dimension should match the batch size of x, and each value will be used as target for its corresponding sample in x. If None, then gradients for all classes will be computed for each sample.

Returns

Array of gradients of input features w.r.t. each class in the form (batch_size, nb_classes, input_shape) when computing for all classes, otherwise shape becomes (batch_size, 1, input_shape) when label parameter is specified.

property clip_values

Return the clip values of the input samples.

Returns

Clip values (min, max).

custom_loss_gradient(nn_function, tensors, input_values, name='default')

Returns the gradient of the nn_function with respect to model input

Parameters
  • nn_function (a Keras tensor) – an intermediate tensor representation of the function to differentiate

  • tensors (list) – the tensors or variables to differentiate with respect to

  • input_values (list) – the inputs to evaluate the gradient

  • name (str) – The name of the function. Functions of the same name are cached

Returns

the gradient of the function w.r.t vars

Return type

np.ndarray

fit(*args, **kwargs)

Fit the classifier on the training set (x, y).

Parameters
  • x – Training data.

  • y – Target values (class labels) one-hot-encoded of shape (nb_samples, nb_classes) or index labels of shape (nb_samples,).

  • batch_size – Size of batches.

  • nb_epochs – Number of epochs to use for training.

  • kwargs – Dictionary of framework-specific arguments. These should be parameters supported by the fit_generator function in Keras and will be passed to this function as such. Including the number of epochs or the number of steps per epoch as part of this argument will result in as error.

fit_generator(generator: DataGenerator, nb_epochs: int = 20, **kwargs) → None

Fit the classifier using the generator that yields batches as specified.

Parameters
  • generator – Batch generator providing (x, y) for each epoch. If the generator can be used for native training in Keras, it will.

  • nb_epochs (int) – Number of epochs to use for training.

  • kwargs – Dictionary of framework-specific arguments. These should be parameters supported by the fit_generator function in Keras and will be passed to this function as such. Including the number of epochs as part of this argument will result in as error.

generate_backdoor(x_val: numpy.ndarray, y_val: numpy.ndarray, y_target: numpy.ndarray) → Tuple[numpy.ndarray, numpy.ndarray]

Generates a possible backdoor for the model. Returns the pattern and the mask :return: A tuple of the pattern and mask for the model.

get_activations(*args, **kwargs)

Return the output of the specified layer for input x. layer is specified by layer index (between 0 and nb_layers - 1) or by name. The number of layers can be determined by counting the results returned by calling layer_names.

Parameters
  • x – Input for computing the activations.

  • layer – Layer for computing the activations.

  • batch_size – Size of batches.

  • framework – If true, return the intermediate tensor representation of the activation.

Returns

The output of layer, where the first dimension is the batch size corresponding to x.

get_params() → Dict[str, Any]

Get all parameters and their values of this estimator.

Returns

A dictionary of string parameter names to their value.

property input_shape

Return the shape of one input sample.

Returns

Shape of one input sample.

property layer_names

Return the names of the hidden layers in the model, if applicable.

Returns

The names of the hidden layers in the model, input and output layers are ignored.

Warning

layer_names tries to infer the internal structure of the model. This feature comes with no guarantees on the correctness of the result. The intended order of the layers tries to match their order in the model, but this is not guaranteed either.

property learning_phase

The learning phase set by the user. Possible values are True for training or False for prediction and None if it has not been set by the library. In the latter case, the library does not do any explicit learning phase manipulation and the current value of the backend framework is used. If a value has been set by the user for this property, it will impact all following computations for model fitting, prediction and gradients.

Returns

Learning phase.

loss(x: numpy.ndarray, y: numpy.ndarray, **kwargs) → numpy.ndarray

Compute the loss of the neural network for samples x.

Parameters
  • x (ndarray) – Samples of shape (nb_samples, nb_features) or (nb_samples, nb_pixels_1, nb_pixels_2, nb_channels) or (nb_samples, nb_channels, nb_pixels_1, nb_pixels_2).

  • y (ndarray) – Target values (class labels) one-hot-encoded of shape (nb_samples, nb_classes) or indices of shape (nb_samples,).

Returns

Loss values.

Return type

Format as expected by the model

loss_gradient(*args, **kwargs)

Compute the gradient of the loss function w.r.t. x.

Parameters
  • x – Sample input with shape as expected by the model.

  • y – Target values (class labels) one-hot-encoded of shape (nb_samples, nb_classes) or indices of shape (nb_samples,).

Returns

Array of gradients of the same shape as x.

mitigate(x_val: numpy.ndarray, y_val: numpy.ndarray, mitigation_types: List[str]) → None

Mitigates the effect of poison on a classifier

Parameters
  • x_val (ndarray) – Validation data to use to mitigate the effect of poison.

  • y_val (ndarray) – Validation labels to use to mitigate the effect of poison.

  • mitigation_types (List) – The types of mitigation method, can include ‘unlearning’, ‘pruning’, or ‘filtering’

Returns

Tuple of length 2 of the selected class and certified radius.

property model

Return the model.

Returns

The model.

property nb_classes

Return the number of output classes.

Returns

Number of classes in the data.

outlier_detection(x_val: numpy.ndarray, y_val: numpy.ndarray) → List[Tuple[int, numpy.ndarray, numpy.ndarray]]

Returns a tuple of suspected of suspected poison labels and their mask and pattern :return: A list of tuples containing the the class index, mask, and pattern for suspected labels

predict(*args, **kwargs)

Perform prediction of the given classifier for a batch of inputs, potentially filtering suspicious input

Parameters
  • x – Input data to predict.

  • batch_size – Batch size.

Returns

Array of predictions of shape (nb_inputs, nb_classes).

reset()

Reset the state of the defense :return:

save(filename: str, path: Optional[str] = None) → None

Save a model to file in the format specific to the backend framework. For Keras, .h5 format is used.

Parameters
  • filename (str) – Name of the file where to store the model.

  • path – Path of the folder where to store the model. If no path is specified, the model will be stored in the default data location of the library ART_DATA_PATH.

set_learning_phase(train: bool) → None

Set the learning phase for the backend framework.

Parameters

train (bool) – True to set the learning phase to training, False to set it to prediction.

set_params(**kwargs) → None

Take a dictionary of parameters and apply checks before setting them as attributes.

Parameters

kwargs – A dictionary of attributes.