art.defences.detector.evasion

Module implementing detector-based defences against evasion attacks.

Binary Input Detector

class art.defences.detector.evasion.BinaryInputDetector(detector: ClassifierNeuralNetwork)

Binary detector of adversarial samples coming from evasion attacks. The detector uses an architecture provided by the user and trains it on data labeled as clean (label 0) or adversarial (label 1).

property channels_first
Returns

Boolean to indicate index of the color channels in the sample x.

class_gradient(x: numpy.ndarray, label: Optional[Union[int, List[int]]] = None, training_mode: bool = False, **kwargs)numpy.ndarray

Compute per-class derivatives w.r.t. x.

Return type

ndarray

Parameters
  • x (ndarray) – Sample input with shape as expected by the model.

  • label – Index of a specific per-class derivative. If an integer is provided, the gradient of that class output is computed for all samples. If multiple values as provided, the first dimension should match the batch size of x, and each value will be used as target for its corresponding sample in x. If None, then gradients for all classes will be computed for each sample.

  • training_mode (bool) – True for model set to training mode and ‘False for model set to evaluation mode.

Returns

Array of gradients of input features w.r.t. each class in the form (batch_size, nb_classes, input_shape) when computing for all classes, otherwise shape becomes (batch_size, 1, input_shape) when label parameter is specified.

property clip_values

Return the clip values of the input samples.

Returns

Clip values (min, max).

compute_loss(x: numpy.ndarray, y: numpy.ndarray, **kwargs)numpy.ndarray

Compute the loss of the neural network for samples x.

Parameters
  • x (ndarray) – Samples of shape (nb_samples, nb_features) or (nb_samples, nb_pixels_1, nb_pixels_2, nb_channels) or (nb_samples, nb_channels, nb_pixels_1, nb_pixels_2).

  • y (ndarray) – Target values (class labels) one-hot-encoded of shape (nb_samples, nb_classes) or indices of shape (nb_samples,).

Returns

Loss values.

Return type

Format as expected by the model

fit(*args, **kwargs)

Fit the detector using clean and adversarial samples.

Parameters
  • x – Training set to fit the detector.

  • y – Labels for the training set.

  • batch_size – Size of batches.

  • nb_epochs – Number of epochs to use for training.

  • kwargs – Other parameters.

fit_generator(generator: DataGenerator, nb_epochs: int = 20, **kwargs)None

Fit the classifier using the generator gen that yields batches as specified. This function is not supported for this detector.

Raises

NotImplementedException – This method is not supported for detectors.

get_activations(x: numpy.ndarray, layer: Union[int, str], batch_size: int, framework: bool = False)numpy.ndarray

Return the output of the specified layer for input x. layer is specified by layer index (between 0 and nb_layers - 1) or by name. The number of layers can be determined by counting the results returned by calling layer_names. This function is not supported for this detector.

Raises

NotImplementedException – This method is not supported for detectors.

property input_shape

Return the shape of one input sample.

Returns

Shape of one input sample.

loss_gradient(x: numpy.ndarray, y: numpy.ndarray, training_mode: bool = False, **kwargs)numpy.ndarray

Compute the gradient of the loss function w.r.t. x.

Return type

ndarray

Parameters
  • x (ndarray) – Sample input with shape as expected by the model.

  • y (ndarray) – Target values (class labels) one-hot-encoded of shape (nb_samples, nb_classes) or indices of shape (nb_samples,).

  • training_mode (bool) – True for model set to training mode and ‘False for model set to evaluation mode.

Returns

Array of gradients of the same shape as x.

property nb_classes

Return the number of output classes.

Returns

Number of classes in the data.

predict(*args, **kwargs)

Perform detection of adversarial data and return prediction as tuple.

Parameters
  • x – Data sample on which to perform detection.

  • batch_size – Size of batches.

Returns

Per-sample prediction whether data is adversarial or not, where 0 means non-adversarial. Return variable has the same batch_size (first dimension) as x.

save(filename: str, path: Optional[str] = None)None

Save the detector model.

param filename: The name of the saved file. param path: The path to the location of the saved file.

Binary Activation Detector

class art.defences.detector.evasion.BinaryActivationDetector(classifier: ClassifierNeuralNetwork, detector: ClassifierNeuralNetwork, layer: Union[int, str])

Binary detector of adversarial samples coming from evasion attacks. The detector uses an architecture provided by the user and is trained on the values of the activations of a classifier at a given layer.

property channels_first
Returns

Boolean to indicate index of the color channels in the sample x.

class_gradient(x: numpy.ndarray, label: Optional[Union[int, List[int]]] = None, training_mode: bool = False, **kwargs)numpy.ndarray

Compute per-class derivatives w.r.t. x.

Return type

ndarray

Parameters
  • x (ndarray) – Sample input with shape as expected by the model.

  • label – Index of a specific per-class derivative. If an integer is provided, the gradient of that class output is computed for all samples. If multiple values as provided, the first dimension should match the batch size of x, and each value will be used as target for its corresponding sample in x. If None, then gradients for all classes will be computed for each sample.

  • training_mode (bool) – True for model set to training mode and ‘False for model set to evaluation mode.

Returns

Array of gradients of input features w.r.t. each class in the form (batch_size, nb_classes, input_shape) when computing for all classes, otherwise shape becomes (batch_size, 1, input_shape) when label parameter is specified.

property clip_values

Return the clip values of the input samples.

Returns

Clip values (min, max).

compute_loss(x: numpy.ndarray, y: numpy.ndarray, **kwargs)numpy.ndarray

Compute the loss of the neural network for samples x.

Parameters
  • x (ndarray) – Samples of shape (nb_samples, nb_features) or (nb_samples, nb_pixels_1, nb_pixels_2, nb_channels) or (nb_samples, nb_channels, nb_pixels_1, nb_pixels_2).

  • y (ndarray) – Target values (class labels) one-hot-encoded of shape (nb_samples, nb_classes) or indices of shape (nb_samples,).

Returns

Loss values.

Return type

Format as expected by the model

fit(*args, **kwargs)

Fit the detector using training data.

Parameters
  • x – Training set to fit the detector.

  • y – Labels for the training set.

  • batch_size – Size of batches.

  • nb_epochs – Number of epochs to use for training.

  • kwargs – Other parameters.

fit_generator(generator: DataGenerator, nb_epochs: int = 20, **kwargs)None

Fit the classifier using the generator gen that yields batches as specified. This function is not supported for this detector.

Raises

NotImplementedException – This method is not supported for detectors.

get_activations(x: numpy.ndarray, layer: Union[int, str], batch_size: int, framework: bool = False)numpy.ndarray

Return the output of the specified layer for input x. layer is specified by layer index (between 0 and nb_layers - 1) or by name. The number of layers can be determined by counting the results returned by calling layer_names. This function is not supported for this detector.

Raises

NotImplementedException – This method is not supported for detectors.

property input_shape

Return the shape of one input sample.

Returns

Shape of one input sample.

property layer_names

Return the names of the hidden layers in the model, if applicable.

Returns

The names of the hidden layers in the model, input and output layers are ignored.

Warning

layer_names tries to infer the internal structure of the model. This feature comes with no guarantees on the correctness of the result. The intended order of the layers tries to match their order in the model, but this is not guaranteed either.

loss_gradient(x: numpy.ndarray, y: numpy.ndarray, training_mode: bool = False, **kwargs)numpy.ndarray

Compute the gradient of the loss function w.r.t. x.

Return type

ndarray

Parameters
  • x (ndarray) – Sample input with shape as expected by the model.

  • y (ndarray) – Target values (class labels) one-hot-encoded of shape (nb_samples, nb_classes) or indices of shape (nb_samples,).

  • training_mode (bool) – True for model set to training mode and ‘False for model set to evaluation mode.

Returns

Array of gradients of the same shape as x.

property nb_classes

Return the number of output classes.

Returns

Number of classes in the data.

predict(*args, **kwargs)

Perform detection of adversarial data and return prediction as tuple.

Parameters
  • x – Data sample on which to perform detection.

  • batch_size – Size of batches.

Returns

Per-sample prediction whether data is adversarial or not, where 0 means non-adversarial. Return variable has the same batch_size (first dimension) as x.

save(filename: str, path: Optional[str] = None)None

Save the detector model.

param filename: The name of the saved file. param path: The path to the location of the saved file.