art.defences.detector.evasion
¶
Module implementing detector-based defences against evasion attacks.
Binary Input Detector¶
- class art.defences.detector.evasion.BinaryInputDetector(detector: ClassifierNeuralNetwork)¶
Binary detector of adversarial samples coming from evasion attacks. The detector uses an architecture provided by the user and trains it on data labeled as clean (label 0) or adversarial (label 1).
- property channels_first: bool¶
- Returns
Boolean to indicate index of the color channels in the sample x.
- class_gradient(x: numpy.ndarray, label: Optional[Union[int, List[int]]] = None, training_mode: bool = False, **kwargs) numpy.ndarray ¶
Compute per-class derivatives w.r.t. x.
- Return type
ndarray
- Parameters
x (
ndarray
) – Sample input with shape as expected by the model.label – Index of a specific per-class derivative. If an integer is provided, the gradient of that class output is computed for all samples. If multiple values as provided, the first dimension should match the batch size of x, and each value will be used as target for its corresponding sample in x. If None, then gradients for all classes will be computed for each sample.
training_mode (
bool
) – True for model set to training mode and ‘False for model set to evaluation mode.
- Returns
Array of gradients of input features w.r.t. each class in the form (batch_size, nb_classes, input_shape) when computing for all classes, otherwise shape becomes (batch_size, 1, input_shape) when label parameter is specified.
- property clip_values: Optional[CLIP_VALUES_TYPE]¶
Return the clip values of the input samples.
- Returns
Clip values (min, max).
- compute_loss(x: numpy.ndarray, y: numpy.ndarray, **kwargs) numpy.ndarray ¶
Compute the loss of the neural network for samples x.
- Parameters
x (
ndarray
) – Samples of shape (nb_samples, nb_features) or (nb_samples, nb_pixels_1, nb_pixels_2, nb_channels) or (nb_samples, nb_channels, nb_pixels_1, nb_pixels_2).y (
ndarray
) – Target values (class labels) one-hot-encoded of shape (nb_samples, nb_classes) or indices of shape (nb_samples,).
- Returns
Loss values.
- Return type
Format as expected by the model
- fit(*args, **kwargs)¶
Fit the detector using clean and adversarial samples.
- Parameters
x – Training set to fit the detector.
y – Labels for the training set.
batch_size – Size of batches.
nb_epochs – Number of epochs to use for training.
kwargs – Other parameters.
- fit_generator(generator: DataGenerator, nb_epochs: int = 20, **kwargs) None ¶
Fit the classifier using the generator gen that yields batches as specified. This function is not supported for this detector.
- Raises
NotImplementedException – This method is not supported for detectors.
- get_activations(x: numpy.ndarray, layer: Union[int, str], batch_size: int, framework: bool = False) numpy.ndarray ¶
Return the output of the specified layer for input x. layer is specified by layer index (between 0 and nb_layers - 1) or by name. The number of layers can be determined by counting the results returned by calling layer_names. This function is not supported for this detector.
- Raises
NotImplementedException – This method is not supported for detectors.
- property input_shape: Tuple[int, ...]¶
Return the shape of one input sample.
- Returns
Shape of one input sample.
- loss_gradient(x: numpy.ndarray, y: numpy.ndarray, training_mode: bool = False, **kwargs) numpy.ndarray ¶
Compute the gradient of the loss function w.r.t. x.
- Return type
ndarray
- Parameters
x (
ndarray
) – Sample input with shape as expected by the model.y (
ndarray
) – Target values (class labels) one-hot-encoded of shape (nb_samples, nb_classes) or indices of shape (nb_samples,).training_mode (
bool
) – True for model set to training mode and ‘False for model set to evaluation mode.
- Returns
Array of gradients of the same shape as x.
- property nb_classes: int¶
Return the number of output classes.
- Returns
Number of classes in the data.
- predict(*args, **kwargs)¶
Perform detection of adversarial data and return prediction as tuple.
- Parameters
x – Data sample on which to perform detection.
batch_size – Size of batches.
- Returns
Per-sample prediction whether data is adversarial or not, where 0 means non-adversarial. Return variable has the same batch_size (first dimension) as x.
- save(filename: str, path: Optional[str] = None) None ¶
Save the detector model.
param filename: The name of the saved file. param path: The path to the location of the saved file.
Binary Activation Detector¶
- class art.defences.detector.evasion.BinaryActivationDetector(classifier: ClassifierNeuralNetwork, detector: ClassifierNeuralNetwork, layer: Union[int, str])¶
Binary detector of adversarial samples coming from evasion attacks. The detector uses an architecture provided by the user and is trained on the values of the activations of a classifier at a given layer.
- property channels_first: bool¶
- Returns
Boolean to indicate index of the color channels in the sample x.
- class_gradient(x: numpy.ndarray, label: Optional[Union[int, List[int]]] = None, training_mode: bool = False, **kwargs) numpy.ndarray ¶
Compute per-class derivatives w.r.t. x.
- Return type
ndarray
- Parameters
x (
ndarray
) – Sample input with shape as expected by the model.label – Index of a specific per-class derivative. If an integer is provided, the gradient of that class output is computed for all samples. If multiple values as provided, the first dimension should match the batch size of x, and each value will be used as target for its corresponding sample in x. If None, then gradients for all classes will be computed for each sample.
training_mode (
bool
) – True for model set to training mode and ‘False for model set to evaluation mode.
- Returns
Array of gradients of input features w.r.t. each class in the form (batch_size, nb_classes, input_shape) when computing for all classes, otherwise shape becomes (batch_size, 1, input_shape) when label parameter is specified.
- property clip_values: Optional[CLIP_VALUES_TYPE]¶
Return the clip values of the input samples.
- Returns
Clip values (min, max).
- compute_loss(x: numpy.ndarray, y: numpy.ndarray, **kwargs) numpy.ndarray ¶
Compute the loss of the neural network for samples x.
- Parameters
x (
ndarray
) – Samples of shape (nb_samples, nb_features) or (nb_samples, nb_pixels_1, nb_pixels_2, nb_channels) or (nb_samples, nb_channels, nb_pixels_1, nb_pixels_2).y (
ndarray
) – Target values (class labels) one-hot-encoded of shape (nb_samples, nb_classes) or indices of shape (nb_samples,).
- Returns
Loss values.
- Return type
Format as expected by the model
- fit(*args, **kwargs)¶
Fit the detector using training data.
- Parameters
x – Training set to fit the detector.
y – Labels for the training set.
batch_size – Size of batches.
nb_epochs – Number of epochs to use for training.
kwargs – Other parameters.
- fit_generator(generator: DataGenerator, nb_epochs: int = 20, **kwargs) None ¶
Fit the classifier using the generator gen that yields batches as specified. This function is not supported for this detector.
- Raises
NotImplementedException – This method is not supported for detectors.
- get_activations(x: numpy.ndarray, layer: Union[int, str], batch_size: int, framework: bool = False) numpy.ndarray ¶
Return the output of the specified layer for input x. layer is specified by layer index (between 0 and nb_layers - 1) or by name. The number of layers can be determined by counting the results returned by calling layer_names. This function is not supported for this detector.
- Raises
NotImplementedException – This method is not supported for detectors.
- property input_shape: Tuple[int, ...]¶
Return the shape of one input sample.
- Returns
Shape of one input sample.
- property layer_names: List[str]¶
Return the names of the hidden layers in the model, if applicable.
- Returns
The names of the hidden layers in the model, input and output layers are ignored.
Warning
layer_names tries to infer the internal structure of the model. This feature comes with no guarantees on the correctness of the result. The intended order of the layers tries to match their order in the model, but this is not guaranteed either.
- loss_gradient(x: numpy.ndarray, y: numpy.ndarray, training_mode: bool = False, **kwargs) numpy.ndarray ¶
Compute the gradient of the loss function w.r.t. x.
- Return type
ndarray
- Parameters
x (
ndarray
) – Sample input with shape as expected by the model.y (
ndarray
) – Target values (class labels) one-hot-encoded of shape (nb_samples, nb_classes) or indices of shape (nb_samples,).training_mode (
bool
) – True for model set to training mode and ‘False for model set to evaluation mode.
- Returns
Array of gradients of the same shape as x.
- property nb_classes: int¶
Return the number of output classes.
- Returns
Number of classes in the data.
- predict(*args, **kwargs)¶
Perform detection of adversarial data and return prediction as tuple.
- Parameters
x – Data sample on which to perform detection.
batch_size – Size of batches.
- Returns
Per-sample prediction whether data is adversarial or not, where 0 means non-adversarial. Return variable has the same batch_size (first dimension) as x.
- save(filename: str, path: Optional[str] = None) None ¶
Save the detector model.
param filename: The name of the saved file. param path: The path to the location of the saved file.