`art.estimators.object_detection`¶

Module containing estimators for object detection.

Mixin Base Class Object Detector¶

class art.estimators.object_detection.ObjectDetectorMixin¶

Mix-in Base class for ART object detectors.

abstract property native_label_is_pytorch_format: bool¶: Are the native labels in PyTorch format [x1, y1, x2, y2]?

Object Detector PyTorch¶

class art.estimators.object_detection.PyTorchObjectDetector(model: torch.nn.Module, input_shape: Tuple[int, ...] = (-1, -1, -1), optimizer: torch.optim.Optimizer | None = None, clip_values: CLIP_VALUES_TYPE | None = None, channels_first: bool | None = True, preprocessing_defences: Preprocessor | List[Preprocessor] | None = None, postprocessing_defences: Postprocessor | List[Postprocessor] | None = None, preprocessing: PREPROCESSING_TYPE = None, attack_losses: Tuple[str, ...] = ('loss_classifier', 'loss_box_reg', 'loss_objectness', 'loss_rpn_box_reg'), device_type: str = 'gpu')¶

This module implements the task specific estimator for PyTorch object detection models following the input and output formats of torchvision.

__init__(model: torch.nn.Module, input_shape: Tuple[int, ...] = (-1, -1, -1), optimizer: torch.optim.Optimizer | None = None, clip_values: CLIP_VALUES_TYPE | None = None, channels_first: bool | None = True, preprocessing_defences: Preprocessor | List[Preprocessor] | None = None, postprocessing_defences: Postprocessor | List[Postprocessor] | None = None, preprocessing: PREPROCESSING_TYPE = None, attack_losses: Tuple[str, ...] = ('loss_classifier', 'loss_box_reg', 'loss_objectness', 'loss_rpn_box_reg'), device_type: str = 'gpu')¶

Initialization.

Parameters:

model –
Object detection model. The output of the model is List[Dict[str, torch.Tensor]], one for each input image. The fields of the Dict are as follows:
- boxes [N, 4]: the boxes in [x1, y1, x2, y2] format, with 0 <= x1 < x2 <= W and 0 <= y1 < y2 <= H.
- labels [N]: the labels for each image.
- scores [N]: the scores of each prediction.
input_shape – The shape of one input sample.
optimizer – The optimizer for training the classifier.
clip_values – Tuple of the form (min, max) of floats or np.ndarray representing the minimum and maximum values allowed for features. If floats are provided, these will be used as the range of all features. If arrays are provided, each value will be considered the bound for a feature, thus the shape of clip values needs to match the total number of features.
channels_first – Set channels first or last.
preprocessing_defences – Preprocessing defence(s) to be applied by the classifier.
postprocessing_defences – Postprocessing defence(s) to be applied by the classifier.
preprocessing – Tuple of the form (subtrahend, divisor) of floats or np.ndarray of values to be used for data preprocessing. The first value will be subtracted from the input. The input will then be divided by the second one.
attack_losses – Tuple of any combination of strings of loss components: ‘loss_classifier’, ‘loss_box_reg’, ‘loss_objectness’, and ‘loss_rpn_box_reg’.
device_type (str) – Type of device to be used for model and tensors, if cpu run on CPU, if gpu run on GPU if available otherwise run on CPU.

property attack_losses: Tuple[str, ...]¶

Return the combination of strings of the loss components.

Returns:: The combination of strings of the loss components.

property channels_first: bool¶

Returns:: Boolean to indicate index of the color channels in the sample x.

property clip_values: CLIP_VALUES_TYPE | None¶

Return the clip values of the input samples.

Returns:: Clip values (min, max).

clone_for_refitting() → ESTIMATOR_TYPE¶: Clone estimator for refitting.

compute_loss(x: ndarray, y: List[Dict[str, ndarray | torch.Tensor]], **kwargs) → ndarray | torch.Tensor¶

Compute the loss of the neural network for samples x.

Parameters:

x (ndarray) – Samples of shape NCHW or NHWC.
y –
Target values of format List[Dict[str, Union[np.ndarray, torch.Tensor]]], one for each input image. The fields of the Dict are as follows:
- boxes [N, 4]: the boxes in [x1, y1, x2, y2] format, with 0 <= x1 < x2 <= W and 0 <= y1 < y2 <= H.
- labels [N]: the labels for each image.

Returns:

Loss.

compute_loss_from_predictions(pred: ndarray, y: ndarray, **kwargs) → ndarray¶

Compute the loss of the estimator for predictions pred.

Return type:

ndarray

Parameters:

pred (ndarray) – Model predictions.
y (ndarray) – Target values.

Returns:

Loss values.

compute_losses(x: ndarray, y: List[Dict[str, ndarray | torch.Tensor]]) → Dict[str, ndarray]¶

Compute all loss components.

Parameters:

x (ndarray) – Samples of shape NCHW or NHWC.
y –
Target values of format List[Dict[str, Union[np.ndarray, torch.Tensor]]], one for each input image. The fields of the Dict are as follows:
- boxes [N, 4]: the boxes in [x1, y1, x2, y2] format, with 0 <= x1 < x2 <= W and 0 <= y1 < y2 <= H.
- labels [N]: the labels for each image.

Returns:

Dictionary of loss components.

property device: torch.device¶

Get current used device.

Returns:: Current used device.

property device_type: str¶

Return the type of device on which the estimator is run.

Returns:: Type of device on which the estimator is run, either gpu or cpu.

fit(x: ndarray, y: List[Dict[str, ndarray | torch.Tensor]], batch_size: int = 128, nb_epochs: int = 10, drop_last: bool = False, scheduler: torch.optim.lr_scheduler._LRScheduler | None = None, **kwargs) → None¶

Fit the classifier on the training set (x, y).

Parameters:

x (ndarray) – Samples of shape NCHW or NHWC.
y –
Target values of format List[Dict[str, Union[np.ndarray, torch.Tensor]]], one for each input image. The fields of the Dict are as follows:
- boxes [N, 4]: the boxes in [x1, y1, x2, y2] format, with 0 <= x1 < x2 <= W and 0 <= y1 < y2 <= H.
- labels [N]: the labels for each image.
batch_size (int) – Size of batches.
nb_epochs (int) – Number of epochs to use for training.
drop_last (bool) – Set to True to drop the last incomplete batch, if the dataset size is not divisible by the batch size. If False and the size of dataset is not divisible by the batch size, then the last batch will be smaller. (default: False)
scheduler – Learning rate scheduler to run at the start of every epoch.
kwargs – Dictionary of framework-specific arguments. This parameter is not currently supported for PyTorch and providing it takes no effect.

fit_generator(generator: DataGenerator, nb_epochs: int = 20, **kwargs) → None¶

Fit the estimator using a generator yielding training batches. Implementations can provide framework-specific versions of this function to speed-up computation.

Parameters:

generator – Batch generator providing (x, y) for each epoch.
nb_epochs (int) – Number of training epochs.

get_activations(x: ndarray, layer: int | str, batch_size: int, framework: bool = False) → ndarray¶

Return the output of a specific layer for samples x where layer is the index of the layer between 0 and nb_layers - 1 or the name of the layer. The number of layers can be determined by counting the results returned by calling `layer_names.

Return type:

ndarray

Parameters:

x (ndarray) – Samples
layer – Index or name of the layer.
batch_size (int) – Batch size.
framework (bool) – If true, return the intermediate tensor representation of the activation.

Returns:

The output of layer, where the first dimension is the batch size corresponding to x.

get_params() → Dict[str, Any]¶

Get all parameters and their values of this estimator.

Returns:: A dictionary of string parameter names to their value.

property input_shape: Tuple[int, ...]¶

Return the shape of one input sample.

Returns:: Shape of one input sample.

property layer_names: List[str] | None¶

Return the names of the hidden layers in the model, if applicable.

Returns:: The names of the hidden layers in the model, input and output layers are ignored.

Warning

layer_names tries to infer the internal structure of the model. This feature comes with no guarantees on the correctness of the result. The intended order of the layers tries to match their order in the model, but this is not guaranteed either.

loss_gradient(x: ndarray, y: List[Dict[str, ndarray | torch.Tensor]], **kwargs) → ndarray¶

Compute the gradient of the loss function w.r.t. x.

Return type:

ndarray

Parameters:

x (ndarray) – Samples of shape NCHW or NHWC.
y –
Target values of format List[Dict[str, Union[np.ndarray, torch.Tensor]]], one for each input image. The fields of the Dict are as follows:
- boxes [N, 4]: the boxes in [x1, y1, x2, y2] format, with 0 <= x1 < x2 <= W and 0 <= y1 < y2 <= H.
- labels [N]: the labels for each image.

Returns:

Loss gradients of the same shape as x.

property model: torch.nn.Module¶

Return the model.

Returns:: The model.

property native_label_is_pytorch_format: bool¶: Are the native labels in PyTorch format [x1, y1, x2, y2]?

property optimizer: torch.optim.Optimizer | None¶

Return the optimizer.

Returns:: The optimizer.

predict(x: ndarray, batch_size: int = 128, **kwargs) → List[Dict[str, ndarray]]¶

Perform prediction for a batch of inputs.

Parameters:

x (ndarray) – Samples of shape NCHW or NHWC.
batch_size (int) – Batch size.

Returns:

Predictions of format List[Dict[str, np.ndarray]], one for each input image. The fields of the Dict are as follows:

boxes [N, 4]: the boxes in [x1, y1, x2, y2] format, with 0 <= x1 < x2 <= W and 0 <= y1 < y2 <= H.
labels [N]: the labels for each image
scores [N]: the scores or each prediction.

set_batchnorm(train: bool) → None¶

Set all batch normalization layers into train or eval mode.

Parameters:: train (bool) – False for evaluation mode.

set_dropout(train: bool) → None¶

Set all dropout layers into train or eval mode.

Parameters:: train (bool) – False for evaluation mode.

set_multihead_attention(train: bool) → None¶

Set all multi-head attention layers into train or eval mode.

Parameters:: train (bool) – False for evaluation mode.

set_params(**kwargs) → None¶

Take a dictionary of parameters and apply checks before setting them as attributes.

Parameters:: kwargs – A dictionary of attributes.

Object Detector PyTorch Faster-RCNN¶

class art.estimators.object_detection.PyTorchFasterRCNN(model: torchvision.models.detection.FasterRCNN | None = None, input_shape: Tuple[int, ...] = (-1, -1, -1), optimizer: torch.optim.Optimizer | None = None, clip_values: CLIP_VALUES_TYPE | None = None, channels_first: bool | None = True, preprocessing_defences: Preprocessor | List[Preprocessor] | None = None, postprocessing_defences: Postprocessor | List[Postprocessor] | None = None, preprocessing: PREPROCESSING_TYPE = None, attack_losses: Tuple[str, ...] = ('loss_classifier', 'loss_box_reg', 'loss_objectness', 'loss_rpn_box_reg'), device_type: str = 'gpu')¶

This class implements a model-specific object detector using Faster R-CNN and PyTorch following the input and output formats of torchvision.

__init__(model: torchvision.models.detection.FasterRCNN | None = None, input_shape: Tuple[int, ...] = (-1, -1, -1), optimizer: torch.optim.Optimizer | None = None, clip_values: CLIP_VALUES_TYPE | None = None, channels_first: bool | None = True, preprocessing_defences: Preprocessor | List[Preprocessor] | None = None, postprocessing_defences: Postprocessor | List[Postprocessor] | None = None, preprocessing: PREPROCESSING_TYPE = None, attack_losses: Tuple[str, ...] = ('loss_classifier', 'loss_box_reg', 'loss_objectness', 'loss_rpn_box_reg'), device_type: str = 'gpu')¶

Initialization.

Parameters:

model –
Faster R-CNN model. The output of the model is List[Dict[str, torch.Tensor]], one for each input image. The fields of the Dict are as follows:
- boxes [N, 4]: the boxes in [x1, y1, x2, y2] format, with 0 <= x1 < x2 <= W and 0 <= y1 < y2 <= H.
- labels [N]: the labels for each image.
- scores [N]: the scores of each prediction.
input_shape – The shape of one input sample.
optimizer – The optimizer for training the classifier.
clip_values – Tuple of the form (min, max) of floats or np.ndarray representing the minimum and maximum values allowed for features. If floats are provided, these will be used as the range of all features. If arrays are provided, each value will be considered the bound for a feature, thus the shape of clip values needs to match the total number of features.
channels_first – Set channels first or last.
preprocessing_defences – Preprocessing defence(s) to be applied by the classifier.
postprocessing_defences – Postprocessing defence(s) to be applied by the classifier.
preprocessing – Tuple of the form (subtrahend, divisor) of floats or np.ndarray of values to be used for data preprocessing. The first value will be subtracted from the input. The input will then be divided by the second one.
attack_losses – Tuple of any combination of strings of loss components: ‘loss_classifier’, ‘loss_box_reg’, ‘loss_objectness’, and ‘loss_rpn_box_reg’.
device_type (str) – Type of device to be used for model and tensors, if cpu run on CPU, if gpu run on GPU if available otherwise run on CPU.

property attack_losses: Tuple[str, ...]¶

Return the combination of strings of the loss components.

Returns:: The combination of strings of the loss components.

property channels_first: bool¶

Returns:: Boolean to indicate index of the color channels in the sample x.

property clip_values: CLIP_VALUES_TYPE | None¶

Return the clip values of the input samples.

Returns:: Clip values (min, max).

clone_for_refitting() → ESTIMATOR_TYPE¶: Clone estimator for refitting.

compute_loss(x: ndarray, y: List[Dict[str, ndarray | torch.Tensor]], **kwargs) → ndarray | torch.Tensor¶

Compute the loss of the neural network for samples x.

Parameters:

x (ndarray) – Samples of shape NCHW or NHWC.
y –
Target values of format List[Dict[str, Union[np.ndarray, torch.Tensor]]], one for each input image. The fields of the Dict are as follows:
- boxes [N, 4]: the boxes in [x1, y1, x2, y2] format, with 0 <= x1 < x2 <= W and 0 <= y1 < y2 <= H.
- labels [N]: the labels for each image.

Returns:

Loss.

compute_loss_from_predictions(pred: ndarray, y: ndarray, **kwargs) → ndarray¶

Compute the loss of the estimator for predictions pred.

Return type:

ndarray

Parameters:

pred (ndarray) – Model predictions.
y (ndarray) – Target values.

Returns:

Loss values.

compute_losses(x: ndarray, y: List[Dict[str, ndarray | torch.Tensor]]) → Dict[str, ndarray]¶

Compute all loss components.

Parameters:

x (ndarray) – Samples of shape NCHW or NHWC.
y –
Target values of format List[Dict[str, Union[np.ndarray, torch.Tensor]]], one for each input image. The fields of the Dict are as follows:
- boxes [N, 4]: the boxes in [x1, y1, x2, y2] format, with 0 <= x1 < x2 <= W and 0 <= y1 < y2 <= H.
- labels [N]: the labels for each image.

Returns:

Dictionary of loss components.

property device: torch.device¶

Get current used device.

Returns:: Current used device.

property device_type: str¶

Return the type of device on which the estimator is run.

Returns:: Type of device on which the estimator is run, either gpu or cpu.

fit(x: ndarray, y: List[Dict[str, ndarray | torch.Tensor]], batch_size: int = 128, nb_epochs: int = 10, drop_last: bool = False, scheduler: torch.optim.lr_scheduler._LRScheduler | None = None, **kwargs) → None¶

Fit the classifier on the training set (x, y).

Parameters:

x (ndarray) – Samples of shape NCHW or NHWC.
y –
Target values of format List[Dict[str, Union[np.ndarray, torch.Tensor]]], one for each input image. The fields of the Dict are as follows:
- boxes [N, 4]: the boxes in [x1, y1, x2, y2] format, with 0 <= x1 < x2 <= W and 0 <= y1 < y2 <= H.
- labels [N]: the labels for each image.
batch_size (int) – Size of batches.
nb_epochs (int) – Number of epochs to use for training.
drop_last (bool) – Set to True to drop the last incomplete batch, if the dataset size is not divisible by the batch size. If False and the size of dataset is not divisible by the batch size, then the last batch will be smaller. (default: False)
scheduler – Learning rate scheduler to run at the start of every epoch.
kwargs – Dictionary of framework-specific arguments. This parameter is not currently supported for PyTorch and providing it takes no effect.

fit_generator(generator: DataGenerator, nb_epochs: int = 20, **kwargs) → None¶

Fit the estimator using a generator yielding training batches. Implementations can provide framework-specific versions of this function to speed-up computation.

Parameters:

generator – Batch generator providing (x, y) for each epoch.
nb_epochs (int) – Number of training epochs.

get_activations(x: ndarray, layer: int | str, batch_size: int, framework: bool = False) → ndarray¶

Return the output of a specific layer for samples x where layer is the index of the layer between 0 and nb_layers - 1 or the name of the layer. The number of layers can be determined by counting the results returned by calling `layer_names.

Return type:

ndarray

Parameters:

x (ndarray) – Samples
layer – Index or name of the layer.
batch_size (int) – Batch size.
framework (bool) – If true, return the intermediate tensor representation of the activation.

Returns:

The output of layer, where the first dimension is the batch size corresponding to x.

get_params() → Dict[str, Any]¶

Get all parameters and their values of this estimator.

Returns:: A dictionary of string parameter names to their value.

property input_shape: Tuple[int, ...]¶

Return the shape of one input sample.

Returns:: Shape of one input sample.

property layer_names: List[str] | None¶

Return the names of the hidden layers in the model, if applicable.

Returns:: The names of the hidden layers in the model, input and output layers are ignored.

Warning

layer_names tries to infer the internal structure of the model. This feature comes with no guarantees on the correctness of the result. The intended order of the layers tries to match their order in the model, but this is not guaranteed either.

loss_gradient(x: ndarray, y: List[Dict[str, ndarray | torch.Tensor]], **kwargs) → ndarray¶

Compute the gradient of the loss function w.r.t. x.

Return type:

ndarray

Parameters:

x (ndarray) – Samples of shape NCHW or NHWC.
y –
Target values of format List[Dict[str, Union[np.ndarray, torch.Tensor]]], one for each input image. The fields of the Dict are as follows:
- boxes [N, 4]: the boxes in [x1, y1, x2, y2] format, with 0 <= x1 < x2 <= W and 0 <= y1 < y2 <= H.
- labels [N]: the labels for each image.

Returns:

Loss gradients of the same shape as x.

property model: torch.nn.Module¶

Return the model.

Returns:: The model.

property native_label_is_pytorch_format: bool¶: Are the native labels in PyTorch format [x1, y1, x2, y2]?

property optimizer: torch.optim.Optimizer | None¶

Return the optimizer.

Returns:: The optimizer.

predict(x: ndarray, batch_size: int = 128, **kwargs) → List[Dict[str, ndarray]]¶

Perform prediction for a batch of inputs.

Parameters:

x (ndarray) – Samples of shape NCHW or NHWC.
batch_size (int) – Batch size.

Returns:

Predictions of format List[Dict[str, np.ndarray]], one for each input image. The fields of the Dict are as follows:

boxes [N, 4]: the boxes in [x1, y1, x2, y2] format, with 0 <= x1 < x2 <= W and 0 <= y1 < y2 <= H.
labels [N]: the labels for each image
scores [N]: the scores or each prediction.

set_batchnorm(train: bool) → None¶

Set all batch normalization layers into train or eval mode.

Parameters:: train (bool) – False for evaluation mode.

set_dropout(train: bool) → None¶

Set all dropout layers into train or eval mode.

Parameters:: train (bool) – False for evaluation mode.

set_multihead_attention(train: bool) → None¶

Set all multi-head attention layers into train or eval mode.

Parameters:: train (bool) – False for evaluation mode.

set_params(**kwargs) → None¶

Take a dictionary of parameters and apply checks before setting them as attributes.

Parameters:: kwargs – A dictionary of attributes.

Object Detector PyTorch YOLO¶

class art.estimators.object_detection.PyTorchYolo(model: torch.nn.Module, input_shape: Tuple[int, ...] = (3, 416, 416), optimizer: torch.optim.Optimizer | None = None, clip_values: CLIP_VALUES_TYPE | None = None, channels_first: bool | None = True, preprocessing_defences: Preprocessor | List[Preprocessor] | None = None, postprocessing_defences: Postprocessor | List[Postprocessor] | None = None, preprocessing: PREPROCESSING_TYPE = None, attack_losses: Tuple[str, ...] = ('loss_classifier', 'loss_box_reg', 'loss_objectness', 'loss_rpn_box_reg'), device_type: str = 'gpu')¶

This module implements the model- and task specific estimator for YOLO v3, v5 object detector models in PyTorch.

Paper link: https://arxiv.org/abs/1804.02767

__init__(model: torch.nn.Module, input_shape: Tuple[int, ...] = (3, 416, 416), optimizer: torch.optim.Optimizer | None = None, clip_values: CLIP_VALUES_TYPE | None = None, channels_first: bool | None = True, preprocessing_defences: Preprocessor | List[Preprocessor] | None = None, postprocessing_defences: Postprocessor | List[Postprocessor] | None = None, preprocessing: PREPROCESSING_TYPE = None, attack_losses: Tuple[str, ...] = ('loss_classifier', 'loss_box_reg', 'loss_objectness', 'loss_rpn_box_reg'), device_type: str = 'gpu')¶

Initialization.

Parameters:

model –
YOLO v3 or v5 model wrapped as demonstrated in examples/get_started_yolo.py. The output of the model is List[Dict[str, torch.Tensor]], one for each input image. The fields of the Dict are as follows:
- boxes [N, 4]: the boxes in [x1, y1, x2, y2] format, with 0 <= x1 < x2 <= W and 0 <= y1 < y2 <= H.
- labels [N]: the labels for each image.
- scores [N]: the scores of each prediction.
input_shape – The shape of one input sample.
optimizer – The optimizer for training the classifier.
clip_values – Tuple of the form (min, max) of floats or np.ndarray representing the minimum and maximum values allowed for features. If floats are provided, these will be used as the range of all features. If arrays are provided, each value will be considered the bound for a feature, thus the shape of clip values needs to match the total number of features.
channels_first – Set channels first or last.
preprocessing_defences – Preprocessing defence(s) to be applied by the classifier.
postprocessing_defences – Postprocessing defence(s) to be applied by the classifier.
preprocessing – Tuple of the form (subtrahend, divisor) of floats or np.ndarray of values to be used for data preprocessing. The first value will be subtracted from the input. The input will then be divided by the second one.
attack_losses – Tuple of any combination of strings of loss components: ‘loss_classifier’, ‘loss_box_reg’, ‘loss_objectness’, and ‘loss_rpn_box_reg’.
device_type (str) – Type of device to be used for model and tensors, if cpu run on CPU, if gpu run on GPU if available otherwise run on CPU.

property attack_losses: Tuple[str, ...]¶

Return the combination of strings of the loss components.

Returns:: The combination of strings of the loss components.

property channels_first: bool¶

Returns:: Boolean to indicate index of the color channels in the sample x.

property clip_values: CLIP_VALUES_TYPE | None¶

Return the clip values of the input samples.

Returns:: Clip values (min, max).

clone_for_refitting() → ESTIMATOR_TYPE¶: Clone estimator for refitting.

compute_loss(x: ndarray | torch.Tensor, y: List[Dict[str, ndarray | torch.Tensor]], **kwargs) → ndarray | torch.Tensor¶

Compute the loss of the neural network for samples x.

Parameters:

x – Samples of shape NCHW or NHWC.
y –
Target values of format List[Dict[str, Union[np.ndarray, torch.Tensor]]], one for each input image. The fields of the Dict are as follows:
- boxes [N, 4]: the boxes in [x1, y1, x2, y2] format, with 0 <= x1 < x2 <= W and 0 <= y1 < y2 <= H.
- labels [N]: the labels for each image.

Returns:

Loss.

compute_loss_from_predictions(pred: ndarray, y: ndarray, **kwargs) → ndarray¶

Compute the loss of the estimator for predictions pred.

Return type:

ndarray

Parameters:

pred (ndarray) – Model predictions.
y (ndarray) – Target values.

Returns:

Loss values.

compute_losses(x: ndarray | torch.Tensor, y: List[Dict[str, ndarray | torch.Tensor]]) → Dict[str, ndarray]¶

Compute all loss components.

Parameters:

x – Samples of shape NCHW or NHWC.
y –
Target values of format List[Dict[str, Union[np.ndarray, torch.Tensor]]], one for each input image. The fields of the Dict are as follows:
- boxes [N, 4]: the boxes in [x1, y1, x2, y2] format, with 0 <= x1 < x2 <= W and 0 <= y1 < y2 <= H.
- labels [N]: the labels for each image.

Returns:

Dictionary of loss components.

property device: torch.device¶

Get current used device.

Returns:: Current used device.

property device_type: str¶

Return the type of device on which the estimator is run.

Returns:: Type of device on which the estimator is run, either gpu or cpu.

fit(x: ndarray, y: List[Dict[str, ndarray | torch.Tensor]], batch_size: int = 128, nb_epochs: int = 10, drop_last: bool = False, scheduler: torch.optim.lr_scheduler._LRScheduler | None = None, **kwargs) → None¶

Fit the classifier on the training set (x, y).

Parameters:

x (ndarray) – Samples of shape NCHW or NHWC.
y –
Target values of format List[Dict[str, Union[np.ndarray, torch.Tensor]]], one for each input image. The fields of the Dict are as follows:
- boxes [N, 4]: the boxes in [x1, y1, x2, y2] format, with 0 <= x1 < x2 <= W and 0 <= y1 < y2 <= H.
- labels [N]: the labels for each image.
batch_size (int) – Size of batches.
nb_epochs (int) – Number of epochs to use for training.
drop_last (bool) – Set to True to drop the last incomplete batch, if the dataset size is not divisible by the batch size. If False and the size of dataset is not divisible by the batch size, then the last batch will be smaller. (default: False)
scheduler – Learning rate scheduler to run at the start of every epoch.
kwargs – Dictionary of framework-specific arguments. This parameter is not currently supported for PyTorch and providing it takes no effect.

fit_generator(generator: DataGenerator, nb_epochs: int = 20, **kwargs) → None¶

Fit the estimator using a generator yielding training batches. Implementations can provide framework-specific versions of this function to speed-up computation.

Parameters:

generator – Batch generator providing (x, y) for each epoch.
nb_epochs (int) – Number of training epochs.

get_activations(x: ndarray, layer: int | str, batch_size: int, framework: bool = False) → ndarray¶

Return the output of a specific layer for samples x where layer is the index of the layer between 0 and nb_layers - 1 or the name of the layer. The number of layers can be determined by counting the results returned by calling `layer_names.

Return type:

ndarray

Parameters:

x (ndarray) – Samples
layer – Index or name of the layer.
batch_size (int) – Batch size.
framework (bool) – If true, return the intermediate tensor representation of the activation.

Returns:

The output of layer, where the first dimension is the batch size corresponding to x.

get_params() → Dict[str, Any]¶

Get all parameters and their values of this estimator.

Returns:: A dictionary of string parameter names to their value.

property input_shape: Tuple[int, ...]¶

Return the shape of one input sample.

Returns:: Shape of one input sample.

property layer_names: List[str] | None¶

Return the names of the hidden layers in the model, if applicable.

Returns:: The names of the hidden layers in the model, input and output layers are ignored.

Warning

layer_names tries to infer the internal structure of the model. This feature comes with no guarantees on the correctness of the result. The intended order of the layers tries to match their order in the model, but this is not guaranteed either.

loss_gradient(x: ndarray | torch.Tensor, y: List[Dict[str, ndarray | torch.Tensor]], **kwargs) → ndarray | torch.Tensor¶

Compute the gradient of the loss function w.r.t. x.

Parameters:

x – Samples of shape NCHW or NHWC.
y –
Target values of format List[Dict[str, Union[np.ndarray, torch.Tensor]]], one for each input image. The fields of the Dict are as follows:
- boxes [N, 4]: the boxes in [x1, y1, x2, y2] format, with 0 <= x1 < x2 <= W and 0 <= y1 < y2 <= H.
- labels [N]: the labels for each image.

Returns:

Loss gradients of the same shape as x.

property model: torch.nn.Module¶

Return the model.

Returns:: The model.

property native_label_is_pytorch_format: bool¶

Return are the native labels in PyTorch format [x1, y1, x2, y2]?

Returns:: Are the native labels in PyTorch format [x1, y1, x2, y2]?

property optimizer: torch.optim.Optimizer | None¶

Return the optimizer.

Returns:: The optimizer.

predict(x: ndarray, batch_size: int = 128, **kwargs) → List[Dict[str, ndarray]]¶

Perform prediction for a batch of inputs.

Parameters:

x (ndarray) – Samples of shape NCHW or NHWC.
batch_size (int) – Batch size.

Returns:

Predictions of format List[Dict[str, np.ndarray]], one for each input image. The fields of the Dict are as follows:

boxes [N, 4]: the boxes in [x1, y1, x2, y2] format, with 0 <= x1 < x2 <= W and 0 <= y1 < y2 <= H.
labels [N]: the labels for each image.
scores [N]: the scores of each prediction.

set_batchnorm(train: bool) → None¶

Set all batch normalization layers into train or eval mode.

Parameters:: train (bool) – False for evaluation mode.

set_dropout(train: bool) → None¶

Set all dropout layers into train or eval mode.

Parameters:: train (bool) – False for evaluation mode.

set_multihead_attention(train: bool) → None¶

Set all multi-head attention layers into train or eval mode.

Parameters:: train (bool) – False for evaluation mode.

set_params(**kwargs) → None¶

Take a dictionary of parameters and apply checks before setting them as attributes.

Parameters:: kwargs – A dictionary of attributes.

Object Detector TensorFlow Faster-RCNN¶

class art.estimators.object_detection.TensorFlowFasterRCNN(images: tf.Tensor, model: FasterRCNNMetaArch | None = None, filename: str | None = None, url: str | None = None, sess: Session | None = None, is_training: bool = False, clip_values: CLIP_VALUES_TYPE | None = None, channels_first: bool = False, preprocessing_defences: Preprocessor | List[Preprocessor] | None = None, postprocessing_defences: Postprocessor | List[Postprocessor] | None = None, preprocessing: PREPROCESSING_TYPE = (0.0, 1.0), attack_losses: Tuple[str, ...] = ('Loss/RPNLoss/localization_loss', 'Loss/RPNLoss/objectness_loss', 'Loss/BoxClassifierLoss/localization_loss', 'Loss/BoxClassifierLoss/classification_loss'))¶

This class implements a model-specific object detector using Faster-RCNN and TensorFlow.

__init__(images: tf.Tensor, model: FasterRCNNMetaArch | None = None, filename: str | None = None, url: str | None = None, sess: Session | None = None, is_training: bool = False, clip_values: CLIP_VALUES_TYPE | None = None, channels_first: bool = False, preprocessing_defences: Preprocessor | List[Preprocessor] | None = None, postprocessing_defences: Postprocessor | List[Postprocessor] | None = None, preprocessing: PREPROCESSING_TYPE = (0.0, 1.0), attack_losses: Tuple[str, ...] = ('Loss/RPNLoss/localization_loss', 'Loss/RPNLoss/objectness_loss', 'Loss/BoxClassifierLoss/localization_loss', 'Loss/BoxClassifierLoss/classification_loss'))¶

Initialization of an instance TensorFlowFasterRCNN.

Parameters:

images – Input samples of shape (nb_samples, height, width, nb_channels).
model –
A TensorFlow Faster-RCNN model. The output that can be computed from the model includes a tuple of (predictions, losses, detections):
- predictions: a dictionary holding “raw” prediction tensors.
- losses: a dictionary mapping loss keys (Loss/RPNLoss/localization_loss,
  Loss/RPNLoss/objectness_loss, Loss/BoxClassifierLoss/localization_loss, Loss/BoxClassifierLoss/classification_loss) to scalar tensors representing corresponding loss values.
- detections: a dictionary containing final detection results.
filename – Filename of the detection model without filename extension.
url – URL to download archive of detection model including filename extension.
sess – Computation session.
is_training (bool) – A boolean indicating whether the training version of the computation graph should be constructed.
clip_values – Tuple of the form (min, max) of floats or np.ndarray representing the minimum and maximum values allowed for input image features. If floats are provided, these will be used as the range of all features. If arrays are provided, each value will be considered the bound for a feature, thus the shape of clip values needs to match the total number of features.
channels_first (bool) – Set channels first or last.
preprocessing_defences – Preprocessing defence(s) to be applied by the classifier.
postprocessing_defences – Postprocessing defence(s) to be applied by the classifier.
preprocessing – Tuple of the form (subtractor, divider) of floats or np.ndarray of values to be used for data preprocessing. The first value will be subtracted from the input. The input will then be divided by the second one.
attack_losses – Tuple of any combination of strings of the following loss components: first_stage_localization_loss, first_stage_objectness_loss, second_stage_localization_loss, second_stage_classification_loss.

property channels_first: bool¶

Returns:: Boolean to indicate index of the color channels in the sample x.

property clip_values: CLIP_VALUES_TYPE | None¶

Return the clip values of the input samples.

Returns:: Clip values (min, max).

clone_for_refitting() → ESTIMATOR_TYPE¶: Clone estimator for refitting.

compute_loss(x: ndarray, y: ndarray, **kwargs) → ndarray¶

Compute the loss.

Return type:

ndarray

Parameters:

x (ndarray) – Sample input with shape as expected by the model.
y (ndarray) – Target values (class labels) one-hot-encoded of shape (nb_samples, nb_classes) or indices of shape (nb_samples,).

Returns:

Array of losses of the same shape as x.

compute_loss_from_predictions(pred: ndarray, y: ndarray, **kwargs) → ndarray¶

Compute the loss of the estimator for predictions pred.

Return type:

ndarray

Parameters:

pred (ndarray) – Model predictions.
y (ndarray) – Target values.

Returns:

Loss values.

compute_losses(x: ndarray, y: ndarray) → Dict[str, ndarray]¶

Compute all loss components.

Parameters:

x (ndarray) – Samples of shape (nb_samples, nb_features) or (nb_samples, nb_pixels_1, nb_pixels_2, nb_channels) or (nb_samples, nb_channels, nb_pixels_1, nb_pixels_2).
y (ndarray) – Target values (class labels) one-hot-encoded of shape (nb_samples, nb_classes) or indices of shape (nb_samples,).

Returns:

Dictionary of loss components.

property detections: Dict[str, tf.Tensor]¶

Get the _detections attribute.

Returns:: A dictionary containing final detection results.

fit(x: ndarray, y, batch_size: int = 128, nb_epochs: int = 20, **kwargs) → None¶

Fit the model of the estimator on the training data x and y.

Parameters:

x (ndarray) – Samples of shape (nb_samples, nb_features) or (nb_samples, nb_pixels_1, nb_pixels_2, nb_channels) or (nb_samples, nb_channels, nb_pixels_1, nb_pixels_2).
y (Format as expected by the model) – Target values.
batch_size (int) – Batch size.
nb_epochs (int) – Number of training epochs.

fit_generator(generator: DataGenerator, nb_epochs: int = 20, **kwargs) → None¶

Fit the estimator using a generator yielding training batches. Implementations can provide framework-specific versions of this function to speed-up computation.

Parameters:

generator – Batch generator providing (x, y) for each epoch.
nb_epochs (int) – Number of training epochs.

get_activations(x: ndarray, layer: int | str, batch_size: int, framework: bool = False) → ndarray¶

Return the output of a specific layer for samples x where layer is the index of the layer between 0 and nb_layers - 1 or the name of the layer. The number of layers can be determined by counting the results returned by calling `layer_names.

Return type:

ndarray

Parameters:

x (ndarray) – Samples
layer – Index or name of the layer.
batch_size (int) – Batch size.
framework (bool) – If true, return the intermediate tensor representation of the activation.

Returns:

The output of layer, where the first dimension is the batch size corresponding to x.

get_params() → Dict[str, Any]¶

Get all parameters and their values of this estimator.

Returns:: A dictionary of string parameter names to their value.

property input_images: tf.Tensor¶

Get the images attribute.

Returns:: The input image tensor.

property input_shape: Tuple[int, ...]¶

Return the shape of one input sample.

Returns:: Shape of one input sample.

property layer_names: List[str] | None¶

Return the names of the hidden layers in the model, if applicable.

Returns:: The names of the hidden layers in the model, input and output layers are ignored.

Warning

layer_names tries to infer the internal structure of the model. This feature comes with no guarantees on the correctness of the result. The intended order of the layers tries to match their order in the model, but this is not guaranteed either.

loss_gradient(x: ndarray, y: List[Dict[str, ndarray]], standardise_output: bool = False, **kwargs) → ndarray¶

Compute the gradient of the loss function w.r.t. x.

Return type:

ndarray

Parameters:

x (ndarray) – Samples of shape (nb_samples, height, width, nb_channels).
y –

Targets of format List[Dict[str, np.ndarray]], one for each input image. The fields of the Dict are
as follows:
- boxes [N, 4]: the boxes in [y1, x1, y2, x2] in scale [0, 1] (standardise_output=False) or
  [x1, y1, x2, y2] in image scale (standardise_output=True) format, with 0 <= x1 < x2 <= W and 0 <= y1 < y2 <= H.
- labels [N]: the labels for each image in TensorFlow (standardise_output=False) or PyTorch
  (standardise_output=True) format
standardise_output (bool) – True if y is provided in standardised PyTorch format. Box coordinates will be scaled back to [0, 1], label index will be decreased by 1 and the boxes will be changed from [x1, y1, x2, y2] to [y1, x1, y2, x2] format, with 0 <= x1 < x2 <= W and 0 <= y1 < y2 <= H.

Returns:

Loss gradients of the same shape as x.

property losses: Dict[str, tf.Tensor]¶

Get the _losses attribute.

Returns:: A dictionary mapping loss keys (Loss/RPNLoss/localization_loss, Loss/RPNLoss/objectness_loss, Loss/BoxClassifierLoss/localization_loss, Loss/BoxClassifierLoss/classification_loss) to scalar tensors representing corresponding loss values.

property model¶

Return the model.

Returns:: The model.

property native_label_is_pytorch_format: bool¶: Are the native labels in PyTorch format [x1, y1, x2, y2]?

predict(x: ndarray, batch_size: int = 128, standardise_output: bool = False, **kwargs) → List[Dict[str, ndarray]]¶

Perform prediction for a batch of inputs.

Parameters:

x (ndarray) – Samples of shape (nb_samples, height, width, nb_channels).
batch_size (int) – Batch size.
standardise_output (bool) – True if output should be standardised to PyTorch format. Box coordinates will be scaled from [0, 1] to image dimensions, label index will be increased by 1 to adhere to COCO categories and the boxes will be changed to [x1, y1, x2, y2] format, with 0 <= x1 < x2 <= W and 0 <= y1 < y2 <= H.

Returns:

Predictions of format List[Dict[str, np.ndarray]], one for each input image. The fields of the Dict are as follows:

boxes [N, 4]: the boxes in [y1, x1, y2, x2] format, with 0 <= x1 < x2 <= W and 0 <= y1 < y2 <= H.
Can be changed to PyTorch format with standardise_output=True.
labels [N]: the labels for each image in TensorFlow format. Can be changed to PyTorch format with
standardise_output=True.
scores [N]: the scores or each prediction.

property predictions: Dict[str, tf.Tensor]¶

Get the _predictions attribute.

Returns:: A dictionary holding “raw” prediction tensors.

property sess: tf.python.client.session.Session¶

Get current TensorFlow session.

Returns:: The current TensorFlow session.

set_params(**kwargs) → None¶

Take a dictionary of parameters and apply checks before setting them as attributes.

Parameters:: kwargs – A dictionary of attributes.

`art.estimators.object_detection`¶

Mixin Base Class Object Detector¶

Object Detector PyTorch¶

Object Detector PyTorch Faster-RCNN¶

Object Detector PyTorch YOLO¶

Object Detector TensorFlow Faster-RCNN¶

Adversarial Robustness Toolbox

Navigation

Related Topics

art.estimators.object_detection¶

Mixin Base Class Object Detector¶

Object Detector PyTorch¶

Object Detector PyTorch Faster-RCNN¶

Object Detector PyTorch YOLO¶

Object Detector TensorFlow Faster-RCNN¶

`art.estimators.object_detection`¶