`art.attacks.extraction`¶

Module providing extraction attacks under a common interface.

Copycat CNN¶

class art.attacks.extraction.CopycatCNN(classifier: CLASSIFIER_TYPE, batch_size_fit: int = 1, batch_size_query: int = 1, nb_epochs: int = 10, nb_stolen: int = 1, use_probability: bool = False)¶

Implementation of the Copycat CNN attack from Rodrigues Correia-Silva et al. (2018).

Paper link: https://arxiv.org/abs/1806.05476

__init__(classifier: CLASSIFIER_TYPE, batch_size_fit: int = 1, batch_size_query: int = 1, nb_epochs: int = 10, nb_stolen: int = 1, use_probability: bool = False) → None¶

Create a Copycat CNN attack instance.

Parameters:

classifier – A victim classifier.
batch_size_fit (int) – Size of batches for fitting the thieved classifier.
batch_size_query (int) – Size of batches for querying the victim classifier.
nb_epochs (int) – Number of epochs to use for training.
nb_stolen (int) – Number of queries submitted to the victim classifier to steal it.

extract(x: ndarray, y: ndarray | None = None, **kwargs) → CLASSIFIER_TYPE¶

Extract a thieved classifier.

Parameters:

x (ndarray) – An array with the source input to the victim classifier.
y – Target values (class labels) one-hot-encoded of shape (nb_samples, nb_classes) or indices of shape (nb_samples,). Not used in this attack.
thieved_classifier (Classifier) – A classifier to be stolen, currently always trained on one-hot labels.

Returns:

The stolen classifier.

Functionally Equivalent Extraction¶

class art.attacks.extraction.FunctionallyEquivalentExtraction(classifier: CLASSIFIER_TYPE, num_neurons: int | None = None)¶

This module implements the Functionally Equivalent Extraction attack for neural networks with two dense layers, ReLU activation at the first layer and logits output after the second layer.

Paper link: https://arxiv.org/abs/1909.01838

__init__(classifier: CLASSIFIER_TYPE, num_neurons: int | None = None) → None¶

Create a FunctionallyEquivalentExtraction instance.

Parameters:

classifier – A trained ART classifier.
num_neurons – The number of neurons in the first dense layer.

extract(x: ndarray, y: ndarray | None = None, delta_0: float = 0.05, fraction_true: float = 0.3, rel_diff_slope: float = 1e-05, rel_diff_value: float = 1e-06, delta_init_value: float = 0.1, delta_value_max: int = 50, d2_min: float = 0.0004, d_step: float = 0.01, delta_sign: float = 0.02, unit_vector_scale: int = 10000, ftol: float = 1e-08, **kwargs) → BlackBoxClassifier¶

Extract the targeted model.

Return type:

BlackBoxClassifier

Parameters:

x (ndarray) – Samples of input data of shape (num_samples, num_features).
y – Correct labels or target labels for x, depending if the attack is targeted or not. This parameter is only used by some of the attacks.
delta_0 (float) – Initial step size of binary search.
fraction_true (float) – Fraction of output predictions that have to fulfill criteria for critical point.
rel_diff_slope (float) – Relative slope difference at critical points.
rel_diff_value (float) – Relative value difference at critical points.
delta_init_value (float) – Initial delta of weight value search.
delta_value_max (int) – Maximum delta of weight value search.
d2_min (float) – Minimum acceptable value of sum of absolute second derivatives.
d_step (float) – Step size of delta increase.
delta_sign (float) – Delta of weight sign search.
unit_vector_scale (int) – Multiplicative scale of the unit vector e_j.
ftol (float) – Tolerance for termination by the change of the cost function.

Returns:

ART BlackBoxClassifier of the extracted model.

Knockoff Nets¶

class art.attacks.extraction.KnockoffNets(classifier: CLASSIFIER_TYPE, batch_size_fit: int = 1, batch_size_query: int = 1, nb_epochs: int = 10, nb_stolen: int = 1, sampling_strategy: str = 'random', reward: str = 'all', verbose: bool = True, use_probability: bool = False)¶

Implementation of the Knockoff Nets attack from Orekondy et al. (2018).

Paper link: https://arxiv.org/abs/1812.02766

__init__(classifier: CLASSIFIER_TYPE, batch_size_fit: int = 1, batch_size_query: int = 1, nb_epochs: int = 10, nb_stolen: int = 1, sampling_strategy: str = 'random', reward: str = 'all', verbose: bool = True, use_probability: bool = False) → None¶

Create a KnockoffNets attack instance. Note, it is assumed that both the victim classifier and the thieved classifier produce logit outputs.

Parameters:

classifier – A victim classifier.
batch_size_fit (int) – Size of batches for fitting the thieved classifier.
batch_size_query (int) – Size of batches for querying the victim classifier.
nb_epochs (int) – Number of epochs to use for training.
nb_stolen (int) – Number of queries submitted to the victim classifier to steal it.
sampling_strategy (str) – Sampling strategy, either random or adaptive.
reward (str) – Reward type, in [‘cert’, ‘div’, ‘loss’, ‘all’].
verbose (bool) – Show progress bars.

extract(x: ndarray, y: ndarray | None = None, **kwargs) → CLASSIFIER_TYPE¶

Extract a thieved classifier.

Parameters:

x (ndarray) – An array with the source input to the victim classifier.
y – Target values (class labels) one-hot-encoded of shape (nb_samples, nb_classes) or indices of shape (nb_samples,).
thieved_classifier – A thieved classifier to be stolen.

Returns:

The stolen classifier.

`art.attacks.extraction`¶

Copycat CNN¶

Functionally Equivalent Extraction¶

Knockoff Nets¶

Adversarial Robustness Toolbox

Navigation

Related Topics

art.attacks.extraction¶

Copycat CNN¶

Functionally Equivalent Extraction¶

Knockoff Nets¶

`art.attacks.extraction`¶