art.attacks.extraction

Module providing extraction attacks under a common interface.

Copycat CNN

class art.attacks.extraction.CopycatCNN(classifier: CLASSIFIER_TYPE, batch_size_fit: int = 1, batch_size_query: int = 1, nb_epochs: int = 10, nb_stolen: int = 1, use_probability: bool = False)

Implementation of the Copycat CNN attack from Rodrigues Correia-Silva et al. (2018).

__init__(classifier: CLASSIFIER_TYPE, batch_size_fit: int = 1, batch_size_query: int = 1, nb_epochs: int = 10, nb_stolen: int = 1, use_probability: bool = False) None

Create a Copycat CNN attack instance.

Parameters:
  • classifier – A victim classifier.

  • batch_size_fit (int) – Size of batches for fitting the thieved classifier.

  • batch_size_query (int) – Size of batches for querying the victim classifier.

  • nb_epochs (int) – Number of epochs to use for training.

  • nb_stolen (int) – Number of queries submitted to the victim classifier to steal it.

extract(x: ndarray, y: ndarray | None = None, **kwargs) CLASSIFIER_TYPE

Extract a thieved classifier.

Parameters:
  • x (ndarray) – An array with the source input to the victim classifier.

  • y – Target values (class labels) one-hot-encoded of shape (nb_samples, nb_classes) or indices of shape (nb_samples,). Not used in this attack.

  • thieved_classifier (Classifier) – A classifier to be stolen, currently always trained on one-hot labels.

Returns:

The stolen classifier.

Functionally Equivalent Extraction

class art.attacks.extraction.FunctionallyEquivalentExtraction(classifier: CLASSIFIER_TYPE, num_neurons: int | None = None)

This module implements the Functionally Equivalent Extraction attack for neural networks with two dense layers, ReLU activation at the first layer and logits output after the second layer.

__init__(classifier: CLASSIFIER_TYPE, num_neurons: int | None = None) None

Create a FunctionallyEquivalentExtraction instance.

Parameters:
  • classifier – A trained ART classifier.

  • num_neurons – The number of neurons in the first dense layer.

extract(x: ndarray, y: ndarray | None = None, delta_0: float = 0.05, fraction_true: float = 0.3, rel_diff_slope: float = 1e-05, rel_diff_value: float = 1e-06, delta_init_value: float = 0.1, delta_value_max: int = 50, d2_min: float = 0.0004, d_step: float = 0.01, delta_sign: float = 0.02, unit_vector_scale: int = 10000, ftol: float = 1e-08, **kwargs) BlackBoxClassifier

Extract the targeted model.

Return type:

BlackBoxClassifier

Parameters:
  • x (ndarray) – Samples of input data of shape (num_samples, num_features).

  • y – Correct labels or target labels for x, depending if the attack is targeted or not. This parameter is only used by some of the attacks.

  • delta_0 (float) – Initial step size of binary search.

  • fraction_true (float) – Fraction of output predictions that have to fulfill criteria for critical point.

  • rel_diff_slope (float) – Relative slope difference at critical points.

  • rel_diff_value (float) – Relative value difference at critical points.

  • delta_init_value (float) – Initial delta of weight value search.

  • delta_value_max (int) – Maximum delta of weight value search.

  • d2_min (float) – Minimum acceptable value of sum of absolute second derivatives.

  • d_step (float) – Step size of delta increase.

  • delta_sign (float) – Delta of weight sign search.

  • unit_vector_scale (int) – Multiplicative scale of the unit vector e_j.

  • ftol (float) – Tolerance for termination by the change of the cost function.

Returns:

ART BlackBoxClassifier of the extracted model.

Knockoff Nets

class art.attacks.extraction.KnockoffNets(classifier: CLASSIFIER_TYPE, batch_size_fit: int = 1, batch_size_query: int = 1, nb_epochs: int = 10, nb_stolen: int = 1, sampling_strategy: str = 'random', reward: str = 'all', verbose: bool = True, use_probability: bool = False)

Implementation of the Knockoff Nets attack from Orekondy et al. (2018).

__init__(classifier: CLASSIFIER_TYPE, batch_size_fit: int = 1, batch_size_query: int = 1, nb_epochs: int = 10, nb_stolen: int = 1, sampling_strategy: str = 'random', reward: str = 'all', verbose: bool = True, use_probability: bool = False) None

Create a KnockoffNets attack instance. Note, it is assumed that both the victim classifier and the thieved classifier produce logit outputs.

Parameters:
  • classifier – A victim classifier.

  • batch_size_fit (int) – Size of batches for fitting the thieved classifier.

  • batch_size_query (int) – Size of batches for querying the victim classifier.

  • nb_epochs (int) – Number of epochs to use for training.

  • nb_stolen (int) – Number of queries submitted to the victim classifier to steal it.

  • sampling_strategy (str) – Sampling strategy, either random or adaptive.

  • reward (str) – Reward type, in [‘cert’, ‘div’, ‘loss’, ‘all’].

  • verbose (bool) – Show progress bars.

extract(x: ndarray, y: ndarray | None = None, **kwargs) CLASSIFIER_TYPE

Extract a thieved classifier.

Parameters:
  • x (ndarray) – An array with the source input to the victim classifier.

  • y – Target values (class labels) one-hot-encoded of shape (nb_samples, nb_classes) or indices of shape (nb_samples,).

  • thieved_classifier – A thieved classifier to be stolen.

Returns:

The stolen classifier.