art.attacks.inference

Module providing extraction attacks under a common interface.

Attribute Inference Black-Box

class art.attacks.inference.AttributeInferenceBlackBox(classifier: art.estimators.classification.classifier.Classifier, attack_model: Optional[art.estimators.classification.classifier.Classifier] = None, attack_feature: int = 0)

Implementation of a simple black-box attribute inference attack.

The idea is to train a simple neural network to learn the attacked feature from the rest of the features and the model’s predictions. Assumes the availability of the attacked model’s predictions for the samples under attack, in addition to the rest of the feature values. If this is not available, the true class label of the samples may be used as a proxy.

__init__(classifier: art.estimators.classification.classifier.Classifier, attack_model: Optional[art.estimators.classification.classifier.Classifier] = None, attack_feature: int = 0)

Create an AttributeInferenceBlackBox attack instance.

Parameters
  • classifier (Classifier) – Target classifier.

  • attack_model – The attack model to train, optional. If none is provided, a default model will be created.

  • attack_feature (int) – The index of the feature to be attacked.

fit(x: numpy.ndarray) → None

Train the attack model.

Parameters

x (ndarray) – Input to training process. Includes all features used to train the original model.

infer(x: numpy.ndarray, y: Optional[numpy.ndarray] = None, **kwargs) → numpy.ndarray

Infer the attacked feature.

Return type

ndarray

Parameters
  • x (ndarray) – Input to attack. Includes all features except the attacked feature.

  • y – Original model’s predictions for x.

  • values (np.ndarray) – Possible values for attacked feature.

Returns

The inferred feature values.

Attribute Inference White-Box Lifestyle Decision-Tree

class art.attacks.inference.AttributeInferenceWhiteBoxLifestyleDecisionTree(classifier: art.estimators.classification.classifier.Classifier, attack_feature: int = 0)

Implementation of Fredrikson et al. white box inference attack for decision trees.

Assumes that the attacked feature is discrete or categorical, with limited number of possible values. For example: a boolean feature.

__init__(classifier: art.estimators.classification.classifier.Classifier, attack_feature: int = 0)

Create an AttributeInferenceWhiteBoxLifestyle attack instance.

Parameters
  • classifier (Classifier) – Target classifier.

  • attack_feature (int) – The index of the feature to be attacked.

infer(x, y=None, **kwargs)

Infer the attacked feature.

Parameters
  • x (np.ndarray) – Input to attack. Includes all features except the attacked feature.

  • values (np.ndarray) – Possible values for attacked feature.

  • priors (np.ndarray) – Prior distributions of attacked feature values. Same size array as values.

Returns

The inferred feature values.

Return type

np.ndarray

Attribute Inference White-Box Decision-Tree

class art.attacks.inference.AttributeInferenceWhiteBoxDecisionTree(classifier, attack_feature=0)

A variation of the method proposed by of Fredrikson et al. in: https://dl.acm.org/doi/10.1145/2810103.2813677

Assumes the availability of the attacked model’s predictions for the samples under attack, in addition to access to the model itself and the rest of the feature values. If this is not available, the true class label of the samples may be used as a proxy. Also assumes that the attacked feature is discrete or categorical, with limited number of possible values. For example: a boolean feature.

__init__(classifier, attack_feature=0)

Create an AttributeInferenceWhiteBox attack instance.

Parameters
  • classifier (Classifier) – Target classifier.

  • attack_feature (int) – The index of the feature to be attacked.

infer(x: numpy.ndarray, y: Optional[numpy.ndarray] = None, **kwargs) → numpy.ndarray

Infer the attacked feature.

If the model’s prediction coincides with the real prediction for the sample for a single value, choose it as the predicted value. If not, fall back to the Fredrikson method (without phi)

Return type

ndarray

Parameters
  • x (ndarray) – Input to attack. Includes all features except the attacked feature.

  • y – Original model’s predictions for x.

  • values (np.ndarray) – Possible values for attacked feature.

  • priors (np.ndarray) – Prior distributions of attacked feature values. Same size array as values.

Returns

The inferred feature values.

MIFace

class art.attacks.inference.MIFace(classifier: art.estimators.classification.classifier.Classifier, max_iter: int = 10000, window_length: int = 100, threshold: float = 0.99, learning_rate: float = 0.1, batch_size: int = 1)

Implementation of the MIFace algorithm from Fredrikson et al. (2015). While in that paper the attack is demonstrated specifically against face recognition models, it is applicable more broadly to classifiers with continuous features which expose class gradients.

__init__(classifier: art.estimators.classification.classifier.Classifier, max_iter: int = 10000, window_length: int = 100, threshold: float = 0.99, learning_rate: float = 0.1, batch_size: int = 1)

Create an MIFace attack instance.

Parameters
  • classifier (Classifier) – Target classifier.

  • max_iter (int) – Maximum number of gradient descent iterations for the model inversion.

  • window_length (int) – Length of window for checking whether descent should be aborted.

  • threshold (float) – Threshold for descent stopping criterion.

  • batch_size (int) – Size of internal batches.

infer(x: Optional[numpy.ndarray], y: Optional[numpy.ndarray] = None, **kwargs) → numpy.ndarray

Extract a thieved classifier.

Return type

ndarray

Parameters
  • x – An array with the initial input to the victim classifier. If None, then initial input will be initialized as zero array.

  • y – Target values (class labels) one-hot-encoded of shape (nb_samples, nb_classes) or indices of shape (nb_samples,).

Returns

The inferred training samples.