art.attacks.inference.membership_inference

Module providing membership inference attacks.

Membership Inference Black-Box

class art.attacks.inference.membership_inference.MembershipInferenceBlackBox(estimator: CLASSIFIER_TYPE | REGRESSOR_TYPE, input_type: str = 'prediction', attack_model_type: str = 'nn', attack_model: Any | None = None, nn_model_epochs: int = 100, nn_model_batch_size: int = 100, nn_model_learning_rate: float = 0.0001)

Implementation of a learned black-box membership inference attack.

This implementation can use as input to the learning process probabilities/logits or losses, depending on the type of model and provided configuration.

__init__(estimator: CLASSIFIER_TYPE | REGRESSOR_TYPE, input_type: str = 'prediction', attack_model_type: str = 'nn', attack_model: Any | None = None, nn_model_epochs: int = 100, nn_model_batch_size: int = 100, nn_model_learning_rate: float = 0.0001)

Create a MembershipInferenceBlackBox attack instance.

Parameters:
  • estimator – Target estimator.

  • attack_model_type (str) – the type of default attack model to train, optional. Should be one of: nn (neural network, default), rf (random forest), gb (gradient boosting), lr (logistic regression), dt (decision tree), knn (k nearest neighbors), svm (support vector machine). If attack_model is supplied, this option will be ignored.

  • input_type (str) – the type of input to train the attack on. Can be one of: ‘prediction’ or ‘loss’. Default is prediction. Predictions can be either probabilities or logits, depending on the return type of the model. If the model is a regressor, only loss can be used.

  • attack_model – The attack model to train, optional. If none is provided, a default model will be created.

  • nn_model_epochs (int) – the number of epochs to use when training a nn attack model

  • nn_model_batch_size (int) – the batch size to use when training a nn attack model

  • nn_model_learning_rate (float) – the learning rate to use when training a nn attack model

fit(x: ndarray | None = None, y: ndarray | None = None, test_x: ndarray | None = None, test_y: ndarray | None = None, pred: ndarray | None = None, test_pred: ndarray | None = None, **kwargs)

Train the attack model.

Parameters:
  • x – Records that were used in training the target estimator. Can be None if supplying pred.

  • y – True labels for x. If not supplied, attack will be based solely on model predictions.

  • test_x – Records that were not used in training the target estimator. Can be None if supplying test_pred.

  • test_y – True labels for test_x. If not supplied, attack will be based solely on model predictions.

  • pred – Estimator predictions for the records, if not supplied will be generated by calling the estimators’ predict function. Only relevant for input_type=’prediction’.

  • test_pred – Estimator predictions for the test records, if not supplied will be generated by calling the estimators’ predict function. Only relevant for input_type=’prediction’.

Returns:

An array holding the inferred membership status, 1 indicates a member and 0 indicates non-member.

infer(x: ndarray, y: ndarray | None = None, **kwargs) ndarray

Infer membership in the training set of the target estimator.

Return type:

ndarray

Parameters:
  • x (ndarray) – Input records to attack. Can be None if supplying pred.

  • y – True labels for x. If not supplied, attack will be based solely on model predictions.

  • pred – Estimator predictions for the records, if not supplied will be generated by calling the estimators’ predict function. Only relevant for input_type=’prediction’.

  • probabilities – a boolean indicating whether to return the predicted probabilities per class, or just the predicted class.

Returns:

An array holding the inferred membership status, 1 indicates a member and 0 indicates non-member, or class probabilities.

Membership Inference Black-Box Rule-Based

class art.attacks.inference.membership_inference.MembershipInferenceBlackBoxRuleBased(classifier: CLASSIFIER_TYPE)

Implementation of a simple, rule-based black-box membership inference attack.

This implementation uses the simple rule: if the model’s prediction for a sample is correct, then it is a member. Otherwise, it is not a member.

__init__(classifier: CLASSIFIER_TYPE)

Create a MembershipInferenceBlackBoxRuleBased attack instance.

Parameters:

classifier – Target classifier.

infer(x: ndarray, y: ndarray | None = None, **kwargs) ndarray

Infer membership in the training set of the target estimator.

Return type:

ndarray

Parameters:
  • x (ndarray) – Input records to attack.

  • y – True labels for x.

  • probabilities – a boolean indicating whether to return the predicted probabilities per class, or just the predicted class.

Returns:

An array holding the inferred membership status, 1 indicates a member and 0 indicates non-member, or class probabilities.

Membership Inference Label-Only - Decision Boundary

class art.attacks.inference.membership_inference.LabelOnlyDecisionBoundary(estimator: CLASSIFIER_TYPE, distance_threshold_tau: float | None = None)

Implementation of Label-Only Inference Attack based on Decision Boundary.

You only need to call ONE of the calibrate methods, depending on which attack you want to launch.

Paper link: https://arxiv.org/abs/2007.14321 (Choquette-Choo et al.)
Paper link: https://arxiv.org/abs/2007.15528 (Li and Zhang)
__init__(estimator: CLASSIFIER_TYPE, distance_threshold_tau: float | None = None)

Create a LabelOnlyDecisionBoundary instance for Label-Only Inference Attack based on Decision Boundary.

Parameters:
  • estimator – A trained classification estimator.

  • distance_threshold_tau – Threshold distance for decision boundary. Samples with boundary distances larger than threshold are considered members of the training dataset.

calibrate_distance_threshold(x_train: ndarray, y_train: ndarray, x_test: ndarray, y_test: ndarray, **kwargs)

Calibrate distance threshold maximising the membership inference accuracy on x_train and x_test.

Parameters:
  • x_train (ndarray) – Training data.

  • y_train (ndarray) – Labels of training data x_train.

  • x_test (ndarray) – Test data.

  • y_test (ndarray) – Labels of test data x_test.

Keyword Arguments for HopSkipJump:
  • norm: Order of the norm. Possible values: “inf”, np.inf or 2.

  • max_iter: Maximum number of iterations.

  • max_eval: Maximum number of evaluations for estimating gradient.

  • init_eval: Initial number of evaluations for estimating gradient.

  • init_size: Maximum number of trials for initial generation of adversarial examples.

  • verbose: Show progress bars.

calibrate_distance_threshold_unsupervised(top_t: int = 50, num_samples: int = 100, max_queries: int = 1, **kwargs)

Calibrate distance threshold on randomly generated samples, choosing the top-t percentile of the noise needed to change the classifier’s initial prediction. This method requires the model’s clip_values to be set.

Parameters:
  • top_t (int) – Top-t percentile.

  • num_samples (int) – Number of random samples to generate.

  • max_queries (int) – Maximum number of queries. Maximum number of HSJ iterations on a single sample will be max_queries * max_iter.

Keyword Arguments for HopSkipJump:
  • norm: Order of the norm. Possible values: “inf”, np.inf or 2.

  • max_iter: Maximum number of iterations.

  • max_eval: Maximum number of evaluations for estimating gradient.

  • init_eval: Initial number of evaluations for estimating gradient.

  • init_size: Maximum number of trials for initial generation of adversarial examples.

  • verbose: Show progress bars.

infer(x: ndarray, y: ndarray | None = None, **kwargs) ndarray

Infer membership of input x in estimator’s training data.

Return type:

ndarray

Parameters:
  • x (ndarray) – Input data.

  • y – True labels for x.

  • probabilities – a boolean indicating whether to return the predicted probabilities per class, or just the predicted class

Keyword Arguments for HopSkipJump:
  • norm: Order of the norm. Possible values: “inf”, np.inf or 2.

  • max_iter: Maximum number of iterations.

  • max_eval: Maximum number of evaluations for estimating gradient.

  • init_eval: Initial number of evaluations for estimating gradient.

  • init_size: Maximum number of trials for initial generation of adversarial examples.

  • verbose: Show progress bars.

Returns:

An array holding the inferred membership status, 1 indicates a member and 0 indicates non-member, or class probabilities.

Membership Inference Label-Only - Gap Attack

art.attacks.inference.membership_inference.LabelOnlyGapAttack

alias of MembershipInferenceBlackBoxRuleBased

Shadow Models

class art.attacks.inference.membership_inference.ShadowModels(shadow_model_template: CLONABLE, num_shadow_models: int = 3, disjoint_datasets=False, random_state=None)

Utility for training shadow models and generating shadow-datasets for membership inference attacks in scikit-learn, PyTorch and TensorFlow v2.

__init__(shadow_model_template: CLONABLE, num_shadow_models: int = 3, disjoint_datasets=False, random_state=None)

Initializes shadow models using the provided template.

Parameters:
  • shadow_model_template – Untrained classifier model to be used as a template for shadow models. Should be as similar as possible to the target model. Must implement clone_for_refitting method.

  • num_shadow_models (int) – How many shadow models to train to generate the shadow dataset.

  • disjoint_datasets (bool) – A boolean indicating whether the datasets used to train each shadow model should be disjoint. Default is False.

  • random_state – Seed for the numpy default random number generator.

__weakref__

list of weak references to the object (if defined)

generate_shadow_dataset(x: ndarray, y: ndarray, member_ratio: float = 0.5) Tuple[Tuple[ndarray, ndarray, ndarray], Tuple[ndarray, ndarray, ndarray]]

Generates a shadow dataset (member and nonmember samples and their corresponding model predictions) by splitting the dataset into training and testing samples, and then training the shadow models on the result.

Parameters:
  • x (ndarray) – The samples used to train the shadow models.

  • y (ndarray) – True labels for the dataset samples (as expected by the estimator’s fit method).

  • member_ratio (float) – Percentage of the data that should be used to train the shadow models. Must be between 0 and 1.

Returns:

The shadow dataset generated. The shape is ((member_samples, true_label, model_prediction), (nonmember_samples, true_label, model_prediction)).

generate_synthetic_shadow_dataset(target_classifier: CLASSIFIER_TYPE, dataset_size: int, max_features_randomized: int | None, member_ratio: float = 0.5, min_confidence: float = 0.4, max_retries: int = 6, random_record_fn: Callable[[], ndarray] | None = None, randomize_features_fn: Callable[[ndarray, int], ndarray] | None = None) Tuple[Tuple[ndarray, ndarray, ndarray], Tuple[ndarray, ndarray, ndarray]]

Generates a shadow dataset (member and nonmember samples and their corresponding model predictions) by training the shadow models on a synthetic dataset generated from the target classifier using the hill climbing algorithm from R. Shokri et al. (2017)

Paper Link: https://arxiv.org/abs/1610.05820

Parameters:
  • target_classifier – The classifier to synthesize data from.

  • dataset_size (int) – How many records to synthesize.

  • max_features_randomized – The initial amount of features to randomize before fine-tuning. If None, half of record features will be used, which will not work well for one-hot encoded data.

  • member_ratio (float) – Percentage of the data that should be used to train the shadow models. Must be between 0 and 1.

  • min_confidence (float) – The minimum confidence the classifier assigns the target class for the record to be accepted (i.e. the hill-climbing algorithm is finished).

  • max_retries (int) – The maximum amount of record-generation retries. The initial random pick of a record for the hill-climbing algorithm might result in failing to optimize the target-class confidence, and so a new random record will be retried.

  • random_record_fn – Callback that returns a single random record (numpy array), i.e. all feature values are random. If None, random records are generated by treating each column in the input shape as a feature and choosing uniform values [0, 1) for each feature. This default behaviour is not correct for one-hot-encoded features, and a custom callback which provides a random record with random one-hot-encoded values should be used instead.

  • randomize_features_fn – Callback that accepts an existing record (numpy array) and an int which is the number of features to randomize. The callback should return a new record, where the specified number of features have been randomized. If None, records are randomized by treating each column in the input shape as a feature, and choosing uniform values [0, 1) for each randomized feature. This default behaviour is not correct for one-hot-encoded features, and a custom callback which randomizes one-hot-encoded features should be used instead.

Returns:

The shadow dataset generated. The shape is ((member_samples, true_label, model_prediction), (nonmember_samples, true_label, model_prediction)).

get_shadow_models() Sequence[CLONABLE]

Returns the list of shadow models. generate_shadow_dataset or generate_synthetic_shadow_dataset must be called for the shadow models to be trained.

get_shadow_models_train_sets() List[Tuple[ndarray, ndarray] | None]

Returns a list of tuples the form (shadow_x_train, shadow_y_train) for each shadow model. generate_shadow_dataset or generate_synthetic_shadow_dataset must be called before, or a list of Nones will be returned.