art.attacks.inference.membership_inference

Module providing membership inference attacks.

Membership Inference Black-Box

class art.attacks.inference.membership_inference.MembershipInferenceBlackBox(estimator: Union[CLASSIFIER_TYPE, REGRESSOR_TYPE], input_type: str = 'prediction', attack_model_type: str = 'nn', attack_model: Optional[Any] = None)

Implementation of a learned black-box membership inference attack.

This implementation can use as input to the learning process probabilities/logits or losses, depending on the type of model and provided configuration.

__init__(estimator: Union[CLASSIFIER_TYPE, REGRESSOR_TYPE], input_type: str = 'prediction', attack_model_type: str = 'nn', attack_model: Optional[Any] = None)

Create a MembershipInferenceBlackBox attack instance.

Parameters:
  • estimator – Target estimator.

  • attack_model_type (str) – the type of default attack model to train, optional. Should be one of nn (for neural network, default), rf (for random forest) or gb (gradient boosting). If attack_model is supplied, this option will be ignored.

  • input_type (str) – the type of input to train the attack on. Can be one of: ‘prediction’ or ‘loss’. Default is prediction. Predictions can be either probabilities or logits, depending on the return type of the model. If the model is a regressor, only loss can be used.

  • attack_model – The attack model to train, optional. If none is provided, a default model will be created.

fit(x: ndarray, y: ndarray, test_x: ndarray, test_y: ndarray, pred: Optional[ndarray] = None, test_pred: Optional[ndarray] = None, **kwargs)

Train the attack model.

Parameters:
  • x (ndarray) – Records that were used in training the target estimator. Can be None if supplying pred.

  • y (ndarray) – True labels for x.

  • test_x (ndarray) – Records that were not used in training the target estimator. Can be None if supplying test_pred.

  • test_y (ndarray) – True labels for test_x.

  • pred – Estimator predictions for the records, if not supplied will be generated by calling the estimators’ predict function. Only relevant for input_type=’prediction’.

  • test_pred – Estimator predictions for the test records, if not supplied will be generated by calling the estimators’ predict function. Only relevant for input_type=’prediction’.

Returns:

An array holding the inferred membership status, 1 indicates a member and 0 indicates non-member.

infer(x: ndarray, y: Optional[ndarray] = None, **kwargs) ndarray

Infer membership in the training set of the target estimator.

Return type:

ndarray

Parameters:
  • x (ndarray) – Input records to attack. Can be None if supplying pred.

  • y – True labels for x.

  • pred – Estimator predictions for the records, if not supplied will be generated by calling the estimators’ predict function. Only relevant for input_type=’prediction’.

  • probabilities – a boolean indicating whether to return the predicted probabilities per class, or just the predicted class.

Returns:

An array holding the inferred membership status, 1 indicates a member and 0 indicates non-member, or class probabilities.

Membership Inference Black-Box Rule-Based

class art.attacks.inference.membership_inference.MembershipInferenceBlackBoxRuleBased(classifier: CLASSIFIER_TYPE)

Implementation of a simple, rule-based black-box membership inference attack.

This implementation uses the simple rule: if the model’s prediction for a sample is correct, then it is a member. Otherwise, it is not a member.

__init__(classifier: CLASSIFIER_TYPE)

Create a MembershipInferenceBlackBoxRuleBased attack instance.

Parameters:

classifier – Target classifier.

infer(x: ndarray, y: Optional[ndarray] = None, **kwargs) ndarray

Infer membership in the training set of the target estimator.

Return type:

ndarray

Parameters:
  • x (ndarray) – Input records to attack.

  • y – True labels for x.

  • probabilities – a boolean indicating whether to return the predicted probabilities per class, or just the predicted class.

Returns:

An array holding the inferred membership status, 1 indicates a member and 0 indicates non-member, or class probabilities.

Membership Inference Label-Only - Decision Boundary

class art.attacks.inference.membership_inference.LabelOnlyDecisionBoundary(estimator: CLASSIFIER_TYPE, distance_threshold_tau: Optional[float] = None)

Implementation of Label-Only Inference Attack based on Decision Boundary.

You only need to call ONE of the calibrate methods, depending on which attack you want to launch.

Paper link: https://arxiv.org/abs/2007.14321 (Choquette-Choo et al.)
Paper link: https://arxiv.org/abs/2007.15528 (Li and Zhang)
__init__(estimator: CLASSIFIER_TYPE, distance_threshold_tau: Optional[float] = None)

Create a LabelOnlyDecisionBoundary instance for Label-Only Inference Attack based on Decision Boundary.

Parameters:
  • estimator – A trained classification estimator.

  • distance_threshold_tau – Threshold distance for decision boundary. Samples with boundary distances larger than threshold are considered members of the training dataset.

calibrate_distance_threshold(x_train: ndarray, y_train: ndarray, x_test: ndarray, y_test: ndarray, **kwargs)

Calibrate distance threshold maximising the membership inference accuracy on x_train and x_test.

Parameters:
  • x_train (ndarray) – Training data.

  • y_train (ndarray) – Labels of training data x_train.

  • x_test (ndarray) – Test data.

  • y_test (ndarray) – Labels of test data x_test.

Keyword Arguments for HopSkipJump:
  • norm: Order of the norm. Possible values: “inf”, np.inf or 2.

  • max_iter: Maximum number of iterations.

  • max_eval: Maximum number of evaluations for estimating gradient.

  • init_eval: Initial number of evaluations for estimating gradient.

  • init_size: Maximum number of trials for initial generation of adversarial examples.

  • verbose: Show progress bars.

calibrate_distance_threshold_unsupervised(top_t: int = 50, num_samples: int = 100, max_queries: int = 1, **kwargs)

Calibrate distance threshold on randomly generated samples, choosing the top-t percentile of the noise needed to change the classifier’s initial prediction. This method requires the model’s clip_values to be set.

Parameters:
  • top_t (int) – Top-t percentile.

  • num_samples (int) – Number of random samples to generate.

  • max_queries (int) – Maximum number of queries. Maximum number of HSJ iterations on a single sample will be max_queries * max_iter.

Keyword Arguments for HopSkipJump:
  • norm: Order of the norm. Possible values: “inf”, np.inf or 2.

  • max_iter: Maximum number of iterations.

  • max_eval: Maximum number of evaluations for estimating gradient.

  • init_eval: Initial number of evaluations for estimating gradient.

  • init_size: Maximum number of trials for initial generation of adversarial examples.

  • verbose: Show progress bars.

infer(x: ndarray, y: Optional[ndarray] = None, **kwargs) ndarray

Infer membership of input x in estimator’s training data.

Return type:

ndarray

Parameters:
  • x (ndarray) – Input data.

  • y – True labels for x.

  • probabilities – a boolean indicating whether to return the predicted probabilities per class, or just the predicted class

Keyword Arguments for HopSkipJump:
  • norm: Order of the norm. Possible values: “inf”, np.inf or 2.

  • max_iter: Maximum number of iterations.

  • max_eval: Maximum number of evaluations for estimating gradient.

  • init_eval: Initial number of evaluations for estimating gradient.

  • init_size: Maximum number of trials for initial generation of adversarial examples.

  • verbose: Show progress bars.

Returns:

An array holding the inferred membership status, 1 indicates a member and 0 indicates non-member, or class probabilities.

Membership Inference Label-Only - Gap Attack

art.attacks.inference.membership_inference.LabelOnlyGapAttack

alias of MembershipInferenceBlackBoxRuleBased

Shadow Models

class art.attacks.inference.membership_inference.ShadowModels(shadow_model_template: CLONABLE, num_shadow_models: int = 3, disjoint_datasets=False, random_state=None)

Utility for training shadow models and generating shadow-datasets for membership inference attacks in scikit-learn, PyTorch and TensorFlow v2.

__init__(shadow_model_template: CLONABLE, num_shadow_models: int = 3, disjoint_datasets=False, random_state=None)

Initializes shadow models using the provided template.

Parameters:
  • shadow_model_template – Untrained classifier model to be used as a template for shadow models. Should be as similar as possible to the target model. Must implement clone_for_refitting method.

  • num_shadow_models (int) – How many shadow models to train to generate the shadow dataset.

  • disjoint_datasets (bool) – A boolean indicating whether the datasets used to train each shadow model should be disjoint. Default is False.

  • random_state – Seed for the numpy default random number generator.

__weakref__

list of weak references to the object (if defined)

generate_shadow_dataset(x: ndarray, y: ndarray, member_ratio: float = 0.5) Tuple[Tuple[ndarray, ndarray, ndarray], Tuple[ndarray, ndarray, ndarray]]

Generates a shadow dataset (member and nonmember samples and their corresponding model predictions) by splitting the dataset into training and testing samples, and then training the shadow models on the result.

Return type:

Tuple

Parameters:
  • x (ndarray) – The samples used to train the shadow models.

  • y (ndarray) – True labels for the dataset samples (as expected by the estimator’s fit method).

  • member_ratio (float) – Percentage of the data that should be used to train the shadow models. Must be between 0 and 1.

Returns:

The shadow dataset generated. The shape is ((member_samples, true_label, model_prediction), (nonmember_samples, true_label, model_prediction)).

generate_synthetic_shadow_dataset(target_classifier: CLASSIFIER_TYPE, dataset_size: int, max_features_randomized: Optional[int], member_ratio: float = 0.5, min_confidence: float = 0.4, max_retries: int = 6, random_record_fn: Callable[ndarray] = None, randomize_features_fn: Callable[[ndarray, int], numpy.ndarray] = None) Tuple[Tuple[ndarray, ndarray, ndarray], Tuple[ndarray, ndarray, ndarray]]

Generates a shadow dataset (member and nonmember samples and their corresponding model predictions) by training the shadow models on a synthetic dataset generated from the target classifier using the hill climbing algorithm from R. Shokri et al. (2017)

Paper Link: https://arxiv.org/abs/1610.05820

Return type:

Tuple

Parameters:
  • target_classifier – The classifier to synthesize data from.

  • dataset_size (int) – How many records to synthesize.

  • max_features_randomized – The initial amount of features to randomize before fine-tuning. If None, half of record features will be used, which will not work well for one-hot encoded data.

  • member_ratio (float) – Percentage of the data that should be used to train the shadow models. Must be between 0 and 1.

  • min_confidence (float) – The minimum confidence the classifier assigns the target class for the record to be accepted (i.e. the hill-climbing algorithm is finished).

  • max_retries (int) – The maximum amount of record-generation retries. The initial random pick of a record for the hill-climbing algorithm might result in failing to optimize the target-class confidence, and so a new random record will be retried.

  • random_record_fn (Callable) – Callback that returns a single random record (numpy array), i.e. all feature values are random. If None, random records are generated by treating each column in the input shape as a feature and choosing uniform values [0, 1) for each feature. This default behaviour is not correct for one-hot-encoded features, and a custom callback which provides a random record with random one-hot-encoded values should be used instead.

  • randomize_features_fn (Callable) – Callback that accepts an existing record (numpy array) and an int which is the number of features to randomize. The callback should return a new record, where the specified number of features have been randomized. If None, records are randomized by treating each column in the input shape as a feature, and choosing uniform values [0, 1) for each randomized feature. This default behaviour is not correct for one-hot-encoded features, and a custom callback which randomizes one-hot-encoded features should be used instead.

Returns:

The shadow dataset generated. The shape is ((member_samples, true_label, model_prediction), (nonmember_samples, true_label, model_prediction)).

get_shadow_models() Sequence[CLONABLE]

Returns the list of shadow models. generate_shadow_dataset or generate_synthetic_shadow_dataset must be called for the shadow models to be trained.

get_shadow_models_train_sets() List[Optional[Tuple[ndarray, ndarray]]]

Returns a list of tuples the form (shadow_x_train, shadow_y_train) for each shadow model. generate_shadow_dataset or generate_synthetic_shadow_dataset must be called before, or a list of Nones will be returned.