`art.attacks.inference.membership_inference`¶

Module providing membership inference attacks.

Membership Inference Black-Box¶

class art.attacks.inference.membership_inference.MembershipInferenceBlackBox(estimator: CLASSIFIER_TYPE | REGRESSOR_TYPE, input_type: str = 'prediction', attack_model_type: str = 'nn', attack_model: Any | None = None, nn_model_epochs: int = 100, nn_model_batch_size: int = 100, nn_model_learning_rate: float = 0.0001)¶

Implementation of a learned black-box membership inference attack.

This implementation can use as input to the learning process probabilities/logits or losses, depending on the type of model and provided configuration.

__init__(estimator: CLASSIFIER_TYPE | REGRESSOR_TYPE, input_type: str = 'prediction', attack_model_type: str = 'nn', attack_model: Any | None = None, nn_model_epochs: int = 100, nn_model_batch_size: int = 100, nn_model_learning_rate: float = 0.0001)¶

Create a MembershipInferenceBlackBox attack instance.

Parameters:

estimator – Target estimator.
attack_model_type (str) – the type of default attack model to train, optional. Should be one of: nn (neural network, default), rf (random forest), gb (gradient boosting), lr (logistic regression), dt (decision tree), knn (k nearest neighbors), svm (support vector machine). If attack_model is supplied, this option will be ignored.
input_type (str) – the type of input to train the attack on. Can be one of: ‘prediction’ or ‘loss’. Default is prediction. Predictions can be either probabilities or logits, depending on the return type of the model. If the model is a regressor, only loss can be used.
attack_model – The attack model to train, optional. If none is provided, a default model will be created.
nn_model_epochs (int) – the number of epochs to use when training a nn attack model
nn_model_batch_size (int) – the batch size to use when training a nn attack model
nn_model_learning_rate (float) – the learning rate to use when training a nn attack model

Train the attack model.

Parameters:

x – Records that were used in training the target estimator. Can be None if supplying pred.
y – True labels for x. If not supplied, attack will be based solely on model predictions.
test_x – Records that were not used in training the target estimator. Can be None if supplying test_pred.
test_y – True labels for test_x. If not supplied, attack will be based solely on model predictions.
pred – Estimator predictions for the records, if not supplied will be generated by calling the estimators’ predict function. Only relevant for input_type=’prediction’.
test_pred – Estimator predictions for the test records, if not supplied will be generated by calling the estimators’ predict function. Only relevant for input_type=’prediction’.

Returns:

An array holding the inferred membership status, 1 indicates a member and 0 indicates non-member.

infer(x: ndarray, y: ndarray | None = None, **kwargs) → ndarray¶

Infer membership in the training set of the target estimator.

Return type:

ndarray

Parameters:

x (ndarray) – Input records to attack. Can be None if supplying pred.
y – True labels for x. If not supplied, attack will be based solely on model predictions.
pred – Estimator predictions for the records, if not supplied will be generated by calling the estimators’ predict function. Only relevant for input_type=’prediction’.
probabilities – a boolean indicating whether to return the predicted probabilities per class, or just the predicted class.

Returns:

An array holding the inferred membership status, 1 indicates a member and 0 indicates non-member, or class probabilities.

Membership Inference Black-Box Rule-Based¶

class art.attacks.inference.membership_inference.MembershipInferenceBlackBoxRuleBased(classifier: CLASSIFIER_TYPE)¶

Implementation of a simple, rule-based black-box membership inference attack.

This implementation uses the simple rule: if the model’s prediction for a sample is correct, then it is a member. Otherwise, it is not a member.

__init__(classifier: CLASSIFIER_TYPE)¶

Create a MembershipInferenceBlackBoxRuleBased attack instance.

Parameters:: classifier – Target classifier.

infer(x: ndarray, y: ndarray | None = None, **kwargs) → ndarray¶

Infer membership in the training set of the target estimator.

Return type:

ndarray

Parameters:

x (ndarray) – Input records to attack.
y – True labels for x.
probabilities – a boolean indicating whether to return the predicted probabilities per class, or just the predicted class.

Returns:

An array holding the inferred membership status, 1 indicates a member and 0 indicates non-member, or class probabilities.

Membership Inference Label-Only - Decision Boundary¶

class art.attacks.inference.membership_inference.LabelOnlyDecisionBoundary(estimator: CLASSIFIER_TYPE, distance_threshold_tau: float | None = None)¶

Implementation of Label-Only Inference Attack based on Decision Boundary.

You only need to call ONE of the calibrate methods, depending on which attack you want to launch.

Paper link: https://arxiv.org/abs/2007.14321 (Choquette-Choo et al.)

Paper link: https://arxiv.org/abs/2007.15528 (Li and Zhang)

__init__(estimator: CLASSIFIER_TYPE, distance_threshold_tau: float | None = None)¶

Create a LabelOnlyDecisionBoundary instance for Label-Only Inference Attack based on Decision Boundary.

Parameters:

estimator – A trained classification estimator.
distance_threshold_tau – Threshold distance for decision boundary. Samples with boundary distances larger than threshold are considered members of the training dataset.

calibrate_distance_threshold(x_train: ndarray, y_train: ndarray, x_test: ndarray, y_test: ndarray, **kwargs)¶

Calibrate distance threshold maximising the membership inference accuracy on x_train and x_test.

Paper link: https://arxiv.org/abs/2007.14321

Parameters:

x_train (ndarray) – Training data.
y_train (ndarray) – Labels of training data x_train.
x_test (ndarray) – Test data.
y_test (ndarray) – Labels of test data x_test.

Keyword Arguments for HopSkipJump:

norm: Order of the norm. Possible values: “inf”, np.inf or 2.
max_iter: Maximum number of iterations.
max_eval: Maximum number of evaluations for estimating gradient.
init_eval: Initial number of evaluations for estimating gradient.
init_size: Maximum number of trials for initial generation of adversarial examples.
verbose: Show progress bars.

calibrate_distance_threshold_unsupervised(top_t: int = 50, num_samples: int = 100, max_queries: int = 1, **kwargs)¶

Calibrate distance threshold on randomly generated samples, choosing the top-t percentile of the noise needed to change the classifier’s initial prediction. This method requires the model’s clip_values to be set.

Paper link: https://arxiv.org/abs/2007.15528

Parameters:

top_t (int) – Top-t percentile.
num_samples (int) – Number of random samples to generate.
max_queries (int) – Maximum number of queries. Maximum number of HSJ iterations on a single sample will be max_queries * max_iter.

Keyword Arguments for HopSkipJump:

norm: Order of the norm. Possible values: “inf”, np.inf or 2.
max_iter: Maximum number of iterations.
max_eval: Maximum number of evaluations for estimating gradient.
init_eval: Initial number of evaluations for estimating gradient.
init_size: Maximum number of trials for initial generation of adversarial examples.
verbose: Show progress bars.

infer(x: ndarray, y: ndarray | None = None, **kwargs) → ndarray¶

Infer membership of input x in estimator’s training data.

Return type:

ndarray

Parameters:

x (ndarray) – Input data.
y – True labels for x.
probabilities – a boolean indicating whether to return the predicted probabilities per class, or just the predicted class

Keyword Arguments for HopSkipJump:

norm: Order of the norm. Possible values: “inf”, np.inf or 2.
max_iter: Maximum number of iterations.
max_eval: Maximum number of evaluations for estimating gradient.
init_eval: Initial number of evaluations for estimating gradient.
init_size: Maximum number of trials for initial generation of adversarial examples.
verbose: Show progress bars.

Returns:

An array holding the inferred membership status, 1 indicates a member and 0 indicates non-member, or class probabilities.

Membership Inference Label-Only - Gap Attack¶

art.attacks.inference.membership_inference.LabelOnlyGapAttack¶: alias of MembershipInferenceBlackBoxRuleBased

Shadow Models¶

class art.attacks.inference.membership_inference.ShadowModels(shadow_model_template: CLONABLE, num_shadow_models: int = 3, disjoint_datasets=False, random_state=None)¶

Utility for training shadow models and generating shadow-datasets for membership inference attacks in scikit-learn, PyTorch and TensorFlow v2.

__init__(shadow_model_template: CLONABLE, num_shadow_models: int = 3, disjoint_datasets=False, random_state=None)¶

Initializes shadow models using the provided template.

Parameters:

shadow_model_template – Untrained classifier model to be used as a template for shadow models. Should be as similar as possible to the target model. Must implement clone_for_refitting method.
num_shadow_models (int) – How many shadow models to train to generate the shadow dataset.
disjoint_datasets (bool) – A boolean indicating whether the datasets used to train each shadow model should be disjoint. Default is False.
random_state – Seed for the numpy default random number generator.

__weakref__¶: list of weak references to the object (if defined)

generate_shadow_dataset(x: ndarray, y: ndarray, member_ratio: float = 0.5) → Tuple[Tuple[ndarray, ndarray, ndarray], Tuple[ndarray, ndarray, ndarray]]¶

Generates a shadow dataset (member and nonmember samples and their corresponding model predictions) by splitting the dataset into training and testing samples, and then training the shadow models on the result.

Parameters:

x (ndarray) – The samples used to train the shadow models.
y (ndarray) – True labels for the dataset samples (as expected by the estimator’s fit method).
member_ratio (float) – Percentage of the data that should be used to train the shadow models. Must be between 0 and 1.

Returns:

The shadow dataset generated. The shape is ((member_samples, true_label, model_prediction), (nonmember_samples, true_label, model_prediction)).

generate_synthetic_shadow_dataset(target_classifier: CLASSIFIER_TYPE, dataset_size: int, max_features_randomized: int | None, member_ratio: float = 0.5, min_confidence: float = 0.4, max_retries: int = 6, random_record_fn: Callable[[], ndarray] | None = None, randomize_features_fn: Callable[[ndarray, int], ndarray] | None = None) → Tuple[Tuple[ndarray, ndarray, ndarray], Tuple[ndarray, ndarray, ndarray]]¶

Generates a shadow dataset (member and nonmember samples and their corresponding model predictions) by training the shadow models on a synthetic dataset generated from the target classifier using the hill climbing algorithm from R. Shokri et al. (2017)

Paper Link: https://arxiv.org/abs/1610.05820

Parameters:

target_classifier – The classifier to synthesize data from.
dataset_size (int) – How many records to synthesize.
max_features_randomized – The initial amount of features to randomize before fine-tuning. If None, half of record features will be used, which will not work well for one-hot encoded data.
member_ratio (float) – Percentage of the data that should be used to train the shadow models. Must be between 0 and 1.
min_confidence (float) – The minimum confidence the classifier assigns the target class for the record to be accepted (i.e. the hill-climbing algorithm is finished).
max_retries (int) – The maximum amount of record-generation retries. The initial random pick of a record for the hill-climbing algorithm might result in failing to optimize the target-class confidence, and so a new random record will be retried.
random_record_fn – Callback that returns a single random record (numpy array), i.e. all feature values are random. If None, random records are generated by treating each column in the input shape as a feature and choosing uniform values [0, 1) for each feature. This default behaviour is not correct for one-hot-encoded features, and a custom callback which provides a random record with random one-hot-encoded values should be used instead.
randomize_features_fn – Callback that accepts an existing record (numpy array) and an int which is the number of features to randomize. The callback should return a new record, where the specified number of features have been randomized. If None, records are randomized by treating each column in the input shape as a feature, and choosing uniform values [0, 1) for each randomized feature. This default behaviour is not correct for one-hot-encoded features, and a custom callback which randomizes one-hot-encoded features should be used instead.

Returns:

The shadow dataset generated. The shape is ((member_samples, true_label, model_prediction), (nonmember_samples, true_label, model_prediction)).

get_shadow_models() → Sequence[CLONABLE]¶: Returns the list of shadow models. generate_shadow_dataset or generate_synthetic_shadow_dataset must be called for the shadow models to be trained.

get_shadow_models_train_sets() → List[Tuple[ndarray, ndarray] | None]¶: Returns a list of tuples the form (shadow_x_train, shadow_y_train) for each shadow model. generate_shadow_dataset or generate_synthetic_shadow_dataset must be called before, or a list of Nones will be returned.

`art.attacks.inference.membership_inference`¶

Membership Inference Black-Box¶

Membership Inference Black-Box Rule-Based¶

Membership Inference Label-Only - Decision Boundary¶

Membership Inference Label-Only - Gap Attack¶

Shadow Models¶

Adversarial Robustness Toolbox

Navigation

Related Topics

art.attacks.inference.membership_inference¶

Membership Inference Black-Box¶

Membership Inference Black-Box Rule-Based¶

Membership Inference Label-Only - Decision Boundary¶

Membership Inference Label-Only - Gap Attack¶

Shadow Models¶

`art.attacks.inference.membership_inference`¶