art.attacks.inference.membership_inference
¶
Module providing membership inference attacks.
Membership Inference Black-Box¶
- class art.attacks.inference.membership_inference.MembershipInferenceBlackBox(estimator: CLASSIFIER_TYPE | REGRESSOR_TYPE, input_type: str = 'prediction', attack_model_type: str = 'nn', attack_model: Any | None = None, nn_model_epochs: int = 100, nn_model_batch_size: int = 100, nn_model_learning_rate: float = 0.0001)¶
Implementation of a learned black-box membership inference attack.
This implementation can use as input to the learning process probabilities/logits or losses, depending on the type of model and provided configuration.
- __init__(estimator: CLASSIFIER_TYPE | REGRESSOR_TYPE, input_type: str = 'prediction', attack_model_type: str = 'nn', attack_model: Any | None = None, nn_model_epochs: int = 100, nn_model_batch_size: int = 100, nn_model_learning_rate: float = 0.0001)¶
Create a MembershipInferenceBlackBox attack instance.
- Parameters:
estimator – Target estimator.
attack_model_type (
str
) – the type of default attack model to train, optional. Should be one of: nn (neural network, default), rf (random forest), gb (gradient boosting), lr (logistic regression), dt (decision tree), knn (k nearest neighbors), svm (support vector machine). If attack_model is supplied, this option will be ignored.input_type (
str
) – the type of input to train the attack on. Can be one of: ‘prediction’ or ‘loss’. Default is prediction. Predictions can be either probabilities or logits, depending on the return type of the model. If the model is a regressor, only loss can be used.attack_model – The attack model to train, optional. If none is provided, a default model will be created.
nn_model_epochs (
int
) – the number of epochs to use when training a nn attack modelnn_model_batch_size (
int
) – the batch size to use when training a nn attack modelnn_model_learning_rate (
float
) – the learning rate to use when training a nn attack model
- fit(x: ndarray, y: ndarray, test_x: ndarray, test_y: ndarray, pred: ndarray | None = None, test_pred: ndarray | None = None, **kwargs)¶
Train the attack model.
- Parameters:
x (
ndarray
) – Records that were used in training the target estimator. Can be None if supplying pred.y (
ndarray
) – True labels for x.test_x (
ndarray
) – Records that were not used in training the target estimator. Can be None if supplying test_pred.test_y (
ndarray
) – True labels for test_x.pred – Estimator predictions for the records, if not supplied will be generated by calling the estimators’ predict function. Only relevant for input_type=’prediction’.
test_pred – Estimator predictions for the test records, if not supplied will be generated by calling the estimators’ predict function. Only relevant for input_type=’prediction’.
- Returns:
An array holding the inferred membership status, 1 indicates a member and 0 indicates non-member.
- infer(x: ndarray, y: ndarray | None = None, **kwargs) ndarray ¶
Infer membership in the training set of the target estimator.
- Return type:
ndarray
- Parameters:
x (
ndarray
) – Input records to attack. Can be None if supplying pred.y – True labels for x.
pred – Estimator predictions for the records, if not supplied will be generated by calling the estimators’ predict function. Only relevant for input_type=’prediction’.
probabilities – a boolean indicating whether to return the predicted probabilities per class, or just the predicted class.
- Returns:
An array holding the inferred membership status, 1 indicates a member and 0 indicates non-member, or class probabilities.
Membership Inference Black-Box Rule-Based¶
- class art.attacks.inference.membership_inference.MembershipInferenceBlackBoxRuleBased(classifier: CLASSIFIER_TYPE)¶
Implementation of a simple, rule-based black-box membership inference attack.
This implementation uses the simple rule: if the model’s prediction for a sample is correct, then it is a member. Otherwise, it is not a member.
- __init__(classifier: CLASSIFIER_TYPE)¶
Create a MembershipInferenceBlackBoxRuleBased attack instance.
- Parameters:
classifier – Target classifier.
- infer(x: ndarray, y: ndarray | None = None, **kwargs) ndarray ¶
Infer membership in the training set of the target estimator.
- Return type:
ndarray
- Parameters:
x (
ndarray
) – Input records to attack.y – True labels for x.
probabilities – a boolean indicating whether to return the predicted probabilities per class, or just the predicted class.
- Returns:
An array holding the inferred membership status, 1 indicates a member and 0 indicates non-member, or class probabilities.
Membership Inference Label-Only - Decision Boundary¶
- class art.attacks.inference.membership_inference.LabelOnlyDecisionBoundary(estimator: CLASSIFIER_TYPE, distance_threshold_tau: float | None = None)¶
Implementation of Label-Only Inference Attack based on Decision Boundary.
You only need to call ONE of the calibrate methods, depending on which attack you want to launch.
Paper link: https://arxiv.org/abs/2007.14321 (Choquette-Choo et al.)Paper link: https://arxiv.org/abs/2007.15528 (Li and Zhang)- __init__(estimator: CLASSIFIER_TYPE, distance_threshold_tau: float | None = None)¶
Create a LabelOnlyDecisionBoundary instance for Label-Only Inference Attack based on Decision Boundary.
- Parameters:
estimator – A trained classification estimator.
distance_threshold_tau – Threshold distance for decision boundary. Samples with boundary distances larger than threshold are considered members of the training dataset.
- calibrate_distance_threshold(x_train: ndarray, y_train: ndarray, x_test: ndarray, y_test: ndarray, **kwargs)¶
Calibrate distance threshold maximising the membership inference accuracy on x_train and x_test.
Paper link: https://arxiv.org/abs/2007.14321- Parameters:
x_train (
ndarray
) – Training data.y_train (
ndarray
) – Labels of training data x_train.x_test (
ndarray
) – Test data.y_test (
ndarray
) – Labels of test data x_test.
- Keyword Arguments for HopSkipJump:
norm: Order of the norm. Possible values: “inf”, np.inf or 2.
max_iter: Maximum number of iterations.
max_eval: Maximum number of evaluations for estimating gradient.
init_eval: Initial number of evaluations for estimating gradient.
init_size: Maximum number of trials for initial generation of adversarial examples.
verbose: Show progress bars.
- calibrate_distance_threshold_unsupervised(top_t: int = 50, num_samples: int = 100, max_queries: int = 1, **kwargs)¶
Calibrate distance threshold on randomly generated samples, choosing the top-t percentile of the noise needed to change the classifier’s initial prediction. This method requires the model’s clip_values to be set.
Paper link: https://arxiv.org/abs/2007.15528- Parameters:
top_t (
int
) – Top-t percentile.num_samples (
int
) – Number of random samples to generate.max_queries (
int
) – Maximum number of queries. Maximum number of HSJ iterations on a single sample will be max_queries * max_iter.
- Keyword Arguments for HopSkipJump:
norm: Order of the norm. Possible values: “inf”, np.inf or 2.
max_iter: Maximum number of iterations.
max_eval: Maximum number of evaluations for estimating gradient.
init_eval: Initial number of evaluations for estimating gradient.
init_size: Maximum number of trials for initial generation of adversarial examples.
verbose: Show progress bars.
- infer(x: ndarray, y: ndarray | None = None, **kwargs) ndarray ¶
Infer membership of input x in estimator’s training data.
- Return type:
ndarray
- Parameters:
x (
ndarray
) – Input data.y – True labels for x.
probabilities – a boolean indicating whether to return the predicted probabilities per class, or just the predicted class
- Keyword Arguments for HopSkipJump:
norm: Order of the norm. Possible values: “inf”, np.inf or 2.
max_iter: Maximum number of iterations.
max_eval: Maximum number of evaluations for estimating gradient.
init_eval: Initial number of evaluations for estimating gradient.
init_size: Maximum number of trials for initial generation of adversarial examples.
verbose: Show progress bars.
- Returns:
An array holding the inferred membership status, 1 indicates a member and 0 indicates non-member, or class probabilities.
Membership Inference Label-Only - Gap Attack¶
- art.attacks.inference.membership_inference.LabelOnlyGapAttack¶
alias of
MembershipInferenceBlackBoxRuleBased
Shadow Models¶
- class art.attacks.inference.membership_inference.ShadowModels(shadow_model_template: CLONABLE, num_shadow_models: int = 3, disjoint_datasets=False, random_state=None)¶
Utility for training shadow models and generating shadow-datasets for membership inference attacks in scikit-learn, PyTorch and TensorFlow v2.
- __init__(shadow_model_template: CLONABLE, num_shadow_models: int = 3, disjoint_datasets=False, random_state=None)¶
Initializes shadow models using the provided template.
- Parameters:
shadow_model_template – Untrained classifier model to be used as a template for shadow models. Should be as similar as possible to the target model. Must implement clone_for_refitting method.
num_shadow_models (
int
) – How many shadow models to train to generate the shadow dataset.disjoint_datasets (
bool
) – A boolean indicating whether the datasets used to train each shadow model should be disjoint. Default is False.random_state – Seed for the numpy default random number generator.
- __weakref__¶
list of weak references to the object (if defined)
- generate_shadow_dataset(x: ndarray, y: ndarray, member_ratio: float = 0.5) Tuple[Tuple[ndarray, ndarray, ndarray], Tuple[ndarray, ndarray, ndarray]] ¶
Generates a shadow dataset (member and nonmember samples and their corresponding model predictions) by splitting the dataset into training and testing samples, and then training the shadow models on the result.
- Parameters:
x (
ndarray
) – The samples used to train the shadow models.y (
ndarray
) – True labels for the dataset samples (as expected by the estimator’s fit method).member_ratio (
float
) – Percentage of the data that should be used to train the shadow models. Must be between 0 and 1.
- Returns:
The shadow dataset generated. The shape is ((member_samples, true_label, model_prediction), (nonmember_samples, true_label, model_prediction)).
- generate_synthetic_shadow_dataset(target_classifier: CLASSIFIER_TYPE, dataset_size: int, max_features_randomized: int | None, member_ratio: float = 0.5, min_confidence: float = 0.4, max_retries: int = 6, random_record_fn: Callable[[], ndarray] = None, randomize_features_fn: Callable[[ndarray, int], ndarray] = None) Tuple[Tuple[ndarray, ndarray, ndarray], Tuple[ndarray, ndarray, ndarray]] ¶
Generates a shadow dataset (member and nonmember samples and their corresponding model predictions) by training the shadow models on a synthetic dataset generated from the target classifier using the hill climbing algorithm from R. Shokri et al. (2017)
Paper Link: https://arxiv.org/abs/1610.05820
- Parameters:
target_classifier – The classifier to synthesize data from.
dataset_size (
int
) – How many records to synthesize.max_features_randomized – The initial amount of features to randomize before fine-tuning. If None, half of record features will be used, which will not work well for one-hot encoded data.
member_ratio (
float
) – Percentage of the data that should be used to train the shadow models. Must be between 0 and 1.min_confidence (
float
) – The minimum confidence the classifier assigns the target class for the record to be accepted (i.e. the hill-climbing algorithm is finished).max_retries (
int
) – The maximum amount of record-generation retries. The initial random pick of a record for the hill-climbing algorithm might result in failing to optimize the target-class confidence, and so a new random record will be retried.random_record_fn – Callback that returns a single random record (numpy array), i.e. all feature values are random. If None, random records are generated by treating each column in the input shape as a feature and choosing uniform values [0, 1) for each feature. This default behaviour is not correct for one-hot-encoded features, and a custom callback which provides a random record with random one-hot-encoded values should be used instead.
randomize_features_fn – Callback that accepts an existing record (numpy array) and an int which is the number of features to randomize. The callback should return a new record, where the specified number of features have been randomized. If None, records are randomized by treating each column in the input shape as a feature, and choosing uniform values [0, 1) for each randomized feature. This default behaviour is not correct for one-hot-encoded features, and a custom callback which randomizes one-hot-encoded features should be used instead.
- Returns:
The shadow dataset generated. The shape is ((member_samples, true_label, model_prediction), (nonmember_samples, true_label, model_prediction)).
- get_shadow_models() Sequence[CLONABLE] ¶
Returns the list of shadow models. generate_shadow_dataset or generate_synthetic_shadow_dataset must be called for the shadow models to be trained.
- get_shadow_models_train_sets() List[Tuple[ndarray, ndarray] | None] ¶
Returns a list of tuples the form (shadow_x_train, shadow_y_train) for each shadow model. generate_shadow_dataset or generate_synthetic_shadow_dataset must be called before, or a list of Nones will be returned.