art.attacks.inference.attribute_inference

Module providing attribute inference attacks.

Attribute Inference Baseline

class art.attacks.inference.attribute_inference.AttributeInferenceBaseline(attack_model_type: str = 'nn', attack_model: CLASSIFIER_TYPE | REGRESSOR_TYPE | None = None, attack_feature: int | slice = 0, is_continuous: bool | None = False, non_numerical_features: List[int] | None = None, encoder: OrdinalEncoder | OneHotEncoder | ColumnTransformer | None = None, nn_model_epochs: int = 100, nn_model_batch_size: int = 100, nn_model_learning_rate: float = 0.0001)

Implementation of a baseline attribute inference, not using a model.

The idea is to train a simple neural network to learn the attacked feature from the rest of the features. Should be used to compare with other attribute inference results.

__init__(attack_model_type: str = 'nn', attack_model: CLASSIFIER_TYPE | REGRESSOR_TYPE | None = None, attack_feature: int | slice = 0, is_continuous: bool | None = False, non_numerical_features: List[int] | None = None, encoder: OrdinalEncoder | OneHotEncoder | ColumnTransformer | None = None, nn_model_epochs: int = 100, nn_model_batch_size: int = 100, nn_model_learning_rate: float = 0.0001)

Create an AttributeInferenceBaseline attack instance.

Parameters:
  • attack_model_type (str) –

    the type of default attack model to train, optional. Should be one of: nn (neural network, default), rf (random forest), gb (gradient boosting), lr (logistic/linear regression), dt (decision tree), knn (k nearest neighbors), svm (support vector machine).

    If attack_model is supplied, this option will be ignored.

  • attack_model – The attack model to train, optional. If none is provided, a default model will be created.

  • attack_feature – The index of the feature to be attacked or a slice representing multiple indexes in case of a one-hot encoded feature.

  • is_continuous – Whether the attacked feature is continuous. Default is False (which means categorical).

  • non_numerical_features – a list of feature indexes that require encoding in order to feed into an ML model (i.e., strings), not including the attacked feature. Should only be supplied if non-numeric features exist in the input data not including the attacked feature, and an encoder is not supplied.

  • encoder – An already fit encoder that can be applied to the model’s input features without the attacked feature (i.e., should be fit for n-1 features).

  • nn_model_epochs (int) – the number of epochs to use when training a nn attack model

  • nn_model_batch_size (int) – the batch size to use when training a nn attack model

  • nn_model_learning_rate (float) – the learning rate to use when training a nn attack model

fit(x: ndarray) None

Train the attack model.

Parameters:

x (ndarray) – Input to training process. Includes all features used to train the original model.

infer(x: ndarray, y: ndarray | None = None, **kwargs) ndarray

Infer the attacked feature.

Return type:

ndarray

Parameters:
  • x (ndarray) – Input to attack. Includes all features except the attacked feature.

  • y – Not used in this attack.

  • values (list) – Possible values for attacked feature. For a single column feature this should be a simple list containing all possible values, in increasing order (the smallest value in the 0 index and so on). For a multi-column feature (for example 1-hot encoded and then scaled), this should be a list of lists, where each internal list represents a column (in increasing order) and the values represent the possible values for that column (in increasing order).

Returns:

The inferred feature values.

Attribute Inference Black-Box

class art.attacks.inference.attribute_inference.AttributeInferenceBlackBox(estimator: CLASSIFIER_TYPE | REGRESSOR_TYPE, attack_model_type: str = 'nn', attack_model: CLASSIFIER_TYPE | REGRESSOR_TYPE | None = None, attack_feature: int | slice = 0, is_continuous: bool | None = False, scale_range: Tuple[float, float] | None = None, prediction_normal_factor: float | None = 1, non_numerical_features: List[int] | None = None, encoder: OrdinalEncoder | OneHotEncoder | ColumnTransformer | None = None, nn_model_epochs: int = 100, nn_model_batch_size: int = 100, nn_model_learning_rate: float = 0.0001)

Implementation of a simple black-box attribute inference attack.

The idea is to train a simple neural network to learn the attacked feature from the rest of the features and the model’s predictions. Assumes the availability of the attacked model’s predictions for the samples under attack, in addition to the rest of the feature values. If this is not available, the true class label of the samples may be used as a proxy.

__init__(estimator: CLASSIFIER_TYPE | REGRESSOR_TYPE, attack_model_type: str = 'nn', attack_model: CLASSIFIER_TYPE | REGRESSOR_TYPE | None = None, attack_feature: int | slice = 0, is_continuous: bool | None = False, scale_range: Tuple[float, float] | None = None, prediction_normal_factor: float | None = 1, non_numerical_features: List[int] | None = None, encoder: OrdinalEncoder | OneHotEncoder | ColumnTransformer | None = None, nn_model_epochs: int = 100, nn_model_batch_size: int = 100, nn_model_learning_rate: float = 0.0001)

Create an AttributeInferenceBlackBox attack instance.

Parameters:
  • estimator – Target estimator.

  • attack_model_type (str) –

    the type of default attack model to train, optional. Should be one of: nn (neural network, default), rf (random forest), gb (gradient boosting), lr (logistic/linear regression), dt (decision tree), knn (k nearest neighbors), svm (support vector machine).

    If attack_model is supplied, this option will be ignored.

  • attack_model – The attack model to train, optional. If the attacked feature is continuous, this should be a regression model, and if the attacked feature is categorical it should be a classifier.If none is provided, a default model will be created.

  • attack_feature – The index of the feature to be attacked or a slice representing multiple indexes in case of a one-hot encoded feature.

  • is_continuous – Whether the attacked feature is continuous. Default is False (which means categorical).

  • scale_range – If supplied, the class labels (both true and predicted) will be scaled to the given range. Only applicable when estimator is a regressor.

  • prediction_normal_factor – If supplied, the class labels (both true and predicted) are multiplied by the factor when used as inputs to the attack-model. Only applicable when estimator is a regressor and if scale_range is not supplied

  • non_numerical_features – a list of feature indexes that require encoding in order to feed into an ML model (i.e., strings), not including the attacked feature. Should only be supplied if non-numeric features exist in the input data not including the attacked feature, and an encoder is not supplied.

  • encoder – An already fit encoder that can be applied to the model’s input features without the attacked feature (i.e., should be fit for n-1 features).

  • nn_model_epochs (int) – the number of epochs to use when training a nn attack model

  • nn_model_batch_size (int) – the batch size to use when training a nn attack model

  • nn_model_learning_rate (float) – the learning rate to use when training a nn attack model

fit(x: ndarray, y: ndarray | None = None) None

Train the attack model.

Parameters:
  • x (ndarray) – Input to training process. Includes all features used to train the original model.

  • y – True labels for x.

infer(x: ndarray, y: ndarray | None = None, **kwargs) ndarray

Infer the attacked feature.

Return type:

ndarray

Parameters:
  • x (ndarray) – Input to attack. Includes all features except the attacked feature.

  • y – True labels for x.

  • pred (np.ndarray) – Original model’s predictions for x.

  • values (list, optional) – Possible values for attacked feature. For a single column feature this should be a simple list containing all possible values, in increasing order (the smallest value in the 0 index and so on). For a multi-column feature (for example 1-hot encoded and then scaled), this should be a list of lists, where each internal list represents a column (in increasing order) and the values represent the possible values for that column (in increasing order). If not provided, is computed from the training data when calling fit. Only relevant for categorical features.

Returns:

The inferred feature values.

Attribute Inference Membership

class art.attacks.inference.attribute_inference.AttributeInferenceMembership(estimator: CLASSIFIER_TYPE | REGRESSOR_TYPE, membership_attack: MembershipInferenceAttack, attack_feature: int | slice = 0)

Implementation of a an attribute inference attack that utilizes a membership inference attack.

The idea is to find the target feature value that causes the membership inference attack to classify the sample as a member with the highest confidence.

__init__(estimator: CLASSIFIER_TYPE | REGRESSOR_TYPE, membership_attack: MembershipInferenceAttack, attack_feature: int | slice = 0)

Create an AttributeInferenceMembership attack instance.

Parameters:
  • estimator – Target estimator.

  • membership_attack (MembershipInferenceAttack) – The membership inference attack to use. Should be fit/calibrated in advance, and should support returning probabilities. Should also support the target estimator.

  • attack_feature – The index of the feature to be attacked or a slice representing multiple indexes in case of a one-hot encoded feature.

infer(x: ndarray, y: ndarray | None = None, **kwargs) ndarray

Infer the attacked feature.

Return type:

ndarray

Parameters:
  • x (ndarray) – Input to attack. Includes all features except the attacked feature.

  • y – The labels expected by the membership attack.

  • values (list) – Possible values for attacked feature. For a single column feature this should be a simple list containing all possible values, in increasing order (the smallest value in the 0 index and so on). For a multi-column feature (for example 1-hot encoded and then scaled), this should be a list of lists, where each internal list represents a column (in increasing order) and the values represent the possible values for that column (in increasing order).

Returns:

The inferred feature values.

Attribute Inference Base Line True Label

class art.attacks.inference.attribute_inference.AttributeInferenceBaselineTrueLabel(attack_model_type: str = 'nn', attack_model: CLASSIFIER_TYPE | REGRESSOR_TYPE | None = None, attack_feature: int | slice = 0, is_continuous: bool | None = False, is_regression: bool | None = False, scale_range: Tuple[float, float] | None = None, prediction_normal_factor: float = 1, non_numerical_features: List[int] | None = None, encoder: OrdinalEncoder | OneHotEncoder | ColumnTransformer | None = None, nn_model_epochs: int = 100, nn_model_batch_size: int = 100, nn_model_learning_rate: float = 0.0001)

Implementation of a baseline attribute inference, not using a model.

The idea is to train a simple neural network to learn the attacked feature from the rest of the features, and the true label. Should be used to compare with other attribute inference results.

__init__(attack_model_type: str = 'nn', attack_model: CLASSIFIER_TYPE | REGRESSOR_TYPE | None = None, attack_feature: int | slice = 0, is_continuous: bool | None = False, is_regression: bool | None = False, scale_range: Tuple[float, float] | None = None, prediction_normal_factor: float = 1, non_numerical_features: List[int] | None = None, encoder: OrdinalEncoder | OneHotEncoder | ColumnTransformer | None = None, nn_model_epochs: int = 100, nn_model_batch_size: int = 100, nn_model_learning_rate: float = 0.0001)

Create an AttributeInferenceBaseline attack instance.

Parameters:
  • attack_model_type (str) –

    the type of default attack model to train, optional. Should be one of: nn (neural network, default), rf (random forest), gb (gradient boosting), lr (logistic/linear regression), dt (decision tree), knn (k nearest neighbors), svm (support vector machine).

    If attack_model is supplied, this option will be ignored.

  • attack_model – The attack model to train, optional. If none is provided, a default model will be created.

  • attack_feature – The index of the feature to be attacked or a slice representing multiple indexes in case of a one-hot encoded feature.

  • is_continuous – Whether the attacked feature is continuous. Default is False (which means categorical).

  • is_regression – Whether the model is a regression model. Default is False (classification).

  • scale_range – If supplied, the class labels (both true and predicted) will be scaled to the given range. Only applicable when is_regression is True.

  • prediction_normal_factor (float) – If supplied, the class labels (both true and predicted) are multiplied by the factor when used as inputs to the attack-model. Only applicable when is_regression is True and if scale_range is not supplied.

  • non_numerical_features – a list of feature indexes that require encoding in order to feed into an ML model (i.e., strings), not including the attacked feature. Should only be supplied if non-numeric features exist in the input data not including the attacked feature, and an encoder is not supplied.

  • encoder – An already fit encoder that can be applied to the model’s input features without the attacked feature (i.e., should be fit for n-1 features).

  • nn_model_epochs (int) – the number of epochs to use when training a nn attack model

  • nn_model_batch_size (int) – the batch size to use when training a nn attack model

  • nn_model_learning_rate (float) – the learning rate to use when training a nn attack model

fit(x: ndarray, y: ndarray) None

Train the attack model.

Parameters:
  • x (ndarray) – Input to training process. Includes all features used to train the original model.

  • y (ndarray) – True labels of the features.

infer(x: ndarray, y: ndarray | None = None, **kwargs) ndarray

Infer the attacked feature.

Return type:

ndarray

Parameters:
  • x (ndarray) – Input to attack. Includes all features except the attacked feature.

  • y – True labels of the features.

  • values (list) – Possible values for attacked feature. For a single column feature this should be a simple list containing all possible values, in increasing order (the smallest value in the 0 index and so on). For a multi-column feature (for example 1-hot encoded and then scaled), this should be a list of lists, where each internal list represents a column (in increasing order) and the values represent the possible values for that column (in increasing order).

Returns:

The inferred feature values.

Attribute Inference White-Box Lifestyle Decision-Tree

class art.attacks.inference.attribute_inference.AttributeInferenceWhiteBoxLifestyleDecisionTree(estimator: CLASSIFIER_TYPE | REGRESSOR_TYPE, attack_feature: int = 0)

Implementation of Fredrikson et al. white box inference attack for decision trees.

Assumes that the attacked feature is discrete or categorical, with limited number of possible values. For example: a boolean feature.

__init__(estimator: CLASSIFIER_TYPE | REGRESSOR_TYPE, attack_feature: int = 0)

Create an AttributeInferenceWhiteBoxLifestyle attack instance.

Parameters:
  • estimator – Target estimator.

  • attack_feature (int) – The index of the feature to be attacked.

infer(x: ndarray, y: ndarray | None = None, **kwargs) ndarray

Infer the attacked feature.

Parameters:
  • x (ndarray) – Input to attack. Includes all features except the attacked feature.

  • y – Not used.

  • values (list) – Possible values for attacked feature.

  • priors (list) – Prior distributions of attacked feature values. Same size array as values.

Returns:

The inferred feature values.

Return type:

np.ndarray

Attribute Inference White-Box Decision-Tree

class art.attacks.inference.attribute_inference.AttributeInferenceWhiteBoxDecisionTree(classifier: ScikitlearnDecisionTreeClassifier, attack_feature: int = 0)

A variation of the method proposed by of Fredrikson et al. in: https://dl.acm.org/doi/10.1145/2810103.2813677

Assumes the availability of the attacked model’s predictions for the samples under attack, in addition to access to the model itself and the rest of the feature values. If this is not available, the true class label of the samples may be used as a proxy. Also assumes that the attacked feature is discrete or categorical, with limited number of possible values. For example: a boolean feature.

__init__(classifier: ScikitlearnDecisionTreeClassifier, attack_feature: int = 0)

Create an AttributeInferenceWhiteBox attack instance.

Parameters:
infer(x: ndarray, y: ndarray | None = None, **kwargs) ndarray

Infer the attacked feature.

If the model’s prediction coincides with the real prediction for the sample for a single value, choose it as the predicted value. If not, fall back to the Fredrikson method (without phi)

Return type:

ndarray

Parameters:
  • x (ndarray) – Input to attack. Includes all features except the attacked feature.

  • y – Original model’s predictions for x.

  • values (list) – Possible values for attacked feature.

  • priors (list) – Prior distributions of attacked feature values. Same size array as values.

Returns:

The inferred feature values.