art.attacks.inference.attribute_inference
¶
Module providing attribute inference attacks.
Attribute Inference Baseline¶
-
class
art.attacks.inference.attribute_inference.
AttributeInferenceBaseline
(attack_model: Optional[CLASSIFIER_TYPE] = None, attack_feature: Union[int, slice] = 0)¶ Implementation of a baseline attribute inference, not using a model.
The idea is to train a simple neural network to learn the attacked feature from the rest of the features. Should be used to compare with other attribute inference results.
-
__init__
(attack_model: Optional[CLASSIFIER_TYPE] = None, attack_feature: Union[int, slice] = 0)¶ Create an AttributeInferenceBaseline attack instance.
- Parameters
attack_model – The attack model to train, optional. If none is provided, a default model will be created.
attack_feature – The index of the feature to be attacked or a slice representing multiple indexes in case of a one-hot encoded feature.
-
fit
(x: numpy.ndarray) → None¶ Train the attack model.
- Parameters
x (
ndarray
) – Input to training process. Includes all features used to train the original model.
-
infer
(x: numpy.ndarray, y: Optional[numpy.ndarray] = None, **kwargs) → numpy.ndarray¶ Infer the attacked feature.
- Return type
ndarray
- Parameters
x (
ndarray
) – Input to attack. Includes all features except the attacked feature.y – Not used in this attack.
values (np.ndarray) – Possible values for attacked feature. Only needed in case of categorical feature (not one-hot).
- Returns
The inferred feature values.
-
Attribute Inference Black-Box¶
-
class
art.attacks.inference.attribute_inference.
AttributeInferenceBlackBox
(classifier: CLASSIFIER_TYPE, attack_model: Optional[CLASSIFIER_TYPE] = None, attack_feature: Union[int, slice] = 0)¶ Implementation of a simple black-box attribute inference attack.
The idea is to train a simple neural network to learn the attacked feature from the rest of the features and the model’s predictions. Assumes the availability of the attacked model’s predictions for the samples under attack, in addition to the rest of the feature values. If this is not available, the true class label of the samples may be used as a proxy.
-
__init__
(classifier: CLASSIFIER_TYPE, attack_model: Optional[CLASSIFIER_TYPE] = None, attack_feature: Union[int, slice] = 0)¶ Create an AttributeInferenceBlackBox attack instance.
- Parameters
classifier – Target classifier.
attack_model – The attack model to train, optional. If none is provided, a default model will be created.
attack_feature – The index of the feature to be attacked or a slice representing multiple indexes in case of a one-hot encoded feature.
-
fit
(x: numpy.ndarray) → None¶ Train the attack model.
- Parameters
x (
ndarray
) – Input to training process. Includes all features used to train the original model.
-
infer
(x: numpy.ndarray, y: Optional[numpy.ndarray] = None, **kwargs) → numpy.ndarray¶ Infer the attacked feature.
- Return type
ndarray
- Parameters
x (
ndarray
) – Input to attack. Includes all features except the attacked feature.y – Original model’s predictions for x.
values (np.ndarray) – Possible values for attacked feature. Only needed in case of categorical feature (not one-hot).
- Returns
The inferred feature values.
-
Attribute Inference White-Box Lifestyle Decision-Tree¶
-
class
art.attacks.inference.attribute_inference.
AttributeInferenceWhiteBoxLifestyleDecisionTree
(classifier: CLASSIFIER_TYPE, attack_feature: int = 0)¶ Implementation of Fredrikson et al. white box inference attack for decision trees.
Assumes that the attacked feature is discrete or categorical, with limited number of possible values. For example: a boolean feature.
Paper link: https://dl.acm.org/doi/10.1145/2810103.2813677-
__init__
(classifier: CLASSIFIER_TYPE, attack_feature: int = 0)¶ Create an AttributeInferenceWhiteBoxLifestyle attack instance.
- Parameters
classifier – Target classifier.
attack_feature (
int
) – The index of the feature to be attacked.
-
infer
(x: numpy.ndarray, y: Optional[numpy.ndarray] = None, **kwargs) → numpy.ndarray¶ Infer the attacked feature.
- Parameters
x (
ndarray
) – Input to attack. Includes all features except the attacked feature.y – Not used.
values (np.ndarray) – Possible values for attacked feature.
priors (np.ndarray) – Prior distributions of attacked feature values. Same size array as values.
- Returns
The inferred feature values.
- Return type
np.ndarray
-
Attribute Inference White-Box Decision-Tree¶
-
class
art.attacks.inference.attribute_inference.
AttributeInferenceWhiteBoxDecisionTree
(classifier: art.estimators.classification.scikitlearn.ScikitlearnDecisionTreeClassifier, attack_feature: int = 0)¶ A variation of the method proposed by of Fredrikson et al. in: https://dl.acm.org/doi/10.1145/2810103.2813677
Assumes the availability of the attacked model’s predictions for the samples under attack, in addition to access to the model itself and the rest of the feature values. If this is not available, the true class label of the samples may be used as a proxy. Also assumes that the attacked feature is discrete or categorical, with limited number of possible values. For example: a boolean feature.
Paper link: https://dl.acm.org/doi/10.1145/2810103.2813677-
__init__
(classifier: art.estimators.classification.scikitlearn.ScikitlearnDecisionTreeClassifier, attack_feature: int = 0)¶ Create an AttributeInferenceWhiteBox attack instance.
- Parameters
classifier (
ScikitlearnDecisionTreeClassifier
) – Target classifier.attack_feature (
int
) – The index of the feature to be attacked.
-
infer
(x: numpy.ndarray, y: Optional[numpy.ndarray] = None, **kwargs) → numpy.ndarray¶ Infer the attacked feature.
If the model’s prediction coincides with the real prediction for the sample for a single value, choose it as the predicted value. If not, fall back to the Fredrikson method (without phi)
- Return type
ndarray
- Parameters
x (
ndarray
) – Input to attack. Includes all features except the attacked feature.y – Original model’s predictions for x.
values (np.ndarray) – Possible values for attacked feature.
priors (np.ndarray) – Prior distributions of attacked feature values. Same size array as values.
- Returns
The inferred feature values.
-