art.defences.detector.poison
¶
Module implementing detector-based defences against poisoning attacks.
Base Class¶
- class art.defences.detector.poison.PoisonFilteringDefence(classifier: CLASSIFIER_TYPE, x_train: ndarray, y_train: ndarray)¶
Base class for all poison filtering defences.
- abstract detect_poison(**kwargs) Tuple[dict, List[int]] ¶
Detect poison.
- Return type
Tuple
- Parameters
kwargs – Defence-specific parameters used by child classes.
- Returns
Dictionary with report and list with items identified as poison.
- abstract evaluate_defence(is_clean: ndarray, **kwargs) str ¶
Evaluate the defence given the labels specifying if the data is poisoned or not.
- Return type
str
- Parameters
is_clean (
ndarray
) – 1-D array where is_clean[i]=1 means x_train[i] is clean and is_clean[i]=0 that it’s poison.kwargs – Defence-specific parameters used by child classes.
- Returns
JSON object with confusion matrix.
- get_params() Dict[str, Any] ¶
Returns dictionary of parameters used to run defence.
- Returns
Dictionary of parameters of the method.
- set_params(**kwargs) None ¶
Take in a dictionary of parameters and apply attack-specific checks before saving them as attributes.
- Parameters
kwargs – A dictionary of defence-specific parameters.
Activation Defence¶
- class art.defences.detector.poison.ActivationDefence(classifier: CLASSIFIER_NEURALNETWORK_TYPE, x_train: ndarray, y_train: ndarray, generator: Optional[DataGenerator] = None, ex_re_threshold: Optional[float] = None)¶
Method from Chen et al., 2018 performing poisoning detection based on activations clustering.
Paper link: https://arxiv.org/abs/1811.03728Please keep in mind the limitations of defences. For more information on the limitations of this defence, see https://arxiv.org/abs/1905.13409 . For details on how to evaluate classifier security in general, see https://arxiv.org/abs/1902.06705- analyze_clusters(**kwargs) Tuple[Dict[str, Any], ndarray] ¶
This function analyzes the clusters according to the provided method.
- Return type
Tuple
- Parameters
kwargs – A dictionary of cluster-analysis-specific parameters.
- Returns
(report, assigned_clean_by_class), where the report is a dict object and assigned_clean_by_class is a list of arrays that contains what data points where classified as clean.
- cluster_activations(**kwargs) Tuple[List[ndarray], List[ndarray]] ¶
Clusters activations and returns cluster_by_class and red_activations_by_class, where cluster_by_class[i][j] is the cluster to which the j-th data point in the ith class belongs and the correspondent activations reduced by class red_activations_by_class[i][j].
- Return type
Tuple
- Parameters
kwargs – A dictionary of cluster-specific parameters.
- Returns
Clusters per class and activations by class.
- detect_poison(**kwargs) Tuple[Dict[str, Any], List[int]] ¶
Returns poison detected and a report.
- Return type
Tuple
- Parameters
clustering_method (str) – clustering algorithm to be used. Currently KMeans is the only method supported
nb_clusters (int) – number of clusters to find. This value needs to be greater or equal to one
reduce (str) – method used to reduce dimensionality of the activations. Supported methods include PCA, FastICA and TSNE
nb_dims (int) – number of dimensions to be reduced
cluster_analysis (str) – heuristic to automatically determine if a cluster contains poisonous data. Supported methods include smaller and distance. The smaller method defines as poisonous the cluster with less number of data points, while the distance heuristic uses the distance between the clusters.
- Returns
(report, is_clean_lst): where a report is a dict object that contains information specified by the clustering analysis technique where is_clean is a list, where is_clean_lst[i]=1 means that x_train[i] there is clean and is_clean_lst[i]=0, means that x_train[i] was classified as poison.
- evaluate_defence(is_clean: ndarray, **kwargs) str ¶
If ground truth is known, this function returns a confusion matrix in the form of a JSON object.
- Return type
str
- Parameters
is_clean (
ndarray
) – Ground truth, where is_clean[i]=1 means that x_train[i] is clean and is_clean[i]=0 means x_train[i] is poisonous.kwargs – A dictionary of defence-specific parameters.
- Returns
JSON object with confusion matrix.
- exclusionary_reclassification(report: Dict[str, Any])¶
This function perform exclusionary reclassification. Based on the ex_re_threshold, suspicious clusters will be rechecked. If they remain suspicious, the suspected source class will be added to the report and the data will be relabelled. The new labels are stored in self.y_train_relabelled
- Parameters
report (
Dict
) – A dictionary containing defence params as well as the class clusters and their suspiciousness.- Returns
report where the report is a dict object
- plot_clusters(save: bool = True, folder: str = '.', **kwargs) None ¶
Creates a 3D-plot to visualize each cluster each cluster is assigned a different color in the plot. When save=True, it also stores the 3D-plot per cluster in art.config.ART_DATA_PATH.
- Parameters
save (
bool
) – Boolean specifying if image should be saved.folder (
str
) – Directory where the sprites will be saved inside art.config.ART_DATA_PATH folder.kwargs – a dictionary of cluster-analysis-specific parameters.
- static relabel_poison_cross_validation(classifier: CLASSIFIER_NEURALNETWORK_TYPE, x: ndarray, y_fix: ndarray, n_splits: int = 10, tolerable_backdoor: float = 0.01, max_epochs: int = 50, batch_epochs: int = 10) Tuple[float, CLASSIFIER_NEURALNETWORK_TYPE] ¶
Revert poison attack by continue training the current classifier with x, y_fix. n_splits determines the number of cross validation splits.
- Return type
Tuple
- Parameters
classifier – Classifier to be fixed.
x (
ndarray
) – Samples that were miss-labeled.y_fix (
ndarray
) – True label of x.n_splits (
int
) – Determines how many splits to use in cross validation (only used if cross_validation=True).tolerable_backdoor (
float
) – Threshold that determines what is the maximum tolerable backdoor success rate.max_epochs (
int
) – Maximum number of epochs that the model will be trained.batch_epochs (
int
) – Number of epochs to be trained before checking current state of model.
- Returns
(improve_factor, classifier)
- static relabel_poison_ground_truth(classifier: CLASSIFIER_NEURALNETWORK_TYPE, x: ndarray, y_fix: ndarray, test_set_split: float = 0.7, tolerable_backdoor: float = 0.01, max_epochs: int = 50, batch_epochs: int = 10) Tuple[float, CLASSIFIER_NEURALNETWORK_TYPE] ¶
Revert poison attack by continue training the current classifier with x, y_fix. test_set_split determines the percentage in x that will be used as training set, while 1-test_set_split determines how many data points to use for test set.
- Return type
Tuple
- Parameters
classifier – Classifier to be fixed.
x (
ndarray
) – Samples.y_fix (
ndarray
) – True label of x_poison.test_set_split (
float
) – this parameter determine how much data goes to the training set. Here test_set_split*len(y_fix) determines the number of data points in x_train and (1-test_set_split) * len(y_fix) the number of data points in x_test.tolerable_backdoor (
float
) – Threshold that determines what is the maximum tolerable backdoor success rate.max_epochs (
int
) – Maximum number of epochs that the model will be trained.batch_epochs (
int
) – Number of epochs to be trained before checking current state of model.
- Returns
(improve_factor, classifier).
- visualize_clusters(x_raw: ndarray, save: bool = True, folder: str = '.', **kwargs) List[List[ndarray]] ¶
This function creates the sprite/mosaic visualization for clusters. When save=True, it also stores a sprite (mosaic) per cluster in art.config.ART_DATA_PATH.
- Return type
List
- Parameters
x_raw (
ndarray
) – Images used to train the classifier (before pre-processing).save (
bool
) – Boolean specifying if image should be saved.folder (
str
) – Directory where the sprites will be saved inside art.config.ART_DATA_PATH folder.kwargs – a dictionary of cluster-analysis-specific parameters.
- Returns
Array with sprite images sprites_by_class, where sprites_by_class[i][j] contains the sprite of class i cluster j.
Data Provenance Defense¶
- class art.defences.detector.poison.ProvenanceDefense(classifier: CLASSIFIER_TYPE, x_train: ndarray, y_train: ndarray, p_train: ndarray, x_val: Optional[ndarray] = None, y_val: Optional[ndarray] = None, eps: float = 0.2, perf_func: str = 'accuracy', pp_valid: float = 0.2)¶
Implements methods performing poisoning detection based on data provenance.
- detect_poison(**kwargs) Tuple[Dict[int, float], List[int]] ¶
Returns poison detected and a report.
- Parameters
kwargs – A dictionary of detection-specific parameters.
- Returns
(report, is_clean_lst): where a report is a dict object that contains information specified by the provenance detection method where is_clean is a list, where is_clean_lst[i]=1 means that x_train[i] there is clean and is_clean_lst[i]=0, means that x_train[i] was classified as poison.
- Return type
tuple
- detect_poison_partially_trusted(**kwargs) Dict[int, float] ¶
Detect poison given trusted validation data
- Returns
dictionary where keys are suspected poisonous device indices and values are performance differences
- detect_poison_untrusted(**kwargs) Dict[int, float] ¶
Detect poison given no trusted validation data
- Returns
dictionary where keys are suspected poisonous device indices and values are performance differences
- evaluate_defence(is_clean: ndarray, **kwargs) str ¶
Returns confusion matrix.
- Return type
str
- Parameters
is_clean (
ndarray
) – Ground truth, where is_clean[i]=1 means that x_train[i] is clean and is_clean[i]=0 means x_train[i] is poisonous.kwargs – A dictionary of defence-specific parameters.
- Returns
JSON object with confusion matrix.
- static filter_input(data: ndarray, labels: ndarray, segment: ndarray) Tuple[ndarray, ndarray] ¶
Return the data and labels that are not part of a specified segment
- Return type
Tuple
- Parameters
data (
ndarray
) – The data to segment.labels (
ndarray
) – The corresponding labels to segmentsegment (
ndarray
) –
- Returns
Tuple of (filtered_data, filtered_labels).
Reject on Negative Impact (RONI) Defense¶
- class art.defences.detector.poison.RONIDefense(classifier: CLASSIFIER_TYPE, x_train: ndarray, y_train: ndarray, x_val: ndarray, y_val: ndarray, perf_func: Union[str, Callable] = 'accuracy', pp_cal: float = 0.2, pp_quiz: float = 0.2, calibrated: bool = True, eps: float = 0.1)¶
Close implementation based on description in Nelson ‘Behavior of Machine Learning Algorithms in Adversarial Environments’ Ch. 4.4
- detect_poison(**kwargs) Tuple[dict, List[int]] ¶
Returns poison detected and a report.
- Return type
Tuple
- Parameters
kwargs – A dictionary of detection-specific parameters.
- Returns
(report, is_clean_lst): where a report is a dict object that contains information specified by the provenance detection method where is_clean is a list, where is_clean_lst[i]=1 means that x_train[i] there is clean and is_clean_lst[i]=0, means that x_train[i] was classified as poison.
- evaluate_defence(is_clean: ndarray, **kwargs) str ¶
Returns confusion matrix.
- Return type
str
- Parameters
is_clean (
ndarray
) – Ground truth, where is_clean[i]=1 means that x_train[i] is clean and is_clean[i]=0 means x_train[i] is poisonous.kwargs – A dictionary of defence-specific parameters.
- Returns
JSON object with confusion matrix.
- get_calibration_info(before_classifier: CLASSIFIER_TYPE) Tuple[ndarray, ndarray] ¶
Calculate the median and standard deviation of the accuracy shifts caused by the calibration set.
- Return type
Tuple
- Parameters
before_classifier – The classifier trained without suspicious point.
- Returns
A tuple consisting of (median, std_dev).
- is_suspicious(before_classifier: CLASSIFIER_TYPE, perf_shift: float) bool ¶
Returns True if a given performance shift is suspicious
- Return type
bool
- Parameters
before_classifier – The classifier without untrusted data.
perf_shift (
float
) – A shift in performance.
- Returns
True if a given performance shift is suspicious, false otherwise.
Spectral Signature Defense¶
- class art.defences.detector.poison.SpectralSignatureDefense(classifier: CLASSIFIER_NEURALNETWORK_TYPE, x_train: ndarray, y_train: ndarray, expected_pp_poison: float = 0.33, batch_size: int = 128, eps_multiplier: float = 1.5)¶
Method from Tran et al., 2018 performing poisoning detection based on Spectral Signatures
- detect_poison(**kwargs) Tuple[dict, List[int]] ¶
Returns poison detected and a report.
- Returns
(report, is_clean_lst): where a report is a dictionary containing the index as keys the outlier score of suspected poisons as values where is_clean is a list, where is_clean_lst[i]=1 means that x_train[i] there is clean and is_clean_lst[i]=0, means that x_train[i] was classified as poison.
- evaluate_defence(is_clean: ndarray, **kwargs) str ¶
If ground truth is known, this function returns a confusion matrix in the form of a JSON object.
- Return type
str
- Parameters
is_clean (
ndarray
) – Ground truth, where is_clean[i]=1 means that x_train[i] is clean and is_clean[i]=0 means x_train[i] is poisonous.kwargs – A dictionary of defence-specific parameters.
- Returns
JSON object with confusion matrix.
- static spectral_signature_scores(matrix_r: ndarray) ndarray ¶
- Return type
ndarray
- Parameters
matrix_r (
ndarray
) – Matrix of feature representations.- Returns
Outlier scores for each observation based on spectral signature.