art.defences.detector.poison

Module implementing detector-based defences against poisoning attacks.

Base Class

class art.defences.detector.poison.PoisonFilteringDefence(classifier: CLASSIFIER_TYPE, x_train: ndarray, y_train: ndarray)

Base class for all poison filtering defences.

abstract detect_poison(**kwargs) Tuple[dict, List[int]]

Detect poison.

Parameters:

kwargs – Defence-specific parameters used by child classes.

Returns:

Dictionary with report and list with items identified as poison.

abstract evaluate_defence(is_clean: ndarray, **kwargs) str

Evaluate the defence given the labels specifying if the data is poisoned or not.

Return type:

str

Parameters:
  • is_clean (ndarray) – 1-D array where is_clean[i]=1 means x_train[i] is clean and is_clean[i]=0 that it’s poison.

  • kwargs – Defence-specific parameters used by child classes.

Returns:

JSON object with confusion matrix.

get_params() Dict[str, Any]

Returns dictionary of parameters used to run defence.

Returns:

Dictionary of parameters of the method.

set_params(**kwargs) None

Take in a dictionary of parameters and apply attack-specific checks before saving them as attributes.

Parameters:

kwargs – A dictionary of defence-specific parameters.

Activation Defence

class art.defences.detector.poison.ActivationDefence(classifier: CLASSIFIER_NEURALNETWORK_TYPE, x_train: ndarray, y_train: ndarray, generator: DataGenerator | None = None, ex_re_threshold: float | None = None)

Method from Chen et al., 2018 performing poisoning detection based on activations clustering.

Please keep in mind the limitations of defences. For more information on the limitations of this defence, see https://arxiv.org/abs/1905.13409 . For details on how to evaluate classifier security in general, see https://arxiv.org/abs/1902.06705
analyze_clusters(**kwargs) Tuple[Dict[str, Any], ndarray]

This function analyzes the clusters according to the provided method.

Parameters:

kwargs – A dictionary of cluster-analysis-specific parameters.

Returns:

(report, assigned_clean_by_class), where the report is a dict object and assigned_clean_by_class is a list of arrays that contains what data points where classified as clean.

cluster_activations(**kwargs) Tuple[List[ndarray], List[ndarray]]

Clusters activations and returns cluster_by_class and red_activations_by_class, where cluster_by_class[i][j] is the cluster to which the j-th data point in the ith class belongs and the correspondent activations reduced by class red_activations_by_class[i][j].

Parameters:

kwargs – A dictionary of cluster-specific parameters.

Returns:

Clusters per class and activations by class.

detect_poison(**kwargs) Tuple[Dict[str, Any], List[int]]

Returns poison detected and a report.

Parameters:
  • clustering_method (str) – clustering algorithm to be used. Currently KMeans is the only method supported

  • nb_clusters (int) – number of clusters to find. This value needs to be greater or equal to one

  • reduce (str) – method used to reduce dimensionality of the activations. Supported methods include PCA, FastICA and TSNE

  • nb_dims (int) – number of dimensions to be reduced

  • cluster_analysis (str) – heuristic to automatically determine if a cluster contains poisonous data. Supported methods include smaller and distance. The smaller method defines as poisonous the cluster with less number of data points, while the distance heuristic uses the distance between the clusters.

Returns:

(report, is_clean_lst): where a report is a dict object that contains information specified by the clustering analysis technique where is_clean is a list, where is_clean_lst[i]=1 means that x_train[i] there is clean and is_clean_lst[i]=0, means that x_train[i] was classified as poison.

evaluate_defence(is_clean: ndarray, **kwargs) str

If ground truth is known, this function returns a confusion matrix in the form of a JSON object.

Return type:

str

Parameters:
  • is_clean (ndarray) – Ground truth, where is_clean[i]=1 means that x_train[i] is clean and is_clean[i]=0 means x_train[i] is poisonous.

  • kwargs – A dictionary of defence-specific parameters.

Returns:

JSON object with confusion matrix.

exclusionary_reclassification(report: Dict[str, Any])

This function perform exclusionary reclassification. Based on the ex_re_threshold, suspicious clusters will be rechecked. If they remain suspicious, the suspected source class will be added to the report and the data will be relabelled. The new labels are stored in self.y_train_relabelled

Parameters:

report – A dictionary containing defence params as well as the class clusters and their suspiciousness.

Returns:

report where the report is a dict object

plot_clusters(save: bool = True, folder: str = '.', **kwargs) None

Creates a 3D-plot to visualize each cluster each cluster is assigned a different color in the plot. When save=True, it also stores the 3D-plot per cluster in art.config.ART_DATA_PATH.

Parameters:
  • save (bool) – Boolean specifying if image should be saved.

  • folder (str) – Directory where the sprites will be saved inside art.config.ART_DATA_PATH folder.

  • kwargs – a dictionary of cluster-analysis-specific parameters.

static relabel_poison_cross_validation(classifier: CLASSIFIER_NEURALNETWORK_TYPE, x: ndarray, y_fix: ndarray, n_splits: int = 10, tolerable_backdoor: float = 0.01, max_epochs: int = 50, batch_epochs: int = 10) Tuple[float, CLASSIFIER_NEURALNETWORK_TYPE]

Revert poison attack by continue training the current classifier with x, y_fix. n_splits determines the number of cross validation splits.

Parameters:
  • classifier – Classifier to be fixed.

  • x (ndarray) – Samples that were miss-labeled.

  • y_fix (ndarray) – True label of x.

  • n_splits (int) – Determines how many splits to use in cross validation (only used if cross_validation=True).

  • tolerable_backdoor (float) – Threshold that determines what is the maximum tolerable backdoor success rate.

  • max_epochs (int) – Maximum number of epochs that the model will be trained.

  • batch_epochs (int) – Number of epochs to be trained before checking current state of model.

Returns:

(improve_factor, classifier)

static relabel_poison_ground_truth(classifier: CLASSIFIER_NEURALNETWORK_TYPE, x: ndarray, y_fix: ndarray, test_set_split: float = 0.7, tolerable_backdoor: float = 0.01, max_epochs: int = 50, batch_epochs: int = 10) Tuple[float, CLASSIFIER_NEURALNETWORK_TYPE]

Revert poison attack by continue training the current classifier with x, y_fix. test_set_split determines the percentage in x that will be used as training set, while 1-test_set_split determines how many data points to use for test set.

Parameters:
  • classifier – Classifier to be fixed.

  • x (ndarray) – Samples.

  • y_fix (ndarray) – True label of x_poison.

  • test_set_split (float) – this parameter determine how much data goes to the training set. Here test_set_split*len(y_fix) determines the number of data points in x_train and (1-test_set_split) * len(y_fix) the number of data points in x_test.

  • tolerable_backdoor (float) – Threshold that determines what is the maximum tolerable backdoor success rate.

  • max_epochs (int) – Maximum number of epochs that the model will be trained.

  • batch_epochs (int) – Number of epochs to be trained before checking current state of model.

Returns:

(improve_factor, classifier).

visualize_clusters(x_raw: ndarray, save: bool = True, folder: str = '.', **kwargs) List[List[ndarray]]

This function creates the sprite/mosaic visualization for clusters. When save=True, it also stores a sprite (mosaic) per cluster in art.config.ART_DATA_PATH.

Parameters:
  • x_raw (ndarray) – Images used to train the classifier (before pre-processing).

  • save (bool) – Boolean specifying if image should be saved.

  • folder (str) – Directory where the sprites will be saved inside art.config.ART_DATA_PATH folder.

  • kwargs – a dictionary of cluster-analysis-specific parameters.

Returns:

Array with sprite images sprites_by_class, where sprites_by_class[i][j] contains the sprite of class i cluster j.

Data Provenance Defense

class art.defences.detector.poison.ProvenanceDefense(classifier: CLASSIFIER_TYPE, x_train: ndarray, y_train: ndarray, p_train: ndarray, x_val: ndarray | None = None, y_val: ndarray | None = None, eps: float = 0.2, perf_func: str = 'accuracy', pp_valid: float = 0.2)

Implements methods performing poisoning detection based on data provenance.

detect_poison(**kwargs) Tuple[Dict[int, float], List[int]]

Returns poison detected and a report.

Parameters:

kwargs – A dictionary of detection-specific parameters.

Returns:

(report, is_clean_lst): where a report is a dict object that contains information specified by the provenance detection method where is_clean is a list, where is_clean_lst[i]=1 means that x_train[i] there is clean and is_clean_lst[i]=0, means that x_train[i] was classified as poison.

Return type:

tuple

detect_poison_partially_trusted(**kwargs) Dict[int, float]

Detect poison given trusted validation data

Returns:

dictionary where keys are suspected poisonous device indices and values are performance differences

detect_poison_untrusted(**kwargs) Dict[int, float]

Detect poison given no trusted validation data

Returns:

dictionary where keys are suspected poisonous device indices and values are performance differences

evaluate_defence(is_clean: ndarray, **kwargs) str

Returns confusion matrix.

Return type:

str

Parameters:
  • is_clean (ndarray) – Ground truth, where is_clean[i]=1 means that x_train[i] is clean and is_clean[i]=0 means x_train[i] is poisonous.

  • kwargs – A dictionary of defence-specific parameters.

Returns:

JSON object with confusion matrix.

static filter_input(data: ndarray, labels: ndarray, segment: ndarray) Tuple[ndarray, ndarray]

Return the data and labels that are not part of a specified segment

Parameters:
  • data (ndarray) – The data to segment.

  • labels (ndarray) – The corresponding labels to segment

  • segment (ndarray) –

Returns:

Tuple of (filtered_data, filtered_labels).

Reject on Negative Impact (RONI) Defense

class art.defences.detector.poison.RONIDefense(classifier: CLASSIFIER_TYPE, x_train: ndarray, y_train: ndarray, x_val: ndarray, y_val: ndarray, perf_func: str | Callable = 'accuracy', pp_cal: float = 0.2, pp_quiz: float = 0.2, calibrated: bool = True, eps: float = 0.1)

Close implementation based on description in Nelson ‘Behavior of Machine Learning Algorithms in Adversarial Environments’ Ch. 4.4

detect_poison(**kwargs) Tuple[dict, List[int]]

Returns poison detected and a report.

Parameters:

kwargs – A dictionary of detection-specific parameters.

Returns:

(report, is_clean_lst): where a report is a dict object that contains information specified by the provenance detection method where is_clean is a list, where is_clean_lst[i]=1 means that x_train[i] there is clean and is_clean_lst[i]=0, means that x_train[i] was classified as poison.

evaluate_defence(is_clean: ndarray, **kwargs) str

Returns confusion matrix.

Return type:

str

Parameters:
  • is_clean (ndarray) – Ground truth, where is_clean[i]=1 means that x_train[i] is clean and is_clean[i]=0 means x_train[i] is poisonous.

  • kwargs – A dictionary of defence-specific parameters.

Returns:

JSON object with confusion matrix.

get_calibration_info(before_classifier: CLASSIFIER_TYPE) Tuple[float, float]

Calculate the median and standard deviation of the accuracy shifts caused by the calibration set.

Parameters:

before_classifier – The classifier trained without suspicious point.

Returns:

A tuple consisting of (median, std_dev).

is_suspicious(before_classifier: CLASSIFIER_TYPE, perf_shift: float) bool

Returns True if a given performance shift is suspicious

Return type:

bool

Parameters:
  • before_classifier – The classifier without untrusted data.

  • perf_shift (float) – A shift in performance.

Returns:

True if a given performance shift is suspicious, false otherwise.

Spectral Signature Defense

class art.defences.detector.poison.SpectralSignatureDefense(classifier: CLASSIFIER_NEURALNETWORK_TYPE, x_train: ndarray, y_train: ndarray, expected_pp_poison: float = 0.33, batch_size: int = 128, eps_multiplier: float = 1.5)

Method from Tran et al., 2018 performing poisoning detection based on Spectral Signatures

detect_poison(**kwargs) Tuple[dict, List[int]]

Returns poison detected and a report.

Returns:

(report, is_clean_lst): where a report is a dictionary containing the index as keys the outlier score of suspected poisons as values where is_clean is a list, where is_clean_lst[i]=1 means that x_train[i] there is clean and is_clean_lst[i]=0, means that x_train[i] was classified as poison.

evaluate_defence(is_clean: ndarray, **kwargs) str

If ground truth is known, this function returns a confusion matrix in the form of a JSON object.

Return type:

str

Parameters:
  • is_clean (ndarray) – Ground truth, where is_clean[i]=1 means that x_train[i] is clean and is_clean[i]=0 means x_train[i] is poisonous.

  • kwargs – A dictionary of defence-specific parameters.

Returns:

JSON object with confusion matrix.

static spectral_signature_scores(matrix_r: ndarray) ndarray
Return type:

ndarray

Parameters:

matrix_r (ndarray) – Matrix of feature representations.

Returns:

Outlier scores for each observation based on spectral signature.