art.defences.detector.poison

Poison detection defence API. Use the PoisonFilteringDefence wrapper to be able to apply a defence for a preexisting model.

Base Class

class art.defences.detector.poison.PoisonFilteringDefence(classifier, x_train: numpy.ndarray, y_train: numpy.ndarray)

Base class for all poison filtering defences.

abstract detect_poison(**kwargs) → Tuple[dict, List[int]]

Detect poison.

Return type

Tuple

Parameters

kwargs – Defence-specific parameters used by child classes.

Returns

Dictionary with report and list with items identified as poison.

abstract evaluate_defence(is_clean: numpy.ndarray, **kwargs) → str

Evaluate the defence given the labels specifying if the data is poisoned or not.

Return type

str

Parameters
  • is_clean (ndarray) – 1-D array where is_clean[i]=1 means x_train[i] is clean and is_clean[i]=0 that it’s poison.

  • kwargs – Defence-specific parameters used by child classes.

Returns

JSON object with confusion matrix.

get_params() → Dict[str, Any]

Returns dictionary of parameters used to run defence.

Returns

Dictionary of parameters of the method.

set_params(**kwargs) → None

Take in a dictionary of parameters and apply attack-specific checks before saving them as attributes.

Parameters

kwargs – A dictionary of defence-specific parameters.

Activation Defence

class art.defences.detector.poison.ActivationDefence(classifier: Classifier, x_train: Optional[numpy.ndarray], y_train: Optional[numpy.ndarray], generator: Optional[art.data_generators.DataGenerator] = None)

Method from Chen et al., 2018 performing poisoning detection based on activations clustering.

Please keep in mind the limitations of defences. For more information on the limitations of this defence, see https://arxiv.org/abs/1905.13409 . For details on how to evaluate classifier security in general, see https://arxiv.org/abs/1902.06705
analyze_clusters(**kwargs) → Tuple[Dict[str, Any], numpy.ndarray]

This function analyzes the clusters according to the provided method.

Return type

Tuple

Parameters

kwargs – A dictionary of cluster-analysis-specific parameters.

Returns

(report, assigned_clean_by_class), where the report is a dict object and assigned_clean_by_class is an array of arrays that contains what data points where classified as clean.

cluster_activations(**kwargs) → Tuple[List[List[int]], List[List[int]]]

Clusters activations and returns cluster_by_class and red_activations_by_class, where cluster_by_class[i][j] is the cluster to which the j-th datapoint in the ith class belongs and the correspondent activations reduced by class red_activations_by_class[i][j].

Return type

Tuple

Parameters

kwargs – A dictionary of cluster-specific parameters.

Returns

Clusters per class and activations by class.

detect_poison(**kwargs) → Tuple[Dict[str, Any], List[int]]

Returns poison detected and a report.

Return type

Tuple

Parameters
  • clustering_method (str) – clustering algorithm to be used. Currently KMeans is the only method supported

  • nb_clusters (int) – number of clusters to find. This value needs to be greater or equal to one

  • reduce (str) – method used to reduce dimensionality of the activations. Supported methods include PCA, FastICA and TSNE

  • nb_dims (int) – number of dimensions to be reduced

  • cluster_analysis (str) – heuristic to automatically determine if a cluster contains poisonous data. Supported methods include smaller and distance. The smaller method defines as poisonous the cluster with less number of data points, while the distance heuristic uses the distance between the clusters.

Returns

(report, is_clean_lst): where a report is a dict object that contains information specified by the clustering analysis technique where is_clean is a list, where is_clean_lst[i]=1 means that x_train[i] there is clean and is_clean_lst[i]=0, means that x_train[i] was classified as poison.

evaluate_defence(is_clean: numpy.ndarray, **kwargs) → str

If ground truth is known, this function returns a confusion matrix in the form of a JSON object.

Return type

str

Parameters
  • is_clean (ndarray) – Ground truth, where is_clean[i]=1 means that x_train[i] is clean and is_clean[i]=0 means x_train[i] is poisonous.

  • kwargs – A dictionary of defence-specific parameters.

Returns

JSON object with confusion matrix.

plot_clusters(save: bool = True, folder: str = '.', **kwargs) → None

Creates a 3D-plot to visualize each cluster each cluster is assigned a different color in the plot. When save=True, it also stores the 3D-plot per cluster in ART_DATA_PATH.

Parameters
  • save (bool) – Boolean specifying if image should be saved.

  • folder (str) – Directory where the sprites will be saved inside ART_DATA_PATH folder.

  • kwargs – a dictionary of cluster-analysis-specific parameters.

static relabel_poison_cross_validation(classifier: Classifier, x: numpy.ndarray, y_fix: numpy.ndarray, n_splits: int = 10, tolerable_backdoor: float = 0.01, max_epochs: int = 50, batch_epochs: int = 10) → Tuple[float, Classifier]

Revert poison attack by continue training the current classifier with x, y_fix. n_splits determines the number of cross validation splits.

Return type

Tuple

Parameters
  • classifier – Classifier to be fixed.

  • x (ndarray) – Samples that were miss-labeled.

  • y_fix (ndarray) – True label of x.

  • n_splits (int) – Determines how many splits to use in cross validation (only used if cross_validation=True).

  • tolerable_backdoor (float) – Threshold that determines what is the maximum tolerable backdoor success rate.

  • max_epochs (int) – Maximum number of epochs that the model will be trained.

  • batch_epochs (int) – Number of epochs to be trained before checking current state of model.

Returns

(improve_factor, classifier)

static relabel_poison_ground_truth(classifier: Classifier, x: numpy.ndarray, y_fix: numpy.ndarray, test_set_split: float = 0.7, tolerable_backdoor: float = 0.01, max_epochs: int = 50, batch_epochs: int = 10) → Tuple[float, Classifier]

Revert poison attack by continue training the current classifier with x, y_fix. test_set_split determines the percentage in x that will be used as training set, while 1-test_set_split determines how many data points to use for test set.

Return type

Tuple

Parameters
  • classifier – Classifier to be fixed.

  • x (ndarray) – Samples.

  • y_fix (ndarray) – True label of x_poison.

  • test_set_split (float) – this parameter determine how much data goes to the training set. Here test_set_split*len(y_fix) determines the number of data points in x_train and (1-test_set_split) * len(y_fix) the number of data points in x_test.

  • tolerable_backdoor (float) – Threshold that determines what is the maximum tolerable backdoor success rate.

  • max_epochs (int) – Maximum number of epochs that the model will be trained.

  • batch_epochs (int) – Number of epochs to be trained before checking current state of model.

Returns

(improve_factor, classifier).

visualize_clusters(x_raw: numpy.ndarray, save: bool = True, folder: str = '.', **kwargs) → List[List[List[numpy.ndarray]]]

This function creates the sprite/mosaic visualization for clusters. When save=True, it also stores a sprite (mosaic) per cluster in ART_DATA_PATH.

Return type

List

Parameters
  • x_raw (ndarray) – Images used to train the classifier (before pre-processing).

  • save (bool) – Boolean specifying if image should be saved.

  • folder (str) – Directory where the sprites will be saved inside ART_DATA_PATH folder.

  • kwargs – a dictionary of cluster-analysis-specific parameters.

Returns

Array with sprite images sprites_by_class, where sprites_by_class[i][j] contains the sprite of class i cluster j.

Data Provenance Defense

class art.defences.detector.poison.ProvenanceDefense(classifier: Classifier, x_train: numpy.ndarray, y_train: numpy.ndarray, p_train: numpy.ndarray, x_val: Optional[numpy.ndarray] = None, y_val: Optional[numpy.ndarray] = None, eps: float = 0.2, perf_func: str = 'accuracy', pp_valid: float = 0.2, **kwargs)

Implements methods performing poisoning detection based on data provenance.

detect_poison(**kwargs) → Tuple[dict, numpy.ndarray]

Returns poison detected and a report.

Parameters

kwargs – A dictionary of detection-specific parameters.

Returns

(report, is_clean_lst): where a report is a dict object that contains information specified by the provenance detection method where is_clean is a list, where is_clean_lst[i]=1 means that x_train[i] there is clean and is_clean_lst[i]=0, means that x_train[i] was classified as poison.

Return type

tuple

detect_poison_partially_trusted(**kwargs) → Dict[int, float]

Detect poison given trusted validation data

Returns

dictionary where keys are suspected poisonous device indices and values are performance differences

detect_poison_untrusted(**kwargs) → Dict[int, float]

Detect poison given no trusted validation data

Returns

dictionary where keys are suspected poisonous device indices and values are performance differences

evaluate_defence(is_clean: numpy.ndarray, **kwargs) → str

Returns confusion matrix.

Return type

str

Parameters
  • is_clean (ndarray) – Ground truth, where is_clean[i]=1 means that x_train[i] is clean and is_clean[i]=0 means x_train[i] is poisonous.

  • kwargs – A dictionary of defence-specific parameters.

Returns

JSON object with confusion matrix.

static filter_input(data: numpy.ndarray, labels: numpy.ndarray, segment: numpy.ndarray) → Tuple[numpy.ndarray, numpy.ndarray]

Return the data and labels that are not part of a specified segment

Return type

Tuple

Parameters
  • data (ndarray) – The data to segment.

  • labels (ndarray) – The corresponding labels to segment

  • segment (ndarray) –

Returns

Tuple of (filtered_data, filtered_labels).

Reject on Negative Impact (RONI) Defense

class art.defences.detector.poison.RONIDefense(classifier: Classifier, x_train: numpy.ndarray, y_train: numpy.ndarray, x_val: numpy.ndarray, y_val: numpy.ndarray, perf_func: Union[str, Callable] = 'accuracy', pp_cal: float = 0.2, pp_quiz: float = 0.2, calibrated: bool = True, eps: float = 0.1)

Close implementation based on description in Nelson ‘Behavior of Machine Learning Algorithms in Adversarial Environments’ Ch. 4.4

detect_poison(**kwargs) → Tuple[dict, List[int]]

Returns poison detected and a report.

Return type

Tuple

Parameters

kwargs – A dictionary of detection-specific parameters.

Returns

(report, is_clean_lst): where a report is a dict object that contains information specified by the provenance detection method where is_clean is a list, where is_clean_lst[i]=1 means that x_train[i] there is clean and is_clean_lst[i]=0, means that x_train[i] was classified as poison.

evaluate_defence(is_clean: numpy.ndarray, **kwargs) → str

Returns confusion matrix.

Return type

str

Parameters
  • is_clean (ndarray) – Ground truth, where is_clean[i]=1 means that x_train[i] is clean and is_clean[i]=0 means x_train[i] is poisonous.

  • kwargs – A dictionary of defence-specific parameters.

Returns

JSON object with confusion matrix.

get_calibration_info(before_classifier: Classifier) → Tuple[numpy.ndarray, numpy.ndarray]

Calculate the median and standard deviation of the accuracy shifts caused by the calibration set.

Return type

Tuple

Parameters

before_classifier – The classifier trained without suspicious point.

Returns

A tuple consisting of (median, std_dev).

is_suspicious(before_classifier: Classifier, perf_shift: float) → bool

Returns True if a given performance shift is suspicious

Return type

bool

Parameters
  • before_classifier – The classifier without untrusted data.

  • perf_shift (float) – A shift in performance.

Returns

True if a given performance shift is suspicious, false otherwise.

Spectral Signature Defense

class art.defences.detector.poison.SpectralSignatureDefense(classifier: Classifier, x_train: numpy.ndarray, y_train: numpy.ndarray, batch_size: int, eps_multiplier: float, ub_pct_poison, nb_classes: int)

Method from Tran et al., 2018 performing poisoning detection based on Spectral Signatures

detect_poison(**kwargs) → Tuple[dict, List[int]]

Returns poison detected and a report.

Returns

(report, is_clean_lst): where a report is a dictionary containing the index as keys the outlier score of suspected poisons as values where is_clean is a list, where is_clean_lst[i]=1 means that x_train[i] there is clean and is_clean_lst[i]=0, means that x_train[i] was classified as poison.

evaluate_defence(is_clean: numpy.ndarray, **kwargs) → str

If ground truth is known, this function returns a confusion matrix in the form of a JSON object.

Return type

str

Parameters
  • is_clean (ndarray) – Ground truth, where is_clean[i]=1 means that x_train[i] is clean and is_clean[i]=0 means x_train[i] is poisonous.

  • kwargs – A dictionary of defence-specific parameters.

Returns

JSON object with confusion matrix.

static spectral_signature_scores(matrix_r: numpy.ndarray) → numpy.ndarray
Return type

ndarray

Parameters

matrix_r (ndarray) – Matrix of feature representations.

Returns

Outlier scores for each observation based on spectral signature.

static split_by_class(data: numpy.ndarray, labels: numpy.ndarray, num_classes: int) → List[numpy.ndarray]
Return type

List

Parameters
  • data (ndarray) – Features.

  • labels (ndarray) – Labels, not in one-hot representations.

  • num_classes (int) – Number of classes of labels.

Returns

List of numpy arrays of features split by labels.