art.defences.transformer.poisoning

Module implementing transformer-based defences against poisoning attacks.

Neural Cleanse

class art.defences.transformer.poisoning.NeuralCleanse(classifier: CLASSIFIER_TYPE)

Implementation of methods in Neural Cleanse: Identifying and Mitigating Backdoor Attacks in Neural Networks. Wang et al. (2019).

__call__(transformed_classifier: CLASSIFIER_TYPE, steps: int = 1000, init_cost: float = 0.001, norm: int | float = 2, learning_rate: float = 0.1, attack_success_threshold: float = 0.99, patience: int = 5, early_stop: bool = True, early_stop_threshold: float = 0.99, early_stop_patience: int = 10, cost_multiplier: float = 1.5, batch_size: int = 32) KerasNeuralCleanse

Returns an new classifier with implementation of methods in Neural Cleanse: Identifying and Mitigating Backdoor Attacks in Neural Networks. Wang et al. (2019).

Namely, the new classifier has a new method mitigate(). This can also affect the predict() function.

Return type:

KerasNeuralCleanse

Parameters:
  • transformed_classifier – An ART classifier

  • steps (int) – The maximum number of steps to run the Neural Cleanse optimization

  • init_cost (float) – The initial value for the cost tensor in the Neural Cleanse optimization

  • norm – The norm to use for the Neural Cleanse optimization, can be 1, 2, or np.inf

  • learning_rate (float) – The learning rate for the Neural Cleanse optimization

  • attack_success_threshold (float) – The threshold at which the generated backdoor is successful enough to stop the Neural Cleanse optimization

  • patience (int) – How long to wait for changing the cost multiplier in the Neural Cleanse optimization

  • early_stop (bool) – Whether or not to allow early stopping in the Neural Cleanse optimization

  • early_stop_threshold (float) – How close values need to come to max value to start counting early stop

  • early_stop_patience (int) – How long to wait to determine early stopping in the Neural Cleanse optimization

  • cost_multiplier (float) – How much to change the cost in the Neural Cleanse optimization

  • batch_size (int) – The batch size for optimizations in the Neural Cleanse optimization

__init__(classifier: CLASSIFIER_TYPE) None

Create an instance of the neural cleanse defence.

Parameters:

classifier – A trained classifier.

fit(x: ndarray, y: ndarray | None = None, **kwargs) None

No parameters to learn for this method; do nothing.

STRIP

class art.defences.transformer.poisoning.STRIP(classifier: CLASSIFIER_TYPE)

Implementation of STRIP: A Defence Against Trojan Attacks on Deep Neural Networks (Gao et. al. 2020)

__call__(num_samples: int = 20, false_acceptance_rate: float = 0.01) CLASSIFIER_TYPE

Create a STRIP defense

Parameters:
  • num_samples (int) – The number of samples to use to test entropy at inference time

  • false_acceptance_rate (float) – The percentage of acceptable false acceptance

__init__(classifier: CLASSIFIER_TYPE)

Create an instance of the neural cleanse defence.

Parameters:

classifier – A trained classifier.

fit(x: ndarray, y: ndarray | None = None, **kwargs) None

No parameters to learn for this method; do nothing.