`art.defences.transformer.poisoning`¶

Module implementing transformer-based defences against poisoning attacks.

Neural Cleanse¶

class art.defences.transformer.poisoning.NeuralCleanse(classifier: CLASSIFIER_TYPE)¶

Implementation of methods in Neural Cleanse: Identifying and Mitigating Backdoor Attacks in Neural Networks. Wang et al. (2019).

Paper link: https://people.cs.uchicago.edu/~ravenben/publications/pdf/backdoor-sp19.pdf

__call__(transformed_classifier: CLASSIFIER_TYPE, steps: int = 1000, init_cost: float = 0.001, norm: int | float = 2, learning_rate: float = 0.1, attack_success_threshold: float = 0.99, patience: int = 5, early_stop: bool = True, early_stop_threshold: float = 0.99, early_stop_patience: int = 10, cost_multiplier: float = 1.5, batch_size: int = 32) → KerasNeuralCleanse¶

Returns an new classifier with implementation of methods in Neural Cleanse: Identifying and Mitigating Backdoor Attacks in Neural Networks. Wang et al. (2019).

Namely, the new classifier has a new method mitigate(). This can also affect the predict() function.

Paper link: https://people.cs.uchicago.edu/~ravenben/publications/pdf/backdoor-sp19.pdf

Return type:

KerasNeuralCleanse

Parameters:

transformed_classifier – An ART classifier
steps (int) – The maximum number of steps to run the Neural Cleanse optimization
init_cost (float) – The initial value for the cost tensor in the Neural Cleanse optimization
norm – The norm to use for the Neural Cleanse optimization, can be 1, 2, or np.inf
learning_rate (float) – The learning rate for the Neural Cleanse optimization
attack_success_threshold (float) – The threshold at which the generated backdoor is successful enough to stop the Neural Cleanse optimization
patience (int) – How long to wait for changing the cost multiplier in the Neural Cleanse optimization
early_stop (bool) – Whether or not to allow early stopping in the Neural Cleanse optimization
early_stop_threshold (float) – How close values need to come to max value to start counting early stop
early_stop_patience (int) – How long to wait to determine early stopping in the Neural Cleanse optimization
cost_multiplier (float) – How much to change the cost in the Neural Cleanse optimization
batch_size (int) – The batch size for optimizations in the Neural Cleanse optimization

__init__(classifier: CLASSIFIER_TYPE) → None¶

Create an instance of the neural cleanse defence.

Parameters:: classifier – A trained classifier.

fit(x: ndarray, y: ndarray | None = None, **kwargs) → None¶: No parameters to learn for this method; do nothing.

STRIP¶

class art.defences.transformer.poisoning.STRIP(classifier: CLASSIFIER_TYPE)¶

Implementation of STRIP: A Defence Against Trojan Attacks on Deep Neural Networks (Gao et. al. 2020)

Paper link: https://arxiv.org/abs/1902.06531

__call__(num_samples: int = 20, false_acceptance_rate: float = 0.01) → CLASSIFIER_TYPE¶

Create a STRIP defense

Parameters:

num_samples (int) – The number of samples to use to test entropy at inference time
false_acceptance_rate (float) – The percentage of acceptable false acceptance

__init__(classifier: CLASSIFIER_TYPE)¶

Create an instance of the neural cleanse defence.

Parameters:: classifier – A trained classifier.

fit(x: ndarray, y: ndarray | None = None, **kwargs) → None¶: No parameters to learn for this method; do nothing.

`art.defences.transformer.poisoning`¶

Neural Cleanse¶

STRIP¶

Adversarial Robustness Toolbox

Navigation

Related Topics

art.defences.transformer.poisoning¶

Neural Cleanse¶

STRIP¶

`art.defences.transformer.poisoning`¶