art.defences.transformer.poisoning
¶
Module implementing transformer-based defences against poisoning attacks.
Neural Cleanse¶
- class art.defences.transformer.poisoning.NeuralCleanse(classifier: CLASSIFIER_TYPE)¶
Implementation of methods in Neural Cleanse: Identifying and Mitigating Backdoor Attacks in Neural Networks. Wang et al. (2019).
- __call__(transformed_classifier: CLASSIFIER_TYPE, steps: int = 1000, init_cost: float = 0.001, norm: Union[int, float] = 2, learning_rate: float = 0.1, attack_success_threshold: float = 0.99, patience: int = 5, early_stop: bool = True, early_stop_threshold: float = 0.99, early_stop_patience: int = 10, cost_multiplier: float = 1.5, batch_size: int = 32) KerasNeuralCleanse ¶
Returns an new classifier with implementation of methods in Neural Cleanse: Identifying and Mitigating Backdoor Attacks in Neural Networks. Wang et al. (2019).
Namely, the new classifier has a new method mitigate(). This can also affect the predict() function.
- Return type:
- Parameters:
transformed_classifier – An ART classifier
steps (
int
) – The maximum number of steps to run the Neural Cleanse optimizationinit_cost (
float
) – The initial value for the cost tensor in the Neural Cleanse optimizationnorm – The norm to use for the Neural Cleanse optimization, can be 1, 2, or np.inf
learning_rate (
float
) – The learning rate for the Neural Cleanse optimizationattack_success_threshold (
float
) – The threshold at which the generated backdoor is successful enough to stop the Neural Cleanse optimizationpatience (
int
) – How long to wait for changing the cost multiplier in the Neural Cleanse optimizationearly_stop (
bool
) – Whether or not to allow early stopping in the Neural Cleanse optimizationearly_stop_threshold (
float
) – How close values need to come to max value to start counting early stopearly_stop_patience (
int
) – How long to wait to determine early stopping in the Neural Cleanse optimizationcost_multiplier (
float
) – How much to change the cost in the Neural Cleanse optimizationbatch_size (
int
) – The batch size for optimizations in the Neural Cleanse optimization
- __init__(classifier: CLASSIFIER_TYPE) None ¶
Create an instance of the neural cleanse defence.
- Parameters:
classifier – A trained classifier.
- fit(x: ndarray, y: Optional[ndarray] = None, **kwargs) None ¶
No parameters to learn for this method; do nothing.
STRIP¶
- class art.defences.transformer.poisoning.STRIP(classifier: CLASSIFIER_TYPE)¶
Implementation of STRIP: A Defence Against Trojan Attacks on Deep Neural Networks (Gao et. al. 2020)
Paper link: https://arxiv.org/abs/1902.06531- __call__(num_samples: int = 20, false_acceptance_rate: float = 0.01) ClassifierWithStrip ¶
Create a STRIP defense
- Parameters:
num_samples (
int
) – The number of samples to use to test entropy at inference timefalse_acceptance_rate (
float
) – The percentage of acceptable false acceptance
- __init__(classifier: CLASSIFIER_TYPE)¶
Create an instance of the neural cleanse defence.
- Parameters:
classifier – A trained classifier.
- fit(x: ndarray, y: Optional[ndarray] = None, **kwargs) None ¶
No parameters to learn for this method; do nothing.