Module implementing transformer-based defences against poisoning attacks.
Implementation of methods in Neural Cleanse: Identifying and Mitigating Backdoor Attacks in Neural Networks. Wang et al. (2019).
__call__(transformed_classifier: CLASSIFIER_TYPE, steps: int = 1000, init_cost: float = 0.001, norm: Union[int, float] = 2, learning_rate: float = 0.1, attack_success_threshold: float = 0.99, patience: int = 5, early_stop: bool = True, early_stop_threshold: float = 0.99, early_stop_patience: int = 10, cost_multiplier: float = 1.5, batch_size: int = 32) → art.estimators.certification.neural_cleanse.keras.KerasNeuralCleanse¶
Returns an new classifier with implementation of methods in Neural Cleanse: Identifying and Mitigating Backdoor Attacks in Neural Networks. Wang et al. (2019).
Namely, the new classifier has a new method mitigate(). This can also affect the predict() function.
- Return type
transformed_classifier – An ART classifier
int) – The maximum number of steps to run the Neural Cleanse optimization
float) – The initial value for the cost tensor in the Neural Cleanse optimization
norm – The norm to use for the Neural Cleanse optimization, can be 1, 2, or np.inf
float) – The learning rate for the Neural Cleanse optimization
float) – The threshold at which the generated backdoor is successful enough to stop the Neural Cleanse optimization
int) – How long to wait for changing the cost multiplier in the Neural Cleanse optimization
bool) – Whether or not to allow early stopping in the Neural Cleanse optimization
float) – How close values need to come to max value to start counting early stop
int) – How long to wait to determine early stopping in the Neural Cleanse optimization
float) – How much to change the cost in the Neural Cleanse optimization
int) – The batch size for optimizations in the Neural Cleanse optimization
__init__(classifier: CLASSIFIER_TYPE) → None¶
Create an instance of the neural cleanse defence.
classifier – A trained classifier.
fit(x: numpy.ndarray, y: Optional[numpy.ndarray] = None, **kwargs) → None¶
No parameters to learn for this method; do nothing.