Fantastic Data Poisoning Attacks, Hyperparameters and Where to Find Them
Lightning Talk
Machine Learning Security; Poisoning Attacks
In many situations, machine learning systems are trained on data collected from untrusted sources, such as humans, sensors, the Internet, or IoT devices, which can be compromised and manipulated. These scenarios expose machine learning models to data poisoning attacks, where attackers can manipulate a fraction of the training data to subvert the learning process, e.g., to increase the error or bias of the model.
In this talk, I will give an overview of data poisoning attacks and, in particular, optimal poisoning attacks, which are useful to evaluate the robustness of machine learning models in worst-case scenarios. In this regard, some of the previous attacks target models that have hyperparameters, but the hyperparameters are considered constant regardless of the fraction of poisoning points injected in the training dataset. This can provide a misleading analysis of the robustness of the algorithms against such attacks, as the value of the hyperparameters can change depending on the type and strength of the attack.
To overcome this, I will describe a novel formulation that considers the effect of the attack on the hyperparameters of the model, and, for the case of L2 regularisation, I will provide empirical evidence that learning the correct hyperparameters can help protect machine learning models against these attacks. This allows to formulate optimal attacks, learn hyperparameters and evaluate robustness under worst-case conditions. We apply this attack formulation to logistic regression and neural network classifiers. Our evaluation on multiple datasets shows that choosing an “a priori” constant value for the regularization hyperparameter can be detrimental to the performance of the algorithms. This confirms the limitations of previous strategies and evidences the benefits of using L2 regularization to dampen the effect of poisoning attacks, when hyperparameters are learned using a small trusted dataset.