Towards Optimal Timing in Cyber Defense: A Game Theoretic
and Learning Approach
Personnel
- Principal Investigator: Zizhan Zheng
- Graduate Student: Henger Li
Goals
The goal of this project is to develop a unified game theoretic and learning approach to tackle the fundamental challenges in achieving adaptive timing of security updates in the face of advanced attacks, which will be achieved by conducting reseach on three inter-related tasks:- Task 1: Optimal Timing with Partial Feedback: We will develop game models and online learning algorithms that can help a defender with very limited prior knowledge learn its optimal timing strategy from the imperfect feedback obtained during the game. We will consider two cases: when the system is compromised frequently and when the system is under continuous attacks but is compromised only occasionally.
- Task 2: Optimal Timing against Advanced Attacks: We will develop adversarial learning algorithms to achieve optimal timing when the attacker is highly adaptive. Moreover, we will extend our game model from the single node setting to multiple interdependent nodes and develop learning-based security update strategies to thwart coordinated attacks.
- Task 3: Metrics and Validation: We will develop new metrics for comparing game theoretic and learning-based cyber-defense solutions. Further, we will conduct simulations and experiments and collect real data-sets to validate the game models and learning algorithms we develop.
Activities and Findings
We have developed online learning algorithms for optimal timing of security updates under partial feedback (Task 1). Our first algorithm was based on the improved UCB algorithm, which has a very good theoretic performance bound but converges rather slowly in practice. We then designed a new algorithm based on the technique of Thompson sampling, where the next defense period is chosen based on the posterior distribution of attack times observed in previous rounds. The main novelty of our algorithm is to incorporate side observations into Thompson sampling to accelerate learning. We have derived new performance bounds for Thompson sampling in a general graphical bandit setting. We observed that when adapted to our optimal timing problem, Thompson sampling achieves superb empirical performance.
We have developed a new proactive defense approach to combat advanced attacks (Task 2). The new approach is based on the technique of moving target defense (MTD), where the defender dynamically updates the system configuration to increase the attacker’s uncertainty. MTD can be viewed as a generalization of periodic security updates. That is, in addition to deciding when to update system configuration, the defender should also determine what new configuration to use. Although MTD has been successfully applied to various domains, there has been very little research on quantifying the effectiveness of MTD. Further, the timing decisons have been largely ignored in previous studies. We have developed a Markovian modeling of MTD and are currently working on incorporating timing decisions into MTD.
In addition to numerical studies, we have implemented a prototype of our security game framework, which will be posted to Amazon Mechanical Turk to invite human players (“workers”) to play the games. This experiment will help us collect real data needed to validate our algorithms (Task 3). Although behavioral studies on games of timing have been conducted before, previous work largely ignored the stealthy behavior of adversaries and their resource constraints, both of which are captured by our models. We are currently working on improving the user interface and adding more strategies to the games and studying how to incentivize human workers to participate.
Publications
- Analysis
of Thompson Sampling for Graphical Bandits Without the Graphs.
Fang Liu, Zizhan Zheng, and Ness B. Shroff; Conference on Uncertainty in Artificial Intelligence (UAI), 2018. - A
Stackelberg Game and Markov Modeling of Moving Target Defense.
Xiaotao Feng, Zizhan Zheng, Derya Cansever, and Prasant Mohapatra; Decision and Game Theory for Security (GameSec), 2017. - When
to Reset Your Keys: Optimal Timing of Security Updates via
Learning.
Zizhan Zheng, Ness B. Shroff, and Prasant Mohapatra; AAAI Conference on Artificial Intelligence (AAAI) 2017.