This is a paper published in ACSAC 2018. In this paper, the author introduces a method to automatically generate exploitation primitives during the web browser exploitation. In this paper, the author uses CVE-2016-9079 as an example to demonstrate their work, which happens to be analysed in my post before.
In this post, I’d like to take this chance to give some academic definitions about those jargons in our previous posts. One interesting thing about this paper is that the paper is put into the session of web security not software security during the conference. ~~~OO~~~
In the process of web browser exploitation, generating exploitation primitives is a cornerstone in successful exploitation.
In this paper, the authors assume the presence of a memory corruption vulnerability that can be triggered by the attacker. The bug is not prepared and provides no useful primitive. However, a heap spray exists to provide changeable, but still unusable memory contents. Furthermore, we assume that only the crashing input and the initial point of control is known to the attacker,
e.g., a CPU register is controlled.
To put the assumption in another way, the whole work starts from a crashing point where the victim register value has been changed to 0x41414141 via a heap spray.
From an exploitation perspective, a small vulnerability testcase (VUT) ideally triggers the bug and crashes the target process in a deterministic way. The authors define a VUT to be a user-controlled input, which provides a first and basic control point in the program flow.
After control flow is already illegitimately influenced by attacker. An action of the attacker’s choice should be exercised next. The authors name the program point where this specific action takes place attacker sink.
The execution flow, starting at the control point and eventually landing in the attacker sink, is called exploitation primitive.
In this paper, the authors propose three exploitation primitives: (1) Write-Where, (2) Write-What-Where, (3) Control over Instruction Pointer.
The main automation task we accomplish is to generate JS code which triggers an exploitation primitive, i. e., the execution of the program path between the control point and the attacker sink. The generated JS/HTML files based on VUTs to perform the intended exploitation primitive are called exploitation primitive triggers (EPT).
The authors use the following picture to demonstrate the target of the whole system.
The goal of the automation is to start from the crashing point at 0x107a00d4 to reach the control flow hijacking point (sink point) at 0x101c0cb8.
The prototype is split into two main phases consisting of several components.
Each function is lifted into an intermediate language (IL) which is, according to its CFG, transformed into static single assignment (SSA) form. Furthermore, collect data such as function entries, register uses/definitions, memory reads/writes, and control-flow information.
The authors use a Datalog-based approach to follow a path of controlled data beyond the control point. This can be seen as a lightweight static taint analysis. After having determined the locations of a control point, they start the analysis to find reachable sinks.
There are properties of interest that describe load and store operations which we express through facts and store them into their corresponding fact database. Part of the facts are listed in the table below:
Chapter 3.2, 3.3 and 3.4 in this paper describe how these rules are applied to find an EPT with great details. It is recommended to read the original paper for a deep understanding of the whole system.
Each satisfiable path needs to undergo a verification process. The authors use a dump that is acquired at the time where we hit the control point. Usually this is the moment where, for instance, the heap spray has already occurred. Then the authors mimic the process of different heap spray routines by setting the memory according to our memory maps. In an emulation
process we examine if our memory settings drive the execution into the desired primitive. Paths that do not fulfil this property are filtered out.
The core of the system consists of 44,400 lines of Python code and 2,600 lines of Datalog code. This is an awesome amount of work compared to my own…
The following table gives the evaluation result of the system
This paper describes an automatic system to generate exploitation primitives. However, from the perspective of a pwner, I think there are still many things to do in future:
1) The successful generation of an exploitation primitive relies on some memory massaging techniques, e.g. heap spray, to achieve controlling register value at crashing point.
2) The system only applies to 32-bit binary only. From my experience on CVE-2016-9079 on 64-bit, the original heap spray code does not work while the exploitation assumptions may change.