--
NOTE: THIS ARTICLE IS INCOMPLETE AND NEEDS REVIEWED
--
Hi everyone! It's been some time since my last post. In this one, I wanted to continue on the Direct Memory Access (DMA) train of thought and cover something much more complex than simple pointer swapping.
When using a DMA device, the attacker is limited to reading and writing memory, and for most cases that is all we need. However, there exists some circumstances where reading and writing memory cannot achieve what we want. Two quick examples of this:
- If we want to send a network message
- In games using guarded pointers such as valorant. (see this post)
In these situations, a DMA card appears to be nearly useless. What we'd need is a way to execute code on the victim's machine and retrieve the result of that code.
--
To achieve this, I have come up with a solution that combines exploiting unused .text and .data sections along with a simple hook on a function called every tick.
Let's go through the requirements of my hack. First some terminology:
- Attacker > Attacker Machine. This is hooked up to our DMA card and runs the DMA application.
- Victim > Victim Machine. This machine has the DMA PCIe card plugged in and is what we are reading/writing to.
- Victim Application > Software running on the Victim Machine that we want to execute code inside of.
- DMA Application > Software running on the Attacker Machine that controls our DMA card.
In order to execute code in a function, such as running malloc, we first need three things.
- unused .data section buffer that is 0x21 bytes in length or more
- unused or inaccessible .text section buffer that can fit our payload
- 5 byte call instruction in a function that runs every frame/tick or often
I created an example application that meets these requirements.
First, we need to find a buffer in .data that is unused. We want something that looks like this:
Between data_140005018 and data_140005048 there exists 0x28 unused bytes. This is large enough to fit our 0x21 byte requirement.
Next, we need to find a buffer in .text that is inaccessible or unused. This buffer needs to be fairly large, so we'll want to find some large code segment behind an always failing if statement.
If we go digging, we'll find this function's IF statement never passes and there exists 0x3AC bytes in the statement that we can use as a buffer:
Lastly, we need to find a 5-byte call instruction that is run often or every tick. Here is a thread that runs every ms with a call:
Looking at the disassembly, we can happily say this call is 5 bytes in size:
Now we need to figure out how to put together these unused sections to build code execution. The first step, as I see it, is to hijack the call instruction and change the program flow into our .text buffer.
To do this, we overwrite the call instruction as a jmp instruction and jump into our .text buffer:
becomes
Now, in our .text buffer, we recreate the call that was overwritten:
becomes
Lastly, we must jump back to the instruction proceeding the call that we overwrote:
Now we can test that by either patching our victim application or writing those bytes with our DMA card. If we were successful, the victim application should function no different and should not crash.
After learning how to hijack our call instruction, we can start to build a payload of code to insert into the .text buffer that enables arbitrary function calls.
The first part of this is to determine what data the DMA card must provide in order to make an arbitrary call. I created a struct of 0x21 bytes that meets these requirements:
First, we need a byte to act as a switch. This way our target function only executes when our DMA card flips this value to TRUE.
We need the absolute address of the function we want to call.
We need a pointer to a structure containing all of our arguments.
We need a buffer for the result to be written into.
Lastly, we need 8 more bytes to use as buffer space for functions that require 1 argument (I'll explain why later).
With this struct defined, we can start to build a payload for the .text section. We'll need to know some x86_x64 assembly and the x64 software conventions on windows.
The first thing we must do is store all of our volatile registers and other registers we will use on the stack.
Next, we will access the .data section and check if the boolean toggle is true. If the value is FALSE, then we need to jump to the end of our .text buffer, but before we restore the stack (covered shortly).
Now, to be safe, we can check if the target function is NULL. If it is, we can't run it, so we'll want to jump to the end of our .text buffer, just like before.
Finally, we'll move the function we want to execute onto the RAX register.
Note: At this point, we could call our function. However, we want to be able to run functions with arguments. So I'll continue as if we want to call a function with 5 arguments to showcase all of the steps necessary.
To call a function with arguments, we need to understand the x64 calling convention. In short: the first four arguments go onto the registers rcx, rdx, r8, and r9 respectively, while all subsequent arguments are pushed onto the stack in reverse order. Floating point values are handled differently and I decided not to cover them.
The first step when calling a function with arguments is to ensure our pointer to the argument structure is nonnull.
Now we must push our first four arguments onto the correct registers.
For all remaining arguments, we'll want to load them into a register and push their value onto the stack in reverse order. In our case, we only have 1 more argument, so we only need to do this once for the value at 0x20.
Finally! We can call the function.
After calling the function, we need to pop the arguments we pushed to the stack.
lastly, we push the return result to the .data structure and flip the boolean toggle value back to false (to prevent running this function a second time).
Now we have a .text payload that can execute any function with 5 arguments! Here is the final assembly code:
We have one more pressing concern. How do we allocate memory to store all 5 arguments? What if we need more than 5 arguments?
To do this, I propose using a function like malloc. This function takes one argument that is 8 bytes in length.
This is why we need an extra 8 bytes at the end of our structure for functions with 1 argument. When malloc is called, it is assumed that we have no extra working memory, so we must use some buffer in .data for our argument.
Now we need to start building our DMA application. To start, we need to find the RVA values of the three requirements. We'll call these dataAddress, callHijack, and textBuffer
In our target application, these three values exist at the offsets 0x5020, 0x1bbd, and 0x1713 respectively.
So our DMA application will start to look something like this:
I have built a neat framework around my DMA application. I don't expect this to be pertinent moving forward so you can take some of the code for granted.
I created a class called FunctionRunner. This will implement a very simple interface for executing code in the target application. That interface looks like:
Sorry for the abrupt end - I wanted to unlock this so new readers can learn a bit about some more interesting uses for DMA.