SAS

SAS CTF and the many ways to persist a kernel shellcode on Windows 7

On May 18, 2024, Kaspersky’s Global Research & Analysis Team (GReAT), with the help of its partners, held the qualifying stage of the SAS CTF, an international competition of cybersecurity experts held as part of the Security Analyst Summit conference. More than 800 teams from all over the world took part in the event, solving challenges based on real cases that Kaspersky GReAT encountered in its work, but a couple of challenges remained unsolved. One of those challenges was based on a security issue that allows kernel shellcode to be hidden in the system registry and executed during system boot on a fully updated Windows 7/Windows Server 2008 R2 due to an incomplete fix for the CVE-2010-4398 vulnerability. Although security updates and technical support for Windows 7 ended in early 2020, the fact that the released patch only partially addressed the issue was known long before that, and we saw this flaw exploited in a targeted attack in 2018. At the time, we notified Microsoft about the in-the-wild exploitation, but Microsoft refused to address it because using this technique requires attackers to have administrator privileges. In this blog post, we will provide technical details about this flaw and the SAS CTF task based on it.

Vulnerability details

There is a design flaw in older versions of Windows operating systems (Windows NT 4.0 through Windows 7) that allows a kernel shellcode to persist and be launched at system boot by writing specially crafted data to some of the many locations in the system registry.

Windows Kernel API has a function called RtlQueryRegistryValues that can be used to query multiple values from the registry subtree with a single call.

RtlQueryRegistryValues syntax

The values to be queried by this function are defined by the QueryTable parameter, which contains a pointer to a table consisting of _RTL_QUERY_REGISTRY_TABLE structures.

_RTL_QUERY_REGISTRY_TABLE structure definition

Each table entry defines the name of the value to query, its default type (e.g., REG_NONE, REG_BINARY, REG_DWORD, REG_SZ etc.; this is optional) and default data, the address of the buffer to store the value or the address of the callback function, and flags that control how to query this value.

One of the supported flags, RTL_QUERY_REGISTRY_DIRECT, causes RtlQueryRegistryValues ​​not to execute a callback function (pointed to by the entry’s QueryRoutine field), but to store the queried value directly to the provided buffer (pointed to by the entry’s EntryContext field).

While writing data directly to the provided buffer instead of executing a callback may be more convenient, it leads to unexpected consequences if the requested value in the registry is for some reason of an unexpected type. For instance, if the code expects a value of type REG_DWORD, which has a fixed size of four bytes, but receives a value of type REG_BINARY, which is variable in size, the value may not fit fully into the prepared buffer. As a result, if RtlQueryRegistryValues returns more data than the calling function expected, a buffer overflow occurs that can be easily exploited on Windows 7 and older systems because of the lack of stack cookies.

To address this issue, Microsoft has implemented and encouraged developers to use an additional flag, RTL_QUERY_REGISTRY_TYPECHECK, which is intended to be used in conjunction with the RTL_QUERY_REGISTRY_DIRECT flag to check that the type of the requested value matches the type expected by the caller.

Note from RtlQueryRegistryValues documentation

Note from RtlQueryRegistryValues documentation

However, this is by no means a complete fix, and for Windows 7 Microsoft itself started using the new flag only where it was absolutely necessary to address possible privilege escalation vulnerabilities. As for the vulnerable registry/code paths that could be accessed with admin rights, they were not patched, giving attackers the opportunity to stealthily store and execute kernel shellcode.

In one of the attacks, we observed an APT actor using two DirectX drivers for exploitation – “dxgmms1.sys” and “dxgkrnl.sys” – but a quick look revealed about a dozen vulnerable drivers included in the Windows 7/Windows Server 2008 R2 base package.

Exploitation

To execute kernel shellcode, attackers exploit multiple stack buffer overflows in two drivers using the RtlQueryRegistryValues function. This is done in two stages.

In the first stage, attackers exploit the insecure use of the RtlQueryRegistryValues function in the “dxgmms1.sys” driver. The vulnerable code queries several registry values from the path “HKLM\SYSTEM\ControlSet001\Control\GraphicsDrivers\MemoryManager”, and making these registry entries bigger than expected results in several buffer overflows. Attackers can use this to write the shellcode to a fixed location in the kernel memory at the address 0xfffff78000000800, which is an address of the KUSER_SHARED_DATA structure + 0x800.

Exploitation of "dxgmms1.sys" driver

Exploitation of “dxgmms1.sys” driver

In the second stage, attackers exploit the insecure use of the RtlQueryRegistryValues function in the “dxgkrnl.sys” driver – the registry values ​​used by the vulnerable code are located at “HKLM\SYSTEM\ControlSet001\Control\GraphicsDrivers”. This allows attackers to overwrite the return address of one of the called functions with an address of 0xfffff78000000800, resulting in the execution of the shellcode written in the first stage of exploitation.

Exploitation of " dxgkrnl.sys" driver

Exploitation of ” dxgkrnl.sys” driver

All registry values ​​used during exploitation are expected to be of type REG_DWORD, but the attackers have set them to malicious values ​​of type REG_SZ/REG_BINARY. Since the SYSTEM hive is explicitly trusted, the data type mismatch is ignored and this results in successful exploitation.

The SAS CTF challenge

The beginning

You are presented with a README.txt note and three other files:

The SOFTWARE and SYSTEM files are what they are supposed to be, and are the registry hives of a Windows system.

Now, our first goal would be to find the piece of registry that is causing the VM to crash. This can be done in several ways, such as trying to find a piece of executable code in the registry hives (there is a NOP sled at offset 0x92D675 in the SYSTEM hive). But let’s try to reproduce the crash instead.

Identifying the VM and the OS

We are going to use regipy to parse and dump the registry hives. By dumping the SYSTEM hive, we can easily recognize the VirtualBox devices:

Just to be sure, we can even find the right version of the VirtualBox additions package, which is 6.1.46:

We also can identify the exact Windows build to run, which turns out to be Windows 7 SP1 x64:

Now let’s grab a Windows 7 SP1 VM or install a fresh one in a VirtualBox VM. While the VM is booting, let’s also build a timeline of the registry hive that we may need later:

Now download your favorite Live CD (for example, a vanilla Ubuntu Desktop ISO that we’ll boot to transplant the registry hives into the Windows system).

Install the VirtualBox guest additions from the official ISO to match what was installed in the original system. The clues in the README note (video driver!), the list of installed drivers and the shimcache (try “regipy-plugins-run -p shimcache -o output.txt SYSTEM && cat output.txt”, it will mention running dxdiag.exe) suggest that the system should be configured with Direct3D support, and this is crucial to triggering the exploit.

Once installed, “dxdiag.exe” should show “Enabled” for Direct3D on the VM:

Set up the debugger

Before we continue, let’s turn on kernel debugging inside the VM. Since we know there should be a BSOD, we will need it. You can also do this later by backing up the original registry hives to boot into the system and run the proper commands.

We will also set up a second Windows VM with the Windows Debugger and connect it to our target VM using a pipe-based virtual COM port. Start WinDbg on the debugger system (“Kernel Debug”), reboot the VM and you should see the kernel debugger connect. If not, check the COM port connection between the machines. It is also possible to use the host machine to run the debugger.

Crash!

Once it is working, replace the SOFTWARE and SYSTEM hives. Back up the original files, copy the hives (drag and drop, or via a share) to the VM and reboot into a Live CD, mount the NTFS volume, then copy the hives to “mountpoint/Windows/System32/config/”. Reboot and you should get an infinite BSOD loop/connection to the debugger.

Without the debugger it looks like this:

With the debugger, WinDbg output looks like this:

Analyzing the crash

We need to investigate this crash. Now, we can either extract the crash dump and inspect it offline, or debug live with our debugger machine (host, or a second VM) – let’s continue with the latter course. Make sure you can download the correct symbols, set up the symbol path, and execute “.reload /f” in WinDbg to force the download.

By inspecting the addresses on the stack around the stack pointer we can find an address inside “dxgkrnl”:

Further on in the stack we see the return addresses from nt!ObCreateObject:

Now we have a choice: either analyze the vulnerability in dxgkrnl and dxgmms1 until we understand exactly what is happening, or take a more hacky route, guided by the task note (“I tried to fix the registry but now it bluescreens all the time”):

  • check the memory around the crash pointer. At the address +0x800 from the crash site you can clearly see a shellcode that doesn’t belong to any module and can be analyzed;
  • search for the crash pointer address in the registry, using the timeline we generated and looking for “recent” changes.

Nothing. Let’s reverse the byte order (it may be a binary string, little endian):

Now there are three registry values that contain the crash address (“TdrDdiDelay”, “TdrDebugMode” and “TdrLevel”), and these *could* be controlling the return address of a dxgkrnl driver’s routine. Let’s see if the location of the shellcode can also be found in the registry:

Indeed, the “NbDmaBufferLimitPerDevice” value contains exactly that address. We suggest actually analyzing the drivers and finding out where exactly these registry values come into action and control the creation of a UNICODE_STRING object that then leads to a memory copy to a fixed kernel address and then a return to that address.

Just to check this, let’s edit the registry values “TdrDdiDelay”, “TdrDebugMode” and “TdrLevel” in the key “\\ControlSet001\\Control\\GraphicsDrivers” and change the sequence “0000000080f7ffff” to “0008000080f7ffff”. This can be done, for example, by temporarily restoring the original “SYSTEM” hive (Live CD), booting the system, editing the “malicious” hive with regedit (“Load hive”) and then moving it back (Live CD).

Now we can set a breakpoint at the beginning of the shellcode and get a hit on reboot:

Dump the memory page and analyze it statically, or continue in the debugger to find out its purpose. Although you can tinker with the registry hive and transplant only parts of it to make the OS boot without any errors, it is also possible to continue the analysis statically, using the debugger for support (required).

The shellcode

Analyzing the shellcode from the beginning leads us to a function @0x717 that starts a system thread, the thread routine @0x269. API names are resolved by hashes, so we need to step through them in the debugger or resolve them using a script. The decompiled shellcode is shown below.

The two-QWORD array is filled with the two arguments of the “RealMain_717” routine and is then used by the thread routine: the bytes from this pointer are copied until a sequence of 0xC3, 0xCC, 0xCC is encountered. The loop extracts the bytes until the end of some function (pointed to by param_2 or RealMain_717):

The copied bytes are then modified (two DWORDs set to zero), and encrypted with the output of an LCG pre-seeded with the first DWORD of the code sequence, XORed with 0x150D.

The code then uses RtlQueryRegistryValues to read “CurrentControlSet\Control\GraphicsDrivers\MemoryManager”, value “Control”, and decrypts the registry value using the output of the LCG and the encrypted code sequence from the previous piece of code:

So decryption is only possible if the original byte sequence is known (the length of the key is unknown, and brute-forcing the LCG would not help). The address of the correct code sequence can be extracted from the debugger.

Booting with the shellcode

Although replacing the SYSTEM and SOFTWARE hives may get you past the first BSOD, the system will most likely still crash. To actually have a bootable system (and since we know the registry keys that trigger execution), it makes sense to transplant only the correct registry keys:

  • boot in a normal installation of Windows 7 SP1 amd64, mount the SYSTEM hive with Regedit;
  • save the registry key from the mounted hive and then restore it to the target registry location:
Now it is possible to boot the system, debug the shellcode, and figure out the bytes required for decryption. By setting a breakpoint at offset 0xA4 in the shellcode, we can trace the correct address:

The original source bytes to be modified and hashed are located in dxgkernel.sys:

The modification of DWORDs at offsets 0x5 and 0x11 removes relocatable parts.

Decrypting the second stage

Let’s dump the “Control” registry value:

The resulting file should be 10848 bytes long and have the following MD5 checksum:

By implementing the decryption algorithm the same way as in the shellcode, and using the correct bytes from dxgkrnl, we are able to decrypt the second stage:

The resulting file should have the following MD5 checksum:

Since the second stage is also a shellcode, let’s see what’s in there.

The second stage

The shellcode of the second stage can be recognized as multi_arch_kernel_queue_apc.asm with minor modifications. Once recognized, there is no need to analyze the code because its only purpose is to inject a usermode APC with the payload. One detail worth mentioning is that the hashed name of the target process is “vboxtray.exe” (hash value 0x21B5C5E1).

The shellcode is appended with a WORD value equal to the length of the usermode payload, followed by the payload itself. So, by searching for the loader’s ending opcodes, we can locate the usermode payload:

Let’s extract and check the contents of this payload.

Usermode payload

The usermode part injected as an APC starts with a DLL loader generated by the sRDI (“shellcode reflective DLL injection”) toolkit. This is a common piece of code that usually requires no additional analysis, so for now let’s focus on its payload – the DLL appended to the shellcode.

The library has the following characteristics:

Link time 2024-05-16 11:52:51 (GMT)
Linker version AMD64 Windows Console DLL
Size 7207
Internal name keylogger.dll

The library’s entry point simply executes a call to OutputDebugString that can be used for live debugging:

The only export “Hook” that is triggered by the reflective loader also produces a debug string, loads system libraries, and creates a thread. In this thread, the module first resolves API addresses by their hashes and then sets up a typical keylogger Windows hook (WH_KEYBOARD_LL):

The hook procedure is the most important here: it collects WM_KEYDOWN events in a buffer, encrypts them with RC4, and then sends them via UDP. The RC4 key is built from a MachineGuid and a fixed binary string:

Now that we know the business logic of the module, it is time to look at the network dump (traffic.pcapng):

Let’s collect all the UDP packets on port 53 and decrypt them:

A test run produces promising results, but there are low-level hook control sequences that should be handled correctly:

Since the keylogger is rather limited in functionality and only records key down events, the best we can do is parse shift-downs and convert VKeys to readable chars:

As a result, we get the following output:

As we can see, the flag string does not exactly match the format (“SAS{}”) because of the lack of Shift events, so we need to figure out/brute-force the final value, which turns out to be all uppercase (a lack of underscore conversion allows us to guesstimate where Shift should have been held down):

The SAS CTF final competition

The SAS CTF doesn’t end with the last challenge of the qualifying phase. On October 22-25, the top eight teams head to Bali to face more interesting challenges. You can follow the Security Analyst Summit conference using the hashtag #TheSAS2024.

SAS CTF and the many ways to persist a kernel shellcode on Windows 7

Your email address will not be published. Required fields are marked *

 

Reports
Subscribe to our weekly e-mails

The hottest research right in your inbox