Research

Different x86 Bytecode Interpretations

Working on an efficient generic shellcode detection engine and verifying results with randomly generated input, I’ve effectively ended up fuzzing different open source disassembler libraries. The disassembler library of choice for my current project is libdasm because of its comparatively long history and public domain license. But writing a sound and complete x86 disassembler is obviously not a trivial task due to the complex nature of the x86 instruction set.

libdasm used to have issues correctly disassembling certain floating point instructions in the past, but this was simply caused by an off-by-three error in the opcode lookup tables (three NULL rows missing) and thus the fix was comparatively easy.

What I stumbled across today seems not to be a opcode specific issue but instead a bug in decoding instructions correctly. When libdasm disassembles instructions with a 16-bit address prefix, it decodes the address immediate wrong:

The instruction at the virtualized guest’s memory address 008fe0f0 is not decoded correctly:

  • 67 is the previously mentioned 16-bit address size prefix
  • a0 is the opcode for mov al, moffs8
  • 2232 is the 16-bit address that should be interpreted as the operand
  • e830 does not belong to this instruction

Just like you should always consult a second doctor about exotic diseases, I gave udis86, a different disassembler library, a shot:

Nice, the mov instruction got disassembled correctly this time. And since e830 is not interpreted as part of mov‘s immediate anymore, it now correctly disassembles as a call rel32 instruction. Unfortunately, udis86 is a x86-64 aware disassembler and internally sign-extends the operand to call, yet again giving incorrect disassembly.

So what does my CPU actually execute and see? Since this is part of a virtualization / emulation code anyway, we can simply add a cc breakpoint to the block’s prologue and step through it with gdb (omitting some junk):

So the CPU really sees a call instruction and tries to execute it. In this particular case, this would have been a devestating scenario as it would allow a privilegue escalation vulnerability for arbitrary user input, likely shellcode, to break out of the virtualization isolation. For this specific approach to work correctly, all control flow modifying instructions like call must be emulated in software. If we however do not see such an instruction in the disassembly, we cannot handle it correctly.

After patching libdasm (which turned out to ignore address size prefixes for operand parsing entirely), the disassembly is correct:

Lessons learned today:

  • Fuzzing your software with random input as a part of your testing process is always a good idea and like in this case can always reveal interesting vulnerabilities. Exploiting this particular case would have still been very hard, since the code segment descriptor and the data segment descriptors where pointing to different base addresses, but a skilled attacker could have succeeded nevertheless.
  • The public version of libdasm incorrectly disassembles all instructions with a address size override prefix. This will result in interesting attack vectors against some projects using libdasm. Look out for a patch for libdasm!

Different x86 Bytecode Interpretations

Your email address will not be published. Required fields are marked *

 

Reports

APT trends report Q2 2021

This is our latest summary of advanced persistent threat (APT) activity, focusing on significant events that we observed during Q2 2021: attacks against Microsoft Exchange servers, APT29 and APT31 activities, targeting campaigns, etc.

LuminousMoth APT: Sweeping attacks for the chosen few

We recently came across unusual APT activity that was detected in high volumes, albeit most likely aimed at a few targets of interest. Further analysis revealed that the actor, which we dubbed LuminousMoth, shows an affinity to the HoneyMyte group, otherwise known as Mustang Panda.

WildPressure targets the macOS platform

We found new malware samples used in WildPressure campaigns: newer version of the C++ Milum Trojan, a corresponding VBScript variant with the same version number, and a Python script working on both Windows and macOS.

Ferocious Kitten: 6 years of covert surveillance in Iran

Ferocious Kitten is an APT group that has been targeting Persian-speaking individuals in Iran. Some of the TTPs used by this threat actor are reminiscent of other groups, such as Domestic Kitten and Rampant Kitten. In this report we aim to provide more details on these findings.

Subscribe to our weekly e-mails

The hottest research right in your inbox