Different x86 Bytecode Interpretations

22 Jul 2010

minute read

Authors

Georg Wicherski

Working on an efficient generic shellcode detection engine and verifying results with randomly generated input, I’ve effectively ended up fuzzing different open source disassembler libraries. The disassembler library of choice for my current project is libdasm because of its comparatively long history and public domain license. But writing a sound and complete x86 disassembler is obviously not a trivial task due to the complex nature of the x86 instruction set.

libdasm used to have issues correctly disassembling certain floating point instructions in the past, but this was simply caused by an off-by-three error in the opcode lookup tables (three NULL rows missing) and thus the fix was comparatively easy.

What I stumbled across today seems not to be a opcode specific issue but instead a bug in decoding instructions correctly. When libdasm disassembles instructions with a 16-bit address prefix, it decodes the address immediate wrong:

[~] Verifying shellcode candidate offset 8eb0f0
  008fe0f0[    67a02232e830] &gt; mov al,[0x30e83222]
  008fe0f6[              61] &gt; popa 
  008fe0f7[              f9] &gt; stc 
  008fe0f8[          ff4038] &gt; inc [eax+0x38]
  008fe0fb[            b269] &gt; mov dl,0x69
  008fe0fd[              52] &gt; push edx
  008fe0fe[              3f] &gt; aas 
  008fe0ff[              5e] &gt; pop esi
  008fe100[    1a3dc31168aa] &gt; sbb bh,[0xaa6811c3]
  008fe106[              59] &gt; pop ecx
  008fe107[              9c] &gt; pushf 
  008fe108[................] &lt;

1

2

3

4

5

6

7

8

9

10

11

12

13

[~] Verifying shellcode candidate offset 8eb0f0

008fe0f0[ 67a02232e830] > mov al,[0x30e83222]

008fe0f6[ 61] > popa

008fe0f7[ f9] > stc

008fe0f8[ ff4038] > inc [eax+0x38]

008fe0fb[ b269] > mov dl,0x69

008fe0fd[ 52] > push edx

008fe0fe[ 3f] > aas

008fe0ff[ 5e] > pop esi

008fe100[ 1a3dc31168aa] > sbb bh,[0xaa6811c3]

008fe106[ 59] > pop ecx

008fe107[ 9c] > pushf

008fe108[................] <

The instruction at the virtualized guest’s memory address 008fe0f0 is not decoded correctly:

67 is the previously mentioned 16-bit address size prefix
a0 is the opcode for mov al, moffs8
2232 is the 16-bit address that should be interpreted as the operand
e830 does not belong to this instruction

Just like you should always consult a second doctor about exotic diseases, I gave udis86, a different disassembler library, a shot:

$ udcli -noff -32 -s `python -c 'print 0x8eb0f0'` -c 10 shellcode/urandom.bin 
67a02232         a16 mov al, [0x3222]    
e83061f9ff       call 0xfffffffffff96139 
40               inc eax

1

2

3

4

$ udcli -noff -32 -s `python -c 'print 0x8eb0f0'` -c 10 shellcode/urandom.bin

67a02232 a16 mov al, [0x3222]

e83061f9ff call 0xfffffffffff96139

40 inc eax

Nice, the mov instruction got disassembled correctly this time. And since e830 is not interpreted as part of mov‘s immediate anymore, it now correctly disassembles as a call rel32 instruction. Unfortunately, udis86 is a x86-64 aware disassembler and internally sign-extends the operand to call, yet again giving incorrect disassembly.

So what does my CPU actually execute and see? Since this is part of a virtualization / emulation code anyway, we can simply add a cc breakpoint to the block’s prologue and step through it with gdb (omitting some junk):

Program received signal SIGTRAP, Trace/breakpoint trap.
(gdb) disas $eip, $eip+5
=&gt; 0x0804b0c1:  jmp    0x804b134
(gdb) si
(gdb) disas $eip, $eip+10
Dump of assembler code from 0x804b134 to 0x804b13e:
=&gt; 0x0804b134:  addr16 mov 0x3222,%al
   0x0804b138:  call   0x7fe126d
   0x0804b13d:  inc    %eax
End of assembler dump.
(gdb) si
(gdb) si
(gdb) disas $eip, $eip+10
Dump of assembler code from 0x7fe126d to 0x7fe1277:
=&gt; 0x07fe126d:  Cannot access memory at address 0x7fe126d

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

Program received signal SIGTRAP, Trace/breakpoint trap.

(gdb) disas $eip, $eip+5

=> 0x0804b0c1: jmp 0x804b134

(gdb) si

(gdb) disas $eip, $eip+10

Dump of assembler code from 0x804b134 to 0x804b13e:

=> 0x0804b134: addr16 mov 0x3222,%al

0x0804b138: call 0x7fe126d

0x0804b13d: inc %eax

End of assembler dump.

(gdb) si

(gdb) disas $eip, $eip+10

Dump of assembler code from 0x7fe126d to 0x7fe1277:

=> 0x07fe126d: Cannot access memory at address 0x7fe126d

So the CPU really sees a call instruction and tries to execute it. In this particular case, this would have been a devestating scenario as it would allow a privilegue escalation vulnerability for arbitrary user input, likely shellcode, to break out of the virtualization isolation. For this specific approach to work correctly, all control flow modifying instructions like call must be emulated in software. If we however do not see such an instruction in the disassembly, we cannot handle it correctly.

After patching libdasm (which turned out to ignore address size prefixes for operand parsing entirely), the disassembly is correct:

[*] 543 shellcode candidate offsets
[~] Verifying shellcode candidate offset 8eb0f0
  008fe0f0[        67a02232] &gt; mov al,[0x3222]
  008fe0f4[................] &lt;
Emulating 008fe0f4: call 0x894229
Emulating CALL instruction from 8fe0f9.

1

2

3

4

5

6

[*] 543 shellcode candidate offsets

[~] Verifying shellcode candidate offset 8eb0f0

008fe0f0[ 67a02232] > mov al,[0x3222]

008fe0f4[................] <

Emulating 008fe0f4: call 0x894229

Emulating CALL instruction from 8fe0f9.

Lessons learned today:

Fuzzing your software with random input as a part of your testing process is always a good idea and like in this case can always reveal interesting vulnerabilities. Exploiting this particular case would have still been very hard, since the code segment descriptor and the data segment descriptors where pointing to different base addresses, but a skilled attacker could have succeeded nevertheless.
The public version of libdasm incorrectly disassembles all instructions with a address size override prefix. This will result in interesting attack vectors against some projects using libdasm. Look out for a patch for libdasm!

Authors

Georg Wicherski

Different x86 Bytecode Interpretations

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Latest Posts

Latest Webinars

Reports

According to Kaspersky, Librarian Ghouls APT continues its series of attacks on Russian entities. A detailed analysis of a malicious campaign utilizing RAR archives and BAT scripts.

Kaspersky GReAT experts uncovered a new campaign by Lazarus APT that exploits vulnerabilities in South Korean software products and uses a watering hole approach.

MysterySnail RAT attributed to IronHusky APT group hasn’t been reported since 2021. Recently, Kaspersky GReAT detected new versions of this implant in government organizations in Mongolia and Russia.

Kaspersky researchers analyze GOFFEE’s campaign in H2 2024: the updated infection scheme, new PowerModul implant, switch to a binary Mythic agent.

Different x86 Bytecode Interpretations

GReAT Ideas. Balalaika Edition

GReAT Ideas. Green Tea Edition

GReAT Ideas. Powered by SAS: malware attribution and next-gen IoT honeypots

GReAT Ideas. Powered by SAS: threat actors advance on new fronts

GReAT Ideas. Powered by SAS: threat hunting and new techniques

Live Twitter XSS

Heloag has rather no friends, just a master

Is there really a Storm out there?

The Dangers of Social Networking

The msvidctl Internet Explorer 0day

Cobalt Strike Beacon delivered via GitHub and social media

Forensic journey: Breaking down the UserAssist artifact structure

Approach to mainframe penetration testing on z/OS. Deep dive into RACF

AI and collaboration tools: how cyberattackers are targeting SMBs in 2025

Using a Mythic agent to optimize penetration testing

Latest Posts

ToolShell: a story of five vulnerabilities in Microsoft SharePoint

The SOC files: Rumble in the jungle or APT41’s new target in Africa

GhostContainer backdoor: malware compromising Exchange servers of high-value organizations in Asia

Forensic journey: Breaking down the UserAssist artifact structure

Latest Webinars

Unmasking email dangers: Detecting and defending against mail threats

Kaspersky Scan Engine: Built to Integrate, Engineered to Protect

Kaspersky’s way of cloud workload protection

In-depth analysis of cyberattacks: key findings from Kaspersky’s Incident Response report

Reports

Sleep with one eye open: how Librarian Ghouls steal data by night

Operation SyncHole: Lazarus APT goes back to the well

IronHusky updates the forgotten MysterySnail RAT to target Russia and Mongolia

GOFFEE continues to attack organizations in Russia

Subscribe to our weekly e-mails