The Quest for Identification
In my previous blogpost about the Duqu Framework, I described one of the biggest remaining mysteries about Duqu – the oddities of the C&C communications module which appears to have been written in a different language than the rest of the Duqu code. As technical experts, we found this question very interesting and puzzling and we wanted to share it with the community.
The feedback we received exceeded our wildest expectations. We got more than 200 comments and 60+ e-mail messages with suggestions about possible languages and frameworks that could have been used for generating the Duqu Framework code. We would like to say a big 'Thank you!' to everyone who participated in this quest to help us identify the mysterious code.
Let us review the most popular suggestions we got from you:
- Variants of LISP
- Google Go
- OO C
- Old compilers for C++ and other languages
Thanks to some very useful and knowledgeable comments, we can now say with a high degree of certainty that we have found the correct answer. I would like to quote the most relevant comments which helped us solve the puzzle:
Simple Object Orientation (for C)
It seems someone over at reddit (http://www.reddit.com/r/ReverseEngineering/) hit the jackpot: the code snippets look _very_ similar to what this would produce: http://daifukkat.su/wiki/index.php/SOO
There are a few other OO frameworks for C, but they don't match as well: http://ooc-coding.sourceforge.net/ http://sooc.sourceforge.net/
Re: Other C/C++ compiler?
I have seen how GCC works internally and its ABI (for a number of different versions) and I can confirm that the Duqu code is definitely not generated by GCC. I don't know how other C++ compilers work but the things I see in the ASM (like where the pointers to the functions go, the way the "this" pointer is passed etc) do not suggest C++ to me but something else entirely. (such as the aforementioned "object-oriented" frameworks for C that exist)
Re: Other C/C++ compiler?
I'm 99% sure the machine code was generated by MSVC. It's something you get a feel with experience, but I can point out two things that are quite characteristic of MSVC: 1) it uses esi as the first candidate for temporary storage; 2) "pop ecx" instead of "add esp, 4".
We also received two very interesting e-mail messages. Pascal Bertrand aka bps and another author who preferred to remain anonymous suggested that the code was generated from a custom object-oriented C dialect, generally called "OO C".
The comments were very important because they allowed us to track the exact compiler used in the project: the Microsoft Visual Studio compiler. I spent more time experimenting with different versions of MSVC compilers and different source codes and compiling options trying to reproduce the binary code of the constructor function mentioned in the previous blogpost and finally succeeded.
Disassembly of the original Duqu code: construction of the linked list class
Manually decompiled C code that produces the same code
The above C code, when compiled with MSVC 2008 and options /O1 (minimize size) /Ob1 (expand only __inline) produces the opcodes identical with the ones in the Duqu binary. Changing the order of operations and if/else blocks modifies the resulting code; MSVC 2005 compiler produces slightly different code, too. So, we can say with a high degree of certainty that the resulting binary was compiled with MSVC 2008 and options /O1 /Ob1 and the input source code was pure C.
So, what does that mean? In short, there are two very probable answers to our initial question:
- The code was written using a custom OO C framework, based on macros or custom preprocessor directives. This was suggested by your comments, because it is the most common way to combine object-oriented programming with C.
- All the code was written in OO C manually, without any extensions to the language. We can't deny this possibility completely because, technically, it is near impossible to distinguish code written with macro directives from manually copy-pasted code.
Judging by the amount of similar-looking code in every constructor function and member functions, we can assume that source code preprocessing was used and variant 1 is closer to the truth.
Now, there are several open-source "OO C" frameworks available, and some of them produce code constructions that are very similar to those in the Duqu code. The best match we found is SOO (Simple Object Orientation for C), however it could not have been used in Duqu, because it was only published when the Trojan was already in the wild.
No matter which of these two variants is true, the implications are impressive. The Payload DLL contains 95 Kbytes of event-driven code written with OO C, a language that has no automatic memory management or safe pointers. This kind of programming is more commonly found in complex 'civil' software projects, rather than contemporary malware. Additionally, the whole event-driven architecture must have been developed as a part of the Duqu code or its OOC extension.
There is no easy explanation why OO C was used instead of C++, however, we have seen similar cases in the past. Having spoken to some of the people who prefer such techniques, they gave two main reasons for it:
- They don't trust C++ compilers; these are usually people who started programming in the old days, when assembler was the top choice. C was a direct evolutionary step over assembler and quickly became a standard. When C++ was published, many old school programmers preferred to stay away from it because of distrust in memory allocation and other obscure language features which cause indirect execution of code (for instance, constructors).
- Extreme portability. Once again, in the old days (10-12 years ago) C++ was not entirely standardized and it was possible to have C++ code that would compile with MSVC but would not compile with (say) Watcom C++. If you wanted to go for extreme portability and target every existing platform out there, you'd go with C.
Both reasons appear indicate the code was written by a team of experienced, "old-school" developers.
- The Duqu Framework consists of "C" code compiled with MSVC 2008 using the special options "/O1" and "/Ob1"
- The code was most likely written with a custom extension to C, generally called "OO C"
- The event-driven architecture was developed as a part of the Duqu Framework or its OO C extension
- The C&C code could have been reused from an already existing software project and integrated into the Duqu trojan
All the conclusions above indicate a rather professional team of developers, which appear to be reusing older code written by top "old school" developers. Such techniques are normally seen in professional software and almost never in today's malware. Once again, these indicate that Duqu, just like Stuxnet, is a "one of a kind" piece of malware which stands out like a gem from the large mass of "dumb" malicious program we normally see.