The first document we’re going to have a look at is the Best Practices for Validation of Samples document.
Samples are obviously a crucial component in good testing. There are other important aspects such as proper configuration of the products and interpretation of the (scan) results.
However when we think about testing samples come to mind first and foremost.
When we’re talking about validation we’re strictly looking at making sure all samples in a set are functional. So this doesn’t include looking at either relevance or classification of a sample.
Why is validation important? Because non-loadable files will pose no threat to the user. Therefore they don’t have to, and even shouldn’t be detected.
Having non-loadable files in the set will influence the test results. Let’s have a look at a theoretical example. We have a 100 KB large network worm which is detected by AV product A, B and D, but C does not detect it.
Now let’s look at what happens when the worm loads. As this worm is trying to infect a honey pot the connection gets broken after only 80KB of the file was transferred.
Suddenly the test results look completely different:
- Product A still detects this file as it has a signature in the first 80KB of the original file.
- Product B no longer detects this file as it had a signature that relied on the last 20KB of the file.
- Product C sees a miss-match between the info in the PE header and the actual file size and suddenly starts detecting this file heuristically.
- Product D no longer detects it as it has an emulation-based detection for this particular worm. With the worm no longer loadable it can’t be emulated.
The document primarily focuses on the validation of PE (portable executable) files as these make up the vast majority of today’s malware. Ideally the sample handler tries to actually execute the sample in a secure environment as this will give the most accurate results. If that’s not possible for reasons such as time or resource constraints the document gives hints for statically checking if a sample is loadable.
You can find all of the published documents here.