On the way to better testing

Have you ever found a false positive when uploading a file to a website like VirusTotal? Sometimes it happens that not just one scanner detects the file, but several. This leads to an absurd situation where every product which doesn’t detect this file automatically looks bad to users who don’t understand that it’s just false positives.

Sadly you will find the same situation in a lot of AV tests, especially in static on-demand-tests where sometimes hundreds of thousands of samples are scanned. Naturally validating such a huge number of samples requires a lot of resources. That’s why most testers can only verify a subset of the files they use. What about the rest? The only way for them to classify the rest of their files is using a combination of source reputation and multi-scanning. This means that, like in the VirusTotal example above, every company that doesn’t detect samples that are detected by other companies will look bad – even if the samples might be either corrupted or absolutely clean.

Since good test results are a key factor for AV companies, this has led to the rise of multi-scanner based detection. Naturally AV vendors, including us, have been scanning suspicious files with each others’ scanners for years now. Obviously knowing what verdicts are produced by other AV vendors is useful. For instance, if 10 AV vendors detect a suspicious file as being a Trojan downloader, this helps you know where to start. But this is certainly different to what we’re seeing now: driven by the need for good test results, the use of multi-scanner based detection has increased a lot over the last few years. Of course no one really likes this situation – in the end our task is to protect our users, not to hack test methodologies.

This is why a German computer magazine conducted an experiment, and the results of this experiment were presented at a security conference last October: they created a clean file, asked us to add a false detection for it and finally uploaded it to VirusTotal. Some months later this file was detected by more than 20 scanners on VirusTotal. After the presentation, representatives from several AV vendors at the event agreed that a solution should be found. However, multi-scanner based detection is just the symptom – the root of the problem is the test methodology itself.Unfortunately there isn’t much AV companies can do about it, because at the end it’s magazines that order tests – and if they can chose between a cheap static-on-demand test using an impressive-sounding 1 million samples (some of which are several months old) or an expensive dynamic test with fewer, but validated, zero-day samples, most magazines will choose the first option.

As I’ve mentioned above, AV companies as well as most testers are aware of this problem, and they aren’t too happy about it. Improving test methodologies was also the reasons why two years ago, a number of AV companies (including us), independent researchers and testers founded AMTSO (Anti-Malware Testing Standards Organization). But in the end it’s the journalists that play the key role. This is why we decided to illustrate the problem during our recent press tour in Moscow where we welcomed journalists from all around the world. Naturally the goal was not to discredit any AV companies (you could also find examples where we detected a file because of the multi-scanner’s influence), but to highlight the negative effect of cheap static on-demand tests.

What we did pretty much replicated what the German computer magazine did last year, only with more samples. We created 20 clean files and added a fake detection for 10 of them. Over the next few days we re-uploaded all twenty files to VirusTotal to see what would happen. After ten days, all of our detected (but not actually malicious) files were detected by up to 14 other AV companies – in some cases the false detection was probably the result of aggressive heuristics, but multi-scanning obviously influenced some of the results. We handed out all the samples used to the journalists so they could test it for themselves. We were aware this might be a risky step: since our presentation also covered the question of intellectual property, there was a risk that journalists might focus on who copies from whom, rather than on the main issue (multi-scanning being the symptom, not the root cause) But at the end of the day, it’s the journalists who have it in their power to order better tests, so we had to start somewhere.

So where should we go from here? The good news is that in the last few months, some testers have already started to work on new test methodologies. Instead of static on-demand-scanning they try to test the whole chain of detection components: anti-spam-module -> in the cloud protection -> signature based detection -> emulation -> behavior-based real-time analysis , etc.. But ultimately, it’s up to the magazines to order this type of test and to abandon approaches that are simply outdated.

If we get rid of static on-demand-tests with their mass of unvalidated samples, the copying of classifications will at least be significantly reduced, test results will correspond more closely to reality (even if that means saying good bye to 99.x% detection rates) and in the end everyone will benefit: the press, the users and of course us as well.

On the way to better testing

Your email address will not be published. Required fields are marked *



Lazarus targets defense industry with ThreatNeedle

In mid-2020, we realized that Lazarus was launching attacks on the defense industry using the ThreatNeedle cluster, an advanced malware cluster of Manuscrypt (a.k.a. NukeSped). While investigating this activity, we were able to observe the complete life cycle of an attack, uncovering more technical details and links to the group’s other campaigns.

Sunburst backdoor – code overlaps with Kazuar

While looking at the Sunburst backdoor, we discovered several features that overlap with a previously identified backdoor known as Kazuar. Our observations shows that Kazuar was used together with Turla tools during multiple breaches in past years.

Lazarus covets COVID-19-related intelligence

As the COVID-19 crisis grinds on, some threat actors are trying to speed up vaccine development by any means available. We have found evidence that the Lazarus group is going after intelligence that could help these efforts by attacking entities related to COVID-19 research.

Sunburst: connecting the dots in the DNS requests

We matched private and public DNS data for the SUNBURST-malware root C2 domain with the CNAME records, to identify who was targeted for further exploitation. In total, we analyzed 1722 DNS records, leading to 1026 unique target name parts and 964 unique UIDs.

Subscribe to our weekly e-mails

The hottest research right in your inbox