On the way to better testing

01 Feb 2010

minute read

Authors

Magnus Kalkuhl

Have you ever found a false positive when uploading a file to a website like VirusTotal? Sometimes it happens that not just one scanner detects the file, but several. This leads to an absurd situation where every product which doesn’t detect this file automatically looks bad to users who don’t understand that it’s just false positives.

Sadly you will find the same situation in a lot of AV tests, especially in static on-demand-tests where sometimes hundreds of thousands of samples are scanned. Naturally validating such a huge number of samples requires a lot of resources. That’s why most testers can only verify a subset of the files they use. What about the rest? The only way for them to classify the rest of their files is using a combination of source reputation and multi-scanning. This means that, like in the VirusTotal example above, every company that doesn’t detect samples that are detected by other companies will look bad – even if the samples might be either corrupted or absolutely clean.

Since good test results are a key factor for AV companies, this has led to the rise of multi-scanner based detection. Naturally AV vendors, including us, have been scanning suspicious files with each others’ scanners for years now. Obviously knowing what verdicts are produced by other AV vendors is useful. For instance, if 10 AV vendors detect a suspicious file as being a Trojan downloader, this helps you know where to start. But this is certainly different to what we’re seeing now: driven by the need for good test results, the use of multi-scanner based detection has increased a lot over the last few years. Of course no one really likes this situation – in the end our task is to protect our users, not to hack test methodologies.

This is why a German computer magazine conducted an experiment, and the results of this experiment were presented at a security conference last October: they created a clean file, asked us to add a false detection for it and finally uploaded it to VirusTotal. Some months later this file was detected by more than 20 scanners on VirusTotal. After the presentation, representatives from several AV vendors at the event agreed that a solution should be found. However, multi-scanner based detection is just the symptom – the root of the problem is the test methodology itself.Unfortunately there isn’t much AV companies can do about it, because at the end it’s magazines that order tests – and if they can chose between a cheap static-on-demand test using an impressive-sounding 1 million samples (some of which are several months old) or an expensive dynamic test with fewer, but validated, zero-day samples, most magazines will choose the first option.

As I’ve mentioned above, AV companies as well as most testers are aware of this problem, and they aren’t too happy about it. Improving test methodologies was also the reasons why two years ago, a number of AV companies (including us), independent researchers and testers founded AMTSO (Anti-Malware Testing Standards Organization). But in the end it’s the journalists that play the key role. This is why we decided to illustrate the problem during our recent press tour in Moscow where we welcomed journalists from all around the world. Naturally the goal was not to discredit any AV companies (you could also find examples where we detected a file because of the multi-scanner’s influence), but to highlight the negative effect of cheap static on-demand tests.

What we did pretty much replicated what the German computer magazine did last year, only with more samples. We created 20 clean files and added a fake detection for 10 of them. Over the next few days we re-uploaded all twenty files to VirusTotal to see what would happen. After ten days, all of our detected (but not actually malicious) files were detected by up to 14 other AV companies – in some cases the false detection was probably the result of aggressive heuristics, but multi-scanning obviously influenced some of the results. We handed out all the samples used to the journalists so they could test it for themselves. We were aware this might be a risky step: since our presentation also covered the question of intellectual property, there was a risk that journalists might focus on who copies from whom, rather than on the main issue (multi-scanning being the symptom, not the root cause) But at the end of the day, it’s the journalists who have it in their power to order better tests, so we had to start somewhere.

So where should we go from here? The good news is that in the last few months, some testers have already started to work on new test methodologies. Instead of static on-demand-scanning they try to test the whole chain of detection components: anti-spam-module -> in the cloud protection -> signature based detection -> emulation -> behavior-based real-time analysis , etc.. But ultimately, it’s up to the magazines to order this type of test and to abandon approaches that are simply outdated.

If we get rid of static on-demand-tests with their mass of unvalidated samples, the copying of classifications will at least be significantly reduced, test results will correspond more closely to reality (even if that means saying good bye to 99.x% detection rates) and in the end everyone will benefit: the press, the users and of course us as well.

Authors

Magnus Kalkuhl

On the way to better testing

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Latest Posts

Latest Webinars

Reports

Kaspersky researchers analyze updated CoolClient backdoor and new tools and scripts used in HoneyMyte (aka Mustang Panda or Bronze President) APT campaigns, including three variants of a browser data stealer.

Kaspersky discloses a 2025 HoneyMyte (aka Mustang Panda or Bronze President) APT campaign, which uses a kernel-mode rootkit to deliver and protect a ToneShell backdoor.

Kaspersky GReAT experts analyze the Evasive Panda APT’s infection chain, including shellcode encrypted with DPAPI and RC5, as well as the MgBot implant.

Kaspersky expert describes new malicious tools employed by the Cloud Atlas APT, including implants of their signature backdoors VBShower, VBCloud, PowerShower, and CloudAtlas.

On the way to better testing

GReAT Ideas. Balalaika Edition

GReAT Ideas. Green Tea Edition

GReAT Ideas. Powered by SAS: malware attribution and next-gen IoT honeypots

GReAT Ideas. Powered by SAS: threat actors advance on new fronts

GReAT Ideas. Powered by SAS: threat hunting and new techniques

mwcollectd released

Happy birthday, Mac!

Wardriving in Copenhagen, Denmark

Online surveillance still under discussion

The C64 hits 25

How much security is enough?

TOP 10 unattributed APT mysteries

The future of cyberconflicts

Researchers call for a determined path to cybersecurity

Satoshi Bomb

Latest Posts

Spam and phishing in 2025

Stan Ghouls targeting Russia and Uzbekistan with NetSupport RAT

The Notepad++ supply chain attack — unnoticed execution chains and new IoCs

Supply chain attack on eScan antivirus: detecting and remediating malicious updates

Latest Webinars

Hunt Hub: Opening the black box of EDR detection

Signal in the noise. What 2025 hacktivism reveals about the modern threat landscape

Kaspersky Security Bulletin 2025: Decoding the finance sector threats

State of AI in cybersecurity: AI trends & predictions for 2026

Reports

HoneyMyte updates CoolClient and deploys multiple stealers in recent campaigns

The HoneyMyte APT evolves with a kernel-mode rootkit and a ToneShell backdoor

Evasive Panda APT poisons DNS requests to deliver MgBot

Cloud Atlas activity in the first half of 2025: what changed

Subscribe to our weekly e-mails