On the way to better testing

01 Feb 2010

minute read

Authors

Magnus Kalkuhl

Have you ever found a false positive when uploading a file to a website like VirusTotal? Sometimes it happens that not just one scanner detects the file, but several. This leads to an absurd situation where every product which doesn’t detect this file automatically looks bad to users who don’t understand that it’s just false positives.

Sadly you will find the same situation in a lot of AV tests, especially in static on-demand-tests where sometimes hundreds of thousands of samples are scanned. Naturally validating such a huge number of samples requires a lot of resources. That’s why most testers can only verify a subset of the files they use. What about the rest? The only way for them to classify the rest of their files is using a combination of source reputation and multi-scanning. This means that, like in the VirusTotal example above, every company that doesn’t detect samples that are detected by other companies will look bad – even if the samples might be either corrupted or absolutely clean.

Since good test results are a key factor for AV companies, this has led to the rise of multi-scanner based detection. Naturally AV vendors, including us, have been scanning suspicious files with each others’ scanners for years now. Obviously knowing what verdicts are produced by other AV vendors is useful. For instance, if 10 AV vendors detect a suspicious file as being a Trojan downloader, this helps you know where to start. But this is certainly different to what we’re seeing now: driven by the need for good test results, the use of multi-scanner based detection has increased a lot over the last few years. Of course no one really likes this situation – in the end our task is to protect our users, not to hack test methodologies.

This is why a German computer magazine conducted an experiment, and the results of this experiment were presented at a security conference last October: they created a clean file, asked us to add a false detection for it and finally uploaded it to VirusTotal. Some months later this file was detected by more than 20 scanners on VirusTotal. After the presentation, representatives from several AV vendors at the event agreed that a solution should be found. However, multi-scanner based detection is just the symptom – the root of the problem is the test methodology itself.Unfortunately there isn’t much AV companies can do about it, because at the end it’s magazines that order tests – and if they can chose between a cheap static-on-demand test using an impressive-sounding 1 million samples (some of which are several months old) or an expensive dynamic test with fewer, but validated, zero-day samples, most magazines will choose the first option.

As I’ve mentioned above, AV companies as well as most testers are aware of this problem, and they aren’t too happy about it. Improving test methodologies was also the reasons why two years ago, a number of AV companies (including us), independent researchers and testers founded AMTSO (Anti-Malware Testing Standards Organization). But in the end it’s the journalists that play the key role. This is why we decided to illustrate the problem during our recent press tour in Moscow where we welcomed journalists from all around the world. Naturally the goal was not to discredit any AV companies (you could also find examples where we detected a file because of the multi-scanner’s influence), but to highlight the negative effect of cheap static on-demand tests.

What we did pretty much replicated what the German computer magazine did last year, only with more samples. We created 20 clean files and added a fake detection for 10 of them. Over the next few days we re-uploaded all twenty files to VirusTotal to see what would happen. After ten days, all of our detected (but not actually malicious) files were detected by up to 14 other AV companies – in some cases the false detection was probably the result of aggressive heuristics, but multi-scanning obviously influenced some of the results. We handed out all the samples used to the journalists so they could test it for themselves. We were aware this might be a risky step: since our presentation also covered the question of intellectual property, there was a risk that journalists might focus on who copies from whom, rather than on the main issue (multi-scanning being the symptom, not the root cause) But at the end of the day, it’s the journalists who have it in their power to order better tests, so we had to start somewhere.

So where should we go from here? The good news is that in the last few months, some testers have already started to work on new test methodologies. Instead of static on-demand-scanning they try to test the whole chain of detection components: anti-spam-module -> in the cloud protection -> signature based detection -> emulation -> behavior-based real-time analysis , etc.. But ultimately, it’s up to the magazines to order this type of test and to abandon approaches that are simply outdated.

If we get rid of static on-demand-tests with their mass of unvalidated samples, the copying of classifications will at least be significantly reduced, test results will correspond more closely to reality (even if that means saying good bye to 99.x% detection rates) and in the end everyone will benefit: the press, the users and of course us as well.

Authors

Magnus Kalkuhl

On the way to better testing

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Latest Posts

Latest Webinars

Reports

According to Kaspersky, Librarian Ghouls APT continues its series of attacks on Russian entities. A detailed analysis of a malicious campaign utilizing RAR archives and BAT scripts.

Kaspersky GReAT experts uncovered a new campaign by Lazarus APT that exploits vulnerabilities in South Korean software products and uses a watering hole approach.

MysterySnail RAT attributed to IronHusky APT group hasn’t been reported since 2021. Recently, Kaspersky GReAT detected new versions of this implant in government organizations in Mongolia and Russia.

Kaspersky researchers analyze GOFFEE’s campaign in H2 2024: the updated infection scheme, new PowerModul implant, switch to a binary Mythic agent.

On the way to better testing

GReAT Ideas. Balalaika Edition

GReAT Ideas. Green Tea Edition

GReAT Ideas. Powered by SAS: malware attribution and next-gen IoT honeypots

GReAT Ideas. Powered by SAS: threat actors advance on new fronts

GReAT Ideas. Powered by SAS: threat hunting and new techniques

mwcollectd released

Happy birthday, Mac!

Wardriving in Copenhagen, Denmark

Online surveillance still under discussion

The C64 hits 25

How much security is enough?

TOP 10 unattributed APT mysteries

The future of cyberconflicts

Researchers call for a determined path to cybersecurity

What does it take to become a good reverse engineer?

Latest Posts

Sleep with one eye open: how Librarian Ghouls steal data by night

Analysis of the latest Mirai wave exploiting TBK DVR devices with CVE-2024-3721

IT threat evolution in Q1 2025. Non-mobile statistics

IT threat evolution in Q1 2025. Mobile statistics

Latest Webinars

Unmasking email dangers: Detecting and defending against mail threats

Kaspersky Scan Engine: Built to Integrate, Engineered to Protect

Kaspersky’s way of cloud workload protection

In-depth analysis of cyberattacks: key findings from Kaspersky’s Incident Response report

Reports

Sleep with one eye open: how Librarian Ghouls steal data by night

Operation SyncHole: Lazarus APT goes back to the well

IronHusky updates the forgotten MysterySnail RAT to target Russia and Mongolia

GOFFEE continues to attack organizations in Russia

Subscribe to our weekly e-mails