On the way to better testing

01 Feb 2010

minute read

Authors

Magnus Kalkuhl

Have you ever found a false positive when uploading a file to a website like VirusTotal? Sometimes it happens that not just one scanner detects the file, but several. This leads to an absurd situation where every product which doesn’t detect this file automatically looks bad to users who don’t understand that it’s just false positives.

Sadly you will find the same situation in a lot of AV tests, especially in static on-demand-tests where sometimes hundreds of thousands of samples are scanned. Naturally validating such a huge number of samples requires a lot of resources. That’s why most testers can only verify a subset of the files they use. What about the rest? The only way for them to classify the rest of their files is using a combination of source reputation and multi-scanning. This means that, like in the VirusTotal example above, every company that doesn’t detect samples that are detected by other companies will look bad – even if the samples might be either corrupted or absolutely clean.

Since good test results are a key factor for AV companies, this has led to the rise of multi-scanner based detection. Naturally AV vendors, including us, have been scanning suspicious files with each others’ scanners for years now. Obviously knowing what verdicts are produced by other AV vendors is useful. For instance, if 10 AV vendors detect a suspicious file as being a Trojan downloader, this helps you know where to start. But this is certainly different to what we’re seeing now: driven by the need for good test results, the use of multi-scanner based detection has increased a lot over the last few years. Of course no one really likes this situation – in the end our task is to protect our users, not to hack test methodologies.

This is why a German computer magazine conducted an experiment, and the results of this experiment were presented at a security conference last October: they created a clean file, asked us to add a false detection for it and finally uploaded it to VirusTotal. Some months later this file was detected by more than 20 scanners on VirusTotal. After the presentation, representatives from several AV vendors at the event agreed that a solution should be found. However, multi-scanner based detection is just the symptom – the root of the problem is the test methodology itself.Unfortunately there isn’t much AV companies can do about it, because at the end it’s magazines that order tests – and if they can chose between a cheap static-on-demand test using an impressive-sounding 1 million samples (some of which are several months old) or an expensive dynamic test with fewer, but validated, zero-day samples, most magazines will choose the first option.

As I’ve mentioned above, AV companies as well as most testers are aware of this problem, and they aren’t too happy about it. Improving test methodologies was also the reasons why two years ago, a number of AV companies (including us), independent researchers and testers founded AMTSO (Anti-Malware Testing Standards Organization). But in the end it’s the journalists that play the key role. This is why we decided to illustrate the problem during our recent press tour in Moscow where we welcomed journalists from all around the world. Naturally the goal was not to discredit any AV companies (you could also find examples where we detected a file because of the multi-scanner’s influence), but to highlight the negative effect of cheap static on-demand tests.

What we did pretty much replicated what the German computer magazine did last year, only with more samples. We created 20 clean files and added a fake detection for 10 of them. Over the next few days we re-uploaded all twenty files to VirusTotal to see what would happen. After ten days, all of our detected (but not actually malicious) files were detected by up to 14 other AV companies – in some cases the false detection was probably the result of aggressive heuristics, but multi-scanning obviously influenced some of the results. We handed out all the samples used to the journalists so they could test it for themselves. We were aware this might be a risky step: since our presentation also covered the question of intellectual property, there was a risk that journalists might focus on who copies from whom, rather than on the main issue (multi-scanning being the symptom, not the root cause) But at the end of the day, it’s the journalists who have it in their power to order better tests, so we had to start somewhere.

So where should we go from here? The good news is that in the last few months, some testers have already started to work on new test methodologies. Instead of static on-demand-scanning they try to test the whole chain of detection components: anti-spam-module -> in the cloud protection -> signature based detection -> emulation -> behavior-based real-time analysis , etc.. But ultimately, it’s up to the magazines to order this type of test and to abandon approaches that are simply outdated.

If we get rid of static on-demand-tests with their mass of unvalidated samples, the copying of classifications will at least be significantly reduced, test results will correspond more closely to reality (even if that means saying good bye to 99.x% detection rates) and in the end everyone will benefit: the press, the users and of course us as well.

Authors

Magnus Kalkuhl

On the way to better testing

Latest Posts

Latest Webinars

Reports

We continue to report on the APT group ToddyCat. This time, we’ll talk about traffic tunneling, constant access to a target infrastructure and data extraction from hosts.

New unattributed DuneQuixote campaign targeting entities in the Middle East employs droppers disguised as Total Commander installer and CR4T backdoor in C and Go.

In this report Kaspersky researchers provide an analysis of the previously unknown HrServ web shell, which exhibits both APT and crimeware features and has likely been active since 2021.

Asian APT groups target various organizations from a multitude of regions and industries. We created this report to provide the cybersecurity community with the best-prepared intelligence data to effectively counteract Asian APT groups.

On the way to better testing

GReAT Ideas. Balalaika Edition

GReAT Ideas. Green Tea Edition

GReAT Ideas. Powered by SAS: malware attribution and next-gen IoT honeypots

GReAT Ideas. Powered by SAS: threat actors advance on new fronts

GReAT Ideas. Powered by SAS: threat hunting and new techniques

mwcollectd released

Happy birthday, Mac!

Wardriving in Copenhagen, Denmark

Online surveillance still under discussion

The C64 hits 25

How much security is enough?

TOP 10 unattributed APT mysteries

The future of cyberconflicts

Researchers call for a determined path to cybersecurity

What does it take to become a good reverse engineer?

Latest Posts

SoumniBot: the new Android banker’s unique techniques

Using the LockBit builder to generate targeted ransomware

XZ backdoor story – Initial analysis

DinodasRAT Linux implant targeting entities worldwide

Latest Webinars

The Future of AI in cybersecurity: what to expect in 2024

Responding to a data breach: a step-by-step guide

2024 Advanced persistent threat predictions

Overview of modern car compromise techniques and methods of protection

Reports

ToddyCat is making holes in your infrastructure

DuneQuixote campaign targets Middle Eastern entities with “CR4T” malware

HrServ – Previously unknown web shell used in APT attack

Modern Asian APT groups’ tactics, techniques and procedures (TTPs)

Subscribe to our weekly e-mails