GReAT’s distributed YARA scanner
While doing threat research, teams need a lot of tools and systems to aid their hunting efforts – from systems storing Passive DNS data and automated malware classification to systems allowing researchers to pattern-match a large volume of data in a relatively short period of time. These tools are extremely useful when working on APT campaigns where research is very agile and spans multiple months. One of the most frequently used tools for hunting new variants of malware is called YARA and was developed by Victor Manuel Alvarez while working for VirusTotal, now part of Alphabet.
In R&D we use a lot of open-source projects and we believe giving back to the community is our way of saying ‘Thank you’. More and more security companies are releasing their open-source projects and we would like to contribute with our distributed YARA scanner.
What is YARA?
YARA is defined as “a tool aimed at (but not limited to) helping malware researchers to identify and classify malware samples”. In other words, it is a pattern-matching tool, but on steroids. It can support complex matching rules as well as searching files with specific metadata (for example, it can search all files that use a certificate containing the string “Microsoft Corporation” but is not signed by “Microsoft”).
How can YARA help you find the next APT in your network?
YARA’s usefulness is amazing, especially given traditional protection measures are no longer enough in today’s complex threat landscape. Modern protection systems, combined with constant network monitoring and incident response have to be deployed in order to successfully protect equipment.
Protective measures that were effective yesterday don’t guarantee the same level of security tomorrow. Indicators of compromise (IoCs) can help you search for footprints of known malware or for an active infection. But serious threat actors have started to tailor their tools to fit each victim, thus making IoCs much less effective. Good YARA detection rules still allow analysts to find malware, exploits and 0-days which couldn’t be found any other way. The rules can be deployed in networks and on various multi-scanner systems.
That’s why, as part of our Threat Intelligence services, we offer a range of training courses, one of them being our world-famous YARA Training, held by our GReAT ninjas: Costin Raiu, Vitaly Kamluk and Sergey Mineev.
Finding exploits in the wild
One of the most remarkable cases in which Kaspersky Lab’s GReAT used YARA was the much publicized Silverlight 0-day. The team started hunting for it after Hacking Team, the Italian company selling “legal surveillance tools” for governments and LEAs, was hacked. One of the stories in the media attracted our researchers’ attention — according to the article, a programmer offered to sell Hacking Team a Silverlight 0-day, an exploit for an obsolete Microsoft plug-in which at one time had been installed on a huge number of computers.
GReAT decided to create a YARA rule based on this programmer’s older, publicly available proof-of-concept exploits. Our researchers found that he had a very particular style when coding the exploits, using very specific comments, shell code and function names. All of this unique information was used to write a YARA rule — the experts set it to carry out a clear task, basically saying “Go and hunt for any piece of malware that shows the characteristics described in the rule”. Eventually it caught a new sample, a 0-day, and the team immediately reported it to Microsoft.
KLara, GReAT’s distributed YARA scanner
As mentioned above, any team carrying out threat intelligence needs to have powerful tools in their arsenal in order to find the latest threats and detect attacks as soon as possible. Within our R&D department we have built a lot of tools internally, but we believe most progress is made when useful tools are shared with the community. As such, we are releasing our internal tool for running YARA rules over a large set of data (malware/virus collections).
What is KLara?
In order to hunt efficiently for malware, you need a large collection of samples to search through. Researchers usually need to fire a YARA rule over a collection/set of malicious files and then get the results back. In some cases, the rule needs adjusting. Unfortunately, scanning a large collection of files takes time. However, if a custom architecture is used instead, scanning 10TB of files can take around 30 minutes. Of course, if there are multiple YARA rules that need to be run simultaneously, it’s important the system is also distributed. And this is where KLara comes in. KLara is a distributed system written in Python, allowing researchers to scan one or more YARA rules over collections with samples, getting notifications by email and in the web interface when the scan results are ready. Systems like KLara are important when large collections of data are involved. Of course, researchers will have their own small virus collections on their computers in order to make sure their YARA rules are sound, but when searching for viruses in the wild, this task requires a lot of processing power and this can only be achieved with a cloud system.
Why is it important to have a distributed YARA scanner?
Attacks using APTs are extremely dangerous, regardless of whether the target belongs to the public, private or government sector. From our experience, constant monitoring of logs, netflow, alerts and any suspicious files helps mitigate an attack during reconnaissance stages. There are some projects similar to KLara that SOC teams can leverage, but most of them are private, meaning either the virus collection or rules exist somewhere in the cloud, outside the team’s direct control.
KLara, on the other hand, allows anyone running any kind of hardware to set up their own private YARA scanner, keeping TLP RED YARA rules local.
KLara under the hood
The project uses the dispatcher/worker model, with the usual architecture of one dispatcher and multiple workers. Worker and dispatcher agents are written in Python. Because the worker agents are written in Python, they can be deployed in any compatible ecosystem (Windows or UNIX). The same logic applies to the YARA scanner (used by KLara): it can be compiled on both platforms.
Jobs can be submitted and their status retrieved using a web-based portal, while each user has their own personal account allowing them to be part of a group, as well as share their KLara jobs with any other valid account.
Accounts have multiple properties that can be set by the administrator: what group they are part of, what scan repositories they can run their YARA rules over (based on group membership), if they can see other groups’ jobs, or the maximum number of jobs that can be submitted monthly (individual quotas).
By using the dashboard, authenticated users can submit jobs on the ‘Add a new job’ page:
And check their status on the ‘Current jobs’ page:
Once a user submits a task, they can view its status, resubmit it or delete it. One of the workers will fetch the job from the dispatcher and if it has eligible scan repositories on its file systems, will start the YARA scan. Once finished, the user is notified by email of the results.
Each job’s metadata consists of one or multiple YARA rules, the submitter’s account info and a set of scan repositories that can be selected:
On the main page, a summary is displayed:
- Job status: New/Assigned/Finished/Error
- Job management: Restart/Delete job
- How many files have been matched
- Name of the first rule in the rules set
- The repository path over which YARA scanned for matches.
A more detailed status can be seen once we click on a job:
Any YARA results will be displayed at the bottom, as well as a list of matched MD5s.
Each user can have a search quota and be part of a group. Groups can choose to restrict users (preventing them from seeing what other jobs group members submit).
Finally, each user can change their email address if they want notifications to be sent to another email account.
API access
In order to facilitate automatic job submissions as well as automatic results retrieval, KLara implements a simple REST API allowing any valid account with a valid API Key to query any allowed job’s status. It allows scripts to:
- Submit new tasks
- Get the job results as well as job details (if it’s still scanning or assigned, finished or if there’s an unprocessed (new) job)
- Get all the YARA results from a specific job.
- Get all the matched MD5 hashes
More info about using the API can be found in the repository.
How can you get KLara?
The software was released on our official Kaspersky Lab GitHub account on 9 March, 2018.
We welcome anyone who wants to contribute to this project to submit pull requests. As we said before, we believe in giving back to the community the best tools we can provide in order to fight malware.
The software is open-sourced under GNU General Public License v3.0 and available with no warranty from the developers.
Your new friend, KLara
rick wesson
I have thousands of yara signatures, how can I get them distributed?