Login

***harlan4096***

Quote:

Introduction

On August 31, 2021 we ran a joint webinar between Virustotal and Kaspersky, with a focus on Yara rules best practices and real world examples. If you didn’t have the chance to watch the webinar live, you can see it as a recording on Brighttalk: Applied Yara training.

During the webinar we received an overwhelming response and we would like to thank all the participants for sharing their thoughts, questions and ideas; most of all, we are happy to see so much interest and enthusiasm for Yara!

During the 90 minutes of the webinar we only had the chance to answer a fraction of the questions we received. We would still like to answer the remaining ones, since we thought a lot of them are quite relevant to real world situations, practices and could be useful to other security practitioners. Even better, for the more tricky questions we decided to ask for help from the creator of YARA itself, Victor Manuel Alvarez (aka Hector Manuel Velasquez) who will help answer them. If you have further questions, please feel free to send them to us in the comments section. We will be happy to answer them too!

Stay safe, stay secure and Happy hunting!

Costin, Vicente and Victor

RULE WRITING

Q: How difficult is it writing a YARA rule for obfuscated payloads? What file features normally you experts often look into when it comes to obfuscated files? How can YARA help? What would be your tip / best practices for writing rules to catch obfuscated binaries?

Vicente here. Obfuscated files are tricky, but YARA can still be useful. We can use all the metadata and file geometry for the detection. Also, depending where the obfuscation is, maybe we can also use some portions of the code for the detection - for instance if it only obfuscates strings with some custom method, maybe the code used for obfuscation can be useful.

Costin here. In general, it is a lot more difficult to write Yara rules for obfuscated payloads. Depending on the obfuscation method, one can still find some ways to detect them. For example, assuming a specific cryptor or packer was used, you can still write a rule for the packer (eg. UPX) or, rely on an unpacking engine to give you the plain code. Some platforms, such as VirusTotal, would automatically unpack known tools for you, which allows one to write simple Yara rules for the unpacked code. When the obfuscation is polymorphic, for instance, the code is expanded with dummy instructions and operands are split within several operations, once can try to use other file properties, such as metadata, entropy, import hashes or other data which stays constants across different generation.

In short, there is no rule on how to write rules for obfuscated code, but in some cases, it is possible.

Q: Do hashes used for API hashing for example would be helpful for a YARA rule? Since this can change in a future campaign, i'm in doubt

Costin here. Absolutely - API hashing can be super useful for detection of malware, especially when there are very few unique strings that can be used in the rule, or, when the malware is otherwise obfuscated. Yara actually provides a nice easy to use solution - the pe module implements the standard Mandiant import hash function as pe.imphash(). This can be used in a Yara rule condition such as:

Q: Are you trying rules against a set of malware before releasing it ?

Costin here. Yes, extensive QA of a Yara rule is critical for us before releasing it publicly or to customers. For this purpose, we use several internal databases, of both malicious files and known clean code. To make sure the Yara rule detects more than just one sample, we try to run it against our entire malware collection, which is over 5 petabytes at the moment. Sometimes, when time is essential, we run it on a subset of the malware collection, such as specific PE files received during the last 12 months, or, say, script files. In many cases, testing a rule on clean files is even more important than testing it on malware! Based on our experience, we’ve seen countless instances of Yara rules that were published by security companies or even governmental agencies which produced false positives when used on a real system. To simplify testing of Yara rules, you can either use VirusTotal’s Retrohunt feature, or set up a test of your own collections using our open source KLARA project.

Q: Won't typos also restrict the Yara rule to a particular sample or samples distributed under one campaign?

Vicente here. That could happen, but maybe the typos are difficult to replace. For example, sometimes typos are in the commands that the malware receives from the server, which would need a redeployment of server and client side malware from the attacker. In some other cases attackers get fully unaware of the typo, maybe because they are not familiar with the language and just use a weird expression that no native speaker would use, and can stay there forever. We can find these typos everywhere, including metadata, comments, etc.

Q: Where are good sources for large amounts of known good clean data?

Costin here. There are several public, free good sources of known good clean data. Some have suggested the NIST reference set, however, you can also build your own from things like a Windows installation, Linux install, Android/iOS dumps and the likes. It’s important to also have third party software, such as Chrome, Firefox or Adobe Reader. A lot of false positives produced by publicly available Yara rules occur on software such as the ones above!

Q: How useful are Yara Rule Generators like Florian Roth's yarGen?

Vicente here. Yara rule generators are VERY useful, but we do not recommend using raw generated rules without an extra round of manual polishing. We believe it is more useful to use it for a first round to extract potentially relevant strings from a collection of samples we are analyzing, and from here use these results to help us build more refined rules.

Q: Is it better to use wide, wide ascii, both, or none?

Costin here. Using wide, ascii or both (or none) depends on a case by case basis. Depending on how a malware piece is compiled, the strings you see inside could be ascii (single byte) or wide (double byte, Unicode). Yara allows you to easily search for UTF-16 strings through the wide modifier. By default, ascii is used every time you assign a string to a variable, however, if you want to search both, you need to use “ascii wide”. Adding “wide” to the ascii strings you are searching for might find you additional stuff.

Q: If a rule generates false positives on a very specific sample, would you suppress a specific hash in the rule directly or rather improve the whole logic?

Vicente here. The answer is - “it depends”. If there is a very particular binary that produces a false positive in an otherwise solid rule, we can keep the rule as long as we know what we are doing - for example we use it for hunting privately, and just exclude that specific FP file by hash. In case this rule would be used externally or automatically, for instance in a production environment for detection, then better to avoid this false positive in the logic.

Q: some programming languages have automatic formatters (like 'black' in Python, or gofmt in Golang) -- do you recommend something similar for Yara, to maintain good formatting across a team?

Víctor here. As far as I know such a tool doesn’t exist. There are some alternative parsers for YARA, which are able to read a YARA source file, build an abstract syntax tree (AST) from it, and regenerate the source code from the AST. But they have limitations that make them unsuitable for building a tool like gofmt (for example the comments are completely lost).
...

***harlan4096*** · 06 September 21, 06:46

Additional Info: https://securelist.com/applied-yara-training-qa/104007/

Login
Username/Email:
Password:	Lost Password?
	Remember me