Thread Rating:
  • 1 Vote(s) - 5 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Looking at Big Threats Using Code Similarity. Part 1
#1
Bug 
Quote:
[Image: sl_looking_at_big_threats_01.png]

Today, we are announcing the release of KTAE, the Kaspersky Threat Attribution Engine. This code attribution technology, developed initially for internal use by the Kaspersky Global Research and Analysis Team, is now being made available to a wider audience. You can read more about KTAE in our official press release, or go directly to its info page on the Kaspersky Enterprise site. From an internal tool, to prototype and product, this is a road which took about 3 years. We tell the story of this trip below, while throwing in a few code examples as well. However, before diving into KTAE, it’s important to talk about how it all started, on a sunny day, approximately three years ago.

May 12, 2017, a Friday, started in a very similar fashion to many other Fridays: I woke up, made coffee, showered and drove to work. As I was reading e-mails, one message from a colleague in Spain caught my attention. Its subject said “Crisis … (and more)”. Now, crisis (and more!) is not something that people appreciate on a Friday, and it wasn’t April 1st either. Going through the e-mail from my colleague, it became obvious something was going on in several companies around the world. The e-mail even had an attachment with a photo, which is now world famous.

Soon after that, Spain’s Computer Emergency Response Team CCN-CERT, posted an alert on their site about a massive ransomware attack affecting several Spanish organizations. The alert recommended the installation of updates in the Microsoft March 2017 Security Bulletin as a means of stopping the spread of the attack. Meanwhile, the National Health Service (NHS) in the U.K. also issued an alert and confirmed infections at 16 medical institutions.

As we dug into the attack, we confirmed additional infections in several additional countries, including Russia, Ukraine, and India.

Quite essential in stopping these attacks was the Kaspersky System Watcher component. The System Watcher component has the ability to rollback the changes done by ransomware in the event that a malicious sample manages to bypass other defenses. This is extremely useful in case a ransomware sample slips past defenses and attempts to encrypt the data on the disk.

As we kept analysing the attack, we started learning more things; for instance, the infection relied on a famous exploit, (codenamed “EternalBlue”), that has been made available on the internet through the Shadowbrokers dump on April 14th, 2017 and patched by Microsoft on March 14. Despite the fact the patch has been available for two months, it appeared that many companies didn’t patch. We put together a couple of blogs, updated our technical support pages and made sure all samples were detected and blocked even on systems that were vulnerable to the EternalBlue exploit. Meanwhile, as everyone was trying to research the samples, we were scouting for any possible links to known criminal or APT groups, trying to determine how a newcomer malware was able to cause such a pandemic in just a few days. The explanation here is simple – for ransomware, it is not very often that we get to see completely new, built from scratch, pandemic-level samples. In most cases, ransomware attacks make use of some popular malware that is sold by criminals on underground forums or, “as a service”.

And yet, we couldn’t spot any links with known ransomware variants. Things became a bit clearer on Monday evening, when Neel Mehta, a researcher at Google, posted a mysterious message on Twitter with the #WannaCryptAttribution hashtag.

The cryptic message in fact referred to a similarity between two samples that have shared code. The two samples Neel refers to in the post were:
  • A WannaCry sample from February 2017 which looks like a very early variant
  • A Lazarus APT group sample from February 2015
The similarity can be observed in the screenshot below, taken between the two samples, with the shared code highlighted.
 
Although some people doubted the link, we immediately realized that Neel Mehta was right. We put together a blog diving into this similarity, “WannaCry and Lazarus Group – the missing link?”. The discovery of this code overlap was obviously not a random hit. For years, Google integrated the technology they acquired from Zynamics into their analysis tools making it possible to cluster together malware samples based on shared code. Obviously, the technology seemed to work rather nicely. Interestingly, one month later, an article was published suggesting the NSA also reportedly believed in this link.

Thinking about the story, the overlap between WannaCry and Lazarus, we put a plan together – what if we built a technology that can quickly identify code reuse between malware attacks and pinpoint the likely culprits in future cases? The goal would be to make this technology available in a larger fashion to assist threat hunters, SOCs and CERTs speed up incident response or malware triage. The first prototype for this new technology was available internally June 2017, and we continued to work on it, fine-tuning it, over the next months.

In principle, the problem of code similarity is relatively easy. Several approaches have been tested and discussed in the past, including:
  • Calculating checksums for subs and comparing them against a database
  • Reconstructing the code flow and creating a graph from it; comparing graphs for similar structures
  • Extracting n-grams and comparing them against a database
  • Using fuzzy hashes on the whole file or parts of it
  • Using metadata, such as the rich header, exports or other parts of the file; although this isn’t code similarity, it can still yield some very good results
To find the common code between two malware samples, one can, for instance, extract all 8-16 byte strings, then check for overlaps. There’s two main problems to that though:
  • Our malware collection is too big; if we want to do this for all the files we have, we’d need a large computing cluster (read: thousands of machines) and lots of storage (read: Petabytes)
  • Capex too small
Additionally, doing this massive code extraction, profiling and storage, not to mention searching, in an efficient way that we can provide as a stand-alone box, VM or appliance is another level of complexity.

To refine it, we started experimenting with code-based Yara rules. The idea was also simple and beautiful: create a Yara rule from the unique code found in a sample, then use our existing systems to scan the malware collection with that Yara rule.
...
Continue Reading
Reply


Forum Jump:


Users browsing this thread: 1 Guest(s)
[-]
Welcome
You have to register before you can post on our site.

Username/Email:


Password:





[-]
Recent Posts
Malwarebytes 5.1.3.110
Malwarebytes 5.1.3...Mohammad.Poorya — 00:51
Music Videos
Billy Joel - The Riv...jAcos — 17:24
Movies! Movies!
Beverly Hills Cop: A...jAcos — 17:22
TV Series
Matlock Kathy Bat...jAcos — 17:16
F-Secure 19.4
What's new in the ...harlan4096 — 09:44

[-]
Birthdays
Today's Birthdays
avatar (42)techlignub
avatar (41)Stevenmam
avatar (48)onlinbah
Upcoming Birthdays
avatar (43)wapedDow
avatar (49)steakelask
avatar (43)Termoplenka
avatar (41)bycoPaist
avatar (47)pieloKat
avatar (41)ilyagNeexy
avatar (49)donitascene
avatar (49)Toligo

[-]
Online Staff
There are no staff members currently online.

>