Malware Hunter – One year ago I decided to invest in static Malware Analysis automation by setting up a full-stack environment able to grab samples from common
Malware Hunter is a python powered project driven by three main components: collectors, processors and public API. The collector takes from public available sources samples and place them in a local queue waiting to be processed. The processors are multiple single python processes running on a distributed environment which pulling samples from the common queue, process them and save back to a mongodb instance the whole processed result-set. Unfortunately Is cannot store all the analyzed samples due to the storage price which would rise quite quickly, so I
After almost one year of fully automated static analyzed samples through Yara rules, Malware Hunter analyzed more than one Million samples, distributed in the following way.
It looks like on April 2019 the engine extracted and analyzed a small set of samples if compared to the general trend, while on late August / first of September it analyzed more than 250k samples. It is interesting to see a significant increase of analyses at the “end of the year” if compared to the analyses performed at the beginning of the same year. While malware collectors collect the same sources over the past year, the engine analyzes only specific file types (such as for example PE and Office files) and assuming the sample sources had not working breaks, it can mean that:
Observing the most snapped Yara rules it is nice to check that the most analyzed samples were executable files. Many of them (almost 400k) hid a PE file compressed and/or encrypted into themselves.
Many Yara matches highlight an high presence of anti-debugging techniques, for example: DebuggerTiming_Ticks, DebuggerPatterns_SEH_Inits, Debugger_Checks and isDebuggerPresent, and so on and so forth. If considered together with Create_Process, Embedded_PE and Win_File_Operations bring the analyst to think that modern malware is heavily obfuscated and weaponized against debuggers. From signatures such as: keyloggers and screenshots it’s clear that most of the nowadays malware is recording our keyboard activities and wants to spy on us by getting periodical screenshots. The presence of HTTP and TCP rules underline the way new malware keep getting online either for downloading shellcodes (signature shellcode) and to ask to be controlled from a C2 system (such as a sever). Many samples look like they open-up a local communication port which often hides a local proxy for encrypt communication between the malware and its command and control. Crafted Mutex are very frequent for Malware developers, they are used to delay or to manage the multi infection processes.
Another interesting observation comes fron the way Equation Group Toolset matches.
From Wikipedia
Many EquationGroup_toolset signatures matched during the most characterized detection time frame (at the beginning and at the ending of the year) alerting us that those well-known (August 2016) tools are still up and running and heavily reused over samples.
From the slow but interesting page “potential APT detection” (available HERE) we have “live” stats (updated every 24h) on APT matches over the 1 Million analyzed samples. Dragonfly (As Known As Energetic Bear) is what the Malware Hunter mostly matched. According to MalPedia, DragonFly is a Russian group that collects intelligence on the energy industry, followed by Regin. According to Kaspersky Lab’s findings, the Regin APT campaign targets telecom operators, government institutions, multi-national political bodies, financial and research institutions and individuals involved in advanced mathematics and cryptography. The attackers seem to be primarily interested in gathering intelligence and facilitating other types of attacks.
Many Ursnif/Gozi were detected during the past year. Ursnif/Gozi is a quite (in)famous banking trojan targeting UK/Italy mostly, and attribute to the cybercrime group TA-505 from TrendMicro in late 2018 by spotting common evidences between Ursnif/Gozi and TA-505 banking trojans such as Dridex and the loader Emotet. Interesting to note that quite old rules related to Putter Panda hit in some samples (for example: 1b1c4bc8d5f32b429eac590ec94b1a0780eaf863db99674decb6b6bd9abdf979
and ef046640438ab22d0168017aa75f7137f7a94e30e9f2f16cd65596d0a95a75d2
, ...
). Putter Panda is a Chinese threat group that has been attributed to Unit 61486 of the 12th Bureau of the PLA’s 3rd General Staff Department (GSD). The analysis on the found results might go further, but if you are interesting in getting into some details please do not hesitate to contact me, or to use the search field on that page.
While the scrapers and the workers run in remote and domestic PCs, the PAPI server holds both: Public Application Program Interface and the searching scripts (the ones used to match and to alert for specific API matches). The following graphs show the VP usage at a glance.
4 CPUs at 100% most of time. CPUs are used to process Yara rules to build-up DataBase views, to filtering out unwanted samples (for example HTML, Javascript and so on..), for searching and alerting on interesting samples and for periodically enrich pre-calculated reports by adding additional information over time. Disk is mostly used for storing temporary files on separate queues before being processed. The used instance of MongoDB is not hosted on the same machine. The network graph is used to track network load balance between Bytes sent and Bytes received. Almost 2.0Mbps incoming network is the lower bound-rate while 300Kbps is the average on out-bound. This means collectors are grabbing a nice number of new samples per day from public available sources and they push the new samples on the central queue as well. On the other hand PAPI usage looks like taking a lower outbound rate. It makes sense since the PAPI Json result for single request is is way lighter than the sample itself represented from the request.
I hope you enjoy that tool, as free to search samples, to use them to classify your TI and if you need PAPI let me know. I am planning to let it run unless the cost will increase too much for me.
About the author: Marco Ramilli, Founder of Yoroi
I do have experience on security testing since I have been performing penetration testing on several US electronic voting systems. I’ve also been encharged of testing uVote voting system from the Italian Minister of homeland security. I met Palantir Technologies where I was introduced to the Intelligence Ecosystem. I decided to amplify my cyber security experiences by diving into SCADA security issues with some of the most biggest industrial aglomerates in Italy. I finally decided to found Yoroi: an innovative Managed Cyber Security Service Provider developing some of the most amazing cyber security defence center I’ve ever experienced ! Now I technically lead Yoroi defending our customers strongly believing in: Defence Belongs To Humans
[adrotate banner=”9″] | [adrotate banner=”12″] |
(
[adrotate banner=”5″]
[adrotate banner=”13″]