2nd Workshop on
Robust Malware Analysis (WoRMA)
July 7, 2023; Delft, The Netherlands
co-located with EuroS&P 2023

Call for Papers

Important Dates

  • Paper submission deadline: March 15, 2023; 11:59 PM (AoE, UTC-12)
  • Acceptance notification: April 30, 2023; 11:59 PM (AoE, UTC-12)
  • Camera ready due: May 15 24, 2023; 11:59 PM (AoE, UTC-12)
  • Workshop date: July 7, 2023

Overview

Malware research is a discipline of information security that aims to provide protection against unwanted and dangerous software. Since the mid-1980s, researchers in this area are leading a technological arms race against creators of malware. Many ideas have been proposed, to varying degrees of effectiveness, from more traditional systems security and program analysis to the use of AI and Machine Learning. Nevertheless, with increased technological complexity and despite more sophisticated defenses, malware’s impact has grown, rather than shrunk. It appears that the defenders are continually reacting to yesterday’s threats, only to be surprised by their today’s minor variations.

This lack of robustness is most apparent in signature matching, where malware is represented by a characteristic substring. The fundamental limitation of this approach is its reliance on falsifiable evidence. Mutating the characteristic substring, i.e., falsifying the evidence, is effective in evading detection, and cheaper than discovering the substring in the first place. Unsurprisingly, the same limitation applies to malware detectors based on machine learning, as long as they rely on falsifiable features for decision-making. Robust malware features are necessary.

Furthermore, robust methods for malware classification and analysis are needed across the board to overcome phenomena including, but not limited to, concept drift (malware evolution), polymorphism, new malware families, new anti-analysis techniques, and adversarial machine learning, while supporting robust explanations. This workshop solicits work that aims to advance robust malware analysis, with the goal of creating long-term solutions to the threats of today’s digital environment. Potential research directions are malware detection, benchmark datasets, environments for malware arms race simulation, and exploring limitations of existing work, among others.

Topics of Interest

Topics of interest include (but are not limited to):

Malware Analysis
Topics related to understanding the malicious actions exhibited by malware:
  • Identification of malware behaviors
  • Identification of code modules which implement specific behaviors
  • Unsupervised behavior identification
  • Machine Learning and AI for behavior identification
  • Reliable parsing of file formats and program code
  • De-obfuscation and de-cloaking of malware
  • Robust static and dynamic code analysis
  • Feature extraction in presence of adversaries
  • Robust signature generation and matching
Malware Detection
Topics related to techniques for malware detection:
  • Developing robust malware detection, malware family recognition, identification of novel malware families
  • Network-based malware analysis
  • Host-based malware analysis
  • Malware datasets: publication of new datasets for detection, e.g., family recognition, new family identification, behavior identification, generalization ability
Malware Attribution
Topics exploring methods and techniques to confidently attribute a piece of malware to its creators:
  • Binary and source-code attribution
  • Adversarial attribution
Malware Arms Race
Topics related to the malware arms race:
  • Virtual malware arms race environments and competition reports – automated bots of malware and detectors simultaneously attacking and defending networked hosts, adaptively co-evolving in their quest towards supremacy
  • Automated countermeasures to malware anti-analysis techniques, e.g., packing, anti-debugging, anti-emulation
  • Bypassing anti-malware (anti-virus), e.g., via problem-space adversarial modifications
Robustness Evaluations of Malware Analysis
Topics exploring the limitations of existing research:
  • Experiments demonstrating the limitations in robustness of existing methods (for detection, unpacking, behavior analysis, etc.), datasets, defenses
  • Machine learning-based malware analysis and adversarial machine learning
  • Overcoming limitations – demonstrating methods resilient to, e.g., concept drift (malware evolution), polymorphism, new malware families, new anti-analysis techniques, or adversarial machine learning defenses

Submission Guidelines

We invite the following types of papers:

  • Original Research papers, which are expected to be 8 pages, not exceeding 12 pages in double-column IEEE format including the references and appendices. This category of papers should describe original work that is not previously published or concurrently submitted elsewhere.
  • Position or open-problem papers, of up to 6 pages, using the same template (title for this category must include the text "Position:” at the beginning). Position research papers aim at fostering discussion and collaboration by presenting preliminary research activities, work in progress and/or industrial innovations. Position research papers may summarize research results published elsewhere or outline new emerging ideas.
  • Reproducibility papers, of up to 8 pages (title for this category must include the text "Reproduction Report:” or “Reproducing…” at the beginning). This is a new, experimental category we are introducing to solicit re-implementation and open-source release of important papers in malware analysis and detection for which source code is not publicly available. The submission needs to include a 5-10 minute video tutorial with clear reproducibility steps. The paper needs to include insights into the main challenges and limitations. In addition to the submission, your prototype will also be evaluated: your prototype needs to have good documentation, with clear computational requirements and an easy interface to reproduce results from the original papers, or - if not - dive deeper into the reasons for why that was not possible.

Submissions must be anonymous (double-blind review), and authors should refer to their previous work in the third-person. Submissions must not substantially overlap with papers that have been published or that are simultaneously submitted to a journal or conference with proceedings.

Papers must be typeset in LaTeX in A4 format (not "US Letter") using the IEEE conference proceeding template supplied by EuroS&P: eurosp2023-template.zip. Please do not use other IEEE templates.

Submissions must be in Portable Document Format (.pdf). Authors should pay special attention to unusual fonts, images, and figures that might create problems for reviewers. Your document should render correctly in Adobe Reader XI and when printed in black and white.

Accepted papers will be published in IEEE Xplore. One author of each accepted paper is required to attend the workshop and present the paper for it to be included in the proceedings. Committee members are not required to read the appendices, so the paper should be intelligible without them. Submissions must be in English and properly anonymized.

Submission Site

Submission Website

Program (tentative)

Local Time (CEST) Title Authors Abstract
08:50 (10 min) Opening Speech Fabio Pierazzi and Nedim Šrndić
09:00 (1 hour) Keynote Speech: An Overview of Modern Windows Malware Analysis: Where We Are and Where We Are Going Simone Aonzo Malicious software has constantly been growing and evolving, from a small research experiment in 1971 to an essential component of modern military arsenals. Today, malware analysis is a term used in the literature to describe a broad field of work with multiple objectives. In this talk, after providing the necessary background, I will present some of the many facets of this line of research that unfold under the malware "umbrella." Finally, I will present some recent results we obtained, while also trying to reveal the hidden technical challenges that I have faced, in the hope that my solutions will help our community avoid repeating some mistakes.
10:00 (30 min) Temporal Analysis of Distribution Shifts in Malware Classification for Digital Forensics Francesco Zola, Jan Bruse, Mikel Galar In recent years, malware diversity and complexity have increased substantially, so the detection and classification of malware families have become one of the key objectives of information security. Machine learning (ML)-based approaches have been proposed to tackle this problem. However, most of these approaches focus on achieving high classification performance scores in static scenarios, without taking into account a key feature of malware: it is constantly evolving. This leads to ML models being outdated and performing poorly after only a few months, leaving stakeholders exposed to potential security risks. With this work, our aim is to highlight the issues that may arise when applying ML-based classification to malware data. We propose a three-step approach to carry out a forensics exploration of model failures. In particular, in the first step, we evaluate and compare the concept drift generated by models trained using a rolling windows approach for selecting the training dataset. In the second step, we evaluate model drift based on the amount of temporal information used in the training dataset. Finally, we perform an in-depth misclassification and feature analysis to emphasize the interpretation of the results and to highlight drift causes. We conclude that caution is warranted when training ML models for malware analysis, as concept drift and clear performance drops were observed even for models trained on larger datasets. Based on our results, it may be more beneficial to train models on fewer but recent data and re-train them after a few months in order to maintain performance.
10:30 (15 min) Break
10:45 (30 min) A Wolf in Sheep's Clothing: Query-Free Evasion Attacks Against Machine Learning-Based Malware Detectors with Generative Adversarial Networks Daniel Gibert, Jordi Planes, Quan Le, Giulio Zizzo Malware detectors based on machine learning (ML) have been shown to be susceptible to adversarial malware examples. However, current methods to generate adversarial malware examples still have their limits. They either rely on detailed model information (gradient-based attacks), or on detailed outputs of the model - such as class probabilities (score-based attacks), neither of which are available in real-world scenarios. Alternatively, adversarial examples might be crafted using only the label assigned by the detector (label-based attack) to train a substitute network or an agent using reinforcement learning. Nonetheless, label-based attacks might require querying a black-box system from a small number to thousands of times, depending on the approach, which might not be feasible against malware detectors.

This work presents a novel query-free approach to craft adversarial malware examples to evade ML-based malware detectors. To this end, we have devised a GAN-based framework to generate adversarial malware examples that look similar to benign executables in the feature space. To demonstrate the suitability of our approach we have applied the GAN-based attack to three common types of features usually employed by static ML-based malware detectors: (1) Byte histogram features, (2) API-based features, and (3) String-based features. Results show that our model-agnostic approach performs on par with MalGAN, while generating more realistic adversarial malware examples without requiring any query to the malware detectors. Furthermore, we have tested the generated adversarial examples against state-of-the-art multimodal and deep learning malware detectors, showing a decrease in detection performance, as well as a decrease in the average number of detections by the anti-malware engines in VirusTotal.
11:15 (30 min) Simplification of General Mixed Boolean-Arithmetic Expressions: GAMBA Benjamin Reichenwallner, Peter Meerwald-Stadler Malware code often resorts to various self-protection techniques to complicate analysis. One such technique is applying Mixed-Boolean Arithmetic (MBA) expressions as a way to create opaque predicates and diversify and obfuscate the data flow.

In this work we aim to provide tools for the simplification of nonlinear MBA expressions in a very practical context to compete in the arms race between the generation of hard, diverse MBAs and their analysis. The proposed algorithm GAMBA employs algebraic rewriting at its core and extends Simba. It achieves efficient deobfuscation of MBA expressions from the most widely tested public datasets and simplifies expressions to their ground truths in most cases, surpassing peer tools.
11:45 (40 min) Discussion panel All paper presenters and the keynote speaker The future of robust malware analysis.
12:25 (5 min) Closing speech Fabio Pierazzi and Nedim Šrndić

Committee

Workshop Program Chairs

Steering Committee

Program Committee

  • Giovanni Apruzzese, University of Liechtenstein, Liechtenstein
  • Daniel Arp, University College London, UK
  • Kevin Borgolte, Ruhr-University Bochum, Germany
  • Savino Dambra, Norton Research Labs
  • Luca Demetrio, University of Genoa, Italy
  • Thijs van Ede, University of Twente, Netherlands
  • Thorsten Eisenhofer, Ruhr University Bochum, Germany
  • Bobby Filar, Sublime Security
  • Apoorva Joshi, Elastic
  • David Klein, Technische Universität Braunschweig, Germany
  • Raphael Labaca-Castro, Universität der Bundeswehr München, Germany
  • Martina Lindorfer, TU Wien, Austria
  • Davide Maiorca, University of Cagliari, Italy
  • Brad Miller, Twitter, USA
  • Luis Muñoz-González, Imperial College London, UK
  • Azqa Nadeem, Delft University of Technology
  • Fabrício José de Oliveira Ceschin, Federal Univeristy of Paraná, Brasil
  • Luca Pajola, University of Padova, Italy
  • Maura Pintor, University of Cagliari, Italy
  • Erwin Quiring, Ruhr University Bochum, Germany & International Computer Science Institute (ICSI), Berkeley, USA
  • Christian Wressnegger, Karlsruhe Institute of Technology, Germany

Past Editions

The first edition of WoRMA took place in 2022, co-located with AsiaCCS in Nagasaki, Japan (https://worma.gitlab.io/2022/).