Datasets for the paper: "Scalable, Behavior-Based Malware Clustering" Ulrich Bayer, Paolo Milani Comparetti, Clemens Hlauscheck, Christopher Kruegel, and Engin Kirda Network and Distributed System Security Symposium (NDSS), Internet Society. USA, February 2009 Files in this distribution: Archive: ndss09_malware_clustering.tgz sample_md5: samples are identified by their md5, listed in this file. The id of the sample (a number used in the clustering results) is the index of the sample's md5 in this file, counting from 0. clustering_results/: the clustering results are in this directory (read the README) sample_paths: paths to each sample's directory samples/: the sample directories are in here. In each sample's directory you will find the following files: profile: 'our' profile profile_bailey: bailey style profile ttanalyze_report.xml: anubis xml report Archive: ndss09_malware_clustering_large.tgz (3.9GB) Extract it inside the samples directory Contains the additional files (for each sample): profile_syscall_raw: the raw syscall profile anubis.log: the full anubis log file allowed_traffic.dump.gz: the pcap dump of the sample's network traffic Archive: ndss09_malware_clustering_binaries.tar Contains the actual malware samples