Recently, I’ve been working on a project where I needed to scan a large number of .apk files for potential malware or malicious intent. Given the fact that antiviruses produce many false positives, it would be better for me to scan the files by using more than one antivirus. During a discussion with a colleague, he mentioned the VirusTotal service. VirusTotal is a free service in which a web user can scan files and URLs to see if they are related to any kind of malicious behavior (viruses, worms, Trojans, etc.). To do so, it uses 55 different antiviruses and 61 scan engines. Using it is pretty straightforward: users upload a file and when the engines finish their analysis the results are displayed.
Unfortunately, the service allows the scanning of one file or URL at a time. Fortunately, it provides a public API! This API allows developers to provide their file or URL and retrieve the services results programmatically. Note that, in order to use the API, an API key is required. However, the VirusTotal community can provide such key, after a simple sign up.
To analyze multiple .apk files I created a series of Python scripts that utilize this API. The scripts can be found in my GitHub repository and they can be used by anyone who needs to do the same thing. The first step is to perform a bulk upload of the files and the second, to retrieve back all the reports. The former can be done via the upload_apks.py script. This script expects a .csv file as an argument. This file must contain the names of all the files together with their MD5 hashes in the following format:
You can easily calculate a message-digest fingerprint for a file by running the md5 command, on the command prompt of your Mac (or using md5sum on Linux). Note that, all the .apk files must be in the same directory as the script and the .csv file (or you must add the paths to it). Also, you will need a script that encodes multipart form data to upload files via POST request (the one that I used can be obtained from here).
To perform the bulk upload, the script opens the .csv file and for every line, sends the corresponding file by performing an HTTP POST request to the following URL:
Note that, if the script fails and the file is not uploaded, the name of the file is logged together with its MD5 hash to another .csv file. After the file finishes, the results can be retrieved via the retrieve_reports.py script. This script also uses the same .csv file we mentioned earlier. To retrieve a file report, it performs an HTTP POST request to the URL below:
The VirusTotal service will send back a JSON object that contains the results of all the antiviruses for this file. Then, for every result the script will check if the file was marked as suspicious and will print the corresponding antivirus and its result (e.g Symantec indicated that this may be a Trojan.KillAV Trojan horse). Notably, the chosen API format is limited to at most four requests in any given 1-minute time frame. Hence, both scripts sleep for one minute after four requests. There is also a script called check_apk.py, which can be used to check the results for one given .apk.
It is important to mention that even if a file is identified as malicious by the service, this does not mean that this is the case. For more on the false positives of the various antiviruses, one can refer to the papers of Chang et al. and Yan’s.