Antivirus file scan solution with MuleSoft

As integration architects, we frequently have business requirements for implementing file-based integrations. It is not enough to design a solution to secure information in transit, but also to analyse the content of the file to detect malware before sending it to downstream systems. At a time when ransomware is an everyday headline, Antivirus File Scanning should be implemented regardless of the file source, if the file comes from the Internet or within the organization. Yes, unfortunately, malware can exist in your organization.

How does the Antivirus File Scanning work?

To identify malicious files, every antivirus software depends largely on signatures. When malware is discovered all around the world, it is analysed by malware researchers or dynamic analysis systems. After it has been established that the file is malware, a suitable signature of the file is extracted and uploaded to the antivirus software’s signature database.

Solution:

A common practice is to have anti-virus software on an isolated server and scan the files. This does not help much because it is not always possible to route our file through this server and it does not support scanning a file on-demand.

This means that our solution has two parts. An anti-virus engine and exposing it as an API for file analysis. We are also looking at integrating this API with MuleSoft to help apply this feature.

Antivirus engine as REST API:

Clam-AV is an open-source antivirus toolkit able to detect all known virus signatures. It comes with an embedded database containing a virus signature against which a file is analysed. This database is updated daily when any new virus signatures are detected worldwide.

Why clam-AV? I am glad you asked!

Clam-AV detects millions of viruses, worms, trojans, and other malware, including Microsoft Office macro viruses, mobile malware, and other threats.
Clam-AV’s bytecode signature runtime, powered by either LLVM or our custom bytecode interpreter, allows the Clam-AV signature writers to create and distribute very complex detection routines and remotely enhance the scanner’s functionality.
Signed signature databases ensure that Clam-AV will only execute trusted signature definitions.
Clam-AV scans within archives and compressed files but also protects against archive bombs.

If you’ve been looking for an alternative or readily available on-demand file scanning service, fire-eye’s “Detection on Demand” is a good option. However, it does have some limitations, such as with a file size limit of 100 MB and the fact that it’s only available as a cloud-based service, which means it doesn’t fit many use cases because you’ll have to upload the file to a fire-eye cloud endpoint, which could go against the security philosophy of your organization.

Back to Clam-AV: install this on an isolated server and expose it as a REST API. The JavaScript is used to read an incoming file in the HTTP form data and transfer it to clam-AV with the clamd daemon. The daemon functions by listening to commands on the sockets shown in clamd.conf.

Where are we? Installed antivirus engine and exposed as REST API. Find the docker instances of the clam-AV and REST API with java script.

An example of the API test looks like this:

Our next task is to make this integration platform developer-friendly and easily accessible, in this case, MuleSoft. A shared flow with a functionality facilitating this file-scan API call and alerting stakeholders helps the security team find a perfect way to achieve reusability. Also, a maven archetype for file-based integration with this shared flow as a dependency will enable the enforcement of this security philosophy. Trust me, that makes our security team happy.

Snapshot showing the intended common flow:

DataWeave code to construct form-data request:

Note: Make sure to scan the file as a first operation in your API, before any transformation or backend calls.

This common flow’s source code is available here.

I would like to point out that we’re compromising some performance for the sake of security. It’s critical to prioritise security on par with, if not ahead of, performance.

The main takeaway from this article is that we’ve identified ClamAV as a viable file scanning engine that we can use on-premises to scan files. Expose the engine as a REST API to call from MuleSoft in file-based integrations as an on-demand file scanning functionality before processing or sending files to downstream systems.

References: