Antivirus: on the trail of malware, even with heuristic techniques

Antivirus is a fundamental tool in our daily "computer life" , analyzing the files of our computer thanks above all to heuristic techniques allows us to identify potential malware before they can do significant damage. However, it often happens to us to approach this program by seeing only the graphic dashboard, without thinking about what is actually underneath. Indeed, most of the time it just seems like a waste of time to scan the system for potential software that could compromise our security. But how does an antivirus in practice tell us if a program is malicious? There are some established techniques that can be combined to increase analytical capacity, including:

  • Static analysis, where the virus "signatures" that are already known are compared with the de-compiled code of the analyzed program;
  • Sandbox, or rather isolated environments, created ad-hoc, which allow suspicious software to be run to understand its behavior and, based on the actions, decide whether to classify it as malicious or not;
  • Heuristic methods, which try to predict the goodness or otherwise of an application through estimates with respect to predefined models.

These three basic techniques can more or less be combined by each vendor to get different (and hopefully better) results in the search for malicious software.

Simple techniques but now well refined

Static analysis (also known as the signature-based method ) uses the signature of the executable, its inherent feature, and compares it with a database of signatures from known malware . As it is easy to understand, the main advantage is that, in the case of a match, we will be sure that that file is malicious due to the typical fingerprint already detected for that type of virus. On the other hand, the biggest difficulty is having an always updated database which will, of course, be the responsibility of the manufacturer to maintain. This translates into a significant deficiency in terms of identifying threats not yet known precisely because they have not been analyzed and not very widespread.

Maintaining a malware signature database is a major responsibility of the manufacturer.
Maintaining a malware signature database is a major responsibility of the manufacturer.

The other very well established approach is the use of sandboxes which, as we said, are isolated virtual machines. Their isolation guarantees that potential dangerous software can be executed without particular impacts and indeed its behavior can be evaluated so as to understand how to classify it. This mode is extremely effective for analyzing what damage the software being analyzed can cause but, in fact, it does not increase our ability to predict potential malware. Furthermore, the use of virtual machines is impractical in everyday use when we are often in a hurry to open documents, install applications or decompress archives, but perhaps more suitable for laboratory use where more advanced research and analysis can be done.

The heuristic techniques against malware

The third technique is based on heuristic methods that try to predict the goodness or otherwise of the analyzed file . What is a heuristic? Generally, in operations research a heuristic method is an algorithm capable of arriving at a good solution (not necessarily optimal, that is the best we could produce) in a shorter time than a classical method. This approach is particularly effective for the so-called NP-Complete problems, that is particularly complex problems which would require too high resolution times and computational capacity but which can find an answer with these algorithms.

Heuristic methods are used to classify files into the correct category.
Heuristic methods are used to classify files into the correct category.

In the reality of the antivirus with heuristic techniques we mainly mean Machine Learning methods that have the objective of classifying a given file in the category of viruses or trusted files. The analysis is especially based on some typical characteristics such as: system or API calls, the analysis of the control flow graph, as well as, at a lower level, the analysis of operational codes, i.e. codes at machine level which identify the operation to be performed.

On the basis of the classification technique used (such as Naïve Bayes) and the refinement of the classification algorithms used, we will obtain a prediction that is much more responsive to reality. Furthermore, the heuristic approach is particularly valid in cases of polymorphic viruses , which are able to change their code. The algorithm is able to evaluate even a partial correspondence in the application code which allows to activate the necessary alarms and to warn the user of the potential danger. The biggest disadvantage, however, lies precisely in the difficulty of predicting a virus with certainty and potentially generating false positives that affect the normal operation of us users.

Antivirus in comparison

From the beginning of this analysis we have said that these programs are in fact still quite unknown to end users. However, recent studies such as the one led by the Politecnico di Milano group aim to shed light on the different implementations through inferences based on the results obtained by analyzing the same malware with different applications. The framework they have created allows us to understand the behavior of antivirus in the presence of different applications by evaluating their responsiveness but above all the application of one technique rather than another.

In conclusion, therefore, these extremely fascinating tools for lovers of cyber security still remain rather obscure but we have more and more knowledge to evaluate their capabilities and technologies. Maybe in the not too distant future we will also have an open standard that allows us to align technologies and provide increasingly effective tools. On the other hand, the motto " Security through obscurity " is strongly discouraged to ensure high levels of security in our systems!

Article by Nicola Fioranelli

The article Antivirus: on the trail of malware, even with heuristic techniques comes from Tech CuE .