Attackers use Python compiled bytecode to evade detection

Attackers use Python compiled bytecode to evade detection

Newly discovered campaign takes advantage of the fact that most vulnerability scanning tools don't read compiled open-source software.

Credit: Sebastian Spindler

Attackers who are targeting open-source package repositories like PyPI (Python Package Index) have devised a new technique for hiding their malicious code from security scanners, manual reviews, and other forms of security analysis.

In one incident, researchers have found malware code hidden inside a Python bytecode (PYC) file that can be directly executed as opposed to source code files that get interpreted by the Python runtime.

"It may be the first supply chain attack to take advantage of the fact that Python bytecode files can be directly executed, and it comes amid a spike in malicious submissions to the Python Package Index," researchers from security firm ReversingLabs said in a report. "If so, it poses yet another supply-chain risk going forward, since this type of attack is likely to be missed by most security tools, which only scan Python source code (PY) files."

Compiled code versus source code

The vast majority of the packages found on public repositories such as npm for JavaScript, PyPI for Python, and RubyGems for Ruby consist of open-source code files that are packaged into archives. They are easy to unpack and read, and as a result security scanners for these repositories have been built to handle this type of packaging.

Attackers are in a constant battle with security companies to evade detection, and the most common evasion technique when it comes to plaintext code is obfuscation. This consists of using features of the programming language itself such as encoding, decoding, or eval to make the code unreadable yet functional. For example, encoding malicious code in base64 is a commonly used technique, but security tools can deal with such encoding.

In the PyPI ecosystem, the cybercriminals behind the W4SP Stealer malware are known for employing techniques including base64 encoding, LZMA compression, and minification -- the removal of spaces and comments from code to make it more compact but also harder to read. The group uses some third-party open-source tools to achieve this such as pyminifier, Kramer, or Hyperion. In one variation of the W4SP attacks, the obfuscated malicious code in the files was shifted past the edge of the default screen borders, so that someone manually reviewing the source code file wouldn't see it.

However, PYC files are different. They are not human-readable like plaintext PY scripts. PYC files are generated when the Python interpreter imports or executes a Python script. Since they're already interpreted (compiled) code, they can later be executed directly by the Python interpreter without reinterpreting the original script. This helps with performance because it has faster execution times, and the most common use for such files is in the distribution of Python modules.

In most instances of PyPI malware, the malicious obfuscated code is meant to reach out to an external URL and download the malware -- usually an information stealer -- which is another opportunity for security tools to detect suspicious behavior. In this latest incident, with a package called fshec2 that was found to contain a malicious PYC file, the full malicious payload can be hidden within the file and it's much harder to detect it if the security tool is not designed to decompile it.

"Loader scripts such as those discovered in the fshec2 package contain a minimal amount of Python code and perform a simple action: loading of a compiled Python module," the ReversingLabs researchers said. "It just happens to be a malicious module. Inspector, the default tool provided by the PyPI security team to analyse PyPI packages, doesn’t, at the moment, provide any way of analysing binary files to spot malicious behaviors. Compiled code from the .PYC file needed to be decompiled in order to analyse its content."

The fshec2 package found by ReversingLabs exhibited additional behavior that was likely meant to evade detection. Normally, a module is imported from a Python script by using the import directive. However, the malicious PYC module in this case was loaded using importlib, a separate package that implements the import functionality and is used only for particular cases like when an imported library is dynamically modified upon import. In this case the malicious PYC was not being modified, so there's no technical reason to use importlib other than to avoid using the regular import directive, likely for detection evasion.

Credential stealing seems to be the main goal

Once executed on a machine, the fshec2 malicious payload collects information about the system such as usernames, directory listings, and hostnames and then sets up a cron job on Linux or a scheduled task on Windows to execute commands fetched from a remote server. The commands allow the malware to self-update, with the attackers being able to deliver a new version, as well as additional payloads in the form of Python scripts.

The ReversingLabs researchers analysed the command-and-control server and found misconfigurations that allowed them to glance at some information. For example, they found that the victim machines are given an incremental ID and were able to confirm that the malware was indeed executed by several victims.

"The sheer number of these mistakes might lead us to the conclusion that this attack was not the work of a state-sponsored actor and not an advanced persistent threat (APT)," the researchers said. "While my team didn’t collect enough evidence to prove that assumption one way or another, harvesting the filenames by incrementing file ID let us determine that the attack was successful in some cases. Our researchers still can’t say who or what the targets were. However, we can confirm that developers did install the malicious PyPI package and that their machine names, usernames, and directory listings were harvested as a result."

Some of the filenames found on the server suggest that the attackers deployed keylogging functionality on some of the machines.

"Historically, npm has been the unfortunate leader and PyPI also ran in the race to see which open-source platform attracts the most attention from malware authors," the researchers said. "In the last six months, however, ReversingLabs and others have observed a marked increase in the volume of malware published to PyPI. In fact, in May, the creation of new user accounts and projects on PyPI were temporarily suspended for a few hours due to a high volume of malicious activity."

ReversingLabs reported the new attack vector to the PyPI security team who removed the package and said they haven't seen this attack technique before. This doesn't exclude the possibility that other similar packages will make their way onto the repository.

To deal with these modern software supply chain threats, organisations need more than static code analysis solutions. They need tools that can also monitor sensitive development systems for suspicious process creation, file execution, unauthorised URL access, information gathering commands and the use of easy to abuse functions like get_path or importlib.

Show Comments