Malicious Python Package uses Unicode support to evade detection 

Pierluigi Paganini March 27, 2023

Researchers discovered a malicious package on PyPI that uses Unicode to evade detection while stealing sensitive data.

Supply chain security firm Phylum discovered a malicious Python package on the Python Package Index (PyPI) repository that uses Unicode to evade detection and deliver information-stealing malware.

The package, named onyxproxy, was uploaded to the PyPI repository on March 15, 2023. The analysis of the package revealed that it supports data harvesting capabilities.

“Phylum’s automated platform recently detected the onyxproxy package on PyPI, a malicious package that harvests and exfiltrates credentials and other sensitive data. In many ways, this package typifies other token stealers that we have found prevalent in PyPI.” reads the analysis published by Phylum.”However, one feature of this particular package caught our eye: an obfuscation technique that was foreseen in 2007 during a discussion about Python’s support for Unicode, documented in PEP-3131

While inspecting the code the experts multiple strange, non-monospaced, sans-serif font with mixed bold and italics. The attackers used Unicode variants of characters that appear identical to a human inspection (homoglyphs) (i.e., self vs. 𝘀𝘦𝘭𝘧). The attackers used this trick to evade detection, but when the Python interpreter parsed the code the malicious code was executed.

malicious package on PyPI

“An obvious and immediate benefit of this strange scheme is readability. We can still easily reason about this code, because our eyes and brains can still read the words, despite the intermixed fonts. Moreover, these visible differences do not prevent the code from running, which it does.” continues the analysis. “One might dismiss this as a developer trying to show how clever they can be, except that this package is trying to steal and exfiltrate things immediately upon installation.”

A similar technique was detailed by the researchers Nicholas Boucher and Ross Anderson, who explained how to abuse bidirectional override characters and homoglyphs in a variety of programming languages.

The experts pointed out that the author of onyxproxy demonstrates is not sophisticated, he likely merely cut-and-paste code from various sources and put them together. This obfuscation technique is absent from other parts of the code in setup.py and many Python modules are imported multiple times.

“But, whomever this author copied this obfuscated code from is clever enough to know how to use the internals of the Python interpreter to generate a novel kind of obfuscated code, a kind that is somewhat readable without divulging too much of exactly what the code is trying to steal.” concludes the report. “This novelty is something that we will be keeping an eye on at Phylum, because now that this technique has proven viable in the wild, we fully anticipate others to copy and improve their attempts to attack developers.”

Follow me on Twitter: @securityaffairs and Facebook and Mastodon

Pierluigi Paganini

(SecurityAffairs – hacking, Python)



you might also like

leave a comment