Software obfuscation and the debate over encryption backdoors

Code obfuscation technologies play an interesting role in the recent debate over mandatory encryption backdoors.  In fact, the debate suggests yet another application for code obfuscation.

Let us suppose for a moment that, despite the advice of the technical community, some form of mandatory backdoor requirements are placed on all platform providers.  Apple would be required to retain the ability to decrypt iOS devices, as would Google for Android, Microsoft for Windows, and so on.

What would happen?  In effect, this would force encryption to move up the software stack. Consider an application developer who wants to protect his law-abiding customers from identity theft, commercial espionage, and other common digital scams.  Since the developer cannot rely on encryption services provided by the underlying platform, the only option is to embed encryption software directly into the application.  Now, Apple’s backdoor would be of no use in recovering this application’s data.

Well, if only it were that simple.  Since Apple controls the platform, its operating system could, in principle, interfere with the application’s encryption implementation. For example, the platform could take snapshots of application memory as it is running, modify the application’s code or data, and even eavesdrop and tamper with the application’s interaction with the user.  What if the platform could find the location in memory where the application stores the decryption key and simply make a copy of that key?  Doing so will defeat the application’s attempt to encrypt user data.

What is the application developer to do?   He or she seem to be facing an impossible situation: the application code is running on an untrusted platform that can peek into the application’s private memory as it is running.  How can we possibly run encryption and decryption algorithms in such extreme settings?

Enter code obfuscation.  Suppose the application developer uses obfuscation (say VBB obfuscation for the sake of the argument — yes, VBB can only be done in generic groups, but let’s ignore that for now). Obfuscation ensures that the platform can learn nothing about the inner workings of the application beyond its input-output behavior.  It would be impossible for the platform to extract the application’s decryption key or any other information about its inner workings. In effect, that platform cannot access user data after the application encrypts it on the device.

This form of obfuscation has a name:  it is called white-box cryptography.  This area examines the question of securely running cryptographic algorithms on untrusted devices. Not surprisingly, obfuscation is the main tool used to provide some level of security.

Why force Apple and others to implement backdoors if any application developer can render those backdoors useless?

While this is a compelling application for obfuscation, there is a significant hole in this design. The trouble is that the underlying platform can emulate user input to the obfuscated application and observe the resulting output.  For example, the platform can emulate user input asking the application to decrypt certain data and then record the values displayed on the screen.  Or it can record the user as he or she are typing a password into the application and use that later.  Obfuscation cannot protect against this type of behavior.  Is there a solution to this UI-based weakness?   We need some way to ensure a trusted path between the application and the user, despite the untrusted underlying platform.  A good area for further research.


Obfuscation: An Introduction

By Amit Sahai

From revolutionaries and spies to ordinary citizens living under repressive regimes, for centuries people have had the need to hide operational secrets. By an operational secret, we mean a secret that must be kept safe, where the secret is also used to perform tasks. For example, someone may want to keep the amount of cash she has on-hand secret, but she also needs to refer to this information when making everyday purchases. Even secrets like the location of hidden refugees under one’s protection could be an operational secret, since the holder of the secret would want to plan her ordinary movements to avoid the secret location that she is protecting. Through many clever means, people throughout history managed to protect such secrets. Essential to such protection was the refuge of the human mind, the ultimate sanctuary where secrets can be pondered and processed without fear of loss.

But what if this most basic assumption proves false? What if an adversary can read someone’s mind while she is thinking about the secret she needs to protect? Indeed, it is hard to imagine how someone can protect secrets when the inner dialogue of her mind can betray her. Fortunately, this scenario remains science fiction when applied to humanity. However, if we replace humans by computer programs, this situation is all too common. Computer programs, with inbuilt databases of sensitive information, are routinely captured by adversarial entities. Adversaries can reverse-engineer these programs and see every detail of how the programs “think” about their secrets as they perform their calculations. Furthermore, adversaries can even modify the programs to alter their behavior, in order to coax secrets out of the original computer code. Because computer programs with inbuilt operational secrets are so useful, cryptographic researchers have pondered this challenge as far back as the classic work of Diffie and Hellman in 1976. For decades, however, our ability to suitably “obfuscate” programs in order to defend against these kinds of attacks was based on unsatisfying approaches that failed to provide security against even moderately skilled adversaries.

This changed in 2013, where in a joint work with Garg, Gentry, Halevi, Raykova, and Waters (FOCS 2013), we gave the first mathematical approach to this problem. This work was hailed as “a watershed moment for cryptography” by the Simons Foundation’s Quanta magazine.

The importance of mathematics: An analogy to message encryption.
For centuries, the protection of secret messages was an ad-hoc process. For example, messages were translated into an unfamiliar or artificial language, or hidden in a messenger’s scalp underneath her hair. These techniques, while clever, were easily broken and often relied on security by obscurity. This changed with the introduction of sophisticated mathematical techniques for encryption, such as RSA. Now, breaking such encryption schemes is known to require solving extremely difficult mathematical problems. The use of encryption based on hard math problems has revolutionized data security, and enabled countless applications from secure e-commerce to secure e-mail.

Unfortunately, over the last several decades, the problem of protecting operational secrets within software remained closer in spirit to encryption of centuries past. Methods for protecting software were clever, but ad-hoc and ultimately broken. Ideas like the software equivalent of thinking in an unfamiliar language, or adding unnecessary steps and complication, have been mainstays of the field. Now, we finally have a mathematical approach to general-purpose software obfuscation, ripe for analytical study.

Operational secrets.
The central challenge in protecting operational secrets within software is that the software must remain functional and useful even after such protection is enabled. This necessarily means that many operational secrets inherently cannot be protected within software. For example, suppose our software contains a secret number y, and when a user provides another number x as input to our software, the software tells the user if x < y. Then it is easy to see that by repeated use of this software, a user can recover the secret y completely by performing a simple binary search.

Thus, at a minimum, we need operational secrets to be unlearnable with input/output queries to the function computed by our software. Unfortunately, however, this is not enough: In 2001, in joint work with Barak, Goldreich, Impagliazzo, Rudich, Vadhan and Yang (CRYPTO 2001, J.ACM 2012), we showed that there exist families of functions {fy} such that: (1) on the one hand, no polynomial-time computation can learn y by just making input/output queries to a randomly chosen fy from our family; but (2) on the other hand, any efficient program P that faithfully implements fy allows an efficient adversary to learn the secret y completely. Indeed, this negative result effectively brought obfuscation research to a near standstill for over a decade.

However, in the same 2001 paper, we put forward an alternative, seemingly much weaker, notion of obfuscation called indistinguishability obfuscation (iO): Informally, iO requires that given two equal-size equivalent programs P0 and P1that compute some function f, no efficient adversary can distinguish the obfuscation iO(P0) of the first program from the obfuscation iO(P1) of the second. In other words, iO is a pseudo-canonicalizer, outputting a randomized representation of a program that “acts” as a canonical program computing the function f. In 2007, Goldwasser and Rothblum showed that iO achieves a kind of “best-possible” obfuscation; however, it was still not clear how iO could be used to hide any particular secret. Only recently, since 2013, have we begun to understand how to leverage iO to hide a variety of operational secrets within programs.

The exploration of mathematical techniques for software obfuscation remains in its infancy. The Center for Encrypted Functionalities, established in 2014 by a National Science Foundation Frontier Award, is devoted to the study of mathematical software obfuscation. On this blog, we will share news and thoughts about the study of mathematical obfuscation.

Coming soon: As suggested by Boaz Barak, I will write a “State of the iO” post about our current understanding of the security of iO constructions.

Note: This post was adapted from a research vignette on mathematical software obfuscation written for the Simons Institute.