During “Ask The Architect” at the Devoxx UK 2018 conference, Oracle’s chief architect, Mark Reinhold, called Java’s serialization mechanism a “horrible mistake” and a virtually endless source of security vulnerabilities. More importantly, Reinhold announced Oracle’s decision to improve Java’s security by changing the way Java handles object serialization. Nearly half of the vulnerabilities that have been patched in the JDK in the last two years are related to serialization. Serialization security issues have also plagued almost every software vendor including Google, IBM, SAP, and many more.
Reinhold mentioned that Oracle’s long-term goal is to remove native object serialization and the creation of a new plugin mechanism that will allow developers to choose the serialization format of their choice. Supported formats will include XML, JSON, YAML and even the existing, problematic native serialization. Additionally, a new safe serialization format will be created that will be based on a new language feature called Data Classes.
Object serialization in Java is two decades old, so this is an important decision. Serialization was first introduced to the Java platform in version 1.1 and is tightly coupled to hundreds of components and important functionality of the JVM. Countless other libraries, frameworks and enterprise servers depend directly or indirectly on native Java object serialization.
Removing Java’s serialization mechanism is a very difficult task and a big engineering challenge that requires careful planning and a thorough design because of this tight-coupling. It’s no surprise that this is a long-term goal for Oracle. Also, Oracle cannot commit to a release schedule for replacing serialization.
Serialization issues plague Java and addressing the underlying causes will benefit the Java community, but how long will it take to bring a new approach to the market? In addition, will replacing the old serialization mechanism with a new approach end the issue?
First, we need to understand more about Java’s object serialization, which is the process of converting an in-memory object (graph) into a stream of bytes for transport and storage. This process can be fully automated by the JVM and it can be transparent to any application component that needs such functionality. In order for a class (component) to utilize Java’s object serialization the class needs to implement the Serializable interface. The whole process of serialization and deserialization is based on a very detailed specification. Deserialization is the reverse process.
Dropping serialization support from Java cannot be achieved by simply removing the Serializable interface because this will have a significant compatibility impact. Java’s object serialization must be backwards compatible while allowing code to evolve. To achieve this backwards compatibility goal, the specification defines a series of strict requirements of what constitutes a compatible change. According to the specification, removing the Serializable interface from a class is an incompatible change.
Maintaining backwards compatibility is a requirement in many enterprise systems. For example, serialized object graphs could be stored in databases for an arbitrary time period with the expectation that whenever they get deserialized, the deserialization will work as expected, even if the system has been upgraded and the classes have been evolved. Removing this would instantly invalidate every single stored serialized graph. To avoid such failures, organizations would need to carefully plan and prepare a detailed migration strategy not only for their applications and infrastructure but also for every persisted serialized graph, which needs to be re-serialized using a new mechanism and re-persisted.
Numerous enterprise middleware, servers and JEE protocols, such as RMI, JMX, and JMS, are heavily dependent on native Java serialization and as such, are very difficult to change. It is highly probable that the Java EE Expert Group will raise objections to such change and the approach might be revised significantly. JEE vendors would also need significant time and effort – possibly years – to switch to any alternative technology while maintaining backwards compatibility.
We do not know Oracle’s exact strategy on how they are planning to introduce this incompatible change. It is clear that Oracle will need to bring forward this plan in a phased manner that will last several years. In such scenario it is very likely that the first step would be to deprecate the Serializable interface and the java.io related classes in a future Java release.
So far, no JDK Enhancement Proposal (JEP) has been proposed publicly to deprecate the serialization mechanism in Java 11, which is expected to be released later this year. Therefore, it seems that the announcement was made too early and there are no public discussions or proposals for such a change in the immediate future.
Removing the existing serialization mechanism will cause major disruptions and hinder the adoption of new Java releases even after the depreciation period expires. This is a highly undesirable scenario for Oracle, especially now that the Java release train is moving faster than ever.
To help with this migration process of maintaining backwards compatibility, Oracle will most likely keep native Java serialization as an option to the new plugin system. In order to achieve this, the serialization related classes will likely be moved out from the java.base module, which provides the fundamental APIs for the Java platform.
However, we must understand that applications using Java object serialization are not automatically vulnerable. The vulnerability occurs only if the application deserializes data from untrusted sources. This means that if an application depends on deserializing user data, then simply switching to another serialization technology will not automatically make the application safe.
Most other serialization technologies such as XML and JSON also suffer from similar critical vulnerabilities. For example, in the recent months, attackers have managed to exploit vulnerabilities (such as CVE-2017-9805) to infect their targets with crypto-mining malware. This demonstrates that the underlying serialization mechanism is not the primary problem. To avoid deserialization vulnerabilities, the application must avoid deserializing untrusted data rather than switching into another serialization technology. This requires a significant engineering effort as the application may have to be redesigned.
Why has Oracle fixed so many serialization vulnerabilities in the JVM if the vulnerability manifests only when an application exposes a deserialization endpoint to its users? The truth is that almost none of these vulnerabilities are exploitable in the JVM. These vulnerabilities were fixed in order to harden the JVM against attacks in case the application exposes deserialization endpoints to unsafe user inputs. For this reason, in the latest Java release most Serializable components have become immune to attacks in case of exposed deserialization endpoints by applications.
Another issue to consider is that legacy servers and applications that cannot be refactored or re-deployed on newer releases of the JVM will remain vulnerable. Recent releases of Java offer a serialization filtering mechanism that could help mitigate some attacks, but this mechanism requires a deep technical understanding of the problem and the application’s internals in order to be properly utilized and configured. Alternatively, applications can be protected against such attacks using a virtualization-based RASP technology that requires no configuration, profiling, tuning or source code changes.
Organizations have also had trouble keeping up with Java updates. Oracle’s Co-CEO Mark Hurd recently acknowledged that Java users typically are months to years behind in their patching schedule. Upgrading versions or rewriting apps takes even longer, if it is even possible.
Even if serialization support is dropped in a future release of Java, organizations may still have cause for concern as deserialization vulnerabilities are not unique to the JVM. Python, Ruby, PHP, and .NET are also affected by deserialization vulnerabilities.
Java is easily the most popular platform language in the world today, and Oracle’s plan to improve the JVM’s serialization facility is a helpful change. However, this alone does not suffice to completely eliminate the blight of deserialization vulnerabilities.
Java Serialization is insecure, and is deeply intertwingled into Java.This is a class of exploit called “deserialization of untrusted data”, aka CWE-502.ll App servers containing commons-collections JAR and remove them.