Wednesday, November 21, 2007

Why RPC is Evil, XML-RPC is Doubly Evil and SOAP-RPC is Triply Evil

After reading my own review of Yuli Vasiliev's book, I realised I had to better explain my "SOAP-RPC is evil" comment.

Actually, SOAP-RPC is not just simply evil but triply evil, and here's why.

The root of the whole evil was the original arrogant assumption behind the notion of distributed objects - that we can make remote objects appear as if they are local. It was only much later that Martin Fowler came out with his First Law of Distributed Objects ("Don't Distribute Your Objects").

Quick question: How can we make a remote object appear to be local?
Correct answer: We can't, period.
Naive answer: Serialise the remote object, transport it over the wire and deserialise it to create a local copy.

The reason the naive answer is so badly wrong is that copies behave very differently to the original objects. To be precise, when a copy is changed, the original is not changed. So "hiding" the remoteness of an object through some object serialisation sleight-of-hand doesn't work. The component or application that gets a reference to the copy must know that it is a copy, otherwise all kinds of application errors can occur.

So that's why RPC (Remote Procedure Call) is evil. It tries to make remote objects look local. If one is not careful, this can create serious errors in applications.

The other issue is the serialisation mechanism. Even assuming it's OK to serialise an object and recreate it elsewhere from the serialised representation, what is the mechanism used for serialisation? Java serialisation works because we have Java on both ends. In recent times, XML has become popular, and some bright spark must have thought of "XML serialisation". The concept is simple. Convert a Java object to XML format, transport it over the wire to a remote host, then convert the XML document back into a Java object to create a perfect copy, a "remote object". The problem with this assumption is that there is an impedance mismatch between Java and XML. We can do things with Java that we can't do with XML, and vice-versa.

So converting a Java object into XML isn't straightforward. Things get lost in the translation. Converting from XML back into Java isn't straightforward either. More things get lost in the translation. So this "serialisation/deserialisation" using XML as the transport format doesn't result in perfect copies at the other end. This is XML-RPC, and when it's used naively, it's easy to see why it's doubly evil.

As SOAP-RPC evolved beyond XML-RPC, the wire protocol itself began to be seen as a "contract" between systems. In other words, the SOAP message could be used to decouple implementations at either end. All of a sudden, the XML document going over the wire was not expected to be just a Java serialisation mechanism, it was a technology-neutral serialisation mechanism. So now it became fashionable to think of serialising a Java object into an XML document, wrapping it in a SOAP envelope and "exposing" this as a contract to another system. This other system could pull in the SOAP object, extract the XML document from within it, then deserialise it into...a C# object! Not only is the impedance mismatch alive and well here too, there is another subtle assumption made that is violated in the execution. Can you find it?

Let me sum up.

RPC is evil because it makes developers think they can make a remote object appear local.

XML-RPC is doubly evil because it makes developers think (i) they can make a remote object appear local and (ii) it is possible to convert a Java object to XML and back without error or ambiguity.

SOAP-RPC is triply evil because it makes developers think (i) they can make a remote object appear local, (ii) it is possible to convert a Java (or C#) object to XML and back without error or ambiguity and (iii) even though the SOAP message travelling between two systems is now a "contract" to be honoured regardless of changes to implementations, it can still be generated from implementation classes, and implementation classes can be generated from it.

What's the truth, then?

Nothing glamorous or counter-intuitive.

1. A copy is a copy, not the original.
2. XML is XML and Java is Java (and C# is C#).
3. Contracts are First Class Entities, just like Domain Objects in an OO system.

SOAP-based Web Services technology today is sadder and wiser because it has internalised these truths. Because of Truth 1, we don't pretend anymore that a SOAP message is a Remote Procedure Call. We know it's just a message. Because of Truths 2 and 3, we don't believe anymore that we can generate XML documents from Java/C# objects, or Java/C# objects from XML documents. We can only map data between these two forms. If you build your Web Services abiding by these principles, you can stay out of trouble, otherwise you will be left wondering why your systems are so brittle.

(Actually, there are still people who don't understand these Truths, and who still go on about SOAP-RPC. That's what makes me despair about SOAP's chances for success.)

5 comments:

Unknown said...
This comment has been removed by the author.
Unknown said...

I come from a ColdFusion world, so forgive me if I am totally ignorant of something that is natural for Java developers; but, from what I understood, XML-RPC is just a standard for sending and receiving XML messages. I don't think there is anything inherent about this whole distributed vs. "local" issue that depends on XML-RPC or RPC?

I think that is a whole other issue altogether? Can't XML-RPC just be used to create a system API that is designed to be accessed remotely without any sort of serialization / deserialization?

prasadgc said...

Ben,

If the end goal of the exercise is to send or receive XML messages, then there isn't very much wrong with XML-RPC except the brittleness that comes with tight coupling (both systems have to be running simultaneously).

The problem with XML-RPC as it is commonly used is that systems at both ends natively speak some other language, usually Java, but increasingly C#. The native "domain objects", in which the applications "think" are the main players. The XML is a sort of communication protocol, but the evils start to come in when the two applications are actually trying to send each other _objects_, and using a mechanism that makes it look at the other end that the objects are _local_.

In other words, if I pretend that I'm an application and I can clearly see that an XML message has come to me from some other application, no harm done. But if what I see is an _object_ that looks like it's one of mine, then I have been badly deceived.

To answer your second question, no serialisation or deserialisation is involved if we treat an XML message as an XML message. But if we're ever sending an _object_ over a wire, then serialisation/deserialisation has to happen. And conversion to XML is considered a form of serialisation.

I hope this explanation makes my post more meaningful.

Unknown said...

@Ganesh,

Thanks, that makes more sense. In ColdFusion - my server-side language of choice - the idea of serializing objects has only come to be with the most recent version (ColdFusion 8). Before that, you could flatten arrays and structures, but certainly nothing with data and methods.

Frankly, I don't even understand how a complex object with private data and business logic can be serialized (even in ColdFusion). I mean, if an object has private data, which is really one of the main features of an object in any system (?), how can that be serialized without that data protection somehow being violated?

But that discussion aside, I see what you are saying. You make a good point.

tomar_prithu said...

I do not know, but you are sounding as if for every change we must interact directly to a object at the other end but probably it will increase the traffic, and the dependability of the network is also a factor then.

Also if privacy is needed, then probably communication should be encrypted.