Intermediary-based Transcoding Framework

Steven C. Ihde   Paul P. Maglio   Jörg Meyer   Rob Barrett
IBM Almaden Research Center, San Jose, California
{ihde, pmaglio, jmeyer, barrett}


With the rapid increase in the amount of content on the World Wide Web (WWW), it is now clear that information cannot always be stored in a form that anticipates all of its uses. One solution to this problem is to create transcoding intermediaries that convert data from one form to another on demand. Up to now, transcoders have often been constructed to convert one particular data format to another particular data format (e.g., [5,6]). A more flexible approach is to create reusable transcoding operations that can be composed as needed. We describe a framework for document transcoding that is meant to simplify the problem of composing transcoding operations. By specifying the capabilities of operations in a uniform way, our framework can correctly combine operations to convert arbitrary input formats to arbitrary output formats.


We first provide a few definitions. A data object represents content to be transformed (i.e., a sequence of bytes). A type indicates the form in which the data object is represented (including information about the kind of data object and the way in which bytes are encoded, such as "image/gif"). Properties represent attributes of particular data types (for instance, the type "text/xml" might have the property "DTD", which could take on values such as ""). A format combines a type and a set of properties, such as ("text/xml", (("DTD", "foo"))), indicating that this particular set of bytes (data object) is encoded as "text/xml" (type) with DTD "foo" (property).

Transcoding takes a data object in a format that is convenient for its supplier, and converts it into a data object in a format that is convenient for its consumer. It doesn't matter whether this happens at the supplier, at the consumer, or somewhere in between [1,2]. Intermediaries are particularly well suited to the task, as they can be operated by a neutral third party, or set up by either supplier or consumer to avoid changes to existing systems. To this end, we developed an architecture for intermediary-based transcoding (for an alterntive approach, see [3,4]). Our architecture is modular, allowing developers to separate functionality into well-defined units. Our architecture is pluggable, meaning that units of functionality might be combined in ways not foreseen by their authors to achieve new transformations.

Architecturally, we break a transcoding operation down into several steps:

  1. Individual transcoders advertise their capabilities to a "master transcoder".
  2. Some outside entity makes a request to the master transcoder.
  3. The master transcoder arranges for appropriate individual transcoders to perform the work.
The result is that different transcoders can be composed to perform all transcoding operations possible by chaining transcoders.

Each transcoder enumerates one or more transcoding capabilities. Each capability lists an input format and an output format. For example, a simple transcoder designed to transcode HTML pages into WML for display on a cell phone might advertise its input format as "text/html" and its output format as "text/wml". A transcoder designed to help several systems using different XML DTDs to describe the same type of data work together might advertise several capabilities: (1) input format: ("text/xml", (("DTD", "dtd1"))), output format: ("text/xml", (("DTD", "dtd2"))); and (2) input format: ("text/xml", (("DTD", "dtd2"))), output format: ("text/xml", (("DTD", "dtd1"))).

Before the master transcoder receives a request, some party external to the transcoding system will have determined the object's current format and its desired output format. Given a request, the master transcoder examines the capabilities of each transcoder to find a single transcoder or a set of transcoders that can perform the requested operation. Once the master has selected the appropriate set, each is invoked in turn with two inputs: (a) the output of the previous transcoder (or the original input, in the case of the first transcoder selected); and (b) a "transcoder operation", which is a request to perform one or more of the operations advertised in a transcoder's capabilities statement. Every transcoder operation specifies the input format of the object being supplied, and the desired output format of the object to be produced.

The master transcoder's job is to hide the details of transfoming an object from the requestor, thereby letting the requestor concentrate on determining what format the object should be in. Using a formal language to describe the capabilities of each transcoder, the master transcoder can consider each other transcoder without any built-in understanding of the formats involved. The master transcoder need only apply simple pattern-matching rules to find a transcoder that can satisfy the request. In some cases, a request can be satisfied by a single transcoder. In other cases, this is not possible. Here the formal language used to describe capabilities of each transcoder enables the operations of different transcoders to be composed by the master transcoder to accomplish operations that were not foreseen by the authors. In essence, the master transcoder tries to find a chain through the pool of available transcoders, matching output formats to input formats, until the request is satisfied (see Figure 1).

Figure 1: The master transcoder selects an appropriate path through the pool of transcoders.


Because transcoding is an intermediary application, we built our transcoding framework on top of WBI (see [1,2,7]). In particular, the transcoding framework is implemented as a WBI plugin that consists of the master transcoder and various specific transcoders (such as a GIF-to-JPEG transcoder, or an XML-to-XML converter based on XSL processing). In WBI terms, the master transcoder is a document editor that receives the original object (e.g., GIF) as input and produces a modified object (e.g., JPEG) as output according to some requirements. The master transcoder sits in the data stream between client and server. For each object that flows along this stream, WBI calls the master transcoder so that it may inspect the request and the original object to make an appropriate response. If transcoding is necessary, the master transcoder determines the appropriate transcoder or combination of transcoders. The master transcoder arranges for the appropriate transcoders to be subsequently called in the correct order.


  1. Barrett, R. & Maglio, P. P. (1999). Intermediaries: An approach to manipulating information streams. IBM Systems Journal, 38, 629-641.
  2. Barrett, R. & Maglio, P. P. (1998). Intermediaries: New places for manipulating and producing web content. Computer Networks and ISDN Systems, 30, 509-518.
  3. Fox, A. & Brewer, E. A. (1996). Reducing WWW latency and bandwidth requirements by real-time distillation. In Proceedings of the Fifth International World Wide Web Conference (WWW5).
  4. Fox, A., Gribble, S.D., Chawathe, Y.,& Brewer, E.A. (1998). Adapting to network and client variation using active proxies: Lessons and perspectives. IEEE Personal Communications.
  5. Smith, J.R., Mohan, R. & Li, C. (1998). Transcoding internet content for heterogeneous client devices. In Proceedings of IEEE Conference on Circuits and Systems (ISCAS).
  6. Tudor, P.N. & Werner, O.H. (1997). Real-time transcoding of MPEG-2 video bit streams. In IEEE Conference Publication of International Broadcasting Convention 1997, pp.286-301.
  7. WBI Programming Tutorial. Available as