Hypermedia, past, present and future

Published on January 22, 2015

Hypermedia is not a new concept, it has been around in various forms since the 1960s.  However, in the past seven years there has been a significant resurgence of interest in the concept.  This blog post contains my reflections on the past few years, where we currently are and where we might be headed in the use of hypermedia for building distributed applications.

The HTML years

The majority of developers have only been exposed to hypermedia via HTML.  HTML is an "in-your-face" example of the success of hypermedia, and yet in most Web API related discussions it is often dismissed with "well that's different".  A distinction is made by many between human-2-machine interactions and machine-2-machine interactions.  This distinction is used to explain why API interactions are different.  To be honest, I've yet to see my parents open a web browser and construct a HTTP request by hand.  The Web Browser is itself is a client application, running code that makes HTTP requests.  It is not completely autonomous, but no human invokes calls to load a stylesheet, javascript snippet, or embedded image. 

On the other hand, the web crawlers used by search engines are autonomous and they also consume HTML hypermedia.

There were numerous efforts over the years to present HTML, or more specifically XHTML as a viable media type for Web APIs.  Jon Moore from Comcast made the biggest splash, but Microsoft made efforts in this area and I participated in workshops a RESTfest where we used XHTML as the response media type.

Despite this, the use of HTML in Web APIs has never really gained significant traction.

RDF's lofty goals

RDF has been around for almost as long as HTML, but it has never really made significant progress outside of academia.  I've talked to a significant number of very smart people who believe RDF is the answer.  However, I suspect it is the Xanadu, or the betamax of media types.

The challenge with RDF is that can be quite difficult to grok.  First of all there are a variety of different serialization formats: turtle, n3, RDFa and now JSON-LD.  This can make learning RDF tricky because really need to learn the conceptual model and then learn the mapping used for serialization.

The model of RDF is based around triples, i.e. Subject, predicate and object, where these elements are often identified using URIs.  This can produce quite cumbersome looking documents.

My experience has been that most developers don't have the patience for the sophistication of RDF and quickly want to hide the complexity with tooling.  Tooling can hurt or help, it just depends on who is writing the tooling and what there goals are.

It is possible that I could be proved wrong by JSON-LD.  A number of organizations are starting to adopt JSON-LD.  Only time will tell if it will stick.

Feeds

File:Feed-icon.svgThe RSS format was originally based on an early working draft of RDF.  In 2005 the Atom Syndication Format was released as replacement for RSS.  Both these formats had the specific goal of allowing content creators and distributors to advertise regularly produced along with metadata about the content.

These formats spawned the creation of a wide range of new client applications that could consume this format. 

Feeds As Containers

The success of the Atom format spurred API providers to consider using Atom as a container of data other than blog posts and new items.  Big players like Google and Microsoft created GData and OData that were similar ideas where Atom was used as an envelope for API data.

In order for feed readers to be useful, they needed to understand the contents of the Atom Entry element.  Most often this content was HTML and could be rendered with a HTML rendering library.  However, when Atom feeds were used in APIs for carrying non-HTML data there was no standardized way for clients to understand the contents of the Atom Entry.  The custom data content was identified using XML namespaces that a client was expected to recognize.  Unfortunately, XML namespaces introduce URIs and prefixes and the ability to mix multiple namespaces in a single document.  This starts to bring back the complexity of RDF.

GData didn't really last very long and OData has some successes and some failures and is currently on version 4.

Overall OData was more ambitious than Gdata in that it defined a standardized querying mechanism and built on top of Microsoft's CSDL which is a metadata language used by the ORM, Entity Framework.  This allowed lots of tooling to be built around OData.

Another more recent effort in this area is Activity Streams.  Initially, I understood this to be a generalized way of advertising lists of events that occur, it seems to have evolved into not only an activity stream container, but a mechanism for describing activities using vocabularies.

Linking and Embedding

Late 2010 saw the beginning of flurry of activity in the hypermedia space.  Mike Kelly created HAL which defined a JSON based format that supported linking to other resources and embedding portions of other resources into representations.

In 2011 Collection+Json was created by Mike Amundsen that provides a way of representing a list of things as well as methods to search the list and add to the list.

Siren, Mason, Uber and JsonApi all followed in the following years, all attempting to address perceived shortcomings of HAL and C+J.

The Great Form Debate

One of the features that has been continually debated is the need for forms support in hypermedia types.  HAL did not support forms. Siren, Mason, C+J and Uber do.  HAL relies heavily on link relation types to convey both read and write semantics.  The authors of the other formats feel it is more valuable to have explicit syntax for describing write operations.

DebatingMonks

Spoilt For Choice

The growing REST community has learned a great deal in the process of developing these specifications.  However, they are left with a whole new problem.  Developers who want to start down the path of using hypermedia in their APIs need to make a choice between a range of formats that are only subtly different in their capabilities.  How are developers supposed to choose?

BeerTaps

In the past, choosing a media type was never a particularly difficult proposition because they tend to be built for a specific purpose.  HTML was originally designed for describing a textual document,  image/jpeg is ideal for photographs, text/css for stylesheets,  etc.

However, this most recent set of media types have focused on message format semantics, i.e the ability to link to other content, embed content and describe affordances, all without saying anything about the application domain.  This has the advantage that they can be used within any application domain, for almost any purpose, however, it also makes them have no meaning, no purpose.

Meanwhile On Another Planet

In contrast to these generic hypermedia types, in the world of telecommunications, a media type called VoiceXML was developed.

 Planet

VoiceXML was developed to drive audio and voice response applications.  Related to that effort was CCXML (Call Control eXtensible Markup Language), MSML (Media Server Markup Language), MSCML (Media Server Control Markup Language) and most recently SCXML (State Chart XML).

All of these media types are used as part of a larger system, but each media type is designed to solve a specific problem domain.  

The Great Divide

Earlier I accused some media types of having no meaning, no purpose.  This is actually a design objective of some media types.  The idea is that media types can limit their complexity by focusing on just structural and protocol semantics of the message and leave application and domain semantics to a separate mechanism.  Formats like RDF use the a concept called ontologies to describe application semantics.  JSON-LD can also use JSON schemas defined at schema.org.  HAL recently added support for a notion called Profiles that existed in early versions of HTML and has been resurrected in recent years.  The idea of Profiles is that you can independently apply a set of domain semantics to a hypermedia message.

The advantage of using Profiles is that your API only needs to support a small set of media types, possible only one, and application semantics can be layered on top.  This limits the time and risk involved in designing and documenting new media types.  Mike Amundsen has been spearheading an effort to define a description language called ALPS that makes defining profile description documents possible.

PaintExplosion

Media Type Explosion

One of the pieces of lore in the hypermedia community is that we must avoid a phenomenon called "media type explosion".  The fear is that if every API provider begins creating media types for all of their application semantics we will massively dilute the re-usability of media types and potentially introduce security vulnerabilities.  Also, the process of registration and expert review is not designed to be efficient for large numbers of media types that may never be suitable for public consumptions.

Ironically, the fear of media type explosion has discouraged people from creating media types and therefore have simply resorted to tunneling application semantics over generic media types like application/json.  The result being an explosion of implicit JSON structures.  Not exactly the desired effect.

API Media Types

A popular alternative to reusing existing media types, or creating media types for each application concept, some APIs have decided to create a single media type definition for the entire API.  This approach provides an API with a central place to document message format conventions and structure.  APIs like GitHub, Heroku and Sun's Cloud API take this approach.  This is an interesting compromise.  However, it limits the potential for re-use because the definition of the media type is scoped to the particular API.

Horizontal Media Types

Ideally, at least in my opinion, media types would be built that solve a specific problem but could be re-used across many APIs.  The end goal would be to be able to build APIs by composing a set of pre-defined APIs and minimize or even eliminate the need for new media types to support the API.

Getting widespread agreement on application domain types, like customer, invoice, work task, employee, etc, is especially difficult.  This can be seen if you dig into the history of standards like ANSI X12,  Edifact or UBL. 

However, there are many aspects of applications that are very similar between applications: users, accounts, permissions, roles, errors, lists, reports, long running operations, filters, tables, graphs, dashboards.

I believe these are the low hanging fruit for building re-usable media types.

Link Relation Types Add Context

Media types are not the only way to convey semantic information to a client.  Link relations are an extremely valuable way to add semantic context to more generic media types.  Consider a media type that described a street address.  An address is a fairly well defined concept that could be sufficiently defined for use in a wide range of scenarios.

Creating link relations like "ShipTo", "InvoiceTo", "Home", "Work", "Destination", can provide sufficient additional semantics to  a fairly generic media type to allow a client to make intelligent choices.

Even link relations have different styles in the way they are defined.  Some are very generic, like "next" and "previous".  Others have precisely defined behaviour like "hub" and "oauth2-token".

So, What Should I Use?

If only there were an easy answer to that question.  The hard answer is: learn about the options you have, understand the pros and cons to each approach and then consider the context of the application you are trying to build.  At that point you may have enough information to choose the right solution for your problem.  Good luck!  And make sure you tell everyone about your experiences.  We are all learning together.

Image Credits:
Janus http://davy.potdevin.free.fr/Site/links.html
Beer Taps https://flic.kr/p/3qbi
Planet : https://flic.kr/p/5WBkp9
Paint Explosion : https://flic.kr/p/7ZNa5C