Atom Content Negotiation (follow-up)

Discussion:

Erik Wilde

2011-04-30 19:20:20 UTC

hello.

on
http://dret.typepad.com/dretblog/2011/04/atom-content-negotiation.html i
suggested a <link> usage that is not allowed by atom (because extension
attributes need to be namespaced), but i think the problem is worth a
solution and i'm planning to write a small draft for it.

the problem is that this link relation needs a parameter (specifying
what the media type of the linked feed/entry/content is) and maybe some
title text. following typical XML usage, it might be useful to define an
extension attribute (for the machine readable information) and allow a
title (for an optional human readable label). the link relation then
would be used this way (pointing to a feed with XML content):

<link rel="alternate-content" href="..."
alternate-content:content-type="application/xml" title="XML version
based on bla.xsd"/>

i am wondering what people are thinking about this kind of link
relation. is it looking to unusual (an attribute that would be required
for this link relation)? if so, what other way of representing it would
you prefer?

thanks and kind regards,

dret.

--
erik wilde | mailto:dret-TVLZxgkOlNX2fBVCVOL8/***@public.gmane.org - tel:+1-510-6432253 |
| UC Berkeley - School of Information (ISchool) |
| http://dret.net/netdret http://twitter.com/dret |

James Holderness

2011-04-30 20:02:18 UTC

Permalink

Post by Erik Wilde
on
http://dret.typepad.com/dretblog/2011/04/atom-content-negotiation.html i
suggested a <link> usage that is not allowed by atom (because extension
attributes need to be namespaced), but i think the problem is worth a
solution and i'm planning to write a small draft for it.

I don't get why you rejected the "link content variants" solution. You don't necessarily have to include the HTML version inline - you could just have a very basic text summary.

<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">

<title>Example Feed</title>
<link href="http://example.org/"/>
<updated>2003-12-13T18:30:02Z</updated>
<author>
<name>John Doe</name>
</author>
<id>urn:uuid:60a76c80-d399-11d9-b93C-0003939e0af6</id>

<entry>
<title>Atom-Powered Robots Run Amok</title>
<link type="text/html" href="http://example.org/2003/12/13/atom03.html"/>
<link type="application/xml" href="http://example.org/2003/12/13/atom03.xml"/>
<link type="application/json" href="http://example.org/2003/12/13/atom03.json"/>
<link type="application/rdf+xml" href="http://example.org/2003/12/13/atom03.rdf"/>
<id>urn:uuid:1225c695-cfb8-4ebb-aaaa-80da344efa6a</id>
<updated>2003-12-13T18:30:02Z</updated>
<summary>Some text.</summary>
</entry>

</feed>

And even the summary isn't strictly necessary. You don't seem to be particularly interested in producing a feed that would be useful to a typical feed reader, so you might as well go with the bare minimum necessary to make the feed valid.

If you think it absolutely necessary to create a new rel type to make your system work, you should probably be asking yourself whether you really want Atom as your data format.

TLDR: I think your proposal sucks.

Erik Wilde

2011-05-01 05:31:23 UTC

Permalink

hello james.

Post by James Holderness
I don't get why you rejected the "link content variants" solution. You
don't necessarily have to include the HTML version inline - you could
just have a very basic text summary.

i don't reject it, but it makes it necessary to issue additional GET
requests for each single entry. if you know that all you ever want are
the linked alternate variants, then it would be much more effective to
simply GET those in a feed (if there was one and you could find it).

Post by James Holderness
And even the summary isn't strictly necessary. You don't seem to be
particularly interested in producing a feed that would be useful to a
typical feed reader, so you might as well go with the bare minimum
necessary to make the feed valid.

i am interested in producing feeds that are useful to feed-consuming
clients. human-oriented feed clients prefer HTML, whereas other types of
clients might prefer XML or RDF or something else.

looking forward, i want to use feeds for pushing content, and in
particular in frameworks supporting fat pings (such as PudSubHubbub),
the effectiveness of pushing would be greatly diminished if for each
pushed entry, the client would then need to GET the linked content
variant. instead, it would be much better if different clients would be
able to discover feed variants and then get the content pushed to them
that they want.

these use cases may not be the ones you're interested in, but they do
exist, and therefore i am wondering how to best address them.

thanks and cheers,

dret.

--
erik wilde | mailto:dret-TVLZxgkOlNX2fBVCVOL8/***@public.gmane.org - tel:+1-510-6432253 |
| UC Berkeley - School of Information (ISchool) |
| http://dret.net/netdret http://twitter.com/dret |

James Holderness

2011-05-02 07:15:44 UTC

Permalink

Post by Erik Wilde
i don't reject it, but it makes it necessary to issue additional GET
requests for each single entry. if you know that all you ever want are
the linked alternate variants, then it would be much more effective to
simply GET those in a feed (if there was one and you could find it).

With HTTP pipelining those multiple GETs shouldn't have that much of a negative impact, although I'll admit that in practice pipelining does have potential problems.

But even without pipelining, after the first feed retrieval, I would still expect those multiple GETs to be more effecient on subsequent updates. Two small GETs to retrieve one new feed entry would seem more efficient than one giant GET of a 20 entry feed when you only want the latest entry (unless you're also supporting RFC3229+feed?)

I guess this does depend to a some extent on the nature of your data, the frequency of updates, and the kind of clients you expect to have. Either way, though, I think you may be optimizing prematurely.

Post by Erik Wilde
i am interested in producing feeds that are useful to feed-consuming
clients. human-oriented feed clients prefer HTML, whereas other types of
clients might prefer XML or RDF or something else.

I still think you are going about it the wrong way. If the content-type for Atom doesn't sufficiently distinguish between the types of feed you want to serve for HTTP content negotiation to work, it seems to me you should be looking at ways to extend the content-type (like the type parameter that was proposed to distinguish between Atom entry documents and Atom feed documents). Inventing a new form of content negotiation that requires parsing links from the top of an Atom feed is just twisted.

But the real problem may just be an inappropriate use of Atom as a general purpose container format. If you have a client that prefers content as RDF, why not give them a true RDF feed? Why RDF embedded in Atom? That kind of misses the point of RDF. Even worse, if a client prefers JSON, do you really think it makes sense to serve them that JSON as base-64 encoded blobs inside an XML container? I can assure you that nobody is going to thank you for that option.

If you served RDF clients a real RDF feed, and JSON clients a real JSON format, then you could be using standard HTTP content negotiation - at least for most cases. If you wanted to provide an "HTML friendly" feed that was separate from the one with raw XML embedded, you'd still need a way to differentiate between the two, but at that point I would think it not worth the complication. I'd just make your feed content the raw XML, include a short summary, and then an alternate link to a more detailed HTML page if really necessary.

Erik Wilde

2011-05-02 16:09:19 UTC

Permalink

hello james.

Post by James Holderness
With HTTP pipelining those multiple GETs shouldn't have that much of a
negative impact, although I'll admit that in practice pipelining does
have potential problems.
But even without pipelining, after the first feed retrieval, I would
still expect those multiple GETs to be more effecient on subsequent
updates. Two small GETs to retrieve one new feed entry would seem more
efficient than one giant GET of a 20 entry feed when you only want the
latest entry (unless you're also supporting RFC3229+feed?)
I guess this does depend to a some extent on the nature of your data,
the frequency of updates, and the kind of clients you expect to have.

of course it does. and please not that everything i propose is entirely
optional, of course. if you want to do it old-style, then that's fine.
but in particular in the case of push with fat ping, there is a very
substantial difference between being able to receive the data in the
required format via push updates, or having to GET the alternate version
of every single update.

Post by James Holderness
I still think you are going about it the wrong way. If the content-type
for Atom doesn't sufficiently distinguish between the types of feed you
want to serve for HTTP content negotiation to work, it seems to me you
should be looking at ways to extend the content-type (like the type
parameter that was proposed to distinguish between Atom entry documents
and Atom feed documents). Inventing a new form of content negotiation
that requires parsing links from the top of an Atom feed is just twisted.

i am not saying that i am proposing the perfect solution, and there are
certainly different ways of approaching the problem i am looking at. it
seems to me, however, that changing the way how this could be done in a
more perfect way would require a lot of very substantial changes to the
core of media types and HTTP content negotiation. it would be very nice
to have HTTP content negotiation doing this, but unfortunately, atom
sort of "hides" the "real content" (the /feed/entry/content) from
visibility on the HTTP level.

Post by James Holderness
But the real problem may just be an inappropriate use of Atom as a
general purpose container format. If you have a client that prefers
content as RDF, why not give them a true RDF feed? Why RDF embedded in
Atom? That kind of misses the point of RDF. Even worse, if a client
prefers JSON, do you really think it makes sense to serve them that JSON
as base-64 encoded blobs inside an XML container? I can assure you that
nobody is going to thank you for that option.

saying that atom is inappropriate as a general purpose container format
is your opinion. i happen to think that atom is exactly that. afaik, RDF
does not have feeds, and if i as a service designer choose to model my
service in a RESTful way based on atom abstractions, then atom probably
is a very good way to represent that. serving RDF content as well would
just be a convenience feature for those who would prefer to get RDF
representations of what my service exposes.

here is something i wrote a while ago about atom as a general container:

http://dret.typepad.com/dretblog/2009/05/atoms-future-as-a-generalpurpose-format.html

you might not agree with what i am saying here, but there are more and
more services where feeds are used as the RESTful abstraction of the
service model, and this is the area i am looking at for the proposal i
have made.

Post by James Holderness
If you served RDF clients a real RDF feed, and JSON clients a real JSON
format, then you could be using standard HTTP content negotiation - at
least for most cases. If you wanted to provide an "HTML friendly" feed
that was separate from the one with raw XML embedded, you'd still need a
way to differentiate between the two, but at that point I would think it
not worth the complication. I'd just make your feed content the raw XML,
include a short summary, and then an alternate link to a more detailed
HTML page if really necessary.

again, you're saying "real RDF feed" and "real JSON feed" and these are
not things i am aware of. it might be interesting to think of RDF and
JSON serializations of the atom data model, but this would actually be
very hard to do well because of atom's openness and support of XML
namespaces. it's a very different discussion anyway, but we are
currently in the process of encoding atom as an RDF ontology, and i can
tell that it's suprisingly complicated.

so my assumption is to have atom XML as the feed container so that there
is only one representation of the feed abstraction that can be handled
by intermediaries, value-added components (such as push frameworks), and
then make sure that various types on content can be transported in that
framework in a well-defined and flexible way.

cheers,

dret.

--
erik wilde | mailto:dret-TVLZxgkOlNX2fBVCVOL8/***@public.gmane.org - tel:+1-510-6432253 |
| UC Berkeley - School of Information (ISchool) |
| http://dret.net/netdret http://twitter.com/dret |

Hadrien Gardeur

2011-05-02 16:18:49 UTC

Permalink

Post by James Holderness
If the content-type for Atom doesn't sufficiently distinguish between the
types of feed you want to serve for HTTP content negotiation to work, it
seems to me you should be looking at ways to extend the content-type (like
the type parameter that was proposed to distinguish between Atom entry
documents and Atom feed documents). Inventing a new form of content
negotiation that requires parsing links from the top of an Atom feed is just
twisted.

+1. Media parameters are not sufficiently well defined yet, but I believe
that this is the right solution to this problem.

Erik Wilde

2011-05-02 17:29:56 UTC

Permalink

hello hadrien.

Post by Hadrien Gardeur
+1. Media parameters are not sufficiently well defined yet, but I
believe that this is the right solution to this problem.

philosophically, i agree. the question i am asking myself is how likely
it is that this is actually going to happen. it might require a rewrite
of the media type specification, and a rewrite of how content
negotiation in HTTP interacts with that updated media type model. is
that something that is likely to happen, in the mid term or at all? i am
having my doubts, which is why i was suggesting my admittedly suboptimal
approach, which has the advantage of layering on top of the existing
architecture without the need for updating some of the core standards.
short version: do you think the right solution is going to happen? and
if not, what then?

cheers,

dret.

--
erik wilde | mailto:dret-TVLZxgkOlNX2fBVCVOL8/***@public.gmane.org - tel:+1-510-6432253 |
| UC Berkeley - School of Information (ISchool) |
| http://dret.net/netdret http://twitter.com/dret |

Hadrien Gardeur

2011-05-03 14:05:55 UTC

Permalink

Sure, it's not likely at all to happen in the mid term, but with media
parameters you're not "breaking" anything either. If we want those
specifications to evolve, we need more examples and people interested in
those parameters. The AtomPub spec with its "type" parameter for Atom is a
good start but we need to go further.

Erik Wilde

2011-05-09 21:21:02 UTC

Permalink

hello.

Post by Hadrien Gardeur
Sure, it's not likely at all to happen in the mid term, but with media
parameters you're not "breaking" anything either. If we want those
specifications to evolve, we need more examples and people interested in
those parameters. The AtomPub spec with its "type" parameter for Atom is a
good start but we need to go further.

like i said before, philosophically, i do agree that this would be
cleaner and the better approach. but we would need to update the atom
spec which, as i am reading it, does not allow for extensibility of
media type parameters:

http://tools.ietf.org/html/rfc4287#section-7 says that the only allowed
parameter is "charset". i think the way to go for all of this to be nice
and clean would be to add a "content-type" parameter, which as a value
would have a media type (the media type of the content embedded or
linked to by the feed). this raises two questions:

- is that kind of parameter value even allowed in a media type? and if
it is, how many media type parsers might possibly break because a media
type would contain "two media types"?

- how realistic is it to update RFC 4287 in that way, which as i
understand it would not be backwards compatible?

if that was a realistic way to go, even things such as
http://www.w3.org/TR/html5/links.html#rel-alternate might need to be
updated, so that the new features would be allowed/supported (and
shouldn't HTML5 allow the optional charset parameter that is possible
according to RFC 4287?).

any feedback on this would be greatly appreciated. kind regards,

dret.

--
erik wilde | mailto:dret-TVLZxgkOlNX2fBVCVOL8/***@public.gmane.org - tel:+1-510-6432253 |
| UC Berkeley - School of Information (ISchool) |
| http://dret.net/netdret http://twitter.com/dret |

Hadrien Gardeur

2011-05-11 12:51:13 UTC

Permalink

like i said before, philosophically, i do agree that this would be cleaner
and the better approach. but we would need to update the atom spec which, as
http://tools.ietf.org/html/rfc4287#section-7 says that the only allowed
parameter is "charset". i think the way to go for all of this to be nice and
clean would be to add a "content-type" parameter, which as a value would
have a media type (the media type of the content embedded or linked to by

There's also http://tools.ietf.org/html/rfc5023#section-12

- is that kind of parameter value even allowed in a media type? and if it
is, how many media type parsers might possibly break because a media type
would contain "two media types"?

That's a valid point indeed.

- how realistic is it to update RFC 4287 in that way, which as i understand
it would not be backwards compatible?
if that was a realistic way to go, even things such as
http://www.w3.org/TR/html5/links.html#rel-alternate might need to be
updated, so that the new features would be allowed/supported (and shouldn't
HTML5 allow the optional charset parameter that is possible according to RFC
4287?).
any feedback on this would be greatly appreciated. kind regards,

I believe that what you're trying to deal with is actually a much more
generic problem than what you're describing.

If I'm distributing bitmap files, but for obvious reasons, I decide to use a
zip archive containing all these bitmap files together, how do I indicate in
Atom that this archive contains bitmap files ? We do not have an easy way to
express this kind of information in Atom, we simply point to the archive
URL, using the zip mimetype. Other containers (DRM files for example,
various audio & video containers) have a similar problem.

As far as I can tell, you're interested in using Atom as an
envelope/container, to distribute multiple representations of the same
information. In this case, what you're actually doing is not "content
negotiation", it is the exact same situation than with other containers.

Hadrien

Peter Krantz

2011-05-01 08:32:57 UTC

Permalink

the problem is that this link relation needs a parameter (specifying what
the media type of the linked feed/entry/content is) and maybe some title
text.

Forgive me if I have misunderstood the use case scenarios here, but:

It looks like you will be duplicating the value of the link relation
in the link element type attribute for each entry in the target feed?
How would you handle discrepancies if the marked feed had entries with
a different value?

If there are multiple feeds for a given set of resources, how can I
trust I am getting the same updates in each of the feeds?

I see a lot of benefit in having one feed with multiple link elements.
If I'm only interested in the RDF entries I am still doing the same
number of requests (one for the feed and then one for each entry I am
interested in)? And I can trust I get all updates at the same time.

Regards,

Peter Krantz

Erik Wilde

2011-05-01 17:21:40 UTC

Permalink

hello peter.

thanks for your comment!

Post by Peter Krantz
It looks like you will be duplicating the value of the link relation
in the link element type attribute for each entry in the target feed?
How would you handle discrepancies if the marked feed had entries with
a different value?

i think i wasn't clear enough. my proposal would only be for feed/link elements, not for feed/entry/link elements. the idea is to point to a variant of the whole feed, not to individual entries. it should be a link y can follow when you want to consume a different feed, one that has different "primary" content. and each "feed variant" would be a regular feed and thus be free to link to entry alternate versions. the label i am thinking of would indicate something you might call the feed's "primary content type".

Post by Peter Krantz
If there are multiple feeds for a given set of resources, how can I
trust I am getting the same updates in each of the feeds?

i don't think you can trust the publisher a 100%, but it's a good thing to point out, and the link relation should probably say that "publishers SHOULD try to expose updates across feed variants in a consistent manner", or something along these lines.

Post by Peter Krantz
I see a lot of benefit in having one feed with multiple link elements.
If I'm only interested in the RDF entries I am still doing the same
number of requests (one for the feed and then one for each entry I am
interested in)? And I can trust I get all updates at the same time.

you're right that there is an issue with trust and consistency, which is probably very hard to formalize or standardize. on the other hand, if you follow a feed that embeds non-RDF (and limks to RDF alternates) and you got 10 new entries when reading the feed, you need a total of 11 GETs to retrieve the data you're interested in. if there was an feed embedding RDF, it's just one GET and you're done. for resource-constrained scenarios such as mobile or embedded systems, and in particular for push support, this is a very significant difference.

thanks and cheers,

dret.