Julian Reschke
2010-04-06 21:17:22 UTC
FYI:
this relates to an HTML-WG discussion about the algorithm to create Atom
feeds from HTML (<http://dev.w3.org/html5/spec/Overview.html#atom>).
See <http://www.w3.org/Bugs/Public/show_bug.cgi?id=7806> and
<http://www.w3.org/html/wg/tracker/issues/86> for more context on how we
got here.
Best regards, Julian
this relates to an HTML-WG discussion about the algorithm to create Atom
feeds from HTML (<http://dev.w3.org/html5/spec/Overview.html#atom>).
See <http://www.w3.org/Bugs/Public/show_bug.cgi?id=7806> and
<http://www.w3.org/html/wg/tracker/issues/86> for more context on how we
got here.
Best regards, Julian
Hi,
below is a change proposal for this issue.
Note that an obvious alternative to fixing the algorithm would be to
remove the section completely.
Best regards,
Julian
-- snip --
SUMMARY
The HTML5 spec contains an algorithm for producing an Atom (RFC4287)
feed document from an HTML page.
The definition both relaxes a MUST-level requirement from RFC4287, but
also adds a needless restriction.
Also, it's not clear *at all* whether this is a feature that people
really want, and if they do, whether it needs to be part of HTML5. Given
the fact that it's non-trivial to generate a valid Atom feed from HTML,
but the reverse *is* trivial, we should also consider removing this
feature altogether (I'd be happy to write a 2nd change proposal if
people want to see that as well).
RATIONALE
Instructions to derive a secondary format from HTML documents shouldn't
be misleading, and also should make clear which conditions need to be
met to produce valid documents.
DETAILS
There are two problems, both with the following step (4.15.1, step 15.9
"Otherwise
Let id be a user-agent-defined undereferenceable yet globally unique
valid absolute URL. The same absolute URL should be generated for each
run of this algorithm when given the same input. Let has-alternate be
false."
Problem #1: RFC 4287 does not require the ID to be undereferenceable.
This was a conscious decision of the IETF AtomPub WG. There's absolutely
no point in adding this requirement, except for the spec author's
distaste for URIs that are both dereferenceable *and* act as a globally
unique and stable identifier.
Note from
"...Though the IRI might use a dereferencable scheme, Atom Processors
MUST NOT assume it can be dereferenced."
Problem #2: RFC 4287 makes it a MUST-level requirement to generate the
From
"When an Atom Document is relocated, migrated, syndicated, republished,
exported, or imported, the content of its atom:id element MUST NOT
change. Put another way, an atom:id element pertains to all
instantiations of a particular Atom entry or feed; revisions retain the
same content in their atom:id elements. It is suggested that the atom:id
element be stored along with the associated resource."
HTML5 relaxes this to a should-level requirement.
I do agree that generating valid Atom feeds from HTML *is* hard, but
violating a MUST-level requirement from the Atom spec is not acceptable.
"Let id be a user-agent-defined yet globally unique valid absolute URL."
Change
"The same absolute URL should be generated for each run of this
algorithm when given the same input."
to
"The same absolute URL must be generated for each run of this algorithm
when given the same input. If this requirement can not be fulfilled,
then generating a valid Atom feed is not possible and this algorithm
should be aborted."
IMPACT
1. Positive Effects
Consistency between the applicable specs. Also, authors are correctly
informed about what it takes to generate proper Atom feeds.
2. Negative Effects
None.
3. Conformance Classes Changes
Atom feed generators are actually required to generate valid Atom
documents (with respect to atom:id).
4. Risks
None.
REFERENCES
Inline.
below is a change proposal for this issue.
Note that an obvious alternative to fixing the algorithm would be to
remove the section completely.
Best regards,
Julian
-- snip --
SUMMARY
The HTML5 spec contains an algorithm for producing an Atom (RFC4287)
feed document from an HTML page.
The definition both relaxes a MUST-level requirement from RFC4287, but
also adds a needless restriction.
Also, it's not clear *at all* whether this is a feature that people
really want, and if they do, whether it needs to be part of HTML5. Given
the fact that it's non-trivial to generate a valid Atom feed from HTML,
but the reverse *is* trivial, we should also consider removing this
feature altogether (I'd be happy to write a 2nd change proposal if
people want to see that as well).
RATIONALE
Instructions to derive a secondary format from HTML documents shouldn't
be misleading, and also should make clear which conditions need to be
met to produce valid documents.
DETAILS
There are two problems, both with the following step (4.15.1, step 15.9
"Otherwise
Let id be a user-agent-defined undereferenceable yet globally unique
valid absolute URL. The same absolute URL should be generated for each
run of this algorithm when given the same input. Let has-alternate be
false."
Problem #1: RFC 4287 does not require the ID to be undereferenceable.
This was a conscious decision of the IETF AtomPub WG. There's absolutely
no point in adding this requirement, except for the spec author's
distaste for URIs that are both dereferenceable *and* act as a globally
unique and stable identifier.
Note from
"...Though the IRI might use a dereferencable scheme, Atom Processors
MUST NOT assume it can be dereferenced."
Problem #2: RFC 4287 makes it a MUST-level requirement to generate the
From
"When an Atom Document is relocated, migrated, syndicated, republished,
exported, or imported, the content of its atom:id element MUST NOT
change. Put another way, an atom:id element pertains to all
instantiations of a particular Atom entry or feed; revisions retain the
same content in their atom:id elements. It is suggested that the atom:id
element be stored along with the associated resource."
HTML5 relaxes this to a should-level requirement.
I do agree that generating valid Atom feeds from HTML *is* hard, but
violating a MUST-level requirement from the Atom spec is not acceptable.
"Let id be a user-agent-defined yet globally unique valid absolute URL."
Change
"The same absolute URL should be generated for each run of this
algorithm when given the same input."
to
"The same absolute URL must be generated for each run of this algorithm
when given the same input. If this requirement can not be fulfilled,
then generating a valid Atom feed is not possible and this algorithm
should be aborted."
IMPACT
1. Positive Effects
Consistency between the applicable specs. Also, authors are correctly
informed about what it takes to generate proper Atom feeds.
2. Negative Effects
None.
3. Conformance Classes Changes
Atom feed generators are actually required to generate valid Atom
documents (with respect to atom:id).
4. Risks
None.
REFERENCES
Inline.