In Atom, categories have schemes. What scheme should we use for tags?
Tags and
categories are pretty obviously the same thing, although some
existing publishing systems store and handle them separately. Categories are
often apt to be hierarchical, tags never (that I know of).
In an
Atom feed, both are
represented by the category
element. For example,
this article is in the category “Technology/Web”, so the Atom feed
contains:
<category scheme='http://www.tbray.org/ongoing/What/' term='Technology/Web' />
Suppose I wanted to tag it, uh, “Tagging”. What value should I supply for
scheme
? Just leave it blank? (Atom allows that.) Or maybe
nthere should be a well-known scheme value saying “This is one of those, you
know, Web-2.0,
Everything Is
Miscellaneous, folksonomy, tagsonomy, groovy ‘Tag’ thingies, not a boring
fusty old hierarchical ‘category’.”
This is one of the few situations in which I can imagine being comfortable
with a URN, and looking at the
URN Namespace
Registry reveals that urn:tag
isn’t used. Should I write an
Internet Draft and register it? Then my Atom feed could have:
<category scheme='urn:tag' term='Tagging' />
Or is there an obviously-better idea?
[Update: I whipped up a urn:tag I-D.] [Lots of intelligent points raised in the comments; I dropped one in to address them. I still think this is a good idea.]
Comment feed for ongoing:
From: katre (Feb 01 2007, at 11:05)
Given that http://www.tbray.org/ongoing/What/ seems to point at a list of categories, why not use something like http://www.tbray.org/ongoing/Tags/, which you could set up as a list of known tags, each linking to a list of articles.
You could even go all crazy and do that bit where the size of the tag link is related to the number of articles.
[link]
From: Gregor J. Rothfuss (Feb 01 2007, at 11:09)
You could link to the Wikipedia page for the concept you are trying to tag. This doesn't impose much structure on things, but allows for interesting inferences. Plus, Wikipedia is already the canonical page for a lot of concepts anyway..
[link]
From: Julian Reschke (Feb 01 2007, at 11:30)
Hmm.
So would that term:
<category scheme='urn:tag' term='Tagging' />
have a URI then? Such as
urn:tag:Tagging
?
[link]
From: Bill de hOra (Feb 01 2007, at 12:25)
In the short term there doesn't seem to any point in a category@scheme RFC unless you can support geo: and machine: tags.
Sending machine tags through XML tools is tricky if the machine tag prefix is bound to a qname. Sending tag prefixes through the scheme attribute is tricky because the scheme requires a URI and machine tags don't use urns.
Oops.
Right now I would send the entire machine tag in the term attribute, document the neccessary regexes, and give the scheme a pass. It's enough people are putting tags into buckets without troubling them with with a URI notation war. Small steps.
[link]
From: Jason Clark (Feb 01 2007, at 13:09)
One could use an existing tag resource such as delicious or technorati. For example <category scheme="http://del.icio.us/tag/" term="Tagging" /> or <category scheme="http://technorati.com/tag/" term="Tagging" />. In the second example, combining the scheme+term results in the same URI one would use with "Technorati Tags", e.g. <a href="http://technorati.com/tag/Tagging" rel="Tagging">Tagging</tag>.
[link]
From: Edward O'Connor (Feb 01 2007, at 13:09)
In my blog I'm currently mapping rel-tag links to atom:category like so:
<category scheme="http://example.com/foo/" term="bar-baz" label="Bar Baz" />
<a rel="tag" href="http://example.com/foo/bar-baz">Bar Baz</a>
(There is a specific scheme, http://edward.oconnor.cx/tags/, that I use for categories-which-are-tags.)
[link]
From: Ben Donley (Feb 01 2007, at 13:47)
Although this might not be hierarchical in the sense you mean, tags in Apple's Aperture are certainly hierarchical in one way. If you're a pet photographer, you'd configure it so the "Sharpei" tag is a "dog" is an "animal".
[link]
From: Tim (Feb 01 2007, at 13:53)
A question: Is there any substantial difference between tags and multiple categories? From my point of view both are the same meaning there ist no problem at all.
[link]
From: Jamey Wood (Feb 01 2007, at 13:57)
It's strange to start out saying that "tags and categories are pretty obviously the same thing" and then immediately go on to talk about their differences ("categories are often apt to be hierarchical, tags never"). Having any differences is fundamentally at odds with "being the same thing". Of course, it doesn't preclude being very closely related things (which I personally think is the case for categories and tags).
[link]
From: Devon (Feb 01 2007, at 18:24)
Why not simply "Tag"? It works well enough for rel attributes. Why the need for a whole new urn and registration? That's what I don't get. It seems like the solution using urn:, is overkill. Like using a saw to cut a stick of butter. If anything, why not use a tag: scheme? It's already usable for Atom ID's.
[link]
From: Lenny (Feb 01 2007, at 19:14)
I recall seeing urn:tag as related to tag: URIs, though Googling came up with this draft [1], which was apparently simplified to remove the urn part for the RFC [2]. It could still be somewhat confusing.
[1]: http://tools.ietf.org/html/draft-kindberg-tag-uri-04
[2]: http://tools.ietf.org/html/rfc4151
Besides, tags hardly ever mean the same thing to two people, so why should they have the same scheme? If some application really thinks that <category scheme="http://example.org/farmer/tag/" term="apple"/> means the same thing as <category scheme="http://example.org/geek/tag/" term="apple"/>, it can just drop the scheme.
[link]
From: Ben (Feb 02 2007, at 00:42)
Regarding urn:tag:{TagName}, shouldn't you specify that Unicode is used? Your example of urn:tag:caf%C3%A9 implies that it's actually UTF-8 encoded into US-ASCII, but wouldn't it be nice if this was explicit?
[link]
From: Henry Story (Feb 02 2007, at 01:24)
As I see it a category is a tag with a namespace. So don't put a namespace (scheme) in if you want a tag, but you may as well put the scheme in, since people can always treat your category as a tag (by not querying on the scheme).
A bit more on this from a modeling perspective at
http://blogs.sun.com/bblfish/entry/folksonomies_ontologies_atom_and_the
On the other hand I really like the idea of using a wiki as the namespace for ones tags. Large organizations could thereby use tags to work out a standard vocabulary.
[link]
From: Danny (Feb 02 2007, at 01:38)
Personally I'd think this is a exactly the kind of situation where a http: URI would be far preferable to a URN, so you can provide a definition of the term and/or a list of posts that fall into the category/have the tagging.
"Is there any substantial difference between tags and multiple categories?"
Maybe, maybe not. But using URIs in the scheme (or terms) allows you to create them as you please - a post could have a single categories, multiple tags. A split like this would be straightforward:
<category scheme='http://www.tbray.org/ongoing/What/' term='Technology/Web' label="Web Category' />
<category scheme='http://www.tbray.org/ongoing/tag/' term='Technology/Web' label='Web Tag' />
Re. machine tags, I wouldn't even try to treat them as machine-readable QNames, the same information could be carried something like this:
<category scheme='http://www.tbray.org/ongoing/tag/Technology/' term='Web' label='tech:web' />
I definitely think it's worth using a scheme, even if many agents might not yet use it, Flickr's 'Web' may be commonly be associated with 'Spider'.
Incidentally, you might want to glance at the Tag Ontology http://www.holygoat.co.uk/projects/tags/ - it's intended to capture not only the relationship between the post (or whatever) and the tag, but also who did the tagging and when. All of which is available in Atom, and could be automatically interpreted in the RDF form using GRDDL, no extra work from the publisher, no one-off coding for the consumer. A SPARQL query across such stuff could incorporate these facets (and any others that may be available/useful).
[link]
From: Tony Hammond (Feb 02 2007, at 02:00)
Hi Tim:
Nice idea. Just have to ask if you feel there may be any possible confusion with the registered URI scheme 'tag' [RFC 4151]:
http://www.rfc-editor.org/rfc/rfc4151.txt
[link]
From: Asbjørn Ulsberg (Feb 02 2007, at 03:55)
Just use Wikipedia!
For the tags to be interoperable and meaningful across weblogs, one could start using 'http://www.wikipedia.org/wiki/' as the scheme. With the term 'Dog' for example, you'd have 'http://www.wikipedia.org/wiki/Dog' (301 to 'http://en.wikipedia.org/wiki/Dog') that universally, globally and uniquely defines "dog". Wikipedia should probably implement something that makes this a bit easier and takes into account localization and stuff I can't wrap my head around at the moment.
[link]
From: Danny (Feb 02 2007, at 05:57)
PS. Tim Berners-Lee has been musing around tags recently too:
http://www.w3.org/DesignIssues/TagLabel.html
[link]
From: Michael Daines (Feb 02 2007, at 14:45)
As far as implementations that are out there in the world go, Flickr uses the idea katre suggests. (Not surprising as there is no urn:tag besides the suggestion your draft!) At first, that way seemed the most reasonable to me, but now I think I agree more with this urn:tag stuff.
But I wonder... (and this is a little flaky) what if you had category elements for "popularity" on your entries? Or some other strange kind of categorizing scheme? It seems to me that having the scheme attribute point at some place that describes how the entry fits into your scheme is better that just saying "this is a popularity rating".
Knowing that something is definitely a "tag" in the way that most people understand tags is probably pretty useful. It would seem that there would be room for both approaches, even in the same gaggle of category tags.
[link]
From: Tim Bray (Feb 03 2007, at 00:27)
Katre: The point is that a lot of people out there (Flickr, Delicious, Technorati) are using the same tags; we're looking for a way to express the fact that this is a large shared neutral vocabulary.
Gregor, Asbjørn: I don't want to privilege Wikipedia. And there are lots of perfectly useful tags like javaone2006 and barcamp2005 that would be hard to point to that way.
Julian: Excellent idea. I wrote it into the I-D.
Bill: I'm just trying to hit an 80/20 point. We already have RDF for those who want to Solve The Whole Problem.
Jason: Nope, I'm sure that Technorati (whom I love) or lots of other parties would like tags to become theirs. That would be a bad idea for a bunch of reasons.
Edward: Cool... but how about a shared scheme that everyone can use?
Ben: Those Aperture "tags" sound like an old-fashioned hierarchical taxonomy to me.
Devon: Because the scheme is supposed to be a URI. urn:tag: would be, if the registration were accepted.
Lenny: The tag: URI scheme is aimed at a different problem entirely.
Ben: The I-D mentions "percent-encoding" which per the URI RFC rules is implicitly UTF-8.
Henry: De facto, there is an unstructured flat vocabulary of "tags" which currently doesn't have a name; it needs one.
Danny: I disagree entirely. Where on earth would you go to find about tags like "lisp" or "ethiopia". The whole point of tags is that they're invented by anyone without asking, and semantics are either emergent or absent. There's nowhere authoritative to go.
Tony: Yeah, I thought of urn:web-tag or urn:subject-tag for precisely that reason, but "urn:tag:" is kinda pleasing to the mind's eye.
[link]
From: Julian Reschke (Feb 03 2007, at 01:26)
On encoding non-ASCII: I think you really should also quote RFC3987 (IRI) - the rules in RFC3986 do not really require UTF-8, as far as I can tell.
(You may also want to state a preference on Unicode NFs)
[link]
From: Danny (Feb 03 2007, at 02:49)
Ok, total disagreement, I can live with that ;-)
[TB] The whole point of tags is that they're invented by anyone without asking, and semantics are either emergent or absent.
Tags are *used* by *someone*, and the semantics are that the person doing the tagging is associating the thing (post, photo or whatever) with a label. Sure, the benefit of this in a system like del.icio.us comes from the emergent part, that people tend to use the same labels (or combination of labels) for the same thing. But the aggregate semantics don't emerge from a void.
[TB] Where on earth would you go to find about tags like "lisp" or "ethiopia".
http://del.icio.us/tag/ethiopia
or
http://flickr.com/photos/tags/ethiopia
or even
http://www.tbray.org/ongoing/What/geo/ethiopia
But these may not necessarily share the same notion of the tag in question. Ok, "ethiopia" may be pretty close above the three spaces above, but consider the tag "sun" (ignoring case).
[TB] There's nowhere authoritative to go.
Not if you use non-resolvable URIs there isn't!
There won't be a global definition for a given tag, but a service could quite easily aggregate taggings, making the assumption that the tag string has the same meaning across services. (Which may or may not work according to the services/tags in question - "chat"@en isn't exactly the same as "chat"@fr).
In tagging there is more information available than a simple string-thing association, each tagging has been made by a particular person in a particular context. One way of capturing some of this information is through using tags in that person's own taxonomy - like your categories. Knowing who did the tagging is still good information, even in a folksonomy.
So how would you do this:
"Show me all the posts tagged 'sun' in the last month by the bloggers aggregated at Planet Intertwingy"
----
Re. your I-D,
[[
a URN consisting of just "urn:tag:" is used to identify the vocabulary from which tags are drawn.
]]
XML namespaces (especially as used by RDF) is an existing specification which covers this.
[[
The current namespace of tags, expressed as a global pool of short textual strings assigned informally, functions well. However, there is currently no way for a document format such as Atom to specify that a category is one of these things.
]]
Using a URI scheme to identify a namespace (even if it is "the global namespace of tagging") seems a very strange approach to this. You want to "specify that a category is one of these things" - ok, so you're typing the category. There's already a perfectly good mechanism for this:
<http://www.tbray.org/ongoing/What/geo/ethiopia>
rdf:type tags:Tag .
While there are circumstances where adding global constants to a distributed system makes sense, I don't think tagging is one of those cases.
[link]
From: Danny (Feb 03 2007, at 03:24)
PS. I forgot you don't want to pay the "RDF Tax".
Ok, here's a simpler solution:
<category term='Web' />
How does the global atom:category/@term space differ from the global tag space?
I still think the URI that associates the term with a namespace is useful, but that can still be consistent with the above:
<category scheme='http://www.tbray.org/ongoing/What/' term='Technology/Web' />
"Technology/Web" is a term in Tim Bray's category scheme, it is also a tag in the global tag namespace.
[link]
From: Bill de hOra (Feb 03 2007, at 05:19)
"I'm just trying to hit an 80/20 point. We already have RDF for those who want to Solve The Whole Problem."
I didn't say anything here about RDF (and you should know I'm no RDF fanboy -ask Danny). But you won't hit 20/80 unless you can cater for geo/machine real code and I can't see how your RFC does that. A lookup table on a list of know prefixes would serve me better than another urn scheme.
[link]
From: Aristotle Pagaltzis (Feb 03 2007, at 22:54)
I posted a log entry about this: <http://plasmasturm.org/log/452/>. Three points:
• There Is No Need For Schemes Beyond HTTP.
• If you think you want one, the real solution to your problem is probably reverse URI templates.
• But for tags, just do the same thing as HTML̂, which does not need a new scheme.
[link]
From: Edward O'Connor (Feb 05 2007, at 09:45)
I ended up writing a long (too long, really) blog post on this: http://edward.oconnor.cx/2007/02/representing-tags-in-atom
[link]
From: Lenny (Feb 05 2007, at 16:19)
Tim, I understand that tag: URIs are entirely different. I brought them up because a draft of that spec also defined an urn:tag: namespace, so that might be confusing.
The second part of my comment was the actual response to your idea: which is essentially said here: http://www.intertwingly.net/blog/2007/02/05/Show-Me . If, as a consumer, all you care about is the tag, then ignore the scheme.
[link]