I have an action item from the W3C TAG to expand my strawman writeup on our Site Data issue. I’ve written about this problem before: There’s No Such Thing as a Web Site. I’m going to do it here because this is a better writing environment and because I think the issue is of general interest (if by “general interest” we mean to heavy Web geeks).
Introduction · A Web Site is a Web Resource, identified by URI, which is a collection of other Web Resources, each identified by URI.
The two chief objectives to be met by Web Sites are:
A location of the publisher’s choice for the storage of site metadata, including for example robot control information, graphical icons, and privacy policies.
Providing a logical grouping mechanism for web pages, for the support of search and content management applications.
Web Site Membership · When a resource is included in the collection of resources that constitute a Site, we say the resource is a member of the Site. A resource may be a member of any number of Sites: zero, one, or many.
A representation of a resource may include assertions of membership in one or more Sites. A representation of a Site may include the assertion that one or more resources are members of that Site. Clearly, inconsistencies and disagreements can arise; software is free to establish its own policies for dealing with them.
Resource Representations and Site Membership ·
A resource representation can assert Site membership in either its
metadata or data.
For example, a header could be added to the HTTP protocol,
Website:
, which specifies a URI and asserts the resource’s
membership in that Site.
A representation could include multiple Website:
headers.
Designers of any language used in resource representations could include a
method for a representation to assert membership in a site.
For example, a new value of the rel
attribute of XHTML’s
link
attribute could serve this purpose.
Site Representations · For the notion of a Site to be useful, it would be necessary to establish an expectation that it would provide representations in a predictable and useful data format. This format should meet the following goals:
It should be human-readable.
It should contain assertions that individual resources are members of the site.
It should contain assertions that groups of resources identified by URI prefix are members of the site.
It should contain the identification of per-site metadata, probably identified by “Nature” and “Purpose” in the style of RDDL.