Recently I wrote of my disgruntlement with JSON Schema. Since then I’ve learned that its authors plan more work, and that there are several other efforts to build a schema facility for JSON. This note is just a complaint about a particular use-case, with the hope that it might inform these efforts.
Here’s the problem; the language I’m building includes a big object whose
fields are also objects, and each of these child objects has a
Type
field, whose value is, in effect, an enum; a constrained set
of string values. The rest of the fields in each child object depend on the
Type
value.
Like so:
{
"o1": {
"Type": "Animal",
... other fields specific to Animal ...
},
"o2": {
"Type": "Vegetable",
... other fields specific to Vegetable ...
},
"o3": {
"Type": "Mineral",
... other fields specific to Mineral
}
}
In JSON Schema it’s easy enough to use a oneOf
construct to
describe what you want, but here’s the problem: when you syntax-check
these, the error messages go right off the rails. For example, suppose my
schema says that if Type
is Mineral
, then there must
be a boolean-valued Organic
field. And then suppose I try to
validate a doc with an object that has "Type": "Mineral"
and
"Organic": 42
.
The validators I’ve been working with give me a massively unhelpful flurry of messages saying that:
This can’t be Animal
because the Type
is
wrong.
This can’t be Vegetable
because the Type
is
wrong.
This can’t be Mineral
because the Organic
is
wrong.
This can’t be a valid instance because the oneOf
doesn’t
match.
Of these, only one error message is helpful; but I don’t see any way to write a schema processor to report what’s really wrong in a human-friendly way.
Anyhow, I can think of specific recommendations for schema-language design that fall out of this observation. But I’m not building any of them so I’ll stick with just reporting the use-case.
Comment feed for ongoing:
From: Erik Wilde (May 22 2016, at 15:34)
you have to do what you have to do. but you'd run into this issue with the majority of schema languages. this kind of "structural value-based co-constraint" is hard to design cleanly into a schema language. any chance to ditch the container names o1 and so forth and use the types instead? frankly, just looking at it i wouldn't consider this a great JSON model, so i have sympathy for any schema language not working great for it.
[link]
From: Grahame Grieve (May 22 2016, at 20:36)
Actually, this is a relatively common pattern, and indeed, most schema languages can describe it, but don't give processors a realistic way to navigate it. This means that code generators are impossible, or produce incoherent code, and validators produce obscure error messages.
The key is to introduce into the schema language the notion of a 'discriminator' - an explicit instruction about which element to use as a master to navigate the options, along with a requirement that the chosen discriminator is unambiguous (and demonstrably so) for all of the choices. For schema processors, then, this collapses to some logical variant of a switch statement, and generating code and coherent error messages becomes straight forward
[link]