In talking to people about modelling metadata I’ve picked up on a distinction mentioned by Staurt Sutton between entity-based modelling, typified by RDF and graphs, and record-based structures typified by XML; however, I don’t think making this distinction alone is sufficient to explain the difference, let alone why it matters. I don’t want to get into the pros and cons of either approach here, just give a couple of examples of where something that works in a monolithic, hierarchical record falls apart when the properties and relationships for each entity are described separately and those descriptions put into a graph. These are especially relevant when people familiar with XML or JSON start using JSON-LD. One of the great things about JSON-LD is that you can use instance data as if it were JSON, without really paying much regard to the “LD” part; that’s not true when designing specs because design choices that would be fine in a JSON record will not work in a linked data graph.
1. Qualified Instances
It’s very common in a record-oriented approach when making statements about something that many people may have done, such as attending a specific school, earning a qualification/credential, learning a skill etc, to have a JSON record that looks something like:
{ "studentID": "Person1",
"studentName": "Betty Rizzo",
"schoolsAttended": [
{ "schoolID": "School1",
"schoolName": "Rydell High School",
"schoolAddress" : {...}
"startDate": "1954",
"endDate": "1959"
}
]
}
It’s tempting to put a @context
on the top of this to map the property keys to an RDF vocabulary and call it linked data. That’s sub-optimal. To see why consider two students Betty, as above, and Sandy who joined the school for her final academic year, 1958-59. Representing her data and Betty’s as RDF graphs we would get something like:
The upper graph is a representation of what you might get for the record about Rizzo shown above, if you choose a suitable @context
. The lower is similar data about Sandy. When this data is loaded into an RDF triple store, the statements will be stored separately, and duplicates removed. We can show that data as a single merged graph:
Whereas in a record the hierarchy preserves the scope for statements like startDate and endDate so that we know who they refer to, in the RDF graph statements from the JSON object describing the school attended are taken as being about the school itself. The problem arises because the information about the school is treated as data that can be linked to by anything that relates to the school, not just the entity in whose record it was found, which makes sense in terms of data management.
There are options for fixing this: one is not to merge the graphs about the Betty and Sandy, but that means repeating all the data about the school in every record that mentions it; another possible solution is to use the property-graph or RDF-star approach of annotating the schoolAttended property directly with startDate and endDate; but often the answer lies in better RDF modelling. In this case we could create an entity to model the attendance of a person at a school:
and when these are merged:
which keeps the advantage of not duplicating information about the school while maintaining the information about who attended which school when. In JSON-LD this conbined graph would look something like
{ "@context": {...},
"@graph": [
{ "@id": "Person1",
"name": "Betty Rizzo",
"schoolAttended": {
"startDate": "1953",
"endDate": "1959",
"at": {"@id": "School1"}
}
},{
"@id": "Person2",
"name": "Sandy Olsson",
"schoolAttended": {
"startDate": "1953",
"endDate": "1959",
"at": {"@id": "School1"}
}
},{
"@id": "School1",
"name": "Rydell High",
"address": {
"@type": "PostalAddress",
"...": "..."
}
}]
}
Finally, those who just want a JSON record for an individual student that could easily be converted to LD could use something like:
{ "studentID": "Person1",
"schoolsAttended": [
{ "startDate": "1954",
"endDate": "1959",
"at": {
"schoolID": "School1",
"schoolName": "Rydell High School",
"schoolAddress" : {...}
}
]
}
You might think that the “attendance” object sitting between a person and the school is a bit artificial and unintuitive, which it is, but it’s no worse than the tables that RDBM systems need for many-to-many relationships.
2. Lists
Another pattern that comes up a lot is when logically separate resource may be ordered in different ways for different reasons. This may be people in a queue, journal articles in a volume, or learning resources in a larger learning opportunity; anywhere that you might want to say “this” comes before “that”. Say we have an educational program that has a number of courses in it that should be taken in sequential order. JSON lists are ordered, so as a record this seems to work:
{
"name": "My Program",
"hasCourse": [
{"name": "This"},
{"name": "That"},
{"name": "The other"}
]
}
So we sprinkle on some syntactic sugar for JSON-LD:
{
"@context": {"@vocab": "http://schema.org/",
"@base": "http://example.org/resources/"},
"@type": "EducationalOccupationalProgram",
"name": "My Program",
"hasCourse": [
{"@type": "Course",
"@id": "Course1",
"name": "This"},
{"@type": "Course",
"@id": "Course2",
"name": "That"},
{"@type": "Course",
"@id": "Course3",
"name": "The other"}
]
}
But there is no RDF statement in there about ordering, and the ordering of JSON’s arrays is not preserved in other RDF syntaxes (unless there is something in the @context
to say the value of hasCourse is an @list
, it wouldn’t be appropriate to say that every value of hasPart is an ordered list because not every list of parts will be an ordered list). So if we convert the JSON-LD into triples and store them, there is no saying how to order the results returned by a query.
The simple solution would be to have a property to state the position of the course in an ordered list (schema.org/position is exactly this)—but don’t be too hasty: if these courses are taken in more than one program, is Course 2 always going to be second in the sequence? Probably not. In general when resources are reused in different contexts they will probably be used in different orders, “this” may not always come before “that”. That’s why the ordering is best specified at one remove from resources themselves. For example, one of the suggestions for ordering content in K12-OCX is to create a table of contents as an ordered list of items that point to the content, something like:
{
"@context": {
"@vocab": "http://schema.org/",
"ocx": "http://example.org/ocx/",
"@base": "http://example.org/resources/",
"item": {"@type": "@id"}
},
"@type": "EducationalOccupationalProgram",
"name": "My Program",
"ocx:hasToC": {
"@type": "ItemList",
"name": "Table of Contents",
"itemListOrder": "ItemListOrderAscending",
"numberOfItems": "3",
"itemListElement": [
{ "@type": "ListItem",
"item": "Course1",
"position": 1},
{ "@type": "ListItem",
"item": "Course2",
"position": 2 },
{ "@type": "ListItem",
"item": "Course3",
"position": 3 }
]
},
"hasCourse": [
{"@type": "Course",
"@id": "Course1",
"name": "This"},
{"@type": "Course",
"@id": "Course2",
"name": "That"},
{"@type": "Course",
"@id": "Course3",
"name": "The other"}
]
}
or if you prefer to use built-in RDF constructs there is that @list
option:
{ "@context": {
"@vocab": "http://schema.org/",
"ocx": "http://example.org/ocx/",
"@base": "http://example.org/resources/",
"ocx:hasToC": {"@container": "@list"}
},
"@type": "EducationalOccupationalProgram",
"@id": "Program",
"name": "My Program",
"ocx:hasToC": ["Course1", "Course2", "Course3"],
"hasCourse": [
{ "@id": "Course1",
"@type": "Course",
"name": "this"
},{
"@id": "Course2",
"@type": "Course",
"name": "that"
},{
"@id": "Course3",
"@type": "Course",
"name": "the other"
}]
}
When this is processed by something like JSON-LD playground you will see that the list of values for hasToC is replaced by a set of statements about blank-nodes which mean this comes before the others:
<ocx:hasToC> _:b0 .
_:b0 <rdf:first> "http://example.org/resources/Course1" .
_:b0 <rdf:rest> _:b1 .
_:b1 <rdf:first> "http://example.org/resources/Course2" .
_:b1 <rdf:rest> _:b2 .
_:b2 <rdf:first> "http://example.org/resources/Course3" .
_:b2 <rdf:rest> <rdf:nil> .
Conclusion
If you’ve made it this far you deserve the short summary advice. The title for this post was meant literally. Representing a record in RDF will break the record down into separate statements, each about one thing, each saying one thing, with the assumption that those statements are each valid on their own. In modelling for JSON-LD you need to make sure that everything you say about an object is true even when that object is separated from the rest of the record.
The post When RDF breaks records appeared first on Sharing and learning.