Tag Archives: JSON LD

Strings to things in context⤴

from @ Sharing and learning

As part of work to convert plain JSON records to proper RDF in JSON-LD I often want to convert a string value to a URI that identifies a thing (real world concrete thing or a concept).

Simple string to URI mapping

Given a fragment of a schedule in JSON

{"day": "Tuesday"}

As well as converting "day" to a property in an RDF vocabulary I might want to use a concept term for “Tuesday” drawn from that vocabulary. JSON-LD’s @context lets you do this: the @vocab keyword says what RDF vocabulary you are using for properties; the @base keyword says what base URL you are using for values that are URIs; the @id keyword maps a JSON key to an RDF property; and, the @type keyword (when used in the @context object) says what type of value a property should be, the value of @type that says you’re using a URI is "@id" (confused by @id doing double duty? it gets worse). So:

{
  "@context": {
    "@vocab": "http://schema.org/",
    "@base": "http://schema.org/",
    "day": {
       "@id": "dayOfWeek",
       "@type": "@id"
    }
  },
  "day": "Tuesday"
}

Pop this in to the JSON-LD playground to convert it into N-QUADS and you get:

_:b0 <http://schema.org/dayOfWeek> <http://schema.org/Tuesday> .

Cool.

What type of thing is this?

The other place where you want to use URI identifiers is to say what type/class of thing you are talking about. Expanding our example a bit, we might have

{
  "type": "Schedule",
  "day": "Tuesday"
}

Trying the same approach as above, in the @context block we can use the @id keyword to map the string value "type" to the special value "@type"; and, use the @type keyword with special value "@id" to say that the type of value expected is a URI, as we did to turn the string “Tuesday” into a schema.org URI. (I did warn you it got more confusing). So:

{
  "@context": {
    "@vocab": "http://schema.org/",
    "@base": "http://schema.org/",
    "type": {
       "@id": "@type",
       "@type": "@id"    
    },
    "day": {
       "@id": "dayOfWeek",
       "@type": "@id"
    }
  },
  "type": "Schedule",
  "day": "Tuesday"
}

Pop this into the JSON-LD playground and convert to N-QUADS and you get

_:b0 <http://schema.org/dayOfWeek> <http://schema.org/Tuesday> .
_:b0 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://schema.org/Schedule> .

As we want.

Mixing it up a bit

So far we’ve had just the one RDF vocabulary, say we want to use terms from a variety of vocabularies. For the sake of argument, let’s say that no one vocabulary is more important than another, so we don’t want to use @vocab and @base to set global defaults. Adding  another term from a custom vocab in to the our example:

{ 
  "type": "Schedule",
  "day": "Tuesday",
  "onDuty": "Phil" 
}

In the context we can set prefixes to use instead of full length URIs, but the most powerful feature is that we can use different @context blocks for each term definition to set different @base URI fragments. That looks like:

{
  "@context": {
    "schema": "http://schema.org/",
    "ex" : "http://my.example.org/",
    "type": {
       "@id": "@type",
       "@type": "@id",
       "@context": {
         "@base": "http://schema.org/"        
      }
    },
    "day": {
      "@id": "schema:dayOfWeek",
      "@type": "@id",
      "@context": {
         "@base": "http://schema.org/"        
      }
    },
   "onDuty": {
     "@id": "ex:onDuty",
       "@type": "@id",
       "@context": {
         "@base": "https://people.pjjk.org/"
      }
    }
  },
  "type": "Schedule",
  "day": "Tuesday",
  "onDuty": "phil"
}

Translated by JSON-LD Playground that gives:

_:b0 <http://my.example.org/onDuty> <https://people.pjjk.org/phil> .
_:b0 <http://schema.org/dayOfWeek> <http://schema.org/Tuesday> .
_:b0 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <https://json-ld.org/playground/Schedule> .

Hmmm. The first two lines look good. The JSON keys have been translated to URIs for properties from two different RDF vocabularies, and their string values have been translated to URIs for things with different bases, so far so good. But, that last line: the @base for the type isn’t being used, and instead JSON-LD playground is using its own default. That won’t do.

The fix for this seems to be not to give the @id keyword for type the special value of "@type", but rather treat it as any other term from an RDF vocabulary:

{
  "@context": {
    "schema": "http://schema.org/",
    "ex" : "http://my.example.org/",
    "rdf": "http://www.w3.org/1999/02/22-rdf-syntax-ns#",
    "type": {
       "@id": "rdf:type",
       "@type": "@id",
       "@context": {
         "@base": "http://schema.org/"        
      }
    },
    "day": {
      "@id": "schema:dayOfWeek",
      "@type": "@id",
      "@context": {
         "@base": "http://schema.org/"        
      }
    },
   "onDuty": {
     "@id": "ex:onDuty",
       "@type": "@id",
       "@context": {
         "@base": "https://people.pjjk.org/"
      }
    }
  },
  "type": "Schedule",
  "day": "Tuesday",
  "onDuty": "phil"
}

Which gives:

_:b0 <http://my.example.org/onDuty> <https://people.pjjk.org/phil> .
_:b0 <http://schema.org/dayOfWeek> <http://schema.org/Tuesday> .
_:b0 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://schema.org/Schedule> .

That’s better, though I do worry that the lack of a JSON-LD @type key might bother some.

Extensions and Limitations

The nested context for a JSON key works even if the value is an object, it can be used to specify the @vocab and @base and any namespace prefixes used in the keys and values of the value object. That’s useful if title in one object is dc:title and title in another needs to be schema:title.

Converting string values to URIs for things like this is fine if the string happens to match the end of the URI that you want. So, while I can change the a JSON key "author" into the property URI <https://www.wikidata.org/prop/direct/P50> I cannot change the value string "Douglas Adams" into <https://www.wikidata.org/entity/Q42>. For that I think you need to use something a bit more flexible, like RML, but please comment if you know of a solution to that!

Also, let me know if you think the lack of a JSON-LD @type keyword, or anything else shown above seems problematic.

The post Strings to things in context appeared first on Sharing and learning.

JSON Schema for JSON-LD⤴

from @ Sharing and learning

I’ve been working recently on definining RDF application profiles, defining specifications in JSON-Schema, and converting specifications from a JSON Schema to an RDF representation. This has lead to me thinking about, and having conversations with people  about whether JSON Schema can be used to define and validate JSON-LD. I think the answer is a qualified “yes”. Here’s a proof of concept; do me a favour and let me know if you think it is wrong.

Terminology might get confusing: I’m discussing JSON, RDF as JSON-LD, JSON Schema, RDF Schema and schema.org; which are all different things (go an look them up if you’re not sure of the differences).

Why JSON LD + JSON Schema + schema.org?

To my mind one of the factors in the big increase in visibility of linked data over that last few years has been the acceptability of JSON-LD to programmers familiar with JSON. Along with schema.org, this means that many people are now producing RDF based linked data often without knowing or caring that that is what they are doing. One of the things that seems to make their life easier is JSON Schema (once they figure it out). Take a look at the replies to this question from @apiEvangelist for some hints at why and how:

Also, one specification organization I am working with publishes its specs as JSON Schema. We’re working with them on curating a specification that was created as RDF and is defined in RDF Schema, and often serialized in JSON-LD. Hence the thinking about what happens when you convert a specification from RDF Schema to JSON Schema —  can you still have instances that are linked data? can you mandate instances that are linked data? if so, what’s the cost in terms of flexibility against the original schema and against what RDF allows you to do?

Another piece of work that I’m involved in is the DCMI Application Profile Interest Group, which is looking at a simple way of defining application profiles — i.e. selecting which terms from RDF vocabularies are to be used, and defining any additional constraints, to meet the requirements of some application. There already exist some not-so-simple ways of doing this, geared to validating instance data, and native to the W3C Semantic Web family of specifications: ShEx and ShACL. Through this work I also got wondering about JSON Schema. Sure, wanting to use JSON Schema to define an RDF application profile in JSON Schema may seem odd to anyone well versed in RDF and W3C Semantic Web recommendations, but I think it might be useful to developers who are familiar with JSON but not Linked Data.

Can JSON Schema define valid JSON-LD?

I’ve heard some organizations have struggled with this, but it seems to me (until someone points out what I’ve missed) that the answer is a qualified “yes”. Qualifications first:

  • JSON Schema doesn’t defined the semantics of RDF terms. RDF Schema defines RDF terms, and the JSON-LD context can map keys in JSON instances to these RDF terms, and hence to their definitions.
  • Given definitions of RDF terms, it is possible to create a JSON Schema such that any JSON instance that validates against it is a valid JSON-LD instance conforming to the RDF specification.
  • Not all valid JSON-LD representations of the RDF will validate against the JSON Schema. In other words the JSON Schema will describe one possible serialization of the RDF in JSON-LD, not all possible serializations. In particular, links between entities in an @graph array are difficult to validate.
  • If you don’t have an RDF model for your data to start with, it’s going to be more difficult to get to RDF.
  • If the spec you want to model is very flexible, you’ll have difficulty making sure instances don’t flex it beyond breaking point.

But, given the limited ambition of the exercise, that is “can I create a JSON Schema so that any data it passes as valid is valid RDF in JSON-LD?”, those qualifications don’t put me off.

Proof concept of examples

My first hint that this seems possible came when I was looking for a tool to use when working with JSON Schema and found this online JSON Schema Validator.  If you look at the “select schema” drop down and scroll a long way, you’ll find a group of JSON Schema for schema.org. After trying a few examples of my own, I have a JSON Schema that will (I think) only validate JSON instances that are valid JSON-LD based on notional requirements for describing a book (switch branches in github for other examples).

Here are the rules I made up and how they are instantiated in JSON Schema.

First, the “@context” sets the default vocabulary to schema.org and allows nothing else:

{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "$id": "context.json",
  "name": "JSON Schema for @context using schema.org as base",
  "description": "schema.org is @base namespace, but others are allowed",
  "type": "object",
  "additionalProperties": false,
  "required": [ "@vocab" ],
  "properties": {
    "@vocab": {
      "type": "string",
      "format": "regex",
      "pattern": "http://schema.org/",
      "description": "required: schema.org is base ns"
    }
  }
}

This is super-strict, it allows no variations on @context": {"@vocab" : "http://schema.org"} which obviously precludes doing a lot of things that RDF is good at, notably using more than one namespace. It’s not difficult to create looser rules, for example madate schema.org as the default vocabulary but allow some or any others. Eventually you create enough slack to allow invalid linked data (e.g. using namespaces that don’t exist; using terms from the wrong namespace) and I promised you only valid linked data would be allowed. In real life, there would be a balance between permissiveness and reliability.

Rule 2: the book ids must come from wikidata:

{
 "$schema": "http://json-schema.org/draft-07/schema#",
 "$id": "wd_uri_schema.json",
 "name": "Wikidata URIs",
 "description": "regexp for Wikidata URIs, useful for @id of entities",
 "type": "string",
 "format": "regex",
 "pattern": "^https://www.wikidata.org/entity/Q[0-9]+" 
}

Again, this could be less strict, e.g. to allow ids to be any http or https URI.

Rule 3: the resource described is a schema.org/Book, for which the following fragment serves:

    "@type": {
      "name": "The resource type",
      "description": "required and must be Book",
      "type": "string",
      "format": "regex",
      "pattern": "^Book$"
    }

You could allow other options, and you could allow multiple types, maybe with one type manadatory (I have an example schema for Learning Resources which requires an array of type that must include LearningResource)

Rules 4 & 5: the book’s name and description are strings:

    "name": {
      "name": "title of the book",
      "type": "string"
    },
    "description": {
      "name": "description of the book",
      "type": "string"
    },

Rule 6, the URL for the book (i.e. a link to a webpage for the book) must be an http[s] URI:

{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "$id": "http_uri_schema.json",
  "name": "URI @ids",
  "description": "required: @id or url is a http or https URI",
  "type": "string",
  "format": "regex",
  "pattern": "^http[s]?://.+"
}

Rule 7, for the author we describe a schema.org/Person, with a wikidata id, a familyName and a givenName (which are strings), and optionally with a name and description, and with no other properties allowed:

{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "$id": "person_schema.json",
  "name": "Person Schema",
  "description": "required and allowed properties for a Person",
  "type": "object",
  "additionalProperties": false,
  "required": ["@id", "@type", "familyName", "givenName"],
  "properties": {
    "@id": {
      "description": "required: @id is a wikidata entity URI",
      "$ref": "wd_uri_schema.json"
    },
    "@type": {
      "description": "required: @type is Person",
      "type": "string",
      "format": "regex",
      "pattern": "Person"
    },
    "familyName": {
      "type": "string"
    },
    "givenName": {
      "type": "string"
    },
    "name": {
      "type": "string"
    },
    "description": {
      "type": "string"
    }
  }
}

The restriction on other properties is, again, simply to make sure no one puts in any properties that don’t exist or aren’t appopriate for a Person.

The subject of the book (the about property) must be provided as wikidata URIs, with optional @type, name, description and url; there may be more than one subject for the book, so this is an array:

{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "$id": "about_thing_schema.json",
  "name": "About Thing Schema",
  "description": "Required and allowed properties for a Thing being used to say what something is about.",
  "type": "array",
  "minItems": 1,
  "items": {
    "type": "object",
    "additionalProperties": false,
    "required": ["@id"],
    "properties": {
      "@id": {
        "description": "required: @id is a wikidata entity URI",
        "$ref": "wd_uri_schema.json"
      },
      "@type": {
        "description": "required: @type is from top two tiers in schema.org type hierarchy",
        "type": "array",
        "minItems": 1,
        "items": {
          "type": "string",
          "uniqueItems": true,
          "enum": [
            "Thing",
            "Person",
            "Event",
            "Intangible",
            "CreativeWork",
            "Organization",
            "Product",
            "Place"
          ]
        }
      },
      "name": {
        "type": "string"
      },
      "description": {
        "type": "string"
      },
      "url": {
        "$ref": "http_uri_schema.json"
      }
    }
  }
}

Finally, bring all the rules together, making the @context, @id, @type, name and author properties mandatory; about, description and url are optional; no others are allowed.

{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "$id": "book_schema.json",
  "name": "JSON Schema for schema.org Book",
  "description": "Attempt at a JSON Schema to create valid JSON-LD descriptions. Limited to using a few schema.org properties.",
  "type": "object",
  "required": [
    "@context",
    "@id",
    "@type",
    "name",
    "author"
  ],
  "additionalProperties": false,
  "properties": {
    "@context": {
      "name": "JSON Schema for @context using schema.org as base",
      "$ref": "./context.json"
    },
    "@id": {
      "name": "wikidata URIs",
      "description": "required: @id is from wikidata",
      "$ref": "./wd_uri_schema.json"
    },
    "@type": {
      "name": "The resource type",
      "description": "required and must be Book",
      "type": "string",
      "format": "regex",
      "pattern": "^Book$"
    },
    "name": {
      "name": "title of the book",
      "type": "string"
    },
    "description": {
      "name": "description of the book",
      "type": "string"
    },
    "url": {
      "name":"The URL for information about the book",
      "$ref": "./http_uri_schema.json"
    },
    "about": {
      "name":"The subject or topic of the book",
      "oneOf": [
        {"$ref": "./about_thing_schema.json"},
        {"$ref": "./wd_uri_schema.json"}
      ]
    },
    "author": {
      "name":"The author of the book",
      "$ref": "./person_schema.json"
    }
  }
}

I’ve allowed the subject (about) to be given as an array of wikidata entity link/descriptions (as described above) or a single link to a wikidata entity; which hints at how similar flexibility could be built in for other properties.

Testing the schema

I wrote a python script (running in a Jupyter Notebook) to test that this works:

from jsonschema import validate, ValidationError, SchemaError, RefResolver
import json
from os.path import abspath
schema_fn = "book_schema.json"
valid_json_fn = "book_valid.json"
invalid_json_fn = "book_invalid.json"
base_uri = 'file://' + abspath('') + '/'
with open(schema_fn, 'r') as schema_f:
    schema = json.loads(schema_f.read())
with open(valid_json_fn, 'r') as valid_json_f:
    valid_json = json.loads(valid_json_f.read())
resolver = RefResolver(referrer=schema, base_uri=base_uri)
try :
    validate(valid_json,  schema, resolver=resolver)
except SchemaError as e :
    print("there was a schema error")
    print(e.message)
except ValidationError as e :
    print("there was a validation error")
    print(e.message)

Or more conveniently for the web (and sometimes with better messages about what failed), there’s the JSON Schema Validator I mentioned above. Put this in the schema box on the left to pull in the JSON Schema for Books from my github:

{
  "$ref": "https://raw.githubusercontent.com/philbarker/lr_schema/book/book_schema.json"
}

And here’s a valid instance:

{
  "@context": {
    "@vocab": "http://schema.org/"
  },
  "@id": "https://www.wikidata.org/entity/Q3107329",
  "@type": "Book",
  "name": "Hitchhikers Guide to the Galaxy",
  "url": "http://example.org/hhgttg",
  "author": {
    "@type": "Person",
    "@id": "https://www.wikidata.org/entity/Q42",
    "familyName": "Adams",
    "givenName": "Douglas"
  },
  "description": "...",
  "about": [
    {"@id": "https://www.wikidata.org/entity/Q3"},
    {"@id": "https://www.wikidata.org/entity/Q1"},
    {"@id": "https://www.wikidata.org/entity/Q2165236"}
  ]
}

Have a play, see what you can break; let me know if you can get anything that isn’t valid JSON LD to validate.

The post JSON Schema for JSON-LD appeared first on Sharing and learning.