Tag Archives: semantic technologies

JDX: a schema for Job Data Exchange⤴

from @ Sharing and learning

[This rather long blog post describes a project that I have been involved with through consultancy with the U.S. Chamber of Commerce Foundation.  Writing this post was funded through that consultancy.]

The U.S. Chamber of Commerce Foundation has recently proposed a modernized schema for job postings based on the work of HR Open and Schema.org, the Job Data Exchange (JDX) JobSchema+. It is hoped JDX JobSchema+ will not just facilitate the exchange of data relevant to jobs, but will do so in a way that helps bridge the various other standards used by relevant systems.  The aim of JDX is to improve the usefulness of job data including signalling around jobs, addressing such questions as: what jobs are available in which geographic areas? What are the requirements for working in these jobs? What are the rewards? What are the career paths? This information needs to be communicated not just between employers and their recruitment partners and to potential job applicants, but also to education and training providers, so that they can create learning opportunities that provide their students with skills that are valuable in their future careers. Job seekers empowered with greater quantity and quality of job data through job postings may secure better-fitting employment faster and for longer duration due to improved matching. Preventing wasted time and hardship may be particularly impactful for populations whose job searches are less well-resourced and those for whom limited flexibility increases their dependence on job details which are often missing, such as schedule, exact location, and security clearance requirement. These are among the properties that JDX provides employers the opportunity to include for easy and quick identification by all.  In short, the data should be available to anyone involved in the talent pipeline. This broad scope poses a problem that JDX also seeks to address: different systems within the talent pipeline data ecosystem use different data standards so how can we ensure that the signalling is intelligible across the whole ecosystem?

The starting point for JDX was two of the most widely used data standards relevant to describing jobs: HR Open Standards Recruiting standard, part of the foremost suite of standards covering all aspects of the HR sector and the schema.org JobPosting schema, which is used to make data on web pages accessible to search engines, notably Google’s Job Search. These, and an analysis of the information required around jobs, job descriptions and job postings, their relationships to other entities such as organizations, competencies, credentials, experience and so on, were modelled in RDF to create a vocabulary of classes, properties, and concept schemes that can be used to create data. The full data model, which can be accessed on GitHub, is quite extensive: the description of jobs that JDX enables goes well beyond what is required for a job posting advertising a vacancy. A subset of the full model comprising those terms useful for job postings was selected for pilot testing, and this is available in a more accessible form on the Chamber Foundation’s website and is documented on the Job Data Exchange website. The results of the data analysis, modelling and piloting were then fed back into the HR Open and schema.org standards that were used as a starting point.

This is where things start to get a little complicated, as it means JDX has contributed to three related efforts.

JobPostings in schema.org

The modelling and piloting highlighted and addressed some issues that were within schema.org’s scope of enabling the provision of structured data about job postings on the web. These were discussed through a W3C Community Group on Talent Marketplace Signalling, and the solutions were reconciled with schema.org’s wider model and scope as a web-wide vocabulary that covers many other types of things apart from Jobs. The outcomes include that schema.org/JobPosting has several new properties (or modifications to how existing properties are used) allowing for such things as: a job posting with more than one vacancy, a job posting with a specified start date, a job posting with requirements other than competencies — i.e. physical, sensory and security clearance requirements, and more specific information about contact details and location within the company structure for the job being advertised.

Because schema.org and JDX are both modelled in RDF as sets of terms that can be used to make independent statements about entities (rather than a record-based model such as XML documents) it was relatively easy to add terms to schema.org that were based on those in JDX. The only reason that the terms added to schema.org are not exactly the same as the terms in JDX JobSchema+ is that it was sometimes necessary to take into account already existing properties in schema.org, and the wider purpose and different audience of schema.org.

JDX in HROpen

As with schema.org, JDX highlighted some issues that are within the scope of the HROpen Standards Recruiting standard, and the aim is to incorporate the lessons learnt from JDX into that standard. However the Recruiting standard is part of the inter-linked suite of specifications that HROpen maintains across all aspects of the HR domain, and these standards are in plain JSON, a record-based format specified through JSON-Schema files not RDF Schema. This makes integration of new terms and modelling approaches from JDX into HROpen more complicated than was the case with schema.org. As a first step the property definitions have been translated into JSON-Schema, and partially integrated into the suite of HROpen standards, however some of the structures, for example for describing Organizations, were significantly different to how other HROpen standards treat the same types of entity, and so these were kept separate. The plan for the next phase is to further integrate JDX into the existing standards, enhance the use cases and documentation and include RDF, JSON Schema, and XML XSD.

JDX JobPosting+ RDF Schema

Finally, of course, JDX still exists as an RDF Schema, currently on github.  The work on integration with HROpen surfaced some errors and other issues, which have been addressed. Likewise feeding back into schema.org JobPosting means that there are new relationships between terms in JDX and schema.org that can be encoded in the JDX schema. Finally there is potential for other changes and remodelling as a result of findings from the JDX pilot of job postings. But given the progress made with integrating lessons learnt into schema.org and the HROpen Recruiting standard, what is the role of the RDF Schema compared to these other two?

Standard Strengths and Interoperability

Each of the three standards has strengths in its own niche. Schema.org provides a widely scoped vocabulary, mostly used for disseminating information on the open web. The most obvious consumers of data that use terms from schema.org are search engines trying to make sense of text in web pages, so that they can signal the key aspects of job postings with less ambiguity than can easily be done by processing natural text. Of course such data is also useful for any system that tries to extract data from webpages. Schema.org is also widely used as a source of RDF terms for other vocabularies, after all it doesn’t make much sense for every standard to define its own version of a property for the name of the thing being described, or a textual description of it—more on this below in the discussion of harmonization.

HROpen Standards are designed for system-to-system interoperability within the HR domain. If organization A and organization B (not to mention organizations C through to Z) have systems that do the same sort of thing with the same sort of data, then using an agreed standard for the data they care about clearly brings efficiencies by allowing for systems to be designed to a common specification and for organizations to share data where appropriate. This is the well understood driving force for interoperability specifications.

it is useful to have a common set of “terms” from which data providers can pick and choose what is appropriate for communicating different aspects of what they care about

But what about when two organizations are using the same sort of data for different things? For example, it might be that they are part of different verticals which interact with each other but have significant differences aside from where they overlap; or it might be that one organization provides a horizontal service, such as web search, across several verticals. This is where it is useful to have a common set of “terms” from which data providers can pick and choose what is appropriate for communicating different aspects of what they care about to those who provide services that intersect or overlap with their own concern. For example a fully worked specification for learning outcomes in education would include much that is not relevant to the HR domain and much that overlaps; furthermore HR and education providers use different systems for other aspects of their work: HR will care about integration with payroll systems, education about integration with course management systems. There is no realistic prospect that the same data standards can be used to the extent that the record formats will be the same; however with the RDF approach of entity-focused description rather than defining a single record structure, there is no reason why some of the terms that are used to describe the HR view of competency shouldn’t also be used to describe the education view of learning outcomes. Schema.org provides a broad horizontal layer of RDF terms that can be used across many domains; JDX provides a deeper dive into the more specific vocabulary used in jobs data.

Data Harmonization

This approach to allowing mutual intelligibility between data standards in different domains to the extent that the data they care about overlaps (or, for that matter, competing data standards in the same domain) is known as data harmonization. RDF is very much suited to harmonization for these reasons:

  • its entity-based modelling approach does not pre-impose the notion of data requirements or inter-relationships between data elements in the way that a record-based modelling approach does;
  • in the RDF data community it is assumed that different vocabularies of terms (classes and properties for describing aspects of a resource) and concepts (providing the means to classify resources) will be developed in such a way that someone can mix and match terms from relevant vocabularies to describe all the entities that they care about; and
  • as it is assumed that there will be more than one relevant vocabulary it has been accepted that there will be related terms in separate vocabularies, and so the RDF schema that describe these vocabularies should also describe these relationships.

JDX was designed in the knowledge that it overlaps with schema.org. For example JDX deals with providing descriptions of organizations (who offer jobs), and with things that have names and so does schema.org. It is not necessary for JDX to define its own class of Organizations or property of name, it simply uses the class and property defined by schema.org. That means that any data that conforms to the JDX RDF schema automatically has some data that conforms with schema.org. No need to extract and transform RDF data before loading it when the modelling approach and vocabularies used are the same in the first place.

Sometimes the match in terminology isn’t so good. At some point in the future we might, for example, be prepared to say that everything JDX calls a JobPosting is something that schema.org calls a JobPosting and vice versa. In this case we could add to the JDX schema a declaration that these are equivalent classes. In other cases we might say that some class of things in JDX form a subset of what schema.org has grouped as a class, in which case we could add to the JDX schema a declaration that the JDX class is a subclass of the schema.org class. Similar declarations can be made about properties.

by querying the data provided about things along with information about relationships between the data terms used we can achieve interoperability across data provided in different data standards

The reason why this is useful is that RDF schema are written in RDF and RDF data includes links to the definitions of the terms in the schema, so data about jobs and organizations and all the other entities described with JDX can be in a data store linked to the definitions of the terms used to describe them. These definitions can link to other definitions of related terms all accessible for querying.  This is linked data at the schema level. For a long time we have referred to this network of data along with definitions, which were seen as sprawling across the internet, as the Semantic Web, but more recently it has been found to be useful for datastores to be more focused, and the result of data about a domain along with the schema for those data is now commonly known as a knowledge graph. What matters is the consequence that by querying the data provided about things along with information about relationships between the data terms used we can achieve interoperability across data provided in different data standards. If a query system knows that some data relates to what JDX calls a JobPosting (because the data links to the JDX schema), and that everything JDX calls a JobPosting schema.org also calls a JobPosting (let’s say this is declared in the schema) then when asked about schema.org  JobPostings the query system knows it can return information about JDX JobPostings. RDF data management systems do this routinely and, for the end user, transparently.

That’s lovely if your data is in RDF; what if it is not? Most system-to-system interoperability standards don’t use RDF. This is the problem taken on by the  Data Ecosystem Schema Mapper (DESM) Tool. The approach it takes is to create local RDF schema describing the classes, properties and classifications used in these standards. The local RDF schema can assert equivalences between the RDF terms corresponding to each standard, or from each standard to an appropriate formal RDF vocabulary such as JDX.  Data can then be extracted from the record formats used and expressed as RDF using technologies such as the RDF Mapping Language (RML). This would allow us to build knowledge graphs that draw on data provided in existing systems, and query them without knowing what format or standard the data was originally in. For example, an employer could publish data in JSON using HR Open Standards’ Recruiting Standard. This data could be translated to the RDF representation of the standard created with the DESM Tool. Relationships expressed in the schema for the RDF representation would allow mapping of some or all of the data to JDX JobSchema+, schema.org JobPosting and other relevant standards. (The other standards may cover only part of the data, for example mapping skills requirements to standards used for competencies as learning objectives in the education domain.) This provides a route to translating data between standards that cover the same ground, and also provides data that can link to other domains.

Acknowledgements

Stuart Sutton, of Sutton & Associates, led the creation of the JDX JobSchema+ and originated many of the ideas described in this blog post.

Many thanks to people who commented on drafts of this post, including Stuart Sutton, Danielle Saunders, Jeanne Kitchens, Joshua Westfall, Kim Bartkus. Any errors remaining are my fault.

Writing this post was part of work funded by the U.S. Chamber of Commerce Foundation.

The post JDX: a schema for Job Data Exchange appeared first on Sharing and learning.

Harmonizing Learning Resource Metadata (even LOM)⤴

from @ Sharing and learning

The best interoperability is interoperability between standards. I mean it’s one thing for you and I to agree to use the same standard in the same way to do the same thing, but what if we are doing slightly different things? What if Dublin Core is right for you, but schema.org is right for me–does that mean we can’t exchange data? That would be a shame, as one standard to rule them all isn’t a viable solution. This has been exercising me through a couple of projects that I have worked on recently, and what I’ll show here is demo based on the ideas from one of these (The T3 Data Ecosystem Mapping Project) applied to another where learning resource metadata is available in many formats and desired in others. In this post I focus on metadata available as IEEE Learning Object Metadata (LOM) but wanted in either schema.org or DCAT.

The Problem

Interoperability in spite of diverse standards being used seems an acute problem when dealing with metadata about learning resources. It makes sense to use existing (diverse) schema for describing books, videos, audio, software etc, supplemented with just enough metadata about learning to describe those things when they are learning resources (textbooks, instructional videos etc.). This is the approach taken by LRMI. Add to this the neighbouring domains with which learning resource metadata needs to connect, e.g. research outputs, course management, learner records, curriculum and competency frameworks, job adverts…, all of which have there own standards ecosystems and perhaps you see why interoperability across standards is desirable.

(Aside it often also makes sense to use a large all-encompassing standard like schema.org, often as well as more specialized standards, which is why LRMI terms are in schema.org.)

This problem of interoperability in an ecosystem of many standards was addressed by Mikael Nilsson in his PhD thesis “From Interoperability to Harmonization in Metadata Standardization” where he argued that syntax wasn’t too important, what mattered more was the abstract model. Specifically he argued that interoperability (or harmonization) was possible between specs that used the RDF entity-based metamodel but less easy between specs that used a record-like metamodel.  IEEE Learning Object Metadata is largely record-like: a whole range of different things are described in one record, and the meanings of many elements depend on the context of the element in the record, and sometimes the values of other elements in the same record. Fortunately it is possible to identify LOM elements that are independent characteristics of an identified entity, which means it is possible to represent some LOM metadata in RDF. Then it is possible to map that RDF representation to terms from other vocabularies.

Step 1: RML to Map LOM XML to a local RDF vocabulary

This RML is the RDF Mapping Language, “a language for expressing customized mappings from heterogeneous data structures and serializations to the RDF data model … and to RDF datasets”.  It does so through a set of RDF statements in turtle syntax that describe the mapping from (in my case, here) XML fragments specified as XPath strings to subjects, predicates and objects. There is parser called RMLMapper that will then execute the mapping to transform the data.

My LOM data came from the Lifewatch training catalogue which has a nice set of APIs allowing access to sets of the metadata. Unfortunately the LOM XML provided deviates from the LOM XML Schema in many ways, such  as element names with underscore separations rather than camel case (so element_name, not elementName) and some nesting errors, so the RML I produced won’t work on other LOM XML instances.

Here’s a fragment of the RML, to give a flavour. The whole file is on github, along with other files mentioned here:

 
<#Mapping> a rr:TriplesMap ;
  rml:logicalSource [
    rml:source "lifewatch-lom.xml" ;
    rml:referenceFormulation ql:XPath ;
    rml:iterator "/lom/*"
  ];
  rr:subjectMap [
    rr:template "http://pjjk.local/resources/{external_id}" ;
    rr:class lom:LearningObject
  ] ;
  rr:predicateObjectMap [
    rr:predicate lom:title;
    rr:objectMap [
      rml:reference "general/title/langstring";
      rr:termType rr:Literal;
    ]
  ] ;
 #etc...
 .

I have skipped the prefix declarations and jumped straight the part of the mapping that specifies the source file for the data, and the XMLPath of the element to iterate over in creating new entities. The subjectMap generates an entity identifier using a non-standard element in the LOM record appended to a local URI, and assigns this a class. After that a series of predicateObjectMaps specify predicates and and where in the XML to find the values to use as objects. Running this through the mapper generates RDF descriptions, such as:

<http://pjjk.local/resources/DSEdW5uVa2> a lom:LearningObject;
  lom:title "Research Game";
#etc...

Again I have omitted the namespaces; the full file, all statements for all resources, is on github.

Step 2: Describe the mappings in RDF

You’ll notice that lom: namespace in the mapping and generated instance data. That’s not a standard rdf representation of the IEEE LOM, it’s a local schema that defines the relationships of some of the terms mapped from IEEE LOM to more standard schema. The full file is on github, but again, here’s a snippet:

lom:LearningObject a rdfs:Class ;
  owl:equivalentClass sdo:LearningResource , lrmi:LearningResource ;
  rdfs:subClassOf dcat:Resource .

lom:title a rdfs:Property ;
  rdfs:subPropertyOf sdo:name ;
  owl:equivalentProperty dcterms:title .

This is where the magic happens. This is the information that later allows us to use the metadata extracted from LOM records as if it is schema.org or LRMI, Dublin Core or DCAT. Because this schema is used locally only I haven’t bothered to put in much information about the terms other than their mapping to other more recognized terms. The idea here isn’t to be able to work with LOM in RDF, the idea is to take the data from LOM records and work with it as if it were from well defined RDF metadata schema. I also haven’t worried too much about follow-on consequences that may derive from the mappings that I have made, i.e. implied statements about relationships between terms in other schema, such as the implication that if lom:title is equivalent to dcterms:title, and also a subProperty of schema.org/name, then I am saying that dcterms:title is a subProperty of schema.org/name. This mapping is for local use, I’ll assert what if locally useful, if you disagree that’s fine because you won’t be affected by my local assertions.

Just to complete the schema picture, I also have RDF schema definitions files for Dublin Core Terms, LRMI, DCAT and schema.org.

(Aside: I also created some SKOS Concept Schemes for controlled vocabularies used in the LOM records, but they’re not properly working yet.)

Step 3: Build a Knowlege Graph

(Actually I just put all the schema definitions and the RDF representation of the LOM metadata into a triple store, but calling it a knowledge graph gets people’s attention.) I use a local install of Ontotext GraphDB (free version). It’s important when initializing the repository to choose a ruleset that allows lots of inferencing: I use the OWL-MAX option. Also, it’s important when querying the data to select the option to include inferred results.

SPARQL interface for GraphDB showing option to include inferred data
SPARQL interface for GraphDB showing option to include inferred data

Step 4: Results!

The data can now be queried with SPARQL. Ror example, a simple query to check what’s there:

PREFIX lom: <http://ns.pjjk.local/lom/terms/>

SELECT ?r ?t { 
    ?r a lom:LearningObject ;
    lom:title ?t .  
}

This produces a list of URIs & titles for the resources:

r,n
http://pjjk.local/resources/DSEdW5uVa2,Research Game
http://pjjk.local/resources/FdW84TkcrZ,Alien and Invasive Species showcase
http://pjjk.local/resources/RcwrBMYavY,EcoLogicaCup
http://pjjk.local/resources/SOFHCa8sIf,ENVRI gaming
http://pjjk.local/resources/Ytb7016Ijs,INTERNATIONAL SUMMER SCHOOL Data FAIRness in Environmental & Earth Science Infrastructures: theory and practice
http://pjjk.local/resources/_OhX8O6YwP,MEDCIS game
http://pjjk.local/resources/kHhx9jiEZn,PHYTO VRE guidelines
http://pjjk.local/resources/wABVJnQQy4,Save the eel
http://pjjk.local/resources/xBFS53Iesg,ECOPOTENTIAL 4SCHOOLS 

Nothing here other than what I put in that was converted from the LOM XML records.

More interestingly, this produces the same:

PREFIX sdo: <http://schema.org/> 

Select ?r ?n { 
    ?r a sdo:LearningResource ;
    sdo:name ?n ;
}

This is more interesting because it’s showing query using schema.org terms yielding results from metadata that came from LOM records.

If you prefer your metadata in DCAT, with a little added lrmi to describe the educational characteristics this:

PREFIX dcat: &lt;http://www.w3.org/ns/dcat#&gt;
PREFIX dcterms: &lt;http://purl.org/dc/terms/&gt;
PREFIX lrmi: &lt;http://purl.org/dcx/lrmi-terms/&gt;

CONSTRUCT {
  ?r a dcat:Resource ;
  dcterms:title ?t ;
  dcat:keywords ?k ;
  lrmi:educationalLevel ?l .
} WHERE {
  ?r a dcat:Resource ;
  dcterms:title ?t ;
  dcat:keywords ?k ;
  lrmi:educationalLevel ?l .
}

will return a graph of such:

@prefix dcat: <http://www.w3.org/ns/dcat#> .
@prefix dcterms: <http://purl.org/dc/terms/> .
@prefix lrmi: <http://purl.org/dcx/lrmi-terms/> .

<http://pjjk.local/resources/DSEdW5uVa2> a dcat:Resource ;
	dcterms:title "Research Game" ;
	dcat:keywords "competition" ;
	lrmi:educationalLevel <lomCon:/difficulty/Medium> ;
	dcat:keywords "european" , "gaming" , "research game" , "schools" .

<http://pjjk.local/resources/FdW84TkcrZ> a dcat:Resource ;
	dcterms:title "Alien and Invasive Species showcase" ;
	dcat:keywords "EUNIS habitat" ;
	lrmi:educationalLevel <lomCon:/difficulty/Medium> ;
	dcat:keywords "alien species" , "invasive species" .

<http://pjjk.local/resources/RcwrBMYavY> a dcat:Resource ;
	dcterms:title "EcoLogicaCup" ;
	dcat:keywords "game" ;
	lrmi:educationalLevel <lomCon:/difficulty/Medium> ;
	dcat:keywords "Italian" , "competition " , "ecology" , "school" .

That’s what I call interoperability in spite of multiple standards. Harmonization of metadata so that, even though the data started of as LOM XML records, we can create a database that can queried and exported as if the metadata were schema.org, Dublin Core, LRMI, DCAT…

Acknowledgements:

This brings together work from two projects that I have been involved in. The harmonization of metadata is from the DESM project, funded by the US Chamber of Commerce Foundation, and the ideas coming from Stuart Sutton. The application to LOM, schema.org, DCAT+LRMI came about from a small piece of work I did for DCC at the University of Edinburgh as input to the FAIRsFAIR project.

The post Harmonizing Learning Resource Metadata (even LOM) appeared first on Sharing and learning.

JSON Schema for JSON-LD⤴

from @ Sharing and learning

I’ve been working recently on definining RDF application profiles, defining specifications in JSON-Schema, and converting specifications from a JSON Schema to an RDF representation. This has lead to me thinking about, and having conversations with people  about whether JSON Schema can be used to define and validate JSON-LD. I think the answer is a qualified “yes”. Here’s a proof of concept; do me a favour and let me know if you think it is wrong.

Terminology might get confusing: I’m discussing JSON, RDF as JSON-LD, JSON Schema, RDF Schema and schema.org; which are all different things (go an look them up if you’re not sure of the differences).

Why JSON LD + JSON Schema + schema.org?

To my mind one of the factors in the big increase in visibility of linked data over that last few years has been the acceptability of JSON-LD to programmers familiar with JSON. Along with schema.org, this means that many people are now producing RDF based linked data often without knowing or caring that that is what they are doing. One of the things that seems to make their life easier is JSON Schema (once they figure it out). Take a look at the replies to this question from @apiEvangelist for some hints at why and how:

Also, one specification organization I am working with publishes its specs as JSON Schema. We’re working with them on curating a specification that was created as RDF and is defined in RDF Schema, and often serialized in JSON-LD. Hence the thinking about what happens when you convert a specification from RDF Schema to JSON Schema —  can you still have instances that are linked data? can you mandate instances that are linked data? if so, what’s the cost in terms of flexibility against the original schema and against what RDF allows you to do?

Another piece of work that I’m involved in is the DCMI Application Profile Interest Group, which is looking at a simple way of defining application profiles — i.e. selecting which terms from RDF vocabularies are to be used, and defining any additional constraints, to meet the requirements of some application. There already exist some not-so-simple ways of doing this, geared to validating instance data, and native to the W3C Semantic Web family of specifications: ShEx and ShACL. Through this work I also got wondering about JSON Schema. Sure, wanting to use JSON Schema to define an RDF application profile in JSON Schema may seem odd to anyone well versed in RDF and W3C Semantic Web recommendations, but I think it might be useful to developers who are familiar with JSON but not Linked Data.

Can JSON Schema define valid JSON-LD?

I’ve heard some organizations have struggled with this, but it seems to me (until someone points out what I’ve missed) that the answer is a qualified “yes”. Qualifications first:

  • JSON Schema doesn’t defined the semantics of RDF terms. RDF Schema defines RDF terms, and the JSON-LD context can map keys in JSON instances to these RDF terms, and hence to their definitions.
  • Given definitions of RDF terms, it is possible to create a JSON Schema such that any JSON instance that validates against it is a valid JSON-LD instance conforming to the RDF specification.
  • Not all valid JSON-LD representations of the RDF will validate against the JSON Schema. In other words the JSON Schema will describe one possible serialization of the RDF in JSON-LD, not all possible serializations. In particular, links between entities in an @graph array are difficult to validate.
  • If you don’t have an RDF model for your data to start with, it’s going to be more difficult to get to RDF.
  • If the spec you want to model is very flexible, you’ll have difficulty making sure instances don’t flex it beyond breaking point.

But, given the limited ambition of the exercise, that is “can I create a JSON Schema so that any data it passes as valid is valid RDF in JSON-LD?”, those qualifications don’t put me off.

Proof concept of examples

My first hint that this seems possible came when I was looking for a tool to use when working with JSON Schema and found this online JSON Schema Validator.  If you look at the “select schema” drop down and scroll a long way, you’ll find a group of JSON Schema for schema.org. After trying a few examples of my own, I have a JSON Schema that will (I think) only validate JSON instances that are valid JSON-LD based on notional requirements for describing a book (switch branches in github for other examples).

Here are the rules I made up and how they are instantiated in JSON Schema.

First, the “@context” sets the default vocabulary to schema.org and allows nothing else:

{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "$id": "context.json",
  "name": "JSON Schema for @context using schema.org as base",
  "description": "schema.org is @base namespace, but others are allowed",
  "type": "object",
  "additionalProperties": false,
  "required": [ "@vocab" ],
  "properties": {
    "@vocab": {
      "type": "string",
      "format": "regex",
      "pattern": "http://schema.org/",
      "description": "required: schema.org is base ns"
    }
  }
}

This is super-strict, it allows no variations on @context": {"@vocab" : "http://schema.org"} which obviously precludes doing a lot of things that RDF is good at, notably using more than one namespace. It’s not difficult to create looser rules, for example madate schema.org as the default vocabulary but allow some or any others. Eventually you create enough slack to allow invalid linked data (e.g. using namespaces that don’t exist; using terms from the wrong namespace) and I promised you only valid linked data would be allowed. In real life, there would be a balance between permissiveness and reliability.

Rule 2: the book ids must come from wikidata:

{
 "$schema": "http://json-schema.org/draft-07/schema#",
 "$id": "wd_uri_schema.json",
 "name": "Wikidata URIs",
 "description": "regexp for Wikidata URIs, useful for @id of entities",
 "type": "string",
 "format": "regex",
 "pattern": "^https://www.wikidata.org/entity/Q[0-9]+" 
}

Again, this could be less strict, e.g. to allow ids to be any http or https URI.

Rule 3: the resource described is a schema.org/Book, for which the following fragment serves:

    "@type": {
      "name": "The resource type",
      "description": "required and must be Book",
      "type": "string",
      "format": "regex",
      "pattern": "^Book$"
    }

You could allow other options, and you could allow multiple types, maybe with one type manadatory (I have an example schema for Learning Resources which requires an array of type that must include LearningResource)

Rules 4 & 5: the book’s name and description are strings:

    "name": {
      "name": "title of the book",
      "type": "string"
    },
    "description": {
      "name": "description of the book",
      "type": "string"
    },

Rule 6, the URL for the book (i.e. a link to a webpage for the book) must be an http[s] URI:

{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "$id": "http_uri_schema.json",
  "name": "URI @ids",
  "description": "required: @id or url is a http or https URI",
  "type": "string",
  "format": "regex",
  "pattern": "^http[s]?://.+"
}

Rule 7, for the author we describe a schema.org/Person, with a wikidata id, a familyName and a givenName (which are strings), and optionally with a name and description, and with no other properties allowed:

{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "$id": "person_schema.json",
  "name": "Person Schema",
  "description": "required and allowed properties for a Person",
  "type": "object",
  "additionalProperties": false,
  "required": ["@id", "@type", "familyName", "givenName"],
  "properties": {
    "@id": {
      "description": "required: @id is a wikidata entity URI",
      "$ref": "wd_uri_schema.json"
    },
    "@type": {
      "description": "required: @type is Person",
      "type": "string",
      "format": "regex",
      "pattern": "Person"
    },
    "familyName": {
      "type": "string"
    },
    "givenName": {
      "type": "string"
    },
    "name": {
      "type": "string"
    },
    "description": {
      "type": "string"
    }
  }
}

The restriction on other properties is, again, simply to make sure no one puts in any properties that don’t exist or aren’t appopriate for a Person.

The subject of the book (the about property) must be provided as wikidata URIs, with optional @type, name, description and url; there may be more than one subject for the book, so this is an array:

{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "$id": "about_thing_schema.json",
  "name": "About Thing Schema",
  "description": "Required and allowed properties for a Thing being used to say what something is about.",
  "type": "array",
  "minItems": 1,
  "items": {
    "type": "object",
    "additionalProperties": false,
    "required": ["@id"],
    "properties": {
      "@id": {
        "description": "required: @id is a wikidata entity URI",
        "$ref": "wd_uri_schema.json"
      },
      "@type": {
        "description": "required: @type is from top two tiers in schema.org type hierarchy",
        "type": "array",
        "minItems": 1,
        "items": {
          "type": "string",
          "uniqueItems": true,
          "enum": [
            "Thing",
            "Person",
            "Event",
            "Intangible",
            "CreativeWork",
            "Organization",
            "Product",
            "Place"
          ]
        }
      },
      "name": {
        "type": "string"
      },
      "description": {
        "type": "string"
      },
      "url": {
        "$ref": "http_uri_schema.json"
      }
    }
  }
}

Finally, bring all the rules together, making the @context, @id, @type, name and author properties mandatory; about, description and url are optional; no others are allowed.

{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "$id": "book_schema.json",
  "name": "JSON Schema for schema.org Book",
  "description": "Attempt at a JSON Schema to create valid JSON-LD descriptions. Limited to using a few schema.org properties.",
  "type": "object",
  "required": [
    "@context",
    "@id",
    "@type",
    "name",
    "author"
  ],
  "additionalProperties": false,
  "properties": {
    "@context": {
      "name": "JSON Schema for @context using schema.org as base",
      "$ref": "./context.json"
    },
    "@id": {
      "name": "wikidata URIs",
      "description": "required: @id is from wikidata",
      "$ref": "./wd_uri_schema.json"
    },
    "@type": {
      "name": "The resource type",
      "description": "required and must be Book",
      "type": "string",
      "format": "regex",
      "pattern": "^Book$"
    },
    "name": {
      "name": "title of the book",
      "type": "string"
    },
    "description": {
      "name": "description of the book",
      "type": "string"
    },
    "url": {
      "name":"The URL for information about the book",
      "$ref": "./http_uri_schema.json"
    },
    "about": {
      "name":"The subject or topic of the book",
      "oneOf": [
        {"$ref": "./about_thing_schema.json"},
        {"$ref": "./wd_uri_schema.json"}
      ]
    },
    "author": {
      "name":"The author of the book",
      "$ref": "./person_schema.json"
    }
  }
}

I’ve allowed the subject (about) to be given as an array of wikidata entity link/descriptions (as described above) or a single link to a wikidata entity; which hints at how similar flexibility could be built in for other properties.

Testing the schema

I wrote a python script (running in a Jupyter Notebook) to test that this works:

from jsonschema import validate, ValidationError, SchemaError, RefResolver
import json
from os.path import abspath
schema_fn = "book_schema.json"
valid_json_fn = "book_valid.json"
invalid_json_fn = "book_invalid.json"
base_uri = 'file://' + abspath('') + '/'
with open(schema_fn, 'r') as schema_f:
    schema = json.loads(schema_f.read())
with open(valid_json_fn, 'r') as valid_json_f:
    valid_json = json.loads(valid_json_f.read())
resolver = RefResolver(referrer=schema, base_uri=base_uri)
try :
    validate(valid_json,  schema, resolver=resolver)
except SchemaError as e :
    print("there was a schema error")
    print(e.message)
except ValidationError as e :
    print("there was a validation error")
    print(e.message)

Or more conveniently for the web (and sometimes with better messages about what failed), there’s the JSON Schema Validator I mentioned above. Put this in the schema box on the left to pull in the JSON Schema for Books from my github:

{
  "$ref": "https://raw.githubusercontent.com/philbarker/lr_schema/book/book_schema.json"
}

And here’s a valid instance:

{
  "@context": {
    "@vocab": "http://schema.org/"
  },
  "@id": "https://www.wikidata.org/entity/Q3107329",
  "@type": "Book",
  "name": "Hitchhikers Guide to the Galaxy",
  "url": "http://example.org/hhgttg",
  "author": {
    "@type": "Person",
    "@id": "https://www.wikidata.org/entity/Q42",
    "familyName": "Adams",
    "givenName": "Douglas"
  },
  "description": "...",
  "about": [
    {"@id": "https://www.wikidata.org/entity/Q3"},
    {"@id": "https://www.wikidata.org/entity/Q1"},
    {"@id": "https://www.wikidata.org/entity/Q2165236"}
  ]
}

Have a play, see what you can break; let me know if you can get anything that isn’t valid JSON LD to validate.

The post JSON Schema for JSON-LD appeared first on Sharing and learning.

Mapping learning resources to curricula in RDF⤴

from @ Sharing and learning

Some personal reflections on relating educational content to curriculum frameworks prompted by some conversation about the Oak National Academy (a broad curriculum of online material available to schools, based on the English national curriculum), and OEH-Linked-Frameworks (an RDF tool for visualizing German educational frameworks). It draws heavily on the BBC curriculum ontology (by Zoe Rose, I think). I’m thinking about these with respect to work I have been involved in such as K12-OCX and LRMI.

If you want to know why you would do this, you might want to skip ahead and read the “so what?” section first. But in brief: representing curriculum frameworks in a standard, machine-readable way, and mapping curriculum materials to that, would help when sharing learning resources.

Curriculum?

But first: curriculum. What does it mean to say  “a broad curriculum of online material available to schools, based on the English national curriculum”? The word curriculum is used in several different ways (there are 71 definitions in the IGI Global dictionary). ranging from “the comprehensive multitude of learning experiences provided by school to its students” (source) to “the set of standards, objectives, and concepts required to be taught and learned in a given course or school year” (source).  So curriculum in one sense is the teaching, in the other all that should be learnt. Those are different: the Oak National Academy provides teaching materials and activities (for learning experiences); the English National Curriculum specifies what should be learnt. Because very few people are interested in one but not the other, these two meanings often get conflated, which is normally fine but here I want to treat them separately and show how they relate to each other. Lets call them Curriculum Content and Materials, and Curriulum Frameworks respectively, think about how to represent the framework, and then how to relate to content and materials to that framework.

Curriculum Frameworks

This is where the BBC curriculum ontology comes in. It has a nice three-dimensional structure, creating the framework on the axes of Field of Study, Level and Topic.

The three dimensions of the BBC Curriculum Onology model. From https://www.bbc.co.uk/ontologies/curriculum

The levels are those that are defined by the national curriculum for progression English schools (KS = Key Stage, children aged 5 to 7 are normally at Key Stage 1; GCSE is the exam typically taken at 16, so represents the end of compulsory education, though students may stay on to study A-levels or similar after that). The levels used in curriculum frameworks tend to be very contextual, normally relating to the grade levels and examinations used in the school system for which the framework is written. It may be useful to relate them to more neutral (or at least, less heavily contextualised) schemes such as the levels of the EQF, or the levels of the Connecting Credentials framework.

The field of study may be called the “educational subject” (though I don’t like to writing RDF statements with Subject as the object) or, especialy in HE, “discipline”. Topics are the subjects studied within a field or discipline. I don’t much like the examples given here because the topics do just look like mini fields of study. I would wonder where to put “biology”–is it a topic within science or a field of study in its own right. A couple of points about field of study and one about topic may help clarify. In higher education a field of study if often called a discipline, which highlights that it is not just the thing being studied, but a community with a common interest and agreed norms on the tools and techniques used to study the subject. Most HE disciplines have an adjectival form that relates to people (I am a Physicists, she is a Humanist). In schools, fields of study are sometimes artifacts of the curriculum design process with no real equilavent outside of school. These artifacts often seem to have names that are initialisms that you won’t come across outside of specific school settings, for example RMPS ( Religious, Moral and Philosophical Studies), PE (Physical Education), PSHE (personal, social, health and economic education), ESL (English as a Second Language) / ESOL (English for Speakers of Other Languages), ICT (Information and Computer Techonolgy) DT (Design and Technology) — but very often the fields of study will have the same names as the top levels of a topic taxonomy (math/s, english, science). Most fields of study will have someone in a school who is a teacher of that field or leader of its teaching for the school. Topics are more neutral of context, less personal, more like the subjects of the Dewey Decimal System (at least more like they are supposed to be). It’s important to note that the same topic may be covered in different fields of study / disciplines in different ways. For example statistics may be a discipline itself (part of maths), with a very theoretical approach taken to studying the topics, but those topics may also be studied in biology, physics and economics. Crucially when it comes to facilitating discovery of suitable content materials for the curriculum, the approach taken and examples used will probably mean a resource aimed at teaching a statistics topic for economics is not very useful for teaching the same topic as part of physics or mathematics.

On to these axes get mapped the what are variously called learning objectives, intended learning outcomes, learning standards, and so on: the competences you want the students to acheive. They exist in the framework as statements of what  knowledge, skills, abilities a student is expected to be able to demonstrate. Let’s call them competences because that is a term that has wides currency beyond education, for example a competence can link educational outcomes to job requirements. There is a lot written about competences. There’s lots about how to write competence statements, including the form the descriptions should take (“you will be able to …”; how to form them as objectives (specific, mearsurable, …); how they relate to context (“able to … under supervision”); how they relate to each other (“you must learn to walk before you learn to run”); what tools should be used (“able to use a calculator to …”). And, of course, there are the specifications, standards and RDF vocabularies for representing all these aspects of competences, e.g. ASN, IMS CASE, ESCO. Let’s not go into that except to say that a curriculum framework will describe these competences as learning objectives and map them to the Field of study, topic and level schemes used by the framework. The same terms described below for mapping content to frameworks can be useful in doing this.

Mapping Curriculum Content to Curriculum Frameworks

So we have some curriculum content material; how do we map it to the curriculum framework?

It may help to model the content material in the way K12-OCX did, following oerschema, as a hierarchy of course, module, unit, lesson, activity, with associated materials and assessments:

Shows a hierarchy of course, module, unit, lesson, activity, with associated materials and assessments
The content model used by K12-OCX, based on oerschema.org

(Aside: any given course may not have modules or units, or either.)

Breaking curriculum materials down from monolithic courses to their constituent parts (while keeping the logical and pedagogical relationships between those parts) creates finer grained resources more easily accomodated into existing contexts.

At the Course level, oerschema.org gives us the property syllabus which can be used to relate the course to the framework as a whole, called by oerschema a CourseSyllabus, (“syllabus” is another word used in various ways, so lets not worry about any difference between a syllabus and a curriculum framework). This may also be useful at finer-grained levels, e.g. Module and Unit.

@prefix oer: <http://oerschema.org/> .
@prefix sdo: <http://schema.org/> .
@base <http://example.org/> .
<myCurriculumFramework> a oer:CourseSyllabus .
<myCourse> a oer:Course, sdo:Course ;
    oer:syllabus <myCurriculumFramework> .

[example code in tutle, there’s a JSON-LD version of it all below]

We can use the schema.org educationalLevel property to relate the resource to the educational level of the framework:

<myCourse> sdo:educationalLevel <myCurriculumFramework/Levels/KS4> .

Lets say our course deals with Mathematics and has a Unit on Statistics (no modules). We can use the schema.org AlignmentObject to say that there is an educationAlignment between my Course and my Unit to the field of study (that is, in the language of the alignment object, the educational subject). We can use the schema.org about property to say what the topic is:

<myCourse> sdo:hasPart <myUnit> ;
    sdo:educationalAlignment [
        a sdo:AlignmentObject ;
        sdo:alignmentType "educationalSubject";
        sdo:targetUrl <myCurriculumFramework/FieldsOfStudy/Mathematics>
    ] .

<myUnit> a oer:Unit, sdo:LearningResource ;
    sdo:educationalAlignment [
        a sdo:AlignmentObject ;
        sdo:alignmentType "educationalSubject";
        sdo:targetUrl <myCurriculumFramework/FieldsOfStudy/Mathematics>
    ] ;
    sdo:about <myCurriculumFramework/Topic/Statistics> .

For lessons, and especially for activities, we can relate to competences as individual learning objectives. The schema.org teaches property is designed for this:

<myUnit> sdo:hasPart <myLesson> .
<myLesson> a oer:Lesson, sdo:LearningResource ;
    sdo:hasPart <myActivity> .

<myActivity> a oer:Activity, sdo:LearningResource ;
   sdo:teaches <myCurriculumFramework/Objective/Competence0123> .

Whether you repeat about and educationalAlignment statements linking to “Field of Study” and “Topic” in the descriptions of Lessons and Activities depends on how much you want to rely on inferencing that something which is a part of a course has the same Fields of Study, something which is a part of Unit has the same topic, and so on. If your parts might get scattered, or used by systems that don’t do RDF inferencing, then you’ll want to repeat them (they will, you should). I haven’t done so here just to avoid repetition.

Finally, let’s link the competence statement to the framework (the framework here represented in a fairly crude way, not wanting to get into the intricacies of competence frameworks):

<myCurriculumFramework> a oer:CourseSyllabus, sdo:DefinedTermSet ;
    sdo:hasDefinedTerm <myCurriculumFramework/Objective/Competence0123> .

<myCurriculumFramework/Objective/Competence0123> a sdo:DefinedTerm,  
                                                   sdo:LearningResource ;
    sdo:educationalAlignment [ 
        a sdo:AlignmentObject ; 
        sdo:alignmentType "educationalSubject"; 
        sdo:targetUrl <myCurriculumFramework/FieldsOfStudy/Mathematics> 
    ] ;
    sdo:about <myCurriculumFramework/Topic/Statistics> ;
    sdo:educationalLevel <myCurriculumFramework/Levels/KS4> ;
    sdo:description "You will be able to use a calculator to find the mean..." ;
    sdo:name "Calculate the arithmetic mean" .

(Aside: Modelling a learning objective / competence as a defined term and a LearningResource is probably the most  controversial thing here, but I think it works for illustration.)

So What?

Well this shows several things I think would be useful:

  • Having metadata for a curriculum (whatever it is) will help others find it and use it, if suitable tools for using the metadata exist.
  • Tools are more likely to exist if the metadata is nicely machine readable (RDF, not PDF) and standardised (widely used vocabularies like schema.org).
  • A common model for curriculum frameworks will make mapping from one to another easier. For example. it’s easier to map from UK to US educational levels if they are clearly and separately defined.
  • Breaking curriculum materials down from monolithic courses to their constituent parts (while keeping the logical and pedagogical relationships between those parts) creates finer grained resources more easily accomodated into existing contexts.
  • Mapping curriculum materials to learning objectives in a given framework makes it easier to find resources for that curriculum, which is great, but the world is bigger than one curriculum.
  • Mapping both learning objectives and curriculum materials to the axes of the curriculum framework model makes it easier to find resources appropriate accross different curricula.

Finally, if you prefer your RDF as JSON-LD:

{
  "@context": {
    "oer": "http://oerschema.org/",
    "rdf": "http://www.w3.org/1999/02/22-rdf-syntax-ns#",
    "rdfs": "http://www.w3.org/2000/01/rdf-schema#",
    "schema": "http://schema.org/",
    "sdo": "http://schema.org/",
    "xsd": "http://www.w3.org/2001/XMLSchema#"
  },
  "@graph": [
    {
      "@id": "http://example.org/myCurriculumFramework",
      "@type": [
        "oer:CourseSyllabus",
        "schema:DefinedTermSet"
      ],
      "schema:hasDefinedTerm": {
        "@id": "http://example.org/myCurriculumFramework/Objective/Competence0123"
      }
    },
    {
      "@id": "http://example.org/myActivity",
      "@type": [
        "oer:Activity",
        "schema:LearningResource"
      ],
      "schema:teaches": {
        "@id": "http://example.org/myCurriculumFramework/Objective/Competence0123"
      }
    },
    {
      "@id": "http://example.org/myCourse",
      "@type": [
        "schema:Course",
        "oer:Course"
      ],
      "oer:syllabus": {
        "@id": "http://example.org/myCurriculumFramework"
      },
      "schema:educationalAlignment": {
        "@id": "_:ub132bL12C30"
      },
      "schema:educationalLevel": {
        "@id": "http://example.org/myCurriculumFramework/Levels/KS4"
      },
      "schema:hasPart": {
        "@id": "http://example.org/myUnit"
      }
    },
    {
      "@id": "http://example.org/myLesson",
      "@type": [
        "schema:LearningResource",
        "oer:Lesson"
      ],
      "schema:hasPart": {
        "@id": "http://example.org/myActivity"
      }
    },
    {
      "@id": "http://example.org/myUnit",
      "@type": [ 
        "oer:Unit",
        "schema:LearningResource"
       ],
       "schema:about": {
        "@id": "http://example.org/myCurriculumFramework/Topic/Statistics"
      },
      "schema:educationalAlignment": {
        "@id": "_:ub132bL21C30"
      },
      "schema:hasPart": {
        "@id": "http://example.org/myLesson"
      }
    },
    {
      "@id": "_:ub132bL12C30",
      "@type": "schema:AlignmentObject",
      "schema:alignmentType": "educationalSubject",
      "schema:targetUrl": {
        "@id": "http://example.org/myCurriculumFramework/FieldsOfStudy/Mathematics"
      }
    },
    {
      "@id": "_:ub132bL40C30",
      "@type": "schema:AlignmentObject",
      "schema:alignmentType": "educationalSubject",
      "schema:targetUrl": {
        "@id": "http://example.org/myCurriculumFramework/FieldsOfStudy/Mathematics"
      }
    },
    {
      "@id": "http://example.org/myCurriculumFramework/Objective/Competence0123",
      "@type": [
        "schema:LearningResource",
        "schema:DefinedTerm"
      ],
      "schema:about": {
        "@id": "http://example.org/myCurriculumFramework/Topic/Statistics"
      },
      "schema:description": "You will be able to use a calculator to find the mean ...",
      "schema:educationalAlignment": {
        "@id": "_:ub132bL40C30"
      },
      "schema:educationalLevel": {
        "@id": "http://example.org/myCurriculumFramework/Levels/KS4"
      },
      "schema:name": "Calculate the arithmetic mean"
    },
    {
      "@id": "_:ub132bL21C30",
      "@type": "schema:AlignmentObject",
      "schema:alignmentType": "educationalSubject",
      "schema:targetUrl": {
        "@id": "http://example.org/myCurriculumFramework/FieldsOfStudy/Mathematics"
      }
    }
  ]
}

 

The post Mapping learning resources to curricula in RDF appeared first on Sharing and learning.

Inclusion of Educational and Occupational Credentials in schema.org⤴

from @ Sharing and learning

The new terms developed by the EOCred community group that I chaired were added to the pending area in the April 2019 release of schema.org. This marks a natural endpoint for this round of the community group’s work. You can see most of the outcome  under EducationalOccupationalCredential. As it says, these terms are now “proposed for full integration into Schema.org, pending implementation feedback and adoption from applications and websites”. I am pretty pleased with this outcome.

Please use these terms widely where you wish to meet the use cases outlined in the previous post, and feel free to use the EOCred group to discuss any issues that arise from implementation and adoption.

My own attention is moving on the Talent Marketplace Signalling community group which is just kicking off (as well as continuing with LRMI and some discussions around Courses that I am having). One early outcome for me from this is a picture of how I see Talent Signalling requiring all these linked together:

Outline sketch of the Talent Signaling domain, with many items omitted for clarity. Mostly but not entirely based on things already in schema.org

 

The post Inclusion of Educational and Occupational Credentials in schema.org appeared first on Sharing and learning.

Using wikidata for linked data WordPress indexes⤴

from @ Sharing and learning

A while back I wrote about getting data from wikidata into a WordPress custom taxonomy. Shortly thereafter Alex Stinson said some nice things about it:


and as a result that post got a little attention.

Well, I have now a working prototype plugin which is somewhat more general purpose than my first attempt.

1.Custom Taxonomy Term Metadata from Wikidata

Here’s a video showing how you can create a custom taxonomy term with just a name and the wikidata Q identifier, and the plugin will pull down relevant wikidata for that type of entity:

[similar video on YouTube]

2. Linked data index of posts

Once this taxonomy term is used to tag a post, you can view the term’s archive page, and if you have a linked data sniffer, you will see that the metadata from WikiData is embedded in machine readable form using schema.org. Here’s a screenshot of what the OpenLink structured data sniffer sees:

Or you can view the Google structured data testing tool output for that page.

Features

  • You can create terms for custom taxonomies with just a term name (which is used as the slug for the term) and the Wikidata Q number identifier. The relevant name, description and metadata is pulled down from Wikidata.
  • Alternatively you can create a new term when you tag a post and later edit the term to add the wikidata Q number and hence the metadata.
  • The metadata retrieved from Wikidata varies to be suitable for the class of item represented by the term, e.g. birth and death details for people, date and location for events.
  • Term archive pages include the metadata from wikidata as machine readable structured data using schema.org. This includes links back to the wikidata record and other authority files (e.g. ISNI and VIAF). A system harvesting the archive page for linked data could use these to find more metadata. (These onward links put the linked in linked data and the web in semantic web.)
  • The type of relationship between the term and posts tagged with it is recorded in the schema.org structure data on the term archive page. Each custom taxonomy is for a specific type of relationship (currently about and mentions, but it would be simple to add others).
  • Short codes allow each post to list the entries from a custom taxonomy that are relevant for it using a simple text widget.
  • This is a self-contained plugin. The plugin includes default term archive page templates without the need for a custom theme. The archive page is pretty basic (based on twentysixteen theme) so you would get better results if you did use it as the basis for an addition to a custom theme.

How’s it work / where is it

It’s on github. Do not use it on a production WordPress site. It’s definitely pre-alpha, and undocumented, and I make no claims for the code to be adequate or safe. It currently lacks error trapping / exception handling, and more seriously it doesn’t sanitize some things that should be sanitized. That said, if you fancy giving it a try do let me know what doesn’t work.

It’s based around two classes: one which sets up a custom taxonomy and provides some methods for outputting terms and term metadata in HTML with suitable schema.org RDFa markup; the other handles getting the wikidata via SPARQL queries and storing this data as term metadata. Getting the wikidata via SPARQL is much improved on the way it was done in the original post I mentioned above. Other files create taxonomy instances, provide some shortcode functions for displaying taxonomy terms and provide default term archive templates.

Where’s it going

It’s not finished. I’ll see to some of the deficiencies in the coding, but also I want to get some more elegant output, e.g. single indexes / archives of terms from all taxonomies, no matter what the relationship between the post and the item that the term relates to.

There’s no reason why the source of the metadata need be Wikidata. The same approach could be with any source of metadata, or by creating the term metadata in WordPress. As such this is part of my exploration of WordPress as a semantic platform. Using taxonomies related to educational properties would be useful for any instance of WordPress being used as a repository of open educational resources, or to disseminate information about courses, or to provide metadata for PressBooks being used for open textbooks.

I also want to use it to index PressBooks such as my copy of Omniana. I think the graphs generated may be interesting ways of visualizing and processing the contents of a book for researchers.

Licenses: Wikidata is CC:0, the wikidata logo used in the featured image for this post is sourced from wikimedia and is also CC:0 but is a registered trademark of the wikimedia foundation used with permission. The plugin, as a derivative of WordPress, will be licensed as GPLv2 (the bit about NO WARRANTY is especially relevant).

The post Using wikidata for linked data WordPress indexes appeared first on Sharing and learning.

Getting data from wikidata into WordPress custom taxonomy⤴

from @ Sharing and learning

I created a custom taxonomy to use as an index of people mentioned. I wanted it to work nicely as linked data, and so wanted each term in it to refer to the wikidata identifier for the person mentioned. Then I thought, why not get the data for the terms from wikidata?

Brief details

Lots of tutorials on how to set up a custom taxonomy with with custom metadata fields. I worked from this one from smashingmagazine, to get a taxonomy call people, with a custom field for the wikidata id.

Once the wikidata is entered, this code will fetch & parse the data (it’s a work in progress as I add more fields)

<?php
function omni_get_wikidata($wd_id) {
    print('getting wikidata<br />');
    if ('' !== trim( $wd_id) ) {
	    $wd_api_uri = 'https://wikidata.org/entity/'.$wd_id.'.json';
    	$json = file_get_contents( $wd_api_uri );
    	$obj = json_decode($json);
    	return $obj;
    } else {
    	return false;
	}
}

function get_wikidata_value($claim, $datatype) {
	if ( isset( $claim->mainsnak->datavalue->value->$datatype ) ) {
		return $claim->mainsnak->datavalue->value->$datatype;
	} else {
		return false;
	}
}

function omni_get_people_wikidata($term) {
	$term_id = $term->term_id;
    $wd_id = get_term_meta( $term_id, 'wd_id', true );
   	$args = array();
   	$wikidata = omni_get_wikidata($wd_id);
   	if ( $wikidata ) {
    	$wd_name = $wikidata->entities->$wd_id->labels->en->value;
    	$wd_description = $wikidata->entities->$wd_id->descriptions->en->value;
    	$claims = $wikidata->entities->$wd_id->claims;
   		$type = get_wikidata_value($claims->P31[0], 'id');
   		if ( 'Q5' === $type ) {
			if ( isset ($claims->P569[0] ) ) {
				$wd_birth_date = get_wikidata_value($claims->P569[0], 'time');
				print( $wd_birth_date.'<br/>' );
			}
   		} else {
	   		echo(' Warning: that wikidata is not for a human, check the ID. ');
	   		echo(' <br /> ');
   		} 
    	$args['description'] = $wd_description;
    	$args['name'] = $wd_name;
		print_r( $args );print('<br />');
    	update_term_meta( $term_id, 'wd_name', $wd_name );
    	update_term_meta( $term_id, 'wd_description', $wd_description );
    	wp_update_term( $term_id, 'people', $args );
    	
   	} else {
   		echo(' Warning: no wikidata for you, check the Wikidata ID. ');
   	}
}
add_action( 'people_pre_edit_form', 'omni_get_people_wikidata' );
?>

(Note: don’t add this to edited_people hook unless you want along wait while causes itself to be called every time it is called…)

That on its own wasn’t enough. While the name and description of the term were being updated, the values for them displayed in the edit form weren’t updated until the page was refreshed. (Figuring out that it was mostly working took a while.) A bit of javascript inserted into the edit form fixed this:

function omni_taxonomies_edit_fields( $term, $taxonomy ) {
    $wd_id = get_term_meta( $term->term_id, 'wd_id', true );
    $wd_name = get_term_meta( $term->term_id, 'wd_name', true ); 
    $wd_description = get_term_meta( $term->term_id, 'wd_description', true ); 
//JavaScript required so that name and description fields are updated 
    ?>
    <script>
	  var f = document.getElementById("edittag");
	  var n = document.getElementById("name");
  	  var d = document.getElementById("description");
  	  function updateFields() {
  		n.value = "<?php echo($wd_name) ?>";
  		d.innerHTML = "<?php echo($wd_description) ?>";
  	  }

	  f.onsubmit=updateFields();
	</script>
    <tr class="form-field term-group-wrap">
        <th scope="row">
            <label for="wd_id"><?php _e( 'Wikidata ID', 'omniana' ); ?></label>
        </th>
        <td>
            <input type="text" id="wd_id"  name="wd_id" value="<?php echo $wd_id; ?>" />
        </td>
    </tr>
    <?php
}
add_action( 'people_edit_form_fields', 'omni_taxonomies_edit_fields', 10, 2 );

 

The post Getting data from wikidata into WordPress custom taxonomy appeared first on Sharing and learning.

Wikidata driven timeline⤴

from @ Sharing and learning

I have been to a couple of wikidata workshops recently, both involving Ewan McAndrew; between which I read Christine de Pizan‘s Book of the City of Ladies(*). Christine de Pizan is described as one of the first women in Europe to earn her living as a writer, which made me wonder what other female writers were around at that time (e.g. Julian of Norwich and, err…). So, at the second of these workshops, I took advantage of Ewan’s expertise, and the additional bonus of Navino Evans cofounder of Histropedia  also being there, to create a timeline of medieval European female writers.  (By the way, it’s interesting to compare this to Asian female writers–I was interested in Christina de Pizan and wanted to see how she fitted in with others who might have influenced her or attitudes to her, and so didn’t think that Chinese and Japanese writers fitted into the same timeline.)

Histropedia timeline of medieval female authors (click on image to go to interactive version)

This generated from a SPARQL query:

#Timeline of medieval european female writers
#defaultView:Timeline
SELECT ?person ?personLabel ?birth_date ?death_date ?country (SAMPLE(?image) AS ?image) WHERE {
  ?person wdt:P106 wd:Q36180; # find everything that is a writer
          wdt:P21 wd:Q6581072. # ...and a human female
  OPTIONAL{?person wdt:P2031 ?birth_date} # use florit if present for birth/death dates  
  OPTIONAL{?person wdt:P2032 ?death_date} # as some v impecise dates give odd results 
  ?person wdt:P570 ?death_date. # get their date of death
  OPTIONAL{?person wdt:P569 ?birth_date} # get their birth date if it is there
  ?person wdt:P27 ?country.   # get there country
  ?country wdt:P30  wd:Q46.   # we want country to be part of Europe
  FILTER (year(?death_date) < 1500) FILTER (year(?death_date) > 600)
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
  OPTIONAL { ?person wdt:P18 ?image. }
}
GROUP BY ?person ?personLabel ?birth_date ?death_date ?country
Limit 100

[run it on wikidata query service]

Reflections

I’m still trying to get my head around SPARQL, Ewan and Nav helped a lot, but I wouldn’t want to pass this off as exemplary SPARQL. In particular, I have no idea how to optimise SPARQL queries, and the way I get birth_date and death_date to be the start and end of when the writer flourished, if that data is there, seems a bit fragile.

It was necessary to to use florit dates because some of the imprecise birth & death dates lead to very odd timeline displays: born C12th . died C13th showed as being alive for 200 years.

There were other oddities in the wikidata. When I first tried, Julian of Norwich didn’t appear because she was a citizen of the Kingdom of England, which wasn’t listed as a country in Europe. Occitania, on the other hand was.  That was fixed. More difficult was a writer from Basra who was showing up because Basra was in the Umayyad Caliphate, which included Spain and so was classed as a European country. Deciding what we mean by European has never been easy.

Given the complexities of the data being represented, it’s no surprise that the Wikidata data model isn’t simple. In particular I found that dealing with qualifiers for properties was mind bending (especially with another query I tried to write).

Combining my novice level of SPARQL and the complexity of the Wikidata data model, I could definitely see the need for SPARQL tutorials that go beyond the simple “here’s how you find triple that matches a pattern” level.

Finally: histropedia is pretty cool.

Footnote:

The Book of the City of Ladies is a kind of women in red for Medieval Europe.  Rosalind Brown-Grant’s translation for Penguin Classics is very readable.

The post Wikidata driven timeline appeared first on Sharing and learning.