This project was suggested by the Credential Engine‘s CTDL Advisory Group, and ran from January to June this year. That was slightly longer than its initial 3 month estimate, but we covered more than we initially expected. The intended benefits were outlined by the CTDL Advisory group, and centre on making sure that micro-credentials issued in one jurisdiction are understandable in others, even when different data specifications have to be used in order to comply with local technical and political requirements and practices where they are issued. The end result envisaged is that individuals can have their achievements recognized globally.
We used the Data Ecosystem Mapping Tool to map elements from various specifications and standards related micro-credentials, such as CTDL, Open Badges, the versions of Open Badges used by a commercial badge issuer in Canada and Australia, W3C Verifiable Credentials and the European Learning Model: more information on those and specs and who I mean by “we” are in the report.
The results are available on the Credential Engines DESM site where you can see the degree of semantic alignment between these schemas, and there are some reflections on the results in the report.
The Data Ecosystem Schema Mapping (DESM) tool is one of the projects that I am working on for the US Chamber of Commerce Foundation’s T3 Innovation Network. DESM is a specialized tool for creating, editing, maintaining and viewing crosswalks between data models, these crosswalks are based on the degree of semantic alignment between terms in the different schemas. Colleagues on the project have produced two one-page fliers about DESM that have just been published: one explaining what DESM is and how it works, the other providing guidance on mapping projects that use DESM.
Watch this space for more about our use of DESM in both T3 and Credential Engine projects.
The best interoperability is interoperability between standards. I mean it’s one thing for you and I to agree to use the same standard in the same way to do the same thing, but what if we are doing slightly different things? What if Dublin Core is right for you, but schema.org is right for me–does that mean we can’t exchange data? That would be a shame, as one standard to rule them all isn’t a viable solution. This has been exercising me through a couple of projects that I have worked on recently, and what I’ll show here is demo based on the ideas from one of these (The T3 Data Ecosystem Mapping Project) applied to another where learning resource metadata is available in many formats and desired in others. In this post I focus on metadata available as IEEE Learning Object Metadata (LOM) but wanted in either schema.org or DCAT.
The Problem
Interoperability in spite of diverse standards being used seems an acute problem when dealing with metadata about learning resources. It makes sense to use existing (diverse) schema for describing books, videos, audio, software etc, supplemented with just enough metadata about learning to describe those things when they are learning resources (textbooks, instructional videos etc.). This is the approach taken by LRMI. Add to this the neighbouring domains with which learning resource metadata needs to connect, e.g. research outputs, course management, learner records, curriculum and competency frameworks, job adverts…, all of which have there own standards ecosystems and perhaps you see why interoperability across standards is desirable.
(Aside it often also makes sense to use a large all-encompassing standard like schema.org, often as well as more specialized standards, which is why LRMI terms are in schema.org.)
This problem of interoperability in an ecosystem of many standards was addressed by Mikael Nilsson in his PhD thesis “From Interoperability to Harmonization in Metadata Standardization” where he argued that syntax wasn’t too important, what mattered more was the abstract model. Specifically he argued that interoperability (or harmonization) was possible between specs that used the RDF entity-based metamodel but less easy between specs that used a record-like metamodel. IEEE Learning Object Metadata is largely record-like: a whole range of different things are described in one record, and the meanings of many elements depend on the context of the element in the record, and sometimes the values of other elements in the same record. Fortunately it is possible to identify LOM elements that are independent characteristics of an identified entity, which means it is possible to represent some LOM metadata in RDF. Then it is possible to map that RDF representation to terms from other vocabularies.
Step 1: RML to Map LOM XML to a local RDF vocabulary
This RML is the RDF Mapping Language, “a language for expressing customized mappings from heterogeneous data structures and serializations to the RDF data model … and to RDF datasets”. It does so through a set of RDF statements in turtle syntax that describe the mapping from (in my case, here) XML fragments specified as XPath strings to subjects, predicates and objects. There is parser called RMLMapper that will then execute the mapping to transform the data.
My LOM data came from the Lifewatch training catalogue which has a nice set of APIs allowing access to sets of the metadata. Unfortunately the LOM XML provided deviates from the LOM XML Schema in many ways, such as element names with underscore separations rather than camel case (so element_name, not elementName) and some nesting errors, so the RML I produced won’t work on other LOM XML instances.
Here’s a fragment of the RML, to give a flavour. The whole file is on github, along with other files mentioned here:
I have skipped the prefix declarations and jumped straight the part of the mapping that specifies the source file for the data, and the XMLPath of the element to iterate over in creating new entities. The subjectMap generates an entity identifier using a non-standard element in the LOM record appended to a local URI, and assigns this a class. After that a series of predicateObjectMaps specify predicates and where in the XML to find the values to use as objects. Running this through the mapper generates RDF descriptions, such as:
<http://pjjk.local/resources/DSEdW5uVa2> a lom:LearningObject;
lom:title "Research Game";
#etc...
Again I have omitted the namespaces; the full file, all statements for all resources, is on github.
Step 2: Describe the mappings in RDF
You’ll notice that lom: namespace in the mapping and generated instance data. That’s not a standard rdf representation of the IEEE LOM, it’s a local schema that defines the relationships of some of the terms mapped from IEEE LOM to more standard schema. The full file is on github, but again, here’s a snippet:
This is where the magic happens. This is the information that later allows us to use the metadata extracted from LOM records as if it is schema.org or LRMI, Dublin Core or DCAT. Because this schema is used locally only I haven’t bothered to put in much information about the terms other than their mapping to other more recognized terms. The idea here isn’t to be able to work with LOM in RDF, the idea is to take the data from LOM records and work with it as if it were from well defined RDF metadata schema. I also haven’t worried too much about follow-on consequences that may derive from the mappings that I have made, i.e. implied statements about relationships between terms in other schema, such as the implication that if lom:title is equivalent to dcterms:title, and also a subProperty of schema.org/name, then I am saying that dcterms:title is a subProperty of schema.org/name. This mapping is for local use, I’ll assert what is locally useful, if you disagree that’s fine because you won’t be affected by my local assertions.
Just to complete the schema picture, I also have RDF schema definitions files for Dublin Core Terms, LRMI, DCAT and schema.org.
(Aside: I also created some SKOS Concept Schemes for controlled vocabularies used in the LOM records, but they’re not properly working yet.)
Step 3: Build a Knowlege Graph
(Actually I just put all the schema definitions and the RDF representation of the LOM metadata into a triple store, but calling it a knowledge graph gets people’s attention.) I use a local install of Ontotext GraphDB (free version). It’s important when initializing the repository to choose a ruleset that allows lots of inferencing: I use the OWL-MAX option. Also, it’s important when querying the data to select the option to include inferred results.
Step 4: Results!
The data can now be queried with SPARQL. For example, a simple query to check what’s there:
This produces a list of URIs & titles for the resources:
r,n
http://pjjk.local/resources/DSEdW5uVa2,Research Game
http://pjjk.local/resources/FdW84TkcrZ,Alien and Invasive Species showcase
http://pjjk.local/resources/RcwrBMYavY,EcoLogicaCup
http://pjjk.local/resources/SOFHCa8sIf,ENVRI gaming
http://pjjk.local/resources/Ytb7016Ijs,INTERNATIONAL SUMMER SCHOOL Data FAIRness in Environmental & Earth Science Infrastructures: theory and practice
http://pjjk.local/resources/_OhX8O6YwP,MEDCIS game
http://pjjk.local/resources/kHhx9jiEZn,PHYTO VRE guidelines
http://pjjk.local/resources/wABVJnQQy4,Save the eel
http://pjjk.local/resources/xBFS53Iesg,ECOPOTENTIAL 4SCHOOLS
Nothing here other than what I put in that was converted from the LOM XML records.
That’s what I call interoperability in spite of multiple standards. Harmonization of metadata so that, even though the data started off as LOM XML records, we can create a database that can queried and exported as if the metadata were schema.org, Dublin Core, LRMI, DCAT…
Acknowledgements:
This brings together work from two projects that I have been involved in. The harmonization of metadata is from the DESM project, funded by the US Chamber of Commerce Foundation, and the ideas coming from Stuart Sutton. The application to LOM, schema.org, DCAT+LRMI came about from a small piece of work I did for DCC at the University of Edinburgh as input to the FAIRsFAIR project.