Tag Archives: python

Graphical Application Profiles?⤴

In this post I outline how a graphical representation of an application profile can be converted to SHACL that can be used for data validation.

My last few posts have been about work I have been doing with the Dublin Core Application Profiles Interest Group on Tabular Application Profiles (TAPs). In introducing TAPs I described them as “a human-friendly approach that also lends itself to machine processing”. The human readability comes from the tabular format, and the use of a defined CSV structure makes this machine processable. I’ve illustrated the machine processability through a python program, tap2shacl.py, that will convert a TAP into SHACL that can be used to validate instance data against the application profile, and I’ve shown that this works with a simple application profile and a real-world application profile based on DCAT. Once you get to these larger application profiles the tabular view is useful but a graphical representation is also great for providing an overview. For example here’s the graphic of the DCAT AP:

Mind the GAP

I’ve long wondered whether it would be possible to convert the source for a graphical representation of an application profile (let’s call it a GAP) into one of the machine readable RDF formats. That boils down to processing the native format of the diagram file or any export from the graphics package used to create it. So I’ve routinely been looking for any chance of that whenever I come across a new diagramming tool. The breakthrough came when I noticed that lucid chart allows CSV export. After some exploration this is what I came up with.

As diagramming software, what Lucid chart does is quite familiar from Visio, yEd, diagrams.net and the like: it allows you to produce diagrams like the one below, of the (very) simple book application profile that we use in the DC Application Profiles Interest Group for testing:

One distinctive feature of Lucid chart is that as well as just entering text directly into fields in the diagram, you can enter it into a data form associated with any object in the diagram, as shown below, first for the page and then for the shape representing the Author:

In the latter shot especially you can see the placeholder brackets [] in the AuthorShape object into which the values from the custom data form are put for display. Custom data can be associated with the document as a whole, any page in it and any shape (boxes, arrows etc) on the page; you can create templates for shapes so that all shapes from a given template have the same custom data fields.

I chose a template for to represent Node Shapes (in the SHACL/ShEx sense, which become actual shapes in the diagram) that had the the following data:

name and expected RDF type in the top section;
information about the node shape, such as label, target, closure, severity in the middle section; and,
a list of the properties that have the range Literal is entered directly in the lower section (i.e. these don’t come from the custom data form).

Properties that have a range of BNode or URI are represented as arrows.

By using a structured string for Literal valued properties, and by adding information about the application profile and namespace prefixes and their URIs into the sheet custom data, I was able to enter most of the data needed for a simple application profile. The main shortcomings are that format for Literal valued properties is limited, and that complex constraints such as alternatives (such as: use this Literal valued property or that URI property depending on …) cannot be dealt with.

The key to the magic is that on export as CSV, each page, shape and arrow gets a row, and there is a column for the default text areas and for the custom data (whether or not the latter is displayed). It’s an ugly, sparsely populated table, you can see a copy in github, but I can read it into a python Dict structure using python’s standard CSV module.

GAP2SHACL

When I created the TAP2SHACL program I aimed to do so in a very modular way: there is one module for the central application profile python classes, another to read csv files and convert them into those python classes, another to convert the python classes into SHACL and output them; so tap2shacl.py is just a wrapper that provide a user interface to those classes. That approach paid off here because having read the CSV file exported from lucid chart all I had to do was create a module to convert it into the python AP classes and then I could use AP2SHACL to get the output. That conversion was fairly straightforward, mostly just tedious if ... else statements to parse the values from the data export. I did this in a Jupyter Notebook so that I could interact more easily with the data, that notebook is in github.

Here’s the SHACL generated from the graphic for the simple book ap, above:

[code]
# SHACL generated by python AP to shacl converter
@base <http://example.org/> .
@prefix dct: <http://purl.org/dc/terms/> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix sdo: <https://schema.org/> .
@prefix sh: <http://www.w3.org/ns/shacl#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

<BookShape> a sh:NodeShape ;
sh:class sdo:Book ;
sh:closed true ;
sh:description "Shape for describing books"@en ;
sh:name "Book"@en ;
sh:property <bookshapeAuthor>,
<bookshapeISBN>,
<bookshapeTitle> ;
sh:targetClass sdo:Book .

<AuthorShape> a sh:NodeShape ;
sh:class foaf:Person ;
sh:closed false ;
sh:description "Shape for describing authors"@en ;
sh:name "Author"@en ;
sh:property <authorshapeFamilyname>,
<authorshapeGivenname> ;
sh:targetObjectsOf dct:creator .

<authorshapeFamilyname> a sh:PropertyShape ;
sh:datatype xsd:string ;
sh:maxCount 1 ;
sh:minCount 1 ;
sh:name "Family name"@en ;
sh:nodeKind sh:Literal ;
sh:path foaf:familyName .

<authorshapeGivenname> a sh:PropertyShape ;
sh:datatype xsd:string ;
sh:maxCount 1 ;
sh:minCount 1 ;
sh:name "Given name"@en ;
sh:nodeKind sh:Literal ;
sh:path foaf:givenName .

<bookshapeAuthor> a sh:PropertyShape ;
sh:minCount 1 ;
sh:name "author"@en ;
sh:node <AuthorShape> ;
sh:nodeKind sh:IRI ;
sh:path dct:creator .

<bookshapeISBN> a sh:PropertyShape ;
sh:datatype xsd:string ;
sh:name "ISBN"@en ;
sh:nodeKind sh:Literal ;
sh:path sdo:isbn .

<bookshapeTitle> a sh:PropertyShape ;
sh:datatype rdf:langString ;
sh:maxCount 1 ;
sh:minCount 1 ;
sh:name "Title"@en ;
sh:nodeKind sh:Literal ;
sh:path dct:title .
[/code]

I haven’t tested this as thoroughly as the work on TAPs. The SHACL is valid, and as far as I can see it works as expected on the test instances I have for the simple book ap (though slight variations in the rules represented somehow crept in). I’m sure there will be ways of triggering exceptions in the code, or getting it to generate invalid SHACL, but for now, as a proof of concept, I think it’s pretty cool.

What next?

Well, I’m still using TAPs for some complex application profile / standards work. As it stands I don’t think I could express all the conditions that often arise in an application profile in an easily managed graphical form. Perhaps there is a way forward by generating a tap from a diagram and then adding further rules, but then I would worry about version management if one was altered and not the other. I’m also concerned about tying this work to one commercial diagramming tool, over which I have no real control. I’m pretty sure that there is something in the GAP+TAP approach, but it would need tighter integration between the graphical and tabular representations.

I also want to explore generating other outputs that SHACL from TAPs (and graphical representations). I see a need to generate JSON-LD context files for application profiles, we should try getting ShEx from TAPs, and I have already done a little experimenting with generating RDF-Schema from Lucid Chart diagrams.

The post Graphical Application Profiles? appeared first on Sharing and learning.

Using the WordPress REST API to post a book from WikiSource to PressBooks with python⤴

from Phil Barker @ Sharing and learning

I am using Pressbooks to build an online edition of Southey and Coleridge’s Omniana. I transcribed the text for Volume I on wikisource. This post is about how I got that text into pressbooks; copy and paste didn’t appeal, so I thought I would try using the WordPress REST API. You could probably write a PHP plugin that would do this, but I find python a bit easier for exploratory work, so I used that.

Getting the data from Wikisource is reasonably trivial. On wikisource I have transcluded the page transcriptions into a single HTML file of the whole book. This file is relatively easy to parse into the individual articles for posting to Pressbooks, especially as I added <hr /> tags before each article (even the first) and added stop at the end.

In the longer term I want to start indexing the PressBook Omniana using wikidata for linked data. This will let me look at the semantic graph of what Southey and Coleridge were interested in.

First steps with the WordPress API

I’ve not used the WordPress API before, but it is well documented and there is a useful series of articles on envatoTuts+: Introducing the WP REST API.

Put /wp-json onto the end of a WordPress blog URL and you can see the routes and endpoints (e.g. this blog, my Pressbooks/Omniana). (I use the JSON viewer chrome plugin to make these easier to read.) I found wp-api-python very useful in helping make requests against these in python. It’s available via pip as wordpress-api and I found it required python the libraries request beautifulsoup4, requests-oauthlib and six. It authenticates via OAuth, so on WordPress you need the WordPress REST API – Oauth1.0a plugin or similar; there’s more than you need to know about how OAuth works on envatotuts+.

I installed the Oauth1.0a plugin for the network on a WordPress multisite and PressBook test servers. Network activation seemed to generate errors on Pressbooks and plain multisite WordPress, so I activated it only for the individual blog/book. Then in the Users tab on the admin screen I was will be able to view and set up applications:

Filling out the details and clicking on save consumer and gave me a client key and client secret.

Back in python I used these to poke around the various API endpoints of my test multisite installation of WordPress, e.g.

from wordpress import API
base_url = "http://wordpress.home.local/test"
api_path = "/wp-json/wp/v2/"
wpapi = API(
    url=base_url,
    consumer_key="thisismykey",
    consumer_secret="thisismysecret",
    api="wp-json",
    version="wp/v2",
    wp_user="phil",
    wp_pass="thisismypassword",
    oauth1a_3leg=True,
    creds_store="~/.wc-api-creds.json",
    callback="http://wordpress.home.local/test/api-test"
)
print("listing posts")
resource = "posts"
try:
    response = wpapi.get(base_url+api_path+resource)
    for post in response.json():
        print(post['id'], post['title'])
except Exception as e:
    print("couldn't get posts")
    print(e)

wpapi uses requests methods, documented here. Other useful properties and methods are

r.ok: boolean, True if HTTP status code is <400
r.content, response content in bytes,
r.text, response content in text
r.headers, response headers
r.iter_lines() content a line at a time
r.json() response as a json object

Posting to WordPress

Following the envatoTuts+ Creating, Updating, and Deleting Data article and translating to python:

from wordpress import API
base_url = "http://wordpress.home.local/test"
api_path = "/wp-json/wp/v2/"
wpapi = API(
    url=base_url,
    consumer_key="thisismykey",
    consumer_secret="thisismysecret",
    api="wp-json",
    version="wp/v2",
    wp_user="phil",
    wp_pass="thisismypassword",
    oauth1a_3leg=True,
    creds_store="~/.wc-api-creds.json",
    callback="http://wordpress.home.local/test/api-test"
)

print("creating new post")
resource = "posts"
title = "86. Glover's Leonidas."
content = """Glover's Leonidas was unduly praised at its first appearance, and more unduly ...
..."""
excerpt = """Glover's Leonidas was unduly praised at its ..."""
data = {
    "content": content,
    "title": title,
    "excerpt": excerpt,
    "status": "draft",
    "categories": [190]
}
try:
    response = wpapi.post(base_url+api_path+resource, data)
    print(response.json())
except Exception as e:
    print("couldn't post")
    print(e)

The posts resource collection allows creation and retrieval (POST and GET methods); a specific posts/(?P<id>[\d]+) resource allows update and delete (PUT, PATCH and DELETE methods).

The keys for the data dict are the same as the schema for the WordPress API method, which are also shown in the arguments listed in the JSON returned by wp-json for each endpoint under each route.

Posting to Pressbooks

Pressbooks has a whole extended set of api routes and endpoints, no ‘posts’ resources, but front-matter, back-matter, parts and chapters; all under the /pressbooks/v2/ path.

There is some documentation on the Pressbooks site. I’m posting articles as chapters into a Pressbook site that already has some organised content, so I don’t have to worry about setting them up. Adapting from the above, changing to URL and credentials to those for my local test instance of Pressbooks, and changing the api-path, version, and resource name, this posts a test chapter to the content part of my book, as a “numberless” chapter-type:

from wordpress import API
base_url = "http://books.home.local/omniana"
api_path = "/wp-json/pressbooks/v2/"
wpapi = API(
    url=base_url,
    consumer_key="thisismykey",
    consumer_secret="thisismysecret",
    api="wp-json",
    version="pressbooks/v2",
    wp_user="phil",
    wp_pass="thisismypassword",
    oauth1a_3leg=True,
    creds_store="~/.wc-api-creds3.json",
    callback="http://books.home.local/omniana/api-test"
)
print("creating new chapter")
resource = "chapters"
data = {
"content": "test",
"title": "test",
"status": "publish",
"chapter-type": 48,
"part": 27
}
try:
response = wpapi.post(base_url+api_path+resource, data)
pprint(response.json())
except Exception as e:
print("couldn't post")
print(e)

Finding the ids for chapter-type and part need a little detective work. You can, of course use an API call to GET the parts and list their names and ids, in a similar way to listing the posts in the first example above; or you can just edit the part or chapter-type in the Bookpress admin interface and inspect the url. It’s also worth noting that you need a different creds_store for each OAUTH provider you connect to.

Next Steps

As I said, parsing reading through and parsing the transcluded the page transcriptions wasn’t too hard (I put some markers in the transclusion to help). I made some changes to the content before posting it: perhaps the most interesting issue was changing the wiki style footnotes to Pressbook style.

At the time of writing, I have started posting to the live/public instance of Omniana on Pressbooks but still have to sort some formatting issues: removing line breaks, making sure that the CSS selectors are appropriate for WordPress; that shouldn’t take long to fix.

Then I want to start indexing the articles using wikidata for linked data.

The post Using the WordPress REST API to post a book from WikiSource to PressBooks with python appeared first on Sharing and learning.

Translating course descriptions from XCRI-CAP to schema.org⤴

from Phil Barker @ Sharing and learning

XCRI-CAP (eXchanging Course Related Information, Course Advertising Profile) is the UK standard for course marketing information in Higher Education. It is compatible with the European Standard Metadata for Learning Opportunities. The W3C schema course extension community group has developed terms for describing educational courses that are now part of schema.org. Here I look at translating the data from an XCRI-CAP xml feed to schema.org json-ld.

Why?

The aim here is to illustrate the extent to which the two specifications are interoperable. Also mapping a functioning specification for advertising courses to schema.org terms will give an indication what might be lacking from the latter, or to help define a subset of schema.org that could be used as an application profile for course advertising. Finally, there is a python script that might be the start of a useful tool for people who have XCRI-CAP data and want to use schema.org to describe those courses.

Before going any further I should clear up one potential point of misunderstanding. In the UK ‘course’ is often used to describe a programme of study at University or College level lasting from one to five years, leading to an award such as an Diploma, Degree, Masters etc. These ‘courses’ also called programmes, and roughly translate to what in the US can be called a Course of Study. They typically comprise several modules, also often called courses (sorry, we made up this language as we went along). XCRI-CAP is primarily used to describe these long courses/programs of study, because in the UK that is what institutions typically advertise to potential students. However, XCRI-CAP can also be used to describe short courses. My sense from the development of the schema course extension is that many people had short courses in mind (e.g. MOOCs), however it is also applicable to long courses / programs of study. So, in short, for this discussion, if it is a “sequence of events and/or creative works that aims to build the knowledge, competence or ability of learners” then I’ll call it a course, however long or short it is.

The anatomy of an XCRI-CAP XML feed in schema.org terms

In XCRI. a Catalog contains Providers, who offer Courses. Courses specify Presentations, i.e. instances of the course, at a venue which is a Provider. Providers have locations. Courses lead to Qualifications and/or Credit that can be used towards a qualification. (image after BS 8581-1:2012 p1)

To help show how XCRI-CAP maps to schema.org terms I took a model XCRI XML feed prepared by Alan Paull and gutted it of most content (this is an example of the Post Graduate XCRI format):

<?xml version="1.0" encoding="UTF-8"?>
<!-- 
Author: Alan Paull, APS Ltd, alan@alanpaull.co.uk
Created: 25 June 2014; modified: 21 May 2015
This is a generic XCRI-CAP 1.2 example file produced to illustrate the postgraduate format adopted by Prospects, including material that would be expected to be relevant for other aggregators.
It uses the coursedataprogramme.xsd schema.
It uses revisions to the schemas to include specific refinements for postgraduate data vocabularies.

Modified by Phil Barker <http://people.pjjk.net/phil> to show just starting tags and hints to content
-->

<catalog xmlns="http://xcri.org/profiles/1.2/catalog"
    xmlns:xcriTerms="http://xcri.org/profiles/1.2/catalog/terms"
    xmlns:credit="http://purl.org/net/cm" 
    xmlns:dcterms="http://purl.org/dc/terms/"
    xmlns:dc="http://purl.org/dc/elements/1.1/"
    xmlns:mlo="http://purl.org/net/mlo"
    xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos"
    xmlns:xhtml="http://www.w3.org/1999/xhtml"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xmlns:courseDataProgramme="http://xcri.co.uk">
    <dc:contributor>
    <dc:description>
    <provider>
    <!-- list of all the university's departments that can 'own' courses. -->
        <mlo:hasPart>
        <mlo:hasPart>
        <dc:description>
        <dc:identifier>
        <dc:identifier xsi:type="courseDataProgramme:ukprn"><!-- numerical id -->
        <dc:title>Poppleton University</dc:title>
        <mlo:url>
        <course>
        <!-- isPartOf must match exactly with a hasPart entry -->
            <mlo:isPartOf>
            <!-- Note XHTML markup (concise markup version) -->
            <dc:description>
                <xhtml:div>
            </dc:description>
        <!-- 'specialFeature' is in the XCRI-CAP 1.2 Terms schema (verbose markup version) -->
            <dc:description xsi:type="xcriTerms:specialFeature">
                <div xmlns="http://www.w3.org/1999/xhtml">
            </dc:description>
            <dc:identifier><!--url-->
            <dc:identifier xsi:type="courseDataProgramme:internalID"><!--alpha-numeric id-->
            <dc:subject xsi:type="courseDataProgramme:JACS3" identifier="N200"><!--name of subject-->
            <dc:subject><!--name of subject-->  
            <dc:title>
        <!-- Course type codes specific to PG(T) -->
            <dc:type xsi:type="courseDataProgramme:courseTypeGeneral" courseDataProgramme:identifier="PG"><!--label for code-->
            <dc:type xsi:type="mlo:RTCourseTypeFlag" mlo:RT-identifier="T"><!--label for code-->
            <mlo:url>
            <abstract>
            <applicationProcedure href="http://www.poppleton.ac.uk/postgraduate/courses/how-to-apply/"/>
            <mlo:assessment>
            <learningOutcome>
            <mlo:objective>
            <mlo:prerequisite>
            <regulations href="www.poppleton.ac.uk/regulations"/>
            <mlo:qualification>
                <dc:identifier><!--alpha-numeric id-->
                <dc:title>
                <abbr>
                <dc:description>
                <dcterms:educationLevel>
                <mlo:url>
                <awardedBy>
            </mlo:qualification>
            <mlo:credit>
                <credit:scheme>
                <credit:level><!--code-->
                <credit:value><!--value-->
            </mlo:credit>
            <presentation>
                <dc:identifier><!--url-->
                <dc:title>
                <mlo:start dtf="2015-09-01"><!--text equiv-->
                <end dtf="2017-07-01"><!--text equiv-->
                <mlo:duration interval="P2Y"><!--text equiv-->
                <applyFrom dtf="2014-09"><!--text equiv-->
                <applyUntil dtf="2015-09"><!--text equiv-->
                <applyTo><!--url-->
                <studyMode identifier="PT"><!--label-->
           <!-- Note: in the absence of attendanceMode, consumers can assume that it is Campus, so the attendanceMode can be omitted -->
                <attendanceMode identifier="CM"><!--label--> 
                <attendancePattern identifier="DT"><!--label-->
                <mlo:languageOfInstruction><!--iso 639-2 code-->
                <languageOfAssessment><!--iso 639-2 code-->
                <mlo:places><!--iso 639-2 code-->
                <mlo:cost><!--free text description-->
            <!-- Note: in the absence of venue, consumers can assume that it is as per the main provider element, so the venue can be omitted -->
                <venue>
                    <provider>
                        <dc:identifier><!--label-->
                        <dc:title>
                        <mlo:location>
                            <mlo:town>
                            <mlo:postcode>
                            <mlo:address>
                            <mlo:address>
                            <mlo:phone>
                            <mlo:email>
                        </mlo:location>
                    </provider>
                </venue>
            </presentation>
        </course>
        <mlo:location>
            <mlo:town>
            <mlo:postcode>
            <mlo:address>
            <mlo:address>
        <!-- international convention also acceptable: +44 (0) 800 666 9999 -->
            <mlo:phone>
            <mlo:fax>
            <mlo:email>
        </mlo:location>
    </provider>
</catalog>

Working through this from the top (root) down (up?):

catalog

Main text gives full desctiption — Top levels of XCRI and corresponding schema.org types and properties. (click to enlarge)

In XCRI catalog is the root element for a list of courses, the Google Developer guidance for describing course lists suggest schema.org/ItemList is a good equivalent. The catalog element has an @generated attribute which is the date on which the catalog content was generated, it also has sub elements of description and contributor. I haven’t implemented this yet, but they could be translated if the schema.org course list is double typed as an ItemList and a CreativeWork. In schema.org the relationship between the course list / catalog and the Courses is provided by the itemListElement property of the ItemList. This expects a value which is a ListItem, and so we need to double type the course entities in schema.org as a ListItem and Course.

provider

In XCRI XML the relationship between courses and the organizations that provide them is expressed by nesting the a course element nested inside a provider element. In schema.org we use the provider property that Course inherits from CreativeWork. The information about the provider, i.e. description, title, parts, location (as a postal address), identifier, url mostly have obvious counterparts in schema.org/Organization (i.e. description, name, subOrganization (not implemented), address (as PostalAddress) and url). Identifiers take a bit of thought, see below.

course

XCRI makes the distinction between Course as a thing which may be offered at different times and places, and Presentation as an offering or instantiation of a course. This is the same as the distinction between schema.org/Course and schema.org/CourseInstance. So the course elements in XCRI map directly to schema.org/Course entities.

The subelements of xcri:course that map clearly to schema.org properties of Course are: title (maps to name), url, abstract (maps to description), subject (maps to about, and, when an identifier from suitable framework is specified, an educational subject alignment) and mlo:prerequiste (maps to coursePrerequisites). Identifiers take a bit of thought, see below, but if the identifier was not an http URI I took a punt at it being the schema.org/courseCode (I am especially sure of this if it had the internalID attribute).

As well as abstract there were other descriptions in the XCRI feed, formatted in XHTML giving marketing information. These I passed over, but (stripped of the formatting) they could be used as descriptions, especially if the abstract is absent.

XCRI Elements that I haven’t mapped yet are, isPartOf, dc:Type, applicationProcedure, mlo:assessment, mlo:objective, regulations, mlo:qualifications, and mlo:credit. The last two of these are known gaps in schema’s ability to describe courses; there might be mappings to some LRMI properties for some of the others in some circumstances. For example if the dc:Type is PG: Postgraduate, then this could be an alignment to some educational level.

Additionally, we use the provider property of schema.org/Course to link to the provider, a relationship that is conveyed in XCRI XML by nesting, as mentioned above.

presentation

Shows XCRI elements and corresponding schema classes and relationships as described in main text. — XML to schema.org mapping (click to enlarge)

The xcri:presentation maps to schema.org/CourseInstance, which is linked to from the Course by the hasCourseInstance property.

Elements of presentation which map directly to properties of CourseInstance are: title (maps to name), mlo:start (maps to startDate) end (maps to endDate), mlo:duration (maps to duration), studyMode, attendanceMode and attendance pattern (all mapped to courseMode). The venue element maps to CourseInstance’s location property, though the provider’s identifier turns up here in a way which requires a bit of thought, see below.

A number of other elements (namely cost, applyFrom, applyUntil and applyTo) can all be mapped to properties of a schema.org/Offer. mlo:cost maps to a description of a PriceSpecification (the costs for UKHE degrees are usually more complex than can be given with a single number/currency pair), the others map to availabilityStarts, availabilityEnds, availabilityAtOrFrom. This Offer is linked to from the CourseInstance’s offers property.

Identifiers

There are multiple identifiers in various formats in the XCRI XML input, and various required identifiers in the schema.org graph of the course information. As discussed above, some of the dc:identifiers provided were short alphanumeric codes, and were used as, for example, the value for schema.org/courseCode, or to identify an educational subject in an Alignment Object. There is also the mlo:url element, which I used for the schema.org/url property.

What I skipped over several times is that, as well as the mlo:url, similar (or identical) http URIs were used as values for dc:identifier. Also, as well as the schema.org url property, for linked data we need an identifier for the entities we are describing (the @id tag in JSON-LD), preferably an http URI. So, I decided to experiment with using the the dc:identifiers in XCRI XML as @id identifiers for the JSON-LD. This has an advantage over just using an arbitrary random identifier in that for larger data sets there is a chance of reducing repetition in the serialization of the graph. For example with luck many courses will share the same location, and so this could appear as a properly identified entity in the graph to which many Course Instance location properties link. I have experimented with different orders or preference for what to use for the @id, (e.g. 1. dc:identifier beginning with http, 2. mlo:url, 3. dc:identifier with text value). I immediately hit a snag with this, because the same http URI was being used for different things in the XCRI example, e.g. for course and presentation, or for institution and venue, and it troubled me that the URI I was using was actually the identifier for an Institutional web page. So to disambiguate I appended #<SchemaType> to the URI, e.g. http://example.org/course1#Course, http://example.org/course1#CourseInstance.

This is probably going to take more thinking about in the future.

Implementation

Most of the above mapping is implemented (or with luck, soon will be) in a python script using xml.eTree and rdflib. You’re welcome to take a look at it on github, but please bear in mind that it is pretty much untested on any input other than the example file given above. It is certainly not production level code, so don’t use it as such.

The output in JSON-LD is pretty unreadable, so perhaps the most interesting way to view it is through the Google Structured Data Testing Tool. Ignore the errors and warnings, they arise from requirements for various Google products, not problems with the schema.org data.

Conclusions

Broadly speaking, it works. With some exceptions that didn’t surprise me the XCRI-CAP data about a course can be represented as schema.org linked data.
Credit and Qualifications seem to me to be the biggest gaps, relating to existing use cases from the schema course extension community group.
Likewise there is a gap around how to represent aims and objectives in schema.org, which might be related to work on competencies.
In several places there was coded information in the XCRI (e.g. UK Provider ID, course mode codes) which isn’t easy to represent in schema.org. But this issue is being worked on.

Next steps

I’ll tidy up the code a bit, and I also want to test it more extensively. I’m also pondering putting the resulting JSON-LD into a graph database to test how well it can be queried. This would be a great test of whether the schema course extension project really did meet it’s use cases.

Do drop me a line if you have any ideas, or if you have any XCRI feeds (or similar data in another format) I could play with.

The post Translating course descriptions from XCRI-CAP to schema.org appeared first on Sharing and learning.

A micro look at microbit⤴

from john @ John's World Wide Wall Display

A lot of micro:bits from the BBC arrived in the centre where I work, ready to be distributed to North Lanarkshire schools. I’ve taken the opportunity to break one out and have a wee play.

The devices are aimed at secondary so outside my wheelhouse, but I could not resist a wee play.

The microbic works by creating code for it on a computer and flashing it to the device via USB (you can also use bluetooth from a mobile app). There are several different ways to create code. You can do in in the browser with severe different editors, Code Kingdom’s JavaScript, The Microsoft Block Editor, Microsoft Touch Develop or Python. I’ve had a quick try of most of these. You can also use the MU python editor that runs on Windows, OSX, Linux and Raspberry Pi.

Although I don’t really know any python I’ve found that the MU editor the most reliable. The browser based ones have been occasionally flaky, causing me to switch browsers a few times. I also like to have anything stored locally (the browser editor stores in local storage, but that means you need to either get an account sorted out or use the same browser on the same box all the time.)

There are already a nice set of resource building up, I found the Raspberry Pi and micro:bit Playground both useful.

When I was looking at the Tilty Game from the micro:bit Playground I though I might be able to make a ‘paint’ editor. This is the result. (click to start the movie, I’ve just found you can use a gif as a poster frame)

The code allows you to draw on the microbes LEDs, the left and right buttons move the cursor in a horizontal and vertical directions and a double press toggle the lights.

And here is the code, I used hilite.me to make it look nicer. Not exactly rocket science. I expect there are better ways of doing this.

from microbit import *

Matrix = [[0 for x in range(5)] for x in range(5)]

#set initial position
x = 2
y = 2

def printmatrix():
    for x in range(5):
        for y in range(5):
            if (Matrix[x][y]):
                display.set_pixel(x, y, 6)
            else:
                display.set_pixel(x, y, 0)
    return;
            
#show cursor
display.set_pixel(x, y, 9)
 
while True:
    if button_a.is_pressed() and button_b.is_pressed():
        if (Matrix[x][y]==0):
            Matrix[x][y]=1
        else:
            Matrix[x][y]=0
        printmatrix()
        sleep(1000)
        continue
                 
    elif button_a.is_pressed():
        x = x + 1
        if (x>4):
            x=0
        printmatrix()
        display.set_pixel(x, y, 9)
    elif button_b.is_pressed():
        y = y + 1
        if (y>4):
            y=0
        printmatrix()
        display.set_pixel(x, y, 9)
    sleep(200)

The idea is we store a matrix of which lights are on. The ones turned on are shown by the printmatrix function. They are displayed at a brightness of 6 to distinguish them from the cursor, which is full beam.

The cursor is moved with the left and right buttons. it loops (I wonder if it would be better to bounce it?) Clicking the left and right buttons toggles the light on or of in the matrix. The reset button clears the screen.

I had quite a lot of fun getting this to work, the formatting of the script caught me out a few times. I wonder, if I was smarter, could I take the same approach and make a noughts and crosses app?

Some more resources
BBC micro:bit and the iPad by Steve Bunce on iBooks looks like an interesting resource.
A search of gists at git hub finds some example python code: Search · microbit
microbit – YouTube Search

Featured image on this post a gif made from BBC micro:bit by Gareth Halfacree used under a Creative Commons — Attribution-ShareAlike 2.0 Generic — CC BY-SA 2.0 License.

Playing with Pinboard⤴

from john @ John's World Wide Wall Display

Pinboard

I’ve been using pinboard for collecting links for five years now. I like it a lot, it feeds the Links page here and most of the enviable stuff.

One of the main things I like about it is its simplicity. Pinboard lists the links, titles, and descriptions without any images or fancy stuff. Adding links via the bookmarklet is simple. It supports the delicious API and has RSS so you can pull sets of links onto blogs and webpages easily enough.

Last week I used the service to play around with python a little. To produce a more visual representation of my recent links. I appreciate the irony. This was an excuse to play with several technologies that I do not know much about.

Last month I had read: this post Homemade RSS aggregator followup by Dr Drang. This shows how to make an RSS reader with python.

I’ve very occasionally played with python for an hour or two but do not really understand the basics. I can however try things repeatedly until they worked.

Planing and playing

My plan was to use the code from Dr Drang, simplifying it to deal with just one RSS feed. Using my pinboard links to produce a webpage. I also wanted to make thumbnails of the websites linked and play with CSS and JavaScript a bit.

The idea was to create the webpage in my dropbox. This could be updated automatically by the script running on my mac. I’ve had dropbox long enough to have a Public folder that is very handy for publishing webpages. This is now a pro and business option only.

Here is the script: pinboardrecent.py and the current output: Recent Pinboard.

Problems

The interesting thing about all of this is the several problems I hit and their solution.

The problem included:

Not know how to do something
Errors in the code I wrote
Errors with webkit2png ¹ which I was using to produce the thumbnails.

The answers all involved google and testing and re-testing until things worked. In ~~some~~ all cases I am sure my answers were not the best way of doing things but they worked. I’ve noted most of these in the source. The other think I see in my code is lots of print statements that are commented out. I deleted lots more. There are surely better ways to find out what is going on/going wrong with a script but this works for me.

I am never going to be a programmer, but I get a lot of fun and occasional utility out of playing around like this.

There is a huge push to teach coding to pupils in school going on at the moment. A major reason for this is getting the right skills for employment. I hope a small side benefit will be giving learners the chance to have fun. Producing things for themselves rather than just use services and applications produced for them.

Tinkering with code that you do not understand may not be the best way to get a deep understanding of a language. It may not even help with learning the fundamental concepts. It does in my experience hook you into engaging with learning.

This term at work I’ll be involved in providing training in starting primary pupils coding. I’ll be recommending tinkering as one possible way of getting started and engaing pupils. I am sure some will be as fascinated as me.

webkit2png has problems when trying to get thumbnails of non https sites on El Capitan (Mac OS X 10.11) google allowed me to find a fix and edit the source of webkit2png (which turned out to be python for extra learning). ↩

Checking schema.org data with the Yandex structured data validator API⤴

from Phil Barker @ Sharing and learning

I have been writing examples of LRMI metadata for schema.org. Of course I want these to be valid, so I have been hitting various online validators quite frequently. This was getting tedious. Fortunately, the Yandex structured data validator has an API, so I could write a python script to automate the testing.

Here it is

#!/usr/bin/python
import httplib, urllib, json, sys 
from html2text import html2text
from sys import argv

noerror = False

def errhunt(key, responses):                 # a key and a dictionary,  
    print "Checking %s object" % key         # we're going on an err hunt
    if (responses[key]):
        for s in responses[key]:             
            for object_key in s.keys(): 
                if (object_key == "@error"):              
                    print "Errors in ", key
                    for error in s['@error']:
                        print "tError code:    ", error['error_code'][0]
                        print "tError message: ", html2text(error['message'][0]).replace('n',' ')
                        noerror = False
                elif (s[object_key] != ''):
                    errhunt(object_key, s)
                else:
                    print "No errors in %s object" % key
    else:
        print "No %s objects" % key

try:
    script, file_name = argv 
except:
    print "tError: Missing argument, name of file to check.ntUsage: yandexvalidator.py filename"
    sys.exit(0)

try:
    file = open( file_name, 'r' )
except:
    print "tError: Could not open file ", file_name, " to read"
    sys.exit(0)

content = file.read()

try:
    validator_url = "validator-api.semweb.yandex.ru"
    key = "12345-1234-1234-1234-123456789abc"
    params = urllib.urlencode({'apikey': key, 
                               'lang': 'en', 
                               'pretty': 'true', 
                               'only_errors': 'true' 
                             })
    validator_path = "/v1.0/document_parser?"+params
    headers = {"Content-type": "application/x-www-form-urlencoded",
               "Accept": "*/*"}
    validator_connection = httplib.HTTPSConnection( validator_url )
except:
    print "tError: something went wrong connecting to the Yandex validator." 

try:
    validator_connection.request("POST", validator_path, content, headers)
    response = validator_connection.getresponse()
    if (response.status == 204):
        noerror= True
        response_data = response.read()   # to clear for next connection
    else:
        response_data = json.load(response)
    validator_connection.close()
except:
    print "tError: something went wrong getting data from the Yandex validator by API." 
    print "tcontent:n", content
    print "tresponse: ", response.read()
    print "tstatus: ", response.status
    print "tmessage: ", response.msg 
    print "treason: ", response.reason 
    print "n"

    raise
    sys.exit(0)

if noerror :
    print "No errors found."
else:
    for k in response_data.keys():
        errhunt(k, response_data)

Usage:

$ ./yandexvalidator.py test.html
No errors found.
$ ./yandexvalidator.py test2.html
Checking json-ld object
No json-ld objects
Checking rdfa object
No rdfa objects
Checking id object
No id objects
Checking microformat object
No microformat objects
Checking microdata object
Checking http://schema.org/audience object
Checking http://schema.org/educationalAlignment object
Checking http://schema.org/video object
Errors in  http://schema.org/video
	Error code:     missing_empty
	Error message:  WARNING: Не выполнено обязательное условие для передачи данных в Яндекс.Видео: **isFamilyFriendly** field missing or empty  
	Error code:     missing_empty
	Error message:  WARNING: Не выполнено обязательное условие для передачи данных в Яндекс.Видео: **thumbnail** field missing or empty  
$

Points to note:

I’m no software engineer. I’ve tested this against valid and invalid files. You’re welcome to use this, but it might not work for you. (You’ll need your own API key). If you see something needs fixing, drop me a line.
Line 51: has to be an HTTPS connection.
Line 58: we ask for errors only (at line 46) so no news is good news.
The function errhunt does most of the work, recursively.

The response from the API is a json object (and json objects are converted into python dictionary objects by line 62), the keys of which are the “id” you sent and each of the structured data formats that are checked. For each of these there is an array/list of objects, and these objects are either simple key-value pairs or the value may be an array/list of objects. If there is an error in any of the objects, the value for the key “@error” gives the details, as a list of Error_code and Error_message key-value pairs. errhunt iterates and recurses through these lists of objects with embedded lists of objects.

ScotEduBlogs

An aggregation of Scottish educational blogs

Tag Archives: python

Graphical Application Profiles?⤴

Mind the GAP

GAP2SHACL

What next?

Using the WordPress REST API to post a book from WikiSource to PressBooks with python⤴

First steps with the WordPress API

Posting to WordPress

Posting to Pressbooks

Next Steps

Translating course descriptions from XCRI-CAP to schema.org⤴

Why?

The anatomy of an XCRI-CAP XML feed in schema.org terms

catalog

provider

course

presentation

Identifiers

Implementation

Conclusions

Next steps

A micro look at microbit⤴

Playing with Pinboard⤴

Pinboard

Planing and playing

Problems

Checking schema.org data with the Yandex structured data validator API⤴