Tag Archives: python

Graphical Application Profiles?⤴

from @ Sharing and learning

In this post I outline how a graphical representation of an application profile can be converted to SHACL that can be used for data validation.

My last few posts have been about work I have been doing with the Dublin Core Application Profiles Interest Group on Tabular Application Profiles (TAPs). In introducing TAPs I described them as “a human-friendly approach that also lends itself to machine processing”. The human readability comes from the tabular format, and the use of a defined CSV structure makes this machine processable. I’ve illustrated the machine processability through a python program, tap2shacl.py, that will convert a TAP into SHACL that can be used to validate instance data against the application profile, and I’ve shown that this works with a simple application profile and a real-world application profile based on DCAT. Once you get to these larger application profiles the tabular view is useful but a graphical representation is also great for providing an overview. For example here’s the graphic of the DCAT AP:

Source: join-up DCAT AP

Mind the GAP

I’ve long wondered whether it would be possible to convert the source for a graphical representation of an application profile (let’s call it a GAP) into one of the machine readable RDF formats. That boils down to processing the native format of the diagram file or any export from the graphics package used to create it. So I’ve routinely been looking for any chance of that whenever I come across a new diagramming tool. The breakthrough came when I noticed that lucid chart allows CSV export. After some exploration this is what I came up with.

As diagramming software, what Lucid chart does is quite familiar from Visio, yEd, diagrams.net and the like: it allows you to produce diagrams like the one below, of the (very) simple book application profile that we use in the DC Application Profiles Interest Group for testing:

two boxes, one representing data about a book, the other data about a person, joined by an arrow representing the author relationship. Lots of further detail about the book an author data is provided in the boxes, as discussed in the text of the blog post.

One distinctive feature of Lucid chart is that as well as just entering text directly into fields in the diagram, you can enter it into a data form associated with any object in the diagram, as shown below, first for the page and then for the shape representing the Author:

A screen shot of the Lucid Chart software showing the page and page data

A screen shot of the Lucid Chart software showing the Auhtor Shape and the data for it.

In the latter shot especially you can see the placeholder brackets [] in the AuthorShape object into which the values from the custom data form are put for display. Custom data can be associated with the document as a whole, any page in it and any shape (boxes, arrows etc) on the page;  you can create templates for shapes so that all shapes from a given template have the same custom data fields.

I chose a template for to represent Node Shapes (in the SHACL/ShEx sense, which become actual shapes in the diagram) that had the the following data:

  • name and expected RDF type in the top section;
  • information about the node shape, such as label, target, closure, severity in the middle section; and,
  • a list of the properties that have the range Literal is entered directly in the lower section (i.e. these don’t come from the custom data form).

Properties that have a range of BNode or URI are represented as arrows.

By using a structured string for Literal valued properties, and by adding information about the application profile and namespace prefixes and their URIs into the sheet custom data, I was able to enter most of the data needed for a simple application profile. The main shortcomings are that format for Literal valued properties is limited, and that complex  constraints such as alternatives (such as: use this Literal valued property or that URI property depending on …) cannot be dealt with.

The key to the magic is that on export as CSV, each page, shape and arrow gets a row, and there is a column for the default text areas and for the custom data (whether or not the latter is displayed). It’s an ugly, sparsely populated table, you can see a copy in github, but I can read it into a python Dict structure using python’s standard CSV module.

GAP2SHACL

When I created the TAP2SHACL program I aimed to do so in a very modular way: there is one module for the central application profile python classes, another to read csv files and convert them into those python classes, another to convert the python classes into SHACL and output them; so tap2shacl.py is just a wrapper that provide a user interface to those classes. That approach paid off here because having read the CSV file exported from lucid chart all I had to do was create a module to convert it into the python AP classes and then I could use AP2SHACL to get the output. That conversion was fairly straightforward, mostly just tedious if ...  else statements to parse the values from the data export. I did this in a Jupyter Notebook so that I could interact more easily with the data, that notebook is in github.

Here’s the SHACL generated from the graphic for the simple book ap, above:

[code]
# SHACL generated by python AP to shacl converter
@base <http://example.org/> .
@prefix dct: <http://purl.org/dc/terms/> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix sdo: <https://schema.org/> .
@prefix sh: <http://www.w3.org/ns/shacl#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

<BookShape> a sh:NodeShape ;
sh:class sdo:Book ;
sh:closed true ;
sh:description "Shape for describing books"@en ;
sh:name "Book"@en ;
sh:property <bookshapeAuthor>,
<bookshapeISBN>,
<bookshapeTitle> ;
sh:targetClass sdo:Book .

<AuthorShape> a sh:NodeShape ;
sh:class foaf:Person ;
sh:closed false ;
sh:description "Shape for describing authors"@en ;
sh:name "Author"@en ;
sh:property <authorshapeFamilyname>,
<authorshapeGivenname> ;
sh:targetObjectsOf dct:creator .

<authorshapeFamilyname> a sh:PropertyShape ;
sh:datatype xsd:string ;
sh:maxCount 1 ;
sh:minCount 1 ;
sh:name "Family name"@en ;
sh:nodeKind sh:Literal ;
sh:path foaf:familyName .

<authorshapeGivenname> a sh:PropertyShape ;
sh:datatype xsd:string ;
sh:maxCount 1 ;
sh:minCount 1 ;
sh:name "Given name"@en ;
sh:nodeKind sh:Literal ;
sh:path foaf:givenName .

<bookshapeAuthor> a sh:PropertyShape ;
sh:minCount 1 ;
sh:name "author"@en ;
sh:node <AuthorShape> ;
sh:nodeKind sh:IRI ;
sh:path dct:creator .

<bookshapeISBN> a sh:PropertyShape ;
sh:datatype xsd:string ;
sh:name "ISBN"@en ;
sh:nodeKind sh:Literal ;
sh:path sdo:isbn .

<bookshapeTitle> a sh:PropertyShape ;
sh:datatype rdf:langString ;
sh:maxCount 1 ;
sh:minCount 1 ;
sh:name "Title"@en ;
sh:nodeKind sh:Literal ;
sh:path dct:title .
[/code]

I haven’t tested this as thoroughly as the work on TAPs. The SHACL is valid, and as far as I can see it works as expected on the test instances I have for the simple book ap (though slight variations in the rules represented somehow crept in). I’m sure there will be ways of triggering exceptions in the code, or getting it to generate invalid SHACL, but for now, as a proof of concept, I think it’s pretty cool.

What next?

Well, I’m still using TAPs for some complex application profile / standards work. As it stands I don’t think I could express all the conditions that often arise in an application profile in an easily managed graphical form. Perhaps there is a way forward by generating a tap from a diagram and then adding further rules, but then I would worry about version management if one was altered and not the other. I’m also concerned about tying this work to one commercial diagramming tool, over which I have no real control. I’m pretty sure that there is something in the GAP+TAP approach, but it would need tighter integration between the graphical and tabular representations.

I also want to explore generating other outputs that SHACL from TAPs (and graphical representations). I see a need to generate JSON-LD context files for application profiles, we should try getting ShEx from TAPs, and I have already done a little experimenting with generating RDF-Schema from Lucid Chart diagrams.

The post Graphical Application Profiles? appeared first on Sharing and learning.

Using the WordPress REST API to post a book from WikiSource to PressBooks with python⤴

from @ Sharing and learning

I am using Pressbooks to build an online edition of Southey and Coleridge’s Omniana. I transcribed the text for Volume I on wikisource. This post is about how I got that text into pressbooks; copy and paste didn’t appeal, so I thought I would try using the WordPress REST API. You could probably write a PHP plugin that would do this, but I find python a bit easier for exploratory work, so I used that.

Getting the data from Wikisource is reasonably trivial. On wikisource I have transcluded the page transcriptions into a single HTML file of the whole book. This file is relatively easy to parse into the individual articles for posting to Pressbooks, especially as I added <hr /> tags before each article (even the first) and added stop at the end.

In the longer term I want to start indexing the PressBook Omniana using wikidata for linked data. This will let me look at the semantic graph of what Southey and Coleridge were interested in.

First steps with the WordPress API

I’ve not used the WordPress API before, but it is well documented and there is a useful series of articles on envatoTuts+: Introducing the WP REST API.

Put /wp-json onto the end of a WordPress blog URL and you can see the routes and endpoints (e.g. this blog, my Pressbooks/Omniana). (I use the JSON viewer chrome plugin to make these easier to read.) I found wp-api-python very useful in helping make requests against these in python. It’s available via pip as wordpress-api and I found it required python the libraries request beautifulsoup4requests-oauthlib and six. It authenticates via  OAuth, so on WordPress you need the  WordPress REST API – Oauth1.0a plugin or similar; there’s more than you need to know about how OAuth works  on envatotuts+.

I installed the Oauth1.0a plugin for the network on a WordPress multisite and PressBook test servers. Network activation seemed to generate errors on Pressbooks and plain multisite WordPress, so I activated it only for the individual blog/book. Then in the Users tab on the admin screen I was will be able to view and set up applications:

Add Application screen from the OAuth1.0a plugin

Filling out the details and clicking on save consumer and  gave me a client key and client secret.

Back in python I used these to poke around the various API endpoints of my test multisite installation of WordPress, e.g.

from wordpress import API
base_url = "http://wordpress.home.local/test"
api_path = "/wp-json/wp/v2/"
wpapi = API(
    url=base_url,
    consumer_key="thisismykey",
    consumer_secret="thisismysecret",
    api="wp-json",
    version="wp/v2",
    wp_user="phil",
    wp_pass="thisismypassword",
    oauth1a_3leg=True,
    creds_store="~/.wc-api-creds.json",
    callback="http://wordpress.home.local/test/api-test"
)
print("listing posts")
resource = "posts"
try:
    response = wpapi.get(base_url+api_path+resource)
    for post in response.json():
        print(post['id'], post['title'])
except Exception as e:
    print("couldn't get posts")
    print(e)

wpapi uses requests methods, documented here.  Other useful properties and methods are

  • r.ok: boolean, True if HTTP status code is <400
  • r.content, response content in bytes,
  • r.text, response content in text
  • r.headers, response headers
  • r.iter_lines() content a line at a time
  • r.json() response as a json object

Posting to WordPress

Following the envatoTuts+ Creating, Updating, and Deleting Data article and translating to python:

from wordpress import API
base_url = "http://wordpress.home.local/test"
api_path = "/wp-json/wp/v2/"
wpapi = API(
    url=base_url,
    consumer_key="thisismykey",
    consumer_secret="thisismysecret",
    api="wp-json",
    version="wp/v2",
    wp_user="phil",
    wp_pass="thisismypassword",
    oauth1a_3leg=True,
    creds_store="~/.wc-api-creds.json",
    callback="http://wordpress.home.local/test/api-test"
)

print("creating new post")
resource = "posts"
title = "86. Glover's Leonidas."
content = """Glover's Leonidas was unduly praised at its first appearance, and more unduly ...
..."""
excerpt = """Glover's Leonidas was unduly praised at its ..."""
data = {
    "content": content,
    "title": title,
    "excerpt": excerpt,
    "status": "draft",
    "categories": [190]
}
try:
    response = wpapi.post(base_url+api_path+resource, data)
    print(response.json())
except Exception as e:
    print("couldn't post")
    print(e)

The posts resource collection allows creation and retrieval  (POST and GET methods); a specific posts/(?P<id>[\d]+) resource allows update and delete (PUT, PATCH and DELETE methods).

The keys for the data dict are the same as the schema for the WordPress API method, which are also shown in the arguments listed in the JSON returned by wp-json for each endpoint under each route.

Posting to Pressbooks

Pressbooks has a whole extended set of api routes and endpoints, no ‘posts’ resources, but front-matter, back-matter, parts and chapters; all under the /pressbooks/v2/ path.

There is some documentation on the Pressbooks site.  I’m posting articles as chapters into a Pressbook site that already has some organised content, so I don’t have to worry about setting them up. Adapting from the above, changing to URL and credentials to those for my local test instance of Pressbooks, and changing the api-path, version, and resource name, this posts a test chapter to the content part of my book, as a “numberless” chapter-type:

from wordpress import API
base_url = "http://books.home.local/omniana"
api_path = "/wp-json/pressbooks/v2/"
wpapi = API(
    url=base_url,
    consumer_key="thisismykey",
    consumer_secret="thisismysecret",
    api="wp-json",
    version="pressbooks/v2",
    wp_user="phil",
    wp_pass="thisismypassword",
    oauth1a_3leg=True,
    creds_store="~/.wc-api-creds3.json",
    callback="http://books.home.local/omniana/api-test"
)
print("creating new chapter")
resource = "chapters"
data = {
"content": "test",
"title": "test",
"status": "publish",
"chapter-type": 48,
"part": 27
}
try:
response = wpapi.post(base_url+api_path+resource, data)
pprint(response.json())
except Exception as e:
print("couldn't post")
print(e)

Finding the ids for chapter-type and part need a little detective work. You can, of course use an API call to GET the parts and  list their names and ids, in a similar way to listing the posts in the first example above; or you can just edit the part or chapter-type in the Bookpress admin interface and inspect the url. It’s also worth noting that you need a different creds_store for each OAUTH provider you connect to.

Next Steps

As I said, parsing reading through and parsing the transcluded the page transcriptions wasn’t too hard (I put some markers in the transclusion to help). I made some changes to the content before posting it: perhaps the most interesting issue was  changing the wiki style footnotes to Pressbook style.

At the time of writing, I have started posting to the live/public instance of Omniana on Pressbooks but still have to sort some formatting issues: removing line breaks, making sure that the CSS selectors are appropriate for WordPress; that shouldn’t take long to fix.

Then I want to start indexing the articles using wikidata for linked data.

The post Using the WordPress REST API to post a book from WikiSource to PressBooks with python appeared first on Sharing and learning.

A micro look at microbit⤴

from @ John's World Wide Wall Display

microbit-animation

A lot of micro:bits from the BBC arrived in the centre where I work, ready to be distributed to North Lanarkshire schools. I’ve taken the opportunity to break one out and have a wee play.

The devices are aimed at secondary so outside my wheelhouse, but I could not resist a wee play.

The microbic works by creating code for it on a computer and flashing it to the device via USB (you can also use bluetooth from a mobile app). There are several different ways to create code. You can do in in the browser with severe different editors, Code Kingdom’s JavaScript, The Microsoft Block Editor, Microsoft Touch Develop or Python. I’ve had a quick try of most of these. You can also use the MU python editor that runs on Windows, OSX, Linux and Raspberry Pi.

Although I don’t really know any python I’ve found that the MU editor the most reliable. The browser based ones have been occasionally flaky, causing me to switch browsers a few times. I also like to have anything stored locally (the browser editor stores in local storage, but that means you need to either get an account sorted out or use the same browser on the same box all the time.)

There are already a nice set of resource building up, I found the Raspberry Pi and micro:bit Playground both useful.

When I was looking at the Tilty Game from the micro:bit Playground I though I might be able to make a ‘paint’ editor. This is the result. (click to start the movie, I’ve just found you can use a gif as a poster frame)

The code allows you to draw on the microbes LEDs, the left and right buttons move the cursor in a horizontal and vertical  directions and a double press toggle the lights.

And here is the code, I used hilite.me to make it look nicer. Not exactly rocket science. I expect there are better ways of doing this.

from microbit import *

Matrix = [[0 for x in range(5)] for x in range(5)]

#set initial position
x = 2
y = 2

def printmatrix():
    for x in range(5):
        for y in range(5):
            if (Matrix[x][y]):
                display.set_pixel(x, y, 6)
            else:
                display.set_pixel(x, y, 0)
    return;
            
#show cursor
display.set_pixel(x, y, 9)
 
while True:
    if button_a.is_pressed() and button_b.is_pressed():
        if (Matrix[x][y]==0):
            Matrix[x][y]=1
        else:
            Matrix[x][y]=0
        printmatrix()
        sleep(1000)
        continue
                 
    elif button_a.is_pressed():
        x = x + 1
        if (x>4):
            x=0
        printmatrix()
        display.set_pixel(x, y, 9)
    elif button_b.is_pressed():
        y = y + 1
        if (y>4):
            y=0
        printmatrix()
        display.set_pixel(x, y, 9)
    sleep(200)

The idea is we store a matrix of which lights are on. The ones turned on are shown by the printmatrix function. They are displayed at a brightness of 6 to distinguish them from the cursor, which is full beam.

The cursor is moved with the left and right buttons. it loops (I wonder if it would be better to bounce it?) Clicking the left and right buttons toggles the light on or of in the matrix. The reset button clears the screen.

I had quite a lot of fun getting this to work, the formatting of the script caught me out a few times. I wonder, if I was smarter, could I take the same approach and make a noughts and crosses app?

Featured image on this post a gif made from BBC micro:bit by Gareth Halfacree used under a Creative Commons — Attribution-ShareAlike 2.0 Generic — CC BY-SA 2.0 License.

Playing with Pinboard⤴

from @ John's World Wide Wall Display

Pinboard

I’ve been using pinboard for collecting links for five years now. I like it a lot, it feeds the Links page here and most of the enviable stuff.

One of the main things I like about it is its simplicity. Pinboard lists the links, titles, and descriptions without any images or fancy stuff. Adding links via the bookmarklet is simple. It supports the delicious API and has RSS so you can pull sets of links onto blogs and webpages easily enough.

Last week I used the service to play around with python a little. To produce a more visual representation of my recent links. I appreciate the irony. This was an excuse to play with several technologies that I do not know much about.

Last month I had read: this post Homemade RSS aggregator followup by Dr Drang. This shows how to make an RSS reader with python.

I’ve very occasionally played with python for an hour or two but do not really understand the basics. I can however try things repeatedly until they worked.

Planing and playing

My plan was to use the code from Dr Drang, simplifying it to deal with just one RSS feed. Using my pinboard links to produce a webpage. I also wanted to make thumbnails of the websites linked and play with CSS and JavaScript a bit.

The idea was to create the webpage in my dropbox. This could be updated automatically by the script running on my mac. I’ve had dropbox long enough to have a Public folder that is very handy for publishing webpages. This is now a pro and business option only.

Here is the script: pinboardrecent.py and the current output: Recent Pinboard.

Problems

The interesting thing about all of this is the several problems I hit and their solution.

The problem included:

  • Not know how to do something
  • Errors in the code I wrote
  • Errors with webkit2png 1 which I was using to produce the thumbnails.

The answers all involved google and testing and re-testing until things worked. In some all cases I am sure my answers were not the best way of doing things but they worked. I’ve noted most of these in the source. The other think I see in my code is lots of print statements that are commented out. I deleted lots more. There are surely better ways to find out what is going on/going wrong with a script but this works for me.

I am never going to be a programmer, but I get a lot of fun and occasional utility out of playing around like this.

There is a huge push to teach coding to pupils in school going on at the moment. A major reason for this is getting the right skills for employment. I hope a small side benefit will be giving learners the chance to have fun. Producing things for themselves rather than just use services and applications produced for them.

Tinkering with code that you do not understand may not be the best way to get a deep understanding of a language. It may not even help with learning the fundamental concepts. It does in my experience hook you into engaging with learning.

This term at work I’ll be involved in providing training in starting primary pupils coding. I’ll be recommending tinkering as one possible way of getting started and engaing pupils. I am sure some will be as fascinated as me.

  1. webkit2png has problems when trying to get thumbnails of non https sites on El Capitan (Mac OS X 10.11) google allowed me to find a fix and edit the source of webkit2png (which turned out to be python for extra learning).

Checking schema.org data with the Yandex structured data validator API⤴

from @ Sharing and learning

I have been writing examples of LRMI metadata for schema.org. Of course I want these to be valid, so I have been hitting various online validators quite frequently. This was getting tedious. Fortunately, the Yandex structured data validator has an API, so I could write a python script to automate the testing.

Here it is

#!/usr/bin/python
import httplib, urllib, json, sys 
from html2text import html2text
from sys import argv

noerror = False

def errhunt(key, responses):                 # a key and a dictionary,  
    print "Checking %s object" % key         # we're going on an err hunt
    if (responses[key]):
        for s in responses[key]:             
            for object_key in s.keys(): 
                if (object_key == "@error"):              
                    print "Errors in ", key
                    for error in s['@error']:
                        print "tError code:    ", error['error_code'][0]
                        print "tError message: ", html2text(error['message'][0]).replace('n',' ')
                        noerror = False
                elif (s[object_key] != ''):
                    errhunt(object_key, s)
                else:
                    print "No errors in %s object" % key
    else:
        print "No %s objects" % key

try:
    script, file_name = argv 
except:
    print "tError: Missing argument, name of file to check.ntUsage: yandexvalidator.py filename"
    sys.exit(0)

try:
    file = open( file_name, 'r' )
except:
    print "tError: Could not open file ", file_name, " to read"
    sys.exit(0)

content = file.read()

try:
    validator_url = "validator-api.semweb.yandex.ru"
    key = "12345-1234-1234-1234-123456789abc"
    params = urllib.urlencode({'apikey': key, 
                               'lang': 'en', 
                               'pretty': 'true', 
                               'only_errors': 'true' 
                             })
    validator_path = "/v1.0/document_parser?"+params
    headers = {"Content-type": "application/x-www-form-urlencoded",
               "Accept": "*/*"}
    validator_connection = httplib.HTTPSConnection( validator_url )
except:
    print "tError: something went wrong connecting to the Yandex validator." 

try:
    validator_connection.request("POST", validator_path, content, headers)
    response = validator_connection.getresponse()
    if (response.status == 204):
        noerror= True
        response_data = response.read()   # to clear for next connection
    else:
        response_data = json.load(response)
    validator_connection.close()
except:
    print "tError: something went wrong getting data from the Yandex validator by API." 
    print "tcontent:n", content
    print "tresponse: ", response.read()
    print "tstatus: ", response.status
    print "tmessage: ", response.msg 
    print "treason: ", response.reason 
    print "n"

    raise
    sys.exit(0)

if noerror :
    print "No errors found."
else:
    for k in response_data.keys():
        errhunt(k, response_data)

Usage:

$ ./yandexvalidator.py test.html
No errors found.
$ ./yandexvalidator.py test2.html
Checking json-ld object
No json-ld objects
Checking rdfa object
No rdfa objects
Checking id object
No id objects
Checking microformat object
No microformat objects
Checking microdata object
Checking http://schema.org/audience object
Checking http://schema.org/educationalAlignment object
Checking http://schema.org/video object
Errors in  http://schema.org/video
	Error code:     missing_empty
	Error message:  WARNING: Не выполнено обязательное условие для передачи данных в Яндекс.Видео: **isFamilyFriendly** field missing or empty  
	Error code:     missing_empty
	Error message:  WARNING: Не выполнено обязательное условие для передачи данных в Яндекс.Видео: **thumbnail** field missing or empty  
$ 

Points to note:

  • I’m no software engineer. I’ve tested this against valid and invalid files. You’re welcome to use this, but it might not work for you. (You’ll need your own API key). If you see something needs fixing, drop me a line.
  • Line 51: has to be an HTTPS connection.
  • Line 58: we ask for errors only (at line 46) so no news is good news.
  • The function errhunt does most of the work, recursively.

The response from the API is a json object (and json objects are converted into python dictionary objects by line 62), the keys of which are the “id” you sent and each of the structured data formats that are checked. For each of these there is an array/list of objects, and these objects are either simple key-value pairs or the value may be an array/list of objects. If there is an error in any of the objects, the value for the key “@error” gives the details, as a list of Error_code and Error_message key-value pairs. errhunt iterates and recurses through these lists of objects with embedded lists of objects.