Tag Archives: python

Using the WordPress REST API to post a book from WikiSource to PressBooks with python⤴

from @ Sharing and learning

I am using Pressbooks to build an online edition of Southey and Coleridge’s Omniana. I transcribed the text for Volume I on wikisource. This post is about how I got that text into pressbooks; copy and paste didn’t appeal, so I thought I would try using the WordPress REST API. You could probably write a PHP plugin that would do this, but I find python a bit easier for exploratory work, so I used that.

Getting the data from Wikisource is reasonably trivial. On wikisource I have transcluded the page transcriptions into a single HTML file of the whole book. This file is relatively easy to parse into the individual articles for posting to Pressbooks, especially as I added <hr /> tags before each article (even the first) and added stop at the end.

In the longer term I want to start indexing the PressBook Omniana using wikidata for linked data. This will let me look at the semantic graph of what Southey and Coleridge were interested in.

First steps with the WordPress API

I’ve not used the WordPress API before, but it is well documented and there is a useful series of articles on envatoTuts+: Introducing the WP REST API.

Put /wp-json onto the end of a WordPress blog URL and you can see the routes and endpoints (e.g. this blog, my Pressbooks/Omniana). (I use the JSON viewer chrome plugin to make these easier to read.) I found wp-api-python very useful in helping make requests against these in python. It’s available via pip as wordpress-api and I found it required python the libraries request beautifulsoup4requests-oauthlib and six. It authenticates via  OAuth, so on WordPress you need the  WordPress REST API – Oauth1.0a plugin or similar; there’s more than you need to know about how OAuth works  on envatotuts+.

I installed the Oauth1.0a plugin for the network on a WordPress multisite and PressBook test servers. Network activation seemed to generate errors on Pressbooks and plain multisite WordPress, so I activated it only for the individual blog/book. Then in the Users tab on the admin screen I was will be able to view and set up applications:

Add Application screen from the OAuth1.0a plugin

Filling out the details and clicking on save consumer and  gave me a client key and client secret.

Back in python I used these to poke around the various API endpoints of my test multisite installation of WordPress, e.g.

from wordpress import API
base_url = "http://wordpress.home.local/test"
api_path = "/wp-json/wp/v2/"
wpapi = API(
print("listing posts")
resource = "posts"
    response = wpapi.get(base_url+api_path+resource)
    for post in response.json():
        print(post['id'], post['title'])
except Exception as e:
    print("couldn't get posts")

wpapi uses requests methods, documented here.  Other useful properties and methods are

  • r.ok: boolean, True if HTTP status code is <400
  • r.content, response content in bytes,
  • r.text, response content in text
  • r.headers, response headers
  • r.iter_lines() content a line at a time
  • r.json() response as a json object

Posting to WordPress

Following the envatoTuts+ Creating, Updating, and Deleting Data article and translating to python:

from wordpress import API
base_url = "http://wordpress.home.local/test"
api_path = "/wp-json/wp/v2/"
wpapi = API(

print("creating new post")
resource = "posts"
title = "86. Glover's Leonidas."
content = """Glover's Leonidas was unduly praised at its first appearance, and more unduly ...
excerpt = """Glover's Leonidas was unduly praised at its ..."""
data = {
    "content": content,
    "title": title,
    "excerpt": excerpt,
    "status": "draft",
    "categories": [190]
    response = wpapi.post(base_url+api_path+resource, data)
except Exception as e:
    print("couldn't post")

The posts resource collection allows creation and retrieval  (POST and GET methods); a specific posts/(?P<id>[\d]+) resource allows update and delete (PUT, PATCH and DELETE methods).

The keys for the data dict are the same as the schema for the WordPress API method, which are also shown in the arguments listed in the JSON returned by wp-json for each endpoint under each route.

Posting to Pressbooks

Pressbooks has a whole extended set of api routes and endpoints, no ‘posts’ resources, but front-matter, back-matter, parts and chapters; all under the /pressbooks/v2/ path.

There is some documentation on the Pressbooks site.  I’m posting articles as chapters into a Pressbook site that already has some organised content, so I don’t have to worry about setting them up. Adapting from the above, changing to URL and credentials to those for my local test instance of Pressbooks, and changing the api-path, version, and resource name, this posts a test chapter to the content part of my book, as a “numberless” chapter-type:

from wordpress import API
base_url = "http://books.home.local/omniana"
api_path = "/wp-json/pressbooks/v2/"
wpapi = API(
print("creating new chapter")
resource = "chapters"
data = {
"content": "test",
"title": "test",
"status": "publish",
"chapter-type": 48,
"part": 27
response = wpapi.post(base_url+api_path+resource, data)
except Exception as e:
print("couldn't post")

Finding the ids for chapter-type and part need a little detective work. You can, of course use an API call to GET the parts and  list their names and ids, in a similar way to listing the posts in the first example above; or you can just edit the part or chapter-type in the Bookpress admin interface and inspect the url. It’s also worth noting that you need a different creds_store for each OAUTH provider you connect to.

Next Steps

As I said, parsing reading through and parsing the transcluded the page transcriptions wasn’t too hard (I put some markers in the transclusion to help). I made some changes to the content before posting it: perhaps the most interesting issue was  changing the wiki style footnotes to Pressbook style.

At the time of writing, I have started posting to the live/public instance of Omniana on Pressbooks but still have to sort some formatting issues: removing line breaks, making sure that the CSS selectors are appropriate for WordPress; that shouldn’t take long to fix.

Then I want to start indexing the articles using wikidata for linked data.

The post Using the WordPress REST API to post a book from WikiSource to PressBooks with python appeared first on Sharing and learning.

A micro look at microbit⤴

from @ John's World Wide Wall Display


A lot of micro:bits from the BBC arrived in the centre where I work, ready to be distributed to North Lanarkshire schools. I’ve taken the opportunity to break one out and have a wee play.

The devices are aimed at secondary so outside my wheelhouse, but I could not resist a wee play.

The microbic works by creating code for it on a computer and flashing it to the device via USB (you can also use bluetooth from a mobile app). There are several different ways to create code. You can do in in the browser with severe different editors, Code Kingdom’s JavaScript, The Microsoft Block Editor, Microsoft Touch Develop or Python. I’ve had a quick try of most of these. You can also use the MU python editor that runs on Windows, OSX, Linux and Raspberry Pi.

Although I don’t really know any python I’ve found that the MU editor the most reliable. The browser based ones have been occasionally flaky, causing me to switch browsers a few times. I also like to have anything stored locally (the browser editor stores in local storage, but that means you need to either get an account sorted out or use the same browser on the same box all the time.)

There are already a nice set of resource building up, I found the Raspberry Pi and micro:bit Playground both useful.

When I was looking at the Tilty Game from the micro:bit Playground I though I might be able to make a ‘paint’ editor. This is the result. (click to start the movie, I’ve just found you can use a gif as a poster frame)

The code allows you to draw on the microbes LEDs, the left and right buttons move the cursor in a horizontal and vertical  directions and a double press toggle the lights.

And here is the code, I used hilite.me to make it look nicer. Not exactly rocket science. I expect there are better ways of doing this.

from microbit import *

Matrix = [[0 for x in range(5)] for x in range(5)]

#set initial position
x = 2
y = 2

def printmatrix():
    for x in range(5):
        for y in range(5):
            if (Matrix[x][y]):
                display.set_pixel(x, y, 6)
                display.set_pixel(x, y, 0)
#show cursor
display.set_pixel(x, y, 9)
while True:
    if button_a.is_pressed() and button_b.is_pressed():
        if (Matrix[x][y]==0):
    elif button_a.is_pressed():
        x = x + 1
        if (x>4):
        display.set_pixel(x, y, 9)
    elif button_b.is_pressed():
        y = y + 1
        if (y>4):
        display.set_pixel(x, y, 9)

The idea is we store a matrix of which lights are on. The ones turned on are shown by the printmatrix function. They are displayed at a brightness of 6 to distinguish them from the cursor, which is full beam.

The cursor is moved with the left and right buttons. it loops (I wonder if it would be better to bounce it?) Clicking the left and right buttons toggles the light on or of in the matrix. The reset button clears the screen.

I had quite a lot of fun getting this to work, the formatting of the script caught me out a few times. I wonder, if I was smarter, could I take the same approach and make a noughts and crosses app?

Featured image on this post a gif made from BBC micro:bit by Gareth Halfacree used under a Creative Commons — Attribution-ShareAlike 2.0 Generic — CC BY-SA 2.0 License.

Playing with Pinboard⤴

from @ John's World Wide Wall Display


I’ve been using pinboard for collecting links for five years now. I like it a lot, it feeds the Links page here and most of the enviable stuff.

One of the main things I like about it is its simplicity. Pinboard lists the links, titles, and descriptions without any images or fancy stuff. Adding links via the bookmarklet is simple. It supports the delicious API and has RSS so you can pull sets of links onto blogs and webpages easily enough.

Last week I used the service to play around with python a little. To produce a more visual representation of my recent links. I appreciate the irony. This was an excuse to play with several technologies that I do not know much about.

Last month I had read: this post Homemade RSS aggregator followup by Dr Drang. This shows how to make an RSS reader with python.

I’ve very occasionally played with python for an hour or two but do not really understand the basics. I can however try things repeatedly until they worked.

Planing and playing

My plan was to use the code from Dr Drang, simplifying it to deal with just one RSS feed. Using my pinboard links to produce a webpage. I also wanted to make thumbnails of the websites linked and play with CSS and JavaScript a bit.

The idea was to create the webpage in my dropbox. This could be updated automatically by the script running on my mac. I’ve had dropbox long enough to have a Public folder that is very handy for publishing webpages. This is now a pro and business option only.

Here is the script: pinboardrecent.py and the current output: Recent Pinboard.


The interesting thing about all of this is the several problems I hit and their solution.

The problem included:

  • Not know how to do something
  • Errors in the code I wrote
  • Errors with webkit2png 1 which I was using to produce the thumbnails.

The answers all involved google and testing and re-testing until things worked. In some all cases I am sure my answers were not the best way of doing things but they worked. I’ve noted most of these in the source. The other think I see in my code is lots of print statements that are commented out. I deleted lots more. There are surely better ways to find out what is going on/going wrong with a script but this works for me.

I am never going to be a programmer, but I get a lot of fun and occasional utility out of playing around like this.

There is a huge push to teach coding to pupils in school going on at the moment. A major reason for this is getting the right skills for employment. I hope a small side benefit will be giving learners the chance to have fun. Producing things for themselves rather than just use services and applications produced for them.

Tinkering with code that you do not understand may not be the best way to get a deep understanding of a language. It may not even help with learning the fundamental concepts. It does in my experience hook you into engaging with learning.

This term at work I’ll be involved in providing training in starting primary pupils coding. I’ll be recommending tinkering as one possible way of getting started and engaing pupils. I am sure some will be as fascinated as me.

  1. webkit2png has problems when trying to get thumbnails of non https sites on El Capitan (Mac OS X 10.11) google allowed me to find a fix and edit the source of webkit2png (which turned out to be python for extra learning).

Checking schema.org data with the Yandex structured data validator API⤴

from @ Sharing and learning

I have been writing examples of LRMI metadata for schema.org. Of course I want these to be valid, so I have been hitting various online validators quite frequently. This was getting tedious. Fortunately, the Yandex structured data validator has an API, so I could write a python script to automate the testing.

Here it is

import httplib, urllib, json, sys 
from html2text import html2text
from sys import argv

noerror = False

def errhunt(key, responses):                 # a key and a dictionary,  
    print "Checking %s object" % key         # we're going on an err hunt
    if (responses[key]):
        for s in responses[key]:             
            for object_key in s.keys(): 
                if (object_key == "@error"):              
                    print "Errors in ", key
                    for error in s['@error']:
                        print "tError code:    ", error['error_code'][0]
                        print "tError message: ", html2text(error['message'][0]).replace('n',' ')
                        noerror = False
                elif (s[object_key] != ''):
                    errhunt(object_key, s)
                    print "No errors in %s object" % key
        print "No %s objects" % key

    script, file_name = argv 
    print "tError: Missing argument, name of file to check.ntUsage: yandexvalidator.py filename"

    file = open( file_name, 'r' )
    print "tError: Could not open file ", file_name, " to read"

content = file.read()

    validator_url = "validator-api.semweb.yandex.ru"
    key = "12345-1234-1234-1234-123456789abc"
    params = urllib.urlencode({'apikey': key, 
                               'lang': 'en', 
                               'pretty': 'true', 
                               'only_errors': 'true' 
    validator_path = "/v1.0/document_parser?"+params
    headers = {"Content-type": "application/x-www-form-urlencoded",
               "Accept": "*/*"}
    validator_connection = httplib.HTTPSConnection( validator_url )
    print "tError: something went wrong connecting to the Yandex validator." 

    validator_connection.request("POST", validator_path, content, headers)
    response = validator_connection.getresponse()
    if (response.status == 204):
        noerror= True
        response_data = response.read()   # to clear for next connection
        response_data = json.load(response)
    print "tError: something went wrong getting data from the Yandex validator by API." 
    print "tcontent:n", content
    print "tresponse: ", response.read()
    print "tstatus: ", response.status
    print "tmessage: ", response.msg 
    print "treason: ", response.reason 
    print "n"


if noerror :
    print "No errors found."
    for k in response_data.keys():
        errhunt(k, response_data)


$ ./yandexvalidator.py test.html
No errors found.
$ ./yandexvalidator.py test2.html
Checking json-ld object
No json-ld objects
Checking rdfa object
No rdfa objects
Checking id object
No id objects
Checking microformat object
No microformat objects
Checking microdata object
Checking http://schema.org/audience object
Checking http://schema.org/educationalAlignment object
Checking http://schema.org/video object
Errors in  http://schema.org/video
	Error code:     missing_empty
	Error message:  WARNING: Не выполнено обязательное условие для передачи данных в Яндекс.Видео: **isFamilyFriendly** field missing or empty  
	Error code:     missing_empty
	Error message:  WARNING: Не выполнено обязательное условие для передачи данных в Яндекс.Видео: **thumbnail** field missing or empty  

Points to note:

  • I’m no software engineer. I’ve tested this against valid and invalid files. You’re welcome to use this, but it might not work for you. (You’ll need your own API key). If you see something needs fixing, drop me a line.
  • Line 51: has to be an HTTPS connection.
  • Line 58: we ask for errors only (at line 46) so no news is good news.
  • The function errhunt does most of the work, recursively.

The response from the API is a json object (and json objects are converted into python dictionary objects by line 62), the keys of which are the “id” you sent and each of the structured data formats that are checked. For each of these there is an array/list of objects, and these objects are either simple key-value pairs or the value may be an array/list of objects. If there is an error in any of the objects, the value for the key “@error” gives the details, as a list of Error_code and Error_message key-value pairs. errhunt iterates and recurses through these lists of objects with embedded lists of objects.