Tag Archives: ebooks

Using the WordPress REST API to post a book from WikiSource to PressBooks with python⤴

from @ Sharing and learning

I am using Pressbooks to build an online edition of Southey and Coleridge’s Omniana. I transcribed the text for Volume I on wikisource. This post is about how I got that text into pressbooks; copy and paste didn’t appeal, so I thought I would try using the WordPress REST API. You could probably write a PHP plugin that would do this, but I find python a bit easier for exploratory work, so I used that.

Getting the data from Wikisource is reasonably trivial. On wikisource I have transcluded the page transcriptions into a single HTML file of the whole book. This file is relatively easy to parse into the individual articles for posting to Pressbooks, especially as I added <hr /> tags before each article (even the first) and added stop at the end.

In the longer term I want to start indexing the PressBook Omniana using wikidata for linked data. This will let me look at the semantic graph of what Southey and Coleridge were interested in.

First steps with the WordPress API

I’ve not used the WordPress API before, but it is well documented and there is a useful series of articles on envatoTuts+: Introducing the WP REST API.

Put /wp-json onto the end of a WordPress blog URL and you can see the routes and endpoints (e.g. this blog, my Pressbooks/Omniana). (I use the JSON viewer chrome plugin to make these easier to read.) I found wp-api-python very useful in helping make requests against these in python. It’s available via pip as wordpress-api and I found it required python the libraries request beautifulsoup4requests-oauthlib and six. It authenticates via  OAuth, so on WordPress you need the  WordPress REST API – Oauth1.0a plugin or similar; there’s more than you need to know about how OAuth works  on envatotuts+.

I installed the Oauth1.0a plugin for the network on a WordPress multisite and PressBook test servers. Network activation seemed to generate errors on Pressbooks and plain multisite WordPress, so I activated it only for the individual blog/book. Then in the Users tab on the admin screen I was will be able to view and set up applications:

Add Application screen from the OAuth1.0a plugin

Filling out the details and clicking on save consumer and  gave me a client key and client secret.

Back in python I used these to poke around the various API endpoints of my test multisite installation of WordPress, e.g.

from wordpress import API
base_url = "http://wordpress.home.local/test"
api_path = "/wp-json/wp/v2/"
wpapi = API(
    url=base_url,
    consumer_key="thisismykey",
    consumer_secret="thisismysecret",
    api="wp-json",
    version="wp/v2",
    wp_user="phil",
    wp_pass="thisismypassword",
    oauth1a_3leg=True,
    creds_store="~/.wc-api-creds.json",
    callback="http://wordpress.home.local/test/api-test"
)
print("listing posts")
resource = "posts"
try:
    response = wpapi.get(base_url+api_path+resource)
    for post in response.json():
        print(post['id'], post['title'])
except Exception as e:
    print("couldn't get posts")
    print(e)

wpapi uses requests methods, documented here.  Other useful properties and methods are

  • r.ok: boolean, True if HTTP status code is <400
  • r.content, response content in bytes,
  • r.text, response content in text
  • r.headers, response headers
  • r.iter_lines() content a line at a time
  • r.json() response as a json object

Posting to WordPress

Following the envatoTuts+ Creating, Updating, and Deleting Data article and translating to python:

from wordpress import API
base_url = "http://wordpress.home.local/test"
api_path = "/wp-json/wp/v2/"
wpapi = API(
    url=base_url,
    consumer_key="thisismykey",
    consumer_secret="thisismysecret",
    api="wp-json",
    version="wp/v2",
    wp_user="phil",
    wp_pass="thisismypassword",
    oauth1a_3leg=True,
    creds_store="~/.wc-api-creds.json",
    callback="http://wordpress.home.local/test/api-test"
)

print("creating new post")
resource = "posts"
title = "86. Glover's Leonidas."
content = """Glover's Leonidas was unduly praised at its first appearance, and more unduly ...
..."""
excerpt = """Glover's Leonidas was unduly praised at its ..."""
data = {
    "content": content,
    "title": title,
    "excerpt": excerpt,
    "status": "draft",
    "categories": [190]
}
try:
    response = wpapi.post(base_url+api_path+resource, data)
    print(response.json())
except Exception as e:
    print("couldn't post")
    print(e)

The posts resource collection allows creation and retrieval  (POST and GET methods); a specific posts/(?P<id>[\d]+) resource allows update and delete (PUT, PATCH and DELETE methods).

The keys for the data dict are the same as the schema for the WordPress API method, which are also shown in the arguments listed in the JSON returned by wp-json for each endpoint under each route.

Posting to Pressbooks

Pressbooks has a whole extended set of api routes and endpoints, no ‘posts’ resources, but front-matter, back-matter, parts and chapters; all under the /pressbooks/v2/ path.

There is some documentation on the Pressbooks site.  I’m posting articles as chapters into a Pressbook site that already has some organised content, so I don’t have to worry about setting them up. Adapting from the above, changing to URL and credentials to those for my local test instance of Pressbooks, and changing the api-path, version, and resource name, this posts a test chapter to the content part of my book, as a “numberless” chapter-type:

from wordpress import API
base_url = "http://books.home.local/omniana"
api_path = "/wp-json/pressbooks/v2/"
wpapi = API(
    url=base_url,
    consumer_key="thisismykey",
    consumer_secret="thisismysecret",
    api="wp-json",
    version="pressbooks/v2",
    wp_user="phil",
    wp_pass="thisismypassword",
    oauth1a_3leg=True,
    creds_store="~/.wc-api-creds3.json",
    callback="http://books.home.local/omniana/api-test"
)
print("creating new chapter")
resource = "chapters"
data = {
"content": "test",
"title": "test",
"status": "publish",
"chapter-type": 48,
"part": 27
}
try:
response = wpapi.post(base_url+api_path+resource, data)
pprint(response.json())
except Exception as e:
print("couldn't post")
print(e)

Finding the ids for chapter-type and part need a little detective work. You can, of course use an API call to GET the parts and  list their names and ids, in a similar way to listing the posts in the first example above; or you can just edit the part or chapter-type in the Bookpress admin interface and inspect the url. It’s also worth noting that you need a different creds_store for each OAUTH provider you connect to.

Next Steps

As I said, parsing reading through and parsing the transcluded the page transcriptions wasn’t too hard (I put some markers in the transclusion to help). I made some changes to the content before posting it: perhaps the most interesting issue was  changing the wiki style footnotes to Pressbook style.

At the time of writing, I have started posting to the live/public instance of Omniana on Pressbooks but still have to sort some formatting issues: removing line breaks, making sure that the CSS selectors are appropriate for WordPress; that shouldn’t take long to fix.

Then I want to start indexing the articles using wikidata for linked data.

The post Using the WordPress REST API to post a book from WikiSource to PressBooks with python appeared first on Sharing and learning.

PressBooks and ePub as an OER format.⤴

from @ Sharing and learning

PressBooks does a reasonable job of importing ePub, so that ePub can be used as a portable format for open text books. But, of course, there are limits.

I have been really impressed with PressBooks, the extension to WordPress for authoring eBooks. Like WordPress it is available as a hosted service from PressBooks.com and to host yourself from PressBooks.org. I have been using the latter for a few months. It looks like a great way of authoring, hosting, using, and distributing open books. Reports like this from Steel Wagstaff about Publishing Open Textbooks at UW-Madison really show the possibilities for education that open up if you do that. There you can read what work Steel and others have been doing around PressBooks for authoring open textbooks, with interaction (using hypothe.is, and h5p), connections to their VLE (LTI), and responsible learning analytics (xAPI).

PressBooks also supports replication of content from one PressBook install to another, which is great, but what is even greater is support of import from other content creation systems. We’re not wanting monoculture here.

Open text books are, of course, a type of Open Educational Resource, and so when thinking about PressBooks as a platform for open text books you’re also thinking about PressBooks and OER. So what aspects of text-books-as-OER does PressBooks support? What aspects should it support?

OER: DERPable, 5Rs & ALMS

Frameworks for thinking about requirements for openness in educational resources go back to the very start of the OER movement. Back in the early 2000s, when JISC was thinking about repositories and Learning Objects as ways of sharing educational resources, Charles Duncan used to talk about the need for resources to be DERPable: Discoverable, Editable, Repurposable and Portable. At about the same time in the US, David Wiley was defining Open Content in terms of four, later five Rs and ALMS. The five Rs are well known: the permissions to Retain, Reuse, Revise, Remix and Redistribute. ALMS is a less memorable, more tortured acronym, relating to technical choices that affect openness in practice. The choices relate to: Access to editing tools, the Level of expertise required to use these tools, the content being Meaningfully editable, and being Self-sourced (i.e. there not being separate source and distribution files).

Portability of ePub and editing in PressBooks

I tend to approach these terms back to front: I am interested in portable formats for disseminating resources, and systems that allow these to be edited. For eBooks / open textbooks my format of choice for portability is currently ePub, which is essentially HTML and other assets (images, stylesheets, etc.) with metadata, in a zip archive. Being HTML-based, ePub is largely self-sourced, and can be edited with suitable tools (though there may be caveats around some of the other assets such as images and diagrams). Furthermore, WordPress in general and PressBooks specifically makes editing, repurposing and distributing easy without requiring knowledge of HTML. It’s a good platform for remixing, revising, reusing, retaining content. And the key to this whole ramble of a blog post is the ‘import from ePub‘ feature.

So how does  the combination of ePub and PressBooks work in practice. I can go to OpenStax, and download one of their text books as ePub. As far as I can see the best-known open textbook project doesn’t seem to make ePub available (Apple’s iPub is similar, but I don’t do iBooks so couldn’t download one). So I went to Siyavula and downloaded one of their CC:BY textbooks as an ePub. Chose that download for import into PressBooks and got a screen that lets me choose which parts of the ePub to import and what type of content to import it as.

List of sections of the ePub with tick box for whether to import in PressBooks, and radio button options for what type of book part to import as

After choosing which parts to import and hitting the import button at the bottom of the page, the content is there to edit and republish in PressBooks.

From here you can edit or add content (including by import from other sources), rearrange the content, and set options for publishing it. There is other work to be done. You will need to choose a decent theme to display your book with style. You will also need to make sure internal links work as your PressBooks permalink URL scheme might not match the URLs embedded in the content. How easy this is will vary depending on choices made when the book was created and your own knowledge of some of the WordPress tools that can be used to make bulk edits.

I am not really interested in distributing maths text books, so I won’t link to the end result of this specific example. I did once write a book in a book sprint with some colleagues, and that was published as an ePub. So here an imported & republished version of Into The Wild (PressBook edition).  I didn’t do much polishing of this: it uses a stock theme, and I haven’t fixed internal links, e.g. footnotes.

Limitations

Of course there are limits to this approach. I do not expect that much (if any) of the really interesting interactive content would survive a trip through ePub. Also much of Steel’s work that I described up at the top is PressBook platform specific. So that’s where cloning from PressBooks to PressBooks becomes useful. But ePub remains a viable way of getting textbook content into the PressBooks platform.

Also, while WordPress in general, and hence PressBooks, is a great way of distributing content, I haven’t looked much at whether metadata from the ePub is imported. On first sight none of it is, so there is work to do here in order to make the imported books discoverable. That applies to the package level metadata in ePubs, which is a separate file from the content. However, what also really interests me is the possibility of embedding education-specific schema.org metadata into the HTML content in such a way that it becomes transportable (easy, I think) and editable on import (harder).

The post PressBooks and ePub as an OER format. appeared first on Sharing and learning.

Initial thoughts on EPUB-WEB (Portable Documents for the Open Web Platform)⤴

from @ Sharing and learning

facebooktwittergoogle_plusredditpinterestlinkedinmail

In a W3C Unofficial Draft White Paper “Advancing Portable Documents for the Open Web Platform: EPUB-WEB” published 21 Nov 2014, Markus Gulling of IPDF (curators of the EPUB standards) and Ivan Herman of W3C (curators of web standards) have highlighted the potential of a specification that brings EPUB on to the Web. Informally known as EPUB-WEB, the vision is that this specification would make “EPUB a first-class citizen of the Open Web Platform and as a result significantly reduce the complexity of deploying EPUB content into browsers, for online as well as offline consumption”

EPUB3 is based mostly on web standards, i.e. a collection of HTML5 files with associated bells and whistles (embedded video, audio, SVG, JavaScript, CSS) held in a zip archive with an XML manifest  to tell an application what is there and what order to display it in. So at first EPUB-WEB seems straightforward: get rid of the zip archive, use the manifest to point to files anywhere on the web (IMS Content Packaging has allowed a similar route with “logical packages” which allow for both local and remote components). But the draft white papers raises some interesting points

Firstly, on that manifest, in section 3.1 the authors note that while the zip file + XML manifest is a common pattern:

“W3C’s Web Application Working Group has, in its new charter, the task of defining a general packaging format for the Web to encompass the needs of various applications (like installing Web Applications or downloading data for local processing). It is probably advantageous for EPUB-WEB to adopt this format, thereby being compatible with what Web Browsers would implement anyway. While this general packaging format could hypothetically be compatible with the ZIP+XML manifest format used by EPUB (and also by the Open Document Format [ODF]) the broader requirements of installable applications and other types of content, and efficient incremental transmission over networks, may well imply a different and incompatible packaging format.”

Secondly, there’s a question about how you identify documents (and fragments within documents) reliably when they may be either online or off-line depending on whether the user has decided to “archive” them (and I think archive here includes download onto an ebook reader to take on holiday). “What is the URI of the offline version of the document”. Interestingly there is a link drawn with the W3C Annotation Working Group:

The recently formed W3C Annotation Working Group has a joint deliverable with the W3C Web Application Working Group called “Robust Anchoring”. This deliverable will provide a general framework for anchoring; and, although defined within the framework of annotations, the specification can also be used for other fragment identification use cases. Similarly, the W3C Media Fragments specification [media-frags] may prove useful to address some of the use cases.

And thirdly there is (of course) Metadata. EPUB 3 has plenty of places to put your Metadata. Most conventional publishing needs for metadata inside the EPUB file are covered with the range of metadata allowed in the manifest. However, there is additional potential for in-line metadata that is “agnostic to online and offline modes” that will “seamlessly support  discovery and harvesting by both generic Web search engines, as well as dedicated bibliographic/archival/retailer systems” The note points to schema.org in all but name:

The adoption of HTML as the vehicle for expressing publication-level metadata (i.e., using RDFa and/or Microdata  for metadata like authors or title) would have the added benefits of better I18N support than XML or JSON formats.

And what about application to learning? Taken in conjunction with the Annotation work starting at W3C, the scope for eTextBooks online (or whatever you want to call educational use of EPUBWeb for education) seems clear. One area that seems important for education use that seems inadequately addressed in the draft white paper is alternative presentations that would make the material remixable and adaptable to meet individual learner needs. There a little in draft about presentation control and personalization, but it rather limited: changing the font size or page layout rather than changing the learning pathway.

facebooktwittergoogle_plusredditpinterestlinkedinmail

eBooks and libraries, the right to eRead? #ebooks14⤴

from @ Sharing and learning

facebooktwittergoogle_plusredditpinterestlinkedinmail

About once a year I go to some meeting or another on libraries and eBooks. I nearly always come back from it struck by the tension between libraries, as institutions of stability, and the rapid pace at which technology companies are driving forward eBook technology.  This year’s event of that type was the Scottish Library and Information Council’s 13th annual eBook conference. The keynote from Gerald Leitner, chair of the European Bureau of Library, Information and Documentation Associations task force on eBooks was especially interesting to me in introducing the Right to eRead Campaign.

Leitner spoke about the ecosystem around ebooks and libraries and about the uncertainty and instability throughout the system. Can lending libraries compete  with commercial lending of eBooks (Amazon kindle unlimited, £6 per month for over half a million titles)?  Publishers too are threatened and are fighting, as the spat between Amazon and Hachette shows–and note, it’s not publishers who are driving the change to eBooks, it’s technology companies, notably Amazon and Apple.  Libraries are at risk of being the collateral damage in this fight.  And where do book lovers fit in, those who as well as reading physical books read ebooks on various mobile devices?

Leitner made the point that consumers and libraries very rarely buy eBooks; you buy a limited license that allows you to download a copy and read it under certain restrictions–and no, like most people I have never bothered to read those restrictions though I am aware of the limit to the devices on which I can read that copy, that I am not allowed to lend it and that Amazon can delete copies remotely (I don’t use Apple products, but I assume they have similar terms). A consequence of this relates to the exhaustion of rights. Under copyright authors have the right to decide whether/how their work is published, and the publishers may have the right to sell books that contain the authors work. But once bought the book becomes the property of the person who bought it; the publishers rights are exhausted, they cannot longer forbid that it be resold or lent. The right to lend and resell is provided by Article 6 of the WIPO Copyright Treaty and the EU Rental and Lending directive (2006/115/EC). Library lending rights are written into statute and accompanied by remuneration for authors. Ebooks, intangible, licensed and not sold, are classed as services by the EU Information and Service Directive (2001/29/EC), and for these there is no exhaustion of rights, no right to resell or lend, and no statutory guarantee that libraries may provide access.

The EBLIDA right to eRead campaign is about trying to secure a right for libraries to provide access to eBooks. The argument is that without this right to access  information itself becomes privatised at the cost of an informed democracy. The campaign is asking for a statutory exemption with IP law, or mandatory fair licensing that provides libraries with the right to acquire and a right to lend.

facebooktwittergoogle_plusredditpinterestlinkedinmail

Feeling Guilty⤴

from

I have now been the proud owner of a brand new Kindle for a week now.  I bought it for travelling.  One of the few times I get the chance to read is when I am away on holiday.  I am not a fast reader but carrying even a small pile of books is a bit of a pain when I am away from home.  Later this year we are going on a two week cruise (more of that nearer the time!) so thought that now would be a good time to make this purchase.

I am far more impressed with this little package than I thought I was going to be.  For £59 I can now read a very wide selection of books even in bright daylight (again, another reason for its purchase).  One of the advantages of being a fan of P. G. Wodehouse is that most of his books are available as free downloads!

Reading books when I am away on holiday is one thing.  Even reading late at night as I drop off to sleep is fine.  Why is it, then, that I feel guilty when I lift a book on a Saturday afternoon?  What sort of twisted work ethic means that I feel that I should be doing something far more productive on a day off work?  There is a long “to do” list needing done.  I could be outside tidying the garden.  The cars need washed.  All these are things that make me feel that I should not be sitting down doing nothing.  Why does reading a book feel like “doing nothing”?  Is physical activity more important or more valuable than intellectual activity? 

For now, the tasks can wait for a wee while.  We are going out this evening and I have been busy since early morning.  I am going to read for an hour.  Perhaps I might not feel too guilty for too long.  I have to try out my new toy after all. 

Happy reading.