Author Archives: @cullaloe | Tech, tales and imagery

Datacamp course – introduction to R⤴

from @ @cullaloe | Tech, tales and imagery

Having abandoned the data visualisation course run by Edinburgh University, and wanting to gain some further competence in R, I took the DataCamp “Introduction to R” course. This course is written by Jonathan Cornelissen, one of the founders of DataCamp and a man with seriously good credentials in R.

Basics

Assignment and operators

a <- 4		# assignment 3 ways
4 -> a
a = 4

1 + 2		# mathematical operators
4 - 3 
6 * 5
(7 + 9) / 2 
8^2		# exponentiation
10 %% 4		# modulo

x < y		# less than
a > c		# greater than
a <= b 
j >= k 
one == two	# equal to
up != down	# not equal to

Data types

12.5 / 2.5	# numerics
7 + 123		# integers are also numerics
7 = 3		# Boolean (TRUE or FALSE) are logicals
"Hello world"	# characters

class(x)	# what data type is x?

Vectors

A vector is a one-dimensional array (think of a row in a spreadsheet). In research, this is a single observation.

# using the combine function to create a vector
a_numeric_vector <- c(1, 2, 3, 4, 5)

# vectors can have column names
names(a_numeric_vector) <- c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday")

# printing the vector outputs the element names:
> a_numeric_vector
   Monday   Tuesday Wednesday  Thursday    Friday 
        1         2         3         4         5
> 

# using a vector to hold the column names
days_of_week <- c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday")
names(week-values) <- days_of_week

You can do some quick and easy arithmetic with vectors.

low_nums <- c(1, 2, 3, 4, 5)
hi_nums <- c(6, 7, 8, 9, 10)

total_nums <- low_nums + hi_nums

> total_nums
[1]  7  9 11 13 15

sum(low_nums) 	# adds up the elements in the vector
mean(low_nums)	# average of elements in the vector

low_nums[3]	# print the third low number (note 1-index)
hi_num[c(2:4)] 	# just get the middle values

The selection of elements can be conditional using boolean values in another vector.

> c(49, 50, 51) > 50
[1] FALSE FALSE TRUE

> nums <- c(1:99)	# vector of the first 99 integers
> fives <- nums %% 5
> nums[fives == 0]	# all of those divisible by 5
 [1]  5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95

In the last example above, fives == 0 is a vector of boolean values. Used as a selector in the nums vector, only the TRUE elements are selected.

Matrices

A matrix in R is a collection of elements, all of the same data type, arranged in 2 dimensions of rows and columns.

> # A matrix that contain the numbers 1 up to 9 in 3 rows
> matrix(1:9, byrow = TRUE, nrow = 3)
     [,1] [,2] [,3]
[1,]    1    2    3
[2,]    4    5    6
[3,]    7    8    9
> 

The access indicators are shown in the row labels and column headers above. So, element [2,3] of the matrix contains the value 6. The first row of my_matrix is the vector my_matrix[1,]. Row and column names can be set for matrices, as they can be for vectors. This can be done by calling rownames() and colnames(), or at the time the matrix is set up.

# Construct star_wars_matrix
box_office <- c(460.998, 314.4, 290.475, 247.900, 309.306, 165.8)
star_wars_matrix <- matrix(box_office, nrow = 3, byrow = TRUE,
                           dimnames = list(c("A New Hope", "The Empire Strikes Back", "Return of the Jedi"), 
                                           c("US", "non-US")))

The function cbind() binds columns to an existing matrix. rbind() does the same thing for adding row vectors to a matrix. rowSums() and colSums() do what they sound like - making new vectors ready to be bound into the source matrix if required.

Arithmetic operators work element-wise on matrices.

Factors

A factor is a data type used to store categorical variables. These are discrete variable which can only take a finite number of values (cf. continuous variables which can have any of an infinite set of values, like real numbers). R can make a vector of the categories from a vector of categorical values:

> birthdates <- c(12,4,13,23,31,16,1,9,12,4,8,24,27,25,24,25)
> birthdates
 [1] 12  4 13 23 31 16  1  9 12  4  8 24 27 25 24 25
> bd_factors <- factor(birthdates)
> bd_factors
 [1] 12 4  13 23 31 16 1  9  12 4  8  24 27 25 24 25
Levels: 1 4 8 9 12 13 16 23 24 25 27 31
> 

Such variables are nominal or ordinal according to whether they are just names, or if they can be ranked in some meaningful way. Ordinal factors are created with additional parameters, e.g., order = TRUE and levels = c("low", "high") and can be compared easily.

Data Frames

A data frame has the variables of a data set as columns and the observations as rows. A quick peek at the structure of a data frame is provided by head() and tail() functions, e.g.:

> head(mtcars)
                   mpg cyl disp  hp drat    wt  qsec vs am gear carb
Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1
>

mtcars is one of the many data sets built into R. A list of them is obtained by calling data(). str() provides a look at the structure of a data set:

> str(mtcars)
'data.frame':	32 obs. of  11 variables:
 $ mpg : num  21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
 $ cyl : num  6 6 4 6 8 6 8 4 4 6 ...
 $ disp: num  160 160 108 258 360 ...
 $ hp  : num  110 110 93 110 175 105 245 62 95 123 ...
 $ drat: num  3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
 $ wt  : num  2.62 2.88 2.32 3.21 3.44 ...
 $ qsec: num  16.5 17 18.6 19.4 17 ...
 $ vs  : num  0 0 1 1 0 1 0 1 1 1 ...
 $ am  : num  1 1 1 0 0 0 0 0 0 0 ...
 $ gear: num  4 4 4 3 3 3 3 4 4 4 ...
 $ carb: num  4 4 1 1 2 1 4 2 2 4 ...
> 

Columns of a data frame are added one column vector at a time as a list of parameters in the function call data.frame(). Selecting a data point from row 32, column 2 is a matter of calling df_bears[32,2]. Note the order - observation (row) first, then variable (column). A whole observation (e.g. the tenth) is obtained by df_bears[10,]. The first 4 data points from the paw_size column are df_bears[1:4,"paw_size"]. The whole column vector is df_bears$paw_size (notice the dollar sign notation). Subsets can be made calling subset(df_bears, paw_size < 4). Sorting can be achieved by making a vector of the data frame order, based upon the columns you are interested in:

> a <- order(df_bears$claw_size)
> df_bears[a,]
> 
> mtcars[order(mtcars$disp),]
>                      mpg cyl  disp  hp drat    wt  qsec vs am gear carb
Toyota Corolla      33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1
Honda Civic         30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2
Fiat 128            32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1
Fiat X1-9           27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1
Lotus Europa        30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2
Datsun 710          22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
...

Lists

Lists can contain arbitrary data and data types. They are constructed by calling list() with optional names for each component, e.g. list(top_dogs = df_dogs[1:10,], top_cats = df_cats[1:10,]).

Evaluation and next steps

I’ve found this introduction differently paced to the earlier introduction course run by the university. Because there is no instructor, attention has been paid to very small details: every aspect of the course works because it is programmatic. Learners have to take the right (small) steps to complete the exercises successfully. Errors are picked up and RTFQ-type prompts are given. This was less challenging than the earlier, demonstrator-led course but I completed this one, instead of bailing out feeling frustrated and weak. I also learned considerably more that is useful and earned a more secure foundation for further study.

I am working with RStudio on a daily basis now as I am producing documentation and course materials with Bookdown. My intention is to further develop competence with R and R-markdown.

OneNote Teacher Academy⤴

from @ @cullaloe | Tech, tales and imagery

The Microsoft OneNote Teacher Academy is a learning path comprising four short courses introducing the use of OneNote in teaching for lesson planning, assessment and activities for learning:

  • Getting Started with OneNote
  • OneNote Class Notebook: A teacher’s all-in-one notebook for students
  • OneNote Staff Notebook: Tools for staff collaboration
  • Independent learning with math tools in OneNote

These are my notes taken as I followed the learning path, including the reflection questions and my responses.

Contents

Getting started with OneNote

The course makes use of lock-step interactive videos which step you through mouse clicks to achieve the various functions in OneNote, which at first I thought was patronising, but which actually gets you used to the interface quickly. If you’re following and feel the same thing, stick with it.

Module 1: Getting started with OneNote for Windows 10

So, the introduction tells us that OneNote is a way of “staying organised, quickly capturing ideas, and accessing information across devices”. It’s a digital notebook that seems to rely on Windows 10 - the intro video seems to assume that all participants are using that OS and steps through “how to use Windows”. The OS is different on my MacBook, so the introductory video doesn’t work well at the start, but it gets better once you start using the OneNote app.

Reflection

What aspects of organizing words does OneNote mimic? - It’s like a real notebook and helps you make little documents or pages of notes.

What are some ways you might organize your sections and pages? - into chapters or topics, maybe. Gather those into project notebooks.

What topics might be an easy place to start a notebook or section? - Course structure, syllabus, one note per week, maybe.

Module 2: Getting to know OneNote for Windows 10

OneNote is organised into pages, which can be organised in sections like the tabs in a FiloFax. It has a built-in full-text search function which I think is a very good thing for students to have - it’s one of the major reasons I turned from Learn to Moodle 3 years ago.

Items like text boxes in the pages of OneNote are drag-and-droppable, a bit like the Apple Pages app, and allow a great deal of flexibility over how the pages look. Each page is like “a digital canvas” that can accept media, images and ink (from a stylus), or embed things like YouTube videos. Items can be hyperlinked.

Audio or video feedback can be inserted into pages on the fly.

User help in the interface is available through the “lightbulb” icon, aka “Tell me”.

Reflection

What are the benefits of being able to start writing or typing anywhere on the page? - you can be creative with the flow and functionality of your resources and activities.

What are the benefits of being able to insert pictures and media anywhere on the page? - same again, creativity in making logical storyboards for learning.

Module 3: Getting the most out of OneNote for Windows 10

This module exemplified the use of digital ink, (re)organising materials and sharing work in a team using OneNote. A number of workarounds in the interface were also exercised in here, such as when you need to right-click, or use a non-obvious menu to perform aesthetic changes like page colour. There are neat functions built in for tablet users in particular, such as “ink to shape” and “ink to text” that convert your scruffy notes and scribbles into something somebody else can read. The glory of form over function!

Inserting video links will embed the video itself.

One very interesting feature is the researcher tool, which is used for pulling together web sources and their citations. I’m not sure it’s going to lead to good academic writing, but I’ll try it out to see how it works.

Reflection

How much time do teachers spend looking for files in old email threads? - none at all. Not this one, anyway, I’m better organised than that. I delete almost everything once I’ve detached the information I need.

How does OneNote’s organization save teachers time? - I’m not convinced it does, yet. I think that quick re-organisation of pages and sections is what is being referred to in this question.

What is the benefit to being able to access information from all of your devices? - you can drop your device off a bridge and not lose any information.

Module 4: Digital Ink

This section described the time-linked digital ink feature of OneNote which allows you to step forward and back through the sequence of pen strokes. A teacher can do this in a student’s notes, too, to enable them to replay, video-like, the thinking of the learner as they constructed their responses. A blog post advertises the features in the recent redesign of OneNote and it does look impressive in the use case shown in the promotional video. I particularly liked the attention that has been paid to achieving consistency across different platforms and devices:

OneNote users often use a range of devices. Having a cohesive user experience across all screens makes it simple for users to jump from one device to the next.

This also helps promote consistency of experience in a learning community, enabling a common culture of understanding in collaboration.

Reflection

When would you encourage students to use the “playback” feature? - when revising previous work, or when looking at teacher feedback.

Would this be beneficial for a teacher looking at a student’s homework assignment? - of course, to uncover misconceptions.

Do you think teachers would use this as they take notes for a class or meeting? - I need to try it first, but it might enable the making of useful video tutorials for revision, or accessibility, or for students who are unable to participate live.

Trying it out

I went to my own instance of OneNote to try this playback feature out but couldn’t find it at all, which was frustrating. Nor could I find things like “ink to text”, the “tell me” lightbulb, or many of the features this course has described. Perhaps the OSX version of OneNote I am using doesn’t know about the consistency across devices this module is talking about. Further reading reveals that although they are superficially similar, there are substantial differences in the feature sets in OneNote, depending on which device you are running it on. This gives the lie to the earlier content, something I am genuinely disappointed to discover, having made a real effort to suppress my visceral, justified and hard-earned distrust of All Things Microsoft.

OneNote links tend to open not in the application but a browser window - yet another version of OneNote that doesn’t have the features described so far in this course. The ecosystem seems to be significantly varied across platforms.

The very last line of module 5:

This course covers features in the OneNote Windows 10 app. To see which features are available in other OneNote versions, visit this site: The differences in OneNote.

The list isn’t complete or accurate, for example, the replay function I was genuinely excited by, isn’t in the list. Spending more time exploring the OneNote interface created more difficulty. I tried to connect to Teams, where I have set up a team for the new cohort and those remaining from previous years, but with little success, and little help when it didn’t work:

Screenshot on connection

Module 5: A content library brimming with rich resources

The last module in this “getting started” course is a collection of videos or examples of content made by other teachers (“OneNote Ninjas”). I couldn’t see how the content library worked at all, nor how it worked in teaching practice. Some clues were deducible from the end of course quiz - it seems to be a read-only resource space for learners.

Reflection

How might the Content Library be a unique place for students in your class? - in the same way perhaps, as the class cupboard is unique.

What kinds of media might you add to make your Content Library a go-to place for students? - not sure, because I couldn’t get it to work, but I would hope any digital media.

Once your Content Library is complete for one class, would it take much to alter it for next year’s class? - it depends on the class and any changes to the course but in principle, maybe just change all the dates.

The quiz

A very low bar to leap over. For example…

  • OneNote can locate any word on any page in any of your notebooks with the ____ function.

    • Looking
    • Hunt
    • Search
    • Where is it?

Still useful, though, because the quiz added some details that weren’t in the course content.

OneNote Class Notebook: A teacher’s all-in-one notebook for students

The course uses the same approach as the first, with interactive videos.

Module 1: Getting started with a OneNote Class Notebook

The Class Notebook is intended to help teachers deliver content, work collaboratively and provide private feedback to learners. It is described as being like a physical notebook, filing cabinet and whiteboard in one. This is achieved by providing a private notebook for each student; a content library; a teacher-only area; and a space for collaboration with students or staff.

The teacher can look at any of the student notebooks, and provide private feedback or commentary. The collaboration space allows for peer work and assessment. The content library is only writable by the teacher.

Reflection

What types of materials would be best housed in the Content Library versus the Collaboration Space? Think of a few ways to organize your Class Notebook and start by creating one section and giving it a try. - Static materials which do not require editing or annotation by students.

I created new sections within the collaboration space and content library, and even created a new section group, but cannot immediately tell how access is managed in these groups. Further investigation revealed that you cannot do this in the OneNote app: it can be done by logging in to the Microsoft OneNote website and finding Manage notebooks. I found this difficult because the browser version of my Class Notebook and the OneNote version were different (different groups, and sections not in sync). The browser interface offered connection to the notebook, which it did in the browser, but still gave me no insight into who could see my new group. The lightbulb “Tell me what you want to do” offered a very narrow scope (menu items) or a wider search using Bing (what else, when you’re in a corporate prison?) which isn’t available without acknowledging data permissions. I’m stuck in a hole within an application infrastructure which is utterly getting in the way of what I am trying to do.

Module 2: Setting up a OneNote Class Notebook

It says here, “OneNote Class Notebook can assist teachers in providing:

  • Individualized instruction
  • Meaningful feedback
  • Rich multimedia content”

We’re getting a lot of repetition in the videos now, and more patterns emerging. Setting up a OneNote Class Notebook starts not in the app, but on (yet another different) website: Office 365. This can also be done via the MS Teams app: I set up a class notebook for the new cohort Team OK. The dialog for this task sets out the default access permissions for the Collaboration Space; Content Library; and Student Notebooks.

Class notebook setup

Once the class notebook is set up in Teams, you can open it in the OneNote app. Deleting a notebook is a completely different matter: it is ridiculously hard to do this and will take you and hour or so of Googling: different procedures are needed according your platform, Office account setup and which way the wind is blowing. I followed this set of instructions to delete the class notebook I had set up, naïvely, in the OneNote app. That didn’t work at all, because the menu selections on the website were different (there is no “Manage and Delete” option in OneNote today). I have no idea how to delete a NoteBook. I think it can be done through the online access to OneDrive, but I can’t tell if I deleted a link, or an actual item, or whether in fact it was just moved somewhere else. That situation gives me no confidence in this software suite at all. The more I learn about it, the more I dislike it. Still, we’re here now, so let’s see this through…

Reflection

In what ways would OneNote Class Notebook allow you to differentiate your instruction to various students? - by writing different things in their notebooks. By providing different materials for them in the content library.

What is it about Class Notebook that might expedite giving feedback compared to traditional paper assignments? - synchronisation. The student’s notebook is updated as soon as the teacher writes in it.

Module 3: Getting the most out of OneNote Class Notebook

It says here, “OneNote Class Notebook has many features to help teachers:

  • Distribute assignments
  • Write feedback
  • Share with parents - a unique link”

OK, I get it. It’s like Church. Keeping saying the mantras, singing the hymns, re-stating the creed, and eventually we all are saved. The more I am sold to, the less I want to buy. I’m going straight to the reflections before watching the video this time:

Reflection

What are the advantages to verbal feedback for a student? - it seems personal, and can be nuanced in non-verbal ways through tone and gesture. It can be individual, and can be replayed (if a clip is available, which presumably it is) at different speeds for comic effect to help understanding.

What are the advantages to verbal feedback for a teacher? - it might be quicker, especially if you do it in one take. There are some very real disadvantages too, in that it can be replayed in contexts you hadn’t intended.

When might teachers use this feature? - I wouldn’t normally, unless I had a vision impaired student who preferred it, but can see why some might be tempted to try it.

Things just don’t work

I discovered more functionality glitches as I followed the “getting the most…” video. I think these are bugs, rather than poor connectivity between the various interfaces - Teams, the OneNote app, the different websites that manage the spaces, etc. Things just don’t work: it’s clunky and inconsistent in daily usage, resulting in huge time costs for the teacher. For example, in OneNote app, I found “Manage Notebooks” which popped up what looks like a browser container, where I added a teacher only space without errors. This did not add the teacher only space in the Class Notebook in the app, but it was visible in the browser. Browser and app were synching OK, because I could type in one and see the changes with less than a second of latency in the other1.

I like the idea of “distributing” a page (assignment) to all student notebooks at the same time, then being able to review all of their completed edits of their copy of that page (assignment) simply in the interface. I am not sure I can rely on that without testing it first, however, so am not intending to spend any time on this. The collaboration space idea is implemented in a difficult and weird way also: individual groups of students can be allocated to sections within that space but if a deadline passes, the entire collaboration space must be locked rather than the individual section.

I am increasingly coming to the conclusion that BlackBoard Learn might be my best option for Physics PGDE in the coming session. This is a real pity because of Learn’s deficiencies. I wonder if I will go maverick again and launch a Bookdown site as a searchable resource for my students.

Module 4: Immersive Reader and Researcher

This module provides detail on some of the accessibility features built into Class Notebook, including the Immersive Reader, which is a nice idea but again, just does not work reliably. It failed to recognise login credentials when I tried to use it the first time but I was able to get it working once I’d logged in and out a few times of the various interfaces. When it does work, it’s a screen reader - “immersive” meaning that it removes sidebars and ribbons, displaying just the text of the document you’re looking at in a large font, with a moving highlighter that follows the voice.

Researcher is a weak tool that lowers the access bar for students who need help doing basic searches for information. It seems to have been lifted, like other tools in this suite, from the main Office toolset.

Reflection - see above.

Module 5: Making Math Happen in OneNote

This module is not about learning mathematics and is definitely not about “… a virtual math tutor with OneNote”. It is just a video pitch for OneNote using children talking about glitter pens and toys in the OneNote space, and a very brief mention of “ink to math”, which doesn’t seem to exist in the app, but is available in OneNote in the browser. It’s a digital ink converter that recognises mathematical symbols.

Module 6: Accessibility in OneNote

This module is more on accessibility features and functions in OneNote - a button you can click that tells you “everything is OK” in your documents.

Module 7: Translation

Self-explanatory, this one: using the share button on any document within OneNote allows you to export it through Microsoft’s Translator application if you have it installed. I don’t because it isn’t available for OSX, and it doesn’t seem to be available in the browser version of OneNote, either.

The quiz

Just as challenging as the last one. Earned myself a “Certified Microsoft Innovative Educator” badge for having completed 2 hours of online Microsoft Educator CPD.

Review of progress and next steps

What I have learned in the past few hours (a lot more than the two hours these two courses suggest) is that Microsoft tools are most definitely not the way forward for my hybridised teaching in the autumn. Too much time is required in trouble-shooting problems, bugs, inadequately documented differences between platforms, and design flaws. I want a piece of chalk, not a stick of cheese and some certified training in how to use it on a blackboard.

Your mileage may vary, of course: if you are fully embedded within the Redmond way of things, then good luck to you. This particular teacher is going to be innovative in other ways, and not be burdened by having to work this difficult software. Class Notebook is not going to feature in my teaching.

I completed the rest of the OneNote Teacher Academy learning path (without taking notes, digital or otherwise) and earned my badge.

MIE certificate

My transcript is available here.

Footnotes

  1. Later, I got this to work by signing out in the OneNote app, connecting to the University VPN, signing in to the university’s EASE authorisation (in the browser), and then restarting the OneNote app, and signing in (to Office 365) in the app. That is just such a ridiculous waste of time messing around trying to figure out why it doesn’t work, only to have it suddenly behave itself at the point of launching the laptop out the window. For reasons of mental wellbeing, I am heading back to avoiding Microsoft software as far as I possibly can. Life is just too short for this nonsense. 

Deploying a Bookdown site securely⤴

from @ @cullaloe | Tech, tales and imagery

I have been writing documentation for a project in markdown using RStudio, which provides a nice way of packaging it all as a static (html) website. I wanted to share this work with colleagues securely.

Writing workflow

The documents exist within an RStudio project and are built to a folder containing static files. That folder is by default _book, but I change this to docs to make it easy to deploy as a github site if I wish1. Configuration management is a crucial element to proper productivity, not just in software but also in all walks of life where documentation is important. Because of this, I use github to store my work safely, should I lose a laptop or suffer some other first-world calamity. It’s one of the reasons I use markdown when writing: configuration management is well-suited to text-based documents because it is easy to track and manage changes.

Although I keep the source files on github, I haven’t published this project to github pages because it should not be publicly available: instead, I deploy to a VPS (Centos/Apache/Plesk), putting it all behind a login.

The domain

I set up a specific domain static.cullaloe.net for this project, and secured it with an SSL certificate.

The files

Clone the GitHub repository into a new folder somewhere behind the web-facing directory (i.e. not in httpdocs). In this example, both the repository and the local folder are called “foobar”:

$ git clone https://github.com/githubuser/foobar.git /var/www/vhosts/[domain]

It is not necessary to specify the target directory, you’ll get that as default. It is not possible2 to selectively clone a github project: it’s all or nothing. /var/www/vhosts/[domain]/foobar now contains all of the source files of the project.

Permissions

You need to create a .htpasswd file in the server somewhere, containing the username and password you wish to grant access to your files to:

$ /path/to/htpasswd -c /var/www/vhosts/[domain]/.htaccess user1

This prompts you for the password you wish to set for this user. Adding another user is the same command without the -c option.

The server

You need to tell the Apache, using Alias, where to find the files, and with <Location>, control who can access files at the URL you are trying to protect. In the Plesk control panel, Apache & nginx Settings for static.cullaloe.net ···:

Alias /foobar /var/www/vhosts/[domain]/foobar/docs
<Location /foobar>
	AuthType Basic
	AuthName "Restricted access"
	AuthUserFile /var/www/vhosts/[domain]/.htpasswd
	Require user user1
</Location>

Outcome

I can easily continue to work on my project documentation, updating it from time to time for colleagues who are interested in seeing what I’m doing. I make (neurotic) use of github for configuration management and safekeeping of all my hard work anyway, so updating the site just requires $ git pull from the repository folder on the web server. They can then view the documentation in a browser, or download a pdf or docx that is up-to-date with my current progress.

Notes

  1. In bookdown.yml, add the line output_dir: "docs"

  2. As far as I know, anyway. 

Microsoft Tools for Education⤴

from @ @cullaloe | Tech, tales and imagery

Having spent time thinking about the principles and approaches I will be taking with my teaching in the new semester, now I must move to thinking about the tools that I will be using.

Goodbye, Moodle

For the past 3 years, I have used a Moodle instance to support and communicate with students in preference to the University’s BlackBoard Learn. The main reason for this (there are several) is that Moodle is user-centric with features like search. Learn doesn’t have that, and is one of those systems that is designed for the designer, not the user.

Moodle has increasingly become more costly to sustain because of a series of ongoing bugs, most recently following the upgrade to 3.9, search has stopped working. Now, I’m a busy person and don’t have time to debug this right now, so I have decided that I need to use a more stable system of tools.

Community

One of the single most important aspects of learning is the community you learn with. Former students have continued to access their Moodle course for a long time after the PGDE ends. I don’t want to abandon them, or break that community, so I have set up a Slack workspace for former PGDE Physics students, including those more senior who did not have the Moodle experience. So far, they have responded well to the idea. I will be using Slack to connect to the new cohort, outside of the official channels which they cannot join until they matriculate in September. This will give them a head start on building community.

Conformity

There are a lot of people using Microsoft tools, and the university is no exception: we make increasingly consistent use of Sharepoint, Office365, OneDrive, OneNote, and the ever-present PowerPoint. Now, whilst there is a very good argument not to be suckered into the global brand domination of Microsoft, you can’t say that Bill Gates has kept all the money for himself, so there is a reason to go with the flow. The other, closer to home, reason is the perceived equity of the student cohort. Our students hate difference in experience, perceiving it to be somehow uneven or even discriminatory. It isn’t, and never has been. It’s good pedagogy to try to meet the particular needs of your learners, so teachers will always try to adapt and innovate to make things better for their own, very special, learners. That’s why I have operated a Moodle site outside of the mainstream experience for my Physics cohorts in the past 3 years.

I’m going to try to use the MS tools because the university has spent a lot of money providing them. Because it’s what the rest of the team are doing, and I might be able to offer support to my colleagues as we move forward into hybridised teaching together. Because the time cost of operating a maverick set is unnecessary, when I have the cost of conforming to bear also.

Getting ready

I need to skill up. So, I’m going to pick up an MIE, starting with the OneNote Teacher Academy course. I’m starting here because it’s where I need to begin in shifting content from the old VLE into something newer: not Learn, because it’s not fit for that particular purpose, but OneNote. Here we go.

Introduction to Mendeley Advisors Webinar⤴

from @ @cullaloe | Tech, tales and imagery

This webinar was run for the benefit of Mendeley advisors. I thought I might run a session on citation and referencing for students and so went to this webinar for a refresher on the advisor programme.

The webinar was presented by Dr Rob Moore, an AP at Old Dominion University, USA, and Rachel Brennesholtz, Head of Mendeley advisors programme at Elsevier in Amsterdam, who gave an introduction.

Why Mendeley, and how to introduce it

Rob cited the principal advantage of Mendeley over Zotero as the ability to annotated the pdfs within the app. He talked about some of the “gotchas” in Mendeley, specifically, getting the title right on import of a citation, and the inconsistency with author names.

He then discussed his demonstration approach when showing Mendeley to new users, which starts with input of several citations, showing the immediate build of the bibliography files. He then moved to demonstration of annotation of pdfs on an iPad.

The advisor community

Rachel told us about the world-wide community of advisors and users, and shared some statistics on its reach - 22,000 students were introduced to it in about 300 events in 2019. She shared a links to the advisor community1 and advised us to email community-@-mendeley.com2 if the advisor badge doesn’t show on your profile.

Benefits of the programme include expanded network storage (7.5 GB) and exclusive product updates and webinars. There’s merchandise, too (t-shirts, pens and stickers, etc). You have to register an event to get these, which can be online or in person. The Mendeley event can be part of other events (such as sessions on citation and referencing for example). Allow 4 weeks to get your stuff from Amsterdam.

Resources

Rachel winged it a bit as her slides froze, but then got on to showing the PowerPoint slides and other materials available to support training sessions put on by advisors.

Next steps

I’ve been prompted by a colleague today to make a video on note-taking using Cornell and sketch noting, both of which I use and teach as part of my work at Moray House. My colleague also shared this YouTube video of someone doing this, without actually understanding how the Cornell system works: like a lot of YouTube channel stars, that presenter knows about the product without understanding the process, which is the important part. I think I might also make a separate video or session on citing with Mendeley for academic writing.

Notes and references

  1. The advisors portal is here

  2. Remove the hyphens for the correct address. 

Teaching online: so far, so good.⤴

from @ @cullaloe | Tech, tales and imagery

I wanted to note here some reflections on how I have been doing in my experience with teaching remotely during the COVID-19 “lockdown”, which came as the PGDE Secondary programme I teach on was in its final few weeks. I ignored the suggestion that we should only have a narrow focus for our teaching at this stage, being the preparation for GTCS registration, the final task for most of the students in my cohort of about 16 physics PGDE students.

At the same time as thinking about how it’s gone so far, and sharing some of the student feedback, I am also going to reflect briefly on one aspect of how to prepare for a “digital-first” or hybridised model of teaching when the next cohort arrives in mid-September, several weeks later than usual.

The values of near future teaching

The Near Future Teaching project ran between 2017 and 2019, with the goal to develop a values-based vision for the future of digital education at the University of Edinburgh. In planning for a different delivery, it is worth considering the four values1 the project identified:

  1. Experience over assessment

    Learning should not be over-assessed and instrumentalised. Teaching should share a focus on employability and success with an understanding of the value of rich experience, creativity, curiosity and – sometimes – failure.

  2. Diversity and justice

    Education should design-in meaningful diversity and real inclusion across all areas of activity. All near future teaching should further social responsibility and global justice.

  3. Relationships first

    Relationships, dialogues and personal exchanges between students and staff build understanding in a way that is not possible via transmissive forms of teaching. Teaching should be designed to provide the time and space for proper relationships and meaningful human exchange.

  4. Participation and flexibility

    The University community should cooperatively shape how – and what – it learns and teaches. Flexibility for individuals, fluency across disciplines and cooperative responsibility for curricula should shape near future teaching.

Now, we are not in a position to just tear up all that has gone before and begin with a new design. We must retain all that is good and work with what we’ve got: there are many reasons for this, not least of which is time and cost; but also the fact that our initial teacher education programmes are carefully and rigorously accredited by the GTCS. This means that we are not in a position to substantively alter content.

What we must do, therefore, is focus on the elements of delivery and overlay the changes imposed upon us by that, on our core material. This is hard to do, and much work is being done to anticipate the future and plan for equitable and valid experience on the PGDE as it will be delivered in the coming session. These four values are helpful to me in thinking about how I do my part of this, alongside professional learning, reflection and dialogue with colleagues. I like to think, and can show, that these values are already present in my teaching and those of my colleagues in initial teacher education: these values reflect those of the GTCS professional standards by which our product is measured. So, the future is now. Let’s get on with it.

How have I done so far?

In my online teaching, I have found that direct teaching, which has a place in the usual run of things, is significantly less effective. I have learned to move away from this towards a more student-focused pedagogy which makes frequent use of groups working together on a problem or question, and plenaries or discussions intended to draw out key ideas and actions, and which offer some form of formative assessment opportunity.

An example of this might be a stimulus question like this, which I used in the last tutorial they had as a group:

A tutor gives the following feedback to a student on an essay: “You assert without evidence that making shy young people do a presentation to peers will help ensure they become confident somehow. How do you know that?”

In groups, consider what specific advice you would offer next year’s PGDE students about this challenge from their tutor.

Which is followed by a well-signposted time of about 10 minutes for students in small groups discussing, and preparing to share their responses.

I asked the students to share their thoughts on how they thought the last online session had gone, compared to those in recent weeks. Here is a small selection of their responses.

On the structure of the session

  • I liked the breakout groups.
  • Breakout groups were good for engagement and keeping me interested in the discussion.
  • … today was the best, lots of breakouts, lots to do, didn’t feel like an hour and a half.
  • Thanks, good session today, good amount of content w/o overloading, and good balance of big group and breakouts
  • The later sessions have been more interactive and using the breakout rooms and mixture of chat and voice has been much more productive/engaging.
  • Yeah it flew by! Pace was good, lots to keep me engaged and interested
  • I liked the breakout groups and discussion on stimuli.
  • Nice wholesome session, Breakout sessions really make it interactive for us.
  • Enjoyed the frequency of the breakout groups today they seemed to come at just the right time and varied the pace of the tutorial.

On the stimulus

  • The stimulus for thought at the end was interesting, especially as there were different levels to engage with it.
  • Giving us a small stimulus and letting us get on with it worked well I thought.
  • Particularly liked the last stimulus and got me thinking reflectively as a learner and teacher.

Other remarks

  • Useful as always. Ideas for practical online delivery as well as things to think about.
  • And some reassuring chat about probations was appreciated

Reflections

So far, so good, but I have not discovered any solutions here. This model cannot be applied to the entire course, and like all good teaching strategies, has a place in amongst a boxful of others. It would become very dull indeed, if I used this formula every week. What is clear is that pace and variety are very much more important in the online context. The implication of this is that these sessions in the new academic year must be time-limited, well planned and tightly managed.

Does that seem obvious? It probably is, but I think I had to go through this with my last cohort in order to have that clear vision for my next.

Notes

  1. I’m thankful to my colleague and friend Val Gordon for sharing her perspective on this project, which was described by Professor Siân Bayne during her recent keynote at the University’s Learning and Teaching Conference. 

Statistics and Visualisation with R #3⤴

from @ @cullaloe | Tech, tales and imagery

Post number three of my notes from a course on Statistics and (data) Visualisation with R, presented by Lucia Michielin of the University of Edinburgh in June and July.

Third week

Programme overview

Class Date and time Title
Class 1 01 June 13:00-14:30 Intro to R and R studio
Class 2 02 June 13:00-14:30 Types of data and Grammar of graphs
Class 3 08 June 13:00-14:30 Intro to statistics and descriptive stats
Class 4 09 June 13:00-14:30 Boxplot and playing with Colours
Class 5 15 June 13:00-14:30 Data Collection Bias, Probability, and Distribution
Class 6 16 June 13:00-14:30 Hypothesis testings and the main tests
Class 7 22 June 13:00-14:30 Barcharts and cleaning the sample
Class 8 23 June 13:00-14:30 PCA and Cluster analysis
Class 9 29 June 13:00-14:30 Covariance, Regression, Similarity and Difference coefficients
Class 10 30 June 13:00-14:30 Recap and Bring your dataset class

Class 5 - Data Collection Bias, Probability, and Distribution

After a quick review of the challenge from last week, an agenda for today was shared which wasn’t really followed1.

Data Collection and bias

“Data collection is the process of gathering and measuring information on targeted variables in an established system, which then enables one to answer relevant questions and evaluate outcomes. The goal for all data collection is to capture quality evidence that allows analysis to lead to the formulation of convincing and credible answers to the questions that have been posed.” (Wikipedia)

The sample is taken from the population, but a bias is introduced when the sample is taken. One example of this is the survivorship bias.

Probability

A basic definition was given to the class, with an example from an earlier class that helped us. If we make the statement if **A** is true then **B** is true can we infer from this that if **B** is true then **A** is true? Clearly not, but we might be able to say if **B** is true then **A** is more plausible. Plausibility here relates to how probably something can be.

“Probability is a numerical description of how likely an event is to occur or how likely it is that a proposition is true.” - Wikipedia

Distribution

In introducing this topic, we were shown the really neat rnorm() function that generates random data sets with specified mean and standard deviation.

In the class, we were given a challenge to go through the steps needed with a new data set to examine it. These steps seem to be just to plot it and have a look. All of the curves looked to be distributed around a central mean, which makes them normal distributions. There was a problem with the challenge files which meant that some of us didn’t have the correct files and so did a different (trivial) task. The key message is to examine your data as soon as you have it, to look for central values and distribution, so as to understand it and how you expect it to behave in analysis.

Other distributions include uniform, logarithmic, left-, right-skewed and bimodal.

Class 6 - Hypothesis testing and the main tests

Just half a dozen students today. We started with a review of the mini-challenge from last session, which included a consideration of colours: HTML colour and a ggplot colour reference were shared.

One emphasis today is that there are multiple ways of doing the same thing in R, with an illustration of data importation and summarisation. Although some of the few remaining students seemed to be following the narrative, I became increasingly frustrated and lost trying to follow the class: the thread was along the lines of hypothesis testing, the null hypothesis and corrections.

Playtime discoveries

Between this week’s two classes, I had a quick look at the Comprehensive R Archive Network (CRAN) and discovered a neat thing. Remember the funky assignment operator <- from the first class which had its funky keyboard shortcut alt -? Most programming languages I’ve used have a simple equals sign as the assignment operator and I was a bit baffled as to why super-sexy R should do something so weird as <-. Well, it turns out that in most contexts, = works in exactly the way you’d expect, so these are equivalent:

> x <- 99
> x = 99

What’s interesting about the R syntax, though, it that it can be reversed and it still works:

> 99 -> x

I’m sure that the case for this will become apparent eventually. One more way to do assignments is to call the assignment function:

> assign('x', 99) # notice the weird need for quotes around x in the function call

Next steps: other ways to learn R

I got totally overwhelmed with the rate of new concepts in this class, or perhaps the pace and style of delivery. I think, given the time I have to spend after each class going over the material in order to make sense of it, it is clear that this course is not right for me. I have decided to abandon it and follow other materials more appropriate to my purpose, which is to develop skills in data visualisation using R. Despite the title, this course is not doing that for me.

Having decided2 to seek out other ways to learn how to R, I found that there are lots of self-directed learning resources out there, of course. An introduction to R course at DataCamp will be the new direction of travel down this rabbit hole for me.

Notes and references

  1. I seem to be not the only one amongst the diminishing cohort (we’re down to a dozen today (Monday), having started closer to 50) that is finding it hard to follow this course. Yes, it’s interesting and cool, the different things you can do with R and geometries, but I am missing clarity, signposting, structure and other basic pedagogical features. 

  2. Taking time after class 5 to review the presentation slides, it was possible to reconstruct perhaps what the instructor was intending, for example, in the two slides on probability and more generally with this class today. I might have got very little from today’s session without spending as long again going over the materials. I am perhaps going to seek out other resources

Rendering LaTeX in Jekyll pages⤴

from @ @cullaloe | Tech, tales and imagery

I wanted to render a formula for standard deviation on a page in a site built on Jekyll. Sal Khan’s Khan Academy has tackled a similar problem and developed their own solution. Not only have they done that, but they have made it available to the world under an MIT Licence. That solution is KaTeX.

Described as “The fastest math typesetting library for the web”, it has no dependencies and renders very quickly on all major browsers. The API is simple and openly accessible, with javascript libraries publicly hosted in a CDN. It is also implemented in kramdown, the MIT-licensed markdown parser, which makes it an ideal solution for sites that are built on Jekyll, such as those hosted on github pages.

To implement it in a Jekyll site, the markdown parser should be set in config.yml:

markdown: kramdown

In the <head> section, perhaps in the included file _includes/head.html, add the KaTeX libraries and css:

  <!-- KaTeX -->
  <link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/katex@0.11.1/dist/katex.min.css" integrity="sha384-zB1R0rpPzHqg7Kpt0Aljp8JPLqbXI3bhnPWROx27a9N0Ll6ZP/+DiW/UqRcLbRjq" crossorigin="anonymous">

  <!-- The loading of KaTeX is deferred to speed up page rendering -->
  <script defer src="https://cdn.jsdelivr.net/npm/katex@0.11.1/dist/katex.min.js" integrity="sha384-y23I5Q6l+B6vatafAwxRu/0oK/79VlbSz7Q9aiSZUvyWYIYsd+qj+o24G5ZU2zJz" crossorigin="anonymous"></script>

  <!-- To automatically render math in text elements, include the auto-render extension: -->
  <script defer src="https://cdn.jsdelivr.net/npm/katex@0.11.1/dist/contrib/auto-render.min.js" integrity="sha384-kWPLUVMOks5AQFrykwIup5lo0m3iMkkHrD0uJ4H5cjeGihAutqP0yW0J6dpFiVkI" crossorigin="anonymous"
      onload="renderMathInElement(document.body);"></script>

In my implementation, I set a flag in the yaml header of any page that requires \(\LaTeX\), for example:

---
layout: post
title: Render LaTeX in Jekyll pages
date: 2020-06-11 08:24
published: yes
katex: yes
tags:
- Jekyll
- LaTeX
---

Then conditionally output the header code above only in pages that need to load the css and script:


{% if page.katex %}
  <!-- KaTeX -->
  .
  . etc.
  .
{% endif %}

Using it

Having set the katex flag, I can now type LaTeX code either as display (slightly larger and in its own block on the page), or inline:

$$\LaTeX code$$     % (for inline)
\\(\LaTeX code\\)   % (also for inline)
\\[\LaTeX code\\]   % (for display)

The script converts the $$ code above into either \\( inline or \\[ display syntax which is output as html:

<p>\(\LaTeX code\)     % (inline)</p>
<p>\[\LaTeX code\]     % (display)</p>

and renders thus:

\(\LaTeX code\) \[\LaTeX code\]

Example: binomial coefficient

In the markdown for this page, if I type:

\\[
    \binom{n}{k} = \frac{n!}{k!(n-k)!}
\\]

Jekyll and kramdown will output this html:

<p>
\[
    \binom{n}{k} = \frac{n!}{k!(n-k)!}
\]</p>

In the client browser, the KaTeX javascript will render thus: \[ \binom{n}{k} = \frac{n!}{k!(n-k)!} \]

Caveats

Notice this solution does not render from the server side. All of the rendering is done on the client using the javascript loaded in the head of the page. It just does it very, very quickly.

Statistics and Visualisation with R #2⤴

from @ @cullaloe | Tech, tales and imagery

This is the second in a series of posts containing my notes from a course on Statistics and (data) Visualisation with R, presented by Lucia Michielin of the University of Edinburgh. The course is running two days a week in June and July.

Second week

Programme overview

Class Date and time Title
Class 1 01 June 13:00-14:30 Intro to R and R studio
Class 2 02 June 13:00-14:30 Types of data and Grammar of graphs
Class 3 08 June 13:00-14:30 Intro to statistics and descriptive stats
Class 4 09 June 13:00-14:30 Boxplot and playing with Colours
Class 5 15 June 13:00-14:30 Data Collection Bias, Probability, and Distribution
Class 6 18 June 13:00-14:30 Hypothesis testings and the main tests 
Class 7 22 June 13:00-14:30 Barcharts and cleaning the sample
Class 8 23 June 13:00-14:30 PCA and Cluster analysis
Class 9 29 June 13:00-14:30 Covariance, Regression, Similarity and Difference coefficients
Class 10 30 June 13:00-14:30 Recap and Bring your dataset class

A bit of homework

Errors

I had the chance to play with the IDE and learn a few things by invoking errors.

> ggplot(college, aes(x=sat_avg, y=admission_rate)) + geom_point()
Error in ggplot(college, aes(x = sat_avg, y = admission_rate)) : 
  could not find function "ggplot"

The above occurs if you just run that line of code without first loading the library in which ggplot lives, i.e. by calling library("tidyverse") first.

> ggplot(college, aes(x=sat_avg, y=admission_rate)) + geom_point()
Error in ggplot(college, aes(x = sat_avg, y = admission_rate)) : 
  object 'college' not found

This error is thrown because the data object ‘college’ has not been created. Do this by loading the data first, i.e. call college <- read_csv('http://672258.youcanlearnit.net/college.csv').

Mutate and as.factor

Last week’s class included the creation of the dataset using this call:

college <- college %>%
  mutate(state=as.factor(state), region=as.factor(region),
         highest_degree=as.factor(highest_degree),
         control=as.factor(control), gender=as.factor(gender),
         loan_default_rate=as.numeric(loan_default_rate))

The mutate() function adds new variable columns to the dataset or replaces existing ones if the same name is used. You can see that state is replaced by a different version of itself: the as.factor() function takes data and turns it into a factor if it isn’t already. “Factor” is a term that indicates that the data is a category or enumerated type, rather than just a set of strings. To see this, consider this:

> var=letters[1:5]
> var
[1] "a" "b" "c" "d" "e"
> var=as.factor(var)
> var
[1] a b c d e
Levels: a b c d e

var is created as a vector, then converted to a factor or column in the as.factor call.

Class 3 - Intro to statistics and descriptive stats

  • Discuss the results of the challenge
  • Intro to Statistics
  • Descriptive Statistics
  • Summarising Statistics
  • Exploring 1 variable: Plotting distribution Histograms and Density plot

The results of the challenge

Last week’s challenge was discussed, focusing on the meaning of the plots obtained, and how to add a best fit line using +geom_smooth() geometry to the gg-plot command. We played a bit with a tool to help us get better at “seeing” the correlation of plotted data: guessthecorrelation.

Intro to Statistics

This part of the class was further levelling by going over the basics of statistics and how they may be used to summarise or infer information. Central tendency includes arithmetic mean, median and mode values. Measures of dispersion like variance and standard deviation were also discussed.

Standard deviation: \[ \sigma = \sqrt {\frac{1}{N} \sum\limits ^N _{i=1}(x_i - \mu)^2} \]

These functions were illustrated in the R IDE, including accessing columns within datasets using the dollar sign like this:

mean(iris$Petal.Width)	#using mean formula

Similar functions offer median, variance and sd (standard deviation) calculations.

Visualise the distribution: Histogram and Density plots

There is a nice cheat sheet for ggplot data visualisation tips.

Subsets of the data based upon some category can be easily made:

virginica <- subset(iris, Species=="virginica")

A table of data around these subsets can be quickly constructed and plotted:

#Values of Virginica
mean(virginica$Petal.Width)
median(virginica$Petal.Width)
var(virginica$Petal.Width)
sd (virginica$Petal.Width)

#... etc for the other two

# Make a new table...
Species <- c("setosa","versicolor", "virginica")

# add columns...
Mean <- c(mean(setosa$Petal.Width), mean(versicolor$Petal.Width), mean(virginica$Petal.Width))
Median <- c(median(setosa$Petal.Width), median(versicolor$Petal.Width), median(virginica$Petal.Width))
Variance <- c(round(var(setosa$Petal.Width), digits = 2), round(var(versicolor$Petal.Width), digits = 2), round(var(virginica$Petal.Width), digits = 2))
SD <- c(round(sd(setosa$Petal.Width), digits = 2), round(sd(versicolor$Petal.Width), digits = 2),round(sd(virginica$Petal.Width), digits = 2))

# Make the data frame for plotting...
FullPlot <- as.data.frame(cbind(Species, Mean, Median, Variance, SD))

Notice you can double-click on the FullPlot table in the environment box (equivalent to the command View(FullPlot)), it will display the table in a new tab for inspection.

# ... and now plot it in a nice histogram

ggplot(iris, aes(x=Petal.Width, fill=Species))+ 
  geom_histogram(alpha=0.8,color="black", binwidth=0.08)+
  geom_vline(aes(xintercept = mean(Petal.Width)),col='red',size=2)+
  theme_bw()+facet_wrap(~Species, ncol = 1) 
  

Here’s how that looks:

The next challenge, number 2

Once again, I got nowhere with the challenge in the 10 minutes we had to do it. I spent most of that time, after loading the dataset, trying to figure out how to make a scatter plot. When we reviewd the problem at the end of the class, it became apparent that the question was looking for a histogram, not a scatter plot. Getting stuck here meant that I didn’t know how to approach the second part of the challenge, and didn’t progress to reading the third.

Class 4 - Boxplot and playing with Colours

  • Discuss Challenge 2
  • Boxplots
  • Playing with colours
  • Exporting graphs

Discussing Challenge 2 and feeling lost

No time to spend on this before the next class, so I just sat back and tried to follow the discussion1.

Boxplots

Boxplot

  • The thick black line is the median.
  • The boxes represent 50% of the sample closest to the median
  • The whiskers correspond to 95% of the sample closest to the median aka 2SD from the median
  • The dots represent the outliers

In R, boxplots are another geometry2:

ggplot(iris, aes(x=Species, y=Petal.Width, fill=Species)) + geom_boxplot()

Playing with colours

There are a number of colour themes within R to allow you to make beautiful and readable plots and graphs. What is important, as with all open source, is that when using different packages and utilities, you must RTFM3.

These are invoked with colour commands and palettes, like scale_color_manual here:

ggplot(iris, aes(x=Species, y=Petal.Width, color=Species)) + 
  geom_boxplot(outlier.alpha = 0)+
  scale_color_manual(values = wes_palette("GrandBudapest2", n=4))

# other color methods:
scale_color_grey(start = 0.6, end = 0.1) # grayscale
scale_color_manual(values=c("#80ec65", "#145200", "#700015")) # manually defined
scale_color_gradientn(colours = rainbow(7)) # gradient

Exporting graphs

Pretty self-explanatory from the IDE: exporting graphs to pdf or png is available from the dropdown in the plots tab.

Challenge number 3

Well, today I managed to get a result during the time allocated for us to work on the problem, which made up for how I had been feeling earlier. With a little tweaking after class, I settled on this:

ggplot(college, aes(x=region, y=faculty_salary_avg, fill = region)) + 
  geom_boxplot(outlier.alpha = 0.0, alpha = 0.8) +
  geom_jitter(alpha =0.2)+
  theme_bw()+ 
  facet_wrap(~control, ncol = 2)+
  labs(title = "USA University", 
       subtitle = "Faculty salary by region", 
       x = "Region", 
       y = "Average salary")

which yielded this:

Challenge 3 plot

Next class

  • Data Collection
  • Datasets biases
  • Inferential statistics
  • Probability

Notes and references

  1. This shook my confidence a little, which took me back to my school days, the last time that I was so lost in a class that I wasn’t even aware when the teacher was asking a question. All you can do in this situation is wait for the next section and try to be invisible. Part of the difficulty I am having is the heavy accent of the instructor, which causes me to miss signposts and words. It made me think of all the EAL kids in Scottish schools. 

  2. There is a complete catalogue of the graphs available in R at r-graph-gallery.com

  3. Read the manual. 

Statistics and Visualisation with R⤴

from @ @cullaloe | Tech, tales and imagery

This is the first in a series of posts containing my notes from a course on Statistics and (data) Visualisation with R, presented by Lucia Michielin of the University of Edinburgh. The course is running two days a week in June and July.

First week

The learning environment

The course was presented in a blended form using Collaborate, Blackboard Learn and Slack. These represent three key elements in a remote learning environment:

  • Synchronous, whole-class activity, usually instructor-led, but may include “break-out” sessions for students to work together in groups.
  • A resources repository for files, course outlines, presenter’s notes and assignments.
  • A back-channel for students to interact in, and for interaction with the course leader. This may be synchronous and asynchronous, but Slack is only used in this course for offline use. Collaborate chat is used in the live sessions.

The offline chatter and forum space is useful in courses like this, which is why Slack is such a useful tool here. Microsoft teams has been suggested as an equivalent to Slack but I really don’t find it intuitive or useful in the same way as Slack. The UI, as with all Microsoft products I have ever used, is difficult and intrusive.

Preparation

The pre-course task is to install R and the R-Studio IDE. I painlessly installed on my Mac R 4.0.0 “Arbor Day” and R-Studio 1.3.959, and downloaded the data files for the first week.

Programme overview

Class Date and time Title
Class 1 01 June 13:00-14:30 Intro to R and R studio
Class 2 02 June 13:00-14:30 Types of data and Grammar of graphs
Class 3 08 June 13:00-14:30 Intro to statistics and descriptive stats
Class 4 09 June 13:00-14:30 Boxplot and playing with Colours
Class 5 15 June 13:00-14:30 Data Collection Bias, Probability, and Distribution
Class 6 18 June 13:00-14:30 Hypothesis testings and the main tests 
Class 7 22 June 13:00-14:30 Barcharts and cleaning the sample
Class 8 23 June 13:00-14:30 PCA and Cluster analysis
Class 9 29 June 13:00-14:30 Covariance, Regression, Similarity and Difference coefficients
Class 10 30 June 13:00-14:30 Recap and Bring your dataset class

Class 1 - Intro to R and R studio

This began with an introduction on The Edinburgh Centre for Data, Culture & Society (CDCS) and their other courses and research projects, followed by an introduction to our course leader, Lucia, and a walk around the VLE folders and files. Then, an overview of the first session:

  • Intro to Quantitative methods in Research
  • The R and R studio Interface
  • How to organise your work in R efficiently
  • How to install packages

This is fairly self-explanatory but an essential levelling for all delegates on the course, who come from a range of backgrounds. Most seemed to be researchers.

R and R-Studio

R is the language, and R-Studio is a graphical IDE for working with projects and data using R. Lucia said that the key skill in acquiring a new language is knowing how to Google: R is widely used and has a large user base and the forums are very helpful in quickly overcoming problems.

Scientific method

We heard about the scientific method and in particular the issue of correlation vs. causation. This, in context of remaining close to the process, and the meaning of the data, whilst engaging deeply with the IDE.

  • Define a research question
  • Explore the evidence
  • Define working hypotheses
  • Translate hypotheses → statistical models
  • Compare models to evidence
  • Interpret results

Getting our hands dirty

Finally we got to open R-Studio and follow Lucia through opening the script for today and take a walk around the IDE. We started with creating a new project and setting up a folder organisation for the course, then played with settings to make the IDE more comfortable for us.

Comments and header syntax was explained, then using an immediate mode of running commands in the console (by pressing cmd-enter). Outputs are presented in vectors: index 1.

# THIS IS A LEVEL 1 HEADER #################################

## This Is a Level 2 Header ================================

### This is a level 3 header. ------------------------------

print('Hello World!')

## Define Variables ================
x <- 1:5 #Put the numbers between 1-5 in the variable x (alt- is a short code for <- in the IDE)
x #Displays the values we have set in x

Getting help is a matter of issuing a query command: e.g. ?datasets displays help on the Datasets package.

Packages

Packages are installed and activated thus:

install.packages("ggplot2")
?install.packages  # Get help on installing packages

## Activate packages ==========================
library(ggplot2)

We ended the session by installing something called the “Tidyverse”, which seems to be a collection of useful packages. It calls itself an “opinionated collection of R packages designed for data science”.

Class 2 - Types of data and Grammar of graphs

  • Type of Data
  • How to load data
  • How to choose the right Graph
  • Graph types
  • The Grammar of Graphs
  • Ggplot2 structure
  • Graph settings

Data types

Type Description
Vector One dimensional collection of data, like a column or row, all of the same type
Matrix Two-dimensional collection of data, all of the same type
Array Multi-dimensional collection of data, all of the same type
Data frame Mixed type variables (similar to a table in spreadsheet)

Reading a table by convention has variables in columns, and the observations or data values are in the rows. Vectors, arrays and matrices in R all must contain the same data type, and a data frame is the only construct that allows mixed types.

Keeping the environment clean

# Clear environment
rm(list = ls())

# Clear console
cat("\014")  # ctrl+L

Loading data

df <- read_csv("data/RodentSimplified.csv")

# ... same, with selection and mutation of month column name:

df2 <- read_csv("data/RodentSimplified.csv") %>%
  select(mo,dy,yr,period,species) %>% 
  mutate(period = as.factor(period)) %>%
  rename(month = mo) %>%
  print()

What looks like a continuation line in the above code block is called a “pipe”, written %>%. A pipe is much more than just for making the code look pretty: it really does work as a pipe to transfer a value between subsequent function calls. See this and this and this for more details.

Graph types

After some warnings on selecting data presentation that is consistent with your intention to actually communicate something, and not to show off, Lucia introduced us to some of the graph types available with R. She gave us a really nice demonstration why many journals prefer data presentation (e.g. ratios) as a bar chart rather than a pie chart. Differences are sometimes clearer in the bar chart. Apparently, Florence Nightingale invented the pie chart!

The Grammar of graphics

The ggplot function allows the creation of graphs according to the framework outlined by Wilkinson1 in which the separate aspects of data presentation are dealt with separately. A basic template is described here with examples for ggplot. My take on the framework is:

  1. a data set
  2. aesthetics = a set of scales and variables to be plotted
  3. geometry = shapes to represent the data and the type of chart
  4. a grid of subplots, or facets
  5. statistics = summaries of the data
  6. coordinates to define the plotting space
  7. global themes and labels to make it look pretty

Some of these map into the ggplot function, from the most basic (using examples from the weekly challenge, below):

ggplot(college, aes(x=sat_avg, y=admission_rate)) + geom_point()

Here, the aes part is the aesthetics definition of the variables for the plot; geom_point defines the type of chart (scatter plot), and college is the data set.

The weekly challenge

I completely failed to get anything at all out of the challenge in the 5 minutes we were given for this task. My code just didn’t produce any kind of plot, just barffing errors at me. This was entirely down to my inability to type or read carefully what I had typed. Later, I managed to make some progress:

# Import CSV files from online repo
college <- read_csv('http://672258.youcanlearnit.net/college.csv')
college <- college %>%
  mutate(state=as.factor(state), region=as.factor(region),
         highest_degree=as.factor(highest_degree),
         control=as.factor(control), gender=as.factor(gender),
         loan_default_rate=as.numeric(loan_default_rate))
# Create a chart that would show the relationship between the SAT average and the admission rate 
ggplot(college, aes(x=sat_avg, y=admission_rate)) + geom_point()

Time is not in abundance at the moment, so I did eventually just look at the solution and run it, seeing how each element of the code contributed to the final result. Here’s my result:

Next class

  • Discuss the results of the challenge
  • Intro to Statistics
  • Descriptive Statistics
  • Summarising Statistics
  • Exploring 1 variable: Plotting distribution Histograms and Density plot

Notes and references

  1. Wilkinson, L. (2005) The Grammar of Graphics, The Grammar of Graphics. Springer-Verlag. doi: 10.1007/0-387-28695-0.