Tag Archives: CPD

Data Ethics, AI and Responsible Innovation⤴

from

In November, I took part in 4 weeks of a 5-week MOOC offered by the University of Edinburgh via the edX platform, Data Ethics, AI and Responsible Innovation. I had various difficulties with the course itself, culminating in a barrier to my continuing.

You can read my notes on the course, including a personal reflection, at the blog I kept on the MOOC.

Climate Change Mitigating Technologies⤴

from

I had the chance to sit in on the cross-party group on science, in which there were two presentations on the topic, the first from Rebecca Bell, Scottish CCS1 on Carbon Capture and Storage. The second was given by Richard Gow, Drax2 on Bioenergy with Carbon Capture and Storage. The latter presentation called for policy help in rewarding negative carbon emissions, which are an odd omission from the accounting model used in climate change impact measurement.

Both provided a really useful understanding and overview of what carbon emission and capture is about and how it is working, with an emphasis on what is happening in Scotland within a very clear European context. I found the presentations, both neither slick nor sales-focused, extremely engaging and helpful in thinking about CO\(_2\) emissions.

There was a lively and wide-ranging Q & A session chaired by Craig Denham of the RSE. Questions were both technical and social: there was good representation of young people through, for example, asking about the skills required to find careers in CCS. My own question:

For teachers, are there any behaviours they can model for young people that will enable them to take a specific personal responsibility for action in tackling CO\(_2\) accrual in the atmosphere?

I suspect this was a question outside of the scope of the presentations (focusing on individual action) but it was picked up by Craig, which I am thankful for. Richard picked this up first and acknowleged the criticism of BECCS for being remote from personal action but pushed back against this by linking to personal choices such as taking less flights. Rebecca added to that by pointing to transport choices like taking your bike, or wearing a jumper instead of turning up the heating, which are easily modelled and reinforced by educators. She also pointed to SCCS resources related to CfE, and the LfS Scotland resources. I particularly liked the GeoBus Education Resources site which is designed to provide teachers with an introduction to CCS, providing experiments, activities, lessons and homework ideas as well as links to a number of other useful CCS education resources, which are linked to English Key Stage 3 and Scotland’s CfE: this pdf links the resource to the Experiences and Outcomes.

The resources available in the websites of both organisions are very accessible and immediately useful in schools in, for example, projects within the interdisciplinary topic of sustainable energy production. It is particularly warming to see the interest and promotion of positive problem solving through the cross-party group. I am thankful to them for opening up this session to interested parties and applaud the the work being done by SCCS and Drax.

Footnotes

The header image is part of an infographic available at SCCS.

  1. Scottish CCS is “a partnership of the British Geological Survey, Heriot-Watt University, the University of Aberdeen, the University of Edinburgh and the University of Strathclyde working together with universities across Scotland.” 

  2. This is the group that operates Drax Power Station which is moving from coal-fired to biomass and leads on innovation and development in the technologies of Bioenergy with Carbon Capture and Storage (BECCS). 

IOP CLPL: Smartphone practicals⤴

from @ @cullaloe | Tech, tales and imagery

The IoP in Scotland is putting on a rich catalogue of online CLPL for the community of physics teachers, and presented by colleagues from that same community. I accessed one of these after the event, because it had been recorded and published by Drew Burrett on YouTube.

This session was hosted by Stuart Farmer and Jenny Hargreaves, and presented by Murat Gullen and Martyn Crawshaw, who gave us a practical introduction to using tools for data capture and analysis using the suite of sensors on most modern phones. They presented a brief rationale, acknowledging that although not all children have a smart phone, most have access to one. The scope (pun intended) for teachers to offer practical sessions using their own equipment was also underlined.

The first tool discussed was PhyPhox which not only can access all of the phone’s sensors, it has a number of built-in activities and tools to make use of the data captured. These can be exported in several formats, or posted to Dropbox for later discussion. What I hadn’t realised is that there is a built-in web server accessed through the “triple dots” at the top right of the phone screen which enables display and control of the app from any nearby browser, provided that the computer is on the same network (it uses a 192.168.x.x IP address).

Screenshots of Phyphox on iPhone and Browser (uses the language of the phone)

Martyn started by talking about Vernier video tracking and analysis software, similar to the java Tracker program. He went on to demonstrate Pasco’s SparkVue app, what Martyn called the “twenty-first century version” of Data Studio found in many schools. It allows connection to onboard sensors and also Pasco equipment in the lab.

It was made clear that teachers should take care not to assume that their pupils are tech-savvy enough to know how to use all of the tools and interfaces without support and guidance.

“Don’t lose the learning!” – Martyn Crawshaw

Excellent stuff, as ever, with a lively Q & A at the end. I’m going to share this with my students.

Datacamp course – intermediate R⤴

from @ @cullaloe | Tech, tales and imagery

Continuing my journey into R, the next course in the R programming track at DataCamp is Intermediate R. This course is presented by Filip Schouwenaars. It teaches language syntax and programming conventions, building on the last course.

Conditionals and Control Flow

Relational operators

This section begins with a talk-through the main relational operators in R, with simple examples, followed by exercises in the virtual lab.

> TRUE == TRUE			# Equality
[1] TRUE
> 'oranges' != 'apples'	# Inequality
[1] TRUE
> 'oranges' > 'apples'	# Strings compare alphabetically
[1] TRUE
> 'oranges' < 'apples'
[1] FALSE
> vec <- c('apples', 'bananas', 'dragon fruit', 'tomato')
> vec > 'oranges'		# Works on vectors (and matrices)
[1] FALSE FALSE FALSE  TRUE

TRUE coerces to the value 1, FALSE, 0. So truth is greater!

Logical operators

Syntax for these familiar operators is &, | and !, for logical AND, OR and NOT, respectively. They have high precedence and therefore do not need brackets around expressions:

> 4 > 3 & 8 <= 9
[1] TRUE

Logical operators may be used on matrices and vectors:

> !c(TRUE, FALSE, 1 > 0)
[1] FALSE  TRUE FALSE

Note that double-signed operators like && work only on the first element of a vector.

Conditional statements

Again, familiar syntax here, with the conditional test in brackets; code blocks in curly braces; and two statement words, if and else:

x <- 0
if (x < 0) { 
    print ('x is negative')
} else if (x == 0) { 
  print ('x is zero') 
} else { 
    print ('x is positive') 
}

Notice that the else and else if statements come on the same line as the closign curly brace of the associated if statement. Once a conditional test evaluates TRUE, the corresponding code block is executed and the remaining code within the if control structure is ignored. Conditional statements may be nested.

Evaluation and next steps

There is a greater teacher presence in this course than the previous, through the use of video presentations to support the hands-on interactive labs.

Thus far into the R Programming Track with Datacamp, I have stopped because I have hit an unexpected paywall. Continuing requires a commitment of at least $25 per month, which is good value if I were continuing with courses several hours per day but not appropriate for my current ad-hoc engagement. The day job takes priority, which means all of the available time mostly. I’ll be switching to other resources from now, probably starting with R for Data Science1, or at least the online version.

References

  1. Grolemund, G. and Wickham, H. (2016) R for Data Science, O’Reilly Media. 

Datacamp course – introduction to R⤴

from @ @cullaloe | Tech, tales and imagery

Having abandoned the data visualisation course run by Edinburgh University, and wanting to gain some further competence in R, I took the DataCamp “Introduction to R” course. This course is written by Jonathan Cornelissen, one of the founders of DataCamp and a man with seriously good credentials in R.

Basics

Assignment and operators

a <- 4		# assignment 3 ways
4 -> a
a = 4

1 + 2		# mathematical operators
4 - 3 
6 * 5
(7 + 9) / 2 
8^2		# exponentiation
10 %% 4		# modulo

x < y		# less than
a > c		# greater than
a <= b 
j >= k 
one == two	# equal to
up != down	# not equal to

Data types

12.5 / 2.5	# numerics
7 + 123		# integers are also numerics
7 = 3		# Boolean (TRUE or FALSE) are logicals
"Hello world"	# characters

class(x)	# what data type is x?

Vectors

A vector is a one-dimensional array (think of a row in a spreadsheet). In research, this is a single observation.

# using the combine function to create a vector
a_numeric_vector <- c(1, 2, 3, 4, 5)

# vectors can have column names
names(a_numeric_vector) <- c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday")

# printing the vector outputs the element names:
> a_numeric_vector
   Monday   Tuesday Wednesday  Thursday    Friday 
        1         2         3         4         5
> 

# using a vector to hold the column names
days_of_week <- c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday")
names(week-values) <- days_of_week

You can do some quick and easy arithmetic with vectors.

low_nums <- c(1, 2, 3, 4, 5)
hi_nums <- c(6, 7, 8, 9, 10)

total_nums <- low_nums + hi_nums

> total_nums
[1]  7  9 11 13 15

sum(low_nums) 	# adds up the elements in the vector
mean(low_nums)	# average of elements in the vector

low_nums[3]	# print the third low number (note 1-index)
hi_num[c(2:4)] 	# just get the middle values

The selection of elements can be conditional using boolean values in another vector.

> c(49, 50, 51) > 50
[1] FALSE FALSE TRUE

> nums <- c(1:99)	# vector of the first 99 integers
> fives <- nums %% 5
> nums[fives == 0]	# all of those divisible by 5
 [1]  5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95

In the last example above, fives == 0 is a vector of boolean values. Used as a selector in the nums vector, only the TRUE elements are selected.

Matrices

A matrix in R is a collection of elements, all of the same data type, arranged in 2 dimensions of rows and columns.

> # A matrix that contain the numbers 1 up to 9 in 3 rows
> matrix(1:9, byrow = TRUE, nrow = 3)
     [,1] [,2] [,3]
[1,]    1    2    3
[2,]    4    5    6
[3,]    7    8    9
> 

The access indicators are shown in the row labels and column headers above. So, element [2,3] of the matrix contains the value 6. The first row of my_matrix is the vector my_matrix[1,]. Row and column names can be set for matrices, as they can be for vectors. This can be done by calling rownames() and colnames(), or at the time the matrix is set up.

# Construct star_wars_matrix
box_office <- c(460.998, 314.4, 290.475, 247.900, 309.306, 165.8)
star_wars_matrix <- matrix(box_office, nrow = 3, byrow = TRUE,
                           dimnames = list(c("A New Hope", "The Empire Strikes Back", "Return of the Jedi"), 
                                           c("US", "non-US")))

The function cbind() binds columns to an existing matrix. rbind() does the same thing for adding row vectors to a matrix. rowSums() and colSums() do what they sound like - making new vectors ready to be bound into the source matrix if required.

Arithmetic operators work element-wise on matrices.

Factors

A factor is a data type used to store categorical variables. These are discrete variable which can only take a finite number of values (cf. continuous variables which can have any of an infinite set of values, like real numbers). R can make a vector of the categories from a vector of categorical values:

> birthdates <- c(12,4,13,23,31,16,1,9,12,4,8,24,27,25,24,25)
> birthdates
 [1] 12  4 13 23 31 16  1  9 12  4  8 24 27 25 24 25
> bd_factors <- factor(birthdates)
> bd_factors
 [1] 12 4  13 23 31 16 1  9  12 4  8  24 27 25 24 25
Levels: 1 4 8 9 12 13 16 23 24 25 27 31
> 

Such variables are nominal or ordinal according to whether they are just names, or if they can be ranked in some meaningful way. Ordinal factors are created with additional parameters, e.g., order = TRUE and levels = c("low", "high") and can be compared easily.

Data Frames

A data frame has the variables of a data set as columns and the observations as rows. A quick peek at the structure of a data frame is provided by head() and tail() functions, e.g.:

> head(mtcars)
                   mpg cyl disp  hp drat    wt  qsec vs am gear carb
Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1
>

mtcars is one of the many data sets built into R. A list of them is obtained by calling data(). str() provides a look at the structure of a data set:

> str(mtcars)
'data.frame':	32 obs. of  11 variables:
 $ mpg : num  21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
 $ cyl : num  6 6 4 6 8 6 8 4 4 6 ...
 $ disp: num  160 160 108 258 360 ...
 $ hp  : num  110 110 93 110 175 105 245 62 95 123 ...
 $ drat: num  3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
 $ wt  : num  2.62 2.88 2.32 3.21 3.44 ...
 $ qsec: num  16.5 17 18.6 19.4 17 ...
 $ vs  : num  0 0 1 1 0 1 0 1 1 1 ...
 $ am  : num  1 1 1 0 0 0 0 0 0 0 ...
 $ gear: num  4 4 4 3 3 3 3 4 4 4 ...
 $ carb: num  4 4 1 1 2 1 4 2 2 4 ...
> 

Columns of a data frame are added one column vector at a time as a list of parameters in the function call data.frame(). Selecting a data point from row 32, column 2 is a matter of calling df_bears[32,2]. Note the order - observation (row) first, then variable (column). A whole observation (e.g. the tenth) is obtained by df_bears[10,]. The first 4 data points from the paw_size column are df_bears[1:4,"paw_size"]. The whole column vector is df_bears$paw_size (notice the dollar sign notation). Subsets can be made calling subset(df_bears, paw_size < 4). Sorting can be achieved by making a vector of the data frame order, based upon the columns you are interested in:

> a <- order(df_bears$claw_size)
> df_bears[a,]
> 
> mtcars[order(mtcars$disp),]
>                      mpg cyl  disp  hp drat    wt  qsec vs am gear carb
Toyota Corolla      33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1
Honda Civic         30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2
Fiat 128            32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1
Fiat X1-9           27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1
Lotus Europa        30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2
Datsun 710          22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
...

Lists

Lists can contain arbitrary data and data types. They are constructed by calling list() with optional names for each component, e.g. list(top_dogs = df_dogs[1:10,], top_cats = df_cats[1:10,]).

Evaluation and next steps

I’ve found this introduction differently paced to the earlier introduction course run by the university. Because there is no instructor, attention has been paid to very small details: every aspect of the course works because it is programmatic. Learners have to take the right (small) steps to complete the exercises successfully. Errors are picked up and RTFQ-type prompts are given. This was less challenging than the earlier, demonstrator-led course but I completed this one, instead of bailing out feeling frustrated and weak. I also learned considerably more that is useful and earned a more secure foundation for further study.

I am working with RStudio on a daily basis now as I am producing documentation and course materials with Bookdown. My intention is to further develop competence with R and R-markdown.

OneNote Teacher Academy⤴

from @ @cullaloe | Tech, tales and imagery

The Microsoft OneNote Teacher Academy is a learning path comprising four short courses introducing the use of OneNote in teaching for lesson planning, assessment and activities for learning:

  • Getting Started with OneNote
  • OneNote Class Notebook: A teacher’s all-in-one notebook for students
  • OneNote Staff Notebook: Tools for staff collaboration
  • Independent learning with math tools in OneNote

These are my notes taken as I followed the learning path, including the reflection questions and my responses.

Contents

Getting started with OneNote

The course makes use of lock-step interactive videos which step you through mouse clicks to achieve the various functions in OneNote, which at first I thought was patronising, but which actually gets you used to the interface quickly. If you’re following and feel the same thing, stick with it.

Module 1: Getting started with OneNote for Windows 10

So, the introduction tells us that OneNote is a way of “staying organised, quickly capturing ideas, and accessing information across devices”. It’s a digital notebook that seems to rely on Windows 10 - the intro video seems to assume that all participants are using that OS and steps through “how to use Windows”. The OS is different on my MacBook, so the introductory video doesn’t work well at the start, but it gets better once you start using the OneNote app.

Reflection

What aspects of organizing words does OneNote mimic? - It’s like a real notebook and helps you make little documents or pages of notes.

What are some ways you might organize your sections and pages? - into chapters or topics, maybe. Gather those into project notebooks.

What topics might be an easy place to start a notebook or section? - Course structure, syllabus, one note per week, maybe.

Module 2: Getting to know OneNote for Windows 10

OneNote is organised into pages, which can be organised in sections like the tabs in a FiloFax. It has a built-in full-text search function which I think is a very good thing for students to have - it’s one of the major reasons I turned from Learn to Moodle 3 years ago.

Items like text boxes in the pages of OneNote are drag-and-droppable, a bit like the Apple Pages app, and allow a great deal of flexibility over how the pages look. Each page is like “a digital canvas” that can accept media, images and ink (from a stylus), or embed things like YouTube videos. Items can be hyperlinked.

Audio or video feedback can be inserted into pages on the fly.

User help in the interface is available through the “lightbulb” icon, aka “Tell me”.

Reflection

What are the benefits of being able to start writing or typing anywhere on the page? - you can be creative with the flow and functionality of your resources and activities.

What are the benefits of being able to insert pictures and media anywhere on the page? - same again, creativity in making logical storyboards for learning.

Module 3: Getting the most out of OneNote for Windows 10

This module exemplified the use of digital ink, (re)organising materials and sharing work in a team using OneNote. A number of workarounds in the interface were also exercised in here, such as when you need to right-click, or use a non-obvious menu to perform aesthetic changes like page colour. There are neat functions built in for tablet users in particular, such as “ink to shape” and “ink to text” that convert your scruffy notes and scribbles into something somebody else can read. The glory of form over function!

Inserting video links will embed the video itself.

One very interesting feature is the researcher tool, which is used for pulling together web sources and their citations. I’m not sure it’s going to lead to good academic writing, but I’ll try it out to see how it works.

Reflection

How much time do teachers spend looking for files in old email threads? - none at all. Not this one, anyway, I’m better organised than that. I delete almost everything once I’ve detached the information I need.

How does OneNote’s organization save teachers time? - I’m not convinced it does, yet. I think that quick re-organisation of pages and sections is what is being referred to in this question.

What is the benefit to being able to access information from all of your devices? - you can drop your device off a bridge and not lose any information.

Module 4: Digital Ink

This section described the time-linked digital ink feature of OneNote which allows you to step forward and back through the sequence of pen strokes. A teacher can do this in a student’s notes, too, to enable them to replay, video-like, the thinking of the learner as they constructed their responses. A blog post advertises the features in the recent redesign of OneNote and it does look impressive in the use case shown in the promotional video. I particularly liked the attention that has been paid to achieving consistency across different platforms and devices:

OneNote users often use a range of devices. Having a cohesive user experience across all screens makes it simple for users to jump from one device to the next.

This also helps promote consistency of experience in a learning community, enabling a common culture of understanding in collaboration.

Reflection

When would you encourage students to use the “playback” feature? - when revising previous work, or when looking at teacher feedback.

Would this be beneficial for a teacher looking at a student’s homework assignment? - of course, to uncover misconceptions.

Do you think teachers would use this as they take notes for a class or meeting? - I need to try it first, but it might enable the making of useful video tutorials for revision, or accessibility, or for students who are unable to participate live.

Trying it out

I went to my own instance of OneNote to try this playback feature out but couldn’t find it at all, which was frustrating. Nor could I find things like “ink to text”, the “tell me” lightbulb, or many of the features this course has described. Perhaps the OSX version of OneNote I am using doesn’t know about the consistency across devices this module is talking about. Further reading reveals that although they are superficially similar, there are substantial differences in the feature sets in OneNote, depending on which device you are running it on. This gives the lie to the earlier content, something I am genuinely disappointed to discover, having made a real effort to suppress my visceral, justified and hard-earned distrust of All Things Microsoft.

OneNote links tend to open not in the application but a browser window - yet another version of OneNote that doesn’t have the features described so far in this course. The ecosystem seems to be significantly varied across platforms.

The very last line of module 5:

This course covers features in the OneNote Windows 10 app. To see which features are available in other OneNote versions, visit this site: The differences in OneNote.

The list isn’t complete or accurate, for example, the replay function I was genuinely excited by, isn’t in the list. Spending more time exploring the OneNote interface created more difficulty. I tried to connect to Teams, where I have set up a team for the new cohort and those remaining from previous years, but with little success, and little help when it didn’t work:

Screenshot on connection

Module 5: A content library brimming with rich resources

The last module in this “getting started” course is a collection of videos or examples of content made by other teachers (“OneNote Ninjas”). I couldn’t see how the content library worked at all, nor how it worked in teaching practice. Some clues were deducible from the end of course quiz - it seems to be a read-only resource space for learners.

Reflection

How might the Content Library be a unique place for students in your class? - in the same way perhaps, as the class cupboard is unique.

What kinds of media might you add to make your Content Library a go-to place for students? - not sure, because I couldn’t get it to work, but I would hope any digital media.

Once your Content Library is complete for one class, would it take much to alter it for next year’s class? - it depends on the class and any changes to the course but in principle, maybe just change all the dates.

The quiz

A very low bar to leap over. For example…

  • OneNote can locate any word on any page in any of your notebooks with the ____ function.

    • Looking
    • Hunt
    • Search
    • Where is it?

Still useful, though, because the quiz added some details that weren’t in the course content.

OneNote Class Notebook: A teacher’s all-in-one notebook for students

The course uses the same approach as the first, with interactive videos.

Module 1: Getting started with a OneNote Class Notebook

The Class Notebook is intended to help teachers deliver content, work collaboratively and provide private feedback to learners. It is described as being like a physical notebook, filing cabinet and whiteboard in one. This is achieved by providing a private notebook for each student; a content library; a teacher-only area; and a space for collaboration with students or staff.

The teacher can look at any of the student notebooks, and provide private feedback or commentary. The collaboration space allows for peer work and assessment. The content library is only writable by the teacher.

Reflection

What types of materials would be best housed in the Content Library versus the Collaboration Space? Think of a few ways to organize your Class Notebook and start by creating one section and giving it a try. - Static materials which do not require editing or annotation by students.

I created new sections within the collaboration space and content library, and even created a new section group, but cannot immediately tell how access is managed in these groups. Further investigation revealed that you cannot do this in the OneNote app: it can be done by logging in to the Microsoft OneNote website and finding Manage notebooks. I found this difficult because the browser version of my Class Notebook and the OneNote version were different (different groups, and sections not in sync). The browser interface offered connection to the notebook, which it did in the browser, but still gave me no insight into who could see my new group. The lightbulb “Tell me what you want to do” offered a very narrow scope (menu items) or a wider search using Bing (what else, when you’re in a corporate prison?) which isn’t available without acknowledging data permissions. I’m stuck in a hole within an application infrastructure which is utterly getting in the way of what I am trying to do.

Module 2: Setting up a OneNote Class Notebook

It says here, “OneNote Class Notebook can assist teachers in providing:

  • Individualized instruction
  • Meaningful feedback
  • Rich multimedia content”

We’re getting a lot of repetition in the videos now, and more patterns emerging. Setting up a OneNote Class Notebook starts not in the app, but on (yet another different) website: Office 365. This can also be done via the MS Teams app: I set up a class notebook for the new cohort Team OK. The dialog for this task sets out the default access permissions for the Collaboration Space; Content Library; and Student Notebooks.

Class notebook setup

Once the class notebook is set up in Teams, you can open it in the OneNote app. Deleting a notebook is a completely different matter: it is ridiculously hard to do this and will take you and hour or so of Googling: different procedures are needed according your platform, Office account setup and which way the wind is blowing. I followed this set of instructions to delete the class notebook I had set up, naïvely, in the OneNote app. That didn’t work at all, because the menu selections on the website were different (there is no “Manage and Delete” option in OneNote today). I have no idea how to delete a NoteBook. I think it can be done through the online access to OneDrive, but I can’t tell if I deleted a link, or an actual item, or whether in fact it was just moved somewhere else. That situation gives me no confidence in this software suite at all. The more I learn about it, the more I dislike it. Still, we’re here now, so let’s see this through…

Reflection

In what ways would OneNote Class Notebook allow you to differentiate your instruction to various students? - by writing different things in their notebooks. By providing different materials for them in the content library.

What is it about Class Notebook that might expedite giving feedback compared to traditional paper assignments? - synchronisation. The student’s notebook is updated as soon as the teacher writes in it.

Module 3: Getting the most out of OneNote Class Notebook

It says here, “OneNote Class Notebook has many features to help teachers:

  • Distribute assignments
  • Write feedback
  • Share with parents - a unique link”

OK, I get it. It’s like Church. Keeping saying the mantras, singing the hymns, re-stating the creed, and eventually we all are saved. The more I am sold to, the less I want to buy. I’m going straight to the reflections before watching the video this time:

Reflection

What are the advantages to verbal feedback for a student? - it seems personal, and can be nuanced in non-verbal ways through tone and gesture. It can be individual, and can be replayed (if a clip is available, which presumably it is) at different speeds for comic effect to help understanding.

What are the advantages to verbal feedback for a teacher? - it might be quicker, especially if you do it in one take. There are some very real disadvantages too, in that it can be replayed in contexts you hadn’t intended.

When might teachers use this feature? - I wouldn’t normally, unless I had a vision impaired student who preferred it, but can see why some might be tempted to try it.

Things just don’t work

I discovered more functionality glitches as I followed the “getting the most…” video. I think these are bugs, rather than poor connectivity between the various interfaces - Teams, the OneNote app, the different websites that manage the spaces, etc. Things just don’t work: it’s clunky and inconsistent in daily usage, resulting in huge time costs for the teacher. For example, in OneNote app, I found “Manage Notebooks” which popped up what looks like a browser container, where I added a teacher only space without errors. This did not add the teacher only space in the Class Notebook in the app, but it was visible in the browser. Browser and app were synching OK, because I could type in one and see the changes with less than a second of latency in the other1.

I like the idea of “distributing” a page (assignment) to all student notebooks at the same time, then being able to review all of their completed edits of their copy of that page (assignment) simply in the interface. I am not sure I can rely on that without testing it first, however, so am not intending to spend any time on this. The collaboration space idea is implemented in a difficult and weird way also: individual groups of students can be allocated to sections within that space but if a deadline passes, the entire collaboration space must be locked rather than the individual section.

I am increasingly coming to the conclusion that BlackBoard Learn might be my best option for Physics PGDE in the coming session. This is a real pity because of Learn’s deficiencies. I wonder if I will go maverick again and launch a Bookdown site as a searchable resource for my students.

Module 4: Immersive Reader and Researcher

This module provides detail on some of the accessibility features built into Class Notebook, including the Immersive Reader, which is a nice idea but again, just does not work reliably. It failed to recognise login credentials when I tried to use it the first time but I was able to get it working once I’d logged in and out a few times of the various interfaces. When it does work, it’s a screen reader - “immersive” meaning that it removes sidebars and ribbons, displaying just the text of the document you’re looking at in a large font, with a moving highlighter that follows the voice.

Researcher is a weak tool that lowers the access bar for students who need help doing basic searches for information. It seems to have been lifted, like other tools in this suite, from the main Office toolset.

Reflection - see above.

Module 5: Making Math Happen in OneNote

This module is not about learning mathematics and is definitely not about “… a virtual math tutor with OneNote”. It is just a video pitch for OneNote using children talking about glitter pens and toys in the OneNote space, and a very brief mention of “ink to math”, which doesn’t seem to exist in the app, but is available in OneNote in the browser. It’s a digital ink converter that recognises mathematical symbols.

Module 6: Accessibility in OneNote

This module is more on accessibility features and functions in OneNote - a button you can click that tells you “everything is OK” in your documents.

Module 7: Translation

Self-explanatory, this one: using the share button on any document within OneNote allows you to export it through Microsoft’s Translator application if you have it installed. I don’t because it isn’t available for OSX, and it doesn’t seem to be available in the browser version of OneNote, either.

The quiz

Just as challenging as the last one. Earned myself a “Certified Microsoft Innovative Educator” badge for having completed 2 hours of online Microsoft Educator CPD.

Review of progress and next steps

What I have learned in the past few hours (a lot more than the two hours these two courses suggest) is that Microsoft tools are most definitely not the way forward for my hybridised teaching in the autumn. Too much time is required in trouble-shooting problems, bugs, inadequately documented differences between platforms, and design flaws. I want a piece of chalk, not a stick of cheese and some certified training in how to use it on a blackboard.

Your mileage may vary, of course: if you are fully embedded within the Redmond way of things, then good luck to you. This particular teacher is going to be innovative in other ways, and not be burdened by having to work this difficult software. Class Notebook is not going to feature in my teaching.

I completed the rest of the OneNote Teacher Academy learning path (without taking notes, digital or otherwise) and earned my badge.

MIE certificate

My transcript is available here.

Footnotes

  1. Later, I got this to work by signing out in the OneNote app, connecting to the University VPN, signing in to the university’s EASE authorisation (in the browser), and then restarting the OneNote app, and signing in (to Office 365) in the app. That is just such a ridiculous waste of time messing around trying to figure out why it doesn’t work, only to have it suddenly behave itself at the point of launching the laptop out the window. For reasons of mental wellbeing, I am heading back to avoiding Microsoft software as far as I possibly can. Life is just too short for this nonsense. 

Introduction to Mendeley Advisors Webinar⤴

from @ @cullaloe | Tech, tales and imagery

This webinar was run for the benefit of Mendeley advisors. I thought I might run a session on citation and referencing for students and so went to this webinar for a refresher on the advisor programme.

The webinar was presented by Dr Rob Moore, an AP at Old Dominion University, USA, and Rachel Brennesholtz, Head of Mendeley advisors programme at Elsevier in Amsterdam, who gave an introduction.

Why Mendeley, and how to introduce it

Rob cited the principal advantage of Mendeley over Zotero as the ability to annotated the pdfs within the app. He talked about some of the “gotchas” in Mendeley, specifically, getting the title right on import of a citation, and the inconsistency with author names.

He then discussed his demonstration approach when showing Mendeley to new users, which starts with input of several citations, showing the immediate build of the bibliography files. He then moved to demonstration of annotation of pdfs on an iPad.

The advisor community

Rachel told us about the world-wide community of advisors and users, and shared some statistics on its reach - 22,000 students were introduced to it in about 300 events in 2019. She shared a links to the advisor community1 and advised us to email community-@-mendeley.com2 if the advisor badge doesn’t show on your profile.

Benefits of the programme include expanded network storage (7.5 GB) and exclusive product updates and webinars. There’s merchandise, too (t-shirts, pens and stickers, etc). You have to register an event to get these, which can be online or in person. The Mendeley event can be part of other events (such as sessions on citation and referencing for example). Allow 4 weeks to get your stuff from Amsterdam.

Resources

Rachel winged it a bit as her slides froze, but then got on to showing the PowerPoint slides and other materials available to support training sessions put on by advisors.

Next steps

I’ve been prompted by a colleague today to make a video on note-taking using Cornell and sketch noting, both of which I use and teach as part of my work at Moray House. My colleague also shared this YouTube video of someone doing this, without actually understanding how the Cornell system works: like a lot of YouTube channel stars, that presenter knows about the product without understanding the process, which is the important part. I think I might also make a separate video or session on citing with Mendeley for academic writing.

Notes and references

  1. The advisors portal is here

  2. Remove the hyphens for the correct address. 

Statistics and Visualisation with R #3⤴

from @ @cullaloe | Tech, tales and imagery

Post number three of my notes from a course on Statistics and (data) Visualisation with R, presented by Lucia Michielin of the University of Edinburgh in June and July.

Third week

Programme overview

Class Date and time Title
Class 1 01 June 13:00-14:30 Intro to R and R studio
Class 2 02 June 13:00-14:30 Types of data and Grammar of graphs
Class 3 08 June 13:00-14:30 Intro to statistics and descriptive stats
Class 4 09 June 13:00-14:30 Boxplot and playing with Colours
Class 5 15 June 13:00-14:30 Data Collection Bias, Probability, and Distribution
Class 6 16 June 13:00-14:30 Hypothesis testings and the main tests
Class 7 22 June 13:00-14:30 Barcharts and cleaning the sample
Class 8 23 June 13:00-14:30 PCA and Cluster analysis
Class 9 29 June 13:00-14:30 Covariance, Regression, Similarity and Difference coefficients
Class 10 30 June 13:00-14:30 Recap and Bring your dataset class

Class 5 - Data Collection Bias, Probability, and Distribution

After a quick review of the challenge from last week, an agenda for today was shared which wasn’t really followed1.

Data Collection and bias

“Data collection is the process of gathering and measuring information on targeted variables in an established system, which then enables one to answer relevant questions and evaluate outcomes. The goal for all data collection is to capture quality evidence that allows analysis to lead to the formulation of convincing and credible answers to the questions that have been posed.” (Wikipedia)

The sample is taken from the population, but a bias is introduced when the sample is taken. One example of this is the survivorship bias.

Probability

A basic definition was given to the class, with an example from an earlier class that helped us. If we make the statement if **A** is true then **B** is true can we infer from this that if **B** is true then **A** is true? Clearly not, but we might be able to say if **B** is true then **A** is more plausible. Plausibility here relates to how probably something can be.

“Probability is a numerical description of how likely an event is to occur or how likely it is that a proposition is true.” - Wikipedia

Distribution

In introducing this topic, we were shown the really neat rnorm() function that generates random data sets with specified mean and standard deviation.

In the class, we were given a challenge to go through the steps needed with a new data set to examine it. These steps seem to be just to plot it and have a look. All of the curves looked to be distributed around a central mean, which makes them normal distributions. There was a problem with the challenge files which meant that some of us didn’t have the correct files and so did a different (trivial) task. The key message is to examine your data as soon as you have it, to look for central values and distribution, so as to understand it and how you expect it to behave in analysis.

Other distributions include uniform, logarithmic, left-, right-skewed and bimodal.

Class 6 - Hypothesis testing and the main tests

Just half a dozen students today. We started with a review of the mini-challenge from last session, which included a consideration of colours: HTML colour and a ggplot colour reference were shared.

One emphasis today is that there are multiple ways of doing the same thing in R, with an illustration of data importation and summarisation. Although some of the few remaining students seemed to be following the narrative, I became increasingly frustrated and lost trying to follow the class: the thread was along the lines of hypothesis testing, the null hypothesis and corrections.

Playtime discoveries

Between this week’s two classes, I had a quick look at the Comprehensive R Archive Network (CRAN) and discovered a neat thing. Remember the funky assignment operator <- from the first class which had its funky keyboard shortcut alt -? Most programming languages I’ve used have a simple equals sign as the assignment operator and I was a bit baffled as to why super-sexy R should do something so weird as <-. Well, it turns out that in most contexts, = works in exactly the way you’d expect, so these are equivalent:

> x <- 99
> x = 99

What’s interesting about the R syntax, though, it that it can be reversed and it still works:

> 99 -> x

I’m sure that the case for this will become apparent eventually. One more way to do assignments is to call the assignment function:

> assign('x', 99) # notice the weird need for quotes around x in the function call

Next steps: other ways to learn R

I got totally overwhelmed with the rate of new concepts in this class, or perhaps the pace and style of delivery. I think, given the time I have to spend after each class going over the material in order to make sense of it, it is clear that this course is not right for me. I have decided to abandon it and follow other materials more appropriate to my purpose, which is to develop skills in data visualisation using R. Despite the title, this course is not doing that for me.

Having decided2 to seek out other ways to learn how to R, I found that there are lots of self-directed learning resources out there, of course. An introduction to R course at DataCamp will be the new direction of travel down this rabbit hole for me.

Notes and references

  1. I seem to be not the only one amongst the diminishing cohort (we’re down to a dozen today (Monday), having started closer to 50) that is finding it hard to follow this course. Yes, it’s interesting and cool, the different things you can do with R and geometries, but I am missing clarity, signposting, structure and other basic pedagogical features. 

  2. Taking time after class 5 to review the presentation slides, it was possible to reconstruct perhaps what the instructor was intending, for example, in the two slides on probability and more generally with this class today. I might have got very little from today’s session without spending as long again going over the materials. I am perhaps going to seek out other resources

Statistics and Visualisation with R #2⤴

from @ @cullaloe | Tech, tales and imagery

This is the second in a series of posts containing my notes from a course on Statistics and (data) Visualisation with R, presented by Lucia Michielin of the University of Edinburgh. The course is running two days a week in June and July.

Second week

Programme overview

Class Date and time Title
Class 1 01 June 13:00-14:30 Intro to R and R studio
Class 2 02 June 13:00-14:30 Types of data and Grammar of graphs
Class 3 08 June 13:00-14:30 Intro to statistics and descriptive stats
Class 4 09 June 13:00-14:30 Boxplot and playing with Colours
Class 5 15 June 13:00-14:30 Data Collection Bias, Probability, and Distribution
Class 6 18 June 13:00-14:30 Hypothesis testings and the main tests 
Class 7 22 June 13:00-14:30 Barcharts and cleaning the sample
Class 8 23 June 13:00-14:30 PCA and Cluster analysis
Class 9 29 June 13:00-14:30 Covariance, Regression, Similarity and Difference coefficients
Class 10 30 June 13:00-14:30 Recap and Bring your dataset class

A bit of homework

Errors

I had the chance to play with the IDE and learn a few things by invoking errors.

> ggplot(college, aes(x=sat_avg, y=admission_rate)) + geom_point()
Error in ggplot(college, aes(x = sat_avg, y = admission_rate)) : 
  could not find function "ggplot"

The above occurs if you just run that line of code without first loading the library in which ggplot lives, i.e. by calling library("tidyverse") first.

> ggplot(college, aes(x=sat_avg, y=admission_rate)) + geom_point()
Error in ggplot(college, aes(x = sat_avg, y = admission_rate)) : 
  object 'college' not found

This error is thrown because the data object ‘college’ has not been created. Do this by loading the data first, i.e. call college <- read_csv('http://672258.youcanlearnit.net/college.csv').

Mutate and as.factor

Last week’s class included the creation of the dataset using this call:

college <- college %>%
  mutate(state=as.factor(state), region=as.factor(region),
         highest_degree=as.factor(highest_degree),
         control=as.factor(control), gender=as.factor(gender),
         loan_default_rate=as.numeric(loan_default_rate))

The mutate() function adds new variable columns to the dataset or replaces existing ones if the same name is used. You can see that state is replaced by a different version of itself: the as.factor() function takes data and turns it into a factor if it isn’t already. “Factor” is a term that indicates that the data is a category or enumerated type, rather than just a set of strings. To see this, consider this:

> var=letters[1:5]
> var
[1] "a" "b" "c" "d" "e"
> var=as.factor(var)
> var
[1] a b c d e
Levels: a b c d e

var is created as a vector, then converted to a factor or column in the as.factor call.

Class 3 - Intro to statistics and descriptive stats

  • Discuss the results of the challenge
  • Intro to Statistics
  • Descriptive Statistics
  • Summarising Statistics
  • Exploring 1 variable: Plotting distribution Histograms and Density plot

The results of the challenge

Last week’s challenge was discussed, focusing on the meaning of the plots obtained, and how to add a best fit line using +geom_smooth() geometry to the gg-plot command. We played a bit with a tool to help us get better at “seeing” the correlation of plotted data: guessthecorrelation.

Intro to Statistics

This part of the class was further levelling by going over the basics of statistics and how they may be used to summarise or infer information. Central tendency includes arithmetic mean, median and mode values. Measures of dispersion like variance and standard deviation were also discussed.

Standard deviation: \[ \sigma = \sqrt {\frac{1}{N} \sum\limits ^N _{i=1}(x_i - \mu)^2} \]

These functions were illustrated in the R IDE, including accessing columns within datasets using the dollar sign like this:

mean(iris$Petal.Width)	#using mean formula

Similar functions offer median, variance and sd (standard deviation) calculations.

Visualise the distribution: Histogram and Density plots

There is a nice cheat sheet for ggplot data visualisation tips.

Subsets of the data based upon some category can be easily made:

virginica <- subset(iris, Species=="virginica")

A table of data around these subsets can be quickly constructed and plotted:

#Values of Virginica
mean(virginica$Petal.Width)
median(virginica$Petal.Width)
var(virginica$Petal.Width)
sd (virginica$Petal.Width)

#... etc for the other two

# Make a new table...
Species <- c("setosa","versicolor", "virginica")

# add columns...
Mean <- c(mean(setosa$Petal.Width), mean(versicolor$Petal.Width), mean(virginica$Petal.Width))
Median <- c(median(setosa$Petal.Width), median(versicolor$Petal.Width), median(virginica$Petal.Width))
Variance <- c(round(var(setosa$Petal.Width), digits = 2), round(var(versicolor$Petal.Width), digits = 2), round(var(virginica$Petal.Width), digits = 2))
SD <- c(round(sd(setosa$Petal.Width), digits = 2), round(sd(versicolor$Petal.Width), digits = 2),round(sd(virginica$Petal.Width), digits = 2))

# Make the data frame for plotting...
FullPlot <- as.data.frame(cbind(Species, Mean, Median, Variance, SD))

Notice you can double-click on the FullPlot table in the environment box (equivalent to the command View(FullPlot)), it will display the table in a new tab for inspection.

# ... and now plot it in a nice histogram

ggplot(iris, aes(x=Petal.Width, fill=Species))+ 
  geom_histogram(alpha=0.8,color="black", binwidth=0.08)+
  geom_vline(aes(xintercept = mean(Petal.Width)),col='red',size=2)+
  theme_bw()+facet_wrap(~Species, ncol = 1) 
  

Here’s how that looks:

The next challenge, number 2

Once again, I got nowhere with the challenge in the 10 minutes we had to do it. I spent most of that time, after loading the dataset, trying to figure out how to make a scatter plot. When we reviewd the problem at the end of the class, it became apparent that the question was looking for a histogram, not a scatter plot. Getting stuck here meant that I didn’t know how to approach the second part of the challenge, and didn’t progress to reading the third.

Class 4 - Boxplot and playing with Colours

  • Discuss Challenge 2
  • Boxplots
  • Playing with colours
  • Exporting graphs

Discussing Challenge 2 and feeling lost

No time to spend on this before the next class, so I just sat back and tried to follow the discussion1.

Boxplots

Boxplot

  • The thick black line is the median.
  • The boxes represent 50% of the sample closest to the median
  • The whiskers correspond to 95% of the sample closest to the median aka 2SD from the median
  • The dots represent the outliers

In R, boxplots are another geometry2:

ggplot(iris, aes(x=Species, y=Petal.Width, fill=Species)) + geom_boxplot()

Playing with colours

There are a number of colour themes within R to allow you to make beautiful and readable plots and graphs. What is important, as with all open source, is that when using different packages and utilities, you must RTFM3.

These are invoked with colour commands and palettes, like scale_color_manual here:

ggplot(iris, aes(x=Species, y=Petal.Width, color=Species)) + 
  geom_boxplot(outlier.alpha = 0)+
  scale_color_manual(values = wes_palette("GrandBudapest2", n=4))

# other color methods:
scale_color_grey(start = 0.6, end = 0.1) # grayscale
scale_color_manual(values=c("#80ec65", "#145200", "#700015")) # manually defined
scale_color_gradientn(colours = rainbow(7)) # gradient

Exporting graphs

Pretty self-explanatory from the IDE: exporting graphs to pdf or png is available from the dropdown in the plots tab.

Challenge number 3

Well, today I managed to get a result during the time allocated for us to work on the problem, which made up for how I had been feeling earlier. With a little tweaking after class, I settled on this:

ggplot(college, aes(x=region, y=faculty_salary_avg, fill = region)) + 
  geom_boxplot(outlier.alpha = 0.0, alpha = 0.8) +
  geom_jitter(alpha =0.2)+
  theme_bw()+ 
  facet_wrap(~control, ncol = 2)+
  labs(title = "USA University", 
       subtitle = "Faculty salary by region", 
       x = "Region", 
       y = "Average salary")

which yielded this:

Challenge 3 plot

Next class

  • Data Collection
  • Datasets biases
  • Inferential statistics
  • Probability

Notes and references

  1. This shook my confidence a little, which took me back to my school days, the last time that I was so lost in a class that I wasn’t even aware when the teacher was asking a question. All you can do in this situation is wait for the next section and try to be invisible. Part of the difficulty I am having is the heavy accent of the instructor, which causes me to miss signposts and words. It made me think of all the EAL kids in Scottish schools. 

  2. There is a complete catalogue of the graphs available in R at r-graph-gallery.com

  3. Read the manual. 

Statistics and Visualisation with R⤴

from @ @cullaloe | Tech, tales and imagery

This is the first in a series of posts containing my notes from a course on Statistics and (data) Visualisation with R, presented by Lucia Michielin of the University of Edinburgh. The course is running two days a week in June and July.

First week

The learning environment

The course was presented in a blended form using Collaborate, Blackboard Learn and Slack. These represent three key elements in a remote learning environment:

  • Synchronous, whole-class activity, usually instructor-led, but may include “break-out” sessions for students to work together in groups.
  • A resources repository for files, course outlines, presenter’s notes and assignments.
  • A back-channel for students to interact in, and for interaction with the course leader. This may be synchronous and asynchronous, but Slack is only used in this course for offline use. Collaborate chat is used in the live sessions.

The offline chatter and forum space is useful in courses like this, which is why Slack is such a useful tool here. Microsoft teams has been suggested as an equivalent to Slack but I really don’t find it intuitive or useful in the same way as Slack. The UI, as with all Microsoft products I have ever used, is difficult and intrusive.

Preparation

The pre-course task is to install R and the R-Studio IDE. I painlessly installed on my Mac R 4.0.0 “Arbor Day” and R-Studio 1.3.959, and downloaded the data files for the first week.

Programme overview

Class Date and time Title
Class 1 01 June 13:00-14:30 Intro to R and R studio
Class 2 02 June 13:00-14:30 Types of data and Grammar of graphs
Class 3 08 June 13:00-14:30 Intro to statistics and descriptive stats
Class 4 09 June 13:00-14:30 Boxplot and playing with Colours
Class 5 15 June 13:00-14:30 Data Collection Bias, Probability, and Distribution
Class 6 18 June 13:00-14:30 Hypothesis testings and the main tests 
Class 7 22 June 13:00-14:30 Barcharts and cleaning the sample
Class 8 23 June 13:00-14:30 PCA and Cluster analysis
Class 9 29 June 13:00-14:30 Covariance, Regression, Similarity and Difference coefficients
Class 10 30 June 13:00-14:30 Recap and Bring your dataset class

Class 1 - Intro to R and R studio

This began with an introduction on The Edinburgh Centre for Data, Culture & Society (CDCS) and their other courses and research projects, followed by an introduction to our course leader, Lucia, and a walk around the VLE folders and files. Then, an overview of the first session:

  • Intro to Quantitative methods in Research
  • The R and R studio Interface
  • How to organise your work in R efficiently
  • How to install packages

This is fairly self-explanatory but an essential levelling for all delegates on the course, who come from a range of backgrounds. Most seemed to be researchers.

R and R-Studio

R is the language, and R-Studio is a graphical IDE for working with projects and data using R. Lucia said that the key skill in acquiring a new language is knowing how to Google: R is widely used and has a large user base and the forums are very helpful in quickly overcoming problems.

Scientific method

We heard about the scientific method and in particular the issue of correlation vs. causation. This, in context of remaining close to the process, and the meaning of the data, whilst engaging deeply with the IDE.

  • Define a research question
  • Explore the evidence
  • Define working hypotheses
  • Translate hypotheses → statistical models
  • Compare models to evidence
  • Interpret results

Getting our hands dirty

Finally we got to open R-Studio and follow Lucia through opening the script for today and take a walk around the IDE. We started with creating a new project and setting up a folder organisation for the course, then played with settings to make the IDE more comfortable for us.

Comments and header syntax was explained, then using an immediate mode of running commands in the console (by pressing cmd-enter). Outputs are presented in vectors: index 1.

# THIS IS A LEVEL 1 HEADER #################################

## This Is a Level 2 Header ================================

### This is a level 3 header. ------------------------------

print('Hello World!')

## Define Variables ================
x <- 1:5 #Put the numbers between 1-5 in the variable x (alt- is a short code for <- in the IDE)
x #Displays the values we have set in x

Getting help is a matter of issuing a query command: e.g. ?datasets displays help on the Datasets package.

Packages

Packages are installed and activated thus:

install.packages("ggplot2")
?install.packages  # Get help on installing packages

## Activate packages ==========================
library(ggplot2)

We ended the session by installing something called the “Tidyverse”, which seems to be a collection of useful packages. It calls itself an “opinionated collection of R packages designed for data science”.

Class 2 - Types of data and Grammar of graphs

  • Type of Data
  • How to load data
  • How to choose the right Graph
  • Graph types
  • The Grammar of Graphs
  • Ggplot2 structure
  • Graph settings

Data types

Type Description
Vector One dimensional collection of data, like a column or row, all of the same type
Matrix Two-dimensional collection of data, all of the same type
Array Multi-dimensional collection of data, all of the same type
Data frame Mixed type variables (similar to a table in spreadsheet)

Reading a table by convention has variables in columns, and the observations or data values are in the rows. Vectors, arrays and matrices in R all must contain the same data type, and a data frame is the only construct that allows mixed types.

Keeping the environment clean

# Clear environment
rm(list = ls())

# Clear console
cat("\014")  # ctrl+L

Loading data

df <- read_csv("data/RodentSimplified.csv")

# ... same, with selection and mutation of month column name:

df2 <- read_csv("data/RodentSimplified.csv") %>%
  select(mo,dy,yr,period,species) %>% 
  mutate(period = as.factor(period)) %>%
  rename(month = mo) %>%
  print()

What looks like a continuation line in the above code block is called a “pipe”, written %>%. A pipe is much more than just for making the code look pretty: it really does work as a pipe to transfer a value between subsequent function calls. See this and this and this for more details.

Graph types

After some warnings on selecting data presentation that is consistent with your intention to actually communicate something, and not to show off, Lucia introduced us to some of the graph types available with R. She gave us a really nice demonstration why many journals prefer data presentation (e.g. ratios) as a bar chart rather than a pie chart. Differences are sometimes clearer in the bar chart. Apparently, Florence Nightingale invented the pie chart!

The Grammar of graphics

The ggplot function allows the creation of graphs according to the framework outlined by Wilkinson1 in which the separate aspects of data presentation are dealt with separately. A basic template is described here with examples for ggplot. My take on the framework is:

  1. a data set
  2. aesthetics = a set of scales and variables to be plotted
  3. geometry = shapes to represent the data and the type of chart
  4. a grid of subplots, or facets
  5. statistics = summaries of the data
  6. coordinates to define the plotting space
  7. global themes and labels to make it look pretty

Some of these map into the ggplot function, from the most basic (using examples from the weekly challenge, below):

ggplot(college, aes(x=sat_avg, y=admission_rate)) + geom_point()

Here, the aes part is the aesthetics definition of the variables for the plot; geom_point defines the type of chart (scatter plot), and college is the data set.

The weekly challenge

I completely failed to get anything at all out of the challenge in the 5 minutes we were given for this task. My code just didn’t produce any kind of plot, just barffing errors at me. This was entirely down to my inability to type or read carefully what I had typed. Later, I managed to make some progress:

# Import CSV files from online repo
college <- read_csv('http://672258.youcanlearnit.net/college.csv')
college <- college %>%
  mutate(state=as.factor(state), region=as.factor(region),
         highest_degree=as.factor(highest_degree),
         control=as.factor(control), gender=as.factor(gender),
         loan_default_rate=as.numeric(loan_default_rate))
# Create a chart that would show the relationship between the SAT average and the admission rate 
ggplot(college, aes(x=sat_avg, y=admission_rate)) + geom_point()

Time is not in abundance at the moment, so I did eventually just look at the solution and run it, seeing how each element of the code contributed to the final result. Here’s my result:

Next class

  • Discuss the results of the challenge
  • Intro to Statistics
  • Descriptive Statistics
  • Summarising Statistics
  • Exploring 1 variable: Plotting distribution Histograms and Density plot

Notes and references

  1. Wilkinson, L. (2005) The Grammar of Graphics, The Grammar of Graphics. Springer-Verlag. doi: 10.1007/0-387-28695-0.