In 2017, Dylan Wiliam tweeted: “I’ve come to the conclusion Sweller’s Cognitive Load Theory is the single most important thing for teachers to know http://bit.ly/2kouLOq “ (see here for original tweet).
I became interested in Cognitive Load Theory through listening to Greg Ashman talking on the mrbartonmaths podcast.
In very simple terms, Cognitive Load Theory is about considering the limitations of pupils’ working memory at the point of initial instruction.
I decided to offer to run a workshop about Cognitive Load Theory at the Scottish Mathematical Council’s conference in Stirling in March, 2018, and this blog post will be a summary of my presentation. Note: the SMC conference was postponed due to adverse weather, and has been rescheduled for Saturday 19th May.
As well as Dylan William, Greg Ashman, Craig Barton and John Sweller, I have also read some of the work of Daisy Christodoulou and the paper by Kirschner, Sweller and Clark titled “Why Minimal Guidance During Instruction Does Not Work: An Analysis of the Failure of Constructivist, Discovery, Problem-Based, Experiential and Inquiry-Based Teaching”. Another great summary of Cognitive Load Theory can be found at this link.
Take a moment to answer this question before you read on:
What are some of the things that you know?
You know a lot of stuff. Some of it is important – like date of birth, phone numbers, passwords, pin codes. Some of it is not important – like the lyrics to Aga Do. Some of it is long lasting and easy to retrieve. Some of it is to do with what is happening right now – the brightness and temperature in the room you are sitting in. Some of it is to do with what happened tens of years ago and you probably can’t remember it right now. But it’s in there… What was the name of the teacher you had in Primary 1?
You know how to write but is that the same as knowing how to speak?
You know how to multiply numbers but is that the same as knowing how to count?
Is knowing that things fall towards the ground when they are dropped the same as knowing the formulae for potential energy and kinetic energy?
David Geary (2007) talks about two different types of knowledge: Biologically Primary and Biologically Secondary. Biologically Primary Knowledge includes things like being able to speak your native language, being able to read people’s body language and being able to make sense of how things interact in our physical environment. Biologically Secondary Knowledge concerns everything that has to be learned through effort. Learning a new language, knowing your times tables and being able to tie shoelaces are examples of biologically secondary knowledge. In fact, pretty much everything we teach in our classes in school can be described as biologically secondary.
To understand cognitive load, we must define what we mean by novices and experts and consider how they differ when learning new material. Novices are people who have a very limited experience in a particular domain. Experts are extremely knowledgeable in a particular domain. Novices and Experts think and learn differently. The differences are discussed further in this post by David Didau.
When we learn new material, our working memories are stretched significantly. Everything we think about contributes to working memory. It is thought that our working memories is limited to (7±2) items. There isn’t really an agreed consensus on the number of items that can be held in working memory at any one time, and it depends on many factors such as how complicated the items are and what we are required to do with them once they are in our working memories.
The main points here are:
- our working memories are limited
- everything we must think about uses up space in working memory
- learning is defined as a change in long term memory (Kirschner, Sweller and Clark)
- learning requires effort in working memory
There is no known limit to our long term memories. In the long term memory, information is organised in schemas.
You have schemas for everything. And you can have unlimited schemas (as far as we know). They can be vast or they can be simple. My schema for solving a Rubik’s cube is, like most of yours I am sure, vast and complex. But my schema for crochet patterns is very small – there are, I am assured, lots of different abbreviations used for different stitches, and these vary depending on the country where the pattern originated from etc.
A person with a highly developed schema for, say, solving simple problems involving differentiation (i.e. just finding the derivative of lots of functions) will have a more success learning how to find the stationary points of a function or the equation of a tangent to a function than a person who does not have that schema as well developed. A really good way to develop schemas is through practice of the component parts.
How can we tell if a student is a novice or an expert? We need to use formative assessment and perhaps diagnostic assessment before the first lesson in a particular topic. Note that a student who is an expert in one domain may not be an expert in another.
There are three main types of cognitive load:
Extraneous Load: caused by inappropriate instructional designs that ignore working memory limits and fail to focus working resources on schema construction or automation. This type of load is mostly environmental and always unhelpful for learning. This may include noise, unhelpful or unnecessary pictures/graphics/animations and poorly structured learning activities.
Intrinsic Load: caused by the natural complexity and structure of the material that must be processed. Necessary for learning – it is what makes it worth learning. Some things are harder to learn than others, based on their complexity and the prior knowledge of the learner. Learning capital cities is pretty easy – I tell you that Paris is the capital of France, you understand what I mean (as long as you know that France is a country and you have an idea of what Capital means) but if I tell you that the area under the curve sinx from 0 to pi/2 is 1 square unit you need to know quite a few things in order to understand it. The intrinsic load depends on two main factors – the complexity of the material and how knowledgeable you already are in that specific domain.
Germane Load: caused by effortful learning, resulting in schema construction and automation. This is the effort required to actually learn material (if our definition of learning is “a change in long term memory”).
As teachers (or “instructional designers”) we need to ensure we do the following:
- Minimise extraneous load – consider the environment and anything you make students think about that isn’t to do with the new learning.
- Minimise intrinsic load – break down the problem for novices. Present small parts at a time before approaching a whole problem that requires several new steps.
- Maximise germane load – by reducing extraneous load and making the intrinsic load more manageable for learners, schema construction is much easier.
There is an additive relationship among the three types of Cognitive Load. If we get too much in the total, pupils become cognitively overloaded.
The Phonological Loop
The part of the working memory that processes written and spoken material is called the Phonological Loop. When you read something, you generate a sound in your head. When you listen to someone speak this is also processed as a sound. If you are trying to read something while someone is speaking, you get cognitively overloaded straight away. As teachers, we should avoid things like reading out slides or, even worse, talking about slides that have text on them while the students are reading the slides. For novices who are not familiar with the content, this will cause them excessive cognitive load. More on this when we get to the modality effect.
Cognitive Load Effects
I will mention 6 cognitive load effects briefly, and give some examples of each one.
- Worked Example
- Expertise Reversal
- Split Attention
- Goal Free
The Worked Example Effect
At the point of initial instruction, novices benefit from seeing worked examples. An effective strategy is to present a worked example to the class (you can use questioning about the parts that they can already do – this isn’t necessarily chalk and talk) followed by the class completing a very similar problem for themselves. When I do this, my classes don’t copy the worked example, but they do write their solutions to the problems they will try into their notes. When we discuss the problem as a class and go over the correct solution (or a correct solution) they then have the chance to change their answers. The worked example should allow all pupils who are paying attention the chance to get the problem correct without too much of a demand on their working memories. This allows them to see the ways that the parts of the example interact and allows easier formation of schemas. Some examples of worked examples are given below:
This is taken from Nathan Quirk’s board (our probationer). Each example is completed alongside a problem for pupils to complete.
Questioning and discussion of steps is what makes this effective. Cannot just be pupils following the same steps without using their brains.
These worked examples and problems were supplied by Chris McGrane.
The Expertise Reversal Effect
It has been shown that worked examples are more useful for novices than they are for experts. As expertise grows through experience, worked examples are no longer needed, and in fact can cause unnecessary cognitive load (extrinsic) for experts. Instead of presenting experts in a particular domain with worked examples, it is more beneficial to have them solving problems. Learning through problems is only possible when a strong foundation of knowledge has been built up by the student.
The Redundancy Effect
Any information that is additional to the problem is redundant information. For example, when students are solving geometry problems, an annotated diagram alongside text that tells you the lengths of the sides and the sizes of the angles (which are already marked on the diagram). In this case one of these sources of information is redundant, since the problem could be fully understood with just one of them.
Here is an example:
The text to the left of the diagram is redundant information. The diagram could be fully understood without it.
We can cope with this as experts, because we look at this question and instantly think “Pythagoras!” but remember that novices do not work in the same way. A novice needs to process everything in the problem.
Other sources of redundant information include teachers reading out slides and drawings/images on slides and worksheets that have little to do with the problem. At the point of initial instruction, these additional things are not helpful for learning, and so they should be avoided.
Some teachers tell me that the reason they read out slides is that they do not trust their pupils to read the slides for themselves. A simple fix in this case is to simply put a picture on the slide that represents the idea being discussed and to simply say the things that would have been text on the slide.
The Split Attention Effect
This occurs when two or more sources of information must be integrated in order to make sense of the whole problem or idea. This can easily be eliminated by integrating the two sources. This differs from the redundancy effect in that both pieces of information must be thought of together to make sense of the whole.
Here is an example from a Higher Maths past paper:
A simple fix:
The equations could easily be added to the diagram, thus removing the need to interpret two sources of information to make sense of the whole.
The Modality Effect
This concerns the way that new information is presented, whether it be auditory, written (which is also auditory by the time it is processed) or visual. We can cope with listening to speech and seeing something in a diagram at the same time without impacting on cognitive load. This is better than integrating text and a diagram. Have you ever been on a museum tour with a headphone set? This is effective because it is easier than reading text then looking at things. Yes, it’s saving us from having to read – effort – but also it cuts down on reading (with eyes) and seeing the exhibits (with eyes).
What we can’t do is listen to something while listening to something else. We can’t read something (which uses visual channel and auditory channel) and listen to someone speaking.
A diagram for a question (or to demonstrate a relationship) that would normally have text alongside it can be replaced with just the diagram and the teacher narrating over the top. If you have pupils who need the written form too (not all of them will) then you can give them a written copy, but it will be better for everyone else if they hear the question and see the diagram rather than having the text, which you will probably redundantly read out, and the diagram too – you get the split attention effect if they have to read about the diagram while looking at the diagram.
The Goal Free Effect
This effect concerns the idea of “problem solving search”. When novices are presented with a problem such as the one on the left in the diagram below, they tend to think of the whole problem in one go and suffer cognitive overload as a result.
Taking the specific goal out of the problem and re-framing it as is shown on the right eliminates problem solving search so that the novice learner may use any angle facts they know to fill in as many angles as they can. When the problem is framed this way, novices are able to make sense of the individual steps they take, and this allows them to assimilate long term memories of angle facts.
The idea that novices can learn new knowledge through discovery learning is flawed due to what we know from Cognitive Load Theory. Kirschner, Sweller and Clark (2006) state that “The goal of instruction…is to give learners specific guidance about how to cognitively manipulate information in ways that are consistent with a learning goal, and store the result in long-term memory”. Discovery Learning does not easily facilitate this. I used to attempt to teach Pythagoras’ Theorem through a discovery task. The class would investigate the relationship by matching around 15 squares to the correct 5 triangles by finding the sides that matched. No relationship yet discovered. They then had to measure the lengths of the sides of each square and work out the areas of each square.
Only a small number of pupils in the class managed to calculate the correct areas, and nobody noticed that the two small squares had a combined area that was equal to that of the large square. So I reluctantly told them that this relationship would exist. “It doesn’t work on mine! 3.1 squared plus 3.9 squared doesn’t make 5.2 squared”. If only they could measure accurately. This type of discovery investigation task looks lovely – I was observed by a depute head teacher doing it with a second year class. His comments were “You could just feel the learning in the room – they are so engaged”. No you couldn’t and their engagement was with glue sticks and scissors. They only learned Pythagoras’ Theorem in the last few minutes when I explained it quickly before the bell. They still were not convinced that it works because for their squares and triangles it didn’t work. It was a discovery learning failure. I now start the Pythagoras’ Theorem topic by telling them that the two small squares have the same total area as the large square and I demonstrate it with a few Pythagorean Triples (3, 4, 5), (5, 12, 13). We sketch a diagram of a right-angled triangle with three squares every time we answer a question. The success rate is much higher and they feel like they are doing pretty advanced maths. The paper by Kirschner, Sweller and Clark in the references list is well worth a read for more on this, as is listening to Greg Ashman and Daisy Christodoulou on the @mrbartonmaths podcast.
This is a great way for pupils to apply what they have already learned in different and unfamiliar contexts. The trouble is, often interdisciplinary learning attempts to teach new content through interdisciplinary learning projects. It is not fair on novices to expect them to synthesise new material at the point of initial instruction. I’m not saying that Interdisciplinary Learning is a bad idea. What I am saying is that, when designing learning experiences, we need to be mindful of the fact that we are experts and that our pupils are novices.
Classroom displays often contribute to the extraneous load we impose on our learners, particularly when the displays are engaging. With this in mind, I have removed as much clutter as I could from the walls in my classroom. All of my displays are now on the back wall (my pupils sit in rows, facing the front). The only things worth looking at on the wall at the front of my classroom are the two whiteboards. Examples of pupils’ work are shown using the visualiser and do not become wallpaper on my walls. The walls at the side are plain, with the exception of the fire evacuation instructions. Perhaps you’re not ready to give up your classroom displays, but please consider what they add to the learning in your classroom. If it’s formulas for pupils to use, are you happy that they don’t need to commit these to their long term memories, and instead just rely on them being on the wall?
If you only remember three things from this blog post:
- Novices and experts learn differently
- Working memory is limited
- Effects: Worked Examples, Redundancy, Split-Attention, Modality, Goal-Free
Barton, C. (2017) ‘Greg Ashman – Cognitive Load Theory and Direct Instruction vs Inquiry Based Learning‘, Mr Barton Maths Podcast.
Barton, C. (2017) ‘Daisy Christodoulou – Assessment, Multiple Choice Questions, 7 Myths about Education‘, Mr Barton Maths Podcast.
Barton, C. (2018). How I Wish I’d Taught Maths. John Catt Educational Ltd. Woodbridge.
Christodoulou, D. (2014) Seven Myths About Education. Routledge. Oxon.
Didau, D. (2018). When do novices become experts?. [Blog] The Learning Spy. Available at: http://www.learningspy.co.uk/psychology/novices-become-experts/ [Accessed 7 May 2018].
Geary, D.,(2007). Educating the Evolved Mind: Conceptual Foundations for an Evolutionary Educational Psychology. In: Carlson, J. S. & Levin, J. R. eds. Educating the Evolved Mind. North Carolina: Information Age Publishing, Inc, pp1-100. Available online at: http://evolution.binghamton.edu/evos/wp-content/uploads/2008/11/Geary01.pdf
Kirschner, P. A., Sweller, J, & Clark, R. E., (2006). Why Minimal Guidance During Instruction Does Not Work: An Analysis of the Failure of Constructivist, Discovery, Problem-Based, Experiential and Inquiry-Based Teaching. Educational Psychologist, 41(2), 75–86 Available online at: http://www.cogtech.usc.edu/publications/kirschner_Sweller_Clark.pdf
NSW Department of Education (2017). Cognitive load theory: Research that teachers really need to understand. Sydney: Centre for Education Statistics and Evaluation.
Sweller, J. Story of a Research Program. Education Review. Available online at: http://edrev.asu.edu/edrev/index.php/ER/article/viewFile/2025/545
Willingham, D. T. (2009) Why Don’t Students Like School? Jossey-Bass. San Francisco.