Instruments

An initial survey using a questionnaire (Appendix C) was administered to all participants in order to gain insights into the students’ perceptions of assessment and measure the use of technology in prior assessment practices, both in Physics and other subjects. A norm-referenced, pencil and paper test measuring academic achievement, out of a total of 50 marks (Appendix D), with questions adapted from past standardised high-stakes examinations from 2008 to 2014, was taken by all participants in class as a post-intervention test. Questions were based on and categorised using Bloom’s taxonomy of educational objectives for the cognitive domain (Bloom 1956) as illustrated in Table 2. Although along the years, the taxonomy has been criticised (Furst 1981, Solman and Rosen 1986) and revised (Anderson and Krathwohl 2001), the original structure for classifying cognitive behaviour presented by Bloom (1956), remains the most widely accepted and extensively used version of the taxonomy (Armstrong 2003).

Table 2 Categorisation of post-test questions (including percentage number of marks allotted) according to Bloom’s cognitive processes (Bloom 1956)

%

Level 1: Knowledge


%

Level 4: Analysis

%

Level 2: Comprehension


%

Level 5: Synthesis

%

Level 3: Application


%

Level 6: Evaluation

A traditional objectively scored achievement test, rather than a game-informed assessment was devised to act as a post-test for both groups, as the presented research does not aim to challenge or change the notion of traditional academic attainment as de facto defined in the research question. In fact, the study seeks to evaluate the effect of a game-informed approach to everyday assessment, which is not beyond the control of the classroom teacher (as most standardised exams unfortunately are, both in terms of mode of delivery and content), on students’ traditional academic achievement as measured by high-stakes standardised examinations. Furthermore, students’ prior academic performance which was used as a covariate and acted as a pre-test was based on the score obtained in past high-stakes examinations.

Procedure and Data Collection

Both control and experimental groups were subject to innovation in the use of digital technologies for assessment purposes as shown by the questionnaire administered to both groups. All participants were taught the same subject content, which included Electricity in the Home, Magnetism and Electromagnetism (Matriculation and Secondary Education Certificate Examinations Board 2012) by me as their teacher. During a 10-week intervention period (each week having 4 lessons of 40 minutes each of contact time), the control group was assessed using traditional summative assessment practices while a game-informed approach to assessment was followed by the experimental group through a website, which I have specifically designed for the purpose of the study at www.in2fiziks.com. Table 3 illustrates the intervention period weekly assessment activities for both groups. At the end of the intervention period, both groups took a norm-referenced test measuring their academic achievement. Appendix E gives direct links (including username and passwords) to the weekly assessment activities shown below.

Table 3 Weekly assessment activities during the intervention period

Week

Traditional Assessment Group

Quiz using Clickers (MCQs) – click here for a preview

Game-Informed Assessment Group

Quiz using Clickers (MCQs) – click here for a preview
in twos (collaboration), with hints (just-in-time feedback)

Both groups took a multiple-choice question quiz on the interactive whiteboard using clickers. While the students in the control group took the quiz individually, the game-informed assessment group worked collaboratively on the quiz, in twos or threes and were given immediate feedback through hints, before attempting an answer.

Week

Traditional Assessment Group

Online Exercise (fill in the blanks and MCQs) – click here for a preview

Game-Informed Assessment Group

Online Exercise (fill in the blanks and MCQs) – click here for a preview
with hints (just-in-time feedback)

The online exercise was composed of fill in the blanks and multiple-choice questions. Students in the control group could take the online exercise only once and after registering their score, feedback was provided. On the other hand, the game-informed assessment group had the possibility of improving the score obtained by taking the online exercise 2 times. In between attempts, feedback was given and the end, their best score was registered.

Weeks

Traditional Assessment Group

Single-staged assignment (immediate final version submission) – click here for a preview

Game-Informed Assessment Group

2-staged assignment (stage 1 – draft submission) – click here for a preview
feedforward

Students were required to write down the method for 3 different practical work sessions on the MS Word templates provided. Once filled in, the documents had to be uploaded on www.in2fiziks.com. Online feedback on each individual submission was then provided by the teacher. The experimental group could then re-upload a new revised version, based on the feedback received, which was then marked for assessment purposes.

Week

Traditional Assessment Group

Wiki Exercise (theme poster) – click here for a preview
teacher-generated assessment criteria

Game-Informed Assessment Group

Wiki Exercise (theme poster) – click here for a preview
in twos (collaboration), student-generated assessment criteria and student-nominated test questions (agency)

The control group had to individually produce a poster-like wiki page on an assigned theme from the topic being studied, using different media, like images and YouTube videos and different fonts, styles, colours and hyperlinks. The individual wiki pages were then marked, based on the assessment criteria set by the teacher. The experimental group worked collaboratively, in twos or threes, to produce the poster-like wiki pages. A sample wiki page, which acted as an exemplar, was also given to the group. Students also had the opportunity to nominate 3 questions, together with their respective answers, based on their wiki page, to be used in the in-class test that followed. The assessment criteria for this task were negotiated together with the students, who then voted for the 4 most relevant criteria via an online poll.

Week

Traditional Assessment Group

Mid-intervention Online Test – click here for a preview

Game-Informed Assessment Group

Mid-intervention Online Test – click here for a preview
using student generated content/test questions (agency)

The students took the same mid-intervention online test using the game-informed assessment group students’ generated test questions, from the previous assessment task.

Week

Traditional Assessment Group

Multiple-choice question Workshop – click here for a preview
using teacher-set questions

Game-Informed Assessment Group

PeerWise Workshop (extends through week 8 and 9) – click here for a preview
using student generated content/test questions (agency)

Students following a traditional approach to assessment used clickers to answer a number of teacher-set multiple-choice questions. The experimental group used PeerWise to create, answer and review multiple-choice questions. Apart from giving and explaining the correct answers to the questions they create, students needed to at least give two effective alternative answers. Each student then answered a minimum of 3 different questions created by their classmates, while rating and commenting on the perceived difficulty and appropriateness.

Week

Traditional Assessment Group

Online Exercise with Static Content – click here for a preview
using static diagrams

Game-Informed Assessment Group

Online Exercise with Animated Content (using animations and simulations) – click here for a preview
responsive and interactive environment

Both groups took an online exercise involving the use of visual content, related to the question being asked. While the control group was given static content, mainly in the form of figures and diagrams, the exercise for the game-informed assessment group, based on the same content and questions, consisted of animations and simulations which were interactive and responsive to the students’ actions.

Week

Traditional Assessment Group

Non-Adaptive Online Test (MCQs) – click here for a preview

Game-Informed Assessment Group

Adaptive Online Test (MCQs) – click here for a preview
adaptive (responsive, progression)

The traditional assessment group took 2 traditional online tests on ToKToL. The students in the experimental group used the same platform to complete 2 computer-adaptive online tests.

Week

Traditional Assessment Group

Quiz using Clickers (MCQs) – click here for a preview

Game-Informed Assessment Group

Quiz using Clickers (MCQs) – click here for a preview
in twos (collaboration), with hints (just-in-time feedback)

Both groups took a multiple-choice question quiz on the interactive whiteboard using clickers. While the students in the control group took the quiz individually, the game-informed assessment group worked collaboratively on the quiz, in twos or threes and were given immediate feedback through hints, before attempting an answer.

Data Protection

All data collected was encrypted, anonymised, password-protected and digitally stored. Access was restricted to the researcher, dissertation supervisor and the 2 Physics teachers who independently rated the post-intervention test (Wiles et al 2006).

Validity and Reliability

Potential threats to validity were identified during the research design and proposal stage. The following table illustrates the mitigation measures taken in order to limit the effect of possible threats to the research design.

Table 4 Addressing potential threats to validity

Ambiguous temporal precedence

Type of validity: internal validity

Description: ‘any claims of causality in which the putative cause does not unambiguously precede the putative effect is suspect’ (Hedges 2012, p 28)

Mitigation: the intervention preceded any possible causal effect/s (assessment predated the post-intervention test)

Observation effects

Type of validity: internal validity

Description: ‘The act of observing (or measuring or interviewing) changes the phenomenon being observed in substantial ways’ (Hedges 2012, p 29)

Mitigation: both control and experimental groups were offered innovation in terms of technological mediation and the consent approval form was carefully worded in order not to affect the participants’ behaviour

Observer-expectancy effects

Type of validity: internal validity

Description: ‘Bias introduced by the researcher’s expectations of the effects of an experimental treatment’ (Robson 2011, p 525)

Mitigation: procedural objectivity and confirmation bias were minimised as the post-test was an objectively scored achievement test (Eisner 1992), independently rated by 2 different Physics teachers, other than the researcher

Maturation effects

Type of validity: internal validity

Description: ‘Individuals grow older, wiser and more experienced over time for reasons that have nothing to do with interventions’ (Hedges 2012, p 29)

Mitigation: the control and experimental groups were subject to the same 10-week intervention period

Selection

Type of validity: internal validity

Description: ‘When groups are being compared that are not randomly assigned it is possible that the groups differ in ways other than the putative in- dependent variable which is the presumed cause of group differences in outcomes’ (Hedges 2012, p 29)

Mitigation: although randomisation of the individual students to the participating groups was not possible, the groups themselves were randomly selected from the whole cohort and prior academic performance for each participant was taken as a covariate

Sampling bias

Type of validity: external validity

Description: ‘Representative (that is probability or random) sampling can ensure external validity, but this is seldom a viable option outside survey research’ (Hedges 2012, p 29)

Mitigation: caution in generalising from findings due to the specific target population (15 to 16-year-old males studying Physics in Malta)

Hawthorne and Novelty effects

Type of validity: construct validity of cause and effect

Description: ‘the actual cause of the effects is attention or a change in routine rather than the attributed type of attention or the particular attributed change in routine’ (Hedges 2012, p 29)

Mitigation: all participants were taught by the same teacher and the use of digital technologies for assessment represented an innovation for both groups

Furthermore, both data collection instruments were piloted for face and content validity. Two independent raters categorised the post-test questions, based on Bloom’s taxonomy for the cognitive domain (Bloom 1956). Inter-rater reliability for the different cognitive levels, shown by the intra-class correlation coefficient (ICC), was good (knowledge: ICC= 0.80; comprehension: ICC = 0.93; application: ICC = 0.83; analysis: ICC = 0.91; synthesis: ICC = 0.87 and evaluation: ICC = 0.95). Internal consistency was checked using Cronbach’s alpha (Cronbach’s α = 0.71). Answers to the test were independently marked by 2 Physics teachers, other than the researcher. All written answers to open-ended questions were scored independently. Inter-rater agreement was high (ICC = 0.94).

Data Analysis

All data was analysed in SPSS version19 (IBM 2010). Crosstabulations and Chi-Square tests for independence were performed on the data resulting from the initial questionnaire, in order to summarise the frequency of use of digital technologies for assessment purposes and establish any statistically significant differences in responses across the control and experimental groups (Field 2009). On the other hand, establishing the difference between the post-intervention means test scores attributed to the educational intervention using an independent samples t-test, would have introduced a source of selection bias, as randomisation of the individual participants to the groups was not possible; hence the groups could not be assumed to be ‘probabilistically equivalent’ (Trochim 2001, p 184). Thus, any statistically significant difference and respective effect size (Cohen 1988) between the post-intervention means test scores attributed to the educational intervention and adjusted for prior academic performance were determined using an analysis of covariance (Field 2009).

Go to the

Presentation of Findings

Next