A series of crosstabulations and Chi-Square tests for independence were performed on the different questionnaire items, in order to compare and contrast the frequency of use of the different digital technologies for assessment purposes prior to the study, both in Physics and in other subjects, across the 2 research groups. An analysis of covariance (ANCOVA), which controlled for the influence of the covariate (Hedges 2012), defined by prior academic performance, was conducted on the post-intervention test, in order to determine any statistical significant differences in scores between the 2 groups.

# Questionnaire

For the first item on the questionnaire (about the frequency of use of digital technologies for assessment purposes in past Physics lessons), the statistical analysis was performed on 2 levels. A crosstabulation was output for those digital technologies which all the students (from both groups) marked as ‘never’ used or ‘I don’t know what this is’ (Table 5). These include virtual learning environments, e-portfolios, blogs, wikis, clickers and web-based systems. No further statistical analysis, involving Chi-Square tests was necessary, as not enough variance was registered, since these technologies were never effectively used by any of the participants and as such no comparison of frequency of use could be made. On average, 49.4% of the students did not know what these technologies involved. E-portfolios and web-based assessment systems were unknown to most of the students (93.3% and 73.3% respectively), while 96.7% of the students, although aware of clickers, had never used them for assessment purposes in Physics. No students mentioned any another digital technology used for assessment purposes in past Physics lessons.

**Table 5** Summary of the Crosstabulation outputs for the digital technologies, which were never used by the students prior to the study (SPSS output is given in **Appendix ****F**)

*select the different digital technologies to visualise the respective data*

For the only 2 technologies which were marked as somehow used for assessment in Physics, the statistical analysis included also Chi-square tests for independence. In terms of the frequency of ‘use of interactive whiteboards’, the data obtained was recoded in a 2-point scale in order not to have more than 20% of the expected counts of values less than 5 (Field 2009). The results show that the frequency of use is however limited from ‘weekly to monthly’ (43.3%) and from ‘rarely to never’ (56.7%). For the ‘use of online quizzes’, no recoding was necessary as all responses fell into either the ‘rarely’ (36.7%) or ‘never’ (63.3%) options. The Chi-square tests for independence for these technologies show no statistically significant difference in the responses across the control and experimental groups, for neither the interactive whiteboard (χ2(1) = 3.394, p = .065) (Table 6), nor online quizzes (χ2(1) = 0.144, p = .705) (Table 7). All expected cell frequencies were greater than five.

**Table 6** The Chi-Square tests’ results for the use of interactive whiteboards, for assessment purposes in Physics, prior to the study (SPSS output is given in **Appendix G**)

*select the different groups to visualise the respective data*

Chi-Square Tests for the use of interactive whiteboards | |||||

Value | df | Asymp. Sig. (2-sided) | Exact Sig. (2-sided) | Exact Sig. (1-sided) | |

Pearson Chi-Square | 3.394^{a} |
1 | .065 | ||

Continuity Correction^{b} |
2.172 | 1 | .141 | ||

Likelihood Ratio | 3.466 | 1 | .063 | ||

Fisher’s Exact Test | .139 | .070 | |||

Linear-by-Linear Association | 3.281 | 1 | .070 | ||

N of Valid Cases | 30 | ||||

a. 0 cells (.0%) have expected count less than 5. The minimum expected count is 6.50. b. Computed only for a 2×2 table |

**Table 7** The Chi-Square tests’ results for the use of online quizzes, for assessment purposes in Physics, prior to the study (SPSS output is given in **Appendix H**)

*select the different groups to visualise the respective data*

Chi-Square Tests for the use of online quizzes | |||||

Value | df | Asymp. Sig. (2-sided) | Exact Sig. (2-sided) | Exact Sig. (1-sided) | |

Pearson Chi-Square | .144^{a} |
1 | .705 | ||

Continuity Correction^{b} |
.000 | 1 | 1.000 | ||

Likelihood Ratio | .144 | 1 | .705 | ||

Fisher’s Exact Test | 1.000 | .500 | |||

Linear-by-Linear Association | .139 | 1 | .710 | ||

N of Valid Cases | 30 | ||||

a. 0 cells (.0%) have expected count less than 5. The minimum expected count is 5.50. b. Computed only for a 2×2 table |

With regards to the use of digital technologies for assessment purposes in other subjects, all participants mentioned the use of clickers for assessment purposes in Information and Communications Technology (ICT). All students in the control and 53% of the students in the experimental group have also used clickers in assessment practices during the Religious Studies lessons. However, the use of clickers in these subjects varied from every term to rarely.

# Post-Intervention Test

The ANCOVA adjusted the test scores, according to the average score obtained during all the standardised Physics examinations taken by the students in the past 2 years, since they started learning Physics. Furthermore, ANCOVA was performed at two distinct levels. The mean scores obtained by the 2 groups in the post-intervention tests were first analysed for any statistically significant differences. Secondly, the resulting test scores were categorised according to the questions set, based on the different levels of Bloom’s taxonomy of educational objectives for the cognitive domain (Bloom 1956). The mean scores for each category were then separately analysed in order to establish any statistically significant difference between the 2 groups at the particular cognitive level being investigated. For all statistical tests, an α-level of 0.05 was utilised (Borenstein 2012). Partial η² was used as an effect size measure with values between 0.01-0.05 classified as small effects, between 0.06-0.13 as medium effects and bigger than 0.14 as large effects (Cohen 1988). All assumptions of ANCOVA were tested and justified prior to and also following the actual analysis (Field 2009) (**Appendix I**).

The adjusted mean for the control group was found to be 45.3%(±2.679), while that for the experimental group was 51.5%(±2.679). Figure 20 illustrates the original post-intervention score means with the relative standard deviation and the adjusted means along with the respective standard error.

**Figure 20** The original and adjusted post-intervention score means

After controlling for prior academic performance, there was no statistically significant difference between the 2 groups in the overall post-intervention test scores, *F*(1,27) = 2.663, p > 0.05, partial η² = 0.090, medium effect (Table 8). The results also indicate that the proportion of total variance in post-test scores that is associated with the intervention and not attributed to other variables, is of medium effect size (Richardson 2011).

**Table 8** ANCOVA results on the overall post-intervention test scores

**Overall Means Score**

*F*(1,27) = 2.663, p > 0.05, partial η² = 0.090

- Control Group Adjusted Mean (±2.679) 45.292%

- Experimental Group Adjusted Mean (±2.679) 51.508%

Dependent Variable:post-intervention test score | ||||||

Source | Type III Sum of Squares | df | Mean Square | F |
Sig. | Partial Eta Squared |

Corrected Model | 1254.426^{a} |
2 | 627.213 | 5.887 | .008 | .304 |

Intercept | 166.663 | 1 | 166.663 | 1.564 | .222 | .055 |

prior_academic_performance | 805.892 | 1 | 805.892 | 7.564 | .010 | .219 |

group | 283.711 | 1 | 283.711 | 2.663 | .114 | .090 |

Error | 2876.774 | 27 | 106.547 | |||

Total | 74408.000 | 30 | ||||

Corrected Total | 4131.200 | 29 | ||||

a. R Squared = .304 (Adjusted R Squared = .252) |

The same procedure was repeated for the test scores categorised according to the levels of Bloom’s taxonomy of educational objectives for the cognitive domain (Bloom 1956). No significant differences were found between the groups’ adjusted post-intervention means test scores at the knowledge, comprehension and application levels. However, the game-informed assessment group had significantly higher adjusted score means than the traditional assessment group at the analysis, synthesis and evaluation levels. Table 9 gives a summary of ANCOVA on the post-intervention test scores for the different cognitive levels (Bloom 1956).

**Table 9** Summary of ANCOVA on the post-intervention test scores for the different cognitive levels, categorised according to Bloom’s taxonomy of educational objectives (Bloom 1956)

**Knowledge Level**

*F*(1,27) = 0.166, p > 0.05, partial η² = 0.006

- Control Group Adjusted Mean (±2.776) 82.8%

- Experimental Group Adjusted Mean (±2.776) 81.2%

**Comprehension Level**

*F*(1,27) = 0.013, p > 0.05, partial η² = 0.000

- Control Group Adjusted Mean (±5.048) 36.1%

- Experimental Group Adjusted Mean (±5.048) 36.9%

**Application Level**

*F*(1,27) = 0.002, p > 0.05, partial η² = 0.000

- Control Group Adjusted Mean (±3.839) 30.2%

- Experimental Group Adjusted Mean (±3.839) 30.0%

**Analysis Level**

*F*(1,27) = 4.404, p < 0.05, partial η² = 0.140

- Control Group Adjusted Mean (±5.456) 36.5%

- Experimental Group Adjusted Mean (±5.456) 52.7%

**Synthesis Level**

*F*(1,27) = 4.707, p < 0.05, partial η² = 0.148

- Control Group Adjusted Mean (±4.788) 20.7%

- Experimental Group Adjusted Mean (±4.788) 35.5%

**Evaluation Level**

*F*(1,27) = 4.584, p < 0.05, partial η² = 0.145

- Control Group Adjusted Mean (±6.099) 20.8%

- Experimental Group Adjusted Mean (±6.099) 39.3%

## Go to the

Discussion of Findings