A series of crosstabulations and Chi-Square tests for independence were performed on the different questionnaire items, in order to compare and contrast the frequency of use of the different digital technologies for assessment purposes prior to the study, both in Physics and in other subjects, across the 2 research groups. An analysis of covariance (ANCOVA), which controlled for the influence of the covariate (Hedges 2012), defined by prior academic performance, was conducted on the post-intervention test, in order to determine any statistical significant differences in scores between the 2 groups.

Questionnaire

For the first item on the questionnaire (about the frequency of use of digital technologies for assessment purposes in past Physics lessons), the statistical analysis was performed on 2 levels. A crosstabulation was output for those digital technologies which all the students (from both groups) marked as ‘never’ used or ‘I don’t know what this is’ (Table 5). These include virtual learning environments, e-portfolios, blogs, wikis, clickers and web-based systems. No further statistical analysis, involving Chi-Square tests was necessary, as not enough variance was registered, since these technologies were never effectively used by any of the participants and as such no comparison of frequency of use could be made. On average, 49.4% of the students did not know what these technologies involved. E-portfolios and web-based assessment systems were unknown to most of the students (93.3% and 73.3% respectively), while 96.7% of the students, although aware of clickers, had never used them for assessment purposes in Physics. No students mentioned any another digital technology used for assessment purposes in past Physics lessons.

Table 5 Summary of the Crosstabulation outputs for the digital technologies, which were never used by the students prior to the study (SPSS output is given in Appendix F)

select the different digital technologies to visualise the respective data

For the only 2 technologies which were marked as somehow used for assessment in Physics, the statistical analysis included also Chi-square tests for independence. In terms of the frequency of ‘use of interactive whiteboards’, the data obtained was recoded in a 2-point scale in order not to have more than 20% of the expected counts of values less than 5 (Field 2009). The results show that the frequency of use is however limited from ‘weekly to monthly’ (43.3%) and from ‘rarely to never’ (56.7%). For the ‘use of online quizzes’, no recoding was necessary as all responses fell into either the ‘rarely’ (36.7%) or ‘never’ (63.3%) options. The Chi-square tests for independence for these technologies show no statistically significant difference in the responses across the control and experimental groups, for neither the interactive whiteboard (χ2(1) = 3.394, p = .065) (Table 6), nor online quizzes (χ2(1) = 0.144, p = .705) (Table 7). All expected cell frequencies were greater than five.

Table 6 The Chi-Square tests’ results for the use of interactive whiteboards, for assessment purposes in Physics, prior to the study (SPSS output is given in Appendix G)
select the different groups to visualise the respective data

 

Chi-Square Tests for the use of interactive whiteboards
Value df Asymp. Sig. (2-sided) Exact Sig. (2-sided) Exact Sig. (1-sided)
Pearson Chi-Square 3.394a 1 .065
Continuity Correctionb 2.172 1 .141
Likelihood Ratio 3.466 1 .063
Fisher’s Exact Test .139 .070
Linear-by-Linear Association 3.281 1 .070
N of Valid Cases 30
a. 0 cells (.0%) have expected count less than 5. The minimum expected count is 6.50. b. Computed only for a 2×2 table

Table 7 The Chi-Square tests’ results for the use of online quizzes, for assessment purposes in Physics, prior to the study (SPSS output is given in Appendix H)

select the different groups to visualise the respective data

Chi-Square Tests for the use of online quizzes
Value df Asymp. Sig. (2-sided) Exact Sig. (2-sided) Exact Sig. (1-sided)
Pearson Chi-Square .144a 1 .705
Continuity Correctionb .000 1 1.000
Likelihood Ratio .144 1 .705
Fisher’s Exact Test 1.000 .500
Linear-by-Linear Association .139 1 .710
N of Valid Cases 30
a. 0 cells (.0%) have expected count less than 5. The minimum expected count is 5.50. b. Computed only for a 2×2 table

 

With regards to the use of digital technologies for assessment purposes in other subjects, all participants mentioned the use of clickers for assessment purposes in Information and Communications Technology (ICT). All students in the control and 53% of the students in the experimental group have also used clickers in assessment practices during the Religious Studies lessons. However, the use of clickers in these subjects varied from every term to rarely.

Post-Intervention Test

The ANCOVA adjusted the test scores, according to the average score obtained during all the standardised Physics examinations taken by the students in the past 2 years, since they started learning Physics. Furthermore, ANCOVA was performed at two distinct levels. The mean scores obtained by the 2 groups in the post-intervention tests were first analysed for any statistically significant differences. Secondly, the resulting test scores were categorised according to the questions set, based on the different levels of Bloom’s taxonomy of educational objectives for the cognitive domain (Bloom 1956). The mean scores for each category were then separately analysed in order to establish any statistically significant difference between the 2 groups at the particular cognitive level being investigated. For all statistical tests, an α-level of 0.05 was utilised (Borenstein 2012). Partial η² was used as an effect size measure with values between 0.01-0.05 classified as small effects, between 0.06-0.13 as medium effects and bigger than 0.14 as large effects (Cohen 1988). All assumptions of ANCOVA were tested and justified prior to and also following the actual analysis (Field 2009) (Appendix I).


The adjusted mean for the control group was found to be 45.3%(±2.679), while that for the experimental group was 51.5%(±2.679). Figure 20 illustrates the original post-intervention score means with the relative standard deviation and the adjusted means along with the respective standard error.

Figure 20 The original and adjusted post-intervention score means

After controlling for prior academic performance, there was no statistically significant difference between the 2 groups in the overall post-intervention test scores, F(1,27) = 2.663, p > 0.05, partial η² = 0.090, medium effect (Table 8). The results also indicate that the proportion of total variance in post-test scores that is associated with the intervention and not attributed to other variables, is of medium effect size (Richardson 2011).

Table 8 ANCOVA results on the overall post-intervention test scores

Overall Means Score

F(1,27) = 2.663, p > 0.05, partial η² = 0.090

  • Control Group Adjusted Mean (±2.679) 45.292%
  • Experimental Group Adjusted Mean (±2.679) 51.508%

 

Dependent Variable:post-intervention test score
Source Type III Sum of Squares df Mean Square F Sig. Partial Eta Squared
Corrected Model 1254.426a 2 627.213 5.887 .008 .304
Intercept 166.663 1 166.663 1.564 .222 .055
prior_academic_performance 805.892 1 805.892 7.564 .010 .219
group 283.711 1 283.711 2.663 .114 .090
Error 2876.774 27 106.547
Total 74408.000 30
Corrected Total 4131.200 29
a. R Squared = .304 (Adjusted R Squared = .252)

The same procedure was repeated for the test scores categorised according to the levels of Bloom’s taxonomy of educational objectives for the cognitive domain (Bloom 1956). No significant differences were found between the groups’ adjusted post-intervention means test scores at the knowledge, comprehension and application levels. However, the game-informed assessment group had significantly higher adjusted score means than the traditional assessment group at the analysis, synthesis and evaluation levels. Table 9 gives a summary of ANCOVA on the post-intervention test scores for the different cognitive levels (Bloom 1956).

Table 9 Summary of ANCOVA on the post-intervention test scores for the different cognitive levels, categorised according to Bloom’s taxonomy of educational objectives (Bloom 1956)

Knowledge Level

F(1,27) = 0.166, p > 0.05, partial η² = 0.006

  • Control Group Adjusted Mean (±2.776) 82.8%
  • Experimental Group Adjusted Mean (±2.776) 81.2%

Comprehension Level

F(1,27) = 0.013, p > 0.05, partial η² = 0.000

  • Control Group Adjusted Mean (±5.048) 36.1%
  • Experimental Group Adjusted Mean (±5.048) 36.9%

Application Level

F(1,27) = 0.002, p > 0.05, partial η² = 0.000

  • Control Group Adjusted Mean (±3.839) 30.2%
  • Experimental Group Adjusted Mean (±3.839) 30.0%

Analysis Level

F(1,27) = 4.404, p < 0.05, partial η² = 0.140

  • Control Group Adjusted Mean (±5.456) 36.5%
  • Experimental Group Adjusted Mean (±5.456) 52.7%

Synthesis Level

F(1,27) = 4.707, p < 0.05, partial η² = 0.148

  • Control Group Adjusted Mean (±4.788) 20.7%
  • Experimental Group Adjusted Mean (±4.788) 35.5%

Evaluation Level

F(1,27) = 4.584, p < 0.05, partial η² = 0.145

  • Control Group Adjusted Mean (±6.099) 20.8%
  • Experimental Group Adjusted Mean (±6.099) 39.3%

Go to the

Discussion of Findings

Next