6570141442

6570141442



Summary

Studies on comparability of assessment conducted by the Educational Research Institute are consistent with a rangę of activities undertaken by the Central Examination Board (Centralna Komisja Egzaminacyjna - CKE) that are designed to ensure reliability of assessment of exam responses to constructed responses. Assessment of responses to such open questions is inherently burdened with the rater effect which results from the fact that assessment is conducted by humans with all their knowledge, experience, beliefs, empathy and other individual characteristics. Therefore, differences in exam results depend not only on the varying level of skills examined, but also on the rater. At humanities exams, where writing of an essay is required, a significant portion of variation in the exam results is generated by the effects associated with the personal characteristics of the rater (in this report, in generał refer-red to as the rater effect). In the presented studies, the effect was analysed for two major Matura exam subjects - Polish language and mathematics, which are susceptible in a completely different way to these factors.

Respective chapters of this report present studies in this area conducted in Roland to datę, main methodological assumptions, specification of the hierarchical rater model used in the analyses and study results. The main value of the report is the search for Solutions designed to minimise the rater effect on the performance of students sitting the exam.

2011 and 2012 were selected for the rater effect study. For practical reasons, exam papers were drawn from two Regional Examination Boards (Okręgowa Komisja Egzaminacyjna - OKE): Jaworzno and Kraków. It was important to collect a set of papers with the following characteristics:

1.    diversity of the level of examined skills against the entire scalę used in the study;

2.    for the Polish language - diversity of essay topics for the analysis of the rater-essay topie interaction.

Studies were conducted on a representative sample of raters involved in the assessment of Matura exams, stratified for Regional Examination Boards (extended by teachers that are not raters and with students, in the case of the Polish language Matura exam). A complex model was used to link all the papers and raters from the eight OKE districts, as a result of which each essay from the Polish language exam was assessed by one rater from each OKE. In the case of Matura exam in mathematics, for which papers on the basie and extended level were selected, each paper was assessed by four raters (each from a different OKE).

Assessment was conducted in conditions that were possibly closest to those present at the main Matura exam session (held in May). An analogical assessment coordination structure was applied, the teams were located in the same cities, and unchanged assessment criteria and schemes were applied.

Results of the analyses

Studies have shown that raters have significant influence on assessment results, both for the Polish language and mathematics Matura exam. In addition, in the case of the Polish language, this influence is dearly stronger. Difference in leniency at the level of test between 25% of the most lenient raters and 25% of the most stringent raters for the Polish language is (depending on the year and essay topie) from 3.1 to 3.7 percentage points of the exam result For mathematics, these differences are between 0.87 to 1.36 percentage points of the exam result2. Such result is not surprising. As research show, rating essays will always be to some extent subjective. Reaching fuli rater agreement is utopian. Demanding 100% rater agreement leads to petrification of criterion-referen-ced assessment. As a consequence, it can has a negative influence on teaching important skill of writing essays.

Differences in the leniency of assessment occur not only at the individual level, but also at the level of Regional Examination Boards. In the case of the Polish language, in some OKE these differences are systematic and are present in all essay topics examined. The average difference between the most stringent and most lenient board was 1.6 percentage points of the exam result. Differences in the exam results of students at the OKE level show that the effect of raters' leniency explains a large portion of this diversity (on average for all topics 11% of results' variance is explained). In the case of mathematics differences at the OKE level also occur, but

Result for the entire exam - including short-answer tasks and an essay for the Polish language as well as dosed and open tasks for mathematics.



Wyszukiwarka

Podobne podstrony:
DAMACE ASSESSMENT ON STORED MUNGBEAN AND SOYBEAN BY THE COMMON BEAN WEEV1L TABLE1 Effecis of Calloso
Research Papers 1    Characteńzation of PF ńngs by the finite topology on duals of R
26 Paweł Mąkosa Teachers participating in the survey conducted by the Ministry of Education in 2012
146 United Nations — Treaty Series 1972 or semi-trailers on the basis of authorizations issued
Bibliografia 247 i Wombiewo koscjanskogo ujezda, Studies on Sorls of the State Farm Parzęczewo and
5.    Głowacz Michał, Grabowski Łukasz - Development of stands conducted by selection
53 On Pricking Touches and Peals by the Lmd-ends. a linę underneath to show that this, the position
On Pricking Touckes and Peals by the Lead-tnd, 55 The following are different peals of Boh Doubles,
LITERATURA Banaszak J.,    1980: Studies on methods of censusing the numbers of
kolibry w malwach (2) Next, add the details as descnbed, using the number of strands indicated by th
kolibry w malwach (2) Next, add the details as descnbed, using the number of strands indicated by th
oak sih5 27Chapter 4Odin s Flame Many and varied were the types of swordhilt used by the Vikings; m
POSTER SESSION. H7INVESTIGATION OF THE RING PUCKERING HOT BANDS OF OX-ETANE BY THE RITZ PROGRAM GIOV
Preface Thepublication "Energy from renewable sources in 2010” is the 6th edition of study prep

więcej podobnych podstron