Criteria levels of performance
The reąuired level of performance for success shoułd be specified. This may involve a simple statement to the effect that, to demonstrate “mastery“, 80 % of the items must be responded to correctly.
Scoring procedures
If an objective is tested by morę than one item (say, five items) then it is possible to speak of mastery of the objective. If somebody gets four of the five items right, the person has displayed 80 % mastery of the objective, according to the test. The test may be a series of such items.
The test constructor has to consider how long it will take to score particular types of items. The morę objective the item, the higher the scorer reliability is likely to be (i.e., the likelihood that two different scores would come up with the same score for a particular respondenta test). Machinę scoring involves separate answer sheets.
The following generał principles must be observed when constructing a language test:
(1) The principle of validity - i.e-, making surę that the measurements and assessments we obtain reflect what we want them to reflect. A number of different statistićal procedures can be applied to a test to estimate its validity. Such procedures generally seek to determine what the test measures, and how well it does so. But ultimately, validity can only be established by observation and theoretical justification.
(2) The principle ofscope — i.e., making surę that we measure or assess all the varied components of foreign language competence and skills.
(3) The principle of efficiency - i.e., obtaining the best assessments we can obtain within the limits oftime and resources available for the construction and administration of the assessments.
(4) The principle of reliability - a measure of the degree to which a test gives consistent results. A test is said to be reliable if it gives the same results when it is given on different occasions or it is used by different people. Scorer reliability is the consistency of scoring by two or morę scorers. If very subjecńve techniąues are employed in the scoring of a test, one would not expect to find high scorer reliability.
If the above axiomatic principles are carefully met, a test should then be administrable within given constraints, be dependable, and actually measure what it intends to measure.
6.5. Test items
Test item is a ąuestion or element in a test which reąuires an answer or response. Several different types of test item are commonly used in language tests, including:
(1) altemate response item: one in which a correct response must be chosen from two altematives, such as True/False, Yes/No, or A/B,
(2) fixed response item. also closed-ended response: one in which the correct answer must be chosen from among several altematives.
A multiple-choice item is an example,
(3) free response item. also open-ended response: one in which the student is free to answer a ąuestion as he/she wishes without having to choose from among altematives provided,
(4) structured response item: one in which some control or guidance is given for the answer, but the students must contribute something of their own, e.g. after a reading passage, a comprehension ąuestion. (Richards 1992:377)
To illustrate the above, some of the most common types of test items will be briefly considered:
— True-false tests involve the acceptance or rejection of a Statement or utterance heard or read. They are useful as tests of listening or reading comprehension, or of knowledge of historical,
71