
Information for UTS staff
on
Assessment
Compiled by Keith Trigwell for the
Working Party on Assessment, Autumn Semester, January 1992.
Composition of the Working Party: Professor
Brian Low (Chair), Pro-Vice-Chancellor (Academic Support); Professor
Christine E Deer, School of Teacher Education; Rolf Kater, Student
Association; Associate Professor Peter Logan, School of Physical
Sciences; Dr Keith Trigwell, Centre for Learning and Teaching.
Contents
A Note for
UTS Staff
ASSESSMENT ISSUES
Why Assess
Students?
Assessment and Learning
Over-assessment
Statistical Issues in
Assessment
Other Issues
FORMS OF ASSESSMENT
Multiple-choice
Questions
Essays
Short Answer Questions
Alternative Examinations
Learning Contracts
Peer Assessment
Oral (Viva voce) Examinations
Seminar Presentations
Other Forms of Assessment
ASSESSING PARTICULAR LEARNING EXPERIENCES
Assessing Work
Experience
Assessing Laboratory
Work
Projects and Group Work
Assessing Problem-based Learning
Assessing Mastery Learning
COMPLETE ASSESSMENT PACKAGES
KEY REFERENCES
A
Note for UTS Staff
This publication has been prepared for UTS staff by the Working Party on Assessment. It is presented initially as part of the Report from the Working Party to Academic Board, but is intended for use as a stand-alone resource book for UTS staff.
The contents draw heavily on previous writings on assessment, on the assessment practices of some sections of UTS, on contributions from staff of UTS, and on submissions to the Working Party. These contributions are acknowledged within the text.
This document should also be considered in conjunction with two publications which overlap with and extend the issues addressed here. Both of these publications form a part of this information package on assessment and a copy of each is being supplied to each School. They are:
Andresen, L., Nightingale, P., Boud, D. and Magin, D. (1989). Strategies for Assessing Students (Teaching with Reduced Resources, No. 1.) Professional Development Centre, University of NSW.
Gibbs, G., Habeshaw, S. and Habeshaw, T. (1986). 53 Interesting Ways to Assess Your Students. Technical and Educational Services: Bristol.
The publication is arranged in six parts. The first part addresses three issues: the relations between assessment and learning, over-assessment and ways to overcome it, and statistical issues in assessment.
The second part contains a description, advantages, disadvantages, procedures and examples of the common forms of assessment used at UTS, or forms which, if not commonly used at UTS, may be of interest to staff. The third part looks at assessing particular learning experiences.
The fourth part addresses the relations between objectives for a subject and the whole assessment package for that subject, and includes UTS examples.
Parts five and six contain a glossary of assessment terms, and a summary of the references.
Further information on assessment is available
from the Centre for Learning and Teaching.
Why
Assess Students?
There are two main reasons for assessing students. The first is so that staff can determine the level at which a student is performing in terms of knowledge and understanding, skills and, in some subjects and courses, attitudes. The appropriate skills, knowledge and understanding are described in the objectives set for the subject, these objectives being provided to students at the beginning of study of the subject. An appropriate set of assessment tasks is one which, in the professional judgement of the lecturer, allows students to demonstrate their level of performance in achieving the objectives of the subject. This type of assessment, when used to determine grades or marks is known as summative assessment.
The second reason is to give students feedback on their progress towards the development of knowledge, understanding, skills and attitudes. This type of assessment is known as formative assessment.
Assessing students' knowledge, understanding, skills and attitudes is an extremely difficult task. In general, the instruments we use to assess students tend to be limited in what they are able to reveal. We are further constrained by limitations such as having to assess whole groups at certain times in ways that lead to qualitatively different grades, and all the methods used are compromised because of limited financial and human resources. We must accept that none of the methods we currently employ are totally satisfactory.
What this amounts to is that we need to retain an open mind in our interpretation of assessment information and vigilance in our use of these methods so that we do not read more into the information we collect than we can reasonably expect using those methods.
This publication aims to address some of
the issues affecting the quality of our assessment strategies.
It also contains summaries of the major types of assessment, examples
of their use in UTS and sources for further information.
Assessment
and Learning
The following extract contains a brief overview of the literature on the relations between assessment and learning. This section is included because the learning side of assessment is often neglected when resource pressures force assessment methods to focus more on accreditation than on learning.
There are two main purposes of student assessment. The first intends to improve the quality of learning. Students engage in the problems and discourse of a given area and are given encouragement, response and feedback on what they do, as appropriate, with a view to them becoming more effective in their learning. This is formative assessment, or assessment for learning. The second concerns the accreditation of knowledge or performance: students are assessed to certify their achievements. This occurs primarily for the award of a degree or diploma, though various components of assessment are usually taken into account in making this judgement. This is summative assessment, or assessment for the record.
In both cases judgement is involved, but in the first it directly serves the needs of the student and in the second it primarily serves the needs of the external world. Assessment also contributes to motivation through the recognition of achievement. However, the relationship between certification and motivation is a complex one. For as many high-achieving students who are encouraged and stimulated by their high grades, there are others who are discouraged and alienated by their lesser grades. Grading per se is not a motivator and can only be used as such with very great care in any given situation. Assessment is also used for various administrative reasons, such as for the allocation of students to particular groups, but these should be regarded as secondary purposes.
Students tend not to learn well if we are not effective in the former, and they cannot be recognised as competent if we neglect the latter. Unfortunately, resource pressures increasingly lead us to protect assessment for accreditation at the expense of assessment for learning. Learning is so driven by assessment that the form and nature of assessment often swamps the effect of any other aspect of the curriculum.
While there may not be a current public debate about assessment, over the past 20 years an interesting literature has been emerging in higher education on the relationship between assessment and learning. It is not the place here to review it in great detail, but research has shown the following features.
(1) Students are assessed on those matters on which it is easy to assess them, and this leads to an over-emphasis on memory and lower-level skills (e.g. Black, 1969). Creating questions which test higher order skills is not impossible, but it demands a degree of professional commitment to test design which is absent from many departments.
(2) Assessment encourages students to focus on those topics which are assessed at the expense of those which are not (e.g. Elton & Laurillard, 1979). In other words, assessment tasks define the syllabus, and, if students want to get good marks, they focus on these aspects at the expense of others which might capture their interest.
(3) The nature of assessment tasks influences the approaches to learning which students adopt (e.g. Ramsden, 1988). Not only does the content of assessment define what is to be studied, but also the kind of task required shapes the learning strategy of students. If students perceive reproduction of information to be rewarded, they will emphasise memory work, and if they see problem-solving emphasised, they will tend to practise solving problems.
(4) Students who perform well in university examinations can retain fundamental misconceptions about key concepts in the subjects they have passed (e.g. Dahlgren, 1984). Some of the most profoundly depressing research on learning in higher education has demonstrated that successful performance in examinations does not even indicate that students have a good grasp of the very concepts which staff members believed the examinations to be testing.
(5) Students give precedence to assessment which is graded (e.g. Becker et al., 1968). Grading acts as a kind of currency indicating what teachers value. It is in the best interests of students to focus on those things which produce the greatest return.
(6) Successful students seek cues from teachers to enable them to identify what is important for formal assessment purposes (e.g. Miller & Partlett, 1974). Effective performers often use the strategy of attending lectures in order to obtain cues about what of the vast range of matters in a given subject will be emphasised in examinations. They focus their energies on these and may spend significantly less time in studying than their less successful peers.
The picture painted by this research is bleak. Despite the good intentions of staff, assessment tasks are set which encourage a narrow, instrumental approach to learning that emphasises the reproduction of what is presented, at the expense of critical thinking, deep understanding and independent activity.
These findings indicate effects which are quite contrary to those which are sought. Students are discouraged from taking initiatives beyond their lecturer's interpretation of syllabus, and they spend their time 'swotting for examinations' rather than trying to internalise and make sense of the subject. Evidence such as this suggests that very great care must be exercised in the selection and implementation of assessment asks, otherwise they can have counter-productive results.
Becker, H., Geer, B and Hughes, E.C. (1968). Making the Grade: The academic side of college life. New York: Wiley.
Black, P.J. (1969). University examinations, Physics Education, 3, 93-99.
Dahlgren, L.-O. (1984). Outcomes of Learning. In F. Marton, D. Hounsell and N. Entwistle (Eds) The Experience of Learning. Edinburgh: Scottish Academic Press.
Elton, L. and Laurillard, D.M. (1979). Trends in research in student learning, Studies in Higher Education, 4, 87-102.
Miller, C.M.L. and Partlett, M. (1974). Up to the Mark: A study of the examination game. Guildford, SRHE and Open University Press.
Ramsden, P. (1988). Studying Learning, Improving Teaching. In P. Ramsden (Ed) Improving Learning: New Perspectives. London: Kogan Page.
Boud, D. (1990). Assessment and the Promotion
of Academic Values, Studies in Higher Education, 15, 101-111.
Over-assessment
In many sectors of Higher Education there is a general tendency to over-assess students. Some of these practices arise from a desire to increase reliability in the test and to encourage students to study widely for the course. However, over-assessment increases staff time needed to run the system and does not necessarily lead to greater student learning. This section contains some information on avoiding over-assessment. It is taken from one section of the booklet Strategies for Assessing Students which is being circulated to all Schools. It is presented here also as an example of what the booklet contains. Other strategies related to reducing the demands of assessment when there are fewer resources are listed at the end of this section.
First consider
Then consider in the particular assessment
task being implemented,
After these considerations, the items that follow may provide a source of ideas for you to use in starting to develop a more economical assessment system.
Assessment demands have a major impact on both student work-load and on staff resources. It is often possible to reduce the number of tasks for assessment without detracting from the rigour of a course or diminishing the reliability of final assessment grades.
In a study of assessment loads at Macquarie University large variations were found from course to course, and some subjects were able to be pin-pointed as clearly being over-assessed by comparison with others.
We should try to estimate the number of hours spent by staff in setting, explaining, supervising, collecting, marking, grading, processing scores and giving feedback, all relative to the number of contact house of the course.
A similar exercise can be carried out in regard to student loads. There we can compare the number of hours typically spent being briefed upon, carrying out, and receiving feedback on assessment tasks, relative to the total contact hours of the course.
Subjects with anomalous assessment loads on either students or staff become, by definition, cases for suitable attention.
Macquarie University Education Committee (1984) 4-5, 19-21
It is wise to try to anticipate where students will take the questions we set - essay questions in particular. Some instructions, some questions, may subsequently make our task as markers much more difficult than we ever intended. It is obviously best to discover this fact before students go to work on it.
This involves taking time to put ourselves in a student's position and imagine some of the possible mis-interpretations of our question - and what those might lead to when we finally get back the piece of work for marking. This can be difficult when it concerns questions we have ourselves set, and other staff can often given the best insight into ambiguities and lack of precision in our instructions and question-wording.
Nightingale (1986) 18-27
Provided group work is compatible with the aims of assessment in a particular case, it can often be most efficient and economic if group assessment tasks are both done and marked in class time.
A seminar group, for instance, can handle a question in class by collaboratively drafting not-form or outline answers. The result can then be appraised in class.
Either approach can lead to better discussion and clarification of writing or problem-solving strategies and marking standards.
Gibbs et al. (1986) 103-107; Nimmo (1977) 186-187
These refer to any kinds of tasks that may lead up to the major one. Where prior tasks on this kind are set, it is often reasonable to grade only the last, and the earlier ones may not need to be assessed at all in the conventional way.
For examples of these prior-stage tasks, we should consider - among other more obvious types - the use of course journals, reading logs, and reflective pieces of many different kinds.
Nightingale (1986) 10-13; Zubrick (1985); Gibbs et al. (1986) 115-117
Assessment that is "formative" in function - i.e. not required in order to give a final grade- can often be carried out on lead-up tasks, such as synopses of proposed essays.
In such instances it should be remembered that the appraisal of these does not necessarily have to be done by academic staff. Students can given each other constructive feedback and learn a lot as they do so, provided the task is designed and set with this option in mind.
Paxton (1976)
Short answers, lists, synopses and outlines can be used to replace extended essays in many cases. If we critically inspect what really needs to be assessed in our subjects, this might be only a portion of what would be demonstrated in a full, complete essay.
We need to ask why, then, we should have to mark a full essay if much of what it reveals is superfluous to our present assessment needs?
The main arguments against giving a choice among essay-type question options in tests and examinations have nothing to do with economics or efficiency in the first instance. They are that, according to research, if a paper has say 12-15 questions of which only 5 are to be chosen, the chances are that:
(i) the different options will vary in difficulty,
(ii) marker subjectivity and unreliability will be greatly amplified, and
(iii) many students will choose unwisely among options, not doing themselves justice.
Hence on all empirical counts the practice of giving a choice tends to make essay examinations unreliable and unfair. Each student in fact sits for a different paper.
On the issue of resource efficiency, common sense indicates that we have less work to do if ...
(i) fewer questions need writing when the paper is being set, and
(ii) fewer marking schemes need designing during the marking stage.
In addition markers generally find it more efficient to mark more answers of the same question than a few answers of many different questions.
So if you think you would find it easier to set papers comprising a set of common questions with no options, you have research and commonsense both on your side.
Students, accustomed to being given options, may resent the change as being unfair, so you may need to marshal your facts and explain clearly your reasons for the new policy. It will be a fairer exam for them, and the marks you award will be more reliable.
Stanton (1981)
Student surveys often come out strongly in favour of a diversity of examining methods because of the very reasonable belief that only in this way can a "fair deal" be assured.
Apart from any intrinsic value in using a diversity of assessment methods, a policy of encouraging diversity may also be consistent with gain in efficiency.
Whether this is so will depend, however, upon which particular formats are included in the package, since not all methods need be as time-consuming as others in setting and marking. For example, an economically diverse package may need to comprise only one long written task, supplemented by a variety of shorter, more easily marked tasks.
As an instance, the examination structure in one professional medical course involves ...
(i) a written paper using the modified essay question style,
(ii) a multiple-choice test,
(iii) a simulated diagnostic interview, and
(iv) a practice test.
Beard (1970) 193-196
Beard, R. (1970). Teaching and Learning in Higher Education. Harmondsworth: Penguin.
Gibbs, G., Habeshaw, S. and Habeshaw, T, (1986). 53 Interesting Ways to Assess Your Students. Technical and Educational Services: Bristol.
Macquarie University (Education Committee) (1984). Report of Working Party on Teaching and Assessment Methods.
Nightingale, P. (1986). Improving Students' Writing. Sydney: HERDSA Green Guide No 4.
Nimmo, D.B. (1977). The undergraduate essay: A case of neglect? Studies in Higher Education, 2, 183-189.
Paxton, S. (1976). Pre-submitting essays, re-attempting tests. University of Queensland, St Lucia: TEDI.
Stanton, H. (1981). Optional Question on Essay-Type Examination Papers. The University Teacher, 4 (2).
Zubrick, A. (1985). Learning through writing: the use of reading logs. HERDSA News, 7 (3), 11-12, 24.
Andresen, L., Nightingale, P., Boud, D. and Magin, D. (1989) Strategies for Assessing Students (Teaching with Reduced Resources, No. 1.) Professional Development Centre, University of NSW.
Other strategies in the booklet are
1. Deciding whose interests assessment is serving
2. Avoid over-sampling the course
3. Avoid over-questioning topics
4. Avoid over-reading student work
5. Avoid over-commenting on student work
6. Avoid over-grading student work
7. Refine current policies and use present methods better
8. Consider alternative approaches to assessment policy and practice
9. Consult original sources for how-to-do details
10. Get support, help or advice from an
Educational Development Consultant
Statistical
Issues in Assessment
Assessment in education is a statistical problem. Statistics can be defined as 'the art and science of making sense of numerical information'.
You do not have to be a statistician to carry out assessment in education, but it helps to remember that what you are doing is essentially something statistical. If you neglect the advice that statistical theory has to offer, you run the risk of making serious mistakes.
Statistical theory is not in some way opposed to a human, sympathetic and flexible approach to assessment. On the contrary, a careful statistical analysis of the problem shows the need for these very qualities.
To measure a student's performance, industry or potential in a certain subject area, either in absolute terms or relative to other students.
What are we trying to measure (for example, in intelligence tests)? Are we measuring in absolute or relative terms?
How can we tell if one test is a better measuring instrument than another? How do we get the best test, or the ideal method of assessment?
How do we make test marks from different situations or subjects comparable? How do we combine information from a number of different measurements?
What happens if we do not do anything statistical?
Criterion referenced assessment: how well has the student done by comparison with some predetermined criterion?
Norm referenced assessment: how well has the student done by comparison with the norms established by his or her colleagues in the whole group?
Another possibility is self referenced assessment: how well has the student done by comparison with their own earlier achievements?
Norm referenced assessment may be the most appropriate for public examinations such as the Higher School Certificate (HSC). With large numbers of students, the standard can be expected to be similar from year to year. One of the functions of the HSC is to rank students for tertiary entrance.
Criterion referenced assessment seems to be more appropriate for the typical university course. Numbers are smaller, so there is no guarantee that the standard is the same from year to year. We are comparing students with predetermined criteria - the course objectives - to determine whether they should be passed.
In practice there is not as much difference between criterion and norm referenced assessment as there seems to be. In criterion referencing, we are comparing a student against norms established by a larger number of other students at earlier times. With criterion referencing, we can sort students into multiple categories (for example, graded passes). With norm referencing, we can get information about how far away the student was from the pass/fail criterion.
We often use norm referenced techniques to obtain and combine marks from assessment events and then criterion referenced techniques to report the final results as Fail, Pass, and Graded Passes. We should be aware of the dual nature of this process and ensure that we keep the nexus between them by adjusting the marks so that 50% does in fact represent what we regard as a Pass (in subject where 50% is a pass) and similarly for the higher grades.
Validity is concerned with whether a test measures the ability we are trying to measure, or some different related or unrelated ability. (In statistical terms, is it an unbiased estimator of the student's ability?)
Validity is concerned with questions such as: Is my assessment of the student's behaviour correct? Have I really seen evidence of what I think I have? How confidently can I generalise from what I have seen? Can I predict anything of the student's educational future?
Reliability (or consistency) is concerned with whether the assessment measures with low variability between repeated situations. (In statistical terms, do repeated measurements have a high correlation?)
Reliability is concerned with questions such as: Would other assessors agree with my interpretation of the student's behaviour? Would I myself interpret his or her behaviour in the same way if I saw it again?
Ebel (p. 309): Expertly constructed educational achievement tests often yield reliabilities of 0.90 or higher. In contrast, the achievement tests used in many elementary, secondary and college classrooms often show reliability coefficients of 0.50 or lower.
Objective or multiple choice type tests are more reliable than subjective, divergent tests such as essay questions (typical reliability of 0.8 for re-reading by same rater, 0.7 for two raters, 0.7 for two essays from same candidate with one rater). Validity and reliability are opposed. The more structured the question, the less evidence we have on which to make valid assessments of the qualities we are interested in, but the more agreement we will get about our assessment. This is not to say that objective questions are totally objective. There is a subjective element in the choice of which questions to include.
Rowntree (p. 197): The more a student's performance approximates a 'work of the imagination', the less 'reliable' is assessment - unless assessors agree to ignore the contentious aspects and concentrate on the measurable. And the more we look for high quality in students, the more divergent and open to the exercise of the imagination are our assessment methods likely to be.
When given to a group of people, any test/examination/assessment task results in scores. These scores can be summarised by an average value (the mean) and a spread or measure of variability (the standard deviation).
Assessment tasks do not measure without error. With different questions, at different times and under different conditions, a student's score will be different. The standard error of measurement gives a reasonable plus or minus on each individual score. The value of the standard error of measurement depends on the standard deviation of the scores and on the reliability of the test (see previous section). The standard error of difference gives a reasonable plus or minus on the difference between two scores.
For a high-reliability test (a public examination like the HSC, for instance) two students with equal ability or achievement would score marks up to 3.5 different 68% of the time and more than 7 marks different in the most extreme 5% of cases.
The situation is far worse with less reliable tests as shown in Table 1:
Table 1 Statistics for Different Reliabilities
| Reliability | Standard Error | Standard Error of difference |
For equal students, 5% of the time the difference exceeds |
| 0.96 | 2.5 | 3.5 | 7 |
| 0.90 | 4.0 | 5.6 | 11 |
| 0.84 | 5.0 | 7.1 | 14 |
| 0.77 | 6.0 | 8.5 | 17 |
Assessment marks obtained from class tests or exercises generally have much lower reliability than 0.77, but we tend to average several of them, reducing the standard error. For example, averaging four tasks halves the standard error of measurement (or makes the overall measurement twice as accurate).
These results should make us wary of automatically assuming that a student who gets 52% has 'passed' while a student who gets 48% has 'failed'. They should also make us wary about spending time marking students' work using an overly complicated marking scheme.
Combination of information from several assessment events presents many traps. If we are using a pure criterion-based approach we will have information on whether a student has achieved the criterion on a number of separate assessments. Then we may want to set up an overall criterion for the combination. For instance: the student must reach the acceptable level in all components (or maybe in all but one of the components).
Usually, however, information in the form of marks is combined using a norm referenced approach. Then we must beware of the following situation. If we are combining sets of marks from tests A and B (both as %) in a desired weighting, the actual weighting is determined by the relative standard deviations of the marks from A and B. For example, if a practical assignment with a narrow spread (standard deviation = 5) is combined with an examination with a fairly wide spread (standard deviation = 15) the desired weighting is 50:50, but the actual weighting is 25:75.
The results of combining marks from components of assessment which have different spreads are clearly illustrated in the following example (Clark, 1990). Table 2 shows rankings of students A to L as a result of addition of raw scores.
Table 2: Student Rankings Before Scaling
| Student Assessment Component Total | |||||||||
| A | B | C | D | E | F | G | H | ||
| A | 50 | 50 | 80 | 50 | 25 | 60 | 32 | 42 | 419 |
| B | 95 | 70 | 15 | 80 | 40 | 40 | 30 | 48 | 418 |
| C | 50 | 80 | 65 | 20 | 70 | 75 | 34 | 40 | 414 |
| D | 81 | 60 | 30 | 60 | 60 | 36 | 44 | 42 | 413 |
| E | 94 | 20 | 75 | 50 | 50 | 30 | 47 | 46 | 412 |
| F | 60 | 90 | 20 | 75 | 30 | 45 | 45 | 46 | 411 |
| G | 100 | 40 | 40 | 70 | 20 | 40 | 52 | 48 | 410 |
| H | 10 | 75 | 50 | 55 | 30 | 60 | 56 | 45 | 381 |
| I | 31 | 30 | 60 | 70 | 48 | 35 | 55 | 50 | 379 |
| J | 80 | 10 | 50 | 25 | 30 | 70 | 50 | 55 | 370 |
| K | 20 | 80 | 30 | 30 | 40 | 55 | 60 | 52 | 367 |
| L | 70 | 25 | 10 | 50 | 40 | 50 | 55 | 60 | 360 |
The ranking in Table 3, which is the exact reverse of that in the first, comes from scaling the raw scores before addition.
Table 3: Student Rankings After Scaling
| Student Assessment Component Total | |||||||||
| A | B | C | D | E | F | G | H | ||
| L | 67 | 19 | 0 | 50 | 40 | 50 | 83 | 100 | 409 |
| K | 11 | 88 | 29 | 17 | 40 | 63 | 100 | 60 | 408 |
| J | 78 | 0 | 57 | 8 | 20 | 100 | 67 | 75 | 405 |
| I | 23 | 35 | 71 | 83 | 56 | 13 | 83 | 50 | 404 |
| H | 0 | 81 | 57 | 58 | 20 | 75 | 87 | 25 | 403 |
| G | 100 | 38 | 43 | 83 | 0 | 25 | 73 | 40 | 402 |
| F | 56 | 100 | 14 | 92 | 20 | 38 | 50 | 30 | 400 |
| E | 93 | 13 | 93 | 50 | 60 | 0 | 57 | 30 | 396 |
| D | 79 | 63 | 29 | 67 | 80 | 15 | 47 | 10 | 390 |
| C | 44 | 88 | 79 | 0 | 100 | 63 | 13 | 0 | 387 |
| B | 94 | 75 | 7 | 100 | 40 | 25 | 0 | 40 | 381 |
| A | 78 | 50 | 100 | 50 | 10 | 75 | 7 | 10 | 380 |
The method of scaling the marks is to give the lowest mark zero and the highest mark 100, then determine all other marks by a straight line conversion graph. For more information see Gilmore (1987).
It should be noted, however, that when we inform students that "50% of their marks will be based on classwork and 50% on the exam" we are not necessarily implying that these two areas will contribute equally to their final ranking. Much assessment in Higher Education is not about ranking, but with judging whether students have achieved the objectives of the subject.
This is the process by which the results obtained on assessment tasks which have not been taken in common are made comparable by means of results obtained on a common task. The distribution of marks in each subgroup is adjusted to have the same distribution as the marks achieved by that subgroup on the common assessment task (though usually only the mean and standard deviation are adjusted).
The process is commonly used at the secondary school level. For example, moderation of class assessments over several 2 Unit mathematics classes at a TAFE College by means of a common test, or moderation of assessment marks from all NSW schools by means of the HSC examination.
In the university situation, it may be necessary to carry out a moderation process if a large class is taught in several parallel groups and assessed with different class tests or practicals and a common final examination. However, it seems that the moderation process is rarely attempted at university level.
There are three potential problem areas in moderation. The first is that the calculations need to be done on a computer as they are not usually feasible by hand. Second, the common assessment task should not be measuring something very different from the other tasks. Thus, a written examination is not a valid moderator for practical marks. Third, the common assessment task must truly be in common, i.e. under the same conditions, at the same time, with unbiased supervisors and unbiased marking. If common conditions do not hold, the complete moderation process is invalid.
This is a difficult job, and the details will not be discussed here. It is a more immediate problem in public matriculation examinations such as the HSC. In the university situation, it is used to calculate the weighted average mark (WAM) which is only used for excluding students who perform poorly or awarding prizes to outstanding students.
The most straightforward and the least acceptable method is simply to average the raw percentage marks. This is the method used currently to calculate the WAM at UTS.
A better method is to scale the marks in each subject to the same mean and standard deviation, and then to simply average the marks across all subjects taken (or a certain number of the best subjects taken, as at the HSC). This obviously discriminates against the more intellectually demanding subjects and favours the 'easy options', not something that most teachers would encourage.
Statisticians are generally agreed that the best method of combining marks from different subjects is to use some form of inter-subject scaling to calculate weightings for different subjects before they are averaged. In this process, the marks in each subject are adjusted to reflect the ability of the candidature in that subject. This is the method used at the HSC to obtain a 'Tertiary Entrance Rank'.
Statistically speaking, the WAM as calculated at UTS is a very primitive way of assessing students' overall achievement. However, it may be adequate for picking out the very poorest and the most able students.
Do remember that the aim of statistics is to use the numerical information intelligently. The following quotation illustrates the statistical spirit in assessment, although some people see it as cheating, fudging, or non-statistical:
Before any set of marks or ranks is considered final, the assessments should be examined subjectively by all teachers involved. Any anomalies which appear to have been introduced by the whole assessment process should then be removed by changing the mark or rank of the student(s) concerned. It must be recognised that aggregate marks are not absolute truths: they suffer from errors of measurement, unintended weightings and losses of reliability. They merely provide reference points for later fine adjustment: the final arbiter should be the teacher's professional judgement. It is by incorporating value judgements with weighted assessment aggregates that we develop an evaluation of student achievement. (Secondary Schools Board: "Assessment Guidelines 6")
It is impossible to avoid the subjective element in assessment. Statistics should be used to aid commonsense and professional judgement, not to replace them.
Clark, R., The WAM System. Paper to Interim Academic Board, 90/9, UTS, 1990.
Ebel, R., Measuring Educational Achievement. Prentice-Hall, New Jersey, 1965.
Gilmore, A., Combining Scores (Set, The Best of Assessment). NZCER/ACER, Wellington, NZ and Hawthorn, Victoria. 1987.
Rowntree, D., Assessing Students: How Shall We Know Them. Harper and Row, London, 1977.
Written by Dr Peter Petocz, School of Mathematical
Science and containing a component of a paper to Academic Board
by Dr Roy Clark, School of Teacher Education.
Some of the other key assessment issues not described in the previous sections have been addressed elsewhere.
Assessment procedures to be used by staff at UTS are described in Assessment Procedures Manual which is provided as an attachment to the Report to Academic Board by the Working Party on Assessment, and is available to all staff from the office of the Academic Registrar. Topics covered in that manual include: Responsibilities of Co-ordinating Examiners, Assessors and Examiners; Heads of Schools; Deans; Examination Review Committees; and the Academic Registrar. The attachments cover Recommended Grades for Use at UTS, Interim Results, Council Policy for the Disclosure of Assessment Results, Procedures for Students Requiring Alternative Assessments, and the Academic Registrar's Representative at meetings of Examination Review Committees.
Assessment procedures adopted by both the
University and by academic staff will be subject to changes. The
former as a result of changes to Academic Board resolutions, the
latter following regular reviews as a normal part of a staff member's
curriculum evaluation of their teaching practices. Changes to
University procedures will be notified through updates or supplements
to the Manual. New assessment methods are continually being tried
by members of academic staff. While this booklet contains some
of the range of methods used, it is by no means exhaustive, and
it is not intended that assessment methods be restricted to this
choice or, more importantly, selected from this booklet and subjected
to no further review.
Multiple-choice
Questions
Multiple-choice Questions (MCQs) are a subset of what are referred to as "objective questions". Objective questions are questions which have a correct answer (usually only one). The term "objective" here means there is complete objectivity in marking the test. The construction, specification and writing of the individual questions (items) are influenced by the judgements of examiners as much as in any other test.
The objective test is largely used to test factual material and the understanding of concepts. Because of the objectivity and ease of marking it is frequently used for testing large groups. It is claimed that skilled items writers can develop items to test higher level intellectual skills (Cannon and Newble: 1983) but if the perception of students is that these types of questions usually test the recall of facts, then they will prepare for them accordingly.
It is to be noted that the quality of an objective test is determined by the skill of the constructors of the test.
While the following notes refer specifically to the most common form of objective tests (MCQs), many of the comments are relate to all objective questions. Some of the advantages and disadvantages of the range of questions is given in Table 4 on page 24.
Multiple-choice Questions are probably the most widely used of objective tests. Such questions are normally composed of four parts:
STEM - question or incomplete statement
OPTIONS - suggested answers or completions
DISTRACTERS - incorrect responses
KEY - correct response
While the stem is, in most cases, written material, other material such as graphs, diagrams, sets of results may be used. In these cases it is probable that abilities other than recall may be tested. In some cases a number of questions (items) may be related to the same material. In these circumstances, care must be taken to make the questions independent (i.e. the answer to one question not depending on an answer to a previous one).
In some cases, MCQs may be used to test higher abilities by asking students to judge the most appropriate answers to a problem from several "correct" ones. Obviously, great care is required for the setting of such a question.
It is usual to have four or five options, with five options giving the most reliable test. However, it may be difficult to provide five plausible options and in this case it is better to stay with four.
Dunstan (1971), in a short monograph on
multiple choice testing, summarises the steps involved in writing
MCQs. He makes three important general statements which briefly
may be summarised as follows:-
More detailed setting guidelines and considerations
are given below.
1. Objectives:
The answers to these questions are important. Because of the increasing use of computer marked tests, MCQs are becoming increasingly popular, particularly where large classes are involved.
In some cases, the use of MCQs may not be appropriate. For example, it may be difficult to find suitable distracters for a question. In other words, the material and abilities under test should determine the nature of the examination rather than fitting the examinable materials to a specific type of test.
2. Stem:
It is most important that the stem is clearly
and unambiguously worded. The language used should be simple and
clear. If these guidelines are not followed, the test item becomes
one in which comprehension is tested as much as the material on
which the question is based. The stem should also state clearly
what is expected of the student before the options are read. This
requirement would rule out use of the question "which of
the following is correct?'
3. Options
Options should be independent of one another, consistent in logic and grammar to the stem and the language used must be clear and simple. In particular, the examiner should avoid clues to the correct answer and clues which will allow a student to discard distracters even if little is known of the material under test. Poorly constructed items may allow a student to reduce the number of plausible distracters and hence increase the chance of guessing the correct answer. In particular, the person constructing the test should AVOID USING:
One of the advantages of using multiple-choice questions is the statistics on the tests which are easily obtainable (particularly when the tests are computer scored). The parameters which define the quality of the test item are discrimination (D) and facility (F).
Discrimination compares the number of correct responses to an item for the upper and lower 27% of the class (based on the total test score). If for any one item the number of correct responses from people who are in the lower 27% for the whole test is greater than the number from those in the top 27%, then the item may not be effectively discriminating between students.
Facility (F) is the percentage of the class obtaining the correct answer. In general -
if F < 30% the question is hard
if F = 30-75% the question is satisfactory
if F > 75% the question is easy
Whether or not items are acceptable in terms of difficulty is largely the judgement of the examiner. A hard item with a high discrimination may be useful in establishing a rank order of students. An easy item with a low discrimination is probably a give-away question and consideration might be given to using another form of assessment. Items with acceptable discrimination and facility may be stored for future use. In this way an item bank may be built up over time and questions selected randomly for tests.
The quality of the test as a whole is measured by its validity (v) and reliability (r). Validity for lecturer-developed tests is determined in the light of course content and course objectives to determine the degree to which the test, content and objectives coincide.
Reliability is, in effect, the probability that the same score on a test would be achieved, on repetition of the test, by the same or an equivalent group of students or when the test is scored by another examiner. Values of this statistic range from 0 to 1 and a value of 0.7 or greater is generally acceptable. Objective tests in general have a higher reliability than essays primarily because of marker objectivity. Reliability increases with length of test and if questions with high discrimination values are included. Clear statements of instructions and items will also increase reliability.
A note of caution: high reliability does not imply high validity. An examiner could consistently be testing other objectives than those believed to be under examination.
Objective tests are often criticised because they encouraging guessing. Marking schemes which can take account of this assumption include:-
(a) Raising of the pass mark: In a test of 100 questions with 5 options per question, random guessing should allow a score, on average, of 20.
Hence the range of marks to be considered is from 20 to 100. The mid-point of this range (or pass mark) is 60, rather than 50.
This method assumes that random guessing takes place and that a score of 60 represents a satisfactory attainment of course objectives. Both these assumptions may or may not be valid.
(b) Deductions for guessing: A score of 1/(n-1) is deducted for each incorrect answer where n is the number of options per item. There is no penalty for an omitted answer. Assumptions are made when this method is applied. Firstly, it is assumed that incorrect answers are made on the basis of random guessing and that omissions are made only on the basis of insufficient information. Secondly, it is assumed that all options are equally attractive. Other considerations which may be taken into account when guessing corrections are used include:
(i) the influence of personality and test-taking strategy on the scores of students - confident students may guess; cautious students may adopt "error avoidance" schemes;
(ii) the overflow of scoring to subtract from other areas, which means that the score does not vary directly with knowledge.
Obviously, the quality of the item writing will have a large bearing on the way students select options as being correct or otherwise. Where poor distracters are used students will be able to use logical deductions to assist them in their search for the correct answer rather than guessing. In this case one of the assumptions made in the use of guessing corrections does not apply.
Billing, D.E. (1973) Objective Tests in Tertiary Science Courses in D.E. Billing and B.F. Furniss (Ed.) Aims Methods and Assessment in Advanced Science Education, Heydon & Son, London, p.131-148
Billington, D.R. (1981) The uses and abuses of assessment in biochemistry education in C.F.A. Bryce (Ed.) Biochemical Education, Croom Helm, London, p.95-122.
Cannon, R.A. and Newble, D. (1983) A Handbook for Clinical Teachers, Lancaster, MTP Boston: p 97-105.
Dunstan, M. (1971) A Guide to the Planning Writing and Review of Multiple Choice Tests, Tertiary Education Research Centre, University of New South Wales
Gibbs, G., Habeshaw, S. and Habeshaw, T, (1986). 53 Interesting Ways to Assess Your Students. Technical and Educational Services: Bristol.
Gronlund, N.E. (1982) Constructing Achievement Tests, 3rd. Edition, Prentice Hall, p.36-70
Hudson, B. (1973) Assessment Techniques - An Introduction, Methuen, London, p.122-126
Lennox, B. (1974) Hints on Setting and Evaluation of Multiple Choice Questions of the One from Five Type, Association for the Study of Medical Education, Dundee, Scotland
Stratton, J.J. (1981) Recurrent Faults in Objective Test Items, Teaching at a Distance, 20, p.66-73
Education Research and Development Unit, Queensland University of Technology.
Table 4: Advantages and Disadvantages of Different Types of Tests
| Item Type | Advantages | Disadvantages |
| Multiple choice | Simplicity in structure and make changes of guessing; students not exposed to any false statements; applicable to branched learning programmes and other feedback | Often impossible to think of four or five possible detractors; sometimes two or more alternatives may be correct, depending on the sophistication of the students |
| True/false | Simplicity; some statements can only be written with two rather than five alternatives | Too easy to guess, so this must be penalised; few areas are so clear-cut that there is no alternative besides 'true' and 'false'; students exposed to false statements |
| Multiple completion | Easy to construct since several correct answers are permitted; few incorrect distractors need to be constructed; no false statements | Instructions to students are complicated |
| Situation | Comprehension, application and evaluation can be easily tested; relationships between areas can be made by connecting items to a common situation | Difficult to think of situations which are not too similar to those the students has encountered, and yet which s/he has knowledge and ability to answer; complex to construct |
| Matching | Comprehension can be easily tested; related areas can be brought together; few distractors per item since one list services several items | Complex to construct; instructions are complicated; all responses must be homogeneous in being related in some way; gives false impressions of clear distinctions between categories (eg. of material, properties, interactions) |
| Assertion/Reason | Reasoning ability is easily tested; large numbers of distractors unnecessary | Students exposed to false statements; instructions are complicated; difficult to ensure that a 'reason' is either completely false at all levels of sophistication or is uniquely correct. |
Standard forms of essays require students to
Discuss a quotation, or
Write an essay on ........., or
Describe, Give an account of, Compare, Contrast, Explain ......., or
Assess, Analyse, Evaluate .........
While these types of questions give students the freedom to choose what they will concentrate on and to structure their work themselves, they may also leave the weaker students in some dilemma as to what is required. In addition to these types of questions there are a range of alternatives which can be employed to fulfill certain roles or suit different objectives. Three of these are briefly outlined below. These, and other variations appear in Gibbs et al (1986).
In role play essays, students respond to an essay question from the perspective of a position given in the essay question.
Role play essays help students see the relevance of the task and take an interest in it. Their writing often becomes more fluent and natural. Even small elements of simulation or role play can dramatically change students' approach to questions.
There is a danger of encouraging too flippant an approach, but this can be kept in check by careful phrasing of the question. For example ask students to write to someone in an official position, such as the Minister for Higher Education, or a superior, such as their Managing Director.
Write a letter to the Minister for Higher Education protesting about the lack of university places in Australia, giving economic arguments and emphasising evidence in Government Reports.
Advise Weybridge Electrical Ltd. (by whom you have been employed as a consultant) on the suitability of the circuit designs in Appendix I given the performance specifications in Appendix II.
Structured essays, require students to respond to an essay question which contains specific areas or parts of the questions which require an answer. For example:
Undertake a stylistic analysis of the following passage. Select, arrange and comment on features of syntax, lexis, semantics and (where relevant) phonology. Relate the artistic effects of the passage to the writer's choice of language.
By specifying the content required in an essay it is possible, when marking, to be clearer whether students know about and understand the specific things which you think matter.
It is difficult to know whether students would know which things matter without prompting. However, this type of essay is useful when you are testing specific knowledge and techniques.
Identify and discuss some of the determinants of urban land values and their impact on urban development. In your answer you should:
a) define the following terms; property rights in land; zoning; site value rating,
b) explain the influence of these terms in determining land values,
c) select one activity of Local Government and one market factor which affects market values and explain how each might influence urban development.
Students are supplied with data or evidence. Using that evidence (which in many subjects with mini-projects or laboratory exercises students may have collected themselves) students are asked to write an essay in which they address a question on that evidence.
The question and the data can relate directly to an exercise previously conducted by the students in which they collected, analysed and interpreted data. Interpretation questions require the students to undertake the analysis "live" and this can avoid regurgitation.
You own a house in a developing suburban area but are considering selling your property and moving closer to the city centre. Given the following demographic data
. . . . . . . . .
. . . . . . . . .
What economic and social factors would you consider in coming to a decision?
Grading of essays is a notoriously unreliable activity. If we read an essay at two different times, the chances are good that we will give the essay a different grade each time. If two or more of us read the essay, our grades will likely differ, often dramatically so. We all like to think we are exceptions, but study after study of well meaning and conscientious teachers shows that essay grading is unreliable (Ebel, 1972; Hills, 1976; McKeachie, 1986; White, 1985). Eliminating the problem is unlikely, but we can take steps to improve grading reliability.
First, using a scoring guide helps control the shifting of standards that inevitably take place as we read a collection of essays and papers. The two most common forms of scoring guides reflect the two approaches to grading most widely used in universities: analytic and holistic.
Those who use analytic scoring guides identify important components of the essay and assign marks to each component. As they read the essay, they award marks up to the limit specified by the scoring guide and then total the points to determine the essay's grade. An analytic scoring guide is included as the first example at the end of this section. A variation on the analytic method used in the subject Securities Market Regulation in UTS's School of Finance and Economics, is included as the second example.
Holistic grading methods assume that an essay is other than a sum of particular parts so we read the essay as a whole. Whereas the analytic scoring guide designated marks for particular aspects of the essay, the holistic scoring guide describes the characteristics of excellent, good and not-so-good essays.
An example of a holistic scoring guide is attached as the last example.
Gibbs, G., Habeshaw, S. and Habeshaw, T, (1986). 53 Interesting Ways to Assess Your Students. Technical and Educational Services: Bristol, pp. 11-26.
Erickson, B.L. and Strommer, D.W. (1991). Teaching College Freshmen. Oxford: Jossey-Bass, pp 145-148.
Ebel, R.L. (1972). Essentials of Education Measurement. Englewood Cliffs, N.J.: Prentice-Hall
Hills, J.R. (1976). Measurement and Evaluation in the Classroom. Westerville, Ohio: Merrill.
McKeachie, W.J. (1986). Teaching Tips: A guide for the beginning teacher. Lexington, Mass.: Heath.
White, E.M. (1985). Teaching and Assessing Writing: Recent Advances in Understanding,Evaluating and Improving Student Performance. San Francisco: Jossey-Bass.
Analytic Essay Scoring Guide (Erickson and Strommer, 1991)
Total marks possible: 6
Statement of position: 1 mark
The essay clearly states the students' position. One does not have to read between the lines.
Support for the position: 2 marks
The essay cites examples or evidence in support of the position. The quality or persuasiveness of the evidence is worth one mark. Originality is worth one mark.
Statement of an alternative position: 1 mark
The essay raises a reasonably significant objection, counterargument, or alternative to the position taken.
Refutation of the alternative: 2 marks
The essay provides examples or other evidence that render the alternatives false or less persuasive.
| GRADING OF FINAL EXAM The final exam is open book and relates to understanding the process, players, important factors and alternative solutions affecting a specific regulatory issue. Your assessment is based on your ability to effectively argue a position regarding a specific regulatory issue using the tools and information developed in the course. |
|||||
Michael and Mark have read your final exam paper. We have awarded you a mark of _____ out of twenty. This follows because we believe that your essay achieved: |
|||||
| all the major and minor objectives of the question | |||||
| all the major objectives of the question, but some of the minor ones were not | |||||
| all the major objectives of the question, but many of the minor ones were not | |||||
| most of the major objectives of the question, but most of the minor ones were not | |||||
| some of the major objectives of the question, but none of the minor ones were | |||||
| only a few of the major or minor objectives of the question | |||||
| none of the major or minor objectives of the question | |||||
This result has come from our assessment of your paper in terms of a number of categories of achievement. The ticks below indicate where you stand with regard to each set of statements. |
|||||
| D | C | P | F | ||
| STRUCTURE | |||||
| Essay relevant to topic | Essay has little relevance | ||||
| Topic covered in depth | Superficial treatment of topic | ||||
| ARGUMENT | |||||
| Logically developed argument | Essay rambles and lacks continuity | ||||
| Accurate presentation of factors | Much questionable or inaccurate evidence | ||||
| Rigorous critique of key concepts | Lack of demonstration of key concepts | ||||
| ORIGINALITY | |||||
| Original and creative thought | Little evidence of originality | ||||
| STYLE | |||||
| Fluent piece of writing | Clumsily written | ||||
| Succinct writing | Unnecessarily repetitive | ||||
| PRESENTATION | |||||
| Legible and well set out work | Untidy and difficult to read | ||||
| Reasonable length | Too long/short | ||||
| MECHANICS | |||||
| Sentences grammatical | Many ungrammatical sentences | ||||
| Effective use of figures/tables | Figures/tables add little to argument | ||||
| Correct spelling throughout | Much incorrect spelling | ||||
| SOURCES | |||||
| Adequate acknowledgement of sources | Some plagiarism | ||||
Adapted from Educational Services and Teaching Resources, Murdoch University. |
|||||
Highest possible score: 6
6: The essay clearly states a position,
provides support for the position, raises a counterargument or
objection, and refutes it. The evidence, both in support of the
position and in refutation of counterpositions, is persuasive
and original (that is, drawn from the student's own observations,
not borrowed). The essay tackles a significant objection or counterargument,
not a trivial one. The relationships between position, evidence,
counterargument, and refutation are clear, and the essay does
not contain extraneous or irrelevant information.
5: The essay states a position, supports it, raises an objection
or counterargument, and refutes it. The essay may, however, contain
one or more of the following ragged edges: evidence is not uniformly
persuasive or original; the counter-argument is not a very
serious threat to the position; one has to read between the lines
to see relationships between ideas and some ideas seem out of
place or irrelevant.
4: The essay states a position and raises a counterargument, but
their is well developed. The objection or counterargument considered
may lean toward the trivial. The essay may also seem disorganised.
Nonetheless, the essay should receive a 4 in acknowledgement of
the cognitive complexity of the task. It is more difficult
to address arguments and counterarguments than it is simply to
support one line of argument.
3: The essay states a position, provides strong and original evidence
supporting the position, and is well organised. However, the essay
does not address possible objections or counterarguments. Thus,
even though the support seems stronger and the essay
may be more well organised than the 4 essay, it should not receive
more than a 3.
2: The essay states a position and provides some support, but
it doesn't do it very well. Evidence is scanty, general, trivial
or not original. The essay achieves its length largely through
repetition of ideas and inclusion of irrelevant information. The
overall impression is that the essay has been dashed off at the
last minute.
1: The essay does not state the student's position on the issue.
Instead, it restates the position presented in the assignment
and summarizes the evidence discussed in the text or in class.
The essay may include an occasional "I agree with,"
but it provides nothing beyond what was said in class or in the
readings. The essay receives a 1 rather than a 0 because there
may be some merit to being able to summarise what the author of
the text said.
The following checklist, devised as a guide to helping students learn from their assignments is from Gibbs, G.and Habeshaw, T. (1989). Preparing to Teach. Bristol: Technical and Educational Services, p 125.
Commenting on assignments takes skill: it
is not just a matter of crossing out mistakes. Use this checklist
to revue the way you comment on your students' work.
Tick if you ...
| start off with a positive, encouraging comment | |
| write a brief summary of your view of the assignment | |
| balance negative with positive comments | |
| turn all criticism into positive suggestions | |
| take general suggestions for how to go about the next assignment | |
| ask questions which encourage reflection about the work | |
| use informal, conversational language | |
| explain all your comments | |
| suggest follow-up work and references | |
| suggest specific ways to improve the assignment | |
| explain the mark or grade, and why it is not better (or worse!) | |
| offer help with specific problems | |
| offer the opportunity to discuss the assignment, and your comments |
Short
Answer Questions
A large proportion of assessment
items make use of short answer questions of some form (in assignments,
quizzes, examinations, laboratory tests). These questions vary
in expected student response from one word or several lines to
over a page, and include forms such as complete the sentence,
supply the missing line, problems and exercises in science-based
subjects, short descriptive or qualitative answers, essay plans,
diagrams with explanation, etc. The diversity of form means that
no generic description is possible, but they are included in this
section for completeness.
Make the questions precise
Direct questions are better than incomplete statements.
If a numerical answer is required, indicate the units and degree of precision required.
Prepare a structured marking sheet
Allocate marks or part-marks for acceptable answer(s).
Be prepared to accept other equally acceptable answers, some of which you may not have predicted.
Mark questions with the following points in mind
Mark anonymously.
Have different markers for different sets of questions.
For a checklist to help students learn from assignments, see page 31
One example of a "long" short-answer question in chemistry, and detailed marking scheme is included.
| This question caries 25 percent of the marks
for this assignment and tests Objectives 1 and 9 of the
course. The concept of ionic radius is used extensively in chemistry. What observations lead one to suppose that the concept of ionic radius has any validity? Indicate how values of ionic radii may be obtained. Your answer should also indicate potential problems in the use of ionic radii. Marking Scheme: It is quite acceptable for this question to be answered in note form. You may award full marks even if every point is not included, provided that you feel that the answer is well constructed. |
|
| Marks | |
| Interionic distances in crystals obtained from x-ray diffraction | 3 |
| Impossible to say where one ion ends and the other begins | 3 |
| Series of compounds indicates constant(ish) factor when cation (or anion) is changed. Here values should be quoted | 6 |
| Appears that ionic rad assignable, are roughly additive | 3 |
| Assignment of ionic radii should refer to the Lande method with 'touching' I- in Lil | 6 |
| Problems Additivity not strictly applicable. Values vary according to assignment method. Values may vary with environment (eg co-ordination number) |
4 |
Newble, D. and Cannon, R. (1989). A Handbook for Teachers in University and Colleges. New York: Kogan Page, p 107.
Inorganic Chemistry: Concepts and Case Studies,
(1981). Milton Keynes: Open University Press
Conventional examinations which are normally unseen by students prior to the day they are attempted are widely used, partly because they have been used for years without question, and partly because they appear to have some advantages. These advantages include being able to sample a range of student learning and testing the students own work. However they also have many disadvantages, some of which can be overcome by selecting alternative examination methods.
The main disadvantage of unseen examinations is the extent to which they encourage students to memorise information for examinations rather than attempting to understand it as a component of their overall course. Open book exams go some way towards remedying this disadvantage. Two alternative methods of examining student understanding which remove the incentive to memorise are outlined below. In both cases, the suggestions work best with questions that do not have a unique correct answer.
An examination method more in keeping with the type of "tests" students will encounter in their careers is the take home exam. Students are given the examination paper and 2 - 7 days to submit their responses. It assesses their ability to research, redraft, and use resources, and places less emphasis on speed and memory than conventional exams.
An excellent way to inform students about what really matters in your subject is to show them the examination paper at the beginning of the semester. This examination paper should contain a broad question on each major topic of the subject (say eight questions), and students should be told that the final examination will only contain (say) three of them and that they will be required to answer all three.
Advantages
Disadvantages
Examples
A variation on this idea is employed in the subject Law for Marketing Managers, offered through the Faculty of Law for Business students. In that subject, the lecturer hands out 12 to 13 questions early in the semester with the advice to students that the final examination will contain 4-5 questions made up of parts or a whole of, and variations on, these 12-13 questions. Using the full range of resources available to them (library books, cases, etc) students prepare their answers free of examination stresses, and are able to take those answers with them into the examination room.
In all assessment schemes it is important that students are given clear criteria at the beginning of the semester and that these criteria do not change during the semester.
Gibbs, G., Habeshaw, S. and Habeshaw, T, (1986). 53 Interesting Ways to Assess Your Students. Technical and Educational Services: Bristol, pp. 49-64.
James Cooper, Faculty of Law.
A learning contract is a structured method whereby each student, in consultation with a staff advisor, designs and implements manageable learning activities. The emphasis is on making each activity relevant to those professional and personal needs of the student which are consistent with the aims of the course and/or subject.
Advantages
The principal advantages of the learning contract method are:
A learning contract is an agreement between a student and a staff adviser that the student will undertake a specified activity and produce evidence that the activity was successfully completed.
The role of the adviser is to negotiate the contract, monitor progress and to participate in the final assessment. Normally guidelines are developed explaining how to set out the contract, the parameters within which the contract should be negotiated, and any generic assessment criteria.
Typically a "Learning Contract Proposal' is developed which comprises the following elements (see the examples at the end of the section):
Purpose
What is required here is a clear statement of the purpose of the contract. This may be stated as an objective, a problem, an issue etc. The purpose will need to be appropriate to the subject or course and be sufficiently challenging to warrant inclusion in a degree level program.
Strategies and resources
Here is indicated the references students intend to consult, people they will arrange to interview, research they conduct, material they prepare to gather data (e.g. evaluation form), places they will visit etc.
What will be produced
Mostly students produce written work of some kind, e.g. a report, an essay, a journal, a book review etc. But other forms of presentation are also acceptable, e.g. video-recording, audio-recording, charts and diagrams, models, etc.
Adviser comments
Here the adviser will make comments and suggestions on the contract proposal. The adviser will draw attention to any weaknesses in the proposal, note any special conditions, and set out the assessment criteria.
1. The learning contract method may be linked to a particular subject or it can be a subject in its own right (e.g. referred to as Individualised Projects). In the latter case, students have more degrees of freedom in developing their contracts, but the expertise of advisers is stretched. Both approaches have merit and may be used concurrently in a given course, but they need to be administered differently.
2. The learning contract can be thought of as an assessment device or a learning device. This has implications for staffing allocations. If it is merely an assessment device connected with a particular subject, then the lecturer will receive no additional teaching hours beyond the face-to-face hours allocated. If it is seen as a learning device, with the adviser playing a crucial and on-going role, then teaching hours need to be allocated to the activity. In this way it can be seen as an alternative to learning in a large group. Instead, there is face-to-face, one-on-one or small group contact with an adviser.
3. The assessment criteria may be generic or individually negotiated. In the latter case, different criteria may be applied to the same subject, which certainly raises equity considerations. These may be overcome by having a generic standard which is applied to the criteria developed for individual contracts.
4. It is necessary to ensure that learning contract work has not been submitted elsewhere for credit. Some kind of statement on the cover of the contract may be required, but there is always the marginal case where some other work has been adapted or edited. It is normal practice to keep records of contracts completed.
5. The learning contract, when completed, may deviate significantly from the proposal, but is otherwise satisfactory. Ideally, contract proposals should be re-negotiated; notwithstanding this, if the final product fits within the parameters it should be assessed in the normal way.
6. Some contracts may be negotiated which are not within the adviser's area of expertise. This will invariably occur with learning contracts. The problem is that the level of advice which can be given is less than would normally be the case. Also the adviser is not in a good position to assess the final product. Instances such as these can be handled by re-allocating advisers for a given contract, or using a back-up marker, or even using a panel of advisers to check on marking procedures.
Caffarella, R. and Caffarella, E. (1986). Self-directedness and learning contracts in adult education. Adult Education Quarterly, 36, 226-234
Hiemstra, R and Sisco, B. (1990). Individualising instruction. San Francisco: Jossey-Bass.
Knowles, M. (1986). Using learning contracts. San Francisco: Jossey-Bass.
O'Donnell, J. and Caffarella, R. (1991). Learning contracts. In Michael Goldsmith (Ed.) Adult learning methods. Malabar: Krieger. (pp133-160)
Tompkins, C and McGraw, M.J. (1988). The negotiated learning contract. In David Boud (Ed.) Developing Student Autonomy in Learning. London: Kogan Page.
School of Adult and Language Education
School of Teacher Education
School of Adult Vocational Education
School of Nursing Health Studies
School of Information Studies.
Professor David Boud, School of Adult and Language Education
Associate Professor Mark Tennant, School of Adult and Language Education
Trish Farrar, School of Nursing Health Studies
| UNIVERSITY OF TECHNOLOGY SYDNEY, KURING-GAI
CAMPUS NURSING STUDIES VIA GUIDELINES FOR COMPILING A PERSONAL RESOURCE FILE In your Personal Resource File, you should include the following: |
|
| 1. | journal articles to which you have referred in the "Learning Procedures" column of your learning contract; |
| 2. | written information which helps you to meet your learning objectives eg. pharmaceutical information, equipment instructions, newspaper cuttings etc; |
| 3. | completed worksheets which provide evidence of accomplishment of your learning objectives; |
| 4. | nursing care plans which you completed during your nursing practice experience placement; |
| 5. | rating scales which you may have devised to demonstrate your learning progress. |
You should acknowledge the information contained in your Personal Resource File in your Reflective Journal, in accordance with the "Guidelines". A suggested method for presenting your Reflective Journal and Personal Resource File is to use a ring-binder and plastic sleeves. The Reflective Journal can be included at the front, with your Personal Resource File enclosed in the plastic sleeves. Each plastic sleeve could contain the information pertinent to one learning objective. Don't forget to include your Learning Contract, as this guides your Reflective Journal and Personal Resource File. The above is only one suggested method of presentation - it is not mandatory. |
|
|
Student's Name: Agency/Facility: Facilitator: Date Negotiated: |
|||
| LEARNING OBJECTIVES "What am I going to learn?" |
LEARNING RESOURCES AND STRATEGIES "How am I going to learn it?" |
EVIDENCE OF ACCOMPLISHMENT "How am I going to know that I have learned it?" |
CRITERIA AND MEANS OF VALIDATING EVIDENCE "How am I going to prove that I have learned it?" |
| 1. To update my knowledge of cardiac pharmacy. |
- Read journal articles in "Current Therapeutics" and "Heart and Lung". - Collect drug literature from medications I administer. |
- Summarise articles for annoated bibliography. - Complete inventory of drugs which I administer in CCU. |
Present annotated bibliography
in Personal Resource File. Include inventory in Personal Resource File. |
| 2. To become proficient in haemodynamic monitoring. |
- Read articles in "Nursing
'89". - Consult with the Clinical Nurse Specialist |
- Complete the exercises and
worksheet on haemodynamic monitoring in "Critical Care Nursing" Work Manual. - Care for a patient with haemodynamic monitoring without feeling apprehensive. |
Ask the Clinical Nurse Specialist
to supervise and assess me during haemodynamic monitoring. - Include completed worksheet in Personal Resource File. - Document my personal achievement in haemodynamic monitoring. - Reveal my feelings about haemodynamic monitoring in my Reflective Journal. |
| 3. To develop a new rotating roster in the Coronary Care Unit. |
- Consult with Time Management consultant - Research the area on Med Line |
- Write an annotated bibliography
of sources consulted. - Explain the new system to my colleagues and debate the pros and cons. |
- Implement the new roster system. - Include a summary in my Personal Resource File. - Present "spotlight". |
Peer assessment, in which students comment on and judge their colleagues work, has a vital role to play in formative assessment, but it can also be used as a component in a summative assessment package.
One of the desirable outcomes of education should be an increased ability in the learner to make independent judgements of their own and others' work. Peer and self assessment (See page 45) exercises are seen as means by which these general skills can be developed and practised. A peer rating format can encourage a greater sense of involvement and responsibility, establish a clearer framework and promote excellence, direct attention to skills and learning and provide increased feedback (Weaver and Cotrell, 1986).
In terms of summative assessment, studies have found student ratings of their colleagues to be both reliable and valid. Orpen (1982) found no difference between lecturer and student ratings of assignments in terms of average ratings, variations in ratings, agreement in ratings or relationship between ratings. Arnold et al. (1981) reported that peer ratings of medical students were internally consistent, unbiased and valid. Other studies suggest there is variation according to factors such as age of the student (Falchikov, 1986).
Reports of the types of assessment where peer assessment is used for summative purposes include essay writing, clinical skills, speeches and oral presentations, architectural designs, interpersonal skills, photography and small group activities (Kane and Crawford, 1989). In all cases, the contribution to the overall assessment result is small (10-30%).
The second of the two examples of Peer Assessment forms attached is used to determine an individual's contribution to a group's activity. For more details see Group Work, page 72.
Advantages
Disadvantages
Arnold, L. et al. (1981). Use of peer evaluation in the assessment of medical students. Journal of Medical Education, 56, 35-42.
Falchikov, N. (1986). Product comparisons and process benefits of peer group and self assessments. Assessment and Evaluation in Higher Education, 11, 146-166.
Orpen, C. (1982). Student versus lecturer assessment of learning: a research note. Higher Education, 11, 567-572.
Weaver, W. and Cotrell, H.W. (1986). Peer evaluation: a case study. Innovative Higher Education, 11, 25-39.
Extracts and examples from: Kane, R.L. and Crawford, J. (1989). Peer ratings of oral presentations by students: some preliminary data. UTS Faculty of Business Working Paper.
| PROJECT PRESENTATIONS PEER RATING FORM This form would be used to RATE THE PROJECT PRESENTATION YOU WILL BE HEARING. Please try to rate honestly. The student giving the presentation will not see this form, but will receive a tally of all the responses to their presentation. For all categories, please CIRCLE the number that isthe nearest approximation to your opinion. Please do not circle more than one response for each item, or place marking between the numbers on the scale. IN YOUR OPINION, HOW EFFECTIVE WAS THIS PREENTATION IN TERMS OF: |
|||||
| Weak | Fair | Good | Very Good |
||
| 1. | Providing a brief summary of the project (purpose, methods,results, conclusions, etc)? | 1 | 2 | 3 | 4 |
| 2. | Explaining and illustrating the important points? | 1 | 2 | 3 | 4 |
| 3. | The knowledge shown by the presenter? | 1 | 2 | 3 | 4 |
| 4. | Provoking and cntrolling discussion? | 1 | 2 | 3 | 4 |
| 5. | The extent to which the issues were new to you? | 1 | 2 | 3 | 4 |
| 6. | Presentation style? | 1 | 2 | 3 | 4 |
| 7. | Timing? | 1 | 2 | 3 | 4 |
| 8. | Gaining and holding your interest? | 1 | 2 | 3 | 4 |
| 9. | The extent to which you learned something from it? | 1 | 2 | 3 | 4 |
| 10. | The likely usefulness of this learning to you? | 1 | 2 | 3 | 4 |
|
2 Inadeaquate 3 4 5 Passable 6 Reasonably good 7 Good 8 Very good 9 Outstanding in every respect 10 Perfect in every respect |
|||||
Examples of Peer Rating Forms (Securities Markets Regulation)
PEER EVALUATION Name ____________________________________ Group __________________________ |
Please try to assign scores that reflect how you really feel about the extent to which the other members of your group contributed to your learning and/or/your group's performance. This will be your only opportunity to reward the members of your group who actually worked hard on your behalf. If you give everyone pretty much the same score you will be hurting those who did the most and helping those who did the least. Instructions: In the space below please rate each of the other members of your group. Each member's peer evaluation score will be the average of the points they receive after the highest and lowest scores have been deleted and the scores have been standardised so that the average peer evaluation score for all groups is identical. To complete the evaluation you should: 1. List the name of each of the members of your group in the alphabetical order of their last names. 2. Assign an average of ten points to the other members of your group. (Thus, for example, you should assign a total of 50 points in a six member group; 60 points in a seven member group; etc.) 3. Differentiate some in your ratings, eg. you must give at least one score of 11 or higher (maximum = 15) and one score of 9 or lower. |
|
2. ______________________________ __________ 3. ______________________________ __________ 4. ______________________________ __________ 5. ______________________________ __________ 6. ______________________________ __________ 7. ______________________________ __________ 8. ______________________________ __________ |
Additional Feedback: In the space below would you also briefly describe your reasons for your highest and lowest ratings. These comments - but not information about who provided them - will be used to provide feedback to students who would like to receive it. |
| 1. Reason(s) for your highest rating(s). (Use back if necessary) |
| 2. Reason(s) for your lowest rating(s). (Use back if nescessary) |
The following commentary is supplied by a user of oral exams in the School of Teacher Education at UTS
By the fifth semester of the sequence students have had considerable experience in thinking about literature and in articulating their responses to it. An oral examination at this stage does not make demands upon them which they feel incapable of meeting.
By this stage some students have developed particular areas of interest and are beginning to become independent thinkers about literature. The oral examination gives them an opportunity to take up points and pursue them in a way not always appropriate in a response to a set examination or essay question. (It is always possible to say to a student, in an oral examination: "That is an interesting idea. Would you like to develop it further.")
The oral examination is more demanding because it allows the examiner to probe understanding. Students cannot hide behind a learned opinion or borrowed position. (It is always possible to ask: "Just what exactly do you mean by that? Why do you think that critic says that? What about the position taken by X?")
The oral examination requires students to think on their feet, i.e. to adapt their knowledge and insights to a variety of demands and defend their position. Students in the final semesters of their sequence should be able to perform in this way. At the beginning of their course of study, on the other hand, it is probably sufficient to expect that students will arrive at some understanding and be able to express it.
Each student has a session which lasts approximately twenty minutes.
Students have selected a general topic on which they wish to be examined. e.g. King Lear. Every student who has selected that topic will be asked the same set of questions.
The prepared questions demand a hierarchy of responses. Some are designed simply to establish that the student has a basic, literal knowledge of the topic (e.g. of the events in King Lear). Some are designed to see whether the student has thought about the significance of that knowledge (e.g. "In the source play for King Lear Cordelia does not die. Why do you think Shakespeare modified the story to include her death? How do you see her death in relation to the 'meanings' of the play as you understand them?). Some are designed to encourage students to pursue any particular interests or insights they may have had ("Do you agree with...?" "If you were producing a performance of the play would you...?") and there is always a final question: "Is there anything else you would like to say about this play/topic that we haven't covered?" Some questions are designed to see whether students have related their thoughts about the topic to wider issues (e.g. "King Lear was one of the least often performed of Shakespeare's plays during the nineteenth century. Since World War II it has been one of the most frequently performed and discussed plays. Have you any thoughts about why this might be so?").
If students have difficulty in answering questions some prompt questions are asked. But if the difficulty persists, such that it is apparent that the student does not have sufficient knowledge or understanding to answer the question, we pass on to the next question. It is important, in that situation, to move on without undermining the student's confidence by making it apparent that they have 'failed' a section of the examination.
Ideally two or three staff members conduct the exam, sharing the asking of questions, and discussing the student's responses at the conclusion of the examination. In recent years staffing constraints have made this difficult, and sometimes it has not been possible to use additional staff. For this reason the oral examination is never worth more than 40% of the total. Notes are made on the student's responses as the examination proceeds, and a grade is decided before the next student is dealt with.
Broad grades rather than precise marks are awarded, but perhaps with a plus or minus modification, which may be taken into account when determining the final, overall result for the subject.
For a pass grade a student needs to demonstrate a basic knowledge of the to