IDENTIFICATION OR RECOGNITION OF HIGH ABILITY
Peter Tilsley
The role of testing in alternative models of educational practice is first discussed followed by detailed consideration of issues affecting the choice of tests and the interpretation of test data.
All of us involved with pupils develop personal views of their abilities, whether these are explicitly stated and recorded or remain personal beliefs. The actions we take vis a vis the pupils are influenced by our beliefs about their abilities. It is therefore important to consider carefully the basis upon which the beliefs are founded and the consequent appropriateness of the beliefs. If we do not, then our actions may, unwittingly, disadvantage pupils, even severely disadvantage them.
Testing of one sort or another is a major source of information on which we base our beliefs about pupils' abilities. It is therefore, I believe, important that we consider very carefully our use of tests and our interpretation of the resulting data.
Some schools and teachers place great reliance upon test data for allocating children to different groups. Others appear to regard the use of tests as undesirable. In such cases it seems that the sort of tests they have in mind are standardised tests like those used in 11+ selection. Indeed, perhaps the reservations they have are a legacy of perceived difficulties, dangers and undesirable outcomes stemming from the use of tests in 11+ selection.
Dangers, difficulties and undesirable outcomes such as labelling, stereotyping, misplacement and general relationships between scores and variables such as social class, gender, ethnicity, language background and the like, are indeed potentially present when tests and test data are used inappropriately. However, this applies to all tests, not just standardised instruments. It is thus important to recognise that marks and grades resulting from classwork, teacher devised classroom tests and examinations are test data. Such devices and data are at least equally prone to the same dangers, difficulties and undesirable outcomes and in some cases may be more prone to them than standardised tests.
Nevertheless, appropriate testing can provide valuable information about pupils which can be used to their benefit. The key word is 'appropriate'. The tests used must be appropriate for the precise purpose for which they are used. They must also be appropriate for the population being tested and the resulting data must be interpreted appropriately.
Unless these conditions are met, decisions made about pupils based on test data may be inappropriate and detrimental to the pupils. Where the conditions are met, useful indicators of ability can be obtained. Particularly in the case of standardised tests, indications may be obtained of high ability which has not shown itself in the classroom situation. The converse may also be the case and it is, in the author's view, important to stress that we should take account of the most optimistic evidence we have rather than pessimistic evidence.
Thus if test scores indicate higher ability than we have detected in the classroom, let us take note of this apparent higher ability rather than putting the high score down to a fluke. Conversely, if we believe a child has high ability but test scores do not indicate this let us go with our professional intuition. In all cases give the child the benefit of the doubt.
In what follows, issues affecting appropriateness will be discussed particularly in relation to use of standardised tests but these issues also apply to other forms of testing.
The use of tests and test data for identification or recognition purposes depends upon both the concept or definition of high ability employed in the school and the intended function of identification of children or recognition of high abilities in the process of provision for highly able children.
In using the term 'identification' I refer to the process of determining whether an individual meets criteria for inclusion in a category, for example 'gifted child', with a view to providing them with a curriculum appropriate for that sort of child. On the other hand I use the phrase 'recognition of high abilities' to refer to the process of determining profiles of strengths amongst individuals with a view to providing for the development of their specific strengths as individuals. The distinction is, I believe, important for several related reasons:
The points above can be linked to two alternative models of educational provision. As represented in figures I and 2 and described below they are ideal types. That is, they are shown as theoretically pure forms unmodified by reality. In reality it is likely that both will be operating together but with one dominant to a greater or lesser extent, with consequent differences in both provision and the choice and use of tests in assessment procedures.
1. First is what I call The DIP model (Definition - Identification - Provision). It might also be called the prescriptive curriculum model.

In a nutshell children, are chosen to suit a curriculum predesigned for the category they fit.
In this model we have a clear concept of what we mean by the category label in use, such as 'more able' or 'gifted'. This concept may be narrow, for example just in terms of intelligence or of high ability in maths and English. Alternatively, it may be broad, embracing a variety of dimensions of human ability, as for example the definition accepted by HMI which specifies:
"general intellectual ability, specific aptitude in one or more subjects, creative or productive thinking, leadership qualities, ability in creative or performing arts and psychomotor ability."
(HMI 1992:1)
As a working definition for identification and categorisation purposes, specific criteria for inclusion in the category need to be specified. For example in the case of high intelligence a criterion of IQ 130+ might be set. Children not achieving this would then not be categorised as highly able or gifted. Alternatively, a proportion might be specified, such as the top 5% in the school or the top 30 pupils. The latter approach is, for organisational purposes, commonly used in setting arrangements. A child who achieves 31st place is then placed in the second rather than the top set even though his/her achievement may be only marginally lower and there may be no real difference in ability or, because of deficiencies in assessment, the rank order is suspect.
Once the definition and criteria are established, measures appropriate for assessment of children are decided upon and used. These should provide appropriate data on all the dimensions of ability recognised in the definition. Thus, standardised tests of such dimensions as general intelligence, creative thinking, verbal reasoning or attainment in specific subjects might be used. In addition, or alternatively, teacher devised tests such as coursework assessments, classroom tests and end of year exams might be used. As noted earlier, these are tests since they purport to measure children's ability or attainment. Thus issues related to testing which are discussed below apply to them as well as to standardised tests.
Children's performance on the chosen measures is then checked against the categorisation criteria and children are allocated to appropriate groups. These may be whole classes, as in setting or withdrawal groups, or groups within a class such as the red table or top maths group.
The different groups established are then provided with a curriculum considered appropriate for the category of pupil in that group. Thus set 1 maths may be different from set 2 maths in content and/or pace of learning and/or style of teaching and learning adopted.
The expected achievement of learning objectives is also likely to be different for different groups and children's achievement of these objectives will be assessed through the use of tests, though not usually standardised tests. On the basis of assessment data children may be re-categorised and reallocated to groups. The process then continues as before.
2. The second model I call The PEP model (Provision - Evaluation - Provision). It might also be called the person development model.

In a nutshell the curriculum is designed and then modified to suit the children it serves as their abilities unfold.
In this model the first general and 'tentative definition is of valued areas of ability not of categories of children. Initial tentative consideration of children's abilities is intended to guide planning of curriculum opportunities, not to allocate them to groups.
The resulting curriculum design is open ended in that its aims are not restricted to achievement of predetermined learning objectives. These may be included but an equally important aim is to provide opportunities for abilities of whatever sort which may be educationally valued (possibly unsuspected by either teacher or child), to show themselves, to be utilised and to be developed to the highest possible levels. Initial design of the curriculum for any given group of children, whether it be a class, year group or age range, is necessarily tentative although guided by the first general definition, initial consideration of possible abilities and previous professional experience. It is tentative because there is no certainty about the abilities and levels of ability which the children to be taught may possess. Nor is there certainty about all that should be achieved or is achievable by the children. It allows for surprises to emerge.
Provision is thus essentially one of opportunity and opportunity requires not only appropriate content, processes and tasks but also a facilitating environment. This latter pre-eminently requires an ethos in which risks can be taken and failure in a specific task is seen as part of the process of self discovery and self development . An ethos where the teacher is receptive to signs of unexpected abilities and encouraging of personal development in areas of strength and where deviation from the norm is thoroughly acceptable in areas which are considered to be worthy of value.
Concurrent with provision is assessment of children's responses to the opportunities. The purpose of this assessment is two-fold. First to aid in the recognition of previously unrecognised abilities in individual children and amongst the children as a whole. Secondly to permit evaluation of the appropriateness of the curriculum provision for the children it is intended to serve.
On the basis of this evaluation of both children's abilities and curriculum provision to develop those abilities, the planned curriculum experience is modified to take account of: any previously unrecognised abilities; any mismatches between the previous curriculum demand and the levels of ability in the various dimensions of ability revealed by individual children. The modified curriculum is then implemented and the cycle repeated in an ongoing process.
The role of testing in this model differs from that in the first where its role is for categorisation. It is aimed at providing data on the widest possible range of ability dimensions which might manifest themselves and on the full range of levels of ability, up to the highest conceivable, which might be revealed on any dimension. In other words it aims to provide profiles of abilities for all children rather than to categorise.
As noted earlier, in practice neither model is likely to operate in pure form. Both are likely to be operating though with one dominant. My view is that, as far as is practicable, it is preferable to emphasise the PEP model. There are many reasons for this view which cannot be explored here but reasons in relation to testing are that neither the meaning nor accuracy of data from tests, of whatever sort, are wholly certain. Consequently categorisation cannot be certain and children may be misplaced, with detrimental effects. For example children misplaced 'high' may face expectations which are too demanding for them with possible consequences of anxiety and lowered self confidence. Conversely, those misplaced 'low', as for example in too low a set, may have little subsequent opportunity to demonstrate or develop their high ability as a result of an undemanding curriculum and may learn to respond at a relatively low level in accordance with low expectations.
This is not to say that grouping by ability should be avoided. Indeed, I believe the provision of opportunity for children of high ability to work with others of similar ability is an important feature of the facilitating environment in the PEP mode. But that is another story.
Nor is it to say that testing is inappropriate in either model. It is necessary in the DIP model and, I believe, highly desirable in the PEP model both in the initial tentative consideration of children's abilities and as part of the curriculum opportunities to show high abilities.
It is to say that sensitivity and caution in the interpretation of test data, for whatever purpose, are necessary if decisions made on the basis of that data are not to disadvantage children. The reasons for this view will become apparent from the following discussion.
Key questions are:
If according to the DIP model previously discussed the purpose is to identify pupils who meet specified criteria so that they can be grouped accordingly, then differential levels of ability or achievement beyond those criteria are not of fundamental importance. Thus tests chosen should be capable of differentiating well at the borderline but need not provide differentiation much above the borderline.
On the other hand if the PEP model is adopted then the purpose is to recognise the differential abilities of all pupils so that curriculum provision can be tuned as far as possible to the individual child's levels of abilities in a wide range of dimensions. Thus tests chosen should be capable of differentiating clearly across the wide ability range, including the very highest. Any one specific test may not be capable of this. Consequently, more than one may be needed in respect of any one dimension of ability. Thus for example in the case of general intellectual ability, one test may discriminate well in the mid range of abilities but may not discriminate well at the top end, thus bunching the scores of the most able. Consequently, we would get very little information about the highest levels of ability present.
This often applies particularly to tests designed for a specific age group. In such cases a test designed for older pupils than those being tested, eg. the higher levels in the Cognitive Abilities Test (Thorndike et al), may provide for discrimination at the top end. Alternatively, a test designed to provide good discrimination at the top end as well as in the mid range should be given. The AH2 group intelligence test (Heim et al) is one such instrument which can be used with children over a wide age range, from about 10 to 18. Otherwise following use of a general purpose test, children scoring in the top 20-25% might be retested using a test designed specifically to discriminate well at the top end.
The answer to this depends upon our definition of more able/gifted and its specification of the range of dimensions on which it is considered important to recognise high abilities. Two sub issues are whether these are conceptualised in terms of ability or achievement and whether they are general or specific.
The range of dimensions can be broad or narrow. If narrow, such as intelligence only, then the number of tests needed will be small. If broad, including for example not only intelligence but also creative thinking, specific academic abilities, performing and visual arts abilities and sporting abilities, then the number of measures needed will be large.
The number and variety of measures used must be sufficient to assess the full range of dimensions otherwise only lip service is being paid to a broad definition. Thus for example intelligence tests provide little information on creative thinking ability. Moreover amongst highly intelligent pupils specific academic ability may not be significantly related to their general intelligence level. Furneaux and Rees (1978) show this in relation to mathematical ability.
Additionally the view that intelligence is a general, global, unitary attribute of humans, is challenged by much psychological research. This research suggests that intelligence is multidimensional and that individuals may have varying levels of ability on different dimensions (eg. Guilford, 1977; Gardner, 1993) Indeed it seems that amongst the most intelligent, the relationship between levels of ability on different dimensions is particularly weak (Detterman, 1993). Thus an individual may be gifted on one dimension but of significantly lower ability on others. This is unlikely to be revealed by a general intelligence test since high performance on items related to the dimension of strength (if there are any such items) will be offset by lower performance on other items, resulting in an overall score possibly well below the criterion adopted for identifying giftedness.
In order to identify the areas of specific high abilities, separate scales for each of the dimensions considered important are needed. Individual intelligence tests (such as WISC and BAS) administered by Educational Psychologists provide such separate scale scores. Group pencil and paper tests which can be used by teachers less commonly do so though some, such as AH2 and CAT, include a limited range of separate scales. Similar considerations apply to general areas of ability other than intelligence.
Setting is commonly considered to be a form of ability grouping. However, assessment to determine placement of children in different sets is commonly assessment of achievement on coursework, classroom tests, examinations or standardised achievement tests. Thus setting can commonly be considered to be grouping by achievement rather than ability. It is important to distinguish between the two.
If it is the explicit intention to assess attainment, for example in mathematics, then it is appropriate to use attainment measures. The consequence of this for setting is that some children of very high ability who have not achieved highly for one or more of the many reasons affecting achievement, will not be placed in the top set.
If the intention is to assess ability then it follows that the best available measures of ability should be used, not measures of attainment. Thus for example an intelligence test is a better measure of intellectual ability, though less than perfect, than previous or present academic achievement.
Reference has already been made above to the distinction between ability and achievement. Tests of whatever sort need to be judged against this distinction when considering choices.
Other considerations also apply. One of particular importance is that almost all tests assume a certain commonalty in the general background experience of the population to be tested. Yet individuals being tested may have a dissimilar background resulting, for example, from culture, language difference, lack of opportunity or differences in length of experience.
Most tests of general or specific abilities are culture biased - they are not culture fair. Consequently children from cultures other than that assumed may be disadvantaged with resulting lower scores which may then be taken as indicative of lower ability.
This may apply particularly to ethnic minority groups. In such cases tests claimed to be largely culture fair, such as Raven's Progressive Matrices, may be more indicative of intellectual ability than most other tests.
This test is a non-verbal test in that it is largely language free both in content of the tasks and instructions on what to do. However, not all non-verbal tests are either culture fair or so language free. For example, the perceptual reasoning scale on the AH2 group test of intelligence contains content which assumes certain cultural experience and a greater reliance on written language for giving instructions.
Nevertheless, non-verbal tests are generally fairer culturally and linguistically than other forms of ability tests. Thus they may be of particular value in assessing intellectual abilities amongst culturally different or linguistically deprived children. A note of caution however. They are not direct measures of general intellectual ability but of reasoning abilities applied to non-verbal content and are consequently not very good general predictors of academic achievement in a linguistically dominated curriculum.
Language has been highlighted above as a variable influencing test performance. Language competence plays a part in determining performance on most tests. If the explicit intention is to include assessment of this, as for example in verbal reasoning tests, all well and good. In such a case however, the danger is that scores will be taken to indicate general intellectual ability not just verbal reasoning ability.
Moreover where the intention is to assess other specific abilities, such as numerical reasoning or mathematical abilities, language differences may be a confounding variable. Thus for example, measures of numerical reasoning and mathematical ability or attainment may be couched in language which disadvantages mathematically able but linguistically less able or deprived children with a consequent depression of their scores. Low scores may thus not be indicative of low mathematical ability.
In the technical language of testing the general issue is that of validity. The validity of a test refers to the extent to which the test is measuring what it is intended to test. Tests are not like rulers used to measure length. Almost everybody knows what length is. There is a commonly agreed concept of length and whilst different rulers use different units of measurement they are all measuring the same thing. This is not the case with ability tests.
For example we do not know what intelligence is and there is no single commonly agreed concept of intelligence. Therefore we cannot be at all sure that intelligence tests are measuring intelligence. Thus validity is a problem.
The way out of this problem normally adopted is to assume that other intelligence tests are measuring intelligence and to compare the test under consideration with those. This is done by giving your test and one or two other tests to a sample, the larger the better, and comparing the scores by means of a correlation coefficient. This is a figure of between -1 and +1. The nearer the figure is to +1 the more similar are the two sets of scores. For example a validity coefficient of 0.95 suggests that the two sets of scores are very similar and it is thus assumed that the test under consideration is measuring something very similar ie. the same sort of intelligence. This is known as the concurrent validity of the test and manuals for standardised tests usually give this validity coefficient.
Alternatively or additionally, scores from the sample are compared with subsequent academic achievement such as public examination grades. This is known as predictive validity and is perhaps more valuable for choosing tests intended for use in assessing academic capability. However, it must be remembered that the validity coefficient refers to the scores of large numbers of children and not to the scores of an individual. The actual achievements of individuals result not only from their academic ability but also from the influence of many other variables such as teaching quality experienced, motivation, personality, study habits, cultural and language background etc.
In choosing standardised tests, the validity coefficients should be checked and should, other things being equal, be as high as possible. For example in the case of the AH2 intelligence test concurrent validity coefficients in comparison with various NFER tests range from 0.61 to 0.87 and predictive validity coefficients in comparison with GCE/CSE grades range from 0.57 to 0.65 (Heim, 1974).
In the previous section it was pointed out that tests are not rulers used to measure length and that this was because, unlike the concept of length, concepts of ability are less certain. This, of course, has implications for the accuracy of measurement of abilities. If it is not wholly certain that the test in use is measuring exactly what we want it to measure and if moreover, different tests supposedly measuring the same ability yield different scores for the same individuals, as for example is the case with different intelligence tests, then we cannot place complete reliance upon the scores as accurate indicators of the ability we wish to assess.
Other issues also affect the accuracy of measurement. Some are external to the test itself, others are technical and related to the test itself. External factors include language, as already discussed but also experience, motivation, anxiety levels and fatigue of the child being tested, together with environmental factors such as test conditions and administration procedures. Technical and test related issues include reliability, standard errors of measurement and the normative data with which an individual's test scores are compared.
Children's background and age have an impact on experience and consequent scores. Children from less advantaged backgrounds may have had less experience relevant to the tests with consequent lower scores than children from more advantaged backgrounds. Yet they may have equal or greater potential.
It is for this reason that some selection procedures for admission to enrichment programs take account of background in setting criteria for admission. For example a program in Jerusalem set IQ levels of 135, 125, 115 as minimum admission criteria for children from 'normal', 'low education standard' and 'culturally deprived' backgrounds respectively (Ornoy, 1979).
Age has an impact upon experience, particularly in the case of younger children. The oldest children in a year group may be almost a year older than the youngest. They will thus have had almost a year's extra experience which may put the youngest children at a disadvantage.
Thus recent evidence (Fox, 1995) on the assessment of 18,000 children at the end of key stage one (ie. at age seven) shows that precise chronological age is significantly related to the level achieved, with the average level of the younger children in the age cohort being considerably lower than that of the older children.
Motivation may have a powerful effect on scores achieved. Children with low motivation may perform poorly because of lack of effort or attention rather than lack of ability. However, it may be suggested from personal experience and the experience of colleagues that some very able children who appear unmotivated and poor achievers in the classroom appear to be more highly motivated when faced with an intelligence test, perhaps because of the unusual challenge it presents, and perform highly.
The converse of low motivation is high motivation and in some cases this may be too high leading to anxiety. Anxiety has consistently been shown to be related to performance of many sorts with high anxiety generally being related to low performance (eg. Levitt, 1967). Thus the anxiety level of children when taking a test may influence their performance. In one case known to the author a highly motivated and anxiety prone child taking a verbal reasoning test as part of 11+ selection procedures scored zero because he was so anxious that he did not put pencil to paper throughout the duration of the test. Sad to say he did not get to grammar school although in classroom work he showed himself to have high academic ability.
Fatigue, illness, pre-menstrual tension and other bodily conditions may all play their part in depressing test performance below that which would be indicative of the person's ability. So also may the test-taking situation - whether conditions are optimal or whether they are noisy, hot, cold, crowded and the like, all of which may affect an individual's performance.
All the above are variables outside the control of the test constructor yet they may affect the extent to which achieved test scores are accurate reflections of ability.
More within the control of a test constructor are the test administration procedures. If comparisons are to be made between the scores of children taking the test on different occasions or in different places, the comparisons should be fair. This will not be the case if administration procedures are not the same. In the case of standardised tests, the manuals give instructions at least on the precise time allowed and may give precise detail on other things such as precise wording of instructions to testees. If these conditions are breached, as for example additional explanations are given or times are altered, then the scores obtained cannot be interpreted fairly through comparison with the norms. For example IQs are determined by comparison and may be depressed or inflated as a result of non standard administration. So in one case known to the author extra time was given yet scores were directly compared with the norms, thereby inflating IQs. Such IQs are meaningless. It follows therefore that the standard administration procedures must be followed exactly if scores are to be capable of meaningful interpretation.
Turning to the tests themselves, two related technical concepts need discussion. These are reliability and the standard error of measurement.
Reliability is a measure of consistency in the scores on the test achieved by a group of people. Internal consistency (split half reliability) is the commonest feature assessed but this merely indicates whether one half of the items produces the same pattern of scores as the other half. Test/re-test reliability is much more important for assessing the likely accuracy of group scores achieved on one occasion. A correlation coefficient (as discussed earlier) is calculated for scores obtained by a group on the same test taken on two separate occasions, perhaps two months apart. Given that we would not expect abilities comprising intelligence to change much over that short period, we would expect the pattern of scores to be very similar. Ideally therefore the correlation coefficient should be +1 ie. indicating a perfect match between the two sets of scores. This is seldom the case and test/re-test reliability coefficients are often much lower than +1. Consistency is often not very good. Apart from the various external factors discussed above one other reason for this is the errors of measurement inherent in the test itself.
The best test manuals give information on the standard error of measurement for the test. Most do not but it can always be calculated from the formula:
![]()
where SEm is the standard error of measurement
SD is the standard deviation of test scores
r is the reliability coefficient for the test.
The SEm is an important indicator for purposes of test score interpretation. Much of the above has suggested that tests of ability and achievement may not be accurate measures. It is thus helpful if we know within what range around an achieved score a child's 'true' score may lie. Knowing the SEm enables us to estimate this range with certain levels of probability.
The ranges are as follows. With a 67% probability the 'true' score is likely to be within 1 SEm of the achieved score. With 95% probability the 'true' score is likely to be within 2 SEm .
Many tests, for example Raven's Progressive Matrices and the AH2, have an SEm of around 4 points. Assuming this figure and an achieved quotient of 125 we can be 95% confident that the 'true' score will be somewhere between 117 and 133 i.e. within ±2 SEm, but we don't know where in that range. Moreover there remains a 5% probability that the 'true' score will be outside this range.
Thus, from all of the above, we see that test data, from whatever sort of ability or achievement test, cannot be accepted as accurate with any certainty. Such data is merely an indicator. Reverting to the analogy of measuring length, ability and achievement tests are more like elastic tape measures which may give different readings on different occasions than like steel rulers whose readings will vary very little. One implication of this is that rank orders of children based on achieved scores on one testing occasion are highly suspect.
How then should test data be interpreted and what use is it for identification or recognition of high ability?
The first important point is that a low score does NOT necessarily mean that a child has low ability. This may be the case but one or more variables other than ability may have conspired to produce the low score. Conversely, a high score is more likely to reflect high ability than the influence of contaminating variables though lucky guesses and, obviously, cheating can inflate scores. Nevertheless, in the absence of cheating, it is probably wise to accept high scores as indicative of high ability and to be cautious about inferring low ability from low scores if there is any other suggestion that the child may have higher ability.
Secondly, scores on a given test refer only to the ability which the test is measuring. They should not be regarded as indicative of ability on other dimensions. Thus verbal reasoning tests are not indicative of mathematical ability.
Thirdly, if two children achieve the same score on a broad ability test, such as a general intelligence test, this does not mean that their abilities are the same. The same high scores may have been achieved as a result of correct answers to different sub-sets of questions ie. as a result of different specific abilities. Moreover, a relatively lower overall score may mask high specific ability which might be revealed by a more specific test.
Fourthly, scores on one particular test cannot be directly compared with scores on another test since the two tests will, to a greater or lesser extent, be measuring different aspects of ability. Thus an IQ achieved on one intelligence test does not mean precisely the same as an IQ achieved with another intelligence test and firm judgements about children's relative abilities should not be made on the basis of lQs achieved with different tests.
The only absolute meaning that we can give to test scores is that they represent the individual's performance on the one particular test taken on one particular occasion. If quotients are derived from the scores these additionally only provide a statistical index of comparison between the scores achieved and the scores of a comparison sample. Any meaning beyond this is an inference not a fact. An inference about ability from test scores or a quotient may or may not be justified. Thus an IQ is not a child's intelligence. It is merely a statistical index. A child is highly likely to obtain different lQs on different occasions of testing, on different tests and on comparing the score with the normative data from different samples.
Inferences about ability are only appropriate if they are tempered by consideration of all the issues discussed above.
The question therefore arises:
In the author's view tests can provide valuable data which, if interpreted sensitively in the light of the above discussion, supplements other sources of evidence. Different sources of evidence are needed if we are either to categorise and select children, as in the DIP model, or to recognise high abilities, as in the PEP model.
Tests may and do reveal children with high abilities who have not been noted by teachers and conversely teachers or other sources may and do detect high abilities not revealed by a particular test (Gagné,1994).
The message is that no one source of information should be relied upon either exclusively or predominantly in judging children's abilities, particularly where children's educational and life chances may be affected by the judgements.
Appropriate tests, sensitively used and sensitively interpreted can provide indications of high abilities which may not have been detected in other ways. If it is truly our intention to provide all children with an equal opportunity to develop their abilities as far as possible and to achieve as highly as they are able, then it is incumbent on us to recognise their existing abilities so that we can provide appropriately. Tests of ability of varying sorts have an important part to play in this recognition process.
Detterman, D. K. (1993). Giftedness and intelligence: One and the same? In G.R. Bock & Ackrill, K. The origins and development of high ability. CIBA Foundation/John Wiley.
Fox, R. (1995). The pattern of age differences in high and low achievers in English, Mathematics and Science at Key Stage One.
Furneaux, W. D. & Rees, R. (1978). The structure of mathematical ability. British Journal of Psychology 69(4), pp 507-512.
Gagné, F. (1994). Are teachers really poor talent detectors? Comments on PegnatO and Birch's (1959) study of the effectiveness and efficiency of various identification techniques. Gifted Child Quarterly, 38(3). pp 124-126.
Gardner, H. (1993). Multiple intelligences: the theory in practice. Basic Books.
Guilford, J. P. (1977). Way beyond the IQ. The Creative Education Foundation.
Heim, A. W. et al. AH2 Group test of general reasoning. NFER/Nelson.
Heim, A. W., Watts, K. P. & Simmonds, V. (1974). Manual for AH2/AH3 group tests of general reasoning. NFER/Nelson.
HNU (1992). The education of very able children in maintained schools. HMSO.
Ornoy, A. (1979). The special educational project for gifted children within the school system in Jerusalem. Paper presented at the third world conference on gifted and talented, Jerusalem 1979.
Raven, J. Raven's Progressive Matrices. NFER/Nelson.
Thorndike, R. L. et al. CAT Cognitive Abilities Test, 2nd Ed. NFER/Nelson.
Rust, J. & Golombok, S. (1989). Modern Psychometrics: The science of psychological assessment. Routledge.
Satterly, D. (1981). Assessment in schools. Blackwell.
At the time of writing this Peter Tilsley was Principal Lecturer in Education at Worcester College of Higher Education where he worked as a lecturer in Psychology and ran a Masters degree course on educational provision for more able and gifted children. He has been active in the field of gifted education for over twenty years, notably with both NACE and NAGCin the U.K. This article was first published in Flying High, 2, 1995, 43-50, and is reprinted with permission of the author.