亚洲免费av电影一区二区三区,日韩爱爱视频,51精品视频一区二区三区,91视频爱爱,日韩欧美在线播放视频,中文字幕少妇AV,亚洲电影中文字幕,久久久久亚洲av成人网址,久久综合视频网站,国产在线不卡免费播放

        ?

        Introduction to Classical Test Theory

        2017-03-31 21:49:39孫千惠
        青春歲月 2017年3期
        關(guān)鍵詞:理論

        Abstract:This paper gives an introduction to the Classical Test Theory (CTT), including the history, the procedure, the expansion of CTT. Also in this paper, shortcomings and reasons of its downfall are listed.

        Key words:CTT;theory introduction

        【摘要】本文介紹了經(jīng)典測(cè)試?yán)碚?,并且給出了經(jīng)典測(cè)試?yán)碚摰陌l(fā)展歷史,使用流程以及拓展。此外,文中還介紹了經(jīng)典測(cè)試?yán)碚摰娜秉c(diǎn)和其逐漸沒落的原因。

        【關(guān)鍵詞】經(jīng)典測(cè)試?yán)碚?;理論介紹

        1. Introduction

        Classical Test Theory (CTT) is a body of related psychometric theory that predicts outcomes of psychological testing such as the difficulty of items or the ability of test-takers. Generally speaking, the aim of the theory is to understand and improve the reliability of psychological tests.

        2. History

        CTT was born only after the following 3 achievements or ideas were conceptualized: a recognition of the presence of errors in measurements, a conception of that error as a random variable, and a conception of correlation and how to index it. In 1904, Charles Spearman was responsible for figuring out how to correct a correlation coefficient for attenuation due to measurement error and to obtain the index of reliability needed in making the correction, and his finding was seen as the beginning of the theory(Traub, 1997). Others who had an influence in the theorys framework include: G U Yule, K R Formulas, M R Novick, etc. CTT as we know it today was codified by Novick (1966) and described in classic texts such as Lord & Novick (1968) and Allen & Yen (1979/2002).

        Spearman created the theory in 1904, which was loosely utilized until 1966 when Novick put its use at the forefront of psychological theory (Novick, 1966). CTT can be identified as the theory of a true-test score, taking into account the previous score of a test item or a test-taking population to predict a future score for the same item or population. Using previous scores, theorists can predict which test questions will be answered correctly and which population tends to answer the questions successfully. Successful responses are then referred to as normative responses.

        When considering a population, the entire population must be taken into account. For example, if all of the eleventh graders in the United States took the Advanced Placement Exam (APE) for English and the same overall score was identified trial after trial, that score would be identified as the normative score for the population. It is meaningless when correlated with any individual. One could individually score higher or lower than the normative score; however, CTT can make reliable identifications based on populations or individuals, depending upon the purpose of the test.

        CTT believes that each person has a true score T that would be obtained if there were no errors in measurement. Unfortunately, test users never observe a person's true score, only an observed score, X, which is assumed to equal true score T plus some error E. The relations between the three variables X, T and E are used to describe the quality of test scores. The reliability of the observed test scores X, which is denoted as {\rho^2_{XT}}, is defined as the ratio of true score variance {\sigma^2_T} to the observed score variance {\sigma^2_X}:

        {\rho^2_{XT}} = \frac{{\sigma^2_T}}{{\sigma^2_X}}

        Because the variance of the observed scores can be shown to equal the sum of the variance of true scores and the variance of error scores, this is equivalent to

        {\rho^2_{XT}}=\frac{{\sigma^2_T}}{{\sigma^2_X}}= \frac{{\sigma^2_T}}{{\sigma^2_T}+{\sigma^2_E}}

        This equation, which formulates a signal-to-noise ratio, has intuitive appeal: The reliability of test scores becomes higher as the proportion of error variance in the test scores becomes lower and vice versa. The reliability is equal to the proportion of the variance in test scores that we could explain if we knew true scores. The square root of the reliability is the correlation between true and observed scores.

        3. The process of CTT

        1. come up with the question; 2. get data; 3. analysis data; 4. explain data; 5. come to a conclusion

        And the pattern of data contains: 1. Nominal scale; 2. Ordinal scale; 3. Interval scale

        4. Item Discrimination

        The more an item discriminates among individuals with different amounts of the underlying concept of interest, the higher the item-discrimination index. The extreme group method can be used to calculate the discrimination index using the following 3 steps. Step 1 is to partition respondents who have the highest and lowest overall scores on the overall scale, aggregated across all items, into upper and lower groups. Step 2 is to examine each item and determine the proportion of individual respondents in the sample who endorse or respond to each item in upper and lower groups. Step 3 is to subtract the pair of proportions noted in Step 2. The higher this item-discrimination index, the more the item discriminates. It is useful to compare the discrimination indexes of each of the items in the scale.

        5. Second language test

        For ESL students, the fastest growing community of school-age children, it is common to have a non-native English speaker in the classroom. However, there is only one exam given to ESL students, the Test of English as a Foreign Language (TOEFL), as an entrance exam for students applying to college. The format for the TOEFL is a standardized, multiple-choice question exam. Dudley (2006) offers that multiple true-false question exams (MTF), can be just as reliable and a valid alternative to multiple-choice tests, which can be confusing to students (p. 199).

        Dudley (2006) took two forms of test, which were multiple-choice in nature, and converted them to a multiple true-false format. He notes the findings are supportive with MTF format. (Dudley, 2006, p. 224) He also notes that conclusions of the study have provided sound empirical evidence that central factors such as item interdependence, reliability and concurrent validity are viable with MTF items that assess vocabulary and reading comprehension in the realm of norm-referenced testing (p. 224). Even though Dudley's (2006) focus was on undergraduate students, it is not a far reach to offer that teachers in the K-12 sector could begin creating MTF nature or converting already created multiple-choice exams to MTF using CTT.

        6. Reliability

        Reliability is important in the development of PRO measures. Validity is limited by reliability. If responses are inconsistent(unreliable), it necessarily implies invalidity. Reliability refers to the proportion of variance in a measure that can be ascribed to a common characteristic shared by the individual items, whereas validity refers to whether that characteristic is actually the one intended.

        Test–retest reliability, which can apply to both single-item and multi-item scales, reflects the reproducibility of scale scores on repeated administrations over a period during which the respondents condition did not change. As a way to compute test–retest reliability, the kappa statistic can be used for categorical responses, and the intraclass correlation coefficient can be used for continuous responses. Further, having multiple items in a scale increases its reliability. In multi-item scales, a common indicator of scale reliability is Cronbach coefficient alpha, which is driven by the number of items and correlations of items in the scale.

        The greater the proportion of shared variation, the more the items share in common and the more consistent they are in reflecting a common true score. The covariance-based formula for coefficient alpha expresses such reliability while adjusting for the number of items contributing to the prior calculations on the variances. The corresponding correlation–based formula, an alternative expression, represents coefficient alpha as the mean inter-item correlation among all pairs of items after adjustment for the number of items.

        7. Shortcomings

        One of the most well-known shortcomings of CTT is that examinee characteristics and test characteristics cannot be separated: each can only be interpreted in the context of the other. Another shortcoming lies in the definition of Reliability in CTT, which states that reliability is "the correlation between test scores on parallel forms of a test".The problem is that various reliability coefficients provide either lower bound estimates of reliability or reliability estimates with unknown biases. A third shortcoming involves the standard error of measurement. The problem here is that, the standard error of measurement is assumed to be the same for all examinees. However, as Hambleton explains in his book, scores on any test are unequally precise measures for examinees of different ability, thus making the assumption of equal errors of measurement for all examinees implausible (Hambleton, Swaminathan, Rogers, 1991, p.4). A fourth and final shortcoming of CTT is that it is test oriented, rather than item oriented. In other words, CTT cannot help us make predictions of how well an individual or even a group of examinees might do on a test item.

        What makes CTT effective is also its primary downfall in that the normative scores used to predict future scores are specific to the samples previously studied. One may have received the highest score on the exam but was grouped with the population of test-takers when the APE results were used to predict future success or effectiveness. (Reid, et. al, 2007, p. 179). A secondary problem with CTT is that to gain useful information, an entire testing instrument has to be completed to gain predictable information regarding a population or an individual. Only the completed exam is what matters. Finally, as Reid, et al. (2007) points out, "the instability of scores at extreme levels of an ability or trait, even within the normative sample" is a concern with CTT (p. 179).

        8. Conclusion

        Although CTT has a lot of shortcomings in modern life, but its truly a famous theory and contributes much to the education and modeling and something like that. Its a practical way of tackling the complex questions and problems by collecting data, analyzing data and giving answers.

        【Reference】

        [1] American Psychiatric Association. Diagnostic and statistical manual of mental disorders(3rd ed., rev.)[M]. Washington, DC: Author, 1987.

        [2] Bolton, B. Handbook of measurement and evaluation in rehabilitation(3rd ed.)[M]. Gaithersburg, MD: Aspen, 2001.

        [3] Brown, J.D. and Hudson, T. Criterion-referenced language testing[M]. New York, NY: Cambridge University Press, 2002.

        [4] Corkum, P. Andreou, P. Schachar, R. Tannock, R. & Cunningham, C. The Telephone Interview Probe[M]. Educational & Psychological Measurement, 2007:67,169-185.

        [5] Cronbach, L. J. Note on the multiple true-false test exercise[M]. Journal of Educational Psychology, 1939:30,628-31.

        [6] Cronbach, L. J., Nageswari, R., & Gleser, G.C. Theory of generalizability: A liberation of reliability theory[M]. The British Journal of Statistical Psychology, 1963:16,137-163.

        [7] Cronbach, L.J., Gleser, G.C., Nanda, H., & Rajaratnam, N. The dependability of behavioral measurements: Theory of generalizability for scores and profiles[M]. New York: John Wiley, 1972.

        [8] Dudley, A. Multiple dichotomous-scored items in second language testing: investigating the multiple true-false item type under norm-referenced conditions[M]. Language Testing, 2006:23,198-228.

        [9] Haladyna, T. M. Developing and validating multiple-choice test items(2nd ed.)[M]. Mahwah, NJ: Lawrence Erlbaum, 1999.

        [10] Koppitz, E. M. Psychological evaluation of children's human-figure drawings[M]. New York: Grune & Stratton, 1968.

        [11] Novick, M. R. The axioms and principal results of classical test theory[M]. Journal of Mathematical Psychology, 1966:3,1-18.

        【作者簡(jiǎn)介】

        孫千惠(1992—),女,漢族,碩士研究生學(xué)歷,天津市武警后勤學(xué)院大學(xué)英語(yǔ)助教,研究方向:外國(guó)語(yǔ)言學(xué)及應(yīng)用語(yǔ)言學(xué)。

        猜你喜歡
        理論
        堅(jiān)持理論創(chuàng)新
        神秘的混沌理論
        理論創(chuàng)新 引領(lǐng)百年
        相關(guān)于撓理論的Baer模
        多項(xiàng)式理論在矩陣求逆中的應(yīng)用
        基于Popov超穩(wěn)定理論的PMSM轉(zhuǎn)速辨識(shí)
        十八大以來黨關(guān)于反腐倡廉的理論創(chuàng)新
        “3T”理論與“3S”理論的比較研究
        理論宣講如何答疑解惑
        婦女解放——從理論到實(shí)踐
        日本五十路熟女在线视频| 激情偷乱人成视频在线观看| 久久久久亚洲av无码观看| 窄裙美女教师在线观看视频| 国产亚洲av一线观看| 亚洲av无码一区二区三区天堂| 黑人巨茎大战欧美白妇| 国产熟女亚洲精品麻豆| 午夜国产在线精彩自拍视频| av素人中文字幕在线观看| 亚洲va无码手机在线电影| 日韩高清毛片| 国产精品一区二区三区黄片视频| 日本人妻免费一区二区三区| 亚洲成av人片在线观看www| 中日韩欧美在线观看| 亚洲视频在线中文字幕乱码| 中文字幕一区二区人妻秘书| 亚洲狠狠婷婷综合久久久久图片| 亚洲aⅴ无码日韩av无码网站| 中文字幕一区二区三区精品在线| 日本少妇又色又爽又高潮| 俄罗斯老熟妇色xxxx| 狠狠干视频网站| 日本免费一区二区久久久| 国产欧美一区二区精品久久久| 亚洲国产区男人本色| 日韩女优中文字幕在线| 美女主播网红视频福利一区二区| 亚洲精品一区国产欧美| 国产精品亚洲专区无码web | 一区二区三区字幕中文| 亚洲人成电影在线观看天堂色| 精品不卡久久久久久无码人妻| 国产三级av在线精品| 香蕉免费一区二区三区| 国产成人午夜福利在线观看者| 亚洲一区久久久狠婷婷| 国产av无码专区亚洲精品| 国产三级在线观看免费| 91亚洲精品久久久蜜桃|