Test Bias and Anxiety: What Does the Research Tell Us?

In arguing for performance-based assessment, a “senior writer” for the neaToday publication wrote, “Most of us know that standardized tests are inaccurate, inequitable, and often ineffective at gauging what students actually know” (Long, 2023). Having served as a district and state assessment leader for many years, I found this statement surprising. After all, it seems there are more specialized assessments in use today than at any time in recent history. To mention a few, nearly every district and/or state administers English language proficiency tests, universal literacy screeners, computer-adaptive diagnostic tests, virtual course exams, writing assessments, progress-monitoring exams, and benchmark tests. These are in addition to the “traditional” state end-of-year summative assessments and the biennial NAEP reading and math tests. The number of students taking the SAT and ACT college entrance exams has declined somewhat in recent years – because many colleges and universities have adopted test-optional or test-free policies  – yet, “some 1.38 million students took the ACT in 2025 and about 2 million students took the SAT” (Modan, 2025).

This is not to say that there are no legitimate concerns and shortcomings with virtually any assessment. Most clearly, we know that student performance is often not perfectly consistent (i.e., reliable) over time. Therefore, it is critical that users of test data review all available information and not base decisions on a single point-in-time measure. It is also evident that tests require time. Schools must balance the benefits of assessments against the time they take from instruction. Further, as Schmidt et al. (2005) found, large-scale testing can “narrow” the curriculum offered, as schools target the “tested curriculum” to improve scores. As the Third International Mathematics and Science Study (TIMSS) Technical Report cited, many U.S. curricula area mile wide and an inch deep” (Martin & Kelly, 1996). In other words, they found that most standardized tests focused on relatively basic facts and information and included only multiple-choice questions rather than those requiring critical thinking, complex problem-solving, or real-world application.

Regarding diagnostic assessments, an old friend once “reminded” me that great teachers already know their students’ strengths and weaknesses, and don’t need more tests. After all, they are in class with their students every day, observing, listening, reading papers, grading, and monitoring work. (Of course, some might argue that teachers’ judgments, like tests, are fallible.) Another concern raised by many is that educators often focus on students’ level of attainment rather than their growth. As a teacher once told me (in praise of a portfolio system we were piloting), “the system forces me to focus on what students can do, not just their weaknesses.” Again, of course, this concern is more a function of educators’ use of test results than of the tests’ validity.

Test bias has been a major concern of detractors for many years. Among the most publicized critics of standardized tests, particularly the SAT and ACT, was the consumer advocate Ralph Nader. Among the consequences of the critical 1980 Nader-sponsored report, The Reign of ETS (Nairn, 1980), was the enactment of legislation in New York State, popularly known as the “Truth in Testing Act.” As a result of the law, “questions and correct answers for post-secondary or professional school admissions tests must be disclosed 30 days after the students have been told their scores. Test company information concerning test validity, score recalculation, formula reliability, and cultural and economic bias must also be made public. Exempt from disclosure are questions included in the ungraded sections of the tests, which are used to equate the test to other versions of the same test” (Nader, n.d.). In recent years, many studies have examined potential biases in the SAT and ACT (and many other tests). Interestingly, the Educational Testing Service (publisher of the SAT) has even removed a few items from math tests because studies found them biased, generally favoring white and male test-takers. FairTest has published an annotated bibliography of many of these studies (FairTest, 2007).

Among the biases noted in the Nader report were those resulting from economic and social advantages some students possess. For example, some students can afford to enroll in test-preparation classes, and some schools help students practice SAT- and ACT-like questions to maximize their performance. From a psychometric perspective, test publishers and many researchers have employed statistical and qualitative methods to find “construct-irrelevant variance”i.e., factors not measuring the intended skills (Zhai, 2021 and 2022; Heister, 2024). These studies have examined potential disadvantages related to race, gender, culture, and socioeconomic factors.

To avoid language biases, individually administered “intelligence” tests, such as the Wechsler Intelligence Scale for Children (WISC) and the Stanford-Binet Intelligence Scale, which are used in clinical or diagnostic settings, include non-verbal components. Similarly, controlling cultural biases is particularly critical when assessing language skills and in international studies of achievement (Heister, 2024; Girolamo, 20-22). Likewise, test publishers must “control” for language complexity when employing constructed-response (open-ended) questions and when assessing non-language content such as math, science, and social studies (Avenia-Tapper & Llosa, 2015; Keyvanfar & Rashtchi, 2008).

Perhaps the most common complaint I heard from parents over the years was that their kids were “not good at taking tests.” Recent estimates have suggested that between 15% and 22% of students exhibit high levels of “test anxiety” (Putwain & Daly, 2014; Thomas et al., 2017), and students with disabilities, women, and minority students reported higher rates of test anxiety (Putwain, 2014; Sena, Lowe, & Lee, 2007; Zeidner, 1990).

As with test bias, a number of studies have examined this issue and possible solutions. The most recent meta-review, Test Anxiety Effects, Predictors, and Correlates, was conducted by Von der Embse et al. (2018), who reviewed 238 studies. Performance areas were examined in relation to components of test anxiety: cognitive, affective, physiological, behavioral, and social.

Not surprisingly, the theoretical models of test anxiety have evolved over time. Early studies supported “an interference model, which explains depressed performance by identifying factors such as emotionality and worry that disturb the process of information recall and utilization during testing situations” (Von der Embs et al., 2018). Other theorists proposed deficit models, which held that test anxiety stemmed from deficits in the knowledge and skills required to perform well in evaluative situations – e.g., study skills, self-efficacy, motivation, testing strategies. Lowe et al. (2008) proposed a model that argued for a biological and psychological basis of test anxiety. The model proposed by Segool et al. (2014) “combined cognitive perceptions and prior learning experiences with demographic characteristics, social or educational context, and environmental contingencies (i.e., educational expectations).”

Timeline of Measures of Test Anxiety 

  • The Test Anxiety Scale for Children (TASC; Sarason et al., 1960)
  • Test Anxiety Profile (Oetting & Cole, 1980),
  • Reaction to Test (Sarason, 1981)
  • Revised Test Anxiety scale (Benson et al., 1992)
  • Test Anxiety Inventory (TAI; Spielberger, 1980)
  • FRIEDBEN Test Anxiety Scale (Friedman & Bendas-Jacob, 1997)
  • Cognitive Test Anxiety Scale (Cassady & Johnson, 2002)
  • Anxiety Inventory for Children and Adolescents (TAICA; Lowe et al., 2008)
  • Test Anxiety Scale for Elementary Students (TAS-E; Lowe et al., 2011)
  • B-FTAS (von der Embse et al., 2013a, 2013b)
  • DBR-A (von der Embse et al., 2015a, 2015b).

Von der Embse et al. (2018) noted that theoretical and measurement advances changed the understanding of the test anxiety construct, with biological, psychological, and environmental variables considered the primary factors. These include intrapersonal variables (self-efficacy, motivation, and self-regulation), social influences (performance expectations and values, achievement standards, and social support), and demographic variables (level of education, economic status, and cultural background).

“Moreover, the nature and use of testing for academic progress and achievement in schools have changed dramatically, thus again raising the importance of emotion in performance. For example, many countries have used high-stakes exams to determine student academic gains, teacher efficacy, and school effectiveness. Student test anxiety is higher on high-stakes exams than on typical classroom tests, underscoring the potential influence of this change on students” (Segool et al., 2013).

Consistent with Hembree (1988), Von der Embse et a. (2018) found a consistent pattern of relationships with higher levels of test anxiety and lower levels of performance with effect sizes ranging from small (r =−.13) to moderate (r= −.40). Their results found that students in the middle grades exhibited the largest negative effects and high school students the lowest. Further, the largest negative effects were with “university entrance exams and state standardized exams within secondary grades.” Test anxiety was also found to have a larger, negative relationship with verbal and cognitive proficiency tasks than with non-verbal tasks. Prior research has also suggested that the cognitive component of test anxiety may interfere with verbal demands (Markham & Darke, 1991; Lee, 1999).

Regarding intrapersonal variables, results from the Von der Embse et al. (2018) meta-analysis were largely similar to Hembree’s. Self-esteem and self-efficacy were consistently found to have the strongest relationships with test anxiety. However, locus of control had a weaker relationship with test anxiety (effect size = .04) than had previously been reported (.22). In other words, test anxiety is high among students who have difficulties due to negative beliefs about themselves and their ability to succeed in personal and academic environments. Results also indicated a small, significant positive relationship between test anxiety and extrinsic motivation. Thus, students who are motivated by external demands rather than internal interests are more likely to exhibit higher test anxiety.

The study also examined the relation between test anxiety and personality traits. Test anxiety was found to have its largest positive relationship with neuroticism and a small negative relationship with conscientiousness. “These findings align with prior research examining the attributes of these traits and their relation to anxiety, as Neuroticism is considered to be highly related to high levels of anxiety (Watson & Clark, 1984) and Conscientiousness as being related to variables, such as intrinsic motivation and self-efficacy, that have a negative relationship with test anxiety (Mount et al., 1995). Neuroticism is a core personality trait (part of the Big Five traits), defined as the tendency to experience negative emotions like anxiety, anger, and depression more frequently and intensely, alongside emotional instability, poor stress coping, and self-doubt” (Von der Embser et al., 2018).

[The “Big Five” refers to a model in psychology describing human personality through five broad dimensions: Openness, Conscientiousness, Extraversion, Agreeableness, and Neuroticism.]

A number of studies have found that certain demographic groups exhibit higher rates of test anxiety that may contribute to test achievement gaps (Cassady & Johnson, 2002; Steele & Aronson, 1995; Wolf & Smith, 1995; Cizek & Burg, 2006; Hembree, 1988). While the Von der Embse et al. (2018) analysis did find, consistent with earlier studies, that females are more likely to exhibit higher test anxiety, the effect size (.19) was smaller than in prior studies. However, unlike prior studies (e.g., Hembree, 1988), the 2018 analysis found only a small relationship with minority status (r = .11).

In summary, it is critical that educators use standardized (and classroom) test results judiciously. No test is perfect, and no student performs perfectly every day. It is critical that educators use the full range of available information when making grading, placement, and promotion decisions. “Test anxiety has consistently been demonstrated to have a negative relationship with important educational outcomes across two meta-analyses spanning nearly 70 years of research… (and) across many hundreds of studies and many thousands of participants” (Von der Embse et al., 2018). In addition, “today testing plays a much more prominent role in important educational decisions, ranging from grade promotion to university entrance to teacher evaluation.” As a result, it is more critical than ever for educators to understand the relationship between emotions and test performance to ensure accurate and equitable outcomes, and the importance of creating a positive testing environment for students. Educators should consider evidence-based test anxiety interventions that are targeted to the most vulnerable student subgroups (Von der Embse et al., 2014). While a school-wide approach is a first step, “targeted, individualized supports are important for students exhibiting higher levels of test anxiety” (Reiss et al., 2017).

—————————————-

Justice and equity are the foundation of NCEED’s six pillars (https://nceed.morgan.edu/) as we work with partners in Maryland and nationwide to address the most critical issues facing students, parents, communities, teachers, and administrators. Whole-child SEL strategies related to standardized testing, particularly in relation to gender and race, are among the many areas of interest of the NCEED staff and faculty.

References

Avenia-Tapper, B., & Llosa, L. (2015). Construct relevant or irrelevant? The role of linguistic complexity in the assessment of English language learners’ science knowledge. Educational Assessment20(2), 95-111.

Digest of Education Statistics. Washington, D.C.: U.S. Department of Education. Archived from the original on December 31, 2024.

FairTest, 2007. Selected Annotated Bibliography on The SAT: Bias and Misuse. https://fairtest.org/selected-annotated-bibliography-sat-bias-and-misus/#.

Girolamo, T., Ghali, S., Campos, I., & Ford, A. (2022). Interpretation and use of standardized language assessments for diverse school-age individuals. Perspectives of the ASHA special interest groups7(4), 981-994.

Heister, H., Strietholt, R., Doebler, P., & Baghaei, P. (2024). Circumventing construct-irrelevant variance in international assessments using cognitive diagnostic modeling: A curriculum-sensitive measure. Studies in Educational Evaluation83, 101393.

Keyvanfar, A., & Rashtchi, M. (2008). Task-Based Listening Assessment and the Influence of Construct-Irrelevant Variance. Journal of English Language Pedagogy and Practice1(Inaugural Issue), 45-65.

Lee, J.H., 1999. Test anxiety and working memory. J. Exp. Educ. 67 (3), 218–240. http:// dx.doi.org/10.1080/00220979909598354

Long, C., 2023. Standardized Testing is Still Failing Students. https://www.nea.org/nea-today/all-news-articles/standardized-testing-still-failing-students

Lowe, P., Lee, S., Witteborg, K., Prichard, K., Luhr, M., Cullinan, C., Janik, M., 2008. The Test Anxiety Inventory for Children and Adolescents (TAICA): examination of the psychometric properties of a new multidimensional measure of test anxiety among elementary and secondary school students. J. Psychoeducational Assessment. 26 (3), 215–230. http://dx.doi.org/10.1177/0734282907303760.

Markham, R., Darke, S., 1991. The effects of anxiety on verbal and spatial task performance. Aust. J. Psychol. 43 (2), 107–111

Martin, M. O., & Kelly, D. L. (1996). Third international mathematics and science study: An overview. Third international mathematics and science study (TIMSS) technical report1.

Modan, N., 2025. K-12 Dive. SAT and ACT participation remain below pre-pandemic levels. https://www.k12dive.com/news/sat-and-act-participation-remains-low-compared-to-pre-pandemic-college-board-admissions/

Mount, M., Barrick, M., & Callans, M. (1995). Manual for the Personal Characteristics Inventory.

Nader, n.d., Truth-in-Test Law Bodes and Ill Wind for SATs, ACTs. https://nader.org/1979/09/13/truth-in-test-law-bodes-and-ill-wind-for-sats-acts/#.

Nairn, A. (1980). ” The Reign of ETS.” Today’s Education69(2), 58-64.

Putwain, D. (2007). Test anxiety in UK schoolchildren: Prevalence and demographic patterns. British Journal of Educational Psychology, 77, 579 – 593. Putwain, D. (2008). Deconstructing test anxiety. Emotional & Behavioural Difficulties, 13, 141 – 155.

Putwain, D., Daly, A.L., 2014. Test anxiety prevalence and gender differences in a sample of English secondary school students. Educ. Stud. 40 (5), 554–570. http://dx.doi.org/ 10.1080/03055698.2014.953914.

Putwain, D., Daly, A.L., 2014. Test anxiety prevalence and gender differences in a sample of English secondary school students. Educ. Stud. 40 (5), 554–570. http://dx.doi.org/ 10.1080/03055698.2014.953914.

Reiss, N., Warnecke, I., Tolgou, T., Krampen, D., Luka-Krausgrill, U., Rohrmann, S., 2017. Effects of cognitive behavioral therapy with relaxation vs. imagery rescripting on test anxiety: a randomized controlled trial. Journal of Affective Disorders, 208, 483–489. http://dx.doi. org/10.1016/j.jad.2016.10.039

Rosairio, P., Naez, J. C., Salgado, A., González-Pienda, J. A., Valle, A., Joly, C., et al. (2008). Test anxiety: Associations with personal and family variables. Psicothema, 20(4), 563 – 570.

Sarason, I.G., 1981. Test anxiety, stress, and social support. J. Personal. 49 (1), 101–114. http://dx.doi.org/10.1111/j.1467-6494.1981.tb00849.x

Schmidt, W. H., Wang, H. C., & McKnight, C. C. (2005). Curriculum coherence: An examination of US mathematics and science content standards from an international perspective. Journal of Curriculum Studies37(5), 525–559.

Segool, N., von der Embse, N.P., Mata, A., Gallant, J., 2014. Cognitive behavioral model of test anxiety in a high-stakes context: an exploratory study. Sch. Ment. Health 6, 50–61. http://dx.doi.org/10.1007/s12310-013-9111-7.

Sena, J. D. W., Lowe, P. A., & Lee, S. W. (2007). Significant predictors of test anxiety among students with and without learning disabilities. Journal of Learning Disabilities, 40, 360 – 376.

Sommer, M., Arendasy, M.E., 2014. Comparing different explanations of the effect of test anxiety on respondents’ test scores. Intelligence 42, 115–127. http://dx.doi.org/10. 1016/j.intell.2013.11.003.

Thomas, C.L., Cassady, J.C., Finch, W.H., 2017. Identifying severity standards on the cognitive test anxiety scale: cut score determination using latent class and cluster analysis. J. Psychoeduc. Assess. http://dx.doi.org/10.1177/0734282916686004

Von der Embse, N., Jester, D., Roy, D., & Post, J. (2018). Test anxiety effects, predictors, and correlates: A 30-year meta-analytic review. Journal of Affective Disorders227, 483–493.

Zeidner, M. (1990). Does test anxiety bias scholastic aptitude test performance by gender and sociocultural group?. Journal of Personality Assessment, 55, 145.

Zhai, X., Haudek, K. C., Stuhlsatz, M. A., & Wilson, C. (2020). Evaluation of construct-irrelevant variance yielded by machine and human scoring of a science teacher’s PCK constructed response assessment. Studies in Educational Evaluation67, 100916.

Zhai, X., Haudek, K. C., Wilson, C., & Stuhlsatz, M. (2021, October). A framework of construct-irrelevant variance for contextualized constructed response assessment. In Frontiers in Education (Vol. 6, p. 751283). Frontiers Media SA.