Georgine M. Pion
David S. Cordray
QUALITATIVE AND ETHNOGRAPHIC
LeAnn G. Putney
Judith L. Green
Carol N. Dixon
SCHOOL AND PROGRAM EVALUATION
How do people learn to be effective teachers? What percentage of American students has access to computers at home? What types of assessments bestPage 2021 | Top of Article measure learning in science classes? Do college admission tests place certain groups at a disadvantage? Can students who are at risk for dropping out of high school be identified? What is the impact of new technologies on school performance? These are some of the many questions that can be informed by the results of research.
Although research is not the only source used for seeking answers to such questions, it is an important one and the most reliable if executed well. Research is a process in which measurements are taken of individuals or organizations and the resulting data are subjected to analysis and interpretation. Special care is taken to provide as accurate an answer as possible to the posed question by subjecting "beliefs, conjectures, policies, positions, sources of ideas, traditions, and the like … to maximum criticism, in order to counteract and eliminate as much intellectual error as possible" (Bartley, pp. 139–140). In collecting the necessary information, a variety of methodologies and procedures can be used, many of which are shared by such disciplines as education, psychology, sociology, cognitive science, anthropology, history, and economics.
Evidence–The Foundation of Research
In education, research is approached from two distinct perspectives on how knowledge should be acquired. Research using quantitative methods rests on the belief that individuals, groups, organizations, and the environments in which they operate have an objective reality that is relatively constant across time and settings. Consequently, it is possible to construct measures that yield numerical data on this reality, which can then be further probed and interpreted by statistical analyses. In contrast, qualitative research methods are rooted in the conviction that "features of the social environment are constructed as interpretations by individuals and that these interpretations tend to be transitory and situational" (Gall, Borg, and Gall, p. 28). It is only through intensive study of specific cases in natural settings that these meanings and interpretations can be revealed and common themes educed. Although debate over which perspective is "right" continues, qualitative and quantitative research share a common feature–data are at the center of all forms of inquiry.
Fundamentally, data gathering boils down to two basic activities: Researchers either ask individuals (or other units) questions or observe behavior. More specifically, individuals can be asked about their attitudes, beliefs, and knowledge about past or current behaviors or experiences. Questions can also tap personality traits and other hypothetical constructs associated with individuals. Similarly, observations can take on a number of forms: (1) the observer can be a passive transducer of information or an active participant in the group being observed;(2) those being observed may or may not be aware that their behavior is being chronicled for research purposes; and (3) data gathering can be done by a human recorder or through the use of technology (e.g., video cameras or other electronic devices). Another distinction that is applicable to both forms of data gathering is whether the data are developed afresh within the study (i.e., primary data) or stem from secondary sources (e.g., data archives; written documents such as academic transcripts, individualized educational plans, or teacher notes; and artifacts that are found in natural settings). Artifacts can be very telling about naturally occurring phenomena. These can involve trace and accretion measures–that is, "residue" that individuals leave behind in the course of their daily lives. Examples include carpet wear in front of exhibits at children's museums (showing which exhibits are the most popular), graffiti written on school buildings, and websites visited by students.
What should be clear from this discussion so far is that there exists a vast array of approaches to gathering evidence about educational and social phenomena. Although reliance on empirical data distinguishes research-based disciplines from other modes of knowing, decisions about what to gather and how to structure the data gathering process need to be governed by the purpose of the research. In addition, a thoughtful combination of data gathering approaches has the greater chance of producing the most accurate answer.
Purposes of Research
The array of questions listed in the introductory paragraph suggests that research is done for a variety of purposes. These include exploring, describing, predicting, explaining, or evaluating some phenomenon or set of phenomena. Some research is aimed at replicating results from previous studies; other research is focused on quantitatively synthesizing a body of research. These two types of efforts are directed at strengthening a theory, verifying predictions, or probing the robustness of explanations by seeing if they hold true for different types of individuals, organizations, or settings.
Exploration. Very little may be known about some phenomena such as new types of settings, practices, or groups. Here, the research question focuses on identifying salient characteristics or features that merit further and more concerted examination in additional studies.
Description. Often, research is initiated to carefully describe a phenomenon or problem in terms of its structure, form, key ingredients, magnitude, and/or changes over time. The resulting profiles can either be qualitative or narrative, quantitative (e.g., x number of people have this characteristic), or a mixture of both. For example, the National Center for Education Statistics collects statistical information about several aspects of education and monitors changes in these indicators over time. The information covers a broad range of topics, most of which are chosen because of their interest to policymakers and educational personnel.
Prediction. Some questions seek to predict the occurrence of specific phenomena or states on the basis of one or more other characteristics. Short-and long-term planning are often the main rationale for this type of research.
Explanation. It is possible to be able to predict the occurrence of a certain phenomenon but not to know exactly why this relationship exists. In explanatory research, the aim is to not only predict the out-come or state of interest but also understand the mechanisms and processes that result in one variable causing another.
Evaluation. Questions of this nature focus on evaluating or judging the worth of something, typically an intervention or program. Of primary interest is to learn whether an organized set of activities that is aimed at correcting some problem (e.g., poor academic skills, low self-esteem, disruptive behavior) is effective. When these efforts are targeted at evaluating the potential or actual success of policies, regulations, and laws, this is often known as policy analysis.
Replication. Some questions revolve around whether a demonstrated relationship between two variables (e.g., predictive value of the SAT in college persistence) can be again found in different populations or different types of settings. Because few studies can incorporate all relevant populations and settings, it is important to determine how generalizable the results of a study to a particular group or program are.
Synthesis. Taking stock of what is known and what is not known is a major function of research. "Summing-up" a body of prior research can take quantitative (e.g., meta-analysis) and qualitative (narrative summaries) forms.
Types of Research Methods
The purpose or purposes underlying a research study guide the choice of the specific research methods that are used. Any individual research study may address multiple questions, not all of which share the same purpose. Consequently, more than one research method may be incorporated into a particular research effort. Because methods of investigation are not pure (i.e., free of bias), several types of data and methods of gathering data are often used to "triangulate" on the answer to a specific question.
Measurement development. At the root of most inquiry is the act of measuring key conceptual variables of interest (e.g., learning strategies, intrinsic motivation, learning with understanding). When the outcomes being measured are important (e.g., grade placement, speech therapy, college admission), considerable research is often needed prior to conducting the main research study to ensure that the measure accurately describes individuals' status or performance. This can require substantial data collection and analysis in order to determine the measure's reliability, validity, and sensitivity to change; for some measures, additional data from a variety of diverse groups must be gathered for establishing norms that can assist in interpretation. With the exception of exploratory research, the quality of most studies relies heavily upon the degree to which the data-collection instruments provide reliable and valid information on the variables of interest.
Survey methodology. Survey research is primarily aimed at collecting self-report information about a population by asking questions directly of some sample of it. The members of the target population can be individuals (e.g., local teachers), organizations (e.g., parent–teacher associations), or other recognized bodies (e.g., school districts or states). The questions can be directed at examining attitudes and preferences, facts, previous behaviors, and past experiences. Such questions can be asked by interviewers either face-to-face or on the telephone; they can also be self-administered by distributing them to groups (e.g., students in classrooms) or delivering them via the mail, e-mail, or the Internet.
High-quality surveys devote considerable attention to reducing as much as possible the major sources of error that can bias the results. For example, the target population needs to be completely enumerated so that important segments or groups are not unintentionally excluded from being eligible to participate. The sample is chosen in a way as to be representative of the population of interest, which is best accomplished through the use of probability sampling. Substantial time is given to constructing survey questions, pilot testing them, and training interviewers so that item wording, question presentation and format, and interviewing styles are likely to encourage thoughtful and accurate responses. Finally, concerted efforts are used to encourage all sampled individuals to complete the interview or questionnaire.
Surveys are mainly designed for description and prediction. Because they rarely involve the manipulation of independent variables or random assignment of individuals (or units) to conditions, they generally are less useful by themselves for answering explanatory and effects-oriented evaluative questions. If survey research is separated into its two fundamental components–sampling and data gathering through the use of questionnaires–it is easy to see that survey methods are embedded within experimental and quasi-experimental studies. For example, comparing learning outcome among students enrolled in traditional classroom-based college courses with those of students completing the course through distance learning would likely involve the administration of surveys that assess student views of the instructor and their satisfaction with how the course was taught. As another illustration, a major evaluation of Sesame Street that randomly assigned classrooms to in-class viewing of the program involved not only administering standardized reading tests to the students participating but also surveys of teachers and parents. So, in this sense, many forms of inquiry can be improved by using state-of-the-art methods in questionnaire construction and measurement.
Observational methods. Instead of relying on individuals' self-reports of events, researchers can conduct their own observations. This is often preferable when there is a concern that individuals may misreport the requested information, either deliberately or inadvertently (e.g., they cannot remember). In addition, some variables are better measured by direct observation. For example, in comparing direct observations of how long teachers lecture in a class as opposed to asking teachers to self-report the time they spent lecturing; it should be obvious that the latter could be influenced (biased upward or downward) by how the teachers believe the researcher wants them to respond.
Observational methods are typically used in natural settings, although, as with survey methods, observations can be made of behaviors even in experimental and quasi-experimental studies. Both quantitative and qualitative observation strategies are possible. Quantitative strategies involve either training observers to record the information of interest in a systematic fashion or employing audiotape recorders, video cameras, and other electronic devices. When observers are used, they must be trained and monitored as to what should be observed and how it should be recorded (e.g., the number of times that a target behavior occurs during an agreed-upon time period).
Qualitative observational methods are distinctly different in several ways. First, rather than coding a prescribed set of behaviors, the focus of the observations is deliberately left more open-ended. By using open-ended observation schemes, the full range of individuals' responses to an environment can be recorded. That is, observations are much broader in contrast to quantitative observational strategies that focus on specific behaviors. Second, observers do not necessarily strive to remain neutral about what they are observing and may include their own feelings and experiences in interpreting what happened. Also, observers who employ quantitative methods do not participate in the situations that they are observing. In contrast, observers in qualitative research are not typically detached from the setting being studied; rather, they are more likely to be complete participants where the researcher is a member of the setting that is being observed.
Qualitative strategies are typically used to answer exploratory questions as they help identify important variables and hypotheses about them. They also are commonly used to answer descriptive questions because they can provide in-depth information about groups and situations. Although qualitative strategies have been used to answer predictive, explanatory, and evaluative questions, they are less able to yield results that can eliminate all rival explanations for causal relationships.
Experimental methods. Experimental research methods are ideally suited for examining explanatoryPage 2024 | Top of Article questions that seek to ascertain whether a cause-and-effect relationship exists among two or more variables. In experiments, the researcher directly manipulates the cause (the independent variable), assigns individuals randomly to various levels of the independent variable, and measures their responses (the expected effect). Ideally, the researcher has a high degree of control over the presentation of the purported cause–where, when, and in what form it is delivered; who receives it; and when and how the effect is measured. This level of control helps rule out alternative or rival explanations for the observed results. Exercising this control typically requires that the research be done under laboratory or contrived conditions rather than in natural settings. Experimental methods, however, can also be used in real-world settings–these are commonly referred to as field experiments.
Conducting experiments in the field is more difficult inasmuch as the chances increase that integral parts of the experimental method will be compromised. Participants may be more likely to leave the study and thus be unavailable for measurement of the outcomes of interest. Subjects who are randomly assigned to the control group, which may receive no tutoring, may decide to obtain help on their own–assistance that resembles the intervention being tested. Such problems essentially work against controlling for rival explanations and the key elements of the experimental method are sacrificed. Excellent discussions of procedures for conducting field experiments can be found in the 2002 book Experimental and Quasi-Experimental Designs for Generalized Causal Inference, written by William R. Shadish, Thomas D. Cook, and Donald T. Campbell, and in Robert F. Boruch's 1997 book Randomized Field Experiments for Planning and Evaluation: A Practical Guide.
Quasi-experimental methods. As suggested by its name, the methods that comprise quasi-experimental research approximate experimental methodologies. They are directed at fulfilling the same purposes–explanation and evaluation–but may provide more equivocal answers than experimental designs. The key characteristic that distinguishes quasi experiments from experiments is the lack of random assignment. Because of this, researchers must make concerted efforts to rule out the plausible rival hypotheses that random assignment is designed to eliminate.
Quasi-experimental designs constitute a core set of research strategies because there are many instances in which it is impossible to successfully assign participants randomly to different conditions or levels of the independent variable. For example, the first evaluation of Sesame Street that was conducted by Samuel Ball and Gerry Bogatz in 1970 was designed as a randomized experiment where individual children in five locations were randomly assigned to either be encouraged to watch the television program (and be observed in their homes doing it) or not encouraged. Classrooms in these locations were also either given television sets or not, and teachers in classrooms with television sets were encouraged to allow the children to view the show at least three days per week. The study, however, turned into a quasi experiment because Sesame Street became so popular that children in the control group (who were not encouraged to watch) ended up watching a considerable number of shows.
The two most frequently used quasi-experimental strategies are time-series designs and nonequivalent comparison group designs, each of which has some variations. In time-series designs, the dependent variable or expected effect is measured several times before and after the independent variable is introduced. For example, in a study of a zero tolerance policy, the number of school incidents related to violence and substance use are recorded on a monthly basis for twelve months before the policy is introduced and twelve or more months after its implementation. If a noticeable reduction in incidents occurs soon after the new policy is introduced and the reduction persists, one can be reasonably confident that the new policy was responsible for the observed increase if no other events occurred that could have resulted in a decline and there was evidence that the policy was actually enforced. This confidence may be even stronger if data are collected on schools that have similar student populations and characteristics but no zero tolerance policies during the same period and there is no reduction in illegal substance and violence-related incidents.
Establishing causal relationships with the nonequivalent comparison group design is typically more difficult. This is because when groups are formed in ways other than random assignment (e.g., participant choice), this often means that they differ in other ways that affect the outcome of interest. For example, suppose that students who are having problems academically are identified and allowed toPage 2025 | Top of Article choose to be involved or not involved in an after-school tutoring program. Those who decide to enroll are also those who may be more motivated to do well, who may have parents who are willing to help their children improve, and who may differ in other ways from those who choose not to stay after school. They may also have less-serious academic problems. Such factors all may contribute to these students exhibiting higher academic gains than their nontutored counterparts do when after-tutoring testing has been completed. It is difficult, however, to disen-tangle the role that tutoring contributed to any observed improvement from these other features. The use of well-validated measures of these characteristics for both groups prior to receiving or not receiving tutoring can help in this process, but the difficulty is to identify and measure all the key variables other than tutoring receipt that can influence the observed outcomes.
Secondary analysis and meta-analysis. Both secondary analysis and meta-analysis are part of the arsenal of quantitative research methods, and both rely on research data already collected by other studies. They are invaluable tools for informing questions that seek descriptive, predictive, explanatory, or evaluative answers. Studies that rely on secondary analysis focus on examining and reanalyzing the raw data from prior surveys, experiments, and quasi experiments. In some cases, the questions prompting the analysis are ones that were not examined by the original investigator; in other cases, secondary analysis is performed because the researcher disagrees to some extent with the original conclusions and wants to probe the data, using different statistical techniques.
Secondary analyses occupy a distinct place in educational research. Since the 1960s federal agencies have sponsored several large-scale survey and evaluation efforts relevant to education, which have been analyzed by other researchers to re-examine the reported results or answer additional questions not addressed by the original researchers. Two examples, both conducted by the National Center for Education Statistics, include the High School and Beyond Survey, which tracks seniors and sophomores as they progress through high school and college and enter the workplace; and the Schools and Staffing Survey, which regularly collects data on the characteristics and qualifications of teachers and principals, class size, and other school conditions.
The primary idea underlying meta-analysis or research synthesis methods is to go beyond the more traditional, narrative literature reviews of research in a given area. The process involves using systematic and comprehensive retrieval practices for accumulating prior studies, quantifying the results by using a common metric (such as the effect size), and statistically combining this collection of results. In general, the reported results that are used from studies involve intermediate statistics such as means, standard deviations, proportions, and correlations.
The use of meta-analysis grew dramatically in the 1990s. Its strength is that it allows one to draw conclusions across multiple studies that addressed the same question (e.g., what have been the effects of bilingual education?) but used different measures, populations, settings, and study designs. The use of both secondary analysis and meta-analysis has increased the longer-term value of individual research efforts, either by increasing the number of questions that can be answered from one large-scale survey or by looking across several small-scale studies that seek answers to the same question. These research methods have contributed much in addressing policymakers' questions in a timely fashion and to advancing theories relevant to translating educational research into recommended practices.
BALL, SAMUEL, and BOGATZ, GERRY A. 1970. The First Year of Sesame Street: An Evaluation. Princeton, NJ: Educational Testing Service.
BARTLEY, WILLIAM W., III. 1962. The Retreat to Commitment. New York: Knopf.
BORUCH, ROBERT F. 1997. Randomized Field Experiments for Planning and Evaluation: A Practical Guide. Thousand Oaks, CA: Sage.
BRYK, ANTHONY S., and RAUDENBUSH, STEPHEN W. 1992. Hierarchical Linear Models: Applications and Data Analysis Methods. Newbury Park, CA: Sage.
COOK, THOMAS D.; COOPER, HARRISON; CORDRAY, DAVID S.; HARTMANN, HEIDI; HEDGES, LARRYV.; LIGHT, RICHARD J.; LOUIS, THOMAS A.; and MOSTELLER, FREDERICK, eds. 1992. MetaanalysisPage 2026 | Top of Article for Explanation: A Casebook. New York: Russell Sage Foundation.
COOPER, HARRISON, and HEDGES, LARRY V., eds. 1994. The Handbook of Research Synthesis. New York: Russell Sage Foundation.
GALL, MERIDITH D.; BORG, WALTER R.; and GALL, JOYCE P. 1966. Educational Research: An Introduction, 6th edition. White Plains, NY: Long-man
SHADISH, WILLIAM R.; COOK, THOMAS D.; and CAMPBELL, DONALD T. 2002. Experimental and Quasi-Experimental Designs for Generalized Causal Inference. Boston: Houghton Mifflin.
GEORGINE M. PION
DAVID S. CORDRAY
QUALITATIVE AND ETHNOGRAPHIC
A qualitative approach to research generally involves the researcher in contact with participants in their natural setting to answer questions related to how the participants make sense of their lives. Qualitative researchers may observe the participants and conduct formal and informal interviews to further an understanding of what is going on in the setting from the point of view of those involved in the study. Ethnographic research shares these qualitative traits, but ethnographers more specifically seek understanding of what participants do to create the culture in which they live, and how the culture develops over time. This article further explores what it means to conduct qualitative and ethnographic research by looking at them historically and then by describing key characteristics of these approaches.
The Context in Education
Qualitative and ethnographic research developed in education in the late 1970s. Ethnographic researchers drew on theory and methods in anthropology and sociology, creating a distinction between ethnography of education (work undertaken by anthropologists and sociologists) and ethnography in education (work undertaken by educators to address educational issues). Other forms of qualitative research drew on theories from the humanities and other social and behavioral sciences, adapting this work to educational goals and concerns, often creating new forms (e.g., connoisseurship, a field method approach, interview approaches, and some forms of action research).
In the early development of these traditions, educational researchers struggled for acceptance by both other professionals and policymakers. This phase was characterized by arguments over the value of qualitative methods in contrast to the dominant paradigms of the time–quantitative and experimental approaches. Qualitative and ethnographic researchers argued that questions important to education were left unexamined by the dominant paradigms. Some qualitative researchers argued for the need to include and represent the voices of people in their research, particularly voices not heard in other forms of research involving large-scale studies.
Questions asked by qualitative and ethnographic researchers generally focus on understanding the local experiences of people as they engage in their everyday worlds (e.g., classrooms, peer groups, homes, communities). For example, some researchers explore questions about ways in which people gain, or fail to gain, access to ways of learning in a diverse world; others focus on beliefs people hold about education and learning; while still others examine how patterns learned within a group are consequential for participation in other groups and situations.
A broad range of perspectives and approaches exist, each with its own historical tradition and theoretical orientation. A number of common dimensions can be identified across these perspectives and approaches. Qualitative and ethnographic researchers in education are concerned with the positions they take relative to participants and data collected. For example, many qualitative and ethnographic researchers engage in observations over a period of time to identify patterns of life in a particular group.
The theoretical orientation chosen guides the design and implementation of the research, including the tools used to collect (e.g., participant observation, interviewing, and collecting artifacts) and analyze data (e.g., discourse analysis, document analysis, content analysis, and transcribing video/audio data). Theory also guides other decisions, including how to enter the field (e.g., the social group, classroom, home, and/or community center), what types and how much data to collect and records to make (e.g., videotape, audiotape, and/or field notes), who to interview (formally and/or informally), how long to remain in the field (e.g., for ethnography, one or more years), and what literature is relevant. It also influences relationships researchers establish with people in local settings, which in turn influences what can be known. Some theoretical perspectivesPage 2027 | Top of Article guide researchers to observe what is occurring from a distance by taking the role of passive observer, recording information for analysis once they leave the field. Such researchers often do not interview participants, preferring to "ground" their observations in patterns in the data, without concern for what members understand. These descriptions are called etic, or outsider descriptions, because the observer is not concerned with members' understandings.
This approach is in contrast with ones in which researchers join the group and become active participant-observers, at times participating directly in events. Such researchers also make videotape records that enable them to step back from what they thought was occurring to examine closely what resulted from those actions. Those not using video or audio records reconstruct events by constructing retrospective field notes, drawing on their memories of what occurred to create a written record to analyze when they leave the field. Just which type of approach and position researchers take depends on their research goal (s) and theoretical orientation (s) as well as what participants permit.
Approaches to Research Questions
Research questions in a qualitative study are generated as part of the research process. Qualitative and ethnographic researchers often begin a study with one or more initiating question (s) or an issue they want to examine. Qualitative and ethnographic research approaches involve a process of interacting with data, reflecting on what is important to members in the local setting, and using this to generate new questions and refine the initial questions. This interactive and responsive process also influences the data that are collected and analyzed throughout the study. Therefore, it is common for researchers to construct more detailed questions that are generated as part of the analysis as they proceed throughout the study, or to abandon questions and generate ones more relevant to the local group or issues being studied.
For example, in one study of a fifth-grade classroom, the initial research questions were open ended and general: (1) What counts as community to the students and teacher in this classroom? (2) How do the participants construct community in this classroom? and (3) How is participating in this classroom consequential for students and the teacher? As the study unfolded, the research questions became more directed toward what the researcher was beginning to understand about this classroom in particular. After first developing an understanding of patterns of interactions among participants, the researcher began to formulate more specific questions: (1) What patterns of practice does the teacher construct to offer opportunities for learning? (2) What roles do the social and academic practices play in the construction of community in this classroom? and (3) What are the consequences for individuals and the collective when a member leaves and reenters the classroom community? This last question was one that could not have been anticipated but was important to understanding what students learned and when student learning occurred as well as what supported and constrained that learning. The shifts in questions constitute this researcher's logic of inquiry and need to be reported as part of the dynamic design of the study.
Approaches to Design and Data Collection
In designing qualitative studies, researchers consider ways of collecting data to represent the multiple voices and actions constituting the research setting. Typical techniques used in qualitative research for collecting data include observing in the particular setting, conducting interviews with various participants, and reviewing documents or artifacts. The degree to which these techniques are used depends on the nature of the particular research study and what occurs in the local group.
Some studies involve in-depth analysis of one setting or interviews of one group of people. Others involve a contrastive design from the beginning, seeking to understand how the practices of one group are similar to or different from another group. Others seek to study multiple communities to test hypotheses from the research literature (e.g., child-rearing practices are the same in all communities). What is common to all of these studies is that they are examining the qualities of life and experiences within a local situation. This is often called a situated perspective.
Entering the Field and Gaining Access to Insider Knowledge
Entering the research setting is one of the first phases of conducting fieldwork. Gaining access to the site is ongoing and negotiated with the participants throughout the study. As new questions arise, the researcher has to renegotiate access. For example, a researcherPage 2028 | Top of Article may find that the outcomes of standardized tests become an important issue for the teachers and students. The researcher may not have obtained permission to collect these data at the beginning of the study and must then negotiate permission from parents, students, teachers, and district personnel to gain access to these scores.
Qualitative research involves a social contract with those participating in the study, and informed consent is negotiated at each phase of the research when new information is needed or new areas of study are undertaken. At such points of renegotiation, researchers need to consider the tools necessary and the ways to participate within the group (e.g., as participant-observer and/or observer-participant, as interviewer of one person or as a facilitator of a focus group, or as analyst of district data or student products). How the researcher conducts observations, collects new forms of data, and analyzes such data is related to shifts in questions and/or theoretical stance (s) necessary to understand what is occurring.
One of the most frequently used tools, in addition to participant observation, is interviewing. For ethnography and other types of field research, interviews occur within the context of the ongoing observations and collection of artifacts. These interviews are grounded in what is occurring in the local context, both within and across time. Some interviews are undertaken to gain insider information about what the researcher is observing or to test out the developing theory that the researcher is constructing.
In contrast, other forms of qualitative research may use interviews as the sole form of data collection. Such interviews also seek meanings that individuals or groups have for their own experience or of observed phenomena. These interviews, however, form the basis for analysis and do not require contextual information from observations. What the people say becomes the basis for exploration, not what was observed.
Other tools used by qualitative and ethnographic researchers include artifact and document analysis (artifacts being anything people make and use). The researcher in a field-based study collects artifacts produced and/or used by members of the group, identifies how these artifacts function for the individual and/or the group, and explores how members talk about and name these artifacts. For some theoretical positions, the artifacts may be viewed as a type of participant in the local event (e.g., computer programs as participants). Some artifacts, such as documents, are examined for links to other events or artifacts. This form of analysis builds on the understanding that the past (and future) is present in these artifacts and that intertextual links between and among events are often inscribed in such documents. In some cases, qualitative researchers may focus solely on a set of artifacts (e.g., student work, linked sets of laws, a photograph collection, or written texts in the environment–environmental print). Such studies seek to examine the range of texts or materials constructed, the patterned ways in which the texts are constructed, and how the choices of focus or discourse inscribe the views that members have of self and others as well as what is possible in their worlds.
Although some qualitative studies focus solely on the documents, field-based researchers generally move between document analysis and an exploration of the relationship of the document to past, present, and future actions of individuals and/or groups. These studies seek to understand the importance of the artifact or document within the lives of those being studied.
Ongoing Data Analysis
While conducting fieldwork, researchers reread their field notes and add to them any relevant information that they were not able to include at the time of first writing the notes. While reviewing their field notes, researchers look for themes and information relevant to the research questions. They note this information in the form of theoretical notes (or write theoretical memos to themselves) that may include questions about repeated patterns, links to other theories, and conceptual ideas they are beginning to develop. They also make methodological notes to reconstruct their thinking and their logic of inquiry. Sometimes they make personal notes that reflect their thoughts and feelings about what they are observing or experiencing. These notes allow them to keep from imposing their own opinion on data, helping them to focus on what is meaningful or important to those with whom they are working.
Researchers constantly use contrast to build interpretations that are grounded in the data, within and across actors, events, times, actions, and activitiesPage 2029 | Top of Article that constitute the social situations of everyday life. Many qualitative (particularly ethnographic) researchers examine material, activity, semiotic (meaning-carrying), and/or social dimensions of everyday life and its consequences for members. The analytic principles of practice that they use include comparing and contrasting data, methods, theories, and perspectives; examining part-whole relationships between and among actions, events, and actors; seeking insider (emic) understandings of experiences, actions, practices, and events; and identifying through these what is relevant to the local group.
Reporting Research Findings
The final step in qualitative and ethnographic research is writing an account. The researchers make choices about how to represent the data that illustrate what was typical about the particular group being studied. Another choice might be to highlight actions of the group that were illustrative of their particular patterns of beliefs. In some studies, several cases are chosen to make visible comparisons across different activities within the group, or across different groups that may have some activities in common. For example, researchers who study classroom interactions might bring together data from different classrooms to make visible principles of practice that are similar in general terms such as asking students to understand various points of view. However, in each classroom, the actions of juxtaposing points of view will be carried out differently due to the different experiences within each classroom.
Researchers also select genres for writing the report that best enable the intended audience to understand what the study made visible that was not previously known or that extended previous knowledge. The researcher does not seek to generalize from the specific case. Rather, qualitative or ethnographic researchers provide in-depth descriptions that lead to general patterns. These patterns are then examined in other situations to see if, when, and how they occur and what consequences they have for what members in the new setting can know, do, understand, and/or produce. In qualitative and ethnographic studies this is often referred to as transferability, in contrast to generalizability.
DENZIN, NORMAN, and LINCOLN, YVONNA, eds. 1994. Handbook of Qualitative Research. Thousand Oaks, CA: Sage.
ERICKSON, FREDRICK. 1986. "Qualitative Research." In The Handbook of Research on Teaching, 3rd edition, ed. Merle Wittrock. New York: Macmillan.
FLOOD, JAMES; JENSEN, JULIE; LAPP, DIANE; and SQUIRE, JAMES, eds. 1990. Handbook of Research on Teaching the English Language Arts. New York: Macmillan.
GEE, JAMES, and GREEN, JUDITH. 1998. "Discourse Analysis, Learning, and Social Practice: A Methodological Study." Review of Research in Education 23:119–169.
GILLMORE, PERRY, and GLATTHORN, ALAN, eds. Children In and Out of School: Ethnography and Education. Washington, DC: Center for Applied Linguistics.
GREEN, JUDITH; DIXON, CAROL; and ZAHARLICK, AMY. 2002. "Ethnography as a Logic of Inquiry." In Handbook for Methods of Research on English Language Arts Teaching, ed. James Flood, Julie Jensen, Diane Lapp, and James Squire. New York: Macmillan.
HAMMERSLEY, MARTIN, and ATKINSON, PAUL. 1995. Ethnography: Principles in Practice, 2nd edition. New York: Routledge.
KVALE, STEINAR. 1996. Interviews: An Introduction to Qualitative Research Interviewing. Thousand Oaks, CA: Sage.
LECOMPTE, MARGARET; MILLROY, WENDY; and PREISSLE, JUDITH, eds. 1992. The Handbook of Qualitative Research in Education. San Diego, CA: Academic Press.
LINDE, CHARLOTTE. 1993. Life Stories: The Creation of Coherence. New York: Oxford University Press.
OCHS, ELINOR. 1979. "Transcription as Theory." In Developmental Pragmatics, ed. Elinor Ochs and Bambi B. Schieffelin. New York: Academic Press.
PUTNEY, LEANN; GREEN, JUDITH; DIXON, CAROL; and KELLY, GREGORY. 1999. "Evolution of Qualitative Research Methodology: Looking beyond Defense to Possibilities." Reading Research Quarterly 34:368–377.
RICHARDSON, VIRGINIA. 2002. Handbook for Research on Teaching, 4th edition. Mahwah, NJ: Erlbaum.
SPRADLEY, JAMES. 1980. Participant Observation. New York: Holt, Rinehart and Winston.
STRIKE, KENNETH. 1974. "On the Expressive Potential of Behaviorist Language." American Educational Research Journal 11:103–120.
VAN MAANEN, JOHN. 1988. Tales of the Field: On Writing Ethnography. Chicago: University of Chicago Press.
WOLCOTT, HARRY. 1992. "Posturing in Qualitative Research." In The Handbook of Qualitative Research in Education, ed. Margaret LeCompte, Wendy Millroy, and Judith Preissle. New York: Academic Press.
LEANN G. PUTNEY
JUDITH L. GREEN
CAROL N. DIXON
SCHOOL AND PROGRAM EVALUATION
Program evaluation is research designed to assess the implementation and effects of a program. Its purposes vary and can include (1) program improvement, (2) judging the value of a program, (3) assessing the utility of particular components of a program, and (4) meeting accountability requirements. Results of program evaluations are often used for decisions about whether to continue a program, improve it, institute similar programs elsewhere, allocate resources among competing programs, or accept or reject a program approach or theory. Through these uses program evaluation is viewed as a way of rationalizing policy decision-making.
Program evaluation is conducted for a wide range of programs, from broad social programs such as welfare, to large multisite programs such as the preschool intervention program Head Start, to program funding streams such as the U.S. Department of Education's Title I program that gives millions of dollars to high-poverty schools, to small-scale programs with only one or a few sites such as a new mathematics curriculum in one school or district.
Scientific Research versus Evaluation
There has been some debate about the relationship between "basic" or scientific research and program evaluation. For example, in 1999 Peter Rossi, Howard Freeman, and Michael Lipsey described program evaluation as the application of scientific research methods to the assessment of the design and implementation of a program. In contrast, Michael Patton in 1997 described program evaluation not as the application of scientific research methods, but as the systematic collection of information about a program to inform decision-making.
Both agree, however, that in many circumstances the design of a program evaluation that is sufficient for answering evaluation questions and providing guidance to decision-makers would not meet the high standards of scientific research. Further, program evaluations are often not able to strictly follow the principles of scientific research because evaluators must confront the politics of changing actors and priorities, limited resources, short timelines, and imperfect program implementation.
Another dimension on which scientific research and program evaluation differ is their purpose. Program evaluations must be designed to maximize the usefulness for decision-makers, whereas scientific research does not have this constraint. Both types of research might use the same methods or focus on the same subject, but scientific research can be formulated solely from intellectual curiosity, whereas evaluations must respond to the policy and program interests of stakeholders (i.e., those who hold a stake in the program, such as those who fund or manage it, or program staff or clients).
How Did Program Evaluation Evolve?
Program evaluation began proliferating in the 1960s, with the dawn of social antipoverty programs and the government's desire to hold the programs accountable for positive results. Education program evaluation in particular expanded also because of the formal evaluation requirements of the National Science Foundation–sponsored mathematics and science curriculum reforms that were a response to the 1957 launch of Sputnik by the Soviet Union, as well as the evaluation requirements instituted as part of the Elementary and Secondary Education Act of 1965.
Experimentation versus Quasi-experimentation
The first large-scale evaluations in education were the subject of much criticism. In particular, two influential early evaluations were Paul Berman and Milbrey McLaughlin's RAND Change Agent 1973–Page 2031 | Top of Article 1978 study of four major federal programs: the Elementary and Secondary Education Act, Title VII (bilingual education), the Vocational Education Act, and the Right to Read Act; and a four-year study of Follow Through, which sampled 20,000 students and compared thirteen models of early childhood education. Some of the criticisms of these evaluations were that they were conducted under too short of a time frame, used crude measures that did not look at incremental or intermediate change, had statistical inadequacies including invalid assumptions, used poorly supported models and inappropriate analyses, and did not consider the social context of the program.
These criticisms led to the promotion of the use of experiments for program evaluation. Donald Campbell wrote an influential article in 1969 advocating the use of experimental designs in social program evaluation. The Social Science Research Council commissioned Henry Riecken and Robert Boruch to write the 1978 book Social Experimentation, which served as both a "guidebook and manifesto" for using experimentation in program evaluation. The best example of the use of experimentation in social research is the New Jersey negative income tax experiment sponsored by the Office of Equal Opportunity of the federal Department of Health, Education, and Welfare.
Experiments are the strongest designs for assessing impact, because through random sampling from the population of interest and random assignment to treatment and control groups, experiments rule out other factors besides the program that might explain program success. There are several practical disadvantages to experiments, however. First, they require that the program be a partial coverage program–that is, there must be people who do not participate in the program, who can serve as the control group. Second, experiments require large amounts of resources that are not always available. Third, they require that the program be firmly and consistently implemented, which is frequently not the case. Fourth, experiments do not provide information about how the program achieved its effects. Fifth, program stakeholders sometimes feel that random assignment to the program is unethical or politically unfeasible. Sixth, an experimental design in a field study is likely to produce no more than an approximation of a true experiment, because of such factors as systematic attrition from the program, which leaves the evaluator with a biased sample of participants (e.g., those who leave the program, or attrite, might be those who are the hardest to influence, so successful program outcomes would be biased in the positive direction).
When experiments are not appropriate or feasible, quasi-experimental techniques are used. Set forth by Donald Campbell and Julian Stanley in 1963, quasi-experimentation involves a number of different methods of conducting research that does not require random sampling and random assignment to treatment and control groups. One common example is an evaluation that matches the program participants to nonparticipants that share similar characteristics (e.g., race) and measures outcomes of both groups before and after the program. The challenge to quasi-experimentation is to rule out what Campbell and Stanley termed internal validity threats, or factors that might be alternative explanations for program results besides the program itself, which in turn would reduce confidence in the conclusions of the study. Unlike experimental design, which protects against just about all possible internal validity threats, quasi-experimental designs generally leave one or several of them uncontrolled.
In addition to focusing on the relative strengths and weaknesses of experiments and quasi-experiments, criticisms of early large-scale education evaluations highlighted the importance of measuring implementation. For example, McLaughlin and Berman's RAND Change Agent study and the Follow-Through evaluation demonstrated that implementation of a specific program can differ a great deal from one site to the next. If an evaluation is designed to attribute effects to a program, varying implementation of the same program reduces the value of the evaluation, because it is unclear how to define the program. Thus, it is necessary to include in a program evaluation a complete description of how the program is being implemented, to allow the examination of implementation fidelity to the original design, and to discover any cross-site implementation differences that would affect outcomes.
In 1967 Michael Scriven first articulated the idea that there were two types of evaluation–one focused on evaluating implementation, called formative evaluation, and one focused on evaluating the impact of the program, called summative evaluation. He argued that emerging programs should be the subject of formative evaluations, which are designedPage 2032 | Top of Article to see how well a program was implemented and to improve implementation; and that summative evaluations should be reserved for programs that have been well-established and have stable and consistent implementation.
Related to the idea of formative and summative evaluation is a controversy over the extent to which the evaluator should be a program insider or an objective third party. In formative evaluations, it can be argued that the evaluator needs to become somewhat of an insider, in order to become part of the formal and informal feedback loop that makes providing program improvement information possible. In contrast, summative evaluations conducted by a program insider foster little confidence in the results, because of the inherent conflict of interest.
Stakeholder and Utilization Approaches
Still another criticism of early education evaluations was that stakeholders felt uninvolved in the evaluations; did not agree with the goals, measures, and procedures; and thus rejected the findings. This discovery of the importance to the evaluation of stake-holder buy-in led to what Michael Patton termed stakeholder or utilization-focused evaluation. Stake-holder evaluation bases its design and execution on the needs and goals of identified stakeholders or users, such as the funding organization, a program director, the staff, or clients of the program.
In the context of stakeholder evaluation, Patton in 1997 introduced the idea that it is sometimes appropriate to conduct goal-free evaluation. He suggested that evaluators should be open to the idea of conducting an evaluation without preconceived goals because program staff might not agree with the goals and because the goals of the program might change over time. Further, he argued that goal-free evaluation avoids missing unanticipated outcomes, removes the negative connotation to side effects, eliminates perceptual biases that occur when goals are known, and helps to maintain evaluator objectivity. Goals are often necessary, however, to guide and focus the evaluation and to respond to the needs of policymakers. As a result, Patton argued that the use of goals in program evaluation should be decided on a case-by-case basis.
Besides stakeholder and goal-free evaluation, Carol Weiss in 1997 advocated for theory-based evaluations, or evaluations that are grounded in the program's theory of action. Theory-based evaluation aims to make clear the theoretical underpinnings of the program and use them to help structure the evaluation. In her support of theory-based evaluation, Weiss wrote that if the program theory is outlined in a phased sequence of cause and effect, then the evaluation can identify weaknesses in the system or at what point in the chain of effects results can be attributed. Also, articulating a programmatic theory can have positive benefits for the program, including helping the staff address conflicts, examine their own assumptions, and improve practice.
Weiss explained that theory-based approaches have not been widespread because there may be more than one theory that applies to a program and no guidance about which to choose, and because the process of constructing theories is challenging and time consuming. Further, theory-based approaches require large amounts of data and resources. A theory-based evaluation approach does, however, strengthen the rigor of the evaluation and link it more with scientific research, which by design is a theory-testing endeavor.
Data Collection Methods
Within different types of evaluation (e.g., formative, stakeholder, theory-based), there have been debates about which type of methodology is appropriate, with these debates mirroring the debates in the larger social science community. The "scientific ideal" of using social experiments and randomized experiments, which supports the quantification of implementation and outcomes, is contrasted with the "humanistic ideal" that the program should be seen through the eyes of the clients and defies quantification, which supports an ethnographic or observational methodology.
Campbell believed that the nature of the research question should determine the question, and he encouraged evaluations that have both qualitative and quantitative assessments, with these assessments supporting each other. In the early twenty-first century, program evaluations commonly use a combination of qualitative and quantitative data collection techniques.
Does Evaluation Influence Policy?
Although the main justification for program evaluation is its role in rationalizing policy, program evaluation results rarely have a direct impact on decision-making. This is because of the diffuse and politicalPage 2033 | Top of Article nature of policy decision-making and because people are generally resistant to change. Most evaluations are undertaken and disseminated in an environment where decision-making is decentralized among several groups and where program and policy choices result from conflict and accommodation across a complex and shifting set of players. In this environment, evaluation results cannot have a single and clear use, nor can the evaluator be sure how the results will be interpreted or used.
While program evaluations may not directly affect decisions, evaluation does play a critical role in contributing to the discourse around a particular program or issue. Information generated from program evaluation helps to frame the policy debate by bringing conflict to the forefront, providing information about trade-offs, influencing the broad assumptions and beliefs underlying policies, and changing the way people think about a specific issue or problem.
Evaluation in the Early Twenty-First Century
In the early twenty-first century, program evaluation is an integral component of education research and practice. The No Child Left Behind Act of 2001 (reauthorization of the U.S. government's Elementary and Secondary Education Act) calls for schools to use "research-based practices." This means practices that are grounded in research and have been proven through evaluation to be successful. Owing in part to this government emphasis on the results of program evaluation, there is an increased call for the use of experimental designs.
Further, as the evaluation field has developed in sophistication and increased its requirements for rigor and high standards of research, the lines between scientific research and evaluation have faded. There is a move to design large-scale education evaluations to respond to programmatic concerns while simultaneously informing methodological and substantive inquiry.
While program evaluation is not expected to drive policy, if conducted in a rigorous and systematic way that adheres to the principles of social research as closely as possible, the results of program evaluations can contribute to program improvement and can provide valuable information to both advance scholarly inquiry as well as inform important policy debates.
BERMAN, PAUL, and MCLAUGHLIN, MILBREY. 1978. Federal Programs Supporting Educational Change, Vol. IV: The Findings in Review. Santa Monica, CA: RAND.
CAMPBELL, DONALD. 1969. "Reforms as Experiments." American Psychologist 24:409–429.
CAMPBELL, DONALD, and STANLEY, JULIAN. 1963. Experimental and Quasi-Experimental Designs for Research. Chicago: Rand McNally.
CHELIMSKY, ELEANOR. 1987. "What Have We Learned about the Politics of Program Evaluation?" Evaluation News 8 (1):5–22.
COHEN, DAVID, and GARET, MICHAEL. 1975. "Reforming Educational Policy with Applied Social Research." Harvard Educational Review 45 (1):17–43.
COOK, THOMAS D., and CAMPBELL, DONALD T. 1979. Quasi-Experimentation: Design and Analysis Issues for Field Settings. Chicago: Rand McNally.
CRONBACH, LEE J. 1982. Designing Evaluations of Educational and Social Programs. San Francisco: Jossey-Bass.
CRONBACH, LEE J.; ABRON, SUEANN ROBINSON; DORNBUSCH, SANFORD; HESS, ROBERT; PHILLIPS, D. C.; WALKER, DECKER; and WEINER,STEPHEN. 1980. Toward Reform of Program Evaluation: Aims, Methods, and Institutional Arrangements. San Francisco: Jossey-Bass.
HOUSE, ERNEST; GLASS, GENE; MCLEAN, LESLIE; and WALKER, DECKER. 1978. "No Simple Answer: Critique of the Follow Through Evaluation." Harvard Educational Review 48:128–160.
PATTON, MICHAEL. 1997. Utilization-Focused Evaluation, 3rd edition. Thousand Oaks, CA: Sage.
RIECKEN, HENRY, and BORUCH, ROBERT. 1978. Social Experimentation: A Method for Planning and Evaluating Social Intervention. New York: Academic Press.
ROSSI, PETER; FREEMAN, HOWARD; and LIPSEY, MARK. 1999. Evaluation: A Systematic Approach, 6th edition. Thousand Oaks, CA: Sage.
SCRIVEN, MICHAEL. 1967. "The Methodology of Evaluation." In Perspective of Curriculum Evaluation, ed. Robert E. Stake. Chicago: Rand McNally.
SHADISH, WILLIAM R.; COOK, THOMAS; and LEVITON, LAURA. 1991. Foundations of Program Evaluation:Page 2034 | Top of Article Theories of Practice. Newbury Park, CA: Sage.
U.S. OFFICE OF EDUCATION. 1977. National Evaluation: Detailed Effects. Volumes II-A and II-B of the Follow Through Planned Variation Experiment Series. Washington, DC: Government Printing Office.
WEISS, CAROL. 1972. Evaluation Research: Methods for Assessing Program Effectiveness. Englewood Cliffs, NJ: Prentice Hall.
WEISS, CAROL. 1987. "Evaluating Social Programs: What Have We Learned?" Society 25:40–45.
WEISS, CAROL. 1988. "Evaluation for Decisions: Is Anybody There? Does Anybody Care?" Evaluation Practice 9:5–20.
WEISS, CAROL. 1997. "How Can Theory-Based Evaluation Make Greater Headway?" Evaluation Review 21:501–524.
Since the early 1900s, researchers have relied on verbal data to gain insights about thinking and learning. Over the years, however, the perceived value of verbal data for gaining such insights has waxed and waned. In 1912 Edward Titchener, one of the founders of structural psychology, advocated the use of introspection by highly trained self-observers as the only method for revealing certain cognitive processes. At the same time, this technique of observing and verbalizing one's own cognitive processes drew much criticism. Researchers questioned the objectivity of the technique and the extent to which people have knowledge of and access to their cognitive processes. With behaviorism as the dominant perspective for studying learning in the United States, verbal data were treated as behavioral products, not as information that might reveal something about cognitive processing. From about the 1920s to 1950s, most U.S. researchers abandoned the use of introspective techniques, as well as most other types of verbal data such as question answering.
While U.S. learning theorists and researchers were relying almost solely on nonverbal or very limited verbal (e.g., yes/no response) techniques, the Swiss cognitive theorist Jean Piaget was relying primarily on children's verbal explanations for gaining insights into their cognitive abilities and processes. Piaget believed that children's explanations for their responses to various cognitive tasks provided much more information about their thinking than did the task responses alone. United States theorists, however, were not ready to consider Piaget's work seriously until about 1960, when cognitive psychology was beginning to emerge and there was declining satisfaction with a purely behavioral perspective.
With the rise of cognitive psychology beginning in the 1950s and 1960s, educational and experimental psychologists became interested once again in the usefulness of verbal data for providing information about thinking and learning. Cognitive researchers rarely use Titchener's original introspective technique in the early twenty-first century. Since the 1980s, however, researchers have increasingly used verbal protocol analysis, which has roots in the introspective technique, to study the cognitive processes involved in expert task performance, problem solving, text comprehension, science education, second language acquisition, and hypertext navigation.
What Are Verbal Protocols?
Verbal protocols are rich data sources containing individuals' spoken thoughts that are associated with working on a task. While working on a particular task, subjects usually either think aloud as thoughts occur to them or they do so at intervals specified by the researcher. In some studies, researchers ask subjects to verbalize their thoughts upon completion of the task. The verbalizations are recorded verbatim, usually using a tape recorder, and are then coded according to theory-driven and/or empirically driven categories.
Verbal protocols differ from introspection. Subjects are not instructed to focus on the cognitive processes involved in task completion nor are they trained in the self-observation of cognitive processing. The goal is for subjects to express out loud the thoughts that occur to them naturally. Researchers use these data in conjunction with logical theoretical premises to generate hypotheses and to draw conclusions about cognitive processes and products.
What Can Verbal Protocols Reveal about Thinking and Learning?
In order to verbalize one's thoughts, individuals must be aware of those thoughts and the thoughts must be amenable to language. Thus, verbal protocol analysis can reveal those aspects of thinking and learning that are consciously available, or activatedPage 2035 | Top of Article in working memory, and that can be encoded verbally.
One major advantage of verbal protocol data is that they provide the richest information regarding the contents of working memory during task execution. In studies of reading comprehension, for example, verbal protocols have provided a detailed database of the types of text-based and knowledge-based inferences that might occur during the normal reading of narrative texts. Data using other measures such as sentence reading time and reaction time to single-word probes have corroborated some of the verbal protocol findings. For example, corroborating evidence for the generation of causal inferences and goal-based explanations exists. Verbal protocols have also provided information about the particular knowledge domains that are used to make inferences when reading narratives, and about differences in readers' deliberate strategies for understanding both narrative and informational texts.
Verbal protocols have been used extensively in the study of expert versus novice task performance across a variety of domains (e.g., cognitive-perceptual expertise involved in chess, perceptual-motor expertise such as in sports, science and mathematical problem-solving strategies, skilled versus less-skilled reading). While the specific insights about the differences between expert and novice approaches vary from domain to domain, some generalities across domains can be made. Clearly, experts have more knowledge and more highly organized knowledge structures within their domains than do novices. But the processes by which they solve problems and accomplish tasks within their domains of expertise also differ. Verbal protocols have revealed that experts are more likely to evaluate and anticipate the ever-changing situations involved with many problems and to plan ahead and reason accordingly. Knowledge about expert and novice problem-solving processes has implications for developing and assessing pedagogical practices.
Another advantage of verbal protocol analysis is that it provides sequential observations over time. As such, it reveals changes that occur in working memory over the course of task execution. This has been useful in studies of reading comprehension where the information presented and the individual's representation of the text change over time, in studies of problem solving where multiple steps are involved in reaching a solution and/or where multiple solutions are possible, in studies of expert versus novice task performance, and in studies of conceptual change.
Limitations of Verbal Protocol Data
As is the case with most research methods, verbal protocols have both advantages and limitations. Obviously, subjects can verbalize only thoughts and processes about which they are consciously aware. Thus, processes that are automatic and executed outside of conscious awareness are not likely to be included in verbal protocols, and other means of assessing such processes must be used. Also, nonverbal knowledge is not likely to be reported.
Most authors of articles examining the think-aloud procedure seem to disagree with the 1993 contention of K. Anders Ericsson and Herbert A. Simon that thinking aloud does not usually affect normal cognitive processing. It is thought that the think-aloud procedure may lead to overestimates and/or underestimates of the knowledge and processes used under normal task conditions. The need to verbalize for the think-aloud task itself might encourage subjects to strategically use knowledge or processes that they might not otherwise use. Alternately, the demands of the think-aloud task might interfere with subjects' abilities to use knowledge and/or processes they might use under normal conditions. Self-presentation issues (e.g., desire to appear smart, embarrassment, introversion/extroversion) might affect subjects' verbal reports. Finally, the pragmatics and social rules associated with the perception of having to communicate one's thoughts to the researcher might also lead to overestimates or underestimates of knowledge and processes typically used.
Unfortunately, it is not possible to know if a verbal protocol provides a complete picture of the knowledge and processes normally used to perform a task. Typically, however, no single research technique provides a complete picture. Only the use of multiple measures for assessing the same hypotheses and for assessing various aspects of task performance can provide the most complete picture possible.
A final limitation of verbal protocol methodology is that it is very labor intensive. The data collection and data coding are extremely time consuming as compared with other methodologies. The amount of potential information that can be acquired about the contents of working memory during task performance, however, is often well worth the time required.
Optimizing the Advantages and Minimizing the Limitations
Several suggestions have been put forth for increasing the likelihood of obtaining verbal protocol data that provide valid information about the contents of working memory under normal task conditions. The most frequent suggestions are as follows:
- Collect verbal protocol data while subjects are performing the task of interest.
- Ask subjects to verbalize all thoughts that occur. One should not direct their thoughts or processing by asking for specific types of information unless one wishes to study the planned, strategic use of that type of information.
- Make it clear to the subjects that task performance is their primary concern and that thinking aloud is secondary. If, however, a subject is silent for a relatively long period as compared to others during task execution, prompts such as "keep talking" may become necessary.
- To minimize as much as possible the conversational aspects of the think-aloud task, the researcher should try to remain out of the subject's view.
BERK, LAURA E. 2000. Child Development, 5th edition. Boston: Allyn and Bacon.
COTE, NATHALIE, and GOLDMAN, SUSAN R. 1999. "Building Representations of Informational Text: Evidence from Children's Think-Aloud Protocols." In The Construction of Mental Representations during Reading, ed. Herre van Oostendorp and Susan R. Goldman. Mahwah, NJ: Erlbaum.
CRUTCHER, ROBERT J. 1994. "Telling What We Know: The Use of Verbal Report Methodologies in Psychological Research." Psychological Science 5:241–244.
DHILLON, AMARJIT S. 1998. "Individual Differences within Problem-Solving Strategies Used in Physics." Science Education 82:379–405.
ERICSSON, K. ANDERS, and SIMON, HERBERT A. 1993. Protocol Analysis: Verbal Reports as Data, revised edition. Cambridge, MA: MIT Press.
HURST, ROY W., and MILKENT, MARLENE M. 1996. "Facilitating Successful Prediction Problem Solving in Biology through Application of Skill Theory." Journal of Research in Science Teaching 33:541–552.
LONG, DEBRA L., and BOURG, TAMMY. 1996. "Thinking Aloud: Telling a Story about a Story." Discourse Processes 21:329–339.
MAGLIANO, JOSEPH P. 1999. "Revealing Inference Processes during Text Comprehension." In Narrative Comprehension, Causality, and Coherence: Essays in Honor of Tom Trabasso, ed. Susan R. Goldman, Arthur C. Graesser, and Paul van den Broek. Mahwah, NJ: Erlbaum.
MAGLIANO, JOSEPH P.; TRABASSO, TOM; and GRAESSER, ARTHUR C. 1999. "Strategic Processing during Comprehension." Journal of Educational Psychology 91:615–629.
PAYNE, JOHN W. 1994. "Thinking Aloud: Insights into Information Processing." Psychological Science 5 (5):241–248.
PIAGET, JEAN. 1929. The Child's Conception of the World (1926), trans. Joan Tomlinson and Andrew Tomlinson. London: Kegan Paul.
PRESSLEY, MICHAEL, and AFFLERBACH, PETER. 1995. Verbal Protocols of Reading: The Nature of Constructively Responsive Reading. Hillsdale, NJ: Erlbaum.
PRITCHARD, ROBERT. 1990. "The Evolution of Introspective Methodology and Its Implications for Studying the Reading Process." Reading Psychology: An International Quarterly 11 (1):1–13.
TRABASSO, TOM, and MAGLIANO, JOSEPH P. 1996. "Conscious Understanding during Comprehension." Discourse Processes 21:255–287.
WHITNEY, PAUL, and BUDD, DESIREE. 1996. "Think-Aloud Protocols and the Study of Comprehension." Discourse Processes 21:341–351.
WILSON, TIMOTHY D. 1994. "The Proper Protocol: Validity and Completeness of Verbal Reports." Psychological Science 5 (5):249–252.
ZWAAN, ROLF A., and BROWN, CAROL M. 1996. "The Influence of Language Proficiency and Comprehension Skill on Situation-Model Construction." Discourse Processes 21:289–327.