Knol will be unavailable during scheduled maintenance starting at Mon, 09 Nov 2009 18:30:00 GMT. We expect the maintenance to be completed at Mon, 09 Nov 2009 20:00:00 GMT.
Version: Baidi441

Data Collection


I.                   Introduction

Data collection is a necessary component of any intervention trial: without data, no analysis can be completed, manuscripts published, or policies supported.  Implementation of high-quality data collection methods can lead to a better experience for participants, easier management for study staff, and more valid conclusions being drawn by investigators.  This paper aims to highlight several areas in the study development and execution process that have major bearings on overall validity.  These areas include hypothesis generation, instrument development, interviewer training, selection of collection method, and quality control checks.  Topics covered will be supplemented by information drawn from book chapters and published articles, and also the professional experiences of this paper’s author (SR), who has worked as a data manager on an intervention trial with HIV-positive men in Washington, DC, and an observational trial with low-income young married women in Bangalore, India.

                                                                

II.                Hypothesis generation and analysis plan

The type of data to be collected is most impacted by the study hypothesis: the research question determines the relationship between an outcome, predictors and confounders.  Each of these concepts must be operationalized into measurable variables.  For example, a hypothesis that increased HIV antiretroviral adherence will extend survival among persons living with HIV must quantify “adherence”.  This could be done using a dichotomous scale (adherent vs. non-adherent), ordinal categorical scale (very adherent, somewhat adherent, minimally adherent), or continuous scale (adherent for X% of days).

With regards to an analysis plan, each of these scales has different implications with regards to the amount of information conveyed, the power required to detect differences between groups, precision and ease of measurement.  Discussion with a biostatistician at the earliest opportunity can be particularly valuable with regards to identification of statistical models to test the hypothesis, and thought given to the form variables must take to work within those models to allow for meaningful conclusions to be drawn.

 

III.             Instrument creation

a.       Measures

The number of response categories in an outcome variable impacts the kind of regression analysis possible.  For non-survival analysis, dichotomous allows for logistic, categorical for multinomial, and continuous for linear regression.  More flexibility is possible for independent variables, and investigators may prefer the greater precision allowed for by having more response categories. 

The applicability of these scales must be pilot tested with persons similar to potential participants.  In some cases, pilot participants may show how too few response categories paper over meaningful differences between the options.  Conversely, with too many categories, participants may not give much value to the differences between categories.  Likert scales should be pre-tested to evaluate appropriateness in study populations that may not have had much exposure to such measurement tools.  In the Prevention with Positives study in Washington, DC and likewise the Samata Study in Bangalore, India, the participants had a tendency to select the most central or extreme values on Likert scales, which was evidence that participants conceptualized certain issues more absolutely than predicted by researchers, who assumed greater nuance.

In the end, the researchers may opt to measure with the greatest precision feasible, and then, as determined before analysis commences, recode into categories.

            Open-ended questions add flexibility to an instrument, by allowing for responses not anticipated by investigators.  For some questions, an “Other” response may provide valuable insight that would have not been captured.  Investigators should be aware of the additional demands such questions put on the interviewer to record the response, data entry staff to key it in, and data managers to recode these string responses into existing/new categories.

b.      Qualities of good questions

Several texts on instrument development exist[1-4], many of which do an excellent job of discussing the qualities of effective questions.  Colton and Covert (2007) devote two chapters to the topic, building on the formative experiences of others.  Given the diverse populations reached by public health researchers, and the private nature of individual health data collection, these rules have particular importance.  If constructing new or modified questions for data collection, one should pay careful attention to the characteristics such as sentence length, word choice/terminology, use of background information/definitions, the range of response options, and how socially sensitive questions are phrased.[5]  Hulley et al. also suggest making time frames clear, ensuring response sets match the question, and that questions do not make assumptions about the participant.[1]  One should also take care to avoid double negatives, to have pre-defined verbal or physical probes to clarify responses, and to have scale responses with meaningful levels to participants.  A poorly constructed question can garner an invalid response and/or negatively impact participants’ experience with the research.

The worst-case scenario in data collection is missing data, and staff can take several steps to increase participant retention and response rates.  These include the use of incentives, early description of the importance of the research, explanation of all steps of the study visits, simplification of case report forms (CRF), identification of the participant as part of a cohort, reinforcement of confidentiality, and ensuring participants know they will be contacted when research results are available.[6]  While simple or sophisticated statistical adjustments can be made to impute missing data, significant non-response and loss-to-follow-up can bias conclusions[7], or even prevent any analysis from taking place.[8]

            c. Validated scales

It is often the case some concept that a researcher seeks to measure has already been the subject of prior research.  In many cases, validates scales may exist and should be strongly considered.  These scales allow for greater comparability between studies and increase acceptance of results for funders and other members of the research community.  Scales must be separately validated for every new study population, language, and implementation method; for popular measures, again, this step may have already been completed.  If not, validation of a scale is an excellent way to contribute to the research community.  In HIV research, for example, a selected list of instruments is available at the UCSF Center for AIDS Prevention Studies.[9]

For each of these characteristics, pilot testing the instrument is the best means of determining whether information captured during data collection is meaningful in the intended fashion.  Asking pilot participants to explain their answers, or give their interpretation of the question, can inform revisions of questions, and subsequently confidence in the result.  This kind of qualitative research can enrich all stages of the research process.  For instrument development or presentation of preliminary results, personal semi-structured interviews or focus group discussions can help answer investigators’ questioning: “What’s going on here?”[10]

 

IV.             Interview selection and training

Training is essential when using interviewers for face-to-face data collection.  Untrained - or poorly trained - interviews can significantly impact the quality of data collected; it would not be accurate to assume anyone can become an effective interviewer, no matter what their past training may have been.  Fortunately, there exists such a job as a professional interviewer, and comprehensive health interviewer training guides exist.  The 150-page training manual for the Demographic and Health Surveys (DHS)[11] is an excellent starting point for anyone who seeks to train their own interviewers.

Beyond the actual training, integral characteristics can also make a difference.  Groves et al. provide ample evidence that age, gender, socioeconomic class, and religion can impact participant responses.[3]  When an interviewer carries some characteristic related to the measure in question, there exists the chance for systematic bias to affect the response.[12]  In the case of collecting data on socially sensitive topics, considering the how the survey sample will interact with the interviewers is essential, and is thus a standard part of methods reporting in publications.

One special case for interviewer selection is that of using medical professionals.  Logically, doctors and nurses would be ideal interviewers: they engender a great deal of trust among people and necessarily have rigorous data collection skills.  Yet, as Abramson notes, these professionals “often make poor interviewers in a research setting.”[13]  A quality physician will approach information-gathering as an adaptive process in order to immediately serve the needs of the patient.  And in a clinical setting, medical professionals may not have the time or inclination to record additional data required for research purposes.  A quality interviewer, in contrast will fully collect information in a standardized manner with some indifference to the content of responses, and whose interests lie in having valid data analyzed to serve the needs of some future population.

 

V.                Data collection methods

Researchers are quite aware of the limitations of self-reported data: they are prone to several forms of bias and lack precision, to start.  Substance use researchers Richter and Johnson found that “self-report assessments are particularly vulnerable to distortion with less socially acceptable behaviors.”  They also find that, compared to a gold standard of a biologic drug test, self-reports of drug use are highly unreliable.[14]  Knowledge of this fact across public health subject areas has led assessors of behaviors towards other methods of data collection that can better approximate a gold standard measure.

Over the past two decades, electronic data capture has gained popularity, as it has many considerable advantages over paper-and-pencil methods.  For example, audio computer-assisted self-interviewing (ACASI) – where a computer is programmed to serve as the interviewer and data recorder – eliminates the need for interviewer training, double data entry, data cleaning, and can reduce participant concerns about confidentiality and socially desirable responses.  A computer program can also accurately handle complex skip patterns, check for logical consistency of responses, and provides a standard interview experience for all participants.  Conversely, it does require higher technical knowledge, reduces the possibility for feedback, may not allow for open-ended responses, can be costlier, and demand more time of the participant.

A significant body of literature around ACASI is available for any potential researcher seeking advice from past experience.  A recent paper by Jaya et al (2008) typifies these papers.  Conducting the same interview with 1500 Indian male and female adolescents using two of three methodologies shows the females consistently reporting fewer socially-sensitive experiences on the ACASI than on the face-to-face; the males show differential reporting, but not consistently.[15]

Learning of the experiences past researchers have had by participants’ nationality, gender, age, and technological experience can guide selection of a data collection instrument, which can then be pilot tested for detailed feedback.  Much like validated scales, testing ACASI in a new population against an existing method, or a gold standard (eg concurrently measured clinical data), can be a valuable contribution to the literature.

One such example exists in a study by Villaroel, where a US-based sample of 2200 adults reported on hearing of, and/or being diagnosed with, certain sexually transmitted infections (STI).  Participants using the telephone-ACASI system had higher reporting rates on both measures, compared to telephone-based personal interviewing.  While the authors attribute the difference to the collection method, this result also applied for one fictitious STI[16].  These authors, unlike most others, added this element of validity-checking to their study, though they do not adjust for it in their conclusion.

The ease in responding to ACASI interviews may lead some participants to be less diligent with their responses, or to repeatedly respond with “Don’t know”, “Decline to answer”, or the same button.  And individuals who are not yet comfortable with the new technology may in fact prefer human interaction for answering sensitive questions.  Pilot testing can provide essential insights into which method is most suitable.

A range of other methods exist.  Computer-assisted personal interviewing (CAPI) and handheld-assisted personal interviewing (HAPI) are compromises between ACASI and face-to-face interviewing, by having a human interviewer read questions and key responses directly into a laptop or handheld device.  HAPI has been demonstrated to be a versatile technology for use in difficult conditions: one study used 127 interviewers to gather data from 15,000 households in 15 weeks, in rural Burkina Faso, with “no serious technical problems that hampered data collection.”[17]

Research studies that are more data-intensive may ask participants to self-report symptoms or behaviors into a daily activity diary, so as to minimize recall bias and gain the advantages of more precise reporting.  This method has been most associated with the Food Frequency Questionnaires (FFQ) in nutritional epidemiology.  While affording the greatest opportunity for detailed data collection, the actual process can negatively impact data validity.  Participants may change their behavior, try to report socially desirable data, and may not fill out their diaries completely, or in a timely fashion.  Glady’s Block, an FFQ pioneer has even stated that due to these data issues: “I don’t believe anything I read in nutritional epidemiology anymore. I’m so skeptical at this point.”[18]

As much as logistically and financially possible, gold standard measurements should be collected.  For public health research, these generally concern clinical measurements.  The range of biological measurements is continually expanding; the use of biomarkers can allow for validation of self-reported measures, and perhaps substitution for them.  For example, receptive sexual partners reporting condom use can be checked for presence of prostate-specific antigen, which signifies exposure to semen.[19]  Given the unreliability of self reports found by Richter and Johnson, a greater focus on development of biomarkers is critical.

 

VI.             Quality control and confidentiality

Even with thorough instrument development, pilot testing and staff training, quality control procedures are required.  The revision of a question on a CRF early in the data collection process may result in the invalidation of that previously collected, but still increase the valid analyzability of the question by the end of the study.  An effective project manager will continually collect feedback from interviewers, participants and data team members, along with potential solutions, and revise in consultation with the investigators and, for protocol revisions, institutional review boards.  Issues discovered too late for revisions must still be documented and considered at time of analysis.

An issue documentation system should be devised and run in parallel to the data collection system.  In the Samata study, any staff member could initiate start of a query resolution form (QRF) whenever some issue may impact the data, such as a participant behaving erratically, discovery of a mistranslation in the CRFs, or an interviewer who seeks to clarify some response they had to modify.  The data manager will consult with the staff and investigators to determine - and document - any solution.  This kind of paper trail becomes invaluable in the later stages of the study, when any seemingly unusual data can be checked against the QRFs to see if modifications were made, or if others had previously considered the same issue before and decided what action was warranted.

The documentation can also help for the inevitable data audit – whether called for by the funding agency or the investigator.  It was a data manager who detected data fabrication in a breast cancer trial, leading to the sanction of an investigator, and re-analysis of the data.[20, 21]  A paper trial should exist for data between the point of collection to the figures analyzed for publication.

Ensuring participant confidentiality is of paramount concern for all human subjects research, and the data collection process must give primary attention to data security.  Steps such as separating personally identifying information from study data, password-protecting databases, and keeping data available on a need-to-know basis (with exceptions as per local laws) are necessary first steps.  While all study staff must be trained on the importance of confidentiality, the physical structure of the data collection and storage operation must also be set up to minimize risk of data breaches.

In conclusion, data collection is an iterative, intentional process that requires diligence on the part of investigators and study staff.  By asking the right questions, using appropriate recording methods, implementing continuous quality controls, and respecting participant confidentiality, quality data can be collected.  Ultimately, quality data collection improves the validity of findings, enhances the participants’ experience, and contributes to the knowledge base that improves health outcomes for the public.


1.         Hulley, S.B., Designing Clinical Research. 2006, Philadelphia, PA: Lippincott Williams & Wilkins.

2.         Aday, L.A., Designing and conducting health surveys. 1989: Jossey-Bass Publishers San Francisco.

3.         Groves, R.M., Survey Methodology. 2004, Hoboken, New Jersey: Wiley-Interscience.

4.         Schuman, H. and S. Presser, Questions and Answers in Attitude Surveys: Experiments on Question Form. Wording and Context (Academic Press, New York), 1981.

5.         Colton, D. and R.W. Covert, Designing and Constructing Instruments for Social Research and Evaluation. 2007, San Francisco, CA: Jossey-Bass.

6.         Phillips, P.P. and C.A. Stawarski, Data Collection: Planning for and Collecting All Types of Data. 2008, San Francisco, CA: Pfeiffer.

7.         Sapsford, R. and V. Jupp, eds. Data Collection and Analysis. 2nd ed. 2006, Sage Publications Ltd: London.

8.         Dillon, S., For want of a proofreader, or at least a good one, a reading exam is lost, in The New York Times. November 20, 2007, The New York Times Co: New York City.

9.         CAPS. CAPS Instruments.  2008  [cited October 12, 2008]; Available from: http://caps.ucsf.edu/tools/surveys/.

10.       Richards, L. and J.M. Morse, Readme First for a User's Guide to Qualitative Methods. 2006, Thousand Oaks, CA: Sage Publications Inc.

11.       ORCMacro, Demographic and Health Survey Interview's Manual, in MEASURE DHS Basic Documentation. 2006, ORC Macro: Calverton, Maryland, USA.

12.       Zernike, K., Do polls lie about race?, in The New York Times. October 11, 2008, The New York Times Co: New York City.

13.       Abramson, J.H. and J. Herbert, Survey methods in community medicine: epidemiological studies, programme evaluation, clinical trials. 1990, New York, NY: Churchill Livingstone.

14.       Richter, L. and P.B. Johnson, Current Methods of Assessing Substance Use: A Review of Strengths, Problems, and Developments. Journal of Drug Issues, 2001. 31(4): p. 809-832.

15.       Jaya, M.J. Hindin, and S. Ahmed, Differences in Young People’s Reports of Sexual Behaviors According to Interview Methodology: A Randomized Trial in India. American Journal of Public Health, 2008. 98(1): p. 169-174.

16.       Villarroel, M.A., et al., T-ACASI Reduces Bias in STD Measurements: The National STD and Behavior Measurement Experiment. Sexually Transmitted Diseases, 2008. 35(5): p. 499-506.

17.       Byass, P., et al., Direct data capture using hand-held computers in rural Burkina Faso: experiences, benefits and lessons learnt. Tropical Medicine & International Health, 2008. 13(s1 An Evaluation of Skilled Care at Delivery in Burkina Faso): p. 25-30.

18.       Pollan, M., In Defense of Food: An Eater's Manifesto. 2008, New York: Penguin Press HC.

19.       Macaluso, M., et al., Efficacy of the Male Latex Condom and of the Female Polyurethane Condom as Barriers to Semen during Intercourse: A Randomized Clinical Trial. Am. J. Epidemiol., 2007. 166(1): p. 88-96.

20.       Fisher, B., et al., Eight-year results of a randomized clinical trial comparing total mastectomy and lumpectomy with or without irradiation in the treatment of breast cancer. 1989. p. 822-828.

21.       Fisher, B., et al., Fraud in Breast-Cancer Trials. N Engl J Med, 1994. 330(20): p. 1458-1462.

 

Comments