![]() |
|
| Français - Español |
|
|
NB: Development of a research process is a cyclical process. The double-headed arrows indicate that the process is never linear. MODULE 13: PLAN FOR DATA PROCESSING AND ANALYSISOBJECTIVES At the end of this session, you should be able to:
I. INTRODUCTIONData processing and analysis should start in the field, with checking for completeness of the data and performing quality control checks, while sorting the data by instrument used and by group of informants. Data of small samples may even be processed and analysed as soon as it is collected. Why is it necessary to prepare a plan for processing and analysis of data?Such a plan helps the researcher assure that at the end of the study:
The plan for data processing and analysis must be made after careful consideration of the objectives of the study as well as of the tools developed to meet the objectives. The procedures for the analysis of data collected through qualitative and quantitative techniques are quite different.
Preparation of a plan for data processing and analysis will provide you with better insight into the feasibility of the analysis to be performed as well as the resources that are required. It also provides an important review of the appropriateness of the data collection tools for collecting the data you need. That is why you have to plan for data analysis before the pre-test (Module 14). When you process and analyse the data you collect during the pre-test you will spot gaps and overlaps which require changes in the data collection tools before it is too late! What should the plan include?When making a plan for data processing and analysis the following issues should be considered:
II. SORTING DATAAn appropriate system for sorting the data is important for facilitating subsequent processing and analysis. If you have different study populations (for example village health workers, village health committees and the general population), you obviously would number the questionnaires separately. In a comparative study it is best to sort the data right after collection into the two or three groups that you will be comparing during data analysis.
It is useful to number the questionnaires belonging to each of these categories separately right after they are sorted.
In a cross-sectional survey it may also be useful to sort the data into two or more groups, depending on possible sub-groups you would like to compare. III. PERFORMING QUALITY CONTROL CHECKSUsually the data have already been checked in the field to ensure that all the information has been properly collected and recorded. Before and during data processing, however, the information should be checked again for completeness and internal consistency. If a questionnaire has not been filled in completely you will have MISSING DATA for some of your variables. If there are many missing data in a particular questionnaire, you may decide to exclude the whole questionnaire from further analysis.
Note: A decision to exclude data of doubtful quality is ethically correct and it testifies to the scientific integrity of the researcher. You should keep track of any questions you had to exclude because of incompleteness or inconsistency in the answers, and discuss it in your final report. For computer data analysis, quality control checks of data must also include a verification of how the data has been transformed into codes and subsequently entered into the computer. The same applies if data are entered into master sheets (see next page). IV. DATA PROCESSING – quantitative dataDecide whether to process and analyse the data from questionnaires:
Data processing in both cases involves:
1. CategorisingDecisions have to be made concerning how to categorise responses. For categorical variables that are investigated through closed questions or observation (for example, observation of the presence or absence of latrines in homesteads), the categories have been decided upon beforehand. In interviews the answers to open-ended questions (for example, ‘Why do you visit the health centre?’) can be pre-categorised to a certain extent, depending on the knowledge of possible answers that may be given. However, there should always be a category called ‘Others, specify . . .’, which can only be categorised afterwards. These responses should be listed and placed in categories that are a logical continuation of the categories you already have. Answers that are difficult or impossible to categorise may be put in a separate residual category called ‘others’, but this category should not contain more than 5% of the answers obtained. For numerical variables, the data are often better collected without any pre-categorisation. If you do not exactly know the range and the dispersion of the different values of these variables when you collect your sample (e.g., home-clinic distance for out-patients, or income), decisions concerning how to categorise and code the data at the time you develop your tools may be premature. If you notice during data analysis that your categories had been wrongly chosen you cannot reclassify the data anymore.
2. CodingIf the data will be entered in a computer for subsequent processing and analysis, it is essential to develop a CODING SYSTEM. For computer analysis, each category of a variable can be coded with a letter, group of letters or word, or be given a number. For example, the answer ‘yes’ may be coded as ‘Y’ or 1; ‘no’ as ‘N’ or 2 and ‘no response’ or ‘unknown’ as ‘U’ or 9. The codes should be entered on the questionnaires (or checklists) themselves. When finalising your questionnaire, for each question you should insert a box for the code in the right margin of the page. These boxes should not be used by the interviewer. They are only filled in afterwards during data processing. Take care that you have as many boxes as the number of digits in each code. If analysis is done by hand using data master sheets, it is useful to code your data as well (see section 3 below) Coding conventions Common responses should have the same code in each question, as this minimises mistakes by coders.
Codes for open-ended questions (in questionnaires) can be done only after examining a sample of (say 20) questionnaires. You may group similar types of responses into single categories, so as to limit their number to at most 6 or 7. If there are too many categories it is difficult to analyse the data. (For details, see section V part 2 of this module and Module 23.) Finally it should be borne in mind that the personnel responsible for computer analysis should be consulted very early in the study, i.e., as soon as the questionnaire and dummy tables are finalised. In fact the research team needs to work closely with the computer analyst or statistician throughout the design and the implementation of the study. 2. Summarising the data in data master sheets, manual compilation, or compilation by computer(1) Data master sheets If data are processed by hand, it is often most efficient to summarise the raw research data in a so-called DATA MASTER SHEET, to facilitate data analysis. On a data master sheet all the answers of individual respondents are entered by hand. To illustrate the use of master sheets, we will give an example of a rapid appraisal carried out by students of a nursing school about the smoking habits of the inhabitants of their town. Smoking was perceived as a big problem, and the study had been designed in order to develop an anti-smoking campaign. The (24) students divided the map of their town in 24 roughly equal parts. Each student put herself in the centre of the part of town designed to her, and interviewed six persons of 15 years and above, three males and three females, so 144 in total. (Another group of students conducted FGDs observations and individual interviews in schools to obtain insight into the onset of smoking behaviour.) The questionnaire had only 17 questions (see Annex 13.1), of which 9 were asked of everyone, 4 exclusively to smokers and 4 exclusively to non-smokers. It was therefore decided to process the data by hand, divided in two groups: smokers and non-smokers, which were again subdivided in males and females. For each of the four groups, master sheets were prepared, on which all the answers of individual respondents could be recorded. Master sheets can be made in different ways. For short simple questionnaires you may put all possible answers for each question in headers at the top of the sheet and then list or tick the answers of the informants one by one in the appropriate columns. For example, the straightforward answers of the smoking questionnaire for male smokers could be processed as follows (see Table 13.1): Table 13.1: Master sheet for smokers (males)
Note that for age and number of cigarettes smoked both the raw data and the categories have been entered. This makes it easier to control for coding mistakes and allows for calculating averages. There are 31 male smokers; if there are less than 31 answers, there must be some non-responders (NR), as happened in Q9, or a mistake was made. If you work with two persons, one reading and one writing, the risk of mistakes will be reduced, as you can discuss the answers and control for mistakes while filling in the data. Even this limited data suggest that male smokers start usually when they are teenagers, that the informants on average smoke one package of cigarettes a day and that attempts to stop appear to increase with age. Some answers, however, require more elaborate coding and have more categories. For example, Q4 on education and Q5 on occupation could be summarised as follows (see Table 13.1, continued): Table 13.1 (continued): Master sheet for smokers (males)
It will be clear that including all possible categories for one question in the headers of the master sheet may take a lot of space. If a questionnaire has only 13 questions, that does not matter; with 64 questions it would mean that you would need several sheets to include all answers. As the idea of master sheets is to have an easy overview of all data, you may then look for another solution, and enter the different codes for one question in one column instead of having different columns of which you tick one. For education, there would in this way only be four columns: none, highest level reached in years, highest type of school reached, and still in school; for occupation only two: self and head household.
This data reveals that Smoker 1 is, like No. 4, still dependent on his father, though No. 1 earns some money. This means the money for cigarettes is strongly limited. Numbers 2 and 3 are financially independent, and smoke more heavily, though 3 has tried to reduce or stop many times. Finally, there are open questions such as Q8, ‘Why do you smoke?’ or Q10, ‘Why not?’, or Q17, soliciting suggestions on how the students could best approach smokers in a campaign to discourage them from smoking. Coding and analysis of such qualitative data will be dealt with in section V of this module. Note: In any small-scale study processed by hand in which groups will be compared, a different master sheet should be made for each of those groups, e.g., good and poor compliers to treatment. As gender is an important cross-cutting theme, it is usually also advisable to subdivide males and females within each of the groups that are being compared. See Annex 13.2 for example of full master sheet. (2) Compilation by hand (without using master sheets) When the sample is small (say less than 30) and the collected data is limited, it might be more efficient to do the compilation manually. Certain procedures will help ensure accuracy and speed.
Note: One can tally in two ways, It should be noted that hand tallying is often used in combination with master sheet analysis when the relationship between two or three variables needs to be established, or details analysed. (For example, the questionnaire forms of non-smokers whose close relatives or co-workers are smoking (Q11 and 12) and who feel disturbed by the smoke (Q13) may be selected to analyse more in detail by tallying the (health) problems these non-smokers are experiencing. Note: Researchers often assume that hand compilation is merely ‘common sense’ and do not train their staff in the correct procedure. Subsequently many hours of work are wasted in trying to detect the source of errors due to double counts, wrong categorisation, and omissions. (3) Computer compilation Before you decide to use a computer, you have to be sure that it will save time or that the quality of the analysis will benefit from it. Note that feeding data into a computer costs time and money. The computer should not be used if your sample is small and the data is mainly generated by open questions (qualitative data), unless there is a resource person who is competent in using a program for qualitative data analysis such as Qualitan or SPSS. The larger the sample, the more beneficial in general the use of a computer will be. Computer compilation consists of the following steps:
i. Choosing an appropriate computer program A number of computer programs are available on the market that can be used to process and analyse research data. The most widely used programs are:
If you intend to use a computer, you may ask advice from an experienced person concerning which program is the most appropriate for your type of data. Note that Epi Info may be freely used and copied. All the other programs have copyrights. ii. Data entry To enter data into the computer you have to develop a data entry format, depending on the program you are using. However, it is possible to enter data using dBase (which is relatively good for data entry) and do the analysis in LOTUS 1-2-3 or SPSS. After deciding on a data entry format, the information on the data collection instrument will have to be coded (e.g., Male: M or 1, Female: F or 2). During data entry, the information relating to each subject in the study is keyed into the computer in the form of the relevant code (e.g., if the first subject (identified as 001) is a male (code 1) aged 25, the data could be keyed in as 001125). Note that data entry can be done through the private sector, which may be fast and not too expensive. Health office staff who are not accustomed to this work tend to be slow and make many errors in entry. iii. Verification During data entry, mistakes will definitely creep in. The computer can print out the data exactly as it has been entered, so the printout can be checked visually for obvious errors, (e.g., exceptionally long or short lines, blanks that should not be there, alphabetic codes where numbers are expected, obviously wrong codes). Example:
If possible, computer verification should be built in. This involves giving the appropriate commands to identify errors.
iv. Programming If you use computer personnel to analyse your data, it is important to communicate effectively with them. Do not leave the analysis to the computer specialist! You as a researcher should tell the computer personnel:
A certain amount of basic knowledge of computer programming is needed to give the appropriate commands. v. Computer outputs The computer can do all kinds of analysis and the results can be printed. It is important to decide whether each of the tables, graphs, and statistical tests that can be produced makes sense and should be used in your report. That is why we PLAN the data analysis BEFOREHAND! (See section V.) V. DATA ANALYSIS – quantitative dataAnalysis of quantitative data involves the production and interpretation of frequencies, tables, graphs, etc., that describe the data. 1. Frequency countsFrom the data master sheets, simple tables can be made with frequency counts for each variable. A frequency count is an enumeration of how often a certain measurement or a certain answer to a specific question occurs.
If numbers are large enough it is better to calculate the frequency distribution in percentages (relative frequencies): 51/144 x 100 = 35% are smokers and 93/144 x 100 = 65% non-smokers. This makes it easier to compare groups than when only absolute numbers are given. In other words, percentages standardise the data. It is usually necessary to summarise the data from numerical variables by dividing them into categories. This process may include the following steps:
2. Cross-tabulationsFurther analysis of the data usually requires the combination of information on two or more variables in order to describe the problem or to arrive at possible explanations for it. For this purpose it is necessary to design CROSS-TABULATIONS. Depending on the objectives and the type of study, two major kinds of cross-tabulations may be required:
A descriptive cross-tabulation would, for example, relate smoking behaviour to sex or occupational background: Table 13.2: Smoking by sex
The males appear to be smoking more (43%) than females (28%). An analytic cross-tabulation serves to investigate if there is a relationship between smoking (independent variable) and persistent cough, or chest complaints (dependent variables/problems). Table 13.3: Smoking in relation to persistent cough over the past 2 weeks
Of the informants with a cough, the majority (77%) is smoking, whereas among those without a cough, only one third (33%) are smokers. The expected relationship between smoking and chest problems seems therefore confirmed. When the plan for data analysis is being developed the data, of course, is not yet available. However, in order to visualise how the data can be organised and summarised it is useful at this stage to construct so-called DUMMY cross-tabulations. A DUMMY TABLE contains all elements of a real table, except that the cells are still empty. In a research proposal dummy tables should be prepared to describe the study population in order to show the crucial relationships between variables. For the study on smoking behaviour, for example, you would have to prepare a number of descriptive dummy tables (describing characteristics of smokers and non-smokers, their behaviour and attitudes towards smoking), such as Table 13.2 but without numbers or percentages. Further you would make two analytic dummy tables, one on the relationship between smoking and persistent cough (see Table 13.3) and one on the relationship between smoking and long-term chest problems. Some practical hints when constructing tables:
To further analyse and interpret the data, certain calculations or statistical procedures must usually be completed. Especially in large cross-sectional surveys and in comparative studies, statistical procedures are necessary if the data is to be adequately interpreted. Statistical tests should, for example, indicate whether the gender differences in smoking behaviour are true differences or due to chance. When conducting such studies it is advisable to consult a person with statistical knowledge from the start in order that:
Some elementary statistical procedures will be taught in the second workshop after field work is completed. An elementary knowledge of statistics will help you better understand the whole process of data analysis and interpretation. VI. PROCESSING AND ANALYSIS OF QUALITATIVE DATAQualitative data may be collected through open-ended questions in self-administered questionnaires, in individual interviews or focus group discussions or through observations during fieldwork. For a detailed description of the analysis of qualitative data see Module 10C and in particular Module 23, which specify the methods most often used. Here we will concentrate on the analysis of responses obtained from open-ended questions in interviews or self-administered questionnaires. Commonly solicited data in open-ended questions include:
The data can be analysed in seven steps: Step 1: Take a sample of (say 20) questionnaires and list all answers for a particular question. Take care to include the source of each answer you list (in the case of questionnaires you can use the questionnaire number), so that you can place each answer in its original context, if required. Step 2: To establish your categories, you first read carefully through the whole list of answers. Then you start giving codes (A, B, C, for example or key words) for the answers that you think belong together in one category, and write these codes in the left margin. Use a pencil so that it is easy to change the categories if you change your mind. Step 3: List the answers again, grouping those with the same code together. Step 4: Then interpret each category of answers and try to give it a label that covers the content of all answers. In the case of data on opinions, for example, there may be only a limited number of possibilities, which may range from (very) positive, neutral, to (very) negative. Step 5: Now try a next batch of 20 questionnaires and check if the labels work. Adjust the categories and labels, if necessary. Step 6: Make a final list of labels for each category and give each label a code (keyword, letter or number). Step 7: Code all your data, including what you have already coded, and enter these codes in your master sheet or in the computer. Note again that you may include a category ‘others’, but that it should be as small as possible, preferably used for less than 5% of the total answers. If you categorise your responses to open-ended questions in this way you can:
Questions that ask for descriptions of procedures, practices, or beliefs usually do not provide quantifiable answers (though you may quantify certain aspects of them). The answers rather form part of a jigsaw puzzle that you have to put together in order to obtain insight in your problem/topic under study. IN CONCLUSION, a plan for the processing and analysis of data may include:
GROUP WORK Prepare your plan for data processing and analysis, considering the following points:
EXERCISE: Analysis of responses to open-ended questionsPlease analyse and interpret the following answers from the study of student nurses on smoking which were given to Question 7: ‘Why do you smoke?’
Analyse and interpret these answers as follows:
In reality you would have tried the coding system out on another sample of answers. Thereafter you would most likely still have slightly adjusted the coding system, after which you would have coded all answers on the questionnaires and entered the codes in the master sheet or computer. Trainer’s Notes Module 13: PLAN FOR DATA PROCESSING AND ANALYSISTiming and teaching methods
Introduction and discussion
Exercise: Analysis of responses to open-ended questionsThis can be done in the small working groups, before starting on the group work assignment. Assist the participants in analysing and interpreting the 19 answers given to the open-ended question. One way you might categorise and interpret the answers is given on the next page. Explain that the codes can be inserted on the master sheets. Annex 13.1: Questionnaire for adults (15+ ) on smoking behaviourIntroductionMay I talk with you for a few minutes? We are students from the nursing school, and we are conducting a small study on smoking. We ask about 10 questions. Would you mind participating? (State you would like to interview non-smokers as well as smokers.) If the informant agrees: Everything we discuss will be treated confidentially (no one else will know who said what). But you should feel free to remain silent if you hesitate to answer a particular question. And if you have any questions for us, please feel free to ask them. Questionnaire
Put v in the applicable box, unless otherwise indicated
Annex 13.2: Mastersheet for femal non-smokers
Annex 13.2: Mastersheet for femal non-smokers (continued)
|
|||||||||||||||||||||||||||||||||||||
| guest (Read)(Ottawa) Login | Home|Careers|Copyright and Terms of Use|General Infomation|Contact Us|Low bandwidth |