International Development Research Centre (IDRC) Canada     
Web Archives > Publications > IDRC Books > All our books > DESIGNING AND CONDUCTING HEALTH SYSTEMS RESEARCH PROJECTS: VOLUME 1 >
 Topic Explorer  
IDRC Books
     New
     in_focus
     Development & evaluation
     Economics
     Environment & biodiversity
     Food/agriculture
     Health
     IT/communication
     Natural resources
     Science/technology
     Social/political sciences
    All our books

IDRC's 40th anniversary

Subscribe

Free Online Books

Free Online Books
 People
Bill Carman

ID: 56622
Added: 2004-03-03 13:14
Modified: 2004-11-03 9:59
Refreshed: 2012-02-10 18:31

Click here to get the URL for the RSS format file RSS format file

Module 13: PLAN FOR DATA PROCESSING AND ANALYSIS
Prev Document(s) 19 of 27 Next

NB: Development of a research process is a cyclical process. The double-headed arrows indicate that the process is never linear.

MODULE 13: PLAN FOR DATA PROCESSING AND ANALYSIS

OBJECTIVES

At the end of this session, you should be able to:

  1. Identify important issues related to sorting, quality control, and processing of data.
  2. Describe how data can be best be analysed and interpreted based on the objectives and variables of the study.
  3. Prepare a plan for the processing and analysis of data (including data master sheets and dummy tables) for the research proposal you are developing.
  1. Introduction
  2. Sorting data
  3. Performing quality checks
  4. Data processing - qualitative data
  5. Data analysis - quantitative data
  6. Processing and analysis of qualitative data

I. INTRODUCTION

Data processing and analysis should start in the field, with checking for completeness of the data and performing quality control checks, while sorting the data by instrument used and by group of informants. Data of small samples may even be processed and analysed as soon as it is collected.

Why is it necessary to prepare a plan for processing and analysis of data?

Such a plan helps the researcher assure that at the end of the study:

  • all the information (s)he needs has indeed been collected, and in a standardised way;
  • (s)he has not collected unnecessary data which will never be analysed.

The plan for data processing and analysis must be made after careful consideration of the objectives of the study as well as of the tools developed to meet the objectives.

The procedures for the analysis of data collected through qualitative and quantitative techniques are quite different.

  • For quantitative data the starting point in analysis is usually a description of the data for each variable for all the study units included in the sample. Processing of data may take place during data collection or when all data has been collected; description and analysis are usually carried out after the fieldwork has been completed.
  • For qualitative data it is more a matter of describing, summarising and interpreting the data obtained for each study unit (or for each group of study units). Here the researcher starts analysing while collecting the data so that questions that remain unanswered (or new questions which come up) can be addressed before data collection is over.

Preparation of a plan for data processing and analysis will provide you with better insight into the feasibility of the analysis to be performed as well as the resources that are required. It also provides an important review of the appropriateness of the data collection tools for collecting the data you need. That is why you have to plan for data analysis before the pre-test (Module 14). When you process and analyse the data you collect during the pre-test you will spot gaps and overlaps which require changes in the data collection tools before it is too late!

What should the plan include?

When making a plan for data processing and analysis the following issues should be considered:

  • Sorting data,
  • Performing quality-control checks,
  • Data processing, and
  • Data analysis.

II. SORTING DATA

An appropriate system for sorting the data is important for facilitating subsequent processing and analysis.

If you have different study populations (for example village health workers, village health committees and the general population), you obviously would number the questionnaires separately.

In a comparative study it is best to sort the data right after collection into the two or three groups that you will be comparing during data analysis.

For example, in a study concerning the reasons for low acceptance of family planning services, users and non-users would be basic categories; in a study of the reasons why nurses object to being posted in rural areas, rural and urban nurses would be basic categories; in a case-control study obviously the cases are to be compared with the controls.

It is useful to number the questionnaires belonging to each of these categories separately right after they are sorted.

For example, the questionnaires administered to users of family planning services could be numbered U1, U2, U3, etc., and those for the non-users N1, N2, N3, etc.

In a cross-sectional survey it may also be useful to sort the data into two or more groups, depending on possible sub-groups you would like to compare.

III. PERFORMING QUALITY CONTROL CHECKS

Usually the data have already been checked in the field to ensure that all the information has been properly collected and recorded. Before and during data processing, however, the information should be checked again for completeness and internal consistency.

If a questionnaire has not been filled in completely you will have MISSING DATA for some of your variables. If there are many missing data in a particular questionnaire, you may decide to exclude the whole questionnaire from further analysis.

  • If an inconsistency is clearly due to a mistake made by the researcher/research assistant (for example if a person in an earlier question is recorded as being a non-smoker, whereas all other questions reveal that he is smoking), it may still be possible to check with the person who conducted the interview and to correct the answer.
  • If the inconsistency is less clearly a mistake in recording, it may be possible (in a small scale study) to return to the respondent and ask for clarification.
  • If it is not possible to correct information that is clearly inconsistent, you may consider excluding this particular part of the data from further processing and analysis as it will affect the validity of the study. If a certain question produces ambiguous or vague answers throughout, the whole question should be excluded from further analysis. (Normally, however, you would discover such a problem during the pre-test and change the wording of the question.)

Note:

A decision to exclude data of doubtful quality is ethically correct and it testifies to the scientific integrity of the researcher. You should keep track of any questions you had to exclude because of incompleteness or inconsistency in the answers, and discuss it in your final report.

For computer data analysis, quality control checks of data must also include a verification of how the data has been transformed into codes and subsequently entered into the computer. The same applies if data are entered into master sheets (see next page).

IV. DATA PROCESSING – quantitative data

Decide whether to process and analyse the data from questionnaires:

  • manually, using data master sheets or manual compilation of the questionnaires, or
  • by computer, for example, using a micro-computer and existing software or self-written programmes for data analysis.

Data processing in both cases involves:

  • categorising the data,
  • coding, and
  • summarising the data in data master sheets, manual compilation without master sheets, or data entry and verification by comuter.
1. Categorising

Decisions have to be made concerning how to categorise responses.

For categorical variables that are investigated through closed questions or observation (for example, observation of the presence or absence of latrines in homesteads), the categories have been decided upon beforehand.

In interviews the answers to open-ended questions (for example, ‘Why do you visit the health centre?’) can be pre-categorised to a certain extent, depending on the knowledge of possible answers that may be given. However, there should always be a category called ‘Others, specify . . .’, which can only be categorised afterwards.

These responses should be listed and placed in categories that are a logical continuation of the categories you already have. Answers that are difficult or impossible to categorise may be put in a separate residual category called ‘others’, but this category should not contain more than 5% of the answers obtained.

For numerical variables, the data are often better collected without any pre-categorisation. If you do not exactly know the range and the dispersion of the different values of these variables when you collect your sample (e.g., home-clinic distance for out-patients, or income), decisions concerning how to categorise and code the data at the time you develop your tools may be premature. If you notice during data analysis that your categories had been wrongly chosen you cannot reclassify the data anymore.

For example, in a study into utilisation of health services, the research team wanted to establish whether income was related to utilisation. They had pre-coded income into three categories. When they analysed the data they discovered that over 80% fell in the lowest income category. In hindsight they would have preferred a five point scale in order to distinguish different grades of poverty, but as the raw data had not been recorded it was impossible to reclassify the data, and the variable was almost useless.

2. Coding

If the data will be entered in a computer for subsequent processing and analysis, it is essential to develop a CODING SYSTEM.

For computer analysis, each category of a variable can be coded with a letter, group of letters or word, or be given a number. For example, the answer ‘yes’ may be coded as ‘Y’ or 1; ‘no’ as ‘N’ or 2 and ‘no response’ or ‘unknown’ as ‘U’ or 9.

The codes should be entered on the questionnaires (or checklists) themselves. When finalising your questionnaire, for each question you should insert a box for the code in the right margin of the page. These boxes should not be used by the interviewer. They are only filled in afterwards during data processing. Take care that you have as many boxes as the number of digits in each code.

If analysis is done by hand using data master sheets, it is useful to code your data as well (see section 3 below)

Coding conventions

Common responses should have the same code in each question, as this minimises mistakes by coders.

For example:

Yes (or positive response)code - Y or 1
No (or negative response)code - N or 2
Don’t knowcode - D or 8
No response/unknowncode - U or 9

Codes for open-ended questions (in questionnaires) can be done only after examining a sample of (say 20) questionnaires. You may group similar types of responses into single categories, so as to limit their number to at most 6 or 7. If there are too many categories it is difficult to analyse the data. (For details, see section V part 2 of this module and Module 23.)

Finally it should be borne in mind that the personnel responsible for computer analysis should be consulted very early in the study, i.e., as soon as the questionnaire and dummy tables are finalised. In fact the research team needs to work closely with the computer analyst or statistician throughout the design and the implementation of the study.

2. Summarising the data in data master sheets, manual compilation, or compilation by computer

(1) Data master sheets

If data are processed by hand, it is often most efficient to summarise the raw research data in a so-called DATA MASTER SHEET, to facilitate data analysis. On a data master sheet all the answers of individual respondents are entered by hand.

To illustrate the use of master sheets, we will give an example of a rapid appraisal carried out by students of a nursing school about the smoking habits of the inhabitants of their town. Smoking was perceived as a big problem, and the study had been designed in order to develop an anti-smoking campaign. The (24) students divided the map of their town in 24 roughly equal parts. Each student put herself in the centre of the part of town designed to her, and interviewed six persons of 15 years and above, three males and three females, so 144 in total. (Another group of students conducted FGDs observations and individual interviews in schools to obtain insight into the onset of smoking behaviour.) The questionnaire had only 17 questions (see Annex 13.1), of which 9 were asked of everyone, 4 exclusively to smokers and 4 exclusively to non-smokers. It was therefore decided to process the data by hand, divided in two groups: smokers and non-smokers, which were again subdivided in males and females. For each of the four groups, master sheets were prepared, on which all the answers of individual respondents could be recorded.

Master sheets can be made in different ways. For short simple questionnaires you may put all possible answers for each question in headers at the top of the sheet and then list or tick the answers of the informants one by one in the appropriate columns.

For example, the straightforward answers of the smoking questionnaire for male smokers could be processed as follows (see Table 13.1):

Table 13.1: Master sheet for smokers (males)

Note that for age and number of cigarettes smoked both the raw data and the categories have been entered. This makes it easier to control for coding mistakes and allows for calculating averages. There are 31 male smokers; if there are less than 31 answers, there must be some non-responders (NR), as happened in Q9, or a mistake was made. If you work with two persons, one reading and one writing, the risk of mistakes will be reduced, as you can discuss the answers and control for mistakes while filling in the data.

Even this limited data suggest that male smokers start usually when they are teenagers, that the informants on average smoke one package of cigarettes a day and that attempts to stop appear to increase with age.

Some answers, however, require more elaborate coding and have more categories. For example, Q4 on education and Q5 on occupation could be summarised as follows (see Table 13.1, continued):

Table 13.1 (continued): Master sheet for smokers (males)

It will be clear that including all possible categories for one question in the headers of the master sheet may take a lot of space. If a questionnaire has only 13 questions, that does not matter; with 64 questions it would mean that you would need several sheets to include all answers. As the idea of master sheets is to have an easy overview of all data, you may then look for another solution, and enter the different codes for one question in one column instead of having different columns of which you tick one. For education, there would in this way only be four columns: none, highest level reached in years, highest type of school reached, and still in school; for occupation only two: self and head household.

This data reveals that Smoker 1 is, like No. 4, still dependent on his father, though No. 1 earns some money. This means the money for cigarettes is strongly limited. Numbers 2 and 3 are financially independent, and smoke more heavily, though 3 has tried to reduce or stop many times.

Finally, there are open questions such as Q8, ‘Why do you smoke?’ or Q10, ‘Why not?’, or Q17, soliciting suggestions on how the students could best approach smokers in a campaign to discourage them from smoking. Coding and analysis of such qualitative data will be dealt with in section V of this module.

Note:

In any small-scale study processed by hand in which groups will be compared, a different master sheet should be made for each of those groups, e.g., good and poor compliers to treatment. As gender is an important cross-cutting theme, it is usually also advisable to subdivide males and females within each of the groups that are being compared.

See Annex 13.2 for example of full master sheet.

(2) Compilation by hand (without using master sheets)

When the sample is small (say less than 30) and the collected data is limited, it might be more efficient to do the compilation manually.

Certain procedures will help ensure accuracy and speed.

  1. If only one person is doing the compilation use manual sorting. If a team of 2 persons work together use either manual sorting or tally counting.

    Manual sorting can be used only if data on each subject is on a different sheet of paper/ entered in a separate questionnaire.

  2. To do manual sorting the basic procedure is to:
    • Take one question at a time, for example, ‘use of health facility’,
    • Sort the questionnaires into different piles representing the various responses to the question, e.g., hospital/ health centre/ traditional practitioners) and
    • Count the number in each pile.

      When you need to sort out subjects who have a certain combination of variables (e.g. females who used each type of health facility) sort the questionnaires into piles according to the first question (gender), then subdivide the piles according to the response to the other question (use of health facility).

  3. To do tally counting the basic procedure is:
    • One member of the compiling team reads out the information while the other records it in the form of a tally (e.g., III representing 3 subjects, IIII or representing 5 subjects who present a particular answer).
    • Tally count for no more than two variables at one time (e.g., sex plus type of facility used).

      If it is necessary to obtain information on 3 variables (e.g., sex by time of attending a health facility by diagnosis), do a manual sorting for the first question, then tally count for the other two variables.

    • After tally counting, add the tallies and record the number of subjects in each group.
  4. After doing either manual or tally counting, check the total number of subjects/responses in each question to make sure that there has been no omission or double count.

Note:

One can tally in two ways, . The latter way is preferable as it reduces the or possibility of error.

It should be noted that hand tallying is often used in combination with master sheet analysis when the relationship between two or three variables needs to be established, or details analysed. (For example, the questionnaire forms of non-smokers whose close relatives or co-workers are smoking (Q11 and 12) and who feel disturbed by the smoke (Q13) may be selected to analyse more in detail by tallying the (health) problems these non-smokers are experiencing.

Note:

Researchers often assume that hand compilation is merely ‘common sense’ and do not train their staff in the correct procedure. Subsequently many hours of work are wasted in trying to detect the source of errors due to double counts, wrong categorisation, and omissions.

(3) Computer compilation

Before you decide to use a computer, you have to be sure that it will save time or that the quality of the analysis will benefit from it. Note that feeding data into a computer costs time and money. The computer should not be used if your sample is small and the data is mainly generated by open questions (qualitative data), unless there is a resource person who is competent in using a program for qualitative data analysis such as Qualitan or SPSS. The larger the sample, the more beneficial in general the use of a computer will be.

Computer compilation consists of the following steps:

  1. Choosing an appropriate computer program
  2. Data entry
  3. Verification or validation of the data
  4. Programming (if necessary)
  5. Computer outputs/prints

i. Choosing an appropriate computer program

A number of computer programs are available on the market that can be used to process and analyse research data. The most widely used programs are:

  • Epi Info (version 6), a very consumer friendly program for data entry and analysis, which also has a word processing function for creating questionnaires (developed by the Centre for Disease Control, Atlanta, USA and World Health Organization, Geneva),
  • LOTUS 1-2-3, a spreadsheet program (from the Lotus Development Corporation),
  • dBase (version III plus or IV), a data-management program (from Ashton-Tate), and
  • SPSS, which is a quite advanced Statistical Package for Social Sciences (SPSS Inc.).

If you intend to use a computer, you may ask advice from an experienced person concerning which program is the most appropriate for your type of data. Note that Epi Info may be freely used and copied. All the other programs have copyrights.

ii. Data entry

To enter data into the computer you have to develop a data entry format, depending on the program you are using. However, it is possible to enter data using dBase (which is relatively good for data entry) and do the analysis in LOTUS 1-2-3 or SPSS.

After deciding on a data entry format, the information on the data collection instrument will have to be coded (e.g., Male: M or 1, Female: F or 2). During data entry, the information relating to each subject in the study is keyed into the computer in the form of the relevant code (e.g., if the first subject (identified as 001) is a male (code 1) aged 25, the data could be keyed in as 001125).

Note that data entry can be done through the private sector, which may be fast and not too expensive. Health office staff who are not accustomed to this work tend to be slow and make many errors in entry.

iii. Verification

During data entry, mistakes will definitely creep in. The computer can print out the data exactly as it has been entered, so the printout can be checked visually for obvious errors, (e.g., exceptionally long or short lines, blanks that should not be there, alphabetic codes where numbers are expected, obviously wrong codes).

Example:

  • Codes 3-8 in the column for sex where only 1(F) and 2 (M) are possible
  • Codes above 250 when you had only 250 subjects

If possible, computer verification should be built in. This involves giving the appropriate commands to identify errors.

Example:

The computer can be instructed to identify and print out all subjects where the ‘sex’ column has a code different from 1 (F) or 2 (M).

iv. Programming

If you use computer personnel to analyse your data, it is important to communicate effectively with them. Do not leave the analysis to the computer specialist! You as a researcher should tell the computer personnel:

  • the names of all the variables in the questionnaire;
  • the location of these variables in relation to the data for one subject (i.e., the data format);
  • how many subjects are to be analysed and which groups are to be compared;
  • whether any variables are to be re-coded or calculated; and
  • for which variables you need straight tabulations and which variables you would like to cross-tabulate.

A certain amount of basic knowledge of computer programming is needed to give the appropriate commands.

v. Computer outputs

The computer can do all kinds of analysis and the results can be printed. It is important to decide whether each of the tables, graphs, and statistical tests that can be produced makes sense and should be used in your report. That is why we PLAN the data analysis BEFOREHAND! (See section V.)

V. DATA ANALYSIS – quantitative data

Analysis of quantitative data involves the production and interpretation of frequencies, tables, graphs, etc., that describe the data.

1. Frequency counts

From the data master sheets, simple tables can be made with frequency counts for each variable. A frequency count is an enumeration of how often a certain measurement or a certain answer to a specific question occurs.

For example,

Smokers51
Non-smokers93
Total 144

If numbers are large enough it is better to calculate the frequency distribution in percentages (relative frequencies): 51/144 x 100 = 35% are smokers and 93/144 x 100 = 65% non-smokers. This makes it easier to compare groups than when only absolute numbers are given. In other words, percentages standardise the data.

It is usually necessary to summarise the data from numerical variables by dividing them into categories. This process may include the following steps:

(1) Inspect all the figures: What is their range? (The range is the difference between the largest and the smallest measurement.)

(2) Divide the range into three to five categories. You can either aim at having a reasonable number in each category (e.g. 0-2 km, 3-4 km, 5-9 km, 10+ km for home-clinic distance) or you can define the categories in such a way that they are each equal in size (e.g., 20-29 years, 30-39 years, 40-49 years, etc.). Sometimes one looks actively for a ‘critical’ value when making different categories. For example, in a study relating family income to prevalence of diarrhoea over a certain period, there appeared to be no statistical relation when income was arbitrarily subdivided into four categories. When the average income was calculated, however, this appeared to be a critical value. The children in families with an income above average had had significantly less diarrhoea than the children in families with an income below average.

(3) Construct a table indicating how data are grouped and count the number of observations in each group.

2. Cross-tabulations

Further analysis of the data usually requires the combination of information on two or more variables in order to describe the problem or to arrive at possible explanations for it.

For this purpose it is necessary to design CROSS-TABULATIONS.

Depending on the objectives and the type of study, two major kinds of cross-tabulations may be required:

  • Descriptive cross-tabulations that aim at describing the problem under study.
  • Analytic cross-tabulations in which groups are compared in order to determine differences, or which focus on exploring relationships between variables.

A descriptive cross-tabulation would, for example, relate smoking behaviour to sex or occupational background:

Table 13.2: Smoking by sex

The males appear to be smoking more (43%) than females (28%).

An analytic cross-tabulation serves to investigate if there is a relationship between smoking (independent variable) and persistent cough, or chest complaints (dependent variables/problems).

Table 13.3: Smoking in relation to persistent cough over the past 2 weeks

Of the informants with a cough, the majority (77%) is smoking, whereas among those without a cough, only one third (33%) are smokers. The expected relationship between smoking and chest problems seems therefore confirmed.

When the plan for data analysis is being developed the data, of course, is not yet available. However, in order to visualise how the data can be organised and summarised it is useful at this stage to construct so-called DUMMY cross-tabulations.

A DUMMY TABLE contains all elements of a real table, except that the cells are still empty.

In a research proposal dummy tables should be prepared to describe the study population in order to show the crucial relationships between variables.

For the study on smoking behaviour, for example, you would have to prepare a number of descriptive dummy tables (describing characteristics of smokers and non-smokers, their behaviour and attitudes towards smoking), such as Table 13.2 but without numbers or percentages. Further you would make two analytic dummy tables, one on the relationship between smoking and persistent cough (see Table 13.3) and one on the relationship between smoking and long-term chest problems.

Some practical hints when constructing tables:

  • If a dependent and an independent variable are cross-tabulated, the headings of the dependent variable are usually placed horizontally (see Table 13.3: ‘cough’ and ‘no cough’), and the headings of the independent variable vertically: (‘smoking’ and ‘not smoking’ in the same table).
  • All tables should have a clear title and clear headings for all rows and columns.
  • All tables should have a separate row and a separate column for totals to enable you to check if your totals are the same for all variables and to make further analysis easier.
  • All tables related to a certain objective should be numbered and kept together so the work can be easily organised and the writing of the final report will be simplified.

To further analyse and interpret the data, certain calculations or statistical procedures must usually be completed. Especially in large cross-sectional surveys and in comparative studies, statistical procedures are necessary if the data is to be adequately interpreted. Statistical tests should, for example, indicate whether the gender differences in smoking behaviour are true differences or due to chance. When conducting such studies it is advisable to consult a person with statistical knowledge from the start in order that:

  • correct sampling methods are used and an appropriate sample size is selected;
  • decisions on coding are made that will facilitate data processing and analysis; and
  • a clear understanding is reached concerning plans for data processing, analysis and interpretation, including agreement concerning which variables need to be cross-tabulated.

Some elementary statistical procedures will be taught in the second workshop after field work is completed. An elementary knowledge of statistics will help you better understand the whole process of data analysis and interpretation.

VI. PROCESSING AND ANALYSIS OF QUALITATIVE DATA

Qualitative data may be collected through open-ended questions in self-administered questionnaires, in individual interviews or focus group discussions or through observations during fieldwork. For a detailed description of the analysis of qualitative data see Module 10C and in particular Module 23, which specify the methods most often used. Here we will concentrate on the analysis of responses obtained from open-ended questions in interviews or self-administered questionnaires.

Commonly solicited data in open-ended questions include:

  • opinions of respondents on a certain issue;
  • reasons for a certain behaviour; and
  • descriptions of certain procedures, practices or perceptions with which the researcher is not familiar.

The data can be analysed in seven steps:

Step 1: Take a sample of (say 20) questionnaires and list all answers for a particular question. Take care to include the source of each answer you list (in the case of questionnaires you can use the questionnaire number), so that you can place each answer in its original context, if required.

Step 2: To establish your categories, you first read carefully through the whole list of answers. Then you start giving codes (A, B, C, for example or key words) for the answers that you think belong together in one category, and write these codes in the left margin. Use a pencil so that it is easy to change the categories if you change your mind.

Step 3: List the answers again, grouping those with the same code together.

Step 4: Then interpret each category of answers and try to give it a label that covers the content of all answers. In the case of data on opinions, for example, there may be only a limited number of possibilities, which may range from (very) positive, neutral, to (very) negative.

Data on reasons may require different categories depending on the topic and the purpose of your question. In the exercise below you will be asked to categorise the reasons why people smoke by grouping them in such a way that it is easy to find entry points for health education aimed at reducing smoking.

After some shuffling you usually end up with 5 to 7 categories.

Step 5: Now try a next batch of 20 questionnaires and check if the labels work. Adjust the categories and labels, if necessary.

Step 6: Make a final list of labels for each category and give each label a code (keyword, letter or number).

Step 7: Code all your data, including what you have already coded, and enter these codes in your master sheet or in the computer.

Note again that you may include a category ‘others’, but that it should be as small as possible, preferably used for less than 5% of the total answers.

If you categorise your responses to open-ended questions in this way you can:

  • Analyse the content of each answer given in particular categories, for example, in order to plan what actions should be taken (e.g., for health education). Gaining insight in a problem, or in possible interventions for a problem, is the most important function of qualitative data.
  • Report the number and percentage of respondents that fall into each category; so that you gain insight in the relative weight of different opinions or reasons.

Questions that ask for descriptions of procedures, practices, or beliefs usually do not provide quantifiable answers (though you may quantify certain aspects of them). The answers rather form part of a jigsaw puzzle that you have to put together in order to obtain insight in your problem/topic under study.

IN CONCLUSION, a plan for the processing and analysis of data may include:

  • a decision on whether all or some parts of the data should be processed by hand or computer;
  • dummy tables for the description of the problem, the comparison of groups (if applicable) and/or the establishment of relationships between variables, guided by the objectives of the study;
  • a decision on the sequence in which tables or data from different study populations should be analysed;
  • a decision on how qualitative data should be analysed;
  • an estimate of the total time needed for analysis and how long particular parts of the analysis will take;
  • a decision concerning whether additional staff will be required for the analysis; and
  • an estimate of the total cost of the analysis.

GROUP WORK

Prepare your plan for data processing and analysis, considering the following points:

  1. Sorting and quality control of data (10 minutes):
    • How will the sorting be done? When?
    • What quality checks should be made? Who will do them? When?
  2. Processing of data (50 minutes):
    • How will you do it (by hand or by computer)? If by computer: do you have enough experience and is the necessary equipment available?
    • Prepare data master sheets for your proposal (preferably on flip charts).
    • How many open-ended questions do you have that require categorising or coding? Who will do the categorising or coding? How much time will be required for data processing (taking into account the sample size)?
  3. Analysing and interpreting the data (1 hour):
    • Using the specific objectives and the list of variables, prepare dummy tables in which you relate variables to each other to analyse possible (causal) relationships. Select the dummy tables that you plan to fill in before we have our workshop on data analysis and reporting.
    • Make estimates of the time and materials required for the analysis (in our case, only for the period up to the second workshop during which we will continue the analysis of data).
  4. Prepare to present in plenary your master sheet, three dummy tables, a list of other important variables that you would like to cross-tabulate, and rough estimates of manpower, time, and materials required for the analysis of data (15 minutes).

EXERCISE: Analysis of responses to open-ended questions

Please analyse and interpret the following answers from the study of student nurses on smoking which were given to Question 7: ‘Why do you smoke?’

  1. I have tried to give up so many times but I have been unable to.
  2. I like the feel of the cigarette in my hand.
  3. Because it gives me pleasure.
  4. I do not see why I should give up smoking!!
  5. Because I like to blow the smoke through my mouth and nose.
  6. Because I feel confident and in charge when I am smoking.
  7. It helps me to think better.
  8. I like the image that comes with smoking.
  9. I feel that people respect me more as a smoker.
  10. All my friends are smokers.
  11. It helps to make people more friendly and comfortable, especially when offering a cigarette.
  12. Why not?!!
  13. Smoking makes me feel like a man.
  14. I like to blow smoke rings.
  15. I like the taste.
  16. It is too difficult to give up.
  17. It helps me to relax.
  18. It helps me to reduce the pressure and tension at work.
  19. My wife likes a man who smokes.

Analyse and interpret these answers as follows:

  • Develop a coding system by categorising the answers. First read all answers carefully. Then make rough categories of answers that seem to belong together. Try to limit yourself to 5-7 groups. Label each group of answers with a key word that seems to characterise the answers.
  • List all answers again, but now in 5-7 groups under the labels you have selected.
  • Discuss the groups, for example in terms of types of messages you would use to convince these smokers to stop smoking. This helps to identify whether the answers indeed belong together. You may split up some groups of answers and combine others. Find an appropriate ‘label’ for each category and count how many answers you have for each category.
  • Come up with suggestions for interventions.

In reality you would have tried the coding system out on another sample of answers. Thereafter you would most likely still have slightly adjusted the coding system, after which you would have coded all answers on the questionnaires and entered the codes in the master sheet or computer.

Trainer’s Notes

Module 13: PLAN FOR DATA PROCESSING AND ANALYSIS

Timing and teaching methods

1 hourIntroduction and discussion
½ hour Exercise
2½ hoursGroup work
1 hourPlenary
5 hoursTOTAL TIME
Introduction and discussion
  • Give a brief introduction to the topic, starting with an overview of your presentation.
  • Explain and clarify terms such as sorting, quality control, processing, categorising, coding and tallying.
  • Emphasise the importance of an adequate numbering system for the different data collection tools: they can be either numbered before going into the field or afterwards or both.
  • Discuss the pros and cons of using a computer if one or more groups intend to use it. If computers are not being used in the course, the section on computer compilation can be omitted and participants can be asked to read it themselves if interested.
  • Discuss the importance of having different data master sheets for different categories of informants.
  • Pay special attention to how to deal with missing data: missing data should be recorded on the data master sheet so the groups can arrive at correct total figures. FOR EACH QUESTION/ITEM the total number of answers and the total number of missing values should add up to the total number of interviewees. If the totals are not correct, groups go astray when processing their data. This should be avoided by all means.
  • Prepare an overhead sheet for dummy tables 13.2 and 13.3, and another sheet with the data filled in to put on top of the previous sheet. Let the participants interpret the data.
  • In your conclusion, summarise the different components that should be included in a plan for data processing and analysis.
Exercise: Analysis of responses to open-ended questions

This can be done in the small working groups, before starting on the group work assignment. Assist the participants in analysing and interpreting the 19 answers given to the open-ended question. One way you might categorise and interpret the answers is given on the next page. Explain that the codes can be inserted on the master sheets.

Annex 13.1: Questionnaire for adults (15+ ) on smoking behaviour

Introduction

May I talk with you for a few minutes? We are students from the nursing school, and we are conducting a small study on smoking. We ask about 10 questions. Would you mind participating? (State you would like to interview non-smokers as well as smokers.)

If the informant agrees:

Everything we discuss will be treated confidentially (no one else will know who said what). But you should feel free to remain silent if you hesitate to answer a particular question. And if you have any questions for us, please feel free to ask them.


Questionnaire

Put v in the applicable box, unless otherwise indicated

Annex 13.2: Mastersheet for femal non-smokers

Annex 13.2: Mastersheet for femal non-smokers (continued)







Prev Document(s) 19 of 27 Next



   guest (Read)(Ottawa)   Login Home|Careers|Copyright and Terms of Use|General Infomation|Contact Us|Low bandwidth