Methodologies for data collection and analysis for monitoring and evaluation

The quality and utility of data derived from either monitoring or evaluation in an IOM intervention depends on the data collection planning, design, implementation, management and analysis stages of these respective processes. Understanding each stage, and the linkages between them, is important for collecting relevant, high-quality data that can inform evidence-based decision-making and learning. The following chapter will look at methodologies for planning, designing and using various data collection tools for both monitoring and evaluation (M&E) purposes. This chapter also focuses on managing and analysing the collected data and, finally, how to present findings.

An overview of chapter 4

This chapter presents methodological fundamentals required for data collection and analysis. Specific issues and considerations for variation between methodologies are covered throughout the chapter. Awareness of methodological fundamentals helps set standards and ensure consistency in methodology, quality of data and reporting across the Organization. It enhances the robustness and rigour of IOM M&E products and facilitates the comparison of results and their aggregation.

While obtaining data is required for both M&E, it is important to note that the methodologies may vary according to respective information needs.1 This may subsequently shape the purpose of the data collection, which is guided by the availability of data, local context, resources and time, as well as other variables.

The scope of this chapter is limited to concepts that will enabble users to acquire a broad understanding of methodologies for collecting and analysing M&E data, and links to additional resources are available at the end of each section.

M&E practitioners will have an understanding of methodologies for M&E, specifically of how to select, design and implement methods relevant to their work and have a knowledgeable background to make informed choices.

Professional standards and ethical guidelines

During the different stages of monitoring or evaluation, including for the collection and use of data, M&E practitioners are required to adopt ethical behaviours that prevent them from being influenced by internal and external pressures that may try to change the findings before they are released or to use them in an inappropriate way. Also see chapter 2, Norms, standards and management for monitoring and evaluation

Ethical behaviour

Ethics are a set of values and beliefs that are based on a person’s view of what is right, wrong, good and bad and that influence the decisions people make. They can be dictated by the organization and also by laws in the country in which the M&E practitioners work and what people consider to be ethical in that context

RESOURCES

M&E practitioners must also act in accordance with the following sources

IOM resource

2014 IOM Standards of Conduct. IN/15 Rev. 1 (Internal link only).

IOM resource

United Nations Evaluation Group (UNEG)

2016 Norms and Standards for Evaluation. New York.3IOM is a member of UNEG, and this must operate in accordance with the established professional norms and standards and ethical guidelines.
For more information related to how professional norms and standards in M&E, including ethics and guide IOM’s M&E work, see chapter 2: Norms, standards and management for monitoring and evaluation.

Evaluation and politics

The gathered data provide an important source of information to decision makers about the intervention being monitored and/or evaluated. While positive evaluations can help secure more funds, expand a pilot project or enhance reputations, the identification of serious problems can lead to difficult situations where the credibility of the work done is at stake. Understanding and managing political situations and influence is crucial for maintaining the integrity of the monitoring and evaluation work and well defined and robust methodologies for the data collection and analysis play a critical role. Morra Imas and Rist, 2009. Please also see chapter 2 on norms, standards and management for monitoring and evaluation

Ethical guidelines and principles

When planning, designing, implementing, managing and reporting on M&E activities, M&E practitioners should ensure that their actions are informed by ethical guidelines, particularly those outlined below:

IOM Evaluation Policy and Monitoring Policy (September 2018);
IOM Data Protection Principles (IN/00138) (May 2009) (Internal link only);
IOM Standards of Conduct (IN/15 Rev.1) (Internal link only);
UNEG Ethical Guidelines for Evaluation (March 2008, revised 2020).

Some of the common ethical principles presented in the above documents “should be applied in full respect of human rights, data protection and confidentiality, gender considerations, ethnicity, age, sexual orientation, language, disability, and other considerations when designing and implementing the evaluation”.IOM, 2017b, p. 438. They can be summarized as follows:

Monitoring and evaluation ethical principles

Adhering to common ethical principles also contributes guaranteeing that the information gathered is accurate, relevant, timely and used in a responsible manner (see chapter 2, as well as Annex 2.1. Ethical monitoring and/or evaluation checklist).

Independence also means avoiding conflicts of interest and being able to retain independence of judgement and not be influenced by pressure from any party to modify evaluation findings.

RESOURCES

IOM resource

2017b IOM Project Handbook. Second edition. Geneva (Internal link only).

Other resources

Buchanan-Smith, M., J. Cosgrave and A. Warner

2016 Evaluation of Humanitarian Action Guide. Active Learning Network for Accountability and Performance/Overseas Development Institute (ALNAP/ODI), London.

Hidden Field Fitzpatrick, J.L., J.R. Sanders and B.R. Worthen

2004 Programme Evaluation: Alternative Approaches and Practical Guidelines. Third edition. Pearson Education Inc., New York.

House, E.R.

1995 Principled evaluation: A critique of the AEA Guiding Principles. New Directions for Programme Evaluation, 66:27–34.

Hidden Field Morra Imas, L.G. and R.C. Rist

2009 The Road to Results: Designing and Conducting Effective Development Evaluations. World Bank, Washington, D.C.

Morris, M. and R. Cohn

1993 Programme evaluators and ethical challenges: A national survey. Evaluation Review, 17:621–642.

Thomson, S., A. Ansoms and J. Murison (eds.)

2013 Emotional and Ethical Challenges for Field Research in Africa: The Story behind the Findings. Palgrave Macmillan, Chippenham and Eastbourne.

Planning

Rigorous planning and designing for data collection can improve the quality of the approach and methods of data collection and, therefore, the quality of collected data. It is imperative to identify the approach intended for use to monitor or evaluate an intervention, and then to establish a data collection plan. Selecting an appropriate approach will also allow a relevant assessment of the monitoring or evaluation questions guiding any review, taking into account the specific context, existing constraints, access, timing, budget and availability of data.

INFORMATION - IOM migration data governance

What is it?

Data governance represents the framework used by IOM to manage the organizational structures, policies, fundamentals and quality that ensures accurate and risk-free migration data and information. It establishes standards, accountability and responsibilities and ensures that migration data and information use are of maximum value to IOM, while managing the cost and quality of handling the information. Data governance enforces the consistent, integrated and disciplined use of migration data by IOM.

How is it relevant to IOM’s work?

Data governance allows IOM to view data as an asset in every IOM intervention and, most importantly, it is the foundation upon which all IOM initiatives can rest. It is important to keep in mind the migration data life cycle throughout the whole project cycle. This includes the planning and designing, capturing and developing, organizing, storing and protecting, using, monitoring and reviewing, and eventually improving, the data or disposing of it.

Key concepts to look out for:

Data steward
Roles and responsibilities
Data quality
Data classification for security and privacy
Data processing, including collection and use

For an elaboration on the information presented on IOM migration data governance, see Annex 4.1. IOM migration data governance and monitoring and evaluation

RESOURCES

IOM resources

2009 IOM Data Protection Principles. IN/00138 (Internal link only).
2010 IOM Data Protection Manual. Geneva.
2017a Migration Data Governance Policy. IN/253 (Internal link only).
2020a IOM Migration Data Strategy: Informing Policy and Action on Migration, Mobility and Displacement 2020–2025.
n.d. Data Protection FAQs (Internal link only).

Planning for data collection

When planning for data collection, basic considerations ensure that the data to be collected and analysed is valid and reliable: purpose for data collection, methodology for data collection, resources for data collection and timing for data collection. Qualitative, quantitative or mixed methods approach to collect data can be considered in that respect.

Figure 4.2. Key considerations when planning for data collection

Purpose for data collection	Some questions to ask: Is the data collected for the purpose of monitoring or evaluation? What are the main information needs? What are the objectives, outcomes, outputs and activities being monitored or evaluated? What are the expected results (intermediate and long term)? Which stakeholder’s information needs will the data address?
Methodology and methods for data collection	Several aspects need to be considered, such as identifying the source of the data, the frequency of data collection, knowing how data will be measured, by whom and how many people will collect data and selecting the appropriate methodology in order to design the right data collection tool/s Some questions to ask: What are the criteria and questions to be addressed in the data collection tools? What type of data is needed to answer the information needs? Are multiple data sources required/used to answer the information needs? What types of data already exist? What data is missing? Are the measures used to collect data valid and reliable? Will a structured or semi-structured approach be used to collect the data? What sampling approach is needed to monitor progress or answer the evaluation questions?
Resources for data collection	Resources will be enabling the implementation of choices. Some questions to ask: Are there enough resources, such as staff and budget, to collect the data on a frequent basis? Who is responsible for collecting the data? Will external enumerators need to be hired? How will enumerators get to the data collection sites? What are the related costs? Are additional costs for data collection and analysis required?
Timing for data collection	Timing may influence the availability of resources, as well as the relevance of data (avoid outdating of data). Some questions to ask: At which stage of the implementation cycle will data be collected? How long is data collection expected to last? Will data be collected in a timely manner for it to reflect a current status quo?

Identifying the purpose of data collection

Identifying the purpose of data collection aims to address different information needs, and information needs of monitoring may also differ from those of evaluation.

Data collection for monitoring, which occurs during implementation, feeds implementation-related information needs, using data collection tools that are designed to collect data for measuring progress towards results against pre-set indicators. Data collected for evaluation serves the purpose of assessing the intervention’s results and the changes it may have brought about on a broader level, using data collection tools designed to answer evaluation questions included in the evaluation terms of reference (ToR), matrix or inception report (see also chapter 5, Planning for evaluation).

The process of planning and designing the respective tools for M&E data collection may be similar, as data collected for monitoring can also be used for evaluation, which will feed the diverse information needs of either. Identifying whether data collection is for either monitoring or evaluation purposes is a first step in planning, which will then influence the choice of an appropriate methodology and tools for data collection and analysis. The following tables show how questions can determine what type of data to collect respectively for monitoring and evaluation.

Figure 4.3. Monitoring and vertical logic

Source: Adapted from International Federation of Red Cross and Red Crescent Societies (IFRC), 2011.

Figure 4.4. Evaluation and vertical logic Evaluation and the vertical log

Source: Adapted from IFRC, 2011.

RESOURCES

International Federation of Red Cross and Red Crescent Societies (IFRC)

2011 Project/Programme Monitoring and Evaluation (M&E) Guide. Geneva.

Sources of data

The IOM Project Handbook defines data sources as identifying where and how the information will be gathered for the purpose of measuring the specified indicators. Module 2 of IOM Project Handbook, p. 143

In general, there are two sources of data that can be drawn upon for monitoring and/or evaluation purposes:

Primary data, which is the data that M&E practitioners collect themselves using various instruments, such as key informant interviews, surveys, focus group discussions and observations.
Secondary data, which is data obtained from other pre-existing sources, such as a country census or survey data from partners, donors or government.

Sources_of_data

Note that in cases where IOM works with implementing partners that are directly managed by IOM, the data collected is still considered primary data collected by IOM.

INFORMATION - Availability and quality of secondary data

It is important to assess the availability and quality of secondary data, as this enables M&E practitioners to target efforts towards the collection of additional data. For instance, it is important to ascertain whether baseline data (such as census data) are available, and, if so, to determine its quality. Where this is not the case or where the quality of the data is poor, M&E practitioners are required to plan for the collection of baseline data.

Desk review

When choosing sources of data, it is helpful to start with a desk review to better assess what type of data to use. For monitoring, this corresponds to the information included under the column “Data source and collection method” of the IOM Results Matrix and Results Monitoring Framework (see chapter 3). For evaluation, the type of data will be clarified in the evaluation ToR, inception report and/or evaluation matrix and can also include data derived from monitoring.

A desk review usually focuses on analysing existing relevant primary and secondary data sources and can be either structured or unstructured. Structured desk reviews use a formal structure for document analysis, whereas unstructured reviews are background reading. For detailed guidance on conducting a desk review, see Annex 4.2. How to conduct a desk review.

Type of measurement

When planning for data collection and analysis, knowing the type of measurement, that is how data will be measured, may influence the decision to choose the appropriate methodology. This is of particular importance to inform the design of data collection tools such as surveys.

Questions to consider
	What is it that you want to measure? What is the purpose of measuring it? How will you go about measuring it?

Measures of indicators identified in a Results Matrix or Evaluation Matrix can include categorical (qualitative) and/or numerical (quantitative) variables. A variable is any characteristic or attribute that differs among and can be measured for each unit in a sample or population (see section on “Sampling”).

Categorical variables represent types of qualitative data that can be divided into groups or categories.Categorical variables can further be categorized as either nominal, dichotomous (binary) and ordinal. Such groups may consist of alphabetic (such as gender, hair colour or religion) or numeric labels (such as female = 1, male = 0), or binary labels (such as yes or no) that do not contain information beyond the frequency counts related to group membership.For more information, see Laerd Statistics, n.d. on the types of variables.
Numerical variables (also known as quantitative variables) are used to measure objective things that can be expressed in numeric terms such as absolute figures, such as the number of persons trained, disaggregated by sex, a percentage, a rate or a ratio

When designing indicators, the most important tasks are to logically link these to the intervention results and determine how the indicators will measure these results.

EXAMPLE

Outcome A

Migrants are asserting their rights in a legal manner.

What is it that you want to measure?

Indicator for outcome A

The number of migrants that go to court to assert their human rights.

What is the purpose of measuring?

To asssess progress towards the outcome

Potential method to capture information

To ask beneficiaries whether or not they have turned to courts over the past years to assert that their human rights are respected and, if so, how many times.

For the purpose of the IOM Monitoring and Evaluation Guidelines, IOM uses the OECD/DAC definition of beneficiary/ies or people that the Organization seeks to assist as “the individuals, groups, or organisations, whether targeted or not, that benefit directly or indirectly, from the development intervention. Other terms, such as rights holders or affected people, may also be used.” See OECD, 2019, p. 7. The term beneficiary/ies or people that IOM seeks to assist will intermittently be used throughout the IOM Monitoring and Evaluation Guidelines, and refers to the definition given above, including when discussing humanitarian context.

How will you go about measuring it?

By conducting a survey

Measurement quality

Any measure that is intended to be used should be relevant, credible, valid, reliable and cost-effective. The quality of indicators is determined by four main factors:

(a) Quality of the logical link between the indicator and what is being measured (such as the objective, outcome, output and/or impact of an intervention)

What is being measured and why? What are the potential indicators?
Why does/do the indicator(s) measure the objective, outcome, output and/or impact?
How does the indicator measure the objective, outcome, output and/or impact?

(b) Quality of the measurement

Are the indicators measuring what they are designed to measure (validity)?
Do the indicators provide the same results when the measurements are repeated (reliability)?

(c) Quality of implementation

Are the financial costs of measuring the indicators worth the information to be collected (cost-effectiveness)?
Are the data collection instruments the most appropriate given the established indicators for measuring the intervention objectives, outcomes, outputs and/or impact (relevancy)? Limited resources (time, personnel and money) can often prevent the use of the most appropriate data collection instruments.

(d) Quality of recognizing the measurement results and their interpretation

To what extent are the measurement results and their interpretation accepted as a basis for decision-making by those involved (credibility)?

Table 4.1 provides a checklist for ensuring good quality measures.

Table 4.1. Checklist for measuring quality
Criteria	Reflection checklis	√
Relevancy	Does it measure what really matters as opposed to what is easiest to measure?
Credibility	Will it provide credible information about the actual situation?
Validity	Does the content of the measure look as if it measures what it is supposed to measure? Will the measure adequately capture what you intend to measure?
Reliability	If data on the measure are collected in the same way from the same source using the same decision rules every time, will the same results be obtained?
Cost-effectiveness	What is the cost associated with collecting and analysing the data? Is the measure cost-effective?

Source: Adapted from Morra Imas and Rist, 2009, p. 293.

RESOURCES

IOM resources

2017b IOM Project Handbook. Second edition. Geneva (Internal link only).

Other resources

Laerd Statistics

n.d. Types of variable.

Organisation for Economic Co-operation and Development (OECD)

2019 Better Criteria for Better Evaluation: Revised Evaluation Criteria Definitions and Principles for Use.OECD/Development Assistance Committee (DAC) Network on Development Evaluation.

Stockmann, R. (ed.)

2011 A Practitioner Handbook on Evaluation. Edward Elgar, Cheltenham and Northampton.

Levels of measurement

The values that a variable takes form a measurement scale, which is used to categorize and/or quantify indicators. They can be nominal, ordinal, interval or ratio scales. The levels of measurement used will determine the kind of data analysis techniques that can or cannot be used.

Levels of measurement

Nominal scales
Nominal scales consist of assigning unranked categories that represent more of quality than quantity. Any values that may be assigned to categories only represent a descriptive category (they have no inherent numerical value in terms of magnitude). The measurement from a nominal scale can help determine whether the units under observation are different but cannot identify the direction or size of this difference. A nominal scale is used for classification/grouping purposes.

EXAMPLE - Question

Site type

(Select one option)

(a) Host communities

(b) Collective settlement/centre

(d) Camp/Site

(e) Others (specify):_____________________

Ordinal scales
Ordinal scales are an ordered form of measurement, consisting of ranked categories. However, the differences between the categories are not meaningful. Each value on the ordinal scale has a unique meaning, and it has an ordered relationship to every other value on the scale. The measurement from an ordinal scale can help determine whether the units under observation are different from each other and the direction of this difference. An ordinal scale is used for comparison/sorting purposes.

EXAMPLE - Question

How often do you interact with local people?

(a) Every day (5)

(b) A few times per week (4)

(d) A few times per year (2)

(e) Not at all (1)

Since ordinal scales closely resemble interval scales, numerical scores (as illustrated in the above example) are often assigned to the categories. The assignment of numerical scores makes it possible to use more powerful quantitative data analysis techniques than would otherwise be possible with non-numerical data.

Interval scales
Interval scales consist of numerical data that have no true zero point with the differences between each interval being the same regardless of where it is located on the scale. The measurement from an interval scale can help determine both the size and the direction of the difference between units. However, since there is no true zero point, it is not possible to make statements about how many times higher one score is than another (for example, a rating of 8 on the scale below is not two times a rating of 4). Thus, an interval scale is used to assess the degree of difference between values.

EXAMPLE - Question

Compared to your financial situation before leaving, how would you rate your current financial situation?

Question

Ratio scales
Ratio scales consist of numerical data with a true zero point that is meaningful (that is, something does not exist), and there are no negative numbers on this scale. Like interval scales, ratio scales determine both the absolute size (that is, measure distance from the true zero point) and the direction of the difference between units. This measurement also allows to describe the difference between units in terms of ratios, which is not possible with interval scales. Thus, a ratio scale is used to assess the absolute amount of a variable and compare measurements in terms of a ratio.

EXAMPLE - Question

What was your income last month? _________________

Note: An annual income of USD 20,000 is 4 times as large as an annual income of USD 5,000.

Table 4.2. Summary of measurement scales
Scale	Values	Type	What it provides	Examples
Nominal	Discrete	Categorical	Values have no order Frequency Mode	Gender: Male (1); Female (2) Marital status: Married (4);Single (3); Divorced (2);Widowed (1)
Ordinal	Discrete	Categorical	Order of values is known Frequency of distribution Mode Media Mean*	The assistance received was appropriate and timely. Entirely agree (4);Agree (3); Disagree (2);Entirely disagree (1)
Interval	Continuous	Numerical	Order of values is known Frequency of distribution Mode Media Mean Quantify difference between each value Can add or subtract values No true zero point	Mental health score Political orientation
Ratio	Continuous	Numerical	Order of values is known Frequency of distribution Mode Media Mean Quantify difference between each value Can add or subtract values Can multiply and divide values Has a true zero point	The distance travelled from point of origin to destination Income
* Ordinal scales are often treated in a quantitative manner by assigning scores to the categories and then using numerical summaries, such as the mean and standard deviation.

The most important task of any indicator is to ensure the best possible allocation of the characteristics being measured to the measurement scale. This segregation of the characteristics “and their measurable statistical dispersion (variance) on the scale are the main insights gained because of the indicator (the variables)”.Stockmann, A Practitioner Handbook on Evaluation, p. 204

Sampling

When planning for data collection and thinking of the type of data that will be collected, it is important to assess the target audience from which the data will be collected. A crucial consideration that may influence decision-making is to determine the sample size and sampling strategy to select a representative sample of respondents, as this has budgetary implications.

While at times it may be feasible to include the entire population in the data collection process, at other times, this may not be necessary nor feasible due to time, resource and context-specific constraints, so a sample is selected.

INFORMATION

A population, commonly denoted by the letter N, is comprised of members of a specified group. For example, in order to learn about the average age of internally displaced persons (IDPs) living in an IDP camp in city X, all IDPs living in that IDP camp would be the population.

Because available resources may not allow for the gathering of information from all IDPs living in the IDP camp in city X, a sample of this population will need to be selected. This is commonly denoted by the lowercase letter n. A sample refers to a set of observations drawn from a population. It is a part of the population that is used to make inference about/is representative for the whole population.

Illustration of population (N) versus sample (n)

Sampling is the process of selecting units from a population (that is, a sample) to describe or make inferences about that population (that is, estimate what the population is like based on the sample results).

Questions to ask when sampling
	Who will be the respondents for data collection? How many people will data be collected from? Why is it better to collect data from group A rather than group B? How many responses will need to be collected to make the findings reliable, valid and representative of a larger population or group? How much data will need to be collected to enable an in-depth analysis? Will it be enough to speak to just a few beneficiaries about the results of a particular activity, or will the entire target population be needed? What is the ideal balance between information participants can provide and the number of participants required until the information needed is acquired?

Sampling applies to both qualitative and quantitative monitoring/evaluation methods. Whereas random sampling (also referred to as probability sampling) is often applied when primarily quantitative data collection tools are used for monitoring/evaluation purposes, non-random sampling (also referred to as non-probability or purposeful sampling) tends to be applied to monitoring/evaluation work that relies largely upon qualitative data Adapted from Trochim, 2020a and Lærd Dissertation, n.d.

Properly selecting a sample, ideally at random, can reduce the chances of introducing bias in the data, thereby enhancing the extent to which the gathered data reflects the status quo of an intervention. Bias is any process at any stage in the design, planning, implementation, analysis and reporting of data that produces results or conclusions that differ systematically from the truth. Adapted from Sackett, 1979.For more information on the types of bias, see Annex 4.3. Types of bias.

EXAMPLE

Country Y has a site hosting 1,536 IDPs; this is the entire population (N).

IOM is implementing several activities, alongside other humanitarian actors, to address the needs of the IDPs sheltering at this site. You are interested in monitoring/evaluating these activities. In particular, you are trying to capture the views of an average person benefiting from this intervention.

Due to time and budget constraints, it is impossible to survey every IDP benefiting from IOM services. Therefore, you pick a sample (n) that represents the overall view of the 1,536 IDPs benefiting from the intervention. Given the available resources, the representative sample for the target population in this case was chosen to be 300

Figure 4.6. Illustration of example

Random sampling

Random sampling is an approach to sampling used when a large number of respondents is required and where the sample results are used to generalize about an entire target population. In other words, to ensure that the sample really represents the larger target population and that not only reflecting the views of a very small group within the sample, representative individuals are randomly chosen. Random sampling is an effective method to avoid sampling bias.

True random sampling requires a sampling frame, which is a list of the whole target population from which the sample can be selected. This is often difficult to apply. As a result, other random sampling techniques exist that do not require a full sampling frame (systematic, stratified and clustered random sampling).

Table 4.3. Summary of types of random sampling
Criteria	Definition	Purpose	Advantages	Disadvantages
Simple random sampling	Simple random sampling is a technique where each member of the population has an equal chance of being selected as subject.	When the target population is small, homogeneous and easily accessible	High degree of representativeness of the target population	Time consuming and expensive Requires a sampling frame Results can vary considerably if target population is very heterogeneous Difficult to do for large/dispersed populations Small subpopulations of interest may not be present in the sample in sufficient numbers
Systematic random sampling	Systematic random sampling is a technique that randomly selects a number near the beginning of the sampling frame list, skips several numbers, and selects another number, skips several more numbers, and selects the next name, and so on. The number of names skipped at each stage depends on the desired sample size.		High degree of representativeness of the target population
Stratified random sampling	Stratified random sampling divides the sampling frame in two or more strata (subpopulations) according to meaningful characteristics, such as type of migrant or gender from which participants are then randomly selected.	When the population is heterogeneous and contains several different subpopulations, some of which are of interest for the monitoring/ evaluation exercise	High degree of representativeness of the subpopulations in the target population	Time consuming and expensive More complex than simple and systematic random sampling Strata must be carefully defined
Cluster random sampling	Cluster random sampling divides the population into many clusters (such as neighbourhoods in a city) and then takes a simple random sample of the clusters. The units in each cluster constitute the sample.	When both the target population and the desired sample size are large	Easy and convenient Can select a random sample when the target population sampling frames are very localized	Clusters may not be representative of the target population Important subpopulations may be left out Statistical analysis more complicated
Multistage random sampling	Multistage random sampling combines two or more of the random sampling techniques sequentially (such as starting with a cluster random sample, followed by a simple random sample or a stratified random sample).	When a sampling frame does not exist and is inappropriate	Multiple randomizations Can select a random sample when the target population lists are very localized	Can be less expensive, but more complex than cluster sampling

Non-random/Purposeful sampling

Figure 4.7. Non-random sample

Non-random sampling is used where:

Large number of respondents are not required;
The research is exploratory;
Qualitative methods are used;
Access is difficult;
The population is highly dispersed.

Non-random/purposeful sampling is appropriate when there is a small “n” study, the research is exploratory, qualitative methods are used, access is difficult or the population is highly dispersed. For further information as to when it is appropriate to use non-random sampling, see Patton (2015) and Daniel (2012). The chosen sampling technique will depend on the information needs, the methodology (quantitative or qualitative) and the data collection tools that will be required.

Table 4.4. Summary of most common types of non-random/purposeful sampling techniques
Types of non-random sampling	Definition	Purpose	Advantages	Disadvantages
Purposeful sampling	Purposeful sampling selects individuals from the target population according to a set of criteria.	When the sample needs to fulfil a purpose	Ensures balance of group sizes when multiple groups are to be selected Sample guaranteed to meet specific criteria	Sample not easily defensible as being representative of the target population due to potential researcher bias
Snowball sampling	Snowball sampling makes contact with an individual from the target population, who then gives names of further relevant persons to contact from the target population.	When individuals from the target population are difficult to get in contact with	Possible to include individuals of groups for which no sampling frame or identifiable clusters exist	Difficult to know whether the sample is representative of the target population
Quota sampling	Quota sampling selects individuals from categories or subpopulations in direct proportion to their existence in the target population.	When strata are present in the target population, but stratified sampling is not possible	Ensures selection of adequate numbers of individuals from the target population with the appropriate characteristics	Need a good understanding of the target population Quota sample may be unrepresentative
Convenience sampling	Convenience sampling asks a set of individuals from the target population who just happen to be available.	When individuals of the target population are convenient to sample	Easy and inexpensive way to ensure sufficient numbers for a monitoring/ evaluation exercise	Likely unrepresentative sample Cannot generalize to target population

Note: While the table shows the most common types of non-random/purposeful sampling, further types of non-random/purposeful sampling can be found in Patton, 2015.

Limitations of non-random/purposeful sampling

There are several limitations when using non-random/purposeful samples, especially convenience and snowball samples. First, generalizations to the entire target population cannot be made. Second, statistical tests for making inferences cannot be applied to quantitative data. Finally, non-random samples can be subject to various biases that are reduced when the sample is selected at random. If using a non-random sample, M&E practitioners should ask the following: “Is there something about this particular sample that might be different from the population as a whole?” If the answer is affirmative, the sample may lack representation from some groups in the population. Presenting demographic characteristics of the sample can provide insight as to how representative it is of the target population from which the sample was drawn.

Table 4.5. Non-random/Purposeful versus random sampling
Non-random/Probability sampling	Random sampling
Sample selection is based on the subjective judgement of the researcher	Sample is selected at random
Subjective method	Objective method
Analytical inference	Statistical inference
Not everyone from the population has an equal chance of getting selected	Everyone in the population has an equal chance of getting selected
Sampling bias may not be considered	Useful to reduce sampling bias
Useful when the population has similar traits	Useful when the population is diverse
Sample does not accurately represent the population	Useful to create an accurate sample
Finding the right respondents is easy	Finding the right respondents can prove challenging
Exploratory findings	Conclusive findings

Source: Adapted from Sheppard, 2020.

TIP

Regardless of which sampling approach and technique you decide to use, it is important that you are clear about your sample selection criteria, procedures and limitations.

RESOURCES

Resources for random sampling and non-random/purposeful sampling are provided in Annex 4.4. Applying types of sampling.

Daniel, J.

2012 Sampling Essentials: Practical Guidelines for Making Sampling Choices. SAGE Publications, Thousand Oaks.

Lærd Dissertation

n.d. Purposive sampling.

Patton, M.Q.

2015 Qualitative Research and Evaluation Methods. Fourth edition. SAGE Publications, Thousand Oaks.

Sackett, D.L.

1979 Bias in analytic research. Journal of Chronic Diseases, 32:51–63.

Sheppard, V.

2020 Chapter 7: Sampling techniques. In: Research Methods for Social Sciences: An Introduction. Pressbooks.

Stockmann, R. (ed.)

2011 A Practitioner Handbook on Evaluation. Edward Elgar, Cheltenham and Northampton

Trochim, W.M.K.

2020a Nonprobability sampling. Research Methods Knowledge Base.
2020b Probability sampling. Research Methods Knowledge Base.

Determining sample size

What you are trying to measure?
What is the purpose?
How you will measure it?

The size of the sample will be determined by what will be measured, for what purpose and how it will be measured. The size of the sample will also need to ensure, with the maximum level of confidence possible, that an observed change or difference between groups is the result of the intervention, rather than a product of chance. However, this may not always be the case for non-random/purposeful sampling.

Determining sample size: Random sampling

When a large number of respondents is required, the appropriate sample size is decided by considering the confidence level and the sampling error.

Table 4.6. Confidence level and sampling error

Confidence level

Sampling error

How confident should the person collecting data be in the sample results and their accuracy in reflecting the entire population?

Generally, the confidence level is set at 95 per cent, that is, there is a 5 per cent chance that the results will not accurately reflect the entire population.

In other words, if a survey is conducted and it is repeated multiple times, the results would match those from the actual population 95 per cent of the time.

In order to be 99 per cent confident, the sample size must be larger than it would need to be to achieve a 90 per cent confidence level.

Increasing the confidence level requires increasing the sample size.

It is important to determine how precise estimates should be for the purpose of data collection. This is the sampling error or margin of error.

The sampling error or margin of error is the estimate of error that arises when data is gathered on a sample rather than the entire population.

A sampling error or margin of error occurs when a sample is selected that does not represent the entire population.

EXAMPLE - Confidence level and sampling error

IOM is currently implementing a livelihoods project in region M of country Y. A poll is taken in region M, which reveals that 62 per cent of the people are satisfied with the activities organized through the livelihoods project and 38 per cent of those surveyed are not satisfied with the assistance received.

The M&E officer responsible for data collection in this case has decided that the sampling error for the poll is +/- 3 per cent points. This means that if everyone in region M were surveyed, between 59 (62 -3) and 65 (62 +3) per cent would be satisfied and between 35 (38 -3) and 41 (38 +3) per cent would not be satisfied with the assistance received at the 95 per cent confidence level. The plus or minus 3 per cent points is called the confidence interval, which is the range within which the true population value lies with a given probability (that is, 95% confidence level). In other words, the +/- 3 per cent points is the confidence interval and represents the width of confidence level, which tells more about uncertain or certain we are about the true figure in the population. When the confidence interval and confidence level are put together, a spread of a percentage results.

RESOURCES - Online sample size calculator

A number of tools are available online to help calculate the sample size needed for a given confidence level and margin of error. One useful tool is the Survey System Sample Size Calculator as well as the Population Proportion – Sample Size Calculator.

EXAMPLE- How to calculate the sample size using an online calculator

At the IDP site in country Y, there are 1,536 IDPs. You would like to make sure that the sample you select is adequate. You decide that having 95 per cent confidence in the sample results with a margin of error of 5 per cent is acceptable. The accuracy and precision for the population of interest tells you that you need a sample size of 307 IDPs to be able to generalize the entire population of IDPs at the site.

calculate the sample size using an online calculator

For a study that requires a small number of participants, selecting small random samples can give highly misleading estimates of the target population. Therefore, non-random sampling is more appropriate.

Determining sample size: Non-random/purposeful sampling

For non-random/purposeful sampling, an indication of whether an adequate sample has been reached or not is data saturation. Once this point is reached, no more data needs to be collected. However, due to little guidance on how many interviews are needed to reach saturation, this can be sometimes difficult to identify.

The following questions can help determine how many people to include in the sample achieving both data saturation and credibility:

Should all population segments be included in the sample?
Should people with diverse perspectives be included in the sample?
Should the findings be triangulated (see section on “Triangulation”)?

Methods, approaches and tools for monitoring and evaluation

Once data collection has been planned and data sources and sampling have been established, it is time to focus on approaches and methods for designing the data collection tools. The indicators in the Results Matrix, as well as the evaluation criteria and related questions, will determine the approach and tools that will be used to collect the necessary data for monitoring progress/evaluating the intervention.

Time and budget constraints, as well as ethical or logistical challenges, will inform the data collection approach and tools used. The use of multiple tools for gathering information, also known as the triangulation of sources, can increase the accuracy of the information collected about the intervention. For instance, if the intervention is managed remotely due to lack of access to the field and relies upon data collection teams, triangulating the information remotely is a crucial quality check mechanism.

While triangulation is ideal, it can also be very expensive. In general, M&E practitioners use a combination of surveys, interviews, focus groups and/or observations. Studies that use only one tool are more vulnerable to biases linked to that particular method.

Methods for and approaches to data collection are systematic procedures and useful to support the process of designing data collection tools. Generally, a mixture of qualitative and quantitative methods and approaches to data collection are used for M&E. Although there are multiple definitions for these concepts, quantitative methods and approaches can be viewed as being based on numerical data that can be analysed using statistics. They focus on pinpointing what, where, when, how often and how long something occurs and can provide objective, hard facts, but cannot explain why something occurs. Qualitative methods and approaches for data collection are based on data that are descriptive in nature, rather than data that can be measured or counted. Qualitative research methods can use descriptive words that can be examined for patterns or meaning and, therefore, focus on why or how something occurs.

The following provides an overview of when a quantitative and/or qualitative approach, and corresponding tools for collecting monitoring and/or evaluation data should be used:

	Table 4.7. Quantitative versus qualitative approaches for monitoring and evaluation
	Quantitative approach	Qualitative approach
What	Structured Emphasizes reliability Harder to develop Easier to analyse	Less structured Emphasizes validity Easier to develop Can provide “rich data” but is more labour intensive to collect and analyse
Why	Want to count things to explain what is observed Want to generalize to entire target population Want to make predictions/provide causal explanations Know what you want to measure	Want complete, detailed description of what is observed Want to understand what is observed Want narrative or in-depth information Not sure what you are able to measure Want to attain a more in-depth understanding or insight
Tools	Surveys Interviews Observations	Surveys Interviews Focus group discussions Case studies Observations
Sample	Large-n (sample) that is representative of the target population Respondents selected using some form of random sampling	Small-n (sample) that is unrepresentative of the target population Respondents usually selected according to their experience
Output	Numerical data	Words and pictures
Analysis	Statistical	Interpretive

Source: Adapted from Morra-Imas and Rist, 2009.

The following graphic provides an overview of data collection methods for both monitoring and evaluation.

Frequently used data collection methods
Surveys	Interviews	Focus group discussions	Case studies	Observation
Additional data collection methods
Brainstorming	Strengths, weaknesses, opportunities and threats (SWOT)	Dreams realized or visioning (DR/V)	Drama and role plays	Photos and videos
		Geographic information system (GIS) mapping

Surveys

Surveys are a common technique for collecting data. Surveys can collect focused, targeted information about a sample taken from the target population for a project, programme or policy, especially data about perceptions, opinions and ideas. While surveys can also be used to measure intended behaviour, there is always room for interpretation, and any data gathered may be less “factual” as what people say they (intend to) do may not reflect what they in fact do in reality.

Generally, a survey is conducted with a relatively large sample that is randomly selected so that the results reflect the larger target population (see section on Sampling). The format of the survey can be structured or semi-structured, depending on the purpose of the data collection (see Table 4.8) and be implemented on a one-time basis (cross-sectional) or over a period of time (longitudinal).

Cross-sectional surveys are used to gather information on the target population at a single point in time, such as at the end of a project. This survey format can be used to determine the relationship between two factors, for example, the impact of a livelihoods project on the respondent’s level of knowledge for establishing an income-generating activity.

Longitudinal surveys gather data over a period of time, allowing for an analysis of changes in the target population over time, as well as the relationship between factors over time. There are different types of longitudinal surveys, such as panel and cohort studies.Both panel and cohort studies are approaches to the design of longitudinal studies. Cohort studies follow people identified by specific characteristics in a defined time period, whereas panel studies aim to cover the whole population (Lugtig and Smith, 2019).

Table 4.8. Structured versus semi-structured surveys

Structured

Semi-structured

Content

Closed-ended questions with a predetermined set of response options.
Each respondent is asked the same questions in the same way and is given the same response options.

Content

A mixture of closed- and open-ended questions with some predetermined set of response options.
Each respondent is asked the same questions in the same way; however, for open-ended questions, they are not provided with a predetermined set of response options.

Purpose

Aggregate and make comparisons between groups, and/or across time, on issues about which there is already a thorough understanding.

Purpose

Acquire an in-depth understanding of the issues that are being monitored and/or evaluated.

RESOURCES

For more information about the different types, design and implementation of longitudinal surveys, see the following:

IOM resource

2019a Post training completion evaluation form (Internal link only).

Other resouces

Lugtig, P. and P.A. Smith

2019 The choice between a panel and cohort study design.

Lynn, P. (ed.)

2009 Methodology of Longitudinal Surveys. Wiley, West Sussex.

Morra-Imas, L.G. and R.C. Rist

2009 The Road to Results: Designing and Conducting Effective Development Evaluations. World Bank, Washington, D.C.

(Kindly note that this can further be adapted as needed.)

Surveys can be administered in different ways, such as in-person interviews, phone interviews or as paper or online questionnaires that require participants to write their answers.

o design and implement a survey

For more information on how to design and implement a survey, see Annex 4.5. Survey design and implementation and Annex 4.6. Survey example

Interviews

Interviews are a qualitative research technique used to shed light on subjectively lived experiences of, and viewpoints from, the respondents’ perspective on a given issue, or sets of issues, that are being monitored or evaluated for a given intervention. Interviews provide opportunities for mutual discovery, understanding, reflection and explanation. Interviews are of three types: (a) structured; (b) semi-structured; and (c) unstructured. Table 4.9 provides an overview of each interview approach, when to use it and some examples.

	Table 4.9. Types of interviews
	Confidence level	Sampling error	Unstructured
What is it?	Mostly closed-ended questions. All respondents are asked the same questions in the same order. No probing beyond the set of questions.	A mixture of closed- and open-ended questions. Can leave certain questions out, mix the order of questions or ask certain standard questions in different ways depending on the context. Allows for probes and clarifications beyond the initial pre-established set of questions.	No predetermined questions and response options. Open conversation guided by a central topic area or theme (such as respondent’s life) and lets the respondent guide the interview. Allows for probes and clarifications.
When to use it?	When there is already a thorough understanding about one or more complex issues being monitored/evaluated. When comparable data is desired/needed.	To obtain an in-depth understanding about one or more complex issues being monitored and/or evaluated. When there is less need for comparable data.

INFORMATION - Formulating interview questions

Good-quality interview questions should have the following characteristics:

Simple and clear and do not use acronyms, abbreviations or jargon;
Not double barreled, such that it touch on more than one subject, while allowing for only one answer;
Favour open-ended and elaborate answers. If including yes/no questions, these should be followed by requests for further explanations, “Why?”, “In what ways?”, or they should be reworded to encourage a more fine-grained answer.
Straightforward (no double negatives), neutral and non-leading;
Non-threatening and non-embarrassing to the interviewee;
Accompanied by appropriate probes.Probes are responsive questions asked to clarify what has been raised by the respondent. The aim is to obtain more clarity, detail or in-depth understanding from the respondent on the issue(s) being monitored/evaluated. For more information, see Annex 4.7. Interview structure and questions

To know more about interviews, examples of interview structure and probing, see Annex 4.7. Interview structure and questions (examples provided throughout the annex) and Annex 4.8. Interview example.

Focus group discussions

A focus group is another qualitative research technique in the form of a planned group discussion among a limited number of people, with a moderator and if possible, note takers, as well as observers if also using observations.Usually, focus group discussions should not exceed 15 participants. For more participants, community group interview techniques may be used. The purpose of a focus group is to attain diverse ideas and perceptions on a topic of interest in a relaxed, permissive environment that allows the expression of different points of view, with no pressure for consensus. Focus groups are also used to acquire an in-depth understanding about a topic or issue, which is generally not possible using a survey. For instance, a survey can tell you that 63 per cent of the population prefers activity Y, but a focus group can reveal the reasons behind this preference. Focus groups can also help check for social desirability bias, which is the tendency among survey respondents to answer what they think the enumerator wants to hear, rather than their actual opinions. For example, during the focus group discussion, one may discover that the actual preference of the participants is activity Z, not activity Y, as per their responses to the survey. However, focus groups provide less of an opportunity to generate detailed individual accounts on the topic or issue being explored. If this type of data is required, one should use interviews instead. If someone is answering too often, it is important to identify if this behaviour intimidates other participants and moderate the discussions inviting others to contribute. It is also important to understand who that person is, for instance, a political leader trying to impose answers to the group.

focus group discussions

To know more about focus group discussions, see Annex 4.9. Preparing, conducting and moderating a focus group and Annex 4.10. IOM example of a focus group discussion guide.

Case study

A case study is a qualitative data collection method that is used to examine real-life situations and if the findings of the case can illustrate aspects of the intervention being monitored and/or evaluated. It is a comprehensive examination of cases to obtain in-depth information, with the goal of understanding the operational dynamics, activities, outputs, outcomes and interactions of an intervention.

Case studies involve a detailed contextual analysis of a limited number of events or conditions and their relationships. It provides the basis for the application of ideas and extension of methods. Data collected using a case study can help understand a complex issue or object and add strength to what is already known.

A case study is useful to explore the factors that contribute to outputs and outcomes. However, this method of data collection may require considerable time and resources, and information obtained from case studies can be complex to analyse and extrapolate.

RESOURCES

For further information on case studies and how to conduct them, please see the following:

Gerring, J.

2007 Case Study Research Principles and Practices. Cambridge University Press, New York.

Neuman, W.L.

2014 Social Research Methods: Qualitative and Quantitative Approaches. Seventh edition. Pearson Education Limited, Essex

Observation

Observation is a research technique that M&E practitioners can use to better understand participants’ behaviour and the physical setting in which a project, programme or policy is being implemented. To observe means to watch individuals and their environments and notice their behaviours and interactions by using all five senses: seeing, touching, tasting, hearing and smelling.

Observations should be used on the following:

Gathering data on individual behaviours or interactions between people and their environment;
When there is a need to know about a physical setting;
When data collection from interviews/surveys with individuals is not feasible.CDC, 2018.

Observations can be conducted in a structured, semi-structured or unstructured approach.

	Table 4.10. Overview of observation approaches
	Structured	Semi-structured	Unstructured
What	Looking for a specific behaviour, object or event	Looking for a specific behaviour, object or event, how they appear or are done, and what other specific issues may exist	Looking at how things are done and what issues exist without limiting it to a specific behaviour, object or event
Why	Collect information about the extent to which particular behaviours or events occur, with information about the frequency, intensity and duration of the behaviours	Collect information about the extent to which and why particular behaviours or events occur without predetermined criteria, such as frequency, intensity or duration	Observe and understand behaviours and events in their physical and sociocultural context without predetermined intent or criteria
How	A set of closed-ended questions and/or a checklist to function both as a reminder and a recording tool	A set of closed-ended and open-ended questions and/or checklist	A set of open-ended questions and/or issues that will be answered/examined based on observations

For more information, tips on and examples of observations, as well as planning and conducting observations, see Annex 4.11. Examples of observations and planning and conducting observations.

Additional methods for data collection for monitoring and evaluation

Additional data collection methodsThe following information is adapted from IFAD, 2002
Method	Definition
Brainstorming	Brainstorming means to gain many ideas quickly from a group without delving into a deeper and more detailed discussion. It encourages critical and creative thinking, rather than simply generating a list of options, answers or interests. From an M&E perspective, this method is often a first step in a discussion that is followed by other methods.
Drama and role plays	Drama and role plays are used to encourage groups of people to enact scenes from their lives concerning perceptions, issues and problems that have emerged relating to a project intervention, which can then be discussed. Drama can also help a group to identify what indicators would be useful for monitoring or evaluation and identify changes emerging from a project intervention.
DR/V	DR/V serves the purpose of understanding people’s dreams or shared visions for the future of an intervention by means of a focused discussion. This is a good method for identifying indicators, understanding if primary stakeholders feel that their well-being is increasing or not and helping stakeholders reflect on the relevance of the intervention based on people’s visions for development.
GIS mapping	Using computer-based GIS that represents geographic coordinates in a very precise map can help present information relating to changes in geographical, social or developmental indicators. From an M&E perspective, GIS can help to analyse complex data collected, as the various thematic layers of spatial information can be overlaid for easy examination of relationships between the different themes.
Photographs and videos	This data collection method helps track changes across a series of sequenced photographs or videos. From an M&E perspective, it helps focus on specific indicators or performance questions, or can be more open-ended if needed; for instance, when asking stakeholders to document/assess change from their perspective.
SWOT analysis	The purpose of a SWOT analysis is to identify the strengths, weaknesses, opportunities and threats in relation to an intervention or group, and how such an assessment may change over time. This method is useful for qualitative assessments, such as the services provided by the implementation and relationships between relevant stakeholders involved.

INFORMATION - Methods for impact evaluations

Impact evaluations aim to identify a proper counterfactual and whether impact can be confidently attributed to an interventionThe following information is adapted from IFAD, 2015 and from BetterEvaluation, n.d..Specifically, this may be done by assessing the situation of the beneficiaries “before and after” and “with or without” the intervention. By comparing the before and after and/or with or without scenarios, any differences/changes observed can be attributed to the intervention, with some reservations as it is not always straightforward and attribution may be more complex to assess than by answering the above scenarios.

A common first step in impact evaluation is to determine the sample size and sampling strategy to select a representative sample from both the treatment group (participating in the intervention) and comparison group (not participating in the intervention). The calculation of a robust and representative sample depends on various factors.

While there is a range of impact evaluation designs, there is also a range of methods that are applicable within these designs.The following information is adapted from UNEG, 2013. To answer the specific evaluation questions, methods are flexible and can be used in different combinations within impact evaluation designs. Experimental, quasi-experimental and non-experimental are three types of impact evaluation design.

Experimental methods

Experimental methods, also called randomized control trials, use randomization techniques at the outset of the intervention to sample both intervention and comparison groups.A randomized controlled trial is an experimental form of impact evaluation in which the population receiving the intervention (intervention group) is chosen at random from the eligible population, and a control group (not receiving intervention) is also chosen at random from the same eligible population. Both groups are chosen randomly and have equal chance of participation (see White et al., 2014). While there are different methods to randomize a population, a general requirement is that the two groups remain as similar as possible in terms of socioeconomic characteristics and that their size should be broadly equivalent. Ensuring these makes them comparable and maximizes the statistical degree of precision of the impact on the target group. IFAD, 2002.

Given the rigourous approach to selecting treatment and control groups, as well as the frequency of primary data collection for generating the required data sets, experimental methods are considered the most robust for assessing and attributing impact to an intervention. However, they have cost and time implications, and might raise ethical considerations (given the purposive exclusion of a group of people from project benefits) that need to be dealt with upfront. Methods of fairly selecting participants include using a lottery, phasing in an intervention and rotating participants through the intervention to ensure that everyone benefits.

Quasi-experimental methods

Quasi-experimental designs identify a comparison group that is as similar as possible to the intervention group in terms of pre-intervention characteristics; with the key difference that quasi-experimental design lacks random assignment.White and Sabarwal, 2014 The main quasi-experimental approaches are pre-post, simple difference, double difference (difference-in-differences), multivariate regression, propensity score matching and regression discontinuity design (see Table 4.10 for definitions). White and Raitzer, 2017.

Non-experimental methods

In non-experimental methods used in ex-post-impact evaluations, the participants as well as the comparison groups are not selected randomly prior to the intervention, but the comparison group is reconstructed ex post, that is, at the time of the evaluation. To determine ex-post changes that may have occurred as a result of the intervention, impact evaluations using non-experimental methods conduct at least two complimentary analyses: “before and after” and “with or without”.

Non-experimental methods are often considered if the decision to do an impact evaluation is taken after the intervention has taken place. Ibid.

A variety of methods are used in non-experimental design to ensure that they are as similar as possible and to minimize selection bias. This can include (propensity) score matching, regression discontinuity design, difference-in-differences and instrumental variables. Gertler et al. (2011) provide an exhaustive description of non-experimental methods A description of the different techniques are found in the following table.

Table 4.11. Quasi and non-experimental methods

Methodology	Description	Who is in the comparison group?	Required assumptions	Required data
Pre-post	Measure how programme participants improved (or changed) over time.	Programme participants themselves – before participating in the programme.	The programme was the only factor influencing any changes in the measured outcome over time.	Before and after data for programme participants.
Simple difference	Measure difference between programme participants and non-participants after the programme is completed.	Individuals who didn’t participate in the programme (for any reason), but for whom data were collected after the programme.	Non-participants are identical to participants except for programme participation, and were equally likely to enter the programme before it started.	“After” data of the beforeand- after scenario for programme participants and nonparticipants.
Difference-in-differences	Measure improvement (change) over time of programme participants relative to the improvement (change) of non-participants.	ndividuals who didn’t participate in the programme (for any reason), but for whom data were collected both before and after the programme.	If the programme didn’t exist, the two groups would have had identical trajectories over this period.	Before and after data for both participants and non-participants.
Multivariate regression	Individuals who received treatment are compared with those who did not, and other factors that might explain differences in the outcomes are “controlled” for.	Individuals who didn’t participate in the programme (for any reason), but for whom data were collected both before and after the programme. In this case, data is not comprised of just indicators of outcomes, but other “explanatory” variables as well.	The factors that were excluded (because they are unobservable and/ or have been not been measured) do not bias results because they are either uncorrelated with the outcome or do not differ between participants and nonparticipants	Outcomes as well as “control variables” for both participants and non-participants.
Statistical matching	Individuals in control group are compared to similar individuals in experimental group.	Exact matching: For each participant, at least one nonparticipant who is identical on selected characteristics Propensity score matching: Nonparticipants who have a mix of characteristics, which predict that they would be as likely to participate as participants.	The factors that were excluded (because they are unobservable and/ or have been not been measured) do not bias results, because they are either uncorrelated with the outcome or do not differ between participants and non-participants	Outcomes, as well as “variables for matching” for both participants and non-participants.
Regression discontinuity design	Individuals are ranked based on specific, measurable criteria. There is some cutoff that determines whether an individual is eligible to participate. Participants are then compared to non-participants and the eligibility criterion is controlled for	Individuals who are close to the cut-off, but fall on the “wrong” side of that cut-off, and therefore do not get the programme.	After controlling for the criteria (and other measures of choice), the remaining differences between individuals directly below and directly above the cut-off score are not statistically significant and will not bias the results. A necessary but sufficient requirement for this to hold is that the cut-off criteria are strictly adhered to.	Outcomes, as well as measures on criteria (and any other controls).
Instrumental variables	Participation can be predicted by an incidental (almost random) factor, or “instrumental” variable, that is uncorrelated with the outcome, other than the fact that it predicts participation (and participation affects the outcome).	ndividuals who, because of this close to random factor, are predicted not to participate and (possibly as a result) did not participate.	If it weren’t for the instrumental variable’s ability to predict participation, this “instrument” would otherwise have no effect on or be uncorrelated with the outcome.	Outcomes, the “instrument,” and other control variables.
Randomized evaluation	Experimental method for measuring a causal relationship between two variables.	Participants are randomly assigned to the control groups.	Randomization “worked.” That is, the two groups are statistically identical (on observed and unobserved factors).	Outcome data for control and experimental groups; control variables can help absorb variance and improve “power”.

Source: IFAD, 2015.

For more information related to impact evaluation, see also chapter 5, Types of evaluation – Key considerations regarding impact evaluations.

RESOURCES

BetterEvaluation

n.d. Compare results to the counterfactual.

Gertler, P.J., S. Martinez, P. Premand, L.B. Rawlings and C.M.J. Vermeersch

2011 Impact Evaluation in Practice. World Bank, Washington, D.C.

International Fund for Agricultural Development (IFAD)

2002 Annex D: Methods for monitoring and evaluation. In: Managing for Impact in Rural Development: A Guide for Project M&E. Rome.
2014 Republic of India, Impact Evaluation of Jharjkhand – Chhattisgarh Tribal Development Programme (JCTDP). Approach Paper. Independent Office of Evaluation.
2015 Evaluation Manual. Second edition. Independent Office of Evaluation of IFAD, Rome.

Leeuw, F. and J. Vaessen

2009 Impact Evaluations and Development: NONIE Guidance on Impact Evaluation. World Bank, Washington, D.C.

United Nations Evaluation Group (UNEG)

2013 Impact Evaluation in UN Agency Evaluation Systems: Guidance on Selection, Planning and Management. Guidance document.

US Department of Health and Human Services, Centers for Disease Control and Prevention (CDC)

2018 Data collection methods for program evaluation: Observation. Evaluation Brief no. 16.

White, H. and D. Raitzer

2017 Impact Evaluation of Development Interventions: A Practical Guide. Asian Development Bank, Metro Manila.

White, H. and S. Sabarwal

2014 Quasi-experimental design and methods. Methodological Briefs, Impact Evaluation No. 8. UNICEF, Florence.

White, H., S. Sabarwal and T. de Hoop

2014 Randomized controlled trials (RCTs). Methodological Briefs, Impact Evaluation No. 7. UNICEF, Florence.

Collecting and managing data

Data collection

Once the M&E design has been identified and the method(s) and tools have been developed, the data collection can start. It is also recommended to organize a training with the data collection team(s) on the methodology. The training should cover in detail each data collection tool that will be used and include practical exercises of how to implement them.

Developing a data collection guide with clear instructions for the enumerators is a useful reference tool, both during the training and after, for the actual data collection; see the example provided below for an excerpt from a survey included in a data collection guide. Taking these steps will ensure that the collected data will be accurate with a minimum amount of error. In certain cases, however, conducting a full training is not feasible due to time and resource constraints, and having a data collection guide can be an important reference.

EXAMPLE - Excerpt from a data collection guide

Section 1: Economic situation

This section looks at the economic/financial situation of the respondent.

1. Do you have a regular source of income?			Yes	No
Objective: To find out whether or not the respondent has a regular supply of money. Possible sources of income include employment, small business and participation in a credit and savings group Instructions: First read out the question and response options and then circle the respondent's answer (yes or no).
a) (If # 1 YES) What has been your average monthly income over the past six months? ____________________________
Instructions: First read out the question and then insert the average monthly income of the respondent in the space provided beside the question. This question is to be asked only if the respondent answered "Yes" to question # 1.
b) (If # 1 NO) What was your income last month? _________________________________
Instructions: First read out the question and then insert the income amount of the respondent in the space provided beside the question. This question is to be asked only if the respondent answered ''No " to question #1.
2. How often do you receive financial support from a third party?	Always	Very often	Rarely	Never
Objective: To find out how regularly the respondent is receiving financial support from a third party which can be a person or an organization. Instructions: First read out the question and response options and then circle the respondent's answer (one of the four options listed beside the question). Each data collection team should have a supervisor who can oversee the data collection and check for any errors. During the data collection, it is imperative that the supervisor of the data collection team regularly checks for the following: Are there any forms missing? Are there any double forms for a respondent? Are there any answer boxes or options left blank? Are there more than one option selected for closed-ended questions with single-option responses? Are correct values filled out in the wrong boxes? Are the answers readable? Are there any writing errors? Are there any answers that are out of the expected range (outliers)? Doing these checks will help reduce the amount of error in the data collected.

Data entry

The data collected needs then to be transferred onto a computer application, such as Microsoft Word or Excel. Having the data in an electronic format will facilitate the data clean-up and data analysis. For quantitative data, the first step in data entry is to create the data file(s) to achieve a smooth transfer between a spreadsheet and a statistical programme package, such as SPSS and Stata for conducting statistical analyses.

How to structure a data spreadsheet

Data structure for cross-sectional data: A table of numbers and text in which each row corresponds to an individual subject (or unit of analysis) and each column corresponds to a different variable or measurement. There is one record (row) per subject.
Data structure for longitudinal data: The data can be structured in a wide data file format or a long data file format. In the wide format (see Table 4.12), a subject’s repeated responses will be in a single row, and each response is in a separate column. In the long format (see Table 4.13), each row is one time point per subject; so each subject (county) will have data in multiple rows. Any variables that don’t change across time will have the same value in all the rows.

Table 4.12. Wide format data file example
	ID	Age	Income 2015	Income 2016	Income 2017
1	067	43	30 000	30 000	32 000
2	135	37	28 000	31 000	30 000

Table 4.13. Long format data file example
	ID	Age	Income	Year
1	067	43	30 000	2015
2	067	43	30 000	2016
3	067	43	32 000	2017
4	135	37	28 000	2015
5	135	37	31 000	2016
6	135	37	30 000	2017

For qualitative data, the first step in the data entry process is transferring all the interview, focus group and observation notes to a Word document for conducting content analysis using qualitative data programme packages, such as NVivo or MAXQDA.

Another component of the data entry is assigning each subject (or unit of analysis) a unique identifier (ID) (for example: 01, 02, 03 and so on), unless this is done directly during the data collection process. To do this, a separate file should be created that matches the identifying information for each subject (unit of analysis) with their unique ID. Assigning a unique identifier to each respondent ensures that the data cannot be traced back to them if the data is disclosed to other parties.

Data clean-up

Once the data has been transferred from the medium used to record the information to a computer application (Word or Excel), it needs to be screened for errors. Following this, any errors need to be diagnosed and treated.

Data errors can occur at different stages of the design, implementation and analysis of data (see Figure 4.8):

When designing the data collection instruments (such as improper sampling strategies, invalid measures, bias and others);
When collecting or entering data;
When transforming/extracting/transferring data;
When exploring or analysing data;
When submitting the draft report for peer review.UNHCR, 2015.

Figure 4.8 Sources of error

Key errors to look for when screening dataIbid.

Spelling and formatting irregularities: Are categorical variables written incorrectly? Are date formats consistent?
Lack of data: Do some questions have fewer answers than surrounding questions?
Excessive data: Are there duplicate entries? Are there more answers than originally allowed?
Outliers/inconsistencies: Are there values that are so far beyond the typical distribution that they seem potentially erroneous?
Strange patterns: Are there patterns that suggest cheating rather than honest answers (that is, several questionnaires with the exact same answers)?
Suspect analysis results: Do the answers to some questions seem counter-intuitive or extremely unlikely?

Table 4.14. Selected data screening method
Quantitative data			Qualitative data
Browse data tables after sorting Calculate summary statistics When time allows, validate data entry Create frequency distributions and cross-tabulations Graphically explore data distributions using box plots, histograms and scatter plots with the help of visual analysis software such as Tableau desktop Detect outliers*			Check for spelling errors Compare data with assumptions or criteria Take counts of words and phrases Create frequency distributions and cross-tabulations
Depending on the number of data collection tools used and amount of data collected, data entry agents may need to be recruited and trained to do the data entry (and data clean-up).

* United Nations High Commissioner for Refugees (UNHCR), 2015.

Diagnosis

Once the suspect data has been identified, the next step is to review all the respondent’s answers to determine if the data makes sense given the context in which it was collected. Following this review, there are several possible diagnoses for each suspect data point identified:

The data point is missing. Missing data can be a result of omitted answers by the respondents (no response), questions that are skipped over by the enumerator (erroneous entry or skip pattern) or the data entry agents, or there are dropouts (for longitudinal research).
The data point is a true extreme value. True extreme values are answers that seem high but can be justified by other answers.
The data point is a true normal value. True normal values are valid answers.
The data point is an error. Errors can be either typos or inappropriate answers (questions asked were misunderstood by the respondents). Sometimes, errors can be rapidly identified when there are pre-defined cut-offs because the values are logically or biologically impossible. For example, the sample comprises only of respondents between the ages of 18 and 35; however, on the survey, a respondent is listed as being 80 years old, which is not possible. Ibid.

Treatment

Once the problematic observations have been identified, these need to be treated before the data can be analysed. The following are some of the key approaches to dealing with data errors:

Leave the data unchanged. This approach is the most conservative as it entails accepting the erroneous data as valid response(s) and making no changes. For large-n studies, leaving one erroneous response may not affect the analysis. However, for small-n studies, the decision of leaving the data unchanged may be more problematic.
Correct the data, however without modification of the intention of or meaning given by the respondent.
Delete the data. It is important to remember that leaving out data can make it seem as if the data is being “cherry-picked” to obtain the desired results. Alternatively, a binary variable can be created (1 = suspicious record; 0 = not so) and use this new variable as a record filter in Pivot tables or in-table filtering to understand the impact of potentially erroneous data in the final results.
Re-measure the suspect or erroneous values, if time and resources permit.Ibid.

General decision-making rules:Ibid.

If the person doing the data entry has entered values different from the ones in the survey, the value should be changed to what was recorded in the survey form.
When variable values do not make sense and there is no data entry error nor notes to determine where the error comes from, the value should be left as it is. Any changes will bias the data.
When blank cases are present for questions that required an answer, or if erroneous values cannot be corrected, these may be deleted from the data file.
When there are still suspect and true extreme values after the diagnostic phase, it is necessary to next examine the influence of these data points, both individually and as a group, on the results before deciding whether or not to leave the data unchanged.
Any data points taken out of the data set should be reported as “excluded from analysis” in the methodology chapter of the final report.

Missing data

Missing values require attention because they cannot be simply ignored. The first step is to decide which blank cells need to be filled with zeros (because they represent negative observation; for example “no”, “not present” and “option not taken”) and which to leave blank (if using blanks to indicate missing or not applicable). Blank cells can also be replaced with missing value codes; for example, 96 (I don’t know), 97 (refused to answer), 98 (skip question/not applicable) and 99 (blank/missing).

If the proportion of missing or incomplete cases is substantial for a category of cases, this will be a major M&E concern. Once a set of data is known to be missing, it is important to determine whether the missing data are random or whether they vary in a systematic fashion, and also the extent to which the problem exists. Random missing values may occur because the subject inadvertently did not answer some questions. The assessment may be overly complex and/or long, or the enumerator may be tired and/or not paying attention, thereby missing the question. Random missing values may also occur through data entry mistakes. If there are only a small number of missing values in the data set (typically, less than 5%), then it is extremely likely to be random. Non-random missing values may occur because the key informant purposefully did not answer some questions (confusing or sensitive question, no appropriate choices such as “no opinion” or “not applicable”).

The default option for handling missing data is filtering and excluding them from the analysis:

(a) Listwise/casewise deletion: Cases that have missing values on the variable(s) under analysis are excluded. If only analysing one variable, then listwise deletion is simply analysing the existing data. If analysing multiple variables, then listwise deletion removes cases if there is a missing value on any of the variables. The disadvantage is a loss of data, because all data from cases who may have answered some of the questions, but not others (such as the missing data), are removed.

(b) Pairwise deletion: All available data is included. Unlike listwise deletion, which removes cases (subjects) that have missing values on any of the variables under analysis, pairwise deletion only removes the specific missing values from the analysis (not the entire case). In other words, all available data is included. If conducting a correlation on multiple variables, this technique allows to conduct the bivariate correlation between all available data points and ignore only those missing values if they exist on some variables. In this case, pairwise deletion will result in different sample sizes for each correlation. Pairwise deletion is useful when the sample size is small or missing values are large, because there are not many values to begin with, so why omit even more with listwise deletion.

Note: Deletion means exclusion within a statistical procedure, not deletion (of variables or cases) from the data set.

(c) Deletion of all cases with missing values: Only those cases with complete data are retained. This approach reduces the sample size of the data, resulting in a loss of power and increased error in estimation (wider confidence intervals). While this may not be a problem for large data sets, it is a big disadvantage for small ones. Results may also be biased if subjects with missing values are different from the subjects without missing values (that is, non-random) resulting in a non-representative sample.

(d) Imputation (replace the missing values): All cases are preserved by replacing the missing data with a probable value based on other available information (such as the mean or median of the observations for the variable for which the value is missing). Once all missing values have been imputed, the data set can then be analysed using standard techniques for complete data. More sophisticated imputation methods, involving equations that attempt to predict the values of the missing data based on a number of variables for which data are available, exist. Each imputation method can result in biased estimates. Detailing the technicalities, appropriateness and validity of each technique goes beyond the scope of this document. Ultimately, choosing the right technique depends on the following: (i) how much data are missing (and why); (ii) patterns, randomness and distribution of missing values; and (iii) effects of the missing data and how the data will be used in the analysis. It is strongly recommended to refer to a statistician if M&E practitioners are faced with a small data set with large quantities of missing values.

In practice, for M&E purposes with few statistical resources, creating a copy of the variable and replacing missing values with the mean or median may often be enough and preferable to losing cases through deletion methods.

Recoding and creating new variables

During the data clean-up process, certain variables may need to be recoded and new variables created to meet the analytic needs for the M&E exercise. Variables may be recoded in various scenarios, including the following:

Formatting: Date (day, month and year), pre-fixes to create better sorting in tables and rounding (in continuous variables).
Syntax: Translation, language style and simplification.
Recoding a categorical variable (such as ethnicity, occupation, an “other” category and spelling corrections).
Recoding a continuous variable (such as age) into a categorical variable (such as age group).
Combining the values of a variable into fewer categories (such as grouping all problems caused by access issues).
Combining several variables to create a new variable (such as building an index based on a set of variables).
Defining a condition based on certain cut-off values (such as “at risk” versus “at acute risk” population).
Changing a level of measurement (such as from interval to ordinal scale).
A distinction is needed between values (conceptually).

Categorical variables can be recoded in three ways:

(a) Collapse a categorical variable into fewer categories by combining categories that logically go together or eliminate categories that have small numbers of observations;

(b) Break a categorical variable up into several variables with fewer categories;

Guidelines for collapsing data

Ordinal variables need to be collapsed in a way that preserves the order of the categories.
Combine only those categories that go together.
The way in which categories are collapsed can easily affect the significance level of statistical tests. Categories should be collapsed a priori to avoid the criticism that the data were manipulated to obtain a certain result.
Do not oversimplify the data. The unnecessary reduction in the number of categories can reduce statistical power and make relationships in the data ambiguous. Generally, any categories that include 10 per cent or more of the data (or 5 cases for very small samples) should be kept intact.

TIP - Tips for effective recoding

Use distinct and easy-to-remember variable names.
Pay attention to missing values. When recoding is done, the number of cases with missing data should be the same as before recoding.
Use graphs to check the accuracy of recoding.
Use variable codes consistently. For example, with dichotomous “yes/no” variables, always use 0 = no and 1 = yes. For a variable that can have more than one value, always make 0 the reference category.
Keep a permanent record of all the recoding.

Documenting change

Two good data management practices are transparency and the proper documentation of all the procedures followed, including the data cleaning process.

Documenting errors, changes and additions is essential to the following:

Determining and maintaining data quality;
Avoiding duplication of error checking by different data entry staff;
Knowing what and by whom data quality checks have been carried out;
Recovering data cleaning errors;
Informing data users of the changes made to the last version of the data accessed.

To keep track of all the changes made to the data, a change log can be created. By keeping track of all the modifications made, it will be possible to roll back to the original values, when necessary. The following are some of the fields that are included in a change log:

Table (if using multiple tables)
Column, row
Date changed
Changed by
Old value
New value
Comments

RESOURCES

Ruel, E., W.E. Wagner III and B.J. Gillespie

2016a Chapter 12: Data entry. In: The Practice of Survey Research: Theory and Applications. SAGE Publications, Thousand Oaks, California, pp. 195–207.
2016b Chapter 13: Data cleaning. In: The Practice of Survey Research: Theory and Applications. SAGE Publications, Thousand Oaks, California, pp. 208–237.

United Nations High Commissioner for Refugees (UNHCR)

2015 Dealing with messy data. Coordination Toolkit.

Analysing data

Once the data has been collected and cleaned, these are ready to be analysed. Data analysis makes it possible to assess whether, how and why the intervention being monitored and/or evaluated is on track towards achieving, or has achieved, the established objectives. This part of the chapter will discuss and provide examples of how to analyse qualitative and quantitative M&E data, as well as the triangulation of data sources.

Qualitative data analysis

Qualitative data analysis is a process aimed at reducing and making sense of vast amounts of qualitative information – very often from multiple sources, such as focus group discussion notes, individual interview notes and observations – in order to deduce relevant themes and patterns that address the M&E questions posed. When analysing qualitative data, the focus is on the words spoken by the respondents, the context in which the data was collected, the consistency and contradictions of respondents’ views, the frequency and intensity of participants’ comments, their specificity and emerging themes and patterns. For example, as part of monitoring an ongoing project, it is decided to conduct 10 focus groups with a select number of beneficiaries. What should be done once all the discussion notes are collected? Should the data be analysed in an ad hoc or systematic fashion, that is, highlight the relevant information or code it?

Codes are words or short phrases that capture a “summative, salient, essence-capturing, and/or evocative attribute for […] language-based or visual data”.Saldana, 2009, p. 3. Coding is the process of labelling as “belonging to” or representing some type of phenomenon that can be a concept, belief, action, theme, cultural practice or relationship. Coding can be accomplished manually, using paper and highlighters, or by a computer in a Word document, an Excel spreadsheet or a qualitative data analysis software like NVivo.

To begin coding the data manually, gather the hard copies of all the data; mark up the text with pens, pencils, highlighters and markers; and finally cut, paste, hole-punch, pile and string together the data. It is preferable to leave wide margins and lots of white space for the markings.

EXAMPLE

Figure 4.9. Example of manual coding and marginal remarks

Source: Center for Evaluation and Reserach, n.d.

As most M&E practitioners have access to word-processing and spreadsheet programs, these are quite popular in qualitative data analysis (see Figure 4.10 illustrating coding in a Word document).

EXAMPLE

Example of coding in Word document

Source: Ng, n.d.

Figure 4.11. Example of coding in a spreadsheet

Source: AccountingWEB, 2017.

For large amounts of data, such as 20 or more interview transcripts, a license for an existing qualitative data analysis software package like NVivo can be purchased. For an introduction to NVivo and an introductory guide to setting up and coding with this software, please see QSR International’s (2014) Getting Started Guide.

For guidelines on how to analyse qualitative data, see Annex 4.12. Steps for analysing qualitative data.

Quantitative data analysis

Once the quantitative data has been entered in a spreadsheet, it is ready to be used for creating information to answer the monitoring or evaluation questions posed. Statistics help transform quantitative data into useful information to help with decision-making, such as summarizing data and describing patterns, relationships and connections. Statistics can be descriptive or inferential. As its name already reveals, descriptive statistics give information that help describe data and help to summarize it. Other methods of descriptive statistics are graphical representations in form of histograms, pie charts and bar charts, to name a few. This provides a quick method of making comparisons between different sets of data and spotting smallest and largest values, trends or changes over a period of time. Inferential statistics use data drawn from a sample of the population to make generalizations about populations.

As most statistics used in M&E exercises are descriptive, the following discussion will provide tools and examples on how to calculate these types of statistics

As already mentioned in the measurement section, data is collected from units, which can be individuals, households, schools, communities and others. The different measurements, questions or pieces of information that are collected from/about these units are the variables. There are two types of variables, quantitative numerical (quantitative) and categorical. Whereas categorical variables are made up of a group of categories (such as sex, male/female), numerical variables are numbers such as the number of participants at a training.

Two types of variables

Categorical data groups are all units in distinct categories, which can be summarized by determining how many times a category occurs. For example, the number of females in a community, which can be described as the frequency of females in the community. This information is presented using a frequency table. The frequency table shows how many individuals in the community fall into each category (male/ female). This can also then be represented as a percentage or proportion of the total.

Frequency tables can be used to present findings in a report or can be converted into a graph for a more visual presentation. A proportion describes the relative frequency of each category and is calculated by dividing each frequency by the total number.

Percentages are calculated by multiplying the proportion by 100. Proportions and percentages can be easier to understand and interpret than examining raw frequency data and are often added into a frequency table (see Table 4.15).

EXAMPLE

Table 4.15 Frequency table
Question 32. Percentage of parents who registered their children’s birth with the birth and death registry
Response	Frequency	Proportion	Percentage
Registered their children before the project	32	0.25	25%
Registered their children after the project	2	0.02	1.6%
Did not have children or did not respond	94	0.73	73.4%
Total	128	1.00	100%

Source: IOM’s internal evaluation of the project titled Technical Support to the Government of Ghana to Address Child Trafficking and Other Child Protection Abuses in the Ketu South, North and South Tongu Districts of the Volta Region.

Analysis of numerical variables

The centre and the spread of the data are two commonly used descriptive statistics. Whereas the centre describes a typical value, the spread describes the distance of a data point from the centre of the data.

Analysis of numerical variables

The most common statistics used to describe the center are the mean (that is, the average) and the median. The median is the middle value in a data set; half the data are greater than the median and half are less. The mean is calculated by adding up all the values and then dividing by the total number of values.

The most common statics used to describe the center are the mean and the median

EXAMPLE

A survey is conducted of 25 youth (between the ages of 18 and 25), who are participating in a project that is being monitored. Among other things, their age is recorded. Each number is the age of an individual with the ages being arranged in order.

A survey is conducted of 25 youth

The mean and the median would be different for this data set. To calculate the median, arrange the youth in order of age and then find the midway point. In this example, 21 is the median age of the youth; 12 youth are below the age of 21 and 11 children are above the age of 21. To calculate the mean, add up all the ages and then divide by the number of youth. In this example, 21 years is also the mean age of the youth interviewed. The range of the example data would be 7 years (minimum = 18, maximum = 25).

Other statistics describing spread are the interquartile range and standard deviation.

The interquartile range is the difference between the upper quartile and lower quartile of the data. A quarter (or 25%) of the data lie above the upper quartile and a quarter of the data lie below the lower quartile.
The standard deviation shows the average difference between each individual data point (or age of youth in the above example) and the mean age. If all data points are close to the mean, then the standard deviation is low, showing that there is little difference between values. A large standard deviation shows that there is a larger spread of data.

For information on how to calculate descriptive statistics using Microsoft Excel, see Annex 4.13. Calculating descriptive statistics.

Triangulation of data sources

Triangulation is the process of comparing several different data sources and methods to corroborate findings and compensate for any weaknesses in the data by the strengths of other data.

Triangulation can enhance the validity and reliability of existing observations about a given issue. The ability to compare and contrast different findings and perspectives on the same situation and/or phenomenon is an effective way to find inconsistencies in data and identify areas for further investigation. When findings converge, this can lead to new, credible findings about an issue and create new ways of looking at it.

Although there are no fixed rules for analysing data for triangulation, there are several activities at the heart of the process:

Critically assess the data. For example, prioritize those findings that are most relevant to the goal(s) of triangulation, identify ways the findings from different sources relate to one another and highlight any gaps in the data.
Identify any trends and whether they are drawn from a single or from multiple data sources.
Develop working hypotheses related to the goal(s) of data triangulation. For example, if the goal is to understand if certain behaviours are changing among beneficiaries and whether any changes can be linked directly to the intervention, hypotheses from the available data that are linked to this goal should be developed. Hypotheses can be in support of the goal; for example, a supportive hypothesis could be “providing psychosocial support has reduced signs of post-traumatic stress disorder (PTSD) symptoms among beneficiaries”.
Confirm or refute hypotheses. This is a critical point in triangulation when new ideas, perspectives and explanations are likely to emerge. It is also a point when gaps in data are identified, which could lead to a search for additional data. If no additional data is available, a hypothesis may need to be modified or dropped. Any modified hypotheses should then be reconfirmed.
Use the convergence of data supporting or not supporting the hypothesis to draw reasoned conclusions from the triangulation exercise. The conclusions should be linked as closely as possible to the stated goal(s) of triangulation. The key to this process is to make the strongest case for a hypothesis/ goal given the evidence. Questions that may be helpful to consider during the process include the following:
- Which hypotheses are supported by the most rigourous data?
- Which hypotheses are supported by independent sources?
- Which hypotheses are supported by both quantitative and qualitative data?
- Are there important biases or limitations in the available data?
- Are there any other possible explanations not covered by the hypotheses?
- How confident are you in the conclusion?
- Is the conclusion actionable (that is, does it lead to a specific improvement in the response)?
Carefully and thoroughly document conclusions before disseminating them.

RESOURCES

AccountingWEB

2017 Open-ended survey text analysis in Excel. 24 August.

Bazeley, P. and K. Jackson

2013 Qualitative Data Analysis with NVivo. SAGE Publications Ltd., London.

Center for Evaluation and Research

n.d. Tips and tools #18: Coding qualitative data. Impact.

Ng, Y.-L.

n.d. A brief introduction to NVivo. Workshop document. Hong Kong Baptist University Library.

QSR International

2014 NVivo for Windows 10 Getting Started Guide.

Saldana, J.

2009 The Coding Manual for Qualitative Researchers. SAGE Publications, Thousand Oaks.

Tracy, S.

2013 Qualitative Research Methods: Collecting Evidence, Crafting Analysis, Communicating Impact. Wiley-Blackwell, West Sussex.

Presenting findings

M&E efforts aim to generate, and make available, relevant information for decision-making and the management of the intervention being monitored or evaluated. All data visualizations should summarize the collected data and communicate the findings obtained in a simple and intuitive way for the reader. This final part of this chapter will discuss and provide examples of the different types of techniques and tools available for visualizing data, depending on what it demonstrates and what is the aim of conveying it to the reader.

How to visualize findings

Step 1: Identify the data visualization goal

Before M&E practitioners start designing any data visualization, the following questions should be asked:

What is the data trying to communicate?
How will it engage or persuade the audience to act upon the information being presented?
What is the takeaway message the audience should be left with?

It is important to be clear about the goal(s) of presenting data visually in order to design it correctly. Defining the message is a crucial step in the process, and the graphic should reinforce who the organization or intervention is and what it does.

Step 2: Know the audience

Knowing the audience means asking what the audience already knows, what additional information they wish to have to learn and how much detail they require to understand the message being conveyed.

Step 3: Think about how to visualize the story

Once the data collected is cleaned and has been analysed, a more precise idea about what findings to present should emerge. Table 4.15 provides an overview of the key visualization techniques to use depending on what the data reveals.

Table 4.15. Summary of different visualization techniques and what they can be used for
What the data shows	Appropriate visualization technique use
Summary	Summary table
Facts and figures	Icons and images draw attention to the data values
Comparison, rank and distribution	Bar charts and heat maps using shapes and colours represent numerical values
Proportion or part-to-whole	Pie charts, donuts, stacked bar charts, tree maps show distribution within a group
Change over time	Line graphs for time-series analyses with optional trend lines
Relationships and trends	Scatter plot and bubble graphs can help show correlation
Text analysis	Word clouds to visually display the most common words in a qualitative data set

Source: Carrington and Handley, 2017.

For additional information regarding each type of visualization, see Annex 4.14. Types of visualizations.

RESOURCES

Carrington, O. and S. Handley

2017 Data visualization: What’s it all about? New Philanthropy Capital (NPC) Briefing, August. London.

Annexes

IOM migration data governance and monitoring and evaluation

What is it?

Data governance represents the framework used by IOM to manage the organizational structures, policies, fundamentals and quality that will ensure access to accurate information. It establishes standards, accountabilities and responsibilities and ensures that migration data and information usage achieves maximum value to IOM, while managing the cost and quality of information handling. Data governance enforces the consistent, integrated and disciplined use of migration data by IOM.

How is it relevant to IOM work?

Data governance allows to view data as an asset in every IOM intervention and, most importantly, it is the foundation upon which all IOM initiatives can rest. Evidence-based programming only becomes a reality when data can prove what the problem is and how to solve it. This means being able to measure what is not known and knowing what is available and what is possible to work with.

The migration data life cycle throughout the whole project cycle includes planning and designing, capturing and developing, organizing, storing and protecting, using, monitoring and reviewing and eventually improving the data or disposing it.

Things to look out for:

(a) Data steward: At the intervention implementation phase, the data to collect has a clear data steward. If the intervention is a project implemented at the mission level, the chief of mission will be the data steward. If it includes more than one country, the data steward is most likely to be the regional director. If it is a global project, it should be ascertained which thematic area the project belongs to and, as such, the division head would be the data steward. Where the data is cross-cutting, it is likely that the data steward would be the department head.

(b) Roles and responsibilities: All project staff should be aware of their roles and responsibilities regarding the data and only have access to the data that they need to do their work.

(d) Data classification for security and privacy: The level of risk for the collection of data should be determined and classified accordingly, so that it can be stored with accurate access controls.

(e) Data processing including collection and use: Tools that allow to collect only the data needed for the purpose of its use should be developed.

For a list of relevant resources, please refer to the text box IOM migration data governance in this chapter

How to conduct a desk review

Steps to conduct a desk review

Step 1: Identify all possible sources.

Step 2: Categorize documents.

Because some documents will be more pertinent than others, not all documents should be given equal weight or attention. To facilitate the desk review, available documents can be categorized into tiers:

Tier I are documents specifically on the subject of the monitoring/evaluation exercise, such as situation, progress and monitoring reports and project proposals;
Tier II are background documents such as media coverage and other agency reports;
Tier III are non-project-related documents.

Step 3: Decide on the approach (Structured/Unstructured).

Often times, due to time constraints, monitoring and evaluation activities limit themselves to first-tier and partially second-tier documents for which an unstructured approach is suitable. To include a full second- and third-tiers documents in the desk review, a structured approach will be required, such as the following:

Structured review form to record comments read through the documents;
Rubric to rate parts of the documents, for example, using a four-point scale to divide documents into the following:
- Those that do not address the topic at all.
- Those that address the topic in a minor way.
- Those that address the topic in some significant way.
- Those focused principally on the topic.
Indexing and searching documents for content analysis.

RESOURCES

Additional information and practical examples of how to conduct a desk review

Buchanan-Smith, M., J. Cosgrave and A. Warner

2016 Evaluation of Humanitarian Action Guide. ALNAP/ODI, London.

Types of bias

The accuracy of the collected data and conclusion drawn depend on the M&E practitioner and the respondents and how they address and comply with the different steps in the data collection, analysis and reporting processes. No study is ever entirely free from bias. Therefore, it is important to be transparent about any bias in the data collected in monitoring/evaluation reports. A statement relating to the potential biases and the steps that were taken to control such biases should be included in all monitoring/evaluation reports.

Respondent bias

Non-response bias: This bias occurs when individuals selected refuse to, or are unable to, participate in the survey. As a result, the data that is collected will differ in meaningful ways from the target population. To avoid this, M&E practitioners should ensure that the selected sample is representative of the target population or adjust the sample if this bias is becoming too important.

Acquiescence bias: Acquiescence bias occurs when a respondent tends to agree with and be positive about whatever the interviewer asks. To avoid this, questions that suggest such an answer should be revised to gage the respondent’s true point of view on the issue(s) of interest.Sarniak, 2015

Social desirability bias: This bias involves respondents answering questions in a way that they think will lead to being accepted and liked. To avoid this, indirect questioning may be used, which entails asking about what a third party thinks, feels and how they will behave. Such approaches allow respondents to project their own feelings onto others, while still providing honest, representative answers.Ibid.

Habituation bias: For habituation bias, the respondent provides the same answer to all those questions that are worded in similar ways. To avoid this, questions should be worded/reworded differently and have varying response options.Ibid.

Sponsor bias: As respondents generally know who is the organization funding the intervention, their feelings and opinions about that organization may bias their answers. For instance, respondents may present a dire situation in the hope of obtaining additional or future funding from the organization. This bias may be more difficult to address, but the same approach for acquiescence bias may be used.Ibid.

Attrition/Mortality bias: When respondents drop out of the study, the sample selected may no longer be representative of the target population. The sample needs to be adjusted.

Selection bias: Selection bias is the distortion of the data because of the way in which it was collected. Self-selection is one common form of selection bias whereby people volunteer to participate in the study.The source of bias here is that the participants may respond differently from those who do not volunteer to participate in the study.

Recall bias: This bias arises when the respondents have difficulties remembering certain information resulting in the collection of inaccurate information. One way to minimize this bias is to anchor questions to key events that respondents are familiar with that can help them recall the relevant information.

Evaluator/Researcher bias

Confirmation bias: This type of bias occurs when an M&E practitioner forms a hypothesis or belief about the intervention being monitored/evaluated and uses respondents’ information to confirm that hypothesis or belief. Confirmation bias can also extend into the analysis stage, with evaluators/researchers tending to remember points that support their hypothesis and points that disprove other hypotheses. To minimize confirmation bias, M&E practitioners should regularly re-evaluate impressions of respondents and challenge pre-existing assumptions and hypotheses.

Question-order bias: This bias arises when one question influences respondents’ answers to subsequent questions. The words and ideas presented in questions prime respondents, thereby impacting their thoughts, feelings and attitudes on subsequent questions. While this type of bias is sometimes unavoidable, it can be reduced by asking general questions before specific, unaided before aided and positive before negative.

Leading questions and wording bias: This type of bias arises when M&E practitioners elaborate on a respondent’s answer in an effort to confirm a hypothesis, build rapport or overestimate their understanding of the respondent. To minimize this bias, practitioners should ask questions that use the respondents’ language and avoid summarizing what the respondents said in their own words.

Publication bias: This bias occurs when negative results are less likely to be submitted and/or published than positive ones. “Most evaluations are commissioned by agencies or donors with a vested interest in the result, so it is possible that the incentive structure tends toward more positive findings even when external consultants are hired to carry out the evaluation.” Ibid.

RESOURCES

Bernard, H.R.

2012 Social Research Methods: Qualitative and Quantitative Approaches. Second edition. SAGE Publications, Thousand Oaks.

Creswell, J.W.

2014 Research Design: Qualitative, Quantitative, and Mixed Methods Approaches. Fourth edition. SAGE Publications, Thousand Oaks.

Hunter, J.E. and F.L. Schmidt

2004 Methods of Meta-Analysis: Correcting Error and Bias in Research Findings. Second edition. SAGE Publications, Thousand Oaks.

Keppel, G.

1991 Design and Analysis: A Researcher's Handbook. Third edition. Prentice-Hall, Inc., New York.

Neuman, W.L.

2006 Social Research Methods: Qualitative and Quantitative Approaches. Sixth edition. Pearson/Allyn and Bacon, Boston.

Punch, K.F.

2013 Introduction to Social Research: Quantitative and Qualitative Approaches. SAGE Publications, London.

Sarniak, R.

2015 9 types of research bias and how to avoid them. Quirk’s Media, August.

Applying types of sampling

Random sampling

Simple random sampling

Random samples are samples in which each unit in the target population for the monitoring/evaluation exercise has an equal chance of being selected. This approach is fair and reduces selection bias, which undermines the accuracy of the predictions being made about the target population (see section on “Bias”).

Ideally, a sample should be representative of the entire target population. In order to select a random sample, a sampling frame is required. Each unit is assigned a unique identification number and then using a random number table or generator, a certain number of units is randomly selected.

Example: At site X in country Y, there is a total of 1,536 IDPs, and 1,536 is a four-digit number, so every individual in the population is assigned a four-digit number beginning with 0001, 0002, 0003, 0004 and so on. Then, starting at any point in the random number table, choose successive four-digit numbers until 300 distinct numbers between 0001 and 1,536 are obtained or generate 300 random numbers between 0001 and 1,536 using a software such as Microsoft Excel (see example below).

How to select a random sample of 300 IDPs from a total population of 1,536 IDPs using Microsoft Excel

Step 1: Click on cell A1 and type RANDBETWEEN(0001,1536) and press Enter

Step 1

Step 2: To generate, for example, a list of 300 random numbers, select cell A1, click on the lower right corner of cell A1 and drag it down to cell A300.

step2 step3

Systematic random sampling

Systematic random sampling is a technique that randomly selects a number near the beginning of the sampling frame list, skips several numbers, and selects another number, skips several more numbers, and selects the next name and so forth. The number of names skipped at each stage depends on the desired sample size.

How to select a systematic random sampling

Step 1: Estimate the number of units in the population (for example, 1,536 IDPs at site X).

Step 2: Determine the sample size (for example, 300 IDPs).

Step 3: Divide step 1 by step 2 (k=N/n) to get the skip number. Example: k = 1,536/300 = 5.12

Step 4: Select a subject at random from the first Kth number in the sampling frame (for example, fifth number).

Step 5: Select every Kth number listed after that one until the required sample is selected.

Because of the luck of the draw using simple random sampling, a good representation of subgroups in a population may not be obtained. Several other random sampling techniques exist to address this issue.

Stratified random sampling

Stratified random sampling, also referred to as proportional or quota random sampling, is a technique that divides the sampling frame in two or more strata (subpopulations) according to meaningful characteristics, such as type of migrant or gender from which participants are then randomly selected. Then, a simple random sample from each strata is taken. When using the same sampling fraction within the strata, proportionate stratified random sampling is conducted. When using different sampling fractions in the strata, disproportionate stratified random sampling is used. This technique is useful when the project, programme or policy is targeting several groups to compare.

Example: Among the people at the IDP site in country Y, how many are children, youth, young adults, adults and some elderly? If children, youth and elderly represent only a small proportion of the total IDP site population, a simple random sample may not include enough of them to allow for a meaningful analysis to be conducted.

How to select a stratified random sample

Step 1: Divide the population into the strata of interest. For example, of the 1,536 IDPs, there are 142 children (0–12), 157 youth (13–25), 413 young adults (26–34), 701 adults (35–60) and 123 elderly (60+).

Step 2: Select a simple random sample from each stratum. Example: 142/1,536 = .092 * 142 = 13.12

Select a simple random sample of 13 from the children stratum.

Note: The number of units that is selected from each stratum should be equivalent to the stratum’s proportion of the total population.

Simple, systematic and stratified random sampling techniques require a sampling frame, which is very difficult to have for individuals or families. When a sampling frame is not available or the units on the list are so widely dispersed that it would be too time-consuming and expensive to conduct a simple random sample, cluster sampling is a useful alternative.

Cluster random sampling

Cluster random sampling divides the population into several clusters and then a simple random sample is selected from the clusters. The units in the selected clusters constitute the sample. Unlike stratified random sampling, cluster random sampling uses the clusters to identify units not to compare them. A drawback of this approach is that the clusters may differ in important characteristics from the ones not included in the sample, thereby biasing the accuracy of the inferences made to the target population.

How to select a cluster random sample

Step 1: Identify the population of interest (for example, 1,536 IDPs at site X in country Y).

Step 2: Divide the population into a large number of clusters (there are 10 zones in the IDP camp of approximately 150 IDPs each, of which two are randomly sampled).

Step 3: Select a simple random sample of the clusters (for example, randomly sample 2 of the 10 zones, yielding a total sample of about 300).

Multistage random sampling

Multistage random sampling is a technique that combines two or more random sampling methods sequentially. The process usually begins by taking a cluster random sample, followed by a simple random sample or a stratified random sample. Multistage random sampling can also combine random and nonrandom sampling techniques.

Example: In country Z, there are 7 IDP sites. In order to assess the support being provided by IOM to the IDP populations in these locations, purposefully select 2 of the 7 sites according to a set of criteria. Within each of the two sites, there are 8 zones with about 60 IDPs each. Randomly select 2 zones from each of the two IDP sites, yielding a total sample of 240 IDPs.

Non-random sampling

Purposeful sampling

Purposeful sampling is when a sample is selected according to set of predetermined criteria that are believed to provide the data necessary for monitoring/evaluating the project, programme or policy under review. Unlike random sampling, this sampling technique is mainly used with a limited number of persons with the required information and limited time and resources to collect it. In emergency settings such as conflict-affected societies, this approach may also be more appropriate, as taking a random sample may face the risk of aggravating tensions.

Unlike random sampling, purposeful sampling is deliberately biased in order to select the most appropriate cases for the monitoring/evaluation questions posed. Thus, if this sampling technique is used, it is necessary to be transparent and rigourous when selecting a sample to control for and identify any potential bias in the data that need to be considered in the interpretation of results. For a discussion on how to select purposeful samples, see Buchanan-Smith et al.'s Evaluation of Humanitarian Action Guide (2016).

Snowball sampling

A snowball sample is another form of purposeful sampling that is used when it is not known who, what or how many perspectives to include. Begin with an interview and then ask the interviewee to identify other potential additional respondents to talk to. This process is continued until having reached a point of saturation. Saturation is the point in the data collection process where no new or relevant information emerges that addressed the monitoring/evaluation questions proposed. Snowball sampling is particularly useful when trying to reach populations that are inaccessible or difficult to locate.

When using purposive sampling, a variety of different perspectives should be included in the sample to ensure the credibility of the findings, that is, data source triangulation. For instance, accounting for roles, gender, ethnicity, religion, geographic location and other factors important to the problem being addressed by the project, programme or policy under review should be done to capture diverse perspectives in the sample. When this is not possible, it is required to be transparent about the perspectives included and those that were not in the monitoring/evaluation report.

Quota sampling

Quota sampling, also a purposeful sampling technique, consists of selecting a specific number of different types of units from a population non-randomly according to some fixed quota. The quota sampling can be proportional (to represent the major characteristics of the target population by sampling a proportional amount of each) or non-proportional (to specify the minimum number of sampled units in each category without considerations for having numbers that match the proportions in the target population). For example, in the IDP site, it is possible to select 150 women and 150 men to interview (n = 300, proportional) or 200 IDPs, 50 returnees and 50 refugees (n = 300, non-proportional).

Convenience sampling

Convenience samples are selected based on the availability or self-selection of participants (that is, volunteers) and/or the researcher’s convenience. While this technique is neither purposeful nor strategic, it remains widely used because it is inexpensive, simple and convenient. For instance, arriving in a project target location and interviewing the project staff available on the day of the visit and beneficiaries that are encountered while walking through different parts of the location. However, this sampling technique is also the least reliable of the non-random sampling approaches presented above, as those most available are over-represented, thereby underestimating the variability that exists within the target population. If this technique is used, using quotas can help ensure that the sample is more inclusive. For instance, ensure sampling an equal number of men and women, and every second interview can be conducted with a female.

RESOURCES

References and further reading on sampling

Buchanan-Smith, M., J. Cosgrave and A. Warner

2016 Evaluation of Humanitarian Action Guide. ALNAP/ODI, London.

Daniel, J.

2012 Sampling Essentials: Practical Guidelines for Making Sampling Choices. SAGE Publications, Thousand Oaks.

Hand, D.J.

2008 Statistics: A Very Short Introduction. Oxford University Press, New York.

Magnani, R.

1999 Sampling Guide. Food Security and Nutrition Monitoring (IMPACT) Project, International Science and Technology Institute for the U.S. Agency for International Development. Arlington, Virginia, January.

Morra-Imas, L.G. and R.C. Rist

2009 The Road to Results: Designing and Conducting Effective Development Evaluations. World Bank, Washington, D.C.

National Audit Office, United Kingdom

2000 A Practical Guide to Sampling. Statistical and Technical Team.

Sackett, D.L.

1979 Bias in analytic research. Journal of Chronic Diseases, 32:51–63.

Stockmann, R. (ed.)

2011 A Practitioner Handbook on Evaluation. Edward Elgar, Cheltenham and Northampton.

Trochim, W.M.K.

2020a Probability sampling. Research Methods Knowledge Base.
2020b Nonprobability sampling. Research Methods Knowledge Base.

Wholey, J.S., H.P. Hatry and K.E. Newcomer (eds.)

2010 Handbook of Practical Program Evaluation. Third edition. Jossey-Bass, San Francisco, California.

Survey design and implementation

Question format, types and response options

The format of survey questions can be closed-ended or open-ended. Many surveys tend to include a mixture of both. Closed-ended questions can be answered with a simple response or a selection of response options. Common formats for close-ended questions include response options that are:

Dichotomous
Example: Do your children go to school? □ Yes □ No
Ordered

Example: What is the frequency of food (or cash/vouchers) distribution at the site?

□ Every day □ Twice a week □ Once a week □ Every two weeks □ Once a month

□ Irregular □ Never □ Unknown □ No answer, why? _______________
Unordered

Example: Reason for displacement

□ Conflict □ Flood □ Fire □ Drought □ Landslide □ Other
Fill in the blank

Example: How much money do you earn per day? ____

For more information and examples on the different formats for closed-ended questions, see David and Sutton’s chapter on “Survey design” in Social Research (2011).

In contrast, open-ended questions enable respondents to answer in their own words. For example, “Why did you specifically choose this destination country?”

There are six different types of questions that can be used to design a survey.Patton, 1987. There are other types/categories of questions that are used for interviews and that are further developed in Annex 4.7.

(a) Behaviour/experience questions explore what the respondents do/aim to do, how they behave/act, what they experience/encounter, how they respond or just what happens or occurs in the context in which the project, programme or policy is being implemented. Example: Do you intend to move to a different location?

(b) Opinion/value questions explore the respondents’ thoughts, reactions, impressions, attitudes and outlook on the activities/issues being monitored/evaluated.

Example:

How would you describe your household’s access to public services, such as education, shelter, health and other services in the area in which you currently reside?

□ Excellent: We experience no problems whatsoever.

□ Good: Access is good but we experience minor delays.

□ Neutral.

□ Bad: We experience delays and problems.

□ Very bad: There are delays and denial of access from local community and authorities.

(d) Feeling questions explore the respondents’ emotions or emotional reactions about a particular experience, thought or issue.

Example:

Do you consider yourself locally integrated?

□ Yes □ Partially integrated □ No □ I do not know

(e) Knowledge questions inquire about the respondent’s knowledge about a particular issue. Example: How many members of your household suffer from a chronic illness?

(d) Sensory questions explore what the respondent sees, hears, touches, tastes and smells. Examples: Is there enough lighting in the site at night?

(e) Background/demographic questions elicit biographical or historical information from the respondents.

Example:

Sex: □ Male □ Female

When developing survey questions, it is important to avoid making them ambiguous, doubled barreled, leading and with double negatives. An ambiguous question is when the respondent can interpret the question in several different ways because it uses a vague term(s) or phrase. A double-barreled question is when you ask about two different issues at the same time. A question that contains “and” or “or” is an indication that the question may be double barreled. A leading question is when you suggest to the respondent a certain answer. Double negatives in a question introduces unnecessary confusion, thereby increasing the risk of gathering unreliable data. By avoiding these mistakes, it is ensured that the data collected is valid and reliable.

Example of ambiguous questions:

a) Poor: Would you be willing to relocate? □ Yes □ No

Good: Would you be willing to relocate (select one option):

□ Within the same state
□ Out of the state (specify):__________
□ Within the same country
□ Out of the country (specify):________
□ No relocation
□ Don’t know

(b) Poor: How did you hear about the project? (Double-barreled: Provides two answer options in one)

□ A friend or relative
□ A newspaper
□ Television or radio
□ Your spouse
□ Work

Good: How did you hear about the project?

□ A friend
□ A relative
□ A media source

Example of double-barreled questions:

Poor: In your opinion, how would you rate the health and education services available at the site?

□ Excellent
□ Good
□ Fair
□ Poor

Good: In your opinion, how would you rate the health services available at the site?

□ Excellent
□ Good
□ Fair
□ Poor

In your opinion, how would you rate the education services available at the site?

□ Excellent
□ Good
□ Fair
□ Poor

Example of leading questions:

Poor: More people have participated in activity X than any other project activity. Have you participated in this activity?

□ Yes □ No

Good: Have you participated in activity X?

□ Yes □ No

Example of double negatives questions:

Poor: How do you feel about the following statement? “We should not reduce military spending.”

□ Strongly agree
□ Agree
□ Disagree
□ Strongly disagree

Good: How do you feel about the following statement? “We should reduce military spending.

□ Strongly agree
□ Agree
□ Disagree
□ Strongly disagree

Sequencing questions

The order in which the survey questions are asked can affect the quality of the responses received, as well as whether or not the survey will be completed by the respondents. For example, if a survey is started with highly sensitive questions, the participants may provide inaccurate information, may skip the questions or drop out of the survey altogether. The following are a few tips on sequencing survey questions:

Begin with questions that are of interest to the respondents.
Ask questions about the present before questions about the past or future.
Spread out fact-based questions throughout the survey, as these tend to disengage respondents.
Begin with easier questions and place more difficult or sensitive questions near the end.
Place personal/demographic questions at the end, as some respondents may feel uncomfortable continuing with the survey if they are asked certain background questions in the beginning.
Group similar or related questions together.
Ensure there is a logical flow to the questions.
Ask the most important questions by two thirds of the way through.
Prioritize questions, dropping those which are of low priority.

RESOURCES

References and further reading on survey question format, types and response options

Bradburn, N.M., B. Wansink and S. Sudman

2004 Asking Questions: The Definitive Guide to Questionnaire Design – For Market Research, Political Polls, and Social and Health Questionnaires. Revised edition. Jossey-Bass, San Francisco, California.

Cowles, E.L. and E. Nelson

2015 An Introduction to Survey Research. Business Expert Press, New York.

David, M. and C.D. Sutton

2011 Survey design. In: Social Research: An Introduction. Second edition. SAGE Publications, London, pp. 239–270.

Fowler Jr., F.J.

2006 Improving Survey Questions: Design and Evaluation. SAGE Publications, Thousand Oaks.

International Program for Development Evaluation Training (IPDET), World Bank and Carleton University

2013 IPDET Handbook.

Morra-Imas, L.G. and R.C. Rist

2009 The Road to Results: Designing and Conducting Effective Development Evaluations. World Bank, Washington, D.C.

Patton, M.Q.

1987 How to Use Qualitative Methods in Evaluation. SAGE Publications, California.

Layout

Similar to the sequencing of survey questions, the layout of the survey is also important for gathering high-quality data (especially for self-administered surveys). The following are some of the main points to include in a survey:

Title, date and respondent identification number;
Contact and return information;
Introductory statement/cover letter that indicates the purpose of the survey and explains how it will be used;
Limit the number of headings for the topics being covered;
Keep questions simple, short and concise;
Questions and response choices should be formatted and worded consistently;
Each question should be numbered;
Provide instructions on how to answer each question (for example, choose one or all options that apply, tick a box, circle the answer or write a short answer);
Order responses from a lower level to a higher level, moving from left to right (for example, 1 – Not at all; 2 – Sometimes; 3 – Often; 4 – Always).

Introductory statement/cover letter

The introductory statement/cover letter should be brief and easy to understand. The following are the main points that are generally included:

Purpose and importance of the survey;
Identification of the organization conducting the survey;
Importance of the respondent’s participation;
Explanation of why the respondent was selected to participate in the survey;
Approximate time it will take to complete the survey;
Assurance that the information will remain anonymous and confidential;
Appreciation for the respondent’s time and effort;
A person’s name and contact details for further enquiries;
An offer for feedback of the survey results.

Building rapport

At the beginning of a survey (or an interview, see section on Interviews), it is important to establish a good rapport with the respondent. Rapport is the ability to relate to the respondent in a way that creates a level of trust and understanding. How to build good rapport with your respondent?

(a) Make a good first impression.

Upon arriving at the meeting place to carry out the survey, do the best to make the respondent feel at ease. With a few well-chosen words, it is possible to put the respondent in the right frame of mind for participating in the survey. Begin with a smile and greeting such as “Good morning” or “Good afternoon” and then proceed with the introduction.

(b) Obtain the respondent’s consent.

(d) Assure anonymity and confidentiality of responses.

If the respondent is hesitant about responding to a question or asks what the data will be used for, explain that the information collected will remain confidential, no individual names will be used for any purpose, and all information will be grouped together when writing any reports. Also, it is advisable not to mention other surveys conducted or show completed survey forms to other enumerators in front of a respondent or any other person.

(e) Always have a positive approach.

(f) Interview the respondent alone.

The presence of a third person during the survey can avoid obtaining frank, honest answers from the respondent. Therefore, it is very important that the survey be conducted privately and that all questions be answered by the respondent. If other people are present, explain to the respondent that some of the questions are private and ask to interview the person in the best place for talking alone. Sometimes, asking for privacy will make others more curious, so they will want to listen. Establishing privacy from the beginning will allow the respondent to be franker and more attentive to questions.

If it is impossible to get privacy, it is possible to carry out the interview with the other people present. In this case, separate the respondent from the others as much as possible.

(g) Use participants as experts.

Individuals are invited to participate in a study because they are viewed as possessing important knowledge required for monitoring/evaluating a specific project, programme or policy. In that case, it is advised to let participants know that the survey will learn from them. Expressing this to participants helps to establish a respectful appreciation for valuable contributions that they will make to the monitoring/evaluation exercise.

RESOURCES

References and further reading on survey layout

King, N. and C. Horrocks

2018 Interviews in Qualitative Research. SAGE Publications, London.

Mack, N., C. Woodsong, K.M. MacQueen, G. Guest and E. Namey

2005 Qualitative Research Methods: A Data Collector's Field Guide. Family Health International, North Carolina.

Reviewing, translating, pretesting and piloting the survey

Once the initial version of a survey is drafted, it is recommended to engage the key stakeholders and local experts in reviewing the draft closely and amend it based on the feedback received. This process may have to be repeated several times before the survey is ready for pretesting. If the survey will be conducted in the local language, the survey needs to be translated prior to the pretest. The translator(s) should be fluent in both languages and, to the extent possible, be familiar with the issues being covered in the survey. Once the survey is translated into the local language, a second translator should translate it back into the original language. This process ensures an accurate translation. Any gaps or misunderstanding need to be addressed before the survey can be pretested.

When conducting a pretest, it is important to test the survey among a group of people from diverse backgrounds relevant to the issues being monitored/evaluated. The goal of a pretest is to ensure that the survey is collecting the information it is aimed to collect. A good pretest should look at the survey at three levels:

As a whole: Are all sections of the survey consistent? Are there any sections that ask the same question?
Each section: If the survey has more than one section, does each section collect the intended information? Are all major activities/issues being monitored/evaluated accounted for? Are there any questions that are not relevant?
Individual questions: Is the wording clear? Is the translation correct? Does the question allow ambiguous responses? Are there alternative interpretations?

One approach to assessing the survey on these three levels is to sit down with a small number of respondents as they complete the survey and ask them to reason aloud as they fill it out. This process, although time intensive, can yield important insights about how potential respondents will interpret the questions being asked. Any question that is misunderstood should be revised and tested again. This process may need to be repeated several times, especially if the survey has been translated into another language, which can result in gaps or misunderstandings regarding the accurate translation.

Once the survey has been finalized, it should be piloted with an appropriate sample of potential respondents to participate in the survey. This will provide a good indication of the validity and reliability of the questions in the survey. The pilot is also an opportunity to practice implementing the survey, which can help identify any potential challenges that may be encountered during the actual data collection.

RESOURCES

References and further reading on reviewing, pretesting and piloting the survey

Morra-Imas, L.G. and R.C. Rist

2009 The Road to Results: Designing and Conducting Effective Development Evaluations. World Bank, Washington, D.C.

International Program for Development Evaluation Training (IPDET), World Bank and Carleton University

2013 IPDET Handbook.

tools4dev

n.d. How to pretest and pilot a survey questionnaire.

Survey example

Excerpts from the IOM South Sudan post-distribution monitoring of non-food items, Wau, September 2017

A mobile phone-based post-distribution monitoring household questionnaire was used to carry out household data collection and included visual observation and recording (photos) of the usage and state of distributed items. From 3,507 households, an initial sample size of 88 households was calculated using the sample size calculator referred to in the South Sudan Shelter-Non-food Item (S-NFI) Cluster Guidelines. Moreover, as per recommendation stipulated in the guidelines, 20 per cent of the initial sample size was added to account for any spoiled surveys or improper data entry. Therefore, 19 households (20% of the initial sample size) was added, totaling a target sample size of 105 households. Confidence was 96 per cent with 10 per cent margin of error. With three days to survey households, each enumerator collected data from seven households per day over three days. One enumerator interviewed an additional two households, resulting in a total sample size of 107 households.

Introduction

Please introduce yourself and the purpose of the visit to the interviewee clearly.
Please confirm the interviewee did receive the shelter and NFI items (blankets, mosquito nets, collapsible jerrycans and kangas) in the IOM distribution in May 2017.
Please seek the consent of the interviewee before proceeding with the questionnaire, telling him/her that the interview will take about 30 minutes of their time.
Please explain that you are not going to provide any additional items but that the information provided is only to help improve distributions in the future.
Please try to keep the interview as confidential as possible to avoid bias. This may mean asking bystanders politely to move away, and/or finding a space where people are not able to overhear.
Please confirm that the head of household and/or the individual who was registered and who collected the items at distribution time is the person you are interviewing.

The following are many possible examples and show only a short segment of a survey.

Did you receive NFI from IOM distribution during May 2017?

□ Yes □ No

1a. Select the State:

□ Abyei Administrative Area □ Central Equatoria □ Eastern Equatoria □ Jonglei

1b. Location of interview: ____________________________

2. Name of enumerator: ___________________________

3a. Name of respondent (Beneficiary name):___________________

3b. Please mark sex of the respondent: □ Male □ Female

3c. What is your household size?

□ 1 □ 2 □ 3 □ 4 □ 5 □ 6 □ 7 □ 8 □ 9 □ More than 10

4a. Did you feel you received the items in time to respond to your needs?

□ Yes, timely □ Delayed □ Too late

4b. What was the quality of the sleeping mat?

□ Good □ Average □ Not good

4c. Was the distribution well organized?

□ Excellent □ Good □ Averagely organized □ Poorly organized □ Badly organized

4d. What changes have you experienced since the items were distributed to you? (that is, protection from malaria, more comfortable sleep)

Interview structure and questions

Parts of the interview and types of questions

Table A4.1. Overview of interview structure and types of questions

Opening	Body		Closing
Building rapport questions	Generative questions	Directive questions	Wrap-up questions
Open-ended experience Factual questions	Tour Hypothetical Behaviour and action Compare–contrast Motives	Closed-ended Typology Member reflection Potentially threatening	Catch-all Identity-enhancing Demographic
Informed consent			Informed consent

The opening

During the first few minutes of an interview, it is advisable to inform the participant about the interview by using an introductory statement similar to that for a survey. The length of the interview can be confirmed by saying something like this:

The interview should last about an hour. Does that still work for you? While I will encourage you to elaborate on your answers to the questions that I will ask, there may be times when I redirect, so that we may be sure to cover all the issues within the hour.

The first questions should then focus on building rapport, helping the participants feel comfortable and knowledgeable. Therefore, questions should be non-intimidating, open-ended, easy and inviting, such as open-ended experience questions that will prompt the respondents to tell stories (for example, “What can you tell me about Project X?”) or factual questions about the issues being monitored/ evaluated (for example, “What basic services are available in your community?”).

The body

After the opening, you can begin to ask generative questions, which are non-directive, non-intimidating questions aimed at generating frameworks for talk. Tour questions ask the interviewee to share familiar descriptive knowledge or memories about an activity or event; for example, “Can you describe a typical day for me?” Tour questions can be followed with probes by asking for examples or timeline questions; for example, “What were the events that led up to you leaving your home?” Hypothetical questions ask interviewees to imagine their behaviours, actions, feelings or thoughts in certain situations; for example, “Imagine you receive X amount of money; how would you use it?” Behaviour and action questions can also be asked, as well as compare–contrast questions; for example, “How is your daily routine similar to or different from the daily routine you had before leaving your home?” Finally, questions can be about motives (of the interviewee and/or those of another person). Such questions include asking about feelings, actions or behaviours, worded in the format of “why” or “how” questions. After asking about past and present experiences, future prediction questions can be asked to obtain further related information, for example, “Where do you see yourself living in the near future?”

To obtain specific information, directive questions are used. The simplest type of directive question is the closed-ended question, which has predetermined single, dichotomous or multiple response options. There are also typology questions where respondents are asked to organize their knowledge into different types or categories; for example, “What types of recreational activities do you engage in on a regular basis?” Prompts can be used to encourage participants to articulate a range of categories or types. If interested to ask the participant to comment on the data collected thus far, member reflection questions can be used; for example, “On the basis of my fieldwork so far, it seems that one reason for … is … What do you think about my interpretation here?” If potentially threatening/intimidating questions need to be asked (such as personal or political), these should be left for the end of the interview as these may be less problematic if good rapport is already built with the participants.

The closing

Several types of questions exist for closing an interview. Catch-all questions can be used to capture and tie together loose ends or unfinished stories; for example, “What question did I not ask that you think I should have asked?” The end of an interview is also the time to ask identity-enhancing questions, which encourage the interviewee to leave the discussion feeling appreciated and an expert. For instance, “What did you feel was the most important thing we talked about today and why?” Answers to these questions can also guide future interviews. For demographic questions, when and how to ask them remains debated. Whereas some researchers and practitioners believe they should be asked at the beginning, in case the participant terminates the interview prematurely, others find they can interfere with building rapport. At the end of the interview, remember to thank the interviewees and reassure them of anonymity and confidentiality.

RESOURCES

References and further reading on interview structure and types of questions

Tracy S.

2013 Qualitative Research Methods: Collecting Evidence, Crafting Analysis, Communicating Impact. Wiley-Blackwell, West Sussex.

Formulating interview questions

Good quality interview questions should have the following characteristics:

Simple and clear (no acronyms, abbreviations or jargon);
Not double barreled;
Promote open-ended and elaborate answers;
Note: If it is decided to include yes/no questions, these should be followed by “Why?” or “In what ways?” or they should be reworded to encourage a more fine-grained answer (for example, “To what extent is…”).
Straightforward (no double negatives), neutral and non-leading;
Non-threatening to the interviewee;
Accompanied by appropriate probes.

Probes

Probes are responsive questions asked to clarify what has been raised by the respondent. The aim is to obtain more clarity, detail or in-depth understanding from the respondent on the issue(s) being monitored/evaluated.

Examples of clarifying probes:

Did I understand you correctly that…?
When you say … what exactly do you mean?

Examples of detail probes:

How did you deal with …?
Can you tell me more about …?

Examples of analytical probes:

How would you characterize …?
What is/was important about …?

Examples of variations and counterfactual probes:

Would you deal with X in the same way the next time?
Some of the people I have talked to said that … What is your take on this?

Reviewing, translating, pretesting and piloting the interview

Similar to surveys, interviews too should be reviewed, translated (if necessary), pretested and piloted. For a review on how to proceed, refer to the subsection and annex on “Surveys”.

Tips for conducting interviews

Let the interviewee know about the purpose and timing of the interview, the reason for being interviewed, how they were selected, how the data will be used (anonymity and confidentiality), how long the interview will take, whether they will receive a copy of the report, and that summary notes of the interview will be available if desired.
Pick a time and location that is safe, easily accessible, quiet and free from distractions.
Have a note-taker and/or record the interview, when feasible.
If taking notes, make sure not to be distracted from the conversation:
- Maintain eye contact as much as possible;
- Write key words and phrases (not verbatim);
- To capture a certain response, ask the interviewee to repeat to have sufficient time to capture it;
- For important responses, ask the interviewee if their exact words can be used/quoted.
Stick to the selected interview approach (structured, semi-structured, unstructured).
Establish rapport and avoid asking sensitive questions until the interviewee appears comfortable.
Give the interviewee sufficient time to respond.
Be aware of cultural norms with eye contact, gender issues and asking direct questions.
Write up the interview notes as soon as possible after the interview.

RESOURCES

References and further reading on reviewing, translating, pretesting and piloting the interview

Kvale S.

1996 InterViews: An Introduction to Qualitative Research Interviewing. SAGE Publications, Thousand Oaks.

Oxfam

2019 Conducting semi-structured interviews. Oxford.

Rubin, H.J. and I.S. Rubin

1995 Qualitative Interviewing: The Art of Hearing Data. SAGE Publications, Thousand Oaks.

Spradley, J.P.

1979 The Ethnographic Interview. Holt, Rinehart and Winston, New York.

Interview example

Evaluation: IOM Timor-Leste Counter Trafficking Project – Interview protocol for stakeholders and
project

Relevant questions were drawn from the comprehensive list below, depending on the respondent’s role (such as project implementer or stakeholder) and areas of competency.

I have been requested to conduct an evaluation of the IOM project titled “Strengthening government and service provider responses to human trafficking in Timor-Leste: A capacity-building initiative”. The objectives of the evaluation are as follows: (a) measure the progress made by the project; (b) identify any challenges faced in the implementation of this project; and (c) identify lessons learned, best practices and potential areas for future project design and implementation. The evaluation focuses on the activities conducted under this project specifically, and not on IOM’s entire programme of activities in the country. The key respondents in this evaluation are IOM staff involved in project implementation, IOM’s implementing partner organizations, beneficiaries of the project’s activities, and government and civil society stakeholders. Individual responses will be kept confidential, and we will only share generalized findings and anonymous comments.

Thank you for your time and cooperation in this process!

Background information

(a) What is your title, role and your responsibilities in relation to the IOM project?

(b) How long have you been in this position?

Relevance

(d) To what extent are the objectives of the programme still valid?

(e) To your knowledge, what are the main activities and outputs of the programme?

(f) Are the activities and outputs of the programme consistent with the outcomes and attainment of its objective?

(g) In your view, what are the assumptions linking the activities, outputs, outcomes and objectives?

Effectiveness

(h) Has your organization been involved in the implementation of any of the activities of this project?

(i) If yes, which ones?

(ii) If no, have you heard anything about the implementation of these activities? If yes, which ones?

(iii) What have been key achievements of these activities?

(iv) In your experience, what types of activities have been most of least successful? Why?

(v) What have been the key factors that have positively or negatively affected your work (or other’s work) in this project?

Sustainability

(i) What factors will contribute to or impede the continuation of the project’s achievements after the end of the project?

(j) To what extent have project outputs been institutionalized? For example, have any guidelines, terms of reference, policy documents, legislation or other items been adopted by national institutions?

Project progress (for implementers)

(k) What do you consider as the project’s key achievements to date?

(l) What have been key disappointments?

(m) To what extent is the project’s implementation on schedule? Why?

(n) What have been strengths and weakness in the implementation of the pilot initiative? (such as in terms of timeliness, managing risks, partners and resource allocation)

(o) What key lessons have been learned to date from implementing the pilot initiative? What recommendations or suggestions can you make with regard to the remaining implementation period of the project? Beyond the project?

Preparing, conducting and moderating a focus group

Preparing for a focus group discussion

An average focus group discussion involves 6 to 8 people and a maximum of 15 and lasts between one to two hours. When selecting the participants for a focus group discussion, it is important that the group is homogeneous so that participants feel more comfortable expressing their opinions. To select homogenous groups of participants, consider, among other things, the following:

Gender: Will men and women feel comfortable discussing this topic in a mixed-gender group? For example, women might feel uncomfortable discussing issue X if men are in the group.
Age: Will age affect the way that people react to this topic? For example, a young person might feel uncomfortable talking about issue X if older people from his community are in the group.
Hierarchy: Will people of different hierarchical positions be able to discuss this topic equally? For example, a resident from village A might feel uncomfortable discussing issue X if the local administrator is in the group.Humans of Data, 2017

Other considerations can also include the participation of officials of the government among participants who may influence the answers or cultural differences that may affect the answers (mixing participation of indigenous people with other non-indigenous communities).

RESOURCE

References and further reading on preparing for a focus group discussion

Humans of Data

2017 How to conduct a successful focus group discussion. 11 September.

Once the participants are selected, obtaining their informed consent is required, verbally or on a written form. For the location, it is important to select a quiet and secure place, and easily accessible by all the participants. If not doing any survey and collecting demographic data is needed, a short form can be designed and administered to the participants at the end of the focus group discussion.

How to conduct a focus group discussion

Introduction

At the start of the focus group, the moderator should present the aim of the discussion and an overview of the topics that will be covered, assure participants that their responses will remain confidential and anonymous, and lay down the ground rules. The following is an example of an introduction to read at the outset of the focus group to all the participants:

"Thank you for agreeing to participate in this focus group discussion. We will discuss … Let me remind you that the information given will be treated confidentially and that you may refuse to answer any of the questions and end your participation in the discussion at any time.

For the group discussion to proceed smoothly and respectfully for all participants, the following ground rules should be respected at all times by everyone:

One person speaks at a time.
What is shared in the room stays in the room.
Everyone has a chance to share their ideas, experience and opinions.
There are no right or wrong answers.
Everyone actively listens to and respects each other.

In addition to these ground rules, I would like to ask you if you have any other ground rules that you would like to add to the list.”

Warm-up

Before starting with the focus group questionnaire or topic guide, a warm-up time can be spent to make the participants feel comfortable around each other and safe to open up and share their ideas, experience and opinions. To do so, begin the discussion by asking the participants to introduce themselves (for example: “First, I’d like everyone to introduce themselves. Can you tell us your name?”) or conduct an icebreaker exercise (see Example box).

EXAMPLE - Icebreaker example

One-worders: This icebreaker allows the group to be familiar with one another by sharing their thoughts on a common topic. First, divide the participants into subgroups of four or five people by having them number off. This allows participants to get acclimated to the others in the group. Mention to the groups that their assignment is to think of one word that describes X; give the groups a minute to generate a word. After, the group shares the one word that describes X with the entire group of participants. For example, in a session about mobile usability testing, request the group to think about their smartphone and come up with one word to describe it.

Source: eVOC Insights, 2013.

Questionnaire/Topic guide

Similar to interviews, focus groups will vary in the extent to which discussions are structured. If having a strong sense of the issues to be explored, consider developing a questionnaire (the number of questions will depend on the length and number of participants, but should not exceed 10 questions). If not having a thorough understanding of the issue(s) to be explored, consider developing a topic guide that will allow the group itself to shape the agenda and the flow of the discussion.

Wrap-up

“What is one thing that you heard here that you thought was really important?”
“Thank you for your time and participation. This has been a very successful discussion.”
“Your opinions will be a valuable asset to… We hope you have found the discussion interesting.”

After the focus group discussion

If the discussion is recorded, having specified it at the beginning, transcribe the conversation as soon as possible, so the specific nuances of the dialogue are not lost.

Tips for moderating a focus group discussion

(a) Actively listen to the participants and remain neutral.

Active listening involves hearing what each person is saying and observing the body posture and facial gestures, which can provide insights regarding the appropriate ways to engage participants. It is important to remain as impartial as possible, even if having a strong opinion about something. If participants sense about you having an opinion, they may want to change their responses so that they will seem more socially desirable, rather than reflect what they truly believe or feel about a topic.

(b) Show the participants that they are listened to what they are saying.

Some of the common signs that indicate about paying attention include leaning forward slightly, looking directly at the participants while they are speaking and nodding at appropriate times. Frequently looking away and/or at a watch, or even worse yawning, can risk making participants feel that they are not listened to or boring, which can result in them becoming disengaged from the discussion.

(c) Use silence to encourage elaboration.

Allowing silence at times during the discussion can encourage participants to elaborate upon what they are saying.

(d) Use probes.

When participants give incomplete or irrelevant answers, it is possible to probe for more developed, clearer responses. Some suggested probing techniques are as follows:

Repeat the question;
Pause for the answer;
Repeat the reply;
Ask when, what, where, which and how questions;
Use neutral comments such as “Anything else?”

Introduction

EXAMPLE

Example of a good probe:

“Could you explain what you mean by…”

Example of a bad probe:

“So you’re telling me that …. Right?”

Using probes for clarification also reinforces the impression that the participants have expert knowledge, based on their direct experiences with the topic(s) or issue(s) being monitored/evaluated. Good probes let the participants know that they are listened to and that it is worth to know more about what they have to say. It is important to avoid asking leading questions, as these can convey to participants an opinion and that it is not about learning from them as an unbiased listener. This type of questioning can also lead participants to answer questions in a socially desirable manner.

(e) Keep the participants talking.

To avoid possibly interrupting the participants if there is a need to follow-up with something, make a mental note of it and ask them about it once they have finished their thought.

(f) Keep track of time.

Some individuals have a tendency to talk at length about their ideas, experience and opinions. The moderator has to structure the focus group discussion in such a way that it is possible to elicit complete responses from the participants without rushing them, while still respecting the time constraints.

(g) Keep the discussion moving.

When the participants are sharing less pertinent or off-topic information, it is possible to politely move the focus group discussions forward, for instance by highlighting something that the respondent talks about that is relevant to another question or set of questions prepared for the discussion. Another way is to acknowledge that time together is waning, and there are some other aspects to have time to discuss, and for this reason, invite to move on.

(h) Reduce the pressure to conform to a dominant point of view.

When an idea is being adopted without any general discussion or disagreement, it is likely that group pressure to conform to a dominant viewpoint has occurred. To minimize this group dynamic, it is suggested to probe for alternative views; for example, “We have had an interesting discussion, but let’s explore other ideas or points of view. Has anyone had a different idea/experience/idea that they wish to share?”

(i) Note-taking.

Handwritten notes should be extensive and accurately reflect the content of the focus group discussion, as well as any salient observations of nonverbal behaviour, such as facial expressions, hand movements and group dynamics.

RESOURCES

References and further reading on preparing for a focus group discussion

eVOC Insights

2013 Focus group icebreakers. 5 September

Humans of Data

2017 How to conduct a successful focus group discussion. 11 September.

IOM example – Focus group discussion guide

Focus group discussion guide: Excerpts from the IOM South Sudan post-distribution monitoring of nonfood items (Wau, September 2017).

Two focus group discussion were carried out, one with women (11 participants) and one with men (9 participants). Community mobilizers assisted in gathering the participants. One local interpreter was present to facilitate discussions from Wau Arabic to English and vice versa.

Effectiveness

What items did you receive?
How did you hear about the registration taking place?

Protection

Have you seen changes after receiving the shelter/NFIs? (Specify changes in their relationship with the community, family and others. Did jealousy arise? Did this have any implications on their sense of security? Were there thieves?)

Appropriateness (items)

Did the items you receive meet your needs? (Can you rank the items that you needed the most at the time of distribution?)

Coverage

Were all those displaced registered?

Quality of intervention (services provided by the organization)

Were the community, local authorities and beneficiaries involved in the criteria and needs identification of the NFI materials?

Accountability

Was there a complaint desk?

Examples of observations and planning and conducting observations

Examples of observations

Structured observations are conducted using a recording sheet or checklist to list targeted observations.

EXAMPLE

Table A4.2. Example of structured observations during a focus group discussion

Overall self-expression of participants	Resist sharing of thoughts and feelings.	Struggle to share thoughts and feelings.	Express thoughts and feelings during safe conversation topics.	Express thoughts and feelings during difficult conversation topics.
Overall listening of participants	Ignore input from others.	Listen to in-group members but ignore outgroup members.	Listen carefully to input from others.	Question others to get other viewpoints

Semi-structured observations can be conducted using observation guides

EXAMPLE

Table A4.3. Example of semi-structured observations during a focus group discussion


Verbal behaviour and interactions	Who is speaking to whom and for how long? Who initiates interaction? What is the tone of voice of participants? Do participants wait for their turn to speak? Is the language used by the participants tolerant?
Physical behaviour gestures	What are participants doing during the focus group discussion? Who is interacting with whom? Who is not interacting? What are the reactions of participants during the discussion; for examples, laughing (about what), surprised (about what), disagree (about what), upset (about what)?

Unstructured observations are usually conducted when en route to or while walking around the observation site (see example in Table A4.4).

EXAMPLE

Table A4.4. Example of unstructured observations

The following are the indicators the researchers should use to guide their daily observations when in the field.

Governance/Political situation in the community

Is the language used by the participants tolerant? (Visibility of local public services, such as health clinics, schools and different political associations)
Local offices of human rights organizations or “community-building” initiatives (such as promotion of women’s rights and educational and health projects)
Functionality of local authorities (visibility, accessibility by the public and possible personal experiences)
Presence of police or the military on the streets
Visible signs of politics in the community (presence of political parties, campaign posters)
Level of (dis)trust encountered in personal experiences with local persons
General atmosphere in the locality (sense of optimism/pessimism)

Socioeconomic situation in the community

State of local infrastructure (road, public services)
Presence of construction activities (roads, official buildings and private houses)
Presence of (international or local) development agencies/non-governmental organizations

Planning for observations

When planning for observations, it is advised to consider the following steps:

Step 1:Determine what is being monitored/evaluated, that is, identify the indicators being monitored, evaluation criteria and questions being explored.

Step 2:Determine the specific items for which to collect data on and how to collect the information needed (recording sheets and checklists, observation guides and/or field notes).

Step 3: Select the sites for conducting the observations.

Step 4: Select and train the observers.

Step 5: Pilot observation data collection procedure(s).

Step 6: Schedule observations appropriately to ensure observing the components of the activity that will answer the evaluation questions.CDC, 2018

The observation data collection procedure should be piloted before it is used for monitoring/evaluation. To do this, a minimum of two observers should go to the same location and complete the observation sheet. Once completed, the sheets should be compared. If there are large differences in terms of the observations made, the observers may require more training and clarification. If there is little difference, the procedure can be used for monitoring/evaluation.

Tips for conducting observations

a) Enter the observation process without preconceived notions and fixed expectations.

(b) Note observations made and information volunteered that are related to subjects beyond formal assessment concerns.

(d) Stay focused to make useful comparisons.

(e) Be active and curious in the observation process, which is about seeing, hearing, smelling, tasting, feeling and touching.

(f) Be aware of what was not seen, such as the absence of people, services and infrastructure.

(g) Respect the local culture.

Do not

(a) Begin the observation process with a set of expectations or seek to record data primarily to prove a pre-existing hypothesis.

(b) Rely on remembering information. Record observations on the observation sheet.

(c) Focus solely on misery and destitution. Be aware of capacities, opportunities and social capital within the affected community.

(d) Be intrusive. Take steps to be as sensitive and respectful as possible.

(e) Take a photograph without asking prior permission.

RESOURCES

References and further reading on observations

US Department of Health and Human Services, Centers for Disease Control and Prevention (CDC)

2018 Data collection methods for program evaluation: Observation. Evaluation Brief no. 16.

Steps for analysing qualitative data

Step 1: Get to know the data

Before beginning to analyse the data, M&E practitioners need to familiarize themselves with them. This process requires reading and rereading the notes taken (the data) in their entirety. As they go through the data, it is important to take notes of any thoughts that come to mind and summarize each transcript and piece of data that will be analysed. The goal at this stage is to absorb and think about the data that has been collected, jotting down reflections and hunches but reserving judgement. Some open-ended questions that can be asked include: “What is happening here?” or “What strikes you?”Tracy, 2013

Step 2: Initial round of coding

Once the content of the data is known, practitioners can begin coding the material to condense the information into key themes and topics that can help answer the M&E questions posed.

There are two main approaches to coding qualitative data. The first approach consists of creating a framework that reflects the monitoring or evaluation aims and objectives, which is then used to assess the data gathered. This is a deductive approach, as the concepts and issues of interest are first identified, which allows one to focus on particular answers of respondents and disregard the rest.

The initial round of coding begins with an examination of the data and assignment of words or phrases that capture their essence. Those who use a manual approach could write the code in the margin, and those who use a word-processing software could type the code in the Comment function or in another column.

The codes assigned in this first round of coding are usually, but not always, also first-level codes. Firstlevel codes focus on “what” is present in the data. They are descriptive, showing the basic activities and processes in the data such as LAUGHING), thereby requiring little interpretation, which is done in the second round of coding.

Throughout the coding process, it is important to continuously compare the data applicable to each code and modify the code definitions to fit new data (or the codes and create a new code). Both lumping data into large bins and fracturing them into smaller slices have advantages and disadvantages. Bazeley and Jackson, 2013. Those who first fracture the data into small pieces, each with its own code, usually connect these bits into larger categories during later coding cycles. In contrast, those who lump first usually make finer distinctions later.

What data should be coded first? Many qualitative experts suggest first coding the data that are typical or interesting in some way, and then moving on to contrastive data. The initial data texts coded will influence the resulting coding scheme, so it is advised to choose texts in these early stages that represent a range of the data available. Also, an iterative approach does not require that the entire corpus of data be put through a fractured and detailed primary coding cycle. After having read through all the data a few times, and having conducted line-by-line initial coding on a portion, it is possible to consider some focusing activities.

As practitioners engage in the initial round of coding, it is helpful to create a list of codes and a brief definition or representative example of each, especially if the codes are not self-explanatory. As the coding becomes more focused, it is wise to develop a systematic codebook – a data display that lists key codes, definitions and examples that are going to be used in the analysis. Codebooks are like “legends” for the data, helping to meet the challenge of going through pages of transcripts, highlighting and scrawling.

EXAMPLE

A codebook can include the following: Bernard and Ryan, 2010, p. 99.

Short description of code;
Detailed description of code;
Inclusion criteria (features that must be present to include data with this code);
Exclusion criteria (features that would automatically exclude data from this code);
Typical exemplars (obvious examples of this code);
Atypical exemplars (surprising examples of this code);
“Close but no” exemplars (examples that may seem like the code but are not). Tracy, 2013.

In addition to creating a codebook, it is important to frequently return to the monitoring or evaluation questions posed. As most M&E practitioners face various time and resource constrains, many pursue analysis directions that align not only with themes emerging in the initial coding, but also with ones that mesh well with monitoring/evaluation goals, experience and deadlines.

Throughout the analysis, revisiting research questions and other sensitizing concepts helps to ensure they are still relevant and interesting. Original interests are merely points of departure, and other more salient issues may emerge in the data analysis.

Step 3: Second round of coding

The second round of coding is about beginning to critically examine the codes identified in the initial round of coding and organize, synthesize and categorize them into interpretive concepts. This second round aims to explain, theorize and synthesize the codes from the first round by identifying patterns, rules or cause–effect progressions and making interpretations.

For instance, if codes that continually reappear in the data are identified, M&E practitioners may decide to link them together in a specific way that responds to the monitoring/evaluation question posed.

Accordingly, at this point, a better understanding of which data will be most important for the analysis will emerge. Certain data, even if they are already collected, may only tangentially relate to the questions being explored, and therefore, they will not be included in the analysis at hand. It is also at this point that M&E practitioners will see whether additional data needs to be collected to flesh out an emerging code or explanation of what is being observed in the data collected. One way to identify whether additional data is required is to ask this question: “Does the emerging analysis address the monitoring/evaluation question posed in an interesting and significant way?” If not, this may suggest the need for more data. It might also suggest the need for additional synthesizing activities.

Throughout the coding process, it is important to record the emerging thoughts and ideas systematically. First, create a document that records all the analysis activities chronologically (date and discussion of what was accomplished in terms of analysis). Second, write analytic memos, both as a part of the analysis process and as an analysis outcome. Analytic memos are sites of conversation with oneself about our data. Analytic memos help M&E practitioners figure out the fundamental stories in the data and serve as a key intermediary step between coding and writing a draft of the analysis. Although they can take many forms, analytic memos are often characterized by one or more of the following features:

Define the code as carefully as possible;
Explicate its properties;
Provide examples of raw data that illustrate the code;
Specify conditions under which it arises, is maintained and changes;
Describe its consequences;
Show how it relates to other codes;
Develop hypotheses about the code.Tracy, 2013.

Analytic memos are very helpful for thinking through how codes relate to each other.

Practitioners should also play devil’s advocate with themselves through the process of negative case analysis. Such a practice asks them to actively seek out deviant data that do not appear to support the emerging hypothesis, and then revise arguments so that they better fit all the emerging data. Negative-case analysis discourages the practice of cherry-picking data examples that only fit early explanations and ignoring discrepant evidence. As such, negative case analysis helps to ensure the fidelity and credibility of emerging explanations.

In addition to the analytic memos, M&E practitioners should create a loose analysis outline that notes the questions posed and the potential ways the emerging codes are attending to them.

Once the data is coded, it is time to abstract themes from the codes. At this stage, practitioners must review the codes and group them together to represent common, salient and significant themes. A useful way of doing this is to write the code headings on small pieces of paper and spread them out on a table; this process will give an overview of the various codes and also allow them to be moved around and clustered together into themes. Look for underlying patterns and structures – including differences between types of respondents (such as adults versus children and men versus women) if analysed together. Label these clusters of codes (and perhaps even single codes) with a more interpretative and “basic theme”. Take a new piece of paper, write the basic theme label and place it next to the cluster of codes. In the final step, examine the basic themes and cluster them together into higher order and more interpretative “organizing themes”.

RESOURCES

Bazeley, P. and K. Jackson

2013 Qualitative Data Analysis with NVivo. SAGE Publications Ltd., London.

Bernard, H.R. and G.W. Ryan

2010 Analyzing Qualitative Data: Systematic Approaches. SAGE Publications, California.

Tracy, S.

2013 Qualitative Research Methods: Collecting Evidence, Crafting Analysis, Communicating Impact. WileyBlackwell, West Sussex.

Calculating descriptive statistics

Calculating these descriptive statistics can be easily done with Microsoft Excel. To do this, install the Data Analysis Tool pack. Open Excel, go to FILE | OPTION| ADD INS and add the Analysis Tool. Once this is done, Data Analysis should appear on the far right of the tool bar. The advantage of the data analysis tool is that it can do several things at once. If a quick overview of the data is needed, it will provide a list of descriptive statistics that explain your data.

INFORMATION - Steps to use the Excel Data Analysis tool

Step 1. Install Data Analysis Tool pack

Install Data Analysis Tool pack

Step 2. Check that the Data Analysis Tool pack is installed.

Step 3. Open the Data Analysis Tool.

Step 3. Open the Data Analysis Tool

Step 4. Conduct data analysis

Conduct data analysis.

Step 5. Interpret descriptive statistics.

Interpret descriptive statistics.

Source: World Sustainable

RESOURCES

World Sustainable

2020 Easy descriptive statistics with Excel. 3 June.

Types of visualizations

There are multiple ways and tool for visualization of data, and here are some of the most common samples and types.

Summary table

Summary tables are useful for displaying data in simple, digestible ways. The use of a summary table allows the reader to assess data and note significant values or relationships. Figure A4.1 depicts a summary table of the types of sites as hosting IDPs displaced due to the ongoing conflict in South Sudan.

EXAMPLE

Figure A4.1. Example of summary table

Table 1. International migrants, 1970–2015

Year	Number of migrants	Migrants as a % of world’s population
1970	84,460,125	2.3%
1975	90,368,010	2.2%
1980	101,983,149	2.3%
1985	113,206,691	2.3%
1990	152,563,212	2.9%
1995	160,801,752	2.8%
2000	172,703,309	2.8%
2005	191,269,100	2.9%
2010	221,714,243	3.2%
2015	243,700,236	3.3%

Source: IOM, 2017c, p. 15.

Facts and figures

Infographics are a useful way to draw attention to important facts and figures in the data. Icons and images, as well as different font sizes, can be used to present the data values in an appealing way that is easily digestible (see Figure A4.2 for an example).

EXAMPLE

Example of infographic

Source: IOM, 2020b, p. xii.

RESOURCES

IOM resources

2017c World Migration Report 2018. Geneva.
2020b 2019 Return and Reintegration Key Highlights. Geneva

Online tools

Noun Project provides access to free icons and images to download and use. Canva and Piktochart provide free and easy-to-use templates for infographics.

Comparison, rank and distribution

Bar charts and heat maps can be used to compare, rank and show the distribution of data values. Bar charts use a horizontal (X) axis and a vertical (Y) axis to plot categorical data or longitudinal data. Bar charts compare or rank variables by grouping data by bars. The lengths of the bars are proportional to the values the group represents. Bar charts can be plotted vertically or horizontally. In the vertical column chart, the categories being compared are on the horizontal axis, and on the horizontal bar chart (see Figure A4.3), the countries being compared are on the vertical axis. Bar charts are useful for ranking categorical data by examining how two or more values or groups compare to each other in relative magnitude at a given point in time.

EXAMPLE

Proportional population change by region, 2009–2019

Source: IOM, 2019b, p. 25.

Histograms are a graphical representation of the distribution and frequency of numerical data. They show how often each different value occurs in a quantitative, continuous data set. Histograms group data into bins or ranges to show the distribution and frequency of each value.

Figure A4.4 shows the proportion of survey respondents reporting exploitation for any type of the Central Mediterranean route by age group. Here, the age of the respondents are grouped into “bins”, rather than displaying each individual age.

EXAMPLE

Figure A4.4. Example of a histogram

Source: IOM, 2019b, p.22.

Another approach to visualize the distribution of the data is to use heat maps. Figure A4.5 shows the concentration of returnees from Pakistan and Islamic Republic of Iran to three provinces in Afghanistan (Laghman, Nangarhar and Kunar).

EXAMPLE

Figure A4.5. Example of a heat map

Example of a heat map

Source: IOM, 2020

Note: This map is for illustration purposes only. The boundaries and names shown and the designations used on this map do not imply official endorsement or acceptance by the International Organization for Migration

RESOURCES

IOM resources

2019b World Migration Report 2020. Geneva.
2020c DTM – Central Sahel and Liptako Gourma Crisis. Monthly Dashboard #3. 13 March

Online tools

Carto allows to present data on geographic maps.

Proportion or part-to-whole

Pie charts or donuts are circular charts divided into slices, with the size of each slice showing the relative value, typically out of 100 per cent. Pie charts and donuts are useful for providing an overview of categories at a single point in time (see Figures A4.6 and A4.7).

If deciding to use a pie chart, make sure to limit the number of pie slices to five, as too many risk distracting the reader from the main point. Also, it can happen that the value of some of the slices are relatively the same, which makes it hard to compare their contribution to relate to one another. In this case, a horizontal bar chart may be more appropriate if the values of the slices are relatively the same to clearly see the difference between them.

EXAMPLE

Example of a pie chart

Source: IOM, 2020 d.

Example of donuts

Source: IOM, 2020, p.4.

Change over time

Bar charts can also be used to represent longitudinal data repeated over time to help identify temporal trends and patterns (see Figure A4.8). Similarly, line charts are another great way for displaying trends (see Figure A4.9).

EXAMPLE

Figure A4.8. Example of a bar chart

. Share of global migrants under 20 years of age

Source: IOM, 2020, p. 237

Figure A4.9. Example of a line chart

migration context; they exist for t

Source: IOM, 2020, p. 142

Relationships and trends

Scatter charts are commonly used to show the relationship among the variables where both the horizontal and vertical axes are value axes, not categorical axes. For instance, the United Nations has released a new study that finds a causal relation between long-lasting droughts in El Salvador, Guatemala and Honduras and the increase in irregular migration from these countries to the United States. “Members of families affected by the drought are 1.5 per cent more likely to emigrate than similar households elsewhere. Although this is a low value, the significance lies in the fact that the correlation between drought occurrence and emigration is positive and the probability of emigrating is higher than that of families who are not from the Dry Corridor”. World Food Programme, 2017, p. 16. The scatter plot can be used to illustrate this positive relationship (as the length of drought increase, irregular migration increases). In Figure A4.10, the scatter plot demonstrates a positive relationship between the number of units sold by product family and revenue. The more units sold, the greater the revenue.

A bubble chart is a variation of a scatter chart in which the data points are replaced with bubbles, and an additional dimension of the data is represented in the size of the bubbles. Just like a scatter chart, a bubble chart does not use a category axis; both horizontal and vertical axes are value axes.

EXAMPLE

Example of a scatter chart

Source: IDMC, 2018, p. 61

Text analysis

Text analysis refers to various processes by which qualitative data can be modified so that they can be organized and described in a clear and intuitive manner. To summarize text gathered such as focus group discussion notes, basic text summaries and analyses can be conducted. Some of the most common ways of achieving this is using word frequencies (lists of words and their frequencies) and word clouds (see Figure A4.11).

EXAMPLE

Figure A4.11. Example of word clouds

Example of word clouds

Source: IOM, 2017c, p. 142

RESOURCES

IOM resources

2017c World Migration Report 2018. Geneva.
2019b World Migration Report 2020. Geneva.
2020b 2019 Return and Reintegration Key Highlights. Geneva.
2020d Yemen – Rapid Displacement Tracking Update. 13–19 December.

Other resources

Internal Displacement Monitoring Centre

2018 Global Report on Internal Displacement. Geneva.

Online tools

Visage tools – how to design scatter plots
Voyant tools – word frequencies, concordance, word clouds and visualizations
Netlytic – summarize and discover social networks from online conversations on social media sites
Wordclouds
EdWordle
Word tree

TIP - What to remember when creating data visuals

To create good graphics, use only a few contrasting but compatible colours that are also suitable for People with colour blindness and reprinting in black and white.
Order the data in graphs in a logical sequence, with appropriate data ranges to help viewers easily interpret the data (such as from greatest to least or by time period).
Take care when using 3D charts because these can often be difficult to read and can hide or distort data.
Keep graphs and charts simple. Avoid including different variables on different scales in the chart or overloading with decoration, gridlines or unnecessary information. If there is no purpose for something, leave it out.
If creating truly powerful data visualizations, adding some context in the form of text is one of the most effective ways to communicate the data. Yuk and Diamond (2014) identify five main rules for adding text to data visualizations:
- Use text that is complementary;
- Use simple words;
- Keep it short;
- Avoid using random colours to make text stand out against visuals;
- Ensure text applies to every scenario of the data being displayed.

Table A4.5. Evaluating data visuals checklist

	Items to consider	√
1	Did I eliminate all non-essential information?
2	Am I overwhelming the reader by the quantity of data?
3	Does the chart choice enhance or obscure the story the data is telling?
4	Is it clear to the reader when and from where you obtained the data
5	Are you consistent with the colours chosen?
6	Do I effectively use white space to separate the graphical areas and text?
7	Is the layout easy to digest and does not crowd any of the information presented?
8	Is the choice of chart suitable for the purpose of the visual?
9	Do icons really help emphasize the important information?
10	Do I avoid duplicating information and charts?
11	Do I use clear sections to make it easy for users to view the visualizations?
12	Is the text size appropriate (not too small but also not too large)?
13	Are labels clear?
14	Is the style of different labels consistent?
15	Is all the text visible (that is, it is not cut off)?

For more information and examples of the points items listed in the checklist for evaluating the data visuals (Figure A4.2), see chapter 13: Evaluating real data visualizations in Data Visualization for Dummies by Yuk and Diamond (2014).

RESOURCES

References and further reading

Carrington, O. and S. Handley

2017 Data visualization: What’s it all about? New Philanthropy Capital (NPC) Briefing, August. London.

Hewitt, M.

2016 11 design tips for visualizing survey results. Visage, 1 December.

World Food Programme

2017 Food Security and Emigration: Why people flee and the impact on family members left behind in El Salvador, Guatemala and Honduras. Research report. Clayton.

Yuk, M. and S. Diamond

2014 Data Visualization for Dummies. John Wiley and Sons, New Jersey.

Additional online data visualization tools

A wide array of data visualization tools exists online, many of which are accessible for free. Also remember that Microsoft Excel is the most common tool for data visualization and can create many good charts and graphs. To access free tutorials, discussions and best practices for creating data visualizations in Microsoft Excel, see the Excel Charts blog. In addition to Excel, the following are some additional free online tools for creating charts: