Coding in the Congo: The Joy of Fieldwork.

Fieldwork is fun for a number of reasons. First, one escapes the ivory tower of academia and sees the theories one normally reads about in books in practice (or – and this is more likely – one sees and experiences how wrong these theories (and you) were on so many points). Second, one flies to exotic places and gets to meet interesting people (do make sure you choose your dissertation topic correctly!). Third, and very important, one gets to dress in green and drive around in 4x4s and on motorbikes like a modern day Indiana Jones. But the fourth and nicest thing about fieldwork is how it combines both the intellectual and the practical. Let me give an example that recently arose.

At the moment we are piloting the evaluation of a large development project (TUUNGANE) in four Congolese villages; in two months the real evaluation will start in 560 villages. For this evaluation we want to interview 10 randomly selected people from a village-meeting of which the size is not know ex ante – people normally move in (arrive late) and out (get bored) of the meeting. So, how does one randomly select 10 people from a group of which the size is not precisely known ex ante? Needless to say, the approach should be theoretically sounds (i.e. a proper randomization) and also be easy to implement. For example, just asking “I need 10 people” and then taking the first ten people that show up, while practically easy, is not theoretically sound because those ten are unlikely to be a representative sample (when the number of village meetings go to infinity).

We came up with four different techniques.

1. Randomize the people-approach.
The people in the village-meeting take each one piece of paper from a bag or a bucket; the pieces of paper have a number on it that ranges from 1 to X, where X is the highest number of people that ever could be present (this can be higher than the actual people present). After distributing the pieces of paper, one can say out loud “number 1”, “number 2”, “number 3”, etc. until there are 10 people. While it is not necessary one can also do this a bit more sophisticated by creating a list before entering the village that has randomly selected numbers that can then be said out loud. The list could look like this:

# Lottery number
1: 120
2: 76
3: 5
4: 65
… …

Needless to say, in both cases it is in possible that a number that is read out is still in the bag; this is no problem because you then just read out the next number on the list. While this approach sounds great in theory, in practice it is more complicated to implement for three reasons. First, handing out numbers is associated with prices (like handing out lottery tickets) and people therefore rush to get more than one piece of paper. This could get chaotic quickly. A solution to this would be to line up everybody or be very orderly. Second, as we noticed during the pilot and what we should have know ex ante: not all people can read. A solution would be to have large numbers on an A4 paper and instead of shouting for example “122” hold up the A4 paper “1”, “2” and “2”. The third problem is that selecting people by making use of numbers has a connotation in Bantu tradition with witchcraft.

2. The x plus fixed interval-approach.
This is an approach we have used with previous randomizations. In brief, one randomly selects the first person and then have a fixed interval to select the other nine. In other words before entering the village one creates a list that gives for each meeting size (#people) a randomly selected number to start (Start Number) and the interval that needs to be used to get 10 people. The list would look like this:

# people Start Number Interval
… … …
121: 12, 12
122: 101, 12
… … …
130: 90, 13
… … …

It is ok that we do not precisely know the number of people present (as long as we don’t expect person 130 and 131 to be very different). In first instance we thought this (and the next) approach is theoretically wrong for the following reason:
“For example, let’s say that there are 130 people in the village-meeting of which 14 are elderly men. From experience we now know that these will stand or sit next to each other. Because the interval in this case is 13, you will always select an elder; i.e. the probability of an elder being chosen is 100%. However, if one would really have a clean random strategy there is only a 14/130 chance to select an elder.”
Upon further reflection we noticed that the above reasoning was wrong and that this (and the next) approach does theoretically make sense. The reasoning was wrong because one does not select one person, but ten times one person and thus one chooses an elderly with 10*14/130 change; i.e. larger than one.

3. Decrease the problem-approach.
Select every 10th person in the village meeting (or every 15th if the group is very large). Let’s say that you then end up with 40 people selected; now one knows the exact amount of people present and things are easy. Place 30 white and 10 red pieces of paper in a bag and let them take one. The people with a red piece of paper stay for the interviews. However, like the first approach things could get chaotic. How to make sure that only those selected 40 will come to you. A solution could be to give them a piece of paper. In addition, in first instance we thought (and we thought wrongly) that this approach was also not theoretically sound for two reasons:

“The technique has the same ‘fixed interval’-problem as approach 2. In addition, the starting number would have to be randomized. Let’s say you always start counting in the first row or the person sitting closest to you. Let’s say again that in a 130 people village-meeting there are 14 elderly in the village and they are the ones that sit on the front row. As a result, one of them will always be selected among the final 40 and will thus have an overall 1/40 change of being selected, while it should be 1/130. This also holds for when you start with people that are in the last row, in the middle, etc.; i.e. problematic if we expect people to be different on where they sit (something that we can’t assume away).”

4. Meeting size list-approach.
Because we thought 2 and 3 were not sound, we came up with approach 4. A list is created before entering the village. This list has for each size of the meeting (# people) ten randomly selected numbers (Ten Numbers) for the people that will be selected. The list could be like this:

# people Ten Numbers
… …
121: 12, 34, 56, 61, 64, 78, 95, 100, 101, 120
122: 23, 25, 34, 45, 46, 49, 67, 89, 100, 122
… …

Then one counts the number of people present. Say one count 121 people you then select person “12”, “34”, etc. Under this system it is not important if you are completely correct on the village size; it doesn’t matter if you choose row 121 or 120 in the list (i.e. count 121 people or 120 people); as long as you think person number 121 isn’t very different from person number 120.* In addition, people are allowed to move around a bit around. Even if they do it in a systematic way it is ok, because the number chosen has been done so randomly. Upon further reflection this technique is similar to approach 2, but without the fixed intervals.

* Do note that if 140 people are counted while there are actually only 130, it is possible that a number between 131 and 140 pops up; i.e. a number that does not exist in the village meeting. Solution: Continue counting at the beginning (i.e. number 12 if there are 121 people present). This gives no bias as “12” had been selected randomly.

We also wrote a little bit of code in the computer program called "R" that gives this list when one would like to randomly select 10 people:

seed = 20100715
max.size.meeting = 500
set.seed(seed)
size.of.meeting = 10:max.size.meeting
get.ten.random.people=function(j)
{sort(sample(1:j, 10, replace=F))
}
for.all.sizes = sapply(size.of.meeting, get.ten.random.people)
t(for.all.sizes)
x <- t(for.all.sizes) village.size <- 10:max.size.meeting cbind(village.size, t(for.all.sizes))
data <- cbind(village.size, t(for.all.sizes)) data<- as.data.frame(data) names(data) <- c("size", "1", "2", "3", "4", "5", "6", "7", "8", "9", "10") View(data)

Conclusion
In the end all four approaches are theoretically sounds. After piloting we think that approach 1 is easiest to implement. All in all, this was one of those examples that shows the fun of fieldwork. One has a practical problem and has to find a solution. It is intellectually challenging: think through a proper randomization technique, write some nerdy computer code, etc. However, at the same time one also has to think about the practical implications of one’s theories (something academics hardly ever do), one even meets barriers that go back to cultura, and (again) one can dress in green and drive in 4x4s and on motorcycles to check out whether your ideas work! Fieldwork. Awesome!

Coding in the Congo

Tuesday, July 20, 2010

The Joy of Fieldwork.

No comments:

Post a Comment