Monday, April 26, 2010

CSDS/ASC miniconference, and Latent Social Spaces.

Last Saturday was great. The Applied Statistics Center and the Center for the Study of Development Strategies held a miniconference at Columbia University for those who had received summer research grants. Eleven people presented their projects' research designs. Topics ranged from the long-term impact of the Peruvian civil war on local level institutions to evaluating the impact of education on HIV/AIDS on sexual behavior. Together with Neelan, I presented our network project in Eastern Congo. Faculty was present, everybody had to read all the project descriptions in advance, and everybody was assigned as a discussant to at least one presentation; we had many long and lively discussions. From seven onwards we had dinner at Macartan's place to continue the discussions in a more informal setting. It was really nice to have people from political science, economics and statistics discussing similar topics together - we do not do this often enough.

Latent Space Models and Aggregate Relations Data

I was discussant for Tyler McCormick's project called "How many X’s do you know?". Tyler is is a PhD Candidate in the statistics department and works together with Tian Zheng. The project builds on Peter Hoff's 2005 seminal article on Latent Space Models [1].

Aggregate relations data
It is often difficult to obtain information about a sensitive group; HIV/AIDS-infected people, people that have been raped, etc. One could ask in a survey "Are you a member of X?" - where X is the sensitive group - and then add up all the people that say 'yes'. However, it is unlikely that people tell the you truth. One can get around this problem by asking "How many members of X do you know?". The answers to these questions is so-called aggregate relations data.

Homophily and diadic interdependence
However, people are more likely to know 'similar' people - something that is called homophily. For example, if X is "Rose" - a common name among older women - and people are more likely to know individuals of the same age it is likely that older people know more people called "Rose" than younger people. If I want to estimate the size of a respondents network based on this information I would over (under) -estimate the network size of older (younger) respondents.

This complicates research on networks. The figure below gives 2 possible network structures among three people. The dotted line is a link that could be formed. On the left hand side we have that T is - for example - friends with N and P (the solid lines), while on the right hand side this is not the case. Consequently, it is more likely that N and P become friends in the structure on the left then on the right.


In other words, whether N and P form a link is dependents on how N and P are connected with a third person. This is the defining characteristic of networks and is called "diadic interdependence".

It is also highly problematic. For example, let's say we want to run a logit or probit regression where the dependent variable is whether person i and j form a link. Whether a link is formed dependents on whether i and j have links with other people (how they are connected in the network). The problem is that one can't just control for that as those other links are dependent variables in their own right. The problem is similar to autocorrelation in time series models.

Latent social space
A solution to this is to bring structure to the error term by making use of a so-called latent network space. In brief, all people have a position in an unobserved d-dimensional social space that gives us information about the underlying social structure. Actors with closer positions in the latent space are more likely to have interactions. For example, let's say that we have a two dimensional space and there is a polygon for the group of people called "Rose" - the size and shape of the polygon dependent on the group's variance over the two dimensions. Then an older person is more closely located to the "Rose"-polygon than than younger people.

Thus by combining the ideas of latent network spaces and aggregate relational data one can get rid of diadic interdependence and obtain information about a sensitive group such as: 1) the size of HIV/AIDS-infected people in society, 2) whom to approach if one wants to find a HIV/AIDS-infected person (who is most likely to know one), and 3) how homogeneous is the group of HIV/AIDS-infected people.

Very interesting stuff!

[1] Peter D Hoff. 2005. Bilinear Mixed-Effects Models for Dyadic Data. Journal of the American Statistical Association. 100, 286-295.

No comments:

Post a Comment