In spring of 2010 I took a course on Social Computing, taught by Dr. William Erdly. As part of that class, I did some research on Facebook. In preparation for the Seattle Technical Forum‘s upcoming “Practices of Social Networking” meetup, I thought I’d summarize my research and findings here. You can download my full research paper here: Facebook Cliques and Control – Joe W Larson.
What are Cliques and who should care?
In graph theory and social network theory, a clique is a group in which every member is connected to every other member. For example, if you consider shared activities or living space to be a social connection, then a softball team or a nuclear family would be a clique. In the diagram below, the Wolf belongs to two different cliques, whereas the Brick Maker belongs to neither clique.
On Facebook, a bunch of people who have friended each other would be a clique. If you use Facebook (and with almost 600m users, odds are you do), then you can probably easily think of some cliques in your friend list. Your buddies from Ultimate Frisbee. The regulars down at the coffee shop. All the cousins who live in your Dad’s hometown. Every time a user joins Facebook and links up with all their friends, neighbors, and relatives, they are exposing these social structures to Facebook.
To me, it seems like having a data representation of these structures is an extremely valuable asset. One of the reasons I wanted to look into cliques is that I felt like they could be leveraged for marketing. In the offline world tight groups of friends or family members have a real ability to sway each other in terms of where to their spend money. If Facebook (or any other service with a social aspect) could successfully identify natural cliques, they might be able to better target their advertising. For example, if the most influential member of a clique could be identified, then perhaps one well placed advertisement shown to that person could actually eventually effect all members of that clique.
Using the Facebook API to find Cliques
I did a lot of research on the various Facebook APIs and SDKs available. This was actually a much more time consuming and frustrating process than I expected, simply because there are so many different tools out there provided by Facebook and others. Also, these tools have changed quite rapidly, so it was easy to be led astray by information that was no longer valid.
What was more frustrating, however, is that most of the data I expected to be able to retrieve was simply not available via any of the Facebook APIs. It is only possible to retrieve your own user’s friend list, meaning I basically had to focus on my own “ego network” (a network centered on one person — in this case, me). Many different types of Facebook objects like photos and links don’t have dates attached to them, so it is difficult to trace their “spread” through cliques in order to determine any given member’s level of influence.
In the end, I used a combination of the RestFB api, the JGraphT library, and some of my own Java code. With this, I was able to retrieve the 1419 friendship connections that exist between my 160 Facebook friends. I was able to gather some information on the number of communications between each pair of friends. Then I put together some code that detected the 646 cliques that exist within my friends list!
The possibility of finding so many cliques was hinted at by our readings and my own research. Since all of these cliques were made up of my own friends, I was quickly able to see how most of them were actually variations of other very similar cliques. There were coworker cliques, neighborhood cliques, and High School classmate cliques.
I was able to work out a method of merging many of these cliques into a set of fewer, larger cliques. This was done by comparing each clique, determining the percentage of common membership between them, and then merging cliques whose commonality was above a certain threshold.
One of the insights I reached by looking at this data is that there was nothing very interesting about comparing the amount of communication within a clique and the size of that clique. I had expected that smaller cliques might have a higher level of communication due to their tightness. But there was only a boring, mostly linear relationship between the communication-frequency and clique-size.
However, since I could not access much data beyond my own ego-network, I felt like couldn’t take any of my data too seriously. After all, my sample size was just 160 members out of hundreds of millions. A sample that included one very common factor – friendship with me. If my social history was significantly different from most users, then data collected about my Facebook friend list could be fairly misleading.
Concern about Facebook
The fact that the Facebook APIs only allow access to information centered on the currently signed-in user makes sense from a privacy point of view. Of course, privacy advocates think Facebook hasn’t taken this nearly seriously enough.
However, my research led me to better appreciate another important issue standing in juxtaposition to privacy. Facebook owns detailed information about the social connections of nearly one tenth of all humanity. Their APIs prevent developers from extracting very much of this information (and you might get sued if you try to extract it some other way!). It is a far broader and deeper quantity of personal information than any other company or government controls. To the extent this information can be understood and leveraged for profit, should this primacy be a cause for concern? Should we demand that more of this information be made available to reduce Facebook’s monopoly in this area?