Network Basics for Research Methods in Sociology
The Goal
This tutorial aims to help students use R to read and clean survey data and do some basic network analysis.
Again about Installing R and Rstudio
Before Installing Rstudio, you need to Install R first. You can download R and other necessary software by clicking here https://cran.r-project.org/. You can choose the appropriate version for your system (e.g., windows, Mac). Be careful to follow its installing instruction, especially regarding those necessary software like xquartz.
After this step, please go to RStudio website to download and install RStudio desktop. You can click here https://rstudio.com/products/rstudio/download/ and choose the free version.
Then, you can use install.packages
function to install necessary R packages. But I suggest you to copy and paste following code to install some common packages for data processing and visualization. You can add any packages you want to install by defining the “packages” variable.
In this tutorial, we are using tidyverse
and igraph
. You can click here for more details about igraph from Dr. Katya Ognyanova https://kateto.net/networks-r-igraph. You can also check the documentation here: https://igraph.org/r/.
Some Network Basics
The data is used as an example in our lecture. You can replicate the graph using the following codes.
library(igraph)
# Create data
set.seed(10)
data <- matrix(c(0,0,1,1,1,0,0,1,0,1,0,1,1,0,0,0), nrow=4,byrow = TRUE)
# set the row and column names
colnames(data) = rownames(data) = c("Bob","Carol","Ted","Alice")
data
## Bob Carol Ted Alice
## Bob 0 0 1 1
## Carol 1 0 0 1
## Ted 0 1 0 1
## Alice 1 0 0 0
# build the graph object from matrix
net <- graph_from_adjacency_matrix(data, mode="directed",diag = FALSE)
net
## IGRAPH 6403c5b DN-- 4 7 --
## + attr: name (v/c)
## + edges from 6403c5b (vertex names):
## [1] Bob ->Ted Bob ->Alice Carol->Bob Carol->Alice Ted ->Carol
## [6] Ted ->Alice Alice->Bob
## [1] "igraph"
# let us improve the plot
par(mfrow=c(1,1),mar=c(0,0,0,0))
plot(net,
vertex.color = "white",
vertex.size=30,
vertex.frame.color = "blue",
vertex.shape="circle",
vertex.label.family="Helvetica",
vertex.label.color="red",
vertex.label.font=c(3),
vertex.label.cex=c(2),
vertex.label.dist=0,
vertex.label.degree=1,
edge.color="gray",
edge.width=1,
edge.arrow.size=.7,
edge.arrow.width=.7,
edge.lty="solid",
edge.curved=0.1)
You can use degree function from igraph package to compute outdegree or indegree centrality.
degree(graph,v = V(graph),mode = c(“all”, “out”, “in”, “total”),loops = TRUE,normalized = FALSE)
# let us compute some network features such as outdegree and indegree centrality
out_d <- degree(
net,
mode = c("out")
)
print(out_d)
## Bob Carol Ted Alice
## 2 2 2 1
## Bob Carol Ted Alice
## 2 1 1 3
You can also compute betweenness, closeness, and eigenvector centrality
You can use
eigen_centrality
to find eigenvector score, which takes a graph (graph) and returns the eigenvector centralities of positions v within it.
eigen_centrality(graph, directed = FALSE, scale = TRUE, weights = NULL)
## Bob Carol Ted Alice
## 1.0000000 0.7807764 0.7807764 1.0000000
Obviously, Bob and Alice have the highest eigenvector centrality.
Let us see betweenness centrality
## Bob Carol Ted Alice
## 3.0 0.5 2.0 0.5
So Bob has the highest betweenness centrality.
Let us see closeness centrality
## Bob Carol Ted Alice
## 0.2500000 0.2500000 0.2500000 0.1666667
We also talked about the structural hole in the lecture. You can compute Burt’s constraint measure. Burt’s constraint is higher if ego has less, or mutually stronger related (i.e. more redundant) contacts.
## Bob Carol Ted Alice
## 0.8645833 0.9969136 0.9969136 0.8645833
Let Us Move to a Real Network Study
Some Background about the Study Abroad Students Network Data
We developed a novel Study Abroad Complete Network Questionnaire to collect students’ complete or whole network data.
We collected SACNQ in a small-size (38) study abroad program in China using Qualtric.
We first obtained a list of students in the target cohort we were studying.
We then asked study abroad students to identify each student they frequently talk to in their cohort.
We further asked students to evaluate each selected student’s language use.
We also collected each student’s demographic and other friendship network information.
We further conducted a 6-month ethnographic work to assess the validity of students’ self-reported network from SACNQ.
Note that we asked students to report their out-of-classroom networks as well.
How to Read Qualtric Data into R
Note that when you download your qualtric data, you need to delete the second and third row of your data. They are extra information of your variables. But you don’t need them. You only need a dataset that the first row is variable names and then it is about your raw data
Here is a snapshot of network data
from | to | w1_freq | w2_chn | w3_en | w4_oth_lang | w5_chn_pro | w6_en_pro |
---|---|---|---|---|---|---|---|
1 | 2 | 1 | 5 | 95 | NA | 3 | 5 |
1 | 5 | 2 | 5 | 95 | NA | 2 | 5 |
name_id | pseudo_name | gender | nation | region | month | chn_pro_self_report |
---|---|---|---|---|---|---|
1 | Tiffany | F | USA | North America | 10 | Imd |
2 | Hana | F | Korea | Asia | 22 | Imd |
Let us visualize the speaker network first.
library(igraph)
net <- graph_from_data_frame(d=sacnq_edges %>% filter(w1_freq>3),
vertices = sacnq_attrs,
directed=T)
plot(net)
# let us improve the plot
par(mar=c(0,0,1,1) ,pty = "s")
plot(net,
vertex.color = rgb(0.8,0.4,0.3,0.8),
vertex.size=8, vertex.frame.color = "gray",
vertex.shape="circle",vertex.label=V(net)$pseudo_name,
vertex.label.family="Helvetica",vertex.label.color="gray40",
vertex.label.font=c(3), vertex.label.cex=c(.8),
vertex.label.dist=1.5,vertex.label.degree=1,
edge.color="gray",edge.width=1,edge.arrow.size=.7,
edge.arrow.width=.7,edge.lty="solid",edge.curved=0.3)
Let us see the popularity of students.
Let us see the internal community in the cohort.
The End