The Goal

This tutorial aims to help students use R to read and clean survey data and do some basic network analysis.

Again about Installing R and Rstudio

Before Installing Rstudio, you need to Install R first. You can download R and other necessary software by clicking here https://cran.r-project.org/. You can choose the appropriate version for your system (e.g., windows, Mac). Be careful to follow its installing instruction, especially regarding those necessary software like xquartz.

After this step, please go to RStudio website to download and install RStudio desktop. You can click here https://rstudio.com/products/rstudio/download/ and choose the free version.

Then, you can use install.packages function to install necessary R packages. But I suggest you to copy and paste following code to install some common packages for data processing and visualization. You can add any packages you want to install by defining the “packages” variable.

In this tutorial, we are using tidyverse and igraph. You can click here for more details about igraph from Dr. Katya Ognyanova https://kateto.net/networks-r-igraph. You can also check the documentation here: https://igraph.org/r/.

if (!requireNamespace("pacman"))
  install.packages('pacman')
library(pacman)
packages<-c("tidyverse","tidytext","igraph","intergraph","network",
            "ggplot2","here")
p_load(packages,character.only = TRUE)

Some Network Basics

The data is used as an example in our lecture. You can replicate the graph using the following codes.

library(igraph)

# Create data
set.seed(10)
data <- matrix(c(0,0,1,1,1,0,0,1,0,1,0,1,1,0,0,0), nrow=4,byrow = TRUE)

# set the row and column names
colnames(data) = rownames(data) = c("Bob","Carol","Ted","Alice")
 
data
##       Bob Carol Ted Alice
## Bob     0     0   1     1
## Carol   1     0   0     1
## Ted     0     1   0     1
## Alice   1     0   0     0
# build the graph object from matrix
net <- graph_from_adjacency_matrix(data, mode="directed",diag = FALSE)

net
## IGRAPH 6403c5b DN-- 4 7 -- 
## + attr: name (v/c)
## + edges from 6403c5b (vertex names):
## [1] Bob  ->Ted   Bob  ->Alice Carol->Bob   Carol->Alice Ted  ->Carol
## [6] Ted  ->Alice Alice->Bob
class(net)
## [1] "igraph"
# plot it
plot(net) 

# let us improve the plot
par(mfrow=c(1,1),mar=c(0,0,0,0))
plot(net, 
     vertex.color = "white",
     vertex.size=30, 
     vertex.frame.color = "blue",
     vertex.shape="circle",
     vertex.label.family="Helvetica",
     vertex.label.color="red",
     vertex.label.font=c(3), 
     vertex.label.cex=c(2),
     vertex.label.dist=0,
     vertex.label.degree=1,     
     edge.color="gray",
     edge.width=1,
     edge.arrow.size=.7,
     edge.arrow.width=.7,
     edge.lty="solid",
     edge.curved=0.1)

You can use degree function from igraph package to compute outdegree or indegree centrality.

degree(graph,v = V(graph),mode = c(“all”, “out”, “in”, “total”),loops = TRUE,normalized = FALSE)

# let us compute some network features such as outdegree and indegree centrality
out_d <- degree(
  net,
  mode = c("out")
)

print(out_d)
##   Bob Carol   Ted Alice 
##     2     2     2     1
in_d <- degree(
  net,
  mode = c("in")
)

print(in_d)
##   Bob Carol   Ted Alice 
##     2     1     1     3

You can also compute betweenness, closeness, and eigenvector centrality

You can use eigen_centralityto find eigenvector score, which takes a graph (graph) and returns the eigenvector centralities of positions v within it.

eigen_centrality(graph, directed = FALSE, scale = TRUE, weights = NULL)

eigen_v <- eigen_centrality(net)
eigen_v$vector
##       Bob     Carol       Ted     Alice 
## 1.0000000 0.7807764 0.7807764 1.0000000

Obviously, Bob and Alice have the highest eigenvector centrality.

Let us see betweenness centrality

bt_v <- betweenness(net)
bt_v
##   Bob Carol   Ted Alice 
##   3.0   0.5   2.0   0.5

So Bob has the highest betweenness centrality.

Let us see closeness centrality

close_v <- closeness(net)
close_v
##       Bob     Carol       Ted     Alice 
## 0.2500000 0.2500000 0.2500000 0.1666667

We also talked about the structural hole in the lecture. You can compute Burt’s constraint measure. Burt’s constraint is higher if ego has less, or mutually stronger related (i.e. more redundant) contacts.

burt_c <- constraint(net)
burt_c
##       Bob     Carol       Ted     Alice 
## 0.8645833 0.9969136 0.9969136 0.8645833

Let Us Move to a Real Network Study

Some Background about the Study Abroad Students Network Data

We developed a novel Study Abroad Complete Network Questionnaire to collect students’ complete or whole network data.

We collected SACNQ in a small-size (38) study abroad program in China using Qualtric.

We first obtained a list of students in the target cohort we were studying.

We then asked study abroad students to identify each student they frequently talk to in their cohort.

We further asked students to evaluate each selected student’s language use.

We also collected each student’s demographic and other friendship network information.

We further conducted a 6-month ethnographic work to assess the validity of students’ self-reported network from SACNQ.

Note that we asked students to report their out-of-classroom networks as well.

How to Read Qualtric Data into R

Note that when you download your qualtric data, you need to delete the second and third row of your data. They are extra information of your variables. But you don’t need them. You only need a dataset that the first row is variable names and then it is about your raw data

# load survey data
sacnq <- read_csv("SACNQ.csv") %>% 
  arrange(name_id)

Here is a snapshot of network data

load("sacnq.RData")
knitr::kable(sacnq_edges[1:2,1:8])
from to w1_freq w2_chn w3_en w4_oth_lang w5_chn_pro w6_en_pro
1 2 1 5 95 NA 3 5
1 5 2 5 95 NA 2 5
knitr::kable(sacnq_attrs[1:2,c(1,3:8)])
name_id pseudo_name gender nation region month chn_pro_self_report
1 Tiffany F USA North America 10 Imd
2 Hana F Korea Asia 22 Imd

Let us visualize the speaker network first.

library(igraph)
net <- graph_from_data_frame(d=sacnq_edges %>% filter(w1_freq>3),
                             vertices = sacnq_attrs,
                             directed=T)
plot(net)

# let us improve the plot
par(mar=c(0,0,1,1) ,pty = "s")
plot(net, 
     vertex.color = rgb(0.8,0.4,0.3,0.8),
     vertex.size=8, vertex.frame.color = "gray",
     vertex.shape="circle",vertex.label=V(net)$pseudo_name,
     vertex.label.family="Helvetica",vertex.label.color="gray40",
     vertex.label.font=c(3), vertex.label.cex=c(.8),
     vertex.label.dist=1.5,vertex.label.degree=1,     
     edge.color="gray",edge.width=1,edge.arrow.size=.7,
     edge.arrow.width=.7,edge.lty="solid",edge.curved=0.3)

Let us see the popularity of students.

Let us see the internal community in the cohort.

The End