Network theory is the study of graphs as a representation of relationship between discrete elements. When applied to social relations, it is known as social network analysis.1
Bollywood
In this article, network theory is applied to analyse relationship between some professionals in bollywood, based on data from movie set. The data has been compiled by Parth Parikh from various sources. The analysis in this article involves a subset of relevant data.
Data preparation
The CSVs downloaded are imported as following
film1: Movie details from 1950 to 1989
film2: Movie details from 1990 to 2009
film3: Movie details from 2010 to 2019
crew: Crew information with unique identifier
filmcrew: Crew details (unique identifier) in each movie
Code
#Combining film datafilms<-rbind(film1,film2,film3)rm(film1,film2,film3)# separating columns with actor names, since actor names are in single columnf<-films%>%separate(actors,c ("a1","a2","a3","a4","a5","a6","a7","a8","a9","a10"),sep="[|]")%>%filter(a1!="NA")#dedup any possible duplicates in movief<-f[!duplicated(f$imdb_id), ]rm(films)# selecting relevant columns from filmcrewfc<-filmcrew[,c(1,2,4,5)]rm(filmcrew)# separating column with writer names, since writer names are in single column and deleting rows with no writerc<-crew%>%separate(writers, c("w1","w2","w3","w4","w5","w6","w7","w8","w9","w10"), sep="[|]")%>%filter(w1!="\\N")#dedup any possible duplicates in moviec<-c[!duplicated(c$imdb_id), ]rm(crew)#Compilining actor and crew info per movie togetherfull.coded.raw<-c%>%right_join(f,by=c("imdb_id"))#Relevant columns are directors, writers and actors i.e. director, wx and axdt<-full.coded.raw[,c(2:12,16:25)]rm(full.coded.raw)#To create a file suitable for network analysis, a 2 column file is required, which depicts relationships. This can be achieved by creating combination of columns and binding them together. It can be done manually or the `juggling_jaguar` function from package `Rmessy` can be used. The package is under development. Please feel free to use it and develop it further. The package can be downloaded from github here. Or it can be installed using devtools using the following command.#>install.packages("devtools") #if not installed#>devtools::install_github("asitav-sen/Rmessy")#using juggling_jaguardt.net<-juggling_jaguar(dt)rm(dt)# names(dt.net)[1]="x"# names(dt.net)[2]="y"# # dt.net<-# dt.net%>%filter(x!=y)#The data frame contains unique ids of the crew (and names of the actors). The data frame fc contains the relevant information to get the names of the relevant codes. However, since actor names are not coded, it is important that their names remain intact.dt.net1<-dt.net%>%left_join(fc,by=c("x"="crew_id"))%>%mutate(from=ifelse(is.na(name),x,name))%>%select(c(6,2))%>%left_join(fc,by=c("y"="crew_id"))%>%mutate(to=ifelse(is.na(name),y,name))%>%select(c(1,6))rm(dt.net)# Since some of the names were not found in the crew list available, these rows may be deleted. This is optional. One may want to analyse using the codes.dt.net2<-dt.net1%>%filter(!str_detect(to,"^nm"))%>%filter(!str_detect(from,"^nm"))rm(dt.net1)#Removing possible empty rowsrm.ro<-which(dt.net2$to=="")dt.net2<-dt.net2[-rm.ro,]#In this data set, pair of A-B and B-A are considered different. However, they are ultimately same. Hence, the data was further rectified.# converting df in igraph filegra.ph<-graph_from_data_frame(d=dt.net2, directed=FALSE)# converting back to data framedata.df<-get.data.frame(gra.ph)rm(gra.ph)#To analyze 'strength of relation' one can assume the number of times people have worked together to be a good indicator.data.df<- data.df%>%group_by(from,to)%>%count()%>%arrange(desc(n))%>%rename(works=n)%>%filter(from!=to)# creating graph object and Removing scattersgra.ph<-graph_from_data_frame(d=data.df, directed=FALSE)gra.ph$weight<-data.df$worksV(gra.ph)$comp <-components(gra.ph)$membershipgra.main <-induced_subgraph(gra.ph,V(gra.ph)$comp==1)rm(gra.ph)
Analyses
Understanding importance of nodes (individuals) and the network
There is immense inequality in the importance of the individuals. Out of all, very few individuals have worked with more than 100 different people in the industry. This is observed through histogram of degrees. Similar trend is observed in the histogram of betweenness, which is another indicator of importance. Roughly, betweenness in this case can be simplified as the tendency to do have worked with different individuals who have not worked together. (This is an oversimplification)
The assortativity based on degree i.e. tendency for individuals to work with other individuals with similar degree (connections), lies somewhere in the middle, near 0. There is almost equal mix of cases.
Code
assortativity_degree(gra.main, directed=FALSE)
[1] 0.0175864
Transitivity of 0.23 is much higher than that of randomly generated network of similar properties. However, it is not uncommon to observe social networks to have transitivity between 03. to 0.6.2 Transitivity measures how well connected the network is. (Oversimplification)
Code
# creating random trees for comparison# *****Requires substantial computational power*****rnd.main <-vector('list',500)dens.main<-edge_density(gra.main)n=gorder(gra.main)for(i in1:500){ rnd.main[[i]] <-sample_gnp(n=n, p = dens.main#, type = "gnp" )}tra.main<-transitivity(gra.main)tra.rnd <-unlist(lapply(rnd.main, transitivity))par(mfrow=c(1,2))hist(tra.rnd, main="Transitivity")abline(v=tra.main)hist(tra.rnd, main="Transitivity, x-axis extended", xlim =c(0,0.3))abline(v=tra.main)
Code
rm(rnd.main,tra.rnd)#similar test ca be prformed for other properties like diameter, max cliques etc.# dia.main<- diameter(gra.main, directed = FALSE)# dia.rnd <- unlist(lapply(rnd.main, diameter, directed = FALSE))# max.c.main<-max_cliques(gra.main)# lar.c.main<-largest_cliques(gra.main)
Understanding communities/clusters in the network
Fast Greedy algorithm identifies several segments, top five of which are as follows.
Central figure cannot be identified from this graph. There seems to be several who can clain to be ‘central’. Eminent personality like Kamal Hassan, Girish Karnad, Smita Patil etc. are also present in this segment. Satyajit Ray and Anil Chatterjee too. Interestingly, they some of the major characters of parallel cinema.
Mahabanoo Mody Kotwal seems to be the central figure in cluster 3. This group does not seem to consist of bollywood blockbuster creators. However, they have gained popularity in regional movies and television. Some of them are foreigners too.
Cluster 1 and 2 are too big and complex to show anything meaningful in the network plot. They need to be broken down further or other segmenting methods need to be used to capture different segments.
Comparing the histogram of degrees, betweenness and eigen vectors do not show any significant difference or unexpected outcome. In the histogram of eigen vectors of segment 3, the pattern is different from others. There are more individuals with higher eigen vectors. This is natural in a smaller group (Segment 3 is very small). People in a smaller group tend to connect to each other more than in those in larger group.
Variation in edge density is noticed in the segments. Segment 3, being smallest, can be expected to have higher density (people connected to each other). Comparing density of segment 2 and 4 is interesting. The population of segment 4 is much smaller. Yet, the density is lower than that of segment 2. This indicates that individuals in segment 2 are more connected (have worked with) to each other than those in segment 4. This, kind of, gets reinforced when the diameter is observed. The diameter of segment 4 is 8, compared to 4 of segment 2. This means that there are 3 connections in between the farthest points of the network in segment 2, compared to 8 in case of segment 4 (despite substantially lower population).
Another interesting observation is the assortativity of segment 3, which is highest. The tendency to stick together with people with similar number of connections is higher in segment 3.
This analysis can be further extended. Esp, by using clustering algorithms other than fast greedy. Moreover, detailed analysis of ego graphs may reveal interesting insights. Don’t forget to share your results, if you do any of it.
Contact me if - You want to understand how network analysis can help in your sales and marketing efforts. - You are looking to collaborate for some investigation/research.
---title: "Network analysis in Bollywood"author: "Asitav Sen"date: "2020-06-09"date-modified: "1/18/2025"categories: [article, analysis, R]format: html: page-layout: article lightbox: auto---```{r setup, include=FALSE}knitr::opts_chunk$set(echo = TRUE)#library(dplyr)library(tidyverse)library(stringr)library(Rmessy)library(igraph)film1<-read_csv("bollywood_text_1950-1989.csv")film2<-read_csv("bollywood_text_1990-2009.csv")film3<-read_csv("bollywood_text_2010-2019.csv")crew<-read_csv("bollywood_crew_1950-2019.csv")filmcrew<-read_csv("bollywood_crew_data_1950-2019.csv")```## Network AnalysisNetwork theory is the study of graphs as a representation of relationship between discrete elements. When applied to social relations, it is known as social network analysis.[^1]## BollywoodIn this article, network theory is applied to analyse relationship between some professionals in [bollywood](https://en.wikipedia.org/wiki/Bollywood), based on data from movie set. The data has been compiled by [Parth Parikh](https://www.kaggle.com/pncnmnp/the-indian-movie-database) from various sources. The analysis in this article involves a subset of relevant data.## Data preparationThe CSVs downloaded are imported as following - film1: Movie details from 1950 to 1989 - film2: Movie details from 1990 to 2009 - film3: Movie details from 2010 to 2019 - crew: Crew information with unique identifier - filmcrew: Crew details (unique identifier) in each movie```{r data.prep, echo=TRUE, message=FALSE, warning=FALSE}#Combining film datafilms<-rbind(film1,film2,film3)rm(film1,film2,film3)# separating columns with actor names, since actor names are in single columnf<-films%>% separate(actors,c ("a1","a2","a3","a4","a5","a6","a7","a8","a9","a10"),sep="[|]")%>% filter(a1!="NA")#dedup any possible duplicates in movief<-f[!duplicated(f$imdb_id), ]rm(films)# selecting relevant columns from filmcrewfc<-filmcrew[,c(1,2,4,5)]rm(filmcrew)# separating column with writer names, since writer names are in single column and deleting rows with no writerc<-crew%>% separate(writers, c("w1","w2","w3","w4","w5","w6","w7","w8","w9","w10"), sep="[|]")%>% filter(w1!="\\N")#dedup any possible duplicates in moviec<-c[!duplicated(c$imdb_id), ]rm(crew)#Compilining actor and crew info per movie togetherfull.coded.raw<-c%>%right_join(f,by=c("imdb_id"))#Relevant columns are directors, writers and actors i.e. director, wx and axdt<-full.coded.raw[,c(2:12,16:25)]rm(full.coded.raw)#To create a file suitable for network analysis, a 2 column file is required, which depicts relationships. This can be achieved by creating combination of columns and binding them together. It can be done manually or the `juggling_jaguar` function from package `Rmessy` can be used. The package is under development. Please feel free to use it and develop it further. The package can be downloaded from github here. Or it can be installed using devtools using the following command.#>install.packages("devtools") #if not installed#>devtools::install_github("asitav-sen/Rmessy")#using juggling_jaguardt.net<-juggling_jaguar(dt)rm(dt)# names(dt.net)[1]="x"# names(dt.net)[2]="y"# # dt.net<-# dt.net%>%filter(x!=y)#The data frame contains unique ids of the crew (and names of the actors). The data frame fc contains the relevant information to get the names of the relevant codes. However, since actor names are not coded, it is important that their names remain intact.dt.net1<-dt.net%>% left_join(fc,by=c("x"="crew_id"))%>% mutate(from=ifelse(is.na(name),x,name))%>% select(c(6,2))%>% left_join(fc,by=c("y"="crew_id"))%>% mutate(to=ifelse(is.na(name),y,name))%>% select(c(1,6))rm(dt.net)# Since some of the names were not found in the crew list available, these rows may be deleted. This is optional. One may want to analyse using the codes.dt.net2<-dt.net1%>% filter(!str_detect(to,"^nm"))%>% filter(!str_detect(from,"^nm"))rm(dt.net1)#Removing possible empty rowsrm.ro<-which(dt.net2$to=="")dt.net2<-dt.net2[-rm.ro,]#In this data set, pair of A-B and B-A are considered different. However, they are ultimately same. Hence, the data was further rectified.# converting df in igraph filegra.ph<-graph_from_data_frame(d=dt.net2, directed=FALSE)# converting back to data framedata.df<-get.data.frame(gra.ph)rm(gra.ph)#To analyze 'strength of relation' one can assume the number of times people have worked together to be a good indicator.data.df<- data.df%>% group_by(from,to)%>% count()%>% arrange(desc(n))%>% rename(works=n)%>% filter(from!=to)# creating graph object and Removing scattersgra.ph<-graph_from_data_frame(d=data.df, directed=FALSE)gra.ph$weight<-data.df$worksV(gra.ph)$comp <- components(gra.ph)$membershipgra.main <- induced_subgraph(gra.ph,V(gra.ph)$comp==1)rm(gra.ph)```## Analyses### Understanding importance of nodes (individuals) and the networkThere is immense inequality in the importance of the individuals. Out of all, very few individuals have worked with more than 100 different people in the industry. This is observed through histogram of degrees. Similar trend is observed in the histogram of betweenness, which is another indicator of importance. Roughly, betweenness in this case can be simplified as the tendency to do have worked with different individuals who have not worked together. *(This is an oversimplification)* ```{r deg_bw_eigen, echo=TRUE, fig.height=6, fig.width=9}main.deg<-degree(gra.main, mode = "all")main.bw<-betweenness(gra.main,directed = FALSE, normalized = TRUE)eigen.main<-eigen_centrality(gra.main)par(mfrow=c(1,3))hist(main.deg, breaks = 10, main = "Degree", xlab = "Degree")hist(main.bw, breaks = 100, main = "Betweenness", xlab = "Betweenness")hist(eigen.main$vector, breaks = 100, main= "Eigen Vector", xlab = "Eigen Vector")```The top individuals identified are mentioned in the table. ```{r echo=TRUE}bydegree<-sort(main.deg, decreasing=TRUE)[1:20]bybetweenness<-sort(main.bw, decreasing=TRUE)[1:20]byeigen<-sort(eigen.main$vector,decreasing = TRUE)[1:20]top<-data.frame(bydegree,bybetweenness,byeigen)top```******<a title="Bollywood Hungama / CC BY (https://creativecommons.org/licenses/by/3.0)" href="https://commons.wikimedia.org/wiki/File:Dilip_Kumar_Saira_Banu_Aruna_Irani_still8.jpg"><img width="512" alt="Dilip Kumar Saira Banu Aruna Irani still8" src="https://upload.wikimedia.org/wikipedia/commons/e/e9/Dilip_Kumar_Saira_Banu_Aruna_Irani_still8.jpg"></a><a title="Bollywood Hungama / CC BY (https://creativecommons.org/licenses/by/3.0)" href="https://commons.wikimedia.org/wiki/File:Shakti_Kapoor_1.jpg"><img width="512" alt="Shakti Kapoor 1" src="https://upload.wikimedia.org/wikipedia/commons/5/5e/Shakti_Kapoor_1.jpg"></a>******The farthest or the longest connection in the network is between Ajai Sinha and Edwin Fernandes with 5 individuals in between.```{r farthe, echo=TRUE}farthest_vertices(gra.main)get_diameter(gra.main) ```The assortativity based on degree i.e. tendency for individuals to work with other individuals with similar degree (connections), lies somewhere in the middle, near 0. There is almost equal mix of cases. ```{r assort, echo=TRUE}assortativity_degree(gra.main, directed= FALSE)```Transitivity of 0.23 is much higher than that of randomly generated network of similar properties. However, it is not uncommon to observe social networks to have transitivity between 03. to 0.6.[^2]Transitivity measures how well connected the network is. *(Oversimplification)* ```{r transi, echo=TRUE, fig.height=6, fig.width=9}# creating random trees for comparison# *****Requires substantial computational power*****rnd.main <- vector('list',500)dens.main<-edge_density(gra.main)n=gorder(gra.main)for(i in 1:500){ rnd.main[[i]] <- sample_gnp(n=n, p = dens.main #, type = "gnp" )}tra.main<-transitivity(gra.main)tra.rnd <- unlist(lapply(rnd.main, transitivity))par(mfrow=c(1,2))hist(tra.rnd, main="Transitivity")abline(v=tra.main)hist(tra.rnd, main="Transitivity, x-axis extended", xlim = c(0,0.3))abline(v=tra.main)rm(rnd.main,tra.rnd)#similar test ca be prformed for other properties like diameter, max cliques etc.# dia.main<- diameter(gra.main, directed = FALSE)# dia.rnd <- unlist(lapply(rnd.main, diameter, directed = FALSE))# max.c.main<-max_cliques(gra.main)# lar.c.main<-largest_cliques(gra.main)```### Understanding communities/clusters in the networkFast Greedy algorithm identifies several segments, top five of which are as follows.```{r fastgreedy, echo=TRUE}#Fast Greedycom.fg<-cluster_fast_greedy(gra.main)sort(sizes(com.fg), decreasing = TRUE)[1:5]#To check membership#membership(com.fg)#membership(com.fg)[membership(com.fg)=1]#membership(com.fg)[names(membership(com.fg))="Amitabh Bahchan"]```Following is the plot of cluster 4 , third largest community identified by the algorithm.```{r cl4, echo=TRUE, fig.height=6, fig.width=7}comm.fg.4 <- as.undirected(induced_subgraph(gra.main, com.fg[[4]]))comm.fg.4.deg<-degree(comm.fg.4, mode = "all")par(bg="black", mfrow=c(1,1))plot(comm.fg.4, rescale= TRUE, vertex.label = ifelse(degree(comm.fg.4) >= 20, names(V(comm.fg.4)), NA), vertex.color = adjustcolor("gold", alpha.f = .5), vertex.size = sqrt(comm.fg.4.deg), layout = layout_with_lgl(comm.fg.4), vertex.label.cex= 0.75, vertex.label.degree=pi/2, vertex.label.dist=1.5, vertex.label.color="white", edge.curved=0.5, edge.width= 0.5, edge.color = ifelse(comm.fg.4$weight>25, "dark green", "dark red"))title("Network of cluster 4",cex.main=1,col.main="white")legend("topright", c("<=25 times",">25 times"), pch=21, col="white", pt.bg=c("dark red","dark green"), pt.cex=2, cex=.8, bty="n", ncol=1, title = "Worked Together", text.col = "white")```Central figure cannot be identified from this graph. There seems to be several who can clain to be 'central'. Eminent personality like Kamal Hassan, Girish Karnad, Smita Patil etc. are also present in this segment. Satyajit Ray and Anil Chatterjee too. Interestingly, they some of the major characters of parallel cinema.Mahabanoo Mody Kotwal seems to be the central figure in cluster 3. This group does not seem to consist of bollywood blockbuster creators. However, they have gained popularity in regional movies and television. Some of them are foreigners too.```{r cl3, echo=TRUE, fig.height=6, fig.width=9}comm.fg.3 <- as.undirected(induced_subgraph(gra.main, com.fg[[3]]))comm.fg.3.deg<-degree(comm.fg.3, mode = "all")par(bg="black", mfrow=c(1,1))plot(comm.fg.3, rescale= TRUE, vertex.label = ifelse(comm.fg.3.deg >= 11, names(V(comm.fg.3)), NA), vertex.color = adjustcolor("gold", alpha.f = .5), vertex.size = comm.fg.3.deg^(1/5), layout = layout_with_lgl(comm.fg.3), vertex.label.cex= 0.75, vertex.label.degree=pi/2, vertex.label.dist=1, vertex.label.color="white", edge.curved=0.5, edge.width= 0.5, edge.color = ifelse(comm.fg.3$weight>25, "dark green", "dark red"), xlim = c(-1,1.1), asp=-0.5)title("Network of cluster 3",cex.main=1,col.main="white")legend("topright", c("<=25 times",">25 times"), pch=21, col="white", pt.bg=c("dark red","dark green"), pt.cex=2, cex=.8, bty="n", ncol=1, title = "Worked Together", text.col = "white")```Cluster 1 and 2 are too big and complex to show anything meaningful in the network plot. They need to be broken down further or other segmenting methods need to be used to capture different segments.```{r cl12, echo=TRUE}comm.fg.1 <- as.undirected(induced_subgraph(gra.main, com.fg[[1]]))comm.fg.1.deg<-degree(comm.fg.1, mode = "all")comm.fg.2 <- as.undirected(induced_subgraph(gra.main, com.fg[[2]]))comm.fg.2.deg<-degree(comm.fg.2, mode = "all")# cluster 1par(bg="black", mfrow=c(2,2))plot(comm.fg.1, rescale= TRUE, vertex.label = NA, #ifelse(comm.fg.2.deg >= 350, names(V(comm.fg.2)), NA), vertex.color = adjustcolor("gold", alpha.f = .5), vertex.size = ifelse(comm.fg.1.deg<50,0.1,(comm.fg.1.deg)^(1/4)), layout = layout_with_lgl(comm.fg.1), vertex.label.cex= 0.75, vertex.label.degree=pi/2, vertex.label.dist=1, vertex.label.color="black", edge.curved=0.5, edge.width= 0.5, edge.color = adjustcolor(ifelse(comm.fg.1$weight>10, "dark green", "dark red"),alpha=0.3), xlim = c(-1,1.1), asp=-1, axes = F)title("Network of cluster 1",cex.main=1,col.main="white")legend("topright", c("<=10 times",">10 times"), pch=21, col="white", pt.bg=c("dark red","dark green"), pt.cex=2, cex=.8, bty="n", ncol=1, title = "Worked Together", text.col = "white")#cluster 1, zoomedplot(comm.fg.1, rescale= TRUE, vertex.label = ifelse(comm.fg.1.deg >= 350, names(V(comm.fg.1)), NA), vertex.color = adjustcolor("gold", alpha.f = .3), vertex.size = ifelse(comm.fg.1.deg<50,0.1,(comm.fg.1.deg)^(1/5)), layout = layout_with_lgl(comm.fg.1), vertex.label.cex= 0.75, vertex.label.degree=pi/2, vertex.label.dist=1, vertex.label.color="black", edge.curved=0.5, edge.width= 0.5, edge.color = adjustcolor(ifelse(comm.fg.1$weight>10, "dark green", "dark red"),alpha=0.2), xlim = c(-0.025,0.025), ylim = c(-0.025,0.025), asp=-1, axes = F)title("Network of cluster 1, zoomed",cex.main=1,col.main="white")legend("topright", c("<=10 times",">10 times"), pch=21, col="white", pt.bg=c("dark red","dark green"), pt.cex=2, cex=.8, bty="n", ncol=1, title = "Worked Together", text.col = "white")# cluster 2plot(comm.fg.2, rescale= TRUE, vertex.label = NA, #ifelse(comm.fg.2.deg >= 350, names(V(comm.fg.2)), NA), vertex.color = adjustcolor("gold", alpha.f = .5), vertex.size = ifelse(comm.fg.2.deg<50,0.1,(comm.fg.2.deg)^(1/4)), layout = layout_with_lgl(comm.fg.2), vertex.label.cex= 0.75, vertex.label.degree=pi/2, vertex.label.dist=1, vertex.label.color="black", edge.curved=0.5, edge.width= 0.5, edge.color = adjustcolor(ifelse(comm.fg.2$weight>10, "dark green", "dark red"),alpha=0.3), xlim = c(-1,1.1), asp=-1, axes = F)title("Network of cluster 2",cex.main=1,col.main="white")legend("topright", c("<=10 times",">10 times"), pch=21, col="white", pt.bg=c("dark red","dark green"), pt.cex=2, cex=.8, bty="n", ncol=1, title = "Worked Together", text.col = "white")#cluster 2, zoomedplot(comm.fg.2, rescale= TRUE, vertex.label = ifelse(comm.fg.2.deg >= 350, names(V(comm.fg.2)), NA), vertex.color = adjustcolor("gold", alpha.f = .3), vertex.size = ifelse(comm.fg.2.deg<50,0.1,(comm.fg.2.deg)^(1/5)), layout = layout_with_lgl(comm.fg.2), vertex.label.cex= 0.75, vertex.label.degree=pi/2, vertex.label.dist=1, vertex.label.color="black", edge.curved=0.5, edge.width= 0.5, edge.color = adjustcolor(ifelse(comm.fg.2$weight>10, "dark green", "dark red"),alpha=0.2), xlim = c(-0.025,0.025), ylim = c(-0.025,0.025), asp=-1, axes = F)title("Network of cluster 2, zoomed",cex.main=1,col.main="white")legend("topright", c("<=10 times",">10 times"), pch=21, col="white", pt.bg=c("dark red","dark green"), pt.cex=2, cex=.8, bty="n", ncol=1, title = "Worked Together", text.col = "white")```### Comparing the clusters```{r mapp, echo=TRUE}l1<-mapping_monkey(comm.fg.1)l2<-mapping_monkey(comm.fg.2)l3<-mapping_monkey(comm.fg.3)l4<-mapping_monkey(comm.fg.4)```Comparing the histogram of degrees, betweenness and eigen vectors do not show any significant difference or unexpected outcome. In the histogram of eigen vectors of segment 3, the pattern is different from others. There are more individuals with higher eigen vectors. This is natural in a smaller group (Segment 3 is very small). People in a smaller group tend to connect to each other more than in those in larger group.```{r com, echo=TRUE, fig.height=6, fig.width=9}par(mfrow=c(3,4))hist(l1[[1]], breaks = 50, col=adjustcolor("black", alpha=0.3), main = "Degrees, Segment 1", xlab = "Degrees")hist(l2[[1]], breaks = 50, col=adjustcolor("red", alpha=0.3), main = "Degrees, Segment 2", xlab = "Degrees")hist(l3[[1]], breaks = 50, col=adjustcolor("green", alpha=0.3), main = "Degrees, Segment 3", xlab = "Degrees")hist(l4[[1]], breaks = 50, col=adjustcolor("blue", alpha=0.3), main = "Degrees, Segment 4", xlab = "Degrees")hist(l1[[2]], breaks = 50, col=adjustcolor("black", alpha=0.3), main = "Betweenness, Segment 1", xlab = "Betweenness")hist(l2[[2]], breaks = 50, col=adjustcolor("red", alpha=0.3), main = "Betweenness, Segment 2", xlab = "Betweenness")hist(l3[[2]], breaks = 50, col=adjustcolor("green", alpha=0.3), main = "Betweenness, Segment 3", xlab = "Betweenness")hist(l4[[2]], breaks = 50, col=adjustcolor("blue", alpha=0.3), main = "Betweenness, Segment 4", xlab = "Betweenness")hist(l1[[3]], breaks = 50, col=adjustcolor("black", alpha=0.3), main = "Eigen vectors, Segment 1", xlab = "Eigen Vector")hist(l2[[3]], breaks = 50, col=adjustcolor("red", alpha=0.3), main = "Eigen vectors, Segment 2", xlab = "Eigen Vector")hist(l3[[3]], breaks = 50, col=adjustcolor("green", alpha=0.3), main = "Eigen vectors, Segment 3", xlab = "Eigen Vector")hist(l4[[3]], breaks = 50, col=adjustcolor("blue", alpha=0.3), main = "Eigen vectors, Segment 4", xlab = "Eigen Vector")```Variation in edge density is noticed in the segments. Segment 3, being smallest, can be expected to have higher density (people connected to each other). Comparing density of segment 2 and 4 is interesting. The population of segment 4 is much smaller. Yet, the density is lower than that of segment 2. This indicates that individuals in segment 2 are more connected (have worked with) to each other than those in segment 4. This, kind of, gets reinforced when the diameter is observed. The diameter of segment 4 is 8, compared to 4 of segment 2. This means that there are 3 connections in between the farthest points of the network in segment 2, compared to 8 in case of segment 4 (despite substantially lower population). Another interesting observation is the assortativity of segment 3, which is highest. The tendency to stick together with people with similar number of connections is higher in segment 3. ```{r echo=TRUE}seg1<-c(l1[[4]],l1[[5]],l1[[8]],l1[[9]])seg2<-c(l2[[4]],l2[[5]],l2[[8]],l2[[9]])seg3<-c(l3[[4]],l3[[5]],l3[[8]],l3[[9]])seg4<-c(l4[[4]],l4[[5]],l4[[8]],l4[[9]])rname<-c("density", "diameter","assortativity", "transitivity")tba<-data.frame(`segmnt 1`= seg1,`segmnt 2`= seg2,`segmnt 3`= seg3,`segmnt 4`= seg4)rownames(tba)<-rnametba```## Final plot (for fun)For fun, graph with individuals with highest eigen vector score is plotted to visualize the network among themselves. ```{r topgun, echo=TRUE, fig.height=6, fig.width=9}ev<-sort(eigen.main$vector, decreasing = T)[1:50]ename<-c(names(ev))n<-c(match(ename,V(gra.main)$name))rock<-induced_subgraph(gra.main,vids=n)rock.d<-degree(rock,mode = "all")par(bg="black")plot(rock, rescale= TRUE, vertex.label = ifelse(rock.d >= 11, names(V(rock)), NA), vertex.color = adjustcolor("gold", alpha.f = .5), vertex.size = ifelse(rock.d<10,0.1,sqrt(rock.d)), layout = layout_with_lgl(rock), vertex.label.cex= 0.75, vertex.label.degree=pi/2, vertex.label.dist=1, vertex.label.color="white", edge.curved=0.5, edge.width= 0.5, edge.color = adjustcolor(ifelse(rock$weight>8, "dark green", "dark red"),alpha=0.9), xlim = c(-1,1), #asp=-1, axes = F)title("Network of top guns",cex.main=1,col.main="white")legend("topright", c("<=8 times",">8 times"), pch=21, col="black", pt.bg=c("dark red","dark green"), pt.cex=2, cex=.8, bty="n", ncol=1, title = "Worked Together", text.col = "white")```This analysis can be further extended. Esp, by using clustering algorithms other than fast greedy. Moreover, detailed analysis of ego graphs may reveal interesting insights.Don't forget to share your results, if you do any of it.>Contact me if - You want to understand how network analysis can help in your sales and marketing efforts. - You are looking to collaborate for some investigation/research.## Sources[^1]: https://en.wikipedia.org/wiki/Network_analysis[^2]: http://www.stats.ox.ac.uk/~snijders/Trans_Triads_ha.pdf