Export igraph R Objects to GePhi Gexf format

Make Vectorize your friend for R

Before we jump to GePhi and iGraph conversions, you might want to know something about Vectorization.

Old habits die hard. For someone coming from linear programming models such as C++ or Java, it takes quite a bit of effort before a fully-compliant R functions can be developed. The major pit-fall is, defining a function that is not vector-compliant.

For example, consider the below function that accepts two numbers and returns some computed value based on them.

    getNc2 = function(vc, ec)
      # compute nC2 and return the ratio
      max_ec = vc * (vc-1) / 2
      return (trunc(ec * 100 / max_ec))

This is a valid function in R, and perhaps similar to any C++ or Java method, except it works on both single number input and vector inputs. For example, note the below single number input calls:

> getNc2(5, 4)
[1] 40

> getNc2(6, 5)
[1] 33    

Now, those two calls can be combined into a single vector call as below and our above definition works perfectly fine without complaints:

    > getNc2(c(5,6), c(4,5))
    [1] 40 33    

The method accepted two vectors (instead of just two individual numbers) and returned vectorized results. Something not quite natural in C++ or Java, unless special care is taken to define the argument types as templated parameters and so on.

Now, what about the limits? Our method is computing a ratio which essentially involves a division. If we are not careful, we can endup with dividing-by-zero. For example, consider the below calls:

> getNc2(0, 5)
[1] -Inf

> getNc2(0, 0)
[1] NaN    

Suppose, we want to handle these cases where the argument is zero and return 0 instead of -Inf or NaN. Then we will be adding an if condition as below:

    getNc2 = function(vc, ec)
      # if no vertices return 0
      if(vc <= 1) return (0)
      # compute nC2 and return the ratio
      max_ec = vc * (vc-1) / 2
      return (trunc(ec * 100 / max_ec))

Now, the earlier calls will return 0 correctly as shown below:

> getNc2(0, 0)
[1] 0
> getNc2(0, 5)
[1] 0    

However, the if condition introduces a new problem. if in R is not a vector-compatible opration. It works only for scalar inputs and as such our method hence fails on vector inputs. For example:

> getNc2(c(5,0), c(4,5))
[1]   40 -Inf
Warning message:
In if (vc <= 1) return(0) :
  the condition has length > 1 and only the first element will be used    

Now, there are couple of ways you can address this problem. One would be to use apply constructs - but they can be overkill oftentimes. Other way is, using ifelse instead of if etc. vector-capable operations explicitly. That would work, provided you have the option to change the source. What about cases where you are using a method from some package and you cannot change the method ?

Take the method get.edge from igraph package, for example. It accepts an edge id and returns the vertices for that edge. However, if you supply a vector of edge ids, it will not work - it just returns the vertices for only the first edge, as can be seen from the below output:

> cg <- erdos.renyi.game(8, 0.6)
> get.edge(cg, 13)
[1] 4 6
> get.edge(cg, c(13,15,16))
[1] 4 6    

How do you make these kind of methods work for vector inputs as well as scalar inputs? R has a nice base method under its sleeve that exactly addresses this problem. Named Vectorize, it helps in these situations - coverting scalar methods to vector methods.

Vectorize creates a function wrapper that vectorizes the action of its argument function. For example, we can vectorize our getNc2 method as shown below:

> Vectorize(getNc2)(c(5,0), c(4,5))
[1] 40  0    

Instead of directly calling getNc2, we are rather creating a wrapper around it by calling Vectorize method with getNc2 as its argument, and supplying the original vector inputs to the resultant function. In other words, the above is same as:

> vecNc2 <- Vectorize(getNc2)
> vecNc2(c(5,0), c(4,5))
[1] 40  0    

Remember, the vectorized vecNc2 method works perfectly fine on scalar inputs too. For example:

> vecNc2(5, 4)
[1] 40    

By default Vectorize tries to vectorize all the arguments of the supplied method, which may not work sometimes. For example, the earlier example of get.edge in igraph:

> vecGE <- Vectorize(get.edge)
> vecGE(cg, c(13,15,16))
Error in function (graph, id)  : Not a graph object    

Here the get.edge method is accepting two inputs, one the graph object itself and the second the edge id - and we want to vectorize only the second argument, namely the id argument and not the graph object. For this, we would have to use the vectorize.args facility of Vectorize method as shown below:

> vecGE <- Vectorize(get.edge, vectorize.args="id")
> vecGE(cg, c(13,15,16))
     [,1] [,2] [,3]
[1,]    4    1    2
[2,]    6    7    7    

This is a simplified output of the Vectorize method. We can use the SIMPLIFY facility of Vectorize method to get the original results unaltered. For example:

> vecGE <- Vectorize(get.edge, vectorize.args="id", SIMPLIFY=FALSE)
> vecGE(cg, c(13,15,16))
[1] 4 6

[1] 1 7

[1] 2 7    

Note, this is same as the lapply output.

> lapply(c(13,15,16), function(id) get.edge(cg, id))
[1] 4 6

[1] 1 7

[1] 2 7    

We will be using this Vectorize method to convert edge lists of iGraph to edge data-frame for GePhi.

iGraph to GePhi GExF format Conversion

GePhi has interesting visualization capabilities built around graphs - and igraph is one of the widely used graph processing libraries for R. But there is no straightway to combine these two at present in R. The gexf format used by GePhi is not currently supported by the igraph package.

GePhi has its own package, the rgexf package for R, that provides some support for creating GePhi styled graphs out of custom matrix/dataframe data. It would have been great if it accepted an igraph object straight out-of-the box and generated a gexf object from it. But the current version does not support it. However, rolling out your own exporter is not that difficult. Here is something to get you started.

First, ensure you have the rgexf package for R.

# install and load
install.packages("rgexf", dependencies=TRUE)

Once you are ready with rgexf, you can create an igraph object and use write.gexf method to create gexf styles xml data out of it.

The write.gexf method needs two basic parameters: 1. the node data, 2. the edge data.

Node data is a data frame with two columns: 1: id and 2: label.

Edge data is also a data frame with two columns 1: source vertex and 2: target vertex.

You can create all these data-frames using the below shown simple method:

    cg1 <- erdos.renyi.game(10, 0.8)
    nodes <- data.frame(cbind(V(cg1), as.character(V(cg1))))
    edges <- t(Vectorize(get.edge, vectorize.args='id')(cg1, 1:ecount(cg1)))
    write.gexf(nodes, edges)    

This piece of code will generate xml formatted in gexf ready for GePhi. You can further add node attributes and edge weights in similar fashion. But, there are couple of points to be taken care though, such as xml & problem.

For some reason, write.gexf fails to properly handle & symbols in the attribute vectors, so you would have to convert the & symbol to &#038; before sending it to write.gexf.

Also note that write.gexf writes to console output and does not save to file. You would have to use print method for that. This is somewhat confusing naming convension !!

Anyhow, here is the reusable function saveAsGEXF that accepts an igraph object and a destination file path, and converts the igraph object to GePhi format and saves it to that path.

# Converts the given igraph object to GEXF format and saves it at the given filepath location
#     g: input igraph object to be converted to gexf format
#     filepath: file location where the output gexf file should be saved
saveAsGEXF = function(g, filepath="converted_graph.gexf")
  # gexf nodes require two column data frame (id, label)
  # check if the input vertices has label already present
  # if not, just have the ids themselves as the label
    V(g)$label <- as.character(V(g))
  # similarily if edges does not have weight, add default 1 weight
    E(g)$weight <- rep.int(1, ecount(g))
  nodes <- data.frame(cbind(V(g), V(g)$label))
  edges <- t(Vectorize(get.edge, vectorize.args='id')(g, 1:ecount(g)))
  # combine all node attributes into a matrix (and take care of & for xml)
  vAttrNames <- setdiff(list.vertex.attributes(g), "label") 
  nodesAtt <- data.frame(sapply(vAttrNames, function(attr) sub("&", "&",get.vertex.attribute(g, attr))))
  # combine all edge attributes into a matrix (and take care of & for xml)
  eAttrNames <- setdiff(list.edge.attributes(g), "weight") 
  edgesAtt <- data.frame(sapply(eAttrNames, function(attr) sub("&", "&",get.edge.attribute(g, attr))))
  # combine all graph attributes into a meta-data
  graphAtt <- sapply(list.graph.attributes(g), function(attr) sub("&", "&",get.graph.attribute(g, attr)))
  # generate the gexf object
  output <- write.gexf(nodes, edges, 
                       edgesAtt = edgesAtt,
                       nodesAtt = nodesAtt,
                       meta=c(list(creator="Gopalakrishna Palem", description="igraph -> gexf converted file", keywords="igraph, gexf, R, rgexf"), graphAtt))
  print(output, filepath, replace=T)


Here is the iGraph to GePhi conversion routine downloadable: iGraph_GexF_Exporter.R

Use it as shown below on any igraph object:

> cg1 <- erdos.renyi.game(5, 0.4)
> saveAsGEXF(cg1, "output.gexf")    

The generated gexf file then can be loaded into GePhi for further visualiaztion and analysis.



Homepage    Other Articles