Plotting results from GOstats analysis

The output from the previous analysis using GOstats can be plotted with ggplot2.

First combine information on pvalue, counts and log(odds) for each type of SNP (highly disruptive, moderately disruptive and low(ly) disruptive (effects)). This assumes the previous steps outlined in 'GOstats analysis' and 'Running snpEff' have been performed, to create a list object called 'plot.data':

 df.high               <- data.frame(plot.data[[1]][[1]]$NAME, plot.data[[1]][[1]]$value, plot.data[[2]][[1]]$value, plot.data[[3]][[1]]$value)
 df.moderate           <- data.frame(plot.data[[1]][[2]]$NAME, plot.data[[1]][[2]]$value, plot.data[[2]][[2]]$value, plot.data[[3]][[2]]$value)
 colnames(df.high)     <- c("NAME", "Pvalue", "logodds", "Count")
 colnames(df.moderate) <- c("NAME", "Pvalue", "logodds", "Count")

This will create two dataframes for plotting. One containing the GO terms associated with highly disruptive SNPs and the other containing GO terms associated with moderately disruptive SNPs. First, as the GO term descriptors are oftentimes simply too long to fit uncumbersomely into a plot, wrap them using the "stringr" package:

 library(stringr) #Or install.packages("stringr")
 df.high.wrapped_names<-str_wrap(df.high$NAME, width = 50)

Then create a plot using ggplot2:

p<-ggplot(data=df.moderate, aes(x=df.moderate.wrapped_names, y=log(logodds), alpha=Pvalue, size=Count))
p+geom_jitter(aes(colour=factor(df.moderate.wrapped_names))) + scale_alpha_continuous(trans="reverse")
p+xlab("Gene ontology term descriptor")
p+ylab("Log(odds ratio)")
p+theme(axis.text.x = element_text(angle = 90, hjust = 1, size=5))
p+ggtitle("Moderately disruptive SNPs")
p+scale_colour_discrete(guide=FALSE)

results matching ""

    No results matching ""