Title: | RNAseq Visualization Automation |
---|---|
Description: | Automate downstream visualization & pathway analysis in RNAseq analysis. 'RVA' is a collection of functions that efficiently visualize RNAseq differential expression analysis result from summary statistics tables. It also utilize the Fisher's exact test to evaluate gene set or pathway enrichment in a convenient and efficient manner. |
Authors: | Xingpeng Li [aut, cre] |
Maintainer: | Xingpeng Li <[email protected]> |
License: | GPL-2 |
Version: | 0.0.4 |
Built: | 2025-03-09 06:16:54 UTC |
Source: | https://github.com/thermostats/rva |
This is data to be included in package
c2BroadSets
c2BroadSets
GeneSetCollection
GeneSetCollection from BroadCollection
Calculate pathway scores
cal.pathway.scores( data, pathway.db, gene.id.type, FCflag, FDRflag, FC.cutoff, FDR.cutoff, OUT.Directional = NULL, IS.list = FALSE, customized.pathways, ... )
cal.pathway.scores( data, pathway.db, gene.id.type, FCflag, FDRflag, FC.cutoff, FDR.cutoff, OUT.Directional = NULL, IS.list = FALSE, customized.pathways, ... )
data |
A summary statistics table (data.frame) or |
pathway.db |
pathway database used |
gene.id.type |
gene.id.type |
FCflag |
The column name (character) of fold change information, assuming the FC is log2 transformed. Default = "logFC". |
FDRflag |
The column name (character) of adjusted p value or FDR. Default = "adj.P.Val". |
FC.cutoff |
The fold change cutoff (numeric) selected to subset summary statistics table. Default = 1.5. |
FDR.cutoff |
The FDR cutoff selected (numeric) to subset summary statistics table. Default = 0.05. |
OUT.Directional |
logical, whether output directional or non-directional pathway analysis result, default: NULL. |
IS.list |
logical, whether the input is a list, default: NULL |
customized.pathways |
the customized pathways in the format of two column dataframe to be used in analysis |
... |
pass over parameters |
Returns a dataframe.
Xingpeng Li & Siddhartha Pachhai RVA - RNAseq Visualization Automation tool.
This function calculates the change from baseline.
calc.cfb(data, annot, baseline.flag, baseline.val)
calc.cfb(data, annot, baseline.flag, baseline.val)
data |
Dataframe with subject id, annotation flag, gene id and cpm value (from count tables) columns. |
annot |
A long-format dataframe with any pertinent treatment data about
the samples. The only required column is one titled the |
baseline.flag |
A character vector of column names. These columns in |
baseline.val |
A character vector of values. This vector must be the
same length as |
This is data to be included in package
count_table
count_table
An example count table where row names are gene ID, each column is a sample
count table
...
Download gene database for enrichment.
dlPathwaysDB(pathway.db, customized.pathways = NULL, ...)
dlPathwaysDB(pathway.db, customized.pathways = NULL, ...)
pathway.db |
The databse to be used for encrichment analysis. Can be one of the following, "rWikiPathways", "KEGG", "REACTOME", "Hallmark","rWikiPathways_aug_2020" |
customized.pathways |
the user provided pathway added for analysis. |
... |
pass over parameters |
Returns a dataframe.
Xingpeng Li & Siddhartha Pachhai RVA - RNAseq Visualization Automation tool.
This function creates the color gradient for the cpm data.
get.cpm.colors(data)
get.cpm.colors(data)
data |
The CPM dataset. |
This function processes dataframe from plot_cutoff_single function and produces a ggplot object which depicts the number of differntially expressed genes with different FDR and fold change cutoff.
get.cutoff.df(datin, pvalues, FCs, FCflag = "logFC", FDRflag = "adj.P.Val")
get.cutoff.df(datin, pvalues, FCs, FCflag = "logFC", FDRflag = "adj.P.Val")
datin |
Dataframe from plot_cutoff_single. |
pvalues |
A set of p-values for FDR cutoff to be checked. |
FCs |
A set of fold change cutoff to be checked. |
FCflag |
The column name of the log2FC in the summary statistics table. |
FDRflag |
The column name of the False Discovery Rate (FDR) in the summary statistics table. |
This function processes dataframe from plot_cutoff_single function and produces a ggplot object which depicts the number of differntially expressed genes with different FDR and fold change cutoff.
get.cutoff.ggplot(df, FCflag, FDRflag)
get.cutoff.ggplot(df, FCflag, FDRflag)
df |
Dataframe from plot_cutoff_single. |
FCflag |
The column name of the log2FC in the summary statistics table. |
FDRflag |
The column name of the False Discovery Rate (FDR) in the summary statistics table. |
This function processes summary statistics table generated by differential expression analysis
like limma
or DESeq2
to produce an interactibe visual object
which depicts the number of differntially expressed genes with different FDR and
fold change cutoff.
make.cutoff.plotly(df)
make.cutoff.plotly(df)
df |
Summary statistics table from limma or DEseq2, where each row is a gene. |
Multi plot is for directional and non-directional plots
multiPlot(allID, backup.d.sig, nd.res, ...)
multiPlot(allID, backup.d.sig, nd.res, ...)
allID |
A vector of all pathway ID's from directional and non directional enriched datasets. |
backup.d.sig |
A dataframe type of object with directional pathways data prior to any cutoff's being applied |
nd.res |
A dataframe type of object with non directional pathways data prior to any cutoff's being applied |
... |
pass on variables |
Multi plot is for directional and non-directional plots, when one of the plots doesn't contain observations.
Returns ggplot.
Xingpeng Li & Siddhartha Pachhai RVA - RNAseq Visualization Automation tool.
The function takes in a boolean value and a numeric value, which it uses to decide what to output.
nullreturn(IS.list, type = 1)
nullreturn(IS.list, type = 1)
IS.list |
Indicator of whether the data frame being input is list or not. |
type |
If type = 1(default) return directional null plot. If type = 2 return non directional null plot. |
nullreturn is a function that returns NULL for single df inputs that don't hold true for threshold values. It returns an empty dataframe for list inputs which don't satisfy the cutoff's
The function returns either returns a data frame or the value NULL.
Xingpeng Li & Siddhartha Pachhai RVA - RNAseq Visualization Automation tool.
This function processes summary statistics table generated by differential expression analysis
like limma
or DESeq2
to evaluate the number of differntially expressed genes with different FDR and
fold change cutoff.
plot_cutoff( data = data, comp.names = NULL, FCflag = "logFC", FDRflag = "adj.P.Val", FCmin = 1.2, FCmax = 2, FCstep = 0.1, p.min = 0, p.max = 0.2, p.step = 0.01, plot.save.to = NULL, gen.3d.plot = TRUE, gen.plot = TRUE )
plot_cutoff( data = data, comp.names = NULL, FCflag = "logFC", FDRflag = "adj.P.Val", FCmin = 1.2, FCmax = 2, FCstep = 0.1, p.min = 0, p.max = 0.2, p.step = 0.01, plot.save.to = NULL, gen.3d.plot = TRUE, gen.plot = TRUE )
data |
Summary statistics table or a list of summary statistics tables from limma or DEseq2, where each row is a gene. |
comp.names |
A character vector that contains the comparison names which correspond to the same order as |
FCflag |
The column name of the log2FC in the summary statistics table. Default = "logFC". |
FDRflag |
The column name of the False Discovery Rate (FDR) in the summary statistics table. Default = "adj.P.Val". |
FCmin |
The minimum starting fold change cutoff to be checked, so the minimum fold change cutoff to be evaluated will be FCmin + FCstep, FCmin default = 1. |
FCmax |
The maximum fold change cutoff to be checked, default = 2. |
FCstep |
The step from the minimum to maximum fold change cutoff, one step increase at a time, default = 0.01. |
p.min |
The minimum starting FDR cutoff to be checked, so the minimum fold change cutoff to be evaluated will be p.min + p.step, p.min default = 0. |
p.max |
The maximum FDR cutoff to be checked, default = 0.2. |
p.step |
The step from the minimum to maximum fold change cutoff, one step increase at a time, default = 0.005. |
plot.save.to |
The address where to save the plot from simplified cutoff combination with FDR of 0.01, 0.05, 0.1, and 0.2. |
gen.3d.plot |
Whether generate a 3d plotly object to visualize the result, only applys to single dataframe input, default = F. |
gen.plot |
Whether generate a plot to visualize the result, default = T. |
The function takes the summary statistics and returns a list which contains 3 objects: a table which describes the number of DE genes with different cutoff combinations of FDR and fold change, a ggplot object which depicts a simplified version of cutoff selection combination, and a plotly 3d visulization object which depicts a high resolution of cutoff combinations. The default range of the fold change is from 1 to 2, and p value is from 0 to 0.2, with the step of 0.01 for FC and 0.005 for FDR.
If the input data
is a data list, then a multi-facet ggplot plot object which contains each
of the summary statistics table will be returned; otherwise, if the input data
is a data frame, then the function will return a list which contains 3 elements:
df.sub |
A dataframe, which contains the number of genes(3rd column) with FDR (1st column), Fold Change (2nd column) |
plot3d |
A plotly object to show the 3d illustration of all possible cutoff selectiosn and the number of DE genes in the 3d surface |
gp |
A ggplot object to show the simplified cutoff combination result |
Xingpeng Li & Olya Besedina, RVA - RNAseq Visualization Automation tool.
plot_cutoff(Sample_summary_statistics_table) plot_cutoff(data = list(Sample_summary_statistics_table, Sample_summary_statistics_table1), comp.names = c("A", "B"))
plot_cutoff(Sample_summary_statistics_table) plot_cutoff(data = list(Sample_summary_statistics_table, Sample_summary_statistics_table1), comp.names = c("A", "B"))
This function processes summary statistics table generated by differential expression analysis
like limma
or DESeq2
and produces a table which contains gene counts
for each of the pvalue and FC combination
plot_cutoff_single(datin, FCflag, FDRflag, FCs, pvalues)
plot_cutoff_single(datin, FCflag, FDRflag, FCs, pvalues)
datin |
Summary statistics table from limma or DEseq2, where each row is a gene. |
FCflag |
The column name of the log2FC in the summary statistics table. |
FDRflag |
The column name of the False Discovery Rate (FDR) in the summary statistics table. |
FCs |
A set of fold change cutoff to be checked. |
pvalues |
A set of p-values for FDR cutoff to be checked. |
This is the function to process the gene count table to show gene expression variations over time or across groups.
plot_gene( data = ~dat, anno = ~meta, gene.names = c("AAAS", "A2ML1", "AADACL3"), ct.table.id.type = "ENSEMBL", gene.id.type = "SYMBOL", treatment = "Treatment", sample.id = "sample_id", time = "day", log.option = TRUE, plot.save.to = NULL, input.type = "count" )
plot_gene( data = ~dat, anno = ~meta, gene.names = c("AAAS", "A2ML1", "AADACL3"), ct.table.id.type = "ENSEMBL", gene.id.type = "SYMBOL", treatment = "Treatment", sample.id = "sample_id", time = "day", log.option = TRUE, plot.save.to = NULL, input.type = "count" )
data |
Count table in the format of dataframe with gene id as row.names. |
anno |
Annotation table that provides design information. |
gene.names |
Genes to be visualized, in the format of character vector. |
ct.table.id.type |
The gene id format in |
gene.id.type |
The gene id format of |
treatment |
The column name to specify treatment groups. |
sample.id |
The column name to specify sample IDs. |
time |
The column name to specify different time points. |
log.option |
Logical option, whether to log2 transform the CPM as y-axis. Default = True. |
plot.save.to |
The address to save the plot from simplified cutoff combination with FDR of 0.01, 0.05, 0.1, and 0.2. |
input.type |
One of |
The function takes the gene counts and returns a ggplot that shows gene expression variation over time or group.
The function returns a ggplot object.
Xingpeng Li,Tatiana Gelaf Romer & Aliyah Olaniyan, RVA - RNAseq Visualization Automation tool.
plot_gene(data = count_table, anno = sample_annotation)
plot_gene(data = count_table, anno = sample_annotation)
An alias for plot_heatmap.expr(annot, cpm, fill = "CFB", ...)
.
plot_heatmap.cfb(cpm, annot, title = "RVA CFB Heatmap", ...)
plot_heatmap.cfb(cpm, annot, title = "RVA CFB Heatmap", ...)
cpm |
cpm data |
annot |
A long-format dataframe with any pertinent treatment data about
the samples. The only required column is one titled the |
title |
A title for the heatmap. Default = "RVA Heatmap". |
... |
pass over parameters |
An alias for plot_heatmap.expr(annot, cpm, fill = "CPM", ...)
.
plot_heatmap.cpm(cpm, annot, title = "RVA CPM Heatmap", ...)
plot_heatmap.cpm(cpm, annot, title = "RVA CPM Heatmap", ...)
cpm |
cpm data |
annot |
A long-format dataframe with any pertinent treatment data about
the samples. The only required column is one titled the |
title |
A title for the heatmap. Default = "RVA Heatmap". |
... |
pass over parameters |
Create a heatmap with either CFB or CPM averaged across individual samples.
plot_heatmap.expr( data = ~count, annot = ~meta, sample.id = "sample_id", annot.flags = c("day", "Treatment", "tissue"), ct.table.id.type = "ENSEMBL", gene.id.type = "SYMBOL", gene.names = NULL, gene.count = 10, title = "RVA Heatmap", fill = "CFB", baseline.flag = "day", baseline.val = "0", plot.save.to = NULL, input.type = "count" )
plot_heatmap.expr( data = ~count, annot = ~meta, sample.id = "sample_id", annot.flags = c("day", "Treatment", "tissue"), ct.table.id.type = "ENSEMBL", gene.id.type = "SYMBOL", gene.names = NULL, gene.count = 10, title = "RVA Heatmap", fill = "CFB", baseline.flag = "day", baseline.val = "0", plot.save.to = NULL, input.type = "count" )
data |
A wide-format dataframe with geneid rownames, sample column
names, and fill data matching |
annot |
A long-format dataframe with any pertinent treatment data about
the samples. The only required column is one titled the |
sample.id |
The column name to specify sample ID. |
annot.flags |
A vector of column names corresponding to column names
in |
ct.table.id.type |
The gene id format in |
gene.id.type |
The gene id format of |
gene.names |
A character vector or list of ensembl IDs for which to
display gene information. If |
gene.count |
The number of genes to include, where genes are selected
based on ranking by values in |
title |
A title for the heatmap. Default = "RVA Heatmap". |
fill |
One of |
baseline.flag |
A character vector of column names. If |
baseline.val |
A character vector of values. This vector must be the
same length as |
plot.save.to |
The address to save the heatmap plot. |
input.type |
One of |
The function takes raw CPM data and returns both a list containing a data frame with values based on the fill parameter and a heatmap plot.
The function returns a list with 2 items:
df.sub |
"A data frame of change from baselines values (fill = CFB in this example) for each gene id that is divided by a combination of treatment group and time point |
gp |
A Heatmap object from ComplexHeatmap which can be plotted |
Xingpeng Li,Tatiana Gelaf Romer & Aliyah Olaniyan, RVA - RNAseq Visualization Automation tool.
plot <- plot_heatmap.expr(data = count_table[,1:20],annot = sample_annotation[1:20,])
plot <- plot_heatmap.expr(data = count_table[,1:20],annot = sample_annotation[1:20,])
This is the function to do pathway enrichment analysis (and visualization) with rWikipathways (also KEGG, REACTOME & Hallmark) from a summary statistics table generated by
differential expression analysis like limma
or DESeq2
.
plot_pathway( data = ~df, comp.names = NULL, gene.id.type = "ENSEMBL", FC.cutoff = 1.2, FDR.cutoff = 0.05, FCflag = "logFC", FDRflag = "adj.P.Val", Fisher.cutoff = 0.1, Fisher.up.cutoff = 0.1, Fisher.down.cutoff = 0.1, plot.save.to = NULL, pathway.db = "rWikiPathways", customized.pathways = NULL, ... )
plot_pathway( data = ~df, comp.names = NULL, gene.id.type = "ENSEMBL", FC.cutoff = 1.2, FDR.cutoff = 0.05, FCflag = "logFC", FDRflag = "adj.P.Val", Fisher.cutoff = 0.1, Fisher.up.cutoff = 0.1, Fisher.down.cutoff = 0.1, plot.save.to = NULL, pathway.db = "rWikiPathways", customized.pathways = NULL, ... )
data |
A summary statistics table (data.frame) or |
comp.names |
A character vector containing the comparison names corresponding to the same order of the |
gene.id.type |
The gene id format in |
FC.cutoff |
The fold change cutoff (numeric) selected to subset summary statistics table. Default = 1.5. |
FDR.cutoff |
The FDR cutoff selected (numeric) to subset summary statistics table. Default = 0.05. |
FCflag |
The column name (character) of fold change information, assuming the FC is log2 transformed. Default = "logFC". |
FDRflag |
The column name (character) of adjusted p value or FDR. Default = "adj.P.Val". |
Fisher.cutoff |
The FDR cutoff selected (numeric) for the pathway enrichment analysis' Fisher's exact test with all determined
Differentially Expressed (DE) genes by |
Fisher.up.cutoff |
The FDR cutoff selected (numeric) for the pathway enrichment analysis' Fisher's exact test with the upregulated gene set. |
Fisher.down.cutoff |
The FDR cutoff selected (numeric) for the pathway enrichment analysis' Fisher's exact test with the downregulated gene set. |
plot.save.to |
The address to save the plot from simplified cutoff combination with FDR of 0.01, 0.05, 0.1, and 0.2. |
pathway.db |
The databse to be used for encrichment analysis. Can be one of the following, "rWikiPathways", "KEGG", "REACTOME", "Hallmark","rWikiPathways_aug_2020". |
customized.pathways |
the customized pathways in the format of two column dataframe (column name as "gs_name" and "entrez_gene") to be used in analysis. |
... |
pass on variables |
The function takes the summary statistics table and use user selected parameter based on check.cutoff to do pathway enrichment analysis
The function returns a list of 5 objects:
1 |
result table from directional pathway enrichment analysis |
2 |
result table from non-directional pathway enrichment analysis |
3 |
plot from directional pathway enrichment analysis |
4 |
plot from non-directional pathway enrichment analysis |
5 |
plot combining both directional and non-directional plot |
Xingpeng Li & Siddhartha Pachhai RVA - RNAseq Visualization Automation tool.
result <- plot_pathway(data = Sample_summary_statistics_table, gene.id.type = "ENSEMBL", FC.cutoff = 1.5, p.cutoff = 0.05, pathway.db = "rWikiPathways_aug_2020" )
result <- plot_pathway(data = Sample_summary_statistics_table, gene.id.type = "ENSEMBL", FC.cutoff = 1.5, p.cutoff = 0.05, pathway.db = "rWikiPathways_aug_2020" )
This function generates a QQ-plot object with confidence interval from summary statistics table generated by differential expression analysis
like limma
or DESeq2
.
plot_qq( data = data, comp.names = NULL, p.value.flag = "P.Value", ci = 0.95, plot.save.to = NULL )
plot_qq( data = data, comp.names = NULL, p.value.flag = "P.Value", ci = 0.95, plot.save.to = NULL )
data |
Summary statistics table or a list that contains multiple summary statistics tables from limma or DEseq2, where each row is a gene. |
comp.names |
A character vector that contains the comparison names which correspond to the same order as |
p.value.flag |
The column name of |
ci |
Confidence interval. Default = 0.95 |
plot.save.to |
The file name and the address where to save the qq-plot "~/address_to_folder/qqplot.png". Default = NULL. |
The function produces the qqplot to evaluate the result from differential expression analysis. The output is a ggplot object.
The function return a ggplot object of qqplot
Xingpeng Li & Tatiana Gelaf Romer & Olya Besedina, RVA - RNAseq Visualization Automation tool.
plot_qq(data = Sample_summary_statistics_table) plot_qq(data = list(Sample_summary_statistics_table, Sample_summary_statistics_table1), comp.names = c("A","B"))
plot_qq(data = Sample_summary_statistics_table) plot_qq(data = list(Sample_summary_statistics_table, Sample_summary_statistics_table1), comp.names = c("A","B"))
This function processes the summary statistics table generated by differential expression analysis
like limma
or DESeq2
to show on the volcano plot with the highlight gene set option (like disease
related genes from Disease vs Healthy comparison).
plot_volcano( data = data, comp.names = NULL, geneset = NULL, geneset.FCflag = "logFC", highlight.1 = NULL, highlight.2 = NULL, upcolor = "#FF0000", downcolor = "#0000FF", plot.save.to = NULL, xlim = c(-4, 4), ylim = c(0, 12), FCflag = "logFC", FDRflag = "adj.P.Val", highlight.FC.cutoff = 1.5, highlight.FDR.cutoff = 0.05, title = "Volcano plot", xlab = "log2 Fold Change", ylab = "log10(FDR)" )
plot_volcano( data = data, comp.names = NULL, geneset = NULL, geneset.FCflag = "logFC", highlight.1 = NULL, highlight.2 = NULL, upcolor = "#FF0000", downcolor = "#0000FF", plot.save.to = NULL, xlim = c(-4, 4), ylim = c(0, 12), FCflag = "logFC", FDRflag = "adj.P.Val", highlight.FC.cutoff = 1.5, highlight.FDR.cutoff = 0.05, title = "Volcano plot", xlab = "log2 Fold Change", ylab = "log10(FDR)" )
data |
Summary statistics table or a list contain multiple summary statistics tables from limma or DEseq2, where each row is a gene. |
comp.names |
A character vector that contains the comparison names which correspond to the same order as |
geneset |
Summary statistic table that contains the genes which needed to be highlighted, the gene name format (in row names) needs to be consistent with the main summary statistics table). For example, this summary statistics table could be the output summary statistics table from the Disease vs Healthy comparison (Only contains the subsetted significant genes to be highlighted). |
geneset.FCflag |
The column name of fold change in |
highlight.1 |
Genes to be highlighted, in the format of a vector consists of gene names. The gene name format needs to be consistent to the main summary statistics table. |
highlight.2 |
Genes to be highlighted, in the format of a vector consists of gene names. The gene name format needs to be consistent to the main summary statistics table. |
upcolor |
The color of the gene names in |
downcolor |
The color of the gene names in |
plot.save.to |
The file name and address where to save the volcano plot, e.g. "~/address_to_folder/volcano_plot.png". |
xlim |
Range of x axis. Default = |
ylim |
Range of x axis. Default = |
FCflag |
Column name of log2FC in the summary statistics table. Default = "logFC". |
FDRflag |
Column name of FDR in the summary statistics table. Default = "adj.P.Val". |
highlight.FC.cutoff |
Fold change cutoff line want to be shown on the plot. Default = 1.5. |
highlight.FDR.cutoff |
FDR cutoff shades want to be shown on the plot. Default = 0.05. |
title |
The plot title. Default "Volcano plot". |
xlab |
The label for x-axis. Default "log2 Fold Change". |
ylab |
The label for y-axis. Default "log10(FDR)". |
The function takes the summary statistics table and returns a ggplot, with the option to highlight genes, e.g. disease signature genes, the genes which are up-regulated and down-regulated in diseased subjects.
The function return a volcano plot as a ggplot object.
Xingpeng Li & Tatiana Gelaf Romer & Olya Besedina, RVA - RNAseq Visualization Automation tool.
plot_volcano(data = Sample_summary_statistics_table, geneset = Sample_disease_gene_set) plot_volcano(data = list(Sample_summary_statistics_table, Sample_summary_statistics_table1), comp.names = c("A", "B"), geneset = Sample_disease_gene_set)
plot_volcano(data = Sample_summary_statistics_table, geneset = Sample_disease_gene_set) plot_volcano(data = list(Sample_summary_statistics_table, Sample_summary_statistics_table1), comp.names = c("A", "B"), geneset = Sample_disease_gene_set)
Special cases where list input and at least one treatment has signal but others don't.
prettyGraphs(vizdf, ...)
prettyGraphs(vizdf, ...)
vizdf |
A dataframes of enriched pathways. |
... |
pass on variables |
Pretty Graphs is a function specifically meant to be in cases where one of the input treatments meet cutoff, but one or more of the other treatments don't meet the cutoff values. This is important so that ggplot doesn't throw any errors.
Returns a dataframe.
Xingpeng Li & Siddhartha Pachhai RVA - RNAseq Visualization Automation tool.
This function processes summary statistics table generated by differential expression analysis
like limma
or DESeq2
and produces a message about pvalues and fold change used.
produce.cutoff.message( data, FCmin, FCmax, FCstep, FDRflag, p.min, p.max, p.step )
produce.cutoff.message( data, FCmin, FCmax, FCstep, FDRflag, p.min, p.max, p.step )
data |
Summary statistics table from limma or DEseq2, where each row is a gene. |
FCmin |
The minimum starting fold change cutoff to be checked, so the minimum fold change cutoff to be evaluated will be FCmin + FCstep, FCmin default = 1. |
FCmax |
The maximum fold change cutoff to be checked, default = 2. |
FCstep |
The step from the minimum to maximum fold change cutoff, one step increase at a time, default = 0.01. |
FDRflag |
The column name of the False Discovery Rate (FDR) in the summary statistics table. |
p.min |
The minimum starting FDR cutoff to be checked, so the minimum fold change cutoff to be evaluated will be p.min + p.step, p.min default = 0. |
p.max |
The maximum FDR cutoff to be checked, default = 0.2. |
p.step |
The step from the minimum to maximum fold change cutoff, one step increase at a time, default = 0.005. |
This function processes summary statistics table generated by differential expression analysis
like limma
or DESeq2
and produces a warning about pvalue or FDR minimum value
produce.cutoff.warning(data, FDRflag)
produce.cutoff.warning(data, FDRflag)
data |
Summary statistics table from limma or DEseq2, where each row is a gene. |
FDRflag |
The column name of the False Discovery Rate (FDR) in the summary statistics table. |
This is the function to exclude the version number from the input ensembl type gene ids.
This is the function to exclude the version number from the input ensembl type gene ids.
reformat.ensembl(logcpm, ct.table.id.type) reformat.ensembl(logcpm, ct.table.id.type)
reformat.ensembl(logcpm, ct.table.id.type) reformat.ensembl(logcpm, ct.table.id.type)
logcpm |
The input count table transformed into log counts per million. |
ct.table.id.type |
The gene id format in |
This is data to be included in package
sample_annotation
sample_annotation
Sample annotation document
sample name
tissue for comparison
subject id
time points
...
This is data to be included in package
sample_count_cpm
sample_count_cpm
An example cpm table where row names are gene ID, each column is a sample
count cpm table
...
This is data to be included in package
Sample_disease_gene_set
Sample_disease_gene_set
An example disease gene set from summary statistics table as dataframe, row names are gene ID the summary statistics can be calculated from disease vs healthy, which is this example.
log2 fold change from comparison
Average expression for this gene
p value
adjusted p value or FDR
...
This is data to be included in package
Sample_summary_statistics_table
Sample_summary_statistics_table
An example summary statistics table as dataframe, row names are gene ID
log2 fold change from comparison
Average expression for this gene
p value
adjusted p value or FDR
...
This is data to be included in package
Sample_summary_statistics_table1
Sample_summary_statistics_table1
Second example summary statistics table as dataframe, row names are gene ID
log2 fold change from comparison
Average expression for this gene
p value
adjusted p value or FDR
...
The function takes in a list of dataframe, comp names and a specified type, to output a dataframe styled for ggplot.
secondCutoffErr(df, comp.names, TypeQ = 1)
secondCutoffErr(df, comp.names, TypeQ = 1)
df |
A list of dataframes. |
comp.names |
a character vector contain the comparison names corresponding to the same order to the |
TypeQ |
If type = 1(default) return directional null plot. If type = 2 return non directional null plot. |
secondCutoffErr is a function specifically meant to be used for list inputs. It is used for cases where after applying filter to the data, one of the comparison ID gets left out, this adversely effects the ggplot
Returns a dataframe.
Xingpeng Li & Siddhartha Pachhai RVA - RNAseq Visualization Automation tool.
This is the function to transform the input gene id type to another gene id type.
This is the function to transform the input gene id type to another gene id type.
## S3 method for class 'geneid' transform(gene.names, from = ~gene.id.type, to = ~ct.table.id.type) ## S3 method for class 'geneid' transform(gene.names, from = ~gene.id.type, to = ~ct.table.id.type)
## S3 method for class 'geneid' transform(gene.names, from = ~gene.id.type, to = ~ct.table.id.type) ## S3 method for class 'geneid' transform(gene.names, from = ~gene.id.type, to = ~ct.table.id.type)
gene.names |
Genes,in the format of character vector, to be transformed. |
from |
The gene id format of |
to |
The new gene id format should be one of: ACCNUM, ALIAS, ENSEMBL, ENSEMBLPROT, ENSEMBLTRANS, ENTREZID, ENZYME, EVIDENCE, EVIDENCEALL, GENENAME, GO, GOALL, IPI, MAP, OMIM, ONTOLOGY, ONTOLOGYALL, PATH, PFAM, PMID, PROSITE, REFSEQ, SYMBOL, UCSCKG, UNIGENE, UNIPROT. |
Ensure that an annotation has all of the required columns.
validate.annot( data, annot, annot.flags, sample.id, fill = "CPM", baseline.flag = NULL, baseline.val = NULL )
validate.annot( data, annot, annot.flags, sample.id, fill = "CPM", baseline.flag = NULL, baseline.val = NULL )
data |
The input count data. |
annot |
The annotation dataframe. |
annot.flags |
The vector of annotation flags passed by the user. |
sample.id |
Sample id label to check if in annot. |
fill |
The fill value indicated by the user,"count" or "CPM". |
baseline.flag |
The baseline.flag passed by the user. |
baseline.val |
The baseline value passed by the user. |
The function will check the following:
The annot.flags
values are columns in annot
If fill
= "cfb": validate the baseline.flag
and
baseline.val
parameters.
sample.id
is a column in annot
.
Xingpeng Li, Tatiana Gelaf Romer & Siddhartha Pachhai RVA - RNAseq Visualization Automation tool.
Ensures that user-input baseline.val
and baseline.flag
parameters are valid with respect to the annot
dataframe.
validate.baseline(annot, baseline.val, baseline.flag)
validate.baseline(annot, baseline.val, baseline.flag)
annot |
The annotation dataframe. |
baseline.val |
The baseline value passed by the user. |
baseline.flag |
The baseline.flag passed by the user. |
Specifically, validates that baseline.flag
value(s) are columns
in annot
, and that baseline.val
value(s) occur at least once in
their respective baseline.flag
columns.
Xingpeng Li, Tatiana Gelaf Romer & Siddhartha Pachhai RVA - RNAseq Visualization Automation tool.
FCflag
and FDRflag
must be numeric.
validate.col.types(datin, name = 1, flags)
validate.col.types(datin, name = 1, flags)
datin |
the summary statistics file. |
name |
summary statistics file position indicator |
flags |
|
Xingpeng Li, Tatiana Gelaf Romer & Siddhartha Pachhai RVA - RNAseq Visualization Automation tool.
This function ensures that when a list of data frames are used as input the the number of comp names are the same as the number of data frames.
validate.comp.names(comp.names, data)
validate.comp.names(comp.names, data)
comp.names |
a character vector contain the comparison names corresponding to the same order to the |
data |
summary statistics table (data.frame) from limma or DEseq2, where rownames are gene id. |
Xingpeng Li, Tatiana Gelaf Romer & Siddhartha Pachhai RVA - RNAseq Visualization Automation tool.
Ensures that the data input has the required formatting.
validate.data(data)
validate.data(data)
data |
The wide-format dataframe with input data. |
Specifically, checks if data
has rownaems and that all other
columns can be coerced to numeric.
Xingpeng Li, Tatiana Gelaf Romer & Siddhartha Pachhai RVA - RNAseq Visualization Automation tool.
Ensures that the annotation file matches the data file with respect to sample IDs. Throws warnings if there are discrepencies.
validate.data.annot(data, annot, sample.id)
validate.data.annot(data, annot, sample.id)
data |
input data |
annot |
annotation file |
sample.id |
sample id in the input |
Xingpeng Li, Tatiana Gelaf Romer & Siddhartha Pachhai RVA - RNAseq Visualization Automation tool.
This function ensures the fold change minimum, maximum, and step are valid.
validate.FC(FCmin, FCmax, FCstep)
validate.FC(FCmin, FCmax, FCstep)
FCmin |
The minimum starting fold change cutoff to be checked, so the minimum fold change cutoff to be evaluated will be FCmin + FCstep, FCmin default = 1. |
FCmax |
The maximum fold change cutoff to be checked, default = 2. |
FCstep |
The step from the minimum to maximum fold change cutoff, one step increase at a time, default = 0.01. |
Specifically it checks that the FCmax is greater than the FCmin, that at least 1 FCstep can fit within the FCmax and FCmin, that FCmax and FCmin values are non-negative, and that FCstep is positive.
Xingpeng Li, Tatiana Gelaf Romer & Siddhartha Pachhai RVA - RNAseq Visualization Automation tool.
Enures that the value
is one of Options
and throws an error
otherwise.
validate.flag(value, name, Options)
validate.flag(value, name, Options)
value |
The user-input value for the parameter |
name |
The name of the parameter to be displayed in the error |
Options |
A vector of valid values for |
Xingpeng Li, Tatiana Gelaf Romer & Siddhartha Pachhai RVA - RNAseq Visualization Automation tool.
Checks how many of the gene id's in the dataset are there in the geneset.
validate.genes.present(data.genes, geneset)
validate.genes.present(data.genes, geneset)
data.genes |
The gene id's. |
geneset |
a summary statistic table contain the genes want to be highlighted, the gene name format (in row names) needs to be consistent to the main summary statistics table). For example, this summary statistics table coulb be the output summary statistics table from Disease vs Healthy comparison (Only contain the subsetted significant genes want to be highlighted). |
Xingpeng Li, Tatiana Gelaf Romer & Siddhartha Pachhai RVA - RNAseq Visualization Automation tool.
This function ensures that the input geneset to check.cutoff is formatted properly and in a usable form.
validate.geneset(data, geneset, highlight.1, highlight.2)
validate.geneset(data, geneset, highlight.1, highlight.2)
data |
summary statistics table or a list contain multiple summary statistics tables from limma or DEseq2, where each row is a gene. |
geneset |
a summary statistic table contain the genes want to be highlighted, the gene name format (in row names) needs to be consistent to the main summary statistics table). For example, this summary statistics table coulb be the output summary statistics table from Disease vs Healthy comparison (Only contain the subsetted significant genes want to be highlighted). |
highlight.1 |
genes want to be highlighted, in the format of a vector consists of gene names. The gene name format needs to be consistent to the main summary statistics table. |
highlight.2 |
genes want to be highlighted, in the format of a vector consists of gene names. The gene name format needs to be consistent to the main summary statistics table. |
The function ensures that only a dataframe or vectors are supplied, that at least one or the other is supplied, and that their formatting is correct if supplied. It also checks if any of the genes overlap with the genes in the datanames.
A character value indicating if the geneset was passed as a
dataframe (df
) or two vectors (vec
), if a list is input
the number of returned values equal the length of the list
Xingpeng Li, Tatiana Gelaf Romer & Siddhartha Pachhai RVA - RNAseq Visualization Automation tool.
Ensures that a column in a dataframe which must be numeric is numeric and throws an error otherwise.
validate.numeric(datin, col, name = 1)
validate.numeric(datin, col, name = 1)
datin |
The data in question. |
col |
The column to validate as numeric. |
name |
the position of dataset |
This specifically checks if any of the values in the column can be coerced as numeric.
Xingpeng Li, Tatiana Gelaf Romer & Siddhartha Pachhai RVA - RNAseq Visualization Automation tool.
To ensure selected db name is correct.
validate.pathways.db(pathway.db, customized.pathways)
validate.pathways.db(pathway.db, customized.pathways)
pathway.db |
The databse to be used for encrichment analysis. Can be one of the following, "rWikiPathways", "KEGG", "REACTOME", "Hallmark","rWikiPathways_aug_2020" |
customized.pathways |
the customized pathways in the format of two column dataframe (column name as "gs_name" and "entrez_gene") to be used in analysis |
Xingpeng Li, Tatiana Gelaf Romer & Siddhartha Pachhai RVA - RNAseq Visualization Automation tool.
Error-handling for invalid p-value.
validate.pval.range(pval, name)
validate.pval.range(pval, name)
pval |
The pvalue |
name |
The name of the value to include in the error. |
Xingpeng Li, Tatiana Gelaf Romer & Siddhartha Pachhai RVA - RNAseq Visualization Automation tool.
To ensure p value flags are the same accross datasets.
validate.pvalflag(data, value)
validate.pvalflag(data, value)
data |
A list of summary statistics table (data.frame) from limma or DEseq2, where rownames are gene id. |
value |
P value flag. |
Xingpeng Li, Tatiana Gelaf Romer & Siddhartha Pachhai RVA - RNAseq Visualization Automation tool.
This function ensures the fold change minimum, maximum, and step are valid.
validate.pvals(p.min, p.max, p.step)
validate.pvals(p.min, p.max, p.step)
p.min |
The minimum starting FDR cutoff to be checked, so the minimum fold change cutoff to be evaluated will be p.min + p.step, p.min default = 0. |
p.max |
The maximum FDR cutoff to be checked, default = 0.2. |
p.step |
The step from the minimum to maximum fold change cutoff, one step increase at a time, default = 0.005. |
Specifically it checks that the pvalues are between 0-1, and that
at least 1 p.step
fits within the p.min
and p.max
bounds and
is positive.
Xingpeng Li, Tatiana Gelaf Romer & Siddhartha Pachhai RVA - RNAseq Visualization Automation tool.
Makes sure the summary table being input is of the right class and format.
validate.single.table.isnotlist(data)
validate.single.table.isnotlist(data)
data |
summary statistics table (data.frame) from limma or DEseq2, where rownames are gene id. |
Xingpeng Li, Tatiana Gelaf Romer & Siddhartha Pachhai RVA - RNAseq Visualization Automation tool.
Check for required column names and types.
validate.stats(datin, name = 1, ...)
validate.stats(datin, name = 1, ...)
datin |
the summary statistics file. |
name |
summary statistics file position indicator |
... |
pass on variables |
Xingpeng Li, Tatiana Gelaf Romer & Siddhartha Pachhai RVA - RNAseq Visualization Automation tool.
Required columns are FCflag
and FDRflag
validate.stats.cols(datin, name = 1, req.cols)
validate.stats.cols(datin, name = 1, req.cols)
datin |
the summary statistics file. |
name |
summary statistics file position indicator |
req.cols |
required column names of |
Xingpeng Li, Tatiana Gelaf Romer & Siddhartha Pachhai RVA - RNAseq Visualization Automation tool.
This is data to be included in package
wpA2020
wpA2020
Rwikipathway data downloaded version 2020
pathway name
version
pathway id
host name
...