---
title: "Single Sample Directional Gene Set Analysis (ssdGSA)"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{ssdGSA}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(echo = TRUE, 
  row.print = 25,
  collapse = TRUE,
  comment = "#>"
)
```

&nbsp;

## Load package for use
&nbsp;
```{r, message=FALSE, warning=FALSE}
library(ssdGSA)
library(GSVA)
```
&nbsp;

## Load Example Data Sets
&nbsp;

Let's first load an example data matrix which contains gene expression information with ENTREZ ID as its row names. For the provided sample data matrix in this package, we only have ten samples in total.
&nbsp;

```{r }
Data_matrix <- ssdGSA::data_matrix_entrezID
knitr::kable(head(round(Data_matrix, 3)))
```
&nbsp;

Alternatively, we can load a data matrix which has ENSEMBL ID as its row names, such as the following example data matrix.
&nbsp;
```{r }
knitr::kable(head(round(ssdGSA::data_matrix, 3)))
```
&nbsp;

We can use the function `transform_ensembl_2_entrez` in the proposed `ssdGSA` package to transform ENSEMBL ID into ENTREZ ID.
&nbsp;

```{r, message=FALSE, warning=FALSE }
Data_matrix <- transform_ensembl_2_entrez(ssdGSA::data_matrix)
knitr::kable(head(round(Data_matrix, 3)))
```
&nbsp;

Now the data matrix has ENTREZ ID as its row names. Please make sure that the data matrix you input in the main function `ssdGSA` and `ssdGSA_individual` has gene ENTREZ ID as its row names. 
&nbsp;
&nbsp;

Then, let's load an example gene sets, which is a list containing gene ENTREZ ID in each gene set. In this example gene sets, there are in total 10 gene sets (corresponding to 10 pathways in this case), and we display 3 gene sets.
&nbsp;

```{r, echo=FALSE }
Gene_sets <- ssdGSA::gene_sets[c(1,2,4)]
head(Gene_sets)
```
&nbsp;

This is a list of gene sets with gene set names as component names, and each component is a vector of gene ENTREZ ID. 

&nbsp;

Also, we need to load the direction matrix which contains directionality information for each gene from summary statistics. 

```{r, echo=FALSE }
Direction_matrix <- ssdGSA::direction_matrix
knitr::kable(head(round(Direction_matrix, 3)))
```
&nbsp;

This direction matrix contains directionality information for each gene, such as effect size (ES), p value, false positive rate (FDR) from summary statistics. Each row of the matrix is for one gene, and there should be at least two columns (with the 1st column containing gene ENTREZ ID, and 2nd column containing directionality information of that gene). For the provided sample direction matrix in this package we include three columns: `gene`, `ES` and `pval`, corresponding to gene ENTREZ ID, effect size and p value. Also note that when direction matrix is missing, scores from traditional single sample gene set analysis would be calculated by the proposed `ssdGSA` package.
&nbsp;

## Functions  
&nbsp;

### ssdGSA 
&nbsp;

This function is to do single sample directional gene set analysis, which inherits the standard gene set variation analysis(GSVA) method, but also provides the option to use summary statistics from any analysis (disease vs healthy, LS vs NL, etc..) input to define the direction of gene sets used for directional gene set score calculation for a given disease. Note that this function is specific for using group weighted scores.
&nbsp;
&nbsp;

Below are the default parameters for `ssdGSA`. You can change them to modify your output. Use `help(ssdGSA)` to learn more about the parameters.
&nbsp;
```{r }
ssdGSA(Data = Data_matrix,
       Gene_sets = Gene_sets,
       Direction_matrix = Direction_matrix, 
       GSA_weight = "group_weighted",
       GSA_weighted_by = "sum.ES", #options are: "sum.ES", "avg.ES", "median.ES"
       GSA_method = "gsva", #"options are: "gsva", "ssgsea", "zscore", "avg.exprs", and "median.exprs"
       min.sz = 1,  # GSVA parameter
       max.sz = 2000, # GSVA parameter
       mx.diff = TRUE # GSVA parameter
)
```
&nbsp;

A matrix of directional gene set scores from single sample directional gene set analysis when using group weighted scores, with rows corresponding to gene sets and columns corresponding to different samples is returned.
&nbsp;
```{r }
ssdGSA(Data = Data_matrix,
       Gene_sets = Gene_sets,
       Direction_matrix = NULL, 
       GSA_weight = "group_weighted",
       GSA_weighted_by = "sum.ES", #options are: "sum.ES", "avg.ES", "median.ES"
       GSA_method = "gsva", #"options are: "gsva", "ssgsea", "zscore", "avg.exprs", and "median.exprs"
       min.sz = 6,  # GSVA parameter
       max.sz = 2000, # GSVA parameter
       mx.diff = TRUE # GSVA parameter
)
```
&nbsp;

Alternatively, when direction matrix is missing, i.e., `Direction_matrix = NULL`, scores from traditional single sample gene set analysis without directionality information will be calculated and returned.

&nbsp;

### ssdGSA_individual 
&nbsp;

This function is to do single sample directional gene set analysis when using individual weighted scores.
&nbsp;


Below are the default parameters for `ssdGSA_individual`. You can change them to modify your output. Use `help(ssdGSA_individual)` to learn more about the parameters.
&nbsp;
```{r}
ssdGSA_individual(Data = Data_matrix,
                  Gene_sets = Gene_sets,
                  Direction_matrix = Direction_matrix
)
```
&nbsp;


A matrix of directional gene set scores from single sample directional gene set analysis when using individual weighted scores, with rows corresponding to gene sets and columns corresponding to different samples is returned.
&nbsp;
&nbsp;