Title: | Transform Counts in RNA-Seq Data Analysis |
---|---|
Description: | Provide data transformation functions to transform counts in RNA-seq data analysis. Please see the reference: Zhang Z, Yu D, Seo M, Hersh CP, Weiss ST, Qiu W. (2019) <doi.org/10.1038/s41598-019-41315-w>. |
Authors: | Zeyu Zhang [aut, cre], Danyang Yu [aut, ctb], Minseok Seo [aut, ctb], Craig P. Hersh [aut, ctb], Scott T. Weiss [aut, ctb], Weiliang Qiu [aut, ctb] |
Maintainer: | Zeyu Zhang <[email protected]> |
License: | GPL (>= 2) |
Version: | 0.0.6 |
Built: | 2025-02-26 03:44:40 UTC |
Source: | https://github.com/cran/countTransformers |
A simulated data set based on the R code provided by Law et al.'s (2014) paper.
data("es")
data("es")
The format is: Formal class 'ExpressionSet' [package "Biobase"]
The simulated data set contains RNA-seq counts of 1000 genes for 6 samples (3 cases and 3 controls). The library sizes of the 6 samples are not equal.
The dataset was generated based on the R code Simulation_Full.R from the website http://bioinf.wehi.edu.au/voom/.
Law CW, Chen Y, Shi W, Smyth GK. voom: precision weights unlock linear model analysis tools for RNA-seq read counts. Genome Biology. 2014; 15:R29
library(Biobase) data(es) print(es) # expression set ex = exprs(es) print(dim(ex)) print(ex[1:3,1:2]) # phenotype data pDat = pData(es) print(dim(pDat)) print(pDat[1:2,]) # feature data fDat = fData(es) print(dim(fDat)) print(fDat[1:2,])
library(Biobase) data(es) print(es) # expression set ex = exprs(es) print(dim(ex)) print(ex[1:3,1:2]) # phenotype data pDat = pData(es) print(dim(pDat)) print(pDat[1:2,]) # feature data fDat = fData(es) print(dim(fDat)) print(fDat[1:2,])
Calculate Jaccard index for two binary vectors.
getJaccard(cl1, cl2)
getJaccard(cl1, cl2)
cl1 |
n by 1 binary vector of classification 1 for the n subjects |
cl2 |
n by 1 binary vector of classification 2 for the n subjects |
Jaccard Index is defined as the ratio
, where
is the number of subjects who were classified to group 1 by both classification rules,
is the number of subjects who were classified to group 1 by classification rule 1 and were classified to group 0 by classification rule 2,
is the number of subjects who were classified to group 0 by classification rule 1 and were classified to group 1 by classification rule 2.
The Jaccard Index
Zeyu Zhang, Danyang Yu, Minseok Seo, Craig P. Hersh, Scott T. Weiss, Weiliang Qiu
Zhang Z, Yu D, Seo M, Hersh CP, Weiss ST, Qiu W. Novel Data Transformations for RNA-seq Differential Expression Analysis. (2019) 9:4820 https://rdcu.be/brDe5
n = 10 set.seed(1234567) # generate two random binary vector of size n cl1 = sample(c(1,0), size = n, prob = c(0.5, 0.5), replace = TRUE) cl2 = sample(c(1,0), size = n, prob = c(0.5, 0.5), replace = TRUE) cat("\n2x2 contingency table >>\n") print(table(cl1, cl2)) JI = getJaccard(cl1, cl2) cat("Jaccard index = ", JI, "\n")
n = 10 set.seed(1234567) # generate two random binary vector of size n cl1 = sample(c(1,0), size = n, prob = c(0.5, 0.5), replace = TRUE) cl2 = sample(c(1,0), size = n, prob = c(0.5, 0.5), replace = TRUE) cat("\n2x2 contingency table >>\n") print(table(cl1, cl2)) JI = getJaccard(cl1, cl2) cat("Jaccard index = ", JI, "\n")
Log based count transformation minimizing sum of sample-specific squared difference.
l2Transformer(mat, low = 1e-04, upp = 1000)
l2Transformer(mat, low = 1e-04, upp = 1000)
mat |
G x n data matrix, where G is the number of genes and n is the number of subjects |
low |
lower bound for the model parameter |
upp |
upper bound for the model parameter |
Denote as the expression level of the
-th gene for
the
-th subject.
We perform the log transformation
. The optimal value for the parameter is to minimize the sum of the squared
difference between the sample mean and the sample median across
subjects
,
and
is the median of
, and
where
is the number of genes and
is the number of subjects.
A list with 3 elements:
res.delta |
An object returned by |
delta |
model parameter |
mat2 |
transformed data matrix having the same dimension as |
Zeyu Zhang, Danyang Yu, Minseok Seo, Craig P. Hersh, Scott T. Weiss, Weiliang Qiu
Zhang Z, Yu D, Seo M, Hersh CP, Weiss ST, Qiu W. Novel Data Transformations for RNA-seq Differential Expression Analysis. (2019) 9:4820 https://rdcu.be/brDe5
library(Biobase) data(es) print(es) # expression set ex = exprs(es) print(dim(ex)) print(ex[1:3,1:2]) # mean-median before transformation vec = c(ex) m = mean(vec) md = median(vec) diff = m - md cat("m=", m, ", md=", md, ", diff=", diff, "\n") res = l2Transformer(mat = ex) # estimated model parameter print(res$delta) # mean-median after transformation vec2 = c(res$mat2) m2 = mean(vec2) md2 = median(vec2) diff2 = m2 - md2 cat("m2=", m2, ", md2=", md2, ", diff2=", diff2, "\n")
library(Biobase) data(es) print(es) # expression set ex = exprs(es) print(dim(ex)) print(ex[1:3,1:2]) # mean-median before transformation vec = c(ex) m = mean(vec) md = median(vec) diff = m - md cat("m=", m, ", md=", md, ", diff=", diff, "\n") res = l2Transformer(mat = ex) # estimated model parameter print(res$delta) # mean-median after transformation vec2 = c(res$mat2) m2 = mean(vec2) md2 = median(vec2) diff2 = m2 - md2 cat("m2=", m2, ", md2=", md2, ", diff2=", diff2, "\n")
Log-based transformation.
lTransformer(mat, low = 1e-04, upp = 100)
lTransformer(mat, low = 1e-04, upp = 100)
mat |
G x n data matrix, where G is the number of genes and n is the number of subjects |
low |
lower bound for the model parameter |
upp |
upper bound for the model parameter |
Denote as the expression level of the
-th gene for
the
-th subject.
We perform the log transformation
. The optimal value for the parameter is to minimize the squared
difference between the sample mean and the sample median of the pooled data
,
,
,
where
is the number of genes and
is the number of subjects.
A list with 3 elements:
res.delta |
An object returned by |
delta |
model parameter |
mat2 |
transformed data matrix having the same dimension as |
Zeyu Zhang, Danyang Yu, Minseok Seo, Craig P. Hersh, Scott T. Weiss, Weiliang Qiu
Zhang Z, Yu D, Seo M, Hersh CP, Weiss ST, Qiu W. Novel Data Transformations for RNA-seq Differential Expression Analysis. (2019) 9:4820 https://rdcu.be/brDe5
library(Biobase) data(es) print(es) # expression set ex = exprs(es) print(dim(ex)) print(ex[1:3,1:2]) # mean-median before transformation vec = c(ex) m = mean(vec) md = median(vec) diff = m - md cat("m=", m, ", md=", md, ", diff=", diff, "\n") res = lTransformer(mat = ex) # estimated model parameter print(res$delta) # mean-median after transformation vec2 = c(res$mat2) m2 = mean(vec2) md2 = median(vec2) diff2 = m2 - md2 cat("m2=", m2, ", md2=", md2, ", diff2=", diff2, "\n")
library(Biobase) data(es) print(es) # expression set ex = exprs(es) print(dim(ex)) print(ex[1:3,1:2]) # mean-median before transformation vec = c(ex) m = mean(vec) md = median(vec) diff = m - md cat("m=", m, ", md=", md, ", diff=", diff, "\n") res = lTransformer(mat = ex) # estimated model parameter print(res$delta) # mean-median after transformation vec2 = c(res$mat2) m2 = mean(vec2) md2 = median(vec2) diff2 = m2 - md2 cat("m2=", m2, ", md2=", md2, ", diff2=", diff2, "\n")
Log and VOOM based count transformation minimizing sum of sample-specific squared difference.
lv2Transformer(mat, lib.size = NULL, low = 0.001, upp = 1000)
lv2Transformer(mat, lib.size = NULL, low = 0.001, upp = 1000)
mat |
G x n data matrix, where G is the number of genes and n is the number of subjects |
lib.size |
By default, |
low |
lower bound for the model parameter |
upp |
upper bound for the model parameter |
Denote as the expression level of the
-th gene for
the
-th subject.
We perform the log transformation
, where
and
is the column sum for the
-th column
of the matrix
mat
.
The optimal value for the parameter is to minimize the sum of the squared
difference between the sample mean and the sample median across
subjects
,
and
is the median of
, and
where
is the number of genes and
is the number of subjects.
A list with 3 elements:
res.delta |
An object returned by |
delta |
model parameter |
mat2 |
transformed data matrix having the same dimension as |
Zeyu Zhang, Danyang Yu, Minseok Seo, Craig P. Hersh, Scott T. Weiss, Weiliang Qiu
Zhang Z, Yu D, Seo M, Hersh CP, Weiss ST, Qiu W. Novel Data Transformations for RNA-seq Differential Expression Analysis. (2019) 9:4820 https://rdcu.be/brDe5
library(Biobase) data(es) print(es) # expression set ex = exprs(es) print(dim(ex)) print(ex[1:3,1:2]) # mean-median before transformation vec = c(ex) m = mean(vec) md = median(vec) diff = m - md cat("m=", m, ", md=", md, ", diff=", diff, "\n") res = lv2Transformer(mat = ex) # estimated model parameter print(res$delta) # mean-median after transformation vec2 = c(res$mat2) m2 = mean(vec2) md2 = median(vec2) diff2 = m2 - md2 cat("m2=", m2, ", md2=", md2, ", diff2=", diff2, "\n")
library(Biobase) data(es) print(es) # expression set ex = exprs(es) print(dim(ex)) print(ex[1:3,1:2]) # mean-median before transformation vec = c(ex) m = mean(vec) md = median(vec) diff = m - md cat("m=", m, ", md=", md, ", diff=", diff, "\n") res = lv2Transformer(mat = ex) # estimated model parameter print(res$delta) # mean-median after transformation vec2 = c(res$mat2) m2 = mean(vec2) md2 = median(vec2) diff2 = m2 - md2 cat("m2=", m2, ", md2=", md2, ", diff2=", diff2, "\n")
Log and VOOM Transformation.
lvTransformer(mat, lib.size=NULL, low=0.001, upp=1000)
lvTransformer(mat, lib.size=NULL, low=0.001, upp=1000)
mat |
G x n data matrix, where G is the number of genes and n is the number of subjects |
lib.size |
By default, |
low |
lower bound for the model parameter |
upp |
upper bound for the model parameter |
Denote as the expression level of the
-th gene for
the
-th subject.
We perform the log transformation
, where
and
is the column sum for the
-th column
of the matrix
mat
. The optimal value for the parameter is to minimize the squared
difference between the sample mean and the sample median of the pooled data
,
,
,
where
is the number of genes and
is the number of subjects.
A list with 3 elements:
res.delta |
An object returned by |
delta |
model parameter |
mat2 |
transformed data matrix having the same dimension as |
Zeyu Zhang, Danyang Yu, Minseok Seo, Craig P. Hersh, Scott T. Weiss, Weiliang Qiu
Zhang Z, Yu D, Seo M, Hersh CP, Weiss ST, Qiu W. Novel Data Transformations for RNA-seq Differential Expression Analysis. (2019) 9:4820 https://rdcu.be/brDe5
library(Biobase) data(es) print(es) # expression set ex = exprs(es) print(dim(ex)) print(ex[1:3,1:2]) # mean-median before transformation vec = c(ex) m = mean(vec) md = median(vec) diff = m - md cat("m=", m, ", md=", md, ", diff=", diff, "\n") res = lvTransformer(mat = ex) # estimated model parameter print(res$delta) # mean-median after transformation vec2 = c(res$mat2) m2 = mean(vec2) md2 = median(vec2) diff2 = m2 - md2 cat("m2=", m2, ", md2=", md2, ", diff2=", diff2, "\n")
library(Biobase) data(es) print(es) # expression set ex = exprs(es) print(dim(ex)) print(ex[1:3,1:2]) # mean-median before transformation vec = c(ex) m = mean(vec) md = median(vec) diff = m - md cat("m=", m, ", md=", md, ", diff=", diff, "\n") res = lvTransformer(mat = ex) # estimated model parameter print(res$delta) # mean-median after transformation vec2 = c(res$mat2) m2 = mean(vec2) md2 = median(vec2) diff2 = m2 - md2 cat("m2=", m2, ", md2=", md2, ", diff2=", diff2, "\n")
Root based count transformation minimizing sum of sample-specific squared difference.
r2Transformer(mat, low = 1e-04, upp = 1000)
r2Transformer(mat, low = 1e-04, upp = 1000)
mat |
G x n data matrix, where G is the number of genes and n is the number of subjects |
low |
lower bound for the model parameter |
upp |
upper bound for the model parameter |
Denote as the expression level of the
-th gene for
the
-th subject.
We perform the root and voom transformation
,
The optimal value for the parameter is to minimize the sum of the squared
difference between the sample mean and the sample median across
subjects
,
and
is the median of
, and
where
is the number of genes and
is the number of subjects.
A list with 3 elements:
res.delta |
An object returned by |
eta |
model parameter |
mat2 |
transformed data matrix having the same dimension as |
Zeyu Zhang, Danyang Yu, Minseok Seo, Craig P. Hersh, Scott T. Weiss, Weiliang Qiu
Zhang Z, Yu D, Seo M, Hersh CP, Weiss ST, Qiu W. Novel Data Transformations for RNA-seq Differential Expression Analysis. (2019) 9:4820 https://rdcu.be/brDe5
library(Biobase) data(es) print(es) # expression set ex = exprs(es) print(dim(ex)) print(ex[1:3,1:2]) # mean-median before transformation vec = c(ex) m = mean(vec) md = median(vec) diff = m - md cat("m=", m, ", md=", md, ", diff=", diff, "\n") res = r2Transformer(mat = ex) # estimated model parameter print(res$eta) # mean-median after transformation vec2 = c(res$mat2) m2 = mean(vec2) md2 = median(vec2) diff2 = m2 - md2 cat("m2=", m2, ", md2=", md2, ", diff2=", diff2, "\n")
library(Biobase) data(es) print(es) # expression set ex = exprs(es) print(dim(ex)) print(ex[1:3,1:2]) # mean-median before transformation vec = c(ex) m = mean(vec) md = median(vec) diff = m - md cat("m=", m, ", md=", md, ", diff=", diff, "\n") res = r2Transformer(mat = ex) # estimated model parameter print(res$eta) # mean-median after transformation vec2 = c(res$mat2) m2 = mean(vec2) md2 = median(vec2) diff2 = m2 - md2 cat("m2=", m2, ", md2=", md2, ", diff2=", diff2, "\n")
Root based transformation.
rTransformer(mat, low = 1e-04, upp = 100)
rTransformer(mat, low = 1e-04, upp = 100)
mat |
G x n data matrix, where G is the number of genes and n is the number of subjects |
low |
lower bound for the model parameter |
upp |
upper bound for the model parameter |
Denote as the expression level of the
-th gene for
the
-th subject.
We perform the root transformation
. The optimal value for the parameter is to minimize the squared
difference between the sample mean and the sample median of the pooled data
,
,
,
where
is the number of genes and
is the number of subjects.
res.eta |
An object returned by |
eta |
model parameter |
mat2 |
transformed data matrix having the same dimension as |
Zeyu Zhang, Danyang Yu, Minseok Seo, Craig P. Hersh, Scott T. Weiss, Weiliang Qiu
Zhang Z, Yu D, Seo M, Hersh CP, Weiss ST, Qiu W. Novel Data Transformations for RNA-seq Differential Expression Analysis. (2019) 9:4820 https://rdcu.be/brDe5
library(Biobase) data(es) print(es) # expression set ex = exprs(es) print(dim(ex)) print(ex[1:3,1:2]) # mean-median before transformation vec = c(ex) m = mean(vec) md = median(vec) diff = m - md cat("m=", m, ", md=", md, ", diff=", diff, "\n") res = rTransformer(mat = ex) # estimated model parameter print(res$eta) # mean-median after transformation vec2 = c(res$mat2) m2 = mean(vec2) md2 = median(vec2) diff2 = m2 - md2 cat("m2=", m2, ", md2=", md2, ", diff2=", diff2, "\n")
library(Biobase) data(es) print(es) # expression set ex = exprs(es) print(dim(ex)) print(ex[1:3,1:2]) # mean-median before transformation vec = c(ex) m = mean(vec) md = median(vec) diff = m - md cat("m=", m, ", md=", md, ", diff=", diff, "\n") res = rTransformer(mat = ex) # estimated model parameter print(res$eta) # mean-median after transformation vec2 = c(res$mat2) m2 = mean(vec2) md2 = median(vec2) diff2 = m2 - md2 cat("m2=", m2, ", md2=", md2, ", diff2=", diff2, "\n")
Root and VOOM based count transformation minimizing sum of sample-specific squared difference.
rv2Transformer(mat, low = 1e-04, upp = 1000, lib.size = NULL)
rv2Transformer(mat, low = 1e-04, upp = 1000, lib.size = NULL)
mat |
G x n data matrix, where G is the number of genes and n is the number of subjects |
lib.size |
By default, |
low |
lower bound for the model parameter |
upp |
upper bound for the model parameter |
Denote as the expression level of the
-th gene for
the
-th subject.
We perform the root and voom transformation
, where
and
is the column sum for the
-th column
of the matrix
mat
.
The optimal value for the parameter is to minimize the sum of the squared
difference between the sample mean and the sample median across
subjects
,
and
is the median of
, and
where
is the number of genes and
is the number of subjects.
A list with 3 elements:
res.delta |
An object returned by |
eta |
model parameter |
mat2 |
transformed data matrix having the same dimension as |
Zeyu Zhang, Danyang Yu, Minseok Seo, Craig P. Hersh, Scott T. Weiss, Weiliang Qiu
Zhang Z, Yu D, Seo M, Hersh CP, Weiss ST, Qiu W. Novel Data Transformations for RNA-seq Differential Expression Analysis. (2019) 9:4820 https://rdcu.be/brDe5
library(Biobase) data(es) print(es) # expression set ex = exprs(es) print(dim(ex)) print(ex[1:3,1:2]) # mean-median before transformation vec = c(ex) m = mean(vec) md = median(vec) diff = m - md cat("m=", m, ", md=", md, ", diff=", diff, "\n") res = rv2Transformer(mat = ex) # estimated model parameter print(res$eta) # mean-median after transformation vec2 = c(res$mat2) m2 = mean(vec2) md2 = median(vec2) diff2 = m2 - md2 cat("m2=", m2, ", md2=", md2, ", diff2=", diff2, "\n")
library(Biobase) data(es) print(es) # expression set ex = exprs(es) print(dim(ex)) print(ex[1:3,1:2]) # mean-median before transformation vec = c(ex) m = mean(vec) md = median(vec) diff = m - md cat("m=", m, ", md=", md, ", diff=", diff, "\n") res = rv2Transformer(mat = ex) # estimated model parameter print(res$eta) # mean-median after transformation vec2 = c(res$mat2) m2 = mean(vec2) md2 = median(vec2) diff2 = m2 - md2 cat("m2=", m2, ", md2=", md2, ", diff2=", diff2, "\n")
Root and vOOM transformation.
rvTransformer(mat, lib.size = NULL, low = 0.001, upp = 1000)
rvTransformer(mat, lib.size = NULL, low = 0.001, upp = 1000)
mat |
G x n data matrix, where G is the number of genes and n is the number of subjects |
lib.size |
By default, |
low |
lower bound for the model parameter |
upp |
upper bound for the model parameter |
Denote as the expression level of the
-th gene for
the
-th subject.
We perform the root transformation
, where
and
is the column sum for the
-th column
of the matrix
mat
. The optimal value for the parameter is to minimize the squared
difference between the sample mean and the sample median of the pooled data
,
,
,
where
is the number of genes and
is the number of subjects.
A list with 3 elements:
res.eta |
An object returned by |
eta |
model parameter |
mat2 |
transformed data matrix having the same dimension as |
Zeyu Zhang, Danyang Yu, Minseok Seo, Craig P. Hersh, Scott T. Weiss, Weiliang Qiu
Zhang Z, Yu D, Seo M, Hersh CP, Weiss ST, Qiu W. Novel Data Transformations for RNA-seq Differential Expression Analysis. (2019) 9:4820 https://rdcu.be/brDe5
library(Biobase) data(es) print(es) # expression set ex = exprs(es) print(dim(ex)) print(ex[1:3,1:2]) # mean-median before transformation vec = c(ex) m = mean(vec) md = median(vec) diff = m - md cat("m=", m, ", md=", md, ", diff=", diff, "\n") res = rvTransformer(mat = ex) # estimated model parameter print(res$eta) # mean-median after transformation vec2 = c(res$mat2) m2 = mean(vec2) md2 = median(vec2) diff2 = m2 - md2 cat("m2=", m2, ", md2=", md2, ", diff2=", diff2, "\n")
library(Biobase) data(es) print(es) # expression set ex = exprs(es) print(dim(ex)) print(ex[1:3,1:2]) # mean-median before transformation vec = c(ex) m = mean(vec) md = median(vec) diff = m - md cat("m=", m, ", md=", md, ", diff=", diff, "\n") res = rvTransformer(mat = ex) # estimated model parameter print(res$eta) # mean-median after transformation vec2 = c(res$mat2) m2 = mean(vec2) md2 = median(vec2) diff2 = m2 - md2 cat("m2=", m2, ", md2=", md2, ", diff2=", diff2, "\n")
Wrapper function for wilcoxon rank sum test.
wilcoxWrapper(mat, grp)
wilcoxWrapper(mat, grp)
mat |
G x n data matrix, where G is the number of genes and n is the number of subjects |
grp |
n x 1 vector of subject group info |
For each row of mat
, we perform Wilcoxon rank sum test.
A G x 1 vector of p-values.
Zeyu Zhang, Danyang Yu, Minseok Seo, Craig P. Hersh, Scott T. Weiss, Weiliang Qiu
Zhang Z, Yu D, Seo M, Hersh CP, Weiss ST, Qiu W. Novel Data Transformations for RNA-seq Differential Expression Analysis. (2019) 9:4820 https://rdcu.be/brDe5