Title: | Screen and clean variable selection procedures |
---|---|
Description: | Routines for a collection of screen-and-clean type variable selection procedures, including UPS and GS. |
Authors: | Pengsheng Ji, Jiashun Jin, Qi Zhang |
Maintainer: | Qi Zhang <[email protected]> |
License: | GPL (>= 2) |
Version: | 1.0.1 |
Built: | 2025-02-13 04:11:58 UTC |
Source: | https://github.com/cran/ScreenClean |
Routines for a collection of screen-and-clean type variable selection procedures.
Package: | ScreenClean |
Type: | Package |
Version: | 1.0.1 |
Date: | 2012-10-30 |
License: | GPL (>= 2) |
In order to use ScreenClean, the data need to be normalized, to make the standard deviation of the noise to be 1, and the l_2 norm of each length n predictor vector to be 1.
Pengsheng Ji, Jiashun Jin, Qi Zhang
Maintainer: Qi Zhang[email protected]
Ji, P. and Jin, J. (2012). UPS delivers optimal phase diagram in high dimensional variable selection. Ann. Statist., 40(1), 73-103.
Jin, J., Zhang, C.-H. and Zhang, Q. (2012). Optimality of Graphlet Screening in High Dimensional Variable Selection. arXiv:1204.6452
CleaningStep performs the cleaning step of the graphlet screening
CleaningStep(survivor, y.tilde, gram, lambda, uu)
CleaningStep(survivor, y.tilde, gram, lambda, uu)
survivor |
the result of the screening step, a logical vector. |
y.tilde |
|
gram |
the thresholded sparse gram matrix |
lambda |
the tuning parameters of the cleaning step, whose optimal choice is tied to the sparse level. |
uu |
the tuning parameter of the cleaning step; its optimal choice has the intuition of the detected minimal signal strength. |
beta.gs |
the estimated regression coefficient of the graphlet screening, a numeric vector |
##See the demoGs.r
##See the demoGs.r
FindAllCG uses FindCG iteratively, and lists all the connected subgraphs with no more than lc nodes
FindAllCG(adjacency.matrix, lc)
FindAllCG(adjacency.matrix, lc)
adjacency.matrix |
p by p adjacency matrix of an undirected graph; it must be symmetric. |
lc |
the maximal size of the connected subgraphs to be listed |
cg.all |
A list, whose kth component is a matrix with k columns that lists all the connected subgraphs with k nodes. |
require(MASS) require(Matrix) p <- 10 Omega <- sparseMatrix(c(1:(p-1),2:p),c(2:p,1:(p-1)),x=1) cg.all <- FindAllCG(Omega,3)
require(MASS) require(Matrix) p <- 10 Omega <- sparseMatrix(c(1:(p-1),2:p),c(2:p,1:(p-1)),x=1) cg.all <- FindAllCG(Omega,3)
FindCG is used to find all the connected subgraphs with a certain number of nodes.
FindCG(adjacency.matrix, cg.initial)
FindCG(adjacency.matrix, cg.initial)
adjacency.matrix |
p by p adjacency matrix of an undirected graph. It must be symmetric. |
cg.initial |
It could be 1:p or a matrix, whose elements are positive integers from 1 to p. If it is a length p vector, FindCG converts it into a matrix with one column. For a matrix with k columns, FindCG reads its rows as th indices of a collection of connected subgraphs with k nodes. |
cg.new |
If the input is a matrix with k columns and stores the indices of all the size k connected subgraphs, the output is a matrix with k+1 columns storing the indices of all the connected subgraphs with k+1 nodes. |
require(MASS) require(Matrix) p <- 10 Omega <- sparseMatrix(c(1:(p-1),2:p),c(2:p,1:(p-1)),x=1) cg.2 <- FindCG(Omega,c(1:p)) cg.3 <- FindCG(Omega,cg.2)
require(MASS) require(Matrix) p <- 10 Omega <- sparseMatrix(c(1:(p-1),2:p),c(2:p,1:(p-1)),x=1) cg.2 <- FindCG(Omega,c(1:p)) cg.3 <- FindCG(Omega,cg.2)
The iterative graphlet screening procedure, main function of the package.
IterGS(y.tilde, gram, gram.bias, cg.all, sp, tau, nm, q0=0.1, scale = 1, max.iter = 3, std.thresh = 1.05, beta.initial = NULL)
IterGS(y.tilde, gram, gram.bias, cg.all, sp, tau, nm, q0=0.1, scale = 1, max.iter = 3, std.thresh = 1.05, beta.initial = NULL)
y.tilde |
|
gram |
the threholded gram matrix |
gram.bias |
the bias of the threholded gram matrix |
cg.all |
all the connected cg.alls of gram with size no more than nm. |
sp |
the expected sparse level |
tau |
the minimal signal strength to be detected |
nm |
the maximal size of the connected subgaphs considered in the screening step. |
q0 |
the minimal screening parameter |
scale |
optional numerical parameter of the screening step. The default is 1 |
max.iter |
the maximal number of iterations. The default is 3. |
std.thresh |
the threshold of the std change that stop the loop. The default is 1.05. |
beta.initial |
the initial estimate of beta in reducing the bias. The default is uu*sign(y.tilde)*(abs(y.tilde)>uu). |
IterGS returns a list with two elements
estimate |
The iterative GS estimate of beta |
n.iter |
The number of iterations it takes |
##See demoIterGs.r
##See demoIterGs.r
Penalized MLE procedure used in the cleaning step, an inner function.
PMLE(gram, y, lambda, uu)
PMLE(gram, y, lambda, uu)
gram |
the sub gram matrix of the small scale quadratic problem. |
y |
the sub-vector of y.tilde |
lambda |
the tuning parameter of the cleaning step, tied to the sparse level. |
uu |
the tuning parameters of the cleaning step. It has the intuitive interpretation of the minimal signal strength to be detected. |
b |
the estimate of the subvector of beta |
ScreeningStep performs the cleaning step of the graphlet screening
ScreeningStep(y.tilde, gram, cg.all, nm, v, r, q0 = 0.1, scale = 1)
ScreeningStep(y.tilde, gram, cg.all, nm, v, r, q0 = 0.1, scale = 1)
y.tilde |
|
gram |
the regularized gram matrix |
cg.all |
a list whose kth element is a matrix of k columns. Its rows contain all the connected subgraph with k nodes. |
nm |
the maximal subgraph invesgated in the screening step |
v |
an essential tuning parameter of graphlet screening, tied to the sparse level |
r |
an essential tuning parameter of graphlet screening, tied to the signal strength |
q0 |
the minimal screening parameter |
scale |
|
survivor |
A logical vector, where TRUE means retained as a protential signal. |
When nm=1, it is just univariate threholding, and thurs the screening step of UPS.
##See the demoGS.r
##See the demoGS.r
Thresholds the gram matrix
ThresholdGram(gram.full, delta = 1/log(dim(gram.full)[1]))
ThresholdGram(gram.full, delta = 1/log(dim(gram.full)[1]))
gram.full |
the gram matrix before the elementwise thresholding, a p by p symmetric matrix |
delta |
the threshold, the default is 1/log(p) |
A list with two elements
gram.sd |
the threhsolded gram matrix, a sparse matrix |
gram.bias |
the difference of the orginal matrix and the threholded matrix |
p <-10 off.diag<-matrix(runif(p^2),p,p) omega <- (off.diag+t(off.diag))*0.3 diag(omega) <- 1 omega.omega<-ThresholdGram(omega,0.3) omega.omega$gram omega.omega$gram.bias
p <-10 off.diag<-matrix(runif(p^2),p,p) omega <- (off.diag+t(off.diag))*0.3 diag(omega) <- 1 omega.omega<-ThresholdGram(omega,0.3) omega.omega$gram omega.omega$gram.bias
expresses the number i on the base as a vector, an inner function.
VectorizeBase(i, base, length)
VectorizeBase(i, base, length)
i |
the non-negative number to be converted |
base |
the base to be converted on |
length |
the length of the converted vector |
vector |
A vector with the given length, whose elements can be read as the number i with the given base. |