A quick example with the pls package. A good description is here: R News:PLS and the package is here: PLS package
First open up the required library packages and use the example faahKO dataset. We'll create a peak list using the xcms pipeline.
library(xcms)
library(faahKO)
library(pls)
cdfpath <- system.file("cdf", package = "faahKO")
cdffiles <- list.files(cdfpath, recursive = TRUE,full=T)
faahko <- xcmsSet(cdffiles)
faahko<-group(faahko)
faahko<-fillPeaks(faahko)
values<-groupval(faahko)
Now that we have the intensity values from the peak list stored in values we can use the pls package. We will use the class of the groups and the metabolite intensities(Met) for our regression. Of course the class could be changed with any other variable.
val<-data.frame(as.numeric(sampclass(faahko)))
val[,2]<-round(t(values),4)
colnames(val)<-c("class", "Met")
faahko.pls <- plsr(class ~ Met, ncomp = 3, data=val, validation = "LOO")
plot(RMSEP(faahko.pls), legendpos = "topright")
dev.new()
plot(faahko.pls, ncomp = 3, asp = 1, line = TRUE)
dev.new()
## to do this manually
scoreplot(faahko.pls, comps = 1:3, identify = FALSE, type = "p", col=rep(c("red", "blue"), c(6,6)), pch=16 )
loadingplot(faahko.pls, comps = 1:3, identify = FALSE, type= "p")
## use identify == TRUE if you want to identify your loadings
## these will be printed to the R console
Finally to get numerical data on the cross validation and other useful data get the summary of the object.
summary(faahko.pls)
Data: X dimension: 12 407
Y dimension: 12 1
Fit method: kernelpls
Number of components considered: 3
VALIDATION: RMSEP
Cross-validated using 12 leave-one-out segments.
(Intercept) 1 comps 2 comps 3 comps
CV 0.5455 0.2885 0.1602 0.1468
adjCV 0.5455 0.2875 0.1561 0.1421
TRAINING: % variance explained
1 comps 2 comps 3 comps
X 60.65 72.91 78.83
class 77.10 97.13 99.20