Hi all,
I have some problems after the fillPeaks. This is the code I work with:
xset <- xcmsSet(files, method="centWave", snthresh=10, ppm=10, mzdiff=0.01,
prefilter=c(3,500), peakwidth=c(10,60), nSlaves=4)
idx<-which(xset@peaks[,"mz"] > 100 & xset@peaks[,"mz"] < 1000)
xset@peaks<-xset@peaks[idx,]
xsetR <- retcor(xset, method="obiwarp", profStep=0.1, plottype="deviation")
xsetR <- group(xsetR, bw=5, mzwid=0.025, minsamp=5)
xsetF <- fillPeaks(xsetR)
I have this warning message after the fillPeaks step:
There were 50 or more warnings (use warnings() to see the first 50)
> warnings()
Warning messages:
1: In .local(object, ...) :
getPeaks: Peak m/z:397.366577148438-397.368316650391, RT:364.471-378.604is out of retention time range for this sample (/home/cism/Documents/Controllo10.mzXML), using zero intensity value.
2: In .local(object, ...) :
getPeaks: Peak m/z:479.481033325195-479.483581542969, RT:364.471-391.31is out of retention time range for this sample (/home/cism/Documents/Controllo10.mzXML), using zero intensity value.
3: In .local(object, ...) :
getPeaks: Peak m/z:480.484039306641-480.487976074219, RT:364.471-392.215is out of retention time range for this sample (/home/cism/Documents/Controllo10.mzXML), using zero intensity value.
4: In .local(object, ...) :
getPeaks: Peak m/z:610.453125-610.457580566406, RT:364.471-383.212is out of retention time range for this sample (/home/cism/Documents/Controllo10.mzXML), using zero intensity value.
5: In .local(object, ...) :
getPeaks: Peak m/z:617.063537597656-617.069519042969, RT:364.471-376.196is out of retention time range for this sample (/home/cism/Documents/Controllo10.mzXML), using zero intensity value.
6: In .local(object, ...) :
getPeaks: Peak m/z:713.638122558594-713.643371582031, RT:364.471-398.001is out of retention time range for this sample (/home/cism/Documents/Controllo10.mzXML), using zero intensity value.
7: In .local(object, ...) :
getPeaks: Peak m/z:798.143920898438-798.152557373047, RT:364.471-376.5195is out of retention time range for this sample (/home/cism/Documents/Controllo10.mzXML), using zero intensity value.
......
So, I think there is a specific problem, for example, in the file called Controllo10.mzXML, but clustering the RT differences it seems that the groups aren't too different.
minlength <- min(sapply(xsetR@rt$raw, length ))
devs <-
sapply(1:length(xsetR@rt$corrected),
function(x){
(xsetR@rt$raw[[x]]-xsetR@rt$corrected[[x]])[1:minlength]})
colnames(devs) <- sampnames(xsetR)
ddevs <- dist(t(devs))
hdevs <- hclust(ddevs)
x11(); plot(hdevs)
The result of the clustering analysis is:
[attachment=1:29id13si]HCluster.png[/attachment:29id13si]
And this is the result of the retcor step:
[attachment=0:29id13si]RTDev.png[/attachment:29id13si]
Do you have any suggestion to try to solve this kind of problem?? Should I discard from my dataset the sample in the warning message??
Best
Riccardo
P.S. This enormous problem in the RT are due to the use of a HILIC LC column... :evil:
[attachment deleted by admin]
Does this file otherwise look normal when you look at the chromatogram? This error suggests that there is no data above 6 mins for this file.
Before I start work I check all the file using a Paul Benton's function:
CheckCDFfile <- function(file, type=".mzdata"){
cat("n")
cat(paste("Loading File:", file, sep=""))
xr <- xcmsRaw(file, profstep=0)
for(i in 1:length(xr@scanindex)){
scan <- getScan(xr, scan=i)
if(is.unsorted(scan[,"mz"]) == TRUE){
cat(" x ")
newfile <- sub(type, "-Fixed.mzdata",
file, ignore.case=TRUE, fixed=TRUE)
write.mzdata(xr, newfile)
file.copy(file, sub(type, ".OLD", file, ignore.case=TRUE))
unlink(file)
return(1)}
if(i == length(xr@scanindex)){
cat(" O ")
return(0)}}}
###
sapply(files, CheckCDFfile)
All seems to be ok.
This doesn't seem to check if the file got cut somehow if the file is still valid. I am just asking you to open the file in whatever viewer and check that the chromatogram has data in the whole retention time range as expected.
I sow that all chromatograms "start" at 6 minutes (360s) and the problem is relative to a time window of RT:364.471-392.215. Could it be this the problem?
Moreover the files are ok, I visually checked them.
Probably they start at slightly different times. So it cannot look for peaks identified very early in some samples if that retention time region is missing in others. I would just ignore...
You can try this (will probably take a while):
first_scan=c()
for (i in 1:length(files)){
first_scan[i] = min(xcmsRaw(files[i])@scantime)
}
hist(first_scan)
To see where your files actually start.
A peak was found in some sample in the range 364.471 - 378.604. As your graph shows not all files have data at early as 364.4, therefore the fillpeaks cannot attempt to find that peak in those files and is forced to set 0. I would just ignore it.
Just to chime in: theres a good explanation which I copied over from the old wiki that steffen made. Link here:
viewtopic.php?f=25&t=148 (http://metabolomics-forum.com/viewtopic.php?f=25&t=148)