Skip to main content
Topic: Pbroblem with large data set and loop (Read 7021 times) previous topic - next topic

Pbroblem with large data set and loop

Hi, I have a problem to work with a large data set (about 200 samples). In particular I have a problem to creaete the xcmsSet:

Code: [Select]
xset <- xcmsSet(data.set, method="centWave",
                    polarity="positive", ppm=10, snthr=15,
                    peakwidth=c(4,15))

In particular, after a few times, about 2 hours, I encountered  several errors such as:

Detecting mass traces at 1o ppm ...
 % finished: 0 10 Error in .local(object, ...) :
  m/z sort assumption violated ! (scan 376, p 63, current 100.9567 (I=1708.65), last 843.6795)


or

17_GS34_A.mzdata: Error in rampSIPeaks(rampid, scans, scanHeaders$peaksCount[scans]) :
  unexpected end of peak list
Calls: xcmsSet -> xcmsRaw -> rampRawData -> rampSIPeaks -> .Call


I have no idea about the problem, do you have any suggestion??

I also tryed to use a loop to create an xcmsSet object:
Code: [Select]
for(i in 1:3){
    xset[i] <- xcmsSet(data.set[i], method="centWave",
                      polarity="positive", ## prefilter=c(3,5000),
                      ppm=10, snthr=1500, peakwidth=c(4,15))
    foo <- c(xset[i])
}

but the R console say: Error in xset <- xcmsSet(data.set, method = "centWave", polarity = "positive",  :
  object of type 'S4' is not subsettable


Best

 

Re: Pbroblem with large data set and loop

Reply #1
Ricca,

For the mz sort violation you can try running the code below. Change type to what you need i.e. .mzData .mzMl , .mzXML .

For the seconded problem not sure. It sounds as if your mzXML/mzData files are corrupt. you could try code :
Code: [Select]
rampid<-xcms:::rampOpen("MyFile.mzXML")
rampid
rampHead<-xcms:::rampScanHeaders(rampid)
head(rampHead)
raw<-xcms:::rampRawData(rampid)

Let us know how this goes.

Code: [Select]
## code for mz sort violation 
AllCDFs<-list.files(recursive=TRUE, pattern="cdf", ignore.case=TRUE, full.names=TRUE)
apply(AllCDFs, 1, CheckCDFfile, type=".cdf")

checkCDFfile<-function(file, type=".cdf"){
cat("n")
cat(paste("Loading File:", file, sep=""))
xr<-xcmsRaw(file, profstep=0)
for(i in 1:length(xr@scanindex)){
scan<-getScan(xr, scan=i)
if(is.unsorted(scan[,"mz"]) == TRUE){
cat(" x ")
newfile<-sub(type, "-Fixed.mzdata", file, ignore.case=TRUE, fixed=TRUE)
write.mzdata(xr, newfile)
file.copy(file, sub(type, ".OLD", file, ignore.case=TRUE))
unlink(file)
return(1)
}
if(i == length(xr@scanindex)){
cat(" O ")
return(0)
}
}
}
~~
H. Paul Benton
Scripps Research Institute
If you have an error with XCMS Online please send me the JOBID and submit an error via the XCMS Online contact page

Re: Pbroblem with large data set and loop

Reply #2
I try the first code:
Code: [Select]
rampid<-xcms:::rampOpen("17_GS34_A.mzdata")
rampid
[1] -1
Code: [Select]
rampHead<-xcms:::rampScanHeaders(rampid)
Error in xcms:::rampScanHeaders(rampid) : invalid rampid

For the second code you wrote I have a doubt about its use because it is for a .cdf file while I have an .mzdata

Re: Pbroblem with large data set and loop

Reply #3
Ricca,

Just change the cdf to what you need. It'll work for any data type as long as the file can be read into xcms. The function will be the same but calling it will be:
Code: [Select]
AllCDFs<-list.files(recursive=TRUE, pattern="mzdata", ignore.case=TRUE, full.names=TRUE)
apply(AllCDFs, 1, CheckCDFfile, type=".mzdata")

The rampid is odd.  :?  From memory it should be 0 or higher. You're loading the file that had the problem and not some other one? I would have a look with something else like OpenMS or mzViewer just to check the file loads.

Paul
~~
H. Paul Benton
Scripps Research Institute
If you have an error with XCMS Online please send me the JOBID and submit an error via the XCMS Online contact page

Re: Pbroblem with large data set and loop

Reply #4
Paul,

Quote
Just change the cdf to what you need.
I follow your instruction but apply give to me a bad response:

Error in apply(AllCDFs, 1, CheckCDFfile, type = ".mzdata") : dim(X) must have a positive length

Quote
You're loading the file that had the problem and not some other one?
Yes

Quote
I would have a look with something else like OpenMS or mzViewer just to check the file loads.
I used mzMine and all works

I relly have no idea.... :?:

Ricca

Re: Pbroblem with large data set and loop

Reply #5
opps!

I did apply and I should have sapply. So code should be:
Code: [Select]
AllCDFs<-list.files(recursive=TRUE, pattern="cdf", ignore.case=TRUE, full.names=TRUE)
sapply(AllCDFs, CheckCDFfile)
Should work now, sorry friday afternoon brain.
~~
H. Paul Benton
Scripps Research Institute
If you have an error with XCMS Online please send me the JOBID and submit an error via the XCMS Online contact page

Re: Pbroblem with large data set and loop

Reply #6
Paul,
now it works fine, all the files are characterized by a 0 and so there are no corrupted file... but the problem still remain...

Do you think is possible that the problem is the PC?? I work with a quad core, 7gb ram, ubuntu 10.04 workstation.

Re: Pbroblem with large data set and loop

Reply #7
Ricca,

What converted did you use? For the 2nd error message it sounds like the converted didn't convert the files correctly!
2nd error message
Quote
17_GS34_A.mzdata: Error in rampSIPeaks(rampid, scans, scanHeaders$peaksCount[scans]) :
unexpected end of peak list
I would, if possible remove these files and process without them. It's also worth trying another converter. The mz sort violation should all be solved right?

Paul
~~
H. Paul Benton
Scripps Research Institute
If you have an error with XCMS Online please send me the JOBID and submit an error via the XCMS Online contact page

Re: Pbroblem with large data set and loop

Reply #8
Paul,
maybe I discovered the problem. I tried to execute the R code in another linux machine and I haven't still had any problem... I hope the problem is only in the computer.

I'm sorry I made you waste your time and I would like to thank you for your help and your suggestion.

Best regards
Riccardo

P.S. I have a further question, I have no experience about the S4 programming rule and I see that the xcms library is written using S4 object. Where can I find information about S4 object programming?? For example how can I perform an easy loop code such this with S4 object?

Code: [Select]
for(i in 1:3){
    xset[i] <- xcmsSet(data.set[i], method="centWave",
                      polarity="positive", ## prefilter=c(3,5000),
                      ppm=10, snthr=1500, peakwidth=c(4,15))
    foo <- c(xset[i])
}

Re: Pbroblem with large data set and loop

Reply #9
Hey,

No problem, just happy we found the problem :)

In the code you wrote what is 'data.set'? The xcmsSet method takes files as the first argument and doesn't need them subset-ed. One xcmsSet is multiple files, and xcmsRaw object are single files.
The nice thing with xcms is that there are a lot of methods to extract the data you need. However, you can also access the slot directly. Here is some code for both:
Code: [Select]
library(xcms)
library(faahko)
gxs<-group(faahko) ## I'll just use the example dataset to skip the xcmsSet method

pairs<-groups(gxs)
head(pairs) ## will give the mass/rt pairs
val<-groupval(gxs, "medret", "into") ## this returns the intensity values for each feature from each file

This code uses nice object code and really is hte way it should be done. However, sometimes its just easier to use the slots themselves. Note thought that the groupval method is the best way to get the intensity values for the features, as there is no slot for this, as they are defined from slot peaks and groupidx(see below).

Code: [Select]
names(attributes(gxs))## these are all of the slots so..
head(gxs@groups)
head(gxs@peaks)
class(gxs@groupidx)
length(gxs@groupidx)
gxs@groupidx[[1]]

Good general guides to S4 classes are :
https://www.rmetrics.org/files/Meielisa ... alabi1.pdf
http://www.r-bloggers.com/resources-for ... d-methods/
~~
H. Paul Benton
Scripps Research Institute
If you have an error with XCMS Online please send me the JOBID and submit an error via the XCMS Online contact page