Dear Forum,
we are conducting metabolomic experiments using a AB Sciex 5600 TripleToF with DDA (Data Dependendent Aquisition) unsing R 3.6.0 under MSnbase (2.9.5) and xcms (3.5.5).
So we have MS1-scans and MS2-scans intrinsically in the raw data files.
The question here refers, on how to obtain the correct number of scans per peak in one file. The raw data was read with:
readMSData(files = files, pdata = new("NAnnotatedDataFrame", pd), msLevel. = 1)
Given the information from the function chromPeaks(object, bySample = FALSE, rt = numeric(), mz = numeric(), ppm = 0, type = "any"), this results in the following table.
mz mzmin mzmax rt rtmin rtmax into intb maxo sn egauss mu sigma h f dppm scale scpos scmin scmax lmin lmax sample is_filled
CP000001 185.0415 185.0409 185.0423 46.568 40.745 53.293 1623.3019 1605.9049 193.8009 25 NA NA NA NA 6 1 9 169 160 178 148 185 1 0
CP000002 185.0419 185.0409 185.0429 3.887 0.724 6.577 763.4537 755.4926 170.2468 21 NA NA NA NA 6 4 7 15 8 22 3 25 1 0
CP000003 512.8859 512.8845 512.8887 51.321 49.069 52.634 322.0898 319.1189 175.0130 174 NA NA NA NA 7 8 7 182 175 189 87 93 1 0
CP000004 271.9464 271.9443 271.9484 51.321 48.780 53.293 303.1867 299.2378 142.4416 141 NA NA NA NA 8 8 7 182 175 189 87 95 1 0
CP000005 385.9267 385.9250 385.9298 51.321 48.780 53.293 275.9011 271.9522 131.7186 131 NA NA NA NA 9 5 7 182 175 189 87 95 1 0
CP000006 498.9059 498.9042 498.9077 50.666 49.069 53.293 256.1620 252.5414 133.5325 133 NA NA NA NA 10 7 7 181 174 188 87 94 1 0
Is it okay to use the colums "scmin" "scmax", i.e. to compute scmax - scmin to get the correct number of scans for each peak,
or is there a need to take into account, that several scans need to be omitted for MS2-scans?
Basically the question (for DDA-experiments) simply condenses on how the scan numbering works:
How are the MS1-scans are numbered intrinsically?
How are the MS2-scans are numbered intrinsically?
By the way, what is the meaning of the columns lmin lmax? I could not find the meaning in the documentation of chromPeaks() ...
Thanks for an answer.
kind regards
Tony
Hi Tony,
While I have know the answers to your question, you could probably test it by filtering out the MS/MS data using MSConvert. Compare the original data to the filtered data and see if they are the same.
Alternatively, you could take a look at one of the files like this:
# Read data file
raw_data<-readMSData(files = files[1], pdata = new("NAnnotatedDataFrame", pd), msLevel. = 1)
# Extract out spectra data
spec<-spectra(raw_data)
# Iterate through the spectra list and obtain relevant information
specl<-lapply(spec,function(x) c(x@msLevel,x@rt,x@scanIndex))
# Compact list to matrix
specl<-do.call(rbind,specl)
This will create a matrix that contains the msLevel, retention time and scan number for every scan in the data. It shouldn't take long to cross reference a couple peaks to confirm whether 'scmin/scmax' need to be adjusted. If so, you have just generated a matrix that can be used to calculate the new scan counts!
Regarding lmin and lmax: I am not too sure about these. They can be found in the CentWave code, where they are found to be the minimum 'continuous wavelet transform' (CWT) coefficients on either side of a peak. Interestingly, it looks like rtmin/rtmax are defined using the same values.
I could be way off, but I think scmin/scmax are estimates of the peak bounds based on the CWT.
As it looks like 'into' is calculated using the bounds defined in lmin/lmax, perhaps these are the better columns for you to use. But do note that these values are not always referenced from scan 1.
I would be careful with the scmin/scmax lmin/lmax columns - I do not recall what they exactly mean. We do by default not record from which spectrum the data of a chromatographic peak comes, but with the retention time and m/z range available it is easy to subset/extract all spectra for one chromatographic peak.
What exactly do you want/need to do with the data? Maybe there is a simple solution for that...
This is simply needed to calculate how many scans each chromatographic peak has in order to set the cycle time and mass range approriately (at least on average).
One possibility would be to first extract the ion chromatograms for all detected peaks and then count the number of data points in each:
## Subset to one file, assuming xdata is an XCMSnExp with identified chrom peaks
xdata_1 <- filterFile(xdata, 1)
chrs <- chromatogram(xdata_1, rt = chromPeaks(xdata_1)[, c("rtmin", "rtmax")],
mz = chromPeaks(xdata_1)[, c("mzmin", "mzmax")])
head(lengths(chrs))
## median number of scans for all peaks in a the file
median(lengths(chrs))
Note: this should be done separately for each file, hence the filterFile step.