Hi All,
I am using XCMS for metabolomics analysis. The raw data is from UPLC+QTOF machine. When I use the centwave method and set the ppm as 30, the software always show me the warning: "There are 5533 peak data insertion problems. Please try lowering the 'ppm' parameter." Which range should be suitable for the ppm in UPLC+QTOF?
And for the diffreport, after the t-test, do we still have all the data or it's already been cut off by a default threshold?
When perform the annotation, can I just use the values in "mzmed" column with considering proton mass and delta mass? Do I need to calculate isotopes and different adducts?
anlin,
This means that your data has a higher resolution than the 30 ppm norm and the samples have some very close peaks. I would have to try different values but try using 15, then 10, then 5 until the warning stops.Note you can also go too low!
I'm not fully sure what you mean. The only threshold you have done is when you did the grouping by using the minfrac parameter. The normal for this parameter is set at 50%. Meaning that a sample needs to be seen in at least 50% of the samples for each class.
I would have a look at the CAMERA package put together by Steffen Neumann's group. It's a very nice package and will identify you isotopes and adducts in the dataset. If something does not get identified with and adduct then it will be difficult to simply use the mzmed value +H for identification purposes.
Hope it helps, let us know if I didn't really answer the questions :)
Cheers,
Paul
Hi Paul,
Thanks. Your reply is very helpful. For the second question, I just want to know the meaning of t-test in diffreport. Does it measure the differential expression between control and case group?
anlin,
Yes the (welches) T-test is evaluating the probability that the two classes come from the same distribution. Therefore, if the p-value is low there is a high probability that the classes are different. Given your chosen alpha (0.05 or 0.01 or 0.001 etc) you can say that they come from different distributions and therefore are from different classes. The T-stat can be used to find which way around the p-vaule is ie which class is 'upregulated' or 'down regulated'.
Hope it helps,
Paul
Hi Paul,
That's really helpful. Thanks. And I got one more question. After completed XCMS and CAMERA analysis, I got the annotated Diffreport file. If I want to annotate the results based on mz value, can I use the mzmed column as a query directly? Do I need to consider the mass of proton and ppm?
anlin,
With CAMERA the results have been annotated. If you mean putting a metabolite name to them it depends on how and what you're using for identification. Searches such as Metlin allow for adduct searches and so consequently you can put the mzmed directly into metlin and tell it that it is an M+H ion (or whatever the CAMERA result is). KEGG searches will require the removal of the adduct to make the neutral mass.
Hope it helps, again let me know how you get on and if I answered everything :)
cheers,
Paul
Thanks again. I have combined xcms with camera and tested on the TNTvsSHAM public shared dataset. However, my local results have 10440 records, the xcmsonline just provide 9242 records under same parameter. I pasted my code below. Would you please check it? Thanks :)
# Calling Libraries
library(xcms)
library(CAMERA)
files <- list.files(myDir, pattern="*.mzXML", recursive=TRUE, full.names=TRUE)
paste(files)
pd <- xcms:::phenoDataFromPaths(files)
xset <- xcmsSet(files,method="centWave", nSlaves=3, prefilter=c(0,0), ppm=30, peakwidth=c(10,60), snthr=6,mzdiff=0.01)
xset <- group(xset)
## retcor.obiwarp {xcms}
xsetR <- retcor(xset, method="obiwarp",profStep=1,plottype = "deviation")
## group.density {xcms}
xsetR <- group(xsetR,bw=5, mzwid=0.025, minfrac=0.5, minsamp=1)
## fillPeaks-methods {xcms}
xset.finale <- fillPeaks(xsetR)
## Welch t-test (unequal variance)
report <- annotateDiffreport(xset.finale,sortpval=FALSE, nSlaves=3, sigma=6, perfwhm=0.6,
cor_eic_th=0.75, graphMethod="hcs", pval=0.05, calcCiS=TRUE,
calcIso=FALSE, calcCaS=FALSE, maxcharge=3, maxiso=4, minfrac=0.5,
ppm=5, mzabs=0.015, quick=FALSE, psg_list=NULL, rules=NULL,
polarity="positive", multiplier=3, max_peaks=100, intval="into",
pval_th = NULL, fc_th = NULL)
write.csv(report,"report1.csv")
anlin,
I dont' know what parameters you used in xcmsOnline. Are they identical settings? It seems a bit odd to have the prefilter set at 0 and 0. This would mean to look at peaks with 0 intensity that have 0 scans ie all noise and peaks/any signal. This could well be the reason for the difference. If the settings are identical check that the versions of xcms are the same between xcmsOnline and xcms.
Paul
Paul,
You are right the prefilter should be set up with other value. But I am little confused about the definition about "intensity" and "scan". Does it mean the number of occurrence for one mass trace among the test samples?
Hi Paul,
I found in the log file of TNTvsSHAM project, they mentioned information below:
6. Diffreport
class1 SHAM
class2 TNT
statistics.threshold.pvalue 0.001
statistics.diffReport.value into
I checked user manual of XCMS, but I did find any parameter in diffreport function that relate to "statistics.threshold.pvalue" and "statistics.diffReport.value". Do you know how to set up these parameters?
anlin,
?findPeaks.centWave
#starting httpd help server
This means that for each mass trace ie a single feature in the current sample the peak detector is looking at it will use k scans with an intensity above I. The documentation could be written a bit clearer I agree.
Next, the parameters you're seeing are in xcmsOnline. These are only parameters to limit the reporting back of the information.
statistics.threshold.pvalue: 0.001 This is pvalue threshold they use for reporting or probably in the cloud plot
statistics.diffReport.value : This is simple if the peak detector value is going to use the integrated peak intesity (into) or the maximal peak intensity (maxo).
Also please start new comment threads for new topics. It helps others find information to answer their questions.
Cheers,
Paul