Thanks Paul. This was a Waters Premier QTOF. I think it is about five years old. I checked the first handful of scans for both functions and I get same scan times between CDF (from databridge) and mzXML (from masswolf) when viewed in mzMine. Which differences do you see?
Do you have a more exact mass? Without some more digits it is impossible to tell for sure. I have observed [M-CO2-C3H6] at 86.0368. If you have exact mass you can use the Rdisop package for R to get an idea of the possibilities (http://www.bioconductor.org/packages/2. ... disop.html).
Like this: decomposeMass(43.99, ppm=20, mzabs=0.01)
So the error you see is simply uncalibrated data unfortunately.
edit: I just checked with masswolf and remembered why I don't use that. You get calibrated data all right but the lockmass scans are mixed in with you normal scan. Did you find a way to avoid/fix this?
The significance level for the venn diagram was 0.05 for p values corrected with the mt.rawp2adjp function with method BY; I believe you mentioned that one somewhere on this forum. For the other graphs uncorrected p values were used. I didn't do any filtering as such for the venn diagram but of course I could only used feature pairs where [M] and [M+1] were found. So it is based on 1874 features of my total of 6233. For the 3rd graph I cut off some extremely high intensity peaks for clarity (late eluting very broad peaks).
I have played a bit with Rdisop too with mixed results. It seems to work very well if: 1) the molecule is rather small 2) you can establish the isotopic ratio well i.e. the intensity is suitable. at low intensity you get random noise on [M+1], at high intensities you get saturation of [M] and thus overestimate the ratio. Also the scoring function seems to punish mass inaccuracy quite harshly. So sometimes it is better to look at the mass deviation and isotope ratio error separately and decide which seem reasonable to you knowing your instrument.
I would be interested to know if my data is particularly bad behaving or if you see similar results. I can send you the code I cooked up if you are interested.
Well now you got me curious. Paul have you ever looked into how well correlated p values are for [M] and [M+1]?
I gave it a go with my current dataset and got quite surprised that the relationship is not that nice... Take a look at some graphs: In red is pairs where the [M+1] has lower p value. I should mention this is p values calculated with my own statistics script (since the study design is a bit more complicated than what is handle by xcms).
If I plot the ratio of the p values against each other you can see that the ratio tends to be lower with higher intensity of the features. [attachment=0:xjdt8uon]ratio_against_median.png[/attachment:xjdt8uon]
So the central question remains... if you consider the compounds significant or not.
I don't think there is any authoritarian answer to that question. You have the same question in regards to fragments and adducts. I think you just have to choose for yourself; as long as you report what you did. I suggest you try to do some sort of informative plot to get a feel for how your data looks.
Maybe someone more experiences can jump in with some pointers on this.
Are you saying they are not in the peaklist at all? Or that they don't turn up significantly different between groups? How do you determine significance? Please describe what you did in more detail.
It is wholly possible that only isotopes turn up significantly different "by change"; the isotope slightly below the threshold you chose, the [M] slightly above. Imagine you have mean intensities of [M] 2 and [M+1] 0.2 in group one and intensities of [M] 1 and [M+1] 0.1 in group two. I group two the [M+1] could be below the detection limit; thus set to 0 and thus more significantly different than [M].
The CAMERA function findIsotopes tries to figure out which peaks are in fact the same molecular species but containing different isotopes of the atoms that make up the molecule. The peaks that belong together are then grouped and given an isotope group number; that is your 317. So you should have [M]- [M+1]- [M+2]-
that belong together; with the [M]- being the "normal" one without extra neutrons. The others then correspond to the same molecular species that contain 1 and 2 13C respectively or possibly other isotopes like 37Cl. All compounds (organic at least) exist naturally with different isotopes. But the [M] is usually (this depends on the atoms that make up the molecule and the size of the molecule) the one with the highest intensity (=more molecules exist with all atoms in their most "normal" isotope form). So in theory you should see isotopes for all peaks but often they are below the detection limit of your instrument and you just see one peak (of course depends on the concentration of your samples). The findIsotopes function as far as I know will use two criteria to predict which peaks are isotopes: 1) the mass difference has to account for an integer number of neutrons, 2) the ratio between the intensities need to make some sense (this step uses very liberal criteria since it is impossible to know what the correct ratio is without knowing the molecule).
Isotopes are important for several reasons of which some are:
You can sometimes use the relative intensities of the isotopic peaks to help the structure elucidation.
If you don't realise that something is an isotopic peak and you try to figure out what compound it is you will fail
If you do statistical analysis you need to consider that you have several peaks (=variables) representing the same compounds (this is true of fragments and adducts too)
You could check memory.limit() to see how much memory R can actually use on your system. And check the task manager in windows to see how much is free. Since there should be enough memory for what it is asking maybe it is a matter of cleaning up you R workspace before doing this command. As far as I know R needs continues memory allocation. So saving the xsl object, restarting the computer, load the object and running that command might help.