Skip to main content
Topic: Extracting blank samples in a discovery batch (Read 3638 times) previous topic - next topic

Extracting blank samples in a discovery batch

Assume you run a study with plasma samples and include a water sample as a blank for a discovery metabolomics experiment. Will that blank influence the m/z features detected? I have noticed that XCMS rarely outputs zeros in the resulting data matrix, and wonder if this is because it is integrating noise in an m/z window for missing values or because it is ignoring features with missing values. In other words, I am asking what will happen if the plasma samples have a peak that passes SNR threshold and other criteria, but the water blank doesn't have this peak. Will the feature be ignored or will XCMS try to integrate the water sample's missing value?

Re: Extracting blank samples in a discovery batch

Reply #1
Good and important question.

The answer depends on the settings for the group function (?group.density). The interesting parameters are minfrac and minsamp. Remember that they are set "per group". So it depends how you have grouped your data (sample group, not feature group... Usually you have grouped your samples/files by putting it in different sub-folders if not all in one group/folder.)
Blog: stanstrup.github.io

Re: Extracting blank samples in a discovery batch

Reply #2
Jan, thanks for your reply. First let me apologize for my naivete as I am far more familiar with multi-parameter apLCMS and xMSanalyzer. I am coming to like the performance of xcms quite a lot but still trying to figure out the nooks and crannies.

Quote from: "Jan Stanstrup"
The answer depends on the settings for the group function (?group.density). The interesting parameters are minfrac and minsamp. Remember that they are set "per group". So it depends how you have grouped your data (sample group, not feature group... Usually you have grouped your samples/files by putting it in different sub-folders if not all in one group/folder.)

I did not appreciate what group.density is doing in xcms. Let me see if I understand how to use these. (For the record, I have never worked with data by grouping into sub-folders. I usually extract an entire directory into one matrix, just like .CEL files into an Affymetrix Robust Multi-Array. If this is not the standard approach with xcms, I may need additional education.)

If minfrac is defined as 'minimum fraction of samples necessary in at least one of the sample groups for it to be a valid group', I suppose you would want to set the minfrac = # samples within smallest group / # of total samples. So if you had 5 cases, 5 controls and 2 water blanks, would you ignore the water blank and set minfrac to 5/12 (round down to 0.40), or include the blanks as a group and use 1/12 (round down to 0.15)? I am assuming all files are in the same sub-folder and extracted together. Or are you suggesting instead to extract samples per condition-group, so you'd extract the 5 cases, 5 controls and 2 blanks as 3 data matrices. But then how to combine them...?

As for minsamp, 'minimum number of samples necessary in at least one of the sample groups for it to be a valid group', is the purpose of this to allow xcms to define groups? In my above example, would I use 2/12 or 5/12?

Re: Extracting blank samples in a discovery batch

Reply #3
You only really need to use the data grouping if you want to use the stats part of XCMS. As for what is best to do with group I am not that sure myself. Personally I usually don't group the data and just do the kind of calculations you have done.

Some thoughts to consider:
If you expect your groups to be very different then it might be easier to do the alignment/grouping sensibly by grouping the data.
Also, if for example you have many groups and you do the calculations you did you might have a very low threshold for inclusion that could let a lot of noise in.
I would recommend not processing your sample with your blanks if possible as they often cause a lot of trouble for the alignment (since so few features are there).

Quote
I suppose you would want to set the minfrac = # samples within smallest group / # of total samples.
Yes, but putting it a bit lower to allow it to miss a few in smallest group too.

Quote
include the blanks as a group and use 1/12 (round down to 0.15)?
But if you do that it would also only need to be found within 15% of any of the samples which might be a bit too liberal. minfrac and minsamp should work as an AND so you could also just set minfrac low and use minsamp to control what happens in the "real groups" that might be easier to do sensibly without grouping the data. If you have similar sized groups that is. Otherwise grouping the data starts to make most sense.

Quote
Or are you suggesting instead to extract samples per condition-group, so you'd extract the 5 cases, 5 controls and 2 blanks as 3 data matrices. But then how to combine them...?
No I am not suggesting to process the datasets separately. Just for the grouping step you should consider if you want features that are only found in a subset or not. If you expect that you do you can either group the data and set minfrac and minsamp more strictly or if you don't group the data do the considerations/calculations you did.

In my mind it only really matters if you have very different groups so you'd expect some features to be absent from one group. Then you might be able to be a bit more strict (and thus get less noise) within each group than you would otherwise have to be with the considerations you just did.
Otherwise the calculations you did is how I would think. Just putting the requirements a bit lower to allow features to not have been found in exactly all samples within a group.
Blog: stanstrup.github.io

Re: Extracting blank samples in a discovery batch

Reply #4
A colleague pointed out that many instruments accumulate ions before injection, so blanks may not be quantitatively viable (as blanks with low total ion abundance will accumulate more signal from the background noise). Maybe this means efforts to include them in the xcms extraction are foolhardy (at least, for said instruments). Has anyone here used blanks in discovery experiments?