Skip to main content
Topic: Defining Column Headings (Read 6793 times) previous topic - next topic

Defining Column Headings

Let me preface my questions by stating the various violations for my specific case.

1) - I am not a chemist or biochemist. I am computer scientist working for a biochemist attempting to determine whether or not this tool will meet the needs of my employer for the purposes of our limited statistical tests.
2) - Our project is dealing with Mass Spectrometry of Lipids, not Proteins. Much of the software I have found is specifically geared towards dealing with Proteins, but this tool may be more general.

While attempting to figure out what relation each column in the output of a comparison of two mzData files produced by MassHunter, compared using the  HPLC/Q-TOF protocol, we determined we did not have a good understanding of what each of these columns meant:

Feature - Assumed to mean the order in which a particular m/z value was discovered. Not clear that this is or isn't the case though.
   
Fold Change - This appears to be more or less the result of Control / Experiment x? columns. What those mean and by extension what this means is unclear
       
m/z - I am aware that this is the mass/charge ratio, the quantity has very high precision, far more than can be accurate. At what point does it cut off the precision if at all? Does it ignore differences after a certain point?
   
Retention Time - Is this the actual Retention Time or the Corrected Retention Time? If it is corrected, what is the procedure it is using?
   
MaxInt - The label for this column is ambiguous, "Absolute Maximum Intensity for all Features in this row". I assume this means it is the the highest positive or negative intensity value, but what is a Row in this context?
   
dataset(x?) - The label for this column is ambiguous, "Mean Feature intensities for dataset #". Once again, the meaning of feature in context comes up.

Any help in nailing down what these terms mean would be much appreciated. I am fairly certain this software can meet our needs, which are, given a specific retention time, for a specific m/z intensity peak, find the m/z intensity peak that matches it in another data set and compare them to determine of they are statistically significant.

Thank you

Re: Defining Column Headings

Reply #1
I guess the rollover info for each column should have more detailed informations.

Feature = arbitrary number, determined by the original ordering of the feature table, normally ordered by p-value

fold change = mean fold change = mean(control) / mean(experiment)

e.g. Control(x) (name depends on your sample group name) = the mean value of the features intensities for you control group.

m/z - cutting off or rounding is a decision that you have to make based on you how much you trust your instruments accuracy.
see documentation, based on the algorithm it is normally calculated as a weighted average.

Retention time will be corrected, if you use retention time correction. See tab "Retention time correction". The algorithm can be chosen here, e.g. OBIWarp.

MaxInt shows the maximum peak intensity of that feature across all samples, not the integrated intensity as in Control(x) or Experiment(x),  but the highest absolute intensity.
The main purpose of this column is helping to decide if performing an MS/MS experiment on that compound would be feasible.

Quote
I am fairly certain this software can meet our needs, which are, given a specific retention time, for a specific m/z intensity peak, find the m/z intensity peak that matches it in another data set and compare them to determine of they are statistically significant.

Yes, that is what XCMS is doing.

Ralf.

Re: Defining Column Headings

Reply #2
Thank you. Some of the labels still seem ambiguous though.

If I sort by Retention Time and look at the p-value, this should tell me the statistical significance of the difference between the peaks in the control group and the experimental group. If this is correct we are, as they say, in business.

Re: Defining Column Headings

Reply #3
We have successfully started generating statistical data, but it violated our understanding of the columns.

Our entry in question had a fold change of 2,434.4, but had a p value of 0.499, which is statistically insignificant. What is the test that is being used? We expected that a high fold change would indicate statistically significant differences between two sets of data. These tests were run using the "into" test.

Thank you

Re: Defining Column Headings

Reply #4
p-values are calculated from a two-sample Welch t-statistics (unequal variances).

The fold change (mean fold change, see above) has nothing to do with the t-test.

I would suggest that you filter your results by p-value (e.g. < 0.01) first,
and then order by fold change, so you can prioritize significantly dysregulated metabolites with high fold changes.

Ralf.

Re: Defining Column Headings

Reply #5
Thank you for the suggestion. I have gotten the statistical data that the biochemist was looking for. Now they want to run  a 9 by 9 comparison, which will exceed the XCMSOnline data size limit per account by quite a significant margin. So I have to find a way to get the same results from the XCMS R package. Does the offline program produce a Diffreport in the same format as the online version?

I assume it does not handle .d Directories. Should we expect the same results with a .d directory as with an .mzdata.xml file?

Thank you

Re: Defining Column Headings

Reply #6
Hey Joklein,
did you ever figure out how to use teh offline version? I have the same problem you do and the same questions consequentely. Thanx

Re: Defining Column Headings

Reply #7
There should be a link at "Account" where you can request additional storage space.