Skip to main content
Topic: Tandem MS/MS .d files and Proteowizard (Read 10057 times) previous topic - next topic

Tandem MS/MS .d files and Proteowizard

Hi Everyone,

I am trying to convert Agilent .d files to .mzXML files using ProteoWizard (for eventual use in xcms).  When viewing the .d files in SeeMS, the MS Level column only contains '2' data.  When converting the files to .mzXML with MSConvert and using in xcms, I receive errors that there is no MS1 data.  Does anyone have experience reading .d files obtained with tandem MS/MS with the ProteoWizard set of tools?  When viewing the chromatograms in MassHunter, precursor ions are clearly distinguished from fragments, so I am not sure why the Proteowizard tools fail to recognize the MS1 scans. 

Thanks.

Re: Tandem MS/MS .d files and Proteowizard

Reply #1
It is unclear what you are trying to achieve. It doesn't sound like an msconvert issue to me. Do you want to ignore ms2 data and do profiling on ms1 data in the file or do you want to do profiling on ms2 data?
If the first is the case you should use --filter "msLevel 1" with msconvert to get only the ms1 data.

The error you report suggest that you are trying to use xcmsSet on your data  ms2 data? I think you can set mslevel=2 in xcmsSet if you want to do that though I have not tried it myself. You can use xcmsRaw (includeMSn=TRUE) for one file and see how xcms sees your data.
Blog: stanstrup.github.io

Re: Tandem MS/MS .d files and Proteowizard

Reply #2
Hi Jan,

Thank you for your reply.  I should have started off with my target, and why I think there may be a problem with reading the data files.  I do want to do profiling on MS2 data.  When I do xcmsRaw(file, includeMSn=TRUE), I receive the warning message: In ‘profStep<-‘(‘*tmp*’,value=1): MS1 scans empty. Skipping profile matrix calculation. 

My goal is to use precursor ion and fragment data to align peaks across many data files, and eventually use metaxcms.  When I use xset<-xcms(mslevel=2), I get the warning message:
 In split.xcmsRaw(object, f=object@msnLevel==mslevel): MSn information will be dropped

Despite this warning, I still get peak grouping when I continue. (I use xset1<-group(xset,bw=10); xset2<-retcor(xset1,family= “symmetric”,plottype= “mdevden”);xset3<-group(xset2,bw=10)).    The final error that I can’t get past is the fillpeaks step (xset4<-fillPeaks(xset3) where I get the following error:
Error in buf[bufidx[imz[1]:imz[2]],iret[1]:iret[2],drop=FALSE:subscript out of bounds
In addition: Warning message:
In FUN(X[[1L]],…): (corrected) retention time vector length mismatch for file.mzXML

Because of the repeated warning messages of missing MS1 data, I thought compatibility between .d and SeeMS may be the root of the problem.  Since viewing the raw Agilent .d files in SeeMS seems to show no MS1 data under the ‘MS Level’ column, I thought I would try this route in order to resolve the warning message after the xcmsSet step.   

Thanks.

Re: Tandem MS/MS .d files and Proteowizard

Reply #3
As far as I know xcms was never build to be able to use MS2 data for profiling in any smart way. As I suggested I thought it was possible to force it to read it like ms1 data. But it seems I was wrong.

What you can do is first make the ms2 data look like ms1 and then do profiling. This I have done myself before. It goes like this:
for each file you read the data with xcmsRaw, convert it to look like ms1 with msn2xcms from the MetShot package and you can then write it to a file with write.mzdata. With the new files you should now be able to profile normally with xcmsSet.

You will of course loose the MS2 information.


Some other questions:
does your file contain only MS2 or also an MS1 trace? It seems you are expecting this or are you thinking of the precursors as MS1? This would not be the way most programs think of MS1.
Is the MS2 data dependent or are you doing a targeted analysis?
Blog: stanstrup.github.io

Re: Tandem MS/MS .d files and Proteowizard

Reply #4
Thanks Jan, I will try using the MetShot package.

I'm not sure if I am fully answering your question.  I collected the data in auto ms/ms mode, so yes, my understanding is that the MS2 data is dependent on the results obtained from the MS1 scans.  I think the data we have is a full MS1 trace - where the masshunter software automatically identifies the precursor ions.

Thanks.

Re: Tandem MS/MS .d files and Proteowizard

Reply #5
In that case it doesn't really make much sense to do the profiling on the MS2 data. Sometimes ions will be selected for MS2, sometimes they won't. I guess the number of ions picked might also be different so the intensities probably can't be compared either. I am sure there are also many other reasons you won't get a useful comparison.

What you should probably do is use the MS1 trace to do the profiling. Then you can use the MS2 in your vendor software to help identification.
To get a file with only MS1 data you should be able to do as described here: viewtopic.php?f=8&t=572#p1781
Blog: stanstrup.github.io

Re: Tandem MS/MS .d files and Proteowizard

Reply #6
Thanks for your help, I had tried using the MS Level 1 filter before, but the files generated are empty.  However, using the msn2xcmsRaw conversion does seem to be working well.  The data in the mzData files looks complete when viewing with plotChrom.  But a new, possibly related, problem is coming up.  The final diffreport doesn’t contain all the data I think it should.  Each xcmsRaw object contains about 2600 mass spectra, but only 10% of these show up in the diffreport file.  This would be OK if the data that seems most important by eye were in the data set, but these are absent.  Instead the file contains many seemingly unimportant fragments.  The .d files contain very clear peaks, with 2-3 abundant m/z values.  Instead of these abundant compounds, which are found in nearly all of our samples, the diffreport is showing some really tiny fragments.  I used the default xcmsSet(files).  I have been changing some of the default values but haven’t had luck yet (I set fwhm = c(5,20)).  Are there any parameters you suspect need to be changed?  We have many different compounds with similar m/z (300.9 is an important fragment at several different retention times).  We’re using a Q-TOF with ppm 2-5.  These are 60 minute runs, 0.2ml/min flow rate.

Re: Tandem MS/MS .d files and Proteowizard

Reply #7
The diffreport contained picked peaks. I don't see how the number of mass spectra should relate to that.

I would at least set:
method =  'centWave'
profparam= list(step=0.005)

and prefilter to c(3,X) where X is something appropriate for your system (it should correspond to the lowest intensity you want to be considered for a peak). Also set peakwidth to something appropriate. See ?findPeaks.centWave. You can perhaps lower the ppm from the default of 25 Da but I doubt it will have a big effect. You should always set it well above your machines theoretical accuracy as this is always far on the optimistic side compared to real world experiments.

Often if peaks are missing, however, it is a problem in your "group" step. Be aware that minsamp and minfrac are counted in each sample class separately. mzwid in "group" can probably be lowered significantly from the default of 0.25. 0.05 or even lower might be appropriate for you. You need to play with the settings here too.
Also if you didn't you need to use fillPeaks.

I still don't think using msn2xcmsRaw is right for you. If your files contain an MS1 trace you need to use that in my opinion. If you really only have a data dependent MS2 trace I don't see how you could sensibly compare samples using that.
Blog: stanstrup.github.io

Re: Tandem MS/MS .d files and Proteowizard

Reply #8
Quote
Be aware that minsamp and minfrac are counted in each sample class separately.

hi Jan,

I'm interesting to know how xcms counts sample class. Should we create enough class files (equal to the number of sample classes) to tell xcms our classes? If there are subclasses?
Is there correlation between minfrac and minsamp?

For example, if I have 16 samples for 2 classes. I create 2 files and each file contains 8 samples. If for my studies, the feature is seen in at least 2 samples of any one class, it can be a valid group. My minfrac must be set to 0.25 and my minsamp must be set to 2? And, how does xcms settle features detected in both classes?