Skip to main content
Topic: Wiff file issues (Read 11571 times) previous topic - next topic

Wiff file issues

Hello,

I've got some strange problems with .Wiff files from an AB SCIEX triple-quad machine that I'm trying to analyze. The files I have contain multiple MS levels (we need to exclude a contaminant ion, so we analyze m/z 50-180 and 190-1200, both at 2 different energies), as well as the trace from the DAD detector, making 5 traces. I can analyze this data fine in (proprietary) software such as MarkerView, but I'd prefer to use xcms since I need to features that are not available in MarkerView.

Now, I've tried to upload the .Wiff and the .wiff.scan files, and submitted a job, but I get this error:

Quote
There were no features detected in at least one of your samples. Please check feature detection settings and make sure the selected method is applicable for your data.

However, if I look in PeakView, the AB SCIEX software that can read .wiff files, I can clearly see features, an image from some random extracted ion is here:


It may not look the best, but it should be able to find some features in such a file...

I have the idea that it is a problem with the multiple MS levels that are stored in a single file, also because I could no-where specify which level it should use for the analysis. Unfortunately, the translat.exe that should be able to convert the .wiff files to .NETcdf files for each channel is not working, the AB SCIEX MS data converter can only create .mzml files containing all channels, or .MGF files that can only read MS/MS data. If try to convert that data with Proteowizard to MzXML files, I select the right file format, choose 32 bit, and enable compression, select 'peak picking' and add MS-level 1-1- it subsequently creates files that are 1.4 GB large, which is frankly absurd for a 40-min MS1 run in centroid mode. XCMS under R can't do anything with that file either, it gives a memory error.

Does anyone here have any experience with .wiff files containing multiple MS levels? Any help will be greatly appreciated!

Re: Wiff file issues

Reply #1
Hi Arjen,

which parameter set did you use ?

XCMS will only use MS1 information for profiling. The MS2 will simply be ignored for this step.
But it sounds like your MS1 data might be spread across different scans. I can have a look it if you send me one of those files.

Ralf

Re: Wiff file issues

Reply #2
Hi Ralf,

Thanks for your answer, I used the parameters for the machine I used (UPLC/TripleTof pos), hope this is ok.

I can send you a file, I've send you a PM via the forums to give you my email address.

Re: Wiff file issues

Reply #3
Hi Arjen,

after converting it with proteoWizard (and no filtering) I see only MS1 data in this file for some reason. With Peak picking (vendor) enabled the resulting file had a size of 390MB.

Code: [Select]
 $cat 'Test file Arjen-55-4-5ul-pos-40min-grad.mzXML'  | grep msLevel="1"
 $ msLevel="1"
 $ msLevel="1"
 ]...]
 $cat 'Test file Arjen-55-4-5ul-pos-40min-grad.mzXML'  | grep msLevel="2"
 $

Since all the scans are simply declared as MS1 data it is impossible to process them properly.

If you load it into XCMS you can see that the m/z range is alternating between the scans, e.g. 
Code: [Select]
> range(getScan(x,1002)[,"mz"])
[1]  66.01918 184.95771
> range(getScan(x,1003)[,"mz"])
[1]  65.01617 184.97498
> range(getScan(x,1004)[,"mz"])
[1]  191.0045 1196.2489
> range(getScan(x,1005)[,"mz"])
[1]  190.266 1199.619

XCMS cannot process this data.
1. there is a problem with the conversion. The scans are not identified as MS1 vs MS2 etc. You can contact Matt at proteoWizard but the problem might be related AB SCIEX' conversion library.
2. if the scan order is known and is the same throughout the file then you might be able to write your own script to rearrange the data so you take only the MS1 out of it and feed it into XCMS.

Hope that helps,
Ralf

Re: Wiff file issues

Reply #4
Hi Ralf,

Thanks so much for looking at the data, really helpful! I'll contact Matt to see if he's known with the issue, and will keep asking AB SCIEX for a proper solution. Could you also see a 5th level in the files, there should be a trace of DAD array in there as well. I guess that proteowizard is part of the pipeline for xcms-online, and thus it doesn't work either when I directly upload the .wiff files.

Anyway, I'll see what the guys at proteowizard can do, otherwise I'll have to extract the relevant data myself out of the files.

Re: Wiff file issues

Reply #5
All scans are indicated as MS1, full scan, pos mode. See below the XML headers for the first 12 scans in your file.
The total number of "MS1 scans" is 49368.

I don't know if there is a tag for DAD data in the mzXML dialect, but it should definitely not show the MSn data as MS1.

Code: [Select]
<?xml version="1.0" encoding="ISO-8859-1"?>
<mzXML xmlns="http://sashimi.sourceforge.net/schema_revision/mzXML_3.2"
      xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
      xsi:schemaLocation="http://sashimi.sourceforge.net/schema_revision/mzXML_3.2 http://sashimi.sourceforge.net/schema_revision/mzXML_3.2/mzXML_idx_3.2.xsd">
  <msRun scanCount="49368" startTime="PT0.048S" endTime="PT2520.05S">
    <parentFile fileName="file://Test file Arjen.wiff"
                fileType="RAWData"
                fileSha1="37fbad3a28372d24706e0965b09831e1733c5bae"/>
    <msInstrument msInstrumentID="1">
      <msManufacturer category="msManufacturer" value="instrument model"/>
      <msModel category="msModel" value="Applied Biosystems instrument model"/>
      <software type="acquisition" name="Analyst" version="unknown"/>
    </msInstrument>
    <dataProcessing>
      <software type="conversion" name="ProteoWizard" version="3.0.3768"/>
      <processingOperation name="Conversion to mzML"/>
    </dataProcessing>
    <dataProcessing centroided="1">
      <comment>ABI/Analyst peak picking</comment>
    </dataProcessing>
    <scan num="1"
          scanType="Full"
          centroided="1"
          msLevel="1"
          peaksCount="53"
          polarity="+"
          retentionTime="PT0.048S"
          basePeakMz="279.09573846629"
          basePeakIntensity="125.0"
          totIonCurrent="22285.0">
      <peaks compressionType="none"
            compressedLen="0"
            precision="64"
            byteOrder="network"
            contentType="m/z-int">...  </scan>
    <scan num="2"
          scanType="Full"
          centroided="1"
          msLevel="1"
          peaksCount="114"
          polarity="+"
          retentionTime="PT0.099S"
          basePeakMz="141.958288465593"
          basePeakIntensity="643.0"
          totIonCurrent="26701.0">
      <peaks compressionType="none"
            compressedLen="0"
            precision="64"
            byteOrder="network"
            contentType="m/z-int">...  </scan>
    <scan num="3"
          scanType="Full"
          centroided="1"
          msLevel="1"
          peaksCount="30"
          polarity="+"
          retentionTime="PT0.151S"
          basePeakMz="97.967953335797"
          basePeakIntensity="188.0"
          totIonCurrent="7161.0">
      <peaks compressionType="none"
            compressedLen="0"
            precision="64"
            byteOrder="network"
            contentType="m/z-int">...  </scan>
    <scan num="4"
          scanType="Full"
          centroided="1"
          msLevel="1"
          peaksCount="6"
          polarity="+"
          retentionTime="PT0.202S"
          basePeakMz="327.076610657294"
          basePeakIntensity="42.0"
          totIonCurrent="6201.0">
      <peaks compressionType="none"
            compressedLen="0"
            precision="64"
            byteOrder="network"
            contentType="m/z-int">... </scan>
    <scan num="5"
          scanType="Full"
          centroided="1"
          msLevel="1"
          peaksCount="47"
          polarity="+"
          retentionTime="PT0.253S"
          basePeakMz="279.093380647333"
          basePeakIntensity="167.0"
          totIonCurrent="22174.0">
      <peaks compressionType="none"
            compressedLen="0"
            precision="64"
            byteOrder="network"
            contentType="m/z-int">... </scan>
    <scan num="6"
          scanType="Full"
          centroided="1"
          msLevel="1"
          peaksCount="111"
          polarity="+"
          retentionTime="PT0.304S"
          basePeakMz="141.956606899812"
          basePeakIntensity="686.0"
          totIonCurrent="26304.0">
      <peaks compressionType="none"
            compressedLen="0"
            precision="64"
            byteOrder="network"
            contentType="m/z-int">...  </scan>
    <scan num="7"
          scanType="Full"
          centroided="1"
          msLevel="1"
          peaksCount="29"
          polarity="+"
          retentionTime="PT0.355S"
          basePeakMz="97.96935027767"
          basePeakIntensity="104.0"
          totIonCurrent="6535.0">
      <peaks compressionType="none"
            compressedLen="0"
            precision="64"
            byteOrder="network"
            contentType="m/z-int">...  </scan>
    <scan num="8"
          scanType="Full"
          centroided="1"
          msLevel="1"
          peaksCount="6"
          polarity="+"
          retentionTime="PT0.406S"
          basePeakMz="201.045904412664"
          basePeakIntensity="63.0"
          totIonCurrent="6368.0">
      <peaks compressionType="none"
            compressedLen="0"
            precision="64"
            byteOrder="network"
            contentType="m/z-int">...  </scan>
    <scan num="9"
          scanType="Full"
          centroided="1"
          msLevel="1"
          peaksCount="42"
          polarity="+"
          retentionTime="PT0.457S"
          basePeakMz="279.09573846629"
          basePeakIntensity="167.0"
          totIonCurrent="18979.0">
      <peaks compressionType="none"
            compressedLen="0"
            precision="64"
            byteOrder="network"
            contentType="m/z-int">...    </scan>
    <scan num="10"
          scanType="Full"
          centroided="1"
          msLevel="1"
          peaksCount="115"
          polarity="+"
          retentionTime="PT0.508S"
          basePeakMz="141.956606899812"
          basePeakIntensity="677.0"
          totIonCurrent="25782.0">
      <peaks compressionType="none"
            compressedLen="0"
            precision="64"
            byteOrder="network"
            contentType="m/z-int">...  </scan>
    <scan num="11"
          scanType="Full"
          centroided="1"
          msLevel="1"
          peaksCount="25"
          polarity="+"
          retentionTime="PT0.559S"
          basePeakMz="97.966556403882"
          basePeakIntensity="104.0"
          totIonCurrent="7203.0">
      <peaks compressionType="none"
            compressedLen="0"
            precision="64"
            byteOrder="network"
            contentType="m/z-int">...  </scan>
    <scan num="12"
          scanType="Full"
          centroided="1"
          msLevel="1"
          peaksCount="9"
          polarity="+"
          retentionTime="PT0.61S"
          basePeakMz="327.074058198928"
          basePeakIntensity="84.0"
          totIonCurrent="5992.0">
      <peaks compressionType="none"
            compressedLen="0"
            precision="64"
            byteOrder="network"
            contentType="m/z-int">...</scan>


Re: Wiff file issues

Reply #6
Hi,

yes, splitting should work:

Code: [Select]
library(xcms)

## load file
xr <- xcmsRaw("test file.mzXML", profstep=0)

## Create a "Tag" for each scan,
scans <- length(xr@scanindex)
f <- rep(c("MS1", "MS2A", "MS2B", "DAD"),scans/4+1)[1:scans]

## Split xcmsRaw into one xcmsRaw per "Tag"
xrs <- split(xr, f=f)

## Extract the MS1 stuff
xr1 <- xrs[["MS1"]]

## Fix the profile matrix required for plotChrom et al
profMethod(xr1) <- "bin"
profStep(xr1) <- 5
plotChrom(xr1, rtrange=c(600,700))

## FInd Peaks (not optimized parameters here!)
p <- findPeaks(xr1, method="centWave")

## Write corrected files out
write.cdf(xr1, "test_file_MS1.cdf")
write.mzdata(xr1, "test_file_MS1.mzData")

Yours,
Steffen
--
IPB Halle                          Mass spectrometry & Bioinformatics
Dr. Steffen Neumann         http://www.IPB-Halle.DE
Weinberg 3 06120 Halle     Tel. +49 (0) 345 5582 - 1470
sneumann(at)IPB-Halle.DE

Re: Wiff file issues

Reply #7
Hi all, thanks so much for your help!

One (naive) question on the splitten.R, is the peakpicking at this step necessary, or can I write directly to cdf? I'd rather do the peakpicking of the whole dataset (24 samples) at once, not file-by-file, and since the computer I have is "not optimized" for number-crunching everything goes rather slow, and the peakpicking take quite some time.

Thanks again, this really helps - the software from ab-sciex is pretty with colours and lots of buttons, but it can't really do basic stuff...

Re: Wiff file issues

Reply #8
Hi,

of course you can skip the peak picking, and go from
xr1 <- xrs[["MS1"]] directly to write.cdf(xr1).

Not even the ProfStep stuff is needed.

Yours,
Steffen
--
IPB Halle                          Mass spectrometry & Bioinformatics
Dr. Steffen Neumann         http://www.IPB-Halle.DE
Weinberg 3 06120 Halle     Tel. +49 (0) 345 5582 - 1470
sneumann(at)IPB-Halle.DE

Re: Wiff file issues

Reply #9
For the sake of completion, here is the code I use to convert a directory of MS-convert MZml files to .cdf files, in case someone else runs into the same problem and has quite a few samples. Courtesy of Steffen Neumann, he gave me the script a few weeks ago.

Code: [Select]
library(xcms)
    files <- list.files(pattern="*.mzML")
    for (file in files) {
        mzdata <- paste(file, ".mzData", sep="")
        if (! file.exists(mzdata)) {
            xr <- xcmsRaw(file)
  scans <- length(xr@scanindex)
  tags <- c("MS1", "MS1B", "MS2A", "MS2B")
  f <- rep(tags, scans/length(tags)+1)[1:scans]
  xrs <- split(xr, f=f)
  xr1 <- xrs[["MS1"]]
  write.cdf(xr1, sprintf("MS1_%s.cdf", file)) 
    }
    }