Skip to main content
Topic: Issue with .cdf files (Read 5726 times) previous topic - next topic

Issue with .cdf files

Hi all

I have been trying for a while now to make sense of all of the software etc. Available for use with metabolomics.

I have water .raw data run on a ULPC water synapt G2 which I converted using databridge, this gave me four files, I think one is lock spray and one is UV data...3.cdf and 4.cdf

Initially I tried using mzmine with my .raw files but it doesn't seem to like the scan filter and I have a lot to get through so moved onto my .cdf files. I have been trying to use xcms online but I am worried that the division of my mass spec info (ie. 1.cdf and 2.cdf) is meaning that I am losing information, is this concern valid?

I have seen some other threads suggesting that waters users need to go back to masslynx and make sure all their data is centrioded before they export and convert... but others suggest using proteowizard to convert .raw files to mzxml... I have tried a while ago to convert files using proteowizard but it doesn't or at least didn't seem to like waters files at the time... has this since been updated? As a return to masslynx would be a nightmare for me as I do not have a license and would need to travel to use it!

Anyways I am feeling very lost and due to the size of the files in question trying things to see if they work takes such a long time so if anyone has any suggestions it would be much appreciated. Thanks.

Re: Issue with .cdf files

Reply #1
You will need to know what is in each function/scanEvent to know what is best to do. Analyzing different functions together probably doesn't make much sense. One could be a low energy scan and the other a high energy scan, or MSE.
If you open the _extern.inf file inside the raw folder each function is described at the bottom.

For conversion with proteowizard you have two problems:
1) Centroiding (if your files actually were recorded in profile mode). For waters data msconvert cannot use the Waters centroiding algorithm. It uses its own which is inferior. You can get around it by centroiding the files in masslynx to create centroided raw files that can then be converted
2) Accurate mass. It used to be that msconvert were not able to use the lockmass information to calibrate the masses. So you got uncalibrated data. Now it appears that the solution depends on how the files were recorded and the version of masslynx... So yes a big mess. Options are:
a) Some files will need the lockmassRefiner filter in msconvert. From my short tests it appears that some newer versions of masslynx will do the conversion correctly without this now.
b) Some files now seem to have the correction baked in.

You will need to check the masses masslynx is showing and comparing to the converted file to understand if things were converted correctly.
Blog: stanstrup.github.io

 

Re: Issue with .cdf files

Reply #2
Thank you for your reply, and please forgive my ignorance. But I want to be sure that I understand your instructions, so if I return to masslynx and compare that information with my .cdf files, you suggest I am likely to find discrepancies. That being the case how can I correct these and export data from masslynx that is centrioded?

Re: Issue with .cdf files

Reply #3
To centroid a whole file or a set of files you use "Tools" --> "Accurate mass measure" --> "Automatic peak detection" from the main windows of masslynx.
The new set of files will have the files centroided (be sure that your originals are not already centroided) but you still need to check the accurate mass issue.

You can start by trying converting your centroided files with msconvert (to mzML since that is the newer format) without any special parameters. If that doesn't work try the lockmassRefiner filter.
Remember to use the scanEvent filter when you use msconvert. It is by far easiest for your down-stream work to have each scanEvent (function in Waters language) in separate files.

The way to check if things worked is to open the centroided raw file in masslynx and open a few ms spectra. Then you open the extract same spectrum in for example mzMine (the mzML file of course). Check a few masses. If they are exactly the same to at least 4 digits you are OK. If not you probably got uncalibrated data. Do this with a few scans just to be sure.


Regarding the cdf files. They should always be correct regarding the masses but you still need to centroid first.
Blog: stanstrup.github.io

Re: Issue with .cdf files

Reply #4
Hi so I have followed all of your instructions but i think maybe i have an issue with the scan event function, my spectrum in Mzmine matches my 2nd TIC in Masslynx but the 1st is completely different, could someone explain what is going on here?

(blue = Mzmine, Helpfully the top green = 2nd TIC in masslynx and red = 1st TIC in Masslynx and the spectrum is the other way around in colour)

[attachment deleted by admin]

Re: Issue with .cdf files

Reply #5
You are looking at tic in masslynx but BPI in mzMine.
The spectrum looks weird. Like it is the wrong spectrum. I think the scan numbers could be offset. I suggest doing EIC of for example 195 so you are sure which scans actually correspond.
Blog: stanstrup.github.io

Re: Issue with .cdf files

Reply #6
New update, I have checked my spectrums and scan numbers, i am confident they are correct. I am still not sure about a few thing and given that i am confident i am not alone i have a few questions.
So having used the scan filter on MSCovert to convert my APD.raw files to mzML as suggested, I now have a scan 1 and a scan2 files for each, to the best of my knowledge these represent MS1 in scan 1 and MSe in scan 2 (is this correct? if not can anyone point me in the direction of something to read to understand this better as the waters handbooks are pretty dry!)
Also I would like to use XCMS online, but does XCMS understand/ can i make it understand what each file represents so for example say i am uploading for a pairwise comparision of x vs y, 10 repeats of each, and a scan 1 and a scan 2 for each file. Should i have a stored data set of X, scan 1 (10 files) and compare that to Y, scan 1 (10 files) and then do the same for scan 2 OR should i use data set of X, scans 1 + 2 (20 files) and compare that to Y, scans 1 + 2 (20 files)? I hope that makes sense, or is there some intermediate step that i am not aware of

Re: Issue with .cdf files

Reply #7
Okay so i realise that I was being foolish, It suddenly dawned on me that its all MSe data with scan1 and scan2 representing low and high energy collision, Doh! But I am still confused about how to inform any would be preprocessing software that scan 1 is parent ions and scan 2 is fragment ions?