sorry for the delay. This functionality is not yet in RMassBank itself, however you can use RMassBank to extract spectra and the OrgMassSpecR functions for comparison (SpectrumSimilarity). We have in-house adapted functions for this but it's not yet all there.
What happens if you split the file into two mass ranges using MSconvert? Using the subset filter. Also I seem to remember that the Qex sometimes does something strange with the data file: it will label one scan as "full scan" and the other as "SIM". You can check that when going through the scans in Xcalibur. If that's the problem, there is a way to get your data into mz(X)ML correctly, I would have to look it up though.
The newest, cutting edge RMassBank version does this.
install_github("MassBank/RMassBank@s4power") RMassBank:::parseMbRecord(filename) parses a massbank record to a RmbSpectrum2 object RMassBank:::parseMbRecords(filenames) parses multiple files into a list of RmbSpectraSet objects (i.e. it groups the spectra by compound)
(yes it's not even exported yet - I just wrote it a week ago. Also, in the older RMassBank versions, there is parseMassBank which also parses MassBank records, but not into the "native" RMassBank/MSnbase format. parseMassBank will be replaced by parseMbRecord(s) soon, i.e. parseMbRecord(s) will be renamed to parseMassBank and will be an S4 method.)
I fully agree. I ran into the same problem recently.
If you click on the compound number the info is displayed on the spectrum, but not if you click on the View button. It took me an entire day before I realized that these spectra are CFM-ID spectra. Please put the "Insilico spectra" info at least on the "View" graphic if not on the result table.
xs <- xcmsSet(whatever) # see first peak print(xs@peaks[1,]) # subtract a constant shift (30 sec) from all rt values: shift <- 30 xs@peaks[,c("rt", "rtmin", "rtmax")] <- xs@peaks[,c("rt", "rtmin", "rtmax")] - shift # print the modified peak print(xs@peaks[1,])
# then there's also the xs@rt which one could shift, # but I don't know whether that's needed for the remaining workflow - # I believe all further calculations start from xs@peaks. xs@rt <- xs@rt - shift
What I don't know is, what will happen downstream if you want to extract chromatograms or whatnot.
I have worked quite extensively with XML itself in R, actually on mzML files and not on mzXML files, but the principle is the same. It is a very versatile method to get a lot of additional data out.
A code snippet I used, this one reads out the instrument configuration sections from a mzML file (the mzXML files are simpler, in general. Just open it as a text file and you can easily orient yourself in the structure)
ProteoWizard 3.0.3700 on Q-Exactive raw data, in Peak Picking (Prefer Vendor) mode, conversion to 64 bit mzML. The problem appears to be a mass stick which occurs twice. I tried with and without "remove zero samples" (since the offending data point was a zero-intensity point).
Sure, if you're interested in adding the function to XCMS, I think it would be a useful addition. If possible we should keep it extensible so that someone can add other algorithms (e.g. the cubic splines used by the OpenMS HiRes feature detector; but that will probably need Rcpp, I can't imagine an easy and fast way to do that one in R...)
But keep in mind that the function could use some more testing
[quote author="Ralf"]1.) (easy work-around) : pre-process your files, write them as mzXML, run centWave on the result. 2.) (good for the community) : implement your algorithm into XCMS, and we'll find a way to integrate it with centWave. Ideally this would happen directly on the C-Level[/quote] Hi Ralf,
since I do not currently have the "otium" (is there an English word for that? German is "Musse" more or less) to accustom myself to Rcpp style, I did something in-between... I coded the routine in R vector-operation style instead of using loops. It's not as fast as the original Java implementation or as Rcpp would be, but it's not terribly bad, and since the subsequent centWave takes much longer anyway, it's not a bottleneck for me.
I wrote the routine primarily for my own use, and it's not really tested or anything, but if anyone wants to use it, feel free to do with it whatever you want. Don't blame me if your computer explodes and buries all your valuable data never to be found again