Hi,
I'm converting Waters data to .mzXML using msconvert, then loading it to xcms as raw data (something like, spec.raw <- xcmsRaw("spec.mzXML", profstep=0.001, profmethod="bin")). I would like to be able to extract metadata on the original Waters .raw file from which the .mzXML was taken; I know this is stored, as I can see it if I load the mzXML using readMzXmlFile (stored at specsm$metaData$parentFile[[1]]$fileName), but I havne't figured out how to get it out of xcms.
Any suggestions?
Thanks very much,
Andy
[edit: Or any other clever way to get the data out, without using the route I have been, since it takes about 10 mins to load the > 200 Mb files]
you don't have to read the entire file if the information you want is contained in the header.
Try something like
filepath <- "mydata.mzXML"
con <- file(filepath, "r")
txt <- readLines(con, n=50) ## only reads the first 50 lines, should only take milliseconds
close(con)
line <- grep("fileName",txt,fixed=T) ## find the line that contains the file name
if (length(line) >0) { ## found it
res <- strsplit(txt[line], ...) ## extract the result using strsplit etc.
}
I'll have a crack.
Thanks,
Andy
I have worked quite extensively with XML itself in R, actually on mzML files and not on mzXML files, but the principle is the same.
It is a very versatile method to get a lot of additional data out.
A code snippet I used, this one reads out the instrument configuration sections from a mzML file (the mzXML files are simpler, in general. Just open it as a text file and you can easily orient yourself in the structure)
package(XML)
openXML.mzML <- function(filename)
{
mzml <- xmlTreeParse(filename, asText=F, useInternalNodes=T,
fullNamespaceInfo=T,useDotNames=T)
return(mzml)
}
getConfigs.mzML <- function(mzml)
{
instrumentConfigs <- getNodeSet(mzml,
"/m:indexedmzML/m:mzML/m:instrumentConfigurationList/m:instrumentConfiguration",
c(m="http://psi.hupo.org/ms/mzml"))
configs <- t(sapply(instrumentConfigs, function(ic){
id <- xmlAttrs(ic)[["id"]]
analyzer <- getNodeSet(ic, "m:componentList/m:analyzer/m:cvParam",
c(m="http://psi.hupo.org/ms/mzml"))
analyzerName <- xmlAttrs(analyzer[[1]])[["name"]]
analyzerMSO <- xmlAttrs(analyzer[[1]])[["accession"]]
return(c(id, analyzerName, analyzerMSO))
}))
rownames(configs) <- configs[,1]
colnames(configs) <- c("ID", "name", "ontology")
return(configs)
}
Thank you people. I have sorted out what I wanted to do using Ralf's suggestion, and thanks to meow for the suggestion - good learning.
Sorry for the late reply,
Andy