Skip to main content
Topic: extracting metadata from mzXML (Read 6498 times) previous topic - next topic

extracting metadata from mzXML

Hi,

I'm converting Waters data to .mzXML using msconvert, then loading it to xcms as raw data (something like, spec.raw <- xcmsRaw("spec.mzXML", profstep=0.001, profmethod="bin")). I would like to be able to extract metadata on the original Waters .raw file from which the .mzXML was taken; I know this is stored, as I can see it if I load the mzXML using  readMzXmlFile (stored at specsm$metaData$parentFile[[1]]$fileName), but I havne't figured out how to get it out of xcms.

Any suggestions?

Thanks very much,

Andy

[edit: Or any other clever way to get the data out, without using the route I have been, since it takes about 10 mins to load the > 200 Mb files]

Re: extracting metadata from mzXML

Reply #1
you don't have to read the entire file if the information you want is contained in the header.
Try something like
Code: [Select]
filepath <- "mydata.mzXML"

 con <- file(filepath, "r")
 txt <- readLines(con, n=50) ## only reads the first 50 lines, should only take milliseconds
 close(con)

line <- grep("fileName",txt,fixed=T) ## find the line that contains the file name
if (length(line) >0) { ## found it
 res <- strsplit(txt[line], ...)  ## extract the result using strsplit etc.
}


Re: extracting metadata from mzXML

Reply #2
I'll have a crack.

Thanks,

Andy

 

Re: extracting metadata from mzXML

Reply #3
I have worked quite extensively with XML itself in R, actually on mzML files and not on mzXML files, but the principle is the same.
It is a very versatile method to get a lot of additional data out.

A code snippet I used, this one reads out the instrument configuration sections from a mzML file (the mzXML files are simpler, in general. Just open it as a text file and you can easily orient yourself in the structure)

Code: [Select]
package(XML)

openXML.mzML <- function(filename)
{
mzml <- xmlTreeParse(filename, asText=F, useInternalNodes=T,
fullNamespaceInfo=T,useDotNames=T)
return(mzml)
}

getConfigs.mzML <- function(mzml)
{
instrumentConfigs <- getNodeSet(mzml,
"/m:indexedmzML/m:mzML/m:instrumentConfigurationList/m:instrumentConfiguration",
c(m="http://psi.hupo.org/ms/mzml"))

configs <- t(sapply(instrumentConfigs, function(ic){
id <- xmlAttrs(ic)[["id"]]
analyzer <- getNodeSet(ic, "m:componentList/m:analyzer/m:cvParam",
c(m="http://psi.hupo.org/ms/mzml"))
analyzerName <- xmlAttrs(analyzer[[1]])[["name"]]
analyzerMSO <- xmlAttrs(analyzer[[1]])[["accession"]]
return(c(id, analyzerName, analyzerMSO))
}))

rownames(configs) <- configs[,1]
colnames(configs) <- c("ID", "name", "ontology")
return(configs)
}

Re: extracting metadata from mzXML

Reply #4
Thank you people. I have sorted out what I wanted to do using Ralf's suggestion, and thanks to meow for the suggestion - good learning.

Sorry for the late reply,

Andy