Skip to main content

Messages

This section allows you to view all Messages made by this member. Note that you can only see Messages made in areas you currently have access to.

Messages - metabolon1

3
MS-DIAL / Not detecting/integrating both partially-separated peaks (MS-DIAL v4.48 Windows)
Hello all,

I am trying to have MS-DIAL detect partially-separated peaks and have so far been unsuccessful.

The attached result was generated using MS1 detection threshold 1000, smoothing level 0, minimum peak width 2, and retention time tolerance 0.01 min. I've also tried detection threshold 1000-3000, smoothing level 0-3, minimum peak width 2-5, and retention time tolerance 0.01-0.05 min without any better luck.

From what I can tell, MS-DIAL is splitting these closely-eluting peaks at the correct place, but it is only integrating the second peak. I am not seeing a anything in the peak spot viewer corresponding to the first peak, which I would expect to be about 2.00 min and 609.145 m/z.

How can I get the first peak integrated as well? Which parameters can I adjust? This is my test peak, but there are several other peaks like this in my samples.

Many thanks in advance for your help! Hope everyone is doing well out there :-)

Taylan


4
MS-DIAL / Re: Exporting CorrDec spectra
Dear Dr. Tada,

Thank you for those detailed instructions.

I was able to do a batch export to MS using MS-DIAL v4.20 and MSFINDER v3.40. However, the MS method type of my project is "SWATH-MS or conventional All-ions method" (MSe), not "All-ions with multiple CEs..." as you instructed. It still worked, though...

How do I know if the exported spectra are CorrDec, MS2Dec, or not deconvoluted? I did not see any indication of this on any of the screens.

Also, the steps I followed were slightly different that what you described:
1) [same]
2) The button that I clicked (see top left corner of attached picture) says "Export to MS-FINDER" when I hover over it.
3) [same]
4) [same]

Thank you,
Taylan
5
MS-DIAL / GNPS export files empty when Filtering by the ion abundances of blank samples
Dear Developers and Community,

Today I came across an odd behavior in MS-DIAL. When I export alignment results as GNPS export, all of the files are empty of any data; this only seems to happen when "Filtering by the ion abundances of blank samples" is selected. When it is not selected, all of the files look fine.

However, when I am exporting as "Raw data matrix (Area)" or "Representative spectra", it works whether or not I have selected the "Filtering by..." option. In fact, the file sizes are the same in both cases.

I tried this with several of my project (.mtd) files that I have successfully exported GNPS files from. I tried with three different versions of MS-DIAL (4.12, 4.18, 4.20). The same thing seems to happen.

I'm surprised I didn't notice this before, but I think I never actually selected "Filtering by..." when exporting. I generally set up my analysis parameters with the option "Remove features based on blank information" using "Sample max / blank average = 5 fold change".

Is this a known issue? Am I doing something wrong?

Thank you for your help,
Taylan

6
MS-DIAL / Exporting CorrDec spectra
Dear Dr. Tada and Dr. Tsugawa,

How can I export CorrDec results as an MSP file? I searched the MS-DIAL tutorial, this forum, and Google, but I did not find any information about this.

If I should export using the "Export -> Alignment results" option, how do I know whether I am exporting the CorrDec spectra or the MS2Dec spectra?

Can I download all of the CorrDec spectra from the alignment results at once, or do I need to download them one by one?

Related posts:
http://www.metabolomics-forum.com/index.php?topic=1410.msg4165#msg4165
http://www.metabolomics-forum.com/index.php?topic=1406.msg4146#msg4146

Thank you for your help.

Kind regards,
Taylan
7
MS-DIAL / Re: Correct settings for GNPS/FBMN from MSe data
Dear all,

I'm writing with an update. I found the edgelists generated by MS-DIAL as part of the GNPS export. I combined all four of the edgelists into a single list. Then I imported the CSV into R and ran the following script using the "igraph" pacakge to group the nodes into network components based on annotation.

Code: [Select]
library(igraph)
dfr <- read.csv('GnpsEdge_0_20202251139_COMBINED.csv') # combined edge list from GNPS
edgelist <- dfr[,c("ID1", "ID2")]
# I use "as.character" below because if the vector is integers instead of characters,
# the resulting components output includes ALL integers
# from 1 to the max value in edges. Thus, new nodes can be created artificially. 
edges <- as.character(as.vector(t(edgelist)))
g1 <- graph(edges, directed=FALSE)
comp1 <- components(g1)
comp.membership <- data.frame(cbind(node=as.numeric(names(comp1$membership))
                                  , annotation.group=as.numeric(comp1$membership)))

The idea I had initially was to select one node (at random or by some other criteria) from each of these annotation-based component groups to represent the entire group. Then in Cytoscape I would filter out all other nodes. However, when I tried this manually, I noticed that some of the components in my molecular network were being split up. Apparently this networking thing is more complex than I thought! :-)

Any thoughts on how to approach this?

For now, I'm moving forward without removing nodes based on annotation and see how it goes. I'll have to deal with this issue at some point down the line, and at least now I know the node groupings based on annotation. I'm curious to see what happens with GNPS IINxFBMN.

Thank you,
Taylan
9
MS-DIAL / Re: Correct settings for GNPS/FBMN from MSe data
Dear Dr. Tsugawa,

Thank you again. Using these settings, I was able to convert and process ~400 files in under 4 hours.

I used MS-DIAL to export the data to GNPS, and I ran FBMN. However, it appears that most of the clusters are composed of MS1 features with very similar retention times (see attached figure, nodes colored by retention time). This makes me think that
1) many of the MS1 features correspond to the same compound AND/OR
2) peaks are not being aligned correctly such that the same feature in different samples is being identified as different features.

My hunch is that #1 is the main factor. If I were using XCMS for pre-processing, I would solve this problem by running CAMERA to deconvolute the features.  Is there a way to do this in MS-DIAL? There is an "MS2Dec" tab in Analysis parameter settings, but I think what I need is to deconvolute the MS1 features. Is there a way to do this? I'm seeing the CorrDec option, but all of the settings appear to be for MS2.

Also, in analysis parameters setting, I selected the option "remove features based on blank information" along with "keep removable features and assign the tag". However, I'm not seeing any obvious columns/tags in the exported gnps table. How can I remove features that were tagged based on blank information?

Many, many thanks,
Taylan
10
MS-DIAL / Re: Correct settings for GNPS/FBMN from MSe data
Dear Dr. Tsugawa,

Thank you for your detailed response.

Using the base peak ion chromatograms from blank injections, it looks like the baseline for function 1 (low energy) is around 3,500 and for function 2 (high energy ramp) is around 1,500. So I am thinking about setting A = 5,000 and B = 3,000. Does this sound like a reasonable approach for determining cut-offs?

Concerning the ABF file converter, do you recommend selecting any of the options for my data? Please see attached screenshots.

Thank you,
Taylan
11
MS-DIAL / Correct settings for GNPS/FBMN from MSe data
Hello Community,

I was very excited to find out yesterday that MS-DIAL can be used to process MSe data for feature based molecular networking on GNPS. However, I am a little confused on how to set the correct data processing parameters. I have read through the MS-DIAL tutorial (https://mtbinfo-team.github.io/mtbinfo.github.io/MS-DIAL/tutorial), especially chapter 8. I am also using this paper as a guide: https://doi.org/10.1016/j.foodchem.2019.05.099

First of all, here is what I'm working with:
--Water's .RAW files, converted to mzML or ABF
--Acquired using Waters Xevo G2 QTOF in MSe mode
--ESI (negative ionization)
--Function 1: low collision energy (6V); centroided
--Function 2: high collision energy ramp (20-50V); centroided
--Function 3: lockmass
--mass range: 100-1500 Da (both low and high collision energy functions)
--MS-DIAL v4.12

I'm the most unclear about which "MS method type" to use and how to set up the Experiment file. According to the tutorial (https://mtbinfo-team.github.io/mtbinfo.github.io/MS-DIAL/tutorial#section-8-1), I should be using ‘All-ions with multiple CEs’ and set up the experiment file something like:
ID   MS Type   Start m/z   End m/z   Name   CollisionEnergy   DecTarget(1:Yes, 0:No)
0   SCAN   100   1500   MS1   6   1
1   ALL   100   1500   MS2   20   1
2   ALL   100   1500   MS2   50   1

However, this does not quite make sense, because the 20V and 50V modes are just the two bounds of the ramp. The entire ramp is collected as a single data stream (function 2), not as two separate streams.

An earlier section of the tutorial (https://mtbinfo-team.github.io/mtbinfo.github.io/MS-DIAL/tutorial#section-1-4) also shows "MSE" as an option for MS Type, which makes more sense to me. However, this example only has the first 4 columns of the experiment file: (i.e. ID,   MS Type,   Start m/z,   End m/z). This would suggest that the MS method type should be set to "SWATH-MS or conventional All-ions method".

I am also unclear on how to set up the "DecTarget" part of the experiment file. My chromatograms have many closely-eluting peaks, so I think I need to do deconvolution. Would I need to set DecTarget = 1 for all of the lines in the experiment file? Or just the line corresponding to the low energy channel (i.e. function 1).

In summary, my questions are:
1) How should I set up the experiment file correctly? Does each line in this file correspond to a data channel (e.g. function 1)?
2) Which "MS method type" should I select?
3) Given what I've described about my data and goals, are there any other data processing parameters that I should pay particular attention to? How about data file conversion parameters?

Thank you for all of your help. I'm very excited by the prospect of being able to do FBMN with our old MSe data!

Taylan

12
XCMS / Re: How to run featureSpectra (or another function) on a subset of samples
Hi Dr. Corey,

I think I got it to work (mostly)! Here's the winning code.
Code: [Select]
j <- grep("NEG_DDA", fileNames(xdata)) # index of DDA files
MS2.file.paths <- fileNames(xdata)[j] # file paths of DDA files
MS2.file.names <- gsub(".*/", "", MS2.file.paths) # names of DDA files
fd <- featureDefinitions(xdata) # extract feature defs as a new object
fv <- featureValues(xdata) # extract feature values as a new object
fv.filtered <- fv[, colnames(fv) %in% MS2.file.names] # filter feature values to include only DDA samples
cp <- chromPeaks(xdata) # extract chromatographic peaks as a new object
cp <- cbind(rowid=seq(nrow(cp)), cp) # add a temporary column to cp for matching with 'peakidx' in feature definitions (suggested by CoreyG)
cp.filtered <- cp[which(cp[,which(colnames(cp) == "sample")] %in% j),] # filter cp to include only DDA samples

peakidx.filtered <- list() # create object to store results of loop below
for(i in 1:length(fd@listData$peakidx)){
     temp <- match(fd@listData$peakidx[[i]], cp.filtered[,"rowid"])
     peakidx.filtered[[i]] <- temp[which(!is.na(temp))]
} # end loop; this filters to include only peakidx in fd that correspond to peaks in DDA samples
fd.filtered <- fd # duplicate original feature definitions
fd.filtered@listData$peakidx <- peakidx.filtered # overwrite peakidx in duplicated fd with filtered peakidx generated by loop above

xdata.filtered <- filterFile(xdata, MS2.file.names, keepAdjustedRtime=TRUE) # create a new xdata object with only DDA samples; correspondence results are removed and will be added back in below

file_factor <- factor(cp.filtered[, "sample"]) # outputs a vector to correspond peak number with sample number
cp.filtered.split <- split.data.frame(cp.filtered, f=file_factor) # splits cp.filtered (a dataframe) into a list of dataframes (one for each remaining DDA sample)

cp.filtered.v2 <- c() # new object for storing results
for(i in 1:length(cp.filtered.split)){
    cp.filtered.split[[i]][,"sample"] <- i
    cp.filtered.v2 <- rbind(cp.filtered.v2, cp.filtered.split[[i]])
} #end loop; renumbers samples in filtered cp list, starting at 1

chromPeaks(xdata.filtered) <- cp.filtered.v2 # GOAL !!!
featureDefinitions(xdata.filtered) <- fd.filtered # GOAL !!!

# export MS1 and MS2 features
filteredMs2Spectra <- featureSpectra(xdata.filtered, return.type = "Spectra")
filteredMs2Spectra <- clean(filteredMs2Spectra, all = TRUE)
filteredMs2Spectra <- formatSpectraForGNPS(filteredMs2Spectra)
writeMgfData(filteredMs2Spectra, paste(run.name, "_ms2spectra_all.mgf", sep=""))

# generate data table (i.e. peak table) in format needed for GNPS/FBMN
setwd(save.dir)
featuresDef <- featureDefinitions(xdata.filtered)
featuresIntensities <- featureValues(xdata.filtered, value = "into")
dataTable <- merge(featuresDef, featuresIntensities, by = 0, all = TRUE)
dataTable <- dataTable[, !(colnames(dataTable) %in% c("peakidx"))]
write.table(dataTable, paste(run.name, "_xcms_all.txt", sep=""), sep = "\t", quote = FALSE, row.names = FALSE) # UPLOAD TO GNPS for FBMN

#export MS2 features only
setwd(save.dir)
filteredMs2Spectra_maxTic <- combineSpectra(filteredMs2Spectra,
                                            fcol = "feature_id",
                                            method = maxTic)
writeMgfData(filteredMs2Spectra_maxTic, paste(run.name, "_ms2spectra_maxTic.mgf", sep="")) # UPLOAD TO GNPS for FBMN
filteredDataTable <- dataTable[which(dataTable$Row.names %in% filteredMs2Spectra@elementMetadata$feature_id),]
write.table(filteredDataTable, paste(run.name, "_xcms_onlyMS2.txt", sep=""), sep = "\t", quote = FALSE, row.names = FALSE) #NOTE: apparently this is the table that can be used with GNPS/FBMN

The two key changes were adding rowid to cp and filtering peakidx to remove peaks not present in peakidx. Incidentally, I did not overwrite row names in cp/cp.filtered, and it didn't seem to cause any problems.

However, as you can see, I did not write fv.filtered back into xdata.filtered. It seems that featureValues() is a getter but NOT a setter, so
Code: [Select]
> featureValues(xdata.filtered) <- fv.filtered
Error in featureValues(xdata.filtered) <- fv.filtered :
  could not find function "featureValues<-"

I couldn't quite figure out the structure of xdata.filtered, but I'm sure there's some way to overwrite the feature values through subsetting. Nonetheless, even without writing fv.filtered back into xdata.filtered, it seems that the feature values are re-populated somehow after
Code: [Select]
featureDefinitions(xdata.filtered)<-fd.filtered 
I'm not quite sure how this happens. And while the newly populated feature values have the same dimensions as fv.filtered, they are not identical: 
Code: [Select]
> all(fv.filtered == featureValues(xdata.filtered))
[1] FALSE

Visual side-by-side inspection of fv.filtered and featureValues(xdata.filtered) shows consistency for some samples [with very small values in fv.filtered  replaced with "NA" in featureValues(xdata.filtered)], but the values for other samples are completely different.

Looking at str(xdata.filtered), I don't see any obvious objects to replace with fv.filtered. I thought maybe xdata.filtered@featureData@data$totIonCurrent would correspond to the values in featureValues(xdata.filtered), but they don't seem to.
 
Looking at the code for featureValues() , I don't see anything obvious. It may need to look at this code again with rested eyes.

Any idea how to get fv.filtered back into xdata.filtered?

Thanks!
Taylan
14
XCMS / Re: How to run featureSpectra (or another function) on a subset of samples
Update: I tested the code from CoreyG's 1st suggestion for a single file...
Code: [Select]
pks <- chromPeaks(xdata) # extract chromatographic peaks as a separate object
xdata_filtered <- filterMsLevel(as(xdata, "OnDiskMSnExp"), 2L) # extract MS2 data as a separate object
## Split data per file
file_factor <- factor(pks[, "sample"]) #define each sample as a separate factor
pks <- split.data.frame(pks, f = file_factor) # make a separate dataframe for each sample
xdata_filtered <- lapply(as.integer(levels(file_factor)), filterFile, object = xdata_filtered)

## You then need to loop through xdata_filtered and pks for the samples you need. Each entry in xdata_filtered becomes 'x'; and 'pks' is the corresponding entry in the pks list.
n <- 1 # index in xdata_filtered and pks
method = 'closest_mz'

sps <- spectra(xdata_filtered[[n]])
pmz <- precursorMz(xdata_filtered[[n]])
rtm <- rtime(xdata_filtered[[n]])

#https://github.com/sneumann/xcms/blob/557b936967271690140e19224be707d87ea63168/R/functions-XCMSnExp.R#L1877
## Make sure you define all the required parameters i.e. method = 'closest_mz'
res <- vector(mode = "list", nrow(pks[[n]]))
for (i in 1:nrow(pks[[n]])) {
   if (is.na(pks[[n]][i, "mz"]))
     next
   idx <- which(pmz >= pks[[n]][i, "mzmin"] & pmz <= pks[[n]][i, "mzmax"] &
                  rtm >= pks[[n]][i, "rtmin"] & rtm <= pks[[n]][i, "rtmax"])
   if (length(idx)) {
     if (length(idx) > 1 & method != "all") {
       if (method == "closest_rt")
         idx <- idx[order(abs(rtm[idx] - pks[[n]][i, "rt"]))][1]
       if (method == "closest_mz")
         idx <- idx[order(abs(pmz[idx] - pks[[n]][i, "mz"]))][1]
       if (method == "signal") {
         sps_sub <- sps[idx]
         ints <- vapply(sps_sub, function(z) sum(intensity(z)),
                        numeric(1))
         idx <- idx[order(abs(ints - pks[[n]][i, "maxo"]))][1]
       }
     }
     res[[i]] <- lapply(sps[idx], function(z) {
       z@fromFile = fromFile
       z
     })
   }
 }
...and got an error message
Code: [Select]
Error in (function (cl, name, valueClass)  : 
  assignment of an object of class “standardGeneric” is not valid for @‘fromFile’ in an object of class “Spectrum2”; is(value, "integer") is not TRUE

Assuming this code does work, I still think I'm missing something. I'm not seeing how looping through this code would produce an output that has a data structure equivalent to calling featureSpectra on xdata. If I loop through the above code for each element in xdata_filtered and pks that correspond to a file with MS2, then each time res will be overwritten by the data from the next file. At any given time, res will have the results from only one file. Any ideas?


Also, as per my previous post, I'm still stuck at the same place regarding CoreyG's 2nd suggestion.

Thanks everyone!
Taylan
15
XCMS / Re: How to run featureSpectra (or another function) on a subset of samples
Update: I have yet to succeed. I made some headway with CoreyG's 2nd suggestion:
Quote
If that is sounding like too much stuffing around, you can save featureDefinitions, featureValues and chromPeaks. Filter the file to just the DDA samples. Then you can edit these to be internally consistent and write them back to xdata.

I'm getting an error when I try to incorporate the saved featureDefinitions back into the filtered xdata object.
Code: [Select]
j <- grep("NEG_DDA", fileNames(xdata))
MS2.file.paths <- fileNames(xdata)[j]
MS2.file.names <- gsub(".*/", "", MS2.file.paths)

# extracting parts of xdata & filtering them ------------------
fd <- featureDefinitions(xdata)
fv <- featureValues(xdata)
fv.filtered <- fv[, colnames(fv) %in% MS2.file.names]
cp <- chromPeaks(xdata)
cp.filtered <- cp[which(cp[,which(colnames(cp) == "sample")] %in% j),]

# filtering files in xdata -----------------------------------------
xdata.filtered <- filterFile(xdata, MS2.file.names, keepAdjustedRtime=TRUE)

# changing sample numbers in filtered, extracted chromatographic peak object to match those in filtered xdata object
# if the sample numbers do not match, an error is generated
file_factor <- factor(cp.filtered[, "sample"])
cp.filtered.split <- split.data.frame(cp.filtered, f=file_factor)
cp.filtered.v2 <- c()
for(i in 1:length(cp.filtered.split)){
    cp.filtered.split[[i]][,"sample"] <- i
    cp.filtered.v2 <- rbind(cp.filtered.v2, cp.filtered.split[[i]])
} #end for loop

# incorporate filtered peaks back into filtered xdata
chromPeaks(xdata.filtered) <- cp.filtered.v2

featureDefinitions(xdata.filtered) <- fd

The last line above gives the error message:
Code: [Select]
Error in validObject(object) : 
  invalid class “XCMSnExp” object: Some of the indices in column 'peakidx' of element 'featureDefinitions' do not match rows of the 'chromPeaks' matrix!

To overcome this error, I tried the code below, with the same error message resulting.
Code: [Select]
peak.list <- as.numeric(gsub("CP", "", row.names(chromPeaks(xdata.filtered))))
for(i in 1:length(fd@listData$peakidx)){ # this loop removes chromatographic peaks in "fd" that are not present in cp.filtered.v2 (i.e. chromPeaks(xdata.filtered) )
    k <- which(fd@listData$peakidx[[i]] %in% peak.list)
    fd@listData$peakidx[[i]] <- fd@listData$peakidx[[i]][k]
} # end for loop

featureDefinitions(xdata.filtered) <- fd

I'm not sure how to proceed. Ideas?




Also, regarding the 1st approach CoreyG suggested, what would I do with the 'res' object that is generated by this code?

Many thanks!
Taylan