Skip to main content

Messages

This section allows you to view all Messages made by this member. Note that you can only see Messages made in areas you currently have access to.

Messages - CoreyG

1
HMDB / Re: Python script to process HMDB xml file
Thanks for sharing, yufree.

I'm sure that will be helpful for people. Plus, it was good to find your paper. I look forward to having a good read later today.
2
XCMS Online / Re: XCMS Long processing time
Hi gsoliman,

I've moved your thread to the XCMS-online board. Unfortunately we have had to lock this board as people keep asking the same questions, but nobody from XCMS-online answers them.
You can see this post (Problems with XCMS online?) which can direct you to places that you might be able to seek assistance.

If you are looking for other software, there are a lot of MS-DIAL users on this forum.
3
Chromatography / Re: Xcalibur
Hi Milou,

I've never used Xcalibur, so can't help you from experience.
If you haven't found a solution, you could take a look at some documents from Thermo (Getting Started Guide).
4
R / Re: Data filtering
Hi onursenol,

Great to see ambition like that.
I guess it really depends on what data you are starting with. If raw mass spec data, I'd recommend XCMS. There are some tutorials on there and all over the web!
MSnbase works with xcms and contains a lot of useful functions.

I'd recommend taking a look at a recently published paper linked in here. It covers a lot of what's available.

Good luck!
5
Sample preparation / Re: Quantification relative vs quantification absolue
Hi Sebas,

No problem is asking basic questions. That's why we are here!

So, ideally you would have had internal standards added from as early a step as possible. Given that you do not have that, it is possible to perform a 'relative quantitation', but there are a lot of caveats. Using internal standards allows a lot more than just quantification.

First thing, it's important that you have performed all the other steps appropriately:
  • The extraction was performed to minimize sample losses or changes in concentration. This is to ensure the signal you measure is as close to proportional to the original concentration as possible (all else being equal).
  • Ensure lipid loading is consistent. 10 mg dry weight can contain very different amounts of lipids depending on the sample type i.e. adipose tissue vs lean muscle. Normalizing to lipid classes isn't going to be accurate unless you are doing an 'apples to apples' comparison.
  • The samples were run in a randomized order, with appropriate quality controls run. If all of samples A were run before samples B, it's possible the intensity differences were due to technical issues, not biological.
  • If you run 'technical' replicates, ensure they are performed independently from the dry matter. Triplicate injections tell you about HPLC/MS variability; Triplicate extractions capture the whole process.

So interesting things to think about. If you did have internal standards and you measured 20 lipids in a class. Using the standard way of calculating concentration=[ISTD conc]*[peak area]/[ISTD area], for all measurements. Then, if you did normalize to the lipid class, you would get the same answer as if you completely ignored the internal standard!
Whenever you ratio concentrations that used the same internal standard, you remove the effect of the internal standard.

I hope this helps.
6
XCMS / Re: How to run featureSpectra (or another function) on a subset of samples
Great work, Taylan. You've made good progress considering the minimal/incomplete advice I gave. However, I'm sorry you still haven't got a working solution.

I'm been travelling and haven't had much internet/computer access. I'll be able to put together a more complete solution mid next week.

Regarding the second solution, it helps to know that peakidx in featureDefinitions corresponds to the row numbers of chromPeaks (not the row.names). This means that deleting rows in chromPeaks will misalign chromPeaks and peakidx.
The simplest solution is to add a column to chromPeaks that contains the row number. cp<-cbind(cp,rowid=seq(nrow(cp))). Then filter chromPeaks, then 'match' peakidx to the temporary column. match(fd@listData$peakidx[[i  ]], cp[,"rowid"]).
This gives you the new peakidx to use. Note that this will probably contain NAs for the missing samples. NA.omit() might be useful to get rid of them.
For consistency, it might be good to overwrite the row.names of cp (not sure if it matters).

This could take a while as match doesn't consider the rowid is a sorted vector. If cp or fd are large, you are doing a whole heap of value lookups.

Regarding solution 1, internally xcms calls a (bioconductor?) Parallel version of lapply. So each file returning 'res' just gets built up into a list. Looking at the xcms code on github might make this simpler to understand than my rambling  :D

This should get you over the line. If not, I'll be able to give you some more help mid week. It's also a good idea to compare the final featurevalue before and after doing this to make sure the results are consistent for the samples still present.
^ again, the above is all from memory. So you might need to use some judgment.

EDIT: looks like some of my code is being scrambled by the wysiwyg editor.
Cheers,
Corey
7
Other / Re: Converting .dat ASC II File with EI-MS spectrum
Hi Biswa,

I'd be happy to take a look, but could you post a smaller section of one of those files?

Do you want to convert the data to another format or extract the spectrum for specific compounds?

Cheers,
Corey
8
XCMS / Re: How to run featureSpectra (or another function) on a subset of samples
Hi Metabolon1,

A great deal of work was put into making the XCMS analysis reproducible. This ensures the same results could be obtained just by having the final object i.e. all the data and 'process history' is contained within.

I'll suggest a bit of a hack that you can try, which I'm not able to test at the moment, but hopefully it can help get you somewhere. There is almost certainly a proper way to do what you want, but I can appreciate the frustration that can ensue when things don't work the way you want  :D

This is basically following the calls that featureSpectra makes internally (skipping all the checks along the way...)
featureSpectra
Code: [Select]
[url=https://github.com/sneumann/xcms/blob/557b936967271690140e19224be707d87ea63168/R/functions-XCMSnExp.R#L1804]ms2_spectra_for_all_peaks[/url]
pks <- chromPeaks(xdata)
xdata_filtered <- filterMsLevel(as(xdata, "OnDiskMSnExp"), 2L)
## Split data per file
file_factor <- factor(pks[, "sample"])
pks <- split.data.frame(pks, f = file_factor)
xdata_filtered <- lapply(as.integer(levels(file_factor)), filterFile, object = xdata_filtered)

## You then need to loop through xdata_filtered and pks for the samples you need. Each entry in xdata_filtered becomes 'x' and 'pks' is the corresponding entry in the pks list.
sps <- spectra(x)
pmz <- precursorMz(x)
rtm <- rtime(x)

[url=https://github.com/sneumann/xcms/blob/557b936967271690140e19224be707d87ea63168/R/functions-XCMSnExp.R#L1877]ms2_spectra_for_peaks_from_file[/url]
## Make sure you define all the required parameters i.e. method = 'closest_mz'
res <- vector(mode = "list", nrow(pks))
for (i in 1:nrow(pks)) {
  if (is.na(pks[i, "mz"]))
    next
  idx <- which(pmz >= pks[i, "mzmin"] & pmz <= pks[i, "mzmax"] &
                 rtm >= pks[i, "rtmin"] & rtm <= pks[i, "rtmax"])
  if (length(idx)) {
    if (length(idx) > 1 & method != "all") {
      if (method == "closest_rt")
        idx <- idx[order(abs(rtm[idx] - pks[i, "rt"]))][1]
      if (method == "closest_mz")
        idx <- idx[order(abs(pmz[idx] - pks[i, "mz"]))][1]
      if (method == "signal") {
        sps_sub <- sps[idx]
        ints <- vapply(sps_sub, function(z) sum(intensity(z)),
                       numeric(1))
        idx <- idx[order(abs(ints - pks[i, "maxo"]))][1]
      }
    }
    res[[i]] <- lapply(sps[idx], function(z) {
      z@fromFile = fromFile
      z
    })
  }
}
names(res) <- rownames(pks)

If that is sounding like too much stuffing around, you can save featureDefinitions, featureValues and chromPeaks. Filter the file to just the DDA samples. Then you can edit these to be internally consistent and write them back to xdata.

After all that, hopefully someone who knows what they are talking about comes in to give you the proper directions  ;)
9
MS-DIAL / Re: MS2 spectrum missing peak information
Hi Larissa,

I haven't used MS-DIAL or mzmine2 enough to give you definite answers about the software.
However, I do see that the precursor masses are different between the two images. Not by much, but enough to suggest that they aren't showing the same DDA product scans. I assume this is why they differ i.e. they are showing different scans at different retention times -> different abundances of coeluting peaks.

If the software is showing the 'best product ion scans' that match a library, the difference would be in how they have implemented spectral similarity. There are many parameters that can be tuned and I guess the authors have settled on slightly different ones.

I hope that helps.
Cheers,
Corey

11
Announcements / Re: Announcement of Opportunity
Congratulations to the EMN committee of 2019-2020!

Very happy to introduce the faces of the EMN committee for 2019-2020 👩‍🎓👨‍🎓. Our members are based across the world and they love all things metabolomics! Watch this space for all the exciting projects we are working on!

EMN Metabolomics Society (@EMN_MetSoc)



EMN Metabolomics Society Facebook Page
12
Conferences and seminars / EMN Webinar 20th November 2019 (15:00 UTC)
Coming up 20th November 2019 at 15:00 UTC (7:00 PST, 10:00 EST, 16:00 CET)!

Metabolomics as a tool for elucidating plant growth regulation

Rising demand for food and fuels makes it crucial to develop breeding strategies for increasing crop yield/biomass. Plant biomass production is tightly associated with growth and relies on a tight regulation of a complex signaling network that integrates external and internal stimuli. The main goal of our group is to elucidate the processes underlying plant growth and production of biomass by combining physiology, metabolomics, and gene expression analyses. In my presentation, I will provide examples of i) how the evolutionary conserved Target of Rapamycin pathway fine-tunes metabolic homeostasis to promote biosynthetic growth in plants; ii) the potential of metabolite profiles to predict plant performance as biomarkers.

Click here to learn more about this latest webinar by Camila Caldana (PhD).

Please register for “Metabolomics as a tool for elucidating plant growth regulation," to be held on 20th November 2019 at 15:00 UTC (7:00 PST, 10:00 EST, 16:00 CET) at: https://register.gotowebinar.com/register/5446228940397431052

After registering, you will receive a confirmation email with information about joining the webinar.

Brought to you by the EMN of the Metabolomics Society.

13
XCMS Online / Problems with XCMS Online? Look here
Dear Forum members and visitors,

We have been very happy to see users supporting each other to solve each others issues with XCMS Online. However, the number of users with detailed knowledge of XCMS Online has dwindled. This has left many unanswered questions in the forum.

To help those going forward, we recommend you to direct specific questions to the XCMS Online Contact Form.
It might also help to look at the online documentation.

For questions related to the R version of XCMS, please ask them in this board.
14
Mass spectrometry / Re: Experimental design for multiple-batch study
Hi djb17,

I feel I wasn't so clear in my last message.

Let's imagine we have 2 groups. We don't know if Metabolite X is different between the two groups, so we decide to measure it.
We run the first group in one batch and get an average concentration of 10.
We run the second group separately and get an average concentration of 20.

Does that mean the second group has twice the concentration of the first? That depends on how reproducible (accurate) our measurement is.
Basically, we don't know if the difference we saw is due to the biological difference in groups or the way we ran the experiment (batches).

Someone suggests 'normalizing' the batches to each other. So we multiply the first batch by 1.5 to get an average of 15. We divide the other batch by 4/3 to get an average of 15. Now the batches are normalized, but we don't see any difference between the groups.

If we have randomized the samples beforehand, we might get an average of 15 in the first batch. This is because half the samples from group 1 were in there with half the samples of group 2.
What if the batch 2 average was 30?
We might say the reproducibility wasn't great, so we'll normalize the two batches. - for simplicity, we'll divide batch 2 by 2, so the average is 15.

Now we can compare our two groups and we see that group 1 average is 10 and group 2 average is 20. Success! We fixed an issue and didn't lose our biological information.

This has become standard practice and as you brought up, so is running pooled samples along the run.
This provides extra insurance and the ability to adjust for within-batch variability/drift.
However, the pooled sample does not need to be from the samples you are running! It just has to be representative i.e. don't run human plasma as a QC alongside algae extract.

We run 3 types of QC samples (not counting blanks). 1 pooled plasma QC (which we use for all our cohorts). 1 pre-extracted pooled QC that we use to monitor system stability. 1 pooled reference sample from NIST.

I hope that helps a bit.
If someone can explain it better, please chip in.
15
Mass spectrometry / Re: Experimental design for multiple-batch study
Hi djb17,

It is always a good idea to randomize samples across batches. There are special cases where you might want to perform block randomization, such as what Jan described.
I guess you could say that the goal is to make systematic/technical variation orthogonal to biological variation.

Given that, how different are the distributions?
Say there are 3 conditions, where all samples were randomly assigned to. It would be fine to have one batch with 30/40/50 samples from each treatment (assuming they are run in a randomized order). However, it wouldn't be good to have a distribution of something like 50/0/60. Even worst would be 10/0/100.

For pooled QCs. It's a good idea to create the pooled samples and put them with the biological samples as soon as possible. This ensures they will capture as much technical variation as possible i.e. all the variation from sample storage, defrosting, extraction, running etc.
This is especially important when batches are processed separately (extracted separately or stored in different freezers).

Cheers,
Corey