Skip to main content
Recent Posts
2
Other / Re: Peak alignment with large dataset (over 2500 samples and growing)
Last post by CoreyG -
On another matter, I’m wondering how to cut down on the number of features while still maintaining a low intensity threshold. Currently I have ~13,000 features. My goal is to be able to get my peak table into EZinfo, which is not able to handle my 1950x13,000 peak table. I am interested in minor compounds, so I don’t just want to filter by intensity. I have a few ideas, and I would love it if anyone could offer feedback.
Depending on what you are trying to do, you could get the median peak area from each of the triplicates. That will cut your sample rows in nearly a third.

Alternatively, you could filter the dataset to remove isotopes. Depending on the average number of carbons in your metabolites and signal/abundance, you might be able to reduce the dimensions 2-4 fold. The same can be done with removing adducts.
I've briefly played around with CAMERA for this, but ended up using mz.unity.

As a less useful suggestion, is it possible to use R for your data analysis?
For a lot of multivariate analysis, mixOmics does pretty well. The website has a lot of examples and the inbuilt plotting functions have come a long way.

One idea is to set strict filters during correspondence to cut down on the number of “false” features. I tried re-running XCMS on this same dataset but using different params. In particular, for correspondence, I changed the way sample groups are defined. Previously, all samples were in the same group. This time, I defined each sample as its own group (so ~600 groups). For PeakDensityParam, I set minFraction=2/3 & minSamples=2. My thinking was that a true feature would be present in all 3 injections of a sample, but I set the cutoff to 2 out of 3 to be on the safe side. In this way, I hoped to eliminate false features. At any rate, the correspondence step took much longer than before, and I ran out of memory before the script was completed. I tried a couple times with the same result.
My thoughts on this differ to many in the 'untargeted' scene. I'm really only interested in features that are present in nearly all samples (<10% missing values). So, I always ask if people expect to see features entirely missing from certain samples/groups.

The nice thing about XCMS is that you can set these parameters fairly loose early in the workflow. Then after fillChromPeaks, you can be more stringent.
With so many samples, I would imagine that seeing the same feature in multiple groups is an almost certainty. So maybe put every sample in 1 group, but set minFraction=10/600 (or something of that sort).

I'd love to hear other peoples thoughts on this, as well.

Another idea is to filter out less informative markers based on variation of each feature across my sample set. My idea is to calculate the coefficient of variation for each marker across the entire dataset, and then exclude any markers below a certain CV value. I understand that p value and fold-change are often used for this kind of filtering, but as I understand it, these only make sense if the dataset contains multiple experimental groups. I don’t have any groups in my dataset; this is just an exploratory analysis. Does anyone have knowledge of or experience with filtering in this way? Any papers that you can suggest? How to determine an appropriate cutoff value for CV?

Thanks!
This is certainly a way you could go.
Perhaps there is a way to empirically determine a good CV cutoff?
If CV is mostly related to biological grouping, then you could determine the difference in CV when all injections are used compared to when you have averaged the triplicates. Determine the threshold CV by permuting the biological grouping and repeating the process (you will end up averaging non-triplicates randomly). Whatever the 95th percentile is, that is your critical value.
3
Other / Re: Peak alignment with large dataset (over 2500 samples and growing)
Last post by CoreyG -
Hello folks,

Here's an update from my end.

I returned form vacation to find CoreyG's helpful responses. It turns out that I was not using "value='into'". I changed this param, and now my data look much better.
Glad to hear I could be of help.

I've been using the Brunius batchCorr package, because I already know how to use R. However, given the characteristics of my dataset, I wonder if it is adequate.

Characteristics:
-- ~1950 files representing ~570 plant extracts (triplicate injection) + QC samples
-- 13 batches
-- All extracts are from the same species
-- The QC sample is an extract of about 40 accessions pooled together. However, it looks quantitatively different than most of the extracts in the sample set: the later eluting peaks of the QC sample are generally bigger while the early peaks are smaller. I don't think there are many qualitative differences between QC and other samples. However, I can imagine that these might translate into presence/absence differences in the peak table for minor compounds.
The differences between QCs and samples shouldn't be that big of a deal.
Depending on what batch correction method you use, you can assess the improvement in CV (RSD) of the QC features to determine how useful the batch correction was. Now, if the batch correction method optimized itself based on minimizing QC variation, then this approach is biased. Cross-validation should then be used to assess performance.

A simple visualization is to plot the pre-corrected CVs on the x-axis and the post-corrected CVs on the y-axis. Points that fall below the diagonal were improved; points on the diagonal weren't affected; points above the diagonal were negatively affected.
This may be an easy way to get a 'gut' feel for what method works best for you.

-- The extracts--other than QC--are not standardized by concentration or by equivalent weight of plant material. There is a range of weight of plant material that was extracted. Nonetheless, I do have for each sample the weight of plant material extracted and the weight of solvent used for extraction. From these values, I have generated a sample:solvent correction factor.
-- This is a pilot dataset and was not intended for publication.

My thinking is, now that the batch correction has been done, the next step is to apply the sample:solvent correction factor. The simplest thing to do would be, for each feature in a sample, divide the peak area value by the correction factor for that sample. However, I realize that detector response may not be linear in the range of interest for each feature; thus, the results may not be completely accurate. Nonetheless, I can't think of a better option. Any feedback on my approach?

This is a fairly common approach. Of course, you should always try to keep the sample:solvent ratio as consistent across all samples as possible. Remember that different sample:solvent ratios will cause variability in extraction efficiency, ionization and detector response.

If you are concerned about introducing associations into your data, consider using a linear model to remove the correction factor.
Get the residuals from lm(peakArea~correctionFactor). This allows the detector response to not be 1:1, but doesn't do much for non-linearity.
4
XCMS / Re: Using xcmsSet getting Error in R_nc4_close: NetCDF: Not a valid ID
Last post by CoreyG -
Hi Dominic,

The issue you are having was recently noted on the xcms github page: NetCDF: Not a valid ID error when using CentOS with NetCDF library 4.6.2

It looks like Johannes has already fixed the issue. You can install the patched version using the command:
Code: [Select]
devtools::install_github("sneumann/xcms", ref = "RELEASE_3_8")

Hopefully that fixes the issue for you!
5
Other / Re: Peak alignment with large dataset (over 2500 samples and growing)
Last post by metabolon1 -
On another matter, I’m wondering how to cut down on the number of features while still maintaining a low intensity threshold. Currently I have ~13,000 features. My goal is to be able to get my peak table into EZinfo, which is not able to handle my 1950x13,000 peak table. I am interested in minor compounds, so I don’t just want to filter by intensity. I have a few ideas, and I would love it if anyone could offer feedback.

One idea is to set strict filters during correspondence to cut down on the number of “false” features. I tried re-running XCMS on this same dataset but using different params. In particular, for correspondence, I changed the way sample groups are defined. Previously, all samples were in the same group. This time, I defined each sample as its own group (so ~600 groups). For PeakDensityParam, I set minFraction=2/3 & minSamples=2. My thinking was that a true feature would be present in all 3 injections of a sample, but I set the cutoff to 2 out of 3 to be on the safe side. In this way, I hoped to eliminate false features. At any rate, the correspondence step took much longer than before, and I ran out of memory before the script was completed. I tried a couple times with the same result.

Another idea is to filter out less informative markers based on variation of each feature across my sample set. My idea is to calculate the coefficient of variation for each marker across the entire dataset, and then exclude any markers below a certain CV value. I understand that p value and fold-change are often used for this kind of filtering, but as I understand it, these only make sense if the dataset contains multiple experimental groups. I don’t have any groups in my dataset; this is just an exploratory analysis. Does anyone have knowledge of or experience with filtering in this way? Any papers that you can suggest? How to determine an appropriate cutoff value for CV?

Thanks!
6
Other / Re: Peak alignment with large dataset (over 2500 samples and growing)
Last post by metabolon1 -
Hello folks,

Here's an update from my end.

I returned form vacation to find CoreyG's helpful responses. It turns out that I was not using "value='into'". I changed this param, and now my data look much better.

I've been using the Brunius batchCorr package, because I already know how to use R. However, given the characteristics of my dataset, I wonder if it is adequate.

Characteristics:
-- ~1950 files representing ~570 plant extracts (triplicate injection) + QC samples
-- 13 batches
-- All extracts are from the same species
-- The QC sample is an extract of about 40 accessions pooled together. However, it looks quantitatively different than most of the extracts in the sample set: the later eluting peaks of the QC sample are generally bigger while the early peaks are smaller. I don't think there are many qualitative differences between QC and other samples. However, I can imagine that these might translate into presence/absence differences in the peak table for minor compounds.
-- The extracts--other than QC--are not standardized by concentration or by equivalent weight of plant material. There is a range of weight of plant material that was extracted. Nonetheless, I do have for each sample the weight of plant material extracted and the weight of solvent used for extraction. From these values, I have generated a sample:solvent correction factor.
-- This is a pilot dataset and was not intended for publication.

My thinking is, now that the batch correction has been done, the next step is to apply the sample:solvent correction factor. The simplest thing to do would be, for each feature in a sample, divide the peak area value by the correction factor for that sample. However, I realize that detector response may not be linear in the range of interest for each feature; thus, the results may not be completely accurate. Nonetheless, I can't think of a better option. Any feedback on my approach?
7
XCMS / Using xcmsSet getting Error in R_nc4_close: NetCDF: Not a valid ID
Last post by DominicLam -
Dear Maintainers,

since a few days I'm getting the Error Message: "Error in R_nc4_close: NetCDF: Not a valid ID" using the "xcmsSet" function from {xcms} package under R-Studio Version 3.5.3 in Windows10 (64bit).

Our code for generating the chromatogramm list from raw cdf-data out of GC-MS-Analysis and getting the xcmxSet out of that list:

### List Chromatograms
lchrom <- list.files(path = "./cdf/", recursive = F, full.names = T,  pattern =".CDF")

### adjust order of chromatograms to 1, 2, 3, ...
temp <- unlist(strsplit(lchrom, c("G")))[c(FALSE, TRUE)]
temp <- unlist(strsplit(temp, c("."), fixed = TRUE))[c(TRUE, FALSE)]
temp <- as.numeric(temp)
lchrom <- lchrom[order(temp)]

### Start xcms analysis
system.time(
  xsgc <- xcmsSet(lchrom[1], method="matchedFilter", step = 0.5, fwhm = 2, snthresh = 3, max = 10000, BPPARAM = SnowParam(workers = 3) )
)


After the xcmsSet function we received the following error message:
"Error in R_nc4_close: NetCDF: Not a valid ID"

With R-Studio Version 3.4.3 and xcms 3.0.2 everything worked as always..

Our loaded packages are:
Biobase2.42.0
BiocGenerics 0.28.0
BiocParallel 1.16.6
CAMERA 1.38.1
data.table 1.12.2
MSnbase  2.8.3
mzR 2.16.2
parallel 3.5.2
ProtGenerics 1.14.0
Rcpp 1.0.1
S4 Vectors 0.20.1
stats4 3.5.2
xcms 3.4.4
8
Job opportunities / Postdoctoral position in NMR metabolomics of stem cell differentiation
Last post by Elena Legrand -
Postdoc position in NMR metabolomics of stem cell differentiation:

"A Postdoc position is available at the Department of Chemistry and CICECO-Aveiro Institute of Materials, at the University of Aveiro, in Portugal, for an estimated duration of 24 months.

 

The NMR metabolomics and tissue engineering groups are looking for a highly motivated doctorate researcher to be contracted to work on the development of a bioreactor based on stem cell metabolic markers for the guided production of bone tissue. This work is funded by the Operational Program Competitiveness and Internationalization, in its FEDER/FNR component, and the Portuguese Foundation for Science and Technology (POCI-01-0145-FEDER-028835), through the BIOIMPLANT project on “A Metabolomics-guided Bioreactor for Improved Engineered Bone Implants”.

 

Our lab combines significant expertise in NMR metabolomics (Professor Ana M. Gil) and in the field of state-of-the-art tissue engineering strategies (Professor Joao Mano). This project pioneers the use of metabolomics to guide stem cell differentiation into osteogenic lineage, to find metabolic biomarkers of differentiation performance. The interdisciplinary nature of the project calls for a candidate ideally with a biochemical orientation and some experience, or at least a strong interest, in NMR methodology.

 

Potential candidates should hold a doctorate degree in Chemistry, Biochemistry, Biotechnology, Biomedical Engineering or Bioengineering (or related scientific area).

 

Application procedure

Candidates may find the full description and conditions of the call in http://www.eracareers.pt/opportunities/index.aspx?task=showAnuncioOportunities&jobId=112541&lang=pt&idc=1 or directly through the link https://www.ua.pt/sgrhf/PageText.aspx?id=15052. The candidate will be led to a University of Aveiro webpage listing the various calls open at present. In this page the candidate should search for 1) “requerimento de candidatura” to find the formal application form and 2) “procedimento para instrução de candidatura” to find general application instructions. Both documents are written in Portuguese and an application form in English can be found at the end of this email. In case you need help with the application procedure, please contact concursosDL57-2016@ua.pt or agil@ua.pt.

 

The application may be submitted in Portuguese or in English and should be sent by email to the Human Resources Department of the University of Aveiro (concursosDL57-2016@ua.pt) until the 7th of May 2019 (24:00). Please indicate “Application to CDL-CTTRI-108-ARH/2019 call” in the subject line."

 
9
Conferences and seminars / EMN Webinar 23/24th April!
Last post by Elena Legrand -
Check out the next EMN Webinar!

Session 3 (2019) : Expert Speaker Dr Hiroshi Tsugawa , 40 min presentation followed by 10 min Q/A
23th April 2019 at 23:00 UTC (23:00 GMT, 18:00 EST, April 24th 8:00 Japan time)

Register here!


Computational mass spectrometry in metabolomics to deepen the understanding of metabolisms

Computational mass spectrometry is a growing research field to process mass spectrometry data, assist the interpretation of mass fragmentations, and elucidate unknown structures with metabolome databases and repositories for the global identification of metabolomes in various living organisms. In this talk, Dr Tsugawa will introduce three metabolomics software programs which include (1) MS-DIAL for untargeted metabolomics, (2) MS-FINDER for structure elucidations of unknowns, and (3) MRMPROBS for targeted metabolomics. These programs are demonstrated to perform the comprehensive analyses of primary metabolites, lipids, and plant specialized metabolites where unknown metabolites are also untangled with various methodologies including stable isotope labeled organisms, metabolite class recommendations, and integrated metabolome network analyses. In addition, a computational workflow to link untargeted- and targeted metabolomics is also highlighted in this talk.
Speaker Details Dr Hiroshi Tsugawa got his PhD at Osaka University in Japan, 2012 (in metabolomics for bioengineering) and moved to RIKEN in October 2012. He belongs now to two different laboratories within RIKEN institute: 1) the metabolome informatics research team at RIKEN Center for Sustainable Resource Science and the laboratory for metabolomics in RIKEN Center for Integrative Medical Sciences to learn lipid chemical biology. Dr Tsugawa is developing the tools for computational mass spectrometry for metabolomics, which are distributed at RIKEN PRIMe website (http://prime.psc.riken.jp/)
10
Job opportunities / Postdoctoral Fellow in Mass Spectrometry-based Metabolomics
Last post by eppsd -
Post: Postdoctoral Fellow in Mass Spectrometry-based Metabolomics
Department: IU Bloomington Public & Environmental Affairs

Full text: https://indiana.peopleadmin.com/postings/6871

Position Summary:
We are recruiting a postdoctoral fellow in Mass Spectrometry-based Metabolomics at Indiana University Bloomington to participate in an exciting, new collaborative project called PhyloTox, which seeks to identify the evolutionary origins of molecular toxicity pathways. Using transcriptomics and metabolomics data collected from a group of model species/cells exposed to a carefully selected suite of chemicals, biological insights will be drawn from the perturbation of entire genetic and biochemical networks via chemical ablation. The ultimate goal of the project is to develop a novel precision environmental health program to help solve the enormous environmental health crisis caused by environmental pollution.

For this position, we seek a postdoctoral fellow in Mass Spectrometry Metabolomics to focus on the application of metabolic phenotyping across a six model organisms/cell lines, applying primarily non-targeted LC-MS strategies. The post-holder will perform sample preparation applying manual and robotic approaches, LC-MS instrument maintenance and operation to acquire highly reproducible data in a high-throughput laboratory, and metabolite identification. They will contribute to study design and analytical method development in cutting edge, biomedical computational and statistical analysis.

The project involves working with several PIs and laboratories and, thus, collaborative skills and results-oriented project management are required. The position is localized at Indiana University Bloomington in the School of Public and Environmental Affairs, Department of Chemistry, Department of Environmental Health, and Department of Intelligent Systems Engineering. The position also includes a unique training opportunity under the guidance of Prof Mark Viant in the world class metabolomics facility of the University of Birmingham, UK (https://www.birmingham.ac.uk/staff/profiles/biosciences/viant-mark.aspx). This position will either be assigned to the Department of Chemistry or the School of Public and Environmental Affairs, whichever is the best fit for the successful candidate.

Questions regarding the position or application process can be directed to: Drs. Joseph Shaw (joeshaw@indiana.edu) or Stephen Jacobson (jacobson@indiana.edu).

To apply: Interested candidates should review the application requirements and submit their application online: https://indiana.peopleadmin.com/. The application should consist of a cover letter stating your accomplishments and interest in the project’s research, curriculum vitae, and letters of support from at least two references. Review of applications will begin immediately and continue until the position is filled. Applications received by November 15, 2018 will receive full consideration.