Skip to main content

Recent Posts

1
Our latest version SIRIUS 3.5 comes with several improvements. Download it from https://bio.informatik.uni-jena.de/software/sirius/.

We have a new overview tab for CSI:FingerID hits, which displays results of structure search for multiple molecular formulas. You can examine the predicted fingerprint of each compound (and molecular formula) independently of any database. We now offer the possibility to create and search in custom structure databases (suspect screening). Besides, we have a new Bayesian networks scoring function for CSI:FingerID which considers dependencies between different molecular properties. This and much more.

Note in passing: The CSI:FingerID web service has just passed the mark of processing data from 500,000 query compounds -- congratulations to CSI:FingerID, and thank you for your interest in our tools! (Be reminded that CSI:FingerID should be accessed via the SIRIUS application, not via the web page.)

If you have comments, ideas for improvements, feature requests etc, please answer to this post.
2
XCMS / Re: sequential addition of files to an xcms object
Last post by cbroeckl -
Thanks edmandsw,

I currently am using a custom script for feature grouping which simply assumes that if retention times overlap and mass windows (plus a given ppm error) overlap, the represent the same compound.  After each iteration, I apply a retention time adjustment  as well.  The tried-and-true XCMS functions are just much more well used and validated than my internal tools, so I was hoping there would be a way to utilize them that I had missed.   I will look at the package you are developing - it does look interesting! Are you aware of the skyline/panorama tool sets?  I know they are also working on vendor neutral QC monitoring.  Interested to see how yours compares.  

Corey
3
XCMS / Re: sequential addition of files to an xcms object
Last post by edmandsw -
You might be interested in an R package simExTargId I have been developing for real-time metabolomic experiment monitoring (with email notification) and MS/MS target identification. It makes use of xcms and CAMERA and sequentially concatenates xcmsSet objects as data is collected. It is still in development and has a few rough edges but it has been used regularly in our lab (it currently works for Agilent .d and Thermo .raw/.RAW data files).
4
XCMS / Re: sequential addition of files to an xcms object
Last post by edmandsw -
As far as I'm aware this isn't possible. The retention time correction using the obiwarp method for example has to be reassessed with a new centre sample.
See the help file
Code: [Select]
?retcor.obiwarp
#center
#the index of the sample all others will be aligned to. If center==NULL, the sample with the most peaks is chosen as default
So the retention time deviation for each file has to be re-calculated.
Additionally for grouping the minfrac and minsamp arguments will be affected by additional samples in each group as you concatenate.
Code: [Select]
?group.density
#minfrac
#minimum fraction of samples necessary in at least one of the sample groups for it to be a valid group
#minsamp
#minimum number of samples necessary in at least one of the sample groups for it to be a valid group

It shouldn't be (computationally speaking) too much of a big deal to re-do the retention time correction and grouping each time you peak-pick an additional file(s). The most time-consuming part is definitely the xcmsSet function.
5
XCMS / Re: No console messages xcmsSet (xcms v1.50.1)
Last post by edmandsw -
Fantastic, thanks for getting back to me so quickly. I read Martin Morgan's explanation also and it is now clear to me (although probably quite superficially)  how BiocParallel is working.   :))
6
XCMS / Re: No console messages xcmsSet (xcms v1.50.1)
Last post by johannes.rainer -
Got now an explanation from Martin Morgan (https://support.bioconductor.org/p/96856/). Basically, you could use the progressbar, but you have to increase the number of tasks, so that the progress bar will be updated more frequently. Note however that a) the number of tasks should not be larger than the number of files you're processing and b) there might be a performance decrease with too many tasks.
So, in your case you could:
Code: [Select]
library(faahKO)
library(xcms)
library(BiocParallel)
library(snow)

## The directory with the NetCDF LC/MS files
cdfpath <- file.path(find.package("faahKO"), "cdf")

setwd(cdfpath)

## Register the parallel processing setting - will be used by default by all xcms methods
## Set tasks to a reasonable number
register(SnowParam(tasks = 10, progressbar = TRUE))

peakmatrix <- xcmsSet()

Now, for your 400 file experiment you might want to increase the number of tasks to get more frequent callbacks and updates of the progress bar.

Hope this helps.

cheers, jo
7
XCMS / Re: No console messages xcmsSet (xcms v1.50.1)
Last post by johannes.rainer -
Dear Will,

with the switch to BiocParallel the progress information are no longer printed immediately - this seems to have to do with the way BiocParallel handles the sub-processes. I know that is annoying, but there is not much we can do within xcms.

I'll get in contact with the BiocParallel developers to check if we can fix that.

cheers, jo
8
XCMS / No console messages xcmsSet (xcms v1.50.1)
Last post by edmandsw -
Hi,

It is probably something minor/trivial but since updating to xcms3 (v1.50.1) (and the deprecation of the nSlaves argument change to BPPARAM of BiocParallel) I am no longer receiving lovely reassuring progress messages printed to the R console. The strange thing is in previous versions of xcms the progress messages would appear soon after initiation of the xcmsSet function regardless of the number of mzXML files in the directory.
I have tried the progressBar argument of the SnowParam function but it was stuck at 0% for over an hour.
With the ~200 mzXML files I am currently peak-picking I did not receive any console message for over two hours:
metabForum_20170608.PNG
Then all of a sudden there were messages previously typical of a single-threaded process:
metabForum_20170608_2.PNG
However I was able to check the multi-threaded process was running by monitoring the CPU usage.

When the dataset size is small as is the case for faahKO, the progress messages appear much sooner. Here is a reproducible but perhaps not very useful example:
Code: [Select]
library(faahKO)
library(xcms)
library(BiocParallel)
library(snow)

## The directory with the NetCDF LC/MS files
cdfpath <- file.path(find.package("faahKO"), "cdf")

setwd(cdfpath)

snowparam <- SnowParam(workers = parallel::detectCores(), type = "SOCK")

peakmatrix <- xcmsSet(BPPARAM = snowparam)

Is is necessary to DIY/Jerry-rig your own progressCallBack function now?

Code: [Select]
cdffiles <- list.files(cdfpath, recursive = TRUE)
progress <- function(n) cat(paste0(n, ' of ', length(cdffiles),
                                   ' complete (', basename(cdffiles)[n],
                                   ').\n'))
peakmatrix <- xcmsSet(BPPARAM = snowparam, progressCallback = progress)
Although this didn't work as expected either.

Many thanks in advance,

Will

Code: [Select]
>sessionInfo()
R version 3.3.3 (2017-03-06)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252 
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                         
[5] LC_TIME=English_United States.1252   

attached base packages:
[1] parallel  stats    graphics  grDevices utils    datasets  methods  base   

other attached packages:
[1] snow_0.4-2          BiocParallel_1.8.1  faahKO_1.14.0      xcms_1.50.1        Biobase_2.34.0   
[6] ProtGenerics_1.6.0  BiocGenerics_0.20.0 mzR_2.8.1          Rcpp_0.12.10     

loaded via a namespace (and not attached):
[1] RANN_2.5              lattice_0.20-34        codetools_0.2-15      MASS_7.3-45         
[5] MassSpecWavelet_1.40.0 grid_3.3.2            plyr_1.8.4            stats4_3.3.2         
[9] S4Vectors_0.12.1      Matrix_1.2-8          splines_3.3.2          RColorBrewer_1.1-2   
[13] tools_3.3.2            survival_2.40-1        multtest_2.30.0     
9
The last webinar "Machine learning powered metabolomic network analysis" by Dr. Dmitry Grapov is now online.

http://metabolomicssociety.org/resources/videos/88-videos/258-2017-emn-webinars-public

10
XCMS / Re: peak shape/symmetry?
Last post by cbroeckl -
I have done filtering for peak width: 

    orig<-xset@peaks
    good<-which((orig[,"rtmax"]-orig[,"rtmin"])<(3*maxpw))
    filt<-orig[good,]
    xset@peaks<-filt

No reason it could not be adapted for shape descriptors.  It has to be done before grouping.
Corey