Skip to main content

Messages

This section allows you to view all Messages made by this member. Note that you can only see Messages made in areas you currently have access to.

Messages - Jan Stanstrup

271
XCMS / Re: Choose the best parameters to analyse UPLC-qTof data
As for your TICs: masswolf incorporates the lockmass scans into the data. That are those spikes you see. Currently the only way to correctly convert the data is to use waters databridge software to convert to CDF files.
Next it seems like all your peaks are extremely low. It can often be more easy to look at the base peak intensity (BPI) instead of the TIC. You can do that in masslynx also. Do you see more reasonable peaks there? Do you have a reasonably stable signal from the lockspray compound/trace? It doesn't look like it in your chromatogram. You need to understand if you have a reasonable data quality before there is a point in using XCMS.


For XCMS take a look at the parameters for centwave:
?findPeaks.centWave

Especially the default peakwidth parameter is not good for UPLC. You can try to use something like peakwidth=c(2,20).
272
XCMS / Re: CentWave error
Here you go for the  impressive speed improvements.
I wrote the same file 20 times.

Old way:  used 1.39GB of memory and took 9.83min
New way: used 0.03GB of memory and took 3.27min


So 3 times as fast and memory leak gone :) wonderful!
273
XCMS / Re: CentWave error
Yep... There is a memory leak in write.mzdata. You can show this if you just write the same data over and over again. Memory usage will increase. I tried to look at the code some time ago but couldn't find an obvious problem.

I hope someone will find time to fix this issue soon as I too wanted to use xcms as a batch converter.
I am hoping that the writing routine can be made more efficient too. It seems illogical that writing is so many many times slower than reading the data; maybe this is my lack of understanding of what needs to be done though. As I remember reading the code it looked like it is adding one scan at a time to the file. Maybe this is not an efficient way?
274
XCMS / Re: memory error during fillpeak step
That is a lot of samples!!! ~200 days of run time.

You can split you xcmsset: see ?split.xcmsSet
But the peak group info is discarded so it won't be trivial to put it back together again after re-grouping and filling. There is no logical way to divide this data for individual analysis?
275
XCMS / Re: XCMS2: collect() doesn't work
I read the guide you referred to now.
It appears that the way collect works was changed (another person downgraded to another version to make it work it seems: viewtopic.php?f=8&t=287). It now seems to require doing some peak picking first though the help is very sparse so I am not much help.
However since you want to use the searchMetlin function the point seems moot at the moment as this was apparently removed from xcms: viewtopic.php?f=8&t=130. The function is no longer there if you check the list of xcms functions.

See also: https://groups.google.com/forum/?fromgr ... 6_dMrI8vQA
276
XCMS / Re: XCMS2: collect() doesn't work
I am sorry but I can't be very helpful with these functions as I haven't used them myself. But I see the problem in your example.
You are still passing the xcmsSet object (Data.set) to collect instead of the xcmsFragments object (Data.fragments).

Data.fragments is just a new variables not connected to Data.set. "." doesn't have special meaning in R.
To understand better what is inside each variable it is always a good idea to use str(object) as in str(Data.set) or str(Data.fragments).


Hope this helps.
279
Other / Re: Use several computers to do the calculation?
Did I understand that you have only about 70 samples?

My workstation is just a measly Core2duo 3GHz; But after I added 16GB of RAM I can without issue process 500 samples (6mins each); though granted it takes about a day.
The key to be able to process the samples is memory; if you don't have enough the speed will drop to nothing.
The key to getting a good speed is as Steffen said a fast computer with several cores.

I would go for the cheap solution first and add RAM (something like 16GB) to the machine you have. RAM is very cheap at the moment. A very fast computer not so much! It really depends on your budget and needs.

And as Steffen said use nSlaves if you have several cores.


Jan.
280
CAMERA / Re: Way too many ions being assigned to the same compound
Well I don't think I should be giving advise in statistics... But no I don't use assignment at all before I do statistics. So yes I only use CAMERA for identification after statistics have told me which features are interesting. I don't use the statistics in xcms as the studies I am working on have a design that requires more complicated statistics.

It sounds like PLS might be the appropriate statistical tool for your problem.
281
CAMERA / Re: Way too many ions being assigned to the same compound
I think you are misunderstanding how it works.

calcCiS: Calculate correlation inside samples
That means correlation across the peak = is it really coeluting or not?
It is correlation inside the sample; not inside a sample group. This means that camera goes back to the raw data and compares extracted ion chromatograms.
The illutration in Carsten's paper show this: http://pubs.acs.org/doi/abs/10.1021/ac202450g
This will fail if compounds are perfectly coeluting.

calcCaS: Calculate correlation accross samples
They are correlated if high intensity of feature A means high intensity of feature B. The study design or sample groups are not used for this information.
Look at these plots. Each dot is a sample.

Features that are highly correlated between samples
[attachment=1:13e2jjox]cor.png[/attachment:13e2jjox]
Features that are uncorrelated between samples
[attachment=0:13e2jjox]uncor.png[/attachment:13e2jjox]


These methods are not to solve the problem of features not being independent. That is a statistical problem. Even if you could perfectly tell which features are from the same groups you will still have correlated groups.
These functions are helpful for structure elucidation. For that reason you would rather include too much in one group than too little.
Would you rather have a group consisting of several compounds than not be aware that a feature belongs in a group? I would choose the former. You will need to manually asses the data in either case. And the adduct annotation will help you greatly "guesstimating" which are the true pseudo-molecular ions in you group. But it can do nothing if a compound have been split in different groups.

[attachment deleted by admin]
282
CAMERA / Re: Way too many ions being assigned to the same compound
If the features actually are perfectly co-eluding then the correlation across peaks would be perfect and calcCiS would not be able to say that they are different compounds. Have you looked in the raw data if they really are coeluting?
If your have a reasonable number of samples you can try enabling calcCaS that would look for correlation across samples. In this way features that are perfectly coeluting but not correlating across samples (that is if they are from the same compound, if one is high in a sample the other must be too) can be separated.
If they are both perfectly coeluting and related in a way that makes them also highly correlated across samples then there is no magic that will tell you which truly belong together. Only MSn experiments can help you determine that.
284
XCMS / Re: Targeted Peak Picking
It seems you are generally able to write R scripts so here is just a small hint:

in your xcmsSet object you have the mass for each feature in:
xset@groups[,"mzmed"]

Simple make a loop that runs through each mass in your target list and mark which feature gives a hit.
Something like:
Code: [Select]
(abs(xset@groups[,"mzmed"]-mass[i])/mass[i])*1E6 < ppm_tol

would give you a logical vector for each target mass giving you the possible location of the target.
I didn't test this but that would be the general easy way.

If you only have a single sample you could use xset@peaks instead that have additional columns you could use for filtering:
Code: [Select]
into	
integrated peak intensity

intb
baseline corrected integrated peak intensity

maxo
maximum peak intensity

sn
Signal/Noise ratio, defined as (maxo - baseline)/sd, where
maxo is the maximum peak intensity,
baseline the estimated baseline value and
sd the standard deviation of local chromatographic noise.