Skip to main content
Topic: load a CDF file in R (Read 8935 times) previous topic - next topic

load a CDF file in R

I am using your R package called xcms to import cdf files (GC/MS) data.
based on the manual page 74, I use both following command line


path to the data
xr<-xcmsRaw("C:/path to the folder /m.cdf", profstep = 1, profmethod = 'bin',profparam = list(),includeMSn = FALSE,mslevel = NULL, scanrange = NULL)

xr<-xcmsRaw("C:/path to the folder /m.cdf")

Program: C:Program FilesRStudiobinx64rsession.exe
File: posixio.c, Line 325

Expression: pxp->bf_offset <= offset && offset < pxp->bf_offset + (off_t) pxp->bf_extent

This application has requested the Runtime to terminate it in an unusual way.
Please contact the application's support team for more information.
I have tried everything which did not lead me to a solution and that is why I decided to contact you
if you have any idea how I can fix it,please let me know

Re: load a CDF file in R

Reply #1
How large is this file?
Googling it seems that it might be because something is running in 32 bit mode. I don't know the internals well enough to venture a guess to where the problem is or how to fix it.


https://code.zmaw.de/boards/4/topics/468
http://www.aps.anl.gov/epics/tech-talk/ ... g01231.php


EDIT: idea: if your files are > 2GB because they are in profile mode you could probably get the size down by converting them to centroid mode with msconvert from proteowizard. You can try that anyway to see if it is a CDF specific issue.
Blog: stanstrup.github.io

Re: load a CDF file in R

Reply #2
Hi Mohammad,

the file you sent seems to be fine on my Ubuntu Linux box.
What is your operating system and R version ?
Can you run R in a command line without the Rstudio
around it ?

Yours,
Steffem


Code: [Select]
> library(xcms)
> xr <- xcmsRaw("m.cdf")
> xr
An "xcmsRaw" object with 1029 mass spectra

Time range: 1199.8-1500 seconds (20-25 minutes)
Mass range: 28.8909-501.0926 m/z
Intensity range: 0-4153340

MSn data on  0  mass(es)
with  0  MSn spectra
Profile method: bin
Profile step: 1 m/z (473 grid points from 29 to 501 m/z)

Memory usage: 24.5 MB
--
IPB Halle                          Mass spectrometry & Bioinformatics
Dr. Steffen Neumann         http://www.IPB-Halle.DE
Weinberg 3 06120 Halle     Tel. +49 (0) 345 5582 - 1470
sneumann(at)IPB-Halle.DE

Re: load a CDF file in R

Reply #3
Thanks Steffan,
No need to do anything, I simply used R alone, I checked it on unix and windows which both are working, the file was checked (e.g. mzR) and it seems to be alright.

Re: load a CDF file in R

Reply #4
The problem is still there! i used few data this time and I tried to invoke them all, the small size came in without any problem. the others could not pass through.
the same error appeared and ...
I used different platform, with and without Rstudio, the same error was there !!!
any solution?

Re: load a CDF file in R

Reply #5
Hi,
if your installation works in principle, there is little I can think of.
Is this LECO GCxGC data ?

If your file has  6,479,713  bytes, that's only 6MB, so not huge at all.

For your other question, if you get the xcmsRaw, you find the
Raw data as a matrix by using xr@env$profile if you've set profStep=1
where 1 is the resolution in Da of the matrix.

Yours,
Steffen
--
IPB Halle                          Mass spectrometry & Bioinformatics
Dr. Steffen Neumann         http://www.IPB-Halle.DE
Weinberg 3 06120 Halle     Tel. +49 (0) 345 5582 - 1470
sneumann(at)IPB-Halle.DE

Re: load a CDF file in R

Reply #6
Quote from: "sneumann"
Hi,
if your installation works in principle, there is little I can think of.
Is this LECO GCxGC data ?

If your file has  6,479,713  bytes, that's only 6MB, so not huge at all.

For your other question, if you get the xcmsRaw, you find the
Raw data as a matrix by using xr@env$profile if you've set profStep=1
where 1 is the resolution in Da of the matrix.

Yours,
Steffen


Hello,

No, it is GC-TOF and not GC/GC. and it is about 7GB since I ran a long time sequence.
I sent you an example by email.
about the data matrix, it gives me a matrix but I have no idea what is what. which row is retention and which ones are MZ and ....

Bests,

Re: load a CDF file in R

Reply #7
Hi,

the file you sent loads fine over here, so I expect something in your installation.
If the smaller files load fine, I suspect RAM issues. How much memory do you have ?

Code: [Select]
> library(xcms)
Loading required package: mzR
Loading required package: Rcpp
xr <- xcmsRawLoading required package: BiocGenerics
Loading required package: parallel

Attaching package: ‘BiocGenerics’

The following objects are masked from ‘package:parallel’:

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following object is masked from ‘package:stats’:

    xtabs

The following objects are masked from ‘package:base’:

    anyDuplicated, append, as.data.frame, as.vector, cbind, colnames,
    duplicated, eval, evalq, Filter, Find, get, intersect, is.unsorted,
    lapply, Map, mapply, match, mget, order, paste, pmax, pmax.int,
    pmin, pmin.int, Position, rank, rbind, Reduce, rep.int, rownames,
    sapply, setdiff, sort, table, tapply, union, unique, unlist

Loading required package: Biobase
Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.

("data2
Attaching package: ‘xcms’

The following object is masked from ‘package:Biobase’:

    phenoData, phenoData<-

> xr <- xcmsRaw("data2.cdf")
> xr
An "xcmsRaw" object with 13257 mass spectra

Time range: 360-4204.2 seconds (6-70.1 minutes)
Mass range: 14.9984-519.9868 m/z
Intensity range: 0-1384450

MSn data on  0  mass(es)
with  0  MSn spectra
Profile method: bin
Profile step: 1 m/z (506 grid points from 15 to 520 m/z)

Memory usage: 5110 MB
> sessionInfo()
R version 3.0.0 Patched (2013-04-04 r62494)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8      LC_NUMERIC=C             
 [3] LC_TIME=de_DE.UTF-8        LC_COLLATE=en_US.UTF-8   
 [5] LC_MONETARY=de_DE.UTF-8    LC_MESSAGES=en_US.UTF-8 
 [7] LC_PAPER=C                LC_NAME=C               
 [9] LC_ADDRESS=C              LC_TELEPHONE=C           
[11] LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C     

attached base packages:
[1] parallel  stats    graphics  grDevices utils    datasets  methods 
[8] base   

other attached packages:
[1] xcms_1.43.1        Biobase_2.22.0    BiocGenerics_0.8.0 mzR_2.1.10       
[5] Rcpp_0.11.2     

loaded via a namespace (and not attached):
[1] codetools_0.2-14 zlibbioc_1.8.0 

The matrix that you get has the dimensions:
Code: [Select]
> dim(xr@env$profile)
[1]  506 13257

So the 13257 correspond to the scans, the 506 grid points from 15 to 520 m/z.

Yours,
Steffen
--
IPB Halle                          Mass spectrometry & Bioinformatics
Dr. Steffen Neumann         http://www.IPB-Halle.DE
Weinberg 3 06120 Halle     Tel. +49 (0) 345 5582 - 1470
sneumann(at)IPB-Halle.DE

 

Re: load a CDF file in R

Reply #8
Quote from: "sneumann"
Hi,

the file you sent loads fine over here, so I expect something in your installation.
If the smaller files load fine, I suspect RAM issues. How much memory do you have ?

Code: [Select]
> library(xcms)
Loading required package: mzR
Loading required package: Rcpp
xr <- xcmsRawLoading required package: BiocGenerics
Loading required package: parallel

Attaching package: ‘BiocGenerics’

The following objects are masked from ‘package:parallel’:

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following object is masked from ‘package:stats’:

    xtabs

The following objects are masked from ‘package:base’:

    anyDuplicated, append, as.data.frame, as.vector, cbind, colnames,
    duplicated, eval, evalq, Filter, Find, get, intersect, is.unsorted,
    lapply, Map, mapply, match, mget, order, paste, pmax, pmax.int,
    pmin, pmin.int, Position, rank, rbind, Reduce, rep.int, rownames,
    sapply, setdiff, sort, table, tapply, union, unique, unlist

Loading required package: Biobase
Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.

("data2
Attaching package: ‘xcms’

The following object is masked from ‘package:Biobase’:

    phenoData, phenoData<-

> xr <- xcmsRaw("data2.cdf")
> xr
An "xcmsRaw" object with 13257 mass spectra

Time range: 360-4204.2 seconds (6-70.1 minutes)
Mass range: 14.9984-519.9868 m/z
Intensity range: 0-1384450

MSn data on  0  mass(es)
with  0  MSn spectra
Profile method: bin
Profile step: 1 m/z (506 grid points from 15 to 520 m/z)

Memory usage: 5110 MB
> sessionInfo()
R version 3.0.0 Patched (2013-04-04 r62494)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8      LC_NUMERIC=C             
 [3] LC_TIME=de_DE.UTF-8        LC_COLLATE=en_US.UTF-8   
 [5] LC_MONETARY=de_DE.UTF-8    LC_MESSAGES=en_US.UTF-8 
 [7] LC_PAPER=C                LC_NAME=C               
 [9] LC_ADDRESS=C              LC_TELEPHONE=C           
[11] LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C     

attached base packages:
[1] parallel  stats    graphics  grDevices utils    datasets  methods 
[8] base   

other attached packages:
[1] xcms_1.43.1        Biobase_2.22.0    BiocGenerics_0.8.0 mzR_2.1.10       
[5] Rcpp_0.11.2     

loaded via a namespace (and not attached):
[1] codetools_0.2-14 zlibbioc_1.8.0 

The matrix that you get has the dimensions:
Code: [Select]
> dim(xr@env$profile)
[1]  506 13257

So the 13257 correspond to the scans, the 506 grid points from 15 to 520 m/z.

Yours,
Steffen



Hello,

I see, for one system I have about 8GB RAM, do you think it is something to do with that?
I will check with a better system and see if I still have a problem.
Thanks for the dim so, it only gets the m/z per scan. I was confused because I was mainly searching for the time rather than m/z

Thanks,