Hi Dr. Corey,
I think I got it to work (mostly)! Here's the winning code.
j <- grep("NEG_DDA", fileNames(xdata)) # index of DDA files
MS2.file.paths <- fileNames(xdata)[j] # file paths of DDA files
MS2.file.names <- gsub(".*/", "", MS2.file.paths) # names of DDA files
fd <- featureDefinitions(xdata) # extract feature defs as a new object
fv <- featureValues(xdata) # extract feature values as a new object
fv.filtered <- fv[, colnames(fv) %in% MS2.file.names] # filter feature values to include only DDA samples
cp <- chromPeaks(xdata) # extract chromatographic peaks as a new object
cp <- cbind(rowid=seq(nrow(cp)), cp) # add a temporary column to cp for matching with 'peakidx' in feature definitions (suggested by CoreyG)
cp.filtered <- cp[which(cp[,which(colnames(cp) == "sample")] %in% j),] # filter cp to include only DDA samples
peakidx.filtered <- list() # create object to store results of loop below
for(i in 1:length(fd@listData$peakidx)){
temp <- match(fd@listData$peakidx[[i]], cp.filtered[,"rowid"])
peakidx.filtered[[i]] <- temp[which(!is.na(temp))]
} # end loop; this filters to include only peakidx in fd that correspond to peaks in DDA samples
fd.filtered <- fd # duplicate original feature definitions
fd.filtered@listData$peakidx <- peakidx.filtered # overwrite peakidx in duplicated fd with filtered peakidx generated by loop above
xdata.filtered <- filterFile(xdata, MS2.file.names, keepAdjustedRtime=TRUE) # create a new xdata object with only DDA samples; correspondence results are removed and will be added back in below
file_factor <- factor(cp.filtered[, "sample"]) # outputs a vector to correspond peak number with sample number
cp.filtered.split <- split.data.frame(cp.filtered, f=file_factor) # splits cp.filtered (a dataframe) into a list of dataframes (one for each remaining DDA sample)
cp.filtered.v2 <- c() # new object for storing results
for(i in 1:length(cp.filtered.split)){
cp.filtered.split[[i]][,"sample"] <- i
cp.filtered.v2 <- rbind(cp.filtered.v2, cp.filtered.split[[i]])
} #end loop; renumbers samples in filtered cp list, starting at 1
chromPeaks(xdata.filtered) <- cp.filtered.v2 # GOAL !!!
featureDefinitions(xdata.filtered) <- fd.filtered # GOAL !!!
# export MS1 and MS2 features
filteredMs2Spectra <- featureSpectra(xdata.filtered, return.type = "Spectra")
filteredMs2Spectra <- clean(filteredMs2Spectra, all = TRUE)
filteredMs2Spectra <- formatSpectraForGNPS(filteredMs2Spectra)
writeMgfData(filteredMs2Spectra, paste(run.name, "_ms2spectra_all.mgf", sep=""))
# generate data table (i.e. peak table) in format needed for GNPS/FBMN
setwd(save.dir)
featuresDef <- featureDefinitions(xdata.filtered)
featuresIntensities <- featureValues(xdata.filtered, value = "into")
dataTable <- merge(featuresDef, featuresIntensities, by = 0, all = TRUE)
dataTable <- dataTable[, !(colnames(dataTable) %in% c("peakidx"))]
write.table(dataTable, paste(run.name, "_xcms_all.txt", sep=""), sep = "\t", quote = FALSE, row.names = FALSE) # UPLOAD TO GNPS for FBMN
#export MS2 features only
setwd(save.dir)
filteredMs2Spectra_maxTic <- combineSpectra(filteredMs2Spectra,
fcol = "feature_id",
method = maxTic)
writeMgfData(filteredMs2Spectra_maxTic, paste(run.name, "_ms2spectra_maxTic.mgf", sep="")) # UPLOAD TO GNPS for FBMN
filteredDataTable <- dataTable[which(dataTable$Row.names %in% filteredMs2Spectra@elementMetadata$feature_id),]
write.table(filteredDataTable, paste(run.name, "_xcms_onlyMS2.txt", sep=""), sep = "\t", quote = FALSE, row.names = FALSE) #NOTE: apparently this is the table that can be used with GNPS/FBMN
The two key changes were adding rowid to cp and filtering peakidx to remove peaks not present in peakidx. Incidentally, I did not overwrite row names in cp/cp.filtered, and it didn't seem to cause any problems.
However, as you can see, I did not write fv.filtered back into xdata.filtered. It seems that featureValues() is a getter but NOT a setter, so
> featureValues(xdata.filtered) <- fv.filtered
Error in featureValues(xdata.filtered) <- fv.filtered :
could not find function "featureValues<-"
I couldn't quite figure out the structure of xdata.filtered, but I'm sure there's some way to overwrite the feature values through subsetting. Nonetheless, even without writing fv.filtered back into xdata.filtered, it seems that the feature values are re-populated somehow after featureDefinitions(xdata.filtered)<-fd.filtered
I'm not quite sure how this happens. And while the newly populated feature values have the same dimensions as fv.filtered, they are not identical: > all(fv.filtered == featureValues(xdata.filtered))
[1] FALSE
Visual side-by-side inspection of fv.filtered and featureValues(xdata.filtered) shows consistency for some samples [with very small values in fv.filtered replaced with "NA" in featureValues(xdata.filtered)], but the values for other samples are completely different.
Looking at str(xdata.filtered), I don't see any obvious objects to replace with fv.filtered. I thought maybe xdata.filtered@featureData@data$totIonCurrent would correspond to the values in featureValues(xdata.filtered), but they don't seem to.
Looking at the code for featureValues() , I don't see anything obvious. It may need to look at this code again with rested eyes.
Any idea how to get fv.filtered back into xdata.filtered?
Thanks!
Taylan