Skip to main content
Topic: File Count and Storage Space Count Errors (Read 5845 times) previous topic - next topic

File Count and Storage Space Count Errors

This error presents across Linux and Mac machines, both viewing through Firefox and a Windows 7 machine viewing through Iron (De-Google'd Chrome).

Firstly, when uploading relatively large file sets (over 14 files per set), the Data Set Viewing interface does not show all files within the set in its list-view format. Additionally, the number of files displayed is cut off at 14. I have not ran many tests on this error due to the time and data it would require to experiment with.

The particular data set I encountered this error with contains 19 files, and when using the Create Job page's "Select Stored Dataset" dialogue, the file count listed beside this data set is indeed 19. When viewed under Stored Datasets, the count is 14, and only 14 files are visible.

Secondly, the Storage Space Percentage reports between the  Create Job and File Upload dialogue seems to disagree with the pie chart presented under the Accounts page. My account claims to have 11% of its total storage space available under the Account page, yet the Create Job page states that 96.2%.

What other information can I provide?

Re: File Count and Storage Space Count Errors

Reply #1
Additionally, after seeing the new entries on the FAQ page, I noticed there are two different messages listed under Dataset File Status:

File read: OK.

or

VALID MD5

When using a dataset only of File read: OKs, I have been getting errors, but not with those that contain only a few File read: OKs and primarily VALID MD5s. Is there really a functional difference? Where does it lie between useful data and corrupted data?

When testing File read OK .d directories with Elgorithm's Chaos MD5 program, I could not select individual .d directories. When I manually zipped them using "Send To Compressed Folder" from right click options on Windows 7, the zipped file's MD5 Sum was not the same as the zip file on your servers. I am still guessing that the files weren't properly uploaded, but should they match if everything worked properly during the upload?

Thank you

Re: File Count and Storage Space Count Errors

Reply #2
With respect to the file count on dataset uploads, I started testing to reproduce (and fix) this error.  Currently we are working on the upload module, adding some enhanced functionality so it is possible this issue has already been resolved.  I will post an update after testing is completed.  No additional information is need on your part as I have located your account and datasets.

For the available storage capacity, I was able to make programming modifications to correctly calculate available space.

In response to your question about the difference between "File read: OK." and "VALID MD5", when files are uploaded a checksum is calculated prior to upload on your computer.  This code is transferred to the server and stored in our database.  After all segments of the upload are transferred and the file is reconstructed on the server, the complete file is used to calculate an MD5 checksum.  This code is compared with the one previously calculated on the client prior to upload.  If the codes are identical, the files are also identical, implicating a successful transfer of the complete file without corruption.  (This means the file you transferred was received on the server but it does not say anything about the file itself.)  This is where files can become "corrupted".

XCMS Online currently supports certain file formats (e.g. mzXML, CDF, .d, etc. - see website for complete list).  If other formats are uploaded (e.g. RAW, .D, WIFF, etc.), we may receive them (valid MD5) but the system can not read/process them.  After a job is started, the files must be "read" into memory.  This stage results in the status message "File read: OK", if the file was in a format the XCMS Online system can interpret.  This is where files could be considered "useless", at least until we implement a converter for the file type in question.

The Chaos MD5 program is only capable of calculating an MD5 checksum for a single file.  The MD5 checksum we calculate is based on a single file as well: the zipped version of the .d directory.  For mzXML or CDF files, we zip those as well prior to calculating the MD5.  It is possible to calculate the MD5 on the non-zipped version prior to zipping but we have found this to be overkill at present (because most errors we have seen do not relate to zipping/unzipping).  The MD5 codes should match what was uploaded (assuming they are the same).  One of the features our development team is implementing is the ability to upload a single file to be included in a dataset.  This will likely help where a large dataset has a single problematic file (because the user will not have to upload all the dataset files again).

Thank you for using the system and for your patience as well refine the system, which is still in beta version.

Duane

Re: File Count and Storage Space Count Errors

Reply #3
I was able to reproduce and isolate the problem related to file counts - it appears to be related to resuming uploads (ability to return to page and resume partially-completed upload).  I temporarily disabled this functionality until I can test further.

I recently enabled the ability to add individual files to a dataset.  If a single file is corrupt, you can delete this file and upload again to the same dataset.  There is a button on the "Stored Datasets" page to accomplish this.  You will need to select a dataset to see the "Add Files" button at the bottom of the page.  Alternatively you can edit an individual dataset by clicking the link on the "View Results" page.

Note: This code has been internally tested but not externally tested yet.  In addition, the user manual does not document this feature yet.

Duane