[UPDATE] Research Fellow: Data Science / Biostatistics / Bioinformatics

December 20, 2016, 03:17:25 PM

As described in a previous message I am currently recruiting a Data Scientist to fill the position of a senior postdoctoral research fellow. Any applicant should have extensive experience (PhD+) in one or more of the following: Biostatistics, Bioinformatics, Cheminformatics, Data Management, or Machine Learning - and must be a proficient programmer in a standard scripting language (preferably Python). Being able to write Matlab, R, SAS scripts would be a bonus. http://tinyurl.com/gw63a4w

Update:

I have had a few inquiries about the above position and it seems that potential applicants are uncomfortable with the idea of not being given a specific project to work on. There are so many areas of metabolomics data analysis that either require radical new approaches, significant further research, or a simply not provided. So should anyone need a nudge in the right direction, here are a few projects that I will be engaging with in the near future and with which I would like some help.

1) Next generation of software for the removal of within/between batch systematic variation. Building on the success of my QC-RSC algorithm ( http://link.springer.com/article/10.1007/s00216-013-6856-7 ). Moving to a hierarchical multivariate generalized model.
2) Radically reinventing the data analysis "workflow" moving from a depressingly constraining linear "data dredging" (algorithm dredging) approach of current online solutions (galaxy, knime, metaboanalyst) to a directed nonlinear workflow based on study design and with the aim to identify the correct statistical model to analyse your data with based clear assumptions on data type and distribution (similar to scikit learn https://goo.gl/images/B2yrCk and Laerd Statistics https://statistics.laerd.com/features-selecting-tests.php )
3) A non-linear approach to the multi-block problem integrating multi-system data using Neural Networks.
4) Feature Selection and Model optimization using Evolutionary Computation (GAs / GPs )
5) Non-linear cluster-based time-series analysis. Investigating metabolite sub-trajectories as an alternative to cross-sectional biomarkers. leading of N-of-1 personalized medicine.
6) Web-based interactive data visualization of complex system interactions using a combination of network analysis, pathway mapping and classical visualization techniques.
7) Developing web-based training resources on Experimental Design and Data Analysis using Python and Jupyter.

There are many more floating around, but hopefully this will give people an idea of the opportunities in my lab over the next 5 years or so.

Merry Christmas.

David.