Skip to main content
Topic: Experimental design for multiple-batch study (Read 6338 times) previous topic - next topic

Experimental design for multiple-batch study

Hi all,

I have been thinking about setting up an experiment with around 600 samples (3 batches of 200) with 3 different conditions. Would it be optimal to set up the batches such that the proportion of 3 different conditions are the same throughout the different batches?

Thank you.

Re: Experimental design for multiple-batch study

Reply #1
Yes...
If you have "paired samples" where you are not interested in the differences within the pairs it can reduce variance to put them in the same batch. For example if you have several persons you might consider having the same person in the same batch for all time course samples.

But as you say. Spread evenly conditions and randomize within the batch.

Use plenty of QC samples so that you afterwards can correct for potential intra- and inter-batch drifts.
Blog: stanstrup.github.io

Re: Experimental design for multiple-batch study

Reply #2
Thanks for the reply.

It's not an experiment where we have 'paired samples', so we are okay with respect to that.

I wonder if it would be okay if an experiment set up where:

  • There are multiple batches.
  • Within each batch, samples with different conditions are not evenly distributed.
  • A pooled sample consisting of small amounts of all samples in batches is created. QC samples are analyzed in addition to the samples in each batch (which ideally would allow for inter + intra batch drift).

After the batch correction (inter + intra) using QC samples, will the quality of the data be okay given that the distribution of the conditions were not even?

Thanks again.

 

Re: Experimental design for multiple-batch study

Reply #3
Hi djb17,

It is always a good idea to randomize samples across batches. There are special cases where you might want to perform block randomization, such as what Jan described.
I guess you could say that the goal is to make systematic/technical variation orthogonal to biological variation.

Given that, how different are the distributions?
Say there are 3 conditions, where all samples were randomly assigned to. It would be fine to have one batch with 30/40/50 samples from each treatment (assuming they are run in a randomized order). However, it wouldn't be good to have a distribution of something like 50/0/60. Even worst would be 10/0/100.

For pooled QCs. It's a good idea to create the pooled samples and put them with the biological samples as soon as possible. This ensures they will capture as much technical variation as possible i.e. all the variation from sample storage, defrosting, extraction, running etc.
This is especially important when batches are processed separately (extracted separately or stored in different freezers).

Cheers,
Corey

Re: Experimental design for multiple-batch study

Reply #4
Given that, how different are the distributions?
Say there are 3 conditions, where all samples were randomly assigned to. It would be fine to have one batch with 30/40/50 samples from each treatment (assuming they are run in a randomized order). However, it wouldn't be good to have a distribution of something like 50/0/60. Even worst would be 10/0/100.

I'm talking about the extreme case (e.g. 10/0/100). So is it correct to say that, even if the pooled QC was created from all of the samples in the experiment, the resulting data would have issues because of the unbalanced distribution of the samples (after correction, normalization, etc...)?

If this is the case, what happens during the experiment as a result of the imbalance that leads unusable/low-quality data?

Thank you so much for your feedbacks! Learning so much from you all  :D

Re: Experimental design for multiple-batch study

Reply #5
Hi djb17,

I feel I wasn't so clear in my last message.

Let's imagine we have 2 groups. We don't know if Metabolite X is different between the two groups, so we decide to measure it.
We run the first group in one batch and get an average concentration of 10.
We run the second group separately and get an average concentration of 20.

Does that mean the second group has twice the concentration of the first? That depends on how reproducible (accurate) our measurement is.
Basically, we don't know if the difference we saw is due to the biological difference in groups or the way we ran the experiment (batches).

Someone suggests 'normalizing' the batches to each other. So we multiply the first batch by 1.5 to get an average of 15. We divide the other batch by 4/3 to get an average of 15. Now the batches are normalized, but we don't see any difference between the groups.

If we have randomized the samples beforehand, we might get an average of 15 in the first batch. This is because half the samples from group 1 were in there with half the samples of group 2.
What if the batch 2 average was 30?
We might say the reproducibility wasn't great, so we'll normalize the two batches. - for simplicity, we'll divide batch 2 by 2, so the average is 15.

Now we can compare our two groups and we see that group 1 average is 10 and group 2 average is 20. Success! We fixed an issue and didn't lose our biological information.

This has become standard practice and as you brought up, so is running pooled samples along the run.
This provides extra insurance and the ability to adjust for within-batch variability/drift.
However, the pooled sample does not need to be from the samples you are running! It just has to be representative i.e. don't run human plasma as a QC alongside algae extract.

We run 3 types of QC samples (not counting blanks). 1 pooled plasma QC (which we use for all our cohorts). 1 pre-extracted pooled QC that we use to monitor system stability. 1 pooled reference sample from NIST.

I hope that helps a bit.
If someone can explain it better, please chip in.