The problem a review committee is solving is "estimating" the quality of the submitted papers. It is highly likely that the actual qualities has a power law distribution, and in that case the randomness is probably an effect of that.
A small fraction of the submitted papers are very high quality. A smaller number of review samples are enough to confidently filter those in. In this zone, the larger spacing in quality difference from paper to paper also allows a higher measurement error without harm.
As we go down the knee of the distribution, its a crowded zone with many papers having very similar quality value. It is hard to differentiate between those as the sequence can look like 6.124, 6.122, 6.117, 6.114, 6.113 and a slight error is enough to change the ranking.
If the 10% papers under the experiment all received the same number of reviews (measurement samples), this outcome is likely. For ordinary papers, we actually need more review samples to be confident about the score estimation, or the error in measurement will change the ranking, and show up as "randomness of the committee".