美文网首页
Reservoir Sampling

Reservoir Sampling

作者: WTIFS | 来源:发表于2016-08-11 19:23 被阅读4次

Problem:

  • Choose <code>k</code> entries from <code>n</code> numbers. Make sure each number is selected with the probability of <code>k/n</code>

Basic idea:

  • Choose <code>1, 2, 3, ..., k</code> first and put them into the reservoir.
  • For <code>k+1</code>, pick it with a probability of <code>k/(k+1)</code>, and randomly replace a number in the reservoir.
  • For <code>k+i</code>, pick it with a probability of <code>k/(k+i)</code>, and randomly replace a number in the reservoir.
  • Repeat until <code>k+i</code> reaches <code>n</code>

Proof:

  • For <code>k+i</code>, the probability that it is selected and will replace a number in the reservoir is <code>k/(k+i)</code>
  • For a number in the reservoir before (let's say <code>X</code>), the probability that it keeps staying in the reservoir is
    • <code>P(X was in the reservoir last time)</code> × <code>P(X is not replaced by k+i)</code>
    • <code>P(X was in the reservoir last time)</code> × (<code>1</code> - <code>P(k+i is selected and replaces X)</code>)
    • = <code>k/(k+i-1)</code> × (<code>1</code> - <code>k/(k+i)</code> × <code>1/k</code>)
    • = <code>k/(k+i)</code>
  • When <code>k+i</code> reaches <code>n</code>, the probability of each number staying in the reservoir is <code>k/n</code>

Example

  • Choose <code>3</code> numbers from <code>[111, 222, 333, 444]</code>. Make sure each number is selected with a probability of <code>3/4</code>
  • First, choose <code>[111, 222, 333]</code> as the initial reservior
  • Then choose <code>444</code> with a probability of <code>3/4</code>
  • For <code>111</code>, it stays with a probability of
    • <code>P(444 is not selected)</code> + <code>P(444 is selected but it replaces 222 or 333)</code>
    • = <code>1/4</code> + <code>3/4</code>*<code>2/3</code>
    • = <code>3/4</code>
  • The same case with <code>222</code> and <code>333</code>
  • Now all the numbers have the probability of <code>3/4</code> to be picked

相关文章

网友评论

      本文标题:Reservoir Sampling

      本文链接:https://www.haomeiwen.com/subject/fpsssttx.html