If you were eating a soup from a bowl with 500ml of soup taking 25ml spoonfuls, and the rain replaced the volume that you ate at the same rate as you ate it, how many spoon fulls would it take for the soup to be completely replaced with water? Also, when that happens, would it still be the same soup?

  • NeatNit@discuss.tchncs.de
    link
    fedilink
    arrow-up
    41
    ·
    edit-2
    7 days ago

    The answer is about 1144.

    Ok, let’s do the probability math properly. Others have mentioned how it’s a matter of probability how long until the last molecule of soup is taken out.

    Suppositions:

    There are N molecules in every ml of soup and every ml of water.

    All soup molecules are the same.

    Every spoonful takes out exactly 25N molecules out of the bowl selected at random, and they are immediately replaced by 25N molecules of water.

    At the start, there are 500N molecules of soup in the bowl.

    The question is:

    How many spoonfuls is it expected to take until all soup molecules are removed?

    For every spoonful, each molecule of soup in the bowl has a 25/500 chance of being removed from the bowl.

    For ease of calculation, I will assume that each molecule being removed is independent of all others. This is technically wrong, because this implies that there is a very very tiny chance that all soup molecules are replaced in the very first spoonful. However, for the large number of molecules we are going to be working with, this shouldn’t affect the final result in any meaningful way.

    Number all soup molecules in the bowl: 1, 2, …, 500N.

    Define X_i to be the number of iterations it took until molecule i was removed. All X_i are I.I.D.:

    P(X_i = 1) = 25/500 P(X_i = 2) = (475/500) * 25/500 P(X_i = 3) = (475/500)² * 25/500 … P(X_i = n) = (475/500)^(n-1) * 25/500 …

    This is a geometric distribution with p = 25/500.

    Now what we’re interested in if the maximum value between all X_i

    That is: max_i { X_i }

    Specifically we want the “Expected Value” (basically the average) of it: E[ max_i { X_i } ]

    This is exactly the question asked here: https://math.stackexchange.com/q/26167

    According to the answer there, there is no closed-form exact answer but a very good approximation for the solution is:

    1/2 + (1/λ) H_500N

    Where λ = -log(1-p) and H_n is the nth harmonic number.

    Now it’s just a matter of plugging in the numbers.

    According to Wolfram Alpha, there are N = 3.33310^22 molecules in 1mL of water, or 1.66610^25 in 500mL.

    Again using Wolfram Alpha, the Nth harmonic number is H_500N = 58.652

    With the formula given we get λ = -log(475/500) = 0.051293

    Plugging it all in we get the expected number of spoonfuls:

    0.5 + (1/0.051293)(52.438) = 1143.97 spoonfuls on average.

    With enough coercion we can also force Wolfram Alpha to do the whole calculation in one go: 1/2 + 1/(-log(1-25/500)) * harmonic number (number of molecules in 500mL of water/molecule) giving 1143.9743.

    Edit: initially used N instead of 500N and got the wrong answer of 1022.

    • TranquilTurbulence@lemmy.zip
      link
      fedilink
      English
      arrow-up
      19
      ·
      7 days ago

      Wow, someone actually bothered to do it properly! I just wrote some horrible R code and ended up with 1146 spoons to get to 50% probability of having either 1 or 0 soup molecules. So good to see that the answers were so close.

      • NeatNit@discuss.tchncs.de
        link
        fedilink
        arrow-up
        5
        ·
        7 days ago

        :)

        What I would like to do is give a margin of error, e.g. “there is a 95% change that it will be between spoonful 1000 and spoonful 1300” or something like that. But I don’t have the time to figure that out now, sounds like it would be harder to figure out than the expected value.

            • TranquilTurbulence@lemmy.zip
              link
              fedilink
              English
              arrow-up
              4
              ·
              6 days ago

              I thought of making a vector with a length of about 1.671398e+25, but then I remembered what one time when when I tried to make a linear model with hundreds of dimensions. So yeah… We have gigabytes of RAM, and it’s still not enough. Not really a problem, as long as you don’t try to do anything completely ridiculous.

              Instead, I just made a variable that simply contains the number of soup molecules and another one for the number of water molecules. Far simpler that way.

              Here’s where the magic happens:

              # Number of soup molecules drawn
              soup_molecules_replaced <- rbinom(1, replacement_count, prob_soup)
              

              The rbinom function is used to generate random numbers from a binomial distribution. It’s a discrete probability distribution that models the number of successes, i.e. scooping out a soup molecule. Rest of the codes is just basic infrastructure like variables, loops, etc.

              BTW the variable names look ugly, because I couldn’t be bothered to tidy everything up. I really prefer camelCase, whereas Mistral seems to prefer underscores. That’s what you get for vibing.

              Side note: If you do this kind of stuff for private purposes, you have to rely on your own hardware. If you plan to publish your discoveries, universities and publicly funded supercomputers might be an option. If there exists a Journal Of Recreational Mathematics And Useless Simulations (JORMAUS), I could totally publish this stuff and maybe even run my code on a supercomputer.

              • NeatNit@discuss.tchncs.de
                link
                fedilink
                arrow-up
                2
                ·
                5 days ago

                Interesting. I don’t know why I didn’t think of just keeping a count of soup molecules. Must have been late!

                Another interesting point, your simulation is subtly wrong in a different way from my calculation. When there is only one soup molecule left, there is a chance (however tiny) that rbinom will return 2 or more, taking out more soup molecules than there really are.

                If you run it enough times with a bowl of 3 molecules and a spoon of 2 molecules, I’m sure you’ll hit -1 soup molecules some of the time.

                For a simulation I think we can do better. There must be a random function that does it properly. The function we want is like pulling balls of 2 colors out of a sack without replacement. Pretty common combinatorics question, I would expect a random function to match.

                • TranquilTurbulence@lemmy.zip
                  link
                  fedilink
                  English
                  arrow-up
                  2
                  ·
                  5 days ago

                  You’re right. I just ran rbinom 1E7 times and found that the probability of over drawing soup molecules is a bit too high for my taste.

                  When there’s only 1 left, you usually end up drawing 0 or 1 molecule. However, in rare cases, it can be higher, such as 2, 3, 4… molecules.

                  About 92% was 0, and 7.7% was 1, but the others were not negligible! There’s about 0.3% probability of over drawing, which is way too high for a simulation as serious as this one. In this quick test, there were 20 incidents where rbinom wanted to pull out 4 soup molecules when only 1 was available. We can’t have that, now can we!

                  • NeatNit@discuss.tchncs.de
                    link
                    fedilink
                    arrow-up
                    2
                    ·
                    5 days ago

                    In python the closest I could find was (untested): sum(random.sample([1, 0], spoon_size, counts=[soup_count, water_count]))

                    But this would create an intermediate list of length spoon_size which is not a good idea.

          • NeatNit@discuss.tchncs.de
            link
            fedilink
            arrow-up
            2
            ·
            6 days ago

            You rock! Thank you :)

            If I find myself in the right mood I might try to work out the actual distribution. If I do, your simulation will be a very handy sanity check!

            • TranquilTurbulence@lemmy.zip
              link
              fedilink
              English
              arrow-up
              1
              ·
              edit-2
              6 days ago

              Thanks!

              If you want, I can increase the sample size. Just need to figure out how to add a timer to the code and set it to run for a few hours, maybe even overnight. A histogram with 10^6 bins should look pretty smooth.

              Should also upgrade the visualization to ggplot2. Frivolous computations like this deserve no less.