Arcanaville Statistics 201: how to calculate probabilities for people who don't like math.
The basic idea behind calculating probabilities comes down to two basic principles and very simple math. The first one is what I call the counting rule, and it is this:
If there is a set of equally likely occurrences, and you are interested in a subset of them, the odds of what you're interested happening are just p/N, where p is the number of different things you are interested in, and N is the total number of possible occurrences.
That's simple enough: it is just counting. What are the odds of rolling a two on a six sided die? One in six. Six ways for the die to roll, only one of them is a two, so the odds are 1:6, or about 16.7%. What are the odds of rolling an even number? Three in six, or 3/6, or 1/2, or 50%. Easy. Just count.
But what if it is difficult to count? What if the different things that can happen are not all equally likely? The second rule is what I call the divide and conquer rule:
If the odds of a set of things happening is p, and the odds of any one of those things in the set is q, then the odds of any one of those things happening is p * q
You attack something with an attack that has a 10% chance to crit. What are the odds of getting a critical hit, if your tohit chance is 85%? It is just 0.85 * 0.1 = 0.085, or 8.5%. First you have to get a hit at all, then you have to actually trigger the critical.
Most of statistical analysis is just about applying these rules correctly, and sometimes it can be non-obvious. Statistics is actually less about math in terms of calculation, and more about applying the rules correctly. And it is an open secret in the world of statistics that the rules are often not applied correctly. The fact that a large percentage of math professors get the Monte Haul paradox wrong is a testament to the fact that even though they know the rules, they often let their intuition override proper mathematical reasoning. Even though that paradox has a trivial mathematical resolution every math professor should know.
How do we analyze something like the lottery. We apply the rules, without letting intuition or word games get in the way. What are the odds of a single ticket matching the powerball draw? Well, first lets figure out how many possibilities there are. The powerball lottery draws five numbers from a set of 69 balls labelled one through sixty-nine, and a special "powerball" from a separate set of balls labeled one through twenty-six. How do we determine how many ways there are to do that?
Lets start with the first five. Obviously there are 69 possible ways to draw the first ball, because there are 69 of them. Once that first ball is draw, how many possibilities are there for the second draw? 68, because in every case there are only 68 balls left. Then 67, then 66, then 65. Using the two rule above, you might guess there are 69*68*67*66*65 ways to draw those balls, and you'd be correct.
However the Powerball lottery does not require you to pick all of the balls in the correct order. It only requires you to get all the numbers right. In other words, according to the rule of powerball, the draw "1,2,3,4,5" is considered identical to the draw "5,4,3,2,1." So actually, the number we calculated above is larger than the actual number of possibilities, because we've double counted many of them (more than double, actually). Now what? Well, actually we can still figure this out, because we can use our rules in reverse. The first rule, the counting rule, says that if you want to know what the odds of something happening are, you just count up the possibilities. In this case, we've counted too high. What we need to figure out is, for every "real" possibility, how many times have we repeatedly counted the same thing, and reduce our number by the overage.
Consider a simple case where we pick two numbers. We could pick 1,2, or 2,1. Both are the same "powerball" draw of a one and a two. We've counted the same possibility twice. So we'd divide by two to get the true number of possibilities. We'd normally calculate the total possibilities as 69* 68 = 4692, but because we counted everything twice the real number is 69*68/2 = 2346. There are two thousand three hundred forty six ways to draw two powerball balls. Math people would say there are 4692 permutations of two balls, but only 2346 combinations. Permutations are when order matters, combinations are when order doesn't matter.
So what happens when we pick five balls. Well, we can play a neat trick here. We want to know how many ways we overcounted those five ball sequences. And we know how to do that already. In effect, it is like we have five balls in a bag, and we want to know how many possible ways - permutations in this case - there are to pull those balls out. And that's just 5 * 4 * 3 * 2 * 1. So there are 120 ways you can order five balls. So for every possible combination of powerball, we've overcounted by 120 times. So we take 69*68*67*66*65 and divide by 120, and we get 11,238,513. That's the number of possible ways to draw the first five balls. The actual "powerball" is a separate ball picked from a separate set of 26 balls. There are exactly 26 ways to do that. So the total number of possibilities is 11,238,513 * 26 = 292,201,338. Only *one* of those possibilities is selected at drawing time. So there's only one winning sequence out of 292,201,338 possible sequences they could draw. So if you enter a ticket and that ticket has a specific sequence of numbers on it, the odds that that specific sequence will match the drawn sequence is one in 292,201,338. That's where those numbers come from.
This is all following the rules for probability. Notice that number doesn't rely on anything besides counting possibilities. In effect, we've mostly just used rule one. But what if you want to look at more complicated things, things that can confuse people. What if you wanted to answer the question: if the powerball authority says there's a winning ticket out there somewhere, what are the odds that it is my ticket?
That's more complicated, but the rules still work. It is just that now, you'll likely need to do more math because there are too many possibilities to count easily. You could, in theory, but the numbers are too big. Basically, if you know there is a winner somewhere, then the total number of possibilities for what came out of the powerball machine aren't 292,201,338 anymore, at least not necessarily. The new piece of information we have - that there was a winner - means the machine could have only picked a sequence that someone has entered. If there are a billion entries into the powerball, the odds are pretty good that every single sequence was played by at least one person, but if only a million entries went in then it is impossible that all 292 million were played, because there aren't enough entries to do that. If we knew how many tickets were entered, and if we made the assumption that all ticket sequences were selected randomly (they aren't in reality, but it is not an unreasonable approximation most of the time) we could then calculate how many sequences are *likely* to have been played, out of all of them. Unfortunately, that's Arcanaville statistics 401. Let's just assume we can. If big N is 292,201,338, we call little n the number of sequences that were actually played by players, and we do what mathematicians do when they don't know something: use the letter as if we knew what it was.
Since the total possibilities for the draw are no longer big N but little n, the odds of one specific ticket being the winner is no longer 1/N, but 1/n. Less possibilities, so the odds change. But does this mean that the odds of a ticket winning depend on how many tickets are entered? No, and that's because we started by assuming there was a winner. What do the rules say about what the odds of winning are overall?
Well, they are 1/N, because the rules say so. But what if we want to analyze this the hard way? Well then why are you reading this post: go read a stats text book. Okay, fine. The rules say that to figure out what the odds of a specific ticket winning, we can try to count up all possibilities. First, there is the case where there is actually a winner. We know there are n possibilities there. Of them, one out of n is a specific ticket sequence. The odds of a specific ticket winning *if there is a winner* is one out of n. But that's not the set of all possibilities. What about the case where there is no winner? Well, if n of the draws picks a winner, then the rest don't. That's N-n. In all of them, that specific ticket doesn't win, obviously. So the odds of the ticket winning, using the counting rule, is that it is one out of n + (N-n) = N. There are n possibilities when there is a winner, N-n where there is no winner, and the total is the sum of the two. That's obviously N, but that's how that works.
Notice that no matter how we choose to do it, the easy way or the long way, the odds are the same. That's because they have to be. The point of view shouldn't change the odds. When two different mathematical methods of calculating something generate different answers, the problem is generally not with the math, but with the mathematician. In fact, it can be a valuable way to double check yourself. If you know two ways to do something and you do them both and they agree, that would tend to suggest you haven't made an error. If they don't, pretty sure you did.
Man, this got longer than I thought it would. But it is still shorter than a stats textbook. If there any City of Heroes players out there still awake, consider this final thing. We used to calculate the odds of hitting a target all the time. Well, I did anyway. We needed to know the tohit chance, which is kind of like asking how many possible tohit rolls are hits. We knew how many possibilities there were in total, because we roll 0-100 (there's a teeny tiny complication here that isn't important to this discussion, lets leave that alone for now). We then calculate the odds of hitting as all the ways you can hit, divided by all the ways there are to roll tohit total. "75% chance to hit" was just another way of saying 75 rolls hit out of 100 rolls. Or, because the tohit system seemed to round off to the nearest .0001, you could say 7500 rolls hit out of 10000. Did we ever ask whether anything else was rolling tohit rolls at the same time? Of course not. Because that doesn't matter. All that matters in this case is how many possibilities hit, and how many total. Divide, done.