I have not gone into details with the statistic calculation here, but im quite puzzled by the following.
I need to generate usernames with 2 letters (from A-Z), followed by 2 numbers (from 0-9) and then again 2 letters, for example AB45DE.
I made a formula to do that:
=CHAR(RANDBETWEEN(65,90))&CHAR(RANDBETWEEN(65,90))&RANDBETWEEN(0,9)&RANDBETWEEN(0,9)&CHAR(RANDBETWEEN(65,90))&CHAR(RANDBETWEEN(65,90))
Im not much into the calculations of likelihood, but this should have more than 45 million combinations, ie (26*26*10*10*26*26)=45.697.600
I then do this for 10.000 rows, hoping for 10.000 unique usernames. But what happens more often than not is that I end up with 1, 2, 3 or more duplicates in the 10.000 rows. This seems very strange and unlikely to me.
I made a sheet to illustrate this - its generates the names in column A and shows if its duplicated in column B (2 or more is a duplicate), and the sum of duplicates in D1 - it runs by clicking the Randomize button (takes a little while to run) and it converts to values at the end so it will stay the same until next run. Try running it a few times and you will get duplicates.
Can it really be statistically correct that you get duplicates so often in 10.000 rows, outta 45 million possible combinations or is the Excel RANDBETWEEN really not that random after all?
Bookmarks