When you think ASP, think...
Recent Articles
All Articles
ASP.NET Articles
ASPFAQs.com
Message Board
Related Web Technologies
User Tips!
Coding Tips
Search

Sections:
Book Reviews
Sample Chapters
Commonly Asked Message Board Questions
JavaScript Tutorials
MSDN Communities Hub
Official Docs
Security
Stump the SQL Guru!
Web Hosts
XML
Information:
Advertise
Feedback
Author an Article
Jobs

ASP ASP.NET ASP FAQs Message Board Feedback ASP Jobs
 
Print this Page!
Published: Wednesday, July 2, 2008

Techniques for Randomly Reordering an Array

By Scott Mitchell


Introduction


While reading through some of Jeff Atwood's old blog entries, I stumbled across this gem: The Danger of Naïveté. In it, Jeff discussed the pitfalls the can befall a programmer who implements a naïve algorithm and calls it a day. Consider an algorithm to randomly reorder an array. If you have a collection of items to reorder, the naïve approach is to enumerate each element and swap its position with an element in some other randomly-selected position. However, such an algorithm (as we discuss in this article), does not have an even distribution of results - certain permutations are more likely than others when starting from a set point.

Jeff's blog entry caught my attention because of a past article I wrote here on 4Guys, Randomly Reordering an Array. That article, written back in 2000, showed various techniques for reordering an array in classic ASP using VBScript. And, wouldn't you know it, the first technique uses the exact same naïve algorithm Jeff warns about in his blog post! Oops! (I've since updated the old article.)

After learning of the problems with the naïve algorithm I decided to spend some time exploring approaches that produce valid distributions. This article shows why the naïve algorithm produces less than ideal results and includes code for two alternative solutions. Read on to learn more!

- continued -

Dissecting the Naïve Algorithm


The naïve way of reordering an array (and the way I showed in my earlier article, Randomly Reordering an Array) is to enumerate through each item and swap it with another randomly selected item in the array. The following snippet of code implements the naïve algorithm, reordering an integer array named deck.

Random rnd = new Random();

for (int i = 0; i < deck.Length; i++)
{
   // Set swapWithPos a random position such that 0 <= swapWithPos < deck.Length
   int swapWithPos = rnd.Next(deck.Length);

   // Swap the value at the "current" position (i) with value at swapWithPos
   int tmp = deck[i];
   deck[i] = deck[swapWithPos];
   deck[swapWithPos] = tmp;
}

If you run this algorithm a handful of times it appears to correctly do a random in-place reordering of the elements in deck. But the results aren't truly random - some permutations are over-weighted while others are under-weighted.

Take a three item array with element values 0, 1, and 2. These three elements can be combined into six permutations:

Arrangements
0 1 2
0 2 1
1 0 2
1 2 0
2 0 1
2 1 0

With the naïve algorithm, each item in the array is randomly switched with another item in the array (including, perhaps, itself), meaning that for an array of n elements there are nn possible outcomes. For a three element array, then, there are 33 = 3*3*3 = 27 possible outcomes. But there are only six ways a three element array can be arranged, and six doesn't divide 27. Consequently, some outcomes in the naïve algorithm must be overrepresented and others underrepresented. This can best be illustrated by decomposing the algorithm to show the actual results.

The following (very tall) figure illustrates this breakdown for a three item array that stores the values 0, 1, and 2, and that starts with the values ordered 0 1 2. The node in the far left of the diagram is the start state. Here the first position is evaluated and randomly swapped with the left, center, or right value. These three possibilities are illustrated in the next level of nodes. This process is then repeated, but with the middle element, and then again with the third. The final list of potential results are on the far right.

The various outcomes are not evenly represented. The arrangements

  • 0 2 1
  • 1 0 2
  • 1 2 0
each appear five times in the far right level, whereas the other three possible arrangements:
  • 0 1 2
  • 2 0 1
  • 2 1 0
each appear only four times. Therefore, when randomly ordering a three element array using the naïve algorithm you are more likely to get one of the first three arrangements than you are to get one of the second three arrangements.

This uneven weighting can be illustrated by reordering the array thousands of times and tallying up the resulting re-orderings. The download available at the end of this article includes a demo that does this tallying for a user-specified array size and number of re-orderings. The demo also includes an option to view the results in Microsoft Excel, which you can use to quickly chart the distribution. The following graph shows the distribution of a three item array randomly ordered 60,000 times. Because there are six possible outcomes, each outcome should occur roughly 10,000 times. But as the following graph shows, the permutations 0 2 1, 1 0 2, and 1 2 0 are overrepresented.

The distribution of arrangements for a three item array using the naive algorithm.

Does an Imperfect Random Reordering Really Matter?
If you have implemented the naïve algorithm to display the results of some array in a random order, you may wonder whether you'll need to update your implementation with a correct one. (We examine two correct algorithms in the remainder of this article.) Whether you need to update an incorrect implementation depends on how important it is that the reordering is truly random. If you are building an online poker site where the naïve algorithm is used to shuffle a deck of cards, then it is of paramount importance that the shuffling be truly random, otherwise savvy players would know that certain deck permutations are more likely than others, and use that knowledge to game the system. Likewise, if you are building an ecommerce site and are showing random products on the homepage based on the naïve algorithm then it is important that each product is equally likely to be displayed. If you unknowingly are showing certain products more than others because of a faulty random reordering algorithm then you cannot be sure if a certain product that has fewer sales is because that product is less popular or because it's appearing less often in the product rotation. Ditto for code that is used to randomly selecting an advertisement to display.

On the other hand, if you are displaying some random quote of the day at the bottom of your page, or are showing a random headline from an RSS feed, then true randomness isn't that vital.

Fixing the Naïve Algorithm


The subtle problem with the naïve algorithm is that the number of outcomes does not divide the number of actual permutations. Consequently, there are overrepresented and underrepresented outcomes. In general, given an array size of n the naïve algorithm will always have nn outcomes while the number of permutations will always be n!. (n!, pronouced n factorial, is defined as n * (n - 1) * (n - 2) * ... * 2 * 1. For example, 3! = 3 * 2 * 1 = 6, and 5! = 4 * 3 * 2 * 1 = 120.) What we need is in algorithm that has precisely n! outcomes to match up with the n! permutations.

In 1938 statisticians Ronald Fisher and Frank Yates devised the Fisher-Yates Shuffle algorithm, which could be used to randomly order a set of items with a correct distribution of results. It being 1938, their algorithm was actually expressed in terms of steps to be performed by a human. In his seminal book The Art of Computer Programming, Donald Knuth provided the algorithm in pseudo-code. Today the algorithm is commonly called the Knuth shuffle, or the Knuth-Fisher-Yates shuffle.

The way the Knuth-Fisher-Yates works is by enumerating n - 1 positions, and only considering a swap with positions not yet enumerated. This is typically implemented in code by looping from the right-most array position to the second to left position. Each enumerated element is randomly swapped with an element position to its left (including, possibly, its own location). The code is deceptively similar to the naïve algorithm:

Random rnd = new Random();

for (int i = deck.Length - 1; i > 0; i--)
{
   // Set swapWithPos a random position such that 0 <= swapWithPos <= i
   int swapWithPos = rnd.Next(i + 1);

   // Swap the value at the "current" position (i) with value at swapWithPos
   int tmp = deck[i];
   deck[i] = deck[swapWithPos];
   deck[swapWithPos] = tmp;
}

There are two subtle but important changes in the code. First, the algorithm loops from deck.Length - 1 (the rightmost element) to the second to the leftmost element (note that it loops while i is strictly greater than 0). Second, the swapWithPos choice is limited to the positions from 0 to i, inclusive. (In the naïve algorithm the swapWithPos can be from any element in the array regardless of the current element being enumerated.)

The following diagram illustrates how this algorithm unfolds for a three item array. As you can see, the algorithm has six possible outcomes, which match up to the six possible permutations. Consequently, no permutation is over- or under-weighted.

The following chart compares the tally of arrangements using the Knuth-Fisher-Yates versus the naïve algorithm when performing 60,000 re-orderings. The red bars represent the tallies for Knuth-Fisher-Yates. As you can see, they're tight around 10,000 outcomes, which is expected. The naïve algorithm, over-represents three permutations.

The distribution of arrangements for a three item array using the naive and Knuth-Fisher-Yates algorithms.

Ordering the Contents of an Array by a Random Value


If you need to bring back rows from a database query in random order perhaps the easiest approach is to simply add an ORDER BY NEWID() to the SELECT statement. The SQL Server function NEWID() returns a new uniqueidentifier, which is a globally unique identifier (GUID). Each row is "assigned" a new uniqueidentifier, which then is used to order the results. The net result is that the rows are returned in random order. (For more information on returning rows in a random order, see: Using NEWID() to Randomly Sort Records.)

This same concept can be applied to randomly ordering arrays. In short, if you can associate some random value with each element in an array then you can sort by that value to get a random ordering. In his blog entry titled Shuffling, Jeff Atwood proposes a simple LINQ-based approach:

var shuffledCards = deck.OrderBy(elem => Guid.NewGuid());

As with our previous example, deck is an array or List of some sort. OrderBy is an extension method that is part of the LINQ to Objects library. Calling MyObject.OrderBy(func) returns an ordered enumeration of of the data in MyObject sorted by the criteria returned by the function func. The above code uses a lambda expression to define func; for each element in the array this function says to order by a new System.Guid value. The net result of the OrderBy method is an object that orders each element in deck by a GUID value, which is similar in concept and functionality to adding ORDER BY NEWID() to a SELECT statement.

Upon first glance there are two main differences between the OrderBy GUID algorithm and Knuth-Fisher-Yates. The most apparent one is that Knuth-Fisher-Yates does in place shuffling whereas the OrderBy GUID returns a new object that represents the random order. In other words, when using Knuth-Fisher-Yates you are affecting the actual order of elements in the array. OrderBy GUID, on the other hand, does not disturb the elements in the array.

The second difference between the two has to do with the asymptotic performance. Knuth-Fisher-Yates examines each element in the array once, giving it an asymptotic running time of O(n). The OrderBy GUID approach must sort the results; the best sorting algorithms have a running time of O(n log n). This difference in running time will is not profound until you get to substantially large values of n, so if you are sorting a deck of 52 cards, for instance, the asymptotic running time difference is insignificant. (The demo available for download at the end of this article lists the time taken to perform the tallies for each algorithm; you'll see that in most cases the OrderBy GUID takes 50-75% longer than the Knuth-Fisher-Yates implementation. For small values of n, the bulk of this time difference is likely accounted for by the fact that the OrderBy GUID approach must create a new System.Guid object for each element, whereas Knuth-Fisher-Yates is just performing swaps in an array.)

The more subtle yet important question is, Does the OrderBy GUID algorithm produce a valid, even distribution among the permutation? I (incorrectly) assumed that the naïve algorithm worked as desired when writing Randomly Reordering an Array, and didn't learn of my mistake until years later. To convince myself that the OrderBy GUID approach works correctly, I tallied the permutations for five item arrays shuffled 120,000 times and compared the OrderBy GUID distribution against the Knuth-Fisher-Yates algorithm. As you can see, both are tight around 1,000, which is how many times each of the 120 permutations should occur.

The distribution of arrangements for a five item array using the Knuth-Fisher-Yates and OrderBy GUID algorithms.

Compare this to the distributions for the naïve algorithm when dealing with a five element array shuffled 120,000 times. The following chart shows the distribution discrepancy between the OrderBy GUID and the naïve algorithm. The X-axis represents the various permutations and the Y-axis the number of times each permutation occurred in the 120,000 shuffles. The black triangles show the distribution for the OrderBy GUID algorithm; they are tight around 1,000. The red squares are the distributions for the naïve algorithm. As you can see, they're all over the place. Certain permutations are wildly overrepresented while others are wildly underrepresented.

The distribution of arrangements for a five item array using the naive and OrderBy GUID algorithms.

Granted, this emperical analysis does not "prove" the correctness of the OrderBy GUID approach, but it does give me more confidence in its correctness. To more thoroughly convince yourself of the randomness of the OrderBy GUID algorithm you can use the Chi-square test.

Conclusion


There are many ways to randomly reorder the contents of an array. The most intuitive approach is, unfortunately, incorrect in that it over- and under-weights particular permutations. To generate a true random sampling you need to use an alternative algorithm. In this article we looked at two: Knuth-Fisher-Yates and OrderBy GUID. The download available at the end of this article includes an implementation of both algorithms as well as a test page that you can use to tally the permutations for shuffles of different sized arrays.

Happy Programming!

  • By Scott Mitchell


    Further Reading


  • Randomly Reordering an Array
  • The Danger of Naïveté
  • Shuffling
  • The Fisher-Yates Shuffle Algorithm
  • Attachments


  • Download the code used in this article



  • ASP.NET [1.x] [2.0] | ASPMessageboard.com | ASPFAQs.com | Advertise | Feedback | Author an Article