Say I have a linked list of numbers of length N. N is very large and I don’t know in advance the exact value of N.

How can I most efficiently write a function that will return k completely random numbers from the list?

I would suggest: First find your k random numbers. Sort them. Then traverse both the linked list and your random numbers once.

If you somehow don't know the length of your linked list (how?), then you could grab the first k into an array, then for node r, generate a random number in [0, r), and if that is less than k, replace the rth item of the array. (Not entirely convinced that doesn't bias...)

Other than that: "If I were you, I wouldn't be starting from here." Are you sure linked list is right for your problem? Is there not a better data structure, such as a good old flat array list.

Well, you do need to know what N is at runtime at least, even if this involves doing an extra pass over the list to count them. The simplest algorithm to do this is to just pick a random number in N and remove that item, repeated k times. Or, if it is permissible to return repeat numbers, don't remove the item.

Unless you have a VERY large N, and very stringent performance requirements, this algorithm runs with `O(N*k)`

complexity, which should be acceptable.

Edit: Nevermind, Tom Hawtin's method is way better. Select the random numbers first, then traverse the list once. Same theoretical complexity, I think, but much better expected runtime.

Why can't you just do something like

```
List GetKRandomFromList(List input, int k)
List ret = new List();
for(i=0;i<k;i++)
ret.Add(input[Math.Rand(0,input.Length)]);
return ret;
```

I'm sure that you don't mean something that simple so can you specify further?

This is called a Reservoir Sampling problem. The simple solution is to assign a random number to each element of the list as you see it, then keep the top (or bottom) k elements as ordered by the random number.

If you don't know the length of the list, then you will have to traverse it complete to ensure random picks. The method I've used in this case is the one described by Tom Hawtin (54070). While traversing the list you keep `k`

elements that form your random selection to that point. (Initially you just add the first `k`

elements you encounter.) Then, with probability `k/i`

, you replace a random element from your selection with the `i`

th element of the list (i.e. the element you are at, at that moment).

It's easy to show that this gives a random selection. After seeing `m`

elements (`m > k`

), we have that each of the first `m`

elements of the list are part of you random selection with a probability `k/m`

. That this initially holds is trivial. Then for each element `m+1`

, you put it in your selection (replacing a random element) with probability `k/(m+1)`

. You now need to show that all other elements also have probability `k/(m+1)`

of being selected. We have that the probability is `k/m * (k/(m+1)*(1-1/k) + (1-k/(m+1)))`

(i.e. probability that element was in the list times the probability that it is still there). With calculus you can straightforwardly show that this is equal to `k/(m+1)`

.

A C++ implementation and another disguise of the same problem (infinite data stream).

There's a very nice and efficient algorithm for this using a method called **reservoir sampling**.

Let me start by giving you its **history**:

**Knuth** calls this Algorithm R on p. 144 of his 1997 edition of Seminumerical Algorithms (volume 2 of The Art of Computer Programming), and provides some code for it there. Knuth attributes the algorithm to Alan G. Waterman. Despite a lengthy search, I haven't been able to find Waterman's original document, if it exists, which may be why you'll most often see Knuth quoted as the source of this algorithm.

**McLeod and Bellhouse, 1983** (1) provide a more thorough discussion than Knuth as well as the first published proof (that I'm aware of) that the algorithm works.

**Vitter 1985** (2) reviews Algorithm R and then presents an additional three algorithms which provide the same output, but with a twist. Rather than making a choice to include or skip each incoming element, his algorithm predetermines the number of incoming elements to be skipped. In his tests (which, admittedly, are out of date now) this decreased execution time dramatically by avoiding random number generation and comparisons on each in-coming number.

In **pseudocode** the algorithm is:

```
Let R by the result array of size s
Let I be an input queue
> Fill the reservoir array
for j in the range [1,s]:
R[j]=I.pop()
elements_seen=s
while I is not empty:
elements_seen+=1
j=random(1,elements_seen) > This is inclusive
if j<=s:
R[j]=I.pop()
else:
I.pop()
```

Note that I've specifically written the code to avoid specifying the size of the input. That's one of the cool properties of this algorithm: you can run it without needing to know the size of the input beforehand and it *still* assures you that each element you encounter has an equal probability of ending up in `R`

(that is, there is no bias). Furthermore, `R`

contains a fair and representative sample of the elements the algorithm has considered at all times. This means you can use this as an online algorithm.

**Why does this work?**

McLeod and Bellhouse (1983) provide a proof using the mathematics of combinations. It's pretty, but it would be a bit difficult to reconstruct it here. Therefore, I've generated an alternative proof which is easier to explain.

We proceed via proof by induction.

Say we want to generate a set of `s`

elements and that we have already seen `n>s`

elements.

Let's assume that our current `s`

elements have already each been chosen with probability `s/n`

.

By the definition of the algorithm, we choose element `n+1`

with probability `s/(n+1)`

.

Each element already part of our result set has a probability `1/s`

of being replaced.

The probability that an element from the `n`

-seen result set is replaced in the `n+1`

-seen result set is therefore `(1/s)*s/(n+1)=1/(n+1)`

. Conversely, the probability that an element is not replaced is `1-1/(n+1)=n/(n+1)`

.

Thus, the `n+1`

-seen result set contains an element either if it was part of the `n`

-seen result set and was not replaced---this probability is `(s/n)*n/(n+1)=s/(n+1)`

---or if the element was chosen---with probability `s/(n+1)`

.

The definition of the algorithm tells us that the first `s`

elements are automatically included as the first `n=s`

members of the result set. Therefore, the `n-seen`

result set includes each element with `s/n`

(=1) probability giving us the necessary base case for the induction.

**References**

McLeod, A. Ian, and David R. Bellhouse. "A convenient algorithm for drawing a simple random sample." Journal of the Royal Statistical Society. Series C (Applied Statistics) 32.2 (1983): 182-184. (Link)

Vitter, Jeffrey S. "Random sampling with a reservoir." ACM Transactions on Mathematical Software (TOMS) 11.1 (1985): 37-57. (Link)

Similar Questions

I have a list, list_a, that contains floating numbers: list_a = [[[ 0 for i in range(40)] for j in range(1000)]for k in range(47)] And I have a sorted version of this: list_a_sorted = list_a list_a_s

I'm receiving string from client like following - String time_S = request.getParameter(Message.KEY_TIME); Now, If I want to receive a linked list data how should I do that? I tried to use getParamete

I'm attempting to add elements from a list of lists into a set. For example if I had new_list=[['blue','purple'],['black','orange','red'],['green']] How would I receive new_set=(['blue','purple'],['b

I have a question to ask about a part of a code on linked list I didn't understand. I'm building a doubly linked list and there's a part where I'm stuck hoping that someone could help me. I'm very new

How do you supposed to remove n numbers of element from a linked list? for example.. given the start, and end index 2, 4. my list contains: 1, 4, 6, 7, 8, 4 after the call,(6,7,8 should be gone) my

I have an array, say: books[10], and a linked list, say: bookList. I want to insert all elements in books[10] into the linked list, but in an random order. Solution 1: Shuffle the array, then insert f

I am trying to write a function that takes in as an argument a linked list of structs. One of the elements of these structs is a value for storing the position, example struct #3 has 3 for the element

I have a set of unique elements (there are not two identical elements). And I would like to extract N random and different elements from the set. What is the easiest way to do it in Java?

How can I sort elements in single-linked list (by some value like list->id)? The idea I've came up with is to find maximal value and save its index. Then iterate through the list so that the *head

I am attempting to display a series of images in a random order. However, I do not want any single item to repeat until all items have been shown, so instead of selecting a random image from the array

I'm experimenting with a game program. I'm trying to have an random number of items generate. The code will produce the same item multiple times. I can get away with setting up a series of switch stat

I've a list of float numbers and I would like to delete incrementally a set of elements in a given range of indexes, sth. like: for j in range(beginIndex, endIndex+1): print (remove [%d] => val: %

I have two lists Lx and Ly, each element from Lx has a corresponding label in Ly. Example: Lx = [[1,2,5], [5,2,7], [7,0,4], [9,2,0], [1,8,5], [3,4,5], [3,2,7], [2,9,7]] Ly = [A, C, A, B, A, B, C, C]

I have a map of items with some probability distribution: Map<SingleObjectiveItem, Double> itemsDistribution; Given a certain m I have to generate a Set of m elements sampled from the above dis

I'm struggling with a C issue and I can't figure what I'm doing wrong. I'm using a linked list to store words. When I execute the following: list *myList = list_new(); list_append(myList,Word_01); l

I'm looking for the fastest way to select the elements of a numpy array that satisfy several criteria. As an example, say I want to select all elements that lie between 0.2 and 0.8 from an array. I no

I have been given this problem to solve: Suppose you're given a doubly-linked list and a pointer to some element of that list. Let's suppose that you want to search the list for some value v, which y

Given position, how would I go about returning the value of that given position, and also removing that value from the linked list? What I have, I think, only works for removing a value, but not retu

I am trying to make a recursion function that will print all the elements in a linked list backward. This is the function that I have made: void lista::printBack(node *pocetak) { if (pocetak==NULL) {

public class Item { public int Id {get; set;} public bool Selected {get; set;} } List<Item> itemList = new List<Item>(){ /* fill with items */ }; I need to create a list of Items that mee

i have one problem with my code ,i did a sample program to display the emp details from a linked list,now the problem when i trying to delete a particular entry means it doesn't work,i hope i did some

I made a node class which is a linked list class. Is there any way I can print out elements in this list ? I made my print() method but it only returns the first element which is 21. How do I iterate

so I am grabbing a BIG list of elements using JSoup from a page. When I say big, I mean like a few hundred elements. I know that the elements are there because I converted them all to one huge string

I'm trying to write a method that removes all instances of a value from a singly linked list, but I'm having some trouble. I'm receiving a nullpointerexception on line 8 of this code: public void remo

I want to create a linked list of numbers from 1 to 1000 and print the numbers. Im using the function createList() to create the list and printList() to print the elements. But the following code is c

I have a huge linked list of integers (let's say it has size N, but N is unknown to me) and want to get k random values from it in minimum possible time/space. I think it must be possible to write a v

How do I pick a random element from a set? I'm particularly interested in picking a random element from a HashSet or a LinkedHashSet, in Java. Solutions for other languages are also welcome.

I have an error and a warning while trying to compile my linked list implementation in c warning: assignment from incompatible pointer type error: dereferencing pointer to incomplete type here is the

I would like to find out if there is a function that can generate random numbers from a set of numbers in Matlab? For instance say I have the set [-1 1]. How can I generate numbers from that set? I ha

As in the title, I want to use Knuth-Fisher-Yates shuffle algorithm to select N random elements from a List but without using List.toArray and change the list. Here is my current code: public List<

How would i create a linked list to hold my data in Ocaml? Im trying to make a singly linked list, however im having trouble with the syntax. I just want to make a module to simply get the 'a from the

I recently looked at linked list for saving large amount of data. However I am stuck at coming up with a good way to save data for linked list of linked list. The following is a pseudo code of what I

As the title says: how to you remove a random item from a list? I am making text based game, and I have a list in which I want to randomly take an item from and then remove it from the list, as seen b

I am developing a radio streaming application in android , I have a url in my linked list which is a parsed content from a .pls file link , What i needed is to parse the content from the url for that

Every time I run my Doubly Linked List, all of the methods work except when I remove from the back of the list. I had a List that was 4, 3, 9. I removed from front (which took away the 4). Then, I cal

I'm trying to implement a function that swap two nodes of my double linked list, in order to sort the content of the current directory. But my function seems to 'delete' some elements of my list, here

Problem I'm writing a simple Java program in which I have a TreeSet which contains Comparable elements (it's a class that I've written myself). In a specific moment I need to take only the first k ele

I made an SListClass which stands for sinle-linked list class and a node class SListNode. I'm having a problem with the removeLast method. When I print out the list of nodes, the first item is still t

Presuppose there is a linked list with information. My Program architecture rests upon being able to find out if certain information already exists in an individual node in this linked list. (Ie, see

I have a list of custom class. The custom class contains a number of fields. I want to return the entire list but manipulate some of the fields and return all other fields unchanged. Lets say my class

This is essentially a more constrained version of this question. Suppose we have a very large text file, containing a large number of lines. We need to choose a line at random from the file, with uni

I need to save the linked list I created to a file but I want each part of the users account to be its own element. i.e. (username, password, email, name, breed, gender, age, state, hobby). Something

I've created class for building a linked list. The class declaration is as follows: class LinkedList { private: int data; LinkedList *next; static int count; public: LinkedList(void); ~LinkedList(void

if I use a for-each loop on a linked list in java, is it guaranteed that I will iterate on the elements in the order in which they appear in the list?

while loop (so that while the current element in the list is less than the parameter, go to the next element)....but how do I do this? I can't do < or compareTo, so I'm not sure how to do this.

How can i find the size (num of elements) of a linked list using a function in F#? I would like to find the size of linked lists like the following: type rNumber = Integer of int;; type lists = Nil |

I have the following linked list program in java which works fine excep for the reverse linked list function. What am i missing? public class LinkedList { private Link first; public LinkedList() { fir

I need to delete an element from linked list where address of that element is given.Something like this 1->2->3->4->5 a1 a2 a3 a4 a5 where a1,a2..a5 are addresses of elements 1,2 ..5 resp

I have an issue where I am trying to delete an entry from a linked list but it causes a segmentation fault no matter where I try to delete the item from (head, middle, or tail). I'm not sure where the

I need to generate a list with unique elements parties = ['Party A', 'Party B'] I have tried this def party_generator(size=1, chars=string.ascii_uppercase): parties = [] for y in range(2): party = '