You probably know of hotel infinity - a hotel with infinitely many rooms, but also with infinitely many guests. A new guest arrives, how do you fit them in? The solution is to move all the existing guests down by one, so room 0 is empty. It's a nice demonstration of infinite sets, but it turns out to solve a problem quite nicely: memory allocation where there is no capacity for a 'supervisor' that can manage global state.
A GPU is a vector processor: it runs the same set of instructions on several hundred cores at the same time - each with different input data. So if you have an array of data 'M', the cores will run in parallel to create a new array 'N'. In an iterative algorithm, 'M' is replaced by 'N' before repeating the computation. On such a processor, can you do 'dynamic' memory allocation?
Lets say I have an array of memory 'M' containing elements of data 'D'. M is statically sized, and a slot being available is indicated by 'D' being Null. Each core is assigned an element in 'M'. It can either keep the existing value or 'overwrite' it with something else. I have new data 'Dnew' to store, and 'Dnew' is accessible to all the cores - but how does a core figure out if it needs to overwrite it's current value with 'Dnew'?
- The 'obvious' solution of checking if 'the current slot' is free results in data 'Dnew' being written to every empty slot
- The 'usual' solution of storing the index to the lowest empty slot raises the issue of updating the index: None of the cores can write to it while operating on the array.
- The simple solution of 'iterate through the array until an empty slot is found' needs to be extended with 'and then check if I am the empty slot'. This approach is memory bandwidth heavy.
- You can't have a single GPU core check/write the value of the lowest available slot somewhere because GPU's don't work like that. You'd have to copy the array to the CPU, and do the compute there - this forces a GPU/CPU synchronisation event, stalling the CPU and then the GPU.
But hotel infinity gives us a good solution: Every core copies the data from the lower address, and give the zeroth slot our new data!
I did an implementation on shadertoy to test the idea: Up arrow adds a random value to the stack, down arrow removes one.
The expansion of 'what if an infinite number of guests arrive' with the solution 'multiply by two' may also be useful. Let's say I have an array 'M' with data 'D'. All my cores have just generated some new data 'Dnew' (each core has different data results) and the results are in the array of data 'Mnew' which we want to put into our array M. The solution? Odd numbered cores read from Mnew, even numbered cores read from 'M' at 1/2 their address number.
Now, hotel infinity is infinitely sized, but our array 'M' is not. Can we tell if these operations will succeed? Yup. If you're adding one new item, check that the final slot is empty. If you're merging arrays, check that slot `MemSize/2` is empty - assuming that the data is continuous.
Another challenge is deletion/maintenance. An explicit deletion condition is easy enough "any D with value 'x' set itself to null", but how do you handle gaps forming? Shuffle them along! Every cell checks the cells above and below:
- if the cell below is empty, overwrite self with Null
- if the cell itself is empty and the cell above is full, overwrite self with the cell above
So there we have it. You can manage an object pool with no supervisor!