C++ Lock Free Multi Producer And Consumer Circular Queue
I. Overview
In this article I want to discuss a multi producer, multi consumer lock free queue. This is differentiated from a single producer, single consumer lock free queue in that multiple writer and reader threads can push and pop from the queue respectively. See Wikipedia Circular Buffer and previous Linked In article C++ Lock Free Circular Queue.
II. Single Producer Push
Let's take a look at the single producer push routine from the previous article where I have taken the liberty of upgrading the memory order from relaxed to acquire where necessary since we are looking to use this for multiple producers.
Why does this not work for a multiple producers? It is pretty easy to see how this would not work even in the case of two producers. If we have two threads trying to push an element at the same time then they both will get the same write index value. Then they will both see a queue not full, and finally they will both construct different elements into the same memory location. Other bad things may happen, but at a minimum we will have data corruption and missing elements which will only compound with the more producers that we have.
III. Multi Producer Approach
So how can we change this to allow multiple producers?
A. Increment Write Index on Entry
If we changed the first operation to be a fetch add as follows:
This would definitely solve the issue of the write index value being the same for multiple producers. However, it introduces a couple other problems. First, if we have two producers and the queue is full, then the first one to enter will see the full queue and return. However, the second one to enter will have a write index which now exceeds the read index, thus allowing a write on a full queue. Second, as soon as we increment the write index, it indicates an element has been added to the queue so a read could take place on an element which has not been properly constructed yet. There are other issues, but this is a no go.
B. Pending Write Atomic Counter
What if we used a pending write atomic counter with something like the following:
This would solve the issue of overrunning the read index with the write index. However we need a way to mark a write index as pending to construct an element to. There is no relation ship between the pending counter and the current write index which would give a unique pending write index. Also, incrementing the write index to mark as pending, results in the same situation in A above where we try to read a not properly constructed element.
There is an even more subtle issue here. Even if we were able to get a valid pending write index; there is no guarantee for the order that the elements are constructed. If we have two producers trying to write to element 0 and element 1 respectively, then element 1 may complete first. But we cannot update the write index to this element since element 0 is not constructed yet. So this idea keeps increasing in complexity with no obvious path forward.
C. Two Stage Commit
What we need is a way to allow the write index to advance to reserve space, but also be able to indicate that an element has been properly constructed. A solution to this which is also used in databases in called a two stage commit. Instead of storing just our data in the queue we now store a structure as follows:
Similar to before, we have a block of memory which we can construct an element in, and we have added an atomic bool to indicate whether this element has been properly constructed or not. So now we can advance the write index to reserve a memory location in order to put an element in and mark when it is finish constructed and valid to be read.
There are other ways to indicate a two stage commit using std::atomic_flag or reserving a bit on the address through alignment.
IV. Multi Producer Push
With the approach outline above we can now create a lock free, multi producer push routine as follows:
We read the current write and read indices and check for a full queue. While this does give the same write index to multiple producers to begin with, we resolve that problem later on. If the queue is not full we try to advance the write index to reserve a spot to put the data into. If we do not succeed then someone else has advanced already, and we will get the new write index and loop again to check for a full queue.
If we were able to reserve a spot then we need to wait for the element to be in a deconstructed state since it is possible that this element is currently being read and we would not want to write into it and corrupt the value. Once not in use, we construct the element in the reserved spot and then mark it as valid. This sequence completes our two stage commit. We also do not consume a possibly moved in value if we are not able to place it in the queue.
V. Multi Consumer Pop
The changes to the pop routine are similar to those for the push routine and we end up with a routine as follows:
We read the current read and write indices and check for an empty queue. Similar to the write it is possible that multiple consumers get the same read index, but that is properly resolved later on. We try to grab an element to read by advance the read index. If we fail, then someone else has already grabbed it and we get the new read index. Then we loop again checking for an empty queue.
If we were able to grab an element then we spin waiting for the element to be properly constructed since it is possible that we are writing to the same element we are trying to read. Once it is properly constructed we read out the data and call the destructor if necessary. Finally we mark the element as no longer used, and return the value.
An examination of this routine shows that they are almost mirror images of each other.
VI. Code
The code for the entire circular queue is given as follows:
VII. Performance
We will get a relative comparison by running both the single producer, single consumer version and the multi producer, multi consumer version through the same testing framework with verification. I will caveat this with a statement that this is not a production level performance test.
The result of the single producer, single consumer queue is as follows:
The results of the multi producer, multi consumer queue where there are always an equal number of reader and writer threads is as follows:
We see here that the performance is relatively the same as the SPSC queue and scales reasonably well to multiple threads.
VIII. Summary
In this article we saw how we can modify our single producer, single consumer lock free queue into a multi producer, multi consumer lock free queue using the concept of a two stage commit.
In a future article we can now use this queue in combination with the lock free stack to produce a non memory leaking lock free stack.