This replaces the internal mutex with a semaphore, so we're only using a
single synchronization primitive to implement this, and cleans up some logic
around wait timeouts.
This now matches the logic of the originally cited work, from BeOS.
Fixes#3639.
(I think.)