Quote:
Originally Posted by LaurV
Very nice, congrats!
How do you make the base switch "in place"? I have some ideas how to transform from base 2 to base 10 quite fast and also have some checkpoints, but you would need more storage space (it can't really be done "in place"), and I don't believe this is new, for sure somebody else was thinking to it before. I "invented" it long time ago and used it in my programs in the past, but never for such large inputs.

To answer your actual question of how I do the conversion "in place". The algorithm is the scaled remainder tree. The only operations are multiplications. There are no additions or subtractions at all.
Each multiply is a middleproduct FFT that splits a 2Nbit binary input into two Nbit binary outputs which are written back into the same memory  overwriting the 2Nbit input. One of the Nbit outputs is the same as the upperhalf of the 2Bbit output. Thus instead of overwriting the entire memory region, the "new" Nbit output overwrites the nolongerneeded portion of the 2Nbit input. This partial write is hard to checkpoint due to problem (2) of the previous post.
Since the inplace multiply is the only operation for the entire radix conversion, these inplace multiplications form a dependency chain/tree that prevents any sort of checkpointing at any step.