Changing test types for just a minute. I ran this:

The bounds, 600,000 and 30,000,000 respectively. Suggested by preda, I reduced the B1 and left B2 at it default. Below was the result of trying to begin Stage 2:

Exception gpu_error: MEM_OBJECT_ALLOCATION_FAILURE clEnqueueCopyBuffer(queue.src, dst, 0, 0, size, 0, NULL, NULL) at clwrap.cpp:339 copybuf
I found the issue, maxAlloc. I had it too high, 7,000. I dropped it to 6,000 and it went on the Stage 2. Caveat: For a B2 of this size, at least 12 hours, perhaps more.

Some Nvidia GPU's are better suited for running specific tasks, while different models may be better with others. Mine, it has something to do with integer math, or type of integer math. It is great at running TF. Anything else, not so much.
