View Single Post
Old 2019-10-13, 18:50   #7
kriesel's Avatar
Mar 2017
US midwest

2×3,049 Posts
Default GpuOwL attempt

Tested draft for setup of gpuowl (will create gpuowl folder, git clone and build from latest committed gpuowl source; tested long ago, producing the attached build. Considerably better performance is available with newer builds, especially ~V6.11-357 to -380. Users are strongly encouraged to use builds of at least V6.11-318 for PRP proof capability. If wanting merged P-1/PRP, V7.2-53 is suggested for best performance.)

#draft Notebook to set up a gpuowl Google drive folder for a future Colab session
import os.path
from google.colab import drive
import sys
if not os.path.exists('/content/drive/My Drive'):
%cd '/content/drive/My Drive//'
!chmod +w '/content/drive/My Drive'

if not os.path.exists('/content/drive/My Drive/gpuowl'):
  !mkdir '/content/drive/My Drive/gpuowl'

%cd '/content/drive/My Drive/gpuowl//'
!git clone

%cd '/content/drive/My Drive/gpuowl/gpuowl//'
!apt install libgmp-dev
!update-alternatives --remove-all gcc 
!update-alternatives --remove-all g++
!apt-get install gcc-8 g++-8
!update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-8 10
!update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-8 10
!update-alternatives --config gcc
!update-alternatives --config g++
!g++ --version
!make gpuowl

!echo create config.txt, worktodo.txt before continuing
The resulting executable is attached below in a .7z file. It passed a brief test each on PRP3 and P-1, with the following worktodo.txt

From xx00fs at for production running, (set up the google drive contents first) Note also it goes to a different directory than the first part above sets up. Executables are also available at from Dylan14, and and (containing faster code by prime95) from Fan Ming. So by copying the section below twice into a Colab notebook, and modifying the folder string of one section, you can easily choose between multiple gpuowl builds or versions or worktodo files. This allows such maneuvers as readying one for running, or gathering results, while the other is running.
from google.colab import drive
!chmod 777 '/content/drive/My Drive/gpuowl'
!cd '/content/drive/My Drive/gpuowl' && LD_LIBRARY_PATH="lib:${LD_LIBRARY_PATH}" && chmod 777 gpuowl && chmod 777 worktodo.txt && ./gpuowl -use ORIG_X2 -block 200 -log 120000 -maxAlloc 10240 -user kriesel -cpu colab/K80
K80's are reportedly ~60 GHzD/day (in LL or PRP3, and presumably also in P-1). P100s and V100s are faster, but T4s and P4s are better used on TF. Gpuowl reportedly is faster than CUDALucas on the same gpu model and exponent task.

But what if you don't want to build anew on every Colab account / Google drive? Or want a previous commit, for its differing features, such as V6.11-380 which is generally faster than V7.x and supports standalone P-1 and LLDC?
One could modify the above build process, something like the following.
From my "building with msys2" notes, to get an old gpuowl commit, find the hash corresponding to the version desired, by matching the leading digits of the hash to the trailing digits of the gpuowl version.
For V6.11-380-g79ea0cc that's 79ea0cc29184237b24018e9396df271ec2754e97.

Then do similar to
git clone --branch v6
cd gpuowl  
git checkout 79ea0cc29184237b24018e9396df271ec2754e97
or perhaps git reset --hard 79ea0cc29184237b24018e9396df271ec2754e97
That would need to be rewritten into the preceding build script in python form.

Or attempt to use one of the past builds posted for Linux & Colab. There were issues seen with each of these when I tried them, and others hit them too. See also the gpuowl runtime error thread.
(Links for available gpuowl downloadable builds are maintained at Also available at the download mirror)

In a subfolder for gpuowl and the most likely GPU model K80, tried v6.11-380 for linux (downloaded from
which corresponds to among the fastest versions ever on Windows, from ~2020-09-07,
repeatedly gave error following:
./gpuowl.exe: /usr/lib/x86_64-linux-gnu/ version `GLIBCXX_3.4.26' not found (required by ./gpuowl.exe)
The needed libstdc files were present in the download and in the working directory for the run attempt. That error message is the entire log contents generated for a run attempt. I haven't been able to resolve that yet; may require updating gcc on the Colab VM; looking for a no-change-to-VM-or-new-build solution, so switched and tried v6.11-366 for linux)

(downloaded from
2021-09-28 00:19:16 gpuowl v6.11-366-gf887d6e-dirty which is from ~2020-09-03, hit the missing kernel issue
2021-09-28 00:19:16 config: -user kriesel -cpu colab5/K80 -d 0 -maxAlloc 10000 -cleanup -block 1000 -log 50000 -proof 10
2021-09-28 00:19:16 device 0, unique id ''
2021-09-28 00:19:16 colab5/K80 61646899 FFT: 3.25M 256:13:512 (18.09 bpw)
2021-09-28 00:19:16 colab5/K80 Expected maximum carry32: 37DF0000
2021-09-28 00:19:17 colab5/K80 OpenCL args "-DEXP=61646899u -DWIDTH=256u -DSMALL_HEIGHT=512u -DMIDDLE=13u -DPM1=0 -DWEIGHT_STEP_MINUS_1=0x1.c25dade4f5b0bp-1 -DIWEIGHT_STEP_MINUS_1=-0x1.df359572a054ap-2  -cl-unsafe-math-optimizations -cl-std=CL2.0 -cl-finite-math-only "
2021-09-28 00:19:17 colab5/K80 

2021-09-28 00:19:17 colab5/K80 OpenCL compilation in 0.38 s
2021-09-28 00:19:17 colab5/K80 Exception gpu_error: INVALID_KERNEL clSetKernelArg(k, pos, sizeof(value), &value) at clwrap.h:77 setArg
2021-09-28 00:19:17 colab5/K80 Bye
(I've mostly left out the occasional permissions issue, missing newline in config.txt, worktodo.txt, unsupported -use option, adding -nospin, typos caught and fixed, -fft 3.5M to avoid the slower 3.25M for LLDC, etc usual noise along the way in the various run attempts here, in a futile attempt at brevity. The exact version is not critial, since from -318 to -380 extensive benchmarking shows they're usually within a couple percent in performance, on Windows.)

Ok, retreat in time further, Fan Ming's 2020-06-29 posted build for linux colab, which is ~v6.11-329 judging by dates. May be e5a8f2c as on the mirror. (downloaded from
There followed several rounds of the Colab run claiming that the gpuowl file was not there, when it clearly was. Then repeated permissions problems. Something or other is running, but appeared to be making no log entries. I can see it in top. Later I realize there's always a lag for when gpuowl log outputs get committed to disk, Windows or Linux, and this is Colab on Google drive so may be more lagging.

This gpuowl experience on Colab was a huge contrast to mprime on the same account, which was simply, copy and edit and copy over local.txt, prime.txt, and the usual mprime files, run, bam, done, one try, no permissions issues or anything, on the same new Colab account and Google Drive, in its own sub folder. That happened first, like Murphy setting a trap, before I attempted gpuowl. All files for both mprime and gpuowl were extracted onto and then copied over from the same server drive by the same method, to google drive folders for mprime and gpuowl/K80 created the same way. Weird. Maybe the archive files for gpuowl Linux versions carried over permission restrictions I did not notice.

The other wrinkle is that Fan Ming's gpuowl is using 93% of a CPU on Linux. That's not supposed to happen. It will be a drag on the mprime performance running in parallel with Gpuowl. Gpuowl's -yield option in config.txt did not matter; not surprising since that was added for Windows 7 and seems to not work for Windows 10 either.
top - 01:24:32 up 31 min,  0 users,  load average: 1.87, 1.92, 1.56

%Cpu(s): 22.3 us, 25.2 sy, 48.6 ni,  3.9 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem : 13302924 total,  9581668 free,  1005436 used,  2715820 buff/cache
KiB Swap:        0 total,        0 free,        0 used. 12264724 avail Mem 

    394 root      30  10  297100 143784   7332 S  97.1  1.1  15:49.73 mprime    
    407 root      20   0 36.608g 277992  96752 R  90.6  2.1  14:43.88 gfm       
     74 root      20   0  701272 116492  32104 S   0.8  0.9   0:17.11 python3   
     94 root      20   0  129176  17092   5636 S   0.4  0.1   0:06.85 python3   
     64 root      20   0  708236  10256   4524 S   0.2  0.1   0:03.19 dap_mult+ 
      1 root      20   0  339816  50716  32408 S   0.1  0.4   0:02.27 node      
     63 root      20   0  194928  60528  13508 S   0.1  0.5   0:02.57 jupyter-+ 
    218 root      20   0 2614224  66156  30332 S   0.0  0.5   0:04.17 drive     
     16 root      20   0   35892   4760   3660 S   0.0  0.0   0:00.62 tail      
    289 root      20   0    4572    776    716 S   0.0  0.0   0:00.08 tail
Hmm, maybe gpuowl's running on the cpu OpenCL? No, gpuowl -h includes a device list, and there's no Intel or AMD or other cpu in it:
-device <N>        : select a specific device:
 0  : Tesla K80- not-AMD
Okay, reread, maybe V6.11-366 can be made to work by copying Windows generated version of files "gpuowl-wrap.cpp" or "" in. They're MIA, not in the Linux archive file I began from, and not available for -366 on my build system for Windows via msys2. is there but not them. Nope, no such files needed in Windows, just, in -364. And using -364's version of "gpuowl-wrap.cpp" or "" on -366 may create new problems.

Maybe a Linux wizard on the forum can provide some solutions.

Top of reference tree:
Attached Files
File Type: 7z gpuowl-6-11-11-colab.7z (191.4 KB, 277 views)

Last fiddled with by kriesel on 2021-09-28 at 15:30 Reason: add latest downloaded-build attempts
kriesel is online now