mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Cloud Computing (https://www.mersenneforum.org/forumdisplay.php?f=134)
-   -   New Google Colab Notebooks For Primality Testing (https://www.mersenneforum.org/showthread.php?t=26522)

danc2 2021-02-23 19:55

~1 Month Results
 
1 Attachment(s)
[QUOTE]Note that with Colab Pro, first time primality tests on the GPU will only take around 2-3 days.[/QUOTE]
To add to Teal's point about how quickly results are returned with the Pro version, attached is a picture of the results from approximately one month and 8 days of testing using Colab Pro. I ran mostly 1 Colab Pro machine and later ran a second Colab Pro machine I purchased. I was using the Colab Extension (though had some issues, which slowed me down; so, think: "even more results are possible").

The results from `Oracle*` and `pdxEmail`, and `Windows` CPUs can be ignored as they are not from Colab, but everything else is. Please also note that I was 5 days without power and thus missed out on those 6 days of results. In tottal, the sum of total primality results up to the current date returned in this timeframe of Januuary 16 - February 23rd is [B][SIZE="4"]46[/SIZE][/B]. This is a significant number due to the fact that most of these are not small DC or CERTS.

chalsall 2021-02-23 21:20

[QUOTE=danc2;572346]This is a significant number due to the fact that most of these are not small DC or CERTS.[/QUOTE]

If I may please share, I'm really enjoying this experiment. :tu:


1. A reasonable amount of compute can be "harvested" from Colab.


2. There seem to be quite a few "dimensions" to the compute allotments.


3. While those running the GPU72 Notebook were shut-out, others were reporting 12 hours or so of GPU.

3.1. The Google Gods (which may simply be Humans directing machines) act in mysterious ways.


4. Recently, those running the GPU72 Notebook have been getting a bit of compute each day.

4.1. My thirteen (13#) instances (spread across five machines in three countries) have to be interacted with, but they always get at least CPU compute for at least 20 minutes.


To be honest, I've been as fascinated with watching the experimenters experiment with the Subjects as much as anything else.

(I'm reminded of Douglas Adams, and the Mice and the Dolphins (or was it the whales)).

Uncwilly 2021-02-23 22:26

[QUOTE=chalsall;572351]To be honest, I've been as fascinated with watching the experimenters experiment with the Subjects as much as anything else.

(I'm reminded of Douglas Adams, and the Mice and the Dolphins (or was it the whales)).[/QUOTE]Not Milgram?

chalsall 2021-02-23 22:48

[QUOTE=Uncwilly;572356]Not Milgram?[/QUOTE]

While seminal, in my opinion "lightweight".

That study didn't bring the profit driver function into the equation (although it might have identified psychopaths as interesting subjects).

tdulcet 2021-02-24 15:54

[QUOTE=kriesel;572256]I'm always happy to see someone chip in and contribute to development.[/QUOTE]

No problem, we are happy to help.

[QUOTE=kriesel;572256]Google Colaboratory has resorted at times to requiring ostensibly human image analysis before authorizing a Colab session.
Three by three arrays of little and sometimes unclear images, with a requirement to select each image that contains bicycles, or palm trees, or hills, or buses, etc. (One object category per challenge session.) Sometimes selected images are replaced with additional until no qualifying images remain; sometimes it's only the initial set of 9. And there have sometimes been child windows specifying it is for human interactive use, not bots, and requiring click confirmation that yes it's a human at the keyboard.[/QUOTE]

I have only seen this once and only with the free Colab. However, even if a notebook disconnects, our extension will just automatically reconnect it. I added a new optional feature to our extension which will automatically rotate through the users Colab tabs when there system is idle or locked (similar to a screen saver, but the screen does not need to be on). This should help prevent the notebooks from being perceived as inactive, particularly for users who are using a dedicated device such as a Raspberry Pi to run their notebooks.

[QUOTE=kriesel;572256]Gpuowl reportedly is faster than CUDALucas on the same gpu model and exponent task.[/QUOTE]

Yeah, I have seen a few posts that claim this, but I do not think anyone has tested yet with all five GPUs currently available on Colab and I am not sure exactly what procedure they followed to come to that conclusion. Our GPU notebook (which uses my CUDALucas install script) makes several changes to the Makefile before building CUDALucas, which likely affects the resulting performance, including enabling the [C]-O3[/C] optimization and correctly setting the [C]--generate-code[/C] flag for every GPU available on Colab. We also did in advance the cufftbench and threadbench tuning for all five GPUs to the 32768K FFT length, which covers exponents up to 580,225,813. You can see the resulting [C]*fft.txt[/C] and [C]*threads.txt[/C] files in our repository [URL="https://github.com/tdulcet/Distributed-Computing-Scripts/tree/master/google-colab/gpu_optimizations"]here[/URL], which lists the ms/iter speeds at every FFT length.

kriesel 2021-02-24 17:50

1 Attachment(s)
[QUOTE=tdulcet;572425]I have only seen this once and only with the free Colab. [/QUOTE]In support of chalsall's statement that Google offers Colab for interactive use, not bot use, the image interpretation task used to occur at least daily on one of my several Colab free accounts; same account every time. It doesn't happen often now, but it still comes up.

re gpuowl faster than cudalucas:
[QUOTE]Yeah, I have seen a few posts that claim this, but I do not think anyone has tested yet with all five GPUs currently available on Colab and I am not sure exactly what procedure they followed to come to that conclusion[/QUOTE]I don't have the time now to respond thoroughly to that. But I did enough testing to decide that all my local gpus that could run gpuowl would completely transition from already-established CUDALucas. I had thoroughly tested and tuned for numerous gpu models from Quadro2000 to GTX1080Ti in CUDALucas before that. Here's a recent quick compare on GTX1080.

Compare LL on CUDALucas to PRP on gpuowl. Same exponent, same host, same gpu, same hour, same environmental and clocking conditions, a GTX1080 for this quick benchmark.

CUDALucas v2.06 May 5 2017 version compiled by flashjh; Windows 10 run environment
[CODE]Starting M240110503 fft length = 13824K
| Date Time | Test Num Iter Residue | FFT Error ms/It Time | ETA Done |
| Feb 23 16:14:28 | M240110503 10000 0x5b6b7cbec1bdc015 | 13824K 0.08594 13.4883 134.88s | 37:11:35:46 0.00% |
| Feb 23 16:16:43 | M240110503 20000 0xde34ff2ddb2080a4 | 13824K 0.08789 13.5358 135.35s | 37:13:08:45 0.00% |
| Feb 23 16:18:58 | M240110503 30000 0x14e2c4cd92c29164 | 13824K 0.09180 13.5395 135.39s | 37:13:43:10 0.01% |
| Feb 23 16:21:14 | M240110503 40000 0x5256dd82035447c4 | 13824K 0.08594 13.5488 135.48s | 37:14:08:29 0.01% |
| Feb 23 16:23:29 | M240110503 50000 0xe89ddd5520561b21 | 13824K 0.08594 13.5361 135.36s | 37:14:12:38 0.02% |[/CODE]average ms/it 13.5297
ETA 240110503 * .0135297 sec /3600/24 days/sec =~ 37.600 days

Gpuowl v6.11-380 excerpt mid-run of PRP/GEC/proof, 13M fft (1k:13:512):[CODE]
2021-02-23 15:37:13 asr3/gtx1080 240110503 OK 131700000 54.85%; 11875 us/it; ETA 14d 21:36; a5f295da6eddc0a1 (check 5.17s)
2021-02-23 15:47:13 asr3/gtx1080 240110503 OK 131750000 54.87%; 11877 us/it; ETA 14d 21:30; f20a694bd0c842de (check 5.71s)
2021-02-23 15:57:12 asr3/gtx1080 240110503 OK 131800000 54.89%; 11883 us/it; ETA 14d 21:31; 7ddaab01bbd26fcd (check 5.20s)
2021-02-23 16:07:11 asr3/gtx1080 240110503 OK 131850000 54.91%; 11866 us/it; ETA 14d 20:50; 38b6acb7773f3896 (check 5.28s)[/CODE]average ms/it 11.875
ETA start to finish 240110503 * .011875 sec /3600/24 days/sec =~ 33.001 days

Raw iteration speed ratio gpuowl PRP / CUDALucas LL = 37.6/33.001 =~ 1.1394

The fft length difference (13.5M CUDALucas vs 13M gpuowl) only accounts for ~4% out of the observed 14% difference favoring gpuowl (like getting 8 days per week!)


What's omitted above is the slightly more than 2:1 overall project speed advantage of PRP/GEC/proof vs. LL, LLDC, and typically 4% LLTC, that's lost by using CUDALucas. And the loss of error checking; not even the relatively weaker Jacobi symbol check in CUDALucas, unless you've added it in your builds. The higher the exponent, the longer the run, and the less likely a run will complete correctly without GEC.

In P-1, you could perhaps compare my CUDAPm1 fft and threads file timings and estimate P-1 run times. If you try running P-1 tests on Colab I'd be interested in learning how to resolve the zero-residue issue I ran into. [URL]https://www.mersenneforum.org/showpost.php?p=527928&postcount=5[/URL]

Gpuowl P-1 run time scaling for various gpus including 2 Colab models can be found [URL="https://www.mersenneforum.org/showpost.php?p=525955&postcount=17"]here[/URL]. Benchmarking on V100 has been a nonissue since I don't recall ever encountering one. Lately it's almost entirely T4s, more suitable for TF.

danc2 2021-02-24 19:31

[QUOTE][image interpretation task] doesn't happen often now, but it still comes up.[/QUOTE]
I would be curious if Teal has seen this when using the extension or not. The extension can check (clicks on the play button of the first cell) every 5 seconds IIRC (customizable by the user). With this setup, I've never seen the interpretation task.

GPUOwl stuff:
Yes, it would be great if we could use GPUOwl instead of CUDALucas as it sounds like there is more that can be done, as great as CUDALucas is.

kriesel 2021-02-25 12:46

GTX1060 gpuowl vs. CUDALucas ~58M LL DC
 
Executive summary: Gpuowl 5.8 ms/iter with Jacobi check, CUDALuca 6.25-6.5 ms/iter (no Jacobi check)


Gpuowl v6.11-380 on GTX1060 ~5.806 ms/iter in 58.75M LL DC with Jacobi check:[CODE]2021-02-22 21:04:36 condor/gtx1060 58755607 FFT: 3M 1K:6:256 (18.68 bpw)
2021-02-22 21:04:36 condor/gtx1060 Expected maximum carry32: 50550000
2021-02-22 21:04:36 condor/gtx1060 OpenCL args "-DEXP=58755607u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=6u -DPM1=0 -DMM_CHAIN=1u -DMM2_CHAIN=2u -DMAX_ACCURAC
Y=1 -DWEIGHT_STEP_MINUS_1=0x8.01304be8dc228p-5 -DIWEIGHT_STEP_MINUS_1=-0xc.ce52411c70cep-6 -cl-unsafe-math-optimizations -cl-std=CL2.0 -cl-finite-math-only "
2021-02-22 21:04:39 condor/gtx1060

2021-02-22 21:04:39 condor/gtx1060 OpenCL compilation in 2.52 s
2021-02-22 21:04:39 condor/gtx1060 58755607 LL 0 loaded: 0000000000000004
2021-02-22 21:06:00 condor/gtx1060 102714151 P2 GCD: no factor
2021-02-22 21:06:00 condor/gtx1060 {"status":"NF", "exponent":"102714151", "worktype":"PM1", "B1":"1000000", "B2":"30000000", "fft-length":"5767168", "program":
{"name":"gpuowl", "version":"v6.11-380-g79ea0cc"}, "user":"kriesel", "computer":"condor/gtx1060", "aid":"7DAA6CA7DFF308D0DF638276AF9B5028", "timestamp":"2021-02
-23 03:06:00 UTC"}
2021-02-22 21:14:20 condor/gtx1060 58755607 LL 100000 0.17%; 5807 us/it; ETA 3d 22:37; 39c251c47f602a3d
2021-02-22 21:24:01 condor/gtx1060 58755607 LL 200000 0.34%; 5807 us/it; ETA 3d 22:28; eb46c0fb8d0e94f8
2021-02-22 21:33:41 condor/gtx1060 58755607 LL 300000 0.51%; 5807 us/it; ETA 3d 22:18; ed993c4bb040ddef
2021-02-22 21:43:22 condor/gtx1060 58755607 LL 400000 0.68%; 5807 us/it; ETA 3d 22:07; 54e2c2904288419d
2021-02-22 21:53:03 condor/gtx1060 58755607 LL 500000 0.85%; 5808 us/it; ETA 3d 21:59; 16657e0fba393f7f
2021-02-22 22:02:43 condor/gtx1060 58755607 LL 600000 1.02%; 5808 us/it; ETA 3d 21:49; 7ca0fe4b4db9c724
2021-02-22 22:02:43 condor/gtx1060 58755607 OK 500000 (jacobi == -1)
2021-02-22 22:12:24 condor/gtx1060 58755607 LL 700000 1.19%; 5808 us/it; ETA 3d 21:40; 22aa1cb83c55294c
...
2021-02-25 05:41:26 condor/gtx1060 58755607 LL 35100000 59.74%; 5805 us/it; ETA 1d 14:09; 7810938d88993295
2021-02-25 05:41:26 condor/gtx1060 58755607 OK 35000000 (jacobi == -1)
2021-02-25 05:51:06 condor/gtx1060 58755607 LL 35200000 59.91%; 5804 us/it; ETA 1d 13:59; 5d55d69ab7ca60a9
2021-02-25 06:00:46 condor/gtx1060 58755607 LL 35300000 60.08%; 5804 us/it; ETA 1d 13:49; 5635fb50dc776ab9
2021-02-25 06:10:27 condor/gtx1060 58755607 LL 35400000 60.25%; 5804 us/it; ETA 1d 13:39; 2ef462f9a00916b2
2021-02-25 06:14:25 condor/gtx1060 Stopping, please wait..
2021-02-25 06:14:25 condor/gtx1060 58755607 LL 35441000 60.32%; 5813 us/it; ETA 1d 13:39; bdb95405e8027916
2021-02-25 06:14:25 condor/gtx1060 waiting for the Jacobi check to finish..
2021-02-25 06:15:12 condor/gtx1060 58755607 OK 35441000 (jacobi == -1)
[/CODE]Cudalucas v2.06 May 5 2017, same everything else, nominally 6.248 ms/iter, but actually higher because of oscillation between 3136K and 3200K fft length;
10:51 / 100k iterations = 6.51 msec/iter, 12% longer than Gpuowl, and no Jacobi check:[CODE]Using threads: square 512, splice 128.
Starting M58755607 fft length = 3200K
| Date Time | Test Num Iter Residue | FFT Error ms/It Time | ETA Done |
| Feb 25 06:20:15 | M58755607 50000 0x6b790995614a3aa2 | 3200K 0.19189 6.2483 312.41s | 4:05:53:31 0.08% |
Resettng fft.

Using threads: square 512, splice 32.

Continuing M58755607 @ iteration 50001 with fft length 3136K, 0.09% done

Round off error at iteration = 51500, err = 0.35938 > 0.35, fft = 3136K.
Restarting from last checkpoint to see if the error is repeatable.

Using threads: square 512, splice 32.

Continuing M58755607 @ iteration 50001 with fft length 3136K, 0.09% done

Round off error at iteration = 51500, err = 0.35938 > 0.35, fft = 3136K.
The error persists.
Trying a larger fft until the next checkpoint.

Using threads: square 512, splice 128.

Continuing M58755607 @ iteration 50001 with fft length 3200K, 0.09% done

| Feb 25 06:25:45 | M58755607 100000 0x39c251c47f602a3d | 3200K 0.18750 6.2484 312.41s | 4:05:48:26 0.17% |
Resettng fft.

Using threads: square 512, splice 32.

Continuing M58755607 @ iteration 100001 with fft length 3136K, 0.17% done

Round off error at iteration = 100700, err = 0.35156 > 0.35, fft = 3136K.
Restarting from last checkpoint to see if the error is repeatable.

Using threads: square 512, splice 32.

Continuing M58755607 @ iteration 100001 with fft length 3136K, 0.17% done

Round off error at iteration = 100700, err = 0.35156 > 0.35, fft = 3136K.
The error persists.
Trying a larger fft until the next checkpoint.

Using threads: square 512, splice 128.

Continuing M58755607 @ iteration 100001 with fft length 3200K, 0.17% done

| Feb 25 06:31:06 | M58755607 150000 0x71a49982b1d8c05d | 3200K 0.17969 6.2493 312.46s | 4:05:44:06 0.25% |
Resettng fft.

Using threads: square 512, splice 32.

Continuing M58755607 @ iteration 150001 with fft length 3136K, 0.26% done

Round off error at iteration = 158700, err = 0.375 > 0.35, fft = 3136K.
Restarting from last checkpoint to see if the error is repeatable.

Using threads: square 512, splice 32.

Continuing M58755607 @ iteration 150001 with fft length 3136K, 0.26% done

Round off error at iteration = 158700, err = 0.375 > 0.35, fft = 3136K.
The error persists.
Trying a larger fft until the next checkpoint.[/CODE]CUDALucas was a great program. Had a lot of fun with it. It has been surpassed and is not being actively maintained.

tdulcet 2021-02-25 14:44

[QUOTE=kriesel;572432]In support of chalsall's statement that Google offers Colab for interactive use, not bot use, the image interpretation task used to occur at least daily on one of my several Colab free accounts; same account every time. It doesn't happen often now, but it still comes up.[/QUOTE]

Our extension is not designed to act like a bot and I would actually consider that an abuse of it. It is only to assist users with the otherwise tedious task of checking if their notebooks will connect/reconnect, to help them maximize their runtime. It is also not designed to be used noninteractivity. By default, it will display a desktop notification whenever a notebook connects, reconnects or disconnects due to usage limits. Clicking these notifications opens the tab/window with the notebook so the user can easily monitor the progress and after it connects, they can check which GPU/CPU they got. Even with our extension installed, I still manually check my Colab tabs at least hourly to monitor the progress and check our notebooks for errors, as often as I would without the extension.

Note that there are existing add-ons that claim to be able to automatically solve these reCAPTCHAs (I have never tried any of them), such as [URL="https://addons.mozilla.org/en-US/firefox/addon/buster-captcha-solver/"]Buster: Captcha Solver for Humans[/URL], which could potentially be used if this ever becomes problematic in Colab.

[QUOTE=kriesel;572432]But I did enough testing to decide that all my local gpus that could run gpuowl would completely transition from already-established CUDALucas. I had thoroughly tested and tuned for numerous gpu models from Quadro2000 to GTX1080Ti in CUDALucas before that.[/QUOTE]

OK, I have no doubt that GpuOwl is faster on some Nvidia GPUs than CUDALucas and your results show that for your GTX 1080 and GTX 1060 GPUs. However, I was specifically referring the Tesla V100, P100, K80, T4 and P4 GPUs available on Colab and using my install script to build CUDALucas. I do not think anyone has tested yet with all of those.

For a wavefront first time primality test (with an exponent up to 115,080,019), here are the ms/iter speeds with CUDALucas on Colab using our GPU notebook (all 6272K FFT length):
[LIST][*]Tesla V100: 1.14 ms/iter[*]Tesla P100: 1.74 ms/iter[*]Tesla K80: 6.66 - 7.36 ms/iter[*]Tesla T4: 7.95 - 8.48 ms/iter[*]Tesla P4: 10.24 ms/iter[/LIST]We would be interested if someone had these ms/iter speeds with GpuOwl on Colab.

[QUOTE=kriesel;572432]And the loss of error checking; not even the relatively weaker Jacobi symbol check in CUDALucas, unless you've added it in your builds. The higher the exponent, the longer the run, and the less likely a run will complete correctly without GEC.[/QUOTE]

All the Tesla GPUs on Colab have ECC memory enabled, so Jacobi and Gerbicz error checking is not needed. You can see this from the [C]ECC Support?[/C] line near the top of the CUDALucas output. Adding Jacobi error checking to CUDALucas is listed in [URL="https://github.com/tdulcet/Distributed-Computing-Scripts#contributing"]the Contributing section[/URL] of the main README, but it would have no effect on Colab.

[QUOTE=kriesel;572508]Gpuowl v6.11-380 on GTX1060 ~5.806 ms/iter in 58.75M LL DC with Jacobi check[/QUOTE]

Note that the [URL="https://github.com/preda/gpuowl/releases"]latest version[/URL] of GpuOwl is v7.2, although it no longer supports any LL tests or the Jacobi error check. This would add a lot of complexity to our GPU notebook, if it were to support GpuOwl, as it would have to download and build both v6 and v7 to support both LL DC and PRP tests respectively and then someone would have to write a wrapper to run the correct version based on the next assignment in the worktodo file.

[QUOTE=kriesel;572508]CUDALucas was a great program. Had a lot of fun with it. It has been surpassed and is not being actively maintained.[/QUOTE]

As Daniel said in [URL="https://www.mersenneforum.org/showpost.php?p=572263&postcount=17"]post #17[/URL], pull requests are welcome!

[QUOTE=danc2;572447]I would be curious if Teal has seen this when using the extension or not. The extension can check (clicks on the play button of the first cell) every 5 seconds IIRC (customizable by the user). With this setup, I've never seen the interpretation task.[/QUOTE]

Yeah, I am not sure if the reason I have only seen this once is because of our extension. It dismisses all other popups, so it is possible that our extension just dismisses this popup, which would explain why Daniel and I never see it. I would need to see it again to know for sure, so that I can inspect it.

When our extension is set to automatically run the first cell of the notebook (disabled by default), it will check if the cell is running every minute by default. This is configurable, but I would not recommend that users use a value less than one minute to prevent Google from thinking they/we are [URL="https://en.wikipedia.org/wiki/Denial-of-service_attack"]DoSing[/URL] their servers.

Prime95 2021-02-25 16:59

[QUOTE=tdulcet;572514]All the Tesla GPUs on Colab have ECC memory enabled, so Jacobi and Gerbicz error checking is not needed.[/QUOTE]

There are other sources of hardware error than memory. Thus, Gerbicz error checking is still beneficial.

[quote]GpuOwl is v7.2, no longer supports any LL tests. This would add a lot of complexity to our GPU notebook, if it were to support GpuOwl, as it would have to download and build both v6 and v7 to support both LL DC and PRP tests respectively and then someone would have to write a wrapper to run the correct version based on the next assignment in the worktodo file.[/QUOTE]

The PrimeNet server will happily accept a PRP test with proof for LL-DC work. So, you only need to download one gpuowl version.
Another gpuowl advantage is it will run P-1 if necessary, potentially saving a lengthy PRP test altogether.

Also, in prime95 you can cut the amount of disk space required in half. I'll bet gpuowl has a similar option.

PhilF 2021-02-25 19:04

[QUOTE=Prime95;572522]The PrimeNet server will happily accept a PRP test with proof for LL-DC work.[/QUOTE]

I didn't know that! So, would one just manually reserve a LL-DC exponent, PRP test it, and then manually submit the result?


All times are UTC. The time now is 02:35.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.