mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > Cloud Computing

Reply
Thread Tools
Old 2019-11-07, 07:44   #518
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

5,437 Posts
Default

Quote:
Originally Posted by axn View Post
If you see any 5x timings, then kill the session and reconnect. Hopefully you'll get a better one.
Also, for the FMA3 runs, you might get better timings by enabling Hyperthreaded LL.
I don't see the 5x timings, until the mprime log updates, which is at the end of the run. Mprime is run in background, along with a gpu application, so there is no web browser indication of timing during a run. See the third code block of https://www.mersenneforum.org/showpo...73&postcount=8
Also I am reluctant to give up a session early since it has become unreliable to get one at all. Will try "CpuNumHyperthreads=2" in mprime local.txt next time around.
It would be good if there was a way to determine in the Colab script which gpu model was available, and branch to running either tf or prp, ll, or P-1 chosen based on that. Anyone have a code fragment that will do that?

Last fiddled with by kriesel on 2019-11-07 at 07:46
kriesel is offline   Reply With Quote
Old 2019-11-07, 10:31   #519
axn
 
axn's Avatar
 
Jun 2003

13DD16 Posts
Default

Quote:
Originally Posted by kriesel View Post
I don't see the 5x timings, until the mprime log updates, which is at the end of the run. Mprime is run in background, along with a gpu application, so there is no web browser indication of timing during a run. See the third code block of https://www.mersenneforum.org/showpo...73&postcount=8
Ok. That complicates things. But, you can just run mprime for a while (about 5-10 minutes) in the foreground to observe the iteration value and then either relaunch it in the background if numbers are satisfactory or launch a new instance, can't you?

Quote:
Originally Posted by kriesel View Post
Will try "CpuNumHyperthreads=2" in mprime local.txt next time around.
No. The correct flag is HyperthreadLL=1
axn is offline   Reply With Quote
Old 2019-11-07, 14:16   #520
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

5,437 Posts
Default

Quote:
Originally Posted by axn View Post
Ok. That complicates things. But, you can just run mprime for a while (about 5-10 minutes) in the foreground to observe the iteration value and then either relaunch it in the background if numbers are satisfactory or launch a new instance, can't you?

No. The correct flag is HyperthreadLL=1
I could initially run in foreground as you suggest, although that increases personal overhead and I'm trying to move toward more automation, not more manual involvement. A test built into the script for what cpu type was allocated, and branch on that basis, would be better. (Anyone have suggestions for that? Maybe for now I'll just throw !lscpu in at the front and see what I'm getting.) Attempting to launch a new instance may give a different cpu type, or fail to obtain a gpu, or fail to obtain any VM at all. With two different accounts, from different hosts and web browsers, I'm getting currently about 50% duty cycle on each.

Re the mprime flag you gave, I don't see it in any of the prime95 V29.8b6 documentation. Where did you find that?

From prime95 V29.8b6 undoc.txt:
Code:
The program automatically computes the number of CPUs, hyperthreading, and speed.
This information is used to calculate how much work to get.
If the program did not correctly figure out your CPU information,
you can override the info in local.txt:
    NumCPUs=n
    CpuNumHyperthreads=1 or 2
    CpuSpeed=s
Where n is the number of physical CPUs or cores, not logical CPUs created by
hyperthreading. Choose 1 for non-hyperthreaded and 2 for hyperthreaded. Finally,
s is the speed in MHz.
From readme.txt:
The string "hyperthread" was not present.

From Whatsnew.txt:
Also no mention, of any hyperthread related configuration file entry syntax.

Google Colaboratory is a pretty (sneaky|clever) way of getting us to learn linux and Python, one little bit at a time.
kriesel is offline   Reply With Quote
Old 2019-11-07, 14:27   #521
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

5,437 Posts
Default

Here's a first attempt. This attempt terminates immediately because no gpu was available but I still used the dual-task script.
Code:
Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              2
On-line CPU(s) list: 0,1
Thread(s) per core:  2
Core(s) per socket:  1
Socket(s):           1
NUMA node(s):        1
Vendor ID:           GenuineIntel
CPU family:          6
Model:               85
Model name:          Intel(R) Xeon(R) CPU @ 2.00GHz
Stepping:            3
CPU MHz:             2000.176
BogoMIPS:            4000.35
Hypervisor vendor:   KVM
Virtualization type: full
L1d cache:           32K
L1i cache:           32K
L2 cache:            1024K
L3 cache:            39424K
NUMA node0 CPU(s):   0,1
Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single ssbd ibrs ibpb stibp fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves arat md_clear arch_capabilities
Go to this URL in a browser: https://accounts.google.com/o/oauth2/auth?client_id=...

Enter your authorization code:
··········
Mounted at /content/drive
/content/drive/My Drive
/content/drive/My Drive/mprime
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

2019-11-07 14:20:55 gpuowl 
2019-11-07 14:20:55 Note: no config.txt file found
2019-11-07 14:20:56 config: -use ORIG_X2 -log 120000 -maxAlloc 10240 -user kriesel -cpu colab/K80 
2019-11-07 14:20:56 411000059 FFT 24576K: Width 256x4, Height 256x4, Middle 12; 16.33 bits/word
2019-11-07 14:20:56 Exception gpu_error:  clGetPlatformIDs(16, platforms, (unsigned *) &nPlatforms) at clwrap.cpp:64 getDeviceIDs
 2019-11-07 14:20:56 Bye
But apparently the mprime instance is still running, because a second try with
Code:
#Notebook to resume a run of mprime on a Colab session
!lscpu
import os.path
from google.colab import drive
import sys
if not os.path.exists('/content/drive/My Drive'):
  drive.mount('/content/drive')
%cd '/content/drive/My Drive//'
!chmod +w '/content/drive/My Drive'
%cd '/content/drive/My Drive/mprime//'
!chmod +x ./mprime
!echo run ./mprime
!./mprime -d | tee -a ./mprimelog.txt
says so:
Code:
Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              2
On-line CPU(s) list: 0,1
Thread(s) per core:  2
Core(s) per socket:  1
Socket(s):           1
NUMA node(s):        1
Vendor ID:           GenuineIntel
CPU family:          6
Model:               85
Model name:          Intel(R) Xeon(R) CPU @ 2.00GHz
Stepping:            3
CPU MHz:             2000.176
BogoMIPS:            4000.35
Hypervisor vendor:   KVM
Virtualization type: full
L1d cache:           32K
L1i cache:           32K
L2 cache:            1024K
L3 cache:            39424K
NUMA node0 CPU(s):   0,1
Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single ssbd ibrs ibpb stibp fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves arat md_clear arch_capabilities
/content/drive/My Drive
/content/drive/My Drive/mprime
run ./mprime
[Main thread Nov 7 14:30] Mersenne number primality test program version 29.8
[Main thread Nov 7 14:30] Optimizing for CPU architecture: Core i3/i5/i7, L2 cache size: 1 MB, L3 cache size: 39424 KB
Another mprime is already running!
Running !top -d 30 in a separate Colab section confirms it.

Last fiddled with by kriesel on 2019-11-07 at 14:38
kriesel is offline   Reply With Quote
Old 2019-11-07, 14:49   #522
storm5510
Random Account
 
storm5510's Avatar
 
Aug 2009

13×151 Posts
Default

Quote:
Originally Posted by kriesel View Post
...Google Colaboratory is a pretty (sneaky|clever) way of getting us to learn linux and Python, one little bit at a time.
I looked at some Python code yesterday on a different site. It was amazing, to me, how much I could understand having not seen any before.
storm5510 is offline   Reply With Quote
Old 2019-11-07, 15:17   #523
axn
 
axn's Avatar
 
Jun 2003

32·5·113 Posts
Default

Quote:
Originally Posted by kriesel View Post
Re the mprime flag you gave, I don't see it in any of the prime95 V29.8b6 documentation. Where did you find that?
Used P95 on a windows laptop with HT. Checked the box that says "Use HT for LL test". Looked in local.txt for what changed.

CpuNumHyperthreads is needed if the program doesn't correctly detect presence of HT. It does in kaggle/colab. So that setting doesn't do anything.
axn is offline   Reply With Quote
Old 2019-11-07, 16:16   #524
chalsall
If I May
 
chalsall's Avatar
 
"Chris Halsall"
Sep 2002
Barbados

2×67×73 Posts
Default

Quote:
Originally Posted by kriesel View Post
Anyone have suggestions for that? Maybe for now I'll just throw !lscpu in at the front and see what I'm getting.
A quick idea might be to launch mprime with the "-d" option, and either fork it and read it's STDOUT from a "pipe", or else redirect it's STDOUT to a file which is then "tail -f'"'ed.

How you'd automatically deal with a restart request is an exercise left to the reader...
chalsall is offline   Reply With Quote
Old 2019-11-07, 18:44   #525
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

10101001111012 Posts
Default

Quote:
Originally Posted by chalsall View Post
A quick idea might be to launch mprime with the "-d" option, and either fork it and read it's STDOUT from a "pipe", or else redirect it's STDOUT to a file which is then "tail -f'"'ed.

How you'd automatically deal with a restart request is an exercise left to the reader...
Such as by tee -a?
Code:
#Notebook to resume a run of mprime on a Colab session
!lscpu
import os.path
from google.colab import drive
import sys
if not os.path.exists('/content/drive/My Drive'):
  drive.mount('/content/drive')
%cd '/content/drive/My Drive//'
!chmod +w '/content/drive/My Drive'
%cd '/content/drive/My Drive/mprime//'
!chmod +x ./mprime
!echo run ./mprime
!./mprime -d | tee -a ./mprimelog.txt
The destination does not get updated on the fly; only when the mprime session terminates. Perhaps because it's on a Google Drive. Colab console says, if mprime is run in foreground instead of background:
Code:
/content/drive/My Drive
/content/drive/My Drive/mprime
run ./mprime
[Main thread Nov 7 18:25] Mersenne number primality test program version 29.8
[Main thread Nov 7 18:25] Optimizing for CPU architecture: Core i3/i5/i7, L2 cache size: 1 MB, L3 cache size: 39424 KB
[Main thread Nov 7 18:25] Starting worker.
[Comm thread Nov 7 18:25] Sending interim residue 30000000 for M87092557
[Work thread Nov 7 18:25] Worker starting
[Comm thread Nov 7 18:25] Done communicating with server.
[Work thread Nov 7 18:25] Resuming Gerbicz error-checking PRP test of M87092557 using AVX-512 FFT length 4608K, Pass1=192, Pass2=24K, clm=4
[Work thread Nov 7 18:25] Iteration: 30257107 / 87092557 [34.74%].
[Work thread Nov 7 18:26] Iteration: 30260000 / 87092557 [34.74%], ms/iter: 24.861, ETA: 16d 08:28
[Work thread Nov 7 18:30] Iteration: 30270000 / 87092557 [34.75%], ms/iter: 24.566, ETA: 16d 03:45
But the end of mprimelog.txt is still:
Code:
[Work thread Nov 7 18:15] Iteration: 30240000 / 87092557 [34.72%], ms/iter: 24.493, ETA: 16d 02:48
[Work thread Nov 7 18:19] Iteration: 30250000 / 87092557 [34.73%], ms/iter: 24.526, ETA: 16d 03:15
[Main thread Nov 7 18:22] Stopping all worker threads.
[Work thread Nov 7 18:22] Stopping PRP test of M87092557 at iteration 30257106 [34.74%]
[Work thread Nov 7 18:22] Worker stopped.
[Main thread Nov 7 18:22] Execution halted.
[Main thread Nov 7 18:22] Choose Test/Continue to restart.
(as displayed by double clicking the file in the Google Drive tab. The Colab Files menu produces a more current version but hard to deal with narrow file display.)
kriesel is offline   Reply With Quote
Old 2019-11-07, 19:07   #526
chalsall
If I May
 
chalsall's Avatar
 
"Chris Halsall"
Sep 2002
Barbados

263616 Posts
Default

Quote:
Originally Posted by kriesel View Post
Such as by tee -a? The destination does not get updated on the fly; only when the mprime session terminates.
1. Yup. Tee works too.

1.1. You guys are going to /love/ Linux. You really are!

2. Really? Hmmm...

2.1. I have no time to run experiments myself.

2.2. I haven't attached an instance to a Drive since we all started experimenting with this -- way back at the beginning of September.

2.2.1. How time flies when you're having fun!!!

3. Try running some tests with the filesystem entirely within the nominal Colab File System (FS).

3.1. It should be as simple as copying files over from a Drive into a directory ("/root/" is fine)...

3.2. And then "throwing" mprime "into the background" by way of the Python shell, and then tail / tail -f logs, etc.

HTH. YMMV.
chalsall is offline   Reply With Quote
Old 2019-11-08, 07:06   #527
bayanne
 
bayanne's Avatar
 
"Tony Gott"
Aug 2002
Yell, Shetland, UK

22×83 Posts
Default

An exponent that had been allocated to me 97930517 has been completed by someone else as well, and their result has been accepted. No problem to me, except that this result is not been cleared from the results.txt file, wherever that may be held. Thus it keeps appearing in the results for my instance name.

Where is that file, and can I clear this entry from it?
bayanne is online now   Reply With Quote
Old 2019-11-08, 14:46   #528
Uncwilly
6809 > 6502
 
Uncwilly's Avatar
 
"""""""""""""""""""
Aug 2003
101×103 Posts

2·7·19·37 Posts
Default

Ok folks, with regards to sessions being a little persistent, I found out something else. If you have a phone that has linked to a Google account, you can can 'pick-up' a session that is live on a desk top with that same account and vice-versa. This morning I had my work phone at home with a P100 session going. I got into work and fired up my desk top browser and headed to colab and fired up the session. The same session was running. Then I closed the browser on the phone. It was a successful hand off. Just a small way of getting the most out this.


Has anyone that has been working on using the CPU's tried to set it up under one of the GPU72 sponsored sessions? If you have and it works, then getting Chris to include that in the deployment could get us some nice P-1 or DC work too.
Uncwilly is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Alternatives to Google Colab kriesel Cloud Computing 11 2020-01-14 18:45
Notebook enzocreti enzocreti 0 2019-02-15 08:20
Computer Diet causes Machine Check Exception -- need heuristics help Christenson Hardware 32 2011-12-25 08:17
Computer diet - Need help garo Hardware 41 2011-10-06 04:06
Workunit diet ? dsouza123 NFSNET Discussion 5 2004-02-27 00:42

All times are UTC. The time now is 05:45.


Fri Aug 6 05:45:41 UTC 2021 up 14 days, 14 mins, 1 user, load averages: 3.30, 3.04, 2.90

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.