mersenneforum.org Colab question
 Register FAQ Search Today's Posts Mark Forums Read

 2020-07-29, 22:04 #166 Aramis Wyler     "Bill Staffen" Jan 2013 Pittsburgh, PA, USA 2×3×5×13 Posts Yeah, that was not a bug. Though it did stop a few minutes later at 24 hours and 16 minutes. I hit the reconnect and play buttons, and it restarted. To be clear, I have running 4 sessions at once. Two got about 15 minutes past 24 hours, and I had to restart them. The other 2 are on the 20 hour mark. Last fiddled with by Aramis Wyler on 2020-07-29 at 22:07
2020-08-01, 17:31   #167
LOBES

Mar 2019
USA

2·32 Posts

Quote:
 Originally Posted by Aramis Wyler Yeah, that was not a bug. Though it did stop a few minutes later at 24 hours and 16 minutes. I hit the reconnect and play buttons, and it restarted. To be clear, I have running 4 sessions at once. Two got about 15 minutes past 24 hours, and I had to restart them. The other 2 are on the 20 hour mark.
Do I need to run each Colab window using a different Google Account? I have two Notebook Access Keys, but when I have them running at the same time on different computers, they seem to do the same work and step on each other's toes.

So should I just create a new Google account, and have the different Access Keys I have on GPU72 run on each?

2020-08-01, 17:44   #168
chalsall
If I May

"Chris Halsall"
Sep 2002

2×4,657 Posts

Quote:
 Originally Posted by LOBES So should I just create a new Google account, and have the different Access Keys I have on GPU72 run on each?
Definitely.

Each Google ***Account*** is given the compute allotment. So if you have two (or more) Notebooks "attached" to a backend, they'll be sharing the resources (even the FS). Different AKeys just means they're stepping on each other.

I've found that having four (4#) accounts per web browser (each in a different tab) is just about optimal for managing the runs. You can use the same AKey for all of them if you want, but I think it's better to have a different one for each, so you know what instance is running what where.

2020-08-01, 18:12   #169
LOBES

Mar 2019
USA

100102 Posts

Quote:
 Originally Posted by chalsall Definitely. Each Google ***Account*** is given the compute allotment. So if you have two (or more) Notebooks "attached" to a backend, they'll be sharing the resources (even the FS). Different AKeys just means they're stepping on each other. I've found that having four (4#) accounts per web browser (each in a different tab) is just about optimal for managing the runs. You can use the same AKey for all of them if you want, but I think it's better to have a different one for each, so you know what instance is running what where.
Got it, thanks. I actually had an old Google account that I hadn't used in a long time, I just resurrected and its churning away on Colab now. I have just left the two settings as "Let GPU72 Decide".

 2020-08-01, 18:49 #170 Aramis Wyler     "Bill Staffen" Jan 2013 Pittsburgh, PA, USA 18616 Posts Near as I could tell for the few minutes I was running the free colab, you got 2 sessions per google account. When I upgraded to pro it became 4. I use one account, 4 sessions, 1 session/tab. I have 2 notebook keys, but if I had it to do over again I'd only use 1 because I can eyeball all 4 tabs at a go and they're either all on or all off. Maybe a notebook key per google account would be the way to go. I noticed today that you can open a local session if you have a jupyter server on your pc. I think the likelyhood is low, but I can't help but wonder if it wouldn't be faster/easier to run a notebook against my own pc vs misfit and mfactc. Last fiddled with by Aramis Wyler on 2020-08-01 at 18:50
2020-08-11, 01:25   #171

"Kieren"
Jul 2011
In My Own Galaxy!

1000810 Posts

Quote:
 Originally Posted by chalsall What!?!?! Is that line correct, or do I have a bug in the payload code? Have you really had a session run 24 hours and 14 minutes? I've *never* seen anything over 12, and that was last year!
I run two paid accounts in alternation. I run four Notebooks in each. Usually, I calculate based on an 18 hour run. If I'm going to be asleep when a group times out, I switch to the other before retiring. When I forget to check until later I have seen runs of 20-23 hours. I currently have 4 just short of 22 hours. I still occasionally get the No GPU message if I confuse which account I am in (forgot to switch after stopping.) Going to the other account fixes this. I suppose this means I might be allowed overlap an hour or two. I am usually only signed into one account, but I don't think I have to be.
EDIT:
OMFG! I have 8 notebooks running ATM. I expect 4 of them to time out pretty soon, but we'll see.

Last fiddled with by kladner on 2020-08-11 at 01:31

 2020-08-15, 03:43 #172 kladner     "Kieren" Jul 2011 In My Own Galaxy! 23·32·139 Posts Now for my next trick, I have the standard 4 P100s running on a paid account, and snagged 2 T4s on a free account. Duration is uncertain on the freeby, but 10 hours is about the limit I've run into. One does have to keep track of accounts to use them in sequence to get the longest runs. On the paid accounts, I have seen close to 24 hours if I forget and let them run till they quit. I sure wish they would let T4s onto paid accounts. They are 50% faster than P100s, and they use about 28% of the power. I guess they have a lot more P100s, which still kick ass compared to K80s and the like. The 4 that I am using right now also are using about a kilowatt between them.
 2020-08-15, 11:01 #173 ATH Einyen     Dec 2003 Denmark 22×3×5×72 Posts You should use the P100 for PRP testing with gpuowl, that is what I do. About 21.5 hours for 92M exponent.
2020-08-15, 17:22   #174
Chuck

May 2011
Orange Park, FL

5×173 Posts

Quote:
 Originally Posted by kladner I sure wish they would let T4s onto paid accounts. They are 50% faster than P100s, and they use about 28% of the power. I guess they have a lot more P100s, which still kick ass compared to K80s and the like. The 4 that I am using right now also are using about a kilowatt between them.
How can you find out how much power they are using?

 2020-08-15, 19:26 #175 chalsall If I May     "Chris Halsall" Sep 2002 Barbados 931410 Posts AMD now on Colab... I don't have the time to drill down on the telemetry logs to determine when this started, but... Today I noticed that one of my instances was running an AMD EPYC 7B12. For anyone interested: Code: root@Colab_MAH:~# cat /proc/cpuinfo processor : 0 vendor_id : AuthenticAMD cpu family : 23 model : 49 model name : AMD EPYC 7B12 stepping : 0 microcode : 0x1000065 cpu MHz : 2250.000 cache size : 512 KB physical id : 0 siblings : 2 core id : 0 cpu cores : 1 apicid : 0 initial apicid : 0 fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid extd_apicid tsc_known_freq pni pclmulqdq ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm cmp_legacy cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw topoext ssbd ibrs ibpb stibp vmmcall fsgsbase tsc_adjust bmi1 avx2 smep bmi2 rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 clzero xsaveerptr arat npt nrip_save umip rdpid bugs : sysret_ss_attrs spectre_v1 spectre_v2 spec_store_bypass bogomips : 4500.00 TLB size : 3072 4K pages clflush size : 64 cache_alignment : 64 address sizes : 48 bits physical, 48 bits virtual power management: processor : 1 vendor_id : AuthenticAMD cpu family : 23 model : 49 model name : AMD EPYC 7B12 stepping : 0 microcode : 0x1000065 cpu MHz : 2250.000 cache size : 512 KB physical id : 0 siblings : 2 core id : 0 cpu cores : 1 apicid : 1 initial apicid : 1 fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid extd_apicid tsc_known_freq pni pclmulqdq ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm cmp_legacy cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw topoext ssbd ibrs ibpb stibp vmmcall fsgsbase tsc_adjust bmi1 avx2 smep bmi2 rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 clzero xsaveerptr arat npt nrip_save umip rdpid bugs : sysret_ss_attrs spectre_v1 spectre_v2 spec_store_bypass bogomips : 4500.00 TLB size : 3072 4K pages clflush size : 64 cache_alignment : 64 address sizes : 48 bits physical, 48 bits virtual power management: mprime doing P-1 work: Code: PFactor=N/A,1,2,106896703,-1,77,2 20200815_180354: [Work thread Aug 15 18:03] M106896703 stage 1 is 79.10% complete. Time: 372.068 sec. 20200815_181010: [Work thread Aug 15 18:10] M106896703 stage 1 is 79.92% complete. Time: 376.123 sec. 20200815_181627: [Work thread Aug 15 18:16] M106896703 stage 1 is 80.74% complete. Time: 376.287 sec. 20200815_182239: [Work thread Aug 15 18:22] M106896703 stage 1 is 81.56% complete. Time: 372.221 sec. 20200815_182850: [Work thread Aug 15 18:28] M106896703 stage 1 is 82.38% complete. Time: 370.941 sec. 20200815_183505: [Work thread Aug 15 18:35] M106896703 stage 1 is 83.20% complete. Time: 375.457 sec. 20200815_184122: [Work thread Aug 15 18:41] M106896703 stage 1 is 84.02% complete. Time: 376.390 sec. 20200815_184732: [Work thread Aug 15 18:47] M106896703 stage 1 is 84.84% complete. Time: 370.292 sec. 20200815_185347: [Work thread Aug 15 18:53] M106896703 stage 1 is 85.66% complete. Time: 375.166 sec. Last fiddled with by chalsall on 2020-08-15 at 19:29 Reason: Added mprime logs, for those interested.
 2020-08-17, 11:33 #176 Aramis Wyler     "Bill Staffen" Jan 2013 Pittsburgh, PA, USA 2·3·5·13 Posts I don't know if something is wrong with the colab hookup, but I've been doing trial factoring for 11 hours via colab and only the p-1 stuff is registering as completed. All of my home trial factoring is registering just fine, but all of the colab TF is sitting on the current assignments page at 100%, not on the completed assignments page.