mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing

Reply
 
Thread Tools
Old 2019-04-12, 14:31   #2762
diep
 
diep's Avatar
 
Sep 2006
The Netherlands

11001001102 Posts
Default

Quote:
Originally Posted by VBCurtis View Post
What you call absurd, everyone else calls normal- from the manufacturers to gamers to enthusiasts doing computation on CUDA cards. The tone of your posts is that you have the One True Way, and everyone else is a fool; that makes you look the fool for being so adamant.

I hope you're not this abrasively dogmatic in real life, too.
It is the standard as they have packed too much horse power in a handful of square centimeters of air ventilation - but it is not normal if you look from a logical viewpoint in it.

It's just the way how the pci-e clicks into the computer - derived from old standards that date back decades when cards that you clicked into the computer didn't need much of a cooling - None of them has been thinking yet about a better solution in order to keep it downwards compatible to click it in something like an 80s PC - provided you got a newer motherboard inside.

That still doesn't make it a good solution.

The machines that produce these CPU's and GPU's - in case of Nvidia that'll be TSMC which is producing them using ASML machines - those ASML machines already for far over a decade or 2, they prefer just under room temperature as the ideal chip temperature.

If those stuck in the past compliment you - you're doing something wrong.
diep is offline   Reply With Quote
Old 2019-04-12, 15:23   #2763
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

735210 Posts
Default

Quote:
Originally Posted by GhettoChild View Post
I've made this complaint in previous versions before and you've managed to code it back in again. It's quite an annoying error because it causes immense loss of processing time and waste of utility bills.

The program doesn't know which interval is most efficient to restart from upon round-off errors. It's using the last screen output iteration interval instead of taking whichever is the smaller iteration interval, or at least just the checkpoint iteration interval. My settings are to check the round-off on each iteration. In this example I'm posting my checkpoint writes are set to every 100,000 iterations in the ini file as well as my screen output. However I also pass the command line flags to set the screen output to 100,000 iterations but I manually increased this value to 500,000 iterations using the interactive keys value while the program was running. Wasteful results below:

CUDALucas2.05.1-CUDA8.0-Windows-x64.exe
Code:
Y
 -- report_iter increased to 500000

|   Date     Time    |   Test Num     Iter        Residue        |    FFT   Error     ms/It     Time  |       ETA      Done   |
|  Apr 10  16:07:21  |  M89951461  27500000  ******************  |  4860K  0.33594  72.8459 10362.19s  |  36:18:35:00  30.57%  |
|  Apr 10  22:58:20  |  M89951461  28000000  ******************  |  4860K  0.33008  49.3178 24658.93s  |  36:06:21:35  31.12%  |
|  Apr 11  05:49:13  |  M89951461  28500000  ******************  |  4860K  0.34375  49.3050 24652.49s  |  35:19:50:03  31.68%  |
|  Apr 11  12:37:13  |  M89951461  29000000  ******************  |  4860K  0.32422  48.9598 24479.94s  |  35:09:31:07  32.23%  |
|  Apr 11  19:29:07  |  M89951461  29500000  ******************  |  4860K  0.34375  49.4265 24713.27s  |  35:01:00:58  32.79%  |
Round off error at iteration = 29891700, err = 0.39063 > 0.35, fft = 4860K.
Restarting from last checkpoint to see if the error is repeatable.

Using threads: square 64, splice 256.

Continuing M89951461 @ iteration 29500001 with fft length 4860K, 32.80% done

Round off error at iteration = 29609100, err = 0.35156 > 0.35, fft = 4860K.
The error persists.
Trying a larger fft until the next checkpoint.

Using threads: square 64, splice 32.

Continuing M89951461 @ iteration 29500001 with fft length 5120K, 32.80% done

z
  -- fft count                      177
  -- current fft                    5120K
  -- smallest fft for this exponent 4860K
  -- largest fft for this exponent  6480K
  -- square threads                 64
  -- splice threads                 32
  -- checkpoint interval            100000
  -- report interval                500000
  -- error check interval           100
  -- error reset percent            85
  -- error limit                    40
  -- polite flag                    1
  -- polite value                   10
  -- sleep flag                     0
  -- sleep value                    100
  -- 64 bit carry flag              0
  -- save all checkpoints flag      0
  -- device number                  0
  -- savefile folder                savefiles
  -- ini file                       CUDALucas.ini
  -- input file                     worktodo.txt
  -- results file                   results.txt
I think that by increasing the screen output time to 500k intervals interactively, you've told it to only save checkpoint intervals every 500k intervals. It can't restart from a checkpoint file more recent than the latest that exists. I suggest you adjust screen update interval to 100K or 50K, to get maximum loss of iteration per resume down to around 1/2 to 1 hour.

Please update to v2.06 (May 5 2017 version). It contains checks for certain invalid interim residues, that v2.05.1 does not.
kriesel is online now   Reply With Quote
Old 2019-04-12, 15:50   #2764
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

23×919 Posts
Default

Quote:
Originally Posted by diep View Post
Of course important to figure out what it is as odds high it is something simple - yet important question is: is your card properly watercooled to room temperature?

Already for nearly 2 decades the chips produced perform best when just under or at 19C. Deviation from that, so extreme cold or hot say 40C+ will cause the chip to wear out - errors start to exist then - or in short it's too high clocked then.

You're probably running that gpu 24/24 or nearby that and it just wasn't designed to do that with the default air cool body it has.
Manufacturer's spec is 95C max for the Titan Black. https://www.geforce.com/hardware/des...specifications
Various other models range 87 to 105C max spec.

All my gpus run air cooled. Some have wattage comparable to the Titan Black's 250W rating. Ambient temp around all these systems is higher than the 19C you recommend for gpu running temp. To cool my fleet to such a temperature would be cost prohibitive. To try to put the chip temperature down to 19C might require circulating brine, and cause condensation problems in the systems! They run cooler in CUDALucas, ~80% of rated watts, than in Mfaktc ~100%. Most run 24/7/365. Some are thermally limiting. No gpu here is as cool as 40C, even at idle in a running system. Cumulative error rate is quite acceptable, comparable to prime95 rates. There was one GTX480 that developed bad memory errors and became reliably error prone. That was a factory-overclocked model bought used. It was replaced. This is a summary of gpu-decades of run time experience so far.

Last fiddled with by kriesel on 2019-04-12 at 16:06
kriesel is online now   Reply With Quote
Old 2019-04-12, 15:52   #2765
ATH
Einyen
 
ATH's Avatar
 
Dec 2003
Denmark

65528 Posts
Default

Quote:
Originally Posted by GhettoChild View Post
CUDALucas2.05.1-CUDA8.0-Windows-x64.exe
Do NOT use version 2.05 it has known errors that will sometimes give bad results. You HAVE to use 2.06 even though it is called "Beta", no one got around to removing that beta tag, but that is the only good version.
ATH is offline   Reply With Quote
Old 2019-04-12, 15:59   #2766
ATH
Einyen
 
ATH's Avatar
 
Dec 2003
Denmark

2×17×101 Posts
Default

Quote:
Originally Posted by diep View Post
The machines that produce these CPU's and GPU's - in case of Nvidia that'll be TSMC which is producing them using ASML machines - those ASML machines already for far over a decade or 2, they prefer just under room temperature as the ideal chip temperature.
If you ask everyone in GIMPS using GPU I bet 90%+ are running at 70C or more, and a good number of them are running at 80C or more.

Are you running a GPU with mfaktc or CUDALucas with water cooling? What temperature is it at? Can you back up your extreme claims?

I would love to run a GPU at room temperature, but it does not sound realistic. I have never tried water cooling myself either on CPU or GPU, so I'm hesitant to try it and definitely not on an old Titan Black, maybe in my next computer whenever that will be.

Last fiddled with by ATH on 2019-04-12 at 16:01
ATH is offline   Reply With Quote
Old 2019-04-12, 17:10   #2767
GhettoChild
 
"Ghetto_Child"
Jul 2014
Montreal, QC, Canada

41 Posts
Post

Quote:
Originally Posted by kriesel View Post
I think that by increasing the screen output time to 500k intervals interactively, you've told it to only save checkpoint intervals every 500k intervals. It can't restart from a checkpoint file more recent than the latest that exists. I suggest you adjust screen update interval to 100K or 50K, to get maximum loss of iteration per resume down to around 1/2 to 1 hour.

Please update to v2.06 (May 5 2017 version). It contains checks for certain invalid interim residues, that v2.05.1 does not.
I tested that theory, screen output has no effect on checkpoint iteration or the "savepoint on exit". The cEXPONENT and tEXPONENT files still update according to instructed values or events independant of screen output value. In other words when I lowered screen output to 50,000 the checkpoint on the hard drive did not update but the restart point on round-off error changed to smaller intervals than even the actual checkpoint value which resulted in less was reprocessing time but 10x more output lines printed on screen.

I have not tried version 2.06 yet because there is only a beta version available, no full release, and 2.05.1 is full release not beta.

Last fiddled with by GhettoChild on 2019-04-12 at 17:16
GhettoChild is offline   Reply With Quote
Old 2019-04-12, 17:31   #2768
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

23·919 Posts
Default

Quote:
Originally Posted by GhettoChild View Post
I tested that theory, screen output has no effect on checkpoint iteration or the "savepoint on exit". The cEXPONENT and tEXPONENT files still update according to instructed values or events independant of screen output value. In other words when I lowered screen output to 50,000 the checkpoint on the hard drive did not update but the restart point on round-off error changed to smaller intervals than even the actual checkpoint value which resulted in less was reprocessing time but 10x more output lines printed on screen.

I have not tried version 2.06 yet because there is only a beta version available, no full release, and 2.05.1 is full release not beta.
So what checkpoint interval are you specifying in cudalucas.ini or at the command line or interactively?
From cudalucas.ini:
Code:
# ErrorIterations tells how often the roundoff error is checked. Larger values
# give shorter iteration times, but introduce some uncertainty as to the actual
# maximum roundoff error that occurs during the test. Default is 100.
# ReportIterations is the same as the -x option; it determines how often
# screen output is written. Default is 10000.
# CheckpointIterations is the same as the -c option; it determines how often
# checkpoints are written. Default is 100000.
# Each of these values should be of the form k * 10^n with k = 1, 2, or 5.

ErrorIterations=100
ReportIterations=10000
CheckpointIterations=100000
kriesel is online now   Reply With Quote
Old 2019-04-12, 18:34   #2769
ATH
Einyen
 
ATH's Avatar
 
Dec 2003
Denmark

2×17×101 Posts
Default

I sent a message to Dubslow, flashjh and owftheevil and asked them to remove CUDALucas 2.03 and 2.05 from the site and to remove "Beta" from 2.06:

https://sourceforge.net/projects/cudalucas/files/

Last fiddled with by ATH on 2019-04-12 at 18:39
ATH is offline   Reply With Quote
Old 2019-04-12, 19:39   #2770
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

23×919 Posts
Default

Quote:
Originally Posted by ATH View Post
I sent a message to Dubslow, flashjh and owftheevil and asked them to remove CUDALucas 2.03 and 2.05 from the site and to remove "Beta" from 2.06:

https://sourceforge.net/projects/cudalucas/files/
As I recall, flashjh in response to https://www.mersenneforum.org/showpo...postcount=2708 and the next couple messages had added it to his queue. Can't find the post ATM.
kriesel is online now   Reply With Quote
Old 2019-04-12, 20:55   #2771
GhettoChild
 
"Ghetto_Child"
Jul 2014
Montreal, QC, Canada

4110 Posts
Post

Quote:
Originally Posted by kriesel View Post
So what checkpoint interval are you specifying in cudalucas.ini or at the command line or interactively?
From cudalucas.ini:
Exactly what I posted, I have 100,000 set in the ini file for both screen output and checkpoint interval. When I run the command line I add "-x 100000" but I don't specify the -t flag at all since it's already in the ini file. After I'm satisfied with the program running I use the "Y" input value to increase the screen output to every 500,000. Then I just wait throughout the day unti an error occurs. Same if I lower screen output using "y" down to 50000. No matter what value it uses the screen output interval as the last checkpoint instead of at least the checkpoint interval value or whichever of the two is a smaller interval.
GhettoChild is offline   Reply With Quote
Old 2019-04-13, 05:21   #2772
GhettoChild
 
"Ghetto_Child"
Jul 2014
Montreal, QC, Canada

2916 Posts
Default

I think I just now saw the tEXPONENT and cEXPONENT files update in sync with screen output instead of the checkpoint interval this time; but it's hard to be certain.

I'm going to try adding the -c command line flag with 100000 for the checkpoint value to see if this error occurs that way. After that I'll try setting screen output larger than checkpoint right in the ini file to begin with and use no command line flags/switches to see if the error also occurs.
GhettoChild is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Don't DC/LL them with CudaLucas LaurV Data 131 2017-05-02 18:41
CUDALucas / cuFFT Performance on CUDA 7 / 7.5 / 8 Brain GPU Computing 13 2016-02-19 15:53
CUDALucas: which binary to use? Karl M Johnson GPU Computing 15 2015-10-13 04:44
settings for cudaLucas fairsky GPU Computing 11 2013-11-03 02:08
Trying to run CUDALucas on Windows 8 CP Rodrigo GPU Computing 12 2012-03-07 23:20

All times are UTC. The time now is 05:45.


Fri Jan 27 05:45:32 UTC 2023 up 162 days, 3:14, 0 users, load averages: 0.88, 0.97, 1.00

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔