mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GPU Computing (https://www.mersenneforum.org/forumdisplay.php?f=92)
-   -   mfaktc: a CUDA program for Mersenne prefactoring (https://www.mersenneforum.org/showthread.php?t=12827)

TheJudger 2018-09-30 17:13

Wanted: Testrun on Turing card!
 
Hello,

anyone with a Turing GPU (e.g. Geforce RTX 2080 or 2080 Ti) willing to run some tests? Honza can't right now, I've asked him already.

Oliver

TheJudger 2018-10-05 20:23

Initial Turing benchmarks (RTX 2080Ti)
 
Hello,

finally I was able to put my hands on a Turing (RTX 20x0 series) card. Because of [URL="https://docs.nvidia.com/cuda/turing-tuning-guide/index.html#turing-tuning"]this[/URL] I was excited and I was right, Turing is a beast for mfaktc.

Unmodified mfaktc 0.21 sources (just adjusted the Makefile) + CUDA 10.0.130 on Linux:
[CODE]# ./mfaktc.exe -tf 66362159 73 74
mfaktc v0.21 (64bit built)
[...]
CUDA device info
name [B][COLOR="Red"]GeForce RTX 2080 Ti[/COLOR][/B]
compute capability 7.5
max threads per block 1024
max shared memory per MP 65536 byte
number of multiprocessors 68
clock rate (CUDA cores) 1635MHz
memory clock rate: 7000MHz
memory bus width: 352 bit
[...]
got assignment: exp=66362159 bit_min=73 bit_max=74 (28.83 GHz-days)
Starting trial factoring M66362159 from 2^73 to 2^74 (28.83 GHz-days)
k_min = 71160531149400
k_max = 142321062305090
Using GPU kernel "barrett76_mul32_gs"
Date Time | class Pct | time ETA | GHz-d/day Sieve Wait
Oct 05 22:12 | 0 0.1% | 0.630 10m04s | 4118.14 82485 n.a.%
Oct 05 22:12 | 4 0.2% | 0.563 8m59s | 4608.22 82485 n.a.%
Oct 05 22:12 | 9 0.3% | 0.562 8m58s | 4616.42 82485 n.a.%
[...]
Oct 05 22:21 | 4612 99.9% | 0.599 0m01s | 4331.27 82485 n.a.%
Oct 05 22:21 | 4617 100.0% | 0.600 0m00s | 4324.05 82485 n.a.%
no factor for [B][COLOR="red"]M66362159 from 2^73 to 2^74[/COLOR][/B] [mfaktc 0.21 barrett76_mul32_gs]
tf(): total time spent: [B][COLOR="red"]9m 30.800s[/COLOR][/B]

[/CODE]

This is a founders editions card, starting with a cold card. Power draw is ~260W on average so limited by power target. Temperature is a bit below 80°C and average clock is about 1680MHz once the card is "hot".

[U]New performance king[/U] but slightly behind [URL="https://mersenneforum.org/showpost.php?p=490784&postcount=2819"]Tesla V100[/URL] in terms of energy efficency.

Oliver

petrw1 2018-10-05 20:37

[QUOTE=TheJudger;497430]Hello,

finally I was able to put my hands on a Turing (RTX 20x0 series) card. Because of [URL="https://docs.nvidia.com/cuda/turing-tuning-guide/index.html#turing-tuning"]this[/URL] I was excited and I was right, Turing is a beast for mfaktc.

Unmodified mfaktc 0.21 sources (just adjusted the Makefile) + CUDA 10.0.130 on Linux:
[CODE]# ./mfaktc.exe -tf 66362159 73 74
mfaktc v0.21 (64bit built)
[...]
CUDA device info
name [B][COLOR="Red"]GeForce RTX 2080 Ti[/COLOR][/B]
compute capability 7.5
max threads per block 1024
max shared memory per MP 65536 byte
number of multiprocessors 68
clock rate (CUDA cores) 1635MHz
memory clock rate: 7000MHz
memory bus width: 352 bit
[...]
got assignment: exp=66362159 bit_min=73 bit_max=74 (28.83 GHz-days)
Starting trial factoring M66362159 from 2^73 to 2^74 (28.83 GHz-days)
k_min = 71160531149400
k_max = 142321062305090
Using GPU kernel "barrett76_mul32_gs"
Date Time | class Pct | time ETA | GHz-d/day Sieve Wait
Oct 05 22:12 | 0 0.1% | 0.630 10m04s | 4118.14 82485 n.a.%
Oct 05 22:12 | 4 0.2% | 0.563 8m59s | 4608.22 82485 n.a.%
Oct 05 22:12 | 9 0.3% | 0.562 8m58s | 4616.42 82485 n.a.%
[...]
Oct 05 22:21 | 4612 99.9% | 0.599 0m01s | 4331.27 82485 n.a.%
Oct 05 22:21 | 4617 100.0% | 0.600 0m00s | 4324.05 82485 n.a.%
no factor for [B][COLOR="red"]M66362159 from 2^73 to 2^74[/COLOR][/B] [mfaktc 0.21 barrett76_mul32_gs]
tf(): total time spent: [B][COLOR="red"]9m 30.800s[/COLOR][/B]

[/CODE]

This is a founders editions card, starting with a cold card. Power draw is ~260W on average so limited by power target. Temperature is a bit below 80°C and average clock is about 1680MHz once the card is "hot".

[U]New performance king[/U] but slightly behind [URL="https://mersenneforum.org/showpost.php?p=490784&postcount=2819"]Tesla V100[/URL] in terms of energy efficency.

Oliver[/QUOTE]

Wow...I can hardly wait. Mine is supposed to arrive Tuesday...hopefully alive within a week. Good to hear it works without any mfaktc compatibility issues.

James Heinrich 2018-10-05 20:42

[url]https://www.mersenne.ca/mfaktc.php[/url] has been updated with the new benchmark.

ET_ 2018-10-05 20:50

What is the Street Price of this monster?

James Heinrich 2018-10-05 21:16

[QUOTE=ET_;497439]What is the Street Price of this monster?[/QUOTE]Seems to be hovering around US$1800 right now, which is getting close to double MSRP, due to high demand and low supply. Things should calm down in a little while when there's more supply.

chalsall 2018-10-05 21:54

[QUOTE=TheJudger;497430][U]New performance king[/U] but slightly behind [URL="https://mersenneforum.org/showpost.php?p=490784&postcount=2819"]Tesla V100[/URL] in terms of energy efficency.[/QUOTE]

Like, um, wow!!! :smile:

Despite the capex, the investment might make sense over the life of the kit based on the TDP.

petrw1 2018-10-05 22:11

[QUOTE=James Heinrich;497442]Seems to be hovering around US$1800 right now, which is getting close to double MSRP, due to high demand and low supply. Things should calm down in a little while when there's more supply.[/QUOTE]

I pre-ordered my 2080Ti right after the announcement for $1200US.

ET_ 2018-10-06 10:24

[QUOTE=petrw1;497448]I pre-ordered my 2080Ti right after the announcement for $1200US.[/QUOTE]

Still expensive, having a GTX 980 idle at home... :sad:

ATH 2018-10-06 15:12

[QUOTE=TheJudger;497147]Hello!


[LIST=1][*]Installed [URL="https://visualstudio.microsoft.com/de/downloads/"]Visual Studio 2017.8 "Community"[/URL][*]Installed [URL="https://developer.nvidia.com/cuda-downloads?target_os=Windows&target_arch=x86_64"]CUDA Toolkit 10 for Windows[/URL][*]Installed [URL="http://www.mingw.org/"]MinGW[/URL] as on of many options for [I]GNU Make[/I] on Windows. In MinGW folder I've copied [I]bin/mingw32-make.exe[/I] to [I]bin/make.exe[/I] because I'm lazy. Careful when updating [I]mingw32-make.exe[/I]...[*]Configure Environment for [I]"x64 Native Tools-Command Promt"[/I] - add MinGW/bin and CUDA/bin to PATH variable.[/LIST]
The just open [I]"x64 Native Tools-Command Promt"[/I] and change into the directory with the mfaktc source files and run[CODE]
make -f Makefile.win[/CODE]

I had to adjust some settings in Makefile.win:[CODE]
CUDA_DIR = "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.0"

CC = cl
CFLAGS = /Ox /Oy /GL /W2 /fp:fast /I$(CUDA_DIR)\include /I$(CUDA_DIR)\include\cudart /nologo

NVCCFLAGS = --ptxas-options=-v
CUFLAGS = -ccbin "C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.15.26726\bin\Hostx86\x64" -x cu -I$(CUDA_DIR)\/include --machine 64 --compile -Xcompiler "/wd 4819" -DWIN64 -Xcompiler "/EHsc /W3 /nologo /O2 /FS" $(NVCCFLAGS)

# generate code for various compute capabilities
# NVCCFLAGS += --generate-code arch=compute_11,code=sm_11 # CC 1.1, 1.2 and 1.3 GPUs will use this code (1.0 is not possible for mfaktc)
# NVCCFLAGS += --generate-code arch=compute_20,code=sm_20 # CC 2.x GPUs will use this code, one code fits all!
NVCCFLAGS += --generate-code arch=compute_30,code=sm_30 # all CC 3.x GPUs _COULD_ use this code
NVCCFLAGS += --generate-code arch=compute_35,code=sm_35 # but CC 3.5 (3.2?) _CAN_ use funnel shift which is useful for mfaktc
NVCCFLAGS += --generate-code arch=compute_50,code=sm_50 # CC 5.x GPUs will use this code
NVCCFLAGS += --generate-code arch=compute_60,code=sm_60 # CC 6.x GPUs will use this code
NVCCFLAGS += --generate-code arch=compute_70,code=sm_70 # CC 7.x GPUs will use this code
# NVCCFLAGS += --generate-code arch=compute_75,code=sm_75 # CC 7.5 GPUs will use this code[/CODE]

Oliver[/QUOTE]


I compiled mfaktc this way without any errors but when running it, it does not work:
ERROR: cudaGetLastError() returned 8: invalid device function


Is CUDA10 too advanced for compute capability 3.5 (Titan Black) ? I did add 3.5 in Makefile and as I said no error:
NVCCFLAGS += --generate-code arch=compute_35,code=sm_35 # but CC 3.5 (3.2?) _CAN_ use funnel shift which is useful for mfaktc

There were some errors:
[CODE]c:\msys64\home\ath\mfaktc-0.21\src\tf_common.cu(242): warning C4996: 'cudaThreadSynchronize': was declared deprecated
c:\cuda10\include\cuda_runtime_api.h(947): note: see declaration of 'cudaThreadSynchronize'
c:\msys64\home\ath\mfaktc-0.21\src\tf_common.cu(242): warning C4996: 'cudaThreadSynchronize': was declared deprecated
c:\cuda10\include\cuda_runtime_api.h(947): note: see declaration of 'cudaThreadSynchronize'
c:\msys64\home\ath\mfaktc-0.21\src\tf_common_gs.cu(169): warning C4996: 'cudaThreadSynchronize': was declared deprecated
c:\cuda10\include\cuda_runtime_api.h(947): note: see declaration of 'cudaThreadSynchronize'
c:\msys64\home\ath\mfaktc-0.21\src\tf_common.cu(242): warning C4996: 'cudaThreadSynchronize': was declared deprecated
c:\cuda10\include\cuda_runtime_api.h(947): note: see declaration of 'cudaThreadSynchronize'
c:\msys64\home\ath\mfaktc-0.21\src\tf_common_gs.cu(169): warning C4996: 'cudaThreadSynchronize': was declared deprecated
c:\cuda10\include\cuda_runtime_api.h(947): note: see declaration of 'cudaThreadSynchronize'
gpusieve.cu(1371): warning C4244: '=': conversion from '__int64' to 'uint32', possible loss of data
gpusieve.cu(1385): warning C4244: '=': conversion from '__int64' to 'uint32', possible loss of data
gpusieve.cu(1400): warning C4244: '=': conversion from '__int64' to 'uint32', possible loss of data
gpusieve.cu(1416): warning C4244: '=': conversion from '__int64' to 'uint32', possible loss of data
gpusieve.cu(1450): warning C4244: '=': conversion from '__int64' to 'uint32', possible loss of data
gpusieve.cu(1466): warning C4244: '=': conversion from '__int64' to 'uint32', possible loss of data
gpusieve.cu(1506): warning C4244: '=': conversion from '__int64' to 'uint32', possible loss of data
gpusieve.cu(1522): warning C4244: '=': conversion from '__int64' to 'uint32', possible loss of data
gpusieve.cu(1558): warning C4244: '=': conversion from '__int64' to 'uint32', possible loss of data
gpusieve.cu(1273): warning C4996: 'cudaThreadSetCacheConfig': was declared deprecated
c:\cuda10\include\cuda_runtime_api.h(1112): note: see declaration of 'cudaThreadSetCacheConfig'
gpusieve.cu(1599): warning C4996: 'cudaThreadSynchronize': was declared deprecated
c:\cuda10\include\cuda_runtime_api.h(947): note: see declaration of 'cudaThreadSynchronize'
gpusieve.cu(1621): warning C4996: 'cudaThreadSynchronize': was declared deprecated
c:\cuda10\include\cuda_runtime_api.h(947): note: see declaration of 'cudaThreadSynchronize'
gpusieve.cu(1645): warning C4996: 'cudaThreadSynchronize': was declared deprecated
c:\cuda10\include\cuda_runtime_api.h(947): note: see declaration of 'cudaThreadSynchronize'
c:\msys64\home\ath\mfaktc-0.21\src\tf_common.cu(242): warning C4996: 'cudaThreadSynchronize': was declared deprecated
c:\cuda10\include\cuda_runtime_api.h(947): note: see declaration of 'cudaThreadSynchronize'
c:\msys64\home\ath\mfaktc-0.21\src\tf_common_gs.cu(169): warning C4996: 'cudaThreadSynchronize': was declared deprecated
c:\cuda10\include\cuda_runtime_api.h(947): note: see declaration of 'cudaThreadSynchronize'
[/CODE]

kriesel 2018-10-06 17:18

[QUOTE=ATH;497503]I compiled mfaktc this way without any errors but when running it, it does not work:
ERROR: cudaGetLastError() returned 8: invalid device function
...
Is CUDA10 too advanced for compute capability 3.5 (Titan Black) ?[/QUOTE]

OK, you compiled for CUDA10. Have you confirmed the installed NVIDIA driver supports CUDA10? Confirmed the CUDArt....dll is CUDA10 also? (Via file names, or separate tools?) My recollection is invalid device function error 8 shows up when there's a mismatch.

Have you tried compiling a smaller simpler sample project? How did that go? One that prints out driver supported CUDA version, runtime dll supported version, and gpu model & CUDA CC level would be good. Such might be quickly created from (a copy of) mfaktc by deleting most of mfaktc. Keep just the part necessary for producing the following output, and accepting a device number as input, or spin through integers starting from zero until there's no gpu there:
[CODE]CUDA version info
binary compiled for CUDA 6.50
CUDA runtime version 6.50
CUDA driver version 9.10

CUDA device info
name GeForce GTX 480
compute capability 2.0
maximum threads per block 1024
number of multiprocessors 15 (480 shader cores)
clock rate 1451MHz[/CODE]I take [URL]https://docs.nvidia.com/cuda/pdf/CUDA_Compiler_Driver_NVCC.pdf[/URL] page 23 to mean CUDA10 toolkit supports CC 3.0 and up.

Also [URL]https://en.wikipedia.org/wiki/CUDA#GPUs_supported[/URL].

ATH 2018-10-06 19:19

Yes, I installed the drivers that came with CUDA10 and added the CUDA 10 dll files:
[CODE]CUDA version info
binary compiled for CUDA 10.0
CUDA runtime version 10.0
CUDA driver version 10.0[/CODE]

I also tried compiling with CUDA 9.2 which I still have installed and copy the CUDA 9.2 dll files to the folder. Again it compiled without error, but same error message when running it:
[CODE]CUDA version info
binary compiled for CUDA 9.20
CUDA runtime version 9.20
CUDA driver version 10.0[/CODE]


I'll just wait for Oliver's CUDA10 Windows binaries and test if those works on my card. Otherwise the old binaries I have are good enough, I just wanted to compile them myself if I could.

bayanne 2018-10-07 14:35

[QUOTE=TheJudger;497430]Hello,

finally I was able to put my hands on a Turing (RTX 20x0 series) card. Because of [URL="https://docs.nvidia.com/cuda/turing-tuning-guide/index.html#turing-tuning"]this[/URL] I was excited and I was right, Turing is a beast for mfaktc.

Unmodified mfaktc 0.21 sources (just adjusted the Makefile) + CUDA 10.0.130 on Linux:
[CODE]# ./mfaktc.exe -tf 66362159 73 74
mfaktc v0.21 (64bit built)
[...]
CUDA device info
name [B][COLOR="Red"]GeForce RTX 2080 Ti[/COLOR][/B]
compute capability 7.5
max threads per block 1024
max shared memory per MP 65536 byte
number of multiprocessors 68
clock rate (CUDA cores) 1635MHz
memory clock rate: 7000MHz
memory bus width: 352 bit
[...]
got assignment: exp=66362159 bit_min=73 bit_max=74 (28.83 GHz-days)
Starting trial factoring M66362159 from 2^73 to 2^74 (28.83 GHz-days)
k_min = 71160531149400
k_max = 142321062305090
Using GPU kernel "barrett76_mul32_gs"
Date Time | class Pct | time ETA | GHz-d/day Sieve Wait
Oct 05 22:12 | 0 0.1% | 0.630 10m04s | 4118.14 82485 n.a.%
Oct 05 22:12 | 4 0.2% | 0.563 8m59s | 4608.22 82485 n.a.%
Oct 05 22:12 | 9 0.3% | 0.562 8m58s | 4616.42 82485 n.a.%
[...]
Oct 05 22:21 | 4612 99.9% | 0.599 0m01s | 4331.27 82485 n.a.%
Oct 05 22:21 | 4617 100.0% | 0.600 0m00s | 4324.05 82485 n.a.%
no factor for [B][COLOR="red"]M66362159 from 2^73 to 2^74[/COLOR][/B] [mfaktc 0.21 barrett76_mul32_gs]
tf(): total time spent: [B][COLOR="red"]9m 30.800s[/COLOR][/B]

[/CODE]

This is a founders editions card, starting with a cold card. Power draw is ~260W on average so limited by power target. Temperature is a bit below 80°C and average clock is about 1680MHz once the card is "hot".

[U]New performance king[/U] but slightly behind [URL="https://mersenneforum.org/showpost.php?p=490784&postcount=2819"]Tesla V100[/URL] in terms of energy efficency.

Oliver[/QUOTE]

Wow, look at the GHz-d/day figures!

kriesel 2018-10-07 15:42

[QUOTE=James Heinrich;497442]Seems to be hovering around US$1800 right now, which is getting close to double MSRP, due to high demand and low supply. Things should calm down in a little while when there's more supply.[/QUOTE]
Today's eBay listings have the RTX 2080 Ti ranging from $1200-1400 in ongoing auctions, and $1500 and up (way up) for buy it now.

James Heinrich 2018-10-07 15:47

[QUOTE=kriesel;497573]Today's eBay listings have the RTX 2080 Ti ranging from $1200-1400 in ongoing auctions, and $1500 and up (way up) for buy it now.[/QUOTE]Buy It Now prices are useful, ongoing auction prices are mostly irrelevant. Most useful is the price of [URL="https://www.ebay.com/sch/i.html?_from=R40&_nkw=rtx+2080+ti&_sacat=0&rt=nc&LH_Sold=1&LH_Complete=1"]recently sold items[/URL], a version of which I use to update the price listing on [URL="http://www.mersenne.ca/mfaktc.php"]my site[/URL].

xx005fs 2018-10-07 19:33

[QUOTE=TheJudger;497430]Hello,

finally I was able to put my hands on a Turing (RTX 20x0 series) card. Because of [URL="https://docs.nvidia.com/cuda/turing-tuning-guide/index.html#turing-tuning"]this[/URL] I was excited and I was right, Turing is a beast for mfaktc.

Unmodified mfaktc 0.21 sources (just adjusted the Makefile) + CUDA 10.0.130 on Linux:
[CODE]# ./mfaktc.exe -tf 66362159 73 74
mfaktc v0.21 (64bit built)
[...]
CUDA device info
name [B][COLOR="Red"]GeForce RTX 2080 Ti[/COLOR][/B]
compute capability 7.5
max threads per block 1024
max shared memory per MP 65536 byte
number of multiprocessors 68
clock rate (CUDA cores) 1635MHz
memory clock rate: 7000MHz
memory bus width: 352 bit
[...]
got assignment: exp=66362159 bit_min=73 bit_max=74 (28.83 GHz-days)
Starting trial factoring M66362159 from 2^73 to 2^74 (28.83 GHz-days)
k_min = 71160531149400
k_max = 142321062305090
Using GPU kernel "barrett76_mul32_gs"
Date Time | class Pct | time ETA | GHz-d/day Sieve Wait
Oct 05 22:12 | 0 0.1% | 0.630 10m04s | 4118.14 82485 n.a.%
Oct 05 22:12 | 4 0.2% | 0.563 8m59s | 4608.22 82485 n.a.%
Oct 05 22:12 | 9 0.3% | 0.562 8m58s | 4616.42 82485 n.a.%
[...]
Oct 05 22:21 | 4612 99.9% | 0.599 0m01s | 4331.27 82485 n.a.%
Oct 05 22:21 | 4617 100.0% | 0.600 0m00s | 4324.05 82485 n.a.%
no factor for [B][COLOR="red"]M66362159 from 2^73 to 2^74[/COLOR][/B] [mfaktc 0.21 barrett76_mul32_gs]
tf(): total time spent: [B][COLOR="red"]9m 30.800s[/COLOR][/B]

[/CODE]

This is a founders editions card, starting with a cold card. Power draw is ~260W on average so limited by power target. Temperature is a bit below 80°C and average clock is about 1680MHz once the card is "hot".

[U]New performance king[/U] but slightly behind [URL="https://mersenneforum.org/showpost.php?p=490784&postcount=2819"]Tesla V100[/URL] in terms of energy efficency.

Oliver[/QUOTE]

This is honestly really impressive performance. For 1200$ you get about the same as Titan V would in trial factoring. Now I am just hoping an improvement like this for the next generation cards in LL that's as incredible as this speed bump from Pascal to Turing in trial factoring.

kriesel 2018-10-07 20:48

[QUOTE=James Heinrich;497574]Buy It Now prices are useful, ongoing auction prices are mostly irrelevant. Most useful is the price of [URL="https://www.ebay.com/sch/i.html?_from=R40&_nkw=rtx+2080+ti&_sacat=0&rt=nc&LH_Sold=1&LH_Complete=1"]recently sold items[/URL], a version of which I use to update the price listing on [URL="http://www.mersenne.ca/mfaktc.php"]my site[/URL].[/QUOTE]
As a possible shopper, I find ongoing auction prices and bid counts interesting, particularly on the nearly expired auctions, in the sense I might want to jump in on one that's appealingly priced below the lowest buy it now price of a reputable seller. Those that have 5 days to run yet, not so much.
Looking at the first page of of recently sold (thanks for the link), I saw a range $1309-1925, and auctions seemed to dominate the low end, buy it now the high end. (Note that $1309 was a preorder and wait, not actual product in hand.)

kladner 2018-10-09 00:25

I am necessarily out of the TF business for a while. In the course of getting a cooler swap working, I realized that one fan on the GTX 1060 was stuttering and not running up to speed. After I pulled it, I found that the GTX 460 was running at about half of its previous outrageous output. Moving it to the primary PCIe slot did not help.

Consequently, I have a lot fewer power cables cluttering things up. Half speed on the 460 was not worth the power draw. I am now drawing just less than 200 W at the wall, with P95 LL at 4400 MHz. About 80 W idle. No pesky GPUs to fuss over. :sad: CPU Package power about 110-115 W.

The only way now is to see if I can get a 580 or the 570 running. Maybe there [I]was [/I]something special about running the 460 in the secondary slot, with a card in the primary PCIe. :confused2:

I have ditched all my TF work of any description. With cooler weather and no GPUs, at least I can look forward to a power bill drop. :smile:

TheJudger 2018-10-09 19:21

Just uploaded a new set of binaries of mfaktc 0.21 for Windows using CUDA 10.0.[LIST][*][URL="https://mersenneforum.org/mfaktc/mfaktc-0.21/mfaktc-0.21.win.cuda100.zip"]mfaktc-0.21.win.cuda100.zip[/URL][*][URL="https://mersenneforum.org/mfaktc/mfaktc-0.21/mfaktc-0.21.win.cuda100.extra-versions.zip"]mfaktc-0.21.win.cuda100.extra-versions.zip[/URL][/LIST]If you're already running mfaktc 0.21 using an older CUDA version there is no need to upgrade, the source of mfaktc are unmodified.
These CUDA 10.0 binaries are compiled for compute_30 (Kepler), compute_35 (Kepler Update), compute_50 (Maxwell), compute_60 (Pascal), compute_70 (Volta) and compute_75 (Turing).
No support for compute_20 (Fermi) or even older cards and only 64bit binaries - main purpose of these binaries are Volta and Turing GPUs. For the latter there are only 64bit drivers available so the decission was easy.

Happy factor hunting!
Oliver

P.S. I had no access to Volta and Turing running Windows - if someone has such a GPU running on Windows please run the full selftest (e.g. mfaktc*.exe -st) for all 4 binaries and report results. Thank you!

petrw1 2018-10-09 21:50

I'll have such a beast in a few days.

In the meantime can I get a quick answer to:

"Is MFAKTC capable of stopping/pausing if my PC has to revert to a UPS during a power failure?"

I'll check the config parms when I get home.
I'm not sure it makes sense to have a UPS if the GPU will drain it in a couple minutes.

James Heinrich 2018-10-09 22:02

Assuming your UPS has management software that can trigger events then it's trivial, even if your software of choice doesn't have power detection capabilities. In my case if the power it out for more than 5 seconds I get it to automatically run [code]taskkill /IM mfaktc.exe[/code]

storm5510 2018-10-10 00:33

[QUOTE=petrw1;497730]...I'm not sure it makes sense to have a UPS if the GPU will drain it in a couple minutes.[/QUOTE]

Or nothing, in my case. I have an APC. It's too small to handle this 1080 setup even when the AC is on. It sounds its alarm. No GPU usage, it is fine.

My solution was power taps plugged into the non-battery backed receptacles. It still provides the suppression, but no battery.

To reply to your comment: I do not think it is practical either.

kriesel 2018-10-10 01:14

[QUOTE=storm5510;497737]Or nothing, in my case. I have an APC. It's too small to handle this 1080 setup even when the AC is on. It sounds its alarm. No GPU usage, it is fine.

My solution was power taps plugged into the non-battery backed receptacles. It still provides the suppression, but no battery.

To reply to your comment: I do not think it is practical either.[/QUOTE]

A UPS sized to give even a few minutes of runtime on battery for a system with two ~240W GPUs and a couple of multicore cpus fully loaded is >1.5KVA in my experience. That is, 1.5KVA didn't do it; mains drop, UPS inverter overload-faults immediately. A total of 300W of GPU plus 12 cpu cores though, is ok. It will ride out the occasional lights-blinked or brief squirrel-short but not a sustained outage due to runtime limits. A 1.5KVA UPS is typically the top of the consumer/small office offerings readily available in local retail. A 2KVA UPS costs a LOT more than a 1.5.

James Heinrich 2018-10-10 01:47

I have the [url=https://www.cyberpowersystems.com/product/ups/cp1500pfclcd/]CyberPower CP1500PFCLD[/url], and I'm very pleased with it. Rated at 1500VA / 900W, which should be fine for most setups. Fully loaded you only have a few minutes of runtime, but as noted above if the outage lasts more than a few seconds I kill mfaktc and Prime95 and the runtime jumps to 15+ mins.

One gotcha that caught me when I was shopping for a UPS was that cheaper UPS's give a poor approximation of 120V sine wave, and that doesn't play nice with Active-PFC power supplies, as in the computer would shut down immediately upon switching to UPS battery. Switching to the slightly more expensive "pure sine wave" version solves the problem.

kriesel 2018-10-10 05:04

[QUOTE=James Heinrich;497742]I have the [URL="https://www.cyberpowersystems.com/product/ups/cp1500pfclcd/"]CyberPower CP1500PFCLD[/URL], and I'm very pleased with it. Rated at 1500VA / 900W, which should be fine for most setups. Fully loaded you only have a few minutes of runtime, but as noted above if the outage lasts more than a few seconds I kill mfaktc and Prime95 and the runtime jumps to 15+ mins.

One gotcha that caught me when I was shopping for a UPS was that cheaper UPS's give a poor approximation of 120V sine wave, and that doesn't play nice with Active-PFC power supplies, as in the computer would shut down immediately upon switching to UPS battery. Switching to the slightly more expensive "pure sine wave" version solves the problem.[/QUOTE]
That seems a very nice unit, but would probably not handle my Lenovo D20 load, which is directly connected to the AC outlet. I recently tried adding a lowly Quadro 2000 as third gpu. It was fine at idle, but caused the system to shut down promptly when a gpu app loaded the ~60W Quadro, so I conclude it was running near max PS load without the Quadro. It has a 1060W active PFC power supply. I assume that is rated output, so required input from the UPS would be higher than a KW. I vaguely recall checking it with a KillaWatt a while ago and concluding it exceeded the wattage rating of my largest UPS (1500/980). An HP Z600 has a 650W supply, which handles less gpu wattage but is ok on my UPSes.

Mark Rose 2018-10-10 18:21

[QUOTE=kriesel;497748]That seems a very nice unit, but would probably not handle my Lenovo D20 load, which is directly connected to the AC outlet. I recently tried adding a lowly Quadro 2000 as third gpu. It was fine at idle, but caused the system to shut down promptly when a gpu app loaded the ~60W Quadro, so I conclude it was running near max PS load without the Quadro. It has a 1060W active PFC power supply. I assume that is rated output, so required input from the UPS would be higher than a KW. I vaguely recall checking it with a KillaWatt a while ago and concluding it exceeded the wattage rating of my largest UPS (1500/980). An HP Z600 has a 650W supply, which handles less gpu wattage but is ok on my UPSes.[/QUOTE]

That's a bronze rated power supply, so you're looking around 82 to 85% efficient. Assuming 82%, and the 980 watt limit of the UPS, that means the internal draw can be no more than 800 watts without it tripping.

Unless your electricity is super cheap, a Gold or Platinum power supply would pay for itself quickly.

kriesel 2018-10-10 21:51

[QUOTE=Mark Rose;497794]That's a bronze rated power supply, so you're looking around 82 to 85% efficient. Assuming 82%, and the 980 watt limit of the UPS, that means the internal draw can be no more than 800 watts without it tripping.

Unless your electricity is super cheap, a Gold or Platinum power supply would pay for itself quickly.[/QUOTE]
What's up with the advertised Platinum 80+ power supplies? Others are indicated as 94%.

The HP Z600s are bronze 80+ also, but are oddly shaped so replacement with a more energy efficient type could be a challenge. [URL="https://www.ebay.com/itm/HP-Z600-WorkStation-Power-Supply-HP-s-508548-001-482513-003-650-WATT-TESTED/132814079218"]https://www.ebay.com/itm/HP-z600-Workstation-650W-Power-Supply-508548-001/161652055904?hash=item25a3369f60:rk:1:pf:0[/URL]
They're also limited by having only a single 6-pin gpu power cable.

For the Lenovo I'd be tempted to go for a slightly higher total output, to enable the third gpu.
Electricity is about to become effectively less costly, with the beginning of the heating season. Residential rates are around US$0.12/KW-HR here, so a KW base load is around $1050/year. Natural gas or cut-your-own wood is a much more economical heat source, but does no computing.

storm5510 2018-10-11 00:46

[QUOTE=kriesel;497740]A UPS sized to give even a few minutes of runtime on battery for a system with two ~240W GPUs and a couple of multicore cpus fully loaded is >1.5KVA in my experience. That is, 1.5KVA didn't do it; mains drop, UPS inverter overload-faults immediately. A total of 300W of GPU plus 12 cpu cores though, is ok. It will ride out the occasional lights-blinked or brief squirrel-short but not a sustained outage due to runtime limits. A 1.5KVA UPS is typically the top of the consumer/small office offerings readily available in local retail. A 2KVA UPS costs a LOT more than a 1.5.[/QUOTE]

To comfortably run what I have (two machines), I would need something above 600W. This is adding a bit of cushion. About the best GPU the older HP could handle would be something like a GTX 1060. What it has now is of no consequence. This 1080 rig can easily pass 300W if I work the i7 along with it. I commonly run [I]Prime95[/I] and a GPU process at the same time.

It is my understanding that a UPS is designed to maintain a running system in order to perform a proper shutdown as soon as possible. Is this incorrect?

James Heinrich 2018-10-11 01:11

[QUOTE=storm5510;497823]It is my understanding that a UPS is designed to maintain a running system in order to perform a proper shutdown as soon as possible. Is this incorrect?[/QUOTE][i]Ideally[/i] a UPS should maintain power to the protected system until normal power is restored. For critical systems this may involve short-term battery backup long enough to get an alternative power source in place (e.g. diesel generator). This generally doesn't apply to home systems (but definitely applies to places like data centres and hospitals).
Most power interruptions at the residential level are very short (< 1s) to short (< 30s), in which case it entirely makes sense [i]not[/i] to immediately trigger a system shutdown when the power will often be restored before the system even shuts down. Assuming your UPS has a nominal runtime of say 5 minutes, and it might typically take 2 minutes to orderly power down your system, the software should be configured to let the system run for 1-2 minutes on battery power before triggering a system shutdown. Most of the time this will let you ride out the short power interruption with no disruption. If it's longer than that then a shutdown is in order, and you want to make sure there's enough battery to shutdown cleanly.
In my case I allow things to run normally for 20 seconds, if the outage is longer than that I terminate mfaktc and Prime95 which cuts system power consumption to 25% of full-load, commensurately extending battery runtime by x4. If the power is still out after 5 minutes I have it set to do an orderly shutdown to conserve battery power. Since my UPS also powers my modem and router with any luck the internet still works and I can continue doing whatever I need online on a laptop or tablet (this happened to me about 6 months ago, power was out for 8+ hours but the UPS kept me online).

storm5510 2018-10-12 03:20

[QUOTE=James Heinrich;497825][i]Ideally[/i] a UPS should maintain power to the protected system until normal power is restored. For critical systems this may involve short-term battery backup long enough to get an alternative power source in place (e.g. diesel generator). This generally doesn't apply to home systems (but definitely applies to places like data centres and hospitals).
Most power interruptions at the residential level are very short (< 1s) to short (< 30s), in which case it entirely makes sense [i]not[/i] to immediately trigger a system shutdown when the power will often be restored before the system even shuts down. Assuming your UPS has a nominal runtime of say 5 minutes, and it might typically take 2 minutes to orderly power down your system, the software should be configured to let the system run for 1-2 minutes on battery power before triggering a system shutdown. Most of the time this will let you ride out the short power interruption with no disruption. If it's longer than that then a shutdown is in order, and you want to make sure there's enough battery to shutdown cleanly.
In my case I allow things to run normally for 20 seconds, if the outage is longer than that I terminate mfaktc and Prime95 which cuts system power consumption to 25% of full-load, commensurately extending battery runtime by x4. If the power is still out after 5 minutes I have it set to do an orderly shutdown to conserve battery power. Since my UPS also powers my modem and router with any luck the internet still works and I can continue doing whatever I need online on a laptop or tablet (this happened to me about 6 months ago, power was out for 8+ hours but the UPS kept me online).[/QUOTE]

Periodically, I can hear the relay(s) in this UPS click. This suggests an extremely short duration interruption. I never see anything though. So, the UPS is functioning as it should. When there is an outage, it typically lasts several hours. They are never brief.

This unit is too small for my i7 and 1080 setup. It's limit is 300W. That would work fine for the HP. What I need to do is get another larger unit dedicated to this i7 [U]only[/U]. Many have a USB connection, as mine does, so it can interact with the OS. I usually set the auto-shutdown to five minutes, just in case I am not here to tend to it manually.

storm5510 2018-10-23 02:23

Scratching my head.
 
1 Attachment(s)
[B]I will put this here because I cannot find anyplace more appropriate.[/B]

Since [I]mersenne.ca [/I]went to a SSL connection, I have not been able to communicate with it. I have searched this entire site looking for clues as to what the problem may be. I am relatively certain the problem is on my end, because of finding no mention anywhere else.

Below is a capture of a command prompt window attempting to run James' batch file which is located on his reservation page. All this sails over my head, so if anyone see anything obvious, please respond.

Thank you. :confused2:

axn 2018-10-23 02:41

[QUOTE=storm5510;498555][B]I will put this here because I cannot find anyplace more appropriate.[/B]

Since [I]mersenne.ca [/I]went to a SSL connection, I have not been able to communicate with it. I have searched this entire site looking for clues as to what the problem may be. I am relatively certain the problem is on my end, because of finding no mention anywhere else.

Below is a capture of a command prompt window attempting to run James' batch file which is located on his reservation page. All this sails over my head, so if anyone see anything obvious, please respond.

Thank you. :confused2:[/QUOTE]

Are you trying to access from your work/uni network? In that case, they might be trying to intercept the traffic.

James Heinrich 2018-10-23 03:05

[QUOTE=storm5510;498555]Since [I]mersenne.ca [/I]went to a SSL connection, I have not been able to communicate with it. I have searched this entire site looking for clues as to what the problem may be.[/QUOTE]Apparently you missed my reply to your email (check your spam folder?) :smile:

The old (v1.11.4) version of Wget for Windows from gnuwin32.sourceforge.net doesn't play nice with https (it can download fine but can't authenticate the certificates). You need to use the newer v1.19.4 build available from here:
[url]https://eternallybored.org/misc/wget/[/url]

[SIZE="1"](alternately you [i]can[/i] modify your wget commands to include [i]--no-check-certificate[/i] and that will work, but getting the newer (working) version would be a better solution).[/SIZE]

storm5510 2018-10-23 12:03

[QUOTE=James Heinrich;498557]Apparently you missed my reply to your email (check your spam folder?) :smile:

The old (v1.11.4) version of Wget for Windows from gnuwin32.sourceforge.net doesn't play nice with https (it can download fine but can't authenticate the certificates). You need to use the newer v1.19.4 build available from here:
[url]https://eternallybored.org/misc/wget/[/url]

[SIZE="1"](alternately you [i]can[/i] modify your wget commands to include [i]--no-check-certificate[/i] and that will work, but getting the newer (working) version would be a better solution).[/SIZE][/QUOTE]

You were correct about the spam folder. It should not have gone there since I have you in my address book. Regardless, I marked it as 'not spam.'

I tried [I]--no-check-certificate[/I] after web-searching to find out exactly where it should be placed in the line. Still no joy. This tells me there is definitely something amiss in my setup. I looked at the firewall settings. I didn't expect to find anything there, but it was worth a try. I don't actually have an environment variable for it. I used a fully qualified path instead.

I will download the new one, and congrats on the security upgrade. Most sites on the web are now [I]https[/I]. The way things are it is almost a [U]must-have[/U].

Thank you for answering this dumb old boys' post. :smile:

kladner 2018-10-23 16:54

[QUOTE=kladner;497666]I am necessarily out of the TF business for a while.....[/QUOTE]
Just filed for RMA with Gigabyte. :smile:
EDIT: Approved within an hour!

storm5510 2018-10-24 02:10

[QUOTE=kladner;498582]Just filed for RMA with Gigabyte. :smile:
EDIT: Approved within an hour![/QUOTE]

I paid for a three-year service plan when I got this Gigabyte GTX 1080. As much as I had to pay for it, at the time (April), I could not risk not having a backup plan. I hope I never need it. However, one never knows.

kladner 2018-10-25 06:01

[QUOTE=storm5510;498630]I paid for a three-year service plan when I got this Gigabyte GTX 1080. As much as I had to pay for it, at the time (April), I could not risk not having a backup plan. I hope I never need it. However, one never knows.[/QUOTE]
There is a 3 year warranty. What does the service plan add?

kriesel 2018-10-27 14:35

[QUOTE=storm5510;497914]Periodically, I can hear the relay(s) in this UPS click. This suggests an extremely short duration interruption.[/QUOTE]There are other possibilities. Maybe it's a UPS that does not only straight passthrough, but slight stepup or stepdown of line voltage as well as run from battery, and there's a relay involved in that. These days, the most frequent cases seem to be passthrough and downregulation. Line voltages have been ramping up over the decades.
Now nominally 120V, US single phase voltage used to be called 115V and before that 110, and before that 108, but a KillaWatt will often show 123V or more at the wall socket. A well designed UPS protects not only against zero volts, but sags, surges, and spikes. Just now my wall outlet read 123V, and sagged less than 1V with a toaster load added.

A relay clicking is something I commonly hear on UPSes doing battery test while continuing to carry the load, at a time selected by the UPS, while the line voltage is still good. That's how UPSes detect it's time to replace the battery and light that expensive LED on the front; the battery fails the UPS selftest.

petrw1 2018-10-28 22:00

1 Attachment(s)
[QUOTE=TheJudger;497720]Just uploaded a new set of binaries of mfaktc 0.21 for Windows using CUDA 10.0.[LIST][*][URL="https://mersenneforum.org/mfaktc/mfaktc-0.21/mfaktc-0.21.win.cuda100.zip"]mfaktc-0.21.win.cuda100.zip[/URL][*][URL="https://mersenneforum.org/mfaktc/mfaktc-0.21/mfaktc-0.21.win.cuda100.extra-versions.zip"]mfaktc-0.21.win.cuda100.extra-versions.zip[/URL][/LIST]If you're already running mfaktc 0.21 using an older CUDA version there is no need to upgrade, the source of mfaktc are unmodified.
These CUDA 10.0 binaries are compiled for compute_30 (Kepler), compute_35 (Kepler Update), compute_50 (Maxwell), compute_60 (Pascal), compute_70 (Volta) and compute_75 (Turing).
No support for compute_20 (Fermi) or even older cards and only 64bit binaries - main purpose of these binaries are Volta and Turing GPUs. For the latter there are only 64bit drivers available so the decission was easy.

Happy factor hunting!
Oliver

P.S. I had no access to Volta and Turing running Windows - if someone has such a GPU running on Windows please run the full selftest (e.g. mfaktc*.exe -st) for all 4 binaries and report results. Thank you![/QUOTE]

I have results for binary: mfaktc-win-64.exe

PASSED!

petrw1 2018-10-28 22:57

HOWEVER....
 
2 Attachment(s)
When I ran an actual factor assignment …

First the Good News (I think) I was getting about 3,800 GD/Day

But after about a minute of running my screen pixelated like this (mfaktc 1)

Then the PC crashed with this STOPCODE (mfaktc 2)

TheJudger 2018-10-29 19:40

Hello,

is this behaviour repeatable? The pixelated screens [U]looks[/U] like too much OC or HW failure to me in first place but that is not the only option. Did you run some other workloads?

Keep in mind that the selftest is a selftest for the software itself and not a stresstest for HW and it doesn't put much load on the GPU during selftest.

Oliver

petrw1 2018-10-29 20:40

[QUOTE=TheJudger;499042]Hello,

is this behaviour repeatable? The pixelated screens [U]looks[/U] like too much OC or HW failure to me in first place but that is not the only option. Did you run some other workloads?

Keep in mind that the selftest is a selftest for the software itself and not a stresstest for HW and it doesn't put much load on the GPU during selftest.

Oliver[/QUOTE]

Repeated 3 times the same evening.
After the second I tried to install the driver that came with the new monitor (grasping at straws).

I haven't tried any OC of the GPU...just stock settings.
I did notice that the GhzDays/Day varied quite a bit from about 2,500 to 3,800 over that minute before the crash.

Should I try other exponent ranges or bit levels?
Are there any config parameters that might be useful to try?
Could it be a driver issue???

I just didn't know where to start....you (mfaktc issue) or NVIDIA (GPU issue) or my tech support (CPU/MB/RAM issue).

In case it is relevant (though unlikely) it was built with 4x*GB RAM but Windows only sees 24GB.

Thx

TheJudger 2018-10-29 21:31

Hi,

I recommend to test other software in this case. I know that mfaktc in non-selftest mode easily hits the powertarget on RTX 2080 Ti so likely similar on other turing cards. Maybe try something like Furmark to stress your GPU really hard.
24 out of 32 GiB looks like one memory module isn't detected, maybe tools like CPUz give a hint which one.
I would do step by step
1. fix memory detection
2. run memtest and/or Prime95 torture test
3. put some load on GPU
Just the usual "how to test my system".

Oliver

ATH 2018-10-29 23:31

Launch GPU-Z or MSI Afterburner before you start mfaktc and watch the temperature and fan speed, it could be overheating. In Afterburner you can set a manual fan curve based on temperature, make sure fan speed is at 100% at around 80 C or lower.
[url]https://www.techpowerup.com/gpuz/[/url]
[url]https://www.msi.com/page/afterburner[/url]

kriesel 2018-10-30 00:56

[QUOTE=ATH;499066]Launch GPU-Z or MSI Afterburner before you start mfaktc and watch the temperature and fan speed, it could be overheating. In Afterburner you can set a manual fan curve based on temperature, make sure fan speed is at 100% at around 80 C or lower.
[URL]https://www.techpowerup.com/gpuz/[/URL]
[URL]https://www.msi.com/page/afterburner[/URL][/QUOTE]
Amen to that, plus GPU-Z can log to a file. It looks something like this:
[CODE] Date , GPU Core Clock [MHz] , GPU Memory Clock [MHz] , GPU Load [%] , Memory Usage (Dedicated) [MB] , CPU Temperature [°C] , System Memory Used [MB] ,
2018-10-29 19:35:36 , 448.9 , 478.8 , 0 , 141 , 72.0 , 3329 ,
2018-10-29 19:35:38 , 499.3 , 532.6 , 0 , 139 , 70.0 , 3330 ,
2018-10-29 19:35:41 , 499.3 , 532.6 , 5 , 161 , 64.0 , 3352 ,
2018-10-29 19:35:43 , 499.3 , 532.6 , 2 , 163 , 71.0 , 3355 ,
2018-10-29 19:35:46 , 499.3 , 532.6 , 3 , 161 , 72.0 , 3357 ,
2018-10-29 19:35:48 , 499.3 , 532.6 , 1 , 157 , 72.0 , 3350 ,
2018-10-29 19:35:51 , 510.0 , 544.0 , 3 , 159 , 68.0 , 3341 ,
2018-10-29 19:35:53 , 510.0 , 544.0 , 5 , 157 , 70.0 , 3336 ,
2018-10-29 19:35:56 , 510.0 , 544.0 , 3 , 156 , 74.0 , 3337 ,
2018-10-29 19:35:58 , 510.0 , 544.0 , 1 , 157 , 74.0 , 3338 ,
[/CODE](example above is from a nearly idle Intel Arrandale IGP)
The significant variation in GhzD/day could indicate high temperature causing clock to throttle back; in old models it causes 50% reductions.
Another good app is HWMonitor, from CPUID, which will indicate and log various parameters. And there's nvidia-smi, which also has logging capability.

petrw1 2018-10-30 03:21

My tech support was kind enough to test GPU
 
1 Attachment(s)
Even though I didn't buy it from them he ran FurMark on their own test machine and got the same "artifacting" I was getting running mfaktc.

He said the card was faulty and I convinced NVIDIA support the same so they are going to replace it (oh well….tick tick)

Also they got the RAM fixed...it now recognized all 32GB at 3600.

So I ran a benchmark but was surprised that the timings it sent to Prime95 are about the same as my sons i7-6700 with stock RAM

Mark Rose 2018-10-30 15:05

[url]https://www.theinquirer.net/inquirer/news/3065361/nvidias-geforce-rtx-2080-ti-cards-are-reportedly-failing-in-high-numbers[/url]

TheJudger 2018-10-30 18:34

[QUOTE=petrw1;499076]Even though I didn't buy it from them he ran FurMark on their own test machine and got the same "artifacting" I was getting running mfaktc.

He said the card was faulty and I convinced NVIDIA support the same so they are going to replace it (oh well….tick tick)[/QUOTE]

Thank you for your followup report! I have the feeling that some people feel ashamed (for no reason) when their hardware is faulty and don't report back.

Oliver

kriesel 2018-10-30 19:36

[QUOTE=TheJudger;499125]Thank you for your followup report! I have the feeling that some people feel ashamed (for no reason) when their hardware is faulty and don't report back.

Oliver[/QUOTE]
And thanks much for the report and the various followups. I had been contemplating an RTX2080, which I see in Mark's posted link is also affected. Hopefully NVIDIA gets things straightened out soon.

Mark Rose 2018-10-31 15:01

[QUOTE=kriesel;499127]And thanks much for the report and the various followups. I had been contemplating an RTX2080, which I see in Mark's posted link is also affected. Hopefully NVIDIA gets things straightened out soon.[/QUOTE]

I wouldn't hold back from getting one, but you may need to exchange it, that's all.

kriesel 2018-10-31 17:40

[QUOTE=Mark Rose;499163]I wouldn't hold back from getting one, but you may need to exchange it, that's all.[/QUOTE]
It's not clear what the percentage with early failure is. Any idea or data? I thought since it's shown up early after their release that the percentage is significant. The title and article you posted a link to said high numbers.
It linked in turn to [URL]https://www.digitaltrends.com/computing/nvidia-rtx-2080-ti-graphics-cards-dying/[/URL] which indicates Founders Edition high numbers, and also some Gigabyte and ASUS cards.

Anecdotally, there are folks on the forum with 2 of 3 cards failed, or 2 of 2. Failures are generally in under a month, frequently under a week, some as little as a day. RMAs take longer than that short lifetime. Replacements are also failing. Generally, I like NVIDIA hardware. But this sounds bad. [URL]https://forums.geforce.com/default/topic/1078162/geforce-rtx-20-series/rtx-2080ti-massively-die-/2/[/URL]

xx005fs 2018-10-31 19:25

[QUOTE=kriesel;499178]It's not clear what the percentage with early failure is. Any idea or data? I thought since it's shown up early after their release that the percentage is significant. The title and article you posted a link to said high numbers.
It linked in turn to [URL]https://www.digitaltrends.com/computing/nvidia-rtx-2080-ti-graphics-cards-dying/[/URL] which indicates Founders Edition high numbers, and also some Gigabyte and ASUS cards.

Anecdotally, there are folks on the forum with 2 of 3 cards failed, or 2 of 2. Failures are generally in under a month, frequently under a week, some as little as a day. RMAs take longer than that short lifetime. Replacements are also failing. Generally, I like NVIDIA hardware. But this sounds bad. [URL]https://forums.geforce.com/default/topic/1078162/geforce-rtx-20-series/rtx-2080ti-massively-die-/2/[/URL][/QUOTE]

Honestly those artifacting looks like GDDR6 artifacting because in my personal experience, when the GPU core crashes it won't artifact first, it will just straight up freeze and turn into blank screen, then the driver might auto restart the device or not depending on how badly it crashed (i dunno how the autorestarting device thing works as it usually restarts with nvidia gpu not with amd gpu). Try to underclock the memory to as low as possible as trial factoring is not memory intensive at all.

petrw1 2018-11-02 21:08

Is this bad luck or am I missing factors.
 
Overall my GTX-980 GPU is giving me about 1 factor per 100 attempts which is about right.
But in the last couple weeks I have completed over 500 TF 73-74 in the 5xM range without a factor.

Recently I ran mfaktc.exe -st and it passed all tests.
Is this enough of a test to trust the card is working and just having a streak of bad luck?

chalsall 2018-11-02 21:27

[QUOTE=petrw1;499379]Is this enough of a test to trust the card is working and just having a streak of bad luck?[/QUOTE]

I wouldn't worry about it too much; such gaps are relatively common. But if you want to be absolutely sure, why don't you run a few jobs where people recently found factors at the current DCTF wavefront:[CODE]Factor=56582819,73,74
Factor=56579363,73,74
Factor=56579071,73,74
Factor=56572361,73,74
Factor=56567509,73,74
Factor=56573833,73,74
Factor=56553979,73,74
Factor=56549137,73,74
Factor=56548663,73,74
Factor=56545807,73,74[/CODE]

petrw1 2018-11-03 20:46

[QUOTE=chalsall;499383]I wouldn't worry about it too much; such gaps are relatively common. But if you want to be absolutely sure, why don't you run a few jobs where people recently found factors at the current DCTF wavefront:[CODE]
[/CODE][/QUOTE]

So because I like to be sure I reran 14 in that range that were TF65-66 (took about 10 seconds each). I found all the factors listed PLUS for exponent 56548031 I found:
--- The factor listed
--- The additional factor TJAOI found
--- AND another factor even TJAOI missed
[url]https://www.mersenne.org/report_exponent/?exp_lo=56548031&full=1[/url]

Then I ran 5 in the TF73-74 range and found all the factors listed.

So I now am going to believe my GPU is working but I'm in a factor funk … OR … Chris subversively assigned me only exponents without a factor :razz:

petrw1 2018-11-05 14:31

[QUOTE=petrw1;499495].

So I now am going to believe my GPU is working but I'm in a factor funk … OR … Chris subversively assigned me only exponents without a factor :razz:[/QUOTE]

Found a factor via TF finally

petrw1 2018-11-05 18:58

[QUOTE=petrw1;499641]Found a factor via TF finally[/QUOTE]

And another 8 hours later

Mark Rose 2018-11-05 19:18

[QUOTE=petrw1;499495]Chris subversively assigned me only exponents without a factor :razz:[/QUOTE]

He does this to me all the time :razz:

chalsall 2018-11-05 19:29

[QUOTE=Mark Rose;499659]He does this to me all the time :razz:[/QUOTE]

LOL... But seriously, apparently I do it to myself as well; out of the last ~450 runs I've only found two factors....

kladner 2018-11-06 01:32

2 Attachment(s)
Hmm. GPU72, 1000 results, sorted by work type. All are either LL or DC TF.
1000 Assignments completed. [COLOR=red]‡[/COLOR] 26 Factor(s) Found!!!.
2.6%? I know this a lament about dry spells for factors, but how does the period in question compare to the users' overall statistics? I have certainly [U]felt[/U] parched, at times, but I generally run ahead of projections for TF. At one point, quite a while ago, I had a startling excess of factors in DCTF. This peak was subsequently subdued by very slow periods, which didn't quite make it to Drought status.

Currently, my DCTF efforts are creeping up on projections. LLTF has a comfortable margin above. These too shall pass.
[B]EDIT: Eek! [/B]I just looked at the completion dates. The attached are the first and last few results of the 1000 I have been referencing. They span late 2011 to early 2012. That is a bit early for me to have anything GPU but my faithful first: Gigabyte GTX 460. However, I did find out pretty quickly how hard I could push that little card.

My most recent 1000 completions turned in 11 factors.[INDENT]When yer hot yer hot
when yer not yer not
[/INDENT][YOUTUBE]0rdF7o08KXw[/YOUTUBE]

kriesel 2018-11-06 02:01

57/4819 here in the past year, barely above 1%, so going 100 or much more between factors would not be surprising.

nofaith628 2018-11-06 03:32

Looking at my overall statistics for gpu72, found 1501/52920 exponents. Been doing whatever assignments Chris gives me. Roughly 2.83%, we have similar percentages kladner.

petrw1 2018-11-06 05:31

If we look here:
 
[url]https://www.mersenne.org/various/math.php[/url]

[QUOTE]… Looking at past factoring data we see that the chance of finding a factor between 2X and 2X+1 is about 1/x. ...[/QUOTE]

So TF 73-74 should be 1/74 or 1.35%

Overall GPU72
[url]https://www.gpu72.com/reports/factor_percentage/[/url]
is close to that with LLTF

But closer to 1% in DCTF.
I believe that is because most all the exponents we are DCTF'ing have had P-1 done;
and the factors this finds will reduce the DCTF percentage somewhat

My 2 cents worth.

chalsall 2018-11-06 13:35

[QUOTE=petrw1;499710]I believe that is because most all the exponents we are DCTF'ing have had P-1 done; and the factors this finds will reduce the DCTF percentage somewhat.[/QUOTE]

That's my take on it. LLTF is almost always done before P-1; DCTF almost always after.

And that reminds me... I really need to update that report to reflect the deeper depths we're now working.

kriesel 2018-11-06 13:55

[QUOTE=kriesel;499689]57/4819 here in the past year, barely above 1%, so going 100 or much more between factors would not be surprising.[/QUOTE]
And now, 2 factored out of 26 in the past day, plus 1 factored of 3 P-1 for good measure. Statistics gonna vary, especially with small sample sizes.

ixfd64 2018-12-12 02:56

I'm running mfaktc on a borrowed MSI gaming laptop with a GeForce GTX 1070 video card. There was a moment today when mfaktc got stuck on a class. However, this wasn't a complete freeze because mfaktc processed the next set of classes when I pressed Ctrl + C. I had to press Ctrl + C a few more times (with classes being processed each time) before mfaktc correctly exited. Has anyone encountered this issue?

It's worth mentioning that the cursor on this laptop sometimes freezes for a short time. I have no idea if these issues are related.

James Heinrich 2018-12-12 03:12

I've seen that kind of behaviour after a display driver crash-and-recover.
For me I just restart the assignment, because the average runtime of my assignments is about 3 seconds. I'm not sure if it may be advisable to restart from a known-good checkpoint (or the entire assignment), others may have better advice on that matter.

GP2 2018-12-13 01:25

[QUOTE=petrw1;499495]So because I like to be sure I reran 14 in that range that were TF65-66 (took about 10 seconds each). I found all the factors listed PLUS for exponent 56548031 I found:
--- The factor listed
--- The additional factor TJAOI found
--- AND another factor even TJAOI missed
[url]https://www.mersenne.org/report_exponent/?exp_lo=56548031&full=1[/url]
[/QUOTE]

TJAOI didn't miss that factor, he simply hadn't reached it yet.

When TJAOI works on a bit level, he finds factors in increasing numerical order.

From your link, TJAOI found the factor 46877384547124608047 on 2018-10-03. Recently he has been finding factors like [URL="https://www.mersenne.org/report_exponent/?exp_lo=18216343&exp_hi=&full=1"]51742865878987223449[/URL] on 2018-12-05. Your factor was 60339186474018340969, which you found on 2018-11-03. TJAOI will complete the 66-bit level when he reaches 2^66 = 73786976294838206464. All of these factors are within the range of TF 65–66.

Gordon 2018-12-13 13:32

[QUOTE=James Heinrich;502470]I've seen that kind of behaviour after a display driver crash-and-recover.
For me I just restart the assignment, because the average runtime of my assignments is about 3 seconds. I'm not sure if it may be advisable to restart from a known-good checkpoint (or the entire assignment), others may have better advice on that matter.[/QUOTE]

They must be big exponents, my 1060 is taking 32 minutes per test - down in the 400k range.

storm5510 2018-12-13 14:40

[QUOTE=Gordon;502600]They must be big exponents, my 1060 is taking 32 minutes per test - down in the 400k range.[/QUOTE]

The 400,000 range?

Since [B]James Heinrich[/B] went to SSL, his batch file does not appear to let a person select a range. So, I used it as is. Yesterday evening, I was running in the 1210M range on this 1080. It was taking about 0.65 seconds at 1,100 Ghz-days/day on each assignment.

I use [I]MSI Afterburner[/I] to under-clock it sometimes. At 75% capacity, it can still run nearly 1,000 GHz-days/Day, with much less heat output. 61°C versus 72°C. I prefer not to run it [U]hot[/U].

James Heinrich 2018-12-13 14:46

[QUOTE=storm5510;502607]Since [B]James Heinrich[/B] went to SSL, his batch file does not appear to let a person select a range.[/QUOTE]Nothing should have changed. If you're having trouble, please email me directly and we can sort it out.

ATH 2018-12-13 16:11

[QUOTE=GP2;502562]TJAOI didn't miss that factor, he simply hadn't reached it yet.

When TJAOI works on a bit level, he finds factors in increasing numerical order.

From your link, TJAOI found the factor 46877384547124608047 on 2018-10-03. Recently he has been finding factors like [URL="https://www.mersenne.org/report_exponent/?exp_lo=18216343&exp_hi=&full=1"]51742865878987223449[/URL] on 2018-12-05. Your factor was 60339186474018340969, which you found on 2018-11-03. TJAOI will complete the 66-bit level when he reaches 2^66 = 73786976294838206464. All of these factors are within the range of TF 65–66.[/QUOTE]

It seems like a very inefficient way of doing it. He must find primes q in the current range of increasing order and then factor q-1 = 2*p1*p2*...*pk and then check if q is a factor of each Mp1, Mp2 to Mpk ?

GP2 2018-12-13 16:25

[QUOTE=ATH;502622]It seems like a very inefficient way of doing it. He must find primes q in the current range of increasing order and then factor q-1 = 2*p1*p2*...*pk and then check if q is a factor of each Mp1, Mp2 to Mpk ?[/QUOTE]

In the very long "[URL="https://www.mersenneforum.org/showthread.php?t=19014"]User TJAOI[/URL]" thread there is some discussion and speculation about his methodology.

His progress is pretty rapid, so either he's using an efficient method or he has enormous resources.

James Heinrich 2018-12-13 23:51

If anyone cares, I have a small summary page on TJAOI's activity here:
[url]https://www.mersenne.ca/tjaoi.php[/url]

He's very regular, currently spitting out 3 sequential days of ~2000 factors, then 3 days break, then a handful of much-larger factors, and repeat that cycle every week.

storm5510 2018-12-15 02:34

[QUOTE=James Heinrich;502692]If anyone cares, I have a small summary page on TJAOI's activity here:
[URL]https://www.mersenne.ca/tjaoi.php[/URL]

He's very regular, currently spitting out 3 sequential days of ~2000 factors, then 3 days break, then a handful of much-larger factors, and repeat that cycle every week.[/QUOTE]

I looked at your link above. He's running a pattern. Is he using [I]mfaktc[/I] to do this, or something else?

James Heinrich 2018-12-15 02:43

Something else. You can find more details about it in the above-linked thread, including [url=https://www.mersenneforum.org/showpost.php?p=502828&postcount=461]a recent post[/url] showing the general technique.

storm5510 2018-12-15 14:16

[QUOTE=James Heinrich;502839]Something else. You can find more details about it in the above-linked thread, including [URL="https://www.mersenneforum.org/showpost.php?p=502828&postcount=461"]a recent post[/URL] showing the general technique.[/QUOTE]


[CODE]q=10^3;
while(q<10^5,
v=factorint(q-1)~[1,];
for(i=1,#v,
if(v[i]<10^9&&Mod(2,q)^v[i]==1,
print(q" divide 2^"v[i]"-1");
break
)
);
until(Mod(q,8)==1||Mod(q,8)==7,
q=nextprime(q+1)
)
)[/CODE]
I can follow a little of this. It's a language I've not seen before. :no:

GP2 2018-12-15 16:31

[QUOTE=storm5510;502875]I can follow a little of this. It's a language I've not seen before. :no:[/QUOTE]

I think it's [URL="https://pari.math.u-bordeaux.fr/"]PARI/GP[/URL]

storm5510 2018-12-15 17:02

[QUOTE=GP2;502884]I think it's [URL="https://pari.math.u-bordeaux.fr/"]PARI/GP[/URL][/QUOTE]

Interesting. Everything I know is antiquated. Pascal, Cobol, Fortran, and, shall I say it, B.A.S.I.C. All of these, I learn at the local community college when I went there back in the late '80's and early '90's. There was also Assembly Language. That sailed over my head.

I have experimented with .Net Framework and had some success. All of this is so vast. I have Visual Studio 2017 installed now. I've not spent much time with it though.

Sorry for going [U]off-topic[/U].:blush:

kriesel 2018-12-15 18:22

[QUOTE=storm5510;502889]Interesting. Everything I know is antiquated. Pascal, Cobol, Fortran, and, shall I say it, B.A.S.I.C. All of these, I learn at the local community college when I went there back in the late '80's and early '90's. There was also Assembly Language. That sailed over my head.

I have experimented with .Net Framework and had some success. All of this is so vast. I have Visual Studio 2017 installed now. I've not spent much time with it though.

Sorry for going [U]off-topic[/U].:blush:[/QUOTE]
I think it would be good practice for such code posts to include a comment at the first line in the relevant syntax, identifying the language (and perhaps even compatible version number, & OS if relevant). Computing has become such a Babel over the decades, almost no one can know them all by sight.
Assembly language is great fun, especially when execution time is strictly limited by the necessity to perform all functions within a short time for real time control of robotics with slow processors (think half-millisecond for a well crafted 16 bit multiply via a subroutine, 2/3 millisecond for a 28 bit input square root, built from shift, add, subtract, and conditionals). Or when squeezing out a bit more GIMPS performance from a new flavor of chip, as Woltman routinely does.

ixfd64 2018-12-21 20:51

I might be getting access to a multi-GPU system in the not-too-distant future and have a few questions:

1. It's my understanding that I need an mfaktc instance for each GPU. Is there a way to tell mfaktc to point to a certain folder, or do I need separate copies of mfaktc for each instance?

2. For CLI-only operating systems, mfaktc should work with a terminal multiplexer like tmux or GNU Screen, right?

Thanks!

Mark Rose 2018-12-21 21:24

[QUOTE=ixfd64;503594]I might be getting access to a multi-GPU system in the not-too-distant future and have a few questions:

1. It's my understanding that I need an mfaktc instance for each GPU. Is there a way to tell mfaktc to point to a certain folder, or do I need separate copies of mfaktc for each instance?

2. For CLI-only operating systems, mfaktc should work with a terminal multiplexer like tmux or GNU Screen, right?

Thanks![/QUOTE]

1. I create a directory for each mfaktc instance, and change to that directory to run mfaktc. I actually symlink the binary to the one I compile myself.

2. Yes, I run it in screen. It should work fine in tmux, too.

storm5510 2018-12-23 16:10

A Balancing Act
 
1 Attachment(s)
I have been running a lot of [I]Prime95[/I] recently. I was not fond of running that with [I]mfaktc[/I]. So, I did a video card swap with my older HP. The HP could not handle the 1080 running at its factory defaults. So, I am using [I]MSI AfterBurner[/I] to under-clock it.

At the factory defaults, I would get a hard reset after a few seconds. I throttled it back to 65%. At this setting, the temperature is staying around 60°C. It is still running above 900 GHz-Days/Day. Another concern about heat is the yellow and black power adapter plug. The wire is too thin for this. It is warm to the touch, bot now where near what I would consider hot. I increased the speed of the case fan to adjust. It is high capacity and moves a lot of air.

The thing about this HP is that I can give it a job, turn the monitor off, and let it run for days, or even weeks, without looking at it. In this setup, I will check it a few times a day, maybe.

After I looked at the attached photo, I saw that I need to do a [U]lot[/U] of cleaning. This brown dust gets into everything.

chris2be8 2018-12-23 16:28

Just running a vacuum cleaner over it may not be enough. I've recently had one system crashing because of dust and fluff built up under the fan clogging the heat sink. I had to scrape the fluff with a straightened out paperclip to get it loose enough for the vacuum cleaner to suck it out.

It would have been easier if I could have taken the fan off the heat sink so I could get at the fluff.

Chris

GP2 2018-12-23 17:52

mfaktc uses 4620 classes = 2 × (2×3×5×7×11)

Why not add × 13 ? Wouldn't it filter even better that way?

TheJudger 2018-12-23 22:15

[QUOTE=GP2;503795]mfaktc uses 4620 classes = 2 × (2×3×5×7×11)

Why not add × 13 ? Wouldn't it filter even better that way?[/QUOTE]

Yes, but there will be even more overhead in handling those residue classes, too. Ofc the overhead could be reduced in some places but it isn't really worth doing so.

Oliver

storm5510 2018-12-23 23:50

[QUOTE=chris2be8;503790]Just running a vacuum cleaner over it may not be enough. I've recently had one system crashing because of dust and fluff built up under the fan clogging the heat sink. I had to scrape the fluff with a straightened out paperclip to get it loose enough for the vacuum cleaner to suck it out.

It would have been easier if I could have taken the fan off the heat sink so I could get at the fluff.

Chris[/QUOTE]

This is a [U]soon[/U] to-do item. I used canned air first, then a Dust Buster vac. The heat-sink on this is a bit strange and I'm not exactly sure how the fan comes off. A great thing to use on fan blades, if you can get it out, is an old toothbrush.

Uncwilly 2018-12-24 02:56

[QUOTE=storm5510;503812]This is a [U]soon[/U] to-do item. I used canned air first, then a Dust Buster vac.[/QUOTE]Using the canned air in one hand and the vac in the other can work well. PrimeMonster or CheeseHead used to extol the virtues of this method.

GP2 2018-12-24 03:19

[QUOTE=TheJudger;503806]Yes, but there will be even more overhead in handling those residue classes, too. Ofc the overhead could be reduced in some places but it isn't really worth doing so.[/QUOTE]

What is the biggest source of overhead? Is it the sieving that is done after the classes are selected, or something to do with the GPU, or something else?

Currently, 960/4620 means that 20.78% of the classes are "classes_needed", and then I guess sieve_init_class filters out some percentage of the remainder.

But if it was possible to go all the way to 2 × (2 × 3 × 5 × 7 × 11 × 13 × 17 × 19 × 23 × 29), then there would be a lot more classes but only 15.79% of those classes would be needed. It would be one-quarter fewer as a percent of the total classes. Presumably in this case sieve_init_class could be configured to do considerably less sieving.

Maybe the number of classes itself could be dynamically adjustable by the program. Presumably you want fewer classes if you are TFing to a smaller number of bits and zipping through each exponent in a couple of seconds, and more classes if you are using much higher TF limits and each exponent may take hours.

The code already has the MORE_CLASSES adjustment between 420 and 4620, which is presumably based on such considerations, but it's fixed at compile time.

I'm sure everything above has already been thought about, so I'm hoping to understand better in what areas the overhead increases and in what ways.

James Heinrich 2018-12-24 03:25

[QUOTE=GP2;503820]The code already has the MORE_CLASSES adjustment between 420 and 4620, which is presumably based on such considerations, but it's fixed at compile time.[/QUOTE]For what it's worth, mfakt[b]o[/b] has a similar option, but it's set in .ini (MoreClasses=1) and not at compile-time.

storm5510 2018-12-24 14:38

[QUOTE=Uncwilly;503819]Using the canned air in one hand and the vac in the other can work well. PrimeMonster or CheeseHead used to extol the virtues of this method.[/QUOTE]

Not a bad idea. :smile:

I found a GPU power adapter cable on Amazon that is an extension. It does not split into two like the one I am using now. It is heavier gauge wire as well. For now, I am running is with the side cover off. I was able to sneak the power consumption up to 70%. No issues. Something I have not mentioned is PSU heat. The exhaust outlet on the back is relatively cool to the touch.

Some here may think all this is a bit off-topic. It is [I]mfaktc[/I] that I am running. [I]GPUto72[/I]

TheJudger 2018-12-25 17:43

Hi,

[QUOTE=GP2;503820]What is the biggest source of overhead? Is it the sieving that is done after the classes are selected, or something to do with the GPU, or something else?

Currently, 960/4620 means that 20.78% of the classes are "classes_needed", and then I guess sieve_init_class filters out some percentage of the remainder.

But if it was possible to go all the way to 2 × (2 × 3 × 5 × 7 × 11 × 13 × 17 × 19 × 23 × 29), then there would be a lot more classes but only 15.79% of those classes would be needed. It would be one-quarter fewer as a percent of the total classes. Presumably in this case sieve_init_class could be configured to do considerably less sieving.

Maybe the number of classes itself could be dynamically adjustable by the program. Presumably you want fewer classes if you are TFing to a smaller number of bits and zipping through each exponent in a couple of seconds, and more classes if you are using much higher TF limits and each exponent may take hours.

The code already has the MORE_CLASSES adjustment between 420 and 4620, which is presumably based on such considerations, but it's fixed at compile time.

I'm sure everything above has already been thought about, so I'm hoping to understand better in what areas the overhead increases and in what ways.[/QUOTE]

One source of overhead is that GPU queue runs empty after each class. You fire alot of kernel invocations to your GPU and shouldn't assume that they are executed in that order (it is possible to make sure they are running in specific order). Let the queue run empty after each class is just simpler and doesn't cost that much.
More classes means the distances between two factor candidates increases - at some point you have to take care about this, too (e.g. 64bit instead of 32bit for difference values).
Another source of overhead is the fact that mfaktc doesn't handle partial blocks, the last block is always fully used - with more classes the work above the upper limit increases aswell.
You can compare the two versions (420 vs. 4620 classes) and give an estimate when the next prime should be included in the number of residue classes.

Oliver

GP2 2018-12-25 22:05

[QUOTE=TheJudger;503945]One source of overhead is that GPU queue runs empty after each class. You fire alot of kernel invocations to your GPU and shouldn't assume that they are executed in that order (it is possible to make sure they are running in specific order). Let the queue run empty after each class is just simpler and doesn't cost that much.

[...]

Another source of overhead is the fact that mfaktc doesn't handle partial blocks, the last block is always fully used - with more classes the work above the upper limit increases aswell.
[/QUOTE]

Is it somehow possible, at least in principle, to accumulate full-size blocks that may include work from more than one class or even more than one exponents, and then send that as a batch to the GPU?

In other words, how much of the overhead is fundamental to the way the CUDA code works, and how much is just a legacy implementation choice? I know very little about CUDA and am trying to decide if it's worth learning more.

[QUOTE]
More classes means the distances between two factor candidates increases - at some point you have to take care about this, too (e.g. 64bit instead of 32bit for difference values).[/QUOTE]

But most people will have a 64-bit architecture, I think? Or are you referring to single-precision vs. double-precision speed within the GPU?

preda 2018-12-26 07:43

[QUOTE=GP2;503961]Is it somehow possible, at least in principle, to accumulate full-size blocks that may include work from more than one class or even more than one exponents, and then send that as a batch to the GPU?

In other words, how much of the overhead is fundamental to the way the CUDA code works, and how much is just a legacy implementation choice? I know very little about CUDA and am trying to decide if it's worth learning more.[/QUOTE]

My impression is that you can't easily combine blocks or exponents. This is not because of CUDA but because of the algorithm -- which pre-computes a set of values that are per-exponent, and per-class, once before starting each class.

About learning more -- sure it's worth, if you have the time and the inclination. The concepts are pretty much exactly the same between CUDA and OpenCL, only the details differ, so you get to almost learn OpenCL at the same time :)

[QUOTE]
But most people will have a 64-bit architecture, I think? Or are you referring to single-precision vs. double-precision speed within the GPU?[/QUOTE]

It's a question of the memory size taken by a large number of 32-bit integers vs. the same number of 64-bit integers.

R. Gerbicz 2018-12-26 12:33

Interesting discussion, there are lots of misunderstanding here.

[QUOTE=TheJudger;503945]
Another source of overhead is the fact that mfaktc doesn't handle partial blocks, the last block is always fully used - with more classes the work above the upper limit increases aswell.
You can compare the two versions (420 vs. 4620 classes) and give an estimate when the next prime should be included in the number of residue classes.
[/QUOTE]

So at once you are sieving only one residue per gpu thread? In this case you would be too restrictive, I'll return to this.

[QUOTE=GP2;503820]Currently, 960/4620 means that 20.78% of the classes are "classes_needed", and then I guess sieve_init_class filters out some percentage of the remainder.

But if it was possible to go all the way to 2 × (2 × 3 × 5 × 7 × 11 × 13 × 17 × 19 × 23 × 29), then there would be a lot more classes but only 15.79% of those classes would be needed. It would be one-quarter fewer as a percent of the total classes. [/QUOTE]

Almost, in theory you get that speedup, but only in the sieve. In the mp%q (where mp=2^p-1) calculations it gives no speedup; but here there is a trade off: lowered sieving time could allow higher sievelimit, and with that again increased sieve time-lowered number of mp%q calc to get the optimal computation time.

The sieving interval's length is approx 2^74/M~2.3*10^9 for p~2^26, N1=2^74,N2=2^75, see below for M=8*p*3*5*7*11*13. Still way too large for a smallish 32000 (?) sievelength.
[you can try more primes, but using the primes up to 29 is too excessive, that gives roughly 11000 for the length]
Well, I'm not coding in gpu, but here are my thoughts:

[url]https://www.mersenneforum.org/showthread.php?t=22435&page=13[/url], maybe gap12 is/was the latest code from me. See my primegap code, the task is comparable, what you could do even in a harder problem when you need to sieve on a harder way, and actually it was a true sieving, so we sieved up to sqrt(n) on the sieve intervals, and used a large modulus in a much more complicated setup: m=2*3*5*7*11*13*17*19, you could ask why we haven't used the next prime, as in your case there is an overhead when you switch on the interval), these tasks have similar diff, for one large residue we sieved on an interval ~1e15 length/numbers.

Returning to your problem:

Say you want a sieve for q=2*k*p+1=N1..N2 in range (say p~2^26 in the current wavefront, and N1=2^74,N2=2^75 a typical(?) input).

using small primes in the sieving prime (and also knowledge of q mod 8=1,7 from quad rec.) :

q=res+k*M for k=k0..k1, and here k0,k1 is independent from res (this k is not the same as above).
M=8*p*3*5*7*11*13.
where k0=floor(N1/M) and k1=floor(N2/M).
With this you will sieve at most two more numbers per residue classes (so at most one number<N1 and at most one>N2 per res).
Note that res==1 mod (2*p), because q==1 mod (2*p).

If you have cnt number of residues (mod M) then you'll sieve NR=cnt*(k1-k0+1)=11520*(k1-k0+1) numbers in our case (if you're using the primes up to 13). With something 3rd class knowledge from elementary school you can distribute these numbers into C classes using an almost equal way: [floor(i*NR)/C,floor((i+1)*NR/C)) numbers will be in the i-th class/task for i=0..(C-1).
And this is independent from 11520 and the number of GPU threads, you can use C=1,C=1000,C=1024,C=20480 (yes smaller/larger than 11520). From the value of i you can get the residues and intervals (for each residue) that you need to do. Yes, different threads could work on the same residue class, and the same thread could do multiple residue (ofcourse not at once in this case) !

About checkpoints:
Don't know how you are doing checkpoint(s), say with 1024 gpu threads and with C=10000 you would lost only ~T/20 work (in average) where T is the total computation time, if you're distributing 1000 works ten times. (note that here we are not using all threads, and even 10000<11520 the above 'equal way' distribution works nicely). For much deeper task (N2~2^80 or so) use larger C and/or more small sieve primes (17-19-23).

[QUOTE=TheJudger;503945]Hi,
More classes means the distances between two factor candidates increases - at some point you have to take care about this, too (e.g. 64bit instead of 32bit for difference values).
[/QUOTE]

Using 64 bits? Where in the sieving, when you are using sieve primes up to say 32000 ?

For the sieving initialization in a cpu case I would precompute and store ((2*p) mod R) and (M mod R) for R<sievingprimes=32000 or whatever you are using. But maybe it isn't that needed for gpu; you have to do that when you start/switch on a new residue, so basically you don't need to store/precompute it.

R. Gerbicz 2018-12-26 13:17

One more minor thought:
If you are using (much) more task than threads then it would be possible that first you find not the smallest prime factor
of mp=2^p-1 but that means no issue for us...
in any case, in these situations there could be possible to take the whole interval to find the smallest/all prime factors also since the probability of success is small it would change nothing.
Yes, finding a smaller factor is easier/faster, but the diff is petite, roughly up to constant multiplier: 1/74-1/75~(1/74^2) for N1=2^74,N2=2^75.

TheJudger 2018-12-26 23:32

[QUOTE=GP2;503961]But most people will have a 64-bit architecture, I think? Or are you referring to single-precision vs. double-precision speed within the GPU?[/QUOTE]

Not really, Maxwell and Pascal generation have only 16bit integer multiplication in HW (32bit is constructed of four 16bit multiplications).


All times are UTC. The time now is 13:00.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.