View Single Post
Old 2018-05-29, 03:10   #2
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

3·1,933 Posts
Default GpuOwL run time vs exponent or fft length or version

RX550 data

gpuOwl v2 5000k fft RX550 gpu, MSI 18.2.3 driver Feb 26 2018:in a quick test (~40,000 iterations each) was:
short carry 17.3 ms/iter,
medium 17.6,
long 17.4,
compared to V1.9 gpuOwL on the same gpu, same pcie physical connection, April 2017 MSI driver,
10.9 ms/iter for -fft DP -legacy -size 4M;
18.9 ms/iter -fft M61 -size 4M;
21.4 ms/iter -fft DP -legacy -size 8M.

The driver change coincided with an increase by about 5% of iteration time, on the same gpu, in V1.9 gpuOwL. http://www.mersenneforum.org/showpos...&postcount=370
See the first attachment below for V1.9 on an RX550.

See also the 4-program speed comparison in the general reference thread. http://www.mersenneforum.org/showpos...76&postcount=8

For an RX480 my data indicates 3.4-3.6 times faster than RX550,
on the same exponents and gpuOwL versions, at http://www.mersenneforum.org/showpos...&postcount=386 and subsequently

An Intel IGP HD620 could run V0.5 or v1.9 but it was not worth doing. On mine the hit on prime95 throughput was larger than the gpuOwL throughput as a result. More detail on the V0.5 try (LL): http://www.mersenneforum.org/showpos...&postcount=176 (I discontinued running gpuOwl on the IGP. The tradeoff with mfakto there was much better.)
Detail on the V1.9 try (PRP): http://www.mersenneforum.org/showpos...&postcount=285
A listing of V3.5 OpenOwL command line options and fft lengths can be found at http://www.mersenneforum.org/showpos...&postcount=565

Detail on benchmarking V3.3 and V3.5 OpenOwL fft lengths on RX480 can be found at http://www.mersenneforum.org/showpos...&postcount=570

Second attachment below tabulates ms/iteration timings for various versions, V3.x - V3.9, V4.6, and V5.0, and fft lengths, on an RX480, and includes some graphs and ratios.

Third attachment compares V6.2, 5.0, 3.8, 2.0, and 1.9. Each are fastest for some fft length / exponent ranges, except v2.0. The trend line fit for asymptotic scaling of the fastest version versus fft length or exponent is iteration time p1.078, so run time p2.078, for exponents 100M<p<~2520M (6M to 144M fft length).

Updated timings for RX480 and Radeon VII under Windows 7 and 10 respectively, up to Gpuowl v7.2-69 are included in the fourth and fifth attachments. These are works in progress currently. (Lots of data points, so reading glasses and zoom.)


Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1
Attached Files
File Type: pdf speeds and limits.pdf (13.3 KB, 298 views)
File Type: pdf openowl v3x 46 50 timings.pdf (28.3 KB, 255 views)
File Type: pdf v6.2 etc benchmarks.pdf (37.6 KB, 259 views)
File Type: pdf owl many versions benchmarked rx480.pdf (79.8 KB, 83 views)
File Type: pdf owl many versions benchmarked radeon7.pdf (72.6 KB, 87 views)

Last fiddled with by kriesel on 2021-04-16 at 20:46 Reason: more benchmarks for Radeon VII
kriesel is online now