mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Software

Reply
 
Thread Tools
Old 2018-08-04, 16:39   #34
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

2×32×7×43 Posts
Default

Quote:
Originally Posted by ktpn2011 View Post
this is on CPU E8500, Win XP32.
I like code optimization in direct asm.
So now I like to see other code speeds compared to others coding & publish my codes.
I am ready to start optimize other interesting codes..
I like your enthusiasm. Please take all the following as friendly advice. Don't be discouraged by the following. Coding talent and time is in limited supply. I'd like to see it all applied effectively.

I want to reiterate here, that the efficient sequence seems to me to be learning about the available algorithms and their strengths and applicabilities first, selecting an efficient algorithm which scales well to exponents of interest next, coding it efficiently in a high level language next, optimizing (key portions of) it such as implementing in tuned assembly code, last.

Given the meaning of optimization, and the actual and extrapolated relative run times of accepted commonly used GIMPS code and your ASM LL code, your reference to optimization occurs to me as a misuse of the word optimization. Your ASM LL code extrapolates to literally millions of times too slow to be useful in the exponent range of interest now in GIMPS, which is p~44,000,000 and up for double checks, ~81 million and up to 109. (Years or decades projected run time for exponents of 3 to 7 million, that are commonly used as software and hardware quick reliability tests of minutes or hours duration; thousands of years projected run time per exponent for current LL double checks, to ~25 million years for the maximal GIMPS mersenne.org exponent, p ~109.) It is extrapolating to slower than a simple Perl & BigInt library implementation beginning around p=105. I have not made any adjustments for hardware speed differences, just noted the hardware involved.

I used a few high precision timings from your zip file along with your posted timings for exponent between 10k and 20k to regression fit a power function to extrapolate run time, and did similar for a couple other programs and a hypothetical fft implementation. See the attachment.

Code:
# perlll3.pl (C) 2018 Kriesel an adaptation to LL w/o Jacobi check, from perlpm1.pl with Jacobi check testing, with perhaps some lingering unneeded code

sub gettimevals { # reuse my std routine, some of which is not necessary here
  our $localtimestring=localtime; # global so can be used elsewhere
  our $gmtimestring=gmtime;      # "                "
  # pattern matches are presumed to succeed, not tested for success
  my ( $dow, $mon, $dom, $time, $yr ) = $localtimestring =~ /^(\S+)\s+(\S+)\s+(\S+)\s+(\S+)\s+(\d+)$/;
  my ( $gmdow, $gmmon, $gmdom, $gmtime, $gmyr ) = $gmtimestring =~ /^(\S+)\s+(\S+)\s+(\S+)\s+(\S+)\s+(\d+)$/;
  $localtimestring = $dow.' '.$yr.' '.$mon.' '.$dom.' '.$time; #reassemble the pieces in my preferred order
  $gmtimestring = $gmdow.' '.$gmyr.' '.$gmmon.' '.$gmdom.' '.$gmtime;
  time();  # this is the return value of the subroutine, integer seconds count from start of the epoch
}

sub zerothecounts {
  $nprimeexponents=0;
  $mersennecount=0;
}

use bigint;
use Math::Prime::Util qw/is_prime/;
use Math::BigInt lib => 'GMP'; #http://perldoc.perl.org/Math/BigInt.html

print "Perlll3.pl v 0.10 Aug 3 2018        (C) 2018 Kriesel\n";

$verbosity=0;
$plo = 132048; # start with an odd prime, and it makes no sense to start with a value so small there's no odd prime < $p 
$phi = 132050; # upper limit of primes p to be tested; recommend < 100K, is ~1 hour/exponent on an i3 M370; scales as ~ p^2.558
if ( $plo %2 == 0 ) { $plo++; } # ensure start with an odd prime

my $r = Math::BigInt->new(1);
my $t64 = Math::BigInt->new(1);
$t64 = 2**64; 
zerothecounts;
$time0 = gettimevals;

for ($p=$plo; $p<$phi; $p+=2 ) {
  if ( is_prime($p) ) { # try on all prime exponents >=p lo, <p hi (no TF or P-1 here, or avoidance of already-known nonprimes)
    $nprimeexponents++;
    $mp = 2**$p-1;
    $r->bone();
    $r->bmul(4);
    $time1 = gettimevals;
    for ( $j=1; $j<=$p-2; $j++) {
      $r->bmodpow(2,$mp);
      $r->bsub(2);
    }
    $time2 = gettimevals;
    $dt = $time2 -$time1;
    if ( $dt == 1 ) { $pls=''; } else { $pls='s'; }
    $r64 = ( $r-> bmod ( $t64 ) );
    if ( $r64 > 0 ) {
      print "p=$p composite, res64 =";
      printf " %016x %d second%s\n",$r64, $dt, $pls;
    } else {
      print "p=$p Mersenne prime in $dt second$pls.\n";
      $mersennecount++;
    }
  }
}
$dt = $time2 -$time0;
print "Number of prime exponents $nprimeexponents, number of Mersenne primes $mersennecount, total run time $dt seconds.\n";
Attached Files
File Type: pdf perl bigint vs asm.pdf (20.7 KB, 113 views)

Last fiddled with by kriesel on 2018-08-04 at 16:50
kriesel is online now   Reply With Quote
Old 2018-08-04, 17:36   #35
ktpn2011
 
Aug 2018
GEORGIA Republic

2810 Posts
Default

kriesel, trust me, I fully understand and never argued, about FFT vs simple Math.
you don't need to calculate, how long it will take..

Start: 18:52:42.906
Prime_86243 RES64: 0000000000000000 19:30:44.578

also, just did with my code Square 12500000 -> 3005sec.
I just think, that, maybe my code is better then compiler's compiled?!
ktpn2011 is offline   Reply With Quote
Old 2018-08-04, 20:10   #36
science_man_88
 
science_man_88's Avatar
 
"Forget I exist"
Jul 2009
Dumbassville

26×131 Posts
Default

Quote:
Originally Posted by ktpn2011 View Post
also, just did with my code Square 12500000 -> 3005sec.
I just think, that, maybe my code is better then compiler's compiled?!
you're either missing prefixes, or a decimal point, or orders of magnitude off. PARIdroid [2,11,0,22860,"5579deb0b"] can complete that 10 million times in roughly 4.581 seconds. ASM can be slower if poorly written, wrong algorithm used, etc.
science_man_88 is offline   Reply With Quote
Old 2018-08-04, 20:24   #37
ktpn2011
 
Aug 2018
GEORGIA Republic

348 Posts
Default

I used simple Squaring algorhitm, as enhanced multiplication case..
what you did not get?
ktpn2011 is offline   Reply With Quote
Old 2018-08-04, 20:27   #38
science_man_88
 
science_man_88's Avatar
 
"Forget I exist"
Jul 2009
Dumbassville

100000110000002 Posts
Default

Quote:
Originally Posted by ktpn2011 View Post
I used simple Squaring algorhitm, as enhanced multiplication case..
what you did not get?
so you think 23.477 seconds is good for a single digit multiply by computer ?
science_man_88 is offline   Reply With Quote
Old 2018-08-04, 20:32   #39
ktpn2011
 
Aug 2018
GEORGIA Republic

22·7 Posts
Default

ah, sorry!
I mean 10^12500000

edit:
1 with 12499968 nulls

Last fiddled with by ktpn2011 on 2018-08-04 at 20:38
ktpn2011 is offline   Reply With Quote
Old 2018-08-04, 20:38   #40
paulunderwood
 
paulunderwood's Avatar
 
Sep 2002
Database er0rr

72618 Posts
Default

If FFT is a bit daunting for you, you could easily implement Karatsuba.
paulunderwood is offline   Reply With Quote
Old 2018-08-04, 20:40   #41
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

541810 Posts
Default

Quote:
Originally Posted by ktpn2011 View Post
kriesel, trust me, I fully understand and never argued, about FFT vs simple Math.
you don't need to calculate, how long it will take..

Start: 18:52:42.906
Prime_86243 RES64: 0000000000000000 19:30:44.578

also, just did with my code Square 12500000 -> 3005sec.
Replacing the extrapolated value for 86243, with the delta time from the above, raises the power of the fit, and worsens the run time estimation at the top end, to over 53,164,000 years.

In http://www.mersenneforum.org/showpos...5&postcount=23 you wrote:
Quote:
So now I like to see other code speeds compared to others coding & publish my codes.
which I thought meant you wanted to see how your ASM LL code compared to other programs for the same function and exponents. Others, that have produced the current leading software, have done the same, and sought help from forum members for ideas, code contributions, or QA. I initially thought/hoped you were attempting to make usefully fast code for GIMPS use. (Which would require switching to fft squaring eventually.)

Anyway, I hope you had fun doing it.

And I always appreciate when someone located & perhaps raised in another country posts in English, because my Japanese, Romanian, Russian, French, etc. is nonexistent.
kriesel is online now   Reply With Quote
Old 2018-08-04, 20:47   #42
ktpn2011
 
Aug 2018
GEORGIA Republic

22×7 Posts
Default

No, what a misunderstanding..
yet I am talking about simple algos.

btw, here is results of your perl on my pc:

Code:
Perlll3.pl v 0.10 Aug 3 2018        (C) 2018 Kriesel
p=11213 Mersenne prime in 6 seconds.
p=19937 Mersenne prime in 23 seconds.
p=21701 Mersenne prime in 29 seconds.
p=23209 Mersenne prime in 33 seconds.
p=44497 Mersenne prime in 156 seconds.
p=86243 Mersenne prime in 732 seconds.
but look, you use
$r->bmodpow(2,$mp);
$r->bsub(2);

while I do
square / sub 2 / mod_for2p1
ktpn2011 is offline   Reply With Quote
Old 2018-08-04, 21:04   #43
ktpn2011
 
Aug 2018
GEORGIA Republic

22×7 Posts
Default

I changed your code to:

Code:
      $r->bpow(2);
      $r->bsub(2);
      $r->bmod($mp);
and now it is faster!


p=86243 Mersenne prime in 204 seconds.!! against 732

Last fiddled with by ktpn2011 on 2018-08-04 at 21:10
ktpn2011 is offline   Reply With Quote
Old 2018-08-05, 16:06   #44
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

2·32·7·43 Posts
Default

Quote:
Originally Posted by ktpn2011 View Post
I changed your code to:
Code:
      $r->bpow(2);
      $r->bsub(2);
      $r->bmod($mp);
and now it is faster!

p=86243 Mersenne prime in 204 seconds.!! against 732
That's interesting! I didn't try any optimization. Just a two-line core that produced correct res64 sequences;1st draft, print residues and verify correctness; 2nd, add timing per exponent, remove every-iteration residue print; 3rd, add counts and total timing. (4th, change to 3-line code, simplify timing code.) Even draft 2 indicated higher speed than your assembly code at exponents of interest to GIMPS, so there was no need to optimize the perl program, to make the point the assembler code was not optimal.

But you've now demonstrated your assembler code is >11. times slower, on the same hardware and exponent. (At p=86243, 2281.672 sec vs 204 = 11.18...)
On my i3-M370, for p=86243, I get not the 3.59:1 speedup you indicated, but more like 1.95:1. It's nonintuitive why there's any advantage; it's dealing with longer operands on average.
Code:
      $r->bpow(2); # p bit input, square, 2p bit result
      $r->bsub(2); # subtract scalar from 2p bit operand, 2p bit result
      $r->bmod($mp); # 2p bit input, p bit result
vs.
Code:
      $r->bmodpow(2,$mp); # p bit input, square, mod, p bit result
      $r->bsub(2); # subtract scalar from p bit operand, p bit result
The 3-line change gives not only faster at a given exponent, the regression fit indicates a lower power vs. exponent. The difference between fits is now up to exponent0.539, to the relative disadvantage of your assembler code over the 3-step perl code (p2.925 vs p2.386). This extrapolates to the perl code being 140,000 times too slow at the top end, and the ASM code being >53,000,000 too slow, to complete a maximal exponent in a year. CUDALucas on a GTX1080Ti run time is estimated to be about a year for p~109. The lower fit exponent for the perl code indicates the BigInt library may be automatically selecting schoolboy, Karatsuba, etc, as operand size changes.
kriesel is online now   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Primality test based on factorization of n^2+n+1 carpetpool Miscellaneous Math 5 2018-02-05 05:20
Composites that pass Mathematica's pseudoprime test grandpascorpion Math 15 2012-03-23 02:52
Prime numbers Grid, to test an odd integer on 44 Zarck Math 5 2012-03-06 14:43
Faster Lucas-Lehmer test using integer arithmetic? __HRB__ Software 188 2010-08-09 11:13
please help me pass the test. caliman Hardware 2 2007-11-08 06:12

All times are UTC. The time now is 18:13.


Sun Aug 1 18:13:18 UTC 2021 up 9 days, 12:42, 0 users, load averages: 3.08, 3.11, 2.67

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.