mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GPU Computing (https://www.mersenneforum.org/forumdisplay.php?f=92)
-   -   The P-1 factoring CUDA program (https://www.mersenneforum.org/showthread.php?t=17835)

TheMawn 2015-05-01 19:28

Well, I only work in a TF environment so maybe that's the difference, but my TF rate increases slightly when my screen goes blank.

My best guess is under Power Options > Edit Plan Settings (for your specific power plan) > Advanced Power Options > PCI Express you might find something. All I have is the one option set to disabled so check that this is what you have also.

Otherwise, the fix is to set the display to always be on but turn it off yourself when you don't need it.

ixfd64 2015-05-02 03:34

I've noticed there hasn't been any new code checked in since late 2013. Has development ceased, or are people still working on it behind the scenes?

henryzz 2015-05-02 10:26

[QUOTE=TheMawn;401446]Well, I only work in a TF environment so maybe that's the difference, but my TF rate increases slightly when my screen goes blank.

My best guess is under Power Options > Edit Plan Settings (for your specific power plan) > Advanced Power Options > PCI Express you might find something. All I have is the one option set to disabled so check that this is what you have also.

Otherwise, the fix is to set the display to always be on but turn it off yourself when you don't need it.[/QUOTE]

Already off.

nucleon 2015-06-21 03:45

Are there any linux binaries around for P-1?

I tried compiling, but epic fails all around. I can only get latest environment, and that needs latest drivers which don't seem to work for me.

-- Craig

owftheevil 2015-06-21 03:59

You use Windows? Linux? I can help with linux, but someone else will have to jump in if its windows.

nucleon 2015-06-21 07:55

[QUOTE=owftheevil;404479]You use Windows? Linux? I can help with linux, but someone else will have to jump in if its windows.[/QUOTE]

I have a windows binary.

I need a binary for linux.

-- Craig

frmky 2015-06-23 06:53

[QUOTE=nucleon;404484]I need a binary for linux.[/QUOTE]
Try this one: [URL="https://www.dropbox.com/s/mr0z8e9pbifla4a/cudapm1-0.20.tar.gz?dl=0"]https://www.dropbox.com/s/mr0z8e9pbifla4a/cudapm1-0.20.tar.gz?dl=0[/URL]

It is compiled with cuda 5.5 for 64-bit linux.

nucleon 2015-06-24 23:46

[QUOTE=frmky;404632]Try this one: [URL="https://www.dropbox.com/s/mr0z8e9pbifla4a/cudapm1-0.20.tar.gz?dl=0"]https://www.dropbox.com/s/mr0z8e9pbifla4a/cudapm1-0.20.tar.gz?dl=0[/URL]

It is compiled with cuda 5.5 for 64-bit linux.[/QUOTE]

Thank you heaps.

Much appreciated.

nucleon 2015-06-26 23:44

I'm running the code from the previous link on a g2.8xlarge instance on AWS. I'm getting approximately 25GHz-days/day P-1 per GPU.

Also, I'll note, I've found 2x factors.

-- Craig

chalsall 2015-06-27 13:46

[QUOTE=nucleon;404870]I'm running the code from the previous link on a g2.8xlarge instance on AWS. I'm getting approximately 25GHz-days/day P-1 per GPU.[/QUOTE]

Personally (at least for TF'ing) I find the cg1.4xlarge instances to be better value (in us-east-1d -- "Spot" often less than $0.14 an hour for two Titans).

nucleon 2015-06-28 15:22

1 Attachment(s)
Last week hasn't been that great.

LaurV 2015-11-26 15:57

Anyone has a v52 binary for win 64? possibly with rt 5.5 or so?

I took about 30 assignments in 666M which I am TF-ing to ~80 bits and in the same time (on a different card, in parallel) I do P-1 for survivors. Stage 1 goes well (FFT size 38416k), but it crashes when entering stage 2. I had that problem long ago (see pages 38-42 of this thread) which was fixed at the time by playing with the number of threads. I didn't update cudapm1 since then, I know there were some fixes. It works well for 333M, but these expos at 666M may be a bit too high...

LaurV 2016-01-18 06:32

Can someone teach me
1. Where is the last available win64 binary for cudaPM1 program?
2. How can I convince it to run only stage 1 of the algorithm.
3. In case of 2, how can I [STRIKE]resume[/STRIKE] [U]extend[/U] B1? (i.e. resuming already finished stage 1, with a larger B1, when I have the last checkpoint file saved at the end of stage 1 run with the older/smaller B1)
Thanks in advance.

ixfd64 2016-02-03 04:43

Two questions:

1. There was talk of adding worktodo.txt parsing to the program. Anyone know whether it has been implemented?
2. I asked this several months ago but didn't get an answer: the code hasn't been updated since late 2013. Has development ceased, or is someone still working on it behind the scenes?

kriesel 2017-05-01 20:32

worktodo; code development
 
[QUOTE=ixfd64;425040]Two questions:

1. There was talk of adding worktodo.txt parsing to the program. Anyone know whether it has been implemented?
2. I asked this several months ago but didn't get an answer: the code hasn't been updated since late 2013. Has development ceased, or is someone still working on it behind the scenes?[/QUOTE]

The worktodo was implemented. It certainly seems to be working on my test installation created days ago.
Some of the names here are the same as on cudalucas etc. Last I heard, flashjh (Jerry) has been working on updating CUDALucas on Windows to reflect code developed in 2013 and address some other bugs and wishlist items.
I'm seeing frequent halts to the cudapm1 program, running it on a GTX480. That's a CC2.0 card, subject to the driver timeout issue for Nvidia driver level >~300, regardless of whether it's running CUDALucas, CUDApm1, or anything else. The cudapm1 error message is:
C:/Users/filbert/Documents/Visual Studio 2010/Projects/CUDAPm1/CUDAPm1.cu(3581) : cudaDeviceSynchronize() Runtime API error 30: unknown error.

I'm also seeing runs of round-off error under 0.08, followed by termination with this message from cudapm1:
err = 0.5 >= 0.40, quitting.
Restarting it it continues fine for about an hour, and has another round-off error & quits.

storm5510 2017-06-12 01:58

[QUOTE=kriesel;458044]...The cudapm1 error message is:
C:/Users/filbert/Documents/Visual Studio 2010/Projects/CUDAPm1/CUDAPm1.cu(3581) : cudaDeviceSynchronize() Runtime API error 30: unknown error...[/QUOTE]

Keep plugging away at it. I believe several around here would like to see you succeed with this. I would. :smile:

kladner 2017-06-12 21:45

[QUOTE=ixfd64;425040]Two questions:

1. There was talk of adding worktodo.txt parsing to the program. Anyone know whether it has been implemented?
2. I asked this several months ago but didn't get an answer: the code hasn't been updated since late 2013. Has development ceased, or is someone still working on it behind the scenes?[/QUOTE]
IIRC, the author of CUDAPM1 is user owftheevil, whom I have not seen on the forum in a while. I don't know if anyone else did more work from the source code. Dubslow or FlashJH (Jerry) are some who might know more.

kriesel 2017-07-03 06:30

compile and sourceforge update request
 
1 Attachment(s)
Hi,

I've taken a stab at some minor cosmetic fixes for CUDAPm1 and its ini file. Could someone (perhaps batalov, flashjh, jgchilders, owftheevil, frmky?) please recompile for Windows (at least for 64-bit & CUDA 5.5;other variations can wait), get an updated .exe to me to test, and post the updated ini file to sourceforge? (It's been a long time since I mucked with any C of any flavor, so I'd like to test it myself first, and don't have a build environment.)

From reading the forum recently, I see Friday Nov. 22 2013 owftheevil indicated making some of his last source code changes to date. The date of the executable at [URL]https://sourceforge.net/projects/cudapm1/files/CUDAPm1-0.20/[/URL] is a few days earlier than that (Nov 18 2013, [URL]http://www.mersenneforum.org/showthread.php?t=17835&page=39[/URL] post #427), while the date of the .cu file at [URL]https://sourceforge.net/p/cudapm1/code/HEAD/tree/trunk/[/URL] is a few days later (Monday Nov 25 2013), so I'm unsure whether the last fix by owftheevil is in the exe there. (Was r52 fully synced and does it contain the change described in #427 and any changes relating to #444, message date Nov 25 2013?)

Similarly the question "is it current with the latest code" arises with James Heinrich's mirror [URL]http://download.mersenne.ca/CUDAPm1/[/URL] where the exe date is also Monday Nov 18 2013.

So I suspect the available Windows executables currently correspond to r50, not r52.

Thanks,

Ken

kriesel 2017-07-03 06:38

Error 30
 
[QUOTE=storm5510;461081]Keep plugging away at it. I believe several around here would like to see you succeed with this. I would. :smile:[/QUOTE]

Sadly there does not seem much hope of resolving that recurrent Error 30. It's an NVIDIA driver issue impacting compute capability 2.0 or lower GPUs in combination with driver releases above around 300. if I recall correctly. There was an effort to persuade NVIDIA to fix it but supporting older cards for a niche set of users wasn't enough of a priority.

kriesel 2017-07-03 07:00

Err=0.50>=0.40 (failing GPU)
 
[QUOTE=kriesel;458044]... I'm also seeing runs of round-off error under 0.08, followed by termination with this message from cudapm1:
err = 0.5 >= 0.40, quitting.
Restarting it it continues fine for about an hour, and has another round-off error & quits.[/QUOTE]

It appears that was a case of an older GPU declining in reliability. It got to the point it would get stuck on a particular pass of stage two in CUDAPm1, regardless of how many restart attempts were made. It became more frequent over time. The problem pass would vary from exponent to exponent. The checkpoint files were fine and another GPU, same model, could carry them to completion without error. Thorough memory testing of the declining GPU showed that while testing 10 25MB blocks would test error-free, several of blocks 23-40 would error, even when it was significantly underclocked. I recommend essentially full range memory testing. This GPU is likely to be replaced.

flashjh 2017-07-03 15:41

I can compile, I'll see if I can get it today.

storm5510 2017-07-16 00:54

1 Attachment(s)
This is interesting. I wanted to try it and see what happens. :smile:

kriesel 2017-07-16 14:49

device number
 
[QUOTE=storm5510;463503]This is interesting. I wanted to try it and see what happens. :smile:[/QUOTE]

Device numbering in CUDAPm1 is zero-based if I recall correctly. It is so in CUDALucas. First gpu device is 0, second is one, ... It defaults to device zero if no device is specified on the command line or in the ini file. I think that message happens for any of the following (and possibly other) cases:

- a device number higher than the last device number physically present and properly installed is specified. For example, specifying -d 2 on a system where two gpus d 0 and d 1 are present.
- a device timeout has occurred and Windows hasn't yet restarted the display device driver, so from the point of view of the OS and app, while the GPU is physically present it's not available for use
- a device timeout has occurred and Windows has attempted to restart the display device driver, but a thermal issue or other issue prevented the GPU from restarting, so from the point of view of the OS and app, while the GPU is physically present it's not available for use until the issue is resolved at least temporarily and the driver restarted
- the software was run on a system containing no qualifying device
- the software was run on a system containing a qualifying device but no suitable driver yet successfully installed and active.
- running a version requiring a CUDA level higher than the installed driver supports.

storm5510 2017-07-16 18:30

The 'DeviceNumber' was set at 1 in the configuration file. I changed it to zero. The application became responsive. It doesn't want to go beyond a 1000 iteration average error test.

kriesel 2017-07-18 05:13

CUDAPm1 startup
 
[QUOTE=storm5510;463546]The 'DeviceNumber' was set at 1 in the configuration file. I changed it to zero. The application became responsive. It doesn't want to go beyond a 1000 iteration average error test.[/QUOTE]

It can take a while, minutes, for the next line of output to appear, depending on what the setting for screen output interval is and the exponent or fft length.

For example, on a GTX480, it's nearly four minutes for 50,000 iterations below:
Iteration 1000, average error = 0.19992 x= 0.25 (max error = 0.26172), continuing test.
Iteration 50000 M43158547, 0xdd951715b61e6699, n = 2304K, CUDAPm1 v0.20 err = 0.29688 (3:45 real, 4.4892 ms/iter, ETA 16:46)
Iteration 100000 M43158547, 0xadcc2bec0b8ae426, n = 2304K, CUDAPm1 v0.20 err = 0.29297 (3:42 real, 4.4537 ms/iter, ETA 12:56)

kriesel 2017-07-18 05:16

split error message
 
1 Attachment(s)
Jerry, please see item 8 in the attachment.

storm5510 2017-07-18 16:11

After doing some reading back through the pages here, I found the proper parameter for doing bench tests. The example was, "-cufftbench 1 8192 r." I didn't want to respond to this, Then I saw where someone had used a value of "1" in the place of the "r." It ran the tests after that.

A cosmetic request: In my humble opinion, the console output lines are way too long. If the program name and version number could be removed, that would help. I had to stretch the console window to the full width of my screen to keep it all on a single line each time.

kriesel 2017-07-29 21:32

-r option in CUDAPm1 not implemented
 
Its presence in the CUDAPm1 help message output seems to be a holdover from its CUDALucas ancestry. Specifying -r on the command line does not result in any residue check tests running in CUDAPm1; it goes straight to continuation of work present in the worktodo file. If I read the source code correctly, the residue check function did not get implemented for CUDAPm1.

kriesel 2017-07-29 21:50

CUDAPm1 bug and feature wish list
 
1 Attachment(s)
The topic and attachment are not intended to be critical of the fine and free development done. My intent is to make its use easier and more productive, and maybe aid further development. These are things I've learned by using the program or very recently looking at the source code. Please feel free to PM me with any additions, corrections or suggestions.

storm5510 2017-08-07 16:03

The server did not understand the results below.

[CODE]M82595957 has a factor: 3960668801233058686019823786839 (P-1, B1=730000, B2=730000, e=0, n=4608K, aid=xxxxxxxxxxxxC10420CBB1142D2B6669 )[/CODE]

I shortened it to this:

[CODE]M82595957 has a factor: 3960668801233058686019823786839 (P-1, B1=730000, B2=730000, e=0, n=4608K)[/CODE]

The server still did not understand. [U]Note[/U]: I replaced some of the AID numbers with an 'x' in the first statement.

Ideas?

GP2 2017-08-07 16:23

[QUOTE=storm5510;465018]The server did not understand the results below.

[CODE]M82595957 has a factor: 3960668801233058686019823786839 (P-1, B1=730000, B2=730000, e=0, n=4608K, aid=xxxxxxxxxxxxC10420CBB1142D2B6669 )[/CODE]

I shortened it to this:

[CODE]M82595957 has a factor: 3960668801233058686019823786839 (P-1, B1=730000, B2=730000, e=0, n=4608K)[/CODE]

The server still did not understand. [U]Note[/U]: I replaced some of the AID numbers with an 'x' in the first statement.

Ideas?[/QUOTE]

It will understand this:

[CODE]
M82595957 has a factor: 3960668801233058686019823786839 (P-1, B1=730000, B2=730000)
[/CODE]

James Heinrich 2017-08-07 16:26

It looks like a CudaPm1 result, but it's lacking the program identifier.
The manual results form is, on purpose, very particular about formatting. Do not edit the result lines before attempting to submit them.

storm5510 2017-08-08 04:20

[QUOTE=James Heinrich;465022]It looks like a CudaPm1 result, but it's lacking the program identifier.
The manual results form is, on purpose, very particular about formatting. Do not edit the result lines before attempting to submit them.[/QUOTE]

Guilty! I was playing with a small sorting program and didn't realize it was truncating them. I ran another and formatted this one like the second. Problem solved. :blush:

kriesel 2017-08-14 14:27

cudapm1 bug and wish lst update
 
1 Attachment(s)
Here is today's version of the list I am maintaining. As always, this is in appreciation of the authors' past contributions. Users may want to browse this for workarounds included in some of the descriptions, and for an awareness of some known pitfalls. Please respond with any comments, additions or suggestions you may have.

kriesel 2017-08-20 15:54

short of memory in stage 2, repeating residual
 
Is this a known problem? It warns before starting stage 1 there may not be enough memory for stage 2 for an exponent near 300M (wanting about 3% more than the GPU has), goes ahead and completes stage 1 using ~670MB, reports a residual for stage 1, uses about 5/6 of the gpu's 1.5GB memory for stage 2, and despite the earlier memory warning, chugs along in stage 2, one relative prime at a time, reporting the final stage 1 residual with each. Iteration times appear to be normal.

CUDAPm1 v0.20
------- DEVICE 1 -------
name GeForce GTX 480
Compatibility 2.0
clockRate (MHz) 1401
memClockRate (MHz) 1848
totalGlobalMem 1610612736
totalConstMem 65536
l2CacheSize 786432
sharedMemPerBlock 49152
regsPerBlock 32768
warpSize 32
memPitch 2147483647
maxThreadsPerBlock 1024
maxThreadsPerMP 1536
multiProcessorCount 15
maxThreadsDim[3] 1024,1024,64
maxGridSize[3] 65535,65535,65535
textureAlignment 512
deviceOverlap 1

CUDA reports 1434M of 1536M GPU memory free.
Index 107
Using threads: norm1 256, mult 128, norm2 128.
Using up to 1584M GPU memory.
WARNING: There may not be enough GPU memory for stage 2!
Selected B1=2660000, B2=17955000, 5.04% chance of finding a factor
Starting stage 1 P-1, M299500177, B1 = 2660000, B2 = 17955000, fft length = 18432K
Doing 3837955 iterations

...

Iteration 3750000 M299500177, 0x16fc277b4c69b54a, n = 18432K, CUDAPm1 v0.20 err = 0.03320 (30:13 real, 36.2668 ms/iter, ETA 53:09)
Iteration 3800000 M299500177, 0xe97a5cb286fcf801, n = 18432K, CUDAPm1 v0.20 err = 0.03418 (30:13 real, 36.2698 ms/iter, ETA 22:56)
M299500177, 0x071ac99b54319724, n = 18432K, CUDAPm1 v0.20
Stage 1 complete, estimated total time = 38:41:27
Starting stage 1 gcd.
M299500177 Stage 1 found no factor (P-1, B1=2660000, B2=17955000, e=0, n=18432K CUDAPm1 v0.20)
Starting stage 2.
Using b1 = 2660000, b2 = 17955000, d = 2310, e = 2, nrp = 1
Zeros: 778308, Ones: 811452, Pairs: 143662
Processing 1 - 1 of 480 relative primes.
Inititalizing pass... done. transforms: 170, err = 0.03711, (3.02 real, 17.7483 ms/tran, ETA NA)
Transforms: 16700 M299500177, 0x071ac99b54319724, n = 18432K, CUDAPm1 v0.20 err = 0.03711 (5:16 real, 18.8891 ms/tran, ETA 42:23:55)

...

Processing 341 - 341 of 480 relative primes.
Inititalizing pass... done. transforms: 265, err = 0.02988, (5.16 real, 19.4604 ms/tran, ETA 12:27:36)
Transforms: 16664 M299500177, 0x071ac99b54319724, n = 18432K, CUDAPm1 v0.20 err = 0.03125 (5:15 real, 18.8980 ms/tran, ETA 12:22:20)

storm5510 2017-08-20 18:17

[QUOTE=kriesel;465987]...Transforms: 16664 [B]M299500177[/B], 0x071ac99b54319724, n = 18432K, CUDAPm1 v0.20 err = 0.03125 (5:15 real, 18.8980 ms/tran, ETA 12:22:20)[/QUOTE]

Is there a particular reason for running an exponent this large?

James Heinrich 2017-08-20 18:27

[QUOTE=storm5510;465999]Is there a particular reason for running an exponent this large?[/QUOTE]Especially when it already has a [url=http://www.mersenne.ca/exponent/299500177]known 52-bit factor[/url].

kriesel 2017-08-21 06:23

why
 
[QUOTE=James Heinrich;466000]Especially when it already has a [URL="http://www.mersenne.ca/exponent/299500177"]known 52-bit factor[/URL].[/QUOTE]

Thanks for asking. Yes it's a bit off the beaten path. That's the point of this run.

I started it as a joint test of my hardware and the software & its local configuration. Does it find the factor? (That's actually the technique the author of the software described using, for qualifying an installation, but years ago at lower exponents. Sometimes things go wrong at different fft lengths. Test exponents are selected for having a factor that should be found. You're right that that's the opposite of searching to find new factors to screen out LL test candidates.)

And such testing also can shed some light on the following, even if it fails the find-the-known-factor test. What is actual run-time as a function of exponent, so what's reasonable or unreasonable to run on given hardware? Does anything break at high P for CUDAPm1? What are the gpu memory requirements or default usage versus exponent and stage? What is the save file size versus p and stage? If it's memory limited, does the software handle too little gpu memory gracefully? Are there unknown or forgotten bugs that could be smoked out and dealt with before the wave of PrimeNet assignments hit the fft lengths that reveal them? Armies use scouts.

I had already run current-P-1-wavefront assignments, some double-check territory exponents assigned as LLDC that had only B1 done on them, and some current or recent wavefront LL tests assigned that had only B1 done IIRC, so had about half the data for a handy chart or two already, so why not get a few points elsewhere on the log plot?

It's an extension of some of the stuff I've been posting over at [URL]http://www.mersenneforum.org/showthread.php?t=22450&page=3[/URL] as I puzzle things out as a long time GIMPS participant (1996?) but new to gpu use for it.
If that sort of information is already available and assembled somewhere else, and it may well be, I'd love to know where. I've read a lot of threads and thread lists, and haven't found it yet. It's a big haystack. Maybe the future gpu-newbies will find the Available Software thread and find it useful. I would have.

I want to know the capabilities and limitations of the software, generally and in relation to the parameters of the models of gpu I have running (6) or on order (1). Understanding that will help deploy them in the most productive manner.

And finally, it's because it interests me, more than only doing one exponent after another in ascending order at the wavefront, on each gpu or cpu. I currently have a mix of mfaktc, cudapm1, cudalucas, and prime95 running on systems, which mostly are doing production work, cranking out a mix of ECM, LL, DC, TF, & P-1, but I enjoy looking into how things will be different later and what issues may turn up.

kriesel 2017-09-13 21:26

updated benchmark, memory requirements, limits, etc on GTX480
 
1 Attachment(s)
Note that for comparison, a GTX1070 can do M9100xxxx in about 6.5 hours. The GTX480 is limited by both run-time and 1.5GB video memory size.

storm5510 2017-09-14 02:09

[QUOTE=kriesel;466048]...cranking out a mix of ECM, LL, DC, TF, & P-1, but I enjoy looking into how things will be different later and what issues may turn up.[/QUOTE]

[B]Off Topic[/B]:I ran multiple machines for a while. I found my utility bills, rather shocking, pardon the pun. It was a 1/3 increase over each billing cycle. So, one machine is used sparingly. Only the newest one runs constantly. :smile:

kriesel 2017-11-12 14:07

Appearance of exponent limit on Quadro 2000
 
Has anyone else seen something similar? A GTX480 had no equivalent problem on the same 84M exponents. This is CUDAPm1 v0.20 on Windows 64-bit Vista.

After a few successful stage 1 and stage 2 p-1 runs of ~83.5M, each following exponent >84M runs through stage 1, but not through stage 1 gcd or stage 2, crashing the program instead.
Behavior is reproducible for exponents 84M+, including after program restarts, logouts, system restarts.

M83496143 found no factor (P-1, B1=685000, B2=12843750, e=2, n=4608K, aid=85D38BAC023FCFF8022AABA05F602C4C CUDAPm1 v0.20)
reported 11/1/17
M83496227 found no factor (P-1, B1=685000, B2=12843750, e=2, n=4608K, aid=A1656CF4111B3B15C4A71186811384FF CUDAPm1 v0.20)
reported 11/2/17
M83496247 found no factor (P-1, B1=685000, B2=12843750, e=2, n=4608K, aid=5F246BFB077E96AA450384EFEC8EC599 CUDAPm1 v0.20)
reported 11/3/17
M83496293 found no factor (P-1, B1=685000, B2=12843750, e=2, n=4608K, aid=725F9720C9179022C18CEA98F646F72E CUDAPm1 v0.20)
reported 11/4/17
M50001781 has a factor: 4392938042637898431087689 (P-1, B1=430000, B2=5000000, e=2, n=2688K CUDAPm1 v0.20)

All 5 exponents attempted above 84M failed:
PFactor=A3B66EB4FAAE78E8F283D5C96AD37A__,1,2,84228073,-1,76,2
PFactor=DC8BDAFB8D89D04B3B35742B11D9CE__,1,2,84228097,-1,76,2
PFactor=C996CF4EA78E42F9610D9789BE1666__,1,2,84228103,-1,76,2
and two more

A typical event log entry follows. From entry to entry, process id and application start time changes but other event data values do not.

Log Name: Application
Source: Application Error
Date: 11/4/2017 7:23:36 PM
Event ID: 1000
Task Category: (100)
Level: Error
Keywords: Classic
User: N/A
Computer: eagle
Description:
Faulting application CUDAPm1_win64_20131118_CUDA_50.exe, version 0.0.0.0, time stamp 0x5285815f, faulting module CUDAPm1_win64_20131118_CUDA_50.exe, version 0.0.0.0, time stamp 0x5285815f, exception code 0xc0000005, fault offset 0x000000000000dd20, process id 0xd78, application start time 0x01d355cc5142bacb.
Event Xml:
<Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
<System>
<Provider Name="Application Error" />
<EventID Qualifiers="0">1000</EventID>
<Level>2</Level>
<Task>100</Task>
<Keywords>0x80000000000000</Keywords>
<TimeCreated SystemTime="2017-11-05T00:23:36.000Z" />
<EventRecordID>256</EventRecordID>
<Channel>Application</Channel>
<Computer>eagle</Computer>
<Security />
</System>
<EventData>
<Data>CUDAPm1_win64_20131118_CUDA_50.exe</Data>
<Data>0.0.0.0</Data>
<Data>5285815f</Data>
<Data>CUDAPm1_win64_20131118_CUDA_50.exe</Data>
<Data>0.0.0.0</Data>
<Data>5285815f</Data>
<Data>c0000005</Data>
<Data>000000000000dd20</Data>
<Data>d78</Data>
<Data>01d355cc5142bacb</Data>
</EventData>
</Event>

Normal progression, 83M:
(end of stage 1)
Iteration 987000 M83496293, 0xf2fb4b229c8521b0, n = 4608K, CUDAPm1 v0.20 err = 0.16919 (0:37 real, 36.8380 ms/iter, ETA 0:39)
Iteration 988000 M83496293, 0x9ad528e521e85730, n = 4608K, CUDAPm1 v0.20 err = 0.16797 (0:37 real, 36.8401 ms/iter, ETA 0:03)
M83496293, 0x232eab21eaf81e92, n = 4608K, CUDAPm1 v0.20
Stage 1 complete, estimated total time = 10:10:44
Starting stage 1 gcd.
M83496293 Stage 1 found no factor (P-1, B1=685000, B2=12843750, e=2, n=4608K CUDAPm1 v0.20)
Starting stage 2.
Using b1 = 685000, b2 = 12843750, d = 2310, e = 2, nrp = 13
Zeros: 573917, Ones: 658723, Pairs: 125889
Processing 1 - 13 of 480 relative primes.
Inititalizing pass... done. transforms: 270, err = 0.16406, (5.09 real, 18.8644 ms/tran, ETA NA)
Transforms: 2106 M83496293, 0x52b341a257507f69, n = 4608K, CUDAPm1 v0.20 err = 0.16406 (0:41 real, 19.4671 ms/tran, ETA 9:14:05)
Transforms: 2010 M83496293, 0x905f255bd35e844b, n = 4608K, CUDAPm1 v0.20 err = 0.17188 (0:39 real, 19.5838 ms/tran, ETA 9:15:02)
Transforms: 2014 M83496293, 0x673b942ac1fc4ae2, n = 4608K, CUDAPm1 v0.20 err = 0.16406 (0:40 real, 19.5771 ms/tran, ETA 9:14:52)
...

Processing 469 - 480 of 480 relative primes.
Inititalizing pass... done. transforms: 357, err = 0.17090, (6.88 real, 19.2605 ms/tran, ETA 14:07)
Transforms: 2090 M83496293, 0x284e7914442300ef, n = 4608K, CUDAPm1 v0.20 err = 0.17090 (0:41 real, 19.4700 ms/tran, ETA 13:26)
Transforms: 2058 M83496293, 0xb1c240cc360984b8, n = 4608K, CUDAPm1 v0.20 err = 0.16797 (0:40 real, 19.5747 ms/tran, ETA 12:46)
Transforms: 2012 M83496293, 0xfa21edbaa82e8d9d, n = 4608K, CUDAPm1 v0.20 err = 0.16992 (0:40 real, 19.5721 ms/tran, ETA 12:07)
Transforms: 1958 M83496293, 0xfdc0e766f0aa5f44, n = 4608K, CUDAPm1 v0.20 err = 0.16992 (0:38 real, 19.5923 ms/tran, ETA 11:28)
Transforms: 1980 M83496293, 0xf808c66bf88da80d, n = 4608K, CUDAPm1 v0.20 err = 0.16406 (0:39 real, 19.5757 ms/tran, ETA 10:50)
Transforms: 1998 M83496293, 0xed71c1b76d6c0757, n = 4608K, CUDAPm1 v0.20 err = 0.16602 (0:39 real, 19.5754 ms/tran, ETA 10:10)
Transforms: 1910 M83496293, 0x9587bca9e6a92d95, n = 4608K, CUDAPm1 v0.20 err = 0.16406 (0:37 real, 19.5884 ms/tran, ETA 9:33)
Transforms: 1902 M83496293, 0xdd50dacef6b94028, n = 4608K, CUDAPm1 v0.20 err = 0.17383 (0:38 real, 19.5907 ms/tran, ETA 8:56)
Transforms: 1930 M83496293, 0x5c01c876ba23af0e, n = 4608K, CUDAPm1 v0.20 err = 0.16406 (0:38 real, 19.6468 ms/tran, ETA 8:18)
Transforms: 1924 M83496293, 0x4967e5714a906dd8, n = 4608K, CUDAPm1 v0.20 err = 0.16406 (0:37 real, 19.6022 ms/tran, ETA 7:40)
Transforms: 1914 M83496293, 0xb5338d4f9734dcbf, n = 4608K, CUDAPm1 v0.20 err = 0.16406 (0:38 real, 19.5649 ms/tran, ETA 7:03)
Transforms: 1882 M83496293, 0xb3364da78f68767c, n = 4608K, CUDAPm1 v0.20 err = 0.17969 (0:37 real, 19.5884 ms/tran, ETA 6:26)
Transforms: 1916 M83496293, 0x63c6b998ac49a7a0, n = 4608K, CUDAPm1 v0.20 err = 0.16406 (0:37 real, 19.5861 ms/tran, ETA 5:49)
Transforms: 1844 M83496293, 0x9b385d7b61a51d47, n = 4608K, CUDAPm1 v0.20 err = 0.16406 (0:36 real, 19.5965 ms/tran, ETA 5:13)
Transforms: 1882 M83496293, 0xe0d8af2fcfffed20, n = 4608K, CUDAPm1 v0.20 err = 0.17188 (0:37 real, 19.5938 ms/tran, ETA 4:36)
Transforms: 1896 M83496293, 0x85a24d9c67bd9496, n = 4608K, CUDAPm1 v0.20 err = 0.16406 (0:37 real, 19.5903 ms/tran, ETA 3:59)
Transforms: 1986 M83496293, 0x71a887caf40e5bb7, n = 4608K, CUDAPm1 v0.20 err = 0.17627 (0:39 real, 19.5874 ms/tran, ETA 3:20)
Transforms: 1978 M83496293, 0x65c7d9d6c70197bf, n = 4608K, CUDAPm1 v0.20 err = 0.16797 (0:39 real, 19.5815 ms/tran, ETA 2:41)
Transforms: 1986 M83496293, 0x8f7ecc43a94105ef, n = 4608K, CUDAPm1 v0.20 err = 0.16406 (0:39 real, 19.5769 ms/tran, ETA 2:02)
Transforms: 1950 M83496293, 0xaac5ccee0aafbde0, n = 4608K, CUDAPm1 v0.20 err = 0.16797 (0:38 real, 19.5877 ms/tran, ETA 1:24)
Transforms: 2036 M83496293, 0x34e6f17ecab893b1, n = 4608K, CUDAPm1 v0.20 err = 0.17188 (0:40 real, 19.5862 ms/tran, ETA 0:44)
Transforms: 2024 M83496293, 0x4b29a8a5677c72db, n = 4608K, CUDAPm1 v0.20 err = 0.17578 (0:40 real, 19.5816 ms/tran, ETA 0:04)

Stage 2 complete, 1710522 transforms, estimated total time = 9:18:00
Starting stage 2 gcd.
M83496293 Stage 2 found no factor (P-1, B1=685000, B2=12843750, e=2, n=4608K CUDAPm1 v0.20)

(results.txt entry made, worktodo modified, next exponent started)



Abnormal 84M exponent:
(end of stage 1 crashes before gcd, program restarted attempts to begin at stage 2 fail, stage 1 gcd message missing)
Iteration 994000 M84228073, 0xf6fe7d71235ae765, n = 4608K, CUDAPm1 v0.20 err = 0.21875 (0:37 real, 36.8486 ms/iter, ETA 0:55)
Iteration 995000 M84228073, 0xed35e0151d83c908, n = 4608K, CUDAPm1 v0.20 err = 0.22656 (0:36 real, 36.8537 ms/iter, ETA 0:19)
M84228073, 0xc840c55fb78fc6a2, n = 4608K, CUDAPm1 v0.20
Stage 1 complete, estimated total time = 10:15:26batch wrapper reports cudapm1 exited at Sat 11/04/2017 12:12:38.23
batch wrapper reports CUDAPm1 (re)launch at Sat 11/04/2017 12:12:39.17

(from here repeats except batch wrapper date/time stamps change, until worktodo file is manually modified to remove the stuck exponent)
CUDAPm1 v0.20
Warning: Couldn't parse ini file option UnusedMem; using default.
------- DEVICE 0 -------
name Quadro 2000
Compatibility 2.1
clockRate (MHz) 1251
memClockRate (MHz) 1304
totalGlobalMem 1073741824
totalConstMem 65536
l2CacheSize 262144
sharedMemPerBlock 49152
regsPerBlock 32768
warpSize 32
memPitch 2147483647
maxThreadsPerBlock 1024
maxThreadsPerMP 1536
multiProcessorCount 4
maxThreadsDim[3] 1024,1024,64
maxGridSize[3] 65535,65535,65535
textureAlignment 512
deviceOverlap 1

No Quadro 2000 fft.txt file found. Using default fft lengths.
For optimal fft selection, please run
./CUDAPm1 -cufftbench 1 8192 r
for some small r, 0 < r < 6 e.g.
CUDA reports 952M of 1024M GPU memory free.
No Quadro 2000 threads.txt file found. Using default thread sizes.
For optimal thread selection, please run
./CUDAPm1 -cufftbench 4608 4608 r
for some small r, 0 < r < 6 e.g.
Using threads: norm1 512, mult 128, norm2 128.
No stage 2 checkpoint.
Using up to 828M GPU memory.
Selected B1=690000, B2=12937500, 3.07% chance of finding a factor
Using B1 = 690000 from savefile.
Continuing stage 2 from a partial result of M84228073 fft length = 4608K
batch wrapper reports cudapm1 exited at Sat 11/04/2017 12:13:34.24
batch wrapper reports CUDAPm1 (re)launch at Sat 11/04/2017 12:13:36.14

henryzz 2017-11-12 17:51

The 480 has 1.5x as much memory. I suspect that may be the issue.

kriesel 2017-11-13 03:44

1 Attachment(s)
[QUOTE=henryzz;471628]The 480 has 1.5x as much memory. I suspect that may be the issue.[/QUOTE]

Thanks for your reply. I considered that. I think it's too low an exponent for that to be the case. If you see I missed something, please explain.

Maybe the gcd uses much more memory than the rest of stage 1, but I was able to run up to about 290M on a GTX480 to completion, through stage 1 with gcd, and stage 2 with gcd. Observed stage 1 memory usage is rather linear with exponent in CUDAPM1, with regression fit 54.5MB+ 2.03 bytes times exponent value p, so I'd expect p=~84.2M to require only about 225MB in stage 1.

Stage 2 memory usage is impacted by both exponent and nrp selection; it picks nrp to fit within available memory up to an exponent where nrp=1, leaving at least about 200 MB of headroom on the GTX480 (presumably for the code to occupy). From these observations, and extrapolating downward in memory requirement from the two GTX480 runs with nrp=1 for p~250M and 290M, to 824MB required, I'd expect to be able to run up to p=~145M in a 1GB card.

For p=83.5M, the Quadro 2000 supported nrp=13. From the nrp=13 point on the GTX480, at p=120M, it was able to run over double the exponent.

The program log from the 83.5M and 84.2M runs says for stage 2 on the Quadro 2000,
Using up to 828M GPU memory.
The GTX480 says 1332MB for the same exponents.
But I've found that is just an expression of the available memory, not the amount reported by GPU-Z as in use during a stage.

The Quadro 2000 passed a maximum-feasible-size 38-block memory test. (38x25=950MB).

storm5510 2017-11-29 03:56

[QUOTE=kriesel;471677]...The GTX480 says 1332MB for the same exponents....[/QUOTE]

Are you overclocking your GTX 480?

kriesel 2017-11-29 20:42

[QUOTE=storm5510;472646]Are you overclocking your GTX 480?[/QUOTE]

No. I have two, on the same machine. They came with different default clocks, 701 and 725. The 725 I downclock to 702. The 701 has been reliable; the 725/702 has repeatable memory errors in the middle of the address range, that at one time were reduced by downclocking but no longer are. So I use it only for trial factoring, which occupies memory not affected by the errors. I've become an advocate of testing as much gpu memory as possible, from what I've learned on that second GTX480.

storm5510 2017-11-30 00:06

[QUOTE=kriesel;472689]No. I have two, on the same machine. They came with different default clocks, 701 and 725. The 725 I downclock to 702. The 701 has been reliable; the 725/702 has repeatable memory errors in the middle of the address range, that at one time were reduced by downclocking but no longer are. So I use it only for trial factoring, which occupies memory not affected by the errors. I've become an advocate of testing as much gpu memory as possible, from what I've learned on that second GTX480.[/QUOTE]

Interesting! I tried it on mine once. The gain was insignificant. The one I have defaults to 700. If I run a GPU process that causes it to reset itself, then that number drops to 450. It takes a cold-boot to get back to 700.

kriesel 2017-11-30 05:40

[QUOTE=storm5510;472713]Interesting! I tried it on mine once. The gain was insignificant. The one I have defaults to 700. If I run a GPU process that causes it to reset itself, then that number drops to 450. It takes a cold-boot to get back to 700.[/QUOTE]

One of the two goes AWOL at varying intervals. I found that to get reliable p-1 or LL tests, it required making the 702/memory error one device zero. If it was device one, and p-1 or LL were set to run on device zero, when the one goes AWOL, the bad-memory one drops to device zero and causes problems with a p-1 or LL run. When I say AWOL, it's physically there, but GPU-Z only finds the one device, and a running GPU-Z already set to track device one ceases displaying its sensor readings, Windows event log shows a driver restart, and restarted cudapm1 and cudalucas don't find a device one. Clearing that up requires a shutdown/restart, in command line, shutdown -r.

kriesel 2017-12-01 18:21

multiple instances or dissimilar instances per gpu
 
Hi,

Has anyone experimented with running more than one instance of CUDAPm1 on a single GPU?

Reason I ask is I'm used to seeing 100% GPU load in GPU-Z, with a single instance of CUDALucas or CUDAPm1 per GPU, but on a GTX1070 it varies 99-100%. Also I have found gains in running multiple Mfaktc instances, raising the GPU load from 98 to 100%, on a GTX480.

In sharing a single GTX480 GPU between simultaneous single instances of CUDALucas and CUDAPm1, in a quick test, I'm calculating more combined throughput than either running alone, by several percent. Since I'm running numerous GPUs, if that holds up, it's the equivalent of adding another GPU.

Any light you can shed on effects of multiple instances, such as confirming results, or negative results, on various GPU models, would be appreciated.

kriesel 2017-12-04 17:27

[QUOTE=storm5510;472713]Interesting! I tried it on mine once. The gain was insignificant. The one I have defaults to 700. If I run a GPU process that causes it to reset itself, then that number drops to 450. It takes a cold-boot to get back to 700.[/QUOTE]

I have never seen a 450 clock rate on either of my GTX480's, or a lower clock after driver restart or program reset. I have seen them drop from 70x to 405, and then some seconds later down to 50.6, when there's little or no GPU processing load, and go back up with load.

kriesel 2017-12-04 18:10

[QUOTE=kriesel;471608]Has anyone else seen something similar? A GTX480 had no equivalent problem on the same 84M exponents. This is CUDAPm1 v0.20 on Windows 64-bit Vista.

After a few successful stage 1 and stage 2 p-1 runs of ~83.5M, each following exponent >84M runs through stage 1, but not through stage 1 gcd or stage 2, crashing the program instead.
Behavior is reproducible for exponents 84M+, including after program restarts, logouts, system restarts.

M83496143 found no factor (P-1, B1=685000, B2=12843750, e=2, n=4608K, aid=85D38BAC023FCFF8022AABA05F602C4C CUDAPm1 v0.20)
reported 11/1/17
M83496227 found no factor (P-1, B1=685000, B2=12843750, e=2, n=4608K, aid=A1656CF4111B3B15C4A71186811384FF CUDAPm1 v0.20)
reported 11/2/17
M83496247 found no factor (P-1, B1=685000, B2=12843750, e=2, n=4608K, aid=5F246BFB077E96AA450384EFEC8EC599 CUDAPm1 v0.20)
reported 11/3/17
M83496293 found no factor (P-1, B1=685000, B2=12843750, e=2, n=4608K, aid=725F9720C9179022C18CEA98F646F72E CUDAPm1 v0.20)
reported 11/4/17
M50001781 has a factor: 4392938042637898431087689 (P-1, B1=430000, B2=5000000, e=2, n=2688K CUDAPm1 v0.20)

All 5 exponents attempted above 84M failed:
PFactor=A3B66EB4FAAE78E8F283D5C96AD37A__,1,2,84228073,-1,76,2
PFactor=DC8BDAFB8D89D04B3B35742B11D9CE__,1,2,84228097,-1,76,2
PFactor=C996CF4EA78E42F9610D9789BE1666__,1,2,84228103,-1,76,2
and two more

A typical event log entry follows. From entry to entry, process id and application start time changes but other event data values do not.

Log Name: Application
Source: Application Error
Date: 11/4/2017 7:23:36 PM
Event ID: 1000
Task Category: (100)
Level: Error
Keywords: Classic
User: N/A
Computer: eagle
Description:
Faulting application CUDAPm1_win64_20131118_CUDA_50.exe, version 0.0.0.0, time stamp 0x5285815f, faulting module CUDAPm1_win64_20131118_CUDA_50.exe, version 0.0.0.0, time stamp 0x5285815f, exception code 0xc0000005, fault offset 0x000000000000dd20, process id 0xd78, application start time 0x01d355cc5142bacb.
Event Xml:
<Event xmlns=&quot;http://schemas.microsoft.com/win/2004/08/events/event&quot;>
<System>
<Provider Name=&quot;Application Error&quot; />
<EventID Qualifiers=&quot;0&quot;>1000</EventID>
<Level>2</Level>
<Task>100</Task>
<Keywords>0x80000000000000</Keywords>
<TimeCreated SystemTime=&quot;2017-11-05T00:23:36.000Z&quot; />
<EventRecordID>256</EventRecordID>
<Channel>Application</Channel>
<Computer>eagle</Computer>
<Security />
</System>
<EventData>
<Data>CUDAPm1_win64_20131118_CUDA_50.exe</Data>
<Data>0.0.0.0</Data>
<Data>5285815f</Data>
<Data>CUDAPm1_win64_20131118_CUDA_50.exe</Data>
<Data>0.0.0.0</Data>
<Data>5285815f</Data>
<Data>c0000005</Data>
<Data>000000000000dd20</Data>
<Data>d78</Data>
<Data>01d355cc5142bacb</Data>
</EventData>
</Event>

Normal progression, 83M:
(end of stage 1)
Iteration 987000 M83496293, 0xf2fb4b229c8521b0, n = 4608K, CUDAPm1 v0.20 err = 0.16919 (0:37 real, 36.8380 ms/iter, ETA 0:39)
Iteration 988000 M83496293, 0x9ad528e521e85730, n = 4608K, CUDAPm1 v0.20 err = 0.16797 (0:37 real, 36.8401 ms/iter, ETA 0:03)
M83496293, 0x232eab21eaf81e92, n = 4608K, CUDAPm1 v0.20
Stage 1 complete, estimated total time = 10:10:44
Starting stage 1 gcd.
M83496293 Stage 1 found no factor (P-1, B1=685000, B2=12843750, e=2, n=4608K CUDAPm1 v0.20)
Starting stage 2.
Using b1 = 685000, b2 = 12843750, d = 2310, e = 2, nrp = 13
Zeros: 573917, Ones: 658723, Pairs: 125889
Processing 1 - 13 of 480 relative primes.
Inititalizing pass... done. transforms: 270, err = 0.16406, (5.09 real, 18.8644 ms/tran, ETA NA)
Transforms: 2106 M83496293, 0x52b341a257507f69, n = 4608K, CUDAPm1 v0.20 err = 0.16406 (0:41 real, 19.4671 ms/tran, ETA 9:14:05)
Transforms: 2010 M83496293, 0x905f255bd35e844b, n = 4608K, CUDAPm1 v0.20 err = 0.17188 (0:39 real, 19.5838 ms/tran, ETA 9:15:02)
Transforms: 2014 M83496293, 0x673b942ac1fc4ae2, n = 4608K, CUDAPm1 v0.20 err = 0.16406 (0:40 real, 19.5771 ms/tran, ETA 9:14:52)
...

Processing 469 - 480 of 480 relative primes.
Inititalizing pass... done. transforms: 357, err = 0.17090, (6.88 real, 19.2605 ms/tran, ETA 14:07)
Transforms: 2090 M83496293, 0x284e7914442300ef, n = 4608K, CUDAPm1 v0.20 err = 0.17090 (0:41 real, 19.4700 ms/tran, ETA 13:26)
Transforms: 2058 M83496293, 0xb1c240cc360984b8, n = 4608K, CUDAPm1 v0.20 err = 0.16797 (0:40 real, 19.5747 ms/tran, ETA 12:46)
Transforms: 2012 M83496293, 0xfa21edbaa82e8d9d, n = 4608K, CUDAPm1 v0.20 err = 0.16992 (0:40 real, 19.5721 ms/tran, ETA 12:07)
Transforms: 1958 M83496293, 0xfdc0e766f0aa5f44, n = 4608K, CUDAPm1 v0.20 err = 0.16992 (0:38 real, 19.5923 ms/tran, ETA 11:28)
Transforms: 1980 M83496293, 0xf808c66bf88da80d, n = 4608K, CUDAPm1 v0.20 err = 0.16406 (0:39 real, 19.5757 ms/tran, ETA 10:50)
Transforms: 1998 M83496293, 0xed71c1b76d6c0757, n = 4608K, CUDAPm1 v0.20 err = 0.16602 (0:39 real, 19.5754 ms/tran, ETA 10:10)
Transforms: 1910 M83496293, 0x9587bca9e6a92d95, n = 4608K, CUDAPm1 v0.20 err = 0.16406 (0:37 real, 19.5884 ms/tran, ETA 9:33)
Transforms: 1902 M83496293, 0xdd50dacef6b94028, n = 4608K, CUDAPm1 v0.20 err = 0.17383 (0:38 real, 19.5907 ms/tran, ETA 8:56)
Transforms: 1930 M83496293, 0x5c01c876ba23af0e, n = 4608K, CUDAPm1 v0.20 err = 0.16406 (0:38 real, 19.6468 ms/tran, ETA 8:18)
Transforms: 1924 M83496293, 0x4967e5714a906dd8, n = 4608K, CUDAPm1 v0.20 err = 0.16406 (0:37 real, 19.6022 ms/tran, ETA 7:40)
Transforms: 1914 M83496293, 0xb5338d4f9734dcbf, n = 4608K, CUDAPm1 v0.20 err = 0.16406 (0:38 real, 19.5649 ms/tran, ETA 7:03)
Transforms: 1882 M83496293, 0xb3364da78f68767c, n = 4608K, CUDAPm1 v0.20 err = 0.17969 (0:37 real, 19.5884 ms/tran, ETA 6:26)
Transforms: 1916 M83496293, 0x63c6b998ac49a7a0, n = 4608K, CUDAPm1 v0.20 err = 0.16406 (0:37 real, 19.5861 ms/tran, ETA 5:49)
Transforms: 1844 M83496293, 0x9b385d7b61a51d47, n = 4608K, CUDAPm1 v0.20 err = 0.16406 (0:36 real, 19.5965 ms/tran, ETA 5:13)
Transforms: 1882 M83496293, 0xe0d8af2fcfffed20, n = 4608K, CUDAPm1 v0.20 err = 0.17188 (0:37 real, 19.5938 ms/tran, ETA 4:36)
Transforms: 1896 M83496293, 0x85a24d9c67bd9496, n = 4608K, CUDAPm1 v0.20 err = 0.16406 (0:37 real, 19.5903 ms/tran, ETA 3:59)
Transforms: 1986 M83496293, 0x71a887caf40e5bb7, n = 4608K, CUDAPm1 v0.20 err = 0.17627 (0:39 real, 19.5874 ms/tran, ETA 3:20)
Transforms: 1978 M83496293, 0x65c7d9d6c70197bf, n = 4608K, CUDAPm1 v0.20 err = 0.16797 (0:39 real, 19.5815 ms/tran, ETA 2:41)
Transforms: 1986 M83496293, 0x8f7ecc43a94105ef, n = 4608K, CUDAPm1 v0.20 err = 0.16406 (0:39 real, 19.5769 ms/tran, ETA 2:02)
Transforms: 1950 M83496293, 0xaac5ccee0aafbde0, n = 4608K, CUDAPm1 v0.20 err = 0.16797 (0:38 real, 19.5877 ms/tran, ETA 1:24)
Transforms: 2036 M83496293, 0x34e6f17ecab893b1, n = 4608K, CUDAPm1 v0.20 err = 0.17188 (0:40 real, 19.5862 ms/tran, ETA 0:44)
Transforms: 2024 M83496293, 0x4b29a8a5677c72db, n = 4608K, CUDAPm1 v0.20 err = 0.17578 (0:40 real, 19.5816 ms/tran, ETA 0:04)

Stage 2 complete, 1710522 transforms, estimated total time = 9:18:00
Starting stage 2 gcd.
M83496293 Stage 2 found no factor (P-1, B1=685000, B2=12843750, e=2, n=4608K CUDAPm1 v0.20)

(results.txt entry made, worktodo modified, next exponent started)



Abnormal 84M exponent:
(end of stage 1 crashes before gcd, program restarted attempts to begin at stage 2 fail, stage 1 gcd message missing)
Iteration 994000 M84228073, 0xf6fe7d71235ae765, n = 4608K, CUDAPm1 v0.20 err = 0.21875 (0:37 real, 36.8486 ms/iter, ETA 0:55)
Iteration 995000 M84228073, 0xed35e0151d83c908, n = 4608K, CUDAPm1 v0.20 err = 0.22656 (0:36 real, 36.8537 ms/iter, ETA 0:19)
M84228073, 0xc840c55fb78fc6a2, n = 4608K, CUDAPm1 v0.20
Stage 1 complete, estimated total time = 10:15:26batch wrapper reports cudapm1 exited at Sat 11/04/2017 12:12:38.23
batch wrapper reports CUDAPm1 (re)launch at Sat 11/04/2017 12:12:39.17

(from here repeats except batch wrapper date/time stamps change, until worktodo file is manually modified to remove the stuck exponent)
CUDAPm1 v0.20
Warning: Couldn't parse ini file option UnusedMem; using default.
------- DEVICE 0 -------
name Quadro 2000
Compatibility 2.1
clockRate (MHz) 1251
memClockRate (MHz) 1304
totalGlobalMem 1073741824
totalConstMem 65536
l2CacheSize 262144
sharedMemPerBlock 49152
regsPerBlock 32768
warpSize 32
memPitch 2147483647
maxThreadsPerBlock 1024
maxThreadsPerMP 1536
multiProcessorCount 4
maxThreadsDim[3] 1024,1024,64
maxGridSize[3] 65535,65535,65535
textureAlignment 512
deviceOverlap 1

No Quadro 2000 fft.txt file found. Using default fft lengths.
For optimal fft selection, please run
./CUDAPm1 -cufftbench 1 8192 r
for some small r, 0 < r < 6 e.g.
CUDA reports 952M of 1024M GPU memory free.
No Quadro 2000 threads.txt file found. Using default thread sizes.
For optimal thread selection, please run
./CUDAPm1 -cufftbench 4608 4608 r
for some small r, 0 < r < 6 e.g.
Using threads: norm1 512, mult 128, norm2 128.
No stage 2 checkpoint.
Using up to 828M GPU memory.
Selected B1=690000, B2=12937500, 3.07% chance of finding a factor
Using B1 = 690000 from savefile.
Continuing stage 2 from a partial result of M84228073 fft length = 4608K
batch wrapper reports cudapm1 exited at Sat 11/04/2017 12:13:34.24
batch wrapper reports CUDAPm1 (re)launch at Sat 11/04/2017 12:13:36.14[/QUOTE]

The plot thickens. I've successfully run higher exponents (~84.9m) on another Quadro 2000, with the same CUDAPm1 executable image, CUDA5.5 64-bit 20130923 V0.20 executable. BIOS versions on the GPUs differ in the right 6 characters; the problem occurred on the gpu with the lower BIOS version number 70 06 0F 00 0A, and not with 70 06 31 02 01. It was run with no fft file or threads file initially, 512, 128, 128 threads 4608k fft length, then retried to complete with fft and threads files and 256, 256, 32 threads, 4608k fft length and program still failed. The other GPU had fft and threads files created before beginning to run any P-1 attempts, which succeeded. I'm now attempting a new exponent ~84.9m on the unit that had trouble with 84.2m. If that fails I may run a thorough memory test on it. Other possibilities are card swap and retest, and BIOS update. Other ideas?

storm5510 2017-12-05 01:20

[CODE]Transforms: 2024 M83496293, 0x4b29a8a5677c72db, n = 4608K, [COLOR=Red]CUDAPm1 v0.20[/COLOR] err = 0.17578 (0:40 real, 19.5816 ms/tran, ETA 0:04)[/CODE]I wish the part in [COLOR=Red]red[/COLOR] could be removed. It makes PowerShell, or Command Prompt, almost too wide to fit my screen.

Mark Rose 2017-12-05 02:25

[QUOTE=storm5510;473179][CODE]Transforms: 2024 M83496293, 0x4b29a8a5677c72db, n = 4608K, [COLOR=Red]CUDAPm1 v0.20[/COLOR] err = 0.17578 (0:40 real, 19.5816 ms/tran, ETA 0:04)[/CODE]I wish the part in [COLOR=Red]red[/COLOR] could be removed. It makes PowerShell, or Command Prompt, almost too wide to fit my screen.[/QUOTE]

Use smaller fonts?

storm5510 2017-12-05 08:19

[QUOTE=Mark Rose;473185]Use smaller fonts?[/QUOTE]

That's an option. I generally run this in PowerShell. :smile:

kriesel 2017-12-05 21:24

[QUOTE=kriesel;473144]The plot thickens. I've successfully run higher exponents (~84.9m) on another Quadro 2000, with the same CUDAPm1 executable image, CUDA5.5 64-bit 20130923 V0.20 executable. BIOS versions on the GPUs differ in the right 6 characters; the problem occurred on the gpu with the lower BIOS version number 70 06 0F 00 0A, and not with 70 06 31 02 01. It was run with no fft file or threads file initially, 512, 128, 128 threads 4608k fft length, then retried to complete with fft and threads files and 256, 256, 32 threads, 4608k fft length and program still failed. The other GPU had fft and threads files created before beginning to run any P-1 attempts, which succeeded. I'm now attempting a new exponent ~84.9m on the unit that had trouble with 84.2m. If that fails I may run a thorough memory test on it. Other possibilities are card swap and retest, and BIOS update. Other ideas?[/QUOTE]

Ok. Same GPU and system that reliably choked on exponents 84228073, 84228097, 84228103, 84228119, 84228229, just successfully ran to completion, M84861479, with same fft length etc. I'd expect the higher exponent to present more of a challenge, not less.

CUDA reports 830M of 1024M GPU memory free.
Index 64
Using threads: norm1 256, mult 256, norm2 32.
Using up to 720M GPU memory.
Selected B1=690000, B2=12420000, 3.05% chance of finding a factor
Starting stage 1 P-1, M84861479, B1 = 690000, B2 = 12420000, fft length = 4608K
Doing 995519 iterations
Iteration 5000 M84861479, 0x85dcbca418bb3656, n = 4608K, CUDAPm1 v0.20 err = 0.27344 (3:03 real, 36.6115 ms/iter, ETA 10:04:24)
...
Iteration 995000 M84861479, 0xb98ed42b48260d4a, n = 4608K, CUDAPm1 v0.20 err = 0.25000 (3:02 real, 36.5191 ms/iter, ETA 0:18)
M84861479, 0x4a2093b79c7bf108, n = 4608K, CUDAPm1 v0.20
Stage 1 complete, estimated total time = 10:06:31
Starting stage 1 gcd.
M84861479 Stage 1 found no factor (P-1, B1=690000, B2=12420000, e=0, n=4608K CUDAPm1 v0.20)
Starting stage 2.
Using b1 = 690000, b2 = 12420000, d = 2310, e = 2, nrp = 10
Zeros: 554802, Ones: 637038, Pairs: 121194
Processing 1 - 10 of 480 relative primes.
...

Stage 2 complete, 1766191 transforms, estimated total time = 9:31:47
Starting stage 2 gcd.
M84861479 Stage 2 found no factor (P-1, B1=690000, B2=12420000, e=2, n=4608K CUDAPm1 v0.20)

Weird, but I'll take it. A couple other things I had thought of to try were matching OS and system ram on another box & GPU and retrying there. System that ran the problem 84.2m exponents to completion had a newer Windows OS and twice the system ram.
...

kriesel 2017-12-05 21:44

CUDAPm1 runtime scaling, etc.
 
1 Attachment(s)
The previously posted pdf has been extended to include the effect of GPU ram size on number of relative primes processed in a pass in stage 2, for the exponents near the current primenet manual assignment issue values, ~85,000,000. (See page 4 of the attached pdf.)

Note, nrp has been observed to fluctuate from run to run on the same hardware, and/or identical GPU model on another system, for very similar exponents (examples 1GB, nrp 10 & 13; 1.5GB, 24 & 27). This may be due to some stage 2 runs beginning and selecting an NRP value while another application (mfaktc or cudalucas) was also running on the GPU and occupying some GPU ram. Values tabulated were those first obtained in testing, without attention to GPU sharing. So these values could be considered a lower bound for what should be feasible when not running other gpu applications.

NRP is very linear with available GPU ram, up to 4GB, followed by only slight increase to 8GB, in this exponent range.

kriesel 2017-12-06 00:04

[QUOTE=chalsall;473220]If a tree falls in the forest and there is no one around to hear, does it make a sound?[/QUOTE]

Yes, by definition. [url]https://en.wikipedia.org/wiki/Sound[/url]:google:

Slow day? Are you bored and looking to get banned or blocked again?
It would be better if you instead try to contribute something that might be useful or interesting, at least to a newbie. (Or remain below 0 db.)

wblipp 2017-12-09 04:04

[QUOTE=kriesel;473224]Yes, by definition. [url]https://en.wikipedia.org/wiki/Sound[/url][/QUOTE]

Also No, by definition, using the SAME link. It depends on whether you use the physics definition or the physiology definition of sound.

kriesel 2017-12-12 01:00

[QUOTE=wblipp;473519]Also No, by definition, using the SAME link. It depends on whether you use the physics definition or the physiology definition of sound.[/QUOTE]

Well, I'm an engineer, so tend to look toward the physical natural mechanisms. The speed of sound is derivable as a shock wave asymptotically approaching a pressure wave ratio of 1. Would it make sense to say there was no explosion if no one was there to be hurt? Also, I grew up on a farm, near forests, and both farm and forest held far more animals than humans. &quot;No one&quot; refers to humans, but many other creatures also have ears or other sense organs for acoustic signals. There is a sound (acoustic signal), even if the only potential hearer is deaf. To say otherwise is like saying that because an illiterate person looked at your message and got nothing out of it, there was no writing; or a person who doesn't understand English heard you read it, so there was no speech.

kriesel 2018-02-21 18:41

Updated bug and wishlist for cudapm1
 
1 Attachment(s)
Here is today's version of the list I am maintaining. As always, this is in appreciation of the authors' past contributions. Users may want to browse this for workarounds included in some of the descriptions, and for an awareness of some known pitfalls. Please respond with any comments, additions or suggestions you may have.

Cubox 2018-03-04 01:59

Hi kriesel,

I am interested in running p-1 tests with my GPU, however finding information about cudap-1 is difficult.
Is there a place with updated code? The only one I can find is the one from the original author. If I understood correctly, patches have been created, I would prefer to have them applied.

The program being based on CUDALucas, I will guess that instructions are similar, but if you have any information for me in addition, that would be great!
If binaries for Windows exists, I am also happy to test.

kriesel 2018-03-04 06:01

[QUOTE=Cubox;481516]Hi kriesel,

I am interested in running p-1 tests with my GPU, however finding information about cudap-1 is difficult.
Is there a place with updated code? The only one I can find is the one from the original author. If I understood correctly, patches have been created, I would prefer to have them applied.

The program being based on CUDALucas, I will guess that instructions are similar, but if you have any information for me in addition, that would be great!
If binaries for Windows exists, I am also happy to test.[/QUOTE]

If you're referring to the bug and wish list I made, the code edits mentioned in some parts have not been turned into updated executables yet or tested & debugged.

Available software is described periodically at [URL]http://www.mersenneforum.org/showthread.php?t=22450&page=3[/URL]

James Heinrich 2018-03-04 13:24

Windows binaries for CudaPM1 are available at [url]https://download.mersenne.ca/[/url] but they're 5 years old.

kriesel 2018-03-04 18:04

[QUOTE=Cubox;481516]Hi kriesel,

I am interested in running p-1 tests with my GPU, however finding information about cudap-1 is difficult.
Is there a place with updated code? The only one I can find is the one from the original author. If I understood correctly, patches have been created, I would prefer to have them applied.

The program being based on CUDALucas, I will guess that instructions are similar, but if you have any information for me in addition, that would be great!
If binaries for Windows exists, I am also happy to test.[/QUOTE]
What model is your gpu?

Cubox 2018-03-04 21:14

[QUOTE=James Heinrich;481536]Windows binaries for CudaPM1 are available at [url]https://download.mersenne.ca/[/url] but they're 5 years old.[/QUOTE]

I saw those, and do not wish to use them. I would like to ensure the software I run is updated. This is why I am asking here about updates to this code.

[QUOTE=kriesel]What model is your gpu?[/QUOTE]

MSI GTX 1070 8G

I am running CUDALucas2.06beta at the moment, doing some double checking LLs.
The card is stable-ish. Over the 53 DC I have done, only 3 (updated, was 4 before edit) were bad. (One was a stupid overclock I did).

I am willing to compile my binaries and/or help with testing updated code if you have patches.

Cubox 2018-03-04 21:19

[QUOTE=kriesel;481522]If you're referring to the bug and wish list I made, the code edits mentioned in some parts have not been turned into updated executables yet or tested & debugged.

Available software is described periodically at [URL]http://www.mersenneforum.org/showthread.php?t=22450&page=3[/URL][/QUOTE]

The CUDAp-1 software mentionned in your list of mersenne hunting software pdf (very useful for newcomers!) states Jan 2016 as 'Approx date' for CUDAp-1.

[URL]https://sourceforge.net/projects/cudapm1/files/[/URL] has last code update in 2013, last binaries are from 2013 as well.

kriesel 2018-03-05 04:11

[QUOTE=Cubox;481580]I saw those, and do not wish to use them. I would like to ensure the software I run is updated. This is why I am asking here about updates to this code.

MSI GTX 1070 8G

I am running CUDALucas2.06beta at the moment, doing some double checking LLs.
The card is stable-ish. Over the 53 DC I have done, only 3 (updated, was 4 before edit) were bad. (One was a stupid overclock I did).

I am willing to compile my binaries and/or help with testing updated code if you have patches.[/QUOTE]

As far as I know, v0.20, approx Nov 2013, is the latest available executable for Windows. There was something dated June 2015 for linux. Thanks for volunteering to help change that.

What programming experience do you have?
Are you familiar with posting code on sourceforge?

First step is to get the development environment together, and demonstrate to yourself that you can compile and link gpu code and produce something functional. (That doesn't have to be CUDAPm1 initially; could be CUDALucas or mfaktc, or any tiny demo CUDA app for quick turnaround.) I suggest aiming for CUDA6.5 or CUDA8.0, 64-bit Windows executables. (I've seen speed advantages with CUDA6.x over other versions, in CUDALucas with extensive benchmarking. Driver version didn't make any detectable difference. But it can vary vs. card.) The GTX1070 requires CUDA 8, as I recall. A lot of us have older cards that perform faster at lower CUDA levels.

I think NVIDIA CUDA SDK; MS VC Community Edition. Perhaps Jerry (flashjh) could advise how to set up for multiple CUDA levels.

Then we can get into developing a v0.21 beta with some minor tweaks and bug fixes, and go from there.

Six percent bad runs seems a bit high to me (3/53)

Cubox 2018-03-06 03:47

[QUOTE=kriesel;481604]As far as I know, v0.20, approx Nov 2013, is the latest available executable for Windows. There was something dated June 2015 for linux. Thanks for volunteering to help change that.

What programming experience do you have?
Are you familiar with posting code on sourceforge?

First step is to get the development environment together, and demonstrate to yourself that you can compile and link gpu code and produce something functional. (That doesn't have to be CUDAPm1 initially; could be CUDALucas or mfaktc, or any tiny demo CUDA app for quick turnaround.) I suggest aiming for CUDA6.5 or CUDA8.0, 64-bit Windows executables. (I've seen speed advantages with CUDA6.x over other versions, in CUDALucas with extensive benchmarking. Driver version didn't make any detectable difference. But it can vary vs. card.) The GTX1070 requires CUDA 8, as I recall. A lot of us have older cards that perform faster at lower CUDA levels.

I think NVIDIA CUDA SDK; MS VC Community Edition. Perhaps Jerry (flashjh) could advise how to set up for multiple CUDA levels.

Then we can get into developing a v0.21 beta with some minor tweaks and bug fixes, and go from there.

Six percent bad runs seems a bit high to me (3/53)[/QUOTE]

I am good with C, kinda good with C++, used to work on Linux and OSX, not Windows.
I know all about posting source on Github.

I'll try to go compile the latest CUDALucas. I will keep you updated, however due to my free time being an unknown quantity, I might take a few days.

kriesel 2018-03-06 15:03

[QUOTE=Cubox;481663]I will keep you updated, however due to my free time being an unknown quantity, I might take a few days.[/QUOTE]
No problem, I can relate. Some things have waited nearly 5 years, some longer, they can wait a few more days or weeks.

kriesel 2018-03-06 17:34

cudapm1 images
 
[QUOTE=James Heinrich;481536]Windows binaries for CudaPM1 are available at [URL]https://download.mersenne.ca/[/URL] but they're 5 years old.[/QUOTE]
This looks rather comprehensive for Windows binaries, and apparently contains no linux executables.

Clicking on the link at mersenne.ca, [url]http://www.mersenneforum.org/CUDAPm1/[/url], I get a 404 error.

The June 23 2015 Linux build is on sourceforge but not on mersenne.ca. I wonder if that linux version is the only build with [r52] "reduced register use on square kernel", since that sourceforge entry is dated Nov 25 2013, slightly after the newest Windows build (Nov 18 2013). [URL]https://sourceforge.net/p/cudapm1/code/HEAD/tree/trunk/[/URL]

The wiki page at [url]http://mersennewiki.org/index.php/CUDAPm1[/url] is not an article (yet?), so much as 3 links, to James' mirror, the SourceForge folder, and this discussion thread.

kriesel 2018-03-06 23:43

[QUOTE=Cubox;481581]The CUDAp-1 software mentioned in your list of mersenne hunting software pdf (very useful for newcomers!) states Jan 2016 as 'Approx date' for CUDAp-1.

[URL]https://sourceforge.net/projects/cudapm1/files/[/URL] has last code update in 2013, last binaries are from 2013 as well.[/QUOTE]

Sorry, Jan 2016 in the CUDAPm1 date cell was probably a late-night-edit-error. (clLucas not CUDAPm1 as I recall.)
See post 503 in this thread for a hopefully more accurate reflection of the latest CUDAPm1 versions currently available. I'll fix the pdf soon. (Then, hopefully, you'll make it obsolete, by producing something newer...)

kriesel 2018-03-16 17:54

CUDAPm1 bug and wish list update
 
1 Attachment(s)
Here is today's version of the list I am maintaining. As always, this is in appreciation of the authors' past contributions. Users may want to browse this for workarounds included in some of the descriptions, and for an awareness of some known pitfalls. Please respond with any comments, additions or suggestions you may have.

VictordeHolland 2018-03-17 00:23

The current version seems to be working on the GTX1080 Ti with W10 x64 (didn't do any extensive tests or performance optimalisations)

[code]
C:\CUDAPm1_v0.20>CUDAPm1_v0.20.exe 60593041, -b1 1000
CUDAPm1 v0.20
Warning: Couldn't parse ini file option Threads; using default: 256
Warning: Couldn't parse ini file option CheckRoundoffAllIterations; using default: off
Warning: Couldn't parse ini file option Polite; using default: 1
Warning: Couldn't parse ini file option DeviceNumber; using default: 0
Warning: Couldn't parse ini file option WorkFile; using default "worktodo.txt"
Warning: Couldn't parse ini file option ResultsFile; using default "results.txt"
Warning: Couldn't parse ini file option UnusedMem; using default.
CUDA reports 9310M of 11264M GPU memory free.
Index 50
No GeForce GTX 1080 Ti threads.txt file found. Using default thread sizes.
For optimal thread selection, please run
./CUDAPm1 -cufftbench 3584 3584 r
for some small r, 0 < r < 6 e.g.
Using threads: norm1 256, mult 128, norm2 128.
Using up to 4284M GPU memory.
Starting stage 1 P-1, M60593041, B1 = 1000, B2 = 13320000, fft length = 3584K
Doing 1475 iterations
Running careful round off test for 1000 iterations. If average error > 0.25, the test will restart with a longer FFT.
Iteration 100, average error = 0.01770, max error = 0.02539
Iteration 200, average error = 0.02034, max error = 0.02734
Iteration 300, average error = 0.02122, max error = 0.02734
Iteration 400, average error = 0.02165, max error = 0.02637
Iteration 500, average error = 0.02194, max error = 0.02734
Iteration 600, average error = 0.02210, max error = 0.02686
Iteration 700, average error = 0.02226, max error = 0.02734
Iteration 800, average error = 0.02232, max error = 0.02637
Iteration 900, average error = 0.02238, max error = 0.02637
Iteration 1000, average error = 0.02240 <= 0.25 (max error = 0.02734), continuing test.
M60593041, 0x962b95049cafb7d9, n = 3584K, CUDAPm1 v0.20
Stage 1 complete, estimated total time = 0:03
Starting stage 1 gcd.
M60593041 has a factor: 2105528336291622770155712978260232660484461209 (P-1, B1=1000, B2=1000, e=0, n=3584K CUDAPm1 v0.20)
[/code]


fft bench:
[code]
Device GeForce GTX 1080 Ti
Compatibility 6.1
clockRate (MHz) 1607
memClockRate (MHz) 5505

fft max exp ms/iter
1 22133 0.0355
2 43633 0.0390
4 85933 0.0478
32 657719 0.0693
44 898213 0.0791
64 1296011 0.0839
81 1631969 0.0987
96 1927129 0.0989
112 2240863 0.1025
128 2553659 0.1204
160 3176779 0.1251
200 3951977 0.1446
224 4415431 0.1553
256 5031737 0.1925
288 5646379 0.2212
294 5761451 0.2562
320 6259537 0.2708
324 6336103 0.2832
392 7634537 0.3099
400 7786967 0.3304
448 8700169 0.3338
512 9914521 0.3805
576 11125619 0.4453
648 12484649 0.5054
686 13200581 0.5413
800 15343429 0.5486
864 16543493 0.6236
1024 19535569 0.6952
1080 20580341 0.8218
1120 21325891 0.8564
1152 21921901 0.8756
1176 22368691 0.9074
1296 24599717 0.9129
1372 26010389 1.0312
1568 29640913 1.0384
1600 30232693 1.0678
1728 32597297 1.1680
1792 33778141 1.2742
2048 38492887 1.2833
2160 40551479 1.5437
2304 43194913 1.5569
2560 47885689 1.7060
2592 48471289 1.7171
2625 49075057 1.9772
2688 50227213 1.9787
2744 51250889 1.9848
2800 52274087 2.0086
3136 58404433 2.0353
3200 59570449 2.2746
3240 60298969 2.2818
3584 66556463 2.3477
4096 75846319 2.5299
4608 85111207 3.0311
4800 88579669 3.3866
5120 94353877 3.3908
5184 95507747 3.4069
5292 97454309 3.8099
5600 103000823 3.8417
5832 107174381 4.0325
6144 112781477 4.1750
6272 115080019 4.2456
6400 117377567 4.4651
6480 118813021 4.5797
6912 126558077 4.6116
7168 131142761 4.7072
7200 131715607 4.9283
8192 149447533 5.1292
[/code]

kriesel 2018-03-17 16:30

gtx1070 for comparison
 
[QUOTE=VictordeHolland;482568]The current version seems to be working on the GTX1080 Ti with W10 x64 (didn't do any extensive tests or performance optimalisations)[/QUOTE]

Looks like the 1080 Ti is nearly the equal of a pair of GTX1070s.
What's the largest exponent you can successfully run on the 1080 Ti with its 11GB VRAM?

I've run 314M on the 1070 ok, but 628M had problems continuing from the stage 1 gcd or performing it. (I think the former based on GPU-Z indications)

The GTX480's limit was about 290M for stage 2 due to 1.5GB memory size becoming inadequate at nrp=1.

[CODE]Device GeForce GTX 1070
Compatibility 6.1
clockRate (MHz) 1708
memClockRate (MHz) 4004

fft max exp ms/iter
2 43633 0.0606
4 85933 0.0630
8 169409 0.0911
16 333803 0.0913
32 657719 0.0953
64 1296011 0.1109
80 1612249 0.1237
81 1631969 0.1408
96 1927129 0.1428
100 2005673 0.1436
112 2240863 0.1488
120 2397383 0.1716
128 2553659 0.1794
144 2865601 0.1882
160 3176779 0.2148
162 3215629 0.2467
168 3332107 0.2524
200 3951977 0.2622
216 4261051 0.2945
224 4415431 0.2989
225 4434721 0.3248
256 5031737 0.3341
288 5646379 0.3603
320 6259537 0.4237
324 6336103 0.4458
336 6565633 0.5069
392 7634537 0.5102
400 7786967 0.5271
432 8395997 0.5558
448 8700169 0.5791
512 9914521 0.6009
540 10444757 0.7232
576 11125619 0.7246
640 12333809 0.8014
648 12484649 0.8258
672 12936919 0.9232
686 13200581 0.9234
720 13840423 0.9244
800 15343429 0.9298
864 16543493 1.0297
1024 19535569 1.1486
1080 20580341 1.3637
1125 21419011 1.4440
1134 21586693 1.4747
1152 21921901 1.4855
1176 22368691 1.5284
1280 24302527 1.5325
1296 24599717 1.5563
1323 25101101 1.7481
1344 25490893 1.7790
1350 25602229 1.7805
1400 26529691 1.7827
1568 29640913 1.8353
1600 30232693 1.8536
1728 32597297 2.0343
1750 33003301 2.2177
1792 33778141 2.2198
2048 38492887 2.2744
2304 43194913 2.6746
2560 47885689 3.0174
2592 48471289 3.0979
2688 50227213 3.5028
2700 50446621 3.5501
2800 52274087 3.5831
2916 54392209 3.6662
3136 58404433 3.7083
3200 59570449 4.0342
3240 60298969 4.1233
3584 66556463 4.2461
3600 66847171 4.6064
4096 75846319 4.6173
4608 85111207 5.4760
4800 88579669 6.1239
5120 94353877 6.1506
5184 95507747 6.2963
5292 97454309 6.9197
5600 103000823 7.0910
5832 107174381 7.4497
6144 112781477 7.7539
6272 115080019 7.8423
6400 117377567 8.4223
6480 118813021 8.5396
6912 126558077 8.5851
7168 131142761 9.0281
7200 131715607 9.4287
8192 149447533 9.7002
8640 157439981 11.4261
9216 167703023 11.7002
9408 171120919 12.9847
9600 174537299 12.9942
9720 176671801 13.2919
10080 183071879 13.7479
10240 185914837 13.9074
10368 188188471 14.6202
11200 202952693 14.6974
11664 211176269 15.7289
12096 218826341 16.3628
12544 226753511 16.5236
12800 231280639 17.2002
12960 234109067 17.6919
13824 249369863 18.0687
14336 258403573 18.5125
14400 259532291 19.2037
15552 279831199 20.5104
16384 294471259 20.9802
18432 330441847 23.5745
18816 337176443 26.0162
20480 366326371 26.8871
20736 370806323 29.1363
21168 378363589 29.6717
21504 384239189 30.1835
21952 392070229 30.5201
23040 411074273 30.6741
23328 416101459 32.0017
25088 446794913 34.3478
25600 455715121 35.5808
27648 491358173 37.0692
28672 509158127 38.0063
28800 511382147 38.6743
32768 580225813 41.9480
32805 580866907 47.4597
33075 585544397 48.3338
36864 651102253 49.4871
39200 691446799 56.7610
41472 730636397 58.1385
42336 745527179 62.3263
44800 787958201 62.5338
46080 809980289 64.9344
49152 862780273 68.7844
50176 880364279 71.1277
51200 897940567 75.0087
51840 908921869 75.8619
55296 968171579 77.0567
57344 1003244573 78.9115
57600 1007626787 80.0893
65536 1143276383 87.6720
[/CODE]Obtained with, and followed by, something resembling the following (actually run in stages)
[CODE]set exe=cudaPm1_win64_20131118_CUDA_50.exe

set model=GeForce GTX 1070
set ntimes=2
set dev=0

:some gpus can't do the whole span, so are run in portions to obtain some fft results
%exe% -d %dev% -cufftbench 1 32768 1 >>cudapm1start.txt
rename "%model% fft.txt" "%model% fft save.txt"
if errorlevel 1 goto skip
%exe% -d %dev% -cufftbench 32768 65536 1 >>cudapm1start.txt
for %%a in ( 4096 5120 6144 ) do %exe% -d %dev% -cufftbench %%a %%a 1 >>cudapm1start.txt
for %%a in ( 4608 4800 5184 5292 5600 5832 6272 6400 6480 6912 7168 7200 8192 ) do %exe% -d %dev% -cufftbench %%a %%a 1 >>cudapm1start.txt
for %%a in ( 8640 9216 9408 9600 9720 10080 10240 10368 11200 11664 12096 12544 12800 12960 13824 14336 14400 15552 16384 ) do %exe% -d %dev% -cufftbench %%a %%a 1 >>cudapm1start.txt
for %%a in ( 18432 18816 20480 20736 21168 21504 21952 23040 23328 25088 25600 27648 28672 28800 32768 ) do %exe% -d %dev% -cufftbench %%a %%a 1 >>cudapm1start.txt
:>32m-64M
for %%a in ( 32805 33075 36864 39200 41472 42336 44800 46080 49152 50176 51200 51840 55296 57344 57600 65536 ) do %exe% -d %dev% -cufftbench %%a %%a %ntimes% >>cudapm1start.txt
[/CODE]

kriesel 2018-03-26 16:52

highest exponents successfully run? Issues seen on high exponents?
 
What are the highest exponents you've successfully run in CUDAPm1 through stage 1 including gcd?
Through both stage 1 and stage 2 including gcds?
What hardware was it run on?

If a run failed on a high exponent, what issues were seen?

kriesel 2018-04-25 12:42

Manually reported P-1 results are getting marked as expired assignments
 
FYI: more at [url]http://www.mersenneforum.org/showpost.php?p=486151&postcount=1499[/url]

kriesel 2018-05-27 16:46

Improved recovery from Windows TDRs on old gpus
 
See the detailed writeup at [URL]http://www.mersenneforum.org/showpost.php?p=488288&postcount=37[/URL]

storm5510 2018-05-27 23:57

[QUOTE=kriesel;488460]See the detailed writeup at [URL]http://www.mersenneforum.org/showpost.php?p=488288&postcount=37[/URL][/QUOTE]

I've ran it on Windows 10 x64 v1709 and those that came before. No issues with any. Now, MS is pushing 1803 at everyone. I had a couple of unrelated applications that would no longer function after the update. This, I have not tried, but will.

storm5510 2018-05-28 00:18

Here is a Windows 10 x64 v1803 Benchmark:

[QUOTE]Device GeForce GTX 1080
Compatibility 6.1
clockRate (MHz) 1835
memClockRate (MHz) 5005

fft max exp ms/iter
1 22133 0.0208
2 43633 0.0279
4 85933 0.0427
32 657719 0.0448
36 738083 0.0618
64 1296011 0.0674
72 1454273 0.0805
80 1612249 0.0871
96 1927129 0.1100
100 2005673 0.1170
108 2162543 0.1219
112 2240863 0.1229
128 2553659 0.1298
144 2865601 0.1413
160 3176779 0.1476
162 3215629 0.1908
200 3951977 0.2008
208 4106587 0.2418
216 4261051 0.2467
225 4434721 0.2572
256 5031737 0.2673
288 5646379 0.3193
320 6259537 0.3488
324 6336103 0.3624
392 7634537 0.4049
400 7786967 0.4346
432 8395997 0.4580
448 8700169 0.4709
512 9914521 0.5011
576 11125619 0.6026
648 12484649 0.6723
686 13200581 0.7345
800 15343429 0.7485
864 16543493 0.8335
1024 19535569 0.9290
1080 20580341 1.1098
1120 21325891 1.1732
1125 21419011 1.1940
1152 21921901 1.2013
1176 22368691 1.2195
1296 24599717 1.2244
1372 26010389 1.4071
1568 29640913 1.4139
1600 30232693 1.4501
1728 32597297 1.5729
1792 33778141 1.7635
2048 38492887 1.7638
2160 40551479 2.1364
2304 43194913 2.1590
2592 48471289 2.3442
2700 50446621 2.7283
2744 51250889 2.7442
3136 58404433 2.7904
3200 59570449 3.1828
3240 60298969 3.2006
3584 66556463 3.2837
4096 75846319 3.5004
4608 85111207 4.2431
5184 95507747 4.7036
5292 97454309 5.3005
5600 103000823 5.4040
5832 107174381 5.6629
6048 111056879 5.8718
6144 112781477 5.9137
6272 115080019 5.9963
6400 117377567 6.2128
6480 118813021 6.4584
6912 126558077 6.5339
7168 131142761 6.6882
7200 131715607 6.9528
8192 149447533 7.1364
9216 167703023 8.5473
9408 171120919 9.5003
9600 174537299 9.6540
9604 174608443 9.9670
9720 176671801 10.1752
9800 178094491 10.2449
10080 183071879 10.4060
10240 185914837 10.4187
10368 188188471 10.9104
11200 202952693 10.9833
11664 211176269 11.5556
12096 218826341 11.9058
12544 226753511 12.3132
12800 231280639 12.6450
12960 234109067 13.1032
13824 249369863 13.3780
14336 258403573 13.6714
14400 259532291 14.0879
16384 294471259 15.1884[/QUOTE]

kriesel 2018-06-05 03:16

Reference Material
 
I was offered "a blog area to consolidate all of your pdfs and guides and stuff" and accepted.
Feel free to have a look and suggest content. (G-rated only;)
General interest gpu related reference material [URL]http://www.mersenneforum.org/showthread.php?t=23371[/URL]
CUDAPm1 P-1 factoring with CUDA on gpus [URL]http://www.mersenneforum.org/showthread.php?t=23389[/URL]
Future updates to material previously posted in this thread will probably occur on the blog threads and not here. Having in-place update without a time limit makes it more manageable there.

kriesel 2018-06-23 22:58

P-1 stage 2 residues not reproducing
 
CUDAPm1 gives 64-bit residues in stage 1 and stage 2. I thought they would reproduce. So look at this. First run, start to finish on a GTX1060, gave in part,
[CODE]Iteration 4050000 M425000083, 0x45bcabd2d9a7a6f7, n = 24192K, CUDAPm1 v0.20 err = 0.26563 (44:19 real, 53.1666 ms/iter, ETA 42:48)
M425000083, 0x03b1ecbe222d57ae, n = 24192K, CUDAPm1 v0.20
Stage 1 complete, estimated total time = 60:43:07
Starting stage 1 gcd.
M425000083 Stage 1 found no factor (P-1, B1=2840000, B2=34080000, e=0, n=24192K CUDAPm1 v0.20)
Starting stage 2.
Using b1 = 2840000, b2 = 34080000, d = 2310, e = 2, nrp = 4
Zeros: 1632727, Ones: 1613513, Pairs: 274647
Processing 1 - 4 of 480 relative primes.
Inititalizing pass... done. transforms: 202, err = 0.25000, (5.29 real, 26.1963 ms/tran, ETA NA)
Transforms: 53864 M425000083, 0x240cabc495e881a9, n = 24192K, CUDAPm1 v0.20 err = 0.25000 (25:05 real, 27.9513 ms/tran, ETA 50:04:27)

Processing 5 - 8 of 480 relative primes.
Inititalizing pass... done. transforms: 233, err = 0.21250, (6.80 real, 29.1806 ms/tran, ETA 50:06:05)
Transforms: 54016 M425000083, 0x4113fb6410f7f0d9, n = 24192K, CUDAPm1 v0.20 err = 0.24219 (25:11 real, 27.9686 ms/tran, ETA 49:41:43)

Processing 9 - 12 of 480 relative primes.
Inititalizing pass... done. transforms: 235, err = 0.20703, (6.93 real, 29.4773 ms/tran, ETA 49:42:03)
Transforms: 54058 M425000083, 0x1f056d902f5168a7, n = 24192K, CUDAPm1 v0.20 err = 0.23438 (25:12 real, 27.9701 ms/tran, ETA 49:17:12)

Processing 13 - 16 of 480 relative primes.
Inititalizing pass... done. transforms: 245, err = 0.20703, (7.19 real, 29.3381 ms/tran, ETA 49:17:15)
Transforms: 54030 M425000083, 0x8bec2d947e1fb288, n = 24192K, CUDAPm1 v0.20 err = 0.24219 (25:11 real, 27.9701 ms/tran, ETA 48:52:07)

Processing 17 - 20 of 480 relative primes.
Inititalizing pass... done. transforms: 251, err = 0.20703, (7.33 real, 29.1833 ms/tran, ETA 48:52:11)
Transforms: 54092 M425000083, 0x896d2f455b59709a, n = 24192K, CUDAPm1 v0.20 err = 0.23438 (25:13 real, 27.9710 ms/tran, ETA 48:26:51)
[/CODE]After it completed, for testing purposes, I copied an early stage two interim save file from the 1060 savefile folder, renamed it to checkfile type name, and found I also needed to have a stage one file there too or it would start over from scratch. Put them in the work folder for a gtx1050Ti run and made a corresponding worktodo entry. On the gtx1050Ti I got this; residues don't match, in stage 2, for the same nrp groups; 9-12 on 1050ti doesn't match 9-12 on the 1060, etc.
[CODE]on gtx1050ti, from an early stage 2 save file from a GTX1060:
Starting stage 2.
Using b1 = 2840000, b2 = 34080000, d = 2310, e = 2, nrp = 4
Zeros: 1632727, Ones: 1613513, Pairs: 274647
Processing 9 - 12 of 480 relative primes.
Inititalizing pass... done. transforms: 235, err = 0.20313, (9.42 real, 40.0848 ms/tran, ETA 49:43:53)
Transforms: 54058 M425000083, 0xed32e096fa463f09, n = 24192K, CUDAPm1 v0.20 err = 0.25439 (38:33 real, 42.7948 ms/tran, ETA 57:59:19)

Processing 13 - 16 of 480 relative primes.
Inititalizing pass... done. transforms: 245, err = 0.21624, (10.63 real, 43.3778 ms/tran, ETA 58:01:16)
Transforms: 54030 M425000083, 0xb1a4e401a42c9b89, n = 24192K, CUDAPm1 v0.20 err = 0.23438 (38:33 real, 42.8108 ms/tran, ETA 61:49:54)

Processing 17 - 20 of 480 relative primes.
Inititalizing pass... done. transforms: 251, err = 0.20313, (10.92 real, 43.5090 ms/tran, ETA 61:50:48)
Transforms: 54092 M425000083, 0xf7400d0435b23338, n = 24192K, CUDAPm1 v0.20 err = 0.23438 (38:35 real, 42.8107 ms/tran, ETA 63:52:09)
[/CODE]Exponent, b1, b2, d, e, nrp, zeros, ones, pairs, all the same. Run all through stage 1, gcd, and 1-8 nrp of stage 2 in common.

9-12 nrp residues and roundoffs differ, between the gtx1060 and gtx1050Ti. Roundoffs are close and at acceptable levels.
13-16 nrp residues and roundoffs differ also.
17-20 nrp residues and roundoffs differ also. Different roundoffs if differences are minor don't concern me. Differing residues do. The runs are both CUDAPm1 V0.20 64-bit CUDA 5.5 for Windows; different host systems, same OS version, same model system, different gpu model.
Maybe I got the wrong stage one file, not quite finished, and that threw it off somehow? Ideas?

ET_ 2018-06-24 16:17

[QUOTE=kriesel;490384]CUDAPm1 gives 64-bit residues in stage 1 and stage 2. I thought they would reproduce. So look at this. First run, start to finish on a GTX1060, gave in part,
[CODE]Iteration 4050000 M425000083, 0x45bcabd2d9a7a6f7, n = 24192K, CUDAPm1 v0.20 err = 0.26563 (44:19 real, 53.1666 ms/iter, ETA 42:48)
M425000083, 0x03b1ecbe222d57ae, n = 24192K, CUDAPm1 v0.20
Stage 1 complete, estimated total time = 60:43:07
Starting stage 1 gcd.
M425000083 Stage 1 found no factor (P-1, B1=2840000, B2=34080000, e=0, n=24192K CUDAPm1 v0.20)
Starting stage 2.
Using b1 = 2840000, b2 = 34080000, d = 2310, e = 2, nrp = 4
Zeros: 1632727, Ones: 1613513, Pairs: 274647
Processing 1 - 4 of 480 relative primes.
Inititalizing pass... done. transforms: 202, err = 0.25000, (5.29 real, 26.1963 ms/tran, ETA NA)
Transforms: 53864 M425000083, 0x240cabc495e881a9, n = 24192K, CUDAPm1 v0.20 err = 0.25000 (25:05 real, 27.9513 ms/tran, ETA 50:04:27)

Processing 5 - 8 of 480 relative primes.
Inititalizing pass... done. transforms: 233, err = 0.21250, (6.80 real, 29.1806 ms/tran, ETA 50:06:05)
Transforms: 54016 M425000083, 0x4113fb6410f7f0d9, n = 24192K, CUDAPm1 v0.20 err = 0.24219 (25:11 real, 27.9686 ms/tran, ETA 49:41:43)

Processing 9 - 12 of 480 relative primes.
Inititalizing pass... done. transforms: 235, err = 0.20703, (6.93 real, 29.4773 ms/tran, ETA 49:42:03)
Transforms: 54058 M425000083, 0x1f056d902f5168a7, n = 24192K, CUDAPm1 v0.20 err = 0.23438 (25:12 real, 27.9701 ms/tran, ETA 49:17:12)

Processing 13 - 16 of 480 relative primes.
Inititalizing pass... done. transforms: 245, err = 0.20703, (7.19 real, 29.3381 ms/tran, ETA 49:17:15)
Transforms: 54030 M425000083, 0x8bec2d947e1fb288, n = 24192K, CUDAPm1 v0.20 err = 0.24219 (25:11 real, 27.9701 ms/tran, ETA 48:52:07)

Processing 17 - 20 of 480 relative primes.
Inititalizing pass... done. transforms: 251, err = 0.20703, (7.33 real, 29.1833 ms/tran, ETA 48:52:11)
Transforms: 54092 M425000083, 0x896d2f455b59709a, n = 24192K, CUDAPm1 v0.20 err = 0.23438 (25:13 real, 27.9710 ms/tran, ETA 48:26:51)
[/CODE]After it completed, for testing purposes, I copied an early stage two interim save file from the 1060 savefile folder, renamed it to checkfile type name, and found I also needed to have a stage one file there too or it would start over from scratch. Put them in the work folder for a gtx1050Ti run and made a corresponding worktodo entry. On the gtx1050Ti I got this; residues don't match, in stage 2, for the same nrp groups; 9-12 on 1050ti doesn't match 9-12 on the 1060, etc.
[CODE]on gtx1050ti, from an early stage 2 save file from a GTX1060:
Starting stage 2.
Using b1 = 2840000, b2 = 34080000, d = 2310, e = 2, nrp = 4
Zeros: 1632727, Ones: 1613513, Pairs: 274647
Processing 9 - 12 of 480 relative primes.
Inititalizing pass... done. transforms: 235, err = 0.20313, (9.42 real, 40.0848 ms/tran, ETA 49:43:53)
Transforms: 54058 M425000083, 0xed32e096fa463f09, n = 24192K, CUDAPm1 v0.20 err = 0.25439 (38:33 real, 42.7948 ms/tran, ETA 57:59:19)

Processing 13 - 16 of 480 relative primes.
Inititalizing pass... done. transforms: 245, err = 0.21624, (10.63 real, 43.3778 ms/tran, ETA 58:01:16)
Transforms: 54030 M425000083, 0xb1a4e401a42c9b89, n = 24192K, CUDAPm1 v0.20 err = 0.23438 (38:33 real, 42.8108 ms/tran, ETA 61:49:54)

Processing 17 - 20 of 480 relative primes.
Inititalizing pass... done. transforms: 251, err = 0.20313, (10.92 real, 43.5090 ms/tran, ETA 61:50:48)
Transforms: 54092 M425000083, 0xf7400d0435b23338, n = 24192K, CUDAPm1 v0.20 err = 0.23438 (38:35 real, 42.8107 ms/tran, ETA 63:52:09)
[/CODE]Exponent, b1, b2, d, e, nrp, zeros, ones, pairs, all the same. Run all through stage 1, gcd, and 1-8 nrp of stage 2 in common.

9-12 nrp residues and roundoffs differ, between the gtx1060 and gtx1050Ti. Roundoffs are close and at acceptable levels.
13-16 nrp residues and roundoffs differ also.
17-20 nrp residues and roundoffs differ also. Different roundoffs if differences are minor don't concern me. Differing residues do. The runs are both CUDAPm1 V0.20 64-bit CUDA 5.5 for Windows; different host systems, same OS version, same model system, different gpu model.
Maybe I got the wrong stage one file, not quite finished, and that threw it off somehow? Ideas?[/QUOTE]

IIRC, CUDAPm1 used the available memory and the type of GPU to define the optimal magic numbers for P-1 (b1, b2, d, e, brp). I didn't look at the code, but I guesstimate that the stage2 residue to start from was not compatible, or not correctly reshaped for the GTX 1050Ti.

kriesel 2018-06-24 17:37

[QUOTE=ET_;490434]IIRC, CUDAPm1 used the available memory and the type of GPU to define the optimal magic numbers for P-1 (b1, b2, d, e, brp). I didn't look at the code, but I guesstimate that the stage2 residue to start from was not compatible, or not correctly reshaped for the GTX 1050Ti.[/QUOTE]
I had thought that it would be safe to go from a small-memory gpu to a new larger-memory gpu; more than adequate room to run the bounds, d, e, nrp combination that fit on the more restricted memory gpu. A gpu with more memory would, from a fresh start, probably select bigger bounds to take advantage of the roomier memory on the second gpu, but I think that's an optimization, not a requirement.

The other way around, trying to run something started on a more-memory gpu, transplanted to a less-memory gpu, is likely to fail in stage 2 or perhaps even in stage 1 due to what you describe. The author said so in [url]http://www.mersenneforum.org/showpost.php?p=359086&postcount=421[/url] I've also found cases where start to finish on one gpu, the program selects bounds for stage 2 that have no hope of running to successful completion, requiring gigabytes more memory than is available on the gpu on which those bounds get selected by the program. But neither of those correspond to the case I posted about here.

Looking at read_checkpoint_packed, and other routines, to write a script to export CUDAPm1 savefiles to neutral exchange format, I did not see anything other than these parameters (nothing explicit about how many ROPs or shaders or whatever the gpu had or must have, nor how much memory).
The residue is a word stream, a pretty simple shape. I had the impression the entire save file is 4-byte unsigned integers. (times in seconds). That checked out with the total savefile size, to the byte as I recall.

[CODE]# fread (x_packed, 1, sizeof (unsigned) * (end + 25) , fPtr)
# x_packed[end] = q;
# x_packed[end + 1] = 0; // n
# x_packed[end + 2] = 1; // iteration number
# x_packed[end + 3] = 0; // stage
# x_packed[end + 4] = 0; // accumulated time
# x_packed[end + 5] = 0; // b1
# // 6-9 reserved for extending b1
# // 10-24 reserved for stage 2
# x_packed[end + 10] = b2;
# x_packed[end + 11] = d;
# x_packed[end + 12] = e;
# x_packed[end + 13] = nrp;
# x_packed[end + 14] = 0; // m = number of relative primes already finished
# x_packed[end + 15] = 0; // k = how far done with current crop of relative primes
# x_packed[end + 16] = 0; // t = where to find next relative prime in the bit array
# x_packed[end + 17] = 0; // extra initialization transforms from starting in the middle of a pass
# x_packed[end + 18] = itran_done;
# x_packed[end + 19] = ptran_done + num_tran;
# x_packed[end + 20] = itime;
# x_packed[end + 21] = ptime;
#22-24?[/CODE]The words 0 to end-1 are x_packed. The rest is scalars which the export program I created claims are as follows. Note, these might be from an earlier file than the one I used.
[CODE]Format Mersenne Neutral Exchange d0.4
FileOrigin "CUDAPm1export for Windows" "V0.1 2018-06-23" c425000083s2. 2018 Jun 23 20:52:21 UTC
Type P-1 stage 2
Exponent 425000083
Iteration 4098308
N 24772608
AccumulatedTime 216019
B1 2840000
Reserved6 0
Reserved7 0
Reserved8 0
Reserved9 0
B2 34080000
D 2310
E 2
NRP 4
M 8
K 1229
T 8
Midpasstransforms 0
Itran_done 435
PtrandonePlusNumtran 107880
Itime 12
Ptime 3016
Reserved22 0
Reserved23 0
Reserved24 0
DataFormat binary bytes
CRC32 0x07291d0b
DataBinaryByteCount 53125012
EndOfHeader[/CODE]I see nothing gpu-specific there; no rops or shaders counts, not even choices of thread counts for the 3 phases of the computation.

kriesel 2018-06-25 05:42

prime95 P-1 bug since fixed. Is it present in CUDAPm1?
 
[URL]http://www.mersenneforum.org/showthread.php?t=22776[/URL] shows an issue with prime95 P-1 stage 1 computations, since fixed. Looking at old prime95 source code shows it was present at least back to prime95 v28.5 source, dated 2014, & perhaps earlier, though the code in v27.7, dated 2012, is different. This does not rule out it being present in prime95 P-1 at the time CUDAPm1 was developed, in 2013 (February to November). Since CUDAPm1 development relied on reference to prime95's code and followed it, and CUDAPm1 development and maintenance ended well before the issue was found and fixed in prime95, the issue might also be present in the currently available versions of CUDAPm1.

kriesel 2018-06-27 13:35

B2 reported may not match B2 used
 
In CUDAPm1 v0.20, if a run is continued on a gpu with more memory than it was started on, new bounds are calculated and then the program indicates it will continue with the bounds in the save file. After the run is completed, the result record contains the B2 found from the selection calculation, not the value from the save file that the program indicates was used. Example log excerpts follow.


Using threads: norm1 512, mult 256, norm2 128.
Stage 2 checkpoint found.
Using up to 3780M GPU memory.
Selected B1=3100000, B2=[B]62000000[/B], 3.18% chance of finding a factor
Using B1 = 2840000 from savefile.
Continuing stage 2 from a partial result of M425000083 fft length = 24192K
Starting stage 2.
[B]Using b1 = 2840000, b2 = 34080000[/B], d = 2310, e = 2, nrp = 4
Zeros: 1632727, Ones: 1613513, Pairs: 274647
Processing 9 - 12 of 480 relative primes.
Inititalizing pass... done. transforms: 235, err = 0.20313, (9.42 real, 40.0848 ms/tran, ETA 49:43:53)
Transforms: 54058 M425000083, 0xed32e096fa463f09, n = 24192K, CUDAPm1 v0.20 err = 0.25439 (38:33 real, 42.7948 ms/tran, ETA 57:59:19)
...

Processing 477 - 480 of 480 relative primes.
Inititalizing pass... done. transforms: 299, err = 0.21094, (12.89 real, 43.1271 ms/tran, ETA 39:01)
Transforms: 53916 M425000083, 0x7efe91810f60cfa3, n = 24192K, CUDAPm1 v0.20 err = 0.25000 (38:28 real, 42.8098 ms/tran, ETA 0:46)

Stage 2 complete, 6506485 transforms, estimated total time = 76:55:59
Starting stage 2 gcd.
M425000083 Stage 2 found no factor (P-1, B1=2840000, B2=[B]62000000[/B], e=2, n=24192K CUDAPm1 v0.20)

kriesel 2018-07-05 16:24

new to me GPU, new CUDAPm1 behavior seen
 
Based on what 2 GB Quadro 4000 and 3GB GTX 1060 can run, I thought a 2.5GB Quadro 5000 (which is CC 2.0) would be able to run exponents up to 300M, perhaps higher, in CUDAPm1 v0.20 x64 CUDA 5.5 20130923 version also. It passed a memory test and correctly found the factor for M50001781.

But it failed to run stage 2 on [CODE]M87771547, 0xf6c7342f2bab37fa, n = 5040K, CUDAPm1 v0.20
Stage 1 complete, estimated total time = 3:35:59
Starting stage 1 gcd.
M87771547 Stage 1 found no factor (P-1, B1=755000, B2=17365000, e=0, n=5040K CUDAPm1 v0.20)
Starting stage 2.
Using b1 = 755000, b2 = 17365000, d = 2310, e = 2, nrp = 48
Zeros: 785147, Ones: 880453, Pairs: 172236
Processing 1 - 48 of 480 relative primes.
Inititalizing pass... )

Quitting, estimated time spent = 0:03
[/CODE]With repeated restarts, this was repeatably quitting after a few seconds of stage 2 with no reason given.

Same thing occurs on [CODE]M200000491, 0x8ef21dc89a0b7d8c, n = 11250K, CUDAPm1 v0.20
Stage 1 complete, estimated total time = 16:20:44
Starting stage 1 gcd.
M200000491 Stage 1 found no factor (P-1, B1=1540000, B2=32340000, e=0, n=11250K CUDAPm1 v0.20)
Starting stage 2.
Using b1 = 1540000, b2 = 32340000, d = 2310, e = 2, nrp = 16
Zeros: 1515937, Ones: 1585823, Pairs: 290684
Processing 1 - 16 of 480 relative primes.
Inititalizing pass... )

Quitting, estimated time spent = 0:03[/CODE]Again, repeated restarts produce "Quitting" after a few seconds. I'm trying a few other exponents. But for now, CUDAPm1 on this model GPU appears incapable of running stage 2 P-1 at exponents of current or future interest (p>88M), for some unknown reason.

The test exponent 50001781 ran on threads: norm1 512, mult 256, norm2 128.
and fft length 2688k, which don't appear in the fft file or threads file.

Applicable threads entries are; (88M)
5040 64 64 32 11.5743

and (200M)
11250 128 64 1024 26.5149

Retry with 5040 128 64 32 in the threads file per [URL="http://www.mersenneforum.org/showpost.php?p=359096&postcount=424"]http://www.mersenneforum.org/showpost.php?p=359096&postcount=424,[/URL]
on M88, it progresses.

Any ideas what to do to get M200M running stage 2 successfully?

Are there any CUDA55 or higher executables available with the 20131118 or later code fixes, for Windows?

kriesel 2018-07-19 02:13

new behavior: 16 stage 2 residue values taking turns
 
anomalous Quadro 5000 m350000071 cudapm1 V0.20 20130923 CUDA 5.5 on Windows, interim stage 2 residues:

After a normal looking stage 1, the 120 residues output in stage 2 at NRP=4 are repetitive, over a very limited subset of 16 values,
listed below by ascending value, and that look suspicious by their regularity. (I'm used to runs with pseudorandom looking stage 1 and stage 2 residues. This exponent/gpu combination had seemingly well behaved stage 1 residues but peculiarities throughout stage 2.
[CODE]
_____8___4___2___1 difference appearing in the respective bit positions
0xfff7fffbfffdfffe
0xfff7fffbfffdffff

0xfff7fffbfffffffe
0xfff7fffbffffffff

0xfff7fffffffdfffe
0xfff7fffffffdffff

0xfff7fffffffffffe
0xfff7ffffffffffff


0xfffffffbfffdfffe
0xfffffffbfffdffff

0xfffffffbfffffffe
0xfffffffbffffffff

0xfffffffffffdfffe
0xfffffffffffdffff

0xfffffffffffffffe
0xffffffffffffffff[/CODE]End of stage 1 and beginning of stage 2 looked normal. Stage 2 was using 1863MB of 2.5GB on the gpu. At stage 2 wrapup/gcd, it dropped to 746MB.
[CODE]
Iteration 3650000 M350000071, 0xfa26579b34919a34, n = 20412K, CUDAPm1 v0.20 err = 0.12109 (20:01 real, 48.0195 ms/iter, ETA 22:37)
Iteration 3675000 M350000071, 0x3ca8420d52bd5a27, n = 20412K, CUDAPm1 v0.20 err = 0.11719 (20:01 real, 48.0155 ms/iter, ETA 2:37)
M350000071, 0x509e08b93355b407, n = 20412K, CUDAPm1 v0.20
Stage 1 complete, estimated total time = 49:05:07
Starting stage 1 gcd.
M350000071 Stage 1 found no factor (P-1, B1=2550000, B2=31875000, e=0, n=20412K CUDAPm1 v0.20)
Starting stage 2.
Using b1 = 2550000, b2 = 31875000, d = 2310, e = 2, nrp = 4
Zeros: 1527348, Ones: 1520172, Pairs: 260423
Processing 1 - 4 of 480 relative primes.
Inititalizing pass... done. transforms: 198, err = 0.11328, (4.77 real, 24.0679 ms/tran, ETA NA)
Transforms: 50660 M350000071, 0xfffffffbfffffffe, n = 20412K, CUDAPm1 v0.20 err = 0.11328 (21:53 real, 25.9248 ms/tran, ETA 43:39:45)

Processing 5 - 8 of 480 relative primes.
Inititalizing pass... done. transforms: 229, err = 0.10547, (5.98 real, 26.1210 ms/tran, ETA 43:42:27)
Transforms: 50812 M350000071, 0xfff7fffbfffdffff, n = 20412K, CUDAPm1 v0.20 err = 0.10547 (21:57 real, 25.9243 ms/tran, ETA 43:19:29)

Processing 9 - 12 of 480 relative primes.
Inititalizing pass... done. transforms: 231, err = 0.10547, (5.99 real, 25.9324 ms/tran, ETA 43:20:31)
Transforms: 50810 M350000071, 0xfff7fffffffdffff, n = 20412K, CUDAPm1 v0.20 err = 0.10547 (21:57 real, 25.9239 ms/tran, ETA 42:57:55)

Processing 13 - 16 of 480 relative primes.
Inititalizing pass... done. transforms: 241, err = 0.10547, (6.24 real, 25.8988 ms/tran, ETA 42:58:31)
Transforms: 50762 M350000071, 0xfff7fffbfffffffe, n = 20412K, CUDAPm1 v0.20 err = 0.10547 (21:56 real, 25.9241 ms/tran, ETA 42:35:58)

Processing 17 - 20 of 480 relative primes.
Inititalizing pass... done. transforms: 247, err = 0.10547, (6.40 real, 25.9017 ms/tran, ETA 42:36:30)
Transforms: 50814 M350000071, 0xfffffffbfffffffe, n = 20412K, CUDAPm1 v0.20 err = 0.10547 (21:57 real, 25.9239 ms/tran, ETA 42:14:22)
[/CODE]Etc. It concluded with a result line no factor found.

kriesel 2018-07-19 02:17

[QUOTE=kriesel;491191]
Same thing occurs on [CODE]M200000491, 0x8ef21dc89a0b7d8c, n = 11250K, CUDAPm1 v0.20
Stage 1 complete, estimated total time = 16:20:44
Starting stage 1 gcd.
M200000491 Stage 1 found no factor (P-1, B1=1540000, B2=32340000, e=0, n=11250K CUDAPm1 v0.20)
Starting stage 2.
Using b1 = 1540000, b2 = 32340000, d = 2310, e = 2, nrp = 16
Zeros: 1515937, Ones: 1585823, Pairs: 290684
Processing 1 - 16 of 480 relative primes.
Inititalizing pass... )

Quitting, estimated time spent = 0:03[/CODE] (200M)
11250 128 64 1024 26.5149

Any ideas what to do to get M200M running stage 2 successfully?
[/QUOTE]
Doubling norm1 for the 11250k fft length worked for the 200M exponent

kriesel 2018-07-19 18:00

[QUOTE=kriesel;492089]anomalous Quadro 5000 m350000071 cudapm1 V0.20 20130923 CUDA 5.5 on Windows, interim stage 2 residues:
End of stage 1 and beginning of stage 2 looked normal. Stage 2 was using 1863MB of 2.5GB on the gpu. At stage 2 wrapup/gcd, it dropped to 746MB.
[CODE]
Iteration 3650000 M350000071, 0xfa26579b34919a34, n = 20412K, CUDAPm1 v0.20 err = 0.12109 (20:01 real, 48.0195 ms/iter, ETA 22:37)
Iteration 3675000 M350000071, 0x3ca8420d52bd5a27, n = 20412K, CUDAPm1 v0.20 err = 0.11719 (20:01 real, 48.0155 ms/iter, ETA 2:37)
M350000071, 0x509e08b93355b407, n = 20412K, CUDAPm1 v0.20
Stage 1 complete, estimated total time = 49:05:07
Starting stage 1 gcd.
M350000071 Stage 1 found no factor (P-1, B1=2550000, B2=31875000, e=0, n=20412K CUDAPm1 v0.20)
Starting stage 2.
Using b1 = 2550000, b2 = 31875000, d = 2310, e = 2, nrp = 4
Zeros: 1527348, Ones: 1520172, Pairs: 260423
Processing 1 - 4 of 480 relative primes.
Inititalizing pass... done. transforms: 198, err = 0.11328, (4.77 real, 24.0679 ms/tran, ETA NA)
Transforms: 50660 M350000071, 0xfffffffbfffffffe, n = 20412K, CUDAPm1 v0.20 err = 0.11328 (21:53 real, 25.9248 ms/tran, ETA 43:39:45)

Processing 5 - 8 of 480 relative primes.
Inititalizing pass... done. transforms: 229, err = 0.10547, (5.98 real, 26.1210 ms/tran, ETA 43:42:27)
Transforms: 50812 M350000071, 0xfff7fffbfffdffff, n = 20412K, CUDAPm1 v0.20 err = 0.10547 (21:57 real, 25.9243 ms/tran, ETA 43:19:29)

Processing 9 - 12 of 480 relative primes.
Inititalizing pass... done. transforms: 231, err = 0.10547, (5.99 real, 25.9324 ms/tran, ETA 43:20:31)
Transforms: 50810 M350000071, 0xfff7fffffffdffff, n = 20412K, CUDAPm1 v0.20 err = 0.10547 (21:57 real, 25.9239 ms/tran, ETA 42:57:55)

Processing 13 - 16 of 480 relative primes.
Inititalizing pass... done. transforms: 241, err = 0.10547, (6.24 real, 25.8988 ms/tran, ETA 42:58:31)
Transforms: 50762 M350000071, 0xfff7fffbfffffffe, n = 20412K, CUDAPm1 v0.20 err = 0.10547 (21:56 real, 25.9241 ms/tran, ETA 42:35:58)

Processing 17 - 20 of 480 relative primes.
Inititalizing pass... done. transforms: 247, err = 0.10547, (6.40 real, 25.9017 ms/tran, ETA 42:36:30)
Transforms: 50814 M350000071, 0xfffffffbfffffffe, n = 20412K, CUDAPm1 v0.20 err = 0.10547 (21:57 real, 25.9239 ms/tran, ETA 42:14:22)
[/CODE]Etc. It concluded with a result line no factor found.[/QUOTE]


As a test, I repeated part of stage 2 from very early, and got the following, on a GTX1050Ti
[CODE]Continuing stage 2 from a partial result of M350000071 fft length = 20412K
Starting stage 2.
Using b1 = 2550000, b2 = 31875000, d = 2310, e = 2, nrp = 4
Zeros: 1527348, Ones: 1520172, Pairs: 260423
Processing 5 - 8 of 480 relative primes.
Inititalizing pass... done. transforms: 229, err = 0.10156, (7.87 real, 34.3738 ms/tran, ETA 43:43:20)
Transforms: 50812 M350000071, 0x45dfef64c039aeff, n = 20412K, CUDAPm1 v0.20 err = 0.11133 (31:04 real, 36.6751 ms/tran, ETA 52:17:38)

Processing 9 - 12 of 480 relative primes.
Inititalizing pass... done. transforms: 231, err = 0.10156, (8.50 real, 36.7856 ms/tran, ETA 52:20:27)
SIGINT caught, writing checkpoint.
Transforms: 3300 M350000071, 0x8eb67bcffa00c096, n = 20412K, CUDAPm1 v0.20 err = 0.10938 (2:02 real, 36.8244 ms/tran, ETA 52:35:45)

Quitting, estimated time spent = 55:18
[/CODE]So I concluded probably every bit of stage 2 on the Quadro 5000 was wrong, and resumed it from directly after stage 1 gcd on the GTX1050Ti. That caused it to select different parameters because of the greater available gpu memory.

[CODE]Using b1 = 2550000, b2 = 62188750, d = 4620, e = 2, nrp = 14
Zeros: 3004088, Ones: 2961352, Pairs: 536666
Processing 1 - 14 of 960 relative primes.[/CODE]

kriesel 2018-08-02 20:01

comments in worktodo for CUDAPm1!
 
While looking for something else, I stumbled across this:
The source of parse.c indicates # or \\ or / are comment characters marking the rest of a line as a comment

I've confirmed by test that # or \\ work; / did not in my test, which placed them at the beginnings of records. I could tell by the line number in warning messages which did not work.

kriesel 2018-08-19 18:41

How to build CUDAPm1 for current CUDA levels, development tools available?
 
Yes, there's a makefile, from 2013, and linux. But my experience is makefiles that work in linux don't work in Windows, even in the msys2/mingw64 environment, as is or with what look like merited edits. Also NVIDIA's compiler nvcc wants Visual Studio not g++. And a lot has changed in the 5 years since there were posted Windows executables.

Presumably various paths would need to be updated for the different OS, different CUDA toolkit version, different c++ compiler version etc.
Also CUFLAGS would need to be updated for new available compute levels to be added, and probably for old ones no longer supported by nvcc to be dropped.

And the CUDAPm1 makefile contains:
L = -lcufft -lcudart -lm -lgmp

Presumably gmp is that of [URL]https://gmplib.org/[/URL] which I've installed.

And the m is this? [URL]https://stackoverflow.com/questions/1033898/why-do-you-have-to-link-the-math-library-in-c[/URL]

henryzz 2018-08-20 15:19

[QUOTE=kriesel;494224]Yes, there's a makefile, from 2013, and linux. But my experience is makefiles that work in linux don't work in Windows, even in the msys2/mingw64 environment, as is or with what look like merited edits. Also NVIDIA's compiler nvcc wants Visual Studio not g++. And a lot has changed in the 5 years since there were posted Windows executables.

Presumably various paths would need to be updated for the different OS, different CUDA toolkit version, different c++ compiler version etc.
Also CUFLAGS would need to be updated for new available compute levels to be added, and probably for old ones no longer supported by nvcc to be dropped.

And the CUDAPm1 makefile contains:
L = -lcufft -lcudart -lm -lgmp

Presumably gmp is that of [URL]https://gmplib.org/[/URL] which I've installed.

And the m is this? [URL]https://stackoverflow.com/questions/1033898/why-do-you-have-to-link-the-math-library-in-c[/URL][/QUOTE]

Is there a reason why Visual Studio is not an option?

kriesel 2018-08-20 17:32

[QUOTE=henryzz;494292]Is there a reason why Visual Studio is not an option?[/QUOTE]
It may be. I'm reluctant to spend a lot. I'm preparing to make a first attempt to build CUDAPm1. It seemed a useful exercise to identify what all code needs to be gathered for compile/link. And the necessary tools. And some understanding (which I'm still working on).

Visual Studio might be part of the process. I have (free) VS 2017 Community Edition installed. Nvcc v9.2 is compatible with VS 2017 but nvcc 8.0 and earlier are not. From what I've read, nvcc preprocesses the CUDA specific stuff and then uses VS's cl.exe to compile and link. A specific version of nvcc is limited in what versions of VS it will work with, and in what compute capability levels are supported. I have a lot of old gpus, 2.x compute capability. Nvcc 9.x (only version compatible with VS 2017) doesn't support CUDA level 2.x or lower. VS availability for free is limited to only the latest flavor (2017, or 15.x currently). A VS Pro license for ~$500 also gets only the latest flavor. To get access to earlier versions, such as VS2012 that is compatible with many versions of CUDA toolkt, including as far back as v5.5, seems to require a Pro subscription $1200 first year, $800 annually thereafter. [URL]https://visualstudio.microsoft.com/vs/pricing/[/URL]. Or there are alternatives like used resold software on eBay.
Or maybe I've misunderstood something while climbing this particular learning curve. If so, please share data/corrections.

kriesel 2018-08-20 19:13

ah42 fork
 
Hi,


I stumbled on this a while back, noted it, forgot about it, and recently had another look. Has anyone compiled and run this? If so, how did it compare to the sourceforge version, which is what's mirrored at mersenne.ca?

[URL]https://github.com/ah42/cuda-p1[/URL]

henryzz 2018-08-20 19:57

[QUOTE=kriesel;494300]It may be. I'm reluctant to spend a lot. I'm preparing to make a first attempt to build CUDAPm1. It seemed a useful exercise to identify what all code needs to be gathered for compile/link. And the necessary tools. And some understanding (which I'm still working on).

Visual Studio might be part of the process. I have (free) VS 2017 Community Edition installed. Nvcc v9.2 is compatible with VS 2017 but nvcc 8.0 and earlier are not. From what I've read, nvcc preprocesses the CUDA specific stuff and then uses VS's cl.exe to compile and link. A specific version of nvcc is limited in what versions of VS it will work with, and in what compute capability levels are supported. I have a lot of old gpus, 2.x compute capability. Nvcc 9.x (only version compatible with VS 2017) doesn't support CUDA level 2.x or lower. VS availability for free is limited to only the latest flavor (2017, or 15.x currently). A VS Pro license for ~$500 also gets only the latest flavor. To get access to earlier versions, such as VS2012 that is compatible with many versions of CUDA toolkt, including as far back as v5.5, seems to require a Pro subscription $1200 first year, $800 annually thereafter. [URL]https://visualstudio.microsoft.com/vs/pricing/[/URL]. Or there are alternatives like used resold software on eBay.
Or maybe I've misunderstood something while climbing this particular learning curve. If so, please share data/corrections.[/QUOTE]
The old installers can be got through [url]https://visualstudio.microsoft.com/vs/older-downloads/[/url]

There are a few hoops as part of this you have to subscribe to the developer essentials package for free which wasn't obvious initially for me.

kriesel 2018-08-29 21:56

[QUOTE=henryzz;494325]The old installers can be got through [URL]https://visualstudio.microsoft.com/vs/older-downloads/[/URL]

There are a few hoops as part of this you have to subscribe to the developer essentials package for free which wasn't obvious initially for me.[/QUOTE]
Thanks for the tip. The earliest version available there is VS 2013. (I'd hoped to be able to get back to VS2010.)

After multiple failed download attempts via my crappy slow costly ISP (768k/128k DSL, 4.8GB 14 hour download projected if things were working well, actual 1.5GB max per attempt, multiple days elapsed), the utility contractor working in my neighborhood to install fiber put an end to it by cutting the neighborhood's telco voice/DSL cable. Driving to another location got the 4.8GB ISO download on the first try in under 3 hours. With such slow and unreliable internet, I tend to go for a full install image that can be put on a local file server, download once, and reuse locally. Crappy-slow-costly-ISP was immediately contacted within 10 minutes of the start of the outage, took an hour of phone time to generate a trouble ticket, and projected beginning to repair after a week of no service, and claimed they would process a bill credit. Service cut was on the first day of the billing cycle. I've already received a bill for a full month's service not received or receivable, beginning the day the cable was cut, and the bill did not include the promised credit for outage. The DSL in this neighborhood runs from the nearest village, miles away, preventing high speed, instead of running from the nearest hut, a half mile away, that could probably provide 25.Mbps.

James Heinrich 2018-08-29 22:39

I've had ISP troubles like that in the past (it once took an ISP 6 weeks of no internet before they fixed whatever was broken), so I can sympathize. I'm happy to be on 250Mbps service now (4.8GB ISO should complete in under 3 mins). I hope your fiber install is completed soon.

kriesel 2018-08-29 23:32

[QUOTE=James Heinrich;494889]I've had ISP troubles like that in the past (it once took an ISP 6 weeks of no internet before they fixed whatever was broken), so I can sympathize. I'm happy to be on 250Mbps service now (4.8GB ISO should complete in under 3 mins). I hope your fiber install is completed soon.[/QUOTE]Ouch, 6 weeks, that would be trouble. What cut my cable was work to bring service being marketed as a choice of 300M/300M, or 1000M/400M fiber service.
I turned road warrior for a week and now have a spreadsheet documenting open hours and speed tests for the nearest free WiFi. It was beginning to get highly inconvenient as various Prime95 workers ran out of work and completed work was spooled up.

There have been times when the ping times to the nearest university are routinely 1 to 4 seconds or longer (normal is under 70msec). There have been times when the ISP's dns server is quite hosed, and I provided tech support to their "tech support" phone drone, and it stayed broken for days or weeks, and I switched to using opendns not theirs. Etc.

My greedy cell provider is another story. Ancient plan as is, any fraction of a MB that's not an SMS message is $3. Outbound SMS text is $0.25 each, which is $1.60 and up per KILOBYTE. Nope, can't just add a decent data option to the existing plan, must switch to a hundreds-per-year-more-costly base plan to add data. (Similar story from 3 different locations I tried. One tried to roll a $200 wireless router purchase in to it also.) Base plan without those charges costs more than double per month, an alternative I've recently discovered, while trying to just add some tide-me-over slow wireless data connecting to one laptop during the DSL outage. If they'd offered something reasonable they could have kept a loyal long term customer who's already paid them several thousand dollars over the years. In my opinion there are only one or two cell providers charging reasonably and offering plans that fit a range of usage levels, while the others are all upselling and gouging. Time to go phone shopping on line.


All times are UTC. The time now is 23:19.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.