mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing

Reply
 
Thread Tools
Old 2018-05-06, 06:16   #276
Madpoo
Serpentine Vermin Jar
 
Madpoo's Avatar
 
Jul 2014

2×13×131 Posts
Default

Quote:
Originally Posted by Prime95 View Post
Looking at the full JSON result text, the bad run was done without Gerbicz error-checking. Perhaps the user was getting a lot of rollbacks, looked in undoc.txt, and turned off Gerbicz error checking.

Without looking at Aaron's PHP code I bet the (reliable) text is only output if Gerbicz error-checking was used.
Yeah, I base the reliability on it being that "type=5" (I think? going from memory) if the test had the Gerbicz checking. I believe that type is set when the result is checked in.
Madpoo is offline   Reply With Quote
Old 2018-05-06, 15:54   #277
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

17·487 Posts
Default

Quote:
Originally Posted by Madpoo View Post
Yeah, I base the reliability on it being that "type=5" (I think? going from memory) if the test had the Gerbicz checking. I believe that type is set when the result is checked in.
It is the result check-in process that looks for the Gerbicz keyword. If there it sets the result_state column in a SQL table to either "composite by prp" or "composite by reliable prp". Aaron keys of the result state to print "reliable" in his report (not the prp_type).

Last fiddled with by Prime95 on 2018-05-06 at 15:55
Prime95 is offline   Reply With Quote
Old 2018-06-05, 02:30   #278
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

24·3·163 Posts
Default

Quote:
Originally Posted by preda View Post
Placing the buffers avoiding "bad memory" is a good idea, but I have not implemented that yet. A question I have about memory testing, is how does the virtual to physical address mapping interact with bad memory locations. I would suppose, if the virtual mapping changes, that the same bad physical location can show up at different spots in the virtual address space at different points in time, and pinning down the actual bad location (physical) becomes difficult.
Maybe the following will make more sense to you as you delve into CUDA programming, than it did to me, skimming it. It's the memtest portion of CUDALucas. The number of blocks tested is user specifiable. When I specified the maximum the program would accept, it gave what seemed to be reproducible block numbers of error locations, on my 725Mhz GTX480, in the middle third roughly of the block count.

Code:
void memtest(int s, int iter, int device)
{
  int i, j, k, m;
  int q = 60000091;
  int n = 3200 * 1024;
  int rand_int;
  int *i_data;
  double *d_data;
  double *dev_data1;
  int *d_compare;
  int h_compare;
  int total = 0;
  int total_iterations;
  int iterations_done = 0;
  float percent_done = 0.0f;
  timeval time0, time1;
  unsigned long long diff;
  unsigned long long diff1;
  unsigned long long diff2;
  unsigned long long ttime = 0;
  double total_bytes;
  size_t global_mem, free_mem;

  cudaMemGetInfo(&free_mem, &global_mem);
  printf("CUDA reports %lluM of %lluM GPU memory free.\n", (unsigned long long)free_mem/1024/1024, (unsigned long long)global_mem/1024/1024);
  if((size_t) s *1024 * 1024 * 25  > free_mem )
  {
    s = free_mem / 1024 / 1024 / 25;
     printf("Reducing size to %d\n", s);
  }
  printf("\nInitializing memory test using %0.0fMB of memory on device %d...\n", n / 1024.0 * s / 1024.0 * 8.0, device);

  i_data = (int *) malloc (sizeof (int) * n);
  d_data = (double *) malloc (sizeof (double) * n * 5);

  alloc_gpu_mem(n);
  write_gpu_data(q, n);

  srand(time(0));
  for (j = 0; j < n; j++)    i_data[j] = 1;
  cudaMemcpy (g_xint, i_data, sizeof (int) * n, cudaMemcpyHostToDevice);
  apply_weights <<<n /1024,128 >>> (g_x, g_xint, g_ttp, g_s);
  cudaMemcpy (&d_data[0 * n], g_x, sizeof (double) * n, cudaMemcpyDeviceToHost);

  for (j = 0; j < n; j++)
  {
    rand_int = rand() % (1 << 18);
    rand_int -= (1 << 17);
    i_data[j] = rand_int;
  }
  cudaMemcpy (g_xint, i_data, sizeof (int) * n, cudaMemcpyHostToDevice);
  apply_weights <<<n /1024,128 >>> (g_x, g_xint, g_ttp, g_s);
  cudaMemcpy (&d_data[1 * n], g_x, sizeof (double) * n, cudaMemcpyDeviceToHost);

  cufftExecZ2Z (g_plan, (cufftDoubleComplex *) g_x, (cufftDoubleComplex *) g_x, CUFFT_INVERSE);
  cudaMemcpy (&d_data[2 * n], g_x, sizeof (double) * n, cudaMemcpyDeviceToHost);

  square <<< n / (4 * g_thr[0]), g_thr[0] >>> (n, g_x, g_ct);
  cudaMemcpy (&d_data[3 * n], g_x, sizeof (double) * n, cudaMemcpyDeviceToHost);

  cufftExecZ2Z (g_plan, (cufftDoubleComplex *) g_x, (cufftDoubleComplex *) g_x, CUFFT_INVERSE);
  cudaMemcpy (&d_data[4 * n], g_x, sizeof (double) * n, cudaMemcpyDeviceToHost);

  free(i_data);
  free(g_size1);
  free_gpu(1);

  cutilSafeCall (cudaMalloc ((void **) &d_compare, sizeof (int)));
  cutilSafeCall (cudaMemset (d_compare, 0, sizeof (int)));
  cutilSafeCall (cudaMalloc ((void **) &dev_data1, sizeof (double) * n * s));

  total_iterations = s * 5 * iter;
  iter *= 10000;
  printf("Beginning test.\n\n");
  fflush(NULL);
  gettimeofday (&time0, NULL);
  for(j = 0; j < s; j++)
  {
    m = (j + 1) % s;
    for(i = 0; i < 5; i++)
    {
      cutilSafeCall (cudaMemcpy (&dev_data1[j * n], &d_data[i * n], sizeof (double) * n, cudaMemcpyHostToDevice));
      for(k = 1; k <= iter; k++)
      {
        copy_kernel <<<n / 512, 512 >>> (dev_data1, n, j, m);
        compare_kernel<<<n / 512, 512>>> (&dev_data1[m * n], &dev_data1[j * n], d_compare);
        if(k%100 == 0) cutilSafeThreadSync();
        if(k%10000 == 0)
        {
          cutilSafeCall (cudaMemcpy (&h_compare, d_compare, sizeof (int), cudaMemcpyDeviceToHost));
          cutilSafeCall (cudaMemset (d_compare, 0, sizeof (int)));
          total += h_compare;
          iterations_done++;
          percent_done = iterations_done * 100 / (float) total_iterations;
          gettimeofday (&time1, NULL);
          diff = time1.tv_sec - time0.tv_sec;
          diff1 = 1000000 * diff + time1.tv_usec - time0.tv_usec;
          time0.tv_sec = time1.tv_sec;
          time0.tv_usec = time1.tv_usec;
          ttime += diff1;
          diff2 = ttime  * (total_iterations - iterations_done) / iterations_done / 1000000;
          total_bytes = 244140625 / (double) diff1;
          printf("Position %d, Data Type %d, Iteration %d, Errors: %d, completed %2.2f%%, Read %0.2fGB/s, Write %0.2fGB/s, ETA ",
                  j, i, iterations_done * 100000, total, percent_done, 3.0 * total_bytes, total_bytes);
          print_time_from_seconds ((unsigned) diff2, NULL, 0);
          printf (")\n");
          fflush(NULL);
        }
      }
    }
  }
  printf("Test complete. Total errors: %d.\n", total);
  fflush(NULL);
  cutilSafeCall (cudaFree ((char *) dev_data1));
  cutilSafeCall (cudaFree ((char *) d_compare));
  free((char*) d_data);
}
kriesel is online now   Reply With Quote
Old 2018-06-05, 04:36   #279
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

24×3×163 Posts
Default

Equivalently, one might ask, how do video ram testers work?
https://www.raymond.cc/blog/having-p...st-its-memory/

These are about system memory:
https://docs.microsoft.com/en-us/win...ages-in-memory
https://stackoverflow.com/questions/...memory-so-fast
kriesel is online now   Reply With Quote
Old 2018-06-05, 08:45   #280
LaurV
Romulan Interpreter
 
LaurV's Avatar
 
"name field"
Jun 2011
Thailand

41·251 Posts
Default

Quote:
Originally Posted by preda View Post
The situation is that, when the jacobi detects the first "sure" error, there is a high probability that there were also undetected errors before. Thus rolling back to a recent point only fixes the visible error, while preserving the hidden errors -- not good.
Thousand times better then nothing. That is why we do DC. For your peace of mind, you can add a config line in the .ini file to wrap back to the last known, to the beginning, or give a message and fux out (skip) and let the user decide (however, this would complicate the thingies with erasing the lines in the worktodo file).

Last fiddled with by LaurV on 2018-06-05 at 08:45
LaurV is offline   Reply With Quote
Old 2018-06-06, 04:51   #281
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

22·3·112 Posts
Default Size of PRP save files (checkpoints).

Usually, the PRP data-set is composed of two sets of bits, of equal size (N bits, N being the exponent), that I call "data" and "check". The "data" bits are the usual 3^k (k being the iteration) of the PRP algo, while the check bits are based on Robert's *cool* PRP error checking idea.

So, a PRP save-file contains the two sets (data + check). This is about double the size of an equivalent LL save-file (because the LL does not have "check" to save).

An example: for a 300M exponent, a PRP save-file is about 75MB (i.e.: (300M*2/8 bytes), and not-compressible.

I just realized that, in fact, there is a simple way to derive the "data" bits from the "check" bits. This halves the size of the save-file, as it's not longer required to save the "data", only the "check" being enough.

The tradeoff is a bit of computation (equivalent to 2*L iterations, where L is the "Gerbicz error checking" block size), that needs to be done when loading the save-file, to reconstruct "data" from "check".

This size reduction, while nice for the disk, is more relevant if one want to send "PRP data" over the network, in the future.

(basically, starting with C being the check bits, a sequence of:
D = C;
repeat(L-1): { D=D^2; D *= C; }
D *= 3;
reconstructs data D)

Last fiddled with by preda on 2018-06-06 at 04:52 Reason: de-dup newlines
preda is offline   Reply With Quote
Old 2018-06-13, 04:19   #282
Madpoo
Serpentine Vermin Jar
 
Madpoo's Avatar
 
Jul 2014

65168 Posts
Default

Just for general purpose... When the mod hilariously changes the title of the discussion threads, it really blows searchability down the line. While it may seem amusing, Google and the other crawlers actually do pay attention to things like that and page titles (derived from the topic title on this forum) carry weight. Mixing up the words or curious spellings make it more difficult to find things you might be interested in.
Madpoo is offline   Reply With Quote
Old 2018-06-14, 02:36   #283
kladner
 
kladner's Avatar
 
"Kieren"
Jul 2011
In My Own Galaxy!

236568 Posts
Default

Quote:
Originally Posted by Madpoo View Post
Just for general purpose... When the mod hilariously changes the title of the discussion threads, it really blows searchability down the line. While it may seem amusing, Google and the other crawlers actually do pay attention to things like that and page titles (derived from the topic title on this forum) carry weight. Mixing up the words or curious spellings make it more difficult to find things you might be interested in.
I agree.
kladner is offline   Reply With Quote
Old 2018-06-30, 14:30   #284
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

22·3·112 Posts
Default

Quote:
Originally Posted by R. Gerbicz View Post
With the "robust PRP-3" test we detect these FFT errors also and if we can survive with
a decent probability a block of iterations (what is in your code 1e5 or as you suggest 1e6)
then with the same FFT size but with different shift value (with high probability) using only ONE rollback
we can correct the error, the extra cost is only O(p) bitoperations per one rollback, you have to
do only a few shifts. The main difference is that we should survive only 1e6 iterations,
instead of 8e7. One could say that it is MUCH easier, but it is not that true, because the errors follow
a normal distribution (https://en.wikipedia.org/wiki/68%E2%...80%9399.7_rule) .
I did the experiment. Empirically, what I see is an extension of the FFT range of about 0.5%; -- this with L == 20, and doing a verification and possible offset-change every L^2==400 iterations (a pretty extreme application of the method).

My conclusion is that "variable offset" is not worth as a method of FFT range extension. The extension achieved, ~0.5%, is too small compared to the overhead.

Note: I did the offset-change by incrementing the offset of the residue ("mul-2") vs. setting the offset to a random value, but IMO this does not affect the result (0.5%) in a significant way.

Last fiddled with by preda on 2018-06-30 at 14:35
preda is offline   Reply With Quote
Old 2020-02-25, 15:41   #285
Fan Ming
 
Oct 2019

5·19 Posts
Default

Quote:
Originally Posted by R. Gerbicz View Post
The "only" restriction is that you should know a non-trivial divisor of N, you should know this before or after(!) the test.
What about doing LL iterations modulo k*Mp, where k's size is suitable and is highly-composite(for example, 2^32, 2^61, 3^41)? Would modulo k*Mp operation easily handled by IBDWT?

Last fiddled with by Fan Ming on 2020-02-25 at 16:04
Fan Ming is offline   Reply With Quote
Old 2020-02-26, 07:11   #286
Fan Ming
 
Oct 2019

9510 Posts
Default

Quote:
Originally Posted by Fan Ming View Post
What about doing LL iterations modulo k*Mp, where k's size is suitable and is highly-composite(for example, 2^32, 2^61, 3^41)? Would modulo k*Mp operation easily handled by IBDWT?
For example, if we do LL iterations modulo 2^37-1, as described in Crandall's paperhttp://www.faginfamily.net/barry/Pap...Transforms.pdf, if we choose N=4, the bit-length of every word will be: {10, 9, 9, 9}, and the weight vector a: {1, 2^(3/4), 2^(1/2), 2^(1/4)}.

If we do LL iterations modulo (2^2)*(2^37-1)=2^39-4, still N=4,
the bit-length of every word will be(according to Percival, C.'s paperhttp://www.ams.org/journals/mcom/200...02-01419-9.pdf): {10, 9, 10, 8}, and the weight vector a(ditto): {1, 2^(3/4), 2^(1/2), 2^(5/4)}.

Will modulo such composite numbers (in this example was 2^39-4) using IBDWT allowed in fact? If so, in this example, we can check if the result modulo 2^39-4 equals the same iteration modulo 2^2 after some iteration(this 4 is not good, only a example, better candidate should exist). So as acutal Mp multiply a sutiable k described in the post above.

Last fiddled with by Fan Ming on 2020-02-26 at 07:18
Fan Ming is offline   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
Stockfish / Lutefisk game, move 14 poll. Hungry for fish and black pieces. MooMoo2 Other Chess Games 0 2016-11-26 06:52
Redoing factoring work done by unreliable machines tha Lone Mersenne Hunters 23 2016-11-02 08:51
Unreliable AMD Phenom 9850 xilman Hardware 4 2014-08-02 18:08
[new fish check in] heloo mwxdbcr Lounge 0 2009-01-14 04:55
The Happy Fish thread xilman Hobbies 24 2006-08-22 11:44

All times are UTC. The time now is 15:25.


Fri Jul 7 15:25:26 UTC 2023 up 323 days, 12:54, 0 users, load averages: 1.32, 1.16, 1.11

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔