mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GPU Computing (https://www.mersenneforum.org/forumdisplay.php?f=92)
-   -   genefer/CUDA (https://www.mersenneforum.org/showthread.php?t=14297)

AG5BPilot 2012-01-13 11:45

[QUOTE=msft;286136]Linux need #127 patch with CPU TIME 100% issue.
Win not need #127 patch.[/QUOTE]

Shoichiro,

I don't understand why your fix would work. Is the only change you made to add an extra DeviceToHost copy of maxErr?

But, I did see that the init_device() routine makes 4 cuda device calls, and unlike the rest of the software, it does not use cutilSafeCall() to check for errors.

In particular, I think this call may be important:

[code]cudaSetDeviceFlags(cudaDeviceBlockingSync);[/code]

If that were to fail for any reason, CUDA would act as if you were in cudaDeviceScheduleSpin mode, and that would cause the 100% CPU utilization that you're seeing.

However, I do not understand why that call would be failing.

I would suggest adding cutilSafeCall() to the API calls inside init_device() function. Seems like a good thing to do regardless of whether this is related to the 100% CPU core problem. I'm adding that to my version of the code.

Mike

msft 2012-01-13 12:35

Hi ,AG5BPilot
[QUOTE=AG5BPilot;286150]
I don't understand why your fix would work. Is the only change you made to add an extra DeviceToHost copy of maxErr?
[/QUOTE]
Yes.
I think this code will trigger.
[QUOTE=AG5BPilot;286150]
But, I did see that the init_device() routine makes 4 cuda device calls, and unlike the rest of the software, it does not use cutilSafeCall() to check for errors.
In particular, I think this call may be important:
[code]cudaSetDeviceFlags(cudaDeviceBlockingSync);[/code]
If that were to fail for any reason, CUDA would act as if you were in cudaDeviceScheduleSpin mode, and that would cause the 100% CPU utilization that you're seeing.
However, I do not understand why that call would be failing.
[/QUOTE]
I think,if that were to fail,"extra DeviceToHost copy" should not work.
[QUOTE=AG5BPilot;286150]
I would suggest adding cutilSafeCall() to the API calls inside init_device() function. Seems like a good thing to do regardless of whether this is related to the 100% CPU core problem. I'm adding that to my version of the code.
[/QUOTE]
The results did not change.

Anyway strange phenomenon.

msft 2012-01-24 11:28

Hi ,
Ver 1.06,1.061,1.062 have fatal error with 32bit OS.
I made bug around GMP code.
Do not use with 32bit OS.
Thank you,

rroonnaalldd 2012-01-25 11:28

[QUOTE=msft;287104]Hi ,
Ver 1.06,1.061,1.062 have fatal error with 32bit OS.
I made bug around GMP code.
Do not use with 32bit OS.
Thank you,[/QUOTE]

Which bug?
I am getting no error here or in llrCUDA. I have Lubuntu11.10 with kernel 3.0.0-15.26 as compiling-platform.

msft 2012-01-25 12:38

Hi ,
llrCUDA was OK.
[code]
m_b = b;
mpz_init_set_ui(m_Na,m_b);
for(j = m; j != 1; j/=2)
mpz_mul(m_Na,m_Na,m_Na);
m_Na_size = mpz_size(m_Na);
m_Na_size_byte = m_Na_size*sizeof(long int);
m_a = (long int*) malloc(m_Na_size*sizeof(long int));
mpz_export(m_a,NULL,0,m_Na_size_byte,0,0,m_Na);
[/code]
Sometime m_a[m_Na_size-1]==0 with linux32.
[code]
if(m_a_32[m_a_32_len-1]== 0) m_a_32_len--;
[/code]
This code not enough.

msft 2012-01-26 03:44

Hi ,
mini test program.
[code]
#include <stdio.h>
#include <gmp.h>
int main()
{
mpz_t m;
int i,j;
unsigned long long e_m[10];
mpz_init_set_ui(m,2);
for(j = 0; j != 8; j++)
{
mpz_mul(m,m,m);
for(i=0;i<10;i++)e_m[i]=0;
mpz_export(e_m,NULL,0,80,0,0,m);
printf(" mpz_size(m)=%d ",(int) mpz_size(m));
for(i=9;i>=0;i--) printf(" %llx",e_m[i]);
printf("\n");
}
mpz_clear(m);
}
[/code]
linux32:
[code]
$ ./a.out
mpz_size(m)=1 0 0 0 0 0 0 0 0 0 4
mpz_size(m)=1 0 0 0 0 0 0 0 0 0 10
mpz_size(m)=1 0 0 0 0 0 0 0 0 0 100
mpz_size(m)=1 0 0 0 0 0 0 0 0 0 10000
mpz_size(m)=2 0 0 0 0 0 0 0 0 0 100000000
mpz_size(m)=3 0 0 0 0 0 0 0 0 1 0
mpz_size(m)=5 0 0 0 0 0 0 0 1 0 0
mpz_size(m)=9 0 0 0 0 0 1 0 0 0 0
[/code]
linux64:
[code]
$ ./a.out
mpz_size(m)=1 0 0 0 0 0 0 0 0 0 4
mpz_size(m)=1 0 0 0 0 0 0 0 0 0 10
mpz_size(m)=1 0 0 0 0 0 0 0 0 0 100
mpz_size(m)=1 0 0 0 0 0 0 0 0 0 10000
mpz_size(m)=1 0 0 0 0 0 0 0 0 0 100000000
mpz_size(m)=2 0 0 0 0 0 0 0 0 1 0
mpz_size(m)=3 0 0 0 0 0 0 0 1 0 0
mpz_size(m)=5 0 0 0 0 0 1 0 0 0 0
[/code]

rroonnaalldd 2012-01-28 02:12

Hmm, got it not compiled:
[QUOTE]boinc@Lubuntu32:~/Cuda$ gcc a
a: In function `main':
a.c:(.text.startup+0x25): undefined reference to `__gmpz_init_set_ui'
a.c:(.text.startup+0x3c): undefined reference to `__gmpz_mul'
a.c:(.text.startup+0x110): undefined reference to `__gmpz_export'
a.c:(.text.startup+0x2ba): undefined reference to `__gmpz_clear'
collect2: ld returned 1 exit status
[/QUOTE]

rogue 2012-01-28 02:23

[QUOTE=rroonnaalldd;287485]Hmm, got it not compiled:[/QUOTE]

Add -lgmp

msft 2012-01-28 03:18

[code]
mpz_mul(m_Na,m_Na,m_Na);
m_Na_size = (mpz_sizeinbase(m_Na,2)+sizeof(long long int)*8-1)/(sizeof(long long int)*8);
m_Na_size_byte = m_Na_size*sizeof(long long int);
[/code]
It's only answer.

rroonnaalldd 2012-01-28 10:04

[QUOTE=rogue;287486]Add -lgmp[/QUOTE]
:redface:



Lubuntu11.10-32 [QUOTE]boinc@Lubuntu32:~/Cuda$ ./a32.out
mpz_size(m)=1 0 0 0 0 0 0 0 0 0 4
mpz_size(m)=1 0 0 0 0 0 0 0 0 0 10
mpz_size(m)=1 0 0 0 0 0 0 0 0 0 100
mpz_size(m)=1 0 0 0 0 0 0 0 0 0 10000
mpz_size(m)=2 0 0 0 0 0 0 0 0 0 100000000
mpz_size(m)=3 0 0 0 0 0 0 0 0 1 0
mpz_size(m)=5 0 0 0 0 0 0 0 1 0 0
mpz_size(m)=9 0 0 0 0 0 1 0 0 0 0
[/QUOTE]

DotschUX-64 based on Ubuntu8.10 [QUOTE]boinc@vmware2k-3:~/Cuda$ ./a32.out
mpz_size(m)=1 0 0 0 0 0 0 0 0 0 4
mpz_size(m)=1 0 0 0 0 0 0 0 0 0 10
mpz_size(m)=1 0 0 0 0 0 0 0 0 0 100
mpz_size(m)=1 0 0 0 0 0 0 0 0 0 10000
mpz_size(m)=2 0 0 0 0 0 0 0 0 0 100000000
mpz_size(m)=3 0 0 0 0 0 0 0 0 1 0
mpz_size(m)=5 0 0 0 0 0 0 0 1 0 0
mpz_size(m)=9 0 0 0 0 0 1 0 0 0 0
[/QUOTE]

[quote]boinc@vmware2k-3:~/Cuda$ ./a64.out
mpz_size(m)=1 0 0 0 0 0 0 0 0 0 4
mpz_size(m)=1 0 0 0 0 0 0 0 0 0 10
mpz_size(m)=1 0 0 0 0 0 0 0 0 0 100
mpz_size(m)=1 0 0 0 0 0 0 0 0 0 10000
mpz_size(m)=1 0 0 0 0 0 0 0 0 0 100000000
mpz_size(m)=2 0 0 0 0 0 0 0 0 1 0
mpz_size(m)=3 0 0 0 0 0 0 0 1 0 0
mpz_size(m)=5 0 0 0 0 0 1 0 0 0 0
[/quote]

axn 2012-07-23 08:21

does anyone have any idea of the relative time spent of "fft square" routine vs "next step" kernels in genefer?


All times are UTC. The time now is 05:55.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.