![]() |
[QUOTE=msft;286136]Linux need #127 patch with CPU TIME 100% issue.
Win not need #127 patch.[/QUOTE] Shoichiro, I don't understand why your fix would work. Is the only change you made to add an extra DeviceToHost copy of maxErr? But, I did see that the init_device() routine makes 4 cuda device calls, and unlike the rest of the software, it does not use cutilSafeCall() to check for errors. In particular, I think this call may be important: [code]cudaSetDeviceFlags(cudaDeviceBlockingSync);[/code] If that were to fail for any reason, CUDA would act as if you were in cudaDeviceScheduleSpin mode, and that would cause the 100% CPU utilization that you're seeing. However, I do not understand why that call would be failing. I would suggest adding cutilSafeCall() to the API calls inside init_device() function. Seems like a good thing to do regardless of whether this is related to the 100% CPU core problem. I'm adding that to my version of the code. Mike |
Hi ,AG5BPilot
[QUOTE=AG5BPilot;286150] I don't understand why your fix would work. Is the only change you made to add an extra DeviceToHost copy of maxErr? [/QUOTE] Yes. I think this code will trigger. [QUOTE=AG5BPilot;286150] But, I did see that the init_device() routine makes 4 cuda device calls, and unlike the rest of the software, it does not use cutilSafeCall() to check for errors. In particular, I think this call may be important: [code]cudaSetDeviceFlags(cudaDeviceBlockingSync);[/code] If that were to fail for any reason, CUDA would act as if you were in cudaDeviceScheduleSpin mode, and that would cause the 100% CPU utilization that you're seeing. However, I do not understand why that call would be failing. [/QUOTE] I think,if that were to fail,"extra DeviceToHost copy" should not work. [QUOTE=AG5BPilot;286150] I would suggest adding cutilSafeCall() to the API calls inside init_device() function. Seems like a good thing to do regardless of whether this is related to the 100% CPU core problem. I'm adding that to my version of the code. [/QUOTE] The results did not change. Anyway strange phenomenon. |
Hi ,
Ver 1.06,1.061,1.062 have fatal error with 32bit OS. I made bug around GMP code. Do not use with 32bit OS. Thank you, |
[QUOTE=msft;287104]Hi ,
Ver 1.06,1.061,1.062 have fatal error with 32bit OS. I made bug around GMP code. Do not use with 32bit OS. Thank you,[/QUOTE] Which bug? I am getting no error here or in llrCUDA. I have Lubuntu11.10 with kernel 3.0.0-15.26 as compiling-platform. |
Hi ,
llrCUDA was OK. [code] m_b = b; mpz_init_set_ui(m_Na,m_b); for(j = m; j != 1; j/=2) mpz_mul(m_Na,m_Na,m_Na); m_Na_size = mpz_size(m_Na); m_Na_size_byte = m_Na_size*sizeof(long int); m_a = (long int*) malloc(m_Na_size*sizeof(long int)); mpz_export(m_a,NULL,0,m_Na_size_byte,0,0,m_Na); [/code] Sometime m_a[m_Na_size-1]==0 with linux32. [code] if(m_a_32[m_a_32_len-1]== 0) m_a_32_len--; [/code] This code not enough. |
Hi ,
mini test program. [code] #include <stdio.h> #include <gmp.h> int main() { mpz_t m; int i,j; unsigned long long e_m[10]; mpz_init_set_ui(m,2); for(j = 0; j != 8; j++) { mpz_mul(m,m,m); for(i=0;i<10;i++)e_m[i]=0; mpz_export(e_m,NULL,0,80,0,0,m); printf(" mpz_size(m)=%d ",(int) mpz_size(m)); for(i=9;i>=0;i--) printf(" %llx",e_m[i]); printf("\n"); } mpz_clear(m); } [/code] linux32: [code] $ ./a.out mpz_size(m)=1 0 0 0 0 0 0 0 0 0 4 mpz_size(m)=1 0 0 0 0 0 0 0 0 0 10 mpz_size(m)=1 0 0 0 0 0 0 0 0 0 100 mpz_size(m)=1 0 0 0 0 0 0 0 0 0 10000 mpz_size(m)=2 0 0 0 0 0 0 0 0 0 100000000 mpz_size(m)=3 0 0 0 0 0 0 0 0 1 0 mpz_size(m)=5 0 0 0 0 0 0 0 1 0 0 mpz_size(m)=9 0 0 0 0 0 1 0 0 0 0 [/code] linux64: [code] $ ./a.out mpz_size(m)=1 0 0 0 0 0 0 0 0 0 4 mpz_size(m)=1 0 0 0 0 0 0 0 0 0 10 mpz_size(m)=1 0 0 0 0 0 0 0 0 0 100 mpz_size(m)=1 0 0 0 0 0 0 0 0 0 10000 mpz_size(m)=1 0 0 0 0 0 0 0 0 0 100000000 mpz_size(m)=2 0 0 0 0 0 0 0 0 1 0 mpz_size(m)=3 0 0 0 0 0 0 0 1 0 0 mpz_size(m)=5 0 0 0 0 0 1 0 0 0 0 [/code] |
Hmm, got it not compiled:
[QUOTE]boinc@Lubuntu32:~/Cuda$ gcc a a: In function `main': a.c:(.text.startup+0x25): undefined reference to `__gmpz_init_set_ui' a.c:(.text.startup+0x3c): undefined reference to `__gmpz_mul' a.c:(.text.startup+0x110): undefined reference to `__gmpz_export' a.c:(.text.startup+0x2ba): undefined reference to `__gmpz_clear' collect2: ld returned 1 exit status [/QUOTE] |
[QUOTE=rroonnaalldd;287485]Hmm, got it not compiled:[/QUOTE]
Add -lgmp |
[code]
mpz_mul(m_Na,m_Na,m_Na); m_Na_size = (mpz_sizeinbase(m_Na,2)+sizeof(long long int)*8-1)/(sizeof(long long int)*8); m_Na_size_byte = m_Na_size*sizeof(long long int); [/code] It's only answer. |
[QUOTE=rogue;287486]Add -lgmp[/QUOTE]
:redface: Lubuntu11.10-32 [QUOTE]boinc@Lubuntu32:~/Cuda$ ./a32.out mpz_size(m)=1 0 0 0 0 0 0 0 0 0 4 mpz_size(m)=1 0 0 0 0 0 0 0 0 0 10 mpz_size(m)=1 0 0 0 0 0 0 0 0 0 100 mpz_size(m)=1 0 0 0 0 0 0 0 0 0 10000 mpz_size(m)=2 0 0 0 0 0 0 0 0 0 100000000 mpz_size(m)=3 0 0 0 0 0 0 0 0 1 0 mpz_size(m)=5 0 0 0 0 0 0 0 1 0 0 mpz_size(m)=9 0 0 0 0 0 1 0 0 0 0 [/QUOTE] DotschUX-64 based on Ubuntu8.10 [QUOTE]boinc@vmware2k-3:~/Cuda$ ./a32.out mpz_size(m)=1 0 0 0 0 0 0 0 0 0 4 mpz_size(m)=1 0 0 0 0 0 0 0 0 0 10 mpz_size(m)=1 0 0 0 0 0 0 0 0 0 100 mpz_size(m)=1 0 0 0 0 0 0 0 0 0 10000 mpz_size(m)=2 0 0 0 0 0 0 0 0 0 100000000 mpz_size(m)=3 0 0 0 0 0 0 0 0 1 0 mpz_size(m)=5 0 0 0 0 0 0 0 1 0 0 mpz_size(m)=9 0 0 0 0 0 1 0 0 0 0 [/QUOTE] [quote]boinc@vmware2k-3:~/Cuda$ ./a64.out mpz_size(m)=1 0 0 0 0 0 0 0 0 0 4 mpz_size(m)=1 0 0 0 0 0 0 0 0 0 10 mpz_size(m)=1 0 0 0 0 0 0 0 0 0 100 mpz_size(m)=1 0 0 0 0 0 0 0 0 0 10000 mpz_size(m)=1 0 0 0 0 0 0 0 0 0 100000000 mpz_size(m)=2 0 0 0 0 0 0 0 0 1 0 mpz_size(m)=3 0 0 0 0 0 0 0 1 0 0 mpz_size(m)=5 0 0 0 0 0 1 0 0 0 0 [/quote] |
does anyone have any idea of the relative time spent of "fft square" routine vs "next step" kernels in genefer?
|
| All times are UTC. The time now is 05:55. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.