![]() |
|
|
#12 |
|
I moo ablest echo power!
May 2013
29×61 Posts |
|
|
|
|
|
|
#13 | |
|
Aug 2015
22×5 Posts |
Quote:
@ Antonio Could you tell me from where I can get pthreadVC2.dll? what is this file for? Thanks |
|
|
|
|
|
|
#14 | |
|
Aug 2015
22×5 Posts |
Quote:
Thank you all for the valuable information. As I told you I am new to this so sorry for the coming questions :) 1- What is the mapping between C97, C110 , etc? Is C97, number with 97 digits ? Does it follow the conversion ration of 3.3 (C97 = 320-bits ) ? 2- What is "degree 4 polys" 3- VBCurtis You said: "running msieve with -np1 (GPU step) -nps (size opt step) flags both set results in one core fully loaded to do size optimization while the gpu generates a TON of polys but -stage1_norm rejects 98% of them before the -nps step is run." Do you mean that both: the steps of "Poly select - GPU" and "Size optimization - CPU" can run in parallel or5 they still must be in series? I am confused! 4- If I understand correctly: -we run "msieve -np1 -nps" -both steps are in parallel -we set "-stage1_norm" so that we keep the GPU trying more with the poly select "Value_2" till the CPU finishes the size optimization of the GPU previous output "Value_1" etc ... this time in your example is 2-5 seconds. How I set the "-stage1_norm" What if I have multi core processor, I am still able to do the size optimization in parallel ? I understood the -stage2_norm you explain it so nicely !! Thanks a lot guys for your great help and for your time :) BR |
|
|
|
|
|
|
#15 | |
|
"Curtis"
Feb 2005
Riverside, CA
4,861 Posts |
Quote:
2. polys = polynomials. Degree = highest power, as in basic algebra. Wiki the word if you don't remember the concept. 3. If you invoke msieve with -np1 -nps, the program will run a thread on the first step (on GPU), and a thread of your CPU on the size-optimization step. msieve is not otherwise multi-threaded; a multi-core chip is not used. If you really cared, you could invoke stage 1 GPU on its own, then run a bunch of -nps with various save files to use the other cores, but you'd have to figure that out for yourself (the rest of us don't bother). 4. -stage1_norm=1e26 would set the cutoff for saving the poly to 1e26. If you invoke msieve -np1 -nps, the log will show you msieve's default stage 1 norm and stage 2 norm; you then shrink these via guess-and-see-what-happens to alter the output. I did say 2-5/second, with "/" meaning "per" rather than 2-5 seconds for each hit. I usually decrease stage1norm by one magnitude (if default is 1e26, I might set it to 1.5e25, or 1e25, or 7e24, whatever results in my GPU turning out a couple hits per second). For stage2_norm, I check the log for default stage 2 norm, then divide by something near 25. The bigger the job, the smaller I make this norm (for big jobs, say GNFS170, I might run poly select for a week, looking for 100 or so hits per day out of the -nps step). Before more questions, read: msieve readme, msieve -h for all the available flags, jeff gilchrist's intro to NFS factoring webpage, and peruse the "polynomial request thread" where much discussion has happened over the last two years about how to use msieve-GPU most effectively. Many tuning parameters are open questions- the guide I give here is the result of my personal experience, rather than definite best practices used by everyone. -Curtis |
|
|
|
|
|
|
#16 |
|
Tribal Bullet
Oct 2004
3,541 Posts |
To add to the very good explanation above, GPU stage 1 is multithreaded and you actually can ask for more than one thread to drive the GPU. For 100-digit problems and below, a GPU will crunch through stage 1 so fast that even ignoring the size optimization the GPU is still mostly idle. You can throw 2-3 threads at stage 1 (-t 2 or -t 3 does this) and generate hits about 30% faster just from better GPU utilization, but you still need to boil all those hits down and stage 2 is still a bottleneck. That doesn't change the advice that a GPU is wasted on small problems; even for 150-digit polynomial selection I've found more than one thread is needed to keep the GPU fully occupied.
|
|
|
|
|
|
#17 | |
|
Aug 2015
2010 Posts |
Quote:
VBCurtis and jasonp Thanks a lot for the explanation. VBCurtis I will read the resources you pointed. thanks you ! jasonp I could get my GPU >50% busy by using "-t 100" however the tool crashes if I use "-t 150" !!! one last question, I have two GPUs on my systems, I could not get both of them working it is alwys either GPU0 or GPU1. Any suggestions? Thanks a again BR |
|
|
|
|
|
|
#18 |
|
Sep 2009
2·1,039 Posts |
A (probably) CUDA level related issue, it looks as if CUDA 7.5 doesn't support compute level 1.1 any more. I'm trying to get my new GTX 970 to work, but when I try to make msieve with CUDA=1 I get:
Code:
"/usr/local/cuda-7.5/bin/nvcc" -arch sm_11 -ptx -o stage1_core_sm11.ptx gnfs/poly/stage1/stage1_core_gpu/stage1_core.cu nvcc fatal : Value 'sm_11' is not defined for option 'gpu-architecture' Makefile:307: recipe for target 'stage1_core_sm11.ptx' failed make: *** [stage1_core_sm11.ptx] Error 1 Chris |
|
|
|
|
|
#19 |
|
Tribal Bullet
Oct 2004
DD516 Posts |
The makefile in the bc40 directory also needs to be modified to remove the older targets as well.
|
|
|
|
|
|
#20 |
|
Sep 2009
2×1,039 Posts |
I've got it to compile cleanly after editing the Makefiles as follows (msieve.orig is as downloaded, msieve-svn988.2 is changed):
Code:
diff msieve.orig/trunk/Makefile msieve-svn988.2/trunk/Makefile 172,173d171 < stage1_core_sm11.ptx \ < stage1_core_sm13.ptx \ 305,310d302 < < stage1_core_sm11.ptx: $(NFS_GPU_HDR) < $(NVCC) -arch sm_11 -ptx -o $@ $< < < stage1_core_sm13.ptx: $(NFS_GPU_HDR) < $(NVCC) -arch sm_13 -ptx -o $@ $< Code:
diff msieve.orig/trunk/b40c/Makefile msieve-svn988.2/trunk/b40c/Makefile 20,21d19 < GEN_SM13 = -gencode=arch=compute_13,code=\"sm_13,compute_13\" < GEN_SM10 = -gencode=arch=compute_10,code=\"sm_10,compute_10\" 35c33 < all: $(LIBNAME)_sm10.$(EXT) $(LIBNAME)_sm13.$(EXT) $(LIBNAME)_sm20.$(EXT) --- > all: $(LIBNAME)_sm20.$(EXT) 40,45d37 < < $(LIBNAME)_sm10.$(EXT) : $(DEPS) < $(NVCC) $(GEN_SM10) -o $@ sort_engine.cu $(NVCCFLAGS) $(INC) -O3 < < $(LIBNAME)_sm13.$(EXT) : $(DEPS) < $(NVCC) $(GEN_SM13) -o $@ sort_engine.cu $(NVCCFLAGS) $(INC) -O3 Chris |
|
|
|
|
|
#21 |
|
Sep 2009
207810 Posts |
Tested, it works. I've generated several smallish (100-104 digit) polys with it and they sieve OK.
Chris |
|
|
|
|
|
#22 |
|
Just call me Henry
"David"
Sep 2007
Cambridge (GMT/BST)
23·3·5·72 Posts |
Would this fix it on windows for later versions of cuda or does bc40 run with later versions of cuda on linux? I have been unable to compile a version on windows using the latest source(my VS2012 install is messed up permanently). I still need a recent version that works on a core 2.
|
|
|
|
![]() |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Keeping cuda working over Ubuntu upgrades | fivemack | Software | 9 | 2016-01-16 16:37 |
| ssh -X not working? | Dubslow | Linux | 3 | 2012-05-11 14:44 |
| Log Out not working? | cheesehead | Forum Feedback | 1 | 2012-03-19 17:13 |
| DST not working? | Dubslow | Forum Feedback | 2 | 2012-03-19 06:53 |
| How is working on 44M-45M ? | hbock | Lone Mersenne Hunters | 0 | 2005-04-06 17:16 |