mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Software (https://www.mersenneforum.org/forumdisplay.php?f=10)
-   -   Use of large memory pages possible with newer linux kernels (https://www.mersenneforum.org/showthread.php?t=1605)

Dresdenboy 2003-12-07 17:48

Use of large memory pages possible with newer linux kernels
 
[url]http://kerneltrap.org/node/view/418[/url]

That sounds very interesting! And it offers new possibilities for efficient FFT implementations.

Dresdenboy 2003-12-08 12:53

Ok, some more details about the use of these large pages:

Oracle managed to get a 8% speedup by using the large pages. Although I have little experience in this area I think for FFTs the speedup will be much larger, because:[list][*]Even if data is already in the L1 cache the accessing time can increase if the memory addresses of these data are actually spread over many memory pages.[*]The limited amount of TLB entries requires fine tuning of FFT algorithms to avoid TLB thrashing as much as possible - but this avoidance could cause less efficient algorithms.[*]why is it so hard for large size FFTs to come at least close to the FFT MFLOPS for FFTs running completely inside L1 (or L2) cache in times of memory prefetching?[*]I need at least 2 mem-read/write passes to do a large size FFT - but todays max transfer rates for P4/Opteron/AFX systems (6.4GB/s = reading up to 750 times the 1024k FFT data set per second) is hardly reachable because it drops significantly for large strides.[/list]I roughly estimate that at least a speedup of 10-30% could be possible.

Older discussions regarding this topic can be found here:
[url]http://www.mail-archive.com/mersenne@base.com/msg07117.html[/url]

Dresdenboy 2003-12-08 14:26

Here is a paper which gives some numbers for achieved speedups by using "superpages": [url]http://portal.acm.org/citation.cfm?id=844138&jmp=cit&dl=GUIDE&dl=ACM[/url]

(I think the pdf can only be seen by people which have a subscription)

Here some results of memory intensive benchmarks with some impressive examples (together with the average speedups because in most cases the speed increase is <10%):

[code]
SPEC benchmark Speedup by using superpages

vpr 38.3%
mcf 67.6%
vortex 11.2%
bzip2 14.0%
average for SPECint 11.2%

galgel 28.9%
art 12.2%
lucas 28.0% :cool:
apsi 82.7%
average for SPECfp 11.0%


and some non-SPEC benchmarks:
FFTW 54.9% :grin:
Matrix 654.6% :shock:
[/code]
They implemented it in FreeBSD on the Alpha CPU.

I think these numbers tell enough to justify any GIMPS-related research in this area - especially as it is available with Linux kernel versions 2.5.36+ and 2.6. Actually (for first tests) it just need a change of the memory allocating and releasing system calls.


Dresdenboy 2003-12-08 14:47

Here the paper (and other stuff) is accessible for everyone:

[url]http://www.cs.rice.edu/~jnavarro/superpages/[/url]


All times are UTC. The time now is 01:11.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.