mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware

Reply
 
Thread Tools
Old 2017-06-22, 00:33   #12
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
República de California

101101111011002 Posts
Default

Quote:
Originally Posted by airsquirrels View Post
This is what I'm going to experiment with next, although as mentioned above, it seems like if I have enough cores/resources to fill with enough threads the CPU is doing a bit of this itself.
You might simply first fill each thread's mem-address buffer with a large number of random addresses [properly constrained to lie within the proper mem-chunk and 32-bit aligned, obviously], to obviate the what-is-the-optimal-buffer-size-before-doing-batch-of-reads optimization issue. Definitely curious to see your resulting numbers, in any event.
ewmayer is offline   Reply With Quote
Old 2017-06-22, 14:48   #13
airsquirrels
 
airsquirrels's Avatar
 
"David"
Jul 2015
Ohio

51710 Posts
Default

Quote:
Originally Posted by ewmayer View Post
You might simply first fill each thread's mem-address buffer with a large number of random addresses [properly constrained to lie within the proper mem-chunk and 32-bit aligned, obviously], to obviate the what-is-the-optimal-buffer-size-before-doing-batch-of-reads optimization issue. Definitely curious to see your resulting numbers, in any event.
Initial results for sorting/binning look promising. My work queue/bin implementation is far from efficient, but it does demonstrate that even sequentially navigating a huge 1-3GB organized index for my reads is significantly faster than random access.

Running four threads on the two core + HT system, I was able to achieve a peak equivalent random read and sum rate of 97937 MB/s (up from 11,000 MB/s threaded random!) . This test used a bin granularity of 4096KB with 128 slots in each bin. My next step is to iterate over the bin/slot variations.

So far: (unique index bins are per thread)
1024KB pages with 128 slots (1GB index): 51590 MB/s
512KB pages with 64 slots (1GB index): 74669 MB/s
256KB (L2 cache) pages with 32 slots (1GB index): 88318 MB/s
// Max slots I could go without swapping on 16GB system
227KB (L2 cache with space) pages with 32 slots (1GB index): 81230 MB/s
4096KB pages with 128 slots (200MB index): 97937 MB/s - L3 Cache?
2730KB pages with 128 slots (393MB index): 97319 MB/s

This isn't an instant win, the current code to build the index is quite slow and the index is quite large, but it is a promising approach.

Code:
FillBuffer took 1.54 seconds (2047.20 MB @ 1333.0938 MB/s)
Buffer Full
Sleeping 60 to sync threads
Formed 393216 KB buffer of 786432 bins, each addressing 2730 KB
Filling random queue with 97517568 reads
Formed 393216 KB buffer of 786432 bins, each addressing 2730 KB
Filling random queue with 97517568 reads
Formed 393216 KB buffer of 786432 bins, each addressing 2730 KB
Filling random queue with 97517568 reads
Formed 393216 KB buffer of 786432 bins, each addressing 2730 KB
Filling random queue with 97517568 reads
FillList took 0.49 seconds (371.85 MB @ 754.1983 MB/s)
FillList took 0.49 seconds (371.85 MB @ 753.8254 MB/s)
FillList took 0.49 seconds (371.85 MB @ 758.1578 MB/s)
FillList took 0.50 seconds (371.85 MB @ 744.0179 MB/s)
BuildBins took 6.82 seconds (371.85 MB @ 54.5591 MB/s)
Full bins: 2554092, (97.38 % located)
BuildBins took 6.84 seconds (371.85 MB @ 54.3830 MB/s)
Full bins: 2554092, (97.38 % located)
BuildBins took 7.00 seconds (371.85 MB @ 53.1307 MB/s)
Full bins: 2554092, (97.38 % located)
BuildBins took 7.08 seconds (371.85 MB @ 52.5223 MB/s)
Full bins: 2554092, (97.38 % located)
ReadRandom took 6.48 seconds (11587.69 MB @ 1788.1705 MB/s)
16a3f82c13aeadf9
Pausing...
ReadRandom took 6.47 seconds (11587.69 MB @ 1791.2486 MB/s)
16a3f82c13aeadf9
Pausing...
ReadRandom took 6.39 seconds (11587.69 MB @ 1814.8326 MB/s)
16a3f82c13aeadf9
Pausing...
ReadRandom took 6.40 seconds (11587.69 MB @ 1810.7221 MB/s)
16a3f82c13aeadf9
Pausing...
Running read test
ReadBins took 1.28 seconds (11587.69 MB @ 9041.5889 MB/s)
16d7bd1879d8eb7d
ReadBins took 1.29 seconds (11587.69 MB @ 9017.3944 MB/s)
16d7bd1879d8eb7d
ReadBins took 1.30 seconds (11587.69 MB @ 8900.2444 MB/s)
ReadBins took 1.30 seconds (11587.69 MB @ 8899.9301 MB/s)
16d7bd1879d8eb7d
16d7bd1879d8eb7d
ReadTest took 1.30 seconds (126926.42 MB @ 97319.0807 MB/s)

Last fiddled with by airsquirrels on 2017-06-22 at 14:48
airsquirrels is offline   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
Hyperthreading broken in Skylake and Kaby Lake? GP2 Hardware 4 2017-06-26 02:08
Kaby Lake / Asrock disappointment, RAM weirdness Prime95 Hardware 17 2017-01-27 21:09
Kaby Lake processors: bor-ing ! tServo Hardware 11 2016-12-18 10:32
Kaby Lake chip Prime95 Hardware 0 2016-10-26 23:23
3LP sieving: memory and speed savings! FactorEyes Factoring 36 2010-10-04 20:29

All times are UTC. The time now is 16:09.


Fri Jul 7 16:09:14 UTC 2023 up 323 days, 13:37, 0 users, load averages: 1.70, 1.39, 1.21

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔