![]() |
[QUOTE=airsquirrels;461723]This is what I'm going to experiment with next, although as mentioned above, it seems like if I have enough cores/resources to fill with enough threads the CPU is doing a bit of this itself.[/QUOTE]
You might simply first fill each thread's mem-address buffer with a large number of random addresses [properly constrained to lie within the proper mem-chunk and 32-bit aligned, obviously], to obviate the what-is-the-optimal-buffer-size-before-doing-batch-of-reads optimization issue. Definitely curious to see your resulting numbers, in any event. |
[QUOTE=ewmayer;461730]You might simply first fill each thread's mem-address buffer with a large number of random addresses [properly constrained to lie within the proper mem-chunk and 32-bit aligned, obviously], to obviate the what-is-the-optimal-buffer-size-before-doing-batch-of-reads optimization issue. Definitely curious to see your resulting numbers, in any event.[/QUOTE]
Initial results for sorting/binning look promising. My work queue/bin implementation is far from efficient, but it does demonstrate that even sequentially navigating a huge 1-3GB organized index for my reads is significantly faster than random access. Running four threads on the two core + HT system, I was able to achieve a peak equivalent random read and sum rate of 97937 MB/s (up from 11,000 MB/s threaded random!) . This test used a bin granularity of 4096KB with 128 slots in each bin. My next step is to iterate over the bin/slot variations. So far: (unique index bins are per thread) 1024KB pages with 128 slots (1GB index): 51590 MB/s 512KB pages with 64 slots (1GB index): 74669 MB/s 256KB (L2 cache) pages with 32 slots (1GB index): 88318 MB/s // Max slots I could go without swapping on 16GB system 227KB (L2 cache with space) pages with 32 slots (1GB index): 81230 MB/s 4096KB pages with 128 slots (200MB index): 97937 MB/s - L3 Cache? 2730KB pages with 128 slots (393MB index): 97319 MB/s This isn't an instant win, the current code to build the index is quite slow and the index is quite large, but it is a promising approach. [CODE] FillBuffer took 1.54 seconds (2047.20 MB @ 1333.0938 MB/s) Buffer Full Sleeping 60 to sync threads Formed 393216 KB buffer of 786432 bins, each addressing 2730 KB Filling random queue with 97517568 reads Formed 393216 KB buffer of 786432 bins, each addressing 2730 KB Filling random queue with 97517568 reads Formed 393216 KB buffer of 786432 bins, each addressing 2730 KB Filling random queue with 97517568 reads Formed 393216 KB buffer of 786432 bins, each addressing 2730 KB Filling random queue with 97517568 reads FillList took 0.49 seconds (371.85 MB @ 754.1983 MB/s) FillList took 0.49 seconds (371.85 MB @ 753.8254 MB/s) FillList took 0.49 seconds (371.85 MB @ 758.1578 MB/s) FillList took 0.50 seconds (371.85 MB @ 744.0179 MB/s) BuildBins took 6.82 seconds (371.85 MB @ 54.5591 MB/s) Full bins: 2554092, (97.38 % located) BuildBins took 6.84 seconds (371.85 MB @ 54.3830 MB/s) Full bins: 2554092, (97.38 % located) BuildBins took 7.00 seconds (371.85 MB @ 53.1307 MB/s) Full bins: 2554092, (97.38 % located) BuildBins took 7.08 seconds (371.85 MB @ 52.5223 MB/s) Full bins: 2554092, (97.38 % located) ReadRandom took 6.48 seconds (11587.69 MB @ 1788.1705 MB/s) 16a3f82c13aeadf9 Pausing... ReadRandom took 6.47 seconds (11587.69 MB @ 1791.2486 MB/s) 16a3f82c13aeadf9 Pausing... ReadRandom took 6.39 seconds (11587.69 MB @ 1814.8326 MB/s) 16a3f82c13aeadf9 Pausing... ReadRandom took 6.40 seconds (11587.69 MB @ 1810.7221 MB/s) 16a3f82c13aeadf9 Pausing... Running read test ReadBins took 1.28 seconds (11587.69 MB @ 9041.5889 MB/s) 16d7bd1879d8eb7d ReadBins took 1.29 seconds (11587.69 MB @ 9017.3944 MB/s) 16d7bd1879d8eb7d ReadBins took 1.30 seconds (11587.69 MB @ 8900.2444 MB/s) ReadBins took 1.30 seconds (11587.69 MB @ 8899.9301 MB/s) 16d7bd1879d8eb7d 16d7bd1879d8eb7d ReadTest took 1.30 seconds (126926.42 MB @ 97319.0807 MB/s) [/CODE] |
| All times are UTC. The time now is 07:13. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.