[Neo] force preloading into memory

Erik Ask erikask at maths.lth.se
Tue Apr 20 10:42:49 CEST 2010


Tobias Ivarsson wrote:
> The speedup you are seeing is because of caching. Items that are used are
> loaded into an in-memory structure, that does not need to go through any
> filesystem API, memory-mapped or not. The best way to load things into cache
> is to run the query once to touch everything that needs to be loaded.
>
> Pre-adapting the memory-maps as you suggest would give some speedup to the
> actual process of the first query, but that time would be spent in startup
> instead, meaning that the time from cold start to completed first query
> would be exactly the same.
>
> Cheers,
> Tobias
>
> On Mon, Apr 19, 2010 at 6:31 PM, Erik Ask <ask.erik at gmail.com> wrote:
>
>   
>> Hello
>>
>> I'm getting really slow performance when working against the HD. A
>> given set of queries can take up to 10 minutes when performed the
>> first time. Repeating the same set of queries a second time is
>> executed in seconds (2-5). As far as I can tell from watching in
>> jconsole, the heap behaves in almost the exact same maner (slowly
>> rising slope) for both transactions (each set of queries has it own
>> transaction) so it seems the speedup is due to memory mapping. I've
>> tinkered with the settings, but is there a way of explicitly forcing
>> the IO mapper to preload all or part of the node store and
>> relationship store? Am I right to assume that initially nothing is IO
>> mapped and these buffers builds up during runtime as requests are
>> made? Is there any way of tuning access to the HD?
>>
>> greetz
>> _______________________________________________
>> Neo mailing list
>> User at lists.neo4j.org
>> https://lists.neo4j.org/mailman/listinfo/user
>>
>>     
>
>
>
>   
Then i don't understand the purpose of loading files in to memory. I
thought it was used to make a copy of as much of a file as possible into
memory, then do all subsequent lookups there, and if needed replace
parts if nonloaded parts of the file are more frequently requested than
loaded. This would result in one hd-read per node/rel (assuming it fit
into memory and no replacing was needed), as opposed to searching for
entries in file that would require lots of reads and comparisons. The
amount of data that needs to be loaded into memory just doesn't seem to
warrant that much time being spent. I could easily copy files several
times the size of my complete DB in less time than it takes to run my
query sets.


More information about the User mailing list