[Neo] Neo4j failing transactions after a day of tiny load

Dmitri Livotov dmitri at livotov.eu
Mon Feb 15 12:00:13 CET 2010


Hi,


> Looks like a commit fails and then the TM tries to rollback the
> transaction but that also fails. Only thing TM can do then is to block
>   
Yes, but it seems did not block, as attempts to start new transaction 
were not put on hold but throwing "unable to start transaction" 
exceptions instead. In our servlets we've wrapped all neo4j access logic 
into the try/catch(Throwable e) blocks, so there should not be uncaught 
exceptions.

> all other running and new transactions from executing until the failed
> transaction has been resolved. The strange thing is that the original
> exception that caused the commit to fail is not logged (only the
> exception thrown on the following rollback call is logged).
>   
Could it be problems in JTA, not in neo4j itself ?

> A commit fail could be caused by an OutOfMemoryError or no more disk
> space. If an OutOfMemoryError is thrown it could explain why some log
> messages are missing.
>
> Could you try re-running this using the neo4j-kernel 1.0-SNAPSHOT
> while monitoring heap usage and available disk space?
>
>   
disk space is 100% ok, there are 35 G free space on that server. 
Regarding the OOM - we did not find any OOM exceptions, however, we'll 
increase heap +1G more and also add a heap monitoring to check this 
along with the snapshot version.

Thanks,
Dmitri


> Regards,
> -Johan
>
> On Mon, Feb 15, 2010 at 10:41 AM, Dmitri Livotov <dmitri at livotov.eu> wrote:
>   
>> Morning !
>>
>> Past weekend we established a tiny load test with approx 20 threads in
>> total from a single jmeter machine in order to see how the database will
>> work for a long term under a constant load. The test requests were simple:
>>
>> - (r/w) - random node read by primary key, modification of 10 properties
>> and commit
>> - (r/o) - random node read by primary key, traverse and iterate
>> traversal results
>>
>>
>> We run jmeter on Friday evening (19:00) and database failed at
>> Satturday, about 14:00. After restarting the app server around 16:00 we
>> run the tests again and database failed on Sunday, about 19:00. The
>> diagnostics are strange - suddenly it fails to begin a new transaction
>> and says "Unable to start transaction". No more extra messages and
>> stacktraces but this one.
>>
>> Today we crawled our server logs once again and here how it fails in
>> more details:
>>
>> Suddenly,
>> "org.neo4j.kernel.impl.transaction.TransactionFailureException: Unable
>> to commit transaction. Caused by:
>> javax.transaction.HeuristicMixedException: Unable to rollback ---> error
>> code in commit: -1 ---> error code for rollback: 0" error appears. Then,
>> all subsequent requests fails with "Unable to start transaction". Only a
>> JVM restart solves the problem - if we just redeploy the webapp, neo4j
>> will not start, yelling on impossibility to obtain a lock to database
>> files - the same message if you try to run two neo4j instances with a
>> same database folder. So it looks like some thread keeps sitting and
>> running in memory, locking the data files.
>>
>> Im including below the beginning of server.log from the moment of time
>> when first failure appears.
>>
>> Not sure, if this internal neo4j problem or somethng from JTA, so would
>> appreciate your commetns/suggestions. Hope we'll be able to figure this out.
>>
>> Dmitri
>>
>> P.S. To clarify neo4j instance usage in a webapp - neo4j instance in
>> initialized in within a singleton class. This class is first touched
>> from servlet context listener, when web app starts up, so database gets
>> initialized at that phase. The servlets only using the singleton class
>> to get that neo4j instance.
>>
>>     
> _______________________________________________
> Neo mailing list
> User at lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user
>   



More information about the User mailing list