Terracotta Discussion Forums (LEGACY READ-ONLY ARCHIVE)

Hello,

Our production system hangs due to the fact that one quartz thread holds a lock on the quartz_locks table (TRIGGER_ACCESS row).
We use quartz 1.6.5

I have enclosed a thread dump of all four nodes of our system taken at the time it hangs.

In the thread dump of node 4 we see the following trace:
"QuartzScheduler_PersistentScheduler-supizas4.pcs.portinfolink.com1264070532061_MisfireHandler" prio=10 tid=0x087f6800 nid=0x43a6 waiting for monitor entry [0x820fe000]
java.lang.Thread.State: BLOCKED (on object monitor)
at org.quartz.impl.jdbcjobstore.StdJDBCDelegate.selectMisfiredTriggersInStates(StdJDBCDelegate.java:311)
at org.quartz.impl.jdbcjobstore.JobStoreSupport.recoverMisfiredJobs(JobStoreSupport.java:926)
at org.quartz.impl.jdbcjobstore.JobStoreSupport.doRecoverMisfires(JobStoreSupport.java:3126)
at org.quartz.impl.jdbcjobstore.JobStoreSupport$MisfireHandler.manage(JobStoreSupport.java:3887)
at org.quartz.impl.jdbcjobstore.JobStoreSupport$MisfireHandler.run(JobStoreSupport.java:3907)

The stack shows that this thread owns the lock (JobStoreSupport.doRecoverMisfires). But then seems to lock on a monitor.
This one lock brings our entire system down.

Any thoughts please?

Greetings,
Huub

Very odd.

There is no synchronizing at all going on at that point in the code...

StdJDBCDelegate.java:311 is simply adding an element to an ArrayList.

Sounds like the JVM is screwing up?

James,

Thanks for your response.

Very odd... my feelings exactly.

We are in the midst of doing some detailed performace testing of our system. Rather frequently the system hangs.
At that point we see a large number of threads piling up to obtain the lock on the quartz_locks table.
I have let people make two thread dumps so far just after the system hangs. You have seen one. The other has
the exact same pattern. Notice in the thread dump that in both cases a 'deadlock' occurs on a Java monitor.
But that seems unrelated to the threads that block in the quartz code?

Both quartz threads that block do so on locations where they are simply adding items to a local newly instantiated
list (StdJDBCDelegate.java:2927 and StdJDBCDelegate.java:311).
The closest synchronization I can find is the next() call on a OracleResultsetImpl (our application server is Oracle AS).

Our problem is that our tests involve quite some time and human resources to set up and execute.
Furthermore we are set to go to production with a the version under testing next week.
Tests frequently fail now due to the locked row 'TRIGGER_ACCESS' of the quartz_locks table.

Do you see a temporary workaround or extra things we could do to analyze this problem?
Our only option at the moment is to bring down the server from which the hanging database session originates.

Greetings,
Huub
Portbase
Rotterdam

Have you tried a different version of the JVM ? Maybe go back a build or two?

Over the years I have seen issues with various JVM versions locking up in odd places such as this.

Not yet.
I will discuss this monday and keep you posted.

Thanks,
Huub

James,

We have not tried going back to a previous version of the jvm.
Too much diverse priorities to try this short term.
As indicated we suspected a correlation with the deadlocks that were reported in the threaddump.
Instrumentation Classes of the profiler were involved in the deadlock trace.
Therefore we tried a number of runs without the instrumentation.

We had no 'hanging' quartz locks in those runs.

However, there were some performance related concerns that have risen from those tests.
I will post a new message in this forum to address those concerns.
Hope you will respond.

Greetings,
Huub