| Author |
Message |
![[Post New]](/forums/templates/default/images/icon_minipost_new.gif) 05/10/2011 13:02:39
|
loriente
journeyman
Joined: 05/10/2011 12:10:29
Messages: 37
Offline
|
We have seen several times the scheduler getting stuck at startup waiting for a DB connection. In those cases we found there was already an Oracle blocking session so Quartz would sit there waiting and waiting. After DBA killed the blocking session the app and quartz scheduler started just fine.
We are running quartz-1.8.4 against Oracle 10g. Has anyone experienced a similar situation where quartz would get stuck at startup? How did you solve this issue?
What would be the reason for a lingering/already existing quartz db session?
Please let me know if I can provide more details to help identify/diagnose the issue.
Thanks,
nicolas.loriente
|
|
|
 |
![[Post New]](/forums/templates/default/images/icon_minipost_new.gif) 05/10/2011 19:00:55
|
jhouse
seraphim
Joined: 11/06/2009 15:29:56
Messages: 1654
Offline
|
Never heard of this before.
How are you creating your datasources (via Quartz config, or within app server) ?
|
|
|
 |
![[Post New]](/forums/templates/default/images/icon_minipost_new.gif) 05/12/2011 08:40:05
|
loriente
journeyman
Joined: 05/10/2011 12:10:29
Messages: 37
Offline
|
@jhouse thanks for your reply.
Our datasource is being created by Tomcat and looked up through Spring jndi. We are using Spring's LocalDataSourceJobStore.
I have a little more info about the issue. The problem seems to be that the application server was shutdown right when quartz was holding a lock ( SELECT * FROM SCHEMA.QRTZ_LOCKS WHERE LOCK_NAME = :1 FOR UPDATE ).
When the application is restarted it just hangs as it seems quartz tries to get the same lock the old orphaned Oracle session is currently holding. Not only that but any request from other nodes in the cluster just stay in line waiting for ever for the lock (held by orphan Oracle session) to be released.
This seems to be an issue that will occur very offen as application/nodes are shutdown or restarted (especially in dev/qa enviroments) and given that quartz does cluster checking every 20seconds. And in deed this is happening almost every single time we have a deployement to our dev environment.
How can we avoid these orphaned blocking sessions holding quartz lock when shutting down app or even when node failure?
I appreciate any advice.
Thanks,
nicolas.loriente
|
|
|
 |
![[Post New]](/forums/templates/default/images/icon_minipost_new.gif) 05/13/2011 02:57:43
|
jhouse
seraphim
Joined: 11/06/2009 15:29:56
Messages: 1654
Offline
|
This sounds like a question for the Oracle DBAs.
I'm very familiar with (responsible for) many deployments of Quartz against Oracle and have not ever ran into this issue, and indeed there are many thousands of others using Quartz + Oracle.
It sounds like there is some settings in your Oracle setup that is causing it to not quickly recognize the dropped connection (session) and rolling back its in-progress transaction (and thereby releasing its row-locks).
|
|
|
 |
![[Post New]](/forums/templates/default/images/icon_minipost_new.gif) 05/13/2011 08:29:16
|
loriente
journeyman
Joined: 05/10/2011 12:10:29
Messages: 37
Offline
|
@jhouse
I'm following up with DBA in the mean time let me share my setup.
We are using Spring SchedulerFactoryBean and our datasource is a Tomcat datasource:
SchedulerFactoryBean Code:
<bean name="quartzScheduler" class="org.springframework.scheduling.quartz.SchedulerFactoryBean" >
<property name="jobFactory">
<bean class="com.scheduling.job.factory.SpringBeanJobFactory" />
</property>
<property name="dataSource" ref="dataSource" />
<property name="transactionManager" ref="transactionManager" />
<property name="taskExecutor" ref="taskExecutor" />
<property name="quartzProperties">
<util:properties location="classpath:./META-INF/props/quartz.properties">
<prop key="org.quartz.jobStore.tablePrefix">${hibernate.default_schema}.QRTZ_</prop>
</util:properties>
</property>
<property name="applicationContextSchedulerContextKey" value="applicationContext" />
<property name="waitForJobsToCompleteOnShutdown" value="true" />
</bean>
quartz.properties Code:
org.quartz.scheduler.instanceName=QuartzScheduler
org.quartz.scheduler.instanceId=AUTO
org.quartz.jobStore.class=org.springframework.scheduling.quartz.LocalDataSourceJobStore
org.quartz.jobStore.driverDelegateClass=org.quartz.impl.jdbcjobstore.oracle.OracleDelegate
org.quartz.jobStore.isClustered=true
org.quartz.jobStore.clusterCheckinInterval=20000
org.quartz.jobStore.misfireThreshold=60000
org.quartz.jobStore.selectWithLockSQL=SELECT * FROM {0}LOCKS WHERE LOCK_NAME = ? FOR UPDATE
org.quartz.plugin.triggHistory.class=org.quartz.plugins.history.LoggingTriggerHistoryPlugin
org.quartz.plugin.triggHistory.triggerFiredMessage=Trigger {1}.{0} fired job {6}.{5} at: {4, date, HH:mm:ss dd/MM/yyyy}
org.quartz.plugin.triggHistory.triggerCompleteMessage=Trigger {1}.{0} completed firing job {6}.{5} at {4, date, HH:mm:ss dd/MM/yyyy} with resulting trigger instruction code: {9}
org.quartz.plugin.jobHistory.class=org.quartz.plugins.history.LoggingJobHistoryPlugin
org.quartz.plugin.jobHistory.jobSuccessMessage=Job {1}.{0} fired at: {2, date, dd/MM/yyyy HH:mm:ss} result=OK
org.quartz.plugin.jobHistory.jobFailedMessage=Job {1}.{0} fired at: {2, date, dd/MM/yyyy HH:mm:ss} result=ERROR
context.xml Code:
<Resource
name="jdbc/ourDb"
auth="Container"
type="oracle.jdbc.pool.OracleDataSource"
driverClassName="oracle.jdbc.driver.OracleDriver"
factory="oracle.jdbc.pool.OracleDataSourceFactory"
url="jdbc:oracle:thin:@//hose:port/service"
user="user"
password="password"
implicitCachingEnabled="true"
connectionCachingEnabled="true"
connectionCacheName="ourDBCache"
connectionCacheProperties="{MinLimit=10, MaxLimit=200, InitialLimit=30, MaxStatementsLimit=0, ConnectionWaitTimeout=10, AbandonedConnectionTimeout=60}" />
QuartzScheduler tx:advice Code:
<!-- Quartz Scheduler Transaction Config -->
<aop:config>
<aop:pointcut id="quartzSchedulerPointcut"
expression="execution(* org.quartz.Scheduler.*(..))" />
<aop:advisor advice-ref="quartzSchedulerAdvice"
pointcut-ref="quartzSchedulerPointcut" />
</aop:config>
<!-- Quartz Scheduler Transaction Propagation -->
<tx:advice id="quartzSchedulerAdvice">
<tx:attributes>
<tx:method name="get*" read-only="true" propagation="SUPPORTS" />
<tx:method name="set*" read-only="true" propagation="SUPPORTS" />
<tx:method name="is*" read-only="true" propagation="SUPPORTS" />
<tx:method name="insert*" read-only="false" propagation="REQUIRED" rollback-for="QuartzWrapperException" />
<tx:method name="update*" read-only="false" propagation="REQUIRED" rollback-for="QuartzWrapperException" />
<tx:method name="delete*" read-only="false" propagation="REQUIRED" rollback-for="QuartzWrapperException" />
<tx:method name="schedule*" read-only="false" propagation="REQUIRED" rollback-for="QuartzWrapperException" />
<tx:method name="pause*" read-only="false" propagation="REQUIRED" rollback-for="QuartzWrapperException" />
<tx:method name="resume*" read-only="false" propagation="REQUIRED" rollback-for="QuartzWrapperException" />
<tx:method name="run*" read-only="false" propagation="REQUIRED" rollback-for="QuartzWrapperException" />
<tx:method name="update*" read-only="false" propagation="REQUIRED" rollback-for="QuartzWrapperException" />
<tx:method name="delete*" read-only="false" propagation="REQUIRED" rollback-for="QuartzWrapperException" />
<tx:method name="toggle*" read-only="false" propagation="REQUIRED" rollback-for="QuartzWrapperException" />
<tx:method name="clone*" read-only="false" propagation="REQUIRED" rollback-for="QuartzWrapperException" />
</tx:attributes>
</tx:advice>
We are running with autocommit false. We don't set up a nonTransactionalDataSource as per Spring javadoc "With a non-XA DataSource and local Spring transactions, a single DataSource argument is sufficient."
Have you needed to use values other than defaults for the following properties?
org.quartz.jobStore.txIsolationLevelSerializable
org.quartz.jobStore.txIsolationLevelReadCommitted
org.quartz.jobStore.acquireTriggersWithinLock
org.quartz.jobStore.lockHandler.class
Please let me know if you spot something wrong in our configuration.
Thanks,
nicolas.loriente
|
|
|
 |
![[Post New]](/forums/templates/default/images/icon_minipost_new.gif) 05/13/2011 12:48:43
|
loriente
journeyman
Joined: 05/10/2011 12:10:29
Messages: 37
Offline
|
This is happening every single time we shut down and restart. And in dev that is every 2 hours when we do a new deployement. We need to get DBA to kill sessions and only then restart the app.
We have also seen this happening at startup when there are NO orphaned blocking sessions (because we killed them all) as if Quartz is locking itself.
No individual from those thousand implementations you mention has any ideas? Have any one of those implementations gone through somewhat similar issues?
I've posted my configuration which seems pretty plain to me. We are not doing anything special. So, I don't understand how come we are the only ones having this issues out of thousands of implementations.
Any idea is appreciated. Thansk!
nicolas.loriente
|
|
|
 |
![[Post New]](/forums/templates/default/images/icon_minipost_new.gif) 05/13/2011 13:00:01
|
loriente
journeyman
Joined: 05/10/2011 12:10:29
Messages: 37
Offline
|
This is what I got from DBA:
"There is no easy option to resolve this, especially when the CLIENT process requests a “EXPLICIT LOCK” by doing a “FOR UPDATE” and then crashes/exits and leaves the
server thread (on oracle db server) running."
I don't know if he is correct here or not as I'm not a DB person.
Jhouse, out of all the Oracle implementations you personally worked on you've never seen this issue?
For us it has become the norm. Every time we shutdown/restart for deployment it occurs. It is very rarely when it doesn't.
Does the configuration look OK?
Thanks
nicolas.loriente
|
|
|
 |
![[Post New]](/forums/templates/default/images/icon_minipost_new.gif) 05/17/2011 22:13:13
|
jhouse
seraphim
Joined: 11/06/2009 15:29:56
Messages: 1654
Offline
|
> Jhouse, out of all the Oracle implementations you personally worked on you've never seen this issue?
That's right. And with thousands of folks using Oracle with Quartz, you're the first one ever (in over a decade) to complain of this - so I'm very inclined to think it is something to do with your setup.
In all my experience with Oracle (which is considerable) - and in line with how other databases (such as PostgreSQL) acts - when the connection dies, Oracle rolls-back the work (which releases the locks).
The exception to this is when XA connections are being used, and the connection/process dies between phase 1 and phase 2 of the commit. In that case Oracle (correctly) flags the transaction as "is doubt". The locks are then still held until one of the following occurs: 1- the process is restarted and the TM tells oracle how to complete phase 2 (commit or rollback), 2- the transaction timeout is reached, 3- a DBA resolves the in-doubt transaction manually.
|
|
|
 |
![[Post New]](/forums/templates/default/images/icon_minipost_new.gif) 05/18/2011 12:49:45
|
loriente
journeyman
Joined: 05/10/2011 12:10:29
Messages: 37
Offline
|
Thanks for your reply. We are not using XA connections. This is what I'm getting from DBA:
connections that linger on oracle database have to be manually addresssed
He even mentioned that has seen long running operations alive in Oracle side even though client has already disconnected. (e.g. running store procedure from SQL client and then disconnecting).
It's a bit puzzling since I'm getting two conflicting opinions. On one side you are saying that Oracle should realize of dropped connection and rollback the existing session. And on the other side the DBA is telling me this is well know and lingering sessions need manual intervention (killing) as Oracle won't do that on its own.
I want to make sure that I've done everything correctly on my end and that there is nothing left I can do from a Quartz point of view. Then if that is the case, the issue is on DBA guys to provide solutions for orphaned/hanged sessions.
Do you mind taking a close look at my earlier post where I posted my configuration and Quartz properties and let me know if they look OK to you?
Also, would setting these properties to something else than default could have a positive effect on the issue at hand?
Code:
org.quartz.jobStore.txIsolationLevelSerializable
org.quartz.jobStore.txIsolationLevelReadCommitted
org.quartz.jobStore.acquireTriggersWithinLock
org.quartz.jobStore.lockHandler.class
I would really appreciate any comments from people who is using Cluster Quartz on Oracle database.
I would really appreciate any comments from Oracle guru if you happen to read this post.
Thanks,
nicolas.loriente
|
|
|
 |
![[Post New]](/forums/templates/default/images/icon_minipost_new.gif) 05/19/2011 18:40:45
|
jhouse
seraphim
Joined: 11/06/2009 15:29:56
Messages: 1654
Offline
|
There's nothing in your above properties that catches my eye as odd.
Also the other properties you are inquiring about (related to isolation levels an locking) should have no bearing.
On my honor I swear Quartz is used a LOT against oracle (by myself and thousands of others), both clustered and unclustered, and I've never heard of this before.
I'll try to make a an inquiry or two with some DBAs that I know to see if they know what could lead to this.
Maybe most DBAs enable this property but yours has disabled it?
http://www.toadworld.com/KNOWLEDGE/KnowledgeXpertforOracle/tabid/648/TopicID/NET7G/Default.aspx
james
|
|
|
 |
![[Post New]](/forums/templates/default/images/icon_minipost_new.gif) 05/23/2011 15:04:34
|
loriente
journeyman
Joined: 05/10/2011 12:10:29
Messages: 37
Offline
|
@jhouse
I'm using Spring's SchedulerFactoryBean and as implementation of destroy life cycle method it calls scheduler.shutdown(true).
Questions:
1. does quartz do any cleanup of locks, etc. when requested shutdown?
2. What happens if Quartz gets the row lock and before it finished using it is requested shutdown? Does Quartz releases the lock before shutting down? or just shuts down without cleaning up DB resources? (same as question 1 just more detailed )
Just trying to make sure that Quartz is doing its clean up and this issue is only in DB end.
Thanks,
nicolas.loriente
|
|
|
 |
![[Post New]](/forums/templates/default/images/icon_minipost_new.gif) 05/23/2011 19:55:23
|
jhouse
seraphim
Joined: 11/06/2009 15:29:56
Messages: 1654
Offline
|
If Quartz is allowed to shutdown(), rather than the process just being killed, then yes, all locks will be explicitly released.
|
|
|
 |
![[Post New]](/forums/templates/default/images/icon_minipost_new.gif) 05/24/2011 09:59:36
|
loriente
journeyman
Joined: 05/10/2011 12:10:29
Messages: 37
Offline
|
@jhouse
One of the things I started suspecting was that they were killing app server before Quartz scheduler had a chance to gracefully shutdown.
Since last week we haven't seen the issue and since then I've started checking the logs to see if Quartz was shutdown properly and indeed all these times Quartz was shutdown gracefully.
I haven't been able to make the connection yet since the problem hasn't happened again yet (I should go search the logs) but I wouldn't be surprised if that is the cause of our issue.
I'll let you know when I can pin point the problem.
Thanks,
nicolas.loriente
|
|
|
 |
![[Post New]](/forums/templates/default/images/icon_minipost_new.gif) 06/20/2012 11:25:48
|
hawala
neo
Joined: 06/20/2012 11:24:42
Messages: 1
Offline
|
@loriente @jhouse
Do you have any update on this one ? We are facing the exact same problem when we are using Quartz in a clustered environment with Oracle.
|
|
|
 |
![[Post New]](/forums/templates/default/images/icon_minipost_new.gif) 07/25/2012 14:02:35
|
snide
neo
Joined: 07/25/2012 13:58:22
Messages: 1
Offline
|
We're seeing this behavior as well in our development environment where many (approx 6) are sharing a common database. We're going to try and increase the time we allow for graceful shutdown and I've pinged the DBA crew to see if we can add the SQLNET.EXPIRE_TIME setting to the server. I think that should mitigate the issue. I'll update later with the results.
|
|
|
 |
|
|