[Logo] Terracotta Discussion Forums
  [Search] Search   [Recent Topics] Recent Topics   [Members]  Member Listing   [Groups] Back to home page 
[Register] Register / 
[Login] Login 
[Expert]
Oracle orphaned session - Scheduler stuck at startup  XML
Forum Index -> Quartz
Author Message
loriente

journeyman

Joined: 05/10/2011 12:10:29
Messages: 37
Offline

We have seen several times the scheduler getting stuck at startup waiting for a DB connection. In those cases we found there was already an Oracle blocking session so Quartz would sit there waiting and waiting. After DBA killed the blocking session the app and quartz scheduler started just fine.

We are running quartz-1.8.4 against Oracle 10g. Has anyone experienced a similar situation where quartz would get stuck at startup? How did you solve this issue?

What would be the reason for a lingering/already existing quartz db session?

Please let me know if I can provide more details to help identify/diagnose the issue.

Thanks,

nicolas.loriente
jhouse

seraphim
[Avatar]
Joined: 11/06/2009 15:29:56
Messages: 1654
Offline


Never heard of this before.

How are you creating your datasources (via Quartz config, or within app server) ?
loriente

journeyman

Joined: 05/10/2011 12:10:29
Messages: 37
Offline

@jhouse thanks for your reply.

Our datasource is being created by Tomcat and looked up through Spring jndi. We are using Spring's LocalDataSourceJobStore.

I have a little more info about the issue. The problem seems to be that the application server was shutdown right when quartz was holding a lock ( SELECT * FROM SCHEMA.QRTZ_LOCKS WHERE LOCK_NAME = :1 FOR UPDATE ).

When the application is restarted it just hangs as it seems quartz tries to get the same lock the old orphaned Oracle session is currently holding. Not only that but any request from other nodes in the cluster just stay in line waiting for ever for the lock (held by orphan Oracle session) to be released.

This seems to be an issue that will occur very offen as application/nodes are shutdown or restarted (especially in dev/qa enviroments) and given that quartz does cluster checking every 20seconds. And in deed this is happening almost every single time we have a deployement to our dev environment.

How can we avoid these orphaned blocking sessions holding quartz lock when shutting down app or even when node failure?


I appreciate any advice.

Thanks,

nicolas.loriente
jhouse

seraphim
[Avatar]
Joined: 11/06/2009 15:29:56
Messages: 1654
Offline


This sounds like a question for the Oracle DBAs.

I'm very familiar with (responsible for) many deployments of Quartz against Oracle and have not ever ran into this issue, and indeed there are many thousands of others using Quartz + Oracle.


It sounds like there is some settings in your Oracle setup that is causing it to not quickly recognize the dropped connection (session) and rolling back its in-progress transaction (and thereby releasing its row-locks).
loriente

journeyman

Joined: 05/10/2011 12:10:29
Messages: 37
Offline

@jhouse

I'm following up with DBA in the mean time let me share my setup.

We are using Spring SchedulerFactoryBean and our datasource is a Tomcat datasource:

SchedulerFactoryBean Code:
 <bean name="quartzScheduler" class="org.springframework.scheduling.quartz.SchedulerFactoryBean" >
 	<property name="jobFactory">    
 		<bean class="com.scheduling.job.factory.SpringBeanJobFactory" />  
 	</property>  
 	<property name="dataSource" ref="dataSource" />  
 	<property name="transactionManager" ref="transactionManager" />
 	<property name="taskExecutor" ref="taskExecutor" />  
 	<property name="quartzProperties">    
 		<util:properties location="classpath:./META-INF/props/quartz.properties">
 			<prop key="org.quartz.jobStore.tablePrefix">${hibernate.default_schema}.QRTZ_</prop>
 		</util:properties>
 	</property>  
 	<property name="applicationContextSchedulerContextKey" value="applicationContext" />  
 	<property name="waitForJobsToCompleteOnShutdown" value="true" />
 </bean>
 


quartz.properties Code:
 org.quartz.scheduler.instanceName=QuartzScheduler 
 org.quartz.scheduler.instanceId=AUTO 
 
 org.quartz.jobStore.class=org.springframework.scheduling.quartz.LocalDataSourceJobStore
 org.quartz.jobStore.driverDelegateClass=org.quartz.impl.jdbcjobstore.oracle.OracleDelegate
 org.quartz.jobStore.isClustered=true 
 org.quartz.jobStore.clusterCheckinInterval=20000 
 org.quartz.jobStore.misfireThreshold=60000
 org.quartz.jobStore.selectWithLockSQL=SELECT * FROM {0}LOCKS WHERE LOCK_NAME = ? FOR UPDATE
 
 org.quartz.plugin.triggHistory.class=org.quartz.plugins.history.LoggingTriggerHistoryPlugin
 org.quartz.plugin.triggHistory.triggerFiredMessage=Trigger {1}.{0} fired job {6}.{5} at: {4, date, HH:mm:ss dd/MM/yyyy}
 org.quartz.plugin.triggHistory.triggerCompleteMessage=Trigger {1}.{0} completed firing job {6}.{5} at {4, date, HH:mm:ss dd/MM/yyyy} with resulting trigger instruction code: {9}
 org.quartz.plugin.jobHistory.class=org.quartz.plugins.history.LoggingJobHistoryPlugin
 org.quartz.plugin.jobHistory.jobSuccessMessage=Job {1}.{0} fired at: {2, date, dd/MM/yyyy HH:mm:ss} result=OK
 org.quartz.plugin.jobHistory.jobFailedMessage=Job {1}.{0} fired at: {2, date, dd/MM/yyyy HH:mm:ss} result=ERROR
 


context.xml Code:
     <Resource
           name="jdbc/ourDb"
 	  auth="Container"
 	  type="oracle.jdbc.pool.OracleDataSource"
 	  driverClassName="oracle.jdbc.driver.OracleDriver"
 	  factory="oracle.jdbc.pool.OracleDataSourceFactory"
 	  url="jdbc:oracle:thin:@//hose:port/service"
 	  user="user"
           password="password"
 	  implicitCachingEnabled="true" 
 	  connectionCachingEnabled="true" 
 	  connectionCacheName="ourDBCache" 
 	  connectionCacheProperties="{MinLimit=10, MaxLimit=200, InitialLimit=30, MaxStatementsLimit=0, ConnectionWaitTimeout=10, AbandonedConnectionTimeout=60}"  />
 


QuartzScheduler tx:advice Code:
  <!-- Quartz Scheduler Transaction Config -->
 <aop:config>
 	<aop:pointcut id="quartzSchedulerPointcut"
 			expression="execution(* org.quartz.Scheduler.*(..))" />
 	<aop:advisor advice-ref="quartzSchedulerAdvice"
 			pointcut-ref="quartzSchedulerPointcut" />
 </aop:config>	
 	
 <!--  Quartz Scheduler Transaction Propagation -->
 <tx:advice id="quartzSchedulerAdvice">
 	<tx:attributes>		 
 		<tx:method name="get*" read-only="true" propagation="SUPPORTS" />
 		<tx:method name="set*" read-only="true" propagation="SUPPORTS" />
 		<tx:method name="is*" read-only="true" propagation="SUPPORTS" />
 		<tx:method name="insert*" read-only="false" propagation="REQUIRED" rollback-for="QuartzWrapperException" />
 		<tx:method name="update*" read-only="false" propagation="REQUIRED" rollback-for="QuartzWrapperException" />
 		<tx:method name="delete*" read-only="false" propagation="REQUIRED" rollback-for="QuartzWrapperException" />
 		<tx:method name="schedule*" read-only="false" propagation="REQUIRED" rollback-for="QuartzWrapperException" />
 		<tx:method name="pause*" read-only="false" propagation="REQUIRED" rollback-for="QuartzWrapperException" />
 		<tx:method name="resume*" read-only="false" propagation="REQUIRED" rollback-for="QuartzWrapperException" />
 		<tx:method name="run*" read-only="false" propagation="REQUIRED" rollback-for="QuartzWrapperException" />
 		<tx:method name="update*" read-only="false" propagation="REQUIRED" rollback-for="QuartzWrapperException" />
 		<tx:method name="delete*" read-only="false" propagation="REQUIRED" rollback-for="QuartzWrapperException" />
 		<tx:method name="toggle*" read-only="false" propagation="REQUIRED" rollback-for="QuartzWrapperException" />
 		<tx:method name="clone*" read-only="false" propagation="REQUIRED" rollback-for="QuartzWrapperException" />
 	</tx:attributes>
 </tx:advice>
 


We are running with autocommit false. We don't set up a nonTransactionalDataSource as per Spring javadoc "With a non-XA DataSource and local Spring transactions, a single DataSource argument is sufficient."

Have you needed to use values other than defaults for the following properties?

org.quartz.jobStore.txIsolationLevelSerializable
org.quartz.jobStore.txIsolationLevelReadCommitted
org.quartz.jobStore.acquireTriggersWithinLock
org.quartz.jobStore.lockHandler.class

Please let me know if you spot something wrong in our configuration.

Thanks,

nicolas.loriente
loriente

journeyman

Joined: 05/10/2011 12:10:29
Messages: 37
Offline

This is happening every single time we shut down and restart. And in dev that is every 2 hours when we do a new deployement. We need to get DBA to kill sessions and only then restart the app.

We have also seen this happening at startup when there are NO orphaned blocking sessions (because we killed them all) as if Quartz is locking itself.

No individual from those thousand implementations you mention has any ideas? Have any one of those implementations gone through somewhat similar issues?

I've posted my configuration which seems pretty plain to me. We are not doing anything special. So, I don't understand how come we are the only ones having this issues out of thousands of implementations.

Any idea is appreciated. Thansk!

nicolas.loriente

loriente

journeyman

Joined: 05/10/2011 12:10:29
Messages: 37
Offline

This is what I got from DBA:

"There is no easy option to resolve this, especially when the CLIENT process requests a “EXPLICIT LOCK” by doing a “FOR UPDATE” and then crashes/exits and leaves the
server thread (on oracle db server) running."

I don't know if he is correct here or not as I'm not a DB person.

Jhouse, out of all the Oracle implementations you personally worked on you've never seen this issue?

For us it has become the norm. Every time we shutdown/restart for deployment it occurs. It is very rarely when it doesn't.

Does the configuration look OK?

Thanks

nicolas.loriente
jhouse

seraphim
[Avatar]
Joined: 11/06/2009 15:29:56
Messages: 1654
Offline


> Jhouse, out of all the Oracle implementations you personally worked on you've never seen this issue?

That's right. And with thousands of folks using Oracle with Quartz, you're the first one ever (in over a decade) to complain of this - so I'm very inclined to think it is something to do with your setup.

In all my experience with Oracle (which is considerable) - and in line with how other databases (such as PostgreSQL) acts - when the connection dies, Oracle rolls-back the work (which releases the locks).

The exception to this is when XA connections are being used, and the connection/process dies between phase 1 and phase 2 of the commit. In that case Oracle (correctly) flags the transaction as "is doubt". The locks are then still held until one of the following occurs: 1- the process is restarted and the TM tells oracle how to complete phase 2 (commit or rollback), 2- the transaction timeout is reached, 3- a DBA resolves the in-doubt transaction manually.

loriente

journeyman

Joined: 05/10/2011 12:10:29
Messages: 37
Offline

Thanks for your reply. We are not using XA connections. This is what I'm getting from DBA:

connections that linger on oracle database have to be manually addresssed 


He even mentioned that has seen long running operations alive in Oracle side even though client has already disconnected. (e.g. running store procedure from SQL client and then disconnecting).

It's a bit puzzling since I'm getting two conflicting opinions. On one side you are saying that Oracle should realize of dropped connection and rollback the existing session. And on the other side the DBA is telling me this is well know and lingering sessions need manual intervention (killing) as Oracle won't do that on its own.

I want to make sure that I've done everything correctly on my end and that there is nothing left I can do from a Quartz point of view. Then if that is the case, the issue is on DBA guys to provide solutions for orphaned/hanged sessions.

Do you mind taking a close look at my earlier post where I posted my configuration and Quartz properties and let me know if they look OK to you?

Also, would setting these properties to something else than default could have a positive effect on the issue at hand?

Code:
org.quartz.jobStore.txIsolationLevelSerializable
 org.quartz.jobStore.txIsolationLevelReadCommitted
 org.quartz.jobStore.acquireTriggersWithinLock
 org.quartz.jobStore.lockHandler.class 


I would really appreciate any comments from people who is using Cluster Quartz on Oracle database.
I would really appreciate any comments from Oracle guru if you happen to read this post.

Thanks,


nicolas.loriente
jhouse

seraphim
[Avatar]
Joined: 11/06/2009 15:29:56
Messages: 1654
Offline


There's nothing in your above properties that catches my eye as odd.

Also the other properties you are inquiring about (related to isolation levels an locking) should have no bearing.

On my honor I swear Quartz is used a LOT against oracle (by myself and thousands of others), both clustered and unclustered, and I've never heard of this before.

I'll try to make a an inquiry or two with some DBAs that I know to see if they know what could lead to this.

Maybe most DBAs enable this property but yours has disabled it?

http://www.toadworld.com/KNOWLEDGE/KnowledgeXpertforOracle/tabid/648/TopicID/NET7G/Default.aspx

james
loriente

journeyman

Joined: 05/10/2011 12:10:29
Messages: 37
Offline

@jhouse

I'm using Spring's SchedulerFactoryBean and as implementation of destroy life cycle method it calls scheduler.shutdown(true).

Questions:
1. does quartz do any cleanup of locks, etc. when requested shutdown?
2. What happens if Quartz gets the row lock and before it finished using it is requested shutdown? Does Quartz releases the lock before shutting down? or just shuts down without cleaning up DB resources? (same as question 1 just more detailed )

Just trying to make sure that Quartz is doing its clean up and this issue is only in DB end.


Thanks,

nicolas.loriente
jhouse

seraphim
[Avatar]
Joined: 11/06/2009 15:29:56
Messages: 1654
Offline



If Quartz is allowed to shutdown(), rather than the process just being killed, then yes, all locks will be explicitly released.

loriente

journeyman

Joined: 05/10/2011 12:10:29
Messages: 37
Offline

@jhouse

One of the things I started suspecting was that they were killing app server before Quartz scheduler had a chance to gracefully shutdown.

Since last week we haven't seen the issue and since then I've started checking the logs to see if Quartz was shutdown properly and indeed all these times Quartz was shutdown gracefully.

I haven't been able to make the connection yet since the problem hasn't happened again yet (I should go search the logs) but I wouldn't be surprised if that is the cause of our issue.

I'll let you know when I can pin point the problem.

Thanks,


nicolas.loriente
hawala

neo

Joined: 06/20/2012 11:24:42
Messages: 1
Offline

@loriente @jhouse

Do you have any update on this one ? We are facing the exact same problem when we are using Quartz in a clustered environment with Oracle.
snide

neo

Joined: 07/25/2012 13:58:22
Messages: 1
Offline

We're seeing this behavior as well in our development environment where many (approx 6) are sharing a common database. We're going to try and increase the time we allow for graceful shutdown and I've pinged the DBA crew to see if we can add the SQLNET.EXPIRE_TIME setting to the server. I think that should mitigate the issue. I'll update later with the results.
 
Forum Index -> Quartz
Go to:   
Powered by JForum 2.1.7 © JForum Team