[Logo] Terracotta Discussion Forums
  [Search] Search   [Recent Topics] Recent Topics   [Members]  Member Listing   [Groups] Back to home page 
[Register] Register / 
[Login] Login 
[Expert]
Overlapping stateful job executions in a cluster  XML
Forum Index -> Quartz
Author Message
zutai

neo

Joined: 03/05/2012 07:36:35
Messages: 2
Offline

Hi,

Here is my environment :
a tomcat cluster composed by 2 nodes, Quartz 2.1.3, cluster mode, using JobStoreTX with an oracle DB.

My test :
I have a very simple stateful job which does :
- print "start"
- wait for 90 seconds
- print "end"

I schedule it every 20 seconds using CRON (with MisfireHandlingInstructionDoNothing setting)

When both nodes are up, there's no problem, as the job is stateful, the scheduler waits until current job execution ends before firing a new one.
Cluster is working properly, some executions are on node A, some others on node B, no problem.

On the contrary, I've got a problem in this case :
While job is being executed on Node A, I shutdown this node.
Node A waits for job to end before completing its shutdown.

The problem is between the moment when I call "shutdown" and the moment jobs ends, Node B starts another job execution !

Anyone can explain me what's happening ?


What we can see in log files :

NODE A :
2012-03-05 11:25:35,170 [http-nio-/0.0.0.0-9380-exec-13] Scheduler BigSchedulerGlobal_$_cma11330943135100 started.
2012-03-05 11:26:20,032 [BigSchedulerGlobal_Worker-3] Quartz Test Job Start.
2012-03-05 11:27:50,029 [BigSchedulerGlobal_Worker-3] Quartz Test Job End.
2012-03-05 11:27:50,083 [BigSchedulerGlobal_Worker-8] Quartz Test Job Start.
2012-03-05 11:27:59,293 [stop children - Catalina:j2eeType=WebModule,name=//localhost/casino,J2EEApplication=none,J2EEServer=none] Scheduler BigSchedulerGlobal_$_cma11330943135100 shutting down.
2012-03-05 11:27:59,293 [stop children - Catalina:j2eeType=WebModule,name=//localhost/casino,J2EEApplication=none,J2EEServer=none] Scheduler BigSchedulerGlobal_$_cma11330943135100 paused.
2012-03-05 11:27:59,306 [stop children - Catalina:j2eeType=WebModule,name=//localhost/casino,J2EEApplication=none,J2EEServer=none] Scheduler BigSchedulerGlobal_$_cma11330943135100 shutdown complete.
2012-03-05 11:29:20,079 [BigSchedulerGlobal_Worker-8] Quartz Test Job End.


NODE B:
2012-03-05 11:25:37,282 [http-nio-/0.0.0.0-8380-exec-18] Scheduler BigSchedulerGlobal_$_cma11330943137232 started.
2012-03-05 11:28:14,829 [QuartzScheduler_BigSchedulerGlobal-cma11330943137232_ClusterManager] ClusterManager: detected 1 failed or restarted instances.
2012-03-05 11:28:14,829 [QuartzScheduler_BigSchedulerGlobal-cma11330943137232_ClusterManager] ClusterManager: Scanning for instance "cma11330943135100"'s failed in-progress jobs.
2012-03-05 11:28:14,982 [QuartzScheduler_BigSchedulerGlobal-cma11330943137232_ClusterManager] ClusterManager: ......Cleaned-up 1 other failed job(s).
2012-03-05 11:28:20,022 [BigSchedulerGlobal_Worker-10] Quartz Test Job Start.
2012-03-05 11:29:27,905 [BigSchedulerGlobal_Worker-6] Quartz Test Job Start.
2012-03-05 11:29:50,020 [BigSchedulerGlobal_Worker-10] Quartz Test Job End.
2012-03-05 11:30:20,021 [BigSchedulerGlobal_Worker-5] Quartz Test Job Start.
2012-03-05 11:30:57,901 [BigSchedulerGlobal_Worker-6] Quartz Test Job End.
2012-03-05 11:31:20,021 [BigSchedulerGlobal_Worker-3] Quartz Test Job Start.
2012-03-05 11:31:50,019 [BigSchedulerGlobal_Worker-5] Quartz Test Job End.
2012-03-05 11:32:20,031 [BigSchedulerGlobal_Worker-2] Quartz Test Job Start.
2012-03-05 11:32:50,024 [BigSchedulerGlobal_Worker-3] Quartz Test Job End.
jhouse

seraphim
[Avatar]
Joined: 11/06/2009 15:29:56
Messages: 1654
Offline

Which shutdown() method are you calling on Node A ? (are you passing "true"?) -- the wording of your message is just enough ambiguous that I want to be sure.
zutai

neo

Joined: 03/05/2012 07:36:35
Messages: 2
Offline

To be honest, I've tried both, it makes no difference.
jhouse

seraphim
[Avatar]
Joined: 11/06/2009 15:29:56
Messages: 1654
Offline



The problem appears to be that the shutdown sequence stops the jobstore (which stops the clustermanager from doing checkins) before the jobs are completed, so the other node thinks the node has failed.

However, in the case of shutdown(true) we do not shutdown the jobstore until after the jobs are complete - so I cannot understand how this would happen.
 
Forum Index -> Quartz
Go to:   
Powered by JForum 2.1.7 © JForum Team