| Author |
Message |
![[Post New]](/forums/templates/default/images/icon_minipost_new.gif) 12/21/2009 10:30:11
|
jos
neo
Joined: 12/21/2009 10:22:10
Messages: 5
Offline
|
I am running an application in Tomcat 6 and, to schedule operations, I use Quartz 1.6.0.
All triggers are being executed correctly according the scheduled defined to each one. Recently, I added a new job which trigger, suddenly, changes to error state after some executions.
Example of row in QRTZ_TRIGGERS table (after changed state)
Code:
TRIGGER_NAME TRIGGER_GROUP JOB_NAME JOB_GROUP IS_VOLATILE DESCRIPTION NEXT_FIRE_TIME PREV_FIRE_TIME PRIORITY TRIGGER_STATE TRIGGER_TYPE START_TIME END_TIME CALENDAR_NAME MISFIRE_INSTR JOB_DATA
app8 batch app8 batch 0 NULL 1254877500000 1254875068404 5 ERROR CRON 0 0 NULL 0 NULL
This issue only occurs in environments with multiple application servers.
I have already added a new quartz logger to log4j file and set the log level to DEBUG but the logs written are not clear about the cause of this issue. Besides, I have created a Job Listener class (implements JobListener) and a Scheduler Listener class (implements org.quartz.SchedulerListener). With these listeners, I could get more information about execution but I still don’t understand why this triggers changes suddenly to ERROR state.
Below, I include logs of an execution just before the state changed to error.
Code:
2009-12-04 11:05:00,028 INFO Worker-3 jobToBeExecuted JobExecutionContext: trigger: 'batch.app8 job: batch.app8 fireTime: 'Fri Dec 04 11:05:00 GMT 2009 scheduledFireTime: Fri Dec 04 11:05:00 GMT 2009 previousFireTime: 'Fri Dec 04 10:05:00 GMT 2009 nextFireTime: Fri Dec 04 12:05:00 GMT 2009 isRecovering: false refireCount: 0
2009-12-04 11:05:00,028 INFO Worker-3 Starting job8
...
2009-12-04 11:05:01,339 INFO Worker-3 Finish job8
2009-12-04 11:05:01,340 INFO Worker-3 jobWasExecuted JobExecutionContext: trigger: 'batch.app8 job: batch.app8 fireTime: 'Fri Dec 04 11:05:00 GMT 2009 scheduledFireTime: Fri Dec 04 11:05:00 GMT 2009 previousFireTime: 'Fri Dec 04 10:05:00 GMT 2009 nextFireTime: Fri Dec 04 12:05:00 GMT 2009 isRecovering: false refireCount: 0
The lines “Starting job8” and “Finish job8” are written by my job class whereas the line “jobWasExecuted” is written by the Job listener class that I added recently.
One of my doubts is: this job was scheduled to be executed on Fri Dec 04 12:05:00 GMT 2009 (according logs). Which cause could prevent the correct execution of the job at that time? And why this does not happen with other jobs? By other hand, why does this trigger is correctly fired and executed sometimes and, other times, it changed to error state ?
Can anyone help?
Thanks
|
|
|
 |
![[Post New]](/forums/templates/default/images/icon_minipost_new.gif) 12/21/2009 23:13:41
|
jhouse
seraphim
Joined: 11/06/2009 15:29:56
Messages: 1654
Online
|
The most likely (almost always) cause of a trigger going to ERROR state is that the job class cannot be loaded or newInstance() fails (no puplic no-arg constructor).
Perhaps the job class is not in the classpath on all nodes (where it fails) - but is on some (where it succeeds).
|
|
|
 |
![[Post New]](/forums/templates/default/images/icon_minipost_new.gif) 12/22/2009 03:57:26
|
jos
neo
Joined: 12/21/2009 10:22:10
Messages: 5
Offline
|
But it is not the case because when I restart the server, this job is executed on first hours and, then, the state is changed to error. During application run, the class path is not changed...
Thoughts ?
Thanks in advance
|
|
|
 |
![[Post New]](/forums/templates/default/images/icon_minipost_new.gif) 12/22/2009 07:17:25
|
jhouse
seraphim
Joined: 11/06/2009 15:29:56
Messages: 1654
Online
|
I'm speculating that where it runs first is on a node that does have the job class properly in it's path.
|
|
|
 |
![[Post New]](/forums/templates/default/images/icon_minipost_new.gif) 12/22/2009 11:05:59
|
jos
neo
Joined: 12/21/2009 10:22:10
Messages: 5
Offline
|
But if that happens (class path is wrong in some nodes), the loading exception would be logged to quartz logger, isn't it ? I have searched for exceptions in quartz log files of each node I did not find any exception/error...
How can I know which node executed the job and changed the state to error?
Thanks
|
|
|
 |
![[Post New]](/forums/templates/default/images/icon_minipost_new.gif) 12/24/2009 03:31:55
|
jos
neo
Joined: 12/21/2009 10:22:10
Messages: 5
Offline
|
Just another question: is there any way to my trigger recover from error state?
Even if I cannot figure out why the trigger changes to error, it would be great to know a work-around to execute the trigger again. Currently, when the state changes to error, the trigger does not execute again until I do some updates manually on quartz database.
Thanks
|
|
|
 |
![[Post New]](/forums/templates/default/images/icon_minipost_new.gif) 01/04/2010 03:19:42
|
jos
neo
Joined: 12/21/2009 10:22:10
Messages: 5
Offline
|
Can anybody tell me if there's any way to recover a trigger from error state? (regardless the root cause to the state change).
Or, anybody can tell me how can I log all trigger changes to error state?
Thanks
Jos
|
|
|
 |
![[Post New]](/forums/templates/default/images/icon_minipost_new.gif) 11/04/2010 02:58:46
|
pzaharie
neo
Joined: 11/04/2010 02:56:44
Messages: 2
Offline
|
Hi Jos, did you ever discover how to recover from error state?
Thanks
Pavel
|
|
|
 |
![[Post New]](/forums/templates/default/images/icon_minipost_new.gif) 11/04/2010 06:51:30
|
jhouse
seraphim
Joined: 11/06/2009 15:29:56
Messages: 1654
Online
|
Recovering from error state takes manual intervention:
* Correcting the problem with the job class (either getting it into the classpath, or adding a public no arg constructor, or making the class 'public', etc.)
* executing the sql update qrtz_triggers set trigger_state = 'WAITING' where trigger_state = 'ERROR'
We currently require manual intervention, because if quartz automatically re-tries it creates an rapid spin / loop if the trigger goes back into error state (if the underlying problem hasn't been resolved).
All instances of the trigger going to ERROR state should be logged by quartz with a very clear message (and ERROR level)
|
|
|
 |
![[Post New]](/forums/templates/default/images/icon_minipost_new.gif) 11/04/2010 08:53:31
|
pzaharie
neo
Joined: 11/04/2010 02:56:44
Messages: 2
Offline
|
Hi,
In my particular case the problem is an OutOfMemoryError which happens on the server from time to time. And sometimes, when it happens within the Job instantiation the Trigger goes in error state. (btw Quartz catches Throwable here, and wraps the error in an exception effectively bypassing the JVM OutOfMemory handlers) On the next restart everything is OK with the job but trigger won't fire again. I discovered that calling "resume" may push it to fire again, but not every time and from reading the docs I guess this is not intentional behavior. The ideal solution for us would have been if we could recover within some scheduler startup listener, etc. using the Quartz API, but it seems impossible right now, unless we use direct JDBC, right?
Thanks
Pavel
|
|
|
 |
![[Post New]](/forums/templates/default/images/icon_minipost_new.gif) 02/24/2012 02:02:04
|
archenro
neo
Joined: 01/27/2012 05:53:37
Messages: 8
Offline
|
Thanks for:
sql update qrtz_triggers set trigger_state = 'WAITING' where trigger_state = 'ERROR'
|
|
|
 |
![[Post New]](/forums/templates/default/images/icon_minipost_new.gif) 02/24/2012 02:02:40
|
archenro
neo
Joined: 01/27/2012 05:53:37
Messages: 8
Offline
|
I also used API on error unschedule trigger and schedule it again
|
|
|
 |
![[Post New]](/forums/templates/default/images/icon_minipost_new.gif) 05/04/2012 10:21:57
|
first123
neo
Joined: 05/04/2012 10:18:24
Messages: 1
Offline
|
Hi Archenro,
We are also facing the same issue. Could you please explain how you used API to unschedule and reschedule the trigger? I tried to implement this in TriggerListener, but once the Trigger_State reached ERROR, the access does not go to TriggerListener and my piece code never got executed. It will be very helpful if you could let me know how you implemented your workaround.
Thanks a lot in advance!
|
|
|
 |
|
|