[Logo] Terracotta Discussion Forums
  [Search] Search   [Recent Topics] Recent Topics   [Members]  Member Listing   [Groups] Back to home page 
[Register] Register / 
[Login] Login 
[Expert]
Trigger changed suddenly to error state  XML
Forum Index -> Quartz
Author Message
jos

neo

Joined: 12/21/2009 10:22:10
Messages: 5
Offline

I am running an application in Tomcat 6 and, to schedule operations, I use Quartz 1.6.0.

All triggers are being executed correctly according the scheduled defined to each one. Recently, I added a new job which trigger, suddenly, changes to error state after some executions.

Example of row in QRTZ_TRIGGERS table (after changed state)

Code:
TRIGGER_NAME TRIGGER_GROUP JOB_NAME JOB_GROUP IS_VOLATILE DESCRIPTION NEXT_FIRE_TIME PREV_FIRE_TIME PRIORITY TRIGGER_STATE TRIGGER_TYPE START_TIME END_TIME CALENDAR_NAME MISFIRE_INSTR JOB_DATA
 app8 batch app8 batch 0 NULL 1254877500000 1254875068404 5 ERROR CRON 0 0 NULL 0 NULL
 


This issue only occurs in environments with multiple application servers.

I have already added a new quartz logger to log4j file and set the log level to DEBUG but the logs written are not clear about the cause of this issue. Besides, I have created a Job Listener class (implements JobListener) and a Scheduler Listener class (implements org.quartz.SchedulerListener). With these listeners, I could get more information about execution but I still don’t understand why this triggers changes suddenly to ERROR state.

Below, I include logs of an execution just before the state changed to error.

Code:
2009-12-04 11:05:00,028 INFO Worker-3 jobToBeExecuted JobExecutionContext: trigger: 'batch.app8 job: batch.app8 fireTime: 'Fri Dec 04 11:05:00 GMT 2009 scheduledFireTime: Fri Dec 04 11:05:00 GMT 2009 previousFireTime: 'Fri Dec 04 10:05:00 GMT 2009 nextFireTime: Fri Dec 04 12:05:00 GMT 2009 isRecovering: false refireCount: 0
 2009-12-04 11:05:00,028 INFO Worker-3 Starting job8
 ...
 2009-12-04 11:05:01,339 INFO Worker-3 Finish job8
 2009-12-04 11:05:01,340 INFO Worker-3 jobWasExecuted JobExecutionContext: trigger: 'batch.app8 job: batch.app8 fireTime: 'Fri Dec 04 11:05:00 GMT 2009 scheduledFireTime: Fri Dec 04 11:05:00 GMT 2009 previousFireTime: 'Fri Dec 04 10:05:00 GMT 2009 nextFireTime: Fri Dec 04 12:05:00 GMT 2009 isRecovering: false refireCount: 0


The lines “Starting job8” and “Finish job8” are written by my job class whereas the line “jobWasExecuted” is written by the Job listener class that I added recently.

One of my doubts is: this job was scheduled to be executed on Fri Dec 04 12:05:00 GMT 2009 (according logs). Which cause could prevent the correct execution of the job at that time? And why this does not happen with other jobs? By other hand, why does this trigger is correctly fired and executed sometimes and, other times, it changed to error state ?

Can anyone help?

Thanks
jhouse

seraphim
[Avatar]
Joined: 11/06/2009 15:29:56
Messages: 1654
Offline


The most likely (almost always) cause of a trigger going to ERROR state is that the job class cannot be loaded or newInstance() fails (no puplic no-arg constructor).

Perhaps the job class is not in the classpath on all nodes (where it fails) - but is on some (where it succeeds).
jos

neo

Joined: 12/21/2009 10:22:10
Messages: 5
Offline

But it is not the case because when I restart the server, this job is executed on first hours and, then, the state is changed to error. During application run, the class path is not changed...

Thoughts ?


Thanks in advance
jhouse

seraphim
[Avatar]
Joined: 11/06/2009 15:29:56
Messages: 1654
Offline


I'm speculating that where it runs first is on a node that does have the job class properly in it's path.
jos

neo

Joined: 12/21/2009 10:22:10
Messages: 5
Offline

But if that happens (class path is wrong in some nodes), the loading exception would be logged to quartz logger, isn't it ? I have searched for exceptions in quartz log files of each node I did not find any exception/error...


How can I know which node executed the job and changed the state to error?


Thanks
jos

neo

Joined: 12/21/2009 10:22:10
Messages: 5
Offline

Just another question: is there any way to my trigger recover from error state?

Even if I cannot figure out why the trigger changes to error, it would be great to know a work-around to execute the trigger again. Currently, when the state changes to error, the trigger does not execute again until I do some updates manually on quartz database.

Thanks
jos

neo

Joined: 12/21/2009 10:22:10
Messages: 5
Offline

Can anybody tell me if there's any way to recover a trigger from error state? (regardless the root cause to the state change).

Or, anybody can tell me how can I log all trigger changes to error state?


Thanks
Jos

pzaharie

neo

Joined: 11/04/2010 02:56:44
Messages: 2
Offline

Hi Jos, did you ever discover how to recover from error state?

Thanks
Pavel
jhouse

seraphim
[Avatar]
Joined: 11/06/2009 15:29:56
Messages: 1654
Offline

Recovering from error state takes manual intervention:

* Correcting the problem with the job class (either getting it into the classpath, or adding a public no arg constructor, or making the class 'public', etc.)

* executing the sql update qrtz_triggers set trigger_state = 'WAITING' where trigger_state = 'ERROR'


We currently require manual intervention, because if quartz automatically re-tries it creates an rapid spin / loop if the trigger goes back into error state (if the underlying problem hasn't been resolved).

All instances of the trigger going to ERROR state should be logged by quartz with a very clear message (and ERROR level)
pzaharie

neo

Joined: 11/04/2010 02:56:44
Messages: 2
Offline

Hi,

In my particular case the problem is an OutOfMemoryError which happens on the server from time to time. And sometimes, when it happens within the Job instantiation the Trigger goes in error state. (btw Quartz catches Throwable here, and wraps the error in an exception effectively bypassing the JVM OutOfMemory handlers) On the next restart everything is OK with the job but trigger won't fire again. I discovered that calling "resume" may push it to fire again, but not every time and from reading the docs I guess this is not intentional behavior. The ideal solution for us would have been if we could recover within some scheduler startup listener, etc. using the Quartz API, but it seems impossible right now, unless we use direct JDBC, right?

Thanks
Pavel
archenro

neo

Joined: 01/27/2012 05:53:37
Messages: 8
Offline

Thanks for:
sql update qrtz_triggers set trigger_state = 'WAITING' where trigger_state = 'ERROR'
archenro

neo

Joined: 01/27/2012 05:53:37
Messages: 8
Offline

I also used API on error unschedule trigger and schedule it again
first123

neo

Joined: 05/04/2012 10:18:24
Messages: 1
Offline

Hi Archenro,

We are also facing the same issue. Could you please explain how you used API to unschedule and reschedule the trigger? I tried to implement this in TriggerListener, but once the Trigger_State reached ERROR, the access does not go to TriggerListener and my piece code never got executed. It will be very helpful if you could let me know how you implemented your workaround.

Thanks a lot in advance!

 
Forum Index -> Quartz
Go to:   
Powered by JForum 2.1.7 © JForum Team