[Logo] Terracotta Discussion Forums (LEGACY READ-ONLY ARCHIVE)
  [Search] Search   [Recent Topics] Recent Topics   [Members]  Member Listing   [Groups] Back to home page 
[Register] Register / 
[Login] Login 
[Expert]
falling behind in job scheduling  XML
Forum Index -> Quartz
Author Message
dbcsumo

neo

Joined: 08/20/2014 16:35:48
Messages: 4
Offline

I'm trying to figure out how to best deal with a problem that I'm having where the quartz scheduler seems to be falling behind. Basically, when the system is running in a steady state, everything keeps up, but when I get past around 40,000 job runs an hour (e.g. because some jobs are temporarily slow to execute, causing other jobs to have to wait), then I seem to be entering a death spiral where jobs get further and further behind in execution.

I have plenty of threads available in my thread pool (outside of the situations where jobs are temporarily slow to execute), and the CPU doesn't seem to be maxed out on the box, so I'm thinking the scheduler? The job store is SQL backed (using Amazon RDS), I'll include the code for creating the scheduler below.

Unfortunately, I'm not sure exactly what parameters to bump up to get the scheduler to behave better. Quite possibly I just need a bigger database instance, though I've already bumped that up once. I also noticed the batch size when retrieving tasks (JobStoreSupport.acquireNextTriggers); I have that at the default of 1, should I bump that up? (I'm not entirely sure how much that will help, since the code has a loop per trigger acquired that includes talking to the database, but I don't understand the code super well.) Or there could easily be something else I'm missing.

From thread dumps, the scheduler thread is almost always in JobStoreSupport.acquireNextTriggers() or in JobStoreSupport.triggersFired(), with the former somewhat more common than the latter.

I'm using Quartz 2.1.7. (I looked at the Quartz 2.2.1 code; it wasn't obvious that there were changes that would be relevant to this in 2.2.1, but I could easily be missing something.)

Any suggestions would be gratefully appreciated; here's the code that I'm doing to configure the scheduler:

Code:
   def createScheduler(name: String, threadCount: Int, jobStore: JobStore = new RAMJobStore): Scheduler = {
     val tp = new SimpleThreadPool(threadCount, Thread.NORM_PRIORITY)
     tp.setThreadNamePrefix(name)
 
     DirectSchedulerFactory.getInstance().createScheduler(
       name,
       "%s-instance id".format(name),
       tp,
       jobStore)
 
     DirectSchedulerFactory.getInstance().getScheduler(name)
   }
 


where the jobStore comes from

Code:
   def createRdsJobStore(dsName: String, dbUrl: String, dbUser: String, dbPassword: String): JobStoreSupport = {
     val jobStore = new JobStoreTX
     jobStore.setDriverDelegateClass("org.quartz.impl.jdbcjobstore.StdJDBCDelegate")
     jobStore.setDataSource(dsName)
 
     val connectionManager = DBConnectionManager.getInstance
     val provider = new PoolingConnectionProvider(rdsConnectionProperties(dbUrl, dbUser, dbPassword))
     connectionManager.addConnectionProvider(dsName, provider)
     jobStore
   }
 
dbcsumo

neo

Joined: 08/20/2014 16:35:48
Messages: 4
Offline

I did some network measurements; it looks like the individual SQL commands that acquireNextTrigger() does take about 10ms each. And acquireNextTrigger() does one batch query plus three queries per trigger returned. So for a batch size of 1, that's 40 ms/trigger; for a batch size of infinity, that's 30 ms/trigger. (Rough numbers, the 10ms isn't super precise.)

Which means that, if I'm reading the code correctly, SQL-backed Quartz in my SQL setup maxes out at triggering around 120,000 jobs an hour? (Assuming everything other than the SQL commands is free.) Which is more than I'm seeing, but not by a huge factor.
dbcsumo

neo

Joined: 08/20/2014 16:35:48
Messages: 4
Offline

And then triggerFired() does another 4 or 5 SQL requests for each trigger? Also, my timing experiments didn't take transactions into account; my thread dumps show about half the threads executing the transaction and half the threads committing the transaction.

So, if I put all of that together, I guess it's no surprise I'm getting about 40,000 triggers / hour, I guess that's just inherent to the current design of JobStoreSupport?
dbcsumo

neo

Joined: 08/20/2014 16:35:48
Messages: 4
Offline

I filed https://jira.terracotta.org/jira/browse/QTZ-461 for this.
 
Forum Index -> Quartz
Go to:   
Powered by JForum 2.1.7 © JForum Team