Terracotta Discussion Forums (LEGACY READ-ONLY ARCHIVE)

Originally, we had EhCache 2.5.1 libs deployed to Tomcat/lib and our app deployed to Tomcat/webapps/ROOT.

We made a change as I described in that post to switch some Sets/Maps stored in our cache values to ConcurrentHashMaps and ConcurrentHashMap-backed Sets.

After that change, we started seeing the following error on startup:

Code:

 ay 09, 2014 4:56:35 PM org.apache.catalina.core.StandardWrapperValve invoke
 SEVERE: Servlet.service() for servlet action threw exception
 java.lang.ClassNotFoundException: javax.transaction.TransactionManager
   at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
   at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
   at java.security.AccessController.doPrivileged(Native Method)
   at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
   at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
   at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
   at java.lang.Class.getDeclaredFields0(Native Method)
   at java.lang.Class.privateGetDeclaredFields(Class.java:2397)
   at java.lang.Class.getDeclaredFields(Class.java:1806)
   at net.sf.ehcache.pool.sizeof.ObjectGraphWalker.getAllFields(ObjectGraphWalker.java:266)
   at net.sf.ehcache.pool.sizeof.ObjectGraphWalker.getFilteredFields(ObjectGraphWalker.java:229)
   at net.sf.ehcache.pool.sizeof.ObjectGraphWalker.walk(ObjectGraphWalker.java:160)
   at net.sf.ehcache.pool.sizeof.SizeOf.deepSizeOf(SizeOf.java:73)
   at net.sf.ehcache.pool.impl.DefaultSizeOfEngine.sizeOf(DefaultSizeOfEngine.java:173)
   at net.sf.ehcache.pool.impl.AbstractPoolAccessor.add(AbstractPoolAccessor.java:63)
   at net.sf.ehcache.store.MemoryStore.put(MemoryStore.java:258)
   at net.sf.ehcache.store.MemoryStore.fill(MemoryStore.java:237)
   at net.sf.ehcache.store.FrontEndCacheTier.put(FrontEndCacheTier.java:259)
   at net.sf.ehcache.Cache.putInternal(Cache.java:1489)
   at net.sf.ehcache.Cache.put(Cache.java:1417)
   at net.sf.ehcache.Cache.put(Cache.java:1382)
   at com.hannonhill.cascade.cache.DefaultEhcacheIndexBlockRenderCache.store(DefaultEhcacheIndexBlockRenderCache.java:695)

It would happen on the first request and would go away after that. This was odd because:

Our app supplies jta.jar containing javax.transaction.TransactionManager in our app's WEB-INF/lib directory. Plus, the error seems to go away on the next request indicating some kind of weird classloading race condition or something

None of our cache data structures seem to be referencing the TransactionManager. We do make cache calls from within a Spring-annotated tx though but I'm not sure why EhCache would suddenly need to access the TransactionManager when computing the size of our cache entries

Then, we moved EhCache into our app's libs instead of having it in Tomcat/lib and this error mysteriously went away.

We're fine with this change, but:

Why couldn't EhCache find the TransactionManager before?

Is there any problem with deploying EhCache to our webapp's WEB-INF/lib? This article seems to recommend deploying it to Tomcat/common/lib but I was wondering if that was kind of a legacy recommendation since Tomcat/common doesn't even exist in T6+

Thanks!

We recently made a change to our cache entries so they are now value objects composed of: 3 sets of Strings and 1 String

Previously, the Sets were simple HashSets. But we converted them to be ConcurrentHashMap-backed sets:

Code:

 Collections.newSetFromMap(new ConcurrentHashMap<String, Boolean>());

because we we were running into issues with multiple threads accessing these sets at once.

Since that change, we've been seeing the warning:

2014-05-16 10:59:48,928 WARN [ObjectGraphWalker] : The configured limit of 1,000 object references was reached while attempting to calculate the size of the object graph. Severe performance degradation could occur if the sizing operation continues. This can be avoided by setting the CacheManger or Cache <sizeOfPolicy> elements maxDepthExceededBehavior to "abort" or adding stop points with @IgnoreSizeOf annotations. If performance degradation is NOT an issue at the configured limit, raise the limit value using the CacheManager or Cache <sizeOfPolicy> elements maxDepth attribute. For more information, see the Ehcache configuration documentation.

We're running EhCache 2.8.2 and our cache configuration looks like:
Code:

 	<cache 
 		name="indexBlockRenderCache" 
 		maxBytesLocalHeap="100m"
 		eternal="true" 
 		overflowToOffHeap="false"
 		overflowToDisk="true" 
 		diskPersistent="false"
 		maxBytesLocalDisk="2G"
 		/>

I've breakpointed on this warning and looked at both the cache key and cache value to try to figure out how we could be hitting this warning and I can't figure it out.

Our key object:
Code:

 public class IndexBlockRenderCacheKey implements Serializable
 {
     private static final long serialVersionUID = 8527160051856966360L;
     private final String id;
     private final boolean isContent;
 
     // if isContent is true, the following flags do not matter
     private final boolean indexRegularContent;
     private final boolean indexSystemMetadata;
     private final boolean indexUserMetadata;
     private final boolean indexAccessRights;
     private final boolean incldueWorkflowInformation;
 ....

Our value object:
Code:

 public class IndexBlockRenderCacheValue implements Serializable
 {
     private static final long serialVersionUID = 4153994794523856780L;
     private final Set<String> includedIndexBlockIds;
     private final Set<String> includedAssetIds;
     private final Set<String> includedStructuredDataAssetIds;
     private final String jdomElementAsString;
 ...

The sets of Strings all appear to be empty so I can't figure how we'd be anywhere close to the 1000 object reference limit.

Any ideas here? Is there an issue with computing the size of ConcurrentHashMap-backed sets?

I'm revisiting this discussion to try to better understand why using ReadCommitted or Serializable explicitly would be better than using MySQL's default RepeatableRead.

Specifically, I'm not clear on why moving either in the direction of more isolation (Serializable) or moving in the direction of less isolation (Read Committed) would result in fewer deadlocks.

Thanks for any info you can provide on this!

The org.quartz.impl.jdbcjobstore.oracle.OracleDelegate makes mention of a jdbcDriverVendor property that I can't seem to find in the documentation

Is setting this property Code:

org.quartz.impl.jdbcjobstore.oracle.OracleDelegate

sufficient?

Forgive me if this has been answered. I checked these forums and the FAQ.

Does Quartz support Oracle 11g specifically?

Sorry to bump this thread, but I'm still wondering about this part:

If it was using Repeatable Read, I'm confused why either going towards more tx isolation (Serializable) or going towards less isolation (ReadCommitted) would both help with tx timeouts.

Any insights?

I guess I'm a little confused about the purpose of these settings.

Previous to using either org.quartz.jobStore.txIsolationLevelSerializable or org.quartz.jobStore.txIsolationLevelReadCommitted, we were having frequent transaction timeouts. At that time, was Quartz defaulting to the database's default which for InnoDB tables in MySQL is Repeatable Read or does Quartz always specify an isolation level?

If it was using Repeatable Read, I'm confused why either going towards more tx isolation (Serializable) or going towards less isolation (ReadCommitted) would both help with tx timeouts.

We have now run into a problem where the ReadCommitted isolation level conflicts with systems that use statement-based binary logging and replication in MySQL.

Given that the default isolation level for InnoDB tables in MySQL is "repeatable read" should we consider moving to something more isolated Serializable instead. Under which circumstances (types of databases, clustering) would you recommend using org.quartz.jobStore.txIsolationLevelSerializable instead of org.quartz.jobStore.txIsolationLevelReadCommitted

Sorry to repeat my question, but would you recommend this setting in MySQL for both clustered and non-clustered instances?

Thanks for the info. We are using InnoDB tables which use row-locking by default, as far as I know. There are definitely indexes on the table.

I just added:
Code:

 org.quartz.jobStore.txIsolationLevelReadCommitted = true

to our quartz config and this seems to have made a significant difference. Previously I was getting those errors every few minutes without fail. So far, I haven't seen any.

Is this something you would recommend us doing for all of our supported db vendors or just MySQL? We support MySQL 4.1/5.0, SQL Server 2k5/2k8, and Oracle 10g.

Should we be using this setting in non-load-balanced environments as well?

I'm also a member of Mike's organization. 1.8.3 appears to have addressed this issue with a single application server. However, we're seeing the issues when load-balancing our application (multiple Tomcat servers).

I have verified that our Quartz config contains
Code:

 org.quartz.jobStore.isClustered = true

The stack trace that we're seeing regularly is below. I'm working on getting a thread dump.
Code:

 2010-09-10 11:21:39,199 ERROR [ErrorLogger] : An error occured while firing trigger 'DEFAULT.Publish Request Check'
 org.quartz.JobPersistenceException: Couldn't update states of blocked triggers: Lock wait timeout exceeded; try restarting transaction [See nested exception: java.sql.SQLException: Lock wait timeout exceeded; try restarting transaction]
 	at org.quartz.impl.jdbcjobstore.JobStoreSupport.triggerFired(JobStoreSupport.java:2925)
 	at org.quartz.impl.jdbcjobstore.JobStoreSupport$38.execute(JobStoreSupport.java:2846)
 	at org.quartz.impl.jdbcjobstore.JobStoreSupport.executeInNonManagedTXLock(JobStoreSupport.java:3763)
 	at org.quartz.impl.jdbcjobstore.JobStoreSupport.triggerFired(JobStoreSupport.java:2840)
 	at org.quartz.core.QuartzSchedulerThread.run(QuartzSchedulerThread.java:320)
 Caused by: java.sql.SQLException: Lock wait timeout exceeded; try restarting transaction
 	at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1056)
 	at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:957)
 	at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3376)
 	at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3308)
 	at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:1837)
 	at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:1961)
 	at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2543)
 	at com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:1737)
 	at com.mysql.jdbc.PreparedStatement.executeUpdate(PreparedStatement.java:2022)
 	at com.mysql.jdbc.PreparedStatement.executeUpdate(PreparedStatement.java:1940)
 	at com.mysql.jdbc.PreparedStatement.executeUpdate(PreparedStatement.java:1925)
 	at org.apache.tomcat.dbcp.dbcp.DelegatingPreparedStatement.executeUpdate(DelegatingPreparedStatement.java:101)
 	at org.quartz.impl.jdbcjobstore.StdJDBCDelegate.updateTriggerStatesForJobFromOtherState(StdJDBCDelegate.java:1695)
 	at org.quartz.impl.jdbcjobstore.JobStoreSupport.triggerFired(JobStoreSupport.java:2918)
 	... 4 more

It's worth reiterating that our Application thread represents a Quartz job that is running in a thread worker and actually attempting to unschedule other jobs.

Is this, in it of itself, a problem in Quartz? We're sort of stuck on this issue at the moment. Any feedback would be appreciated.

Thanks!

I guess I don't understand what the misfire threshold is then. Kind of thought it was a way to warn you of over-utilization of resources (too small a worker thread pool/long-running jobs).

Hmm the job could take anywhere between a few minutes and a few hours depending on the size. Would we be ok with a misfire threshold as long as a few hours? Is there some other way of indicating that we either don't really care if a particular job misfires due to it running for a long time or don't need to know about it in the logs?

Hi, I have a stateful job that repeats infinitely and executes every 5s. Most of the time it completes immediately. Occasionally, it will run for 5m or more (when it has actual work to do).

I'm noticing that I always see a message in the log about a misfire if it runs for a long time and I'm assuming that's because it was supposed to fire during that interval.
Code:

 INFO  [LocalDataSourceJobStore] : Handling 1 trigger(s) that missed their scheduled fire-time.

I'm wondering:
1. Is there something I can do so that this particular job doesn't spam the logs when it runs for a long time?
2. Would a chance in the misfire policy make any difference? I'm using the smart misfire policy right now which works great in terms of when the job is scheduled again, but still likes to remind me that it misfired.