[Logo] Terracotta Discussion Forums (LEGACY READ-ONLY ARCHIVE)
  [Search] Search   [Recent Topics] Recent Topics   [Members]  Member Listing   [Groups] Back to home page 
[Register] Register / 
[Login] Login 
[Expert]
Messages posted by: hostalp  XML
Profile for hostalp -> Messages posted by hostalp [8]
Author Message
Yes I can confirm the same. If the DB goes down while the Quartz scheduler instance is running, the QuartzSchedulerThread goes wild trying to get a new connection repeatedly (as fast as it can), easily able to fully utilize 1 CPU core by doing just that.

It's exactly the same situation as described by schmidma above, with the most important part of the stack trace of the affected thread being (Quartz 2.2.1):
Code:
         at org.quartz.utils.DBConnectionManager.getConnection(DBConnectionManager.java:108)
         at org.quartz.impl.jdbcjobstore.JobStoreCMT.getNonManagedTXConnection(JobStoreCMT.java:165)
         at org.quartz.impl.jdbcjobstore.JobStoreSupport.executeInNonManagedTXLock(JobStoreSupport.java:3784)
         at org.quartz.impl.jdbcjobstore.JobStoreSupport.acquireNextTriggers(JobStoreSupport.java:2756)
         at org.quartz.core.QuartzSchedulerThread.run(QuartzSchedulerThread.java:272)
So as a simple workaround we created extended RMIBootstrapCacheLoaderFactory and RMIBootstrapCacheLoader that allow delayed start of bootstrap thread. Then we configured the delay to be long enough to ensure the Hibernate session factory is already available for the bootstrap.
It isn't the best solution, but as a simple one it works.
We use the configuration below to achieve one cache manager instance for Hibernate as well as for our caches. This works well, but in cluster (RMI replication) when the Hibernate StandardQueryCache is replicated, there's one problem.
https://hibernate.atlassian.net/browse/HHH-6162 - there's a reference to the Hibernate session factory held by some query cache objects and the only way to live with it is to name the Hibernate session factory (that's what we've done via hibernate.session_factory_name.
With our configuration the cache manager starts quite earlier than the session factory binds its JNDI name.
When the cache replication initializes, the cache manager will receive objects (with reference to the session factory) from the already running remote instance and because it happens before the session factory is locally available it will complain like shown below.

Now some questions:
- Will the below exception cause the replication for the StandardQueryCache region to not initialize at all, or will it be (silently) retried at the background and thus succeed later (when the session factory is available)?

- Any ideas for how to prevent that from happen at all? We tried to somewhat postpone the cache manager startup to later time (when the session factory exists) but no success so far. The entity manager factory initialization kicks in before session factory of course (and it takes some time so that there's some decent time gap - depending on the environment from few to few tens of seconds) and the cache manager is already required at this point. Session factory initialization follows afterwards and that's too late.

Ehcache 2.7.4, Hibernate 3.6.4, Spring 3.0.7

Code:
[Bootstrap Thread for cache org.hibernate.cache.StandardQueryCache] WARN  RMIBootstrapCacheLoader [] [] - Error asynchronously performing bootstrap. The cause was: Error bootstrapping from remote peer. Message was: error unmarshalling return; nested exception is: 
 	java.io.InvalidObjectException: Could not find a SessionFactory named: hibernate/SessionFactory
 net.sf.ehcache.distribution.RemoteCacheException: Error bootstrapping from remote peer. Message was: error unmarshalling return; nested exception is: 
 	java.io.InvalidObjectException: Could not find a SessionFactory named: hibernate/SessionFactory
 	at net.sf.ehcache.distribution.RMIBootstrapCacheLoader.doLoad(RMIBootstrapCacheLoader.java:176)
 	at net.sf.ehcache.distribution.RMIBootstrapCacheLoader$BootstrapThread.run(RMIBootstrapCacheLoader.java:107)
 Caused by: 
 java.rmi.UnmarshalException: error unmarshalling return; nested exception is: 
 	java.io.InvalidObjectException: Could not find a SessionFactory named: hibernate/SessionFactory
 	at sun.rmi.server.UnicastRef.invoke(UnicastRef.java:185)
 	at net.sf.ehcache.distribution.RMICachePeer_Stub.getKeys(Unknown Source)
 	at net.sf.ehcache.distribution.RMIBootstrapCacheLoader.doLoad(RMIBootstrapCacheLoader.java:146)
 	... 1 more
 Caused by: 
 java.io.InvalidObjectException: Could not find a SessionFactory named: hibernate/SessionFactory
 	at org.hibernate.impl.SessionFactoryImpl.readResolve(SessionFactoryImpl.java:755)
 ...

Now brief configuration info:
Cache manager stuff - it's just used to ensure only one cache manager instance is used throughout the application for both Hibernate and our own caches:
Code:
    <bean id="cacheManager" class="org.springframework.cache.ehcache.EhCacheManagerFactoryBean">
 	</bean>
 
 	<bean id="localConfigurableCacheRegionFactory" class="...ConfigurableCacheRegionFactory">
 		<property name="cacheManager" ref="cacheManager"/>
 	</bean>
 	
 	<bean id="abstractCacheFactory" class="...AbstractCacheBean" abstract="true">
         <property name="cacheManager" ref="cacheManager" />
     </bean>

Then Hibernate config follows:
Code:
    <bean id="entityManagerFactory" class="org.springframework.orm.jpa.LocalContainerEntityManagerFactoryBean">
     	<property name="persistenceUnitManager" ref="persistenceUnitManager" />
     	<property name="persistenceProvider">
     	 	<bean class="...ConfigurableHibernatePersistence">  
 	            <property name="interceptor">  
     	            <bean class="...HintSupportInterceptor"/>  
         	    </property>  
         	</bean> 
     	</property>
     	<property name="jpaVendorAdapter">
     		<bean class="org.springframework.orm.jpa.vendor.HibernateJpaVendorAdapter">
 				<property name="showSql" value="${hibernate.showSql}" />
 				<property name="databasePlatform" value="${dataSource.dialect}" />
     		</bean>
     	</property>
     	<property name="jpaPropertyMap" ref="${transaction.mode}PropertyMap" />
     </bean>
 
 	<util:map id="webspherePropertyMap">
 		<entry key="hibernate.archive.autodetection" value=""/>
 		<entry key="hibernate.transaction.manager_lookup_class" value="org.hibernate.transaction.WebSphereExtendedJTATransactionLookup"/>
 		<entry key="hibernate.current_session_context_class" value="jta"/>
 		<entry key="hibernate.cache.region.factory_class" value="...ConfigurableCacheRegionFactory"/>
 		<entry key="hibernate.cache.use_query_cache" value="true"/>
 		<entry key="hibernate.generate_statistics" value="${hibernate.statistics}" />
 		<entry key="hibernate.session_factory_name" value="hibernate/SessionFactory" />
 	</util:map>
 
 	<bean id="sessionFactory" factory-bean="entityManagerFactory" factory-method="getSessionFactory" />

And our own caches are configured like:
Code:
	<bean id="nodeDao"
 		class="...JpaNodeDao">
 		<property name="cache">
 			<bean parent="abstractCacheFactory">
 				<property name="cacheName"
 					value="...NodeCache" />
 			</bean>
 		</property>
 	</bean>
Adding 2 full thread dumps, check out the following threads, any ideas welcome:
Found one Java-level deadlock:
=============================
"WebContainer : 109":
waiting for ownable synchronizer 0xfffffd7d04e59a18, (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync),
which is held by "WebContainer : 58"
"WebContainer : 58":
waiting for ownable synchronizer 0xfffffd7d04e5a058, (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync),
which is held by "WebContainer : 59"
"WebContainer : 59":
waiting for ownable synchronizer 0xfffffd7d04e596f8, (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync),
which is held by "WebContainer : 69"
"WebContainer : 69":
waiting for ownable synchronizer 0xfffffd7d04e5a058, (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync),
which is held by "WebContainer : 59"


We'll probably try -XX:+UseMembar for the following test, however we're running out of time at that target environment and won't be able to perform much more tests.

alexsnaps wrote:
Still can't help with this... Also, you see threads waiting to acquire lock, but do you see threads holding locks ? Or do these threads hang forever ? 
We don't have full thread dump at the moment because it was running unattended through the night. We'll try to gather some if this issue occurs in one of the following runs.

And yes, those threads remained in that state until one my colleague killed the JVM process in the morning.

Note that at approximately the same time there was an exhaustion of DB one connection pool, though that datasource was a different one, used by different component against separate DB schema.
During load tests we randomly encountered state of lots of threads being stuck like shown below.
Because these were automated tests running on night we didn't capture full thread dumps (below are just stack traces coming from the application server hung thread notifications). We'll perform some more tests and if we encounter the same state we'll gather full thread dumps then.

Below you can see typical stack traces observed for all those hung threads, they refer to caches that are configured as read-write and they're replicated via RMI in the cluster by the way (I think we can disregards the read-write + RMI replication consistency issues described for example here: http://lists.terracotta.org/pipermail/ehcache-list/2011-April/000354.html for now as these shouldn't be related, the reason why they're read-write is simply coming from the history when nonstrict-read-write cache elements were being put and then removed which didn't cause what we required)

at sun.misc.Unsafe.park(Native Method)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:964)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1282)
at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:731)
at net.sf.ehcache.store.chm.SelectableConcurrentHashMap$Segment.get(SelectableConcurrentHashMap.java:690)
at net.sf.ehcache.store.chm.SelectableConcurrentHashMap.get(SelectableConcurrentHashMap.java:308)
at net.sf.ehcache.store.MemoryStore.get(MemoryStore.java:326)
at net.sf.ehcache.Cache.get(Cache.java:1634)
at org.hibernate.cache.ehcache.internal.regions.EhcacheTransactionalDataRegion.get(EhcacheTransactionalDataRegion.java:115)
at org.hibernate.cache.ehcache.internal.strategy.AbstractReadWriteEhcacheAccessStrategy.get(AbstractReadWriteEhcacheAccessStrategy.java:77)
at org.hibernate.cache.ehcache.internal.nonstop.NonstopAwareEntityRegionAccessStrategy.get(NonstopAwareEntityRegionAccessStrategy.java:133)
at org.hibernate.event.internal.DefaultLoadEventListener.loadFromSecondLevelCache(DefaultLoadEventListener.java:546)
at org.hibernate.event.internal.DefaultLoadEventListener.doLoad(DefaultLoadEventListener.java:419)
at org.hibernate.event.internal.DefaultLoadEventListener.load(DefaultLoadEventListener.java:204)
at org.hibernate.event.internal.DefaultLoadEventListener.proxyOrLoad(DefaultLoadEventListener.java:260)
at org.hibernate.event.internal.DefaultLoadEventListener.onLoad(DefaultLoadEventListener.java:148)
at org.hibernate.internal.SessionImpl.fireLoad(SessionImpl.java:1078)
at org.hibernate.internal.SessionImpl.internalLoad(SessionImpl.java:1005)
at org.hibernate.type.EntityType.resolveIdentifier(EntityType.java:651)
at org.hibernate.type.ManyToOneType.assemble(ManyToOneType.java:266)
at org.hibernate.type.TypeHelper.assemble(TypeHelper.java:113)
at org.hibernate.cache.internal.StandardQueryCache.get(StandardQueryCache.java:163)
at org.hibernate.loader.Loader.getResultFromQueryCache(Loader.java:2443)
at org.hibernate.loader.Loader.listUsingQueryCache(Loader.java:2351)
at org.hibernate.loader.Loader.list(Loader.java:2324)
at org.hibernate.loader.hql.QueryLoader.list(QueryLoader.java:490)
at org.hibernate.hql.internal.ast.QueryTranslatorImpl.list(QueryTranslatorImpl.java:355)
at org.hibernate.engine.query.spi.HQLQueryPlan.performList(HQLQueryPlan.java:195)
at org.hibernate.internal.SessionImpl.list(SessionImpl.java:1247)
at org.hibernate.internal.QueryImpl.list(QueryImpl.java:101)
at org.hibernate.ejb.QueryImpl.getResultList(QueryImpl.java:264)
... 
at sun.misc.Unsafe.park(Native Method)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:867)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1197)
at java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:945)
at net.sf.ehcache.concurrent.ReadWriteLockSync.lock(ReadWriteLockSync.java:50)
at org.hibernate.cache.ehcache.internal.regions.EhcacheTransactionalDataRegion.writeLock(EhcacheTransactionalDataRegion.java:210)
at org.hibernate.cache.ehcache.internal.strategy.AbstractReadWriteEhcacheAccessStrategy.putFromLoad(AbstractReadWriteEhcacheAccessStrategy.java:102)
at org.hibernate.cache.ehcache.internal.nonstop.NonstopAwareEntityRegionAccessStrategy.putFromLoad(NonstopAwareEntityRegionAccessStrategy.java:195)
at org.hibernate.engine.internal.TwoPhaseLoad.doInitializeEntity(TwoPhaseLoad.java:224)
at org.hibernate.engine.internal.TwoPhaseLoad.initializeEntity(TwoPhaseLoad.java:134)
at org.hibernate.loader.Loader.initializeEntitiesAndCollections(Loader.java:1103)
at org.hibernate.loader.Loader.processResultSet(Loader.java:960)
at org.hibernate.loader.Loader.doQuery(Loader.java:910)
at org.hibernate.loader.Loader.doQueryAndInitializeNonLazyCollections(Loader.java:341)
at org.hibernate.loader.Loader.doList(Loader.java:2516)
at org.hibernate.loader.Loader.doList(Loader.java:2502)
at org.hibernate.loader.Loader.listIgnoreQueryCache(Loader.java:2332)
at org.hibernate.loader.Loader.list(Loader.java:2327)
at org.hibernate.loader.hql.QueryLoader.list(QueryLoader.java:490)
at org.hibernate.hql.internal.ast.QueryTranslatorImpl.list(QueryTranslatorImpl.java:355)
at org.hibernate.engine.query.spi.HQLQueryPlan.performList(HQLQueryPlan.java:195)
at org.hibernate.internal.SessionImpl.list(SessionImpl.java:1247)
at org.hibernate.internal.QueryImpl.list(QueryImpl.java:101)
at org.hibernate.ejb.QueryImpl.getResultList(QueryImpl.java:264)
... 
at sun.misc.Unsafe.park(Native Method)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:867)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1197)
at java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:945)
at net.sf.ehcache.store.chm.SelectableConcurrentHashMap$Segment.remove(SelectableConcurrentHashMap.java:548)
at net.sf.ehcache.store.chm.SelectableConcurrentHashMap.remove(SelectableConcurrentHashMap.java:377)
at net.sf.ehcache.store.MemoryStore.remove(MemoryStore.java:359)
at net.sf.ehcache.store.NotifyingMemoryStore.evict(NotifyingMemoryStore.java:75)
at net.sf.ehcache.store.MemoryStore.removeElementChosenByEvictionPolicy(MemoryStore.java:609)
at net.sf.ehcache.store.MemoryStore.access$600(MemoryStore.java:95)
at net.sf.ehcache.store.MemoryStore$Participant.evict(MemoryStore.java:1243)
at net.sf.ehcache.pool.impl.FromLargestCachePoolEvictor.freeSpace(FromLargestCachePoolEvictor.java:49)
at net.sf.ehcache.pool.impl.AtomicPoolAccessor.add(AtomicPoolAccessor.java:71)
at net.sf.ehcache.pool.impl.AbstractPoolAccessor.add(AbstractPoolAccessor.java:67)
at net.sf.ehcache.store.MemoryStore.put(MemoryStore.java:266)
at net.sf.ehcache.Cache.putInternal(Cache.java:1500)
at net.sf.ehcache.Cache.put(Cache.java:1426)
at net.sf.ehcache.Cache.put(Cache.java:1391)
at org.hibernate.cache.ehcache.internal.regions.EhcacheTransactionalDataRegion.put(EhcacheTransactionalDataRegion.java:141)
at org.hibernate.cache.ehcache.internal.strategy.AbstractReadWriteEhcacheAccessStrategy.putFromLoad(AbstractReadWriteEhcacheAccessStrategy.java:107)
at org.hibernate.cache.ehcache.internal.nonstop.NonstopAwareEntityRegionAccessStrategy.putFromLoad(NonstopAwareEntityRegionAccessStrategy.java:195)
at org.hibernate.engine.internal.TwoPhaseLoad.doInitializeEntity(TwoPhaseLoad.java:224)
at org.hibernate.engine.internal.TwoPhaseLoad.initializeEntity(TwoPhaseLoad.java:134)
at org.hibernate.loader.Loader.initializeEntitiesAndCollections(Loader.java:1103)
at org.hibernate.loader.Loader.processResultSet(Loader.java:960)
at org.hibernate.loader.Loader.doQuery(Loader.java:910)
at org.hibernate.loader.Loader.doQueryAndInitializeNonLazyCollections(Loader.java:341)
at org.hibernate.loader.Loader.doList(Loader.java:2516)
at org.hibernate.loader.Loader.listUsingQueryCache(Loader.java:2361)
at org.hibernate.loader.Loader.list(Loader.java:2324)
at org.hibernate.loader.hql.QueryLoader.list(QueryLoader.java:490)
at org.hibernate.hql.internal.ast.QueryTranslatorImpl.list(QueryTranslatorImpl.java:355)
at org.hibernate.engine.query.spi.HQLQueryPlan.performList(HQLQueryPlan.java:195)
at org.hibernate.internal.SessionImpl.list(SessionImpl.java:1247)
at org.hibernate.internal.QueryImpl.list(QueryImpl.java:101)
at org.hibernate.ejb.QueryImpl.getResultList(QueryImpl.java:264)
... 
Hibernate 4.1.9, Ehcache 2.7.0, java version "1.7.0_17" (WebSphere Application Server 8.5.5.0 at Solaris 10 x86_64)
Basically I didn't want to start a new thread for that, however it's similar in its nature - threads hang in very similar manner under heavy load. Of course I can create a new thread, however I don't have much more information available at this moment. The caches affected all seem to be configured as read-write and they're also replicated (RMI) - and yes I'm aware of this: http://lists.terracotta.org/pipermail/ehcache-list/2011-April/000354.html which shouldn't be related to this case.
We've got similar problems with read-write ehcache regions used from hibernate. Threads get stuck at:
at sun.misc.Unsafe.park(Native Method)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:964)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1282)
at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:731)
at net.sf.ehcache.store.chm.SelectableConcurrentHashMap$Segment.get(SelectableConcurrentHashMap.java:690)
at net.sf.ehcache.store.chm.SelectableConcurrentHashMap.get(SelectableConcurrentHashMap.java:308)
at net.sf.ehcache.store.MemoryStore.get(MemoryStore.java:326)
at net.sf.ehcache.Cache.get(Cache.java:1634)
at org.hibernate.cache.ehcache.internal.regions.EhcacheTransactionalDataRegion.get(EhcacheTransactionalDataRegion.java:115)
at org.hibernate.cache.ehcache.internal.strategy.AbstractReadWriteEhcacheAccessStrategy.get(AbstractReadWriteEhcacheAccessStrategy.java:77)
at org.hibernate.cache.ehcache.internal.nonstop.NonstopAwareEntityRegionAccessStrategy.get(NonstopAwareEntityRegionAccessStrategy.java:133)
at org.hibernate.event.internal.DefaultLoadEventListener.loadFromSecondLevelCache(DefaultLoadEventListener.java:546)
... 

at sun.misc.Unsafe.park(Native Method)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:867)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1197)
at java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:945)
at net.sf.ehcache.concurrent.ReadWriteLockSync.lock(ReadWriteLockSync.java:50)
at org.hibernate.cache.ehcache.internal.regions.EhcacheTransactionalDataRegion.writeLock(EhcacheTransactionalDataRegion.java:210)
at org.hibernate.cache.ehcache.internal.strategy.AbstractReadWriteEhcacheAccessStrategy.putFromLoad(AbstractReadWriteEhcacheAccessStrategy.java:102)
at org.hibernate.cache.ehcache.internal.nonstop.NonstopAwareEntityRegionAccessStrategy.putFromLoad(NonstopAwareEntityRegionAccessStrategy.java:195)
at org.hibernate.engine.internal.TwoPhaseLoad.doInitializeEntity(TwoPhaseLoad.java:224)
... 
Hibernate 4.1.9, Ehcache 2.7.0, java version "1.7.0_17" (WebSphere Application Server 8.5 at Solaris 10 x86_64)

EDIT: We've found this likely related stuff:
http://forums.terracotta.org/forums/posts/list/6091.page
http://stackoverflow.com/questions/7809768/debugging-hibernate-ehcache-deadlock
Could it be that? We'll try to check it, but it probably isn't the same case we in our case we go via ehcache (in each thread) only once, not twice like there.
 
Profile for hostalp -> Messages posted by hostalp [8]
Go to:   
Powered by JForum 2.1.7 © JForum Team