[Logo] Terracotta Discussion Forums (LEGACY READ-ONLY ARCHIVE)
  [Search] Search   [Recent Topics] Recent Topics   [Members]  Member Listing   [Groups] Back to home page 
[Register] Register / 
[Login] Login 
[Expert]
Ehcache Deadlock  XML
Forum Index -> Ehcache
Author Message
jclawson

journeyman

Joined: 06/17/2010 11:03:46
Messages: 22
Offline

I was hoping that I could get some help on deciphering a deadlock we have seen occur in both production and qa. It appears to be ehcache. The symptom is that we will try to select from a table and the http thread will never return. I have taken client/server state dumps as well as thread dumps. I will try to post the relevent parts of the dump here, but if that's not enough I can try to attach them. (They are quite large)

We are using terracotta 3.4.0 and ehcache 2.3.0. I know... its old. We are going to try and upgrade. But for now, I would really appreciate some help in learning how to use these logs to determine what is happening.

Thread Dump (blocked threads only)


"http-50200-6" Id=371 WAITING on java.util.concurrent.Semaphore$NonfairSync@5c1b1aa5
at sun.misc.Unsafe.park(Native Method)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:811)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:969)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1281)
at java.util.concurrent.Semaphore.acquire(Semaphore.java:286)
at com.tc.object.locks.LockStateNode$PendingLockHold.park(LockStateNode.java:179)
at com.tc.object.locks.ClientLockImpl.acquireQueued(ClientLockImpl.java:723)
at com.tc.object.locks.ClientLockImpl.acquireQueued(ClientLockImpl.java:701)
at com.tc.object.locks.ClientLockImpl.lock(ClientLockImpl.java:52)
at com.tc.object.locks.ClientLockManagerImpl.lock(ClientLockManagerImpl.java:98)
at com.tc.object.bytecode.ManagerImpl.lock(ManagerImpl.java:747)
at org.terracotta.locking.TerracottaLock.lock(TerracottaLock.java:89)
at org.terracotta.modules.ehcache.concurrency.TcSync.lock(TcSync.java:27)
at net.sf.ehcache.hibernate.regions.EhcacheTransactionalDataRegion.writeLock(EhcacheTransactionalDataRegion.java:148)
at net.sf.ehcache.hibernate.strategy.AbstractReadWriteEhcacheAccessStrategy.putFromLoad(AbstractReadWriteEhcacheAccessStrategy.java:88)
at org.hibernate.engine.TwoPhaseLoad.initializeEntity(TwoPhaseLoad.java:195)
at org.hibernate.loader.Loader.initializeEntitiesAndCollections(Loader.java:982)
at org.hibernate.loader.Loader.doQuery(Loader.java:857)
at org.hibernate.loader.Loader.doQueryAndInitializeNonLazyCollections(Loader.java:274)
at org.hibernate.loader.Loader.doList(Loader.java:2533)
at org.hibernate.loader.Loader.listIgnoreQueryCache(Loader.java:2276)
at org.hibernate.loader.Loader.list(Loader.java:2271)
at org.hibernate.loader.criteria.CriteriaLoader.list(CriteriaLoader.java:119)
at org.hibernate.impl.SessionImpl.list(SessionImpl.java:1716)
at org.hibernate.impl.CriteriaImpl.list(CriteriaImpl.java:347)
at com.qualys.platform.persistence.hibernate.HibernatePaginatedListPage.create(HibernatePaginatedListPage.java:160)
at com.qualys.platform.persistence.hibernate.HibernatePaginatedListPage.create(HibernatePaginatedListPage.java:169)
at com.qualys.platform.persistence.domain.AbstractDao.find(AbstractDao.java:445)
at com.qualys.platform.web.extjs.grid.GridConfig$GridDataProvider.execute(GridConfig.java:212)
at com.qualys.portal.backoffice.module.backoffice.dwr.CustomerService.getCustomers(CustomerService.java:117)
at com.qualys.portal.backoffice.module.backoffice.dwr.CustomerService$$EnhancerByGuice$$17905382.CGLIB$getCustomers$6(<generated>)
at com.qualys.portal.backoffice.module.backoffice.dwr.CustomerService$$EnhancerByGuice$$17905382$$FastClassByGuice$$58a5f9f9.invoke(<generated>)
at com.google.inject.internal.cglib.proxy.$MethodProxy.invokeSuper(MethodProxy.java:228)
at com.google.inject.internal.InterceptorStackCallback$InterceptedMethodInvocation.proceed(InterceptorStackCallback.java:72)
at com.qualys.platform.security.shiro.authorization.SecurityAnnotationInterceptor$1.proceed(SecurityAnnotationInterceptor.java:45)
at org.apache.shiro.authz.aop.AuthorizingMethodInterceptor.invoke(AuthorizingMethodInterceptor.java:40)
at com.qualys.platform.security.shiro.authorization.SecurityAnnotationInterceptor.invoke(SecurityAnnotationInterceptor.java:57)
at com.google.inject.internal.InterceptorStackCallback$InterceptedMethodInvocation.proceed(InterceptorStackCallback.java:72)
at com.qualys.platform.security.aop.intercept.AuthenticationInterceptor.invoke(AuthenticationInterceptor.java:33)
at com.google.inject.internal.InterceptorStackCallback$InterceptedMethodInvocation.proceed(InterceptorStackCallback.java:72)
at com.google.inject.internal.InterceptorStackCallback.intercept(InterceptorStackCallback.java:52)
at com.qualys.portal.backoffice.module.backoffice.dwr.CustomerService$$EnhancerByGuice$$17905382.getCustomers(<generated>)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.directwebremoting.impl.CreatorModule$1.doFilter(CreatorModule.java:235)
at com.qualys.platform.web.dwr.request.RequestTemplateFilter.doFilter(RequestTemplateFilter.java:22)
at org.directwebremoting.impl.CreatorModule$1.doFilter(CreatorModule.java:223)
at com.qualys.platform.web.dwr.AuditLogAjaxFilter.doFilter(AuditLogAjaxFilter.java:34)
at org.directwebremoting.impl.CreatorModule$1.doFilter(CreatorModule.java:223)
at org.directwebremoting.impl.CreatorModule.executeMethod(CreatorModule.java:241)
at org.directwebremoting.impl.DefaultRemoter.execute(DefaultRemoter.java:379)
at com.qualys.platform.web.dwr.PlatformRemoter.execute(PlatformRemoter.java:167)
at com.qualys.platform.web.dwr.request.PlatformCallHandler.handle(PlatformCallHandler.java:55)
at org.directwebremoting.servlet.UrlProcessor.handle(UrlProcessor.java:120)
at org.directwebremoting.servlet.DwrServlet.doPost(DwrServlet.java:141)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:637)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:717)
at com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:263)
at com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:178)
at com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91)
at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:62)
at com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:168)
at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58)
at com.qualys.platform.security.web.DataSecuritySessionFilter.doFilter(DataSecuritySessionFilter.java:28)
at com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:163)
at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58)
at com.qualys.platform.security.web.AuthenticationFilter.doFilter(AuthenticationFilter.java:50)
at com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:163)
at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58)
at org.apache.shiro.web.servlet.AbstractShiroFilter.executeChain(AbstractShiroFilter.java:373)
at org.apache.shiro.web.servlet.AbstractShiroFilter.doFilterInternal(AbstractShiroFilter.java:306)
at org.apache.shiro.web.servlet.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:83)
at com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:163)
at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58)
at com.google.inject.persist.PersistFilter.doFilter(PersistFilter.java:89)
at com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:163)
at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58)
at com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:118)
at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:113)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:555)
at org.terracotta.modules.tomcat.tomcat_5_5.SessionValve55.tcInvoke(SessionValve55.java:99)
at org.terracotta.modules.tomcat.tomcat_5_5.SessionValve55.invoke(SessionValve55.java:86)
at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)
at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:852)
at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588)
at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)
at java.lang.Thread.run(Thread.java:662)
LOCKED : [StringLockID(SOAQWxhTYURTu3PMBqFy.session.invalidator)]
WAITING TO LOCK: [LongLockID(-2128867798)]

 



And several other http threads are waiting on the same lock (replaced with "..." for stacks (they are the same)):


"http-50200-5" Id=370 WAITING on java.util.concurrent.Semaphore$NonfairSync@57e788d9
...
LOCKED : [StringLockID(0s2RXcioPPWuGRvwX1mK.session.invalidator), StringLockID(YWcaB9FW7A5DTwAVDHEM.session.invalidator)]
WAITING TO LOCK: [LongLockID(-2128867798)]

----------------------------------------------------------------

"http-50200-4" Id=369 WAITING on java.util.concurrent.Semaphore$NonfairSync@30b6b26a
...
LOCKED : [StringLockID(YWcaB9FW7A5DTwAVDHEM.session.invalidator)]
WAITING TO LOCK: [LongLockID(-2128867798)]

-----------------------------------------------------------------

"http-50200-2" Id=367 WAITING on java.util.concurrent.Semaphore$NonfairSync@4f81354c
...
LOCKED : [StringLockID(YWcaB9FW7A5DTwAVDHEM.session.invalidator)]
WAITING TO LOCK: [LongLockID(-2128867798)]

-----------------------------------------------------------------

"http-50200-1" Id=366 WAITING on java.util.concurrent.Semaphore$NonfairSync@3b498096
...
LOCKED : [StringLockID(SOAQWxhTYURTu3PMBqFy.session.invalidator)]
WAITING TO LOCK: [LongLockID(-2128867798)]

 


Next I took a client dump of this node. Shown here are some relevant portions of the client dump:



[dump] --> ClientLockImpl : StringLockID(0s2RXcioPPWuGRvwX1mK.session.invalidator)
GC Cycle Count : 127
Greediness : GREEDY_READ
State:
LockHold : Thread(http-50200-5) ID[77] : READ

[dump] --> ClientLockImpl : StringLockID(SOAQWxhTYURTu3PMBqFy)
GC Cycle Count : 127
Greediness : GREEDY_WRITE
State:

[dump] --> ClientLockImpl : StringLockID(SOAQWxhTYURTu3PMBqFy.session.invalidator)
GC Cycle Count : 127
Greediness : GREEDY_READ
State:
LockHold : Thread(http-50200-6) ID[76] : READ
LockHold : Thread(http-50200-1) ID[73] : READ

[dump] --> ClientLockImpl : StringLockID(YWcaB9FW7A5DTwAVDHEM)
GC Cycle Count : 0
Greediness : GREEDY_WRITE
State:

[dump] --> ClientLockImpl : StringLockID(YWcaB9FW7A5DTwAVDHEM.session.invalidator)
GC Cycle Count : 0
Greediness : GREEDY_READ
State:
LockHold : Thread(http-50200-8) ID[93] : READ
LockHold : Thread(http-50200-5) ID[77] : READ
LockHold : Thread(http-50200-4) ID[74] : READ
LockHold : Thread(http-50200-2) ID[72] : READ

[dump] --> ClientLockImpl : LongLockID(-2128867798)
GC Cycle Count : 127
Greediness : FREE
State:
PendingLockHold : Thread(http-50200-1) ID[73] : WRITE : delegated=true, awarded=false, refused=false, Called remote.lock(...)...
PendingLockHold : Thread(http-50200-2) ID[72] : WRITE : delegated=true, awarded=false, refused=false, Called remote.lock(...)...
PendingLockHold : Thread(http-50200-4) ID[74] : WRITE : delegated=true, awarded=false, refused=false, Called remote.lock(...)...
PendingLockHold : Thread(http-50200-6) ID[76] : WRITE : delegated=true, awarded=false, refused=false, Called remote.lock(...)...
PendingLockHold : Thread(http-50200-5) ID[77] : WRITE : delegated=true, awarded=false, refused=false, Called remote.lock(...)...

 



And finally some relevant data from the server log



[dump] Lock Info
[dump] StringLockID(SOAQWxhTYURTu3PMBqFy)
[dump] Contexts [ ServerLockContext [clientID=ClientID[250], state=GREEDY_HOLDER:WRITE, threadID=ThreadID=[-9223372036854775808]] ]
[dump] isRecalled=false

[dump] Lock Info
[dump] StringLockID(SOAQWxhTYURTu3PMBqFy.session.invalidator)
[dump] Contexts [ ServerLockContext [clientID=ClientID[250], state=GREEDY_HOLDER:READ, threadID=ThreadID=[-9223372036854775808]] ]
[dump] isRecalled=false

[dump] Lock Info
[dump] LongLockID(-2128867798)
[dump] Contexts [ ServerLockContext [clientID=ClientID[247], state=GREEDY_HOLDER:WRITE, threadID=ThreadID=[-9223372036854775808]] , ServerLockContext [clientID=ClientID[240], state=PENDING:WRITE, threadID=ThreadID=[76]] , ServerLockContext [clientID=ClientID[240], state=PENDING:READ, threadID=ThreadID=[22]] , ServerLockContext [clientID=ClientID[250], state=PENDING:WRITE, threadID=ThreadID=[73]] , ServerLockContext [clientID=ClientID[250], state=PENDING:WRITE, threadID=ThreadID=[72]] , ServerLockContext [clientID=ClientID[250], state=PENDING:WRITE, threadID=ThreadID=[74]] , ServerLockContext [clientID=ClientID[250], state=PENDING:WRITE, threadID=ThreadID=[76]] , ServerLockContext [clientID=ClientID[250], state=PENDING:WRITE, threadID=ThreadID=[77]] , ServerLockContext [clientID=ClientID[249], state=PENDING:WRITE, threadID=ThreadID=[91]] , ServerLockContext [clientID=ClientID[249], state=PENDING:WRITE, threadID=ThreadID=[90]] , ServerLockContext [clientID=ClientID[249], state=PENDING:WRITE, threadID=ThreadID=[87]] ]
[dump] isRecalled=true

[dump] Lock Info
[dump] StringLockID(0s2RXcioPPWuGRvwX1mK.session.invalidator)
[dump] Contexts [ ServerLockContext [clientID=ClientID[250], state=GREEDY_HOLDER:READ, threadID=ThreadID=[-9223372036854775808]] ]
[dump] isRecalled=false

[dump] Lock Info
[dump] StringLockID(YWcaB9FW7A5DTwAVDHEM)
[dump] Contexts [ ServerLockContext [clientID=ClientID[250], state=GREEDY_HOLDER:WRITE, threadID=ThreadID=[-9223372036854775808]] ]
[dump] isRecalled=false

[dump] Lock Info
[dump] StringLockID(YWcaB9FW7A5DTwAVDHEM.session.invalidator)
[dump] Contexts [ ServerLockContext [clientID=ClientID[250], state=GREEDY_HOLDER:READ, threadID=ThreadID=[-9223372036854775808]] ]
[dump] isRecalled=false



 



Thank you so much!!

jclawson

journeyman

Joined: 06/17/2010 11:03:46
Messages: 22
Offline

It kinda looks like ClientID[247] is holding the lock that everyone wants. Here are some relevant dump parts from ClientID[247]:


[dump] --> ClientLockImpl : LongLockID(-2128867798)
GC Cycle Count : 127
Greediness : RECALLED_WRITE
State:
LockHold : Thread(HazelcastReindexer-processor-2) ID[78] : WRITE
PendingLockHold : Thread(HazelcastReindexer-processor-4) ID[79] : READ : delegated=true, awarded=false, refused=false, Called remote.lock(...)...
PendingLockHold : Thread(HazelcastReindexer-processor-3) ID[80] : WRITE : delegated=false, awarded=false, refused=false, Not Delegated
PendingLockHold : Thread(PortalScheduleCluster_Worker-24) ID[111] : WRITE : delegated=false, awarded=false, refused=false, Not Delegated

[dump] --> ClientLockImpl : LongLockID(-2128852325)
GC Cycle Count : 127
Greediness : GREEDY_WRITE
State:
LockHold : Thread(QwebHostUpdateQueueProcessor-processor-2) ID[148] : WRITE



 



Here is the thread dump for HazelcastReindexer-processor-2


"HazelcastReindexer-processor-2" Id=406 WAITING on java.util.concurrent.locks.ReentrantReadWriteLock$FairSync@70d61741
at sun.misc.Unsafe.park(Native Method)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:811)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:941)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1261)
at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.__RWL__tc_lock(ReentrantReadWriteLock.java:594)
at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java)
at org.terracotta.modules.ehcache.coherence.IncoherentNodesSet.acquireReadLock(IncoherentNodesSet.java:59)
at org.terracotta.modules.ehcache.store.ClusteredStore.doGet(ClusteredStore.java:277)
at org.terracotta.modules.ehcache.store.ClusteredStore.getQuiet(ClusteredStore.java:270)
at net.sf.ehcache.Cache.searchInStoreWithoutStats(Cache.java:1791)
at net.sf.ehcache.Cache.get(Cache.java:1411)
at net.sf.ehcache.hibernate.regions.EhcacheTransactionalDataRegion.get(EhcacheTransactionalDataRegion.java:91)
at net.sf.ehcache.hibernate.strategy.AbstractReadWriteEhcacheAccessStrategy.putFromLoad(AbstractReadWriteEhcacheAccessStrategy.java:90)
at org.hibernate.engine.TwoPhaseLoad.initializeEntity(TwoPhaseLoad.java:195)
at org.hibernate.loader.Loader.initializeEntitiesAndCollections(Loader.java:982)
at org.hibernate.loader.Loader.doQuery(Loader.java:857)
at org.hibernate.loader.Loader.doQueryAndInitializeNonLazyCollections(Loader.java:274)
at org.hibernate.loader.Loader.loadEntity(Loader.java:2037)
at org.hibernate.loader.entity.AbstractEntityLoader.load(AbstractEntityLoader.java:86)
at org.hibernate.loader.entity.EntityLoader.loadByUniqueKey(EntityLoader.java:159)
at org.hibernate.persister.entity.AbstractEntityPersister.loadByUniqueKey(AbstractEntityPersister.java:1775)
at org.hibernate.type.EntityType.loadByUniqueKey(EntityType.java:681)
at org.hibernate.type.EntityType.resolve(EntityType.java:441)
at org.hibernate.engine.TwoPhaseLoad.initializeEntity(TwoPhaseLoad.java:139)
at org.hibernate.loader.Loader.initializeEntitiesAndCollections(Loader.java:982)
at org.hibernate.loader.Loader.doQuery(Loader.java:857)
at org.hibernate.loader.Loader.doQueryAndInitializeNonLazyCollections(Loader.java:274)
at org.hibernate.loader.Loader.loadEntity(Loader.java:2037)
at org.hibernate.loader.entity.AbstractEntityLoader.load(AbstractEntityLoader.java:86)
at org.hibernate.loader.entity.AbstractEntityLoader.load(AbstractEntityLoader.java:76)
at org.hibernate.persister.entity.AbstractEntityPersister.load(AbstractEntityPersister.java:3268)
at org.hibernate.event.def.DefaultLoadEventListener.loadFromDatasource(DefaultLoadEventListener.java:496)
at org.hibernate.event.def.DefaultLoadEventListener.doLoad(DefaultLoadEventListener.java:477)
at org.hibernate.event.def.DefaultLoadEventListener.load(DefaultLoadEventListener.java:227)
at org.hibernate.event.def.DefaultLoadEventListener.proxyOrLoad(DefaultLoadEventListener.java:285)
at org.hibernate.event.def.DefaultLoadEventListener.onLoad(DefaultLoadEventListener.java:152)
at org.hibernate.impl.SessionImpl.fireLoad(SessionImpl.java:1090)
at org.hibernate.impl.SessionImpl.get(SessionImpl.java:1005)
at org.hibernate.impl.SessionImpl.get(SessionImpl.java:998)
at org.hibernate.ejb.AbstractEntityManagerImpl.find(AbstractEntityManagerImpl.java:614)
at org.hibernate.ejb.AbstractEntityManagerImpl.find(AbstractEntityManagerImpl.java:589)
at com.qualys.portal.application.module.indexing.business.reindex.HazelcastReindexer.call(HazelcastReindexer.java:91)
at com.qualys.portal.application.module.indexing.business.reindex.HazelcastReindexer.call(HazelcastReindexer.java:1)
at com.qualys.portal.application.module.core.business.thread.ProcessorPool$ProcessorThread$1.run(ProcessorPool.java:112)
at com.qualys.platform.security.PlatformSecurity.sudo(PlatformSecurity.java:343)
at com.qualys.platform.security.PlatformSecurity.sudo(PlatformSecurity.java:332)
at com.qualys.platform.security.SecurityUtil.sudo(SecurityUtil.java:134)
at com.qualys.portal.application.module.core.business.thread.ProcessorPool$ProcessorThread.run(ProcessorPool.java:100)

 


This thread dump didn't come with the nice "LOCKED:" or "WAITING TO LOCK:" parts. So... I don't know what lock he is really waiting on besides FairSync@70d61741. A TON of threads (maybe 20 or more?) appear to be waiting on this same lock.

I am assuming this situation might be... It almost looks like hibernate / ehcache is trying to acquire write locks on 2 different caches without returning the first lock and in 1 case does it in one order, and the other case a different order resulting in a deadlock. I am just guessing here....

I am wondering why terracotta reports that hazelcast-processor-2 has a WRITE lock ("LockHold : Thread(HazelcastReindexer-processor-2) ID[78] : WRITE") when it is clearly doing a read lock: "at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java)"

Thanks.
jclawson

journeyman

Joined: 06/17/2010 11:03:46
Messages: 22
Offline

There are a couple of threads that are doing native queries causing ehcache to clear. It is waiting on FairSync@70d61741 but I could see that a full ehcache clear might exacerbate this issue. We definitely need to fix this :-/.

Any other thoughts?
jclawson

journeyman

Joined: 06/17/2010 11:03:46
Messages: 22
Offline

It happened again .


Client log from locked node:

[dump] --> ClientLockImpl : LongLockID(-2128863062)
GC Cycle Count : 127
Greediness : FREE
State:
PendingLockHold : Thread(http-50200-8) ID[94] : WRITE : delegated=true, awarded=false, refused=false, Called remote.lock(...)...
PendingLockHold : Thread(http-50200-11) ID[73] : WRITE : delegated=true, awarded=false, refused=false, Called remote.lock(...)...
PendingLockHold : Thread(http-50200-5) ID[74] : WRITE : delegated=true, awarded=false, refused=false, Called remote.lock(...)...
PendingLockHold : Thread(http-50200-10) ID[72] : WRITE : delegated=true, awarded=false, refused=false, Called remote.lock(...)...
PendingLockHold : Thread(http-50200-2) ID[93] : WRITE : delegated=true, awarded=false, refused=false, Called remote.lock(...)...
PendingLockHold : Thread(http-50200-3) ID[102] : WRITE : delegated=true, awarded=false, refused=false, Called remote.lock(...)...
PendingLockHold : Thread(http-50200-15) ID[104] : WRITE : delegated=true, awarded=false, refused=false, Called remote.lock(...)...
PendingLockHold : Thread(http-50200-4) ID[103] : WRITE : delegated=true, awarded=false, refused=false, Called remote.lock(...)...
PendingLockHold : Thread(http-50200-14) ID[105] : WRITE : delegated=true, awarded=false, refused=false, Called remote.lock(...)...
PendingLockHold : Thread(http-50200-9) ID[101] : WRITE : delegated=true, awarded=false, refused=false, Called remote.lock(...)...
PendingLockHold : Thread(http-50200-13) ID[100] : WRITE : delegated=true, awarded=false, refused=false, Called remote.lock(...)...
PendingLockHold : Thread(http-50200-6) ID[106] : WRITE : delegated=true, awarded=false, refused=false, Called remote.lock(...)...
PendingLockHold : Thread(http-50200-7) ID[107] : WRITE : delegated=true, awarded=false, refused=false, Called remote.lock(...)...
PendingLockHold : Thread(http-50200-1) ID[108] : WRITE : delegated=true, awarded=false, refused=false, Called remote.lock(...)...
PendingLockHold : Thread(http-50200-16) ID[109] : WRITE : delegated=true, awarded=false, refused=false, Called remote.lock(...)...
PendingLockHold : Thread(http-50200-22) ID[110] : WRITE : delegated=true, awarded=false, refused=false, Called remote.lock(...)...
PendingLockHold : Thread(http-50200-19) ID[111] : WRITE : delegated=true, awarded=false, refused=false, Called remote.lock(...)...
PendingLockHold : Thread(http-50200-21) ID[113] : WRITE : delegated=true, awarded=false, refused=false, Called remote.lock(...)...
PendingLockHold : Thread(http-50200-17) ID[114] : WRITE : delegated=true, awarded=false, refused=false, Called remote.lock(...)...
PendingLockHold : Thread(http-50200-20) ID[112] : WRITE : delegated=true, awarded=false, refused=false, Called remote.lock(...)...
PendingLockHold : Thread(http-50200-18) ID[116] : WRITE : delegated=true, awarded=false, refused=false, Called remote.lock(...)...
PendingLockHold : Thread(http-50200-23) ID[115] : WRITE : delegated=true, awarded=false, refused=false, Called remote.lock(...)...
PendingLockHold : Thread(http-50200-28) ID[121] : WRITE : delegated=true, awarded=false, refused=false, Called remote.lock(...)...
PendingLockHold : Thread(http-50200-25) ID[118] : WRITE : delegated=true, awarded=false, refused=false, Called remote.lock(...)...
PendingLockHold : Thread(http-50200-27) ID[120] : WRITE : delegated=true, awarded=false, refused=false, Called remote.lock(...)...
PendingLockHold : Thread(http-50200-24) ID[117] : WRITE : delegated=true, awarded=false, refused=false, Called remote.lock(...)...
PendingLockHold : Thread(http-50200-29) ID[122] : WRITE : delegated=true, awarded=false, refused=false, Called remote.lock(...)...
PendingLockHold : Thread(http-50200-30) ID[123] : WRITE : delegated=true, awarded=false, refused=false, Called remote.lock(...)...
PendingLockHold : Thread(http-50200-26) ID[119] : WRITE : delegated=true, awarded=false, refused=false, Called remote.lock(...)...
PendingLockHold : Thread(http-50200-32) ID[125] : WRITE : delegated=true, awarded=false, refused=false, Called remote.lock(...)...
PendingLockHold : Thread(http-50200-31) ID[124] : WRITE : delegated=true, awarded=false, refused=false, Called remote.lock(...)...
PendingLockHold : Thread(http-50200-33) ID[126] : WRITE : delegated=true, awarded=false, refused=false, Called remote.lock(...)...
PendingLockHold : Thread(http-50200-38) ID[131] : WRITE : delegated=true, awarded=false, refused=false, Called remote.lock(...)...
PendingLockHold : Thread(http-50200-36) ID[128] : WRITE : delegated=true, awarded=false, refused=false, Called remote.lock(...)...
PendingLockHold : Thread(http-50200-37) ID[130] : WRITE : delegated=true, awarded=false, refused=false, Called remote.lock(...)...
PendingLockHold : Thread(http-50200-35) ID[129] : WRITE : delegated=true, awarded=false, refused=false, Called remote.lock(...)...
PendingLockHold : Thread(http-50200-34) ID[127] : WRITE : delegated=true, awarded=false, refused=false, Called remote.lock(...)...
PendingLockHold : Thread(http-50200-40) ID[133] : WRITE : delegated=true, awarded=false, refused=false, Called remote.lock(...)...
PendingLockHold : Thread(http-50200-43) ID[135] : WRITE : delegated=true, awarded=false, refused=false, Called remote.lock(...)...
PendingLockHold : Thread(http-50200-41) ID[134] : WRITE : delegated=true, awarded=false, refused=false, Called remote.lock(...)...
PendingLockHold : Thread(http-50200-42) ID[136] : WRITE : delegated=true, awarded=false, refused=false, Called remote.lock(...)...
PendingLockHold : Thread(http-50200-39) ID[132] : WRITE : delegated=true, awarded=false, refused=false, Called remote.lock(...)...
PendingLockHold : Thread(http-50200-44) ID[137] : WRITE : delegated=true, awarded=false, refused=false, Called remote.lock(...)...
PendingLockHold : Thread(http-50200-45) ID[138] : WRITE : delegated=true, awarded=false, refused=false, Called remote.lock(...)...
PendingLockHold : Thread(http-50200-46) ID[139] : WRITE : delegated=true, awarded=false, refused=false, Called remote.lock(...)...
 


Server log showing who owns the lock "LongLockID(-2128863062)"

[dump] Lock Info
[dump] LongLockID(-2128863062)
[dump] Contexts [ ServerLockContext [clientID=ClientID[253], state=GREEDY_HOLDER:WRITE, threadID=ThreadID=[-9223372036854775808]] , ServerLockContext [clientID=ClientID[251], state=PENDING:WRITE, threadID=ThreadID=[94]] , ServerLockContext [clientID=ClientID[251], state=PENDING:WRITE, threadID=ThreadID=[73]] , ServerLockContext [clientID=ClientID[251], state=PENDING:WRITE, threadID=ThreadID=[74]] , ServerLockContext [clientID=ClientID[251], state=PENDING:WRITE, threadID=ThreadID=[72]] , ServerLockContext [clientID=ClientID[251], state=PENDING:WRITE, threadID=ThreadID=[93]] , ServerLockContext [clientID=ClientID[251], state=PENDING:WRITE, threadID=ThreadID=[102]] , ServerLockContext [clientID=ClientID[251], state=PENDING:WRITE, threadID=ThreadID=[104]] , ServerLockContext [clientID=ClientID[251], state=PENDING:WRITE, threadID=ThreadID=[103]] , ServerLockContext [clientID=ClientID[251], state=PENDING:WRITE, threadID=ThreadID=[105]] , ServerLockContext [clientID=ClientID[251], state=PENDING:WRITE, threadID=ThreadID=[101]] , ServerLockContext [clientID=ClientID[251], state=PENDING:WRITE, threadID=ThreadID=[100]] , ServerLockContext [clientID=ClientID[251], state=PENDING:WRITE, threadID=ThreadID=[106]] , ServerLockContext [clientID=ClientID[251], state=PENDING:WRITE, threadID=ThreadID=[107]] , ServerLockContext [clientID=ClientID[251], state=PENDING:WRITE, threadID=ThreadID=[108]] , ServerLockContext [clientID=ClientID[251], state=PENDING:WRITE, threadID=ThreadID=[109]] , ServerLockContext [clientID=ClientID[251], state=PENDING:WRITE, threadID=ThreadID=[110]] , ServerLockContext [clientID=ClientID[251], state=PENDING:WRITE, threadID=ThreadID=[111]] , ServerLockContext [clientID=ClientID[251], state=PENDING:WRITE, threadID=ThreadID=[113]] , ServerLockContext [clientID=ClientID[251], state=PENDING:WRITE, threadID=ThreadID=[114]] , ServerLockContext [clientID=ClientID[251], state=PENDING:WRITE, threadID=ThreadID=[112]] , ServerLockContext [clientID=ClientID[251], state=PENDING:WRITE, threadID=ThreadID=[116]] , ServerLockContext [clientID=ClientID[251], state=PENDING:WRITE, threadID=ThreadID=[115]] , ServerLockContext [clientID=ClientID[251], state=PENDING:WRITE, threadID=ThreadID=[121]] , ServerLockContext [clientID=ClientID[251], state=PENDING:WRITE, threadID=ThreadID=[118]] , ServerLockContext [clientID=ClientID[251], state=PENDING:WRITE, threadID=ThreadID=[120]] , ServerLockContext [clientID=ClientID[251], state=PENDING:WRITE, threadID=ThreadID=[117]] , ServerLockContext [clientID=ClientID[251], state=PENDING:WRITE, threadID=ThreadID=[122]] , ServerLockContext [clientID=ClientID[251], state=PENDING:WRITE, threadID=ThreadID=[123]] , ServerLockContext [clientID=ClientID[251], state=PENDING:WRITE, threadID=ThreadID=[119]] , ServerLockContext [clientID=ClientID[251], state=PENDING:WRITE, threadID=ThreadID=[125]] , ServerLockContext [clientID=ClientID[251], state=PENDING:WRITE, threadID=ThreadID=[124]] , ServerLockContext [clientID=ClientID[251], state=PENDING:WRITE, threadID=ThreadID=[126]] , ServerLockContext [clientID=ClientID[251], state=PENDING:WRITE, threadID=ThreadID=[131]] , ServerLockContext [clientID=ClientID[251], state=PENDING:WRITE, threadID=ThreadID=[128]] , ServerLockContext [clientID=ClientID[251], state=PENDING:WRITE, threadID=ThreadID=[130]] , ServerLockContext [clientID=ClientID[251], state=PENDING:WRITE, threadID=ThreadID=[129]] , ServerLockContext [clientID=ClientID[251], state=PENDING:WRITE, threadID=ThreadID=[127]] , ServerLockContext [clientID=ClientID[251], state=PENDING:WRITE, threadID=ThreadID=[133]] , ServerLockContext [clientID=ClientID[251], state=PENDING:WRITE, threadID=ThreadID=[135]] , ServerLockContext [clientID=ClientID[251], state=PENDING:WRITE, threadID=ThreadID=[134]] , ServerLockContext [clientID=ClientID[251], state=PENDING:WRITE, threadID=ThreadID=[136]] , ServerLockContext [clientID=ClientID[251], state=PENDING:WRITE, threadID=ThreadID=[132]] , ServerLockContext [clientID=ClientID[251], state=PENDING:WRITE, threadID=ThreadID=[137]] , ServerLockContext [clientID=ClientID[251], state=PENDING:WRITE, threadID=ThreadID=[138]] , ServerLockContext [clientID=ClientID[251], state=PENDING:WRITE, threadID=ThreadID=[139]] ]
[dump] isRecalled=true
 


thead from node owning lock "LongLockID(-2128863062)"


"PortalScheduleCluster_Worker-3" Id=160 WAITING on java.util.concurrent.locks.ReentrantReadWriteLock$FairSync@666594d3
at sun.misc.Unsafe.park(Native Method)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:811)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:941)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1261)
at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.__RWL__tc_lock(ReentrantReadWriteLock.java:594)
at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java)
at org.terracotta.modules.ehcache.coherence.IncoherentNodesSet.acquireReadLock(IncoherentNodesSet.java:59)
at org.terracotta.modules.ehcache.store.ClusteredStore.doGet(ClusteredStore.java:277)
at org.terracotta.modules.ehcache.store.ClusteredStore.getQuiet(ClusteredStore.java:270)
at net.sf.ehcache.Cache.searchInStoreWithoutStats(Cache.java:1791)
at net.sf.ehcache.Cache.get(Cache.java:1411)
at net.sf.ehcache.hibernate.regions.EhcacheTransactionalDataRegion.get(EhcacheTransactionalDataRegion.java:91)
at net.sf.ehcache.hibernate.strategy.AbstractReadWriteEhcacheAccessStrategy.putFromLoad(AbstractReadWriteEhcacheAccessStrategy.java:90)
at org.hibernate.engine.TwoPhaseLoad.initializeEntity(TwoPhaseLoad.java:195)
at org.hibernate.loader.Loader.initializeEntitiesAndCollections(Loader.java:982)
at org.hibernate.loader.Loader.doQuery(Loader.java:857)
at org.hibernate.loader.Loader.doQueryAndInitializeNonLazyCollections(Loader.java:274)
at org.hibernate.loader.Loader.doList(Loader.java:2533)
at org.hibernate.loader.Loader.listUsingQueryCache(Loader.java:2319)
at org.hibernate.loader.Loader.list(Loader.java:2268)
at org.hibernate.loader.criteria.CriteriaLoader.list(CriteriaLoader.java:119)
at org.hibernate.impl.SessionImpl.list(SessionImpl.java:1716)
at org.hibernate.impl.CriteriaImpl.list(CriteriaImpl.java:347)
at com.qualys.platform.persistence.domain.AbstractDao.findById(AbstractDao.java:135)
at com.qualys.platform.persistence.domain.AbstractDao.get(AbstractDao.java:110)
at com.qualys.portal.application.module.scan.business.ScanWatcher.execute(ScanWatcher.java:72)
at com.qualys.portal.application.module.was.business.scan.WasScanWatcher$$EnhancerByGuice$$3c962cb3.CGLIB$execute$4(<generated>)
at com.qualys.portal.application.module.was.business.scan.WasScanWatcher$$EnhancerByGuice$$3c962cb3$$FastClassByGuice$$9aa1c5d9.invoke(<generated>)
at com.google.inject.internal.cglib.proxy.$MethodProxy.invokeSuper(MethodProxy.java:228)
at com.google.inject.internal.InterceptorStackCallback$InterceptedMethodInvocation.proceed(InterceptorStackCallback.java:72)
at com.qualys.platform.persistence.config.internal.guicepersist.HibernateLocalTxInterceptor.invoke(HibernateLocalTxInterceptor.java:56)
at com.google.inject.internal.InterceptorStackCallback$InterceptedMethodInvocation.proceed(InterceptorStackCallback.java:72)
at com.google.inject.internal.InterceptorStackCallback.intercept(InterceptorStackCallback.java:52)
at com.qualys.portal.application.module.was.business.scan.WasScanWatcher$$EnhancerByGuice$$3c962cb3.execute(<generated>)
at com.qualys.platform.quartz.guice.GuiceJobFactory$JobWrapper.execute(GuiceJobFactory.java:113)
at org.quartz.core.JobRunShell.run(JobRunShell.java:216)
at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:549)
LOCKED : [LongLockID(-2128863062)]
 


And finally... the tc client logs from the node that owns the lock


[dump] --> ClientLockImpl : LongLockID(-2128863062)
GC Cycle Count : 127
Greediness : RECALLED_WRITE
State:
LockHold : Thread(PortalScheduleCluster_Worker-3) ID[76] : WRITE
 
jclawson

journeyman

Joined: 06/17/2010 11:03:46
Messages: 22
Offline

Any help?
ari

seraphim

Joined: 05/24/2006 14:23:21
Messages: 1665
Location: San Francisco, CA
Offline

What version of Terracotta is this? Looks to me on a 60 second glance that the lock is held by a client and simultaneously recalled by the server which is a violation of the locking state machine inside the core.

Let's start with version and go from there.

--Ari
[WWW]
jclawson

journeyman

Joined: 06/17/2010 11:03:46
Messages: 22
Offline

terracotta 3.4.0 and ehcache 2.3.0
ari

seraphim

Joined: 05/24/2006 14:23:21
Messages: 1665
Location: San Francisco, CA
Offline

can you upgrade right away? I am not sure because I am not part of the TC day-to-day team but I think those guys might not be supporting 3.4.0 anymore.

There are several updates on the 3.4 line. And the 3.5 line has been GA over 1.5 years. And the 3.6 line is out and GA and very stable as well.

I think there might have been a few races you want to upgrade away from, to be blunt.

--Ari
[WWW]
ari

seraphim

Joined: 05/24/2006 14:23:21
Messages: 1665
Location: San Francisco, CA
Offline

BTW, I am basing this advice on only a very cursory scan of your post. It looked to me like you had clients waiting on a lock that one client thinks it has and yet the server is reporting the lock is recalled and server-owned at the moment.

FWIW,

Greediness is important to know here:
1. GREEDY READ == all clients can hold a greedy read lock at the same time. It means no writer is present and holding the lock in the cluster. Also, if a writer on that lock appears, readers must yield.
2. GREEDY WRITE == I have the lock and I am holding on to it locally even if I may not be using it at the moment. READERS must wait for me to yield the lock till they are allowed into the critical code segment it protects.
3. RECALL == server saying to a client "hey, greedy dude, give up your lock right now." It is almost lock preemption but not truly in that it will not forcibly take the lock. It just raises a flag with the client that after a lock release, you must yield the lock.

So a lock can be stuck in attempt to recall due to bad GC on a client holding that lock, for example. Then you would tune HA settings in your tc-config.xml. But I thought I saw the clients waiting and the server having it yet waiting at the same time which sounds 3.4.0-ish to me.

--Ari
[WWW]
jclawson

journeyman

Joined: 06/17/2010 11:03:46
Messages: 22
Offline

Thanks for your help. I really appreciate it.

We will upgrade terracotta soon I hope. In the mean time we have switched from using ehcache as our hibernate second level cache to hazelcast. This was easier than deploying a new terracotta server and requires zero down time.
ari

seraphim

Joined: 05/24/2006 14:23:21
Messages: 1665
Location: San Francisco, CA
Offline

Good luck with that.

--Ari
[WWW]
 
Forum Index -> Ehcache
Go to:   
Powered by JForum 2.1.7 © JForum Team