Author |
Message |
11/18/2010 01:27:04
|
targit
journeyman
Joined: 11/18/2010 01:17:53
Messages: 10
Offline
|
Hi,
using ehcache 2.3.0 standalone. we have strange problems under heavy concurrent access with blocking cache. some threads will never wake up and remain in waiting state. this will crash our system.
thread dump:
Thread: ajp-0.0.0.0-8010-2 : priority:5, demon:true, threadId:129, threadState:WAITING, lockName:java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync@6bad4311
sun.misc.Unsafe.park(Native Method)
java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:747)
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:778)
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1114)
java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:807)
net.sf.ehcache.concurrent.ReadWriteLockSync.lock(ReadWriteLockSync.java:53)
net.sf.ehcache.constructs.blocking.BlockingCache.put(BlockingCache.java:204)
de.company.webdb.caching.CacheServiceBean.put(CacheServiceBean.java:166)
we have over 200 thread with same state !!!!
any ideas ?
ehcache 1.6.2 will work under same scenario with no problems !
using java:
Java HotSpot(TM) 64-Bit Server VM (build 16.0-b13, mixed mode)
|
|
|
11/18/2010 02:15:13
|
alexsnaps
consul
Joined: 06/19/2009 09:06:00
Messages: 484
Offline
|
We have noticed similar behavior under certain circumstances.
But are the 200 threads waiting for the write lock?
We are currently evaluating what's the best way to address that, so your input is more than welcome.
Thanks!
|
Alex Snaps (Terracotta engineer) |
|
|
11/18/2010 02:58:57
|
targit
journeyman
Joined: 11/18/2010 01:17:53
Messages: 10
Offline
|
yes. all waiting for the write lock.
What more informations you need?
what do prefer for a workaround? fallback to 1.6.2 ?
we are planing to use jgrouprepliaction in future. it's possible to use this feature with 1.6.2 ?
|
|
|
11/18/2010 08:18:40
|
etsai
master
Joined: 07/31/2007 10:14:38
Messages: 72
Offline
|
May be a JVM issue. Please using JDK_1.6.0.21 or higher.
Refer following links:
https://jira.terracotta.org/jira/browse/DEV-4685
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6822370
|
|
|
11/18/2010 14:55:13
|
alexsnaps
consul
Joined: 06/19/2009 09:06:00
Messages: 484
Offline
|
If that works out for you, please let us know.
Thanks!
|
Alex Snaps (Terracotta engineer) |
|
|
11/19/2010 01:42:08
|
mmatook
neo
Joined: 03/31/2009 15:53:01
Messages: 3
Offline
|
I have seen this problem to occur on quad-core + quad-socket systems under high load (1000 concurrent threads get stuck). It appears to be linked to a JVM bug ( should be fixed in JDK_1.6.0.18 or higher )
Temporary workaround could be to use the -XX:+UseMembar parameter ... seemed to help in some cases ( if upgrading the JDK is not an option).
In any case let us know how you go ...
|
|
|
11/19/2010 05:03:10
|
targit
journeyman
Joined: 11/18/2010 01:17:53
Messages: 10
Offline
|
thx for the help.
we will try first newest jdk1.6.22 and maybe then the vm hint -XX:+UseMembar.
I'll report results
|
|
|
11/19/2010 11:25:52
|
abellas
neo
Joined: 11/19/2010 11:18:10
Messages: 4
Location: Orlando, FL
Offline
|
I, too, am having the exact same issue:
"jrpp-733" prio=5 tid=1194 WAITING
at sun.misc.Unsafe.park(Native Method)
at java.util.concurrent.locks.LockSupport.park(Unknown Source)
at com.tc.object.locks.LockStateNode$PendingLockHold.park(LockStateNode.java:172)
at com.tc.object.locks.ClientLockImpl.acquireQueued(ClientLockImpl.java:731)
at com.tc.object.locks.ClientLockImpl.acquireQueued(ClientLockImpl.java:710)
at com.tc.object.locks.ClientLockImpl.lock(ClientLockImpl.java:50)
at com.tc.object.locks.ClientLockManagerImpl.lock(ClientLockManagerImpl.java:97)
at com.tc.object.bytecode.ManagerImpl.lock(ManagerImpl.java:728)
at com.tc.object.bytecode.ManagerUtil.beginLock(ManagerUtil.java:208)
at org.terracotta.collections.BasicLockStrategy.beginLock(BasicLockStrategy.java:12)
at org.terracotta.collections.ConcurrentDistributedMapDso.beginLock(ConcurrentDistributedMapDso.java:964)
at org.terracotta.collections.ConcurrentDistributedMapDso.get(ConcurrentDistributedMapDso.java:181)
at org.terracotta.collections.ConcurrentDistributedMapDsoArray.get(ConcurrentDistributedMapDsoArray.java:154)
at org.terracotta.collections.ConcurrentDistributedMap.get(ConcurrentDistributedMap.java:165)
at org.terracotta.cache.impl.DistributedCacheImpl.getNonExpiredEntry(DistributedCacheImpl.java:175)
at org.terracotta.cache.impl.DistributedCacheImpl.getNonExpiredEntryCoherent(DistributedCacheImpl.java:115)
at org.terracotta.cache.impl.DistributedCacheImpl.getTimestampedValue(DistributedCacheImpl.java:153)
at org.terracotta.modules.ehcache.store.ClusteredStore.get(ClusteredStore.java:210)
at net.sf.ehcache.Cache.searchInMemoryStoreWithStats(Cache.java:1695)
at net.sf.ehcache.Cache.get(Cache.java:1335)
at net.sf.ehcache.Cache.get(Cache.java:1306)
at coldfusion.tagext.io.cache.ehcache.GenericEhcache.get(GenericEhcache.java:75)
at coldfusion.tagext.io.cache.CacheTagHelper.getFromCache(CacheTagHelper.java:237)
at coldfusion.runtime.CFPage.CacheGet(CFPage.java:8183)
at cfCacheManager2ecfc1027664017$funcASSOCIATECACHEKEYEVICTIONSTORES.runFunction(C:\-------\service\utility\CacheManager.cfc:68)
We are definitely using the latest JDK - that was one of our check list items to try and help things. Upgraded to 1.6.22 on all clients and Terracotta servers. I will also try this jvm hint and report back... I'm thrilled to have found a forum thread talking about my exact issue (seemingly, so far).
|
|
|
11/23/2010 10:34:40
|
abellas
neo
Joined: 11/19/2010 11:18:10
Messages: 4
Location: Orlando, FL
Offline
|
The param didn't help things, we still have a couple dozen hung threads matching my previous post. We added the parameter to the clients though... I thought that made sense, but we're going to try it with the server, too.
Does anyone have any tips on how to more closely inspect what it is that's hanging up those threads? What confuses me is that the Terracotta server isn't overly stressed out on CPU, network, or memory when this is happening. I just have a hard time accepting the idea that the ColdFusion client is unable to contact or get a response back from Terracotta - if that's how I should be interpreting these hung threads.
|
|
|
11/23/2010 11:50:55
|
targit
journeyman
Joined: 11/18/2010 01:17:53
Messages: 10
Offline
|
We testing jdk1.6.22. Same issues.
Now we will try jvm hint -XX:+UseMembar.
I'll report results
|
|
|
11/29/2010 01:32:40
|
targit
journeyman
Joined: 11/18/2010 01:17:53
Messages: 10
Offline
|
We have with -XX:+UseMembar the same problems
App runs 5 days with no problems but then crash with same problems:
many threads waiting for a wake up.
´ sun.misc.Unsafe.park(Native Method)
java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:811)
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:842)
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1178)
java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:807)
net.sf.ehcache.concurrent.ReadWriteLockSync.lock(ReadWriteLockSync.java:53)
net.sf.ehcache.constructs.blocking.BlockingCache.put(BlockingCache.java:204)
de.pantarhei.webdb.caching.CacheServiceBean.put(CacheServiceBean.java:204)
see attached thread dump.
any ideas to solve this problem ?
targit
PS: Wondering, JDK-ConcurrentHashmap use same ReentrantLocking-Mechanism and dont block ? Maybe a Lock.notifyAll() missed ?
Filename |
threaddump.txt |
Download
|
Description |
|
Filesize |
654 Kbytes
|
Downloaded: |
533 time(s) |
|
|
|
11/29/2010 08:28:46
|
abellas
neo
Joined: 11/19/2010 11:18:10
Messages: 4
Location: Orlando, FL
Offline
|
While we have not directly solved the problem, we have a work around. The hung threads were occurring after large mark sweep GC's in ColdFusion. Rather than focus on the hung threads, we focused on getting the GC cycles under control.
Moving to concurrent GC (-XX:+UseConcMarkSweepGC ) and altering the ratio to increase the size of the young generation helped us prevent the large mark sweeps that were resulting in the hung threads.
So the mystery still stands, and it's something we'll work toward in order to understand that better. In the meantime, our immediate problem is solved.
|
|
|
11/29/2010 11:06:23
|
targit
journeyman
Joined: 11/18/2010 01:17:53
Messages: 10
Offline
|
We are using -XX:+UseConcMarkSweepGC but dont work.
we are plan to fallback to 1.6.2 :(
|
|
|
11/29/2010 14:09:24
|
steve
ophanim
Joined: 05/24/2006 14:22:53
Messages: 619
Offline
|
Does anyone have a reproducible case that we can take a look at? Would love to help track it down
|
Want to post to this forum? Join the Terracotta Community |
|
|
11/30/2010 00:25:55
|
alexsnaps
consul
Joined: 06/19/2009 09:06:00
Messages: 484
Offline
|
Sorry to ask the obvious, but are you sure no code path does a get() on the BlockingCache and then doesn't do a put() in case of a cache miss ?
|
Alex Snaps (Terracotta engineer) |
|
|
|