[Logo] Terracotta Discussion Forums (LEGACY READ-ONLY ARCHIVE)
  [Search] Search   [Recent Topics] Recent Topics   [Members]  Member Listing   [Groups] Back to home page 
[Register] Register / 
[Login] Login 
[Expert]
Strange Problems under heasvy load with Blocking cache  XML
Forum Index -> Ehcache Go to Page: 1, 2 Next 
Author Message
targit

journeyman

Joined: 11/18/2010 01:17:53
Messages: 10
Offline

Hi,

using ehcache 2.3.0 standalone. we have strange problems under heavy concurrent access with blocking cache. some threads will never wake up and remain in waiting state. this will crash our system.

thread dump:

Thread: ajp-0.0.0.0-8010-2 : priority:5, demon:true, threadId:129, threadState:WAITING, lockName:java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync@6bad4311

sun.misc.Unsafe.park(Native Method)
java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:747)
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:778)
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1114)
java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:807)
net.sf.ehcache.concurrent.ReadWriteLockSync.lock(ReadWriteLockSync.java:53)
net.sf.ehcache.constructs.blocking.BlockingCache.put(BlockingCache.java:204)
de.company.webdb.caching.CacheServiceBean.put(CacheServiceBean.java:166)


we have over 200 thread with same state !!!!

any ideas ?

ehcache 1.6.2 will work under same scenario with no problems !


using java:
Java HotSpot(TM) 64-Bit Server VM (build 16.0-b13, mixed mode)
alexsnaps

consul

Joined: 06/19/2009 09:06:00
Messages: 484
Offline

We have noticed similar behavior under certain circumstances.
But are the 200 threads waiting for the write lock?
We are currently evaluating what's the best way to address that, so your input is more than welcome.
Thanks!

Alex Snaps (Terracotta engineer)
targit

journeyman

Joined: 11/18/2010 01:17:53
Messages: 10
Offline

yes. all waiting for the write lock.
What more informations you need?

what do prefer for a workaround? fallback to 1.6.2 ?

we are planing to use jgrouprepliaction in future. it's possible to use this feature with 1.6.2 ?
etsai

master

Joined: 07/31/2007 10:14:38
Messages: 72
Offline

May be a JVM issue. Please using JDK_1.6.0.21 or higher.

Refer following links:

https://jira.terracotta.org/jira/browse/DEV-4685

http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6822370

alexsnaps

consul

Joined: 06/19/2009 09:06:00
Messages: 484
Offline

If that works out for you, please let us know.
Thanks!

Alex Snaps (Terracotta engineer)
mmatook

neo

Joined: 03/31/2009 15:53:01
Messages: 3
Offline

I have seen this problem to occur on quad-core + quad-socket systems under high load (1000 concurrent threads get stuck). It appears to be linked to a JVM bug ( should be fixed in JDK_1.6.0.18 or higher )

Temporary workaround could be to use the -XX:+UseMembar parameter ... seemed to help in some cases ( if upgrading the JDK is not an option).

In any case let us know how you go ...
targit

journeyman

Joined: 11/18/2010 01:17:53
Messages: 10
Offline

thx for the help.

we will try first newest jdk1.6.22 and maybe then the vm hint -XX:+UseMembar.

I'll report results

abellas

neo
[Avatar]
Joined: 11/19/2010 11:18:10
Messages: 4
Location: Orlando, FL
Offline

I, too, am having the exact same issue:

"jrpp-733" prio=5 tid=1194 WAITING
at sun.misc.Unsafe.park(Native Method)
at java.util.concurrent.locks.LockSupport.park(Unknown Source)
at com.tc.object.locks.LockStateNode$PendingLockHold.park(LockStateNode.java:172)
at com.tc.object.locks.ClientLockImpl.acquireQueued(ClientLockImpl.java:731)
at com.tc.object.locks.ClientLockImpl.acquireQueued(ClientLockImpl.java:710)
at com.tc.object.locks.ClientLockImpl.lock(ClientLockImpl.java:50)
at com.tc.object.locks.ClientLockManagerImpl.lock(ClientLockManagerImpl.java:97)
at com.tc.object.bytecode.ManagerImpl.lock(ManagerImpl.java:728)
at com.tc.object.bytecode.ManagerUtil.beginLock(ManagerUtil.java:208)
at org.terracotta.collections.BasicLockStrategy.beginLock(BasicLockStrategy.java:12)
at org.terracotta.collections.ConcurrentDistributedMapDso.beginLock(ConcurrentDistributedMapDso.java:964)
at org.terracotta.collections.ConcurrentDistributedMapDso.get(ConcurrentDistributedMapDso.java:181)
at org.terracotta.collections.ConcurrentDistributedMapDsoArray.get(ConcurrentDistributedMapDsoArray.java:154)
at org.terracotta.collections.ConcurrentDistributedMap.get(ConcurrentDistributedMap.java:165)
at org.terracotta.cache.impl.DistributedCacheImpl.getNonExpiredEntry(DistributedCacheImpl.java:175)
at org.terracotta.cache.impl.DistributedCacheImpl.getNonExpiredEntryCoherent(DistributedCacheImpl.java:115)
at org.terracotta.cache.impl.DistributedCacheImpl.getTimestampedValue(DistributedCacheImpl.java:153)
at org.terracotta.modules.ehcache.store.ClusteredStore.get(ClusteredStore.java:210)
at net.sf.ehcache.Cache.searchInMemoryStoreWithStats(Cache.java:1695)
at net.sf.ehcache.Cache.get(Cache.java:1335)
at net.sf.ehcache.Cache.get(Cache.java:1306)
at coldfusion.tagext.io.cache.ehcache.GenericEhcache.get(GenericEhcache.java:75)
at coldfusion.tagext.io.cache.CacheTagHelper.getFromCache(CacheTagHelper.java:237)
at coldfusion.runtime.CFPage.CacheGet(CFPage.java:8183)
at cfCacheManager2ecfc1027664017$funcASSOCIATECACHEKEYEVICTIONSTORES.runFunction(C:\-------\service\utility\CacheManager.cfc:68)


We are definitely using the latest JDK - that was one of our check list items to try and help things. Upgraded to 1.6.22 on all clients and Terracotta servers. I will also try this jvm hint and report back... I'm thrilled to have found a forum thread talking about my exact issue (seemingly, so far).
[WWW]
abellas

neo
[Avatar]
Joined: 11/19/2010 11:18:10
Messages: 4
Location: Orlando, FL
Offline

The param didn't help things, we still have a couple dozen hung threads matching my previous post. We added the parameter to the clients though... I thought that made sense, but we're going to try it with the server, too.

Does anyone have any tips on how to more closely inspect what it is that's hanging up those threads? What confuses me is that the Terracotta server isn't overly stressed out on CPU, network, or memory when this is happening. I just have a hard time accepting the idea that the ColdFusion client is unable to contact or get a response back from Terracotta - if that's how I should be interpreting these hung threads.
[WWW]
targit

journeyman

Joined: 11/18/2010 01:17:53
Messages: 10
Offline

We testing jdk1.6.22. Same issues.

Now we will try jvm hint -XX:+UseMembar.

I'll report results
targit

journeyman

Joined: 11/18/2010 01:17:53
Messages: 10
Offline

We have with -XX:+UseMembar the same problems
App runs 5 days with no problems but then crash with same problems:
many threads waiting for a wake up.

´ sun.misc.Unsafe.park(Native Method)
java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:811)
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:842)
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1178)
java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:807)
net.sf.ehcache.concurrent.ReadWriteLockSync.lock(ReadWriteLockSync.java:53)
net.sf.ehcache.constructs.blocking.BlockingCache.put(BlockingCache.java:204)
de.pantarhei.webdb.caching.CacheServiceBean.put(CacheServiceBean.java:204)

see attached thread dump.

any ideas to solve this problem ?


targit

PS: Wondering, JDK-ConcurrentHashmap use same ReentrantLocking-Mechanism and dont block ? Maybe a Lock.notifyAll() missed ?
 Filename threaddump.txt [Disk] Download
 Description
 Filesize 654 Kbytes
 Downloaded:  533 time(s)

abellas

neo
[Avatar]
Joined: 11/19/2010 11:18:10
Messages: 4
Location: Orlando, FL
Offline

While we have not directly solved the problem, we have a work around. The hung threads were occurring after large mark sweep GC's in ColdFusion. Rather than focus on the hung threads, we focused on getting the GC cycles under control.

Moving to concurrent GC (-XX:+UseConcMarkSweepGC ) and altering the ratio to increase the size of the young generation helped us prevent the large mark sweeps that were resulting in the hung threads.

So the mystery still stands, and it's something we'll work toward in order to understand that better. In the meantime, our immediate problem is solved.
[WWW]
targit

journeyman

Joined: 11/18/2010 01:17:53
Messages: 10
Offline

We are using -XX:+UseConcMarkSweepGC but dont work.

we are plan to fallback to 1.6.2 :(
steve

ophanim

Joined: 05/24/2006 14:22:53
Messages: 619
Offline

Does anyone have a reproducible case that we can take a look at? Would love to help track it down

Want to post to this forum? Join the Terracotta Community
alexsnaps

consul

Joined: 06/19/2009 09:06:00
Messages: 484
Offline

Sorry to ask the obvious, but are you sure no code path does a get() on the BlockingCache and then doesn't do a put() in case of a cache miss ?

Alex Snaps (Terracotta engineer)
 
Forum Index -> Ehcache Go to Page: 1, 2 Next 
Go to:   
Powered by JForum 2.1.7 © JForum Team