Terracotta Discussion Forums (LEGACY READ-ONLY ARCHIVE)

Yes, currently the disk store implementation used within the CompoundStore does not use a striped set of RandomAccessFiles on read, so the diskAccessStripes property does not have any affect with the new disk store. As as far as I can see however there is no reason that this could not be implemented in the new disk store. It is in fact looking like there will be some more work being done on the stores in the near future so this could potentially be added as a feature then.

The exception is coming from the allocate/free algorithm in the disk storage component of Ehcache. The disk store is trying to free a region of the file that is already partially free'd (You can see that the two regions logged are partially overlapping). There is obviously something at fault here, either in Ehcache or in your usage of the cache (which might imply a documentation error on our part).

Does this failure occur reliably or regularly? If so could you describe some more details of your usage pattern for the cache, what operations are being performed on the cache at the time of the failure? The logging message indicates that this failure is occurring during initialization of a spring bean, what does the bean do during initialization?

Saratbeesa you are correct that there is what I consider a bug in MemoryEfficientByteArrayOutputStream, I have created a JIRA item to track the fixing of this here : https://jira.terracotta.org/jira/browse/EHC-754

Since, the various posts here seem to be discussing subtly different issues around Ehcache storage sizes, I'm going to post a number of different responses.

Firstly: Ray, Here is my analysis of the amount of heap used by a single <Integer, Integer> in-memory Ehcache mapping (these are all assuming a 32bit VM)...

Key Object: Integer (16 bytes)
Value Object: Integer (16 bytes)

Element Object: {
Obj Overhead: 8 bytes
Key Ref: 4 bytes
Value Ref : 4 bytes
version: 8 bytes
hit count: 8 bytes
ttl value: 4 bytes
tti value: 4 bytes
ElementEvictionData Ref: 4 bytes
lastUpdateTime value: 8 bytes
cacheDefaultLifespan value: 1 bytes (or 1 bit)
} : 56 bytes

DefaultElementEvictionData: {
Obj Overhead: 8 bytes
creationTime: 8 bytes
lastAccessedTime: 8 bytes
} : 24 bytes

HashEntry: {
Obj Overhead: 8 bytes
Key Ref: 4 bytes
hash value: 4 bytes
Next Ref: 4 bytes
Value Ref: 4 bytes
} : 24 bytes

So in total this gives you a minimum memory cost for each element of : 16 + 16 + 56 + 24 + 24 = 136 bytes.

For 100,000 elements this equals 13.6Mb (if you use base-10 multipliers), or 12Mb if you use base-2. This is 400,000 bytes beneath your direct heap measurement (which is an extra 4 bytes per reference, which can easily be accounted for by the HashEntry[] inside the cache, this is actually better than I would expect since the hash table has a load factor of 0.75 so I would expect 4/0.75 bytes per HashEntry instance on average).

Applying the same correction to your full blown usecase leaves us with a cache using around 2.5Gb of heap. I'm obviously not aware of what else your application is doing but its entirely reasonable that app overhead, plus GC overhead could require a 5Gb heap.

I'm assuming that your test is multi-threaded here. If so then what is almost certainly happening is that the multiple accessing threads are racing to perform eviction. In effect this means that multiple threads all observe the memory store to be over capacity and all try to flush elements to disk, thus causing too many evictions. This can happen especially when there are a similar number of accessing threads to the number of in-memory elements.

If you're seeing this in a single threaded test then the problem is definitely more interesting/concerning. If you can provide a simple test case then I can see if anything more serious is happening.

Thanks,

Chris

The fact that this fails inside the OSGi container, but works fine outside makes me wonder if it somehow be classloader related. Do you see any stack traces indicating anything that went wrong during the disk write in any of your application logs (or the console) that could help us diagnose this?

I have posted a reply to your original post here:

http://forums.terracotta.org/forums/posts/list/3464.page#19351

Chris

You are quite correct that the bug is caused by a change in the internal implementation of LinkedBlockingQueue in JDK 1.6u19. I have created a JIRA issue to track progress on a future fix for this issue. Until this is fixed the best workaround will be to revert to using 1.6u17 on which we currently run internal testing.

For your reference, and in case you wish to watch progress on it the JIRA I have created is: https://jira.terracotta.org/jira/browse/CDV-1472

Regards,

Chris

We believe that this is actually a bug in Google App Engine itself, since the offending class AtomicLongFieldUpdater is in the white-list of JDK classes, but it fails to work with the same exception even in a minimal test case on the production servers despite working perfectly in the development environment.

The associated Google bug filing is:

http://code.google.com/p/googleappengine/issues/detail?id=2769

The Google bug is also linked from the TC issue:

https://jira.terracotta.org/jira/browse/EHC-617

As I stated in the TC JIRA issue, the current plan is to wait to see what Google decide to do with this bug before making changes in Ehcache. Unfortunately this means that this will not get fixed in the upcoming release. However assuming that Google do fix the App Engine bug, Ehcache should (barring any other problems) then work correctly.

Hope this explains the current situation fully,

Chris

Cool, thats not a problem. The locking system changed almost entirely in 3.2, hence I'm more than happy to chase a few bogus bugs in the hope of finding a real one.

Chris

The LinkedBlockingQueue in TC is just an adapted version of the JDK LBQ. They both use a ReentrantLock under the hood to provide exclusion and signalling for reading threads. This should provide the same interruptibility for the LBQ.

Is the failure to interrupt the taking threads happening every time or just occasionally? If you could give me a failing test case I'll hopefully be able to figure out if this is a problem with TC.

Chris

As far as I am aware it should interrupt the client code (see https://jira.terracotta.org/jira/browse/CDV-893). If you could fill me in on what Terracotta version you are using, and also whether the failure is easily repeatable or not that would be useful.

Thanks,

Chris

There are two bugs that may be causing this problem:

1. If you are running Ehcache express (i.e. without a custom boot jar) then your cache is running in serialization mode. This means that before Terracotta stores your values on the server it is converting them into a detached form. This is done for both keys and values, values are simply serialized into byte arrays, and keys are serialized and then converted into Strings. Currently the clustered Ehcache instance underlying the Hibernate cache is not converting these keys back into their original form when Ehcache.getKeys() is called (via Region.toMap()). There is already an existing JIRA item tracking this bug (https://jira.terracotta.org/jira/browse/CDV-1444).

2. If you are running in full blown DSO mode then it also possible that you are using an identity cache to back your Hibernate cache. In 1.7.2 there was a bug whereby even identity caches were detaching their keys before storing them in Terracotta (https://jira.terracotta.org/jira/browse/CDV-1445). This is now fixed. However I would strongly recommend you to not use an identity cache with Hibernate - this will in fact be impossible in the upcoming "Darwin" release.

Hope this clears things up for you,

Chris

There is actually an outstanding JIRA open for the addition of a tim-synchronizedlist: https://jira.terracotta.org/jira/browse/FORGE-258

tim-synchronizedcollection which you are using currently only contains locking configuration for Collections$SynchronizedCollection and not SynchronizedList. SynchronizedList is actually a subclass of SynchronizedCollection so some of that locking is partly supporting SynchronizedList already though. From this knowledge I assume you are calling the add(Object, int) method which is unique to SynchronizedList and not add(Object) which should be correctly autolocked by the tim-synchronizedcollection locking.

You should however be able to add the correct locking configuration for the SynchronizedList class (and possibly its SynchronizedRandomAccessList subclass) to your own tc-config.xml in order to get things working. Have a look at the configuration snippet in the tim-synchronizedlist code if you need an example to work from (http://svn.terracotta.org/fisheye/browse/TerracottaForge/tim-collections/trunk/tim-synchronizedcollection/src/main/resources/terracotta.xml?r=HEAD)

Hope this clears things up,

Chris

Though this isn't documented anywhere (as of yet) it is possible to use the locality API with a CDM/CSM. Although the root CDM is not compatible with the locality API (even though it is a TCMap through instrumentation - the real reason for failure is a little more complex than this...) the constituent maps of the root CDM are compatible. So you can use the locality API through code like the following:

Code:

 for (Map m : cdm.getConstituentMaps()) {
   Set orphanKeys = clusterInfo.getKeysForOrphanedValues(m);
   //do stuff here
 }

Hope this helps you...

Chris