Terracotta Discussion Forums (LEGACY READ-ONLY ARCHIVE)

Hi,

First let me answer your specific questions:

1) There's nothing in the server lock dump that will help you figure that out. All that can read from the lock you are referencing is that Client-48 is holding a greedy (aka VM level) write lock on the lock, and that ThreadID-1 in Client-62 is trying to grab a read lock on the same lock. If you want to figure out what's going on on the client side you'll need to do a client dump (this may already be being produced in the terracotta client logs when you trigger the server dump).

2) What reasoning brought you to the number 10 for the number of locks you expect to see? The presence of a lock in the dump here doesn't mean a thread is holding it. It could be that the client is holding the lock at a vm level, so that a lock request from a thread on that client doesn't require the client to talk to the server for the lock to be awarded. So if you have a small number of threads that lock on a large number of keys, you can end up with each client holding a large number of vm level locks. The vm level holds are slowly gc'ed back to the server, so the lock count is usually a function of the number of unique locks used in a given time period.

3) The ThreadID -9223372036854775808 is Long.MIN_VALUE, this is used to identify lock holds that are vm-level and not specific to any one thread.

If you can attach the full set of client and server logs (with all the dump information from both, including the thread dumps) then we should be able to track down what is happening here.

Chris

Although no one to my knowledge has performed such a comparison (although it is an interesting idea), I can try to theorize (without going in to detail on the implementation of BigMemory) about what you might see in such a test.

The three big problems I'd expect you to face with open source (community edition) Ehcache disk stores are: Firstly in open source only the values are stored on disk - the keys and the meta data to map keys to values is still stored in heap (which is not true for BigMemory). This means the heap would still be the limiting factor on cache size. Secondly the open source disk store is designed to be backed by a single (conventionally spinning disk - although some people do use SSD drives now), this means the backend is less concurrent (especially with regard to writing) than Enterprise BigMemory since the bottleneck is expected to be at the hardware level. Thirdly the serialization performed by the open source disk store is less space efficient so serialized values have much larger overheads.

Hope this makes sense,

Chris

Yes, an open source Terracotta server instance can utilize more than one CPU, although depending on the workload you may not be able to completely saturate all CPUs at any given time.

This possible bug is now being tracked here: https://jira.terracotta.org/jira/browse/CDV-1551

An absurdly late reply I know, but just to close this thread out so that anyone who finds can get a complete answer.

When you have two overlapping locks with an unordered release like your example the way things behave is that a transaction is closed and shipped to the server whenever a lock is released. So when lock1 is unlocked any changes made either under lock1 or under lock1 and lock2 are shipped to the server. Then when lock2 is unlocked, and changes made since unlocking lock1 are then shipped to the server.

Hope that makes sense,

Chris

This may be a little late, as you may have already solved this problem.

My suspicion from the error messages that you are seeing is that the "com.dreammatcher.model.State" cache is a Hibernate 2nd Level Cache, and that you are attempting to access it directly without talking to it through Hibernate (do you also have use_structured_entries set too?). This isn't going to work because Hibernate does not store your domain objects directly in the 2nd level cache, this is because a domain object is intimately associated with a given session, and the 2nd level cache is designed to cache objects across multiple sessions. What's stored in the 2nd level cache is actually much closer to the database entry than it is to your domain objects.

When you try to access the 2nd level cache directly you are getting the internal representation of your domain object (when use_structured_entries is true this is a Map) which obviously cannot be cast to your domain object directly.

Hope this makes everything clear,

Chris

As you have noticed the open-source disk stores do not store the entire cache on disk. Instead the values are stored on the disk, and the keys are stored in memory mapped to value objects which reference where on disk the associated value object can be found. This means each on-disk mapping has an associated in-memory overhead equal to it's key-size plus a fixed overhead per mapping.

Assuming that you cannot increase the heap size and/or cache less of your data set there are really only a couple of other avenues open to you. Firstly if your keys are large objects you can reduce their size (although this isn't normally that common). Secondly you may be able to reduce the number of mappings that you store by restructuring your 5 million key/value pairs in to a smaller number of mappings that have much larger value objects. The value objects in this case may even be maps themselves, which hold the original value objects.

Hope this helps,

Chris

The stack trace you quote indicates that the client is disconnected from the cluster. Is there anything suspicious in the TC client/server logs? If you attach a tar.gz of a full cluster thread dump, and the relevant time-slice of the client and server logs then I may be able to get a better idea of what's going on.

It's possible that you are hitting a bug that has been fixed in a more recent release although none of them spring to mind immediately. If you have the ability to upgrade to the latest release and can try to reproduce it there that would be great.

JIRA filed: https://jira.terracotta.org/jira/browse/CDV-1538

I believe the change in behavior that you are seeing is due to an optimization put in place to avoid needlessly updating the last accessed timestamp. This avoids creating a write transaction that must be pushed over the network to the TC server array on every cache hit.

The optimization doesn't bother to update the timestamp of an Element on a clustered cache until the Element is halfway to expiry. Unfortunately currently the measure for "halfway to expiry" is based on the cache's configured TTI, and not on the Elements TTI. In your case you have a cache configured with a 120 second TTI, which means the Element timestamps won't be updated until the elements haven't been touched for 60 seconds. This would be fine, except for the fact that the Element you are testing on has a custom TTI of 5 seconds. Hence it expires due to it's TTI before the timestamp ever gets updated.

I'm going to file a JIRA issue to fix this behavior so that the decision above is made based on the Element TTI itself so correcting this behavior for custom lifespan Elements. Note however that this will not fix the behavior you are seeing to match that of unclustered where the timestamp is updated on every access.

In the short term if you can avoid using custom TTI/TTL elements that will obviously help, and if you do need them, if you can arrange to have the custom TTI/TTL elements have longer lifespans than the cache default that will also 'fix' things.

Hope this clarifies things for you.

Chris

P.S. I will post back here with a link to the JIRA issue once it is filed.

Hi Augusto,

Thanks for filing EHC-805, I've just checked in a fix for it to trunk which should go out in the next release of ehcache-core.
Chris

Associated JIRA item is here: https://jira.terracotta.org/jira/browse/EHC-793

You're right about this, and it was a problem that I wrestled with when writing the CompoundStore as a replacement for the previous two store system. Unfortunately it's not possible to just look at the memory store keys, since the two stores (memory and disk) now share a single map structure (and hence a single key set). They only differ in what kind of value object the key maps to (i.e. memory maps to an Element, disk maps to a 'pointer' to a file offset). The reason we switched to this system is that it allows the flushing/faulting of the values to and from disk to be perfectly coherent, and hence we can correctly implement the ConcurrentMap style methods (like putIfAbsent for example). The keys are therefore all stored in the same 'place' and we cannot tell whether they are in memory or on disk until we look at the type of the associated value (which is what the filter effectively does).

One thing we could consider doing here is that if the memory store is small enough and the ratio of disk capacity to memory capacity is large enough we could cache the in-memory keys in a secondary data structure. As long as this structure was kept reasonably consistent (i.e. no long term drift) with the real in-memory set of keys then we could select our in-memory eviction sample from this set instead. I'll file a JIRA detailing this problem, and suggesting this tactic as a possible solution. If you register for a JIRA account then you will be able to watch this JIRA to track progress on the issue.

Assuming that you are using the beta release of ehcache-core-ee then this is an already known bug. The bug is already fixed and will be released in the upcoming GA release of ehcache-core-ee.

I think I can see the potential for this to occur in pre-2.1.0 ehcache-core disk store code.

The basic race that I think is possible is:

Thread 1 : Grabs a reference to the DiskElement (pointer to on disk location) for key A
Thread 2 : Removes (either explicitly or via the expiry thread) the mapping for key A and marks the associated area of the disk as free.
Thread X : Reuses the now free'd area of the disk for storing the value associated with a different key.
Thread 1 : Tries to read the element pointed to by the DiskElement it is holding. If its 'lucky' it sees the value written by Thread X (and you see a class cast exception). It its 'unlucky' it gets an EOFException as the serialized stream is longer that it expects (since it represents a different value).

If I'm right on this then you shouldn't see this failure with just the MemoryStore in 2.0.1, and you should also not see this failure with overflow-to-disk in 2.1.0+ since the internals of the disk store were changed and I believe this race is/was fixed.