Terracotta Discussion Forums (LEGACY READ-ONLY ARCHIVE)

I'll try to answer your questions in the order you asked them:

1. Yes, if you create a separate cache for each concurrent user then there will be separate offheap allocations made for each cache.

2. If you either remove a cache from the cache manager, or shutdown the cache manager then the space used by the cache(s) will be reclaimed during the shutdown procedure.

I'm not sure I fully understand your use-case, but it seems like you're creating caches that are used by only a single thread and only temporarily during this file parsing process after which you want to dispose them. You may find things work more better if you can re-use the caches you create across different parsing jobs, and potentially share these caches across multiple threads. This way you may not need to create so many caches with such short lifetimes.

Chris

I've answered this question here: http://forums.terracotta.org/forums/posts/list/5481.page

Chris

The Ehcache interface has always defined the getKeys method (and it's variants) as returning Lists. The main motivation for this was that in previous implementations keys could be stored multiple times in the cache (e.g. in memory and on disk) - this meant the more technically correct type of Set couldn't be used as it would force de-duplication of the keys which would be an expense the user might not care to incur.

Many of the current Store implementations (internal classes used to provide the actual cache storage) now present their key sets as Java Set instances. These obviously do not support indexed access and so there is an impedance mismatch between the Set returned by the Store implementation and the List returned by the Ehcache. We therefore have to wrap the Set to implement as much of the List functionality as is feasible. Indexed access
(especially for very large caches) is not feasible as it requires us to make a complete copy of the key set in the Java heap and then store it in a List. What the wrapper (SetAsList) essentially exposes is a List whose only functional methods are the ones defined by Collection. It is a List by type, but a Collection by functionality, use it as a Collection and you should have no problems.

I hope this helps clear things up,

Chris

If you could perform a full cluster dump via the dev-console and then attach the resultant client and server log files to this thread then we can have a look at the logging to see if their anything that can point us toward a cause for this. In particular this will give us a complete dump of the lock-manager state on all the clients and server(s).

Thanks,

Chris

P.S. Feel free to snip out any sensitive portions of your logs.

I'm not really involved with this code, but my understanding is the main (only?) thing you'll miss is the cpu utilization data in the dev console.

We've seen similar crashes caused by Sigar in other users on a variety of different JVMs. We're pretty certain that this is a Sigar issue, the best solution for you would be to disable the sigar libraries (tc.property: sigar.enabled=false).

I have posted a report of the bug in the Oracle forums:
http://forums.oracle.com/forums/thread.jspa?threadID=2208477&tstart=0

Although as Steve mentions if you have a support contract you are likely to acheive far more traction than we will.

Associated TC JIRA ticket: https://jira.terracotta.org/jira/browse/CDV-1570

It would appear from our internal testing that there was a regression in R28.1 of JRockit. There is a fairly obvious bug in the NIO code in the failing versions. Now that we understand what the bug is we can safely work around it in our own code without being constrained to any Oracle timelines. I've not personally done any work in our low-level network communications code, but I believe that this bug should not occur in 3.5.x releases (unless you are tweaking tc properties values) (if you use Hibernate I would avoid 3.5.0 and wait for 3.5.1 which will be released very soon).

I will file a public JIRA for this so that you can track the issue directly. I'll post back here with the link shortly.

Chris

I have a fairly strong suspicion that this is a bug in JRockit. I'm going to have a look at your logs and see if I can reproduce this locally, and then if I'm right I'll try to produce a smaller test case so we can file it with Oracle.

Chris

The l1.cachemanager.enabled setting needs to be applied to the Terracotta client processes, not the server processes. This may be happening automatically for you depending on exactly how you have set the property in the server, but its probably something worth checking.

I think the best suggestion is that you try running using Hibernate's built in Hashtable based second level cache (org.hibernate.cache.HashtableCacheProvider). If you still see the problem with this second level cache provider, then this is most likely either a bug in Hibernate or somewhere in your configuration or use of it.

If everything works okay with the Hashtable provider then the next course of action would be try and create a test case so that we can reproduce this internally.

Chris

Seems like the best course of action then would be to cross our fingers for a repeat occurrence, and get a fresh set of logs and client/server dumps then.

Chris

Having looked at 3.4.1 release source code, I think the InterruptedException logging is a red-herring. Although the logging is a little overly verbose, the interrupt is being propagated correctly and nothing looks wrong there. We'll know more when we have a full set of logs to inspect.

The simplest way to get a full cluster dump is via the "Terracotta Developer Console". With the developer console connected to your cluster, navigate to the following panel using the graphical tree structure on the left hand size of the GUI: Platform -> Diagnostics -> Cluster dumps. Once there if you click the "Take Cluster State Dump" button you should, once the operation is complete, find a complete state dump for each client and server in their corresponding log files.

I'll have a look at the source code for the 3.4.1 release and see firstly if we are remiss in correctly handling the interrupt, and secondly, if so whether an interrupt at that time could cause problems with releasing the lock.