Terracotta Discussion Forums (LEGACY READ-ONLY ARCHIVE)

We have run into a strange issue lately in using the SelfPopulatingCache construct in ehcache 2.1.0.

We started getting EOFExceptions consistently when trying to read elements from the diskstore. I'm not quite sure about the real reason behind this but it seems to be happening while de-serializing the Element object from the byte[]. Following is the stack trace of the error.

Code:

 Caused by: java.io.EOFException
 at java.io.DataInputStream.readInt(DataInputStream.java:375)
 at java.io.ObjectInputStream$BlockDataInputStream.readInt(ObjectInputStream.java:2776)
 at java.io.ObjectInputStream.readInt(ObjectInputStream.java:950)
 at net.sf.ehcache.Element.readObject(Element.java:797)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:974)
 at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1849)
 at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1753)
 at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1329)
 at java.io.ObjectInputStream.readObject(ObjectInputStream.java:351)
 at net.sf.ehcache.store.compound.factories.DiskStorageFactory.read(DiskStorageFactory.java:264)
 at net.sf.ehcache.store.compound.factories.DiskPersistentStorageFactory.retrieve(DiskPersistentStorageFactory.java:201)

With this exception stacktrace in my server logs, I see two weird behavior happening on the server...

The thread which is responsible for populating the element in the SelfPopulatingCache got stuck in WAITING state forever.

I was not able to shutdown the cache manager at all, i had to kill it forcibly to release the control.

As usual, digged into the code and found out that ehcache is acquiring a read lock on the key object before reading it from the actual cache, but before it could unlock it a runtime error is thrown (in this case a EOFException ), so the control directly goes to the catch block where u try to acquire a write lock on the cache to insert a null element in the cache. I believe, this action is triggering the thread to wait forever since it thinks that someone else is already having a read lock on the same key. I have also made a fix like this...

Original Code :

Code:

         Sync lock = getLockForKey(key);
         acquiredLockForKey(key, lock, LockType.READ);
         Element element = element = cache.get(key);
         lock.unlock(LockType.READ);

Patched Code:

Code:

         Sync lock = getLockForKey(key);
         acquiredLockForKey(key, lock, LockType.READ);
         Element element = null;
          try
          {
             element = cache.get(key);
          }
          finally
          {
             lock.unlock(LockType.READ);
          }

With this fix, the unthinkable happened. Not only that the thread resumed its execution successfully, but I suddenly stopped getting the EOFException in the first place. I was unable to reproduce the use case after this fix. I have tried with the original jar and the patched jar, I consistently see that the code with original jar fails all the time and the patched jar works all the time for the same use case.

Even if the results of my experiment are inconclusive, it does make sense to put the lock/unlock code inside a try/finally block...just to make sure we don't hang the JVM if there's a runtime error at the time of reading the element from the ehcache. What do you guys think ?

Regards,
Sarat kumar Beesa.

Hi All,

I have been working tuning our diskstore to improve the concurrency while reading the elements from the disk. I found from the XSD that there's an attribute we can define called 'diskAccessStripes' which can improve the perforamance of reads as well as writes in the diskstore.

However, I got curious and digged into the code and found that this attribute is referred only in the legacy Diskstore, which will be enabled only when we set a system property Code:

-Dnet.sf.ehcache.use.classic.lru=true

to our server startup script.

Is it true that we can only use this feature if we use the legacy diskstore ? Is it also possible to use this against the new DiskStorageFactory ?

Regards,
Sarat kumar Beesa.

I have finally found the problem that caused this exception. The problem is with the data structure that we used to store in ehcache. The data structure contains lot of collections and Map inside it which are not synchronized. So, while the ehcache spool worker tries to serialize the object into the stream, if any of the other threads tries to modify this collection object or Map object it results in the 'OptionalDataException'.

This was the reason the occurance of this error also is random in nature and also happens at random places in the code, because it only happens when serialization coincides with updation of the object. So the recommended solution is to synchronize access to these collection and Map objects. Once they are synchronized, the problem didn't happen again.

We too faced a similar issue while working with the ehcache standalone server. What we observed is that our jmeter test results started showing that the downloaded element size ( as a byte[] ) is always equals to 37KB, no matter whatever the size of actual element is. Eg: even if the actual Element object size is 10 KB, i'm getting the rest of 17Kb as a junk bytes.

So, we started digging into the code and found that the behavior is happening while serializing the elements using the 'MemoryEfficientBytearray output stream' class. This class tracks the sizes of the objects using a static variable and uses it as a best guess to estimate the size of the object when it is serialized. This approach can actually make the serialization work faster because there're no incremental allocations to the serialized object byte[].

In our case, what happened is that one of the objects we put in the cache is of size 37KB, and once we downloaded this object from the ehcache server, it started assuming the same size for any subsequent cache requests.

pmcreddy,

The same logic is being used while serializing elements to the disk as well. So, it makes sense that you see a large difference in the disk space utilzation. Try estimating the size by taking the largest object in the cache and multiply it by number of elements in the cache. it should come close to the 4GB value u're seeing now.

FYI, We're currently using ehcache 2.1.0

Latest update on this ...

I observed that if we don't modify the object after inserting into the ehcache, everything works fine. If we modify the object in any way, its leading to this exception.

Eg: I have a really huge data structure that i want to maintain in Ehcache, its basically a Huge object with several objects, lists, maps in it. We load this object from database in steps, each step populates the object little by little and finally at the end of all the steps , i have the object ready. Problem here is that I'm storing the object first in ehcache and then updating it by reading it again in the subsequent steps.

Can diskstore handle this amount of pressure on it ? FYI, I'm loading like 100K objects from the database.

Following is the cache declared in the ehcache xml

Code:

 <cache name="service_cache" maxElementsInMemory="10000"
 		maxElementsOnDisk="10000000" eternal="true" overflowToDisk="true"
 		diskSpoolBufferSizeMB="20" 
 		diskPersistent="true" memoryStoreEvictionPolicy="LRU" statistics="false" >

We have migrated our Ehcache implementation to the latest 2.1. We observed that the 2.1 came up with great performance improvements and its blazingly fast compared to 1.6.2. But the problem we're facing now is that Ehcache 2.1 fails at the time of de-serializing the Element object from the diskstore. Its failing with the following exception

Code:

 20:12:23,906 ERROR [STDERR] net.sf.ehcache.CacheException: java.io.OptionalDataException
 20:12:23,906 ERROR [STDERR]     at net.sf.ehcache.store.compound.factories.DiskPersistentStorageFactory.retrieve(DiskPersistentStorageFactory.java:209)
 20:12:23,906 ERROR [STDERR]     at net.sf.ehcache.store.compound.factories.DiskPersistentStorageFactory.retrieve(DiskPersistentStorageFactory.java:59)

I repeated the same case with Ehcache 1.6.2 API, it works without any problem. However, I noticed that Ehcache 2.1 API does custom serialization and de-serialization of Element object by adding primitive values ( two integers per Element ) to the stream. Ehcache 1.6.2 doesn't do this and my guess is that it worked because of the same reason.

The documentation of the OptionalDataException class clearly says that the error can occur if we try to read primitive values from the object stream, which is exactly happening in the code while de-serializing the Element object.

Following is the code snippet that shows this...

Code:

 private void readObject(ObjectInputStream in) throws IOException, ClassNotFoundException {
         in.defaultReadObject();
         elementEvictionData = new DefaultElementEvictionData(TimeUtil.toMillis(in.readInt()), TimeUtil.toMillis(in.readInt()));
     }

I also checked the code for the DefaultElementEvictionData, its just a POJO with two 'ms' fields. So, i was wondering why can't we serialize the object here instead of serializing the primitives ?

Is my analysis correct ? Has anyone faced the same problem before ?

Thanks and Regards,
Sarat kumar Beesa.

here's what I know in Ehcache.

>>1. Does ECache run in its own JVM or the client/App server JVM? If on separate JVM can it run on separate machine - then what is the communication protocol from client to ECache.

If you're asking whether we can setup Ehcache as a standalone server and many thin clients connect to it for fetching the cache data, then the answer is Yes. Ehcache does support a standalone installation. The communication protocols supported are REST as well as SOAP

>> 2. What governs the size of in-memory cache- the configuration only talks about number of objects. Depending on size of object the memory consumption would vary. The ultimate memory constraint would be JVM memory size 2GB in 32 bit?

Right now, you can only configure the number of elements that can be stored in the Memory Store. I don't think Ehcache has configuration to set thresholds on the memory usage. it'd be cool to have that feature though !

>>3. Distributed (or replicated cache) - does each node have all data - or there is some concept of partitioning of data?

Out-of-box Ehcache Core supports Replicated cache. If you're looking for distributed cache, you might want to check out its integration with Terracotta.

Sarat

Thanks Alex,

Its an interesting solution though.. haven't thought in that direction at all.. I'll give it a try.

By the way, I came to learn that active mq also has a similar problem and they solved it by integrating with the spring XML itself, I mean, we can literally paste the active mq XML inside the spring XML and declare it as a configuration to the active MQ API.

Refer to section 'Using Spring 2.0' @ http://activemq.apache.org/how-do-i-embed-a-broker-inside-a-connection.html

We too observed the same problem sometime back. I believe this is happening due to the heavy disk fragmentation that happens because of frequent disk swaps.

What happens in the Ehcache diskstore is that, it maintains a dedicated list called 'Free Space' list where it holds the references to the free space ( or 'holes' i'd call it ) which can be re-used when you're popping in a new object. When a new object is pushed to the disk it checks if the free space is enough to store one object or not, if the size is not big enough to fit the new object, it simply appends at the last of the data file.

What we found out later is that the object which is read from the disk was modified by the application and when we spool it back it didn't fit in the same place because the size of the object has been altered ( we have strings as member variables :) ), so it always appends at the end of the file, there by increasing its size.

Turns out that there's no de-fragmentation happens in Ehcache and infact the free node list makes into the .index file when they write it in the file system so that the free space can be recovered after the startup for recycling.

Inorder to avoid getting into this problem, i'd recommend you increase the MaxElementsInMemory size to a higher value so that the number of disk swaps will be reduced. You should also avoid flushing frequently. However, we still haven't figured out any work around for fixing this problem completely. I was thinking we should have a mechanism to de-fragment these free space list ( may b re-shuffling the objects so that the free space nodes are pushed to the end of the file ), but it has implications on the performance side as well.

What do you guys think of that ?

Thanks and Regards,
Sarat.

Hi all,

We have implemented the Ehcache solution for caching within our product ( 1.6.2 version of it ), we manage to inject the cache manager of Ehcache entirely using the spring's application context XML which worked real great.

Now, we want to use the new 'Cache Writer' feature that is introduced in Ehcache 2.0 in our application but stuck with a configuration problem. We have a database at the backend which acts as a master copy of all our data and we want to use Ehcache with 'Write-through' mechanism to keep our cache in sync with Database.

Since the 'Write-through' implementation factory and the classes are to be defined in the ehcache XML itself, we're not able to access our DAO Api from within this writer Class as all our POJOS are defined in a Spring configuration XML. For eg: We can't get a DB connection from the shared datasource as it is defined in the spring XML.

Briefly, we can not access Spring beans inside ehcache XML. Does Ehcache support this feature ?

Has anyone faced the similar problems before ?
Is there a work-around that we can use ?

Any pointers here would be of great help to us..

Sarat kumar.

Ok, the fact is ehcache can not recover data of the cache from a abnormal system crash. Unless you use terracotta as a backbone for your application.

However, it works perfectly in recovering cache data if you shutdown your server normally and restarted it.

It might also work, if you can periodically flush the cache from within your program ( or atleast at significant places in your business logic ). Use this option sparingly as it leads to performance degradation if you flush your cache more often that you should... as all the requests will now directly goto 'synchronized' Diskstore.

Lately, we have witnessed one curious behavior in Ehcache in our production system. The file size of the *.data file keeps growing continuously whenever we flush the cache.

I did search in google for answers but didn't find any satisfactory solutions. So, I decided to jump into the code and see what's happening. I found that the ehcache maintains a freeblock list in the Diskstore which is later used for storing any other elements in the disk. So after claiming the file space from the OS ehcache is not releasing it back to the OS immediately. It turns out if the element doesn't fit into the freeblock list, its simply added to the end of the file.

In our system, cache elements are getting updated from within our server logic and their size keeps varying. Most of them crossed the limit of what the freeblock list can accomodate. So the file size keeps growing continuously.

Is there a mechanism through which we can signal ehcache to de-fragment the file system to compact the data file to its original size ?

FYI, in one of our use cases, we found that the data file has increased up to 350MB of size and when we cleared all the data and load it again, it came down to 30MB.. that's quite an amount of fragmentation... isn't it ?

Greg,

You're right indeed, we don't need to shutdown the JVM to achieve this. I tried shuttingdown the cache manager and re-creating it again and it seems to work. Thanks for the info..

There was a small problem though, while this cache manager is shutting down, its taking a while ( our cache is kinda large 500k users in memory) flushing all its elements to the disk. Throughout this time the server starts rejecting the requests as any operation on the cache or Cachemanager throws an error.

Is there a way to avoid this ? In fact, this was the reason I have asked for re-load of configuration instead of re-start of cache manager. But i suppose there's no feature like that in ehcache ( as per amiller) am i right ?

We're using the Ehcache RMI replication model in our web application. Right now we're trying to tune the ehcache attributes like number of elements in memory, TTL etc to suite it best for our application's runtime behavior.

Problem is that, every time we change the configuration we had to restart the container in order to make the cachemanager to load the latest ehcache.xml file. Since we have so many other subsystems that initialize at the time of server startup, it consumes lot of time, everytime.

Is there a way Ehcache recognize the changes that happen to ehcache.xml file and reload its cache manager without re-starting the server ?

I tried doing it programatically, but creating a new CacheManager with the same ehcache.xml, but instead of using the same diskstore path, it has created 'ehcache_created_<<Timestamp>>' folder and all the cache files are stored under that folder. This is not desirable because the cache data stored under the diskstore path is not accessible to the newly created cache manager, so I can't simply replace existing cachemanager with the new manager object.

Any help or pointers here would be very helpful.

Regards,
Sarat kumar.