We have run into a strange issue lately in using the SelfPopulatingCache construct in ehcache 2.1.0.
We started getting EOFExceptions consistently when trying to read elements from the diskstore. I'm not quite sure about the real reason behind this but it seems to be happening while de-serializing the Element object from the byte[]. Following is the stack trace of the error.
Code:
Caused by: java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:375)
at java.io.ObjectInputStream$BlockDataInputStream.readInt(ObjectInputStream.java:2776)
at java.io.ObjectInputStream.readInt(ObjectInputStream.java:950)
at net.sf.ehcache.Element.readObject(Element.java:797)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:974)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1849)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1753)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1329)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:351)
at net.sf.ehcache.store.compound.factories.DiskStorageFactory.read(DiskStorageFactory.java:264)
at net.sf.ehcache.store.compound.factories.DiskPersistentStorageFactory.retrieve(DiskPersistentStorageFactory.java:201)
With this exception stacktrace in my server logs, I see two weird behavior happening on the server...
The thread which is responsible for populating the element in the SelfPopulatingCache got stuck in WAITING state forever.
I was not able to shutdown the cache manager at all, i had to kill it forcibly to release the control.
As usual, digged into the code and found out that ehcache is acquiring a read lock on the key object before reading it from the actual cache, but before it could unlock it a runtime error is thrown (in this case a EOFException ), so the control directly goes to the catch block where u try to acquire a write lock on the cache to insert a null element in the cache. I believe, this action is triggering the thread to wait forever since it thinks that someone else is already having a read lock on the same key. I have also made a fix like this...
Original Code :
Code:
Sync lock = getLockForKey(key);
acquiredLockForKey(key, lock, LockType.READ);
Element element = element = cache.get(key);
lock.unlock(LockType.READ);
Patched Code:
Code:
Sync lock = getLockForKey(key);
acquiredLockForKey(key, lock, LockType.READ);
Element element = null;
try
{
element = cache.get(key);
}
finally
{
lock.unlock(LockType.READ);
}
With this fix, the unthinkable happened. Not only that the thread resumed its execution successfully, but I suddenly stopped getting the EOFException in the first place. I was unable to reproduce the use case after this fix. I have tried with the original jar and the patched jar, I consistently see that the code with original jar fails all the time and the patched jar works all the time for the same use case.
Even if the results of my experiment are inconclusive, it does make sense to put the lock/unlock code inside a try/finally block...just to make sure we don't hang the JVM if there's a runtime error at the time of reading the element from the ehcache. What do you guys think ?
Regards,
Sarat kumar Beesa.