Terracotta Discussion Forums (LEGACY READ-ONLY ARCHIVE)

Hi,

Could somebody explains to me how the persistence works and why disk usage is so huge compared to memory usage ?
When 1.9 GB of memory is used I have 440 GB in the cache/persistence/cachedata directory.
I would like to know how to dimesion my disk space.

My persistence configuration is <persistence strategy="localRestartable" synchronousWrites="true"/>

One clue : the data in memory are often updated.
Does the disk stores some kind of history ?

thanks a lot in advance for any piece of answer.

Benoit.

What versions are you using?

You mention that the data is updated often. Is there any data alongside this frequently updated data that does not change frequently? There was an issue that we fixed around an interaction between very frequently updated data and nearly static data that might explain the size discrepancy.

Would it be possible for you to attach the ehcache xml you used and describe a bit about what kind of load is placed on each cache?

We use bigmemory-3.7.2.jar
and using the config:
<persistence strategy="localRestartable" synchronousWrites="false"/>

The /persistence/cachedata directory used too huge, like as:
513M seg000000005.frs
513M seg000000006.frs
513M seg000000007.frs
513M seg000000008.frs
513M seg000000009.frs
513M seg000000010.frs
513M seg000000011.frs
513M seg000000012.frs
513M seg000000013.frs
513M seg000000014.frs
513M seg000000015.frs
513M seg000000016.frs
513M seg000000017.frs
...

How can we control the files number and size?
Thanks.

I've run into a similar issue (cache size was exploding as i was building up the caches), and came to the same conclusion that it must be some sort of journalling issue. I rewrote the initialization to avoid lots of small changes to existing cache elements (instead inserting only once it was fully formed), which seemed to fix the issue. The issue i have now as updates tickle in the diskstore slowly grows (disproportionally more then i'm inserting), until it runs the machine out of disk space.

The obvious thing that occurs to me to try is to turn off the async writes for the diskstore (assuming our guess of whats going wrong is correct i.e that multiple changes to a element before the first change can be flushed to disk is causing this behavior).

Any advice would be welcome. I would also like this mechanism explained some more, because i assume you must have some sort of compaction scheme to deal with this, and i'm wondering why it isn't working for me (as i don't deal with alot of updates once initialised).

What's your access pattern like? There's a known issue with the BigMemory Go 3.7.x line that could cause unbounded growth of the log when there's a mix between static data and dynamic data in a single log.

For example, this issue can manifest itself if you have a cache that is loaded once at the start of the application and effectively read-only after that point, while there's another cache that is constantly being overwritten.

That issue has been fixed in the Bigmemory Go 4.0.x line. You can retest with that to see if there still is an issue.

I'm using bigmemory 4.04.

I tried turning to synchonous writes but it still keeps growing with updates.

Any ideas how I can debug the issue?

So still struggling with this, thought i would add some config details

updatecheck=false
maxbytelocaloffheap="10g"
overflowtodisk="false"
overflowtooffheap="true"

then the caches have eternal="true" and maxEntrieslocalheap set, and synchronousWrites="false"

could eternal = true be causing the diskstore to continue to grow? My in-memory side isn't noticeably increasing?

Synchronous writes doesn't really have a direct affect on the size on disk relative the size in memory. There's no write coalescing before a batch of changes get dumped off to disk. So if you do 3 puts over the same key, we'll write down 3 puts. Where sync write could hurt is the overhead incurred by having to write that often.

Do you know what kind of access pattern your app is doing? Is it just writing a bunch of keys and overwriting all of them? Or overwriting bits at random? Or is it never overwriting?

Some other useful info would be to take a thread dump and see if the compactor thread is sleeping, or just not keeping up. If you could post the log files somewhere (assuming they don't contain any sensitive info) that would be useful. If not, posting a reproducible use case would be helpful.

Thanks for the feedback, ill try and attach with jmx and see what the compactor thread is doing.

The access pattern is just updating a small subset of random keys and occasionally adding new ones. My test system is just updating on the same keys.

So your saying that any update should grow the diskstore, but the compactor should come along and reduce that ? I'm seeing rapid growth (once every cache initalised im using about 8gb of disk, but it doesn't take many updates to take this out to 20gb+, only like ~%1 of keys updates, no significant growth in value sizes afaik)

Checking out my test rig via jmx is reporting 2 compcatorthreads, they spend 99.9%+ of time in a wait state, this does look like a good lead.

Yes, any update will grow the disk store. The compactor will run based on a trigger condition (time or number of replaces/removes), and run based on a policy to evacuate live data out of the oldest file(s). Old files are then deleted.

Is the write frequency high? Is the test just writing in a tight loop? There's a bit of 'niceness' built into the compactor that tries to give priority to the user writes. With enough user writes flooding out the channel, it's possible that the compactor is not keeping up. What the compactor is doing should be visible if you take a few thread dumps.

Do you know where the compactor thread is waiting? There's a semaphore that it waits on to start, and another spot it waits on (write to disk).

Thread dump says its in the run method of class compatorimpl at line 125

My test isn't writing keys in a tight loop,

im running the refresh method on selfpopulatingcache fairly often.

How often are you refreshing? The SelfPopulatingCache.refresh() method will wind up re-putting every element in the cache, which in restartable log terms means the whole cache is going to get copied over. In theory this should have generated enough garbage to trip the compactor run threshold. It's strange that it didn't.

Are there multiple caches in your test case?