Terracotta Discussion Forums (LEGACY READ-ONLY ARCHIVE)

This isn't as much of an Ehcache specific question as it is a general question about a particular usage scenario and the caching pattern that might be used to address it.

I'm building a server-client system where the clients will only be in use for shorts bursts at a time. During their off time, they are really "off", not just disconnected. The server holds both a light and heavy set of data that the client needs to know about whenever it is "on". Fundamentally, I want to figure out how to keep the two different data sets in sync between the client and the server.

The light data is actually pretty easy. I don't even need to sync it really. I can use a small database accessible to both the server and all of the clients, and if necessary do some small amount of caching on the client which is cleared whenever a client turns back "on".

It's the heavy data that I'm not 100% sure about. Certainly I have many ideas about how to manually keep such data in sync, but I am assuming that there must be a well established pattern for doing this as it's certainly not a requirement unique to my situation. More specifically, the heavy data is images; potentially hundreds of megabytes of them.

Might someone point me in the right direction?

Thanks for the reply.

(a)Yes, I found the toolkit. Indeed it does seem interesting, however there are a couple of concerns, the most important of which is that it appears to be quite dependent on the application being clustered. While our application is indeed clustered in its largest deployment instance, the vast majority of our deployments are in single server mode. As things currently stand in our application, there's no trace of Terracotta in those single server deployments. I did play around with the toolkit and wrapped it with our own code to hide some of the cluster details, but there are still a few rough sports. for instance, it's difficult to dynamically determine if one is a member of a cluster. If you try to create a client and connect to the server without a server being present, the client connection mechanism just goes on and on and on trying to make a connection; never timing out. You can set a limit to the number of retries but if you do then bam!!! The app actually exits if no connection is made. Not exactly an ideal result. Despite these niggles, it is a relatively simple approach to some clustering needs and if we can get it to meet our needs, it seems like a viable approach for our application as opposed to custom DSO roots and locks.

(b) As to whether it meets our needs, that's kind of what my original post what trying to determine. We have an archaic search system (that we hope to soon replace with something more centralized, but not yet) and when the search index of one of the cluster members is updated, we need to notify the other cluster members to index the same item. There are a couple of other similar situations that we have, until now, handled with Spring events clustered with a much older version of Terracotta. I'm hoping that there's a simple way to do some similar messaging that we can use until we replace our search system at the end of this year.

(c) I guess I'll throw some general feedback related to finding stuff on the site here. First, between the products page, and the documentation, its a bit difficult o determine what was being offered between the commercial and open source version of the application. Yes, there is a nice clear differentiation page, but then the docs and some other pages muddle things by using different terminology and only referring to the enterprise version.

In general though, the ability to find information is improved from where it was a year or so ago.

We've been using a very old version of Terracotta for several years, and we've finally decided to upgrade, but things have changed quite a bit. I'm noticing large sections of the documentation encouraging users not to use DSO but to instead use the "standard" or "express" version of the application. Some of the arguments made for doing this are pretty compelling so I wanted to check if our use case would allow us to migrate from DSO.

We currently only really use Terracotta for some caching and messaging. I know that the caching is a no-brainer in the express product , but for messaging we used Spring Events. We've known for a long time that this usage of Spring Events wasn't supported in newer versions of Terracotta. It's one of the reasons why we took so long to upgrade. I had already been thinking about how to get rid of these Spring Events and use the tim-messaging module instead, but that would mean using the DSO product. If we wanted to migrate away from DSO, then what might we do instead with the "express" product to meet this need?

jvoegele wrote:

The URL for the Terracotta repository is:
http://www.terracotta.org/download/reflector/releases

This redirects to "http://repo.terracotta.org/maven2/" which appears to be down. Is this a temporary condition?

ilevy wrote:

I think that if an object is materialized on any node, it will fault in a certain (configurable) number of reachable objects, not the whole graph.

Very interesting. Thank you. I'll experiment with this and see what happens.

Lets say that I am using Terracotta to cluster objects that are made up of very deep graphs. If I create that object on one node of the cluster and another node requires a part of the data in that graph, is the entire object copied to the second node, or is it just the data that was needed with the rest being paged in as needed?

I ask because our current domain model is highly interconnected. Basically every object in our system in connected to every other object along some path or another. We have been using a graph database and it implements this model quite well, however we have run into a number of little problems and we're now looking for an alternative such as Terracotta backed by a normal relational database, or even just Terracotta by itself.

MWelch wrote:

Thank you for the reply, however the linked page did not seem to contain any useful information and I don't understand how an XML serialization library is related to this topic. I'm clearly missing something.

Whoops. Color me embarrassed. You have to be logged in to the full content of the linked page. Now I see it. Thanks.

mj wrote:

these cases will be handled by Terracotta "automagically":

http://www.terracotta.org/web/display/orgsite/Operations+Guide#OperationsGuide-DeployingANewVersionofYourApplicationSoftware

For more complex changes, you'll need to create appropriate update-routines (e.g. copy fields from one bean to another) or you'll have to dump/modify/reload your whole data with external tools (e.g. XStream: http://xstream.codehaus.org/).

Thank you for the reply, however the linked page did not seem to contain any useful information and I don't understand how an XML serialization library is related to this topic. I'm clearly missing something.

I just read the blog posting called "Kill Your Database with Terracotta (http://willcode4beer.com/design.jsp?set=kill_your_db) that was linked from the .org site here and one thing that I don't yet understand is what would happen when the object model that was being persisted was modified in some manner.

For example, let's say you have an Employee object that has the attributes "firstname", "lastname", and "phone". You run version 1.0 of your application and some Employee objects are created and put into the Terracotta stack. Terracotta persists it's own state to disk so that even if the application shuts down that data is safe. Lets say you then deploy version 1.5 of your application and now there's a new attribute of the Employee object called "address". Or worse, lets say you removed the "phone" attribute and replaced it with with "workphone" and "mobilephone" attributes. How do you migrate those objects that were persisted under the old model? Or do you?

That's fantastic. Thanks!

This is from the "Design Patterns" part of the .org website:

http://www.terracotta.org/web/display/orgsite/Write+Behind+to+SOR

I'm interested in this, but other than this one reference, I can not find any else related to this topic. Is this functionality documented? Is there an example anywhere?

I have no great love for the database so a solution that can be durable but not require the DB is certainly something I'd consider.

ari wrote:

As for the write-up on TC as a write-behind cache of the db, thus inverting the model and making the cache the SoR, I can get on that now. No one has written it till now.

I owe it to several folks by now. What are your time frames? Could you maybe PM me the use case so I can make sure I am not wasting your time? (Or post it here if you are able to.)

There is no rush on my end. I am merely investigating the use of IMDG's in general. During my last few projects I have grown increasingly frustrated with the standard pattern of achieving scalability in mid-size data centric applications (small size being simple internal app and large being the huge, multi-million users apps that require extremely customized infrastructure). These mid size apps still need to be clustered but they shouldn't require the specialized infrastructure of a large scale app. There are plenty of solutions like distributed caches for hibernate, but I just can't help but feel that there is a better, faster way. I've experimented with in-memory object databases, Oracle Coherence, GigaSpaces EDG. Each has their pros and cons. I guess Coherence has been the best fit so far, but it has the double whammy working against it: 1) being from Oracle, a difficult (to put it kindly) company to work with, and 2) being prohibitively expensive.

I'd like to find a solution, or at least repeatable pattern, that would help me on these kinds of mid-size, data-centric applications.

Thank you for your reply.

ari wrote:

1. The easy way: just use TC as a cache through a Map as your interface. TC is already durable to disk so you don't need some fancy IMDG to scale out your storage or to get pseudo-persistence via replication. Its fast and easy. You will need a Queue and a separate thread/JVM flushing the changes to the map asynchronously to the DB.

This approach sounds interesting. Are there any articles or best practices I might look at that would help me to do a proof of concept for this kind of setup?

A separate, but ultimately related, question: Since Terracotta uses a centralized server, is it impossible to have an in-memory data set larger than the memory capacity of that one server? Other products might tackle this by partitioning the data across several servers, but I'm not sure how Terracotta would handle it.

I hate to be so bold as to ask someone who might answer this to read such a long blog post, but is there a Terracotta solution that achieves something like what is described here:
http://natishalom.typepad.com/nati_shaloms_blog/2008/03/scaling-out-mys.html

In short, Nati describes an In Memory Data Grid (IMDG) that acts as an asynchronous tier between an application and a database. The actual system of record is the IMDG and transactions are run against that instead of the database.

Oracle Coherence seems to dominate this IMDG space right now, but the product is prohibitively expensive for startups or small businesses and dealing with Oracle is not fun even under the best conditions.