Terracotta Discussion Forums (LEGACY READ-ONLY ARCHIVE)

I believe this is the fastest, most scalable approach to partitioning based on Terracotta:
http://www.terracotta.org/confluence/display/labs/WorkManager

Also, Terracotta Tuning Part 2 Online Training is not exactly on the topic of Master / Worker but it talks about how to partition workload across JVMs so that the work doesn't force the data to move:

http://feeds.feedburner.com/~r/TerracottaVideoPodcasts/~3/168044666/i=tcvideo%252Fonline_training%252FTerracottaTuningPart2%252FTerracottaTuningPart2.mov

Let us know if this helps...

A few things here:

1. TC Server in production should be run with a backup server. We call it a PASSIVE. The passive ensures that the client will keep running w/o a hiccup

2. If you want the client to learn that the server is dead, take a look at what we call CLUSTER MEMBERSHIP EVENTS. They are not a set of exceptions, but instead events that you can register for, on a separate thread from the one blocked on the TC server I/O.

3. If your TC Server has no PASSIVE backup AND is not in persistent mode, it will not allow ANY of your JVMs back in when you restart the TC server process.

Now, All these are separate of your issue with the assertion error, as far as I can tell. But this is all to say that your test is somewhat invalid because TC is not designed to be HA / restartable / reconnectable when tested in the format you are testing.

Please let us know if you have a reason to test this way or if this was indeed just an attempt to test us in a failure scenario. If the latter, please look into all the above options and then try again.

Let us know either way if this helps OR if the things I suggest are not appropriate. We are here to help.

Thanks,

--Ari

Thanks Fabrizio...

Someone should have answered this for you...but last time I used our Treecache replacement module, I built it from source control, just like you.

I had remained quiet on this thread thinking someone must have put it into an automated build but that might not be the case.

We will make sure to get the build automated at some point and post back to this thread when we are done. Again, I could be wrong and there might be one out there, but like you, I could not find it.

https://jira.terracotta.org/jira/browse/CDV-438

--Ari

The problem might be that our comms implementation looked up the local machine's hostname when andré doesn't expect it to?!?

So far we know that you cannot change your hostname because you do not have permissions. But as the others on this thread pointed out, we haven't seen this issue before largely because the TC client libraries are using your tc-config.xml to decide where to connect and YOU are in control of your tc-config.

If you are running your TC Server on the same machine as your app, then make sure your <servers> block says:

<servers>
...
<server host="localhost" ...
</servers>
...

If localhost doesn't work, try 127.0.0.1. Just make sure it does not say "%h". If you are running your TC Server on another host (such as andre_cova), then try using its IP as returned by ipconfig using the command prompt in Windows (as Taylor stated).

Read more about config variables here:
http://www.terracotta.org/confluence/display/docs1/Configuration+Guide+and+Reference#ConfigurationGuideandReference-ConfigurationVariables

I do not recall seeing anyone on our mailing lists or on this forum testing it before. Please, go ahead and test it for all of our benefit, if you have time.

Here's a starting point:
http://unserializableone.blogspot.com/2007/04/performance-comparision-between.html

BTW, don't be surprised by your results...we do different clustering optimizations for different collections, all behind the scenes. So Javolution's performance will be dependent on its internal data structures and lock-striping approach.

Cheers...

I should be more clear here. I am not suggesting you marshal your app data into String format just to take advantage of compression. If you do, then you can end up giving up other performance optimizations for specific data types.

Simple rule: if it is String, it will compress with Terracotta and could go very fast. If it is not String, leave it in the app-native data type and if you have performance problems, use this forum to help sort it out. In other words, don't assume that Strings are your best bet for performance.

So, looking at your original post, you did not specify the existing data type you are using for this static data, so take my recommendation with a grain of salt.

Also,

make sure to store the data as String. We have compression built in to the product that automatically compresses strings when they travel on the network between your JVM and ours.

My initial answer was only that oswego utils CAN work with Terracotta. Not that all do. I should have been more specific.

Further, we should clarify that oswego support is not meant to imply that there is a config module or performance-optimized version of anything. If any part of oswego works, it will work the same way your own roots and locks and DMI and wait / notify would work. That is, if an oswego construct will work with Terracotta, it will work when specifying all the config by hand.

Sorry for the confusion.

right...try running the TC process on server1 through a TCP proxy and then killing the proxy instead of yanking the plug. You're uncovering a problem with having your JVM and ours on the same machine. But not explicitly a bug. As Saravan pointed out, this is as designed but it is an edge case of clustering in general you will tickle if you insist on clustering 2 server and 2 clients, one each on 2 boxes.

Why cant you write some shell or other script in the meantime so that dev writes a part of the config, ops writes a part and the 2 get merged...

I know the merge is not technically hard, but it seems that if dev and ops were to get periodic copies of each other's half of the config, then the key issue of dev potentially tweaking the wrong settings goes away.

tc-config-dev.xml and tc-config-ops.xml gets merged and dev knows ONLY EDIT tc-config-dev.xml.

Is that a viable workaround?

Ok, here are the answers I confirmed with the engineering team:

1. With respect to the TC Server, the active and passive should be of similar size. As I had suggested above, the passive can end up slowing down the active under certain circumstances.

2. With respect to other workloads on the passive, currently the active waits till the passive receives the transaction and any delay would slow the cluster. For the same reason that the passive should be sized similarly to the active, the passive should do the same workload as well.

3. WRT weighting or otherwise rigging the election, the only way to ensure that a specific node becomes active is to start that node first, let it become active and then start the rest. Since stopping all TC servers during the election (in order to force an outcome) is not viable for a failover scenario, this cannot / should not be used to get one particular passive to become active. We would have to add a weighting feature. You should add a JIRA feature request, IMO.

Sure...

on 2 core machines, we have seen fewer than 20 nodes and transactions that change hundreds of bytes per second each doing up to 60K ops per second. If the transaction changes thousands of bytes each, the ops per second will drop to the order of 6K ops per second. It depends on the type of change too. You might be able to enable compression and turn large changes to String data in your app into tiny updates to Terracotta, for example.

As for the other questions:

1. All servers do not have to be the same. This is true for your app nodes as well as for our TC nodes.

2. What will happen when the 2ndary is smaller than the primary under networked HA? Well, the 2ndary will take longer to apply changes if its CPU is slower and will apply fewer ops per second than the primary. From your question, I am gathering that you want to do what we did at one company I worked at and that is to have a smaller-scale disaster recovery capacity. With smaller capacity servers, you will server some fraction fewer transactions per second. And, with Networked HA, if the 2ndary is weaker, you can end up slowing down your active TC node as well as your app nodes. I will leave it to someone on our TC server core team to confirm whether or not the 2ndary must ACK a write before it is complete and if that is true under all locking semantics (concurrent vs. read/write).

3. If the backup goes through a spike of CPU and I/O, it will lose capacity for keeping up with the primary. Again, not a good idea but our core team will have to confirm if it will just fall behind or will actually slow down the whole app cluster.

4. If the rate of change is faster than the rate of synch to TC, there are throttling components and buffers inside our communication stack so that Tomcat will slow down along with TC. But, this will not slow down ALL TOMCAT instances. Only those who need to write shared objects. Reads will happen locally almost all the time assuming your Tomcat load balancer is sticky (as opposed to Round Robin). If reads happen to local heap, then TC's relative performance at any given moment is irrelevant. Note also that even clustered locks are sometimes optimized to be greedy and can lock locally w/o waiting for the TC server to grant the lock. Again, the answer is vague--sorry for that--but it is use case specific.

5. if the client cannot connect to the server, the default behavior is to hang, trying. If the client WAS connected in the past and loses its connection the server will quarantine the client and not let it back in. If the client NEVER connected, it will keep running through the list of TC servers in its config trying to find one to connect to. In all cases, JMX CLUSTER MEMBERSHIP EVENTS (you can search that as a topic on our site) will tell your app node what state it is in, and when it joins and leaves the cluster as well as the reason its status changed. Our field engineers have tools under a consulting engagement that they can provide to allow your Tomcat node to divorce from the cluster and run stand-alone when it has been quarantined.

6. weighted server elections. I will look into it. The core clustering engine we use is capable of many things we do not actually expose to users and I am not sure what is possible.

When you say you are running 64 bit linux servers w/ 4GB of RAM with persistence on, do you mean Tomcat session persistence? Or do you mean that those are Terracotta servers running in persistent mode.

I think there is a sizing guide on our site. But, the basic rule of thumb is that TC, when tuned properly, should not be CPU bound.

The way our customers who are in production have successfully sized is by running some tests with a load generator tool (specific to your app) and then run load against 1, 2, 4, 8 and more of their application nodes. Using NMON or some such tool, gather and then analyze CPU on your application nodes and on the TC server.

You are looking for total CPU utilization and I/O wait.

Specifically, if your app node runs at 30% CPU when connected toTerracotta, try the same load at 2 app nodes. If each runs 15%, then try doubling and quadrupling the load to get back to 30% and 60% respectively. If your CPU scales linearly, then your app nodes are easy to size. Just make a business decision as to how much CPU utilization is too much, and then do the math to arrive at a total node count with TC in the mix.

As for the TC Server, check its CPU at the same time that you are running all the above load tests. If its CPU scales linearly, then you can do simple math for TC's scale as well. the only catch is to watch not just total CPU utilization but I/O wait as well. If I/O wait is scaling linearly with load, your TC instance will become I/O bound at a certain scale. In such cases, more RAM is more important than CPU. If your I/O wait stays near zero but your CPU utilization is either too high for your comfort level or it is scaling too quickly as you add app nodes, then you will benefit from more / faster CPUs.

This is a simplification, but it should get you there. As another piece of info, our users range in size and their TC Servers range between
2 core 2GHz (or so) 32bit with 2GB of RAM

to 8 core 3GHz (or so) 64 bit, 32GB of RAM (or more)

Their disks range from simple 5400 rpm to 4 striped SCSI 15K RPM depending on the I/O characteristics of the app and how important disk-based persistence is.

Yes it does. We started there, actually.

http://blog.terracottatech.com/2005/08/fun_with_dso_and_cyclic_barrie.html

Now, ConcurrentHashMap for JDK 1.5 util.concurrent is probably going to perform better because we have done some tuning to support it. But you should try the 1.4 and let us know if you run into issues.

BTW,

I looked at your config and noticed that getHellos() is autolocked which is good. But, assuming it is a getter, you should set the lock-level to read. It is currently write which is higher overhead. If you intend to performance test this app, read locks for getters and write locks for setters can help get significant performance improvements without significant tuning effort.