Terracotta Discussion Forums (LEGACY READ-ONLY ARCHIVE)

Hi, I’ am trying to test the performance of Terracotta product.

I use this scenarious:

1) Terracotta Server run on Linux Server using 1.4.2_04 java version

2) Terracotta Client run on WindowsXP using 1.4.2_05 java version

3) Terracotta Client run on Linux Server using 1.4.2_04 java version

In attachment you can find the main class of the client, InfoHash class, and the config file that I use to perform the tests.

I run the first client with this arguments :

0 1000 N

that insert 1000 objects (InfoHash) in the HashMap with key from 0 to 999

I run the second client with this arguments :

1000 2000 N

that insert 1000 objects (InfoHash) in the HashMap with key from 1000 to 1999

The results are:

1) If I run only one client, it performs all test in ~ 2,3 sec

2) If I run clients in parallel, the times increase infact each client employ ~10 sec

Is there a way to increase the performance? For us these results, are not very good.

It seems the server to be the bottleneck of system

May the limit to 4 clients (if it refers to channel not to the number of application) affect the performances?

If yes could you give us a trial license and commercial information.

Have you same benchmark result that can give us some idea related to performance issue?

We appreciate any help.

Thanks

Hello:

Thanks for the posting. There seem to be several questions in here:

1.
The client-application greedily acquires locks from the Terracotta server. And flushes fine-grained object-mutations to the Terracotta server at a transaction-boundary that "naturally" occurs in Java - i.e. at the end of the synchronized block. From your test-code, we see that there is a tight loop and the synchronized block gets exercised for each iteration of the loop. This is perhaps an extreme-case - so if you test-case permits you, you could synchronize outside the loop (or in batches of certain sizes) - you would see much better numbers.

2.
Yes, the Terracotta Server is a hub - and is to be sized/scaled as a function of the clustered IO your application does. Currently there is only an Active/Passive configuration that is supported - although in the future, multiple active Terracotta servers could be deployed.

3.
The 4 client limit is only from a licensing perspective - technically there is no limit. Ofcourse, each client creates some overhead on the Terracotta server - but as mentioned earlier, the most important factors with regards to Terracotta server scalability is the amount of clustered-I/O and the extent of inter-node contention on any single object (since locking is fine-grained as well).

4.
We can send you by email a license-key (Ofcourse note that what you download could be used in a Dev environment - only caveat is that the Terracotta server will exit after 10 hours and will need to be restarted) and commercial/pricing information based on your application needs (Terracotta-Sessions and Terracotta-Spring is free for upto 4 nodes).

We will be happy to field a call and/or discuss further.
Thanks,
Sreeni Iyer

ebagini,

In addition to the information you've received from SIyer, our development team took a look at your example and they have some suggestions;
***********
There are several performance bottlenecks built into your code.

For example look at the following fragment of HelloWorld.java

for (int i = inizioC; i < fineC; i++) {
if (key.equals("N")) {
//String l=pippo1[i];
synchronized (pippo1[i]) { // **** 1
hellos.put(pippo1[i], new InfoHash("pippo" // **** 2
+ hellos.size(), "pluto" + hellos.size()));
}
....

This uses a fine grained lock to modify the collection. The trouble there is that the hellos collection is a Hashtable which is synchronized implicitly and autolocked by DSO.

Also it is not understood why it is needed to lock on the key value when doing insert. And the value of pippo1[i] can change between lines 1 and 2 below.

It should give you a performance boost if those puts are batched up into a temporary Map and then inserted using hellos.putAll(batchedHellos).

Your shown code is only adding new elements into collection. So, it does not really show Terracotta advantages for field-level incremental changes. I'd suggest to add the following scenario:

1. insert all elements into the distributed collection
2. randomly choose a key from that collection
3. within synchronized block on instance of that key
4. retrieve collection element for that key
5. modify retrieved instance
6. repeat steps 2-5 N times

To make that even more realistic, your test could use Map of Lists of InfoHash objects and randomly choose and update several infohash elements from list instance for given key.
***********

So, as SIyer suggested, we are available to discuss further via here or a call if you like.

Regards,

Hi, thanks for the reply
I'll try to describe our intentions in the follow;
the purpose of the code I have posted is to see the performance of terracotta
under stress test (for this reason I've choosen to synchronize each iteration )
my application contest is the follow:
I have a "cluster" (different jvm on different hw node) that share a HashMap and concurrently access (in read and/or write mode on multiple key in different piece of code so I need to explicitly manage the transaction with synchronize block) this structure (whithin a transaction for consistence purpose) to execute some services;
I need to have high performance (this application is related to application server SIP in telco environment); I'de like to understand if the performance (that are not so good) I've seen is concerned to a configuration problem or (as you suggest) implementation problem. Could you suggest me how to resolve this scenario and how to prepare a stress test for this contest?
Have you a benchmark of your system that I can read?
Let me know if you need more information.
Thanks.

Hello ebagini:

Apologies for the delay in responding.

1.
If you could enumerate your H/W (cpu, ram), O/S and VM heap parameters, it would help us run comparable tests.

2.
As you mentioned, this test is an extreme case - so the closer the test is to your real app - the better it will advise you as to what the clustering overhead would be - In that it would be useful to account for the nature of operations to the Hashmap (how many gets versus puts) and what is the extent of contention on the puts/gets across nodes (i.e. simultaneous access on the same key across nodes).

3.
In this worst case test, there are no other "configuration" optimizations that immediately jump out (apart form the ones previously mentioned - such as the hashtable.put being already implicity synchronized, over and above the synchronized block around put in HelloWorld.java) - although upcoming versions feature improvements if the latency is due to object-creation and support for Java1.5 concurrent collection framework which do well compared to plain old Hashmaps with explicit synchronization around accessor methods.

4.
I will look into some of our internal benchmarks and find one that resembles this "extreme" case - and get back to you.

Best,
Iyer.

The linux server (one for the Terracotta server and one for client)
are hp dl380 with linux red-hat ES 3 with 4 cpu (Intel(R) Xeon(TM) CPU 2.80GHz) and 1Gb of ram, the window xp professional sp2 pc is a hp nc6120 with 512 Mb of ram and pentium M 1.6 Mhz.
the vm option are: -Xms128M -Xmx128M

In attachment you can find a document that explain the scenario.

Let me know if exists an other way to implement this scenario to have best performance.
Thanks.

Hello:
Apologies for the delay in responding. Hope you've received the email with the license-key etc.

Yeah the test-case makes sense - there isn't anything else that one can do with regards to config-file settings, but for refactoring the app to define synchronization-boundaries appropriately - so as to minimize number of round-trips to the Terracotta server with regards to lock acquisition/release.

Another suggestion is to try 2.0.1 which is now publicly available for download - it features performance improvements (especially with regards to cluster-wide object creation, which your pseudo-code indicates your test does a lot of).

Thanks.