Terracotta Discussion Forums (LEGACY READ-ONLY ARCHIVE)

Thanks Ari.

you shouldn't get frustrated with any increase in resource utilization that you see

hmm - are you saying that this type of CPU increase on a DSO client is expected? We are seeing a 2-3x CPU increase on the DSO clients even after I tuned the locking as follows:

Locking Optimization: We were using ConcurrentHashMap for our caches and found that it was generating a large amount of locks since our app reads the caches heavily under load. So I (temporarily) changed to regular HashMap and added tc write locks around the cache write operations. I also closely examined the lock traces from the admin tool and removed whatever other unnecessary locking I could find. The only tc locks we have configured now are write locks - no read locks. I'm not sure if that is ok for us on a long term basis but short term I just want to reduce the number of locks generated.

So this strategy helped enormously in reducing the load on the tc server. Admin lock trace shows the number of locks greatly reduced and we now have the tc server at ~5% cpu when we load test the app so that is great. But I think I am almost at the point were reducing locks further is going to be difficult.

Class instrumentation. So my original question was around reducing the amount of class instrumentation. Currently we specify to instrument pretty much everything with some exceptions.
Code:

<instrumented-classes>
         <include><class-expression>com.aplia.platform..*</class-expression></include>
         <!-- exclude aplia cglib classes -->
         <exclude>com.aplia.platform..DAO</exclude>
         <!--But exclude Tomcat internals to improve performance of webapp load-->
         <exclude>org.apache.coyote..*</exclude>
         <exclude>org.apache.catalina..*</exclude>
         <exclude>org.apache.jasper..*</exclude>
         <exclude>org.apache.tomcat..*</exclude>
       </instrumented-classes>

I read that this impacts class loading time (which is fine for us) but could it also account for the DSO client CPU increase (2-3x) to the extent we are seeing? If so I will take the time to prune the class instrumentation down to the bare minimum.

Your load generator (is it round robin, random, or sticky in nature?)

For the load test we have 3 JBOSS app servers fronted by an F5 load balancer. The F5 is configured to use predictive load balancing but once a user is logged in the session is sticky via an F5 cookie.

Your cluster (not running in a cluster? Problem #1. Running in a cluster where other stuff is happening? Problem #2. etc.)

I guess I'm not sure your meaning here..? Everything runs on its own dedicated box. JBOSS and database are all dedicated servers. tc server is running on its own box in our load test.

In respect to the tuning docs - yes I have read them but I will go and review them again to make sure I didn't miss anything.

Hi - we have spent quite a bit of time now pairing our tc-config.xml <locks> down to the bare minimum. We also used the admin tool to target code causing excessive locks. This has resulted in a very significant reduction in the CPU load of the TC server. )

We have also achieved some reduction in the CPU load generated by the DSO clients (JBoss app servers) but still the app servers exhibit CPU load ~2x what they had before we used tc.

What further optimizations can we employ to reduce the CPU overhead on the DSO clients. For example, how much mileage will we get from carefully pruning the number of classes instrumented by <instrumented-classes>. I read that a reduction in this number of classes improves startup time but will it also reduce DSO client CPU?

Are there any other avenues we can pursue to reduce DSO client CPU? I am surprised we are seeing 2x DSO client CPU under heavy load testings. Is this expected or do we have something wrong?

Thanks.

Ok thanks. This helps my understanding..

But it does not explain why we are seeing such a performance degradation of the app since, from what I understand from what you just posted, most reads of the cache will read the local JVM copy (since the cache is pretty static). I will keep researching it..

BTW - when does the 2.6 version get released?

Steve - thanks for this info, its helpful to understand at a very high level what DSO does internally

Distributed locks are only used when you "tell us" to make a synchronized block into a distributed lock

.
Ok, just so I fully undertand - you are saying that an applicaton's read to a shared DSO cache that is not wrapped with a tc-config.xml <lock><lock-level>read will not generate a TC server round trip. Is that correct?

Then in that case I dont understand when and by what mechanism does a DSO client's copy of the cache get updated. i.e. how does a DSO client know that the copy of the cached object that was faulted into it's local JVM was updated by another JVM? Can you describe a little more about how this works.. from the 60,000 foot perspective I need to understand the cost of sharing these caches via DSO vs just having them local to the JVM. I think this is the root of our performance issues..

Related to this is - in our app (and the DSO usage I have described already in this post) when should we use <lock><lock-level>read. Is it correct that we never specify it in the tc-config.xml or should we be specifying it?

2) The admin tool gives great info on lock tracking. I am seeing a lot of read activiy on the caches from the JSP layer of the app. - so that might be an area to re-factor.

Thanks..

Steve -

1) CPU Bound - no not maxed out on either the TC server or the clients. But the client CPU is much higher (at least 2-3x) when TC enabled vs when not. The testing we are doing is load testing with ~5000 virtual user threads hitting 3 application servers (the DSO clients). The TC Server is running at about 50% CPU average with higher spikes.

2) and 3) I have viewed these stats but need to run the load test again to really get them analysed and documented.

My central question is really around the operation of TC during the (many) cache reads that the application performs. Does each read of each field of each cached object require a TC server round trip to aquire a read lock? This alone, seems like it would have the potential to degrade performance significantly (vs local JVM cache access) if the number of shared objects reads is large.

Hi Himanshu - thanks for your response.

1) What is the datastructure you are using to share your data

We have 7 caches that hold business objects. For each cache we use concurrent hash map to store the object string key and with it the associated business object. So the app grabs an object from the cache by key and then starts to access various fields on the object. If the object is not in the cache it is built from the database and then stored in the cache.

In many cases the values in the cache map themselves represent a fairly complex object graph. e.g, each business object may contain other collections and many other references to 'child' objects. As a hierarchial graph it can go down to 5 or 6 levels of reference.

2) What is your root

We have have an ArrayList that contains the 7 caches. It is the single DSO root.
<root>
<field-name>com.aplia.platform.cache.DistributedCacheManager.distributedCacheList</field-name>
<root-name>DistributedCacheList</root-name>
</root>

and the code instantiates the various caches as follows..
distributedCacheList.add(new AssignmentCache());
distributedCacheList.add(new AssignmentCompletionCache());
distributedCacheList.add(new ContextCache());
etc.

3) How are you configuring your locks (are there unnecessary ones ?).

Possibly there are unnecessary locks but I'm not sure.. Firstly, we rely on the built in TC concurrent hash map and do not explicitly configure <autolocks> for direct access to the keys or values in this hash map. Having read a business object from the cache the application then mainly reads values from business object fields but occaissionly updates the map value's object graph.

So then we started taking care of these writes (by adding a TC <autolock>) for each UnlockedSharedObjectException 'Attempt to access a shared object outside the scope of a shared lock' exception as it occurred. The app is pretty old and in some cases the thread synchronization was missing so we added some JVM synchronization as appropriate. After doing this we have ~10 <autolock> stanzas that specify write locks around the application code that updates the cache values object graphs.

Under load there are a large number of read access to the cache and I suspect it is this large volume of cache reads that is causing performance problems. Prior to Terracotta the caches existed separately on each of the 5 app server machines and were synchronized using JGroups. So prior to TC the reads to the caches were local JVM reads, after TC they are DSO shared object reads.

*** So Question ***: What is the % overhead to read from a DSO shared object vs from a local JVM read? I assume that for these read operations the thread must aquire a distributed read lock which requires a TC server round trip? Is that correct? So every read access to every object in the cache graph generates a TC server round trip and lock aquisition whereas previously it was just a local JVM cache read - do I have that correct?

4) What are the levels of locks (do you have unnecessary write locks?).

The only <autolocks> we explicitly specify are lock-level 'write' and from my understanding they are required to serialize updates to the cahce object graph. We do not anywhere specify lock-level 'read' - should we?

5) What is the number of reads/writes per application transaction (average)?

I am not sure but could figure it out if required but I think the number of reads is pretty large under the load test we are running. The admin console shows ~1500 tps on the TC server.

First it would be great if you could review the tc-config.xml and check for anything dumb and obvious.. If that shows up nothing I can start to analyse the applications rate of cache access.

<tc:tc-config xsi:schemaLocation="http://www.terracotta.org/schema/terracotta-3.xsd" xmlns:tc="http://www.terracotta.org/config" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">


<servers>


<server host="app4">
<data>data/server-data</data>
<logs>logs/server-logs</logs>
</server>
</servers>


<clients>
<logs>C:/Java/jboss-4.0.5.GA/server/default/log/%(webserver.log.name)</logs>

</clients>

<application>
<dso>

<locks>
<autolock>
<method-expression>* com.aplia.platform.cache.DistributedCacheManager.registerClusteredNode(..)</method-expression>
<lock-level>write</lock-level>
</autolock>
<autolock>
<method-expression>* com.aplia.platform.cache.DistributedCacheManager.createCaches(..)</method-expression>
<lock-level>write</lock-level>
</autolock>
<autolock>
<method-expression>* com.aplia.platform.cache.NRUCache$PruningThread.initializeSweep(..)</method-expression>
<lock-level>write</lock-level>
</autolock>
<autolock>
<method-expression>* com.aplia.platform.cache.NRUCache$PruningThread.updateSweep(..)</method-expression>
<lock-level>write</lock-level>
</autolock>





<autolock>
<method-expression>* com.aplia.platform.GenericState.*(..)</method-expression>
<lock-level>write</lock-level>
</autolock>


<autolock>
<method-expression>* com.aplia.platform.entity.Entity.getColumnName(..)</method-expression>
<lock-level>write</lock-level>
</autolock>




<autolock>
<method-expression>* com.aplia.platform.site.Course.getCourseWeeks(..)</method-expression>
<lock-level>write</lock-level>
</autolock>

<autolock>
<method-expression>* com.aplia.platform.site.Course.getTextbook(..)</method-expression>
<lock-level>write</lock-level>
</autolock>



<autolock>
<method-expression>* com.aplia.platform.problemset.ProblemSet.setProblemsValid(..)</method-expression>
<lock-level>write</lock-level>
</autolock>

<autolock>
<method-expression>* com.aplia.platform.problemset.ProblemSet.setOrderValid(..)</method-expression>
<lock-level>write</lock-level>
</autolock>

<autolock>
<method-expression>* com.aplia.platform.problemset.ProblemSet.getProblemsValid(..)</method-expression>
<lock-level>read</lock-level>
</autolock>

<autolock>
<method-expression>* com.aplia.platform.problemset.ProblemSet.getOrderValid(..)</method-expression>
<lock-level>read</lock-level>
</autolock>



<autolock>
<method-expression>* com.aplia.platform.problemset.MultiChoiceProblem.getAnswerOptions(..)</method-expression>
<lock-level>write</lock-level>
</autolock>



</locks>


<instrumented-classes>



<include><class-expression>com.aplia.platform..*</class-expression></include>


<exclude>com.aplia.platform..DAO</exclude>



<exclude>org.apache.coyote..*</exclude>
<exclude>org.apache.catalina..*</exclude>
<exclude>org.apache.jasper..*</exclude>
<exclude>org.apache.tomcat..*</exclude>

</instrumented-classes>






<roots>
<root>
<field-name>com.aplia.platform.cache.DistributedCacheManager.clusterNodeList</field-name>
<root-name>AppServerNodeList</root-name>
</root>
<root>
<field-name>com.aplia.platform.cache.DistributedCacheManager.distributedCacheList</field-name>
<root-name>DistributedCacheList</root-name>
</root>


</roots>

</dso>
</application>
</tc:tc-config>

Hi - we are trying TC with our app and have noticed a pretty dramatic slow down in user reponse times when we load test the application after enabling it to use TC

TC Server is running on a dedicated (reasonably powerful) 2 x dual core machine with 4GB memory.

The admin console shows a transaction rate of ~1500 per second. Is this considered 'high load' for TC Server?

What is the best way we can identify the 'low hanging fruit' in our setup that we can optimize to improve performance?

Thanks

ok, great thanks. This is good infomration..

Why not include it in the docs (or at least I did not find it there described so clearly..)

I'd imagine that a large percentage of TC usage in applications is for the Java collections classes. From I understand - what you are saying that developers can pretty much ignore the specificion of TC locks in the tc-config.xml when using those classes. If this is true, a statement similar in the doc seems like it would be worthwhile.

Ok - very interesting, that makes sense.

1) Which other classes are pre-instrumented like this?

2) So pre-instrumented is convenient but can you describe the locks that the pre-instrumentention generates. As the docs point out - one major issue for TC scalability is controlling excessive TC lock generation. Can you give me an idea of what locks the concurrent hash map will generate?

3) What is the recemmendation for code that may need to iterate over all the objects in the map? e.g. to remove unused entries.. Typically we do that with an Iterator that iterates over the entire map. What sort of locks will the pre-instrumentation generate for that type of access? Are there better ways (to reduce TC locks) to perform a task like this?

I will also re-create the exception mentioned above and send that.
Thanks

Hi – we are getting started with TC with our web app and have a basic question on how best to configure tc-config <locks> using concurrent hash map.
We use concurrent hash map as a database cache in the classic way..

Class DBCache {
static ConcurrentHashMap cache = new ConcurrentHashMap();

static Object Get(String key) {
Object cached = cache.get(key);
If (cached==null) {
cached = getfromdb();
cache.put(key, cached);
}
return cached;
}
}

Note that there is no explicit synchronized block in our code since ConcurrentHashMap provides the synchronization it requires but still allows a large degree of concurrent access. This highly concurrent behaviour is what we want.

So the question is when this cache becomes distributed by DSO what tc-config locks do we need to configure? During our initial config we actually configured no locks for the DBCache.get() method.

Question: With no locks configured we dont get exceptions from TC (during the cache.put()) and I dont understand why. From the documentation I understood that TC requires every update of a distributed object to be wrapped by a TC transaction which is defined by tc-config.xml configured <lock>.

So the question is do we need <locks> and if so what types and also, if we need locks, why does it not throw exceptions if they are not present.

When the cache instantiates it also creates a new thread that periodically prunes the cache (using cache.remove()) by removing objects that have been recently used. The code in this thread also does not have TC <locks> configured in tc-config. However this thread does encounter an exception when it tries to access the cache.

Thanks for you help..

OK, great thanks. That makes sense.

However in the same servlet (which is not clustered) we get a different exception with this code:-

public class DistributedCacheManager extends Servlet {

private static ArrayList<AbstractCache> distributedCacheList = new ArrayList<AbstractCache>();

where AbstractCache is an abstract class.

distributedCacheList is a root in tc-config.xml :-
<root>
<field-name>com.aplia.platform.cache.DistributedCacheManager.distributedCacheList</field-name>
<root-name>DistribuitedApplicationCaches</root-name>
</root>

Trace:-

[2008-03-11 17:23:45,965,STDERR] java.lang.ExceptionInInitializerError
[2008-03-11 17:23:45,980,STDERR] at sun.reflect.GeneratedSerializationConstructorAccessor39.newInstance(Unknown Source)
[2008-03-11 17:23:45,980,STDERR] at java.lang.reflect.Constructor.newInstance(Constructor.java:494)
[2008-03-11 17:23:45,980,STDERR] at com.tc.object.TCObjectFactoryImpl.getNewPeerObject(TCObjectFactoryImpl.java:89)
[2008-03-11 17:23:45,980,STDERR] at com.tc.object.TCObjectFactoryImpl.getNewPeerObject(TCObjectFactoryImpl.java:67)
[2008-03-11 17:23:45,980,STDERR] at com.tc.object.ClientObjectManagerImpl.createNewPeer(ClientObjectManagerImpl.java:1094)
[2008-03-11 17:23:45,980,STDERR] at com.tc.object.ClientObjectManagerImpl.createNewPeer(ClientObjectManagerImpl.java:1055)
[2008-03-11 17:23:45,996,STDERR] at com.tc.object.TCObjectImpl.createPeerObjectIfNecessary(TCObjectImpl.java:179)
[2008-03-11 17:23:45,996,STDERR] at com.tc.object.TCObjectImpl.hydrate(TCObjectImpl.java:107)
[2008-03-11 17:23:45,996,STDERR] at com.tc.object.ClientObjectManagerImpl.lookup(ClientObjectManagerImpl.java:526)
[2008-03-11 17:23:45,996,STDERR] at com.tc.object.ClientObjectManagerImpl.lookupObject(ClientObjectManagerImpl.java:423)
[2008-03-11 17:23:45,996,STDERR] at com.tc.object.ClientObjectManagerImpl.lookupObject(ClientObjectManagerImpl.java:412)
[2008-03-11 17:23:45,996,STDERR] at com.tc.object.applicator.ListApplicator.hydrate(ListApplicator.java:60)
[2008-03-11 17:23:45,996,STDERR] at com.tc.object.TCClassImpl.hydrate(TCClassImpl.java:155)
[2008-03-11 17:23:45,996,STDERR] at com.tc.object.TCObjectImpl.hydrate(TCObjectImpl.java:112)
[2008-03-11 17:23:45,996,STDERR] at com.tc.object.ClientObjectManagerImpl.lookup(ClientObjectManagerImpl.java:526)
[2008-03-11 17:23:46,012,STDERR] at com.tc.object.ClientObjectManagerImpl.lookupObject(ClientObjectManagerImpl.java:423)
[2008-03-11 17:23:46,012,STDERR] at com.tc.object.ClientObjectManagerImpl.lookupRootOptionallyCreateOrReplace(ClientObjectManagerImpl.java:836)
[2008-03-11 17:23:46,012,STDERR] at com.tc.object.ClientObjectManagerImpl.lookupOrCreateRoot(ClientObjectManagerImpl.java:615)
[2008-03-11 17:23:46,012,STDERR] at com.tc.object.ClientObjectManagerImpl.lookupOrCreateRoot(ClientObjectManagerImpl.java:598)
[2008-03-11 17:23:46,012,STDERR] at com.tc.object.bytecode.ManagerImpl.lookupOrCreateRoot(ManagerImpl.java:287)
[2008-03-11 17:23:46,012,STDERR] at com.tc.object.bytecode.ManagerImpl.lookupOrCreateRoot(ManagerImpl.java:266)
[2008-03-11 17:23:46,027,STDERR] at com.tc.object.bytecode.ManagerUtil.lookupOrCreateRoot(ManagerUtil.java:130)
[2008-03-11 17:23:46,027,STDERR] at com.aplia.platform.cache.DistributedCacheManager.__tc_setdistributedCacheList(DistributedCacheManager.java)
[2008-03-11 17:23:46,027,STDERR] at com.aplia.platform.cache.DistributedCacheManager.<clinit>(DistributedCacheManager.java:24)

Hi - thanks for the very rapid response..

No - DistributedCacheManager is not a clustered object. Its actually a servlet.

clusterNodeList is a static field inside that servlet :-

private static ArrayList<ClusterNode> clusterNodeList;

David

Hi - I am getting this exception and do not understand why since the code in the stack trace is within an <autolock>

TC-CONFIG
-------------

<autolock>
<method-expression>* com.aplia.platform.cache.DistributedCacheManager.registerClusteredNode(..)</method-expression>

<lock-level>write</lock-level>
</autolock>

METHOD-CODE
-----------------
private static ArrayList<ClusterNode> clusterNodeList;

public synchronized void registerClusteredNode(ClusterNode node) {
if (clusterNodeList == null) {
clusterNodeList = new ArrayList<ClusterNode>();
}
clusterNodeList.add(node);
}

STACK TRACE
----------------

at com.tc.object.tx.ClientTransactionManagerImpl.getTransaction(ClientTransactionManagerImpl
.java:278)
at com.tc.object.tx.ClientTransactionManagerImpl.checkWriteAccess(ClientTransactionManagerIm
pl.java:291)
at com.tc.object.bytecode.ManagerImpl.checkWriteAccess(ManagerImpl.java:681)
at com.tc.object.bytecode.ManagerUtil.checkWriteAccess(ManagerUtil.java:375)
at java.util.ArrayList.add(ArrayList.java)
at com.aplia.platform.cache.DistributedCacheManager.registerClusteredNode(DistributedCacheMa
nager.java:33)
at com.aplia.platform.cache.DistributedCacheManager.init(DistributedCacheManager.java:53)
at org.apache.catalina.core.StandardWrapper.loadServlet(StandardWrapper.java:1105)

Taylor - this is indeed the same issue. Applying the changes outlined in that thread corrects the issue.

It will be great when all this additional info is available in the regular doc!

David

Hi - We have a problem which appears to be caused by the introduction of Terracotta into our load balancing architecture whereby Apache/mod_jk load balancing looses its ability to make session sticky - i.e. route requests for existing sessions back to the worker node that created the session.

ARCHITECTURE-We are using Apache and mod_jk as a load balancer in front of JBOSS 4.0.5 app servers (the worker nodes) which in turn use embedded Tomcat 5.5 as the servlet engine.

What we are trying to do is use Terracotta to replicate sessions between the JBOSS/Tomcat app servers.

In order to make sessions sticky in the Apache/mod_jk/Tomcat architecture we need to modify jbossweb-tomcat55.sar/jboss-service.xml to set this property <attribute name="UseJK">true</attribute> to cause Tomcat to append the worker name to the session id. This is required to ensure that Apache/mod_jk routes requests for existing sessions through the the correct app server (worker node).

This is described in the JBOSS document here

http://www.jboss.org/wiki/Wiki.jsp?page=UsingMod_jk1.2WithJBoss

So this setting causes a session id like 1E6F8002396FE484925912 to be changed to 1E6F8002396FE484925912.worker1 and mod_jk then routes the request back to worker1. i.e. makes the session sticky.

Now when we introduce TC into the environment it appears that the worker name that was appended to the session id is removed.
TC is correctly replicating sessions between the worker nodes but the omission of the worker name from the session id causes the sessions to loose stickiness and they are not routed back to the original worker node.

Is there some additional configuration we need to make in TC, mod_jk or Tomcat to make this work correctly?

Thanks for your help.
David