Terracotta Discussion Forums (LEGACY READ-ONLY ARCHIVE)

Unfortunately we couldnt spot any thing funny from the thread dump you sent.

Could you also thread dump the L2 when the transaction rate flat lines ? Take a series of 5-10 thread dumps when the txn rate flat lines and send it along. Hopefully that will give us more information.

Also can you please give use more info on your usecase. Specifically we would like to know how many L1s are there, producers/consumers ratio, how many objects are you adding to the queue, the rough size/type of objects etc.

Oops, sorry about that. I edited the post to link to the correct post. The post talks about eclipse but it holds true for regular vms too.

Let me know if you are still have trouble getting a thread dump.

Sure, post ur tc-config along with the server and client logs.

To take thread dump in Linux/Solaris, you can issue the following command.

kill -3 <pid>

where pid is the process id of your java process. (here L2 and L1 pids)

To take thread dump in Windows, check out this thread http://forums.terracotta.org/forums/posts/list/803.page#4772

Yes it takes some time to instrument a lot of classes, but it is a one time penalty you pay. There is also some overhead executing a instrumented class since we have to make some checks to see if the object is shared or not.
Then there is the over head of publishing the data to the cluster.

That being said it does look like there is something wrong with timings that you are seeing.

Can you take some thread dumps of the L2 and the L1 when the transaction rate flat lines ?

Also if you can share your app that would help too. My email is ssubbiah at terracottatech dot com

Cheers

Hi Darin,

We have pushed a fix for a problem with similar symptom which we have been able to reproduce in our test environment. Please try our latest nightly builds from truck to see if it helps your case.

thanks,
Saravanan

Hi Darin,

I think you may have stepped on a bug that we are currently fixing in 2.6 (trunk) I imagine that we would be pushing the fix into trunk in a couple of days. I will update the thread when we do it.

You could then may be try it out with the nightly build to see if it fixes your problem.

thanks,
Saravanan

One more question, are these machines bass and guinness some kind of virtual hosts running on the same machine ? It seems like. and I am only guessing here, there is some issue with the network setup.

One thing I noticed from the logs is that in the bass server log, initially it prints the server name as bass.wtcdev.com but later in the logs the server name is resolved as bass.wtdev.com (Note that wtcdev becomes wtdev)

2008-02-25 11:15:37,309 [main] INFO com.tc.l2.ha.L2HACoordinator - This L2 Node ID = NodeID[tcp://bass.wtcdev.com:9530]

...

2008-02-25 11:15:43,419 [pool-1-thread-1] WARN com.tc.net.groups.TribesGroupManager - Message from non-existing member org.apache.catalina.tribes.membership.MemberImpl[tcp://bass.wtdev.com:9530,bass.wtdev.com,9530, alive=1203956141321,id={0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 }, payload={}, command={}, domain={}, ] . Adding this node to nodes = {NodeID[tcp://guinness.wtcdev.com:9530]=[ NodeID[tcp://guinness.wtcdev.com:9530] => org.apache.catalina.tribes.membership.MemberImpl[tcp://guinness.wtcdev.com:9530,guinness.wtcdev.com,9530, alive=0,id={0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 }, payload={}, command={}, domain={}, ] ]}

One other reason I think this my be due to some funny network setup issue is because you are able to reproduce this consistently and we are seeing this issue for the first time.

More details into your network setup might help. One thing that I would like you to try is to see what nslookup reports for each servername from each of the servers.

Also another thing (if these were infact some virtual hosts) is that this may be due to the fact that we are binding to 0.0.0.0 by default. Can you try specifying bind address in the tc-config and specify each server's address so that we bind to only those interfaces ?

Let us know the results of these.

thanks,
Saravanan

Hi jhaile.

Can you please attach the logs from both the servers ? From whatever you posted, it does look a little wierd. That assertion is thrown because it looks like the node that sent the message also ended up receiving it. Full logs may give us more clues.

Is this consistently reproducible ? BTW, we are revamping the communication stack in 2.6 so this issue should not be present.

thanks,
Saravanan

Terracotta always uses transactions when writting to berkeley DB JE. In persistence mode the transactions are made durable and hence should be available on disk on commit. So I dont think you have to call explicit sync().

Also calling Environment.sycn() only works if you are calling it in process with L2, if I am not wrong.

You can tune any JE DB parameter through the tc.properties. You just have to prepend "l2.berkeleydb." to the JE properties.

Cheers,
Saravanan

I did some testing with my fedora box and thought I will post some results here.

Check out http://ipsysctl-tutorial.frozentux.net/chunkyhtml/tcpvariables.html

Particularly tcp_retries2 seems to be of interest here. The default value for tcp_retries2 in my fedora box was 15. And looking at the packets using tcpdump, you can see that once you pull the plug, the stack tries to resend the data that many times, each time increasing the interval between the resends.

When I reduced the value of tcp_retries2 to 3 in my fedora box, I got good results. The L1s were failing over to the other L2 before the reconnect window closes and the cluster proceeds forward.

echo 3 > /proc/sys/net/ipv4/tcp_retries2

Note that tcp_retries2 along with Retransmission timeout determines the actual timeout and Retransmission timeout is connection dependent. Also you need to be careful while setting this value since in unreliable network with a lot of packet collision etc., it may be required to resend the packets multiple times.

I am sure someone already posted this, we are coming up with a new L1-L2 healthcheck feature in 2.6 so that users dont have to tweak these TCP settings anymore to get this right.

cheers,
Saravanan

Hi dpope,

The problem that you hit with 2.5.0 seems different from that of the one that was originally posted even though the symptom looks similar.

Unfortunately in this case, the logs didnt give us any clue as to why it happened. One possible reason for this could be that the machine was overloaded and unresponsive (because of 100% CPU, swapping etc.) that the transaction timedout. Do you think that could be the case ?

Is this reproducible ? If so, will you be able to share the app with us so that we can debug this scenario ? You can contact me directly at ssubbiah at terracottatech dot com if you prefer that.

thanks,
Saravanan

We have been floating around the idea of multithreading Apply stage for sometime and the idea is very similar to the one you are proposing except there will be a n:m ratio between the number of clients to the apply threads and transactions from a certain client will always be assigned to a specific thread, thus maintaining transaction ordering.

We havent implemented this idea yet since we have never hit a scenario where the apply stage is the bottleneck till now. We hit the disk or sleepycat or the network or lock contention as the bottleneck depending on the usecase. I can imagine apply stage to be the bottle neck when you make many small transactions to the same set of objects in a loop.

On the other hand, the CPU usage never beyonds 25% in my 8-core machine no matter how many transactions committed from clients.

Are you looking at the overall CPU usage ? Can you look at each individual CPU usage ? You can use nmon or similar tool to look at it. It will be interesting to see if one or two cores are pinned at 100% usage. In which case your usecase may benefit from multithreading apply stage.

One thing to note is that even in non-persistence mode you may be hitting the disk if your data doesnt fit the memory. So disk may be the bottleneck too. Once again nmon is a great tool to look at your disk throughput.

The precondition is a client should not enter object A's synchronization block until all the transactions on A is fully applied in it's VM, no matter whether the transactions modify A or not.

This is exactly what is happening in the client right now.

If you think you have a strong usecase for multithreading apply stage, please share your app with us if possible, We will be glad to look at it.

thanks,
Saravanan

We are trying to understand how this happened. The log files gives us some clues but not enough to pin point the problem. Some more information will be useful.

1) It seems like the L2 servers were stopped and restarted multiple times. Looking at log file 10, I see that 192.168.100.102 machine joined and left the cluster multiple times. Can you please gzip and send us all the logs of the prior runs from both L2s if you have them ?

2) Did you have the cluster up and running for a long time or did it happen immediately ? I see that the Global transaction ID reached 345840. Do you think that there were about that many transactions created by the L1 before this happened ? This could a very important clue in understanding this problem.

3) Is this problem reproducible ? If so, can you give us the steps ?

4) Is your application using DMI ?

5) Do you have the data files from sleepycat from this run ? (*.jdb files)

Anyother information that you can provide us is also very useful.

thanks,
Saravanan

Hi,

This is not an expected behavior. In fact I am thinking this may be some environment issue, It is hard to tell from just this exception.

Can you please attach the entire server logs (active & passive) along with the client (L1) logs ?

Also are there some steps we can follow to reproduce this ?

thanks,
Saravanan

The original problem that you saw was due to some transient communication error between ACTIVE and PASSIVE servers.

2007-11-20 10:56:02,775 [WorkerThread(group_events_dispatch_stage,0)] WARN com.terracottatech.console - NodeID[tcp://172.42.1.191:9530] left the cluster
2007-11-20 10:56:02,775 [WorkerThread(group_events_dispatch_stage,0)] WARN com.terracottatech.console - NodeID[tcp://172.42.1.191:9530] left the cluster
2007-11-20 10:56:02,776 [WorkerThread(channel_life_cycle_stage,0)] INFO com.tc.objectserver.handler.ChannelLifeCycleHandler - Received transport disconnect. Shutting down client ChannelID=[-100]
2007-11-20 10:56:02,784 [WorkerThread(channel_life_cycle_stage,0)] INFO com.tc.objectserver.persistence.impl.TransactionStoreImpl - shutdownClient() : Removing txns from DB : 0
2007-11-20 10:56:02,788 [WorkerThread(group_events_dispatch_stage,0)] INFO com.tc.l2.ha.L2HACoordinator - NodeID[tcp://172.42.1.191:9530] joined the cluster
2007-11-20 10:56:02,788 [WorkerThread(group_events_dispatch_stage,0)] INFO com.terracottatech.console - NodeID[tcp://172.42.1.191:9530] joined the cluster

We use Tribes internally for group communication between the L2s and I think sometimes rarely the Health check between the Active and Passive server gives false positives. The resulting exception that you see is just a side effect of it.

We are currently looking at ways of solving this.

Meanwhile, the only time we have been able to reproduce this is when there were long GC pauses in the L2 because the machine was running out of resources and swapping to disk. (Full GCs in the order of minutes) When we tuned the memory settings, this problem was not reproducible.

Did you get this when your server was doing a long Full GC ? You can run Jstat and look at the GC time and see if you can tune the VM to have shorter GC cycles.

As far as 2.5-stable1 is considered, it is still not a final release yet. We are doing a lot of testing and will be happy to fix issues if you give us details. Can you please send us the stack trace and logs or open up a JIRA ?

thanks,
Saravanan