I have 2 Terracotta session servers (v3.70) running in active/standby mode.
There are 14 app servers running Tomcat 6.0 and connecting to the session servers.
Problem:
The Tomcat server will crash sometimes and unable to recover. This has happened on prod a few times and I have been able to reproduce it on our test env. It seems load is a factor contributing to the failure but it's not the root cause. Sometimes the problem happens when the load is normal.
I believe this has something to do with the Terracotta setup when I checked the TC server and client log file.
I've attached the following info:
1. TC server log
2. TC client log
3. Tomcat thread dump
4. TC config
Below are some of the errors I found after a load test. The problem started to happen after 13:40 in the log file.
2012-10-01 13:41:45,960 [L1_L2:TCComm Main Selector Thread_R (listen 0.0.0.0:64336)] ERROR com.tc.net.protocol.transport.TransportHandshakeErrorHandlerForL1 - com.tc.net.protocol.transport.TransportHandshakeErrorContext: com.tc.net.protocol.transport.TransportHandshakeErrorContext: "Client Cannot Reconnect. ConnectionID(17.c8dd256f25c04e5eabe8bfff0c9ffbea.31348ea2-7275-420a-9e1f-e41b63c9fadc-13a1a1eea42)[] not found. Connection attempts from the Terracotta node at 10.32.4.81:39190 are being rejected by the Terracotta server array."Message Class: com.tc.net.protocol.transport.TransportMessageImpl
Sealed: true, Header Length: 32, Data Length: 426, Total Length: 458
|