deepan_c
neo
Joined: 08/15/2012 07:22:18
Messages: 1
Offline
|
Hi,
We are new to have Terracotta 3.5.4 for clustering Software AG's webMethods server.
Two Terracotta servers in network active-passive mode. The server is not quite stable (i'm sure we might not have got the configurations correct)
[list]
1. Frequent SPLIT BRAIN issues and one of the server goes out of cluster frequently.
2. logs with long GC issue
Code:
2012-08-12 00:55:01,173 [L2_L1:TCComm Main Selector Thread_R (listen 193.9.194.45:9510)] WARN com.tc.net.protocol.transport.ConnectionHealthCheckerImpl. DSO Server - dkcph-wmprd1.dk.dfds.root:62189 might be in Long GC. Ping-probe cycles completed since last reply : 1
3. Recently the server we had a problem
a. Server1: After a SPLIT BRAIN scenario got shutdown and didn't bounce automatically.
Code:
2012-08-12 01:02:48,834 [WorkerThread(l2_state_change_stage, 0)] INFO com.tc.objectserver.tx.ServerTransactionManager - Waiting for txns to complete
2012-08-12 01:02:48,834 [WorkerThread(l2_state_change_stage, 0)] INFO com.tc.objectserver.tx.ServerTransactionManager - No more txns in the system.
2012-08-12 01:02:48,836 [WorkerThread(l2_state_change_stage, 0)] INFO com.tc.objectserver.tx.ResentTransactionSequencer - Making callback com.tc.objectserver.gtx.GlobalTransactionIDLowWaterMarkProvider$2@7ac0a26f pending since in State[ ADD_RESENT ] resent txns size : 0
2012-08-12 01:02:54,801 [WorkerThread(receive_group_message_stage, 0)] WARN com.tc.l2.ha.L2HAZapNodeRequestProcessor - State[ ACTIVE-COORDINATOR ] received Zap Node request from another State[ ACTIVE-COORDINATOR ]
NodeID : NodeID[193.9.194.45:9510] Error Type : Two or more Active servers detected in the cluster Details : State[ ACTIVE-COORDINATOR ] Received Election Won Msg : L2StateMessage [ NodeID[193.9.194.44:9510], type = ELECTION_WON_ALREADY, Enrollment [ NodeID[193.9.194.44:9510], isNew = false, weights = 9223372036854775807,9223372036854775807 ]]. A Terracotta server tried to join the mirror group as a second ACTIVE
2012-08-12 01:02:54,802 [WorkerThread(receive_group_message_stage, 0)] INFO com.tc.net.core.TCConnectionManager - Active connections : 0 out of 0
2012-08-12 01:02:54,803 [WorkerThread(receive_group_message_stage, 0)] WARN com.tc.l2.ha.L2HAZapNodeRequestProcessor - A Terracotta server tried to join the mirror group as a second ACTIVE : My weights = 0,-9223372036854775808,0,212247656539430,-4422222123843855360 Other servers weights = 5,-48487641866,15220,212247656604966,-927924186950064572
2012-08-12 01:02:54,804 [WorkerThread(receive_group_message_stage, 0)] FATAL tc.operator.event - NODE : dkcph-wmprd1 Subsystem: CLUSTER_TOPOLOGY Message: SPLIT BRAIN, dkcph-wmprd1 and dkcph-wmprd2 are ACTIVE
2012-08-12 01:02:54,804 [WorkerThread(receive_group_message_stage, 0)] WARN com.tc.l2.ha.L2HAZapNodeRequestProcessor - NodeID[193.9.194.45:9510] wins : Backing off : Exiting !!!
2012-08-12 01:02:54,804 [WorkerThread(receive_group_message_stage, 0)] FATAL tc.operator.event - NODE : dkcph-wmprd1 Subsystem: CLUSTER_TOPOLOGY Message: NodeID[193.9.194.45:9510] has more clients. Exiting!!
2012-08-12 01:02:54,806 [WorkerThread(receive_group_message_stage, 0)] ERROR com.terracottatech.dso - Marking the object db as dirty ...
2012-08-12 01:02:54,828 [WorkerThread(receive_group_message_stage, 0)] ERROR com.terracottatech.console - This Terracotta server instance shut down because of a conflict or communication failure with another Terracotta server instance. The database must be manually wiped before it can be started and allowed to rejoin the cluster.
2012-08-12 01:02:54,828 [WorkerThread(receive_group_message_stage, 0)] INFO com.tc.server.TCServerMain - ExitState : CallbackOnExitState[Throwable: class com.tc.exception.ZapServerNodeException; RestartNeeded: true]; AutoRestart: true
2012-08-12 01:02:56,830 [CommonShutDownHook] INFO com.terracottatech.dso - L2 Exiting...
b. Server2 which was running quite ok, went down after not enough space in disk, when checked the data file was more than 14 GB in size and installed disk ran out of memory. This happened while there was no load on the server (this is a new production environment not yet live)
[/list]
Attached the logs for both servers (1 and 2), along with tc-config.xml.
Let me know if we have got any configuration wrong or should we tune some parameters to for the issues mentioned above.
Regards
Deepan
| Filename |
terracotta-server2.log |
Download
|
| Description |
Server 2 Log file
This server went down after no disk space. It had got 15 GB of data file |
| Filesize |
1908 Kbytes
|
| Downloaded: |
22 time(s) |
| Filename |
tc-config.xml |
Download
|
| Description |
Terracotta config xml |
| Filesize |
7 Kbytes
|
| Downloaded: |
95 time(s) |
| Filename |
terracotta-server1.log |
Download
|
| Description |
Server 1 Log file
This server didn't bounce back after SPLIT BRAIN scenario |
| Filesize |
248 Kbytes
|
| Downloaded: |
116 time(s) |
|