[Logo] Terracotta Discussion Forums
  [Search] Search   [Recent Topics] Recent Topics   [Members]  Member Listing   [Groups] Back to home page 
[Register] Register / 
[Login] Login 
[Expert]
Terracotta not stable. Very huge data file  XML
Forum Index -> Terracotta Platform
Author Message
deepan_c

neo

Joined: 08/15/2012 07:22:18
Messages: 1
Offline

Hi,
We are new to have Terracotta 3.5.4 for clustering Software AG's webMethods server.
Two Terracotta servers in network active-passive mode. The server is not quite stable (i'm sure we might not have got the configurations correct)
[list]
1. Frequent SPLIT BRAIN issues and one of the server goes out of cluster frequently.
2. logs with long GC issue
Code:
2012-08-12 00:55:01,173 [L2_L1:TCComm Main Selector Thread_R (listen 193.9.194.45:9510)] WARN com.tc.net.protocol.transport.ConnectionHealthCheckerImpl. DSO Server - dkcph-wmprd1.dk.dfds.root:62189 might be in Long GC. Ping-probe cycles completed since last reply : 1
 

3. Recently the server we had a problem
a. Server1: After a SPLIT BRAIN scenario got shutdown and didn't bounce automatically.

Code:
2012-08-12 01:02:48,834 [WorkerThread(l2_state_change_stage, 0)] INFO com.tc.objectserver.tx.ServerTransactionManager - Waiting for txns to complete
 2012-08-12 01:02:48,834 [WorkerThread(l2_state_change_stage, 0)] INFO com.tc.objectserver.tx.ServerTransactionManager - No more txns in the system.
 2012-08-12 01:02:48,836 [WorkerThread(l2_state_change_stage, 0)] INFO com.tc.objectserver.tx.ResentTransactionSequencer - Making callback com.tc.objectserver.gtx.GlobalTransactionIDLowWaterMarkProvider$2@7ac0a26f pending since in State[ ADD_RESENT ] resent txns size : 0
 2012-08-12 01:02:54,801 [WorkerThread(receive_group_message_stage, 0)] WARN com.tc.l2.ha.L2HAZapNodeRequestProcessor - State[ ACTIVE-COORDINATOR ] received Zap Node request from another State[ ACTIVE-COORDINATOR ]
 NodeID : NodeID[193.9.194.45:9510] Error Type : Two or more Active servers detected in the cluster Details : State[ ACTIVE-COORDINATOR ] Received Election Won Msg : L2StateMessage [ NodeID[193.9.194.44:9510], type = ELECTION_WON_ALREADY, Enrollment [ NodeID[193.9.194.44:9510], isNew = false, weights = 9223372036854775807,9223372036854775807 ]]. A Terracotta server tried to join the mirror group as a second ACTIVE
 2012-08-12 01:02:54,802 [WorkerThread(receive_group_message_stage, 0)] INFO com.tc.net.core.TCConnectionManager - Active connections : 0 out of 0
 2012-08-12 01:02:54,803 [WorkerThread(receive_group_message_stage, 0)] WARN com.tc.l2.ha.L2HAZapNodeRequestProcessor - A Terracotta server tried to join the mirror group as a second ACTIVE : My weights = 0,-9223372036854775808,0,212247656539430,-4422222123843855360 Other servers weights = 5,-48487641866,15220,212247656604966,-927924186950064572
 2012-08-12 01:02:54,804 [WorkerThread(receive_group_message_stage, 0)] FATAL tc.operator.event - NODE : dkcph-wmprd1  Subsystem: CLUSTER_TOPOLOGY Message: SPLIT BRAIN, dkcph-wmprd1 and dkcph-wmprd2 are ACTIVE
 2012-08-12 01:02:54,804 [WorkerThread(receive_group_message_stage, 0)] WARN com.tc.l2.ha.L2HAZapNodeRequestProcessor - NodeID[193.9.194.45:9510] wins : Backing off : Exiting !!!
 2012-08-12 01:02:54,804 [WorkerThread(receive_group_message_stage, 0)] FATAL tc.operator.event - NODE : dkcph-wmprd1  Subsystem: CLUSTER_TOPOLOGY Message: NodeID[193.9.194.45:9510] has more clients. Exiting!!
 2012-08-12 01:02:54,806 [WorkerThread(receive_group_message_stage, 0)] ERROR com.terracottatech.dso - Marking the object db as dirty ...
 2012-08-12 01:02:54,828 [WorkerThread(receive_group_message_stage, 0)] ERROR com.terracottatech.console - This Terracotta server instance shut down because of a conflict or communication failure with another Terracotta server instance. The database must be manually wiped before it can be started and allowed to rejoin the cluster.
 
 2012-08-12 01:02:54,828 [WorkerThread(receive_group_message_stage, 0)] INFO com.tc.server.TCServerMain - ExitState : CallbackOnExitState[Throwable: class com.tc.exception.ZapServerNodeException; RestartNeeded: true]; AutoRestart: true
 2012-08-12 01:02:56,830 [CommonShutDownHook] INFO com.terracottatech.dso - L2 Exiting...
 

b. Server2 which was running quite ok, went down after not enough space in disk, when checked the data file was more than 14 GB in size and installed disk ran out of memory. This happened while there was no load on the server (this is a new production environment not yet live)
[/list]

Attached the logs for both servers (1 and 2), along with tc-config.xml.

Let me know if we have got any configuration wrong or should we tune some parameters to for the issues mentioned above.

Regards
Deepan
 Filename terracotta-server2.log [Disk] Download
 Description Server 2 Log file This server went down after no disk space. It had got 15 GB of data file
 Filesize 1908 Kbytes
 Downloaded:  22 time(s)

 Filename tc-config.xml [Disk] Download
 Description Terracotta config xml
 Filesize 7 Kbytes
 Downloaded:  95 time(s)

 Filename terracotta-server1.log [Disk] Download
 Description Server 1 Log file This server didn't bounce back after SPLIT BRAIN scenario
 Filesize 248 Kbytes
 Downloaded:  116 time(s)

ericm

jedi

Joined: 01/27/2011 17:23:34
Messages: 117
Offline

It looks like you had a network disrupt around 1AM on 8/12/2012.

2012-08-12 01:01:58,003 [L2_L1:TCWorkerComm # 0_R] INFO com.tc.net.core.TCConnection - error reading from channel java.nio.channels.SocketChannel[connected local=/193.9.194.45:9510 remote=/193.9.194.44:62189]: An existing connection was forcibly closed by the remote host

Then everything goes south from there.

Are you running in a virtual environment?

Eric Mizell (Terracotta Engineer)
 
Forum Index -> Terracotta Platform
Go to:   
Powered by JForum 2.1.7 © JForum Team