| Author |
Message |
![[Post New]](/forums/templates/default/images/icon_minipost_new.gif) 05/08/2008 22:43:19
|
cljhyjs
journeyman
Joined: 05/07/2008 03:22:42
Messages: 10
Offline
|
hi,I have 2 servers startup with 2.6-stable4 cluster, when I shoudown activated server,then another server was activated, but printed below message in console:
"2008-05-09 06:35:40,949 INFO - Unable to find communications stack. ConnectionID(2.e3efe3c35a364bcf9647f0271fad1554) not found. This is usually caused by a client from a prior run trying to illegally reconnect to the server. While that client is being rejected, everything else should proceed as normal. "
why??
|
|
|
 |
![[Post New]](/forums/templates/default/images/icon_minipost_new.gif) 05/09/2008 08:24:47
|
zeeiyer
consul
Joined: 05/24/2006 14:28:28
Messages: 493
Offline
|
This perhaps means one of your client JVMs did not connect to the standby Terracotta server and is being rejected from joining the cluster - is that what you observed?
If so, you have to look into your client-reconnect-window (tc-config.xml) and l2.l1reconnect settings (in tc.properties) and what your client and server were doing when this happened, which resulted in one of your client JVMs not being able to connect to the standby server
|
Sreeni Iyer, Terracotta.
Not a member yet - Click here to join the Terracotta Community |
|
|
 |
![[Post New]](/forums/templates/default/images/icon_minipost_new.gif) 05/09/2008 12:55:50
|
gkeim
ophanim
Joined: 12/05/2006 10:22:37
Messages: 685
Location: Terracotta, Inc.
Offline
|
Is this happening on Linux?
|
Gary Keim (terracotta developer) Want to post to this forum? Join the Terracotta Community |
|
|
 |
![[Post New]](/forums/templates/default/images/icon_minipost_new.gif) 05/10/2008 08:48:56
|
cljhyjs
journeyman
Joined: 05/07/2008 03:22:42
Messages: 10
Offline
|
yes,is happening on Linux?
|
|
|
 |
![[Post New]](/forums/templates/default/images/icon_minipost_new.gif) 05/13/2008 14:13:27
|
gkeim
ophanim
Joined: 12/05/2006 10:22:37
Messages: 685
Location: Terracotta, Inc.
Offline
|
This probably means you have an old client that was once connected to that server linger about. If this is not the case, please try to provide more details or a script to reproduce the problem.
|
Gary Keim (terracotta developer) Want to post to this forum? Join the Terracotta Community |
|
|
 |
![[Post New]](/forums/templates/default/images/icon_minipost_new.gif) 05/14/2008 19:56:32
|
cljhyjs
journeyman
Joined: 05/07/2008 03:22:42
Messages: 10
Offline
|
I was found another exception with cluster,It was very deadliness,lead to two terracotta servers down , below log show in console:
2008-05-15 03:31:22,910 [WorkerThread(group_events_dispatch_stage,0)] INFO com.terracottatech.console - NodeID[192.168.100.55:9530] joined the cluster
2008-05-15 03:31:22,910 [TCComm Main Selector Thread (listen 0:0:0:0:0:0:0:0:9530)] INFO com.tc.net.protocol.transport.ConnectionHealthCheckerImpl. TCGroupManager - Health monitoring agent started for 192.168.100.55:48075
2008-05-15 03:31:23,074 [WorkerThread(group_handshake_message_stage,0)] INFO com.tc.net.protocol.transport.ConnectionHealthCheckerImpl: TCGroupManager - Connection to [192.168.100.55:48075] CLOSED. Health Monitoring for this node is now disabled.
2008-05-15 03:31:23,403 [WorkerThread(receive_group_message_stage,0)] INFO com.tc.l2.objectserver.ReplicatedObjectManagerImpl - Send response to Active's query : known id lists = 1387586
2008-05-15 03:31:25,615 [WorkerThread(receive_group_message_stage,0)] WARN com.tc.l2.ha.L2HAZapNodeRequestProcessor - Terminating due to Zap request from NodeID : NodeID[192.168.100.55:9530] Error Type : Newly Joined Node Contains dirty database. (Please clean up DB and restart node) Details : Nodes joining the cluster after startup shouldnt have any Objects. NodeID[192.168.100.50:9530] contains 1387586 Objects !!! : Exception :
java.lang.Throwable
at com.tc.l2.objectserver.ReplicatedObjectManagerImpl.handleObjectListResponse(ReplicatedObjectManagerImpl.java:165)
at com.tc.l2.objectserver.ReplicatedObjectManagerImpl.handleClusterObjectMessage(ReplicatedObjectManagerImpl.java:146)
at com.tc.l2.objectserver.ReplicatedObjectManagerImpl.messageReceived(ReplicatedObjectManagerImpl.java:120)
at com.tc.net.groups.TCGroupManagerImpl.fireMessageReceivedEvent(TCGroupManagerImpl.java:588)
at com.tc.net.groups.TCGroupManagerImpl.messageReceived(TCGroupManagerImpl.java:548)
at com.tc.objectserver.handler.ReceiveGroupMessageHandler.handleEvent(ReceiveGroupMessageHandler.java:22)
at com.tc.async.impl.StageImpl$WorkerThread.run(StageImpl.java:142)
2008-05-15 03:31:25,615 [WorkerThread(receive_group_message_stage,0)] WARN com.terracottatech.console - Terminating due to Zap request from NodeID : NodeID[192.168.100.55:9530] Error Type : Newly Joined Node Contains dirty database. (Please clean up DB and restart node) Details : Nodes joining the cluster after startup shouldnt have any Objects. NodeID[192.168.100.50:9530] contains 1387586 Objects !!! : Exception :
java.lang.Throwable
at com.tc.l2.objectserver.ReplicatedObjectManagerImpl.handleObjectListResponse(ReplicatedObjectManagerImpl.java:165)
at com.tc.l2.objectserver.ReplicatedObjectManagerImpl.handleClusterObjectMessage(ReplicatedObjectManagerImpl.java:146)
at com.tc.l2.objectserver.ReplicatedObjectManagerImpl.messageReceived(ReplicatedObjectManagerImpl.java:120)
at com.tc.net.groups.TCGroupManagerImpl.fireMessageReceivedEvent(TCGroupManagerImpl.java:588)
at com.tc.net.groups.TCGroupManagerImpl.messageReceived(TCGroupManagerImpl.java:548)
at com.tc.objectserver.handler.ReceiveGroupMessageHandler.handleEvent(ReceiveGroupMessageHandler.java:22)
at com.tc.async.impl.StageImpl$WorkerThread.run(StageImpl.java:142)
2008-05-15 03:31:25,615 [CommonShutDownHook] INFO com.terracottatech.dso - L2 Exiting...
|
|
|
 |
![[Post New]](/forums/templates/default/images/icon_minipost_new.gif) 05/14/2008 20:46:37
|
ssubbiah
jedi
Joined: 05/24/2006 14:25:22
Messages: 115
Location: Saravanan Subbiah
Offline
|
I only see one server going down. Did the active server went down too ? If so please post both the logs.
This exception is normal when you start a passive server with a persistent database. The active server is asking the passive serve to quit because there is data in the persistent data store. If you clean up the store and then restart passive server then this wont happen.
In future TC versions, this will be automatic.
cheers,
|
Saravanan Subbiah
Terracotta Engineer |
|
|
 |
![[Post New]](/forums/templates/default/images/icon_minipost_new.gif) 05/14/2008 20:56:51
|
ari
seraphim
Joined: 05/24/2006 14:23:21
Messages: 1665
Location: San Francisco, CA
Offline
|
I think you should step back a moment. It sounds like you may have several things misconfigured. What are you trying to do? Are you in production or running a test? Are you trying to test what happens when TC servers fails? Clients fail?
You are definitely encountering several configuration issues, but nothing that we see thus far is a bug in the software. With a bit more information we should be able to help.
Can you share your tc-config.xml?
Can you explain the test you are trying to run?
Can you explain a bit about what you did / what happened when you found these errors? Was the system down when you expected it to be up and so you scanned the logs looking for problems? Or were you explicitly testing TC active / passive failover?
More info please.
--Ari
|
|
|
 |
![[Post New]](/forums/templates/default/images/icon_minipost_new.gif) 05/14/2008 20:58:46
|
cljhyjs
journeyman
Joined: 05/07/2008 03:22:42
Messages: 10
Offline
|
Yes,when passive server going down. the active server went down too,Attachment is terracotta cluster l2 log.
| Filename |
log.txt |
Download
|
| Description |
terracotta cluster log |
| Filesize |
32 Kbytes
|
| Downloaded: |
218 time(s) |
|
|
|
 |
![[Post New]](/forums/templates/default/images/icon_minipost_new.gif) 05/14/2008 21:23:25
|
ssubbiah
jedi
Joined: 05/24/2006 14:25:22
Messages: 115
Location: Saravanan Subbiah
Offline
|
Again the log is for the passive server. (192.168.100.55)
Can you attach the log from the active server ? (192.168.100.50)
From the passive servers log, I see that there may have been some transient network problem between the active and the passive for about a second or so.
2008-05-14 08:40:14,031 [WorkerThread(group_events_dispatch_stage,0)] WARN com.tc.l2.ha.L2HACoordinator - NodeID[192.168.100.50:9530] left the cluster
....
2008-05-14 08:40:15,274 [WorkerThread(group_events_dispatch_stage,0)] INFO com.tc.l2.ha.L2HACoordinator - NodeID[192.168.100.50:9530] joined the cluster
This caused the active is request the passive to quit. If you want protect against such transient network failures, there are some configuration parameters. Our field engineers will be able to help u tune it.
I still dont see the active server quiting.
cheers,
|
Saravanan Subbiah
Terracotta Engineer |
|
|
 |
![[Post New]](/forums/templates/default/images/icon_minipost_new.gif) 05/14/2008 21:37:49
|
cljhyjs
journeyman
Joined: 05/07/2008 03:22:42
Messages: 10
Offline
|
thanks,Attachment is active server log。
How to configuration parameters that protect against such transient network failures?
| Filename |
nohup.out |
Download
|
| Description |
|
| Filesize |
30 Kbytes
|
| Downloaded: |
132 time(s) |
|
|
|
 |
![[Post New]](/forums/templates/default/images/icon_minipost_new.gif) 05/14/2008 22:06:48
|
ari
seraphim
Joined: 05/24/2006 14:23:21
Messages: 1665
Location: San Francisco, CA
Offline
|
You shouldn't simply configure Terracotta to "fix" transient network failures. I think your network / machines / operating systems are not configured right. Saravanan, correct me if I am wrong, but shouldn't cljhyjs fix the network and not try to work around the problem using Terracotta?
--Ari
|
|
|
 |
![[Post New]](/forums/templates/default/images/icon_minipost_new.gif) 05/14/2008 22:52:38
|
cljhyjs
journeyman
Joined: 05/07/2008 03:22:42
Messages: 10
Offline
|
ari wrote:
I think you should step back a moment. It sounds like you may have several things misconfigured. What are you trying to do? Are you in production or running a test? Are you trying to test what happens when TC servers fails? Clients fail?
You are definitely encountering several configuration issues, but nothing that we see thus far is a bug in the software. With a bit more information we should be able to help.
Can you share your tc-config.xml?
Can you explain the test you are trying to run?
Can you explain a bit about what you did / what happened when you found these errors? Was the system down when you expected it to be up and so you scanned the logs looking for problems? Or were you explicitly testing TC active / passive failover?
More info please.
--Ari
Ok,thank you response!
I just running a test, as I am now running a system which has millions of users. The maximum number of concurrent access requests is 10,000 per second, and in every second up to 10,000 user sessions are added. I want to use terracotta. but I want to know feasibility?
Currently I'm doing a performance test about session sharing. I have two servers (dell pc server,2cpu 2.4g,6G memory), each installed with terracotta; and 4 web servers, each installed with tomcat 5.5.
Here is the testing result:
When sessions raise up to 1.8 millions, I restarted a standby server. Then the following error happened, which caused the two servers down:
Attachment is tc-config file.
| Filename |
tc-config-server.xml |
Download
|
| Description |
terracotta server config file |
| Filesize |
3 Kbytes
|
| Downloaded: |
149 time(s) |
| Filename |
tc-config-tomcat.xml |
Download
|
| Description |
|
| Filesize |
4 Kbytes
|
| Downloaded: |
127 time(s) |
|
|
|
 |
|
|