Author |
Message |
12/05/2006 01:29:19
|
rtlusty
journeyman
Joined: 12/05/2006 00:56:12
Messages: 11
Offline
|
Is there some elegant way how to maintain shared register of cluster nodes?
I think, that it's quite easy to maintain cluster shared list of cluster nodes where I can on startup add members and on shutdown remove them.
But what I can do if some cluster node dies without unregistering? Terracotta DSO server probably knows about it. But is there some possibility to be informed of such event?
|
|
|
12/05/2006 10:37:57
|
gkeim
ophanim
Joined: 12/05/2006 10:22:37
Messages: 685
Location: Terracotta, Inc.
Offline
|
Area you referring to clustered servers, in the hot-backup failover scenario, or the set of clients currently attached to a server?
|
Gary Keim (terracotta developer) Want to post to this forum? Join the Terracotta Community |
|
|
12/05/2006 17:31:22
|
steve
ophanim
Joined: 05/24/2006 14:22:53
Messages: 619
Offline
|
You've read our collective minds. For our next release we are planning on adding a few cool features around this.
1) You will be able to mark a map as your node map and it will always be up to date with the current list of nodes. You'll be able to use this to a) keep track of what nodes are currently connected and b) store values by node as an indirect form of communication between nodes.
2) Eventing. You'll be able to specify that you want methods that you choose to be called on node join, node leave connect and disconnet.
|
Want to post to this forum? Join the Terracotta Community |
|
|
12/05/2006 21:53:01
|
kbhasin
consul
Joined: 12/04/2006 13:08:21
Messages: 340
Offline
|
In the meanwhile, you can use the clusterwide coordination of Terracotta to detect when a node leaves a cluster.
Each node grabs a lock at startup (registering). Once all nodes have started up (you can use java.util.concurrent.CyclicBarrier in jdk 1.5 to detect this), every node tries to grab the lock which is held by other nodes (this is flexible e.g. you can have a buddy system or every node looking out for every other node etc.). If node1 acquires the lock held by node2, it means that node2 is down and you can take appropriate action (redistribute traffic etc.).
Since Terracotta provides cluster wide coordination of threads, you can achieve all of the above in POJOs.
Regards,
Kunal Bhasin.
|
Regards,
Kunal Bhasin,
Terracotta, Inc.
Be a part of the Terracotta community: Join now! |
|
|
12/06/2006 11:33:31
|
tgautier
seraphim
Joined: 06/05/2006 12:19:26
Messages: 1781
Offline
|
I've actually created some code a while back that does what Kunal suggests. If you have a burning use case shoot me an email and I can send you the code.
I put the project on hold because I was solving the problem at the "user" level when I knew Steve and the engineering team were thinking about solving it at the TC level ('under the hood' so to speak, which has better knowledge about cluster membership).
If you have thoughts on how the interface should look - I urge you to join our tc-dev mailing list and start a thread, or go to the public wiki and let's start an incubator project or at least a feature discussion board around it.
Just for clarity's sake, the lock holding trick works very well, it's just a bit tricky to implement. The basic mechanics are this:
one node becomes a "master". You can easily designate a master using "userland" code by acquiring a master lock and never letting it go.
The master spins off a thread that reads items from a queue. When a node starts, it creates a shared lock that is for that node only, "drops in to the lock" - e.g. enters the monitor and never leaves, and then puts the lock into the queue.
The master now receives this lock from the queue and then spins off a thread that tries to acquire the lock. When the nodes leaves the cluster, for any reason, the master will then acquire the lock and it can do whatever task you like - including updating a shared map that records cluster membership.
There is one last tricky case, it's when the master fails. Upon startup the new master has to know which nodes are in the cluster, and spin off threads for all of the locks/nodes already there - essentially it has to restore the lock-waiting state that the previous master would have had.
And all of this is actually pretty tricky synchronization code which can usually take some time to get just right.
As I said, I already wrote code that does all of this work for you - if you want/need it you're more than welcome to it just let me know.
Taylor
|
|
|
|