[Logo] Terracotta Discussion Forums (LEGACY READ-ONLY ARCHIVE)
  [Search] Search   [Recent Topics] Recent Topics   [Members]  Member Listing   [Groups] Back to home page 
[Register] Register / 
[Login] Login 
[Expert]
Loss of network  XML
Forum Index -> Terracotta Platform
Author Message
Bill

neo

Joined: 12/07/2006 07:33:16
Messages: 3
Offline

I have just installed Terracotta and the first thing I wanted to try was to see what would happen if the network went down. What happened was that the Administrator Console disconnected and the Shared JTables locked up. Ok, I can see how that makes sense. However, when the network was reestablished the Administrator Console could not reconnect and the applications stayed locked up. Is that the expected behavior?
badari

journeyman

Joined: 12/07/2006 08:36:15
Messages: 19
Offline

Bill,

In order to answer your question, let me introduce some nomenclature:

L2 == Terracotta Server
L1 == client JVM / your app's JVM

Short answer is that if the L2 process stays running yet socket connections are lost, L1's are not allowed back. If the L2 dies, all L1's are allowed back as long as they reconnect within the configurable timeout (check the XML schema for that config setting). Only way for an orphaned L1 back into the cluster is to be restarted--this ensures no inconsistencies across the cluster.

Couple of notes:
1. In the very next release, we will provide a callback for you to take action when an L1 is orphaned from the cluster by a network failure.
2. We are discussing internally how to change the current behavior. From our head of engineering (Steve):
we could go to things like, if no locks are held the L1 can reconnect or we can allow it to reconect within a certain amount of time and a few other things but none are planned for the next release as of yet except giving more visibility into when it happens. For sessions only we may come up with a "continue after disconnect" mode and maybe even a reconnect but with basically a new session mananger (meaning that you lose all the clustered sessions if you, an L1, cannot reach an L2, but you create a new empty sessions hashMap and keep running stand alone).

Other key thing to watch out for, given our current behavior is how you simulate an outage. Pulling the plug on a network connection returns an immediate link-level error to the OS, and thus to L1's and L2s. This test will just result in behavior similar to what you are seeing. Kill -9 the L2 and restart it (within the timeout) will test our claimed behavior. Ctrl-Z (pause) an L1 if you want to see the cluster's ability to abandon that L1 and orphan it and, thus, keep moving.

Please post a JIRA issue if you want a different behavior that Steve is contemplating for the future. Alternatively, feel free to continue this thread with us here on the forum till we answer your questions successfully.

Hope this helps.

--Ari
Bill

neo

Joined: 12/07/2006 07:33:16
Messages: 3
Offline

Thanks for the info. What I was doing was pulling the plug on the router. What I expected was that everything would pause until the network came back. As I don't know all the issues involved I'm not qualified to make any suggestions as to what the behavior should be.
tgautier

seraphim

Joined: 06/05/2006 12:19:26
Messages: 1781
Offline

By default, the Terracotta server does not store data to the disk, which is required if you want to kill the L2 and bring it back without loss of function to the L1s.

There is a setting in the config file which controls this behavior, it is in the servers/server section and is named "persistence".

To enable permanent-store mode, which will allow you to kill the L2 server, you need to change the persistence mode from the default, "temporary-swap-only" to "permanent-store".

The setting would look like this:

config.xml:
[code]
...
<servers>
<server ...>
<persistence>
<mode>permanent-store</mode>
<persistence>
</servers>
...
[code]


For detailed information on the config settings, consult our online documentation at http://www.terracotta.org or read the sample config file provided in the download in the config-sample directory.
Regards,

Taylor
[WWW]
Bill

neo

Joined: 12/07/2006 07:33:16
Messages: 3
Offline

I haven't looked at what happens if a L2 fails yet. The main problem I have is that the network around here goes up and down a lot. If that breaks the whole thing than that limits the the kind of jobs I can run (i.e. no jobs that run for weeks). If I had money I could build my own subnet, but I don't have money.
 
Forum Index -> Terracotta Platform
Go to:   
Powered by JForum 2.1.7 © JForum Team