Author |
Message |
|
We have implemented a lot of clustering infrastructure on top of terracotta, eg a specialized map-reduce implementation with sophisticated recovery when nodes fail, a task channel (somewhat like your master-worker), clustered locks, barriers, cluster membership & control services, etc.
ehcache is just that, a distributed cache, and cannot be the answer...
Should we infer that Terracotta is NOT going to support DSO going forward and Java 1.6 is the last supported version? If the answer is yes, that is very disappointing, I have pushed for terracotta adoption at all my jobs for the last 6 years (including introducing terracotta to one of your 9 big customers listed on your home page...)
|
|
|
Hi,
when can we expect support for DSO clustering with Java 7? Our (quite extensive) use of DSO clustering fails right away in Terracotta 3.7.3 + JDK1.7.0_<7 to 13>
Trying to run the pojo samples, I get:
LINUX 64b:
Starting BootJarTool...
2013-02-05 13:05:04,432 INFO - Terracotta 3.7.3, as of 20130116-060539 (Revision unknown-21992 by cruise@su10vmo118 from 3.7.3)
2013-02-05 13:05:04,844 INFO - Successfully loaded base configuration from file at '/home/oflo/Terracotta/terracotta-3.7.3/platform/samples/pojo/chatter/./tc-config.xml'.
********************************* WARNING **********************************
* Not including instrumented ConcurrentHashMap in boot jar
****************************************************************************
2013-02-05 13:05:06,981 INFO - Creating boot JAR file at '/home/oflo/Terracotta/terracotta-3.7.3/lib/dso-boot/dso-boot-oracle_linux_170_07.jar'...
2013-02-05 13:05:07,135 INFO - Successfully created boot JAR file at '/home/oflo/Terracotta/terracotta-3.7.3/lib/dso-boot/dso-boot-oracle_linux_170_07.jar'.
Error occurred during initialization of VM
java.lang.NullPointerException
at java.util.Hashtable.__tc_put(Hashtable.java:542)
at java.util.Hashtable.put(Hashtable.java)
at java.lang.System.initProperties(Native Method)
at java.lang.System.initializeSystemClass(System.java:1115)
WINDOWS7 32b:
Starting BootJarTool...
2013-02-05 13:11:55,843 INFO - Terracotta 3.7.3, as of 20130116-060539 (Revision unknown-21992 by cruise@su10vmo118 from 3.7.3)
2013-02-05 13:11:56,696 INFO - Successfully loaded base configuration from file at 'C:\Users\oflorescu\terracotta-3.7.3\platform\samples\pojo\chatter\TC-CON~1.XML'.
********************************* WARNING **********************************
* Not including instrumented ConcurrentHashMap in boot jar
****************************************************************************
2013-02-05 13:11:58,886 INFO - Creating boot JAR file at 'C:\Users\oflorescu\terracotta-3.7.3\lib\dso-boot\DSO-BO~1.JAR'...
2013-02-05 13:11:59,108 INFO - Successfully created boot JAR file at 'C:\Users\oflorescu\terracotta-3.7.3\lib\dso-boot\DSO-BO~1.JAR'.
Error occurred during initialization of VM
java.lang.NoSuchFieldError: offset
at java.lang.String.getCharsFast(String.java)
at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:416)
at java.lang.StringBuilder.append(StringBuilder.java:132)
at sun.nio.cs.FastCharsetProvider.lookup(FastCharsetProvider.java:119)
at sun.nio.cs.FastCharsetProvider.charsetForName(FastCharsetProvider.java:136)
at java.nio.charset.Charset.lookup2(Charset.java:487)
at java.nio.charset.Charset.lookup(Charset.java:475)
at java.nio.charset.Charset.defaultCharset(Charset.java:618)
at sun.nio.cs.StreamEncoder.forOutputStreamWriter(StreamEncoder.java:56)
at java.io.OutputStreamWriter.<init>(OutputStreamWriter.java:111)
at java.io.PrintStream.<init>(PrintStream.java:104)
at java.io.PrintStream.<init>(PrintStream.java:151)
at java.lang.System.initializeSystemClass(System.java:1141)
Looking at JavaLangStringAdapter.java, it is still instrumenting based on the old java.lang.String class ("offset" member, etc).
Thanks
|
|
|
Thanks twu! Do you have a bug tracing number for the fix?
|
|
|
Considering that we have not yet managed to reproduce this deadlock, and that we cannot yet upgrade, I am interested to find out if higher versions (3.5.x or 3.6.x) have fixes in this area that could potentially have fixed this. If I cannot easily reproduce, how can I prove that 3.5.4 fixes it? Thanks for your help
|
|
|
We are NOT using Shibboleth; our application is clustered using straight DSO clustered collections, etc, it was their post that pointed me to the root cause. The aggressive opts are actually appealing to us in a couple of specific cases/customer work loads, so supporting TreeMap would allow us to run those customers on clusters AND use the aggressive opts
|
|
|
We recently experienced long kernel pauses (linux versions bellow 2.6.38) under severe memory pressure and L1s were dropped. When we restarted them, we noticed that terracotta server was not responsive, and when forcing a thread dump, we found a clear deadlock. The server had default logging level, and the log reveals nothing for the hours prior to L1s being dropped.
I have bellow the 2 deadlocked threads:
Java stack information for the threads listed above:
===================================================
"OOO Connection Restore Timer":
at com.tc.net.protocol.transport.MessageTransportBase.isConnected(MessageTransportBase.java:191)
- waiting to lock <0x00000000e0bdbb70> (a com.tc.net.protocol.transport.MessageTransportStatus)
at com.tc.net.protocol.delivery.OnceAndOnlyOnceProtocolNetworkLayerImpl.sendMessage(OnceAndOnlyOnceProtocolNetworkLayerImpl.java:364)
at com.tc.net.protocol.delivery.OnceAndOnlyOnceProtocolNetworkLayerImpl.close(OnceAndOnlyOnceProtocolNetworkLayerImpl.java:278)
at com.tc.net.protocol.tcm.AbstractMessageChannel.close(AbstractMessageChannel.java:135)
at com.tc.net.protocol.tcm.ServerMessageChannelImpl.close(ServerMessageChannelImpl.java:18)
at com.tc.net.protocol.tcm.ChannelManagerImpl.notifyChannelEvent(ChannelManagerImpl.java:103)
at com.tc.net.protocol.tcm.AbstractMessageChannel.fireEvent(AbstractMessageChannel.java:243)
at com.tc.net.protocol.tcm.AbstractMessageChannel.fireTransportDisconnectedEvent(AbstractMessageChannel.java:203)
at com.tc.net.protocol.tcm.AbstractMessageChannel.notifyTransportDisconnected(AbstractMessageChannel.java:199)
at com.tc.net.protocol.tcm.ServerMessageChannelImpl.notifyTransportDisconnected(ServerMessageChannelImpl.java:18)
at com.tc.net.protocol.delivery.OnceAndOnlyOnceProtocolNetworkLayerImpl.connectionRestoreFailed(OnceAndOnlyOnceProtocolNetworkLayerImpl.java:445)
at com.tc.net.protocol.delivery.OOOReconnectionTimeout.restoreConnectionFailed(OOOReconnectionTimeout.java:73)
- locked <0x00000000e0bdb1d8> (a com.tc.net.protocol.delivery.OOOReconnectionTimeout)
at com.tc.net.protocol.delivery.OOOReconnectionTimeout$TimeoutTimerTask.run(OOOReconnectionTimeout.java:90)
at java.util.TimerThread.mainLoop(Timer.java:512)
at java.util.TimerThread.run(Timer.java:462)
"L2_L1:TCWorkerComm # 3_R":
at com.tc.net.protocol.delivery.OOOReconnectionTimeout.notifyTransportConnected(OOOReconnectionTimeout.java:54)
- waiting to lock <0x00000000e0bdb1d8> (a com.tc.net.protocol.delivery.OOOReconnectionTimeout)
at com.tc.net.protocol.transport.AbstractMessageTransport.fireTransportEvent(AbstractMessageTransport.java:105)
at com.tc.net.protocol.transport.AbstractMessageTransport.fireTransportConnectedEvent(AbstractMessageTransport.java:69)
at com.tc.net.protocol.transport.ServerMessageTransport.handleAck(ServerMessageTransport.java:88)
at com.tc.net.protocol.transport.ServerMessageTransport.verifyAndHandleAck(ServerMessageTransport.java:76)
at com.tc.net.protocol.transport.ServerMessageTransport.receiveTransportMessageImpl(ServerMessageTransport.java:50)
- locked <0x00000000e0bdbb70> (a com.tc.net.protocol.transport.MessageTransportStatus)
at com.tc.net.protocol.transport.MessageTransportBase.receiveTransportMessage(MessageTransportBase.java:91)
- locked <0x00000000e0bdbbb0> (a java.lang.Object)
at com.tc.net.protocol.transport.ServerStackProvider$MessageSink.putMessage(ServerStackProvider.java:238)
at com.tc.net.protocol.transport.WireProtocolAdaptorImpl.addReadData(WireProtocolAdaptorImpl.java:54)
at com.tc.net.protocol.ProtocolSwitch.addReadData(ProtocolSwitch.java:50)
at com.tc.net.core.TCConnectionImpl.addNetworkData(TCConnectionImpl.java:687)
at com.tc.net.core.TCConnectionImpl.doReadInternal(TCConnectionImpl.java:365)
at com.tc.net.core.TCConnectionImpl.doRead(TCConnectionImpl.java:227)
at com.tc.net.core.CoreNIOServices$CommThread.selectLoop(CoreNIOServices.java:625)
at com.tc.net.core.CoreNIOServices$CommThread.run(CoreNIOServices.java:294)
|
|
|
- Terracotta 3.5.0
- Java 1.6.0_29
- Tomcat 6.x
Our app fails to start and the client log has:
Error occurred during initialization of VM
java.lang.NoSuchMethodError: java.util.LinkedHashMap$Entry.<init>(ILjava/lang/Object;Ljava/lang/Object;Ljava/util/HashMap$Entry;)V
at java.util.LinkedHashMap.init(Unknown Source)
at java.util.HashMap.<init>(Unknown Source)
at java.util.LinkedHashMap.<init>(Unknown Source)
at java.io.ExpiringCache$1.<init>(ExpiringCache.java:47)
at java.io.ExpiringCache.<init>(ExpiringCache.java:47)
at java.io.ExpiringCache.<init>(ExpiringCache.java:42)
at java.io.UnixFileSystem.<init>(UnixFileSystem.java:127)
at java.io.FileSystem.getFileSystem(Native Method)
at java.io.File.<clinit>(File.java:127)
at java.lang.Runtime.loadLibrary0(Runtime.java:819)
at java.lang.System.loadLibrary(System.java:1028)
at java.lang.System.initializeSystemClass(System.java:1086)
It turns out this is caused by using -XX:+AggressiveOpts which (according to this posting https://wiki.shibboleth.net/confluence/display/SHIB2/IdPClusterIssues) enables the experimental TreeMap added in 1.6.0_14
Is there a plan to support that TreeMap implementation?
|
|
|
I created a simple test that runs under terracotta (it is NOT sharing anything) that has a similar structure as the real application but stripped down. It seems like a class loader issue (terracotta 3.5.0); does it have anything to do with this code from BootJarTool.java?
Code:
private final void addInstrumentedJavaUtilConcurrentFutureTask() {
if (!Vm.isJDK15Compliant()) { return; }
final Map instrumentedContext = new HashMap();
TransparencyClassSpec spec = this.configHelper.getOrCreateSpec("java.util.concurrent.FutureTask");
spec.setHonorTransient(true);
spec.setCallConstructorOnLoad(true);
spec.markPreInstrumented();
changeClassName("java.util.concurrent.FutureTaskTC", "java.util.concurrent.FutureTaskTC",
"java.util.concurrent.FutureTask", instrumentedContext, true);
this.configHelper.addWriteAutolock("* java.util.concurrent.FutureTask$Sync.*(..)");
spec = this.configHelper.getOrCreateSpec("java.util.concurrent.FutureTask$Sync");
spec.setHonorTransient(true);
spec.markPreInstrumented();
spec.addDistributedMethodCall("managedInnerCancel", "()V", true);
changeClassName("java.util.concurrent.FutureTaskTC$Sync", "java.util.concurrent.FutureTaskTC",
"java.util.concurrent.FutureTask", instrumentedContext, true);
}
My test code. If you run it as a regular java app, it logs
"java.lang.Exception: test exception
"... exiting"
If you start it as a terracotta app (note that there is NOTHING shared), you get
"java.lang.ClassCastException: java.util.concurrent.FutureTask cannot be cast to java.util.concurrent.RunnableFuture
... exiting"
The line that throws class cast exception is:
"for (RunnableFuture<T> future : submittedTasks) { ..."
Code:
package com.acme.test;
import java.util.ArrayList;
import java.util.List;
import java.util.Random;
import java.util.concurrent.BlockingQueue;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.Future;
import java.util.concurrent.FutureTask;
import java.util.concurrent.LinkedBlockingQueue;
import java.util.concurrent.RunnableFuture;
import java.util.concurrent.ThreadPoolExecutor;
import java.util.concurrent.TimeUnit;
public class TestThreadPoolExecutor {
public static final TestThreadPoolExecutor testPool =
new TestThreadPoolExecutor( 4, 4, 0L,
TimeUnit.MILLISECONDS, new LinkedBlockingQueue<Runnable>());
private ThreadPoolExecutor delegate;
TestThreadPoolExecutor( int corePoolSize, int maximumPoolSize, long keepAliveTime,
TimeUnit unit, BlockingQueue<Runnable> workQueue ) {
delegate = new ThreadPoolExecutor(corePoolSize, maximumPoolSize, keepAliveTime, unit, workQueue );
}
public <T> Future<T> submit(Runnable task, T result) {
return delegate.submit(task, result);
}
public void execute(Runnable task) {
delegate.execute(task);
}
public BlockingQueue<Runnable> getQueue() {
return delegate.getQueue();
}
public List<Runnable> shutdownNow() {
return delegate.shutdownNow();
}
public static void main(String args[]) {
Tasks<String> tasks = new Tasks<String>();
int nofTasks = 10;
final Random random = new Random();
for(int counter = 0; counter < nofTasks; ++counter) {
tasks.start(new Runnable(){
@Override
public void run() {
try {
Thread.sleep(random.nextInt(1000));
} catch (InterruptedException e) {
e.printStackTrace();
}
}}, "task " + Integer.toString(counter));
}
try {
tasks.waitForAll();
} catch(ClassCastException cce) {
System.err.println(cce);
} catch(RuntimeException re) {
System.err.println(re.getCause());
} finally {
System.out.println("... exiting");
testPool.shutdownNow();
}
}
public static final class Tasks<T> {
private final List<RunnableFuture<T>> submittedTasks = new ArrayList<RunnableFuture<T>>();
private final TestThreadPoolExecutor exec = testPool;
private boolean doneSubmitting;
volatile boolean forceException = true;
public void start(Runnable task, final T result) {
if (doneSubmitting) {
throw new IllegalStateException(
"Tasks.start() not allowed to be called after Tasks.waitForAll()");
}
addTask((FutureTask<T>) exec.submit(task, result));
}
private void addTask(RunnableFuture<T> wrappedTask) {
submittedTasks.add(wrappedTask);
}
public List<T> waitForAll() {
doneSubmitting = true;
List<T> results = new ArrayList<T>(submittedTasks.size());
boolean done = false;
try {
// Go through the whole list to gather the results.
for (Future<T> task : submittedTasks) {
try {
results.add(task.get());
if (forceException) {
throw new Exception("test exception");
}
} catch (ExecutionException e) {
Throwable t = e.getCause();
if (t instanceof RuntimeException) {
throw ((RuntimeException) t);
} else {
throw new RuntimeException(t);
}
} catch (Exception e) {
throw new RuntimeException(e);
}
}
done = true;
} finally {
if (!done) {
for (RunnableFuture<T> future : submittedTasks) {
future.cancel(false);
}
}
}
return results;
}
}
}
The configuration is the following (adding RunnableFuture to the additional boot jar classes does not change anything):
Code:
<?xml version="1.0" encoding="UTF-8"?>
<con:tc-config xmlns:con="http://www.terracotta.org/config">
<servers>
<server host="%i" name="localhost">
<dso-port bind="0.0.0.0">9510</dso-port>
<jmx-port bind="0.0.0.0">9520</jmx-port>
<data>terracotta/server-data</data>
<logs>terracotta/server-logs</logs>
<statistics>terracotta/cluster-statistics</statistics>
</server>
</servers>
<clients>
<logs>terracotta/client-logs</logs>
</clients>
<application>
<dso>
<instrumented-classes/>
<additional-boot-jar-classes>
<include>java.util.concurrent.RunnableFuture</include>
</additional-boot-jar-classes>
</dso>
</application>
</con:tc-config>
Environment (to make it short, JDK 1.6.0_25, 64b, Windows):
Code:
awt.toolkit: sun.awt.windows.WToolkit
com.sun.management.jmxremote:
com.sun.management.jmxremote.authenticate: false
file.encoding: Cp1252
file.encoding.pkg: sun.io
file.separator: \
h2.maxFileRetry: 8
java.awt.graphicsenv: sun.awt.Win32GraphicsEnvironment
java.awt.printerjob: sun.awt.windows.WPrinterJob
java.class.path: C:\eclipse\plugins\org.terracotta.dso_3.5.0.r17406_v20110326\lib\resources;C:\eclipse\plugins\org.terracotta.dso_3.5.0.r17406_v20110326\lib\tc.jar
java.class.version: 50.0
java.endorsed.dirs: C:\Java\jdk1.6.0_25\jre\lib\endorsed
java.ext.dirs: C:\Java\jdk1.6.0_25\jre\lib\ext;C:\windows\Sun\Java\lib\ext
java.home: C:\Java\jdk1.6.0_25\jre
java.io.tmpdir: C:\Users\ADMINI~1\AppData\Local\Temp\
java.library.path: C:\Java\jdk1.6.0_25\bin;.;C:\windows\Sun\Java\bin;C:\windows\system32;C:\windows;C:\Program Files (x86)\NVIDIA Corporation\PhysX\Common;C:\windows\system32;C:\windows;C:\windows\System32\Wbem;C:\windows\System32\WindowsPowerShell\v1.0\;C:\Program Files\WIDCOMM\Bluetooth Software\;C:\Program Files\WIDCOMM\Bluetooth Software\syswow64;C:\Program Files (x86)\MySQL\MySQL Server 5.1\bin;C:\Program Files\Microsoft Windows Performance Toolkit\;C:\Java\jdk1.6.0_25\bin
java.rmi.server.randomIDs: true
java.runtime.name: Java(TM) SE Runtime Environment
java.runtime.version: 1.6.0_25-b06
java.specification.name: Java Platform API Specification
java.specification.vendor: Sun Microsystems Inc.
java.specification.version: 1.6
java.vendor: Sun Microsystems Inc.
java.vendor.url: http://java.sun.com/
java.vendor.url.bug: http://java.sun.com/cgi-bin/bugreport.cgi
java.version: 1.6.0_25
java.vm.info: mixed mode
java.vm.name: Java HotSpot(TM) 64-Bit Server VM
java.vm.specification.name: Java Virtual Machine Specification
java.vm.specification.vendor: Sun Microsystems Inc.
java.vm.specification.version: 1.0
java.vm.vendor: Sun Microsystems Inc.
java.vm.version: 20.0-b11
line.separator:
os.arch: amd64
os.name: Windows 7
os.version: 6.1
path.separator: ;
sun.arch.data.model: 64
sun.boot.class.path: C:\Java\jdk1.6.0_25\jre\lib\resources.jar;C:\Java\jdk1.6.0_25\jre\lib\rt.jar;C:\Java\jdk1.6.0_25\jre\lib\sunrsasign.jar;C:\Java\jdk1.6.0_25\jre\lib\jsse.jar;C:\Java\jdk1.6.0_25\jre\lib\jce.jar;C:\Java\jdk1.6.0_25\jre\lib\charsets.jar;C:\Java\jdk1.6.0_25\jre\lib\modules\jdk.boot.jar;C:\Java\jdk1.6.0_25\jre\classes
sun.boot.library.path: C:\Java\jdk1.6.0_25\jre\bin
sun.cpu.endian: little
sun.cpu.isalist: amd64
sun.desktop: windows
sun.io.unicode.encoding: UnicodeLittle
sun.java.command: com.tc.server.TCServerMain
sun.java.launcher: SUN_STANDARD
sun.jnu.encoding: Cp1252
sun.management.compiler: HotSpot 64-Bit Tiered Compilers
sun.os.patch.level: Service Pack 1
tc.config: C:tmp\tc-config.xml
tc.install-root: C:\eclipse\plugins\org.terracotta.dso_3.5.0.r17406_v20110326
tc.server.name: localhost
user.country: US
user.dir: C:\Users\Administrator\test
user.home: C:\Users\Administrator
user.language: en
user.name: Administrator
user.timezone: America/Los_Angeles
user.variant:
|
|
|
Un-selecting the box in the admin console seems to have no effect (at least pre 2.7.3, with the latest version is hard to say unless debugging the process or watching the network, as this is the last version :o)
It seems to me that AdminClientPanel.java:728 should be
if (isEnabled() && m_updateCheckerControlAction.isUpdateCheckEnabled())
|
|
|
Terracotta version: 2.7.1
The cluster was happy for quite a while, and then has switched in this mode where a thread requesting the W lock of a shared RRWL gets stuck for hours, but eventually gets the W lock and can progress. It just happened that some workers need a lock (not shared) that this stuck thread acquires prior to requesting the cluster-wide W lock, so it became very apparent. I saw CDV-940 as fixed in 2.7.3, so I am trying to figure out if this bug is the one causing this behavior for us, and if we should upgrade (we are very close to release and wanted to avoid moving to a newer Terracotta version this late in the cycle).
The L1 and L2 log files reveal nothing, so I am pasting bellow the stack trace for the stuck thread (the workers are blocked on locked java.util.HashMap@36f2ecb1, showing just one of them). Thanks for any clarifications.
Stuck thread:
"301100868@qtp0-262" - Thread t@323
java.lang.Thread.State: WAITING on java.lang.Object@230e66d8
at java.lang.Object.wait(Native Method)
at java.lang.Object.wait(Object.java:485)
at com.tc.object.lockmanager.impl.ClientLock.waitForTryLock(ClientLock.java:583)
at com.tc.object.lockmanager.impl.ClientLock.basicLock(ClientLock.java:204)
at com.tc.object.lockmanager.impl.ClientLock.lock(ClientLock.java:118)
at com.tc.object.lockmanager.impl.ClientLock.tryLock(ClientLock.java:103)
at com.tc.object.lockmanager.impl.ClientLockManagerImpl.tryLock(ClientLockManagerImpl.java:337)
at com.tc.object.lockmanager.impl.StripedClientLockManagerImpl.tryLock(StripedClientLockManagerImpl.java:159)
at com.tc.object.lockmanager.impl.ThreadLockManagerImpl.tryLock(ThreadLockManagerImpl.java:59)
at com.tc.object.tx.ClientTransactionManagerImpl.tryBegin(ClientTransactionManagerImpl.java:154)
at com.tc.object.bytecode.ManagerImpl.tryBegin(ManagerImpl.java:341)
at com.tc.object.bytecode.ManagerImpl.tryMonitorEnter(ManagerImpl.java:553)
at com.tc.object.bytecode.ManagerUtil.tryMonitorEnter(ManagerUtil.java:503)
at java.util.concurrent.locks.ReentrantReadWriteLock$DsoLock.tryLock(ReentrantReadWriteLock/java:58)
at java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.tryLock(Unknown Source)
at com.hp.dharma.lockmanager.RWLockManagerImpl._aquireLock(RWLockManagerImpl.java:162)
at com.hp.dharma.lockmanager.RWLockManagerImpl.lock(RWLockManagerImpl.java:74)
at com.iconclude.dharma.commons.repo.LockManager.lockWrite(LockManager.java:30)
at com.iconclude.dharma.commons.repo.LocalConnection.executeWrite(LocalConnection.java:1131)
at com.iconclude.dharma.commons.repo.LocalConnection.start(LocalConnection.java:86)
at com.iconclude.dharma.commons.repo.RepoService.openConnection(RepoService.java:175)
at com.iconclude.dharma.commons.repo.RepoService.openWorkspaceConnection(RepoService.java:258)
at com.iconclude.dharma.commons.repo.RepoService.getConnectionForRemoteClient(RepoService.java:219)
- locked java.util.HashMap@36f2ecb1
at com.iconclude.dharma.commons.repo.servlet.HttpRepoService.getLocalConnection(HttpRepoService.java:342)
at com.iconclude.dharma.commons.repo.servlet.HttpRepoService.doOpen(HttpRepoService.java:1149)
at com.iconclude.dharma.commons.repo.servlet.HttpRepoService.doService(HttpRepoService.java:97)
at com.iconclude.dharma.commons.http.AbstractChainedHttpService.service(AbstractChainedHttpService.java:56)
at com.iconclude.dharma.commons.http.AbstractChainedHttpService.service(AbstractChainedHttpService.java:63)
at com.iconclude.dharma.http.HttpDispatchService.service(HttpDispatchService.java:57)
at com.iconclude.dharma.http.DharmaServlet.doGet(DharmaServlet.java:34)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:502)
at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1124)
at com.iconclude.dharma.commons.util.http.DharmaFilterToBeanProxy.doFilter(DharmaFilterToBeanProxy.java:48)
at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1115)
at com.iconclude.dharma.commons.util.http.DharmaFilterToBeanProxy.doFilter(DharmaFilterToBeanProxy.java:48)
at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1115)
at com.iconclude.dharma.commons.util.http.DharmaFilterToBeanProxy.doFilter(DharmaFilterToBeanProxy.java:48)
at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1115)
at com.iconclude.dharma.commons.util.http.DharmaFilterToBeanProxy.doFilter(DharmaFilterToBeanProxy.java:48)
at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1115)
at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:361)
at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:417)
at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:324)
at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:534)
at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:864)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:533)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:207)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:403)
at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:409)
at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:522)
Locked ownable synchronizers:
- None
Worker waiting for the non-DSO regular lock:
"1301700068@qtp0-243" - Thread t@304
java.lang.Thread.State: BLOCKED on java.util.HashMap@36f2ecb1 owned by: 301100868@qtp0-262
at com.iconclude.dharma.commons.repo.RepoService.getConnectionForRemoteClient(RepoService.java:212)
at com.iconclude.dharma.commons.repo.servlet.HttpRepoService.getLocalConnection(HttpRepoService.java:342)
at com.iconclude.dharma.commons.repo.servlet.HttpRepoService.doPing(HttpRepoService.java:300)
at com.iconclude.dharma.commons.repo.servlet.HttpRepoService.doService(HttpRepoService.java:99)
at com.iconclude.dharma.commons.http.AbstractChainedHttpService.service(AbstractChainedHttpService.java:56)
at com.iconclude.dharma.commons.http.AbstractChainedHttpService.service(AbstractChainedHttpService.java:63)
at com.iconclude.dharma.http.HttpDispatchService.service(HttpDispatchService.java:57)
at com.iconclude.dharma.http.DharmaServlet.doGet(DharmaServlet.java:34)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:502)
at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1124)
at com.iconclude.dharma.commons.util.http.DharmaFilterToBeanProxy.doFilter(DharmaFilterToBeanProxy.java:48)
at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1115)
at com.iconclude.dharma.commons.util.http.DharmaFilterToBeanProxy.doFilter(DharmaFilterToBeanProxy.java:48)
at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1115)
at com.iconclude.dharma.commons.util.http.DharmaFilterToBeanProxy.doFilter(DharmaFilterToBeanProxy.java:48)
at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1115)
at com.iconclude.dharma.commons.util.http.DharmaFilterToBeanProxy.doFilter(DharmaFilterToBeanProxy.java:48)
at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1115)
at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:361)
at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:417)
at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:324)
at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:534)
at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:864)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:533)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:207)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:403)
at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:409)
at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:522)
Locked ownable synchronizers:
- None
|
|
|
Sure no problem. Now, where do I find the contributor agreement? I searched the site but nothing came up. Thanks,
|
|
|
Looking at the code, it seems that those properties have to be named
com.tc.l2.nha.dirtydb.autoDelete and com.tc.l2.nha.autoRestart
I have patched 2.7.1 (it seems that 2.7.2 is the same in that area) and implemented rolling of those back-ups based on a property called com.tc.l2.nha.dirtydb.rolling=<nof of most recent backups to keep>. Default is 0, and a value <=0 means no rolling.
I have attached my changes (look for l2.nha.dirtydb.rolling and DirtyObjectDbCleaner.trimDirtyObjectDbBackups()).
The code can be freely used if deemed worthy. Cheers,
|
|
|
It works, thanks. Yep, start-tc-server script restarts the server for that specific error code, I should have paid attention to it (we are not using it, having our own way to "baby-sit" the L2 process), that is why the whole thing was failing for me.
There is another problem though, in that the backups keep accumulating, there is no rolling over mechanism...
|
|
|
Very simple scenario (we are using 2.7.1):
- the L2s are configured for HA in networked-active-passive
- the machine that has the stand-by L2 got rebooted
- the L2 failed to come up because dirty DB
- the active L2 machine died, and the whole cluster dies as there is no stand-by to take over
Is there a way to tell L2(s) to delete automatically the DB when they come up? Thanks,
|
|
|
yep and no success...
How do you set the proxy info for the eclipse plugin's tim update manager? Is there some way of adding that to tim-get.properties?
|
|
|