Author |
Message |
09/01/2008 22:40:03
|
smartbelief
neo
Joined: 09/01/2008 21:02:13
Messages: 4
Offline
|
Hi,
I have a problem when tc client got lots of concurrent requests :
Code:
2008-09-02 11:11:07,298 [TCComm Main Selector Thread] INFO com.tc.net.core.TCConnectionManager - error event on connection com.tc.net.core.TCConnectionJDK14@745028604: connected: true, closed: false local=127.0.0.1:27868 remote=127.0.0.1:9510 connect=[Tue Sep 02 10:55:57 CST 2008] idle=728ms [90549 read, 8598825 write]: Broken pipe
tc client disconnected with tc server after that,and all the threads to handle the requests blocked to wait the reponses of tc.
BTW,I am using terracotta-2.6.2 with the module of EHCache.
|
|
|
09/02/2008 18:05:28
|
smartbelief
neo
Joined: 09/01/2008 21:02:13
Messages: 4
Offline
|
i got this error again today
Code:
2008-09-02 19:34:10,555 [TCComm Main Selector Thread] INFO com.tc.net.core.TCConnectionManager - error event on connection com.tc.net.core.TCConnectionJDK14@1688936871: connected: true, closed: false local=192.168.0.216:62066 remote=192.168.0.216:9510 connect=[Mon Sep 01 11:49:53 CST 2008] idle=117ms [39108458 read, 623439915 write]: Broken pipe
2008-09-02 19:34:11,248 [TCComm Main Selector Thread] WARN com.tc.net.core.CoreNIOServices - Exception trying to shutdown socket output: Transport endpoint is not connected
|
|
|
09/02/2008 21:09:37
|
tgautier
seraphim
Joined: 06/05/2006 12:19:26
Messages: 1781
Offline
|
Ok. It will help if you explain the environment and situation in more detail.
|
|
|
09/02/2008 23:01:04
|
smartbelief
neo
Joined: 09/01/2008 21:02:13
Messages: 4
Offline
|
I am using ehcache to cache contact.
tc-config.xml:
Code:
<?xml version="1.0" encoding="UTF-8"?>
<con:tc-config xmlns:con="http://www.terracotta.org/config">
<servers>
<server host="127.0.0.1">
<dso-port>9510</dso-port>
<jmx-port>9520</jmx-port>
<data>terracotta/server-data</data>
<logs>terracotta/server-logs</logs>
<dso>
<persistence>
<mode>permanent-store</mode>
</persistence>
</dso>
</server>
</servers>
<clients>
<logs>terracotta/client-logs</logs>
<modules>
<module name="tim-ehcache-1.3" version="1.1.1"/>
</modules>
</clients>
<application>
<dso>
<instrumented-classes>
<include>
<class-expression>com.pqs.contact.Contact</class-expression>
<honor-transient>true</honor-transient>
</include>
</instrumented-classes>
<roots>
<root>
<field-name>com.pqs.contact.ContactProvider.cacheManager</field-name>
</root>
</roots>
<locks>
<autolock>
<method-expression>void com.pgs.contact.ContactItem.*(..)</method-expression>
<lock-level>write</lock-level>
</autolock>
</locks>
</dso>
</application>
</con:tc-config>
source:
Code:
package com.pqs.Contact;
import net.sf.ehcache.Cache;
import net.sf.ehcache.CacheManager;
import net.sf.ehcache.Element;
public class ContactProvider {
public CacheManager cacheManager=new CacheManager();
private Cache contactCache=null;
/**
* Returns the Contact for the given username.
*
* @param username the username to search for.
* @return the contact associated with the ID.
* @throws com.pqs.Contact.UserNotFoundException if the ID does not correspond
* to a known entity on the server.
*/
public Contact getContact(String username) throws UserNotFoundException {
if (contactCache == null) {
cacheManager.addCache("contact");
contactCache = cacheManager.getCache("contact");
}
if (contactCache == null) {
throw new UserNotFoundException("Could not load caches");
}
Element contactEle = contactCache.get(username);
if (contactEle == null) {
Contact contact = new Roster(username);
contactEle = new Element(username, contact);
contactCache.put(contactEle);
} else {
System.out.println("getRoster from cache:" + username);
}
Contact contact = (Contact) contactEle.getValue();
return contact;
}
}
when I started 1,000 users to getContact() simultaneously,things happened.
then,
I try adding synchronized(cacheManager) { } around the code in this method getContact(),and it works smothly now,but i want to know why this happened.
source modified:
Code:
package com.pqs.Contact;
import net.sf.ehcache.Cache;
import net.sf.ehcache.CacheManager;
import net.sf.ehcache.Element;
public class ContactProvider {
public CacheManager cacheManager=new CacheManager();
private Cache contactCache=null;
/**
* Returns the Contact for the given username.
*
* @param username the username to search for.
* @return the contact associated with the ID.
* @throws com.pqs.Contact.UserNotFoundException if the ID does not correspond
* to a known entity on the server.
*/
public Contact getContact(String username) throws UserNotFoundException {
synchronized(cacheManager){
if (contactCache == null) {
cacheManager.addCache("contact");
contactCache = cacheManager.getCache("contact");
}
if (contactCache == null) {
throw new UserNotFoundException("Could not load caches");
}
Element contactEle = contactCache.get(username);
if (contactEle == null) {
Contact contact = new Roster(username);
contactEle = new Element(username, contact);
contactCache.put(contactEle);
} else {
System.out.println("getRoster from cache:" + username);
}
Contact contact = (Contact) contactEle.getValue();
return contact;
}
}
}
|
|
|
09/03/2008 05:05:10
|
ari
seraphim
Joined: 05/24/2006 14:23:21
Messages: 1665
Location: San Francisco, CA
Offline
|
How do you start the 1000 threads? On 1 JVM? On many?
You have a write lock on getContact() so if you are starting the threads across JVMs, they will all contend with each other to acquire the write lock.
How do you generate load on this? How many of the 1K threads are getting a cache miss and having to generate a contact? And how many are simply reading an already-loaded contact?
--Ari
|
|
|
09/03/2008 17:59:09
|
smartbelief
neo
Joined: 09/01/2008 21:02:13
Messages: 4
Offline
|
The request threads from other 2 JVMs.
All requests are simply reading an already loaded contact which is composed of a huge xml stanza.
|
Filename |
tc.JPG |
Download
|
Description |
system constructure |
Filesize |
36 Kbytes
|
Downloaded: |
158 time(s) |
|
|
|
09/03/2008 20:19:43
|
ari
seraphim
Joined: 05/24/2006 14:23:21
Messages: 1665
Location: San Francisco, CA
Offline
|
What's your tc server doing when your client gets a broken pipe? Do your server logs show an OOME or assert and exit perchance?
Can you reproduce this / do you still have the server logs? Please attach them here. You are doing something suspicious of the following sort:
1. You have too much parallel load going at your application from a single JVM (2 JVMs each with 500 - 1000 threads is what I think you are doing, no?)
2. you have not tuned your app to handle the payloads you are sending through Terracotta. You might be running out of memory, especially if your updates are large enough.
I also think that you are asking why your threads block when you get broken pipe. The threads will block while a TC server is not available. This is why you always run 2 TC servers in active / passive mode in production. In your case, the client and tc server connection gets severed, simulating a TC outage and then all your threads trying to write will block. I wouldn't worry about this issue till we get through why you are getting the broken pipe.
--Ari
|
|
|
09/03/2008 20:22:04
|
ari
seraphim
Joined: 05/24/2006 14:23:21
Messages: 1665
Location: San Francisco, CA
Offline
|
BTW, have you done the simple arithmetic of:
XML_Payload x number of parallel threads = total number of bytes sent / second to TC.
If that # is > 1Gbit / second I seriously doubt your test will ever succeed, till you bring more machines into the mix. There are ways to get more than 1Gbit / sec but it will take much more work than this simple test I think you are running.
What is that arithmetic in your test, please? KBytes / sec? MBytes / sec? Gigabytes / sec?
--Ari
|
|
|
|