| Author |
Message |
![[Post New]](/forums/templates/default/images/icon_minipost_new.gif) 06/12/2008 02:03:53
|
richardw
journeyman
Joined: 05/16/2008 03:38:29
Messages: 33
Offline
|
Hi,
We are running 2 teracotta servers(2.6.2 nightly) in network-active-passive mode and we are clustering our http sessions with jetty and wicket, we have 8 active clients.
The server data directory (the terracotta/server-data one) is massive, like 14 gigs, which is up from 8 gigs about 18 hrs ago, if it keeps on growing like this we will start to run into some serious problems.
The number of active sessions, while quite large (~2000) seems to remain fairly constant, so I am quite sure that old sessions are being removed. most sessions are quite small, since we get a lot of traffic from search engine crawlers, which make new sessions on each request.
Also the terracotta garbage collection is taking a large amount of time and seems to be running all of the time. I have attached a screenshot of the admin console.
Is there something we are doing wrong?
Thanks,
Richard
|
| Filename |
Screenshot-Terracotta Administrator Console.png |
Download
|
| Description |
|
| Filesize |
66 Kbytes
|
| Downloaded: |
217 time(s) |
|
|
|
 |
![[Post New]](/forums/templates/default/images/icon_minipost_new.gif) 06/12/2008 03:03:16
|
gbevin
praetor
Joined: 07/04/2007 09:09:42
Messages: 210
Offline
|
Hi,
I can't really comment on the state of our wicket support, but I know that a while ago is was not optimal partly due to the way sessions are being used in wicket.
Just based on your screenshot and description, it seems to me that the sessions attributes might be constantly replaced with new values and new instances. This creates a lot of garbage that has to be handled by the server. If this garbage is created at a rate that is faster than what the current garbage collection cycles can handle, the data on the server will gradually increase.
You might want to try tuning the garbage collection cycles to make it run more frequently, as described here:
http://www.terracotta.org/confluence/display/docs1/Configuration+Guide+and+Reference#ConfigurationGuideandReference-%2Ftc%3Atcconfig%2Fservers%2Fserver%2Fdso%2Fgarbagecollection
Hope this helps,
Geert
|
Want to post to this forum? Join the Terracotta Community |
|
|
 |
![[Post New]](/forums/templates/default/images/icon_minipost_new.gif) 06/12/2008 07:19:42
|
richardw
journeyman
Joined: 05/16/2008 03:38:29
Messages: 33
Offline
|
Ok, thanks. I have restarted using a 15 minute interval and a 1.5 gig jvm (was 512 before), I will post back my findings.
just before my restart my server data dir had grown to 28gig
|
|
|
 |
![[Post New]](/forums/templates/default/images/icon_minipost_new.gif) 06/12/2008 09:48:13
|
ari
seraphim
Joined: 05/24/2006 14:23:21
Messages: 1356
Location: San Francisco, CA
Offline
|
DId you increase load on the system in the last day? The DGC is running for hours (3 - 7MM milliseconds == 1 - 3 hours) and is finding tens of millions of candidate objects.
2 days ago it was running for shorter periods and finding less garbage. What changed?
Definitely need to bump your DGC interval. It looks like you are in production. Are you? If yes, then don't do what I am about to suggest there but in a lab under load instead. Try DGC very aggressive (like every couple of minutes) and see how that impacts overall throughput. Then you have a good bounding point for analysis. It seems like we already know that the default DGC is not keeping up because your disk is growing FAST.
But don't forget to explain to yourself why the last 24 hours got a lot worse then before. As gbevin points out, you could have issues other than DGC. Hard to tell but this is why I suggest (a) explain to yourself the sudden jump in garbage and then (b) tune your DGC somewhere other than production.
Cheers,
--Ari
|
|
|
 |
![[Post New]](/forums/templates/default/images/icon_minipost_new.gif) 06/12/2008 10:02:45
|
richardw
journeyman
Joined: 05/16/2008 03:38:29
Messages: 33
Offline
|
Hi,
Sorry I should have said before. We only turned our prod servers over to terracotta yesterday, thats why you see the massive jump. We did some load testing but didnt factor in all of the traffic we get from web crawlers, even though we were hammering our office internet connection on our test site.
We are currently running gc once every 15 minutes and my disc usage is at 3gigs now and seems to be slowly increasing. the last gc took 1,677,112 ms and removed about 2.5 million garbage objects.
Is there any way I can speed up the garbage collection? I'm not really up on jvm tuning myself, the only changes we made were to give it more memory.
Thanks,
Richard
|
|
|
 |
![[Post New]](/forums/templates/default/images/icon_minipost_new.gif) 06/12/2008 10:09:44
|
ari
seraphim
Joined: 05/24/2006 14:23:21
Messages: 1356
Location: San Francisco, CA
Offline
|
you mean you are tuning DGC, right?
So, if you are speaking about DGC, your DGC is too slow right now. Here's the math:
1. 15 minute interval
2. 1,677,112 ms per run
3. 29.9 minutes (1,677,112 ms / 60,000 ms per minute) to run a single pass!!!
In other words, you are skipping 1 in 2 DGCs because the previous one is still running when the next one wants to fire. I hope that made sense.
You have to get DGC to a point where the duration < interval. This is why I said try something hyper-aggressive like every minute outside a production environment. See what happens. The other alternative is to not make all these objects shared. Perhaps some could be transient to TC (with an on-load hook). I am concerned that you are sending 9MM objects of garbage to the TC server in less than 15 or 30 minutes.
--Ari
|
|
|
 |
![[Post New]](/forums/templates/default/images/icon_minipost_new.gif) 06/12/2008 20:47:16
|
ari
seraphim
Joined: 05/24/2006 14:23:21
Messages: 1356
Location: San Francisco, CA
Offline
|
I should have mentioned the doubly scary thing which is in the example above from your production logs, you have only 6 seconds to go before you end up skipping 2 DGCs. Thus your giant backlog and growing disk, I think.
(At 30 minutes DGC duration you will have missed the one that was to fire 15 minutes after you, and the one that was to fire 15 minutes later. We only run one at a time. )
Definitely shrink your DGC window but also ask yourself why so much garbage?
--Ari
|
|
|
 |
![[Post New]](/forums/templates/default/images/icon_minipost_new.gif) 06/13/2008 00:31:35
|
steve
consul
Joined: 05/24/2006 14:22:53
Messages: 424
Online
|
Maybe take a snapshot of things using the cluster visualization tool and someone can take a look. Also, having a lot of memory on the machine that ISN'T used by java heap makes DGC work faster. Their are other tuning params that the field guys should be able to help with.
|
Want to post to this forum? Join the Terracotta Community |
|
|
 |
![[Post New]](/forums/templates/default/images/icon_minipost_new.gif) 06/13/2008 03:35:56
|
gbevin
praetor
Joined: 07/04/2007 09:09:42
Messages: 210
Offline
|
This FAQ item might also be handy:
http://www.terracotta.org/confluence/display/wiki/TechnicalFAQ#TechnicalFAQ-WhatisDGCandwhyshouldItuneit.AndifIneedto%2ChowshouldItuneDGC.
|
Want to post to this forum? Join the Terracotta Community |
|
|
 |
![[Post New]](/forums/templates/default/images/icon_minipost_new.gif) 06/13/2008 03:54:56
|
richardw
journeyman
Joined: 05/16/2008 03:38:29
Messages: 33
Offline
|
ok, ill try that.
Where is je.properties, i cant find it?
thanks,
Richard
|
|
|
 |
![[Post New]](/forums/templates/default/images/icon_minipost_new.gif) 06/13/2008 04:03:53
|
gbevin
praetor
Joined: 07/04/2007 09:09:42
Messages: 210
Offline
|
I think it's a properties file that you need to create and put in the "terracotta/server-data/objectdb" directory. I'm not 100% sure, will ask someone here to weigh in on this. Thanks for asking!
|
Want to post to this forum? Join the Terracotta Community |
|
|
 |
![[Post New]](/forums/templates/default/images/icon_minipost_new.gif) 06/13/2008 04:13:27
|
njain
master
Joined: 01/03/2007 06:41:59
Messages: 70
Offline
|
je properties are set in tc.properties file.
Create a file named 'tc.properties' in $TC_INSTALL_DIR/lib directory. Add je properties with name value pair, prepend property name with 'l2.berkeleydb.' prefix.
e.g. you want to override je.lock.nLockTables with value 10. Add following line to tc.properties
l2.berkeleydb.je.lock.nLockTables=10
|
Regards,
Nitin Jain
Terracotta, Inc.
Join the Terracotta Community
|
|
|
 |
![[Post New]](/forums/templates/default/images/icon_minipost_new.gif) 06/13/2008 04:41:47
|
ssubbiah
master
Joined: 05/24/2006 14:25:22
Messages: 98
Location: Saravanan Subbiah
Offline
|
je.properties is properties used by berkeley db and can be used as explained here.
But a easy way to configure any je property in terracotta is to add the prefix "l2.berkeleydb." to it and add it to tc.properties file that is used to tune terracotta. More on tc.properties can be found here.
The properties mentioned about tuning cleaner thread will only help if DGC is fast but the cleaner thread is falling behind in cleaning up space. In your case you are creating a lot of garbage and hence dgc is taking a long time to delete those objects. I would suggest turning verbose gc setting on in the config the next time you have a chance to restart and it will print more details about how long it took for each stage.
BTW, 2.7 will have many optimizations around DGC which will make it run faster for various usecases.
|
Saravanan Subbiah
Terracotta Engineer |
|
|
 |
![[Post New]](/forums/templates/default/images/icon_minipost_new.gif) 06/13/2008 06:58:44
|
richardw
journeyman
Joined: 05/16/2008 03:38:29
Messages: 33
Offline
|
Using verbose gc switch I am seeing that a GC takes between 0.1 and 0.06 seconds and a Full GC takes around 5 seconds.
I'm using the suggestions found here: http://www.terracotta.org/confluence/display/wiki/TechnicalFAQ#TechnicalFAQ-WhatisDGCandwhyshouldItuneit.AndifIneedto%2ChowshouldItuneDGC.
And the DGC runs every 1 minute, but it still taking along time to run (the last one was around 15 minutes, still waiting on the next one after 25 minutes)
I'm finding it quite difficult to replicate our prod load on the test system though, so I cant be 100% sure that these changes will reflect what will happen on prod.
I dont think I can control how much garbage gets created, if I look under the classes bit of the admin console, most of the top classes are wicket ones, and our actual data objects dont get sent to terracotta server, only the id.
|
|
|
 |
![[Post New]](/forums/templates/default/images/icon_minipost_new.gif) 06/13/2008 09:40:30
|
ari
seraphim
Joined: 05/24/2006 14:23:21
Messages: 1356
Location: San Francisco, CA
Offline
|
Well,
It is always hard to replicated production load in test. I never found a way, honestly. The question I was asking was how hot does your TC server run (CPU, memory, etc.) and how much slower does it go w/ DGC every minute.
If the answer is that all DGC takes 15+ minutes, then the TC server is not going to run any hotter at 1 minute, 2 minute, or 15 minute intervals. It will just be running DGC all the time :(
And so it will have predictable performance but the disk in your use case is growing faster than DGC can clean it.
You could go ahead and try 1 minute in prod then.
But I think we need to shift focus back to gbevin's point. Wicket is creating garbage on every page request. Someone like gbevin should step in here and help us figure this out (perhaps with Jonathan or Eelco's help).
Sorry I can't be of more service on this one.
--Ari
|
|
|
 |
|
|