| Author |
Message |
![[Post New]](/forums/templates/default/images/icon_minipost_new.gif) 03/02/2009 15:24:14
|
ed_b_71
neo
Joined: 03/02/2009 15:14:06
Messages: 2
Offline
|
Hi,
I need to manage live sessions in a deployment with a lot of traffic. The requirements are for 100,000 tps - that's right 100K creations and deletions of session and upto 10 Million session in memory.
Will stripe the session across multiple machines and create affinity - say 5 machines each hanlding 2 Mill sessions. but can terracota handle these kinds of numbers.
The session objects are going to be small in size say 2K .
Just wanted to get a gut feel from the community.
thanks!
|
|
|
 |
![[Post New]](/forums/templates/default/images/icon_minipost_new.gif) 03/02/2009 22:12:02
|
ari
seraphim
Joined: 05/24/2006 14:23:21
Messages: 1356
Location: San Francisco, CA
Offline
|
wow. 100k tps @ 2K per object == 200MB/sec or 2Gbits per sec.
Can Terracotta do this? Likely yes, but depending on _many_ factors.
We can do 100K tps when the transactions are highly localized and you are not bound by disk I/O when using our striped array in the FX product. When you say session is this sticky HTTP session? If yes, we have a chance. And, when you say session, can we assume fewer than 2KB change per request? And, most importantly, is this peak or sustained? what's the new session creation rate? In an average second, how many sessions will be created and how many would be garbage collected? What's the TTL on sessions? After all this, the answer might be yes. I suggest you get on a phone with us and talk it through.
I also suggest you be prepared to partition sub-clusters.
What would you do if Terracotta cannot do this? I ask because we believe we have one of the fastest technologies on the planet for such use cases. We may need as few as 10 commodity Linux / Intel machines (perhaps with SSD or running w/ no disk persistence). But most other technologies that we come up against can only do 10 - 100 writes / sec per node so you would need 1000 - 10K servers in a "grid" to support this sort of throughput.
How are you thinking of tackling this w/o Terracotta and how do we get on a call to discuss the details so that we can make a promise or say "no go" based on accurate info?
--Ari
|
|
|
 |
![[Post New]](/forums/templates/default/images/icon_minipost_new.gif) 03/03/2009 13:35:29
|
ed_b_71
neo
Joined: 03/02/2009 15:14:06
Messages: 2
Offline
|
The idea was to have only in memory storage - nothing to disk for this level of throughput - hoping availability is provided via back up TC server array.
The session objects stick around for about 10 minutes (av session duration), have a size of 2K.
100K tps is the rate at which these session objects are created and removed. 50K created per second, 50K deleted per second. This is peak hour with sustained at 50% of peak.
Session objects only get added and then removed (never modified). They are queried a number of times.
I can provide affinity externally to make sure that under normal operating conditions - the object is added, queried and deleted on a single server instance , but need the data made available in case of failure across other instances.
Alternative -
I have built a similar system, C++ implementation, at about a 3rd of the requirements (partitioned) using BerkelyDB (as a in memory cache).
The solution was partitioned, like I am thinking with TC, with most load bearing servers running with a sub set of the over all sessions in cache and a master kept the entire set to handle failure of the subset servers. We used replication between the subset and the master.
It does not have all the CAP characteristics as terracotta but worked well for the high throughput requirements.
Investigating JAVA based technologies for the next version, so was looking at EhCache to store upto 2million sessions and TC for availability and replication.
|
|
|
 |
![[Post New]](/forums/templates/default/images/icon_minipost_new.gif) 03/03/2009 20:31:55
|
ari
seraphim
Joined: 05/24/2006 14:23:21
Messages: 1356
Location: San Francisco, CA
Offline
|
Ok, so let's break this down a sec:
50K creates every second
50K garbage every second
10 minute TTL
no updates
perfect locality
a few quick calcs:
10 * 60 = 600 secs * 50K objects = 30,000,000 objects in the system at steady state operation
and 50K objects created per sec
let's say DGC runs every minute, it would have 50K * 60 = 3,000,000 objects to clean every DGC run.
30MM objects *2K each = 60GB data
6GB deleted every minute.
50MB / second sent to the array
50MB / second removed from the array's disks.
0 read traffic because of your load balancer. In fact, this will basically work out to near zero lock contention with our ConcurrentStringMap or ConcurrenthashMap with lots of segments. Its purely a write-only use case as far as Terracotta is concerned.
I would say this feels like a 10-node TC Server (L2) array (x2 for HA - 20 nodes)
at 10 TC Server instances in an FX array you divide everything by 10 (good striping):
so Each L2 will see:
5MB / second new objects
5MB / second disk I/O
6GB of data
300,000 objects of garbage per minute
3MM objects total
You definitely need Terracotta FX for this (paid product). What's more you need FX gen2 array coming out in the next release (a few weeks away now). Do you wanna pilot it? on a Gen1 array (in production at paid customers for the last year+) this might work as well.
But you tell us if you want a 20 L2 cluster with Terracotta? And do you want to pay for the software?
We would love to work on this use case but it is quite a lot of I/O and sounds like the data is not too valuable so I must ask why store it in TC to begin with?
Might make sense to run w/o disk persistence and run 10 64-bit L2's each with a 6 - 10GB heap. Not sure w/o some hands-on study through our services group.
Your call. Send me a personal message or email me at ari AT terracotta DOT org if you want to move fwd.
In short, this is a BIG use case. Should be evident to you from the fact that IBM z-Series computer is good at 25K tps. If a $1MM machine can't do your use case and we can is 20 or so $5K machines == $100K HW, I think our solution is good...but only you know if it will be good enough for you.
BTW, if you partition the cluster up into say 5 separate 2 L2 arrays, you will find we can almost definitely do this use case almost out of the box.
--Ari
|
|
|
 |
|
|
|
|