[Logo] Terracotta Discussion Forums
  [Search] Search   [Recent Topics] Recent Topics   [Members]  Member Listing   [Groups] Back to home page 
[Register] Register / 
[Login] Login 
[Expert]
Why only a single thread for applying transactions?  XML
Forum Index -> Terracotta DSO
Author Message
jie

neo

Joined: 12/06/2007 02:31:27
Messages: 6
Offline

I am investigating of using Terracotta in our system, which needs a POJO cluster with HA, linear scalability and high performance. Terracotta is the best one among the approaches I have ever investigated up to now. However, I'm wondering why terracotta uses only a single thread applying transactions. In my test, 10000+ transactions are committed from a single client(l1) per second. The tc-server seems to have a limit of 10000 tps no matter how powerful the hardware is(8 cores, Gigabits network, 4GB RAM, no persistence). Finally, I realize that the single-thread APPLY_CHANGES_STAGE maybe the bottleneck. Furthermore, since there are not multi-threads applying, more CPUs are useless.

I guess that ensuring transaction sequence is the primary reason for single-thread applying. How about only ensuring the sequence of transactions coming from the same client, which means we can use multi-threads(one thread per client) to apply transactions? I think one applying thread per client can also keep the transaction order. The precondition is a client should not enter object A's synchronization block until all the transactions on A is fully applied in it's VM, no matter whether the transactions modify A or not.

Am I right or not ?
jie

neo

Joined: 12/06/2007 02:31:27
Messages: 6
Offline

I found a similar question at https://jira.terracotta.org/jira//browse/CDV-100, where Euqene Kuleshov said that "... it seems like we'll have a single thread for applying transactions. Not sure if that won't be a bottle neck on high event volumes." Does anybody ever think about it? Or is it obviously wrong using multi-threads for applying transactions?
kbhasin

consul

Joined: 12/04/2006 13:08:21
Messages: 284
Offline

We should also keep in mind that even though APPLY_CHANGES_STAGE is single threaded, that is not the only thing the Server is doing. There are many other stages and threads doing a lot of work. For the apply stage to be the bottleneck, there would need to be a very high volume of fine-grained transactions.

Have you considered batching the transactions by using more coarse-grained locks?

Regards,

Kunal Bhasin,
Terracotta, Inc.

Be a part of the Terracotta community: Join now!
jie

neo

Joined: 12/06/2007 02:31:27
Messages: 6
Offline

Thanks for reply.

Our system receives thousands of requests from clients per second, and maybe increases to 10000+ requests in near future. Each request brings about 1 transaction at least, no matter how coarse-grained the transaction is. What I am worry about is whether or not the tc-server can handle 10000+ tps. Unfortunately, I realize the throughput of server depends on the APPLY_CHANGES_STAGE, which is running on a single thread and definitely has a toplimit (10000 tps on our machine). And, our system can not be partitioned to seperate sub systems, like the case in "Terracotta's Scalability Story"(link: http://www.theserverside.com/tt/articles/article.tss?l=TerracottaScalabilityStory).

If there is possibility of using multi-threads to apply transactions, why does not Terracotta do it? It will break the sequence of transactions or there will be other bottlenecks such as "COMMIT_TRANSACTION_STAGE"?

My wish is that the performance of tc-server scales up with the number of processors, which leads to the ability of supporting infinite l1 clients theoretically. Is there any possibility?
kbhasin

consul

Joined: 12/04/2006 13:08:21
Messages: 284
Offline

Putting a number to TPS (Transactions Per Second) can always be a little misleading as it really depends on what each Transaction is doing. In many use cases, we have seen orders of magnitude higher system wide TPS than what you are seeing.

Increasing or decreasing the throughput of any one stage can have an effect on others and hence the overall system throughput. Terracotta being a general purpose solution, we have tuned it to perform well in a wide array of use cases. Of course, there are certain use cases where it is possible to tune certain areas of the system to better service that particular use case. You will also be glad to know that we have certain features planned on our road map like the active-active TC Servers to achieve theoretically unlimited scale.

This sounds like a perfect opportunity for you to engage with our Professional Services team so we can analyze your use case and tune Terracotta to meet your requirements.

Can I invite you to have a phone conversation? Please send me a PM (private message) so we can arrange a good time to talk.

Regards,

Kunal Bhasin,
Terracotta, Inc.

Be a part of the Terracotta community: Join now!
kbhasin

consul

Joined: 12/04/2006 13:08:21
Messages: 284
Offline

I am also curious to know how you determined that the APPLY_STAGE is the bottleneck. One way to know for sure would be to turn on stage monitoring logging on the server. To do this, add the following properties to a file named tc.properties and drop it in the lib directory of the TC installation:

Code:
 tc.stage.monitor.enabled = true
 tc.stage.monitor.delay = 5000
 


You can then share the terracotta-server.log file so we can analyze it. Basically, the stage that consistently shows high number of pending transactions could be the bottleneck, provided it does not recover.

We already have certain properties that can be tuned to increase or decrease the number of threads in certain stages (e.g. commit, fault, flush) through tc.properties.

Regards,

Kunal Bhasin,
Terracotta, Inc.

Be a part of the Terracotta community: Join now!
jie

neo

Joined: 12/06/2007 02:31:27
Messages: 6
Offline

)

Thanks for your explanation. I agree with you that the current implementation of tc-server is well tuned and refactor on a single stage may not be helpful.

I'm just investigating several POJO clustering approaches for our next generation system, and it will last a couple of months at least. Perhaps it is better we make a further discussion after our team has reached a consensus on this issue. Maybe we actually need your professional support and consultant in near future, but now it is still an unsettled issue.

By the way, the active-active server is very interesting. And we will be very glad if you publish your recent road map.

Thanks again.
jie

neo

Joined: 12/06/2007 02:31:27
Messages: 6
Offline


Actually, I have noticed that tc.properties contains some important tuning options and logging switches when I try to discover the bottle neck. There is a piece of server logs:

2007-12-06 17:58:19,335 [SEDA Stage Monitor] INFO com.tc.async.impl.StageManagerImpl - Stage Depths
2007-12-06 17:58:19,335 [SEDA Stage Monitor] INFO com.tc.async.impl.StageManagerImpl - =================================
2007-12-06 17:58:19,335 [SEDA Stage Monitor] INFO com.tc.async.impl.StageManagerImpl - apply_changes_stage : 127
2007-12-06 17:58:19,335 [SEDA Stage Monitor] INFO com.tc.async.impl.StageManagerImpl - apply_complete_stage : 0
2007-12-06 17:58:19,335 [SEDA Stage Monitor] INFO com.tc.async.impl.StageManagerImpl - broadcast_changes_stage : 42
2007-12-06 17:58:19,335 [SEDA Stage Monitor] INFO com.tc.async.impl.StageManagerImpl - channel_life_cycle_stage : 0
2007-12-06 17:58:19,335 [SEDA Stage Monitor] INFO com.tc.async.impl.StageManagerImpl - client_handshake_stage : 0
2007-12-06 17:58:19,335 [SEDA Stage Monitor] INFO com.tc.async.impl.StageManagerImpl - commit_changes_stage : 0
2007-12-06 17:58:19,335 [SEDA Stage Monitor] INFO com.tc.async.impl.StageManagerImpl - gc_result_processing_stage : 0
2007-12-06 17:58:19,335 [SEDA Stage Monitor] INFO com.tc.async.impl.StageManagerImpl - group_events_dispatch_stage : 0
2007-12-06 17:58:19,335 [SEDA Stage Monitor] INFO com.tc.async.impl.StageManagerImpl - hydrate_message_stage : 0
2007-12-06 17:58:19,335 [SEDA Stage Monitor] INFO com.tc.async.impl.StageManagerImpl - jmx_events_stage : 0
2007-12-06 17:58:19,335 [SEDA Stage Monitor] INFO com.tc.async.impl.StageManagerImpl - jmxremote_connect_stage : 0
2007-12-06 17:58:19,335 [SEDA Stage Monitor] INFO com.tc.async.impl.StageManagerImpl - jmxremote_tunnel_stage : 0
2007-12-06 17:58:19,335 [SEDA Stage Monitor] INFO com.tc.async.impl.StageManagerImpl - l2_state_change_stage : 0
2007-12-06 17:58:19,336 [SEDA Stage Monitor] INFO com.tc.async.impl.StageManagerImpl - l2_state_message_handler_stage : 0
2007-12-06 17:58:19,336 [SEDA Stage Monitor] INFO com.tc.async.impl.StageManagerImpl - managed_object_fault_stage : 0
2007-12-06 17:58:19,336 [SEDA Stage Monitor] INFO com.tc.async.impl.StageManagerImpl - managed_object_flush_stage : 0
2007-12-06 17:58:19,336 [SEDA Stage Monitor] INFO com.tc.async.impl.StageManagerImpl - managed_object_request_stage : 0
2007-12-06 17:58:19,336 [SEDA Stage Monitor] INFO com.tc.async.impl.StageManagerImpl - object_id_batch_request_stage : 0
2007-12-06 17:58:19,336 [SEDA Stage Monitor] INFO com.tc.async.impl.StageManagerImpl - object_sync_request_stage : 0
2007-12-06 17:58:19,336 [SEDA Stage Monitor] INFO com.tc.async.impl.StageManagerImpl - object_sync_send_stage : 0
2007-12-06 17:58:19,336 [SEDA Stage Monitor] INFO com.tc.async.impl.StageManagerImpl - objects_sync_dehydrate_stage : 0
2007-12-06 17:58:19,336 [SEDA Stage Monitor] INFO com.tc.async.impl.StageManagerImpl - objects_sync_stage : 0
2007-12-06 17:58:19,336 [SEDA Stage Monitor] INFO com.tc.async.impl.StageManagerImpl - process_transaction_stage : 0
2007-12-06 17:58:19,336 [SEDA Stage Monitor] INFO com.tc.async.impl.StageManagerImpl - recall_objects_stage : 0
2007-12-06 17:58:19,336 [SEDA Stage Monitor] INFO com.tc.async.impl.StageManagerImpl - request_batch_global_transaction_id_sequ : 0
2007-12-06 17:58:19,336 [SEDA Stage Monitor] INFO com.tc.async.impl.StageManagerImpl - request_lock_stage : 0
2007-12-06 17:58:19,336 [SEDA Stage Monitor] INFO com.tc.async.impl.StageManagerImpl - respond_to_lock_request_stage : 0
2007-12-06 17:58:19,336 [SEDA Stage Monitor] INFO com.tc.async.impl.StageManagerImpl - respond_to_request_stage : 0
2007-12-06 17:58:19,336 [SEDA Stage Monitor] INFO com.tc.async.impl.StageManagerImpl - send_managed_object_stage : 0
2007-12-06 17:58:19,336 [SEDA Stage Monitor] INFO com.tc.async.impl.StageManagerImpl - server_transaction_ack_processing_stage : 0
2007-12-06 17:58:19,336 [SEDA Stage Monitor] INFO com.tc.async.impl.StageManagerImpl - transaction_acknowledgement_stage : 0
2007-12-06 17:58:19,336 [SEDA Stage Monitor] INFO com.tc.async.impl.StageManagerImpl - transaction_lookup_stage : 0
2007-12-06 17:58:19,336 [SEDA Stage Monitor] INFO com.tc.async.impl.StageManagerImpl - transaction_relay_stage : 0

On the other hand, the CPU usage never beyonds 25% in my 8-core machine no matter how many transactions committed from clients. It seems 2 threads are very busy and the others are sleeping or blocking. These 2 threads maybe the apply-stage and commit-stage. Or, the process-transaction-stage shares 1 CPU with the apply-stage since they are highly coupled. And I do not think IO has reached its toplimit because I notice the network traffic is far away from heavy.

My point is that one-apply-thread-per-client architecture scales well with the number of clients since it's throughput scales up with the number of CPUs. Of course, the precondition is this architecture does not break the transaction sequence.

Thanks.
kbhasin

consul

Joined: 12/04/2006 13:08:21
Messages: 284
Offline

How many clients (clustered application nodes) are you running this test with? Is there some locality of reference? The reason I ask is because I see some pending transactions in the BROADCAST stage as well, which means that there is a possibility to further partition the data. It is difficult to analyze this with just one snapshot of the logs. Can you provide us with a log file after running a fairly long running test?

I think it will be worthwhile to have a conversation even if you are not ready to engage on a professional support level. I would love to provide some input and best practices so you get the most favorable results from Terracotta in this two month long evaluation.

Regards,

Kunal Bhasin,
Terracotta, Inc.

Be a part of the Terracotta community: Join now!
ssubbiah

master

Joined: 05/24/2006 14:25:22
Messages: 71
Location: Saravanan Subbiah
Offline

We have been floating around the idea of multithreading Apply stage for sometime and the idea is very similar to the one you are proposing except there will be a n:m ratio between the number of clients to the apply threads and transactions from a certain client will always be assigned to a specific thread, thus maintaining transaction ordering.

We havent implemented this idea yet since we have never hit a scenario where the apply stage is the bottleneck till now. We hit the disk or sleepycat or the network or lock contention as the bottleneck depending on the usecase. I can imagine apply stage to be the bottle neck when you make many small transactions to the same set of objects in a loop.


On the other hand, the CPU usage never beyonds 25% in my 8-core machine no matter how many transactions committed from clients.
 


Are you looking at the overall CPU usage ? Can you look at each individual CPU usage ? You can use nmon or similar tool to look at it. It will be interesting to see if one or two cores are pinned at 100% usage. In which case your usecase may benefit from multithreading apply stage.

One thing to note is that even in non-persistence mode you may be hitting the disk if your data doesnt fit the memory. So disk may be the bottleneck too. Once again nmon is a great tool to look at your disk throughput.



The precondition is a client should not enter object A's synchronization block until all the transactions on A is fully applied in it's VM, no matter whether the transactions modify A or not.
 


This is exactly what is happening in the client right now.

If you think you have a strong usecase for multithreading apply stage, please share your app with us if possible, We will be glad to look at it.

thanks,
Saravanan


Saravanan Subbiah
Terracotta Engineer
jie

neo

Joined: 12/06/2007 02:31:27
Messages: 6
Offline

Thanks for all the explanations. Just as you said, my test keeps committing small transactions on a small set of objects in a infinite loop. I perform that crazy test for the purpose of figuring out the toplimit of Terracotta's throughput. Although normal applications would not act like that, it does expose the theoretical best performance of Terracotta for a POJO clustering solution, right?

I agree with you that the bottle neck probably lies in disk or network in a real system, but there should be a toplimit of tc-server wherever the bottle neck is, except the server scales out itself. Maybe the active-active servers that kbhasin has mentioned can do this? I really look forward to an unlimited throughput.

Whatever, I must admit, Terracotta is so cool: "no api", "no serialization", "cluster wide transaction", "LRU/LFU expiration", and so on. I hope our team would finally decide to use Terracotta.
 
Forum Index -> Terracotta DSO
Go to:   
Powered by JForum 2.1.7 © JForum Team