Tech
5 mins

Testing Lightning Node Performance

Joost Jager

March 30, 2021

Update 2021-04-01: Added Eclair result.

Introduction

Well-known Lightning community member Alex Bosworth recently opened a poll on Twitter:

"Why is it that we can't publish the current transactions per second statistic on the Lightning Network?”

Most given answer:

“Just too much privacy”

We may not be able to provide TPS reports for the network as a whole, but it sure is possible to measure the capability of node implementations.

Why is this relevant? Are we hitting limits yet? Maybe not, but one of the distinguishing features of the Lightning Network is its ability to handle micropayments. A popular use of micropayments is the streaming of sats from listeners to podcasters. If these types of streaming micropayment applications take off in a big way, transaction rates are likely to go up dramatically.

To give an example: a weekly podcast is listened to by 500,000 people. These people stream sats by the minute. The podcast length is 50 minutes. This makes the total number of payments for one episode 25 million. There are roughly 600,000 seconds in a week. If the listening is completely evenly distributed throughout the week, this would work out to ~40 transactions per second. In reality, the peak load right after the release of the episode is much higher. And this is just the figure for a single podcast.

It is not always easy to improve the performance of an existing system. Generally speaking, systems fare better if performance is taken into account early on. As we are considering building streaming payments functionality into our payment infrastructure, we decided to dive deeper into this subject while there is still time. Where are we at with node performance, and are we ready to serve the world? We’ve put several different node configurations to the test and are sharing the results with you in this blog post.

Test setup

There are endless ways to test a node implementation. We kept it simple by using just two nodes that share a set of channels and measuring transaction throughput. To maximise parallelism, multiple worker processes are running. Each worker continually requests invoices from the receiver node and then instructs the sender node to pay those invoices. We also tested invoiceless keysend payments, because keysend (or its successor AMP) is likely to become a building block for streaming money. In this case, the worker process skips requesting the invoice.

test setup

The test is made up of docker containers that are orchestrated via docker-compose, making it easy to run anywhere and experiment with different hardware and software configurations.

Both node containers run on the same machine and disk. This makes the test less realistic, but we still consider the results indicative of performance.

For more details on the test setup, see our github repository.

Results

The test was executed on the following machine:

  • Google Cloud n2d-standard-8 instance (8 vCPUs, 32 GB memory)

  • 100 GB zonal pd-ssd with ext4 filesystem

  • Ubuntu 20.04 LTS

The table below shows the test results for various configurations. The shown values were obtained after completing 10,000 payments.

Configuration Transactions per second
LND (bbolt) 33
LND (bbolt, keysend payments) 35
LND (single-instance etcd) 4
LND (etcd cluster) 4
c-lightning (sqlite, single channel) 61
eclair (sqlite) 12

For the configuration LND+bbolt+keysend, we let the test run until it reached 5,000,000 payments. This shows that the transaction rate goes down as the total number of payments completed increases. Also noticeable is an hourly dip in throughput.

tps chart

Another factor that comes into play with large payment volumes is the used disk space. For the configuration above, we ended up with the following database sizes on disk:

Node Total size Per payment size
Sending node 64,2 GB 12,8 KB
Receiving node 13,8 GB 2,8 KB

Note that this is the required disk space for just 1/5 of the payments that are generated for one episode of the popular podcast in the example above.

For zooming in on disk space usage, Oliver Gugger's fork of boltbrowser is instrumental. It shows the allocated size for the various categories of data. In the case of the receiving node, the tool shows about 4 GB for invoice data and 8 GB for channel state.

There is always the possibility that we’ve implemented some part of the test in a sub-optimal way. If that is the case, please let us know how we can improve the numbers. Or submit a PR directly to the github repository. We are also happy to receive PRs that extend the test to other implementations.

Bottlenecks

To get an idea of potential bottlenecks, a CPU profile can be pulled out from the node containers while the test is running. Below is a flame graph for a receiving LND (bbolt+keysend) node.

flame graph

There is a lot of complexity in a Lightning implementation, but when it comes down to performance this graph shows that there are only two high-impact operations (marked in purple):

  • Database access

  • Cryptographic operations

Optimising the use of cryptographic operations and minimising database access are obvious candidates for performance gains. With database access in particular it is important to be cognisant of the impact that various styles of locking have. Especially in scenarios with high degrees of parallelism, minimising unrelated operations within the context of an exclusive database lock can be a big performance boost.

Conclusion

A first cautious and debatable conclusion from the test results is that the tested configurations may not be ready yet to handle streaming payments for the masses.

To facilitate higher transaction rates, scaling out is a possibility. But there is a limit to the number of instances that anyone wants to run. Costs increase and multiple instances may also lower the efficiency of channel operations on-chain. Any performance improvement on the instance level is therefore more than welcome.

This also raises the question what throughput an optimal implementation is able to achieve. And can further scaling beyond that limit be realised via changes on the protocol level, for example by using lighter cryptography and/or loosening the requirements for disk writes?

Let us know your thoughts on this!

Share
2021 Bottle Pay. All rights reserved.