April 29, 2022
Slowly but steadily, Lightning is maturing. The system has come a long way, but there is still lots of work to be done. With nodes being operated in an increasingly professional way, priorities shift. We’ve moved past the initial goal of just being able to complete a payment and raised our expectations in certain areas.
One such area is reliability. If Lightning is to serve the world, it needs to become more reliable than it currently is.
In this blog post we will zoom in on one requirement for a reliable Lightning operation: fail-over for incoming payments.
Lightning invoices currently always contain a single destination node for the payment. This means that the specific destination node must be running and connected to the network in order to receive the payment. If the node is down, the invoice becomes unusable. A bad experience for the payer and possibly missed revenue for the payee.
A typical solution is to set up multiple nodes. If invoice expiry durations are short and down time is planned, it is possible to gracefully spin down a node by waiting for all invoices generated by that node to reach a final state. During that time, new invoices should only be generated on the remaining nodes.
This however does not work for unexpected down time. An invoice that may have been passed on to a third party already becomes impossible to settle.
Over the past few months, we have worked towards a solution for this problem. We have developed what we call the Lightning Multiplexer. The name is borrowed from the electronics domain. Even though a traditional multiplexer is slightly different, we felt it is close enough to give it this name.
Multiplexer is a massively stripped down version of a full Lightning node, almost to the point that there is nothing left. It has no channels, no external network connections and it does not partake in p2p gossip. It does have its own node key for signing invoices and it is also able to decode onion packets.
The intended setup is a single instance of Multiplexer in conjunction with a number of full nodes.
Senders direct their payments to Multiplexer by its node key, routing through any of the full nodes. When one of the full nodes detects a payment that is to be forwarded to Multiplexer, an alternative processing flow is entered. Instead of trying to look up the outgoing channel (which doesn’t exist), the routing node contacts Multiplexer directly to obtain the preimage for this HTLC and then settles it immediately. The payment is short-circuited with the final hop being no more than a call to obtain the preimage.
If one of the full nodes goes down, senders will keep looking for alternative routes to reach Multiplexer. If full node A is down, they will try to reach Multiplexer via full node B. Full node B is also aware of Multiplexer and is therefore able to settle the payment too. This is what realises fail-over behaviour.
Because Multiplexer has no channels, it is impossible to broadcast its location in the network through Lightning P2P gossip. Senders will not know how to reach it. This is why Multiplexer invoices include route hints. Route hints describe a route from one or more publicly known nodes to the destination. In the case of the example above, there will be two route hints. One hinting a path from full node A to Multiplexer and another one hinting from full node B to Multiplexer.
It is worth noting that senders are not able to distinguish Multiplexer from any other private node.
Multiplexer takes over the invoice handling logic from LND completely and it has its own Postgres-based invoice database where it keeps track of the state of invoices. The invoice databases of the LND instances are no longer used.
Invoices are solely created by Multiplexer and signed with its node key. The LND nodes are not involved in the creation process, other than their node keys being listed as route hints.
For the settlement of invoices, Multiplexer interfaces with LND on the HTLC level through the HTLC interceptor API. HTLCs that come in via the interceptor stream are inspected by Multiplexer. If an HTLC pays to a known invoice, a settle message containing the HTLC preimage is sent back to the node that accepted the HTLC. At that point, Multiplexer marks the invoice as settled in its database.
This mechanism works for multi-part payments too, even when the parts come in through different full nodes. Multi-part, multi-node payments allow the sender to utilise the full liquidity across all nodes for a single payment, improving reliability along a different dimension. More generally, and this also applies to single-part payments, they can choose a route through the node that is best for them and for example minimise the routing fee.
It is critically important that every HTLC is always passed on to Multiplexer for handling. This applies to both the onchain and the offchain resolution paths. Inconsistent behaviour can lead to loss of funds.
The existing API implementation in LND did not offer strong enough guarantees for this. Therefore we’ve upstreamed several PRs to improve and extend the interceptor API to make it safe to use with Multiplexer:
htlcswitch: add htlc interceptor failure control #6177
htlcswitch: add an always on mode to htlc interceptor #6232
htlcswitch: interceptor expiry check #6212
contractcourt: onchain htlc interceptor #6219
In the future, support for node implementations other than LND may be added. Using a mix of implementations further increases the resiliency of the system.
In this described setup, a full node is no longer a single point of failure for a payment. With Multiplexer though, a new single point of failure is introduced. So did we really gain anything? We think the answer to that question is yes.
The Multiplexer database itself can be replicated, so we don’t count it as being an additional single point of failure.
The logic contained in Multiplexer is limited. It tracks incoming HTLCs in memory and when a set of HTLCs is complete, the corresponding invoice in the database is marked as settled. There is far less that can go wrong compared to all the failure modes of a full Lightning node. If Multiplexer is running in a framework like Kubernetes and crashes, a new instance can be brought up automatically with minimal downtime.
One scenario that may be problematic is a bug in Multiplexer from which it cannot recover. In a more distributed setup, that bug may be present too, but may not get triggered on all instances. For the moment, we consider this an acceptable risk. But if needed, the code can be extended to support multiple instances.
We are open-sourcing the code for Multiplexer, as we believe that we all have an obligation to do the best we can to further the development of the Lightning network.
Code and documentation can be found in our repository. Please note that this is work in progress and should be considered experimental.
Feedback, testing or other comments are more than welcome. If you are a developer and interested in getting involved more deeply in our open-source efforts, then check out our dedicated hiring page.
Multiplexer is one of a series of upgrades that we want to introduce to the Lightning community. We are committed to making Lightning the payment rail of the future, to do this we must work together to ready the network for the next phase of real-world adoption.