Solana has announced three main changes in its mitigation plan to address the stability and resilience of the network:
The measures are targeting the intense traffic responsible for two out of the three recent incidents. Although the changes being proposed by Solana developers are considered abstract or deeply technical for the general part of the community, the concepts are not completely new, being imported from other already mature systems. In this article, we will try to break down the technicalities and explain them in simple terms.
The current Solana client version for validator nodes (v1.10) already paves the way for some of these improvements to be iterated on until optimal market fit. Fee prioritization is targeted for the v1.11 release, according to the official announcement.
Solana used to adopt the User Datagram Protocol (UDP) for transmitting transactions between nodes in the network. Nodes send transactions through UDP directly to the leader — the staked node responsible for proposing the block in that particular slot — without a previous connection being established. UDP does not handle traffic congestion or delivery confirmation for data. In situations of network congestion, the leader is unable to handle the volume of incoming traffic, which means some packets get dropped. Even at quiet times, some level of packet loss is normal. By sending the same transaction multiple times, users have a greater chance that at least one of their attempts will arrive.
In contrast to UDP is the Transmission Control Protocol (TCP). TCP includes more sophisticated features but for this to work, it requires a session (i.e. a known connection was previously established between the client and the server). The receiver acknowledges (“acks”) packets and the sender knows when to stop sending packets in case of intense traffic. TCP allows for re-transmitting lost packets, once the sender stops receiving acks, the interpretation is that something must be lost, so the sender should slow down.
TCP is not ideal for some use cases though. In particular, it sequences all traffic. If one portion of the data is lost, everything after it needs to wait. That is not great for Solana transactions, which are independent.
QUIC is an alternative to TCP with similar features: a session, which then enables backpressure to slow the sender down, but it also has a concept of separate streams; so if one transaction gets dropped, it doesn’t need to block the remaining ones.
Solana is a permissionless network. Anyone running a Solana client is a “node” in the network and is able to send messages to the leader. Nodes can operate as validators — when it is signing and sending votes — and (or) they can expose their RPC (Remote Procedure Call) interface to receive messages from applications such as wallets and DEXs, and send those to the leader.
The leader listens on a UDP port and RPCs listen on a TCP port. Given the leader schedule is public, sophisticated players with algorithmic strategies (“bots”) are able to send transactions to the leader directly, bypassing any additional RPC nodes that would only increase latency. With the leader being spammed, the network gets congested and that deteriorates performance. The UDP port used by the leader will be replaced by a QUIC port.
Quality of Service (“QoS”) is the practice of prioritizing certain types of traffic when there is more traffic than the network can handle.
Last January, after Solana faced performance issues as automated trading strategies (aka “liquidator bots”) spammed the network with more than 2 million packets per second, mostly duplicate messages, Anatoly Yakovenko mentioned in a tweet that they would bring the QoS concept to Solana.
The Leader currently tries to process transactions as soon as they arrive. Because IPs are verifiable through QUIC, validators will be able to prioritize and limit the traffic for specific connections. Instead of validators and RPCs blasting transactions at the leader as fast as they can, effectively DoS’ing the leader, they would have a persistent QUIC connection. If the network (IP) gets congested, it will be possible to identify and apply policies to large traffic connections, limiting the number of messages the node can send (“throttle”). These policies are known as QoS.
Internally, staked weighted QoS means queuing transactions in different channels depending on the sender, weighted by the amount of SOL staked. Non-staked nodes will then be incentivized to send transactions to staked nodes first, instead of sending directly to the leader, for a better chance of finding execution, since excess messages from non-staked nodes will most likely be dropped by the leader.
According to Anatoly validators will be responsible for shaping their own traffic, and applying policies that will avoid vulnerability. For example, if a particular node sends huge amounts of transactions, even if they are staked, validators can take action, ignoring the connections established with this node in order to protect network performance.
Solana fees are currently fixed and charged for each signature required in a transaction (5000 lamports = 0.000005 SOL). If there is high competition in a specific market, users face the risk of not getting transactions executed. With a fixed transaction fee, there is no way to communicate priority or compete by paying more to get their transaction prioritized. Without alternatives, users (usually bots) spam transactions to the leader (and soon-to-be leaders) in hope that at least one of them is successful. In many situations, this behavior generates more traffic than what the network can process.
A priority fee is soon to be included in Solana, allowing users to specify an arbitrary “additional fee” to be collected upon execution of the transaction and its inclusion in a block. This mechanism would not only help the network to prioritize time-sensitive transactions but also tends to reduce the amount of invalid or duplicated messages sent by algorithms since speculative operations can become unprofitable with an increase in the total cost.
The ratio of this fee to the requested compute units (the computational cost to the program to perform all operations) will serve as a transaction’s execution priority weight. This ratio will be used by nodes to prioritize the transactions they send to the leader. Additional fees will be treated identically to the base fee today: 50% of the fees paid will be collected by the leader and 50% will be burned.
At this point, you could think of several blocks being filled only with transactions targeting an NFT mint. However, there is a limit time for each account to be locked for writing on a single slot (600 to 800 milliseconds). The remnant block space can be filled with transactions writing in different accounts, even if they offer a smaller priority fee. High-priority transactions trying to write to an account that has already reached its limit will be included in the next block.
Each Solana transaction specifies the writable accounts — the portion of the state that will be modified. This allows transactions to be executed in parallel, as long as transactions are independent, i.e. do not access the same accounts. If two transactions write or read to the same account, these two transactions can not be processed in parallel, because they affect the same state.
The Solana team argues that the priority fee will then behave as parallel auctions, affecting only the “hot market” instead of the global price, allowing the fee to grow for a specific queue of transactions trying to write in that account only.
How does the user know the fee to adopt to get a mint? RPCs nodes will need to estimate an adequate fee, most likely using a simple statistical method, for example averaging the actual cost of similar transactions in previous N blocks, or even a quantile. The optimal method will depend on the market, and whether fees for similar transactions are more volatile (high demand) or stable (less demand).
In practice, the priority fee can have a global effect, if the parallel auctions are not implemented on the validator client. With RPCs and users being responsible for arbitrarily setting it, during high intense traffic, applications will likely try to get priority even though they do not interact with the “hot market”, causing an increase in the fee price for other lower demand dApps.
Fee prioritization is targeted for the v1.11 release, according to the official announcement.
The present article covered the three pieces Solana is actively working on to deal with congestion issues, which include changing the communication protocol from UDP to QUIC, adding stake-weighted QoS for transaction prioritization and a fee market that increases fees with high demand. All of these 3 improvements aspire to improve the performance of Solana, which has been experiencing degraded performance quite often.
We hope it was possible to clarify these concepts and understand the motivations and choices being made. Exploring Solana source code would be an essential next step to investigate the exact metrics being implemented in QoS to select or drop transactions or the mechanism behind the increase (and decrease) of fees and other questions that remain unanswered.
I would like to thank the Chorus One team for the enlightening discussions and knowledge sharing, especially Ruud van Asseldonk for the technical review, and Xavier Meegan for the support.