Background
Incentivised testnets have been one of the significant innovations in the crypto space. Cosmos led the way with Game of Stakes, and since then, it’s become a core activity for bootstrapping new networks.
Incentivized testnets are powerful in many ways. They enable validators to build up the skills they need to deploy, upgrade, and maintain the new network. But more than that, testnets help the validators learn how to operate as a community, where they work together to solve problems and make the network more robust. Often the incentives include rewards for maintaining high uptime or for passing a security audit. They may even include activities besides running validators e.g., rewards for content production. Incentivized testnets also give token holders a chance to evaluate the skills of each validator, to help them decide with whom to stake once the mainnet is launched.
A key part of running testnets is to break things. The premise is that the more successful attacks there are on a testnet, the more prepared a network is for mainnet. The best sign that a network has been put through its paces is to find the Holy Grail of bugs: a priority one, severe bug that is a showstopper for the network launch.
This week Chorus One engineer, Reisen, did just that. He demonstrated a critical flaw with the Solana network that allowed him to steal 500M Sol tokens on the testnet. What follows is the story of how he did it.
A Step Back
To start the story, we must go back to when Reisen joined the Chorus One team in June of 2019. Over the summer, Chorus was contracted by Solana to build out StrongGate, a solution that enables high-availability validators on the Solana network. This involved Reisen getting deep into some of the core Solana code. The Solana codebase would challenge any developer, even for a very experienced Rust and functional programming expert like Reisen. That is because the Solana project has innovated on so many fronts in parallel. Solana is built around eight vital technological innovations in one project (as described in this blog post). All these innovations were delivered in Version 1 of the Solana codebase, so when Reisen dove into the codebase, there was still a lot of development activity going on. And none of the eight technologies are easy to grasp. Proof of History is a new model for distributed timekeeping, Tower BFT is a new consensus algorithm, Turbine is a block propagation protocol influenced by BitTorrent, Archivers are a decentralized file system, and that’s only half of them. Anyone of these on its own would challenge any developer. So it was no easy task to dive into the codebase.
When Solana’s incentivized testnet (Tour de SOL) came around, Reisen was determined to come up with the best exploit. It was always in the back of his mind as we looked through the codebase in late 2019. But it wasn’t until Tour de SOL kicked off in February that he focused his attention on the problem.
The First Attacks
Designing distributed computing protocols in Byzantine environments is especially hard. The typical attacks we see on these networks involve crafting malformed packets or launching denial of service attacks. So this is where Reisen started.
His first attack was a joint effort with Certus One. Reisen had identified an issue at network launch time. Each node announces the height of the last block they have seen. The node with the most recent block is then responsible for sharing the latest block(s) to bootstrap the nodes that are joining the network. Reisen realized that this could be used to sabotage the network. By advertising a very high block height, he could control the launch. But the question was how best to exploit this? This is where the amazing Certus One team come in. They built a superb mechanism to slow down the snapshot delivery so that nodes would be effectively stalled trying to get access to the latest chain data before the network could start. They also prepared a compression (or zip) bomb that we deployed but unfortunately was never activated. And they designed the pièce de résistance, a beautiful piece of ASCII art to add to the mischief:
The attack worked! We had successfully attacked the network. Independently, Certus also launched other denial of service attacks on the network. It was such a pleasure for us the partner with Certus on this, as they have long been the rock stars of testnet attacks.
The fun had begun!
The Search For The Big Attack
So far, so good. But Reisen wanted more. He sensed there was a more significant attack possible. And then, last week, he spotted an issue in the code. At this point, it was just a suspicion that an attack might be possible. But he wasn’t sure.
So over a couple of sleepless nights, he set about setting up a test environment to build and test the exploit. Over the weekend, the Chorus team got early indications that the attack was viable. But could Reisen make it happen in the Tour de SOL testnet? We had to be patient as Reisen waited for the right opportunity.
This was the big one. It was the most critical exploit you can imagine in a crypto network: stealing unlimited funds. Reisen had found a bug that allowed a block producer to inject a transaction that could steal funds from any account without knowing the private key. And the network was utterly oblivious to anything being wrong.
But could we launch such a severe attack on Tour De SOL? Reisen felt that we should run it by one of the Solana team first. So we showed the exploit to Michael Vines, Head of Engineering at Solana. And it didn’t work at first. Reisen thought he’d messed up. Maybe there was something wrong with his test environment. He couldn’t reproduce it with Michael. But then, after seven attempts — it worked! We had reproduced it on the Solana Soft Launch testnet.
Now the big question. Could we make it work on Tour De SOL? Again the answer at first was no. It failed. Then it failed again. But at least Michael had seen the exploit working, so we still marked it down as a big success.
But as you may have guessed by now, this wasn’t good enough for Reisen. He needed to see this through. So he left the exploit on a loop. And an hour after we demonstrated the issue to Michael… The transaction went through. WE DID IT! We stole 500 Million SOL tokens from the genesis account on the Tour de SOL testnet.
Deep Dive Into the Issue
Let’s start with some terminology. Solana has slots, periods where a particular validator adds transactions. It’s helpful to think of a slot as a block in a more traditional blockchain context. Leaders are elected for a slot. They get to decide which transactions to include in that slot, during which time they broadcast these transactions to the other validators. Validators run in one of two modes: Broadcast and Replay. The leader broadcasts, while the other validators are in replay mode.
As is typical in blockchains, each transaction contains a set of public keys for the relevant accounts involved in the transaction. In the case of a simple token transfer, this is the sender and receiver public keys. For the token transfer transaction to be valid, it must be signed by the sender’s private key.
Simplified Transaction Format: [Signature, Sender Key, Receiver Key, Payload].
Check: Does Signature belong to Sender Key.
If you sign a transaction with a key that isn’t the correct one, then validators will fail the signature verification step. But this is not what happens when the leader injects a transaction.
The leader broadcasts the transactions they have included in the slot. Each validator receives those transactions, in what’s called replay mode. When transactions are replayed to a validator from the leader, the signature of each transaction should be verified. Instead the validator does a “light” check on the fields of the transaction. It looks to confirm that the correct keys are in place. There is a field in the transaction to indicate if the signatures have been previously checked (presumably by the node that accepted the transaction). But — and this is the critical flaw — the receiving validator does not actually re-run this validation of the signature i.e. there is no explicit check by the validator to verify that the owner of the account from which funds are being withdrawn has signed the transaction. In effect, the receiving validator trusts the transaction coming from the leader. This was the critical issue Reisen identified.
The steps he took were as follows:
But how come we had trouble reproducing the attack on the two testnets? The issue was that sometimes the malicious transaction the leader submitted never made it into the chain. The Solana network is so fast that it’s hard for a leader to inject transactions fast enough. But by retrying in a loop, the transaction was finally accepted into the chain. Reisen had succeeded through sheer grit and determination and found a way to steal 500M SOL on the Tour De SOL testnet.
The Solana Team Response
The Solana team’s response has been great.
The Solana codebase is excellent. The Rust compiler ensures type safety, which rules out whole classes of bugs. And the code is written defensively so that all inputs are checked. We just happened to find the one case where the robust checks we see everywhere else in the code were missed.
And now the hard work for Solana starts. Of course, questions must be asked on how this issue was missed in the recent security audit. But Anatoly (the Solana CEO) has given clear instructions to the team to review all the code, especially the crypto signing pieces.
We think this is the shock that every network needs as it prepares for the mainnet launch. We do not doubt that the Solana team will rise to the challenge to ensure that an issue like this never occurs again.
But at Chorus One we’re thrilled with our work! Reisen got his attack. And he got the much-deserved kudos from Anatoly (which is always lovely!):
And he got a new nickname: Reisen Hood, Prince of Crypto Thieves.
It’s been an amazing week for us. By a strange coincidence, Monday also saw some other big news for Chorus One. We launched Anthem, our multi-network staking platform, which allows token holders to track and manage Proof of Stake portfolios and earnings. Users can create a personalized staking dashboard for any Cosmos address, with detailed data and charts to cover your ATOM staking portfolio. Support for other networks is underway and will be coming soon. So please check it out at https://anthem.chorus.one.