Blog

Security and Key Management Practices at Chorus One

Chorus One

August 7, 2023

•

Time to Read: 7 minutes

5 min read

August 7, 2023

•

5 min read

Prologue

Recent upgrades to Ethereum, such as The Merge and Shapella, were as transformative to the underlying technical algorithms of Ethereum, as they were for the underpinning economy. Previously, hardware enthusiasts and enterprises worked to provide security for the network and were generously rewarded in the process, while today the main driving force that makes Ethereum work securely is their validator community. While there are some similarities between miners and validators -- they both secure the chain via some technological process -- there are many striking differences between the two, particularly in the technology involved.

This article will focus on the technical aspects of being part of a validator community, risks associated with this role, and Chorus One’s measures to mitigate staking risks in our products.

‍

Cryptography keys used in Ethereum Proof-of-Stake network

Before 2022, Ethereum's main network used a Proof-of-Work (PoW) system called ETHash to create new blocks and handle transactions. Miners using this system had to solve complex puzzles using powerful computers to create blocks of transactions. These blocks were verified with a code unique to the miner, commonly referred to as a (ECDSA) cryptographic key. The EVM then transferred the transaction fees and additional inflation rewards into an account identified by a public key of that miner key pair. This cryptographic key pair that the miner received was known as a coinbase address, and was configured directly in software the miner used.

Now, things are different. Ethereum no longer requires complex puzzle to create blocks anymore. Instead of miners, a specific type of software called Validator Client creates new blocks. Running validators involves locking some funds on-chain, and then producing blocks when that validator is chosen as the next block producer by the randomness of the consensus algorithm. Since there is no hardware-intensive puzzle solving involved, and the randomness consensus algorithm is very lightweight and has allowed the network to reduce the hardware involved and energy spent within Ethereum blockchain. On the economic side, however, this also had profound changes.

First, they changed how rewards are distributed. Instead of the reward going to one address, it is now split into two parts: consensus and execution, which can be specified in separate accounts/addresses. It is split in this manner:

Execution rewards: fees collected for transactions in a block. These are sent to a special address set in the Ethereum software as the "fee recipient address."

Consensus Rewards: These are based on inflation and given to validators who make blocks and also to those who vote on blocks made by other validators. This consensus reward goes to an address set when a new validator is added. It can't be changed later and is called the "withdrawal address."

Second, the very algorithm used to sign the blocks was changed to involve a BLS algorithm instead of the previous ECDSA (however, the individual transaction signatures still utilize ECDSA for convenience and compatibility reasons). Next we will explain why this matters.

‍

More differences between withdrawal and validator addresses

Now, there are two addresses involved in the reward process instead of just one cryptographic key pair. This means each validator has two key pairs. One of these key pairs is called BLS, and it's always used by the Ethereum client software to sign block information. This helps verify the address where fees should go, which is included in the block's data.

The block production process itself does not require the holder to also keep the ECDSA keys to the fee recipient address, so potentially every Ethereum address can be specified in the Validator Client to receive execution rewards. The withdrawal (ECDSA) key pair, however, is never loaded into a client, and is immutably bound to every Ethereum validator during the validator creating process.

For validators, this means that there exists a non-custodial process which allows them to lock their Ethereum coins in the blockchain while still having control over those funds and any future consensus rewards, while the actual Validator Clients can be run by a separate entity. Such an entity would only hold validator BLS signing keys and not the withdrawal ECDSA keys.

‍

Let’s focus on validator keys

The Validator BLS key pair, which is important for confirming and creating new blocks, is traditionally generated from random information using a secure method that keeps the private keys unpredictable. Chorus One has its own in-house tool that makes generating keys fast and secure, which you can read more about here.

After creating the keys, the on-chain verification and loading of the deposit transaction data both kick off in parallel, in a process known as ‘Voting period’, and usually takes around 16 hours. During this time, Ethereum nodes from across the globe will read the deposit transaction log, and vote on whether the signature included in the transaction is correct. When enough votes for correctness are collected, the voting period elapses. However, the validator does not immediately start to fulfill its duty of securing the chain, but due to a limitation in the amount of validators that can become active per block, some additional time is spent in an activation queue. More information on this process can be found here.

Once the activation queue period has passed, the validator clients where the keys have been loaded will start fulfilling Ethereum blockchain duties. Only after this point is that specific duties can be assigned and the keys will start being used to produce signatures and blocks.

This process is visualized on the following image:

Risks involved for the validator key holders and operators

The biggest danger for validator key holders is if those keys are stolen, because attacker in possession of the victim validator keys can produce messages infringing to security of a blockchain, which will lead to slashing the validator that incurs a large penalty (more than 1Eth per validator).Currently, it is not possible for a perpetrator in this scenario to profit from such operation, however the possibility to vandalize the ledger still exists and is the most significant risk when operating the validator keys. Thus, participation in staking usually involves consistent and well-thought security practices that prevent unauthorized access to the validator key seed. It is also important to understand that a single seed can be used to generate multiple validators; the more validators were created from a single seed, the bigger the potential impact from leaking the keys.

Besides this, other risks arise from the operation of running validators.

‍

Client software risks

The Ethereum community often discusses the risks associated with client software diversity, which refers to having different types of software implementations. Ethereum offers various open-source validator options, and users can even create their own software. However, most users prefer established open-source options. Validator software is complex and can have bugs that lead to penalties. To minimize risk, stakers should use different validator software types to avoid simultaneous problems that could result in increased penalties for everyone, known as an "inactivity leak."

It's better for stakers to run multiple types of validators to reduce risk. Currently, Ethereum has good diversity at the consensus layer, but there's an issue with one software type dominating the execution layer. Bugs often appear in specific software versions, but Ethereum runs multiple versions simultaneously to reduce risks. Those using a service to stake Ethereum should make sure the service uses diverse software types for both main agreement and action parts of Ethereum.

‍

Withdrawal risks

Withdrawal risks arise from possible issues with the withdrawal keys, like if they are accidentally revealed or if access to the wallet's private key (ECDSA) is lost. As of now, Ethereum lacks mechanisms to regain access to validator withdrawal once the wallet is lost. This underscores the significance of using a trustworthy wallet, maintaining backups, or relying on a reputable custody provider to safeguard the withdrawal seed. It's essential to ensure the correct wallet public key is used when setting up the validator.

‍

Chorus One operations and practices

At Chorus One, we operate over 8000 validators on the Ethereum Mainnet for various customers, drawing from years of experience without encountering any slashable offense. In the following sections, we'll delve into the techniques we've developed to oversee validator operations, along with the software and infrastructure controls we've implemented. These measures are aimed at minimizing risks for our customers.

‍

Validator key security

Validator key security is at heart of our operation. We ensure that validator keys are never stored on disk without encryption. We utilize cloud-based Vault software, implementing zero-trust access controls, to securely store and provide validator keys to validator clients throughout their lifecycle and operation.

‍

Vault access control

We employ Vault access control policies to ensure that only software clients have access to validator key content. We also segment access for different processes, ensuring that each validator client process can access only a specific set of keys. These keys are guaranteed to be unique across all processes. While generating each validator's private BLS key, we use a strong source of randomness to minimize the possibility of collisions. Furthermore, an SQL database with a unique constraint on the validator's public key field is used to ensure that generated validator keys are never reused for new validators, even if a validator is exited later.

To maintain transparency, we maintain an append-only log of all operations on the Vault storage, and we routinely review it for any anomalies. When it comes to data transfers involving validator keys, they exclusively occur through TLS encrypted channels. Additionally, backups of validator key storage are encrypted with multiple keys, requiring the authorization of multiple individuals to restore from the backup. Each mnemonic used for a validator's BLS private key is unique and exclusively assigned to that specific validator. This approach further minimizes the risk of key leakage.

‍

Slashing Protection Database

For each validator client, we maintain a local slashing protection database. Additionally, we utilize the Web3Signer signing service, which employs a centralized slashing protection database. This dual-layer approach offers enhanced security. In the event of potential glitches or bugs in our cloud platform that might result in two instances of the same process running with the same local slashing protection database, the centralized Web3Signer database acts as a safeguard against double signing by our validators.

The Web3Signer centralized database is replicated across multiple data centers, ensuring redundancy and availability. An automated fail-over mechanism is also in place to address any downtime in a data center. The protective measures employed by Web3Signer to prevent double signing are depicted in the illustration below.

‍

Geographically distributed data-centers

At the core of our infrastructure lies a network of public Ethereum nodes that actively engage in the Ethereum consensus and execution process. These nodes establish dependable infrastructure pathways that support the seamless functioning of the Ethereum network. These public nodes are strategically positioned across various geographical data centers, ensuring redundancy and reliability.

Within our validator clients, we've implemented load-balancing mechanisms. This ensures that if one of the data centers experiences an outage, our validator clients seamlessly transition to utilizing Ethereum nodes from other operational data centers.

‍

Node Monitoring

Alongside the usual health checks for Ethereum APIs, our load-balancing strategy incorporates personalized health assessments for Ethereum nodes. For instance, if an Ethereum node's connected peers experience a sudden drop, our load balancer redirects validator traffic away from that node. This action prevents any potential issues with attestation or block propagation.

At Chorus One, we adopt a safeguard by running various implementations for both the consensus and execution layers in parallel. This approach ensures that any bugs in a single client implementation won't impact all of our nodes. The visual depiction of the infrastructure alignment between public nodes and validators is illustrated in the diagram below.

Our validator client software connects to the public Ethereum nodes, which are hosted on lightweight cloud appliances situated in proximity to the public node hosts. We maintain distinct validator client processes for different customers, ensuring that validators from separate customers don't share the same process memory.

We employ cloud automation software to facilitate automated upgrades for the client process. This includes an instant rollback feature triggered by automated health checks if any misconfiguration is detected that could potentially result in penalties.

‍

Infrastructure Resilience‍

Our infrastructure platform, Kubernetes, operates on top of public cloud providers, ensuring that only a single instance of each validator client process is active at any given time. This is achieved through the utilization of StatefulSet resources, which terminate old processes before launching new ones during restarts.

Our automated validator client updates undergo thorough testing before implementation. Updates are applied exclusively to Ethereum mainnet validators that have been rigorously evaluated and proven effective in privatenet and public testnet environments prior to deployment. The process of automated upgrades and rollbacks is visually depicted in the diagram below.

The controls and mechanisms mentioned above are sophisticated and prioritize security and safety over maintaining uptime. For instance, our validator client software integrates slashing protection to prevent signing attestation in scenarios where true double signing could occur, or if there's an issue with the centralized slashing protection database service.

Another example pertains to the potential downtime of the Vault service, which could lead to validator clients being unable to load signing keys and thus unable to sign on time. To address this, we implement continuous monitoring for all validators and the underlying infrastructure, generating automated alerts if any issues arise. To ensure comprehensive oversight, even in cases where our internal monitoring might falter, we employ a separate process of on-chain monitoring. This process involves scraping Ethereum blockchain APIs from an isolated set of public Ethereum nodes. It raises alerts if any penalties are detected with Ethereum validators.

Our team of rotating on-call engineers is available round the clock to respond to these alerts promptly and troubleshoot any potential problems with validator clients.

About Chorus One

Chorus One is one of the biggest institutional staking providers globally operating infrastructure for 40+ Proof-of-Stake networks including Ethereum, Cosmos, Solana, Avalanche, and Near amongst others. Since 2018, we have been at the forefront of the PoS industry and now offer easy enterprise-grade staking solutions, industry-leading research, and also invest in some of the most cutting-edge protocols through Chorus Ventures.

‍