Welcome to our eth2 insights series, presented by Elias Simos–Protocol Specialist at Bison Trails. In this series, Elias reveals insights he uncovered in his joint research effort with Sid Shekhar (Blockchain Research lead at Coinbase), eth2 Medalla—A journey through the underbelly of eth2’s final call before takeoff.
In this third post, Elias discusses the different parameters that govern validator effectiveness in eth2, proposes a methodology to rank validators, and examines how validators were distributed along those in Medalla.
Validators are the key consensus party in eth2. They are required to perform a range of useful tasks, and, if they perform their duties correctly and add value to the network, they stand to earn rewards for it. More concretely, validators in eth2 are rewarded for proposing blocks, getting attestations included, and whistleblowing on protocol violations. In Phase 0, things are a little simpler as whistleblower rewards are folded into the proposer function.
While the block proposer rewards far exceed those of attesters per unit, as every active validator is called upon to attest once per epoch, the bulk of the predictable rewards in Phase 0 will come from attestations—at a ratio of approximately 7:1 vs proposer rewards.
Taking into account the protocol rules that govern rewards, it quickly becomes apparent that not all validators will be made equal—and therefore the distribution of rewards across validators will not be uniform!
It then follows that if validators are interested in optimizing for rewards, which is what an economically rational actor would do, there is merit to thinking of their existence in eth2 in terms of their “effectiveness.”
In the grand scheme of things, being a proposer in eth2 is like a “party round” for validators. The protocol picks 32 block proposers in every epoch, and tasks them with committing attestations (and later transaction data) on-chain and working towards finalizing a block.
The probability of becoming a proposer in Phase 0, all else equal, is then 32/n where n is the total number of active validators on the network. As the number of network participants grows, the probability of proposing a block diminishes; and so it did in Medalla!
In Phase 0, a proposer is rewarded by the protocol for (i) including pending attestations collected from the P2P network, and (ii) including proposer and attester slashing proofs. The more valuable the attestations that the proposer includes are, the higher their reward will be.
As a result, there are two key vectors along which we can score proposers on an effectiveness spectrum: (a) how often a proposer actually proposes a block when they are called upon to do so, and (b) how many valuable attestations they manage to include in the blocks they propose.
From the two, what a proposer can more effectively control is how many blocks they actually propose from the slots they are allocated by the protocol—a function of their overall uptime. This is the case as the number of valuable attestations a proposer includes on a proposed block depends not only on the proposer, but also both on the attesters sending their votes in on time and also the aggregators aggregating valuable attestations for the proposer to include.
A coarse way to then score for proposer effectiveness is a simple ratio of
proposed_slots : total_slots_attributed.
However, given that validators enter the network at different points in time, it is wise to control for the time each validator has been active (measured in epochs) and the diminishing probability of being selected as a proposer, as the total set of activated validators increases.
In eth2data.github.io we introduced a further optimization to the ratio in order to capture the difficulty factor, by dividing the time weighted ratio of
proposed_slots : total_slots_attributed with the probability of proposing at least once, given a validator’s activation epoch in the Testnet. This is the inverse of the probability that they were allocated 0 slots over the epochs they have been active for (n), such that:
Which, plotted over ~14,700 epochs, looks like this:
With this in mind we defined proposer effectiveness as:
n = epochs active
P_ratio = proposed_slots / total_slots_attributed
P_time_weight = 1 / epochs active
P_prob_weight = 1−P(0_proposals)*n
P_effectiveness = [P_ratio * P_time_weight] / P_prob_weight
Given that the probability of proposing at least once diminishes fast in the epochs closer to the present, dividing by
P_prob_weight disproportionately boosts the overall score of the proposer.
A further set of optimizations to the
P_effectiveness score, introduced here to control for the distortions described above, are:
This is what the distribution of ~70k validators that we tracked in Medalla looks like, when scored for
At the edges of the distribution, we observe a large concentration of really ineffective proposers (10% of the whole) that have missed virtually every opportunity they had to propose a block, and very effective proposers (5% of the whole) that have proposed in every slot they were allotted. Between those we find an exceedingly large representation of proposers at the 40% level of
P_effectiveness. The remainder approximately follows a normal distribution with a mode of 0.35.
So far so good, but proposer effectiveness is only ˜⅛ of the story...
Broadly, the two key variables for attesters to optimize along in eth2 are (i) getting valuable attestations included on-chain (a function of uptime) and (ii) the inclusion delay. In plain terms, from an attestation rewards point of view, to optimize for rewards the attester (a) must always participate in consensus and (b) must participate swiftly when their time comes.
For example, with 1 being the minimum possible delay, an inclusion delay of 2 slots leads to the attester only receiving half of the maximum reward available. The expected rewards “moderator” distribution is presented below.
Given the above, the main categories we focused our attention on when scoring for attester effectiveness are the:
aggregate inclusion delay- measured as the average of inclusion delays a validator has been subject to overtime. A score of 1 means that the validator always got their attestations in without a delay, and maximized their rewards potential.
uptime ratio- measured as the number of valuable attestations vs the time a validator has been active (in epochs). A ratio of 1 implies that the validator has been responsive in every round they have been called to attest in.
We defined the attester effectiveness score as:
A_effectiveness = uptime ratio * 100 / aggregate inclusion delay
In order to optimize generalizability, here a normalization of the A_effectiveness score is introduced so that it tops out at 100%.
Here we find that 12% of the active validators in the first 14500 epochs of Medalla scored below 10% in
A_effectiveness–clearly a derivative of the fact that Medalla is a Testnet with no real value at stake and likely exaggerated by the roughtime incident.
The final step for arriving at a comprehensive validator effectiveness score is combining both the attester and proposer effectiveness scores into one “master” score.
To simplify the process we define the master validator effectiveness score as:
V_effectiveness = A_effectiveness * ⅞ + P_effectiveness * ⅛
V_effectiveness reflects the ecosystem-wide expectation of the distribution of ETH rewards that validators stand to achieve by performing their attester and proposer duties, respectively.
Given the much heavier weighting of attester effectiveness in the score, the distribution of
V_effectiveness ends up looking very much like that of
In order to test the accuracy of this approach to calculating
V_effectiveness, we plotted the score against the total rewards that the validators achieved over their lifetime in Medalla and tested for how good a predictor of rewards
V_effectiveness can be.
To ensure that the comparison is “apples-to-apples,” for the purpose of the exercise we only selected the group of validators that were active at genesis (20k unique indices).
When broadening the correlations matrix to capture a wider range of variables of interest, we find that the correlation between
V_effectiveness and rewards drops to 50%—likely because the different entry points, and exposure to varying network conditions, become a moderating factor.
A few more observations worth surfacing are (i) that uptime is a lot more closely correlated to
V_effectiveness than the inclusion delay is—and is thus likely what operators should optimize for first; (ii) that
V_effectiveness improved significantly as Medalla matured—precisely because the aggregate inclusion delay improved greatly, as demonstrated by the 70% positive correlation between the delay and epochs_active; and (iii) as far as the attestations surplus introduced on-chain, there is only a weak relationship between a validator’s effectiveness and the amount of surplus info they load the chain with—meaning that the top performers, from a rewards perspective, may only be marginally less net “pollutants.”
When testing for validator effectiveness vs rewards in groups of validators aggregated by their most common denominator (a mix of graffiti, self id name, common withdrawal keys, and common eth1 deposit address), the results are equally strong!
The correlation between rewards and
V_effectiveness stands at 80%, while in the distribution view we can discern between three groups of operators that performed poorly (C), average (B), or well (A).
Zooming into the top performers (A) and surveying from client choice,we were able to identify only 30% of the ~17k validator indices in the group. 93% of those identified as either Prysm or Lighthouse, with the representation from Teku, Nimbus and Lodestar in the group at under 1% of the sample. Given that this distribution is not representative of the population, and that the unidentified 70% recorded the highest validator effectiveness score, there is no conclusion to extract here with respect to the relationship between performance and client choice.
Zooming out again at the population level and segmenting the view of
V_effectiveness distributions by client choice, Prysm and Lighthouse score at the top, with Teku, Lodestar and Nimbus following.
What is perhaps the most robust finding here, is the fact that even among the top performing clients, there seems to be a significantly wide distribution in in-group validator effectiveness–commensurate to the picture that the aggregate distribution of validators along their effectiveness score paints. This is a strong hint towards the fact that client choice has only so much to do with validator performance. The remainder likely leans on the strength of the operator’s design choices!
In this post, we developed a feature complete validator effectiveness methodology that is not only a strong predictor of ETH rewards achieved, but also takes into consideration a relative approach to scoring by normalizing scores.
Given that the majority of the factors governing rewards in eth2 are relative to the state of the network, we believe that, while computationally more demanding, the approach we introduce here is painting a more accurate picture compared to methodologies that take a nominal view.
We also found that validators running the Prysm and Lighthouse clients recorded better performance in Medalla, on aggregate. However, given the ever changing macro-level conditions in the Testnet, as well as the fact that segmenting for client choice by graffiti is an imperfect way to do it, there are no strong conclusions we can come to with respect to the client choice and performance relationship.
It’s worth underlining, however, that what is crystal clear is that a large chunk of the determinants of performance lie outside of client choice and more closely relate to the robustness of the operator’s set-up.
For custodial staking, use Coinbase's eth2 retail staking solution powered by Bison Trails.
It’s not too late to be an eth2 Pioneer. Learn more about the eth2 Pioneer Program for enterprise. We want you to have early access to build on the Beacon Chain!
Bison Trails is a blockchain infrastructure platform-as-a-service (PaaS) company based in New York City. We built a platform for anyone who wants to participate in 21 new chains effortlessly.
We also make it easy for anyone building Web 3.0 applications to connect to blockchain data from 33 protocols with Query & Transact (QT). Our goal is for the entire blockchain ecosystem to flourish by providing robust infrastructure for the pioneers of tomorrow.
In January, 2021, we announced Bison Trails joined Coinbase to accelerate our mission to provide easy-to-use blockchain infrastructure, now as a standalone product line. The Bison Trails platform will continue to support our customers. With Coinbase’s backing, we will enhance our infrastructure platform and make it even easier to participate in decentralized networks and build applications that connect to blockchain data.
Bison Trails newsletter 015Jul 22 2021
Bison Trails and CoinList: supporting the growth of innovative networksJul 22 2021
Bison Trails launches Solana Query & Transact to empower the Solana developer communityJul 19 2021
Bison Trails announces support for ProvenanceJul 13 2021
Substrate ecosystem update 008Jul 2 2021
Opinion: Accelerators of the Multichain FutureJul 2 2021
Bison Trails supports Acala and KaruraJul 1 2021
Guide to HeliumJun 24 2021
Q&A on Codename KeanuJun 16 2021
Cardano’s stake pool pledge and margin mechanicsJun 3 2021
Bison Trails announces support for HeliumMay 28 2021
Bison Trails powers secure staking on the Celo network for Volt CapitalMay 25 2021
View more →