Welcome to our eth2 insights series, presented by Elias Simos–Protocol Specialist at Bison Trails. In this series, Elias reveals insights he uncovered in his joint research effort with Sid Shekhar (Blockchain Research lead at Coinbase), eth2 Medalla—A journey through the underbelly of eth2’s final call before takeoff.
In this third post, Elias discusses the different parameters that govern validator effectiveness in eth2, proposes a methodology to rank validators, and examines how validators were distributed along those in Medalla.
Validators are the key consensus party in eth2. They are required to perform a range of useful tasks, and, if they perform their duties correctly and add value to the network, they stand to earn rewards for it. More concretely, validators in eth2 are rewarded for proposing blocks, getting attestations included, and whistleblowing on protocol violations. In Phase 0, things are a little simpler as whistleblower rewards are folded into the proposer function.
While the block proposer rewards far exceed those of attesters per unit, as every active validator is called upon to attest once per epoch, the bulk of the predictable rewards in Phase 0 will come from attestations—at a ratio of approximately 7:1 vs proposer rewards.
Taking into account the protocol rules that govern rewards, it quickly becomes apparent that not all validators will be made equal—and therefore the distribution of rewards across validators will not be uniform!
It then follows that if validators are interested in optimizing for rewards, which is what an economically rational actor would do, there is merit to thinking of their existence in eth2 in terms of their “effectiveness.”
In the grand scheme of things, being a proposer in eth2 is like a “party round” for validators. The protocol picks 32 block proposers in every epoch, and tasks them with committing attestations (and later transaction data) on-chain and working towards finalizing a block.
The probability of becoming a proposer in Phase 0, all else equal, is then 32/n where n is the total number of active validators on the network. As the number of network participants grows, the probability of proposing a block diminishes; and so it did in Medalla!
In Phase 0, a proposer is rewarded by the protocol for (i) including pending attestations collected from the P2P network, and (ii) including proposer and attester slashing proofs. The more valuable the attestations that the proposer includes are, the higher their reward will be.
As a result, there are two key vectors along which we can score proposers on an effectiveness spectrum: (a) how often a proposer actually proposes a block when they are called upon to do so, and (b) how many valuable attestations they manage to include in the blocks they propose.
From the two, what a proposer can more effectively control is how many blocks they actually propose from the slots they are allocated by the protocol—a function of their overall uptime. This is the case as the number of valuable attestations a proposer includes on a proposed block depends not only on the proposer, but also both on the attesters sending their votes in on time and also the aggregators aggregating valuable attestations for the proposer to include.
A coarse way to then score for proposer effectiveness is a simple ratio of proposed_slots : total_slots_attributed
.
However, given that validators enter the network at different points in time, it is wise to control for the time each validator has been active (measured in epochs) and the diminishing probability of being selected as a proposer, as the total set of activated validators increases.
In eth2data.github.io we introduced a further optimization to the ratio in order to capture the difficulty factor, by dividing the time weighted ratio of proposed_slots : total_slots_attributed
with the probability of proposing at least once, given a validator’s activation epoch in the Testnet. This is the inverse of the probability that they were allocated 0 slots over the epochs they have been active for (n), such that:
P(p1)=1−P(p0)*n
Which, plotted over ~14,700 epochs, looks like this:
With this in mind we defined proposer effectiveness as:
n
= epochs active
P_ratio
= proposed_slots / total_slots_attributed
P_time_weight
= 1 / epochs active
P_prob_weight
= 1−P(0_proposals)*n
P_effectiveness = [P_ratio * P_time_weight] / P_prob_weight
Given that the probability of proposing at least once diminishes fast in the epochs closer to the present, dividing by P_prob_weight
disproportionately boosts the overall score of the proposer.
A further set of optimizations to the P_effectiveness
score, introduced here to control for the distortions described above, are:
This is what the distribution of ~70k validators that we tracked in Medalla looks like, when scored for P_effectiveness
.
At the edges of the distribution, we observe a large concentration of really ineffective proposers (10% of the whole) that have missed virtually every opportunity they had to propose a block, and very effective proposers (5% of the whole) that have proposed in every slot they were allotted. Between those we find an exceedingly large representation of proposers at the 40% level of P_effectiveness
. The remainder approximately follows a normal distribution with a mode of 0.35.
So far so good, but proposer effectiveness is only ˜⅛ of the story...
Broadly, the two key variables for attesters to optimize along in eth2 are (i) getting valuable attestations included on-chain (a function of uptime) and (ii) the inclusion delay. In plain terms, from an attestation rewards point of view, to optimize for rewards the attester (a) must always participate in consensus and (b) must participate swiftly when their time comes.
For example, with 1 being the minimum possible delay, an inclusion delay of 2 slots leads to the attester only receiving half of the maximum reward available. The expected rewards “moderator” distribution is presented below.
Given the above, the main categories we focused our attention on when scoring for attester effectiveness are the:
aggregate inclusion delay
- measured as the average of inclusion delays a validator has been subject to overtime. A score of 1 means that the validator always got their attestations in without a delay, and maximized their rewards potential.uptime ratio
- measured as the number of valuable attestations vs the time a validator has been active (in epochs). A ratio of 1 implies that the validator has been responsive in every round they have been called to attest in.We defined the attester effectiveness score as:
A_effectiveness = uptime ratio * 100 / aggregate inclusion delay
In order to optimize generalizability, here a normalization of the A_effectiveness score is introduced so that it tops out at 100%.
Here we find that 12% of the active validators in the first 14500 epochs of Medalla scored below 10% in A_effectiveness
–clearly a derivative of the fact that Medalla is a Testnet with no real value at stake and likely exaggerated by the roughtime incident.
The final step for arriving at a comprehensive validator effectiveness score is combining both the attester and proposer effectiveness scores into one “master” score.
To simplify the process we define the master validator effectiveness score as:
V_effectiveness = A_effectiveness * ⅞ + P_effectiveness * ⅛
such that V_effectiveness
reflects the ecosystem-wide expectation of the distribution of ETH rewards that validators stand to achieve by performing their attester and proposer duties, respectively.
Given the much heavier weighting of attester effectiveness in the score, the distribution of V_effectiveness
ends up looking very much like that of A_effectiveness
.
In order to test the accuracy of this approach to calculating V_effectiveness
, we plotted the score against the total rewards that the validators achieved over their lifetime in Medalla and tested for how good a predictor of rewards V_effectiveness
can be.
To ensure that the comparison is “apples-to-apples,” for the purpose of the exercise we only selected the group of validators that were active at genesis (20k unique indices).
When broadening the correlations matrix to capture a wider range of variables of interest, we find that the correlation between V_effectiveness
and rewards drops to 50%—likely because the different entry points, and exposure to varying network conditions, become a moderating factor.
A few more observations worth surfacing are (i) that uptime is a lot more closely correlated to V_effectiveness
than the inclusion delay is—and is thus likely what operators should optimize for first; (ii) that V_effectiveness
improved significantly as Medalla matured—precisely because the aggregate inclusion delay improved greatly, as demonstrated by the 70% positive correlation between the delay and epochs_active; and (iii) as far as the attestations surplus introduced on-chain, there is only a weak relationship between a validator’s effectiveness and the amount of surplus info they load the chain with—meaning that the top performers, from a rewards perspective, may only be marginally less net “pollutants.”
When testing for validator effectiveness vs rewards in groups of validators aggregated by their most common denominator (a mix of graffiti, self id name, common withdrawal keys, and common eth1 deposit address), the results are equally strong!
The correlation between rewards and V_effectiveness
stands at 80%, while in the distribution view we can discern between three groups of operators that performed poorly (C), average (B), or well (A).
Zooming into the top performers (A) and surveying from client choice,we were able to identify only 30% of the ~17k validator indices in the group. 93% of those identified as either Prysm or Lighthouse, with the representation from Teku, Nimbus and Lodestar in the group at under 1% of the sample. Given that this distribution is not representative of the population, and that the unidentified 70% recorded the highest validator effectiveness score, there is no conclusion to extract here with respect to the relationship between performance and client choice.
Zooming out again at the population level and segmenting the view of V_effectiveness
distributions by client choice, Prysm and Lighthouse score at the top, with Teku, Lodestar and Nimbus following.
What is perhaps the most robust finding here, is the fact that even among the top performing clients, there seems to be a significantly wide distribution in in-group validator effectiveness–commensurate to the picture that the aggregate distribution of validators along their effectiveness score paints. This is a strong hint towards the fact that client choice has only so much to do with validator performance. The remainder likely leans on the strength of the operator’s design choices!
In this post, we developed a feature complete validator effectiveness methodology that is not only a strong predictor of ETH rewards achieved, but also takes into consideration a relative approach to scoring by normalizing scores.
Given that the majority of the factors governing rewards in eth2 are relative to the state of the network, we believe that, while computationally more demanding, the approach we introduce here is painting a more accurate picture compared to methodologies that take a nominal view.
We also found that validators running the Prysm and Lighthouse clients recorded better performance in Medalla, on aggregate. However, given the ever changing macro-level conditions in the Testnet, as well as the fact that segmenting for client choice by graffiti is an imperfect way to do it, there are no strong conclusions we can come to with respect to the client choice and performance relationship.
It’s worth underlining, however, that what is crystal clear is that a large chunk of the determinants of performance lie outside of client choice and more closely relate to the robustness of the operator’s set-up.
Are you an individual with a large amount of ETH? Please contact us to learn how to participate in eth2.
It’s not too late to be an eth2 Pioneer. Learn more about the eth2 Pioneer Program for enterprise. We want you to have early access to build on the Beacon Chain!
Bison Trails is a blockchain infrastructure platform-as-a-service (PaaS) company based in New York City. We built a platform for anyone who wants to participate in 20 new chains effortlessly.
We also make it easy for anyone building Web 3.0 applications to connect to blockchain data from 30 protocols with QT. Our goal is for the entire blockchain ecosystem to flourish by providing robust infrastructure for the pioneers of tomorrow.
In January, 2021, we announced Bison Trails will be joining Coinbase to accelerate our mission to provide easy-to-use blockchain infrastructure, now as a standalone product line as part of Coinbase. The Bison Trails platform will continue to support our customers. With Coinbase’s backing, we will enhance our infrastructure platform and make it even easier to participate in decentralized networks and build applications that connect to blockchain data. Read more.
Who’s who in eth2: Raul Jordan from Prysmatic Labs
Feb 27 2021Substrate ecosystem update 005
Feb 26 2021eth2 update 011
Feb 25 2021Announcing Polkadot Indexer, the easy way to optimize participation
Feb 25 2021Launch fully synced nodes in minutes with Global Blockchain Sync
Feb 15 2021Meet the Herd: Head of Business Operations Evan Weiss
Feb 12 2021eth2 update 010
Feb 11 2021We’re joining Coinbase!
Jan 18 2021Bison Trails newsletter 011
Jan 14 2021Bison Trails announces support for Terra
Jan 12 2021An opportunity for US banks to work with public blockchains
Jan 11 2021Delegating digital assets 101
Jan 9 2021View more →