protocols

eth2 Insights: Validator Effectiveness

The third post in our eth2 Insights Series discusses the parameters governing validator effectiveness in eth2 and how validators were distributed along those in Medalla.

eth2 Insights: Validator Effectiveness

By Elias Simos · Nov 16 2020

Welcome to our eth2 Insights Series, presented by Elias Simos–Protocol Specialist at Bison Trails. In this series, Elias reveals insights he uncovered in his joint research effort with Sid Shekhar (Blockchain Research lead at Coinbase), eth2 Medalla—A journey through the underbelly of eth2’s final call before takeoff.

In this third post, Elias discusses the different parameters that govern validator effectiveness in eth2, proposes a methodology to rank validators, and examines how validators were distributed along those in Medalla.


Key highlights and findings

  • In this post, I introduce a set of optimizations on the approach sketched out in eth2data.github.io, by (i) normalizing the proposer and attester effectiveness scores, and (ii) by aggregating the two in a “master” validator effectiveness score.
  • At a high level, validator effectiveness as defined here appears to be an excellent predictor of the ETH rewards potential of a validator in eth2.
  • We expect that the distribution of validator effectiveness and rewards earned in Mainnet will vary greatly.
  • Prysm and Lighthouse appear to have a clear edge over their competition in driving validator performance.
  • It seems, however, that client choice has only so much to do with validator performance. The remainder likely leans on the strength of the operator’s design choices.

Intro

Validators are the key consensus party in eth2. They are required to perform a range of useful tasks, and, if they perform their duties correctly and add value to the network, they stand to earn rewards for it. More concretely, validators in eth2 are rewarded for proposing blocks, getting attestations included, and whistleblowing on protocol violations. In Phase 0, things are a little simpler as whistleblower rewards are folded into the proposer function.

While the block proposer rewards far exceed those of attesters per unit, as every active validator is called upon to attest once per epoch, the bulk of the predictable rewards in Phase 0 will come from attestations—at a ratio of approximately 7:1 vs proposer rewards.

Taking into account the protocol rules that govern rewards, it quickly becomes apparent that not all validators will be made equal—and therefore the distribution of rewards across validators will not be uniform!

It then follows that if validators are interested in optimizing for rewards, which is what an economically rational actor would do, there is merit to thinking of their existence in eth2 in terms of their “effectiveness.”


Proposer effectiveness

In the grand scheme of things, being a proposer in eth2 is like a “party round” for validators. The protocol picks 32 block proposers in every epoch, and tasks them with committing attestations (and later transaction data) on-chain and working towards finalizing a block.

The probability of becoming a proposer in Phase 0, all else equal, is then 32/n where n is the total number of active validators on the network. As the number of network participants grows, the probability of proposing a block diminishes; and so it did in Medalla!


Probability of proposing a block in every epoch in Medalla
Figure 1: Probability of proposing a block in every epoch in Medalla


In Phase 0, a proposer is rewarded by the protocol for (i) including pending attestations collected from the P2P network, and (ii) including proposer and attester slashing proofs. The more valuable the attestations that the proposer includes are, the higher their reward will be.

As a result, there are two key vectors along which we can score proposers on an effectiveness spectrum: (a) how often a proposer actually proposes a block when they are called upon to do so, and (b) how many valuable attestations they manage to include in the blocks they propose.

From the two, what a proposer can more effectively control is how many blocks they actually propose from the slots they are allocated by the protocol—a function of their overall uptime. This is the case as the number of valuable attestations a proposer includes on a proposed block depends not only on the proposer, but also both on the attesters sending their votes in on time and also the aggregators aggregating valuable attestations for the proposer to include.

A coarse way to then score for proposer effectiveness is a simple ratio of proposed_slots : total_slots_attributed.

However, given that validators enter the network at different points in time, it is wise to control for the time each validator has been active (measured in epochs) and the diminishing probability of being selected as a proposer, as the total set of activated validators increases.

In eth2data.github.io we introduced a further optimization to the ratio in order to capture the difficulty factor, by dividing the time weighted ratio of proposed_slots : total_slots_attributed with the probability of proposing at least once, given a validator’s activation epoch in the Testnet. This is the inverse of the probability that they were allocated 0 slots over the epochs they have been active for (n), such that:

P(p1)=1−P(p0)*n

Which, plotted over ~14,700 epochs, looks like this:


Probability of proposing a block at least once in Medalla, over activation epoch’s
Figure 2: Probability of proposing a block at least once in Medalla, over activation epoch’s


With this in mind we defined proposer effectiveness as:

n = epochs active
P_ratio = proposed_slots / total_slots_attributed
P_time_weight = 1 / epochs active
P_prob_weight = 1−P(0_proposals)*n

P_effectiveness = [P_ratio * P_time_weight] / P_prob_weight

Given that the probability of proposing at least once diminishes fast in the epochs closer to the present, dividing by P_prob_weight disproportionately boosts the overall score of the proposer.

A further set of optimizations to the P_effectiveness score, introduced here to control for the distortions described above, are:

  • 1. normalizing the set of scores and giving them a percentile score out of 100
  • 2. excluding the last 2000 epochs from the overall calculation


This is what the distribution of ~70k validators that we tracked in Medalla looks like, when scored for P_effectiveness.


Distribution of proposer effectiveness scores in Medalla
Figure 3: Distribution of proposer effectiveness scores in Medalla


At the edges of the distribution, we observe a large concentration of really ineffective proposers (10% of the whole) that have missed virtually every opportunity they had to propose a block, and very effective proposers (5% of the whole) that have proposed in every slot they were allotted. Between those we find an exceedingly large representation of proposers at the 40% level of P_effectiveness. The remainder approximately follows a normal distribution with a mode of 0.35.

So far so good, but proposer effectiveness is only ˜⅛ of the story...


Attester effectiveness

Broadly, the two key variables for attesters to optimize along in eth2 are (i) getting valuable attestations included on-chain (a function of uptime) and (ii) the inclusion delay. In plain terms, from an attestation rewards point of view, to optimize for rewards the attester (a) must always participate in consensus and (b) must participate swiftly when their time comes.

Rewards emission in eth2 is designed in such a way that the later an attestation gets included, the lesser the rewards that are emitted back to the attester.

For example, with 1 being the minimum possible delay, an inclusion delay of 2 slots leads to the attester only receiving half of the maximum reward available. The expected rewards “moderator” distribution is presented below.


Inclusion distance vs the corresponding moderator (%) of the max attester reward
Figure 4: Inclusion distance vs the corresponding moderator (%) of the max attester reward


Given the above, the main categories we focused our attention on when scoring for attester effectiveness are the:

  • 1. aggregate inclusion delay - measured as the average of inclusion delays a validator has been subject to overtime. A score of 1 means that the validator always got their attestations in without a delay, and maximized their rewards potential.
  • 2. uptime ratio - measured as the number of valuable attestations vs the time a validator has been active (in epochs). A ratio of 1 implies that the validator has been responsive in every round they have been called to attest in.

We defined the attester effectiveness score as:

A_effectiveness = uptime ratio * 100 / aggregate inclusion delay

In order to optimize generalizability, here a normalization of the A_effectiveness score is introduced so that it tops out at 100%.


Figure 5: Distribution of attester effectiveness across 75,000 validator indices in Medalla
Figure 5: Distribution of attester effectiveness across 75,000 validator indices in Medalla


Here we find that 12% of the active validators in the first 14500 epochs of Medalla scored below 10% in A_effectiveness–clearly a derivative of the fact that Medalla is a Testnet with no real value at stake and likely exaggerated by the roughtime incident.


Validator effectiveness

The final step for arriving at a comprehensive validator effectiveness score is combining both the attester and proposer effectiveness scores into one “master” score.

To simplify the process we define the master validator effectiveness score as:

V_effectiveness = A_effectiveness * ⅞ + P_effectiveness * ⅛

such that V_effectiveness reflects the ecosystem-wide expectation of the distribution of ETH rewards that validators stand to achieve by performing their attester and proposer duties, respectively.

Given the much heavier weighting of attester effectiveness in the score, the distribution of V_effectiveness ends up looking very much like that of A_effectiveness.


Figure 6: Distribution of validator effectiveness across 75,000 validator indices in Medalla
Figure 6: Distribution of validator effectiveness across 75,000 validator indices in Medalla


In order to test the accuracy of this approach to calculating V_effectiveness, we plotted the score against the total rewards that the validators achieved over their lifetime in Medalla and tested for how good a predictor of rewards V_effectiveness can be.

To ensure that the comparison is “apples-to-apples,” for the purpose of the exercise we only selected the group of validators that were active at genesis (20k unique indices).


Figure 7: Validator effectiveness score vs ETH rewards achieved over the Medalla’s lifecycle–for validators that were active since Genesis
Figure 7: Validator effectiveness score vs ETH rewards achieved over the Medalla’s lifecycle–for validators that were active since Genesis



Table 1: OLS regression results table of ETH rewards (dependent variable) vs validator effectiveness score (independent variable)
Table 1: OLS regression results table of ETH rewards (dependent variable) vs validator effectiveness score (independent variable)


The result was an astounding 86.5% correlation between the two variables—meaning that the validator effectiveness score is an excellent predictor of the ETH rewards a validator stands to achieve in eth2!

When broadening the correlations matrix to capture a wider range of variables of interest, we find that the correlation between V_effectiveness and rewards drops to 50%—likely because the different entry points, and exposure to varying network conditions, become a moderating factor.


Figure 8: Correlations between key validator effectiveness variables
Figure 8: Correlations between key validator effectiveness variables


A few more observations worth surfacing are (i) that uptime is a lot more closely correlated to V_effectiveness than the inclusion delay is—and is thus likely what operators should optimize for first; (ii) that V_effectiveness improved significantly as Medalla matured—precisely because the aggregate inclusion delay improved greatly, as demonstrated by the 70% positive correlation between the delay and epochs_active; and (iii) as far as the attestations surplus introduced on-chain, there is only a weak relationship between a validator’s effectiveness and the amount of surplus info they load the chain with—meaning that the top performers, from a rewards perspective, may only be marginally less net “pollutants.”

When testing for validator effectiveness vs rewards in groups of validators aggregated by their most common denominator (a mix of graffiti, self id name, common withdrawal keys, and common eth1 deposit address), the results are equally strong!


Figure 9: Validator effectiveness score vs ETH rewards and distribution of validator effectiveness scores grouped by operator group
Figure 9: Validator effectiveness score vs ETH rewards and distribution of validator effectiveness scores grouped by operator group


The correlation between rewards and V_effectiveness stands at 80%, while in the distribution view we can discern between three groups of operators that performed poorly (C), average (B), or well (A).

Zooming into the top performers (A), and filtering for client identifier, we find that—surprisingly—validators running Lodestar and Teku performed better than those running Lighthouse or Prysm–with a big caveat in that the representation of validators running Lodestar and Teku in group (A) is nearly statistically insignificant.


Table 2: Summary view of client choice of top performing operator groups
Table 2: Summary view of client choice of top performing operator groups


Finally, when zooming out again at the population level and segmenting the view of V_effectiveness distributions by client choice, it becomes more clear why it is Prysm, and more recently Lighthouse, that are dominating client choice among validators in Medalla.


Figure 10: Validator effectiveness score distributions by client choice–population view
Figure 10: Validator effectiveness score distributions by client choice–population view



Table 3: Summary view of validator effectiveness by client choice
Table 3: Summary view of validator effectiveness by client choice


This is a strong hint towards the fact that client choice has only so much to do with validator performance. The remainder likely leans on the strength of the operator’s design choices!

Prysm and Lighthouse appear to significantly differentiate from other clients with respect to the average validator performance recorded in Medalla. The result becomes even more pronounced when segmenting out the bottom performers (group C), which likely represents a type of negligence that we can’t possibly expect on Mainnet—where real ETH is at stake.

On the other end of the distribution, the Nimbus client appears to be the one with the most ground to cover.

What is perhaps the most valuable finding of all, however, is the fact that even among the top performing clients, there seems to be a significantly wide distribution in in-group validator effectiveness. This is a strong hint towards the fact that client choice has only so much to do with validator performance. The remainder likely leans on the strength of the operator’s design choices!


Concluding remarks

In this post, we developed a feature complete validator effectiveness methodology that is not only a strong predictor of ETH rewards achieved, but also takes into consideration a relative approach to scoring by normalizing scores.

Given that the majority of the factors governing rewards in eth2 are relative to the state of the network, we believe that, while computationally more demanding, the approach we introduce here is painting a more accurate picture compared to methodologies that take a nominal view.

We also found that the lead the Prysm and Lighthouse clients currently have in market adoption is well-founded, as they very likely enable better performance. It’s worth underlining, however, that seemingly a large chunk of the determinants of performance lie outside of client choice and more closely relate to the robustness of the operator’s set-up.

More broadly, taking the 10,000 ft view, and using the framing developed over the course of this post, we can liken client software development to a performance sport. This is what the protocol rewards for, after all. Clients will find adoption when they help users achieve maximum rewards: first by playing defense (e.g. helping improve a validator’s security and uptime), and then by playing offense (e.g. by getting attestations included on-chain fast).

But, as with most performance sports, the distribution of winners and not-winners is a power law! And when there are only a handful of categories to compete on, the power law distribution becomes less easily breakable—eventually yielding to a network concentrated around one client choice and all the systemic risks that that implies.

If client variety and a more equal distribution across the whole network is desired, protocol designers should perhaps consider adding to the parameters that the protocol rewards for (e.g. lean attestations, better aggregation, etc), so that different clients can specialize in different niches and cater to user groups with different preference sets.


More eth2 insights

  • The first post in our eth2 Insights Series reveals important insights into the performance, and importance, of validator attestation aggregation in the eth2 network.
  • The second post in our eth2 Insights Series zooms into slashings in Medalla, examining their correlates and probable causes.
  • The third post in our eth2 Insights Series discusses the parameters governing validator effectiveness in eth2 and how validators were distributed along those in Medalla.
  • The fourth post in our eth2 Insights Series discusses Medalla’s arc of development, the metrics to gauge overall network health, and shares perspective on eth2 Mainnet.
  • Read our Q&A with Elias on the challenges faced during the eth2 Medalla Data Challenge and how data-informed research can help improve eth2 in the progression to a mainnet launch.

For Individuals

Are you an individual with a large amount of ETH? Please contact us to learn how to participate in eth2. ETH holders can also participate with LiquidStake powered by Bison Trails.

Contact Us


Become an eth2 Pioneer

It’s not too late to be an eth2 Pioneer. Learn more about the eth2 Pioneer Program for enterprise. We want you to have early access to build on the Beacon Chain!

Contact Us


About Bison Trails


Our mission is to provide superior infrastructure on multiple blockchains, to strengthen the entire ecosystem, and enable the pioneers of tomorrow.

Pioneering Blockchain Infrastructure®

Bison Trails is a blockchain infrastructure company based in New York City. We built a platform for anyone who wants to participate in 19 new chains effortlessly. We also make it easy for anyone building Web 3.0 applications to connect to blockchain data from 27 protocols with QT. Our goal is for the entire blockchain ecosystem to flourish by providing robust infrastructure for the pioneers of tomorrow.


bison cool

THIS DOCUMENT IS FOR INFORMATIONAL PURPOSES ONLY. PLEASE DO NOT CONSTRUE ANY SUCH INFORMATION OR OTHER MATERIAL CONTAINED IN THIS DOCUMENT AS LEGAL, TAX, INVESTMENT, FINANCIAL, OR OTHER ADVICE. THIS DOCUMENT AND THE INFORMATION CONTAINED HEREIN IS NOT A RECOMMENDATION OR ENDORSEMENT OF ANY DIGITAL ASSET, PROTOCOL, NETWORK OR PROJECT. HOWEVER, BISON TRAILS (INCLUDING ITS AFFILIATES AND/OR EMPLOYEES) MAY HAVE, OR MAY IN THE FUTURE HAVE, A SIGNIFICANT FINANCIAL INTEREST IN, AND MAY RECEIVE COMPENSATION FOR SERVICES RELATED TO, ONE OR MORE OF THE DIGITAL ASSETS, PROTOCOLS, NETWORKS, ENTITIES, PROJECTS AND/OR VENTURES DISCUSSED HEREIN.

THE RISK OF LOSS IN CRYPTOCURRENCY, INCLUDING STAKING, CAN BE SUBSTANTIAL AND NOTHING HEREIN IS INTENDED TO BE A GUARANTEE AGAINST THE POSSIBILITY OF LOSS. THIS DOCUMENT AND THE CONTENT CONTAINED HEREIN ARE BASED ON INFORMATION WHICH IS BELIEVED TO BE RELIABLE AND HAS BEEN OBTAINED FROM SOURCES BELIEVED TO BE RELIABLE BUT BISON TRAILS MAKES NO REPRESENTATION OR WARRANTY, EXPRESS OR IMPLIED, AS TO THE FAIRNESS, ACCURACY, ADEQUACY, REASONABLENESS OR COMPLETENESS OF SUCH INFORMATION.

ANY USE OF BISON TRAILS’ SERVICES MAY BE CONTINGENT ON COMPLETION OF BISON TRAILS’ ONBOARDING PROCESS, INCLUDING ENTRANCE INTO APPLICABLE LEGAL DOCUMENTATION AND WILL BE, AT ALL TIMES, SUBJECT TO AND GOVERNED BY BISON TRAILS’ POLICIES, INCLUDING WITHOUT LIMITATION, ITS TERMS OF SERVICE AND PRIVACY POLICY, AS MAY BE AMENDED FROM TIME TO TIME.

Latest News

help

Contact Us

Get in touch

General
Sales
Press
Legal