protocols

eth2 Medalla Data Challenge: Q&A with Protocol Specialist Elias Simos

A deep dive with Elias into the challenges faced during the eth2 Medalla Data Challenge and how data-informed research can help improve eth2 in the progression to a mainnet launch.

eth2 Medalla Data Challenge: Q&A with Protocol Specialist Elias Simos

By Melissa Nelson · Oct 30 2020

Q: Hey Elias! How did you get involved in the eth2 Medalla Data challenge, and how did your collaboration on the project come to take place?

I first saw it come up in one of the channels on the Bison Trails Slack. I've been doing work with data for the best part of the last three years; in my previous work I led the design and product development of a big data tool that helped us ingest all kinds of data from the market and then interpret it through different models. Through this process, my training as an economist and I suppose somewhat of an innate tendency, measuring things and playing around with data and to try to figure out what the truth is behind it comes a little bit more naturally to me.

So I saw this and thought it was a great idea–eth2 is a big initiative for Bison Trails, and besides, we could probably do a pretty good job at it–we had everything that we needed, and people were supportive. After drafting out a rough project roadmap I started building up my understanding of eth2, how the protocol works, and what each key parameter and data field represented, so that I could both understand how the system works and also to be able to interpret the data that it produces.


"The context of what happened in the testnet overall is a very useful thing to have—both as a timestamp before we move on to the production phase of eth2, and also as an enabler in the iterative software development process of all the different parties that are participating in this ecosystem."


Then, as I started thinking a little bit more about where we get the data necessary and how we manipulate/process it, I quickly realised that in order to produce a high quality piece of work, and not limit the scope of what I’d look at, I would need help. Thankfully, by being in the industry for a little while and having worked with blockchain data before, I knew Sid Shekhar who now works within Coinbase as a Blockchain Research Lead. He and I talked about it for a bit and we thought that this was actually a pretty good fit. Sid brought that crucial extra technical oomph–that is key for a high performing team in a project like this!


"It's an intricate game. It's a dance of incentives between different network participants, so to speak. So, in this testnet phase, the point is to understand how consensus rules work in a production level environment, how they are enforced, and what the outliers are that potentially you weren't expecting to find."


Q: What is the importance of conducting data analysis like this in the testnet phase of eth2?

Medalla is a testnet emulating Phase 0 of eth2, where no transactions are enabled and only the consensus rules are implemented–meaning that this is a game of recording the truth, finalizing what we've recorded, and generally accepting that this is the truth, so we can then generally accept it as such at any time in the future. It's an intricate game. It’s a dance of incentives between different network participants, so to speak. So, in this testnet phase, the point is to understand how consensus rules work in a production-level environment, how they are enforced, and what the outliers are that potentially you weren't expecting to find.

When you put things into a live production environment you might end up with outcomes that you didn't plan for–and sometimes possibly adverse outcomes! It’s particularly important that this takes place in a testnet, because, quite literally, there is no real value at stake. But as that is the case, doing deep data analysis on this environment can help drive powerful insights that will help ecosystem participants to make sense of all the weird and wonderful things that took place in this game, and derive actionable insights from them.

To give you an example–maybe a lot of people got slashed. What was going on when all these people were slashed? Was there an interesting correlation that we observed at the same time? Was there something else that was significant, that was also happening around the same time or before, that could also help explain why the slashings took place? Was it the operator’s fault? The client’s fault? Something else?

Creating a report like the one we did for the eth2 data challenge is really all about aligning and retelling the story of what happened and how it happened. It is also about creating metrics to help tell that story–metrics that are hopefully transferable onto the mainnet, and the multiple phases that it's potentially going to roll out as, to help validators, operators, clients, or developers to agree on a more common language.


Q: What challenges did you overcome in compiling this data?

First off, the main challenge was actually getting the data. I worked on getting data out of the beaconcha.in API, one of the first block explorers for eth2, and I believe I was their first customer of the paid version! So I spoke to the team, wrote a bunch of scripts while understanding the context of the data, and started downloading data into tables. Downloading the data in and of itself was a non-trivial process, as some downloading and syncing on some of the larger datasets as the process was getting interrupted frequently because of the sheer size of the data set.


"Besides the actual timeline delivery, the difficulty of the whole endeavor was a challenge. If you have to face these types of difficulties entirely alone, it's a whole different ball game. Teams rock in general. And a good team rocks exponentially."


After downloading the whole six gigabytes of attestations data–the largest chunk–I realized that the whole thing was wrong. It was random numbers! It would not correlate with what you would find on the front-end of beaconcha.in or on other block explorers. I spoke to the beaconcha.in team and they were really up to take feedback and help us out–credit to them! These teams are fairly lean–we're not talking about huge corporations, we're talking about developers that are definitely mission-driven, but they're also strapped for time.

Once the team fixed it we downloaded the data again. Then we got to a point where the form of the data as we had it, was attestations getting grouped to save space in the P2P layer of the network, and what we were seeing was an array of validator indices. In order to be able to see where they were duplicates, and what kind of duplicates they were, we needed to basically blow this out and expand our dataset from 15 million rows to a dataset of 200 million rows in order to disaggregate these group attestations. As it turns out, if we didn’t want to wait for next year for the results to come out, we would have to put more computing power through those calculations.

The challenge around all of this was that we were working towards a deadline. It's not like we had all the time in the world–every single hindrance we found on the way meant that we had even less time to write and submit the project. You can imagine, then, first the data being wrong took us back maybe three days, then not being able to blow out this calculation took us back maybe a day, then setting up the virtual machine and getting the data up there took us another half day, and then being able to debug the code so that we were able to run this all took us another day. And remember, this whole project–the bulk of the work—took place in about two weeks! And I just now probably enumerated five or six days that were kind of lost.

Thankfully we were two people, so we were able to manage it. We were doing parallel work and more exploratory stuff in the meantime–as the hurdles we met were being ironed out. I'm super thankful that I had such a rockstar teammate, Sid Shekhar, on board to be able to share the load. Besides the actual timeline delivery, the difficulty of the whole endeavor was a challenge. If you have to face these types of difficulties entirely alone, it's a whole different ball game. Teams rock in general. And a good team rocks exponentially!


Q: Were there any eth2 testnet insights your research found that could help improve eth2 as we move towards mainnet?

Yeah, absolutely. One example is that we started looking into how votes get aggregated in eth2, and found votes that weren’t double votes or slashable offenses, but, rather, were the same vote getting included multiple times in the chain. Let's say that every committee has about 128 members and the number of votes that you should see per committee per slot is 128 votes. We saw something around 200 votes, which means that there was a surplus of about 70, 80% in votes that were included. We called that attestations bloat, as it inflates the amount of information that gets included in the chain. While it’s not working against the chain and protocol rules per se, it is surplus information. Some of it is entirely justified because of how the messages are getting propagated in the network, but, looking at an 80% surplus (almost twice as much as you would expect) should probably raise some eyebrows.

The more numerous these surplus observations are the more difficult it becomes for the protocol to enforce some of its rules–like slashing, for example. The way you find misconduct is by surveying the past state of the chain. If you have a leaner state of attestations, a smaller history effectively, that surveyor can actually look through all the inputs much faster–meaning that it can look for more of these events, and it can capture more of these events.

Effectively, with leaving that attestations bloat unchecked, you make the work of those that enforce the negative protocol rules harder. And therefore you make the network and the protocol perform less like it is intended to do.

If you have an unfair environment, or a perceived unfair environment like that, it degrades the very purpose of the network. We all join to play in the same set of rules. But then if one of the key rules that we're expecting to be enforced is actually not being enforced effectively, then immediately people that go against protocol rules can more easily get away with it. And if that's the case, then the marginal net return for misconduct increases! So it plays to the overall health of the network in the long-term to solve these issues now, before they snowball into something bigger.


Q: How do you think that this research will benefit the eth2 community moving forward?

What we thought was so interesting, and important, about this exercise was for the community to be able to reflect and set a frame for what happened in the testnet, show how it happened in numbers, introduce metrics in order to be able to measure what happened, and to accurately describe what sort of conditions were present or were underlying significant events in the testnet.

People in this industry work cross-collaboratively but they also work in silos. I think that the context of what happened in Medalla in general overall is a very useful thing to have–both as a timestamp before we move on to the production phase of eth2, and also as an enabler in the iterative software development process of all the different parties that are participating in this ecosystem.


"I hope that through our work, by leveraging a data-driven storytelling approach driven together with concrete metrics, we'll help people express the things that they observe in ways that can then be more easily diffused as knowledge across the community."


One thing that was a pronounced hindrance for climbing the eth2 learning curve was the relative fragmentation of information. There is no absolute source of truth out there, though people are trying their best! We found that it was difficult to navigate the information maze because there was no one source of truth about what the latest state of the protocol is.

Take, for example, trying to find how a protocol rule is supposed to work, when it's enforced, under what conditions, and how that reflects on a data field in the “Matrix.” We found that, in many cases, either the story's not told with much context, or it's just a bunch of math, or the context is conflicting, or it's a stream of conversation that happens somewhere on a Reddit forum, blog, or a chat–which makes it very hard to piece things together.

Everybody has different ways and frameworks of understanding what's going on, and language is somewhat fragmented. I hope that through our work, by leveraging a data-driven storytelling approach driven together with concrete metrics, we’ll help people express the things that they observe in ways that can then be more easily diffused as knowledge across the community.


More eth2 insights

  • The first post in our eth2 Insights Series reveals important insights into the performance, and importance, of validator attestation aggregation in the eth2 network.
  • The second post in our eth2 Insights Series zooms into slashings in Medalla, examining their correlates and probable causes.
  • The third post in our eth2 Insights Series discusses the parameters governing validator effectiveness in eth2 and how validators were distributed along those in Medalla.
  • The fourth post in our eth2 Insights Series discusses Medalla’s arc of development, the metrics to gauge overall network health, and shares perspective on eth2 Mainnet.
  • Read our Q&A with Elias on the challenges faced during the eth2 Medalla Data Challenge and how data-informed research can help improve eth2 in the progression to a mainnet launch.

For Individuals

Are you an individual with a large amount of ETH? Please contact us to learn how to participate in eth2. ETH holders can also participate with LiquidStake powered by Bison Trails.

Contact Us


Become an eth2 Pioneer

It’s not too late to be an eth2 Pioneer. Learn more about the eth2 Pioneer Program for enterprise. We want you to have early access to build on the Beacon Chain!

Contact Us


—Interview by Melissa Nelson


About Bison Trails


Our mission is to provide superior infrastructure on multiple blockchains, to strengthen the entire ecosystem, and enable the pioneers of tomorrow.

Pioneering Blockchain Infrastructure®

Bison Trails is a blockchain infrastructure company based in New York City. We built a platform for anyone who wants to participate in 19 new chains effortlessly. We also make it easy for anyone building Web 3.0 applications to connect to blockchain data from 27 protocols with QT. Our goal is for the entire blockchain ecosystem to flourish by providing robust infrastructure for the pioneers of tomorrow.


bison cool

THIS DOCUMENT IS FOR INFORMATIONAL PURPOSES ONLY. PLEASE DO NOT CONSTRUE ANY SUCH INFORMATION OR OTHER MATERIAL CONTAINED IN THIS DOCUMENT AS LEGAL, TAX, INVESTMENT, FINANCIAL, OR OTHER ADVICE. THIS DOCUMENT AND THE INFORMATION CONTAINED HEREIN IS NOT A RECOMMENDATION OR ENDORSEMENT OF ANY DIGITAL ASSET, PROTOCOL, NETWORK OR PROJECT. HOWEVER, BISON TRAILS (INCLUDING ITS AFFILIATES AND/OR EMPLOYEES) MAY HAVE, OR MAY IN THE FUTURE HAVE, A SIGNIFICANT FINANCIAL INTEREST IN, AND MAY RECEIVE COMPENSATION FOR SERVICES RELATED TO, ONE OR MORE OF THE DIGITAL ASSETS, PROTOCOLS, NETWORKS, ENTITIES, PROJECTS AND/OR VENTURES DISCUSSED HEREIN.

THE RISK OF LOSS IN CRYPTOCURRENCY, INCLUDING STAKING, CAN BE SUBSTANTIAL AND NOTHING HEREIN IS INTENDED TO BE A GUARANTEE AGAINST THE POSSIBILITY OF LOSS. THIS DOCUMENT AND THE CONTENT CONTAINED HEREIN ARE BASED ON INFORMATION WHICH IS BELIEVED TO BE RELIABLE AND HAS BEEN OBTAINED FROM SOURCES BELIEVED TO BE RELIABLE BUT BISON TRAILS MAKES NO REPRESENTATION OR WARRANTY, EXPRESS OR IMPLIED, AS TO THE FAIRNESS, ACCURACY, ADEQUACY, REASONABLENESS OR COMPLETENESS OF SUCH INFORMATION.

ANY USE OF BISON TRAILS’ SERVICES MAY BE CONTINGENT ON COMPLETION OF BISON TRAILS’ ONBOARDING PROCESS, INCLUDING ENTRANCE INTO APPLICABLE LEGAL DOCUMENTATION AND WILL BE, AT ALL TIMES, SUBJECT TO AND GOVERNED BY BISON TRAILS’ POLICIES, INCLUDING WITHOUT LIMITATION, ITS TERMS OF SERVICE AND PRIVACY POLICY, AS MAY BE AMENDED FROM TIME TO TIME.

Latest News

help

Contact Us

Get in touch

General
Sales
Press
Legal