Blockchain client types have different uses within the context of a network. Examining the differences between the four basic client types can help you choose the right type for your needs.
Blockchains are decentralized networks that require many parties to sync with the network in a peer-to-peer manner. Each decentralized network may support multiple software implementations, or clients, which enable a user to take part in the network. A user may execute one of these software implementations in different set forms, called client types or nodes, and these formed nodes are the parties that create the decentralized network.
Typically each node stores its own copy of the blockchain, and keeps track of incoming transactions in order to have an up-to-date view of the network. These nodes are necessary to ensure the correctness of the chain, prevent malicious activity from occurring, and maintain decentralized consensus.
However, the storage and computing requirements of blockchains can prove a barrier to entry for some users, and it is technically difficult to run nodes that require a full download of the blockchain. Thus, new ways are needed to trustlessly sync with the chain in a secure manner.
This post introduces the basic client types of a blockchain network—full nodes, archive (or archival) nodes, light clients, and stateless clients—detailing the characteristics of each and explaining their use cases within the context of a blockchain network.
The most standard type of node is a full node. Full nodes store the entire blockchain data on disk and verify all of the rules of the network, which include tasks such as participating in block validation, receiving and verifying all transactions, and generally serving the network with data.
Full nodes must also store a copy of the state, a data structure that holds the status of users in the network, such as the UTXO set in Bitcoin or all of the accounts and balances in Ethereum. In those networks the state is represented as a Merkle Tree and modified Merkle Patricia Trie, respectively. Full nodes are distinct from miners as miners simply reorder or remove transactions from the data received by nodes and then perform the mining process to solve cryptographic puzzles.
While clients in general must follow a formal specification, a given network can have different client implementations. For example, the Ethereum network consists mostly of Geth and Parity nodes (mostly Geth), and eth2 will support a large variety of client implementations including Prysm, Lighthouse, and Lodestar.
However, full nodes must keep track of a significant amount of information requiring large amounts of storage and bandwidth to operate (SSDs must be used due to the amount of read/write operations). In fact, as of early October 2020, the Bitcoin blockchain was around 300 GB and the Ethereum blockchain was around 500 GB on Geth. Additionally, though nodes require 24/7 uptime and a high level of technical knowledge is required to maintain them, there is generally no direct economic incentive to run a full node. As a result, many users running full nodes are businesses such as exchanges or infrastructure providers who rely on full nodes’ other benefits.
While running a full node is technically challenging, there are benefits to doing so. First, running a full node is the most secure way to access a network. It guarantees maximum self-sovereignty because you can trustlessly verify that all network rules are being followed. By running a full node you also improve the decentralization and overall health of the network by acting as a data provider and protecting other clients from being tricked by malicious nodes or miners.
Full nodes help secure the network by verifying all of the transactions, rather than just those which are relevant to them, and further secure the network by alerting other client types of invalid blocks. In some networks, certain other client types rely on full nodes to verify transactions, and cannot access the blockchain without connecting to a full node.
In some networks you may also be able to receive rewards for running a full node. For example, Celo aims to address the lack of economic incentives for operating a full node. The Celo network allows individuals who run non-validating full nodes to set gateway fees for answering requests and forwarding transactions on behalf of other types of clients. We hope to see further experimentation to incentivize participants to run full nodes in future blockchain networks.
While full nodes already store a large amount of data, archival nodes take it even further by storing everything included in a full node along with an archive of the historical states of the chain.
|Full Node||Archival Node|
|Stores current balance of any account in chain||Stores full balance history of any account in chain|
|Stores state of last few blocks in network||Stores history of every state change in network|
|Has information needed to re-compute historical network data||Has historical network data stored, does not need to re-compute to find|
An archival node can therefore be thought of as a full node with a massive amount of cached historical data. Importantly, an archival node does not provide any more validation or security as compared to full nodes.
As of October 2020, archival nodes on Ethereum occupy more than 5.3 TB of data. To sync an archival node on a network with such a large amount of data traditionally takes approximately two weeks, though by utilizing an infrastructure-as-a-service product that time may be dramatically reduced—the Bison Trails ETH archival node is production ready in a few hours. Due to the lift of spinning up an archival node on one’s own a very small number archival nodes are actually run on the network, and they are typically run by entities such as block explorers, data analytics companies, and infrastructure providers.
Due to their intense storage and uptime requirements to remain performant, most users choose not to run full nodes. Light clients, however, improve the accessibility of blockchain networks for resource constrained devices, with high security and low computing power required.
As a low-resource node, a light client allows users to sync with a blockchain in a cryptographically secure manner without actually storing the whole blockchain. Light clients can be used to know the state of an account, check that a transaction was confirmed, or watch for logged events.
Light clients operate by downloading and verifying a chain of block headers and requesting any other relevant information, such as transaction data, from full nodes. The header is the smallest unit that forms a chain and each header refers back to the previous block’s header. The block header stores a condensed version of information in the block, including the hash of the previous block, the timestamp, and the Merkle tree root.
This Merkle tree root is a representation of the state of a block such as the set of all transactions and it can be thought of as a condensed fingerprint of the information about the block. The goal of a light client is to verify and archive the headers and verify received information against the Merkle tree. Only the portion of the state that is relevant to the light client needs to be verified, and proofs received from full nodes can be verified against the Merkle root in the block header.
While light clients do not need to be run 24/7, they must connect to intermediary full nodes in order to request data and interact with the blockchain. Regardless, verifying is trust minimized since the proofs can be verified regardless of whether the light clients trust the full node or not.
In Bitcoin, the method above is known as SPV verification and SPV clients trust downloaded headers as long as they belong to the longest chain. For a given transaction, full nodes provide light clients with an SPV proof and a Merkle path to the transaction in the tree as the data needed to verify the transaction. This method can be used for cross-chain interactions such as bridges or sidechains.
Light clients are well suited for low capacity users, such as those using smartphones or browser extensions, and enable them to maintain a high security assurance about the state of a chain. While light clients do not write data to the network, they do make blockchains more accessible to a variety of users.
The design space of light clients is enormous and there is always room for improvement and need for more features. Light clients can use techniques from cryptography and distributed systems to construct complex yet innovative solutions. Below are some examples of cutting edge light client designs.
Celo’s ultralight client Plumo uses a mix of different cryptography techniques to achieve lightweight validation. In general, doing SPV verification for Proof of Stake networks is expensive as one needs to verify that ⅔ of the validating stake has signed on a block for a given header (and blocks occur frequently). Celo improves upon this by using epoch-based synching where only one header is downloaded per epoch. In Celo, the validator set only changes once per epoch, and one epoch is one day, so the load on light clients is already drastically reduced as they only need to verify headers once per day rather than once per block.
Further, cryptographic primitives such as BLS signatures can be used to aggregate all of a validator’s signatures and SNARKs (proofs used to verify the correctness of a computation without having to execute it oneself) can be submitted from full nodes to prove the light client protocol, which consists of checking the header signatures of the last header of each epoch and any validator set changes. In fact, using SNARKs, one could quickly (relatively) prove validator set changes over the span of months. It is estimated that around 4 million gas are required to validate a proof for a given epoch and 20000 gas for each additional transaction afterwards.
Light clients are also being improved in research. For context, storage and bandwidth requirements scale linearly with the chain length for SPV proofs and can still be a burden in larger blockchains. Flyclient is an efficient method for light client block header verification, which improves upon a technique called Non-Interactive Proofs of proof of work.
Flyclient improves over previous NiPoPOW protocols by being compatible with variable difficulty and hashrate, and also involves short inclusion proofs, which are 10x smaller than previous solutions. Flyclient operates by downloading only a logarithmic number of block headers (instead of having to download every block header) while storing only a single block header between executions.
With Flyclient, one can prove the whole chain is valid using as little information as possible, enabling easier cross chain interoperability in decentralized protocols that require light client verification. ZCash specifically plans to use Flyclient research to implement a ZEC - ETH bridge (tZEC will implement a light client verification of the ZCash blockchain inside an Ethereum smart contract).
Blockchains depend on a shared state that corresponds to the values in a block at a given time. As explained earlier, the state changes after transactions are executed and is typically stored in a tree data structure such as a Merkle tree or Merkle Patricia Trie. However, the state can become very large, and rebuilding the tree for the purposes of verification can also become expensive. This can make node sync times very long, making it harder to run nodes and ultimately decreasing how many nodes are run.
A research initiative called “Stateless Ethereum” has the goal of making nodes in Ethereum easier and faster to spin up by having nodes require the bare minimum amount of information to ensure the validity of the state. This will enable nodes to begin functioning in minutes rather than days, which is an enormous improvement over the status quo.
The most traditional way to sync a node is by using the full sync method, which involves starting at the genesis block to sync, or alternatively by using fast sync, which starts requesting blocks from a trusted checkpoint and then switches to full sync as soon as it catches up. The closest iteration of Stateless Ethereum at the moment consists of using a sync mode called beam sync, which only pulls the data it needs to execute changes to the state instead of downloading the whole state.
In beam sync, clients begin watching and executing transactions as they happen and request a witness (proof) for each block for any information it does not have. Afterwards, the client can gradually rely more on its locally computed state as it builds up its own stored history of transactions.
It is prudent to note that statelessness is a spectrum and a truly stateless client would not actually store any state itself (instead only storing the latest transactions, together with witnesses, to execute the next block). In practice, there will likely be a spectrum of stateful nodes where some nodes provide full information and some receive selected portions of it. For example, full state nodes would compute a witness and attach it to a block whereas partial state nodes would only keep state for a few number of blocks, or would simply watch the state relevant to them and request the rest of the data from witnesses (zero state nodes would rely entirely on witnesses to verify blocks). Ethereum hopes to use stateless clients in eth2, since validators will be shuffled around shards and will need to get up to speed for validation quickly.
Most light clients operate under the trust assumption that the majority of miners/validators are honest and simply check that the majority of miners/validators have supported a given block rather than verifying the block themselves. However, a set of malicious nodes might be able to attack light clients and submit invalid blocks.
A strategy to protect against such behavior is to introduce a system of alerts so that honest full nodes can report an invalid block to light clients. Specifically, fraud proofs can be used to report dishonest behavior and additionally weaken the honest majority assumption. If a verifying node processes a block and finds that it is invalid, it can create a “fraud proof” containing information from the block and Merkle tree to convince any light client that the block is invalid.
Light clients could simply take this proof and verify the block themselves even if they are given no other data. With fraud proofs, light clients have full assurance about the state of a blockchain and are provided with a better security model as long as there is at least one honest node (1 of N). In a stateless validation setting, light clients would need to verify individual blocks only if they hear alarms (where the alarms are verifiable).
However, what happens if an attacker creates an invalid block but does not release data about the block (called the data availability problem)? Fisherman (actors who check for invalid blocks) would not have enough data to prove that the block is invalid. Furthermore, the resulting game between the fisherman and the attacker would become complicated, as the attacker could publish the data at any time if accused of bad behavior.
One solution is to create “proofs” of data availability by the use of erasure codes (e.g. Reed Solomon codes), a cryptographic technique that allows a piece of information to be divided into many pieces (codes) but reconstructed with only a subset of the pieces. Using erasure codes, light clients would be able to prove the data availability of a block probabilistically by downloading only certain chunks of data.
Another source of improvement is using SNARKs or STARKs to create validity proofs, which are cryptographically verifiable proofs that allow block producers to prove to clients that a block satisfies some arbitrarily complex conditions. The light client would simply need to download the header, verify the proof, and then randomly sample some Merkle tree branches of erasure coded data for data availability checks.
It is clear that a wide ecosystem of client types is required in order to serve a variety of blockchain users and use cases, and have truly healthy blockchain networks. While full nodes must exist for the existence of decentralized blockchain networks, the barriers to entry remain high and not every user can run a full node.
Light and stateless clients are therefore necessary to improve the accessibility and decentralization of blockchains by increasing participation—and are simply more convenient for most users. The easier validation is, the greater the chance that new nodes can sync with a chain, which makes the network as a whole more resilient to attacks.
However, the future of blockchain clients is exciting. As new research is implemented in blockchain clients, we will see designs that are drastically more functional, performant, and accessible. Novel cryptographic tools such as SNARKs and STARKs will accelerate the progress of light clients and lead to improvements in areas such as sharding and cross chain protocols, or simply enable use cases that have not been imagined yet.
As we develop these systems further, by developing different technologies and adopting new trust models, our definition of validation and even decentralization may change as well. Lightweight verification has already enabled more robust social coordination with 1 of N trust models and we now realize that everyone is no longer required to validate everything in a blockchain.
And so, perhaps someday, anyone with just a smartphone and internet connection will be securely connected to blockchain networks and we will all have access to a truly global financial system.
Anyone building products and services with blockchain data needs access to reliable read/write nodes, as nodes are the access points into the entire ecosystem. But, developing and managing decentralized and resilient node infrastructure in-house is not a simple task—especially when trying to support a diverse range of blockchain protocols. Relying on a provider that rate-limits data usage, or only supports a few networks, is not an option for many businesses that anticipate rapid growth.
QT by Bison Trails is an infrastructure product designed for companies and entrepreneurs facing the challenges of developing and managing decentralized and resilient node infrastructure as they build secure Web 3.0 applications today. QT provides a robust link between off-chain systems and blockchain networks, making it significantly easier for companies to add blockchain support and expand their protocol coverage without investing to develop capabilities in-house.
Whether you’re an established company looking to free up engineering resources, or your’e a team just getting started in the blockchain space and you want to build something with secure and reliable access to these chains, QT clusters make it fast and easy to build on any of these blockchains. We’re thankful to our early QT customers who helped make this a better product than anything that exists in the ecosystem.Joe Lallouz, CEO
In addition to offering full nodes, we’re proud to offer archival nodes as part of QT clusters. QT Archival includes complete block-by-block information about the state of the network, data not included in a full node’s ledger. Data and Machine Learning (ML) companies can make use of archival nodes without the hassle and expense of maintaining them in-house. Learn More…
Bison Trails is a blockchain infrastructure company based in New York City. We built a platform for anyone who wants to participate in 19 new chains effortlessly. We also make it easy for anyone building Web 3.0 applications to connect to blockchain data from 27 protocols with QT. Our goal is for the entire blockchain ecosystem to flourish by providing robust infrastructure for the pioneers of tomorrow.