Ethereum Cleanup Plan: A Long-term Solution to Address On-chain Bloat and Complexity

2025-07-16 17:59:33

Ethereum's Possible Future: The Purge

Author: Vitalik Buterin

One of the challenges facing Ethereum is that, by default, the inflation and complexity of any blockchain protocol tend to increase over time. This occurs in two areas:

Historical Data: Any transaction made and any account created at any point in history must be permanently stored by all clients and downloaded by any new client to fully synchronize with the network. This will lead to an increasing client load and synchronization time over time, even if the chain's capacity remains unchanged.

Protocol functionality: Adding new features is much easier than removing old ones, leading to increased code complexity over time.

In order for Ethereum to sustain itself over the long term, we need to apply strong counter-pressure to these two trends, reducing complexity and expansion over time. At the same time, we must retain one of the key attributes that makes blockchain great: persistence. You can place an NFT, a love letter in a transaction call data, or a smart contract containing 1 million dollars on the chain, enter a cave for ten years, and come out to find it still there waiting for you to read and interact with. For DApps to fully decentralize and confidently remove upgrade keys, they need to be assured that their dependencies will not upgrade in a way that disrupts them - especially L1 itself.

If we are determined to strike a balance between these two demands and minimize or reverse the bloat, complexity, and decline while maintaining continuity, it is absolutely possible. Organisms can do this: while most organisms age over time, a few lucky ones do not. Even social systems can have a very long lifespan. In some cases, Ethereum has succeeded: proof of work has disappeared, the SELFDESTRUCT opcode has mostly vanished, and beacon chain nodes have stored old data for up to six months. Finding this path for Ethereum in a more general way and moving towards a long-term stable outcome is the ultimate challenge for Ethereum's long-term scalability, technological sustainability, and even security.

The Purge: Main Objective.

Reduce the storage requirements of clients by minimizing or eliminating the need for each node to permanently store all historical records and even the final state.

Reduce protocol complexity by eliminating unnecessary features.

Article Directory:

History expiry

State expiry

Feature cleanup

History expiry

What problem does it solve?

As of the time of writing this article, a fully synchronized Ethereum node requires approximately 1.1 TB of disk space to run the client, in addition to several hundred GB of disk space for the consensus client. The vast majority of this is historical: data about historical blocks, transactions, and receipts, most of which are several years old. This means that even if the Gas limit does not increase at all, the size of the node will continue to grow by several hundred GB each year.

What is it and how does it work?

A key simplification feature of the historical storage problem is that, because each block points to the previous block through hash links (and other structures), achieving consensus on the current block is sufficient to achieve consensus on history. As long as the network reaches consensus on the latest block, any historical block, transaction, or state (account balance, nonce, code, storage) can be provided by any single participant along with a Merkle proof, which allows anyone else to verify its correctness. The consensus is an N/2-of-N trust model, while history is an N-of-N trust model.

This provides us with many options for how to store historical records. A natural choice is a network where each node only stores a small portion of the data. This is how seed networks have operated for decades: while the network as a whole stores and distributes millions of files, each participant only stores and distributes a few of those files. Perhaps counterintuitively, this approach may not even necessarily reduce the robustness of the data. If we can make it more cost-effective for nodes to operate, we could build a network with 100,000 nodes, where each node stores a random 10% of the historical records, then each piece of data will be copied 10,000 times - exactly the same as a 10,000 node replication factor network, where each node stores everything.

Now, Ethereum has begun to move away from the model where all nodes permanently store all history. Consensus blocks (i.e., those related to proof-of-stake consensus) store data for about 6 months. Blobs are stored for about 18 days. EIP-4444 aims to introduce a one-year storage period for historical blocks and receipts. The long-term goal is to establish a unified period (possibly around 18 days), during which each node is responsible for storing everything, and then to create a peer-to-peer network composed of Ethereum nodes to store old data in a distributed manner.

Erasure codes can be used to improve robustness while keeping the replication factor the same. In fact, Blob has already implemented erasure codes to support data availability sampling. The simplest solution is likely to reuse these Erasure codes and also include execution and consensus block data in the blob.

What are the connections with existing research?

EIP-4444；

Torrents and EIP-4444;

Portal Network;

Portal network and EIP-4444;

Distributed storage and retrieval of SSZ objects in Portal;

How to increase gas limit (Paradigm).

What else needs to be done, what needs to be weighed?

The remaining main tasks include building and integrating a specific distributed solution to store the history—at least the execution history, but eventually also consensus and blobs. The simplest solution is to simply introduce existing torrent libraries, as well as an Ethereum native solution called Portal Network. Once either of these is introduced, we can open EIP-4444. EIP-4444 itself does not require a hard fork, but it does require a new network protocol version. Therefore, enabling it for all clients simultaneously is valuable; otherwise, there is a risk that clients may fail due to expecting to download the full history when connecting to other nodes but not actually obtaining it.

The main trade-off involves how we strive to provide "ancient" historical data. The simplest solution is to stop storing ancient history tomorrow and rely on existing archive nodes and various centralized providers for replication. This is easy, but it undermines Ethereum's status as a permanent record repository. A more difficult but safer approach is to first build and integrate a torrent network to store historical data in a distributed manner. Here, "how hard we work" has two dimensions:

How do we strive to ensure that the largest set of nodes truly stores all the data?

How deep is the integration of historical storage into the protocol?

An extreme paranoid approach to (1) would involve custodial proof: essentially requiring each proof-of-stake validator to store a certain proportion of historical records and periodically verify in an encrypted manner that they are doing so. A more moderate approach would be to set a voluntary standard for the percentage of history stored by each client.

For (2), the basic implementation only involves the work that has been completed today: the Portal has already stored ERA files containing the entire Ethereum history. A more thorough implementation would involve actually connecting it to the synchronization process, so that if someone wants to synchronize a full historical storage node or archive node, they can achieve that through direct synchronization from the Portal network, even if no other archive nodes are online.

(# How does it interact with other parts of the roadmap?

If we want to make it extremely easy to run or start nodes, then reducing historical storage requirements can be said to be more important than statelessness: out of the 1.1 TB required by the node, about 300 GB is state, and the remaining approximately 800 GB has become historical. Only by achieving statelessness and EIP-4444 can the vision of running an Ethereum node on a smartwatch and setting it up in just a few minutes be realized.

Limiting historical storage also makes it more feasible for newer Ethereum nodes to implement, only supporting the latest version of the protocol, which makes them simpler. For instance, many lines of code can now be safely removed because the empty storage slots created during the DoS attack in 2016 have all been deleted. Now that the shift to proof of stake has become history, clients can safely remove all code related to proof of work.

) State expiry

(# What problem does it solve?

Even if we eliminate the need for clients to store historical records, the storage requirements of clients will continue to grow by about 50 GB each year, as the state continues to grow: account balances and random numbers, contract code and contract storage. Users can pay a one-time fee, which will burden both current and future Ethereum clients forever.

"State is harder to 'expire' than history, because the EVM is fundamentally designed around the assumption that once a state object is created, it will always exist and can be read by any transaction at any time. If we introduce statelessness, some argue that this problem may not be so bad: only dedicated block builder classes need to actually store state, while all other nodes (even those involved in list generation!) can operate statelessly. However, there is a viewpoint that we do not want to rely too much on statelessness, and ultimately we may want to allow state to expire to maintain the decentralization of Ethereum."

![Vitalik: The Possible Future of Ethereum, The Purge])https://img-cdn.gateio.im/webp-social/moments-a97b8c7f7927e17a3ec0fa46a48c9f24.webp###

What is it and how does it work?

Today, when you create a new state object (which can occur in one of the following three ways: (i) sending ETH to a new account, (ii) creating a new account using code, (iii) setting a previously untouched storage slot), that state object remains in that state forever. Instead, what we want is for the object to automatically expire over time. The key challenge is to do this in a way that achieves three goals:

Efficiency: No need for extensive additional computations to run the expiry process.

User-friendliness: If someone enters a cave for five years and comes back, they should not lose access to their Ether, ERC20, NFT, and CDP positions...

Developer friendliness: Developers do not have to switch to a completely unfamiliar thinking model. In addition, applications that are currently rigid and not updated should continue to operate normally.

If these goals are not met, it becomes easy to solve problems. For instance, you can have each state object also store an expiration date counter (which can be extended by burning ETH, and this may automatically occur whenever read or written to), and have a process that loops through the states to remove expired date state objects. However, this introduces additional computation (and even storage requirements), and it certainly cannot meet the user-friendliness criteria. Developers also find it difficult to reason about edge cases where stored values sometimes reset to zero. If you set an expiration timer within the contract scope, it technically makes the developer's life easier, but it complicates the economics: developers must consider how to "pass on" the ongoing storage costs to users.

These are issues that the Ethereum core development community has been working hard to address over the years, including proposals such as "blockchain rent" and "regeneration". Ultimately, we combined the best parts of the proposals and focused on two categories of "known least bad solutions":

Partial status expiration solution
Address cycle-based state expiration recommendations.

Partial state expiry

Some status expiration proposals follow the same principles. We divide the status into blocks. Everyone permanently stores a "top-level mapping" where the blocks can be empty or non-empty. Data in each block is stored only if it has been accessed recently. There is a "revival" mechanism that comes into play if it is no longer stored.

The main difference between these proposals is: (i) we like

ETH5.33%

View Original

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.

7 Likes