Ethereum Storage Roadmap: Challenges and Opportunities

Ethereum Storage Roadmap: Challenges and Opportunities

TL;DR

  • The increasing storage demands present significant challenges for Ethereum nodes.
  • Some clients have started to prune the historical data due to storage constraints, leading in inconsistent storage behaviors among full nodes in the network.
  • To ensure alignment across all clients, historical data pruning is being standardized in EIP-4444 and EIP-4844
  • Consequently, recovering the latest L1 or L2 state by replaying historical data relies on centralized, out-of-protocol services, prompting the exploration of more decentralized Ethereum-aligned solutions
  • Ethereum Portal network is a lightweight, decentralized P2P network for all types of Ethereum data including historical data. It is designed for resource-constrained devices and provides Ethereum JSON-RPC service. The historical network and beacon network are almost ready to use.
  • EthStorage network is an incentivized modular storage network for EIP-4844 BLOBs. To store a BLOB, a user calls the L1 storage contract put() method with the BLOB hash and a fee in ETH. The fee will be gradually distributed to storage providers upon submitting a valid proof of storage of off-chain BLOBs over time. The EthStorage testnet is running on Ethereum Sepolia testnet with multiple community participants successfully proving their local storage.
  • Future initiatives include the development of a decentralized state network, the implementation of proof of storage for dynamic-sized data, and decentralized access directly from browsers.

Acknowledgement: Many thanks to Piper Merriam from EF, Karthik Raju from Polychain, Qiang from EthStorage for providing feedback of the article.

Background

On Oct 22, 2023, Péter Szilágyi, the renowned Go-Ethereum (Geth) dev lead, expressed his deep concerns on Twitter. He pointed out that while Geth clients preserve all historical data, other Ethereum clients like Nethermind and Besu can be configured to operate without certain historical Ethereum data, such as historical block bodies and headers. This makes all clients inconsistent and is unfair to Geth. It sparked intense discussions and debates surrounding the Ethereum Storage problem within the Ethereum roadmap.

The Storage Challenge

Why do Nethermind and Besu opt to cease storing historical data? What issues underlie this decision? From my perspective, there are two primary root causes:

  • The storage requirements for an Ethereum client are becoming increasingly demanding.
  • There is no in-protocol incentive or penalty to store Ethereum historical data.

The first reason stems from the escalating storage demands of running an Ethereum client. To delve into the specific requirements, the following pie chart illustrates the distribution of storage costs for a fresh Geth node, as of block 18,779,761 on December 13, 2023.

where

  • Total storage requirement: 925.39 GB
  • Historical data (blocks/receipts): Approximately 628.69 GB
  • State in Merkle Patricia Trie (MPT): Approximately 269.74 GB

The second reason is the absence of in-protocol incentives or penalties for storing historical blocks. While the protocol mandates nodes to store all historical data, it fails to provide any mechanism to encourage storage or penalize non-compliance. Storing and sharing historical data by nodes become purely altruistic, and a node is free to prune all historical data without facing any adverse consequences. In contrast, validators, for instance, must maintain the latest full state to avoid proposing/voting for an invalid block, risking the loss of incentives in either case.

Consequently, when the storage cost becomes a substantial burden for a node, it's not surprising that some node operators choose to prune historical data. Opting to run without historical data can result in significant storage cost savings, reducing it from approximately 1TB to around 300GB.

The Nethermined configuration to run a node without historical block bodies - saving ~460GB storage cost at the time being.

The challenge of storage is expected to intensify with the upcoming Ethereum Data Availability (DA) upgrade. The path towards fully scaling Ethereum DA commences with EIP-4844 in DenCun, introducing a fixed-sized Binary large object (BLOB) accompanied by an independent fee model known as blobGasPrice. Each BLOB is set at 128KB, and EIP-4844 permits each block to contain up to 6 BLOBs. To enhance data scalability, the plan involves implementing 1D Reed-Solomon code, allowing for 32 BLOBs per block initially and eventually reaching 256 BLOBs per block at full scaling. With the Ethereum DA operating at full data capacity with 256 BLOBs per block, one year of Ethereum DA network is projected to accept approximately 80 TB of data, surpassing the storage capacities of most Ethereum nodes.

Ethereum Storage Roadmap and Consequence

Vitalik’s tweet of Ethereum roadmap, in which the Purge mainly deals with storage.

The escalating storage costs have garnered attention from researchers within the Ethereum ecosystem. To address this and ensure alignment across all clients, several proposals are in development to explicitly prune storage. The two main proposals are:

  1. EIP-4444: Bound Historical Data in Execution Clients: This proposal allows a client to prune historical blocks older than one year. Assuming an average block size of 100K, the historical block data is capped at approximately 250 GB (100K * (3600 * 24 * 365) / 12, given the block time = 12s).
  2. EIP-4844: Shard BLOB Transactions: EIP-4844 discards BLOBs older than 18 days. This is a more aggressive approach compared to EIP-4444, capping the historical BLOB size at around 100 GB ((18 * 3600 * 24) * 128K * 6 / 12, given block time = 12s).

What is the consequence of pruning historical data from all clients? The main one is that a fresh node cannot synchronize to the latest state via “full sync” - a synchronization to replay the transactions from the genesis block to the latest block. Instead, we have to resort to a “snap sync” or “state sync” to synchronize the latest state from Ethereum peers. This approach is already implemented in Geth and runs as the default sync.

Similarly, this consequence also applies to all L2s, i.e., a fresh node of L2 cannot fully replay the latest state from L2 genesis from Ethereum by replaying L2 blocks from L2 genesis. Further, since the L1 nodes do not maintain the L2 state, the “snap sync” approach for L2 cannot derive the latest L2 state from L1 - breaking an important L2 assumption of inheriting Ethereum security guarantees. The projected solution will rely on 3rd-party services such as Infura / Etherscan / L2 projects themselves to store a copy of historical L2 data or state, which is centralized with out-of-protocol indirect incentive.

The core questions we are asking are

  • Can we have a better decentralized solution, both in terms of storage and access, to the problem?
  • It is possible to build an Ethereum-aligned trust-minimized solution (e.g., on top of an L1 contract) with a direct incentive solution?
  • With all these, can we pave the way to an in-protocol direct-incentive solution to Ethereum storage and accessing them in a fully decentralized way in the Ethereum roadmap?

Solutions

Solution 1: Ethereum Portal Network

The Ethereum Portal network serves as a lightweight, decentralized access network to the Ethereum protocol. Offering the Ethereum JSON-RPC interface such as eth_call, eth_getBlockByNumber, it translates JSON-RPC requests into P2P requests to a distributed hash table, similar to IPFS network. Unlike IPFS, which permits the storage of any data type and is susceptible to spam, the Portal P2P network exclusively hosts Ethereum data, such as historical headers and bodies. This is achieved through a built-in light-client verification technique within the Portal network.

A significant feature of the Portal network is its design for lightweight operation and compatibility with resource-constrained devices. It can run on top of a node with a few megabytes of storage and low memory, promoting decentralization. Even a cellphone or a Raspberry Pi device can potentially join the network and contribute to the availability of Ethereum data.

The development of the Portal network aligns with the Ethereum client diversity philosophy, with clients written in Rust, JavaScript, Nim, and Go. The beacon network and history network are ready for use, while the state network is actively under development. Notably, the Portal network does not provide direct incentives for data storage—all nodes in the network operate altruistically.

Running a Portal network (Trin) with a 100MB storage limit.

Solution 2: EthStorage Network

The EthStorage network is a decentralized incentivized storage network specifically designed to store EIP-4844 BLOBs, supported by a grant from the ESP program.

  • Minimal Trust: Unlike the existing solutions that needs a centralized data bridge, EthStorage relies on the consensus of Ethereum and $1/m$ trust model of permissionless EthStorage storage providers. The procedure of storing a BLOB is like this: a user signs a BLOB-carry transaction that calls put(key, blob_idx) method of the storage contract. The storage contract will then record the BLOB hash and notify the storage providers with an event. The storage providers, after being received the event, would then download and store the BLOB directly from the Ethereum DA network, circumventing the data bridge issue.
  • Align Storage Cost with Incentive: When calling put() method, a storage fee must be sent (via msg.value) and deposited in the contract. This storage fee is gradually distributed to storage providers over time upon successful submission and verification of a storage proof. Compared to the current Ethereum storage fee model that pays one-time storage fee to the proposer, the storage fee paid over time follows a discounted cash flow model - assuming the storage cost decreases vs ETH over time. This significant innovation introduced by EthStorage aligns the fee paid by users and storage provider contributions over time.
  • Proof of Storage: The proof of storage is inspired by data available sampling, while the sampling in EthStorage is performed against BLOBs over time instead of those of a proposed block. To efficiently verify the sampling on-chain, EthStorage heavily leverages smart contracts and the latest developments in SNARK technologies.
  • Permissionless Network: Any node in EthStorage can be paid as a storage provider as long as it stores data and periodically submits proof of storage on-chain.

From blockchain modularity perspective, EthStorage functions as an Ethereum Layer 2, but collects storage fees instead of transaction fees. By indexing BLOB hashes on-chain, EthStorage is an Ethereum modular storage layer with significant storage scalability and cost savings - targeting about 1000x.

In terms of development, EthStorage is already integrated with EIP-4844 on Ethereum Sepolia testnet. A stress test on EthStorage and Ethereum Sepolia testnet has been conducted, involving the writing of approximately hundreds of Gigabytes of BLOBs to EthStorage. More than 50 community participants joined the network and successfully proved their local storages.

The EthStorage network's primary advantage lies in providing a decentralized, direct incentive on top of Ethereum—a pioneering feature, as far as our current knowledge extends. However, a limitation of the network is that it is specifically tailored for fixed-size BLOBs.

The dashboard of EthStorage on Ethereum Devnet-12ALT

Projecting the Future

Ethereum storage, though less spotlighted, holds significant importance within the Ethereum ecosystem. As the Ethereum network is experiencing rapid growth, the storage and accessibility of Ethereum data emerge as critical challenges. While the Portal network and EthStorage network are in their early stages, we envision several intriguing directions for the long term:

  • Decentralized Low-Latency Access to Ethereum State. Accessing the Ethereum state in a decentralized and verifiable manner is a critical yet challenging task. Given a traditional DHT setup, querying an account typically requires multiple queries of the internal trie nodes stored in different P2P nodes. This often leads to considerable long latency. How to employ the structure of the state tree to accelerate the access is the key, as being addressed by the upcoming state network of the Ethereum Portal network.
  • Integration Between Portal Network and EthStorage Network: The Portal network can seamlessly extend its support to include BLOBs within the network, a step already partially taken by the EthStorage team. A natural progression would be to unite these networks to offer a decentralized JSON-RPC network capable of calling contracts with access to BLOBs. Combining the application logic in the contracts and the scaled BLOB storage by EthStorage, we enable new dApps on Ethereum such as dynamic decentralized websites (e.g., decentralized twitter/youtube/wikipedia/etc).
  • Decentralized Access From Browsers: Similar to ipfs:// protocol used to access the data in the IPFS network, there is a growing need for an Ethereum-native access protocol from browsers to unlock the vast potential of Ethereum's rich data. This data encompasses a wide spectrum, ranging from token ownership and balances to NFT images and dynamic decentralized websites, all made possible by the capabilities of smart contracts and future Ethereum storage. In this realm, the web3:// protocol, as defined in ERC-4804/6860, is currently undergoing active development to fulfill this purpose.
  • Advanced Proof of Storage for Dynamic-Sized Data: Beyond fixed BLOBs, exploring advanced proof of storage becomes imperative to address dynamic-sized data, such as historical blocks or even state objects. Developing sophisticated algorithms can enhance the adaptability of storage solutions.

In our pursuit, we aspire that these endeavors collectively contribute to the Ethereum roadmap, laying the groundwork for future decentralized storage solutions within the Ethereum ecosystem.


About EthStorage

EthStorage is a modular and decentralized storage Layer 2 that offers programmable key-value storage powered by DA. It enables long-term DA solutions for Rollups and opens up new possibilities for fully on-chain applications like games, social networks, AI, etc.

For more information, please join EthStorage’s community: