Simple Definition of Data Availability
- Data availability refers to the ability for transaction data to be made available for nodes to download.
- "Data availability" and the "data availability problem" are terms used to refer to a specific problem faced in various blockchain scaling strategies.
- The data availability problem asks: how can nodes be sure that when a new block is produced, that all of the data in that block was actually published to the network?
- The dilemma is that if a block producer doesn't release all of the data in a block, no one could detect if there is a malicious transaction hidden within that block.
- For more information about data availability, this post by Celestia Labs co-founder Mustafa Al-Bassam is a good place to start.
Longer Definition of Data Availability
- Data availability refers to the availability of transactions in a block that is appended to the tip of the chain. During consensus, validators download the block to verify its availability. If the block contains any transactions that are withheld by a validator, the block is unavailable and will be rejected as invalid.
- The condition of whether or not transaction data was made available for nodes to download, when a block was proposed.
- Verifying data availability is the only way to prevent data withholding, a devastating attack that breaks the fundamental security of any blockchain. In the event that a block is proposed where the underlying data is unavailable, the rest of the network won’t be able to confirm the validity of the transactions in the block, or won’t be able to perform a state transition using the update from the proposed block.
- In traditional blockchains, data availability is verified by requiring full nodes to download all the block data. This approach does not scale, hence the need for specialized schemes such as data availability sampling which allow nodes to verify data availability without downloading the entire block.
Data Availibility vs Data Retrievability
- Data availability is only concerned about the availability of a block when it is being proposed by a validator. Once the block has completed the consensus processes, is appended to the tip of the chain, and has propagated throughout the network, then the ability to download transactions from that block is what we call retrievability.
- This distinction is important because retrievability is a different problem from availability.
Articles on Data Availability
- The original paper which proposes a fraud and data availability proof system to increase light client security and to scale blockchains (by Mustafa Al-Bassam, Alberto Sonnino, and Vitalik Buterin)
- More accessible and shorter version of the above
- A rollup centric roadmap for Ethereum
- Scaling ETH in 2020 and beyond — a talk by Vitalik
- Podcast with John Adler on the Data Availability Problem
- Podcast with Ismail Khoffi on Celestia
- Recording of a Twitter Space with zkSync’s Angela Lu, Arbitrum’s Daniel Goldman, and Fuel Labs’ John Adler giving us a Rollup’s Perspective on Ethereum’s Data Sharding Roadmap
- A note on data availability and erasure coding