A New Era of Web3 Data Access: An Analysis of Indexer Technology and Comparison of Mainstream Projects

The Evolution of Web3 Data Access: An Analysis of Indexers and Related Projects

Data is the core of blockchain technology and the foundation for the development of decentralized applications (dApp). Currently, a lot of discussion in the industry revolves around data availability (DA), which ensures that network participants can access the latest transaction data for verification. However, the equally important aspect of data accessibility is often overlooked.

In the era of modular blockchain, DA solutions have become an indispensable part. These solutions ensure that all participants can access transaction data, achieve real-time validation, and maintain network integrity. However, the DA layer is more like a billboard than a database; data is not permanently stored and will be deleted over time.

In contrast, data accessibility focuses on the ability to retrieve historical data, which is crucial for dApp development and blockchain analysis. Although less discussed, data accessibility is equally important as data availability. The two play different but complementary roles in the blockchain ecosystem, and comprehensive data management must address both issues simultaneously to support robust and efficient blockchain applications.

Development of Web3 Data Access: Introduction to Indexers and Related Projects

Traditional methods of blockchain data retrieval

Since its inception, blockchain has fundamentally changed the infrastructure, promoting the creation of dApps in areas such as gaming, finance, and social networks. However, building these dApps requires access to a large amount of blockchain data, which is both difficult and expensive.

For dApp developers, one option is to self-host and run archive RPC nodes. These nodes store all historical blockchain data, allowing for full access. However, the maintenance costs are high, and the querying capabilities are limited, making it difficult to retrieve data in the format required by developers. Running cheaper nodes is another option, but their data retrieval capabilities are limited, which may affect the operation of the dApp.

Another method is to use commercial RPC node services. These providers are responsible for node costs and management, providing data through RPC endpoints. Public RPC endpoints are free but have rate limits, which can affect user experience. Private RPC endpoints perform better, but simple data retrieval also requires a lot of communication, leading to inefficiency and difficulty in scaling.

A better alternative: blockchain indexer

Blockchain indexers play a key role in organizing chain data and sending it to databases for querying, hence they are often referred to as the "Google of blockchain." They index blockchain data, making it available through SQL-like query languages such as GraphQL API (. Indexers provide a unified query interface that allows developers to quickly and accurately retrieve the information they need using standardized language, greatly simplifying the process.

Different types of indexers optimize data retrieval in various ways:

  1. Full Node Indexer: Runs a complete blockchain node to directly extract data, ensuring data is complete and accurate, but requires a large amount of storage and processing power.

  2. Lightweight Indexer: Relies on full nodes to fetch specific data on demand, reducing storage requirements but may increase query time.

  3. Dedicated Indexer: Optimized for specific data types or blockchains, such as NFT data or DeFi transactions.

  4. Aggregated Indexer: Extracts data from multiple blockchains and sources, including off-chain information, providing a unified query interface suitable for multi-chain dApps.

Ethereum alone requires 3TB of storage space, and this continues to expand as the blockchain grows. The indexer protocol deploys multiple indexers, enabling efficient indexing and high-speed querying of large amounts of data, which is not achievable with RPC.

The indexer also allows for complex queries, easy data filtering, and post-analysis extraction. Some indexers can aggregate multi-source data, avoiding the need for multiple APIs for multi-chain dApp deployments. By being distributed across multiple nodes, the indexer provides higher security and performance, while RPC providers may face interruptions due to centralized characteristics.

Overall, compared to RPC node services, the indexer improves data retrieval efficiency and reliability while reducing the deployment costs of a single node. This makes the blockchain indexer protocol the preferred choice for dApp developers.

![Development of Web3 Data Access: Introduction to Indexers and Related Projects])https://img-cdn.gateio.im/webp-social/moments-16396b955382c2c74010c264affdca46.webp(

) Indexer application scenarios

Building a dApp requires retrieving and reading blockchain data to run services. This includes various types of dApps such as DeFi, NFT platforms, games, and even social networks, as they need to read data first to execute other transactions.

DeFi

DeFi protocols require different information to provide users with specific prices, rates, and fees. Automated Market Maker ###AMM( needs certain liquidity pool price and liquidity information to calculate swap rates, while lending protocols need utilization to determine lending rates and liquidation debt ratios. It is crucial to input the information into the dApp before calculating the execution rate for users.

)# Game

GameFi requires fast indexing and access to data to ensure a smooth gaming experience for users. Only through quick data retrieval and execution can Web3 games compete with Web2 games in terms of performance and attract more users. These games need data such as land ownership, in-game token balances, and in-game operations. By using indexers, they can better ensure a stable data flow and normal uptime, guaranteeing a perfect gaming experience.

NFT

NFT marketplaces and lending platforms need to index data to access various information, such as NFT metadata, ownership and transfer data, royalty information, etc. Quickly indexing such data can avoid browsing through each NFT individually to find ownership or attribute data.

Whether it's DeFi AMMs that require price and liquidity information, or SocialFi applications that need to update new user posts, quickly retrieving data is essential for the normal operation of dApps. With the help of indexers, they can efficiently and accurately retrieve data, providing a smooth user experience.

Analysis

The indexer provides a method to extract specific data from raw blockchain data ###, including smart contract events within each block (. This offers opportunities for more specific data analysis, thereby providing comprehensive insights.

For example, perpetual trading protocols can identify which tokens have high trading volumes and generate fees, thus deciding whether to list them as perpetual contracts on the platform. DEX developers can create dashboards for their own products to gain insights into which liquidity pools have the highest returns or strongest liquidity. They can also create public dashboards, allowing developers to query any type of data they want to display on the charts flexibly.

As there are multiple blockchain indexers available, identifying the differences between indexing protocols is crucial to ensure developers select the indexer that best meets their needs.

Overview of Blockchain Indexers )

The Graph

The Graph is the first indexing protocol launched on Ethereum, allowing easy access to previously hard-to-reach transaction data. It uses subgraph definitions and filters to collect subsets of data from the blockchain, such as all transactions related to a specific liquidity pool.

Using index proof, indexers stake the native token GRT for indexing and query services, and delegators can choose to stake their tokens here. Curators can access high-quality subgraphs to help indexers determine which subgraphs to curate data for in order to earn the best query fees. In the transition towards greater decentralization, The Graph will eventually stop hosting services and require subgraphs to upgrade to its network while providing upgrade indexers.

Its infrastructure brings the average cost per million queries to $40, significantly lower than self-hosted nodes. By using file data sources, it also supports parallel indexing of both on-chain and off-chain data, enabling efficient data retrieval.

The rewards for The Graph's indexers have steadily increased over the past few quarters. This is partly due to the increase in query volume and also attributed to the rise in token prices, as they plan to integrate AI-assisted queries in the future.

Subsquid

Subsquid is a peer-to-peer, horizontally scalable decentralized data lake that efficiently aggregates large amounts of on-chain and off-chain data, protected by zero-knowledge proofs. As a decentralized worker network, each node is responsible for storing a specific subset of block data, accelerating data retrieval by quickly identifying the nodes that hold the necessary data.

Subsquid supports real-time indexing, allowing for indexing before the block is finalized. It also supports storing data in formats chosen by developers, making it easy to analyze with tools like BigQuery, Parquet, or CSV. Additionally, subgraphs can be deployed on the Subsquid network without the need to migrate to the Squid SDK, enabling no-code deployment.

Although still in the testnet phase, Subsquid has achieved impressive statistics: over 80,000 testnet users, deployed over 60,000 Squid indexers, and more than 20,000 verified developers on the network. Recently, Subsquid launched its data lake mainnet.

In addition to indexing, the Subsquid Network data lake can also replace RPC in use cases such as analysis, ZK/TEE co-processors, AI agents, and Oracles.

SubQuery

SubQuery is a decentralized middleware infrastructure network that provides RPC and indexing data services. Initially supporting the Polkadot and Substrate networks, it has now expanded to over 200 chains. Its operation is similar to The Graph, which uses indexing proofs; indexers index data and provide query requests, while delegators stake their shares to the indexers. However, it introduces consumers who submit purchase orders, indicating that the indexers' income is guaranteed, rather than that of the managers.

It will introduce SubQuery data nodes that support sharding, preventing continuous synchronization of new data between nodes, optimizing query efficiency, and moving towards greater decentralization. Users can choose to pay approximately 1 SQT token for every 1000 requests as a calculation fee, or set custom fees for indexers through the protocol.

Although SubQuery only launched its token earlier this year, the issuance rewards for nodes and delegators have increased in USD value on a month-on-month basis, reflecting the growing number of query services offered on its platform. Since the TGE, the total amount of staked SQT has increased from 6 million to 125 million, highlighting the growth in network participation.

![The Development of Web3 Data Access: Introduction to Indexers and Related Projects]###https://img-cdn.gateio.im/webp-social/moments-53dbb4fd659cf6a7184990c886901658.webp(

)# Covalent

Covalent is a decentralized indexing network, created by blockchain sample producers ###BSP( network nodes through bulk export to create copies of blockchain data and publish proofs on the Covalent L1 chain. This data is then refined by blockchain result producers )BRP( nodes according to set rules, filtering out data that meets the requirements.

Through a unified API, developers can easily extract relevant blockchain data in a consistent request and response format, without having to write complex queries to access the data. The CQT token, which can be settled on Moonbeam, can be used as a means of payment to extract these pre-configured datasets from network operators.

Covalent rewards seem to show an overall upward trend from the first quarter of 2023 to the first quarter of 2024, partly due to the increase in the price of the Covalent token CQT.

) Considerations for Choosing an Indexer

Data Customizability

Some indexers ### such as Covalent ( are general-purpose indexers that provide standard pre-configured datasets solely through APIs. While they are fast, they do not offer the flexibility for developers who require custom datasets. Using an indexer framework allows for more customized data processing to meet specific application needs.

)# Security

Index data must be secure; otherwise, dApps built on these indexers can also be vulnerable to attacks. For example, if transactions and wallet balances can be manipulated, the dApp may lose liquidity, affecting users. While all indexers adopt some form of security measures through staking tokens, other indexer solutions may use proofs to further enhance security.

Subsquid offers the option of optimistic and zero-knowledge proofs, while Covalent has also released proofs containing block hash values. The Graph provides a dispute challenge period for indexer queries in an optimistic challenge window manner, and SubQuery generates Merkle Mountain proofs for each block, calculating the hash value of all data stored in its database for each block.

Speed and Scalability

As the blockchain continues to grow, the volume of transactions increases, making the indexing of large amounts of data more cumbersome, requiring more processing power and storage space. As the blockchain network expands, maintaining efficiency becomes more challenging, but the indexer protocol introduces solutions to meet these growing demands.

For example, Subsquid achieves horizontal scaling by adding more nodes to store data, allowing it to scale with hardware improvements. Graph provides parallel streaming data for faster data synchronization, while SubQuery introduces node sharding to accelerate the synchronization process.

Supported Networks

Although most blockchain activities are still taking place within Ethereum, different blockchains are becoming increasingly popular over time. For example, Layer 2s, Solana, Move blockchain, and the Bitcoin ecosystem chains all have their own growing set of developers and activities, which also require indexing services.

Supporting certain chains that are not supported by other indexer protocols can gain more market share fees.

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • 5
  • Share
Comment
0/400
ValidatorVikingvip
· 07-07 18:30
battle-tested nodes don't lie... DA layer's basically a billboard smh
Reply0
ZKSherlockvip
· 07-06 04:19
actually... DA is just half the battle. the real privacy nightmare begins with historical data retrieval smh
Reply0
BearMarketHustlervip
· 07-06 04:18
Is DA still important? It's really ridiculous.
View OriginalReply0
ThatsNotARugPullvip
· 07-06 04:03
Database stack database card got stuck ah
View OriginalReply0
UnluckyValidatorvip
· 07-06 03:50
Is there also a hard drive full warning on-chain?
View OriginalReply0
Trade Crypto Anywhere Anytime
qrCode
Scan to download Gate app
Community
English
  • 简体中文
  • English
  • Tiếng Việt
  • 繁體中文
  • Español
  • Русский
  • Français (Afrique)
  • Português (Portugal)
  • Bahasa Indonesia
  • 日本語
  • بالعربية
  • Українська
  • Português (Brasil)