Web3 data access: indexer and related project introduction

Author: Geng Kai, DFG

The Importance of Data in Blockchains

Data is the key to blockchain technology and is the foundation for developing decentralized applications (dApps). While most current discussions revolve around Data Availability (DA) - ensuring that every network participant can access the most recent transaction data for validation - there is also an equally important aspect that is often overlooked: Data Accessibility.

In the era of modular blockchain, DA solutions have become indispensable. These solutions ensure that all participants can use transaction data to achieve real-time validation and maintain the integrity of the network. However, the function of the DA layer is more like a billboard than a database. This means that data will not be stored indefinitely; it will be deleted over time, just like a new poster will eventually replace the old one on a billboard.

On the other hand, data accessibility focuses on the ability to retrieve historical data, which is crucial for developing dApps and conducting blockchain analysis. This aspect is crucial for tasks that require access to past data to ensure accurate representation and execution. Although data accessibility is important, it is less discussed, but it is equally important as data availability. Both play different but complementary roles in the blockchain ecosystem, and a comprehensive data management approach must address both issues to support powerful and efficient blockchain applications.

How was blockchain data retrieved before?

Since its birth, blockchain has completely changed the infrastructure and driven the creation of decentralized applications (dApps) in various fields such as gaming, finance, and social networks. However, building these dApps requires access to a large amount of blockchain data, which is both difficult and costly.

For dApp developers, one option is to host and run their own archival RPC nodes. These nodes store all historical blockchain data from the beginning, allowing full access to the data. However, maintaining archival nodes is costly and query capabilities are limited, so it is not possible to query data in the format that developers need. While running cheaper nodes is an option, their data retrieval capabilities are limited, which may impede the operation of dApps.

Another method is to use commercial RPC (remote procedure call) node providers. These providers are responsible for the cost and management of nodes, and provide data through RPC endpoints. Public RPC endpoints are free but have rate limits that may negatively impact the user experience of dApps. Private RPC endpoints offer better performance by reducing congestion, but even simple data retrieval requires a significant amount of back-and-forth communication. This makes them request-heavy and inefficient for complex data queries. Additionally, private RPC endpoints are often difficult to scale and lack compatibility across different networks.

Better Alternative: Blockchain Indexer

Blockchain indexers play a crucial role in organizing on-chain data and sending it to databases for easy querying, which is why they are often referred to as the 'Google of blockchain.' Their operation involves indexing blockchain data and making it readily available through query languages similar to SQL (using APIs such as GraphQL). By providing a unified interface for querying data, indexers allow developers to quickly and accurately retrieve the desired information using standardized query languages, greatly simplifying the process.

Different types of indexers optimize data retrieval in various ways:

  1. Full Node Indexers: These indexers run full blockchain nodes and extract data directly from them, ensuring data integrity and accuracy, but requiring a large amount of storage and processing power.
  2. Lightweight indexer: These indexers rely on full nodes to fetch specific data as needed, reducing storage requirements but potentially increasing query time.
  3. Specialized indexers: These indexers are specifically designed for certain types of data or specific blockchains, optimizing retrieval for specific use cases, such as NFT data or DeFi transactions.
  4. Aggregator: These indexers extract data from multiple blockchains and sources, including off-chain information, and provide a unified query interface, which is particularly useful for multi-chain dApps.

Only Ethereum requires 3TB of storage space, and as the blockchain continues to grow, the data storage of Erigon archival nodes will also continue to increase. The indexer protocol deploys multiple indexers, which can efficiently index and quickly query large amounts of data, something that RPC cannot achieve.

The indexer also allows for complex queries, easy filtering and extraction of data based on different criteria. Some indexers also allow aggregation of data from multiple sources, avoiding the deployment of multiple APIs in multi-chain dApps. By being distributed across multiple nodes, indexers provide enhanced security and performance, while RPC providers may experience interruptions and downtime due to their centralized nature.

Overall, compared to RPC node providers, indexers improve the efficiency and reliability of data retrieval, while also reducing the cost of deploying individual nodes. This makes the blockchain indexer protocol the preferred choice for dApp developers.

Indexer Use Cases

As mentioned earlier, building a dApp requires retrieving and reading blockchain data to operate its services. This includes any type of dApp, including DeFi, NFT platforms, games, and even social web, as these platforms need to read data first before executing other transactions.

DeFi

DeFi protocols require different information to provide users with specific prices, ratios, fees, etc. Automated Market Makers (AMMs) require price and liquidity information about certain liquidity pools to calculate swap rates, while lending protocols require utilization rates to determine borrowing rates and debt liquidation ratios. It is essential to input this information into their dApp before calculating the interest rates that users will execute.

Game

GameFi requires fast indexing and access to data to ensure smooth gameplay. Only through lightning-fast data retrieval and execution can Web3 games match the performance of Web2 games, thereby attracting more users. These games require data such as land ownership, in-game token balances, and in-game operations. With indexers, they can better ensure a stable data flow and stable uptime to ensure a perfect gaming experience.

NFT

NFT markets and lending platforms need to index data to access various information, such as NFT metadata, ownership and transfer data, royalty information, etc. Rapid indexing of such data can avoid browsing each NFT individually to find ownership or NFT attribute data.

Whether it's a DeFi Automated Market Maker (AMM) that requires price and liquidity information, or a SocialFi application that needs to update new user posts, the ability to quickly retrieve data is crucial for the normal operation of dApps. With indexers, they can efficiently and accurately retrieve data, providing a smooth user experience.

Analysis

The indexer provides a method for extracting specific data from the original blockchain data, including smart contract events in each block. This provides an opportunity for more specific data analysis, providing comprehensive insights.

For example, the perpetual trading protocol can identify which tokens have high trading volume and which tokens will generate fees, thereby determining whether to list these tokens as perpetual contracts on their platform. DEX developers can create dashboards for their products to gain insight into which liquidity pools have the highest returns or the strongest liquidity. They can also create public dashboards, allowing developers to query any type of data to be displayed on the charts with flexibility.

It is crucial to identify the differences between index protocols to ensure that developers choose the most suitable indexer for their needs, as there are multiple blockchain indexers available.

Block Chain Indexer Overview

Web3数据访问的:索引器及相关项目介绍

Indexer Overview

The Graph

The Graph is the first indexer protocol launched on Ethereum, which enables easy querying of transaction data that was previously difficult to access. It uses subgraphs to define and filter subsets of data collected from the blockchain, such as all transactions related to the Uniswap v3 USDC/ETH pool.

Using index proof, the indexer stakes native token GRT for indexing and querying services, and delegators can choose to stake their tokens on it. Curators can access high-quality subgraphs to help indexers determine which subgraphs to index and earn the best query fees. As The Graph transitions to a more decentralized network, it will eventually cease its hosted services and require subgraphs to upgrade to its network while providing upgraded indexers.

Its infrastructure enables the average cost per million queries to be $40, which is much lower than the cost of self-hosted nodes. Using file data sources, it also supports parallel indexing of on-chain and off-chain data for efficient data retrieval.

Web3数据访问的:索引器及相关项目介绍

Take a look at The Graph's indexer rewards, which have been steadily rising over the past few quarters. This is partly due to an increase in query volume, but also attributed to the rise in token prices as they plan to integrate AI-assisted queries in the future.

Subsquid

Subsquid is a peer-to-peer, horizontally scalable decentralized data lake that efficiently aggregates large amounts of on-chain and off-chain data and protects it through zero-knowledge proofs. As a decentralized network of workers, each node is responsible for storing data from a specific subset of blocks, speeding up the data retrieval process by quickly identifying nodes that hold the required data.

Subsquid also supports real-time indexing, allowing it to be indexed before the block is finalized. It also supports storing data in formats chosen by developers, making it easier to analyze with tools like BigQuery, Parquet, or CSV. Additionally, subgraphs can be deployed on the Subsquid network without migrating to the Squid SDK, enabling codeless deployment.

Despite still being in the Testnet phase, Subsquid has impressive statistics, with over 80,000 Testnet users, over 60,000 Squid indexers deployed, and over 20,000 verified developers on the network. Recently, on June 3rd, Subsquid launched its Mainnet for the data lake.

In addition to indexing, the Subsquid Network data lake can also replace RPCs in use cases such as analysis, ZK/TEE coprocessors, AI agents, and Oracles.

SubQuery

SubQuery is a decentralized middleware infrastructure network that provides RPC and indexing data services. It initially supports Polkadot and Substrate networks and has now expanded to include over 200 chains. Its operation is similar to The Graph, which uses indexing proofs to index data and provide query requests, with stakers pledging stakes to the indexer. However, it introduces consumers to submit purchase orders to ensure the income of the indexer, rather than relying on managers.

It will introduce SubQuery data nodes that support sharding to prevent continuous synchronization of new data between each node, thereby optimizing query efficiency and moving towards greater decentralization. Users can choose to pay about 1 SQT token for every 1000 requests as computational fees, or set custom fees for indexers through the protocol.

Web3数据访问的:索引器及相关项目介绍

Although SubQuery only launched its Token earlier this year, the issuance rewards for nodes and delegates have also risen in value compared to the US dollar, indicating a continuous increase in the number of query services provided on its platform. Since TGE, the total amount of staked SQT has increased from 6 million to 125 million, highlighting the growth in network participation.

Covalent

Covalent is a decentralized indexer network that creates replicas of blockchain data by batch exporting through Block Sample Producers (BSP) network nodes and publishes the proofs on the Covalent L1 blockchain. These data are then refined and filtered by Block Result Producers (BRP) nodes according to the set rules to select the required data.

With a unified API, developers can easily extract relevant blockchain data in a consistent request and response format without writing custom complex queries to access the data. The pre-configured datasets can be accessed from network operators using CQT tokens settled on Moonbeam as a payment method.

Web3数据访问的:索引器及相关项目介绍

Covalent's rewards seems to show an overall rising trend from the first quarter of 23rd year to the first quarter of 24th year, partly due to the pump in Covalent token (CQT) price.

Notes on Selecting an Indexer

Customizability of the data

Some indexers (such as Covalent) are generic indexers, providing only standard pre-configured datasets through APIs. While they may be fast, they do not offer flexibility for developers needing custom datasets. By using the indexer framework, it allows for more custom data processing to meet application-specific needs.

Security

Index data must be secure, otherwise dApps built on these indexers can be vulnerable to attacks. For example, if transactions and wallet balances can be manipulated, the dApp may lose liquidity, thereby affecting its users. While all indexers adopt some form of security through indexer staking tokens, other indexer solutions may use proofs to further enhance security.

Subsquid provides options using optimistic and zero-knowledge proofs, while Covalent also releases proofs containing block hash values. Graph offers a dispute challenge period for indexer queries in an optimistic challenge window, while SubQuery generates Merkle Mountain proofs for each block to calculate the hash value of all data stored in its database for each block.

Speed and Scalability

As the blockchain continues to grow, the transaction volume also increases, which makes indexing a large amount of data more cumbersome as it requires more processing power and storage space. With the growth of the blockchain network, maintaining efficiency becomes more difficult, but the Indexer Protocol introduces a solution to meet these growing needs.

For example, Subsquid achieves horizontal scaling by adding more nodes to store data, and it can scale as hardware improves. Graph provides parallel streaming data to synchronize data faster, while SubQuery introduces node sharding to speed up the synchronization process.

Supported Networks

Although most blockchain activities still take place within Ethereum, over time, different blockchains are becoming more and more popular. For example, Layer 2s, Solana, Move blockchain and Bitcoin ecosystem chains all have their own growing community of developers and activities, which also require indexing services.

Providing support for certain chains not supported by other indexer protocols can gain more market share fees. Indexing data-intensive networks (such as Solana) is not easy, and so far, only Subsquid has successfully provided indexing support for them.

Conclusion

While indexers are widely used in dApp development, their potential is still immense, especially when integrated with AI. With AI becoming increasingly popular in Web2 and Web3, its ability to improve depends on access to relevant data for training models and developing AI agents. Ensuring data integrity is critical for AI applications as it can prevent models from being fed biased or inaccurate information.

In the field of indexer solutions, Subsquid has made significant progress in performance and user metrics. Users have started to experiment with using Subsquid to build AI agents, demonstrating the platform's versatility and potential in the evolving data indexing field. In addition, tools like AutoAgora help indexers use AI to provide dynamic pricing for query services on The Graph, while SubQuery supports multiple AI networks (such as OriginTrail and Oraichain) for transparent data indexing.

The integration of artificial intelligence and indexer is expected to enhance the data accessibility and availability in the blockchain ecosystem. By leveraging AI technology, the indexer can provide more efficient and accurate data retrieval, enabling developers to build more complex dApps and analytical tools. As AI and indexer continue to develop together, we remain optimistic about the future of data indexing and its role in shaping the decentralized digital landscape.

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Share
Comment
0/400
No comments
Trade Crypto Anywhere Anytime
qrCode
Scan to download Gate app
Community
English
  • 简体中文
  • English
  • Tiếng Việt
  • 繁體中文
  • Español
  • Русский
  • Français (Afrique)
  • Português (Portugal)
  • Bahasa Indonesia
  • 日本語
  • بالعربية
  • Українська
  • Português (Brasil)