The field of artificial intelligence (AI) is rapidly evolving, and access to high-quality, domain-specific data has become a critical factor in the success of AI projects. Recognizing the importance of data as a capital element, there have been numerous efforts to establish data marketplaces that facilitate the efficient and transparent circulation of data, ultimately benefiting the growth and adoption of AI technologies. However, despite the existence of both decentralized and centralized data marketplaces, none have proven to be effective in reality due to several inherent challenges.

Problems of the Existing Data Marketplace

1. Limited and Unbalanced Supply

One of the primary issues facing the current data marketplace is the limited and unbalanced supply of data. Studies have shown that 90% of AI projects fail due to the lack of proper data required for fine-tuning or training models. The demand for data is highly customized, while the supply of data, especially domain-specific data, is very limited. This imbalance creates a significant bottleneck for AI development and adoption.

2. High Friction Cost

Data is a unique commodity, and its value is highly dependent on its quality. However, the quality of data cannot be fully assessed until it is purchased, leading to a high risk of scams and fraud in the market. To mitigate this risk, buyers often rely on trustworthy intermediaries to inspect and verify the data quality before making a purchase. This reliance on middlemen adds significant friction costs to the data acquisition process, making it expensive and inefficient for organizations.

Trust-less Synthetic Data Marketplace with MIZU

To address the challenges of the existing data marketplace, we introduce MIZU, a data generation network that enables the creation of the first trust-less synthetic data marketplace. MIZU leverages advanced technologies and innovative approaches to solve the problems of limited supply and high friction costs.

1. Unlimited and Demand-Driven Supply

MIZU’s data generation network empowers developers to build and deploy customized data pipelines for generating synthetic datasets. The flexibility of the network allows developers to generate data that meets specific requirements, fulfilling the demand for customized data while simultaneously increasing the overall data supply. With MIZU, data consumers can post their demands first, and developers can subsequently create tailored data pipelines to address those specific use cases.

2. Trust-less Data Quality Evaluation

MIZU ensures that all datasets within its network are generated by open-source data pipelines. This transparency enables data buyers to easily track and verify the data generation process on-chain and evaluate the quality of the data by assessing the data pipelines. Buyers can also run the data pipeline to generate sample data for further analysis before committing to the purchase of the final datasets. By eliminating the need for intermediaries and providing a trust-less environment, MIZU streamlines the data trading process, making it efficient and trustworthy.

Misc

Dataset Discovery via Staking

To promote high-quality datasets within the MIZU network, a rating mechanism will be introduced. Community members can stake MIZU tokens to vote for specific datasets. Datasets with high rankings will have better monetization opportunities, and a portion of the revenue generated will be shared among the stakers. This incentivizes the community to actively participate in the curation and promotion of valuable datasets.

Dataset Pricing

For popular datasets, data owners can adopt a per-copy pricing model. This allows the price of each data copy to be lower than the production cost while still generating higher overall income for the data owner. The per-copy pricing model makes high-quality datasets more accessible to a wider range of buyers, fostering the growth and adoption of AI technologies.