Introduction
As the reform of market-oriented allocation of data elements deepens, the construction of systems, the release of value, and governance collaboration have become key issues for promoting high-quality development of the digital economy. In this context, experts, scholars, local government officials, and business representatives have gathered to explore new paths for the high-quality development of data elements.
Reducing Data Bias in AI Training
Question: With the stock of available training data gradually reaching its limit, what will be the future trends in the sources and construction methods of training data for large models?
Zhang Linghan: Data is the core foundational element for AI model training and is essential for differentiated competition and continuous capability advancement of large models. The quality of the training data corpus directly determines the capabilities of AI large models and affects the compliance and fairness of AI output content. Future corpus construction should focus on three dimensions: First, clarify the legitimacy of network data sources, excluding unauthorized personal information, infringing content, and illegal data from the training corpus to prevent low-quality and harmful data from entering the training process. Second, coordinate copyright rules to clarify the reasonable boundaries of offline data use, balancing data utilization with copyright protection to avoid data supply issues due to copyright disputes. Third, promote the establishment of cross-domain data circulation and transaction rules, improve data supply incentive mechanisms, and encourage legal and compliant data sharing and transactions, providing institutional guarantees for high-quality corpus construction. Compared to market data, data held by public service institutions such as government departments and research institutes has inherent authority, accuracy, and broad coverage, which can enrich the dimensions of training data, effectively reduce data bias in model training, and enhance the fairness and reliability of AI output content.
Adapting Regulatory Models to AI Technology
Question: How should we optimize governance and regulation of AI and algorithms in the face of rapid technological iteration?
Zhang Linghan: Current AI and algorithm governance can no longer rely solely on post-event remedies; the regulatory focus should shift more towards prevention and control during the process, adapting regulatory models to the iteration of AI technology. A more comprehensive preventive governance system should be established, improving core institutional tools such as filing, labeling, evaluation, safe harbor, and regulatory sandboxes. Strengthening real-time control over AI and algorithms is essential to achieve transparent and standardized regulation. Based on the principle of information disclosure, we should enhance algorithm transparency, requiring companies to disclose the data sources, decision-making processes, and algorithm logic of AI algorithms that involve public interests and personal rights. Additionally, we should conduct algorithm impact assessments based on public participation principles, focusing on potential risks such as algorithm bias, data abuse, and rights damage, inviting public, expert, and regulatory participation to promptly identify and correct errors and biases in algorithms. Furthermore, we need to implement rules for explaining reasons, ensuring that when AI makes decisions affecting user rights, it must clearly explain the basis, process, and rationale to users, safeguarding their right to know and supervise.
Establishing Reasonable Trust Standards
Question: In the deep application of AI, how can we prevent damage caused by AI hallucinations? If damage occurs due to hallucinations, how should we delineate responsibility boundaries?
Zhang Linghan: If erroneous content generated by AI hallucinations is accepted by users, it could lead to rights damage. Service providers must inform users of risks and guide them to trust AI rationally, reducing the risk of damage from hallucinations at the source. We require AI service providers to prominently remind users that “this content is generated by AI and is for reference only,” guiding users to view AI outputs rationally and minimizing the risks of blind trust. It is crucial to clarify the applicable boundaries of reasonable trust standards in highly capable AI. When AI systems approach or even exceed the cognitive abilities of ordinary users, the standards for users’ reasonable trust in AI-generated content can vary significantly. Therefore, in institutional design, we must specify the conditions and standards under which users can reasonably trust generated content, tailoring them to different scenarios. We must confirm the duty of care and responsibility distribution among model providers, system deployers, and tool providers. Generative AI systems are often composed of models, platforms, and tools, which differ significantly in technical control capabilities, risk foreseeability, and actual involvement. The strength of the duty of care should be assessed based on model generality, application scenario risk levels, and specific product design and deployment methods.
Comments
Discussion is powered by Giscus (GitHub Discussions). Add
repo,repoID,category, andcategoryIDunder[params.comments.giscus]inhugo.tomlusing the values from the Giscus setup tool.