A Data Stewardship Framework for Generative AI

Data is the lifeblood of generative AI applications and these apps are ultimately only as good as the data they train on. It is obvious, therefore, that having and maintaining policies and procedures that are specifically designed to ensure high quality data is continuously provided is critical. I refer to this overall effort as “data stewardship” and below is a (very) rough draft of what this effort looks like. (Those of you who are familiar with the CIS-20 Cybersecurity Controls will appreciate the structural similarity.) This framework can also be used by data consumers; i.e., companies that build generative AI applications and by AI auditors.

Basic Controls

Data Inventory Controls
Continuous Data Vulnerability Management (ties in with data observability practices)
Secure Configuration for Data
Maintenance, Monitoring, and Analysis of Data

Foundational Controls

Data Storage Protections
Data Threat Defenses
Data Provenance Protections
Secure Configuration for all Data Sources
Data Sources Boundary Defense
Controlled Access to Data Sources
Audit and Control (for the above)

Organizational Controls

Implement Data Stewardship Program
Data Incident Response Management
Fuzzing Tests and other Red Team Exercises

Leave a Reply Cancel reply