Responsible Training Data: Three Important Aspects

There is no doubt that artificial intelligence (Artificial Intelligence) technology will continue to develop rapidly in the next few years and will be more and more closely related to our daily life. Businesses must now take responsibility for implementing responsible AI to maximize transparency, reduce bias, and guide the ethical application of AI technology. After all, a well-functioning AI should serve everyone fairly. Decisions made today regarding responsible policy and agreements will determine the future of AI and, in turn, how AI will shape our future. Data is the foundation of these efforts; it is at the heart of every AI technique that directly impacts model performance. A model is only as good as the data it is trained on, which is why data is a key area where AI practitioners can really make a difference when determining governance practices. In AI projects, data scientists spend most of their time collecting and labeling data. When accomplishing these tasks, there are three most important aspects: protecting data privacy, reducing data bias, and acquiring data ethically .  

 

data privacy

As an AI practitioner, the most important concern should be data privacy and security. There is already relevant legislation in this area, and an organization's data processing agreements should be compliant. For example, there are internationally recognized ISO standards on the protection of personal information , the European Union's General Data Protection Regulation (GDPR), and other requirements around the world. Your business must adhere to data standards in all regions where it operates with its customers. In some parts of the world, data protection regulations may not exist, or may not be harmonized; in any case, working towards responsible AI means taking data security management measures and protecting your data suppliers. Individual consent should be obtained before personal data is used, and safeguards should be in place to prevent any improper use of personally identifiable information. If it is unclear what type of security protocols should be incorporated into data management practices, consider partnering with a third-party data provider for data collection. These third-party data providers already have security protocols in place and professionally guide you in handling your data safely.  

data bias

It is a simple fact of AI development that biased data leads to biased results. But when you think about it, all methods have the potential to inadvertently introduce bias into AI models, and the situation becomes much more complicated. For example, say you're building a speech recognition model, maybe for a car. Speech itself has different pitches, accents, fillers, and grammar (not to mention different languages ​​and dialects). Assuming you want your speech recognition model to work for drivers of different demographics and backgrounds, you need data that is representative of each use case. If you collect mostly male voices, it will often be difficult for a speech recognition model to recognize female voices. In fact, the current mainstream voice-based products in the market all have this problem, because the model has not been exposed to enough data types during training. Therefore, the challenge we face is how to curate a complete and fair dataset that covers all use cases and edge cases. If you want to create an AI product that works for every user, first make sure the training data covers all users.  

data collection

When referring to data acquisition, we are talking about ethical methods related to the provision of data and the treatment of those preparing it. Ideally, if you provided data, you should be compensated (and be aware that you are the provider of the data). Compensation can be in the form of an exchange of money or services. In fact, a lot of data is obtained without our knowledge, and the lines of data ownership are blurred. For example, if you're on a video call for work, who owns the rights to the voice data generated by the call? Is it your company? Video calling provider? Call participant? The lines of data ownership are very, very blurred. In all cases, companies committed to responsible AI should be open about who, what, and when data they collect and, where possible, appropriately compensate individuals who provide it. Getting data isn't always the problem, though, making data readily available is often more of a hassle. You need a large number of people to clean and filter the data to make sure it is valuable to the project, and even more people to annotate the data with accurate labels. These people must be treated fairly: including fair pay, open lines of communication, privacy, and comfortable working conditions. Legislation in this area is largely about laws prohibiting modern slavery and employment law, but companies can go a step further and ensure that their data labelers are treated ethically.

Guess you like

Origin blog.csdn.net/Appen_China/article/details/132184559