Is your Org Data ready for AI?
Now a days everyone is talking about Artificial Intelligence (AI). AI is definitely a very BIG shift and in the coming time it will get even bigger and may change how we do business and how we work. Though we are still in the early stage, and a lot has to be done.
Many organizations are still thinking and planning that what is the best use case for them to try AI?
Before any organization plan to dive deep into their AI journey and I think having a solid data management and data strategy plan is the key for a successful AI project implementation. Because if you have bad data , your AI app can’t do much. AI has a complete dependency on data which you have in the background, and it operates on it. So, before planning AI project finding out the data quality in our organization should be the top priority.
This document will help you to decide the quality of data which you have in your org and will give you some input that how you can improve it.
Do you have Good or Bad Data?
A complete, accurate and valid data that can be used for business processes is considered as good data. So how to decide that ? We need to figure out the answers for following –
- What is the business objective and what data is required to support that?
- How are you using that data?
- Where is that data stored?
- Is the Data static? Or keep getting updates?
- Is there any governance standard for the data?
Once you figured the answer to the above questions, run some reports on the data. Assuming if the data is stored in Data cloud or in a salesforce object, run the reports and try to find following-
- Do you have all data or data is missing? Find out the stakeholders who can help you to find this.
- Run reports and see do you have duplicate records in the data set?
- Do you see a Data standard pattern in the data? Or California is present as ‘CA’,’Calif’ , ‘California’ , ‘Cali’ etc. 😊
- Find the key data and check do you have all key data present?
- Check the last updated date ? If it is not updated since quite sometime , your data is stale. As a survey every 30 mins , 20 CEO leave their job , so you can imagine how fast data get changed in real time, so if your data is not updated since long , your data is stale.
- If you are building AI project to provide quick response for customer’s FAQ, check that do you have updated knowledge articles in place and when they were last updated.
Data management Plan:
First step to develop data quality is, to have a Data management plan.
- Set a standard for data quality. Give each record a score based on the data attribute. Put a value for record age, accuracy of key attribute, dups etc. Make that score a formula field and make it visible in reports to Key management people. So that right people get the understanding that data is being or not being managed properly.
- Use proper formatting for certain fields for ex. Date format, currency etc.
- Wherever possible use picklist (Drop down) rather text fields to minimize the data errors.
- Determine who owns the records and who is responsible for reviewing , updating and make people owners.
- Determine the appropriate level of security and privacy for data.
- Determine the data monitoring process.
How to implement data management plan in salesforce –
- Create the key fields as required, so that people not miss those data attributes.
- Use validation rules to improve data quality. So that Data standards be maintained.
- Create Flow to automate the process. If you can automate something , invest time for that. It will reduce chances of errors.
- Create various page layouts and assign to the right audiences. So that people will only see their relevant information.
- Create reports and Dashboards and make them available to the right management people. So that action can be taken timely, and people get involved.
- Use Apps on exchange to validate the data or remove duplicates as needed.
- Use custom fields as needed to improve data quality e.g. Picklist to have a fix set of values , Date fields which allow a proper Date format etc.
Data Risk areas:
Once you have cleaned data, you also need to keep in mind that what can be the risks involved with data and AI projects. There can be 3 main risk areas-
1. Data Leaks:
Data leaks happens when you expose the data to an unauthorized person, or you exposed a data which is not supposed to be exposed at all for ex. personal identification information (PII). To prevent data leaks, we should have secure data retrieval policy, data masking policy, and zero-data retention policy in place (Model should not store the data. Once the work is over, model should not have access to any data).
2. Regulatory Requirements: Each country and each industry have their own regulatory requirements. For ex. CCPA, GDPR, HIPAA, EU AI Act. We need to be aware about these policies and apply policy based on the industry and the country where business is being operated on.
3. Reputation Harm:
AI can sometimes generate harmful or non-relevant contents if the model training material was not adequate or in case of any cyber-attack. So, we should have Technical guardrails to protect it from cyber attack and we should also have screening process in place to review for toxic responses.
If multiple data risks are possible in your AI project, then you can’t handle all of them at once. In that case use scoring mechanism to understand the severity of the risk and likelihood of that happening. Give a score to each category and handle the highest negative score category items first and follow in that order.
Thank you for reading. I hope this document will give you some insight about the Data quality and possible risk involved with your AI Project.
Empowering Teams, Optimizing P&L, Delivering Results | Driving IT & Digital Transformation | Accelerating Value Delivery through Agile Program Leadership
3moThanks for sharing, Gaurav