Learning & Innovation Research Manager| Project Management Institute (PMI)Spain
Data quality and quantity is particularly important as we think about leveraging AI on projects. Considerations include the diversity and comprehensiveness of the data that is available to us.
Have you ever encountered unexpected challenges or pitfalls while using data in your projects? How did you navigate the situation and find a resolution? Saving Changes...
Randi KruegerLecturer of Management| Southern Utah University (SUU)Cedar City, Ut, United States
You have to have a good understanding of how reliable and accurate the data is. The philosophy of garbage in, garbage out applied in all data contexts. Saving Changes...
Robert TaruwonaDirector| Resource Towers Investments (Pvt) LtdHarare, Ha, Zimbabwe
The unexpected challenges with your project data could be as a result of "unknown unknowns" which require pre-emptive strategies or mechanisms to be put into place. These mechanisms can be having a standy-by crack team comprising of members drawn from diverse backgrounds of data science, security, audit, legal etc to tackle the challenges versus leaving it to the assigned PM alone...Those are my thoughts...Thank you Saving Changes...
Anonymous
We experience this all the time in the cosulting world when integrating all kinds of data. The data is usually not clean or normalized and requires hours/dolars to pre-massage the data to get it in the kind of shape necessary for certain project. For LLMs this could be less of a problem due to the large amounts of accumulating data averaging out. However this also will homogenize the data and solutions and could lead to overall model collapse. Saving Changes...
I thik it is very important to have good and accurate data if we want to leverage AI on projects. We can keep certain things in mind.
1. Data Quality- we need good quality data which can be achieved by data cleaning.
2. Data Quantity- a good quantity data is required for performance of the AI model. We need to take measures to get effective datasets.
3. Unclear data- In order to expect our AI model functions well, it is important to keep in mind we are able to have a data that is able to understand the problem. This can be achieved by enhancing the features and get a dataset that captures the problem at hand.
4. Last, but not the least, we need to test our model and monitor it in order to see it is working in the expected manner. Saving Changes...
Anonymous
That issue is always a challenge especially when the tool through which that data is collected isn't completely reliable or holistic. I often find that you have to narrow the use of that to prevent generalizing data that doesn't have use in the presence of new data. Saving Changes...
When we face new challenges, in the projects we will work on, it can happen very often
to encounter many difficulties. When data is huge and heterogeneous, the complexity of our work grows exponentially. it is therefore important to constantly monitor the model to prevent duplicate or unstructured data, rather than imprecise or inconsistent data.
Continuous oversight of the model and outputs helped me prevent any reputational and legal issues Saving Changes...
Using an iterative approach to data analysis and modeling. Test assumptions, validate hypotheses, and refine the methods based on feedback and insights gained from the data. Saving Changes...
Pravat JenaSenior Manager - Model Governance| Thomson ReutersBangalore, India
For AI ML projects, data quality is of major concern. Once a model goes to production, we can't control the quality of input data. So, model monitoring should be in place for models in production. The model monitoring should include model performance metrics, their threshold, frequency for alerting, action to be taken, etc. The data quality as well as the model prediction can change over time. So, we have to monitor both data drift as well as model drift in production. For example: if the accuracy of model prediction goes down in course of time, we may have to re train the model with the new dataset (in addition to the old training dataset). Saving Changes...
Alan WangData Engineer| SerraCalifornia, United States
The most common challenge I've seen is that the larger the data grows, the harder it becomes to maintain, especially at the volume AI needs at the moment. Setting in place proper guardrails about how your data is flowing are essential. This could be:
- Building out proper data architecture so that your data pipelines become scalable. I've seen a ton of times where people move data with one-off scripts and as the data grows, these scripts cease to be practical.
- Having proper observability on your data is also very important—this could be having unit tests/checks for what you expect to see at each level of the data pipeline.
- Doing proper QA's and maintaining good communication between the data team and the data stakeholders (ie) business users, data analysts, marketing, finance)—the data team should know the context for the data that is expected from these end users and the end users should communicate consistently with the data team on what they expect as it always changes.
In a recent article in Forbes, “Generative AI Exposes Users To New Security Risks”, Wayne Rash, a technology and science writer wrote about the overwhelming and very attractive benefits of GenAI to business and consumer computing via tools that use GenAI to access your data and then use that access to make tasks easier and faster, and to support decision making and to gain insight into those tasks. However, Rash mentions new security risks warning by Adir Gruss, the co-founder and CTO of Aim Security. Gruss points out that those risks are directly related to the way GenAI functions.
According to Gruss, the GenAI will introduce, “significant and highly unique security challenges, particularly concerning personal privacy, security, and a range of ethical issues.” He claims that GenAI models create new attack vectors that are unique to GenAI models. They include prompt injection which can bypass built-in safety measures, such as, “When an attacker manipulates the output of an LLM (large language model) or GenAI chatbot to gain unauthorized access or to bypass security guardrails,” which is very alarming.
Any thoughts on addressing such risks folks? Saving Changes...