Learning & Innovation Research Manager| Project Management Institute (PMI)Spain
Data quality and quantity is particularly important as we think about leveraging AI on projects. Considerations include the diversity and comprehensiveness of the data that is available to us.
Have you ever encountered unexpected challenges or pitfalls while using data in your projects? How did you navigate the situation and find a resolution? Saving Changes...
Elizabeth MassuraPrincipal Marketo Consultant| AcxiomChicago, IL, United States
Sometimes an idea for data to incorporate into a marketing initiative or dashboard brings to light that the data is not readily available. It might not be collected at all, or it's in a format that isn't very usable for the purpose at hand. We might talk about alternative data that is available to use that will get us part of the way there for the project at hand, as well as plan how to begin collecting and/or organizing the data so it can be used in the future. Saving Changes...
Patricia WhiteEducator/Trainer| UMUCOrange Park, Fl, United States
How do you navigate unexpected data challenges in your projects?
You have to understand the problem, explore data, clean the data prior to using it, leverage some of the advance techniques and found some type of alternative resources if possible. You have to always adapt to unexpected issues that come up with data. Document everything to learn from the past. Saving Changes...
Anonymous
Data accuracy is a big issue. Whenever we implement an automated process within our ERP system, we have to troubleshoot it several times to ensure accuracy Saving Changes...
Ryan TalaveraProgram Manager| Elbit AmericaMerrimack, NH, United States
I've run into more challenges with data than I care to admit. When I've encountered these issues I've typically engaged with the data owner(s) and any required supporting resources to review the issue encountered and review the data that I believe is inaccurate. Once the issue has been confirmed I've worked with the team to understand the cause of the error and (although not 100% successful) ensure that the underlying cause is corrected in a manner that prevents recurrence. I can confirm that, in the absence of periodic verification of any corrective action, issues are more likely than not to reoccur. The recurrence is usually due to someone pulling up an old template or reverting to the data source or process that originally caused the issue which highlights the importance of robust data governance to include regular verification of data, models, and adherence to controls. Saving Changes...
Understanding data and preparing it in well-structured is really essential. Discussing with data providers and making necessary updates is also really important to mitigate unexpected data challenges. Saving Changes...
Data quality and quantity are critical when working on AI projects. Here are some challenges I've encountered or observed, along with strategies used to address them:
1. Data Insufficiency
Challenge: The dataset was too small or not representative enough to train an accurate model. This resulted in poor model performance and overfitting.
Resolution: Augmented the dataset using data synthesis techniques, such as data augmentation for image data or creating synthetic samples for tabular data. Additionally, sought out external datasets or combined multiple sources to increase diversity.
2. Data Imbalance
Challenge: Imbalanced classes led to a model that performed well on the majority class but poorly on the minority class.
Resolution: Applied techniques like oversampling the minority class (e.g., SMOTE) or undersampling the majority class. Used class-weighting in the model training phase to give more importance to the minority class. Also, considered using anomaly detection models if the imbalance was extreme.
3. Data Inconsistency & Quality Issues
Challenge: Inconsistent formatting, missing values, and outliers affected model accuracy.
Resolution: Implemented rigorous data preprocessing steps, including standardizing formats, imputing missing values with appropriate strategies (mean, median, or model-based imputation), and normalizing/transforming data. For outliers, we either removed them or used robust algorithms less sensitive to such anomalies.
4. Biased Data
Challenge: Models trained on biased datasets produced skewed results that favored certain demographics.
Resolution: Conducted bias audits by analyzing model predictions across different subgroups. To mitigate this, balanced the dataset as much as possible, applied debiasing techniques, or even went back to collect more diverse data when feasible. Regularly engaged with domain experts to understand potential biases better.
5. Data Privacy & Security Concerns
Challenge: Working with sensitive or personally identifiable information (PII) raised concerns about data privacy and regulatory compliance.
Resolution: Anonymized the data and implemented differential privacy techniques where appropriate. Worked closely with the legal and compliance teams to ensure all data usage adhered to relevant privacy laws. In some cases, opted for federated learning to train models without exposing raw data.
6. Concept Drift
Challenge: Over time, the nature of the data changed, causing the model to become less effective (e.g., in financial or consumer behavior models).
Resolution: Set up continuous monitoring systems to detect concept drift and retrained models regularly or incorporated adaptive learning mechanisms. Established processes for frequent model validation and updating.
Key Takeaway
The key is to be proactive about data assessment and constantly iterate on data cleaning and enrichment strategies. Having a flexible, adaptive approach and engaging with domain experts at various stages also significantly helps in resolving data-related challenges.
Have you faced similar issues, or do you foresee any specific data challenges in your upcoming projects?
Yes, I have encountered unexpected data challenges in my projects, particularly with data quality and representativeness. In one instance, I discovered that a dataset lacked diversity, which affected the reliability of the AI model we were developing. To address this, I conducted a thorough assessment of our data sources to identify gaps, then supplemented the dataset with additional sources to ensure more balanced representation. This iterative approach, along with close collaboration with stakeholders, helped us refine our model and achieve more accurate, inclusive outcomes. Saving Changes...
Salman ChohanSenior Project Manager| TPL MapsIslamabad, IS, Pakistan
To navigate unexpected data challenges, we should continuously monitor the behavior data models and work on their improvements, optimization and addition of new data sets. Saving Changes...
Dale NolanSenior Services Consultant| GE HealthcareTrophy Club, Tx, United States
The significant threat to data breach I've encountered in my projects has not been from outside sources but rather from external team members with intent to sabotage the project for personal gain. Saving Changes...
Anonymous
Hi Claudia,We focus heavily on the integrity of all data. Saving Changes...