What do you guys think about putting too much unstructured data to GenAI or "machine" that we want to "teach" our historical cases.

Artificial Intelligence

Bingye Yu Branson, Mo, United States

Hey everyone~ Jumped in here directed from GenAI Landscape from PMI website. As a human, I am easily feel overwhelmed after being thrown tons of information, Iose direction easily. I feel I am taking advantage from these talented AIs...

Posted: May 10, 2024 3:46 AM

Sort By:

Mohamed El-Zanaty QA/QC Manager| Kharafi National SAE Alexandria, Egypt

Hi Bingye,

Putting too much unstructured data into a machine learning model like GenAI without proper preprocessing and structuring can have several implications:

1. Noise and Irrelevance: Unstructured data often contains noise and irrelevant information. Without careful preprocessing, this noise can negatively impact the learning process, leading to inaccurate or biased models.

2. Scalability and Efficiency: Large volumes of unstructured data can pose scalability and efficiency challenges for machine learning algorithms. Processing and analyzing such data require significant computational resources and may result in longer training times.

3. Interpretability: Unstructured data may lack clear labels or metadata, making it challenging to interpret the outputs of machine learning models. Understanding how the model makes decisions and providing explanations for its predictions becomes more difficult when working with unstructured data.

4. Data Quality and Consistency: Unstructured data often varies in quality and consistency, which can affect the performance and reliability of machine learning models. It's essential to address data quality issues through preprocessing and data cleaning techniques before feeding the data into the model.

To mitigate these challenges, it's crucial to follow best practices for data preprocessing, feature engineering, and model validation. This includes tasks such as text normalization, feature extraction, and validation against labeled datasets to ensure the model's accuracy and generalization capabilities. Additionally, incorporating domain knowledge and human expertise can help guide the data preprocessing and model development process effectively.

Regards,
Mohamed

Posted: May 10, 2024 5:05 AM

Abolfazl Yousefi Darestani Manager, Quality and Continuous Improvement| Hörmann-TNR Industrial Doors Newmarket, Ontario, Canada

It depends on the type of AI that you are using. You can give it a try and review the result. If it meets your requirements, then you are good to go.

Posted: May 10, 2024 7:24 AM

Keith Novak Tukwila, Wa, United States

I had the opportunity the same question to a big data scientist at CitiBank a few years ago, who was a guest speaker at a distributed computing class I took as part of my masters program. His answer was that the more data the better. While I agree with Mohamed's input on the issues with unstructured data and how it can muddy the waters, preprocessing that data is relatively straightforward.

Not for me of course; I'm not a data scientist.but given someone with the right skills and tools, structure can be added many different ways relatively easily. As a real life example of that, a colleague of mine is a data scientist at Microsoft who was working on the X-box. The live audio of players talking to each other is as unstructured as you can get. His processing techniques found that there was a very strong correlation between very specific obscenities used by the players, to specific system level issues they encountered.

I can only imagine how he made a presentation to his bosses about how he could tell what was wrong with their network based on the top 10 words and phrases would probably get me fired if I used in a business meeting, but the point is still that the right algorithms can find a structure to that data and provide valuable insights.

Posted: May 10, 2024 12:38 PM (Updated by author: May 10, 2024 12:38 PM)

Md. Golam Rob Talukdar

Community Champion

Project Manager| AWR Development (BD) Ltd. Cox's Bazer , Bangladesh

Unstructured data can enhance AI systems by providing rich context and holistic insights. However, challenges include data quality, dimensionality, and resource intensity. Balancing structured and unstructured data is key for effective AI training.

Posted: May 11, 2024 2:43 AM

Fabian Crosa

Community Champion

PMO Leader | Speaker & Mentor | Content Leader – PMOGA Latin America Hub| Catholic University of Uruguay Montevideo, Montevideo, Uruguay

The AI does not have the ability to know whether the data we give it is true or not, and the answers are based on it, so it is very important to pass well-structured, important and real data for the AI to process correctly.

Posted: May 11, 2024 10:45 AM

Sergio Luis Conte Helping to create solutions for everyone| Worldwide based Organizations Buenos Aires, Argentina

You can put what you like on top of a foundational model. For example, ChatGPT. The only thing you will be able to do is fine tunning, except you are working on those companies that are able to spend billion and billion of dollars to crate it own foundational model. My recommendation is taking a look to PMI´s free courses (both) to understand about GenAI mechanics. After that take a look to OpenAI site to understand the architectural mechanic. But at the end, you can put what you want. In fact, if you think to make your own experience you can download foundational models for free to your computer then make prototypes. Check LM Studio site.

Posted: May 11, 2024 7:37 PM

Please login or join to reply

What do you guys think about putting too much unstructured data to GenAI or "machine" that we want to "teach" our historical cases.

Sponsors

Vendor Events

Guessing is not a strategy: How to build decision velocity with AI and real-time data

Newsletters