As artificial intelligence (AI) becomes more popular across various industries, understanding its capabilities is becoming increasingly important. For professional associations and non-profits venturing into the AI realm, knowing what to watch for is crucial. This post will shed light on a recent study on AI performance over time and why it matters for your organization.
A research paper by Lingjiao Chen, Matei Zaharia, and James Zou from Stanford University and UC Berkeley titled "How Is ChatGPT's Behavior Changing over Time?" has recently offered compelling insights into the performance of AI over time [1]. Focusing on two popular large language models (LLMs), GPT-3.5 and GPT-4, the researchers reveal that AI behavior is not static; it evolves and changes over time.
In their study, the authors evaluated the performance of these models on four diverse tasks at two different points in time—March 2023 and June 2023[1]. These tasks included solving math problems, answering sensitive or dangerous questions, generating code, and visual reasoning.
The findings were illuminating: the models' performance varied significantly over time [1]. For instance, GPT-4's proficiency at identifying prime numbers drastically dropped from an accuracy of about 98% in March 2023 to approximately 2% in June 2023[1]. On the flip side, GPT-3.5 improved its performance in the same task during this period [1].
Such variability in AI performance can impact the usability and reliability of AI models in practical settings. If an AI model's response to a prompt change unpredictably, it could disrupt downstream workflows and hinder reproducibility of results, making it challenging to integrate AI into larger projects or systems [1].
The authors also found the willingness of these models to answer sensitive questions decreased over time, and the frequency of formatting mistakes in code generation increased [1]. These fluctuations underline the need for continuous monitoring of AI quality, especially in sensitive areas where accuracy and consistency are paramount.
First, it's important to note that AI is not a set-it-and-forget-it tool. Its performance can change over time due to updates based on new data, user feedback, and design changes [1]. Therefore, continuous monitoring of AI performance is essential to ensure that it is consistently meeting your organization's needs.
In addition, transparency is key. The authors of the study call for greater clarity on when and how updates to AI models occur and how they affect the models' behavior [1]. As organizations looking to leverage AI, asking vendors about their update practices along with the potential impact on AI behavior can help mitigate unforeseen challenges.
It’s important to consider the potential variability in AI performance when planning and designing AI-integrated systems. A change in AI behavior could impact system performance, so having contingency plans in place is prudent.
As professional associations and non-profits continue to explore AI, understanding these dynamics will be crucial in effectively harnessing the power of AI. The world of AI is complex, but with careful planning, continuous monitoring, and a commitment to understanding this evolving field, organizations can leverage AI to its full potential.
If you don’t have a roadmap for AI or a plan to monitor its performance in your organization, Cimatri can help. Our customized AI Roadmap for Associations will position your association to Succeed with AI. Reach out today for a free consultation.
[1]: Chen, L., Zaharia, M., & Zou, J. (2023). How Is ChatGPT's Behavior Changing over Time? Stanford University and UC Berkeley.