Using ChatGPT to Analyze Your Test Data

Ari Mahpour
|  Created: October 18, 2023  |  Updated: April 18, 2024
Using ChatGPT to Analyze Your Test Data

In Using ChatGPT for Automated Testing, we talked about easy ways to get Generative AI to write test scripts and software libraries, making it simple to communicate with your test equipment. Now that you’re familiar with automating your test scripts it’s time to automate processing your test data as well. Using ChatGPT for data analysis can be simple and streamlined. In this article we’re going to look at how to do just that with fresh test data from my recent battery tests.

Background

I recently ran some tests to profile a smart battery and wanted to see how accurate some of the reporting registers
were, specifically:

  1. Temperature
  2. Voltage
  3. Current

 

The output data I grabbed from my test script was in CSV format so it was a breeze to import into Microsoft Excel. Naturally, like most engineers, I could have used spreadsheets with fancy tables, charts, and some Excel formula magic to find the data that I was looking for. I decided, however, what I really wanted was an assistant who I could ask a series of questions to and get back the data I was looking for.

With AI and its Natural Language Model I can ask questions in plain English about my data and get the results I’m looking for. What’s even nicer is the recent addition of the Advanced Data Analysis plugin (formerly known as “Code Interpreter'') on GPT-4. It will not only show me the Python code it writes as it processes my instructions, but it can also graph data for me as well using the Matplotlib library in Python. Note that I’m referring to a paid version of ChatGPT but you can still get most of what you’re looking for in the free version of ChatGPT (or other Generative AI systems).

Testing the Waters

Before I started I wanted to get a feel for what my AI could and could not do for me. I uploaded my CSV test data to ChatGPT and asked simple questions about the data such as “What is the average temperature reported from the battery?” I discovered very quickly that the dataset I gave was too big. I had two choices: trim down my data or ask it to provide me with commands to run against my data locally. I decided to do both. With the Advanced Data Analysis plugin it will provide you with the code snippet (in Python) that it runs on their machines. Once I provided GPT-4 with my trimmed down file It was able to start analyzing the data.

Throughout this article you will see snippets of code that it provides that can be executed locally against the larger set of data as well. Having GPT-4 execute the code first is a key component that was missing when they first came out with ChatGPT. I would, frequently, get generated code that did not work. With Advanced Data Analysis, that stopped happening (since it would attempt to correct itself after a failure).

When you first load your data it’s good to test the waters a little bit. To start, I give ChatGPT a chance to analyze my data and let me know what it thinks it can do:

Initial Data Loading

Figure 1: Initial data loading

After that it gives me suggestions on what I think I would like to see:

Analyses suggestion

Figure 2: Analyses suggestions

I’m curious about the correlation matrix. I’m wondering how temperature correlates to voltage and current so I’ll ask:
Show me the correlation between temperature, supply voltage/current, and load voltage/current.

It gives me a nice breakdown on how all the variables correlate:

Correlation matrix

Figure 3: Correlation matrix

And if I ask ChatGPT to plot it for me I get:

Correlation matrix plot

Figure 4: Correlation matrix plot

It gives me some extra correlations, which I really don’t care for, but I’m fine with ignoring it.

Data Validation

Now that I’ve performed some data exploration I’d like to move on to validating my data sets. As mentioned above I want to get a sense on how well the smart battery reports its internal voltage and current. The simplest way to track this is to read off the supply and load power from my instruments and compare it to the reported battery power. Obviously there will be some loss across the cables and efficiency drop in the charger so we will just look at the correlation between the two and determine our accuracy from there.

While the natural language model that ChatGPT provides is very good, it’s certainly not perfect. There are certain concepts that it may not completely understand. If you ask it to perform statistical analysis it usually understands what to do but it may not always understand simple engineering terms. To account for this I try to make my requests as simple as possible to prevent issues downstream. I ask ChatGPT to create three new columns:

Let's create three new columns of data:

  1. Supply Power: Supply Voltage * Supply Current
  2. Load Power: Load Voltage * Load Current
  3. Battery Power: abs(Voltage * Current)/1000/1000

I also need to indicate that when my load power = 0 (i.e. the DC electronic load is not sinking current from the device under test) that means we are charging. When load power is non-zero that is an indication that we are discharging the battery. Again, in order not to confuse GPT-4 I mention this in request like so:

Plot load power and battery power against timestamps but only when load power is non-zero

The result I get is a nice plot:

Plot of Load and Battery

Figure 5: Plot of Load and Battery power

Now that the data looks good I can ask for a correlation and other analyses:
Show me the correlation between the two (only in a nonzero setting)

Correlation Calculation

Figure 6: Correlation calculation

And when I ask for other stats:
What is the mean difference, standard deviation, and variance between the two (non-zero only) [in percentages]

Stats on Load and Battery power

Figure 7: Stats on Load and Battery power

This helps me validate my hunch that the two are tightly coupled together since there’s only a little cable loss. I now perform the same analysis on the charging portion:

Let's perform the same analysis but plotting supply power and battery power against timestamps when load power is 0

Plot of Supply and Battery power

Figure 8: Plot of Supply and Battery power

As you can see I have some glitches. This is due to the smart battery telemetry being out of sync with my instruments. Ideally, I should be moving into charge mode (i.e. turn electronic load off and turning on power supply) and then wait another few seconds before collecting telemetry. The reason for this is because there are a series of commands that get sent between the battery and charger to “negotiate” how much power to deliver to the battery. Think of this as a dumbed down version of USB-C Power Delivery (if you’re familiar with the concept).

Unfortunately, I did not bake that into my telemetry collection script so I now have anomalies to filter out. No worries, ChatGPT can do that too. After a few iterations I “train” my AI to “learn” what is considered a glitch and what isn’t (e.g. what is considered an outlier that isn’t accounted for in standard statistical analysis functions) and then it applies that knowledge:

Filtered plot of Supply and Battery power

Figure 9: Filtered plot of Supply and Battery power

When I ask for the same stats I get a full breakdown:

Stats on filtered plot of Supply and Battery power Stats on filtered plot of Supply and Battery power1


Figure 10: Stats on filtered plot of Supply and Battery power

This is good news. I’m expecting a bigger difference between the values because there is loss in both the cable and the power converter.

Data Scrubbing

ChatGPT can also act as a scrubber for your data by looking for anomalies. In the past I’ve seen some glitches where the temperature register went a bit haywire. I ask ChatGPT to “tell me if you see any glitches in my data” and it returns with:

Looking for anomalies on Temperature data

Figure 11: Looking for anomalies on Temperature data

This can give you a good starting point in identifying glitches. In this particular test it looks like the temperature register was functioning correctly.. Consider this power efficiency graph that I ask ChatGPT to plot for me:

Plotting power efficiency

Figure 12: Plotting power efficiency

It’s quite evident that there are some abnormalities in my data. It’s certainly not possible to have efficiency > 100% and GPT-4 knows it as well:

ChatGpt take on

Figure 13: ChatGPT’s take on the anomaly

After my investigation I discover that there was, indeed, a glitch across the communication bus. I ask ChatGPT to filter out all the data points where efficiency is > 100%. At this point I can request a download link to grab the filtered data, plot it again, or filter it further.

Conclusion

In this article we look at ways to utilize ChatGPT to analyze, filter, and manipulate our test data from our test reports. We got our feet wet by uploading the test data and asking ChatGPT what it could do and then moved into data validation and scrubbing. The key takeaway here is that we can utilize natural language to provide ourselves with guided explorations through our data set. In the coming months we’ll see other players in AI space start to provide features like this built into their applications. As we move into this ecosystem of pairing we will use AI to support and guide us through our analysis versus replace us. As AI becomes more ubiquitous in data processing, our capabilities as individuals will grow, unlocking countless possibilities and potential.

About Author

About Author

Ari is an engineer with broad experience in designing, manufacturing, testing, and integrating electrical, mechanical, and software systems. He is passionate about bringing design, verification, and test engineers together to work as a cohesive unit.

Related Resources

Related Technical Documentation

Back to Home
Thank you, you are now subscribed to updates.