Testing out ChatGPT’s new Code Interpreter

OpenAI’s latest plugin turns ChatGPT into a tool for data cleaning, preprocessing, analysis, visualisation and predictive modelling tasks, among other things. Some have hailed it ‘the new data scientist’, but is it all it’s cracked up to be? Real World Data Science takes Code Interpreter for a test drive.

AI
Large language models
Coding
Data analysis
Author

Lee Clewley

Published

July 19, 2023

On July 6, 2023, OpenAI began rolling out the Code Interpreter plugin to users of its ChatGPT Plus service. But what exactly is this, and what functionality does it offer?

Code Interpreter runs code and allows for uploading data so you can use ChatGPT for data cleaning, preprocessing, analysis, visualisation and predictive modelling tasks, among other things. This tool holds great promise for programmers and analysts alike, with the potential to streamline coding workflows as well as having an automated data analyst at your fingertips.

To use Code Interpreter, you need to enable it in the ChatGPT settings (at time of writing this only works with a paid ChatGPT Plus subscription).

Screenshot of ChatGPT Plus setting, showing Code Interpreter plugin option.

Now, let’s take it for a bit of a spin by uploading the stroke prediction dataset from Kaggle.

The stroke prediction dataset

The World Health Organization (WHO) identifies stroke as the second leading cause of death worldwide, accounting for roughly 11% of all fatalities.

Kaggle’s stroke prediction dataset is used to forecast the likelihood of a patient suffering a stroke, taking into account various input parameters such as age, gender, presence of certain diseases, and smoking habits. Each row in the dataset offers pertinent information about an individual patient.

Loading this dataset into ChatGPT Code Interpreter, one is treated with:

Screenshot from ChatGPT, showing Code Interpreter's initial review of an uploaded stroke prediction dataset.

The user is asked: “Please let me know what analysis or operations you’d like to perform on this dataset. For instance, we can perform exploratory data analysis, data cleaning, data visualization, or predictive modelling.”

It seems quite a bold claim. So, I asked it to do all of the above.

Screenshot from ChatGPT, showing Code Interpreter's overview explanation of planned analysis steps.

Exploratory Data Analysis

Screenshot of ChatGPT Code Interpreter's exploratory data analysis outputs.

This is a good, useful summary. The missing values in bmi are set to the median, which the user can later decide to change for themselves as the code is available to do so.

Screenshot of code output from ChatGPT Code Interpreter, showing how to set missing values in dataset to the median value.

Data visualisation

Next, the visualisations of the variables are shown along with a correlation heatmap. Users can toggle between the visualisations and the code. The outputs are pretty useful, except for one mistake: id shouldn’t be included as part of the heatmap.

Screenshot of ChatGPT Code Interpreter's description of visualisations it will create, along with partial code for doing so.

Histograms and bar plots created by ChatGPT Code Interpreter for variables in the Kaggle stroke prediction dataset.

Histograms and bar plots created by ChatGPT Code Interpreter for variables in the Kaggle stroke prediction dataset.

Correlation heatmap for variables in the Kaggle stroke prediction dataset.

Correlation heatmap for variables in the Kaggle stroke prediction dataset.

Things start to go seriously awry when Code Interpreter tries to create a predictive model.

The predictive model is garbage

From the screenshot below, you can see that lumping all the data into a predictive model creates some highly spurious results. Age is a factor, as it should be, as is hypertension – indeed, those with hypertension in this dataset are around three times more likely to have a stroke than those without. In reality, there are also significant effects from glucose level and smoking, and also a slight BMI effect in this small, unbalanced dataset. However, work_type_children having a large positive effect is alarming and plainly wrong.

Screenshot showing ChatGPT Code Interpreter's most important features for predicting stroke. The inclusion of 'work_type_children' is wrong: it says that 'individuals who are children are more likely to have a stroke', but goes on to explain that 'this might be the result of an imbalance in the dataset or noise, as in reality, children generally have a lower risk of stroke.

It is very evident from the table below that the positive coefficient on children is spurious.

Screenshot of table from ChatGPT code interpreter, showing 'number of individuals' and 'number of strokes' for each 'work type'. Figures for children are 687 individuals and 2 strokes.

So, where does this leave our thinking about Code Interpreter?

Discussion

My test case is possibly an unfair one. The sort of study presented to Code Interpreter is one that requires careful analysis, and it uses a relatively small, tricky dataset whose difficulties are compounded by missing data. It’s therefore not surprising that, in this context, an automated analysis fails to shine in all respects.

To be fair, OpenAI themselves describe the plugin as an “eager junior programmer”. And as would be the case with a real junior programmer or junior data scientist, you’d expect a more experienced hand to be guiding an analysis like the one I asked for – someone who can sense-check results, point out errors, and offer suggestions for fixes and improvements.

Despite some stumbles in this demo, OpenAI’s “junior programmer” presents a real step forward in the ChatGPT offering, and it is particularly impressive that one can toggle between code and charts without having to worry about coding at all.

At this stage, I would argue that Code Interpreter may be useful for quick summaries, visualisations and a little basic data cleaning and some preliminary investigations. However, based on what I’ve seen so far, it is clear to me that highly trained statisticians won’t be replaced anytime soon.

Back to Editors’ blog

About the author
Lee Clewley is a member of the editorial board of Real World Data Science and head of applied AI in GSK’s AI and Machine Learning Group, R&D.
Copyright and licence
© 2023 Lee Clewley

This article is licensed under a Creative Commons Attribution 4.0 (CC BY 4.0) International licence. Thumbnail image by charlesdeluvio on Unsplash.

How to cite
Clewley, Lee. 2023. “Testing out ChatGPT’s new Code Interpreter.” Real World Data Science, July 19, 2023. URL