Gemini vs. ChatGPT: Who Writes Better Code?

Ari Mahpour
|  Created: March 1, 2024  |  Updated: April 2, 2024
Gemini vs. ChatGPT

 

Gone are the days when Electrical Engineers can get by without writing a single line of code. Knowing how to code has become a required skill, necessary for engineers of all types. While engineers don’t need to understand how to write a full stack web application, it’s important that they have basic scripting skills. Performing analysis on large data sets, for example, requires some knowledge in programming. For more complex tasks, engineers can find themselves in a lurch, sometimes spending hours searching online for a particular function or solution. With the release of ChatGPT, the world, undoubtedly, has changed and so has the landscape for automated code generation. Engineers without a strong background can now write high quality code with the help of Generative AI.

In previous articles such as Using ChatGPT for Altium Scripting and Using ChatGPT for Automated Testing, we looked at how to leverage Generative AI, specifically ChatGPT, to write code for us. Until now, Google Bard (the ChatGPT competitor) has lagged behind but a new model, Gemini, has advertised that it outperforms its competitor on the Massive Multitask Language Understanding (MMLU) scale:

Figure 1: Gemini’s MMLU  performance metric

Figure 1: Gemini’s MMLU  performance metric. Source: https://deepmind.google/technologies/gemini/#capabilities

In this article we’re going to stack up ChatGPT’s most sophisticated engine, GPT4, against Gemini Advanced (formerly known as Ultra), Google’s most advanced AI engine, to see who is better at generating code.

Boilerplate Code

To perform a fully exhaustive comparison between the two engines would be extremely challenging. In this article we’re going to look at how well these two engines generate boilerplate code written in Python (as it is one of the most supported languages for code writing LLMs). This isn’t an exact science as good versus bad can be very subjective. Instead, we’ll grade each engine based simply on whether the code runs and how well it understood our instructions.

Task 1: Recursive File Property Printout

To get started let’s look at a very simple scripting task:

Develop a Python automation script that, given an absolute path to a folder, retrieves the metadata of each file recursively as a dictionary object. Print each dictionary object line by line and include the absolute path.

Let’s first analyze the code that comes back from GPT4:

import os

def get_file_metadata(directory):

    for foldername, subfolders, filenames in os.walk(directory):

        for filename in filenames:

            filepath = os.path.join(foldername, filename)

            metadata = {

                'filename': filename,

                'path': filepath,

                'size': os.path.getsize(filepath),

                'last_modified': os.path.getmtime(filepath)

            }

            print(metadata)

 

get_file_metadata(os.path.dirname(os.path.abspath(__file__)))

It’s concise, it works, and it gets the job done. I’d give this a pass. It’s also an exceptionally trivial task but that’s okay.

Now let’s look at the code that came back from Gemini Advanced:

import os

 

def get_file_metadata(file_path):

    metadata = {}

    try:

        stat = os.stat(file_path)

        metadata['name'] = os.path.basename(file_path)

        metadata['size'] = stat.st_size  # Size in bytes

        metadata['modified_time'] = stat.st_mtime  # Last modified timestamp

        metadata['accessed_time'] = stat.st_atime  # Last accessed timestamp

    except OSError:

        print(f"Error accessing file: {file_path}")

    return metadata

 

def retrieve_metadata_recursively(folder_path):

    for root, _, files in os.walk(folder_path):

        for file in files:

            file_path = os.path.join(root, file)

            metadata = get_file_metadata(file_path)

            if metadata:

                print(f"Absolute Path: {file_path}")  # Print the absolute path

                print(metadata)

 

if __name__ == '__main__':

    retrieve_metadata_recursively(os.path.dirname(os.path.abspath(__file__)))

Gemini is a bit more verbose, albeit, it added some error handling which, I suppose, is nice. It’s somewhat lengthy but it works so we’ll give it a pass.

Task 2: Web Scraper

For the next task I wanted to see how well both engines can understand a given language and create a parser based on that language. In this case we’re going to build a web scraper using the following prompt:
Write a Python script using BeautifulSoup or Scrapy that scrapes the top news headlines from 'https://news.ycombinator.com' and outputs them to a dictionary object.

Here is a snippet page source from 'https://news.ycombinator.com'. Use this to determine how to parse out the tags.

Write a Python script using BeautifulSoup or Scrapy that scrapes the top news headlines from 'https://news.ycombinator.com' and outputs them to a dictionary object.

Here is a snippet page source from 'https://news.ycombinator.com'. Use this to determine how to parse out the tags.

<html lang="en" op="news"><head><meta name="referrer" content="origin"><meta name="viewport" content="width=device-width, initial-scale=1.0"><link rel="stylesheet" type="text/css" href="news.css?J16btoAd8hqdkSoIdLSk">
        <link rel="icon" href="y18.svg">

</html>

As you will notice I included some HTML in my prompt. My actual prompt consists of more HTML but I truncated it for this article. Here’s what GPT4 generated:

import requests
from bs4 import BeautifulSoup

# URL of the site to scrape
url = 'https://news.ycombinator.com'

# Send a GET request to the URL
response = requests.get(url)

# Parse the HTML content of the page using BeautifulSoup
soup = BeautifulSoup(response.text, 'html.parser')

# Find all the news headlines
headlines = soup.find_all('tr', class_='athing')

# Dictionary to store headlines where key is the rank and value is the headline text
news_headlines = {}

# Loop through each headline, extract and store it in the dictionary
for headline in headlines:
    rank = headline.find('span', class_='rank').text.strip('.')
    title = headline.find('span', class_='titleline').text
    news_headlines[rank] = title

# Output the dictionary
print(news_headlines)

This code works fine and does what it’s supposed to. I was hoping it would know, intuitively, that I wanted the link to the articles but I wasn’t explicit so I can’t use that as a mark against GPT4’s grade.

Here is what Gemini generated:

import requests
from bs4 import BeautifulSoup

# Send an HTTP GET request to the Hacker News website
response = requests.get("https://news.ycombinator.com/")

# Parse the HTML content using BeautifulSoup
soup = BeautifulSoup(response.content, "html.parser")

# Find all the 'athing' class elements, which contain the news articles
articles = soup.find_all("tr", class_="athing")

# Create an empty dictionary to store the headlines
headlines = {}

# Loop through each article and extract the headline and URL
for article in articles:
    # Find the headline element
    headline_element = article.find("span", class_="titleline")
    headline = headline_element.find("a").text.strip()

    # Find the URL element
    url_element = headline_element.find("a")
    url = url_element["href"]

    # Add the headline and URL to the dictionary
    headlines[headline] = url

# Print the dictionary of headlines
print(headlines)

This also works and it is quite similar to GPT4’s output but this code actually extracts the URL. That’s nice but, again, not an expectation. In this challenge I would say they are, pretty much, the same. Perhaps Gemini scores an extra point for being a mind reader.

Task 3: Stock Web Application

For this task I want to create a web application that draws out stock charts of a few different tickers:

Create a web application in Python that shows charts of the following stocks: RIVN, TSLA, NKLA, FSR

This one was definitely trickier. It requires a frontend (i.e. graphs/pictures) and backend component (i.e. retrieving the data from a stock database). I’ve generated countless applications with GPT4 and wanted to see how Gemini stacked up against it. Here’s what GPT4 generated for me:

from flask import Flask, render_template_string
import yfinance as yf
import plotly.graph_objects as go
from datetime import datetime

app = Flask(__name__)

@app.route('/')
def home():
    stocks = ['RIVN', 'TSLA', 'NKLA', 'FSR']
    start = datetime(2021, 1, 1)
    end = datetime.now()

    fig = go.Figure()

    for stock in stocks:
        df = yf.download(stock, start=start, end=end)
        fig.add_trace(go.Scatter(x=df.index, y=df.Close, mode='lines', name=stock))

    graph_html = fig.to_html(full_html=False)

    return render_template_string("""
        <html>
            <body>
                {{ graph_html | safe }}
            </body>
        </html>

Figure 2: Screenshot of GPT4 generated web application

Figure 2: Screenshot of GPT4 generated web application

It’s simple, clean, and to the point. No frills but gets the job done. Here’s what Gemini generated for me:

import flask
from flask import render_template
import yfinance as yf
import plotly.graph_objects as go

app = flask.Flask(__name__)

# Stock symbols
tickers = ['RIVN', 'TSLA', 'NKLA', 'FSR']

# Fetch stock data using yfinance
def get_stock_data(ticker):
    start_date = '2023-02-10'  # Adjust start date as needed
    end_date = '2024-02-10'  # Today's date
    return yf.download(ticker, start=start_date, end=end_date)

@app.route('/')
def index():
    stock_data = {}
    for ticker in tickers:
        stock_data[ticker] = get_stock_data(ticker)

    charts = []
    for ticker, data in stock_data.items():
        fig = go.Figure(data=[go.Candlestick(x=data.index,
                                             open=data['Open'],
                                             high=data['High'],
                                             low=data['Low'],
                                             close=data['Close'])])
        fig.update_layout(title=ticker)
        charts.append(fig.to_html(full_html=False, include_plotlyjs='cdn'))

    return render_template('index.html', charts=charts)

if __name__ == '__main__':
    app.run(debug=True)

It also required me to create a new file, index.html, in a new directory called templates:

<!DOCTYPE html>
<html>
<head>
    <title>Stock Charts</title>
    <script src="https://cdn.plot.ly/plotly-latest.min.js"></script>
</head>
<body>
    {% for chart in charts %}
        <div id="{{ loop.index0 }}"></div> 
        <script>
            Plotly.newPlot('{{ loop.index0 }}', {{ chart | safe }});
        </script>
    {% endfor %}
</body>
</html>

Figure 3: Screenshot of Gemini Advanced generated web application

Figure 3: Screenshot of Gemini Advanced generated web application

This was definitely fancier, as it included a histogram and abstracted away the HTML content into a new file. Again, both were functional, thus, both receive a passing grade.

Failures and Hallucinations

When GPT4 first came out it took time to understand how to “prompt” it properly. When using Generative AI to write code, details matter. As a result, a lot of us have become more competent “Prompt Engineers” over this past year. Unfortunately, this can introduce some bias when stacking up GPT4 against other engines and I am willingly admitting that. I know how GPT4 “works” more than how Gemini works. The way that I craft prompts for GPT4 might be different than what Gemini expects. Regardless, I still ran into some very fundamental problems with Gemini.

One common issue I found with Gemini interpreting instructions was with generating unit tests. My initial goal was to cover unit testing in the article but, out of frustration, I gave up on Gemini entirely because it never followed directions. For example, I would ask Gemini to write a unit test using Pytest and encapsulate it in a class. It would generate the test using UnitTest instead, blatantly ignoring my request, but would encapsulate the code within a class. I would correct it and it would acknowledge that it accidentally used UnitTest instead of Pytest. It would then rewrite the code using Pytest but forget about putting it into a class. When I request it to use a Mock construct it defaults to UnitTest’s mock versus Pytest’s Mock. These are nuances but important when engaging with Generative AI.

Figure 4: Gemini Advanced not following directions

Figure 4: Gemini Advanced not following directions

Troubleshooting failures was another pain point. GPT4’s reasoning engine has proved to be pretty powerful when debugging errors in Python. Gemini…not so much. When I asked it to fix certain issues it simply tried to rewrite the code with added indents or swapping variables…an utterly useless response.

Sometimes Gemini just wouldn’t work. It said it couldn’t process my request. In other cases it started commenting about the…upcoming election?

Gemini Advanced confused about the upcoming elections?

Figure 5: Gemini Advanced confused about the upcoming elections?

Conclusion

Overall, the quality of the generated boilerplate code from Gemini was pretty competitive against GPT4. The experience and its reasoning engine, however, was much to be desired. Lucky for Google, those pieces are part of the implementation of the LLM and not the LLM itself. In other words, the Gemini LLM seems to be, fundamentally, quite good and on par with GPT4 but the code written around it that is used in the chat experience needs some help. In time, we will probably see Google iterate on their code and enhance that experience and reasoning engine.

About Author

About Author

Ari is an engineer with broad experience in designing, manufacturing, testing, and integrating electrical, mechanical, and software systems. He is passionate about bringing design, verification, and test engineers together to work as a cohesive unit.

Related Resources

Related Technical Documentation

Back to Home
Thank you, you are now subscribed to updates.