Gone are the days when Electrical Engineers can get by without writing a single line of code. Knowing how to code has become a required skill, necessary for engineers of all types. While engineers don’t need to understand how to write a full stack web application, it’s important that they have basic scripting skills. Performing analysis on large data sets, for example, requires some knowledge in programming. For more complex tasks, engineers can find themselves in a lurch, sometimes spending hours searching online for a particular function or solution. With the release of ChatGPT, the world, undoubtedly, has changed and so has the landscape for automated code generation. Engineers without a strong background can now write high quality code with the help of Generative AI.
In previous articles such as Using ChatGPT for Altium Scripting and Using ChatGPT for Automated Testing, we looked at how to leverage Generative AI, specifically ChatGPT, to write code for us. Until now, Google Bard (the ChatGPT competitor) has lagged behind but a new model, Gemini, has advertised that it outperforms its competitor on the Massive Multitask Language Understanding (MMLU) scale:
Figure 1: Gemini’s MMLU performance metric. Source: https://deepmind.google/technologies/gemini/#capabilities
In this article we’re going to stack up ChatGPT’s most sophisticated engine, GPT4, against Gemini Advanced (formerly known as Ultra), Google’s most advanced AI engine, to see who is better at generating code.
To perform a fully exhaustive comparison between the two engines would be extremely challenging. In this article we’re going to look at how well these two engines generate boilerplate code written in Python (as it is one of the most supported languages for code writing LLMs). This isn’t an exact science as good versus bad can be very subjective. Instead, we’ll grade each engine based simply on whether the code runs and how well it understood our instructions.
To get started let’s look at a very simple scripting task:
Develop a Python automation script that, given an absolute path to a folder, retrieves the metadata of each file recursively as a dictionary object. Print each dictionary object line by line and include the absolute path.
Let’s first analyze the code that comes back from GPT4:
import os def get_file_metadata(directory): for foldername, subfolders, filenames in os.walk(directory): for filename in filenames: filepath = os.path.join(foldername, filename) metadata = { 'filename': filename, 'path': filepath, 'size': os.path.getsize(filepath), 'last_modified': os.path.getmtime(filepath) } print(metadata) get_file_metadata(os.path.dirname(os.path.abspath(__file__))) |
It’s concise, it works, and it gets the job done. I’d give this a pass. It’s also an exceptionally trivial task but that’s okay.
Now let’s look at the code that came back from Gemini Advanced:
import os def get_file_metadata(file_path): metadata = {} try: stat = os.stat(file_path) metadata['name'] = os.path.basename(file_path) metadata['size'] = stat.st_size # Size in bytes metadata['modified_time'] = stat.st_mtime # Last modified timestamp metadata['accessed_time'] = stat.st_atime # Last accessed timestamp except OSError: print(f"Error accessing file: {file_path}") return metadata def retrieve_metadata_recursively(folder_path): for root, _, files in os.walk(folder_path): for file in files: file_path = os.path.join(root, file) metadata = get_file_metadata(file_path) if metadata: print(f"Absolute Path: {file_path}") # Print the absolute path print(metadata) if __name__ == '__main__': retrieve_metadata_recursively(os.path.dirname(os.path.abspath(__file__))) |
Gemini is a bit more verbose, albeit, it added some error handling which, I suppose, is nice. It’s somewhat lengthy but it works so we’ll give it a pass.
For the next task I wanted to see how well both engines can understand a given language and create a parser based on that language. In this case we’re going to build a web scraper using the following prompt:
Write a Python script using BeautifulSoup or Scrapy that scrapes the top news headlines from 'https://news.ycombinator.com' and outputs them to a dictionary object.
Here is a snippet page source from 'https://news.ycombinator.com'. Use this to determine how to parse out the tags.
Write a Python script using BeautifulSoup or Scrapy that scrapes the top news headlines from 'https://news.ycombinator.com' and outputs them to a dictionary object. Here is a snippet page source from 'https://news.ycombinator.com'. Use this to determine how to parse out the tags. <html lang="en" op="news"><head><meta name="referrer" content="origin"><meta name="viewport" content="width=device-width, initial-scale=1.0"><link rel="stylesheet" type="text/css" href="news.css?J16btoAd8hqdkSoIdLSk"> |
As you will notice I included some HTML in my prompt. My actual prompt consists of more HTML but I truncated it for this article. Here’s what GPT4 generated:
import requests # URL of the site to scrape # Send a GET request to the URL # Parse the HTML content of the page using BeautifulSoup # Find all the news headlines # Dictionary to store headlines where key is the rank and value is the headline text # Loop through each headline, extract and store it in the dictionary # Output the dictionary |
This code works fine and does what it’s supposed to. I was hoping it would know, intuitively, that I wanted the link to the articles but I wasn’t explicit so I can’t use that as a mark against GPT4’s grade.
Here is what Gemini generated:
import requests # Send an HTTP GET request to the Hacker News website # Parse the HTML content using BeautifulSoup # Find all the 'athing' class elements, which contain the news articles # Create an empty dictionary to store the headlines # Loop through each article and extract the headline and URL # Find the URL element # Add the headline and URL to the dictionary # Print the dictionary of headlines |
This also works and it is quite similar to GPT4’s output but this code actually extracts the URL. That’s nice but, again, not an expectation. In this challenge I would say they are, pretty much, the same. Perhaps Gemini scores an extra point for being a mind reader.
For this task I want to create a web application that draws out stock charts of a few different tickers:
Create a web application in Python that shows charts of the following stocks: RIVN, TSLA, NKLA, FSR |
This one was definitely trickier. It requires a frontend (i.e. graphs/pictures) and backend component (i.e. retrieving the data from a stock database). I’ve generated countless applications with GPT4 and wanted to see how Gemini stacked up against it. Here’s what GPT4 generated for me:
from flask import Flask, render_template_string app = Flask(__name__) @app.route('/') fig = go.Figure() for stock in stocks: graph_html = fig.to_html(full_html=False) return render_template_string(""" |
Figure 2: Screenshot of GPT4 generated web application
It’s simple, clean, and to the point. No frills but gets the job done. Here’s what Gemini generated for me:
import flask app = flask.Flask(__name__) # Stock symbols # Fetch stock data using yfinance @app.route('/') charts = [] return render_template('index.html', charts=charts) if __name__ == '__main__': |
It also required me to create a new file, index.html, in a new directory called templates:
<!DOCTYPE html> |
Figure 3: Screenshot of Gemini Advanced generated web application
This was definitely fancier, as it included a histogram and abstracted away the HTML content into a new file. Again, both were functional, thus, both receive a passing grade.
When GPT4 first came out it took time to understand how to “prompt” it properly. When using Generative AI to write code, details matter. As a result, a lot of us have become more competent “Prompt Engineers” over this past year. Unfortunately, this can introduce some bias when stacking up GPT4 against other engines and I am willingly admitting that. I know how GPT4 “works” more than how Gemini works. The way that I craft prompts for GPT4 might be different than what Gemini expects. Regardless, I still ran into some very fundamental problems with Gemini.
One common issue I found with Gemini interpreting instructions was with generating unit tests. My initial goal was to cover unit testing in the article but, out of frustration, I gave up on Gemini entirely because it never followed directions. For example, I would ask Gemini to write a unit test using Pytest and encapsulate it in a class. It would generate the test using UnitTest instead, blatantly ignoring my request, but would encapsulate the code within a class. I would correct it and it would acknowledge that it accidentally used UnitTest instead of Pytest. It would then rewrite the code using Pytest but forget about putting it into a class. When I request it to use a Mock construct it defaults to UnitTest’s mock versus Pytest’s Mock. These are nuances but important when engaging with Generative AI.
Figure 4: Gemini Advanced not following directions
Troubleshooting failures was another pain point. GPT4’s reasoning engine has proved to be pretty powerful when debugging errors in Python. Gemini…not so much. When I asked it to fix certain issues it simply tried to rewrite the code with added indents or swapping variables…an utterly useless response.
Sometimes Gemini just wouldn’t work. It said it couldn’t process my request. In other cases it started commenting about the…upcoming election?
Figure 5: Gemini Advanced confused about the upcoming elections?
Overall, the quality of the generated boilerplate code from Gemini was pretty competitive against GPT4. The experience and its reasoning engine, however, was much to be desired. Lucky for Google, those pieces are part of the implementation of the LLM and not the LLM itself. In other words, the Gemini LLM seems to be, fundamentally, quite good and on par with GPT4 but the code written around it that is used in the chat experience needs some help. In time, we will probably see Google iterate on their code and enhance that experience and reasoning engine.