AI inference engines are becoming more and more popular at the edge. We find machine learning algorithms and even large language models being ported over to embedded systems in the hobbyist and commercial spaces. In this project, we will be building a real-time face detector using an Orange Pi 5 powered by the Rockchip 3588 processor. This means we will, in real-time, detect human faces from our webcam video stream.
To get started, we're going to use an Orange Pi 5, a USB webcam, and the code in my public repository. You can refer to Getting Started with the Orange Pi 5 and the Rockchip RK3588 Processor on how to set up your Orange Pi 5. What's great about this example is that it can also run on a PC or any other high-powered embedded device running Linux (e.g., Raspberry Pi 5). After you have set up and run your Orange Pi 5 (or other device), we will need to install a few libraries. This example is designed around Linux (specifically running Ubuntu), so we'll assume that you have access to the Debian package manager. From a terminal, run the following command:
sudo apt update -y && sudo apt install -y python3 python3-pip libopencv-dev python3-opencv v4l-utils
This will install all the necessary packages to run the demo. Install the OpenCV Python package as well using pip:
pip3 install opencv-python
You can also consult the README for more steps to validate your setup.
In AI Vision with the Kria KV260 Vision AI Starter Kit, we demonstrated how to build a face detection AI inference engine using an FPGA, but the level of complexity was extremely high. In this example, we're going to explore a significantly easier way of doing the same thing. This is easier because we offload the complexity to CPUs or GPUs, both of which we'll explore in this article. The real beauty is how well-optimized OpenCV is for our computers. Using the library completely abstracts away all of the machine learning complexities that we experienced in trying to build our own neural network on an FPGA.
The code is quite simple. We can break it down into a few steps:
The real magic here is in the "Cascade Classifier" that we're leveraging with OpenCV. This is a machine learning-based tool for object detection that works through a series of progressively complex stages (similar to the neural network concept). Think of it like a finely tuned assembly line, where each stage has one job: to detect specific features like edges, corners, or shapes. If a region of the image passes all these inspection checkpoints, then it's flagged as a detected object. If not, it's discarded early to save time and processing power.
A cascade classifier is all about speed and efficiency. It uses predefined features (i.e., patterns like edges or contrasts) and processes them in stages, making it perfect for real-time tasks on devices with limited processing capabilities. Neural networks, on the other hand, play in a different league. They automatically learn features directly from data so they can handle more complex and varied scenarios. That power, however, comes at a price: neural networks demand way more computational resources and time. Cascade classifiers are fast and lightweight but less flexible, while neural networks are robust but resource-hungry. It's all about the right tool for the job.
In our case, we get to use a pre-trained model, the Haar Frontal Face Detector (in the form of an XML file), that knows exactly what to look for in a face and what to filter out. Practically speaking, this is good enough as a simple example that we've been exploring. As mentioned above, it won't be as precise but it's still good enough for most simple face detection examples.
To run the code, all you need to do is:
python3 face_detection.py --use-gpu
Or skip the --use-gpu flag if you don't have a GPU on your device (i.e., not using an Orange Pi 5). At this point, a little box should pop u,p and a blue box should appear over your face. In my case, one popped up immediately:
This project demonstrates how accessible real-time face detection has become with tools like OpenCV and devices like the Orange Pi 5. By leveraging the lightweight and efficient Cascade Classifier (through the use of loading a pre-trained XML file), we've built a functional example without the complexities of neural networks or FPGA programming. While this approach has its limitations, such as handling varied lighting or angles, it's a perfect entry point for experimenting with edge AI.
With just a few libraries and minimal setup, you can replicate this project on an embedded device or even a standard PC. As AI inference engines continue to improve, expect to see more sophisticated models running on resource-constrained devices, making advanced AI more accessible to everyone. To view the repository containing all the code to get started, visit https://gitlab.com/ai-examples/orange-pi-face-detection.