Detection Studio: Complete Tutorial – Face Recognition + YOLOv8 Object Detection in One Project

Face and object detection

Introduction

So here is the thing – most computer vision tutorials out there teach you either face recognition or object detection. You rarely find a project that combines both in one clean, working desktop application. That is exactly what Detection Studio does.

This is a Python-based desktop GUI app built with tkinter that lets you:

  • Capture face images straight from your webcam
  • Train a personal face recognition model in one click
  • Run live detection that simultaneously identifies known faces and detects objects, animals, and people using YOLOv8

Everything runs 100% offline on your local machine. No cloud API, no subscriptions, no internet dependency after setup. Just Python doing serious computer vision work on your webcam feed.

In this tutorial, I will walk you through every part of this project — how it is built, how to set it up, how to use all three tabs, what happens under the hood when you train a model, and how to plug in your own custom YOLOv8 model. Let’s get into it.

What Makes This Project Different

Before we jump into setup, I want to explain why this particular combination of technologies is interesting.

Most face recognition demos use a simple script: load a photo, compare encodings, print a name. That is fine for learning. But Detection Studio is built for practical, real-world use. Here is what sets it apart:

Two models working together in one frame. YOLOv8 handles bounding box detection for people and objects. When a “person” bounding box is found, the face_recognition library kicks in to identify who that person is. You get both object-level detection and face-level identification in the same video frame, at the same time.

A proper GUI so anyone can use it. Enter a name, click Capture. Click Train. Click Start Detection. That is genuinely all it takes. You do not need to edit any config files or run Python from the terminal every time you want to add a new person.

Extensible with your own models. If you have trained a custom YOLOv8 model for your own use case — safety equipment detection, product recognition, vehicle classification — you can load it directly into Detection Studio and it will run alongside face recognition automatically.


Tech Stack Overview

Here is everything that powers the app and what each component does:

LibraryRole
tkinterDesktop GUI — window, tabs, buttons, camera canvas
face_recognition128-dim face encoding, comparison, identification
dlibPowers face_recognition under the hood
ultralyticsYOLOv8 model loading and inference
opencv-pythonWebcam capture, frame processing, bounding box drawing
PillowImage handling between OpenCV and tkinter
pickleSaving and loading trained face .pkl model files
numpyArray operations for encoding comparisons

The combination of face_recognition (which is extremely accurate for face matching) and YOLOv8 (which is fast enough to run real-time on a CPU) is what makes this project work well in practice.


Prerequisites

Before installing anything, make sure you have the following ready:

Python 3.9 or higher. Python 3.10 or 3.11 is recommended. Avoid Python 3.12 for now as some dlib wheels may not be available yet for all platforms.

CMake — required on Windows to compile dlib if a prebuilt wheel is not available for your Python version. Download from cmake.org and make sure you check “Add CMake to system PATH” during installation.

Visual Studio Build Tools — Windows only. Download from the Microsoft website and install the “Desktop development with C++” workload. This gives the compiler that dlib needs to build from source.

A working webcam — built-in laptop camera or USB webcam, both work fine.

If you are on Mac or Linux, you can skip the CMake and Build Tools step. dlib compiles cleanly on those platforms with standard build tools.


Installation

Step 1 — Clone the Repository

Download and Extract the Source Code

Step 2 — Install Dependencies

pip install -r requirements.txt

The requirements.txt includes all necessary packages:

face_recognition
ultralytics
opencv-python
Pillow
numpy
dlib

Windows users: If dlib fails to install, make sure CMake and Visual Studio Build Tools are installed and your terminal has been restarted after adding CMake to PATH. Then run pip install dlib separately before running the full requirements install. This isolates the compilation step and gives you clearer error messages if something goes wrong.

Installation takes anywhere from 3 to 10 minutes depending on your machine and internet speed. The dlib compilation step on Windows is the slow part — the terminal may look frozen for a couple of minutes. That is normal.

Step 3 — Run the App

python main.py

If a window opens with three tabs at the top — Capture Faces, Train Model, Detect — you are fully set up and ready to go.

App Overview — Three Tabs, One Workflow

The entire workflow of Detection Studio follows a simple three-step sequence, one tab at a time:

Capture Faces  →  Train Model  →  Detect

You only need to go through steps one and two once per person. After that, detection runs from the saved model every time.

Tab 1 — Capture Faces

This is where you register new people into your face database.

How to use it:

Open the Capture Faces tab. You will see a large black camera preview area on the left and a control panel on the right with a text input field and two buttons – Capture Faces and Clear / Reset.

  1. Type the person’s name in the Person Name field. Use a clean name without spaces if possible — e.g. Rahul or Sara_K.
  2. Click Capture Faces.
  3. Look directly into the camera. The app will detect your face automatically.
  4. Hold reasonably still. The app captures 15 face images in sequence.
  5. Once 15 images are saved, the capture stops automatically.
  6. The images are stored in data/faces/<PersonName>/.

You can register multiple people by repeating this process — just change the name each time. Each person gets their own folder under data/faces/.

Tips for better capture quality:

Good lighting is the single biggest factor in recognition accuracy. Face the light source rather than having it behind you. Avoid harsh shadows across your face.

Vary your angle slightly across the 15 captures — tilt your head a little left, a little right, look slightly up and down. This gives the model more data to work with and improves recognition from non-frontal angles.

Keep only one face in frame during capture. Multiple faces confuse the detection logic and the wrong face may be captured.

If you wear glasses regularly, capture with glasses on. If you want the system to recognise you with and without glasses, do two separate capture sessions — one with, one without — and name them differently (e.g. Rahul_glasses and Rahul).

Use Clear / Reset if you made a mistake — it wipes the current capture session so you can start fresh.

Tab 2 — Train Model

After capturing faces for one or more people, switch to the Train Model tab to build your face recognition model.

What you see on this tab:

  • Detected Face Folders — a list of all person folders found in data/faces/. These are the people who will be included in the trained model.
  • Saved Models — a list of previously trained .pkl model files from data/trained/.
  • Train Model button — starts the training process.
  • Active Model label at the bottom — shows the filename of the currently loaded model.
  • Use and Del buttons — activate a selected saved model or delete it.

How training works:

When you click Train Model, the app loops through every image in every person folder under data/faces/. For each image, it uses face_recognition(which uses dlib under the hood) to extract a 128-dimensional face encoding — a numerical vector that uniquely represents the face geometry.

It then creates a mapping of { encoding_vector: person_name } for every image across every person. This mapping gets serialised into a .pkl file using Python’s pickle module and saved to data/trained/ with a timestamp in the filename (e.g. face_model_20260427_124947.pkl).

The new model is automatically set as the active model.

Training time depends on how many face images you have. For 3-5 people with 15 images each, expect 30 to 90 seconds. For larger databases it scales linearly.

At detection time, when a face is spotted in the webcam feed:

  1. A new 128-dim encoding is computed for the detected face in real time.
  2. This encoding is compared to every stored encoding in the .pkl model using Euclidean distance.
  3. The closest matching encoding below the tolerance threshold identifies the person.
  4. If no stored encoding is close enough — the label shows as “Unknown”.

This is why capturing varied angles matters. More diverse training images means more reference encodings to compare against, which makes the matching more robust.

Tab 3 — Detect

This is the main operational tab. All live detection happens here.

Detection Mode Selection

The left panel shows six detection mode options as radio buttons:

Face Recognition — Combines YOLOv8 person detection with the trained .pkl face model. YOLOv8 finds person bounding boxes, face recognition labels who each person is.

Object Detection — Standard YOLOv8 with the full COCO dataset (80 classes). Detects chairs, bottles, phones, laptops, cars, and dozens of other common objects.

Animal Detection — Filtered YOLOv8 output showing only animal classes — cats, dogs, birds, horses, cows, and so on.

Person Detection — Detects and counts people in frame without face identification. Useful for footfall counting or basic crowd monitoring.

All Detection — Runs everything simultaneously. Faces are identified, objects are labelled, animals are spotted — all with bounding boxes.

Custom Detection — Load your own trained YOLOv8 .pt model. Details in the Custom Model section below.

Other Controls

Confidence Threshold slider — default is 0.30. This controls how confident the model needs to be before drawing a bounding box. Lower values detect more objects but include more false positives. Higher values are stricter but may miss objects. For most use cases 0.25–0.40 is a good range.

Enable Speak — when checked,the app announces detected names and object labels via text-to-speech. The Delay (s) spinner controls how many seconds between announcements so it does not spam the same label constantly.

Face Recognition File — pre-filled with the path to the active .pkl model. You can also manually browse to a specific model file here if you want to use a non-active one.

Start Detection — opens the webcam and begins live inference. The live video feed appears on the right panel with bounding boxes and labels drawn in real time.

Stop — ends detection and closes the camera feed.

Fullscreen — expands the detection view to full screen. Useful for demos.

Custom YOLOv8 Model Integration

One of the most powerful features of Detection Studio is the ability to load any custom-trained YOLOv8 model — not just the default COCO model.

What You Need

To use Custom Detection mode, you need two files:

  1. A YOLOv8 weights file in .pt format (your trained model)
  2. A class names file in .yaml or .yml format (the dataset config used during training)

The .yaml file should contain a names field like this:

names:
  0: helmet
  1: no_helmet
  2: safety_vest
  3: no_vest

How to Load Your Custom Model

  1. Go to the Detect tab
  2. Select Custom Detection mode
  3. Click Browse .pt and select your YOLOv8 weights file
  4. Click Browse .yaml and select your class names config file
  5. Click Start Detection

Combined Face + Custom Object Detection

Here is where it gets genuinely useful. If your custom model’s class list includes a class named person, the app will automatically run face recognition on any person bounding box detected by your custom model. This means you get:

  • Your custom object classes (e.g. helmets, vests, products, vehicles) detected and labelled
  • Faces of any detected people identified by name simultaneously

All in one video frame, with no code changes needed. This makes the app directly applicable to real scenarios like site safety monitoring (detecting PPE compliance while also identifying workers), access control, or retail analytics.

Understanding the Detection Pipeline

Here is what actually happens every frame during live detection:

Step 1 — Frame Capture. OpenCV reads a frame from the webcam using the camera_index specified in settings.json.

Step 2 — YOLOv8 Inference. The frame is passed to the YOLOv8 model. The model returns bounding box coordinates, class IDs, and confidence scores for every detected object above the confidence threshold.

Step 3 — Filtering by Mode. Depending on the selected detection mode, boxes are filtered to show only relevant classes. In Animal mode, only animal class IDs are kept. In Person mode, only class 0 (person) is kept.

Step 4 — Face Recognition (if applicable). For every bounding box identified as a “person” class, the app crops that region of the frame and passes it to face_recognition.face_encodings(). The resulting encoding is compared against all stored encodings in the active .pkl model. The closest match below the tolerance threshold is used as the label.

Step 5 — Drawing. Bounding boxes, class labels, and confidence scores are drawn onto the frame using OpenCV. Recognised face names replace generic “person” labels.

Step 6 — Display. The annotated frame is converted from OpenCV’s BGR format to RGB, then from a numpy array to a PIL Image, then rendered onto the tkinter canvas. This happens every frame, continuously, until Stop is clicked.


Runtime Data and File Structure

Understanding where data is stored helps with backup and troubleshooting.

detection-studio/
│
├── main.py                        # Entry point
├── requirements.txt               # All Python dependencies
├── settings.json                  # Camera index, active model path
│
├── data/
│   ├── faces/                     # Captured face images
│   │   ├── Rahul/
│   │   │   ├── 001.jpg
│   │   │   ├── 002.jpg
│   │   │   └── ... (15 images)
│   │   └── Sara/
│   │       └── ...
│   │
│   └── trained/                   # Trained face recognition models
│       ├── face_model_20260427_124947.pkl
│       └── face_model_20260420_093015.pkl

settings.json stores your camera index and the path to the currently active face model. It is auto-updated when you change the active model in the Train Model tab.

The data/ folder contains everything important. Back it up if you want to preserve your face database and trained models. Moving the data/ folder to another machine with the app installed is all you need to transfer your entire setup.

Each .pkl file is a fully self-contained face recognition model. It holds all face encodings and the associated person names. You can maintain multiple .pkl files for different scenarios — for example, one for home use and one for an office setup — and switch between them using the Use button on the Train Model tab.


Troubleshooting

dlib fails to install on Windows

This is the most common setup issue. The error usually looks like:

error: command 'cmake' failed: No such file or directory

or

CMake must be installed to build the following extensions: dlib

Fix: Install CMake from cmake.org, making sure to check “Add CMake to system PATH” during installation. Then install Visual Studio Build Tools from Microsoft and select the “Desktop development with C++” workload. Restart your terminal completely after both installs, then retry.

Camera shows black screen or “Could not open camera” error

Fix: Open settings.json and change the camera_index value. Index 0 is usually the built-in laptop camera. Index 1 or 2 is typically a USB webcam. Try each value until the correct camera opens.

Face not being detected during capture — counter stays at 0

This means the face detection algorithm cannot find a face in the frame. Common causes:

  • Poor or uneven lighting — try facing a window or lamp directly
  • Face too far from camera — move closer so your face fills more of the frame
  • More than one face in frame — step away from other people or turn other people away from camera
  • Heavy accessories like hats, scarves, or masks blocking facial landmarks

Person shows as “Unknown” during detection

This means the face encoding from the live feed is not close enough to any stored encoding in the model. Common causes:

  • Very different lighting between capture and detection sessions
  • Different angle from training data
  • Not enough training images — try capturing 30 to 50 images per person instead of just 15
  • Model trained before the latest captures — retrain after adding more images

ModuleNotFoundError after installing requirements

If you see No module named 'face_recognition' or similar errors even after installing, you are likely running from a different Python environment than the one where you installed packages. Activate your virtual environment first, then run the app.

Poor real-time performance or very low frame rate

YOLOv8 nano model (the default) runs well on most modern CPUs. If performance is poor, lower the input resolution in the settings or check if another application is heavily using the CPU. For significantly better performance, a machine with a CUDA-capable GPU will run inference much faster.

What to Build Next

Detection Studio is a solid foundation. Here are some natural extensions once you are comfortable with the base app:

Attendance logging — write detected face names and timestamps to a CSV or SQLite database during detection. Useful for classroom or workplace attendance tracking.

Alert system — trigger a notification or play a sound when an unrecognised face (Unknown) is detected. Useful for basic access control.

Multi-camera support — extend the capture and detection logic to handle multiple camera feeds simultaneously.

Web dashboard — serve the annotated video feed over a local web server so it can be viewed in a browser on any device on your network.

Custom model training — train your own YOLOv8 model on a domain-specific dataset (safety equipment, product types, vehicle models) and load it into Detection Studio using the Custom Detection mode.

Final Thoughts

Detection Studio demonstrates something important about modern computer vision — combining powerful pre-trained models with a practical interface makes the technology accessible for real use cases, not just academic demos.

The face_recognition library handles the hard work of face encoding with high accuracy. YOLOv8 handles real-time object detection fast enough to run on a regular CPU. tkinter makes it all accessible without requiring anyone to touch the command line after initial setup.

The project structure is clean, the data flow is straightforward, and the extensibility with custom models makes it genuinely useful beyond just a learning exercise.

If you run into issues during setup, the troubleshooting section above covers the most common problems. The dlib installation on Windows is the trickiest part — everything else is standard Python package management.

Resources

Tags:
detectionstudio yolov8 object detection python face recognition computer vision project opencv python real time object detection machine learning project deep learning project

Article Information

Author: zealyen.it

Last Updated: April 27, 2026

This article is part of our practical learning series focused on embedded systems, STM32, Arduino, and IoT.

Leave a Reply

Your email address will not be published. Required fields are marked *