Introduction
In this blogpost we'll be using datasets generated with Roboflow to benchmark YOLOv6 and YOLOv7 performance on three popular GPU machines offered by Paperspace.
Roboflow is a computer vision platform with a large number of useful features around data annotation, model training, and data compatibility. YOLOv6 and YOLOv7 are state-of-the-art real time object detection libraries popular in computer vision.
This post isn’t intended to be a deep-dive into YOLO model architecture but rather to highlight how easy it is to use Roboflow datasets out of the box with with different model types and then to train on these datasets with various GPUs from Paperspace.
Let's get started!
Setup
We'll be using two datasets in this tutorial – a set of single-class aerial images of sheep and a set of multi-class images of Clash of Clans bases. Details about each dataset are below.
The context of this tutorial will be to train two variations of object detection models (YOLOv6, YOLOv7) with different sets of data on three different types of GPUs to show how to determine which GPU may be best for a given process.
First we'll be benchmarking using 5 epochs. We'll then extrapolate these results out to 100 epochs to estimate the training times and costs we'll need to account for – and then we'll go ahead and train the most promising combinations for the full 100 epochs and detail the results.
The benchmarking process and code is available here.
Datasets
We'll be working with two datasets produced by Roboflow. These datasets are as follows:
Dataset | Type | Image size | Training images | Validation images | Testing images | Link |
---|---|---|---|---|---|---|
Aerial Sheep | Single-class | 3840 x 2160 | 1203 | 350 | 174 | https://universe.roboflow.com/riis/aerial-sheep/dataset/1 |
Clash of Clans | Multi-class | 640 x 640 | 88 | 24 | 13 | https://universe.roboflow.com/find-this-base/clash-of-clans-vop4y/dataset/5 |
Now that the table is set, let's walk through how to generate and download datasets like these from Roboflow.
Downloading data from Roboflow
One of the great things about Roboflow is Roboflow Universe, which provides a ton of different projects and datasets that can be pulled to use with a wide range of models.
In this tutorial we will be training YOLOv6 and YOLOv7 models. There are some quick and easy steps to download the data needed for them (in the proper format) on Roboflow. We'll show you now how to prepare the Clash of Clans dataset.
First we'll head over to the Clash of Clans project page. Next, we'll select Download
.
From there, we'll select meituan/PyTorchv6
as our export format and then we'll make sure show download code
is enabled.
We should now see that Roboflow has generated a snippet we can use to import the dataset into our project.
That's all there is to it! We now have a snippet that we can inject into our notebook that will download the Roboflow dataset into our project. Excellent!
Setting up the test environment in Paperspace
We'll be using Gradient Notebooks to run a simple benchmarking environment on Paperspace.
The notebook file is located in this GitHub repo. We can pull the repo directly into a notebook at the time we create a new notebook.
In the Paperspace console, we'll first navigate to Gradient, which is Paperspace's machine learning platform backed by powerful GPUs, and then create a new notebook within a project.
We'll then select the PyTorch 1.12
runtime.
Next, we'll select the machine. In this case we'll start off with the P6000 GPU knowing that we can stop the notebook and restart on a different machine at any time.
Next we'll toggle Advanced options
and paste the URL of the testing repo into the Workspace URL field.
Now we can start the notebook and we should see that our notebook has entered the Running
state. Nice!
Now all that we have left to do is to inject the snippet that we grabbed from Roboflow into the appropriate code cells. We'll make sure to do that and then we should find ourselves with a working benchmarking notebook.
Training and benchmarking
For this object detection task we ran two different YOLO models, MT-YOLOv6 and YOLOv7 PyTorch.
Let's take a look at some of the details of the two models we'll be running. Let's be sure to note the size difference between them as that will play into the training times that we might expect to see when we start training.
YOLOv6 basic network details
Model | YOLOv6 |
Layers | 295 |
Parameters | 17.2M |
GFLOPS | 44.2 |
YOLOv7 basic network details
Model | YOLOv7 |
Layers | 415 |
Parameters | 37.2M |
GFLOPS | 105.4 |
In our testing, we used two different models across two different datasets powered by three different GPU machines – for a total of twelve different combinations.
Each run was compared for average epoch time which was a sample of 5 epochs. We then extrapolated this training sample out to 100 epochs for the purposes of the comparison below. Once we establish these baselines, we'll then train the most promising combinations for the full 100 epochs.
In order to extrapolate to 100 epochs, training time was multiplied by the on-demand price for each GPU machine to determine an estimated training cost. This way we can compare the training time as well as the actual cost of each benchmark.
The following Paperspace GPU machines were compared:
GPU Type | GPU Memory | TFLOPS (SP) | Tensor Cores | CPUs | RAM | $ per hour |
---|---|---|---|---|---|---|
V100 | 16 GB | 14 | 640 | 8 | 30 GB | $2.30 |
Quadro P5000 | 16 GB | 8.9 | 0 | 8 | 30 GB | $0.78 |
RTX A6000 | 48 GB | 38.7 | 336 | 8 | 45 GB | $1.89 |
Results
The twelve training runs resulted in the following benchmarks.
The detailed performance metrics are detailed below. Note that in each of the following four tables, we've extrapolated 5 epochs out to 100 epochs to get an estimate for the full training times and costs.
YOLOv6 - Sheep dataset
Machine Type | Model | Datasets | Average Epoch Time | On-Demand Hourly Pricing | Training Cost (100 epochs) |
---|---|---|---|---|---|
P5000 | YOLOv6 | Sheep | 3 min 53 seconds | $0.78 | $5.05 |
A6000 | YOLOv6 | Sheep | 1 min 28 sec | $1.89 | $4.62 |
V100 | YOLOv6 | Sheep | 1 min 25 sec | $2.30 | $5.43 |
YOLOv6 - Clash of Clans dataset
Machine Type | Model | Datasets | Average Epoch Time | On-Demand Hourly Pricing | Training Cost (100 epochs) |
---|---|---|---|---|---|
P5000 | YOLOv6 | Clash of Clans | 9 seconds | $0.78 | $0.20 |
A6000 | YOLOv6 | Clash of Clans | 5 seconds | $1.89 | $0.26 |
V100 | YOLOv6 | Clash of Clans | 8 seconds | $2.30 | $0.51 |
YOLOv7 - Sheep dataset
Machine Type | Model | Datasets | Average Epoch Time | On-Demand Hourly Pricing | Training Cost (100 epochs) |
---|---|---|---|---|---|
P5000 | YOLOv7 | Sheep | 5 min 24 seconds | $0.78 | $7.02 |
A6000 | YOLOv7 | Sheep | 1 min 24 sec | $1.89 | $4.41 |
V100 | YOLOv7 | Sheep | 1 min 51 sec | $2.30 | $7.09 |
YOLOv7 - Clash of Clans dataset
Machine Type | Model | Datasets | Average Epoch Time | On-Demand Hourly Pricing | Training Cost (100 epochs) |
---|---|---|---|---|---|
P5000 | YOLOv7 | Clash of Clans | 34 seconds | $0.78 | $0.74 |
A6000 | YOLOv7 | Clash of Clans | 12 seconds | $1.89 | $0.63 |
V100 | YOLOv7 | Clash of Clans | 30 seconds | $2.30 | $1.92 |
Results from training YOLOv6 and YOLOv7 for 5 epochs
In 3/4 training environments, the newer A6000 GPU machine with 48 GB of GPU memory performed the most cost-effective training.
It should be no surprise that training on the Aerial Sheep dataset required more compute time and power. In addition to having more images, the Aerial Sheep dataset also features substantially larger images in terms of resolution. It's clear from the data that bigger dataset images benefit from machines with higher TFLOPS performance.
When starting to scale-up image sizes and dataset sizes, we started to see some of the benefits of the higher-end GPUs like the A6000 and V100 in terms of processing time.
In this case the A6000 clearly processes Aerial Sheep for both YOLOv6 and YOLOv7 the fastest. So even though it’s not the cheapest GPU, it still will be cheapest over a longer training cycle. These are the kinds of tradeoffs that are useful to identify when preparing for a longer training runs.
Taking it one step further
Now that we’ve looked at temporary costs, let’s actually run the models over the 100 epochs for our selected combinations and see what performance we are getting.
We'll be performing the full 100 epoch training interval with the help of the P5000 and the A6000 machines and we'll be measuring the performance of the model using mAP@0.5.
If you would like to know more about mAP as a metric, please check out Evaluating Object Detection Models Using Mean Average Precision (mAP) from the Paperspace blog.
Note that in each of the two following tables we've trained for the full duration of 100 epochs rather than extrapolating.
YOLOv6 and YOLOv7 - Clash of Clans dataset
Machine Type | Model | Datasets | Training Time (100 epochs) | On-demand Hourly Pricing | Training Cost (100 epochs) | mAP@0.5 |
---|---|---|---|---|---|---|
P5000 | YOLOv6 | Clash of Clans | 12 min 32 seconds | $0.78 | $0.15 | 0.082 |
P5000 | YOLOv7 | Clash of Clans | 22 min 1 second | $0.78 | $0.27 | 0.068 |
YOLOv6 and YOLOv7 - Sheep dataset
Machine Type | Model | Datasets | Training Time (100 epochs) | On-demand Hourly Pricing | Training Cost (100 epochs) | mAP@0.5 |
---|---|---|---|---|---|---|
A6000 | YOLOv6 | Sheep | 1 hour 32 min 53 seconds | $1.89 | $2.93 | 0.933 |
A6000 | YOLOv7 | Sheep | 2 hours 29 min 56 seconds | $1.89 | $4.72 | 0.918 |
Results from training YOLOv6 and YOLOv7 for 100 epochs
One of the clearest takeaways from the longer training run is that for the datasets we're working with, neither YOLOv6 nor YOLOv7 is giving significantly better mAP@0.5 results than the other.
Secondly, we should note that as expected YOLOv6 is running in a little over half the time of YOLOv7.
Finally, as related to the datasets themselves, we can see that the Clash of Clans dataset, while fun to work with, has not provided enough data to get accurate results with either model. This is not entirely surprising as the dataset has more than 10 classes with a small number of images.
The Aerial Sheep dataset, meanwhile, is much larger and with a single prediction class and we’re seeing mAP@0.5 of >0.9, resulting in a much better outcome for our training runs.
Up next
We hope this blogpost helped you to understand what's possible when using GPUs from Paperspace and datasets from Roboflow. Be sure to give a follow to @hellopaperspace and @roboflow to keep up with all the latest and greatest computer vision projects that are coming out.
And if you really enjoyed this benchmark and want to spin up your own object detection project using YOLOv6 or YOLOv7, make a free account on Paperspace and be sure to check out Roboflow Universe.