Benchmarking YOLOv6 and YOLOv7 on Paperspace Gradient with the help of Roboflow datasets

In this guide we'll be pairing Gradient Notebooks with Roboflow datasets to run a training benchmark and compare training costs for YOLO object detection models across multiple GPU types.

a month ago   •   9 min read

By Joshua Robison


In this blogpost we'll be using datasets generated with Roboflow to benchmark YOLOv6 and YOLOv7 performance on three popular GPU machines offered by Paperspace.

Roboflow is a computer vision platform with a large number of useful features around data annotation, model training, and data compatibility. YOLOv6 and YOLOv7 are state-of-the-art real time object detection libraries popular in computer vision.

This post isn’t intended to be a deep-dive into YOLO model architecture but rather to highlight how easy it is to use Roboflow datasets out of the box with with different model types and then to train on these datasets with various GPUs from Paperspace.

Let's get started!


We'll be using two datasets in this tutorial – a set of single-class aerial images of sheep and a set of multi-class images of Clash of Clans bases. Details about each dataset are below.

The context of this tutorial will be to train two variations of object detection models (YOLOv6, YOLOv7) with different sets of data on three different types of GPUs to show how to determine which GPU may be best for a given process.

First we'll be benchmarking using 5 epochs. We'll then extrapolate these results out to 100 epochs to estimate the training times and costs we'll need to account for – and then we'll go ahead and train the most promising combinations for the full 100 epochs and detail the results.

The benchmarking process and code is available here.


We'll be working with two datasets produced by Roboflow. These datasets are as follows:

Dataset Type Image size Training images Validation images Testing images Link
Aerial Sheep Single-class 3840 x 2160 1203 350 174
Clash of Clans Multi-class 640 x 640 88 24 13
(Left) Example image from Aerial Sheep dataset. (Right) Example image from Clash of Clans dataset.

Now that the table is set, let's walk through how to generate and download datasets like these from Roboflow.

Downloading data from Roboflow

One of the great things about Roboflow is Roboflow Universe, which provides a ton of different projects and datasets that can be pulled to use with a wide range of models.

The Roboflow Universe is filled with lots of handy datasets and pre-trained models

In this tutorial we will be training YOLOv6 and YOLOv7 models. There are some quick and easy steps to download the data needed for them (in the proper format) on Roboflow. We'll show you now how to prepare the Clash of Clans dataset.

First we'll head over to the Clash of Clans project page. Next, we'll select Download.

Downloading the Clash of Clans dataset from Roboflow

From there, we'll select meituan/PyTorchv6 as our export format and then we'll make sure show download code is enabled.

We'll select the meituan/YOLOv6 format with show download code enabled

We should now see that Roboflow has generated a snippet we can use to import the dataset into our project.

Copy the snippet generated by Roboflow

That's all there is to it! We now have a snippet that we can inject into our notebook that will download the Roboflow dataset into our project. Excellent!

Setting up the test environment in Paperspace

We'll be using Gradient Notebooks to run a simple benchmarking environment on Paperspace.

The notebook file is located in this GitHub repo. We can pull the repo directly into a notebook at the time we create a new notebook.

In the Paperspace console, we'll first navigate to Gradient, which is Paperspace's machine learning platform backed by powerful GPUs, and then create a new notebook within a project.

We'll then select the PyTorch 1.12 runtime.

Select the PyTorch runtime

Next, we'll select the machine. In this case we'll start off with the P6000 GPU knowing that we can stop the notebook and restart on a different machine at any time.

Select a machine such as the P6000 GPU

Next we'll toggle Advanced options and paste the URL of the testing repo into the Workspace URL field.

roboflow-yolo-benchmark/YOLO-training.ipynb at main · gradient-ai/roboflow-yolo-benchmark
Contribute to gradient-ai/roboflow-yolo-benchmark development by creating an account on GitHub.
Add the workspace URL to automatically pull the benchmarking repo into the notebook

Now we can start the notebook and we should see that our notebook has entered the Running state. Nice!

Our notebook is now running on a P6000 GPU

Now all that we have left to do is to inject the snippet that we grabbed from Roboflow into the appropriate code cells. We'll make sure to do that and then we should find ourselves with a working benchmarking notebook.

Training and benchmarking

For this object detection task we ran two different YOLO models, MT-YOLOv6 and YOLOv7 PyTorch.

Let's take a look at some of the details of the two models we'll be running. Let's be sure to note the size difference between them as that will play into the training times that we might expect to see when we start training.

YOLOv6 basic network details

Model YOLOv6
Layers 295
Parameters 17.2M

YOLOv7 basic network details

Model YOLOv7
Layers 415
Parameters 37.2M
GFLOPS 105.4

In our testing, we used two different models across two different datasets powered by three different GPU machines – for a total of twelve different combinations.

Each run was compared for average epoch time which was a sample of 5 epochs. We then extrapolated this training sample out to 100 epochs for the purposes of the comparison below. Once we establish these baselines, we'll then train the most promising combinations for the full 100 epochs.

In order to extrapolate to 100 epochs, training time was multiplied by the on-demand price for each GPU machine to determine an estimated training cost. This way we can compare the training time as well as the actual cost of each benchmark.

The following Paperspace GPU machines were compared:

GPU Type GPU Memory TFLOPS (SP) Tensor Cores CPUs RAM $ per hour
V100 16 GB 14 640 8 30 GB $2.30
Quadro P5000 16 GB 8.9 0 8 30 GB $0.78
RTX A6000 48 GB 38.7 336 8 45 GB $1.89


The twelve training runs resulted in the following benchmarks.

The detailed performance metrics are detailed below. Note that in each of the following four tables, we've extrapolated 5 epochs out to 100 epochs to get an estimate for the full training times and costs.

YOLOv6 - Sheep dataset

Machine Type Model Datasets Average Epoch Time On-Demand Hourly Pricing Training Cost (100 epochs)
P5000 YOLOv6 Sheep 3 min 53 seconds $0.78 $5.05
A6000 YOLOv6 Sheep 1 min 28 sec $1.89 $4.62
V100 YOLOv6 Sheep 1 min 25 sec $2.30 $5.43

YOLOv6 - Clash of Clans dataset

Machine Type Model Datasets Average Epoch Time On-Demand Hourly Pricing Training Cost (100 epochs)
P5000 YOLOv6 Clash of Clans 9 seconds $0.78 $0.20
A6000 YOLOv6 Clash of Clans 5 seconds $1.89 $0.26
V100 YOLOv6 Clash of Clans 8 seconds $2.30 $0.51

YOLOv7 - Sheep dataset

Machine Type Model Datasets Average Epoch Time On-Demand Hourly Pricing Training Cost (100 epochs)
P5000 YOLOv7 Sheep 5 min 24 seconds $0.78 $7.02
A6000 YOLOv7 Sheep 1 min 24 sec $1.89 $4.41
V100 YOLOv7 Sheep 1 min 51 sec $2.30 $7.09

YOLOv7 - Clash of Clans dataset

Machine Type Model Datasets Average Epoch Time On-Demand Hourly Pricing Training Cost (100 epochs)
P5000 YOLOv7 Clash of Clans 34 seconds $0.78 $0.74
A6000 YOLOv7 Clash of Clans 12 seconds $1.89 $0.63
V100 YOLOv7 Clash of Clans 30 seconds $2.30 $1.92

Results from training YOLOv6 and YOLOv7 for 5 epochs

In 3/4 training environments, the newer A6000 GPU machine with 48 GB of GPU memory performed the most cost-effective training.

It should be no surprise that training on the Aerial Sheep dataset required more compute time and power. In addition to having more images, the Aerial Sheep dataset also features substantially larger images in terms of resolution. It's clear from the data that bigger dataset images benefit from machines with higher TFLOPS performance.

When starting to scale-up image sizes and dataset sizes, we started to see some of the benefits of the higher-end GPUs like the A6000 and V100 in terms of processing time.

In this case the A6000 clearly processes Aerial Sheep for both YOLOv6 and YOLOv7 the fastest. So even though it’s not the cheapest GPU, it still will be cheapest over a longer training cycle. These are the kinds of tradeoffs that are useful to identify when preparing for a longer training runs.

Taking it one step further

Now that we’ve looked at temporary costs, let’s actually run the models over the 100 epochs for our selected combinations and see what performance we are getting.

We'll be performing the full 100 epoch training interval with the help of the P5000 and the A6000 machines and we'll be measuring the performance of the model using mAP@0.5.

If you would like to know more about mAP as a metric, please check out Evaluating Object Detection Models Using Mean Average Precision (mAP) from the Paperspace blog.

Note that in each of the two following tables we've trained for the full duration of 100 epochs rather than extrapolating.

YOLOv6 and YOLOv7 - Clash of Clans dataset

Machine Type Model Datasets Training Time (100 epochs) On-demand Hourly Pricing Training Cost (100 epochs) mAP@0.5
P5000 YOLOv6 Clash of Clans 12 min 32 seconds $0.78 $0.15 0.082
P5000 YOLOv7 Clash of Clans 22 min 1 second $0.78 $0.27 0.068

YOLOv6 and YOLOv7 - Sheep dataset

Machine Type Model Datasets Training Time (100 epochs) On-demand Hourly Pricing Training Cost (100 epochs) mAP@0.5
A6000 YOLOv6 Sheep 1 hour 32 min 53 seconds $1.89 $2.93 0.933
A6000 YOLOv7 Sheep 2 hours 29 min 56 seconds $1.89 $4.72 0.918

Results from training YOLOv6 and YOLOv7 for 100 epochs

One of the clearest takeaways from the longer training run is that for the datasets we're working with, neither YOLOv6 nor YOLOv7 is giving significantly better mAP@0.5 results than the other.

Secondly, we should note that as expected YOLOv6 is running in a little over half the time of YOLOv7.

Finally, as related to the datasets themselves, we can see that the Clash of Clans dataset, while fun to work with, has not provided enough data to get accurate results with either model. This is not entirely surprising as the dataset has more than 10 classes with a small number of images.

The Aerial Sheep dataset, meanwhile, is much larger and with a single prediction class and we’re seeing mAP@0.5 of >0.9, resulting in a much better outcome for our training runs.

Up next

We hope this blogpost helped you to understand what's possible when using GPUs from Paperspace and datasets from Roboflow. Be sure to give a follow to @hellopaperspace and @roboflow to keep up with all the latest and greatest computer vision projects that are coming out.

And if you really enjoyed this benchmark and want to spin up your own object detection project using YOLOv6 or YOLOv7, make a free account on Paperspace and be sure to check out Roboflow Universe.

Add speed and simplicity to your Machine Learning workflow today

Get startedContact Sales

Spread the word

Keep reading