Move Quickly, Think Deeply: How Research Is Done @ Paperspace ATG

In this post, we'll be giving a broad overview of the tools and practices the Advanced Technologies Group, or ATG, uses to explore various research, in the form of a high level research workflow.

5 years ago   •   6 min read

By Harsh Sikka

[12/2/2021 Update: This article contains information about Gradient Experiments. Experiments are now deprecated. For more information on current Gradient Resources, please see the Gradient Docs]

The Advanced Technologies Group is an R&D-focused team here at Paperspace, comprising ML Engineers and Researchers. As a group, we're interested in exploring advanced topics in deep learning, data engineering, computer systems, and UI/UX, with the downstream intent of building intelligent applications. If our work sounds interesting to you, consider applying for our Research Fellowships!

In this post, we'll be giving a broad overview of the tools and practices the Advanced Technologies Group, or ATG, uses to explore various research, in the form of a high level research workflow. Many of our research topics sit at the intersection of fields like Deep Learning and Computer Systems. We tend to move fast and tackle ambitious, computationally intensive experiments, and since we have a lot of very useful tools and powerful compute available to us through Paperspace's Gradient platform, we can pursue research questions that involve topics that more traditional research groups in Academia sometimes avoid.

Here, we've outlined the general progression of a research workflow that we've found very useful in the types of projects we tackle. We'll discuss how we generally move from an initial exploratory phase where we scope out the problem and get some preliminary results. Then we'll cover how we scale up our experiments on the Paperspace Cloud. Finally, we'll cover how we version our experiments and keep track of internal progress on research agendas.

Focus on building models, not managing infrastructure.

Keeping up with the ML firehose

The sheer volume of ideas that are shared and papers that are published in the field of Machine Learning is almost incomprehensible. It's enormously difficult to keep up with every new idea that pops up daily, and many of these are of course incremental improvements on fundamental breakthroughs. At the ATG, our researchers join the team with specific ideas in mind that they intend on pursuing,  and usually some idea of how to get there. Ideas and projects in the past have included GPU Kernel programming, Adversarial Learning Schemes, and Neural Architecture Search.

We've worked to introduce a culture of deep inter-area collaboration with the ATG, and often ideas shift to include the expertise of another interested member of the team or anyone else at Paperspace in general. We're also open to topics that many ML Theoreticians stray away from, including interpretability, design, and Human in the Loop systems. New ideas are shared through Lunch and Learn talks, reading group meetings, and a general open culture that allows anyone to strike a conversation on an interesting project up with anybody else. We've had software engineers, project managers, and deep learning researchers excitedly discussing the implications of modularity and pruning in deep neural networks. It's an awesome experience. Bright people, natural curiosity, and a very collaborative culture leads to many incredible ideas and projects forming here at Paperspace.

Exploring an idea: The bread and butter of research

To those not as familiar with research, it may seem a very ambiguous and daunting task to dive in, especially if your only experience is reading papers and seeing final results. The reality of experimentation, especially here at the ATG, is that we start out with a small extension or question about some experimental results. We may try to reproduce the results of a paper or test an idea in a new domain. Naturally as a result, interesting ideas and extensions emerge as we come to better understand the implications and underpinnings of the work.

When a novel idea starts to form as the result of this process, we scope it down to something empirically or theoretically testable. It is crucial to keep this as scoped down and simple as possible, so the resulting mechanisms at play or the desired result is clearly and directly visible. As an example, consider that we may want to test a new pruning mechanism. Rather than jump to test the new pruning scheme on complex architectures like ResNet, we would first train a simple fully connected, feedforward architecture and test the pruning mechanism there. Then we may add a CNN to the exploratory code and test the pruning mechanism on the new architecture.

Both when reimplementing another paper's results or trying out a new scoped out idea of your own, the goal is to have a high level of granularity and control in the process. In our team, we find Gradient Notebooks to be an invaluable tool in this process. Gradient Notebooks allow us to use containers with pre-installed libraries and software, and expose us to a Jupyter Notebook Interface with access to a shared workspace that allows for quick iteration and exploration.

Since moving fast and testing a lot of small scoped out possibilities to get a conceptual and empirical understanding is key, we make very frequent use of this feature. We've also recently been exploring the use of the Gradient SDK inside notebooks, allowing us to kick off Experiments and larger workloads quickly as well. If we generate a useful result, we can store it to the shared workspace storage and use it in more serious follow on experiments if we'd like. Additionally, if something about the research is computationally intensive even if we scope it down to a proof of concept experiment, Gradient allows us to specify what kind of GPU we'd like powering our notebook, something we aren't able to do on other services like Google Colab or with a local Jupyter notebook install.

With great initial results come large follow-on experiments.

Whoah. The initial explorations of our new idea yielded some very interesting results. Our hypothesis may just be correct, so now what? Well, in many fields, including Deep Learning, your method or result should really be tested on some larger benchmark task. Some of these may be small, but some can be quite computationally intensive.

Larger Experiments tend to be more structured and have a fair degree of software engineering involved. They take longer to set up and are also tested a little more rigorously to make sure training is in fact happening. This is usually when folks on our team start to shift to more organized code bases rather than monolithic files. We'll start using design principles and also begin to really document some of the engineering decisions. As a researcher, a notebook interface starts to become a little lacking when it comes to these larger scale experiments, as I'm no longer spending most of my time rerunning cells with small tweaks and rapidly redesigning the codebase.

At ATG, we have access to Gradient's Experiments interface, which allows us to basically treat computationally intensive runs of a particular codebase as jobs. These Experiments will run our specified code and have access to that same shared workspace we specified earlier. The result is the ability to spin off multiple experiments in parallel and get results quickly. We also make use of multinode features and distributed training where appropriate. Gradient also automatically parses statistics about our model processes, so we get some useful analytics around performance and other important metrics.

A quick note on tooling. We tend to use Tensorflow because of the expansive ecosystem and support for large systems level experiments. We also have used Pytorch and find it very useful.

Experiment versioning with Gradient CI

An ongoing problem in ML research, and perhaps CS research in general, is deciding how to version your research models and experiments. As researchers, we sometimes find tweaking small values in our codebase, like hyperparameter values, may have an enormous effect on our results. But, does changing the learning rate from .001 to .005 constitute an entirely new experiment that we're keeping track of? At the ATG, we've taken inspiration from our software engineering roots and decided that any useful committed change should constitute an experiment version.  After all,  the cost of a lost experiment is most certainly higher than having many incremental experiments tracked. Paperspace's GradientCI tool tracks changes and automatically runs those changes as experiments if we so desire. It will also automatically generate a useful report of the various metrics we want in a similar manner to how the Gradient Client does.

There is no right way to do research!

There really isn't. Research processes should be a combination of what makes sense for the sort of work you're doing and what makes your research group feel comfortable and excited. At ATG, we pull from a combined engineering and research background, and have found the approach we mentioned above very useful to testing out a ton of interesting ideas in the areas of DL and Systems.

Moving from flexible tooling like notebooks to more powerful interfaces like Experiments seems to follow the natural flow of the research work we're doing, and allows us to leverage software engineering best practices to be even more productive. As our team grows and we collaborate and build even more close ties with other world class researchers around the globe, we hope to further improve our open, collaborative, and curious culture.

Interested in joining Paperspace? You can check out our openings here!

Add speed and simplicity to your Machine Learning workflow today

Get startedContact Sales

Spread the word

Keep reading