How to use Arena

Arena enables you to build and deploy agents faster than ever before, across industries including finance, logistics, defence and robotics.
‍
For more support, please book a demo.

Book a demo

For the best user experience, we recommend using Google Chrome.

How do I start using Arena?

Getting started with Arena

Creating your account
1. Visit the account registration page
2. Enter your details and use a strong password
3. Click Register to create your account

‍Note: Once your account is created, you'll be automatically enrolled in the Free plan (no credit card required) and directed to the account set up page.

Logging back in
Access your Arena account at any time with these simple steps:
1. Visit the login page
2. Enter your credentials and click the Sign In button

Forgotten your password?
1. Visit the Forgot Password page
2. Enter your registered email address
3. Check your email for a password reset link (valid for 24 hours)
4. Click the link and create a new password
5. Log in with your new credentials

Dashboard and Profile
The dashboard displays an overview of Arena and is a convenient way to view recent projects and experiments, create new experiments or custom environments, watch Arena tutorials and read the latest changes on Arena.
‍

Other areas of your account can be accessed through the profile manager. It can be accessed by clicking on the profile icon on the top right of the page.
‍

Plans & pricing

When you first sign up for Arena, you'll be automatically enrolled in our Free plan. This gives you immediate access to the platform so you can start training agents immediately.

Note: All new accounts automatically start with the Free plan. No credit card required to get started!

Upgrading your plan
Upgrading your Arena plan is simple and can be done at any time. Your upgrade will take effect immediately.
1. Navigate to your Account Settings by clicking on your profile icon in the top-right corner
2. Select Billing and members from the dropdown menu
3. Click on the Change plan button
4. Choose your desired plan from the available options
5. Enter your payment information (credit card or PayPal)
6. Review the charges and confirm your upgrade

Important: When upgrading, you'll be charged a prorated amount for the remainder of the current billing cycle. Your next full charge will occur on your regular billing date.

Downgrading your plan
You can downgrade your plan at any time. The downgrade will take effect at the end of your current billing cycle.
- No partial refunds are provided for downgrades
- You'll retain access to your current plan features until the end of the billing period
- Ensure your usage will fit within the lower plan's limits before downgrading

Topping up training credits
If you are running low on training credits, you can simply top up without upgrading your plan.
1. Navigate to your Account Settings by clicking on your profile icon in the top-right corner
2. Select Usage from the dropdown menu
3. Navigate to the Need more training credits? section and click the Buy now button
4. Select the package that suits your needs or add a custom amount
5. Click Purchase and follow the checkout steps
‍

Usage & limits
‍

Checking your usage
Monitor your resource usage and limits to ensure you're staying within your plan's boundaries. You'll receive notifications before reaching limits. Once exceeded, new experiments will be saved but not started until the next billing cycle, when you upgrade or top up your credits.‍
1. Navigate to your Account Settings by clicking on your profile icon in the top-right corner
2. Select Usage from the dropdown menu
3. View your plans's statistics including:
- Training credits available in your plan and how many have been used
- Storage utilised and available
- Deployment slots used and available
- Team member slots available and utilised

‍Tip: Usage resets on the first day of each billing cycle. Unused resources do not roll over to the next month.

Viewing your daily & monthly usage
Track your credit consumption and balance throughout each billing period with detailed usage statements.‍
1. Navigate to your Account Settings by clicking on your profile icon in the top-right corner
2. Select Usage from the dropdown menu
3. Select the billing period you want to review using the month dropdown (e.g., "August 2025")
4. View your monthly summary including:
- Balance brought forward - Credits carried over from the previous month
- Purchased - Credits added through top-ups and plan renewal during the month
- Consumed - Total credits used for experiments and compute resources
- Closing balance - Remaining credits at the end of the period

Daily transaction breakdown
The statement table shows all daily transactions for the selected month:‍
- Date - When the transaction occurred
- Top Up - Credits purchased on that day
- Credits Used - Credits consumed by your experiments
- Balance - Running balance after each transaction

Downloading your statement
Click the Download button next to any daily transaction to export a detailed CSV breakdown of the selected month’s activity. This includes:
- Individual experiment costs
- Compute resource usage per experiment
- Experiment IDs for reference

‍Note: Daily statements provide granular tracking of your usage, making it easy to allocate costs across projects or teams and identify usage patterns.

Billing & Invoices
‍

Payment methods
Arena accepts multiple payment methods for your convenience:
- Credit cards (Visa, MasterCard, American Express)
- Debit cards with online payment capability
- Wire transfer (Enterprise plans only)
- Purchase orders (Enterprise plans only)

Updating billing information
1. Go to Billing & members
2. Find the Billed to section and select Edit information
3. Here you can edit the contact name and billing email address)

Updating Payment method
1. Go to Billing & members
2. Find the Payment method section and select the Edit method button
3. Here, you can delete the existing payment method or add a new one

‍Security Note: All payment information is encrypted and processed through PCI-compliant payment processors. Arena does not store credit card numbers.

Accessing your invoices
1. Navigate to your Account Settings by clicking on your profile icon in the top-right corner
2. Select Billing & members
3. In the Invoice section, find the invoice you need and click View Invoice‍

Note: All invoices are automatically generated and available for download in your account.

Team member management

Adding team members
1. Navigate to your Account Settings by clicking on your profile icon in the top-right corner
2. Click the Manage members button
3. Enter email address and click Invite to add the user to your workspace
4. Each email address should receive an email with an invitation link to join your team

Note: For Professional, Business and Enterprise plans, you can add members to your workspace.

Removing team members
1. Navigate to your Account Settings by clicking on your profile icon in the top-right corner
2. Click the Manage members button
3. The members table shows a list of team members and their roles
4. Use the Bin icon to remove members
‍

How can I upload and validate a custom environment?

Arena can be used to train reinforcement learning agents on your custom environments.
1. Navigate to the Custom Environments section on the sidebar.
2. Select the New Environment button on the top right.
3. A pop up window will appear to start setting up your custom environment.
‍

4. Enter the Name and Description of your environment.
5. Select whether it is a Single-agent or Multi-agent environment.
‍

6. Upload your environment file(s) using the File or Folder buttons.‍

Note: If your environment is more complex than a single file, you can upload a folder containing nested folders and scripts. At the highest level within the directory, a requirements.txt file should be included to specify dependencies, and a config.yaml file should also be included within a configs directory to outline any arguments that the environment will require.‍

7. Click Next to either upload your Requirements file or type out your dependencies in the box on the right.
‍‍

8. Move on to upload a config file or provide any arguments required by the constructor of your environment class.
‍

‍
9. After the Configuration step, move on to select the path to your environment class.

‍Note: Arena will automatically detect these and display a list or possible options. Select the entrypoint you want to use from the list.

‍Tip: You also have the option to rename the first version of the environment. "v1" is the default version name however you can change this in the entry field at the bottom of the pop up window.

10. View or make changes to your file(s) by selecting the files from the directory.
‍

11. Once you have made changes, click the Validate button to run the validation checks against the entire environment. If you would like to save your changes, simply click Save.

Note: If you want to validate the environment but do not want to run test episodes to see if you are able to train on the environment, simply untick the Run random episodes in validation tickbox on the top right and then click Validate.

Note: If you make changes to any of the files and would like to keep the original version, simply click the Save button and enter a new version name. This will create a new version of your environment. Arena will also ask you if you would like to save, when you have unsaved changes.

12. View the various versions and the results of the validation checks by selecting the custom environment name in the breadcrumb at the top of the page. This will open up a the full list of versions in an accordion.
‍

Note: Once the environment has been successfully validated, you can select it as an environment in the Environment section of the experiment workflow (follow the "How do I train an agent?" FAQ which walks you through the training workflow). When you navigate to this section you can locate your custom environment by searching for it or looking through the Environments table.

What format should my custom environment be in?

Custom environments must be implemented in Python and they must follow the Gymnasium API. Uploaded environments will be validated to ensure they follow the Gymnasium API and you will not be able to commence training until these checks are all completed successfully.

If uploaded environments use a render method, the environment will only be rendered during validation if an rgb_array render mode is provided.

Custom environment packages should be uploaded as a folder, containing all necessary files. At the highest level within the directory, a requirements.txt file should be included to specify dependencies, and a config.yaml file within a configs directory to outline any arguments that the environment will require. You can also upload a single script if your environment is contained entirely within this script.

How do I train an agent?

Leverage our novel evolutionary hyperparameter optimisation to train the most performant reinforcement learning agents in a fraction of the time.

Start by understanding the Group, Project, Experiment and Agent structure that makes it easier to use Arena.

Group: A Group is the top-level organization unit. It helps you manage and categorize multiple projects, allowing teams to collaborate and share resources.
‍Project: Within a Group, you create Projects. Each Project is a workspace where you can manage related experiments and keep all the work for a specific goal or task together.
‍Experiment: An Experiment is part of a Project and is where you define and run your tests. It's the environment where you configure the settings, parameters, and data to train an agent.
‍Agent: The Agent is what you train in an Experiment. It's the entity that learns and improves based on the data and parameters you've set in the Experiment.

1. To begin, create a group by selecting the Experiment Groups tab on the sidebar.
2. Select the New Group button.
‍

‍
3. A pop up window will appear - enter a Name for your group and give it a Description before saving it.
‍

4. Select the View button on the new group tile - this will take you to a new page allowing you to create projects for your experiments.
‍

‍
5. Select the New Project or the Create Project button if you are creating the first project within the group.
6. A pop up will appear allowing you to enter a project Name and Description.
‍

7. Once a project has been created, you can now host multiple experiments within this project. To create an experiment in your project, select the New Experiment button on the top right.
8. A pop up will appear allowing you to enter an experiment Name and Description.
‍

9. Once an experiment is created, you will be taken into the experiment creation flow. To train an agent you will need to first select the resources you wish to use. Each resource shows the specs as well as the cost per hour.
‍

Note: You will be able to update your resources later in the experiment flow if you wish to change them by selecting the Edit resources button on the bottom right of the window.

10. Once you have selected your resources, click the Next button on the top right of the screen to move on to choosing an environment. This can either be one of the many popular environments available on the platform or a custom environment which has been previously uploaded (this tutorial is using Pusher).
‍

‍
11. After selecting the environment, you will be shown a summary of the environment, the validation checks, and a graph showing the reward distribution of a random agent for your reference as well as a rendered gif of your environment if your environment has a render function and an rgb_array render mode.`
‍

12. Click Next to be taken to the Agent section where you will be able to select an algorithm that is compatible with your chosen environment. Here you will also be able to edit the neural network settings and algorithm parameters once you have selected the algorithm you wish to use. The Arena platforms also features novel implementations like the ability to use the SimBa architecture introduced by Sony. This can be toggled on at the top of the Neural network section once an algorithm has been selected.

Note: DQN, DQN Rainbow and PPO are algorithms used to train discrete-action type agents. DDPG, TD3 and PPO are algorithms used to train continuous-action type agents. If you need to learn about any of the given parameters or specifications, you can hover over the information icons provided. MLP (linear layers) or CNN (convolutional layers stacked with linear layers) are loaded based on whether the environment has vector or image observations respectively. Those architectures can be modified by adding or deleting layers directly on the page.

Tip: The AgileRL framework documentation offers a great overview of SOTA reinforcement learning algorithms and techniques along with references.

13. After completing the Agent section, click the Next button to proceed to theTraining setup. All experiment configurations are pre-populated with template values and can be updated. The various training sections include settings for environment vectorization (that each agent in the population trains on separately), the replay buffer as well as its parameters for off-policy algorithms.
‍

14. Thereafter, you can edit the hyperparameter optimization settings in the next section, HPO. By default, HPO is turned on and can be toggled off if needed.

Note: AgileRL Arena uses evolutionary population training with agent mutations. A tournament selection occurs at the end of each successive epoch upon which the fittest agents are preserved, cloned and mutated according to a Tournament Selection method. This uses a random subset of the population to select the next fittest agent, as well as elitism (whether to keep the overall fittest agent). Those selected agents are then mutated according to random probabilities that the user can set manually. The mutations are applied to evolve the neural network architectures and algorithm learning hyperparameters.

15. When you are done with the HPO section, click the Next button, which will take you to the Summary page. Here you can see a summary of the experiment settings.
‍

16. Once you have verified the settings, you can either start training immediately or save the experiment for later. You can do this the Train or Save buttons on the top right of the page. This will take you to the experiment list which shows you all your experiments and their statueses.
‍

Note: It could take up to 10 minutes for the experiment to start, so site tight and arena will notify you when the experiment has finished!

How can I monitor an experiment?

Visualising experiments and experiment controls‍

Running and Completed experiments can be viewed on the Results tab. You may apply further controls to your experiments (as shown below): Stop training and Edit mutation parameters during an experiment, and View logs (during and after an experiment is completed), the latter of which may be helpful in making decisions for a live experiment.

Experiments currently in training will be listed as Running, and you will be able to visualise their performance results in real-time. You can set the data for many experiments as either exposed or hidden using the eye-shaped icon (which will be shown as whole or crossed, respectively). If you think that an experiment has ended early (results converged), or that mutation probabilities and ranges need to change for the better, you may click the relevant icon at the end of each row in the Running table. This will give you the ability to Stop or Edit the experiment.
‍

Updating Hyperparameter Optimisation‍

Over the course of an experiment, you may want to adjust the mutations to boost the learning of the agents. You can do this in two ways: adjusting the probability of specific mutations, and adjusting the range of the target changes, such as the learning step. To do this, select the Edit mutation parameters button in the experiment menu. A popup will appear which will allow you to update the mutation parameters in real time.
‍

Note: You can also disable the mutation process for the agents by setting probabilities to 0 for all mutation types except None.

Looking at the logs‍‍

In order to match a particular mutation and agent selection to a performance change in one of the visualisation plots, you may filter the logs by time range via the Run Query button on the top right along with the start/end time and dates selection tool. Then you can inspect the fitness evaluations for each agent and judge the quality of the current hyperparameters selection. You may also directly search for a particular keyword or substring in the logs.
‍

Resuming a stopped experiment‍

An experiment which has run its course (succeeded) or has been halted can be resumed from the Experiments page. It will run again for the same number of steps it has been initially set with upon its creation. In other words, an experiment of 1 million steps will always be scheduled to run for that many steps every time it is run.

How can I analyse the performance of my agent?

Once training has been started, population performance can be tracked via the Results page. Performance metrics of an experiment will be updated periodically, until the experiment completes (and the visualisations reach their final state).
‍

How to visualise results of an experiment

One or more experiments may be selected for visualisation. When more than one experiment is selected for visualisation, it is possible to directly compare the performance of distinct algorithms (or environments) simultaneously, on the same plots. A visualisation will load the data from an experiment run. The eye-shaped icon is used to show or hide this data from the plots of an experiment.

Users have the option to view default plots or create custom graphs. In the accordion on the right of the graph page is split into 3 sections. The Custom Charts section for users to create their own graphs, the Default Training Charts section which shows preconfigured graphs (explained below) and the Hyperparameter and Network Optimization Charts section which displays default charts showing the how the hyperparameters evolve over time.

‍Note: Plots only appear if an experiment has already produced data to show. Experiments which have just been created will not immediately have the data.

Visualising a single run

Visualising your experiment as a single run alone yields insight into the evolutionary HPO. To do so, select or expose only one of the experiments. This will reveal metrics for each of the agents in the population. Those are: the score (i.e. taken from the active training or learning episodes), fitness (evaluated separately without the training), learning algorithm losses, episode time (calculated from the evaluations) and finally the number of steps through time (to compare the speed of different agents).

In any of the plots, clicking on one of the legends allows you to hide or display the entity in the plot (in the case of this single experiment inspection: each agent in the population).
‍

Comparing two or more runs

When comparing two or more runs, the visualisations switch from the agent-level to population-level. Now, for each experiment, you’ll be able to see the averaged score across agents within the population, the best fitness, the average learning algorithm losses, the average episode time, and the number of steps taken.
‍

Performance against real time

In both single or multiple run visualisations, it is possible to convert the plots to display the progress against time, as opposed to steps (using the toggle button on the top right):
‍

Experiment summaries and comparison

There will be a table beneath the performance plots, which can be expanded for each domain (e.g. Environment, Algorithm…), to display specification summaries of the selected visualisation experiments side-by-side.
‍

A concise summary can be extracted by filtering the display to show only the differences across experiments, which can be useful for determining the best training configuration.
‍

How can I deploy and use my trained agent?

Deployment is done from the Experiments page. This can be done by selecting the Deploy button on the row of the experiment you wish to deploy. The experiment details up will then appear, from which you can pick the checkpoint you wish to deploy (using the radar icon). By default, the latest checkpoint is selected.
‍

After you have selected Confirm, the agent will be added to the Deployments page. There, it can be activated and de-activated via the ‘Connect’ and ‘Disconnect’ icon on its corresponding row.
‍

The Deployments page comes with a default API (script) that you should use for querying deployed agents. A connection is established to a deployed agent via its URL. An API key token is also provided for each experiment and this will be passed as a token to the API for authentication (with OAuth2). A query must be supplied with a batch of (single or several) environment states that the agent can return actions for (with the generic objective of stepping through the corresponding environments) while the agent continues to be deployed natively in AgileRL arena. The expected http response is 200 / success returned along with the action value(s) and an acknowledgement as the input batch size. Otherwise, the error code shall be returned along with a JSON response if available.

Note: Your deployments API Key and Endpoint will be visible in the greyed out boxes respectively when fully deployed.

How does evolutionary hyperparameter optimisation work?

Traditionally, hyperparameter optimization (HPO) for reinforcement learning (RL) is particularly difficult when compared to other types of machine learning. This is for several reasons, including the relative sample inefficiency of RL and its sensitivity to hyperparameters.

AgileRL significantly improves HPO for reinforcement learning through the use of evolutionary algorithms to reduce overall training time whilst making the process more robust. Evolutionary algorithms have been shown to allow faster, automatic convergence to optimal hyperparameters than other HPO methods by taking advantage of shared memory between a population of agents acting in identical environments.

At regular intervals, after learning from shared experiences, a population of agents can be evaluated in an environment. Through tournament selection, the best agents are selected to survive until the next generation, and their offspring are mutated to further explore the hyperparameter space. Eventually, the optimal hyperparameters for learning in a given environment can be reached in significantly less steps than are required using other HPO methods.
‍

Our evolutionary approach allows for HPO in a single training run compared to Bayesian methods that require multiple sequential training runs to achieve similar, and often inferior, results.

Interactive library

Check out the interactive library to help you get the most out of Arena.