Start by creating a group, then set up a project within that group. Inside the project, you can create an experiment, where you can train an agent. This structure helps you keep your work organised.
Note:
Group: A Group is the top-level organization unit. It helps you manage and categorize multiple projects, allowing teams to collaborate and share resources.
Project: Within a Group, you create Projects. Each Project is a workspace where you can manage related experiments and keep all the work for a specific goal or task together.
Experiment: An Experiment is part of a Project and is where you define and run your tests. It's the environment where you configure the settings, parameters, and data to train an agent.
Agent: The Agent is what you train in an Experiment. It's the entity that learns and improves based on the data and parameters you've set in the Experiment.
To start, you will need to create a group by selecting the Groups tab on the sidebar and then selecting the New Group button.
Once you have created a group, select the View button on the new tile which will take you to a new page allowing you to create projects for your experiments.
Select the New Project or the Create Project button if you are creating the first project within the group. A pop up will appear allowing you to enter a project name and description.
Once a project has been created, you can now host multiple experiments within this project. To create an experiment in your project, select the New Experiment button on the project page. A pop up will appear allowing you to enter an experiment name and description.
Once an experiment is created, you will be taken into the experiment creation flow. To train an agent you will have to choose an environment. This can either be one of the many popular environments available on the platform or a custom environment which has been previously uploaded.
Once you have selected an environment, the validation results will be displayed. The Environment Summary table contains a summary of the operational parameters of the selected environment as well as its known category (if not custom) and version saved. The Environment Test Results from the environment validation are displayed in the top right quadrant of the page. Finally, a random agent rollout of the environment is displayed with the scores plotted over 100 episodes and the environment rendering if available.
Next, you will be prompted to select an algorithm that is compatible with your chosen environment. DQN, DQN Rainbow and PPO are algorithms used to train discrete-action type agents. DDPG, TD3 and PPO are algorithms used to train continuous-action type agents. If you need to learn about any of the given parameters or specifications, you can hover over the information icons provided. The
AgileRL framework documentation also offers a great overview of SOTA reinforcement learning algorithms and techniques along with references.
MLP (linear layers) or CNN (convolutional layers stacked with linear layers) are loaded based on whether the environment has vector or image observations respectively. Those architectures can be modified by adding or deleting layers directly on the page.
After completing the Algorithm section, you will need to click the Next button to proceed to the Training setup. All experiment configurations are pre-populated with template values. The various training sections include environment vectorization (that each agent in the population trains on separately) and the replay buffer and its parameters for off-policy algorithms.
AgileRL Arena uses evolutionary population training with agent mutations. A tournament selection occurs at the end of each successive epoch upon which the fittest agents are preserved, cloned and mutated according to a Tournament Selection method. This uses a random subset of the population to select the next fittest agent, as well as elitism (whether to keep the overall fittest agent). Those selected agents are then mutated according to random probabilities that the user can set manually. The mutations are applied to evolve the neural network architectures and algorithm learning hyperparameters.
When you are done with the Training section, select the Next button, which will take you to the Resources page, which will display a summary of the experiment settings. On this page, you will be able to select the size of the experiments compute node.
Note: For the Beta, all compute nodes are set to Medium and cannot be changed.
You can either start training immediately by clicking on the Train button on the top-right corner shown in the figure above, or simply saving the experiment and submitting a training job later. Once you click Train or Save, you will be redirected to the Experiments page which will show the new experiment as well as previous experiments.