- γ ("gamma") - This is the discount rate. Each step bender takes in an episode lessens the rewards gained on the following steps, which encourages Bender to clear the board as fast as possible.
- Ɛ ("epsilon") - This is the likelihood of selecting a non-optimal action. The agent may take an action based on pure randomness as opposed to comparing outcomes. This is the tendency to explore new options, versus the tendency to exploit any previous training.
- η ("eta") - The learning rate. Rewards gained have the discount (gamma) rate applied, and then have the learning rate applied. This further reduces the impact of a single instance of a reward being gained.
-
Notifications
You must be signed in to change notification settings - Fork 0
NelsonG6/BenderWorld
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
About
Reinforcement Learning with Bender
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published