This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This is a submission of the second training epoch of Flappy Bird using Deep Q-Learning, built with Python and Tensorflow. | |
The first epoch was trained with the following settings: | |
First buffer with 1000 random iterations | |
training episodes: 20000 | |
learning rate for Adam Optimizer: 1e-5 | |
we used our sinusoidal epsilon function with parameters: | |
starting epsilon: 1.0 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Model free Q-Learning in an MDP style environment | |
Utilized code from Berkeley's CS188 Reinforcement Learning project | |
Introduced an epsilon decay to offer a transition between early exploration and late exploitation | |
QLearning paramters: | |
alpha = 0.1 | |
epsilon = 1.0 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Tabular Q-Learning on CartPole-v1: | |
Utilized code from Berkeley's CS188 Q-Learning project | |
Discretized the state space from continuous values | |
#cart_x, cart_velocity, pole_theta, pole_velocity | |
[5,10,20,10] | |
Introduced an epsilon decay to offer a transition between early exploration and late exploitation |