In order to gain an understanding of more complex RL algorithms working with the FlexSim model, which includes state machine logic, multi-agent passenger management, and complex observation spaces
How
Read and understood how the learning algorithm interfaced with the FlexSim via the state machine diagrams
Read about how the system uses action masking (A vector to store decisions) to output decisions to act upon
Ran the corresponding Python files to see it in action (The environment, the training, and the inference based on the trained model)
Learned about command tensorboard --logdir=. in order to see the Tensorboard interface, and model results without using the command palette interface
Notice the difference in performance of the FlexSim model with the trained RL model making decisions rather than taking random decisions
So What
Completing and understanding this allows for the bridging with previous more simpler concepts (From HelloWorld) to more complex agent models to understand in the future