论文部分内容阅读
在基于目标的强化学习任务中,欧氏距离常作为启发式函数用于策略选择,其用于状态空间在欧氏空间内不连续的任务效果不理想.针对此问题,引入流形学习中计算复杂度较低的拉普拉斯特征映射法,提出一种基于谱图理论的启发式策略选择方法.所提出的方法适用于状态空间在某个内在维数易于估计的流形上连续,且相邻状态间的连接关系为无向图的任务.格子世界的仿真结果验证了所提出方法的有效性.
In the goal-based reinforcement learning task, the Euclidean distance is often used as a heuristic function in strategy selection, which is not ideal for tasks whose state space is not continuous in Euclidean space.In order to solve this problem, The complexity of Laplacian feature mapping method is low, a heuristic strategy selection method based on spectrogram theory is proposed. The proposed method is suitable for the state space to be continuous over a manifold whose interior dimension is easy to estimate and The connection between adjacent states is the task of undirected graph.The simulation results of lattice world verify the effectiveness of the proposed method.