← Volver a resultados
Ficha bibliográfica · Consulta y acceso
Artículo

TNCOA: Efficient Exploration via Observation‐Action Constraint on Trajectory‐Based Intrinsic Reward

Jingxiang Ma et al · Wiley · 2026

Acceso abierto disponible
Lectura rápida. Revisá los datos básicos del recurso y luego accedé al contenido desde el botón principal. En esta ficha solo se muestra la información necesaria para identificar la obra, citarla y abrirla.

Acceso al recurso

Entrá al contenido desde la opción principal o elegí otra fuente disponible.

Acceso principal

Acceso abierto disponible

Recurso identificado como acceso abierto, sin confirmar automáticamente si es texto completo directo.
Abrir recurso

Resumen

Descripción general del contenido del recurso.

ABSTRACT Efficient exploration is critical in handling sparse rewards and partial observability in deep reinforcement learning. However, most existing intrinsic reward methods based on novelty rely on single‐step observations or Euclidean distances. These approaches struggle to capture trajectory‐level novelty and often perform poorly in partially observable settings. Moreover, they typically ignore the role of actions in driving observation changes, as not all actions lead to meaningful state transitions. To overcome these limitations, we propose a trajectory‐level novelty measure that estimates the novelty of a state by comparing current observations with past ones along the trajectory. To focus on meaningful exploration, we incorporate the mutual information between actions and trajectory novelty to filter out random fluctuations and retain only novelty caused by the agent's actions. Additionally, we introduce a first‐visit constraint on observation–action pairs, rewarding only interactions that result in state transitions to enhance exploration efficiency. We conducted experiments in the MiniGrid‐ObstructedMaze environment characterised by complex object interactions and sparse rewards. Results demonstrate that our method achieves state‐of‐the‐art performance in convergence speed and average returns. Furthermore, it shows strong generalisation on high‐dimensional Atari benchmarks and demonstrates robust performance in more challenging MiniGrid variants. Implementation code is available at: https://github.com/MurrayMa0816/TNCOA.

Cómo citar

Elegí el formato que necesitás y copiá la referencia al portapapeles.

APA 7

al, J. M. E. (2026). TNCOA: Efficient Exploration via Observation‐Action Constraint on Trajectory‐Based Intrinsic Reward. https://doi.org/10.1049/cit2.70100

MLA

al, Jingxiang Ma et. "TNCOA: Efficient Exploration via Observation‐Action Constraint on Trajectory‐Based Intrinsic Reward." 2026. https://doi.org/10.1049/cit2.70100.

Chicago

al, Jingxiang Ma et. 2026. "TNCOA: Efficient Exploration via Observation‐Action Constraint on Trajectory‐Based Intrinsic Reward.". https://doi.org/10.1049/cit2.70100.

Harvard

al, J. M. E. 2026, TNCOA: Efficient Exploration via Observation‐Action Constraint on Trajectory‐Based Intrinsic Reward, Wiley, available at: https://doi.org/10.1049/cit2.70100 [Accessed 28 Jun. 2026].

Compartir e imprimir

Guardá la ficha, copiá su enlace permanente o imprimila como PDF.

Exportar referencia

Si usás un gestor bibliográfico, podés exportar el registro en los formatos más comunes.

Detalles del recurso

Información bibliográfica útil para confirmar que se trata del material correcto.

Título
TNCOA: Efficient Exploration via Observation‐Action Constraint on Trajectory‐Based Intrinsic Reward
Autor / colaboradores
Jingxiang Ma et al
Editorial
Wiley
Año de publicación
2026
ISSN
2468-2322
ISSN
2468-2322
Idioma
eng

Materias

Explorá otros recursos relacionados a partir de estas materias.

Copiado