← Volver a resultados
Ficha bibliográfica · Consulta y acceso
Artículo

Temporal Dependency‐Aware Trajectory‐Level Behavioural Metric for Exploration in Reinforcement Learning

Anjie Zhu et al · Wiley · 2026

Acceso abierto disponible
Lectura rápida. Revisá los datos básicos del recurso y luego accedé al contenido desde el botón principal. En esta ficha solo se muestra la información necesaria para identificar la obra, citarla y abrirla.

Acceso al recurso

Entrá al contenido desde la opción principal o elegí otra fuente disponible.

Acceso principal

Acceso abierto disponible

Recurso identificado como acceso abierto, sin confirmar automáticamente si es texto completo directo.
Abrir recurso

Resumen

Descripción general del contenido del recurso.

ABSTRACT Intrinsic motivation serves as the predominant paradigm of exploration in reinforcement learning. In pursuit of an informative and robust state representation, the behavioural metric groups behaviourally equivalent states together, which share the same single‐step reward and transition distribution. However, due to the presence of uninformative rewards and the dynamic nature of procedurally generated environments, these behavioural metric‐based approaches could limit the effectiveness of the learnt state representations, potentially leading to a representation collapse and an ineffective exploration. Therefore, a more comprehensive and generalisable behavioural metric is needed to overcome the above issues. In this work, we approach the exploration problem from a novel perspective, extending beyond the conventional single‐step assessments to encompass a long‐term consideration of the whole trajectory. Specifically, we propose a novel trajectory‐level behavioural metric (TBM) that exploits temporal dependencies of the trajectory and captures the underlying sequential information of behaviour patterns. To achieve an effective trajectory representation for exploration, we develop a pivotal state identifier (PSI) and a trajectory return estimator (TRE) to distinguish the diverse contributions of individual states in the trajectory. Moreover, an auxiliary representation regulariser is developed to promote the diversity and informativeness of the trajectory representation, mitigating the risk of representation mode collapse. Extensive experiments and empirical analysis conducted on procedurally generated environments showcase the superior performance of our proposed framework.

Cómo citar

Elegí el formato que necesitás y copiá la referencia al portapapeles.

APA 7

al, A. Z. E. (2026). Temporal Dependency‐Aware Trajectory‐Level Behavioural Metric for Exploration in Reinforcement Learning. https://doi.org/10.1049/cit2.70109

MLA

al, Anjie Zhu et. "Temporal Dependency‐Aware Trajectory‐Level Behavioural Metric for Exploration in Reinforcement Learning." 2026. https://doi.org/10.1049/cit2.70109.

Chicago

al, Anjie Zhu et. 2026. "Temporal Dependency‐Aware Trajectory‐Level Behavioural Metric for Exploration in Reinforcement Learning.". https://doi.org/10.1049/cit2.70109.

Harvard

al, A. Z. E. 2026, Temporal Dependency‐Aware Trajectory‐Level Behavioural Metric for Exploration in Reinforcement Learning, Wiley, available at: https://doi.org/10.1049/cit2.70109 [Accessed 29 Jun. 2026].

Compartir e imprimir

Guardá la ficha, copiá su enlace permanente o imprimila como PDF.

Exportar referencia

Si usás un gestor bibliográfico, podés exportar el registro en los formatos más comunes.

Detalles del recurso

Información bibliográfica útil para confirmar que se trata del material correcto.

Título
Temporal Dependency‐Aware Trajectory‐Level Behavioural Metric for Exploration in Reinforcement Learning
Autor / colaboradores
Anjie Zhu et al
Editorial
Wiley
Año de publicación
2026
ISSN
2468-2322
ISSN
2468-2322
Idioma
eng

Materias

Explorá otros recursos relacionados a partir de estas materias.

Copiado