Abstract
This letter optimizes resource blocks allocated for channel estimation and data transmission in multiple-input and single-output (MISO) ultra-reliable low-latency communication (URLLC) systems. The goal is to improve the resource utilization efficiency subject to a reliability constraint. Considering that wireless channels are correlated in the temporal domain, and the channel estimation is not error-free, the problem is formulated as a partial observation Markov decision process (POMDP), which is a sequential decision-making problem with partial observations. To solve this problem, we develop a constrained deep reinforcement learning (DRL) algorithm, namely Cascaded-Action Twin Delayed Deep Deterministic policy (CA-TD3). We train the policy by using a primal domain method and compare it with a primal-dual method and an existing benchmark. Two channel models are considered for evaluation: the first-order autoregressive correlated channel model and the clustered delay line (CDL) channel model. The simulation results show that the proposed primal CA-TD3 method converges faster than the primal-dual method and achieves over 30% improvement in resource utilization efficiency compared to the benchmark.
| Original language | English |
|---|---|
| Pages (from-to) | 1796-1800 |
| Number of pages | 5 |
| Journal | IEEE Wireless Communications Letters |
| Volume | 12 |
| Issue number | 10 |
| DOIs | |
| State | Published - 1 Oct 2023 |
| Externally published | Yes |
Keywords
- Ultra-reliable and low-latency communications
- deep reinforcement learning
- resource allocation
Fingerprint
Dive into the research topics of 'Deep Reinforcement Learning for Improving Resource Utilization Efficiency of URLLC With Imperfect Channel State Information'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver