>>995
Amortizing intractable inference in large language models
https://arxiv.org/abs/2310.04363

A standard method to sample approximately from intractable posterior distributions is Markov chain
Monte Carlo (MCMC), but it is difficult to craft good proposal distributions that would mix between
the modes quickly for language data (Miao et al., 2019; Zhang et al., 2020a; Lew et al., 2023), and
inference on a new input may be prohibitively slow.