Text mining policy: Classifying forest and landscape restoration policy agenda with neural information retrieval

7 Aug 2019 · John Brandt ·

Dozens of countries have committed to restoring the ecological functionality of 350 million hectares of land by 2030. In order to achieve such wide-scale implementation of restoration, the values and priorities of multi-sectoral stakeholders must be aligned and integrated with national level commitments and other development agenda. Although misalignment across scales of policy and between stakeholders are well known barriers to implementing restoration, fast-paced policy making in multi-stakeholder environments complicates the monitoring and analysis of governance and policy. In this work, we assess the potential of machine learning to identify restoration policy agenda across diverse policy documents. An unsupervised neural information retrieval architecture is introduced that leverages transfer learning and word embeddings to create high-dimensional representations of paragraphs. Policy agenda labels are recast as information retrieval queries in order to classify policies with a cosine similarity threshold between paragraphs and query embeddings. This approach achieves a 0.83 F1-score measured across 14 policy agenda in 31 policy documents in Malawi, Kenya, and Rwanda, indicating that automated text mining can provide reliable, generalizable, and efficient analyses of restoration policy.

PDF Abstract