Enhanced multi-tuple extraction for materials: integrating pointer networks and augmented attention
Abstract
Extracting reliable, tuple-level information from materials texts is essential for data-driven design, yet multi-tuple sentences remain difficult due to intertwined semantics, syntactic complexity, and sparse supervision for higher-density cases. In the study, we address this by formulating information extraction as an integrated process that couples entity extraction with tuple allocation. The framework combines an entity extraction module based on MatSciBERT with pointer networks and an allocation module that models inter- and intra-entity attention to enforce tuple coherence. Using mechanical properties of multi-principal-element alloys as a case study, we define the target schema and evaluate exact-match tuple accuracy. Our rigorous experiments on tuple extraction demonstrate F1 scores of 0.96, 0.95, 0.85, and 0.75 across datasets containing one to four tuples per sentences, and 0.85 on a randomly curated set. Ablations show the allocation module is most critical, and inter-entity attention contributes more than intra-entity attention. Error analyses attribute the density-related decline mainly to semantic overlap and syntactic complexity, with upstream extraction errors prominent under sparse supervision and allocation errors concentrated in structurally complex templates. The approach delivers precise structured outputs suitable for downstream analysis and offers a domain-adaptable alternative to prompt-based large models when strict correctness is required.
Keywords
AI for materials, multi-tuple extraction, MatSciBERT, attention mechanism
Cite This Article
Hei M, Zhang Z, Liu Q, Pan Y, Zhao X, Peng Y, Ye Y, Zhang X, Bai S. Enhanced multi-tuple extraction for materials: integrating pointer networks and augmented attention. J Mater Inf 2025;5:[Accept]. http://dx.doi.org/10.20517/jmi.2025.75