+7 (495) 987 43 74 ext. 3304
Join us -              
Рус   |   Eng

articles

Authors: Mironov V., Gusarenko A., Yusupova N.     Published in № 6(96) 24 december 2021 year
Rubric: Software engineering

Soft extract data from word-based documents situationally-oriented approach

The article discusses the use of situation-oriented approach to software processing word-documents. The documents under consideration are prepared by the user in the environment of the Microsoft Word processor or its analogs and are used in the future as data sources. The openness of the Office Open XML and Open Document Format made it possible to apply the concept of virtual documents mapped to ZIP archives for programmatic access to XML components of word documents in a situational environment. The importance of developing preliminary agreements regarding the placement of information in the document for subsequent search and retrieval, for example, using pre-prepared templates, is substantiated. For the DOCX and ODT formats, the article discusses the use of key phrases, bookmarks, content controls, custom XML components to organize the extraction of entered data. For each option, tree-like models of access to the extracted data, as well as the corresponding XPath expressions, are built. It is noted that the use of one or another option depends on the functionality and limitations of the word processor and is characterized by varying complexity of developing a blank template, entering data by the user and programming data extraction. The applied solution is based on entering metadata into the article using content controls placed in a stub template and bound to elements of a custom XML component. The developed hierarchical situational model of HSM provides extraction of an XML component, loading it into a DOM object and XSLT transformations to obtain the resulting data: an error report and JavaScript code for subsequent use of the extracted metadata.

Key words

situationally-oriented database, hierarchical situational models, virtual document, open text format, the metadata of the scientific article, Open Journal System, DOCX, ODT

The author:

Mironov V.

Degree:

Dr of Technique, Professor, Ufa State Aviation Technical University

Location:

Ufa

The author:

Gusarenko A.

Degree:

PhD in Computer Science, Ufa State Aviation Technical University (UGATU)

Location:

Ufa

The author:

Yusupova N.

Degree:

Dr of Technique, Professor, Dean, Ufa State Aviation Technical University

Location:

Ufa