IT management |
|
Performance management |
|
|
Existing Process Mining techniques often exhibit low robustness to common real-world data issues, such as noisy traces and incomplete event logs. This paper presents the development and validation of an integrated three-stage approach that combines statistical anomaly filtering, probabilistic and temporal reconstruction of missing events, and adaptive process model synthesis. The study addresses the following tasks: a critical review of classical process discovery algorithms; formalization of filtering methods based on Isolation Forest and event reconstruction using probabilistic and temporal metrics; and the design of an adaptive mechanism for selecting the noise threshold based on the normalized entropy of subprocess variability. The approach is implemented as a Python software module using the pm4py, scikit-learn, and NumPy libraries. Experiments on synthetic datasets generated with varying noise levels and proportions of missing events confirm the robustness of the proposed method. The results are evaluated using the Fitness, Generalization, Simplicity, and F1-score metrics and compared against the Alpha Miner, Heuristics Miner, and Inductive Miner algorithms. The proposed approach yields a statistically significant improvement in the quality of the resulting process models under high noise and log incompleteness, providing a basis for robust business process analytics systems capable of operating on data from real information systems.
|
|---|---|
|
|
A model for forecasting deviations of external and internal environmental factors at an industrial enterprise from planned values is proposed. This model is based on an ensemble of artificial neural networks. Improving the accuracy of such forecasts is a pressing research challenge, as accuracy plays a key role in developing enterprise production programs, enabling the sound development of competitive strategies for relationships with suppliers and customers. Forecast accuracy is significantly affected by instability in internal environmental factors (data directly characterizing production processes in terms of their impact on output volumes and product quality) and external factors (demand volume, delivery times, and the quality of components and raw materials). In the research problem statement, instability was understood not as the variability of factor values per se, but as fluctuations in their generalized characteristics relating to the entire data set. Such characteristics include irregular data receipt and the presence of anomalies in them. The novelty of the research results lies in the proposed structure of the neural network model for forecasting deviations of external and internal environmental factors from planned indicators in the face of irregular data receipt and the presence of anomalies in the data, as well as the algorithm for its application. The model is based on an ensemble of three neural network submodels built on convolutional and recurrent neural network architectures that forecast factors in the internal and external environments (taking into account decomposition into micro and macro environments) of the enterprise. The mutual influence of the instability of these factors is taken into account in the model by using a long short-term memory network at its output to aggregate the results of the submodels to produce the final forecast. The results of the model experiment showed that taking into account the instability of factors allows for increased accuracy in forecasting deviations of external and internal environmental factors from planned values.
|
Teacher’s portfolio |
|
IT development |
|
|
|
This article proposes a model for automatically generating educational tests based on domain-adaptive retraining of large language models (LLM). Traditional methods for developing test items are time-consuming and limited to narrow subject specialization. An analysis of existing approaches to generating test items using solely prompting of pre-trained LLMs revealed key limitations: unstable results, insufficient control over question structure, and the need for post-editing, which justifies the need to develop specialized solutions. A pipeline for retraining the T-Lite 1.0 language model (7 billion parameters) using the LoRA technique on a dataset of 4000 validated test tasks in higher education disciplines is proposed. A distinctive feature of the proposed method is the use of chain-of-thought to structure the task generation process by decomposing it into components: topic, goal, format, prerequisite knowledge, wording and expected response. An inference pipeline for the task generation system was developed, integrating multimodal processing of input data (text/image), automatic content segmentation, and multi-stage task generation using specialized Qwen2-VL and Gemma-2-27b-it models. The model was tested and implemented into the independent assessment ecosystem of the i-exam.ru portal. This expanded the functionality of existing online testing services by providing additional capacity to expand the uncompromised database of test-based assignments, which is particularly important for ensuring the reliability of online testing procedures, such as those of the FEPO and FIEB. This implementation confirms the model’s ability to generate high-quality educational tests that meet psychometric requirements and are suitable for use in the educational process.
|
Software engineering |
|
|
Economic modeling frequently requires assessing the interdependence among multiple socio-economic indicators. These factors typically exhibit a multi-dimensional structure, allowing them to be examined from various perspectives. The aim of this study is to develop a mathematical modeling-based method that enables a comprehensive assessment of macroeconomic processes, identifies significant characteristics of socio-economic entities, and incorporates them into forecasting. The proposed approach relies on estimating impulse response functions and forecast error variance decompositions through vector autoregressive models. A generalized analysis of these inter-factor interaction characteristics provides a holistic understanding of the phenomena under investigation. In the forecasting stage, the varying influence strengths of multi-dimensional factors across different aspects are explicitly accounted for, facilitating the identification of priority measures to enhance regional socio-economic development. The proposed method was empirically tested using a case study on the relationship between social factors of migration flows and employment levels in the Russian Federation. A panel vector autoregression model was selected as the econometric framework, a choice driven by both the research objectives and the structure of the available dataset. The analysis was conducted across three social dimensions characterizing migration flows: the dynamics of working-age population arrivals and departures, education levels, and citizenship status. Using the estimated models, generalized impulse response functions and forecast error variance decompositions were derived. Based on these results, a model ensemble was constructed, a forecast incorporating the relative importance of each variable was generated, and a comparative analysis was performed. Econometric modeling techniques ensure both the scientific rigor and practical relevance of this research. All computations were implemented using the R programming language.
|
Algorithmic efficiency |
|
|
Developing efficient parallel programs for multicomputers is a complex task that requires solutions to numerous issues. On the one hand, it is necessary to ensure that a parallel program has required non-functional properties, such as reducing the overhead of organizing parallel processing, balancing the load across computational nodes and processor cores, etc. On the other hand, new classes of errors are relevant for parallel programs that are not typical for sequential programs. To overcome these issues, automatic parallel program construction systems are being developed. Such systems take over the work of executing a program on a multicomputer and providing non-functional properties. This allows the user to focus on solving the applied problem. The goal of this paper is to evaluate the degree of readiness of popular automatic parallel program construction systems in terms of both performance and user experience, and also to identify classes of problems for which it is advisable to use the systems under consideration. This paper presents a comparison of these systems with each other and with MPI and OpenMP technologies. The comparison is based on the criteria most important to the user when choosing a technology (performance, completeness of documentation, ease of installation, debugging and optimization, etc.) Common problems typical of such systems and discussed in the paper are incomplete documentation, errors, and non-obvious behavior. However, these systems are mostly capable of constructing parallel programs with acceptable performance. At the same time, the relative simplicity and high level of input languages of the systems make it possible to implement an applied algorithm with less effort compared to parallel programming in MPI and OpenMP technologies.
|
Software engineering |
|
|
|
Practical migration of applications to the environment of the Linux family of operating systems is hampered by the lack of effective mechanisms for centralized management of mobile user profiles when they log in from any computer in the domain, functionally similar to the Roaming User Profiles component of Windows OS. However, unlike Windows, where application and system settings are stored in a single registry, on Linux they are located in a variety of hidden files and directories in the user’s home folder, which leads to intensive data transfers and, thus, increases the initialization and termination time of the session and increases the load on the disk subsystem when scanning metadata. The paper proposes an approach to the organization of centralized management of mobile user profiles based on QCOW2 disk images. It uses a mechanism for saving changes and copying them completely locally during the session, known as Copy-On-Write, which eliminates the main disadvantages of network file systems and file synchronization. The structure of the mobile user profile system for Linux environments has been developed, and a methodology has been proposed to ensure their integrity, performance, and fault tolerance, which eliminates the dependence of performance on network quality characteristic of the registry-based method. Estimates of such indicators as the volume of network transfers, login time, speed of reading and writing files during the session, and resistance to network connection failures, obtained on the created layout, revealed a noticeable increase in them compared with similar indicators typical for known methods. The approach can serve as a basis for creating mobile workplaces in heterogeneous Linux environments.
|
Laboratory |
|
Researching of processes and systems |
|
|
|
The effectiveness of operations on fuzzy presented data (fuzzy sets and relations) in the tasks of modeling under uncertainty conditions is largely determined by the complexity of fuzzy computing, which, as a rule, is based on L. Zadeh’s Extension Principle using a crisp function. Operations on membership functions of fuzzy sets and relations, based on the Extension Principle, are equivalent to interval operations on their α-levels, which are much simpler to computing. The proof of these results is based on Theorem H. Nguyen. The implementation of fuzzy computing is generalized to the case of a fuzzy mapping between membership functions of fuzzy sets and relations based on the Generalized Fuzzy Expansion Principle. However, the problem of representing and proving the possibility of implementing the Generalized Fuzzy Expansion Principle in the transition from operations on membership functions of fuzzy sets and relations to alternative operations on their α-levels has not yet been solved. This paper describes the implementation of a fuzzy mapping between fuzzy sets using a fuzzy relation, based on an approach to the fuzzy composition of membership functions of fuzzy sets and fuzzy relations. An alternative approach to the interpretation of the Generalized Fuzzy Expansion Principle is proposed, based on the fuzzy composition of characteristic functions of α-levels of fuzzy sets and relations, as well as the method of fuzzy computing based on this principle. The paper proves the equivalence of the results of fuzzy composition based on the two approaches considered, as well as a comparative assessment of the complexity of computing and the degree of parallelism in their implementation. The use of the proposed Generalized Fuzzy Expansion Principle and the fuzzy computing method based on it makes it possible to significantly simplify the implementation of fuzzy computing by using non-numerical (logical) operations on the values of characteristic functions of α-levels of fuzzy sets and relations instead of computational operations on the real values of the membership functions of these fuzzy sets and relations.
|
Information security |
|
Data protection |
|
|
|
The article examines the problem of dynamic cyberthreat detection in distributed Internet of Things systems, addressing the limited adaptability of static intrusion detection systems and the vulnerability of machine learning models to adversarial influences. The aim of the work is to improve the effectiveness of cyberthreat detection in distributed IoT systems based on efficiency and timeliness criteria by using generative models capable of simulating normal and abnormal node behavior while accounting for environmental variability. A method based on generative adversarial models and contrastive learning is employed to generate anomaly estimates for IoT data time windows and make decisions based on a threshold rule. A computational experiment was conducted on the open N-BaIoT dataset for Mirai family attack scenarios, comparing statistical, linear, and autoencoder-based anomaly detection methods on windowed representations of IoT data. It was demonstrated that the selected feature description ensures high cyberthreat detection efficiency with short inference times, and the use of an autoencoder yields the best F1-score values across the scenarios considered. The obtained results confirm the potential for further implementation of the proposed generative method for analyzing IoT traffic time sequences and its application in intelligent network security monitoring tools at the edge and gateway levels of IoT systems.
|