+7 (495) 987 43 74 ext. 3304
Join us -              
Рус   |   Eng


  

“Journal of Applied Informatics” is a peer-reviewed science journal with international representation of editorial board and authors, covering a significant part of Russian IT-area. The topics of the publications are connected to the aspects of theory and application of computer modeling and information technologies in various professional areas. The journal is indexed by Russian Science Citation Index on Web of Science platform.

More
In accordance with the decision of the Higher Attestation Commission of the Ministry of Education and Science of Russian Federation, journal is included in the «List of Leading Peer-Reviewed Scientific Journals and Publications authorized to publish main dissertation results»

A method for dynamic detection of cyber threats in distributed Internet of Things systems based on generative models

The article examines the problem of dynamic cyberthreat detection in distributed Internet of Things systems, addressing the limited adaptability of static intrusion detection systems and the vulnerability of machine learning models to adversarial influences. The aim of the work is to improve the effectiveness of cyberthreat detection in distributed IoT systems based on efficiency and timeliness criteria by using generative models capable of simulating normal and abnormal node behavior while accounting for environmental variability. A method based on generative adversarial models and contrastive learning is employed to generate anomaly estimates for IoT data time windows and make decisions based on a threshold rule. A computational experiment was conducted on the open N-BaIoT dataset for Mirai family attack scenarios, comparing statistical, linear, and autoencoder-based anomaly detection methods on windowed representations of IoT data. It was demonstrated that the selected feature description ensures high cyberthreat detection efficiency with short inference times, and the use of an autoencoder yields the best F1-score values across the scenarios considered. The obtained results confirm the potential for further implementation of the proposed generative method for analyzing IoT traffic time sequences and its application in intelligent network security monitoring tools at the edge and gateway levels of IoT systems.

A method for predicting bank customer churn based on an ensemble machine learning model

The results of research are presented, the purpose of which was to develop a method for predicting the outflow of clients of a commercial bank based on the use of machine learning models (including deep artificial neural networks) for processing client data, as well as the creation of software tools that implement this method. The object of the study is a commercial bank, and the subject of the study is its activities in the B2C segment, which includes commercial interaction between businesses and individuals. The relevance of the chosen area of research is determined by the increased activity of banks in the field of introducing digital services to reduce non-operating costs associated, in particular, with retaining clients, since the costs of attracting new ones are much higher than maintaining existing clients. The scientific novelty of the research results is the developed method for predicting the outflow of commercial bank clients, as well as the algorithm underlying the software that implements the proposed method. The proposed ensemble forecasting model is based on three classification algorithms: k-means, random forest and multilayer perceptron. To aggregate the outputs of individual models, it is proposed to use a learning tree of fuzzy inference systems of the Mamdani type. Training of the ensemble model is carried out in two stages: first, the listed three classifiers are trained, and then, based on the data obtained from their outputs, a tree of fuzzy inference systems is trained. The ensemble model in the proposed method implements a static version of the forecast, the results of which are used in a dynamic forecast performed in two versions – based on the recurrent least squares method and based on a convolutional neural network. Model experiments carried out on a synthetic dataset taken from the Kaggle website showed that the ensemble model has a higher quality of binary classification than each model individually.

A model for automatic generation of educational tests from unstructured text based on domain-adaptive retraining of large language models

This article proposes a model for automatically generating educational tests based on domain-adaptive retraining of large language models (LLM). Traditional methods for developing test items are time-consuming and limited to narrow subject specialization. An analysis of existing approaches to generating test items using solely prompting of pre-trained LLMs revealed key limitations: unstable results, insufficient control over question structure, and the need for post-editing, which justifies the need to develop specialized solutions. A pipeline for retraining the T-Lite 1.0 language model (7 billion parameters) using the LoRA technique on a dataset of 4000 validated test tasks in higher education disciplines is proposed. A distinctive feature of the proposed method is the use of chain-of-thought to structure the task generation process by decomposing it into components: topic, goal, format, prerequisite knowledge, wording and expected response. An inference pipeline for the task generation system was developed, integrating multimodal processing of input data (text/image), automatic content segmentation, and multi-stage task generation using specialized Qwen2-VL and Gemma-2-27b-it models. The model was tested and implemented into the independent assessment ecosystem of the i-exam.ru portal. This expanded the functionality of existing online testing services by providing additional capacity to expand the uncompromised database of test-based assignments, which is particularly important for ensuring the reliability of online testing procedures, such as those of the FEPO and FIEB. This implementation confirms the model’s ability to generate high-quality educational tests that meet psychometric requirements and are suitable for use in the educational process.

A neural network algorithm for identifying and removing outliers in noisy data sets

Outliers in statistical data, which are the result of erroneously collected information, are often an obstacle to the successful application of machine learning methods in many subject areas. The presence of outliers in training data sets reduces the accuracy of machine learning models, and in some cases, makes the application of these methods impossible. Currently existing outlier detection methods are unreliable. They are fundamentally unable to detect some types of outliers, while observations that are not outliers are often classified as outliers by these methods. Recently emerging neural network methods for outlier detection are free from this drawback, but they are not universal, since the ability of neural networks to detect outliers depends both on the architecture of the neural network itself and on the problem being solved. The purpose of this study is to develop an algorithm for creating and using neural networks that can correctly detect outliers regardless of the problem being solved. This goal is achieved by using the property of some specially created neural networks to demonstrate the largest training errors on those observations that are outliers. The use of this property, as well as the implementation of a series of computational experiments and the generalization of their results using a mathematical formula, which is a modification of the consequence of the Arnold – Kolmogorov – Hecht-Nielsen theorem, made it possible to achieve the stated goal. The use of the developed algorithm turned out to be especially effective in solving the problems of forecasting and controlling interdependent thermophysical and chemical-energy-technological processes of processing ore raw materials, occurring at existing serial metallurgical enterprises, where the presence of outliers in statistical data is almost inevitable, and without their identification and exclusion, the construction of neural network systems that are acceptable in accuracy models are generally impossible.

Algorithm for steganographic information protection in video files based on a diffusion-probabilistic model with noise reduction

The results of a study are presented, the purpose of which was to develop a steganography algorithm for hiding text messages in video files. The algorithm is based on the use of a diffusion-probability model with noise reduction, which is implemented by a deep artificial neural network. The algorithm consists of two parts – for the parties sending and receiving the message. On the transmitting side, the following is carried out: synthesis of handwritten images of symbols (signatures) of the line of the hidden message, alignment of their frequency; applying direct diffusion to signatures, resulting in the generation of a noisy image that is deposited into a video stego container. At the receiving end, signatures are extracted from the video content, back diffusion is performed to obtain signatures of handwritten string characters, which are recognized using a convolutional neural network. The novelty of the research lies in the original developed algorithm for steganographic information protection in video files, as well as in a modified method of signature deposition based on the method of replacing the least significant bits. The method consists of bitwise embedding of bytes characterizing the pixel brightness level in the signature into the same blue brightness digits in a sequence of 8 frames of a video stego container. This method made it possible to significantly reduce the visible changes made to the video content when replacing not the least significant bits, but the middle significant bits in the stego container. This, in turn, provides greater resistance to compression attacks when transmitting information over the stegochannel. The practical significance of the research results lies in the developed software, with the help of which the algorithm for steganographic information protection in video files was tested, which showed high values of the peak signal-to-noise ratio and the index of structural similarity of images when embedding information in the middle bits of the bytes that set the brightness of the pixels of the stego container.

Algorithms for composing efficient business models

Solving the problems of effective business management is associated with a variety of current goals facing the same and, by implication, requires the construction of appropriate models of efficient business. The article presents two problems of doing business which, apart from their common target being an improvement of business efficiency, have different current goals. The creation or development of any business involves the construction of a specific business plan for it, including a list of those areas of business development, the implementation of which will increase its efficiency. The first problem considered in the article is related to the phased implementation of all areas of efficiency improvement in order to ultimately obtain the greatest efficiency of their realization. The second one solves the problem of increasing efficiency by partially implementing efficiency improvement directions from the initial list, taking into account certain limitations, for example, in conditions of limited company resources. For the construction of models which would meet the problems set, an efficiency criterion is substantiated and proposed in the article, and Algorithms 1 and 2 are developed which made it possible to build the efficient business models which take into account the difference in its current goals. The authors have developed a multi-stage Algorithm 1 for the generation of individual sets of areas for improvement of efficiency to be used to solve the tasks at hand. Algorithm 2 implemented at each stage of Algorithm 1 has been developed by the authors by using the Pareto optimality method but supplemented by taking into account the features and objectives of the current tasks set for the business. The use of such algorithms has made it possible to build efficient business models enabling not only to obtain an economic effect inherent to each efficiency improvement area, but also to ensure additional growth thereof driven by the properties of the developed algorithms.

An approach to the design of a neural network for the formation of an individual trajectory of knowledge testing

The paper discusses the issues of implementing an adaptive testing system based on the use of artificial neural network (INS) modules, which should solve the problem of intelligent choice of the next question, forming an individual testing trajectory. The aim of the work is to increase the accuracy of the INS to form the level of complexity of the next test question for two types of architectures – direct propagation (FNN – Feedforward Neural Network) and recurrent with long-term short-term memory (LSTM – Long-Short Term Memory). The data affecting the quality of training are analyzed, the architectures of the input layer of the direct propagation INS are considered, which have significantly improved the quality of neural networks. To solve the problem of choosing the thematic block of the question, a hybrid module structure is proposed, including the INS itself and a software module for algorithmic processing of the results obtained from the INS. A study of the feasibility of using direct propagation ANNs in comparison with the LSTM architecture was carried out, the input parameters of the network were identified, various architectures and parameters of the ANN training were compared (algorithms for updating weights, loss functions, the number of training epochs, packet sizes). The substantiation of the choice of a direct distribution network in the structure of the hybrid module for selecting a thematic block is given. The above results were obtained using the Keras high-level library, which allows you to quickly start at the initial stages of research and get the first results. Traditionally, learning has taken place over a large number of eras.

An ensemble neural network model for planning production programs under conditions of instability of external and internal environmental factors

A model for forecasting deviations of external and internal environmental factors at an industrial enterprise from planned values is proposed. This model is based on an ensemble of artificial neural networks. Improving the accuracy of such forecasts is a pressing research challenge, as accuracy plays a key role in developing enterprise production programs, enabling the sound development of competitive strategies for relationships with suppliers and customers. Forecast accuracy is significantly affected by instability in internal environmental factors (data directly characterizing production processes in terms of their impact on output volumes and product quality) and external factors (demand volume, delivery times, and the quality of components and raw materials). In the research problem statement, instability was understood not as the variability of factor values per se, but as fluctuations in their generalized characteristics relating to the entire data set. Such characteristics include irregular data receipt and the presence of anomalies in them. The novelty of the research results lies in the proposed structure of the neural network model for forecasting deviations of external and internal environmental factors from planned indicators in the face of irregular data receipt and the presence of anomalies in the data, as well as the algorithm for its application. The model is based on an ensemble of three neural network submodels built on convolutional and recurrent neural network architectures that forecast factors in the internal and external environments (taking into account decomposition into micro and macro environments) of the enterprise. The mutual influence of the instability of these factors is taken into account in the model by using a long short-term memory network at its output to aggregate the results of the submodels to produce the final forecast. The results of the model experiment showed that taking into account the instability of factors allows for increased accuracy in forecasting deviations of external and internal environmental factors from planned values.