+7 (495) 987 43 74 ext. 3304
Join us -              
Рус   |   Eng

Journal archive

№6(90) December 2020 year

Content:

Teacher’s portfolio

IT development

Author: O. Kultygin

The relevance of the topic considered in the article lies in solving the problems of designing expert systems for industrial enterprises based on big data technology. The purpose of the study is to analyze the applied methodologies at the design stage of an enterprise information system, to develop algorithms for the operation of an expert system with big data. A brief statement of the problem consists in analyzing the technologies available on the market for working with big data and the possibility of using them for expert systems, identifying the main stages of working with big data for industrial enterprises. In the modern world, the problem of using Big Data has become extremely urgent. Companies, firms and corporations that are leaders in the field of information technology and business conduct are busy looking for optimal solutions for managing a huge amount of constantly incoming information and its in-depth analysis. They are looking for ways to profit from the data at their disposal, trying to get new data from the existing ones. Developing your own expert system is more cost effective. Methods used - methods of analysis and design IDEF0, DFD, IDEF1, IDEF3, methods of functional (structural) design, methods of object-oriented design. The results obtained - a method of using big data to create an expert system for an industrial enterprise has been developed. Implementation of such an expert system on your own is much cheaper than purchasing ready- made software systems.

IT and education

Educational environment

Author: N. Prokimnov

The effect of learning depends on many factors, among the most important of which are the suitability of a plan for conducting practical exercises and laboratory works that most accurately reflects the purposes and didactic units of the course being studied, and a sufficient level of provision of practical training with methodological guidelines and software tools. Decisions regarding the choice of all these elements depend on a number of conditions, such as the rules and standards in force of a particular educational environment, the format of training, the students’ stuff, and others. The paper proposes a framework generalized plan for conducting laboratory workshops on modeling and simulation courses, summarizing the author's methodological experience. The main prerequisites and principles underlying the composition of the workshops are presented. The set of tools used to perform practical tasks is characterized. Brief characteristics of the goals of each practical task of the generalized plan, the setting of tasks and the software tools used to solve them are given. The ideas and principles presented in the paper can be useful for teachers planning practical lessons on modeling and simulation as well as for developing their methodological and instrumental support.

Software engineering

Algorithmic efficiency

The main goal of the research is to develop a publicly available tonal-thematic dictionary in Russian, which allows identifying the semantic orientation of groups of economic texts, as well as determining their sentimental (tonal) characteristics. The article describes the main stages of compiling a dictionary using machine learning methods (clustering, word frequency allocation, correlogram construction) and expert evaluation of determining the tonality and expanding the dictionary by including terms from similar foreign dictionaries. The empirical base of the research included: annual reports of companies, news from ministries and the Central Bank of the Russian Federation, financial tweets of companies and RBC news articles in the area of "Economics, Finance, money and business". The compiled dictionary differs from the previous ones in the following ways: 1. it is one of the first dictionaries which can be used to rate the tone of economic and financial texts in Russian language by 5 degrees of tonality; 2. allows you to rate the tonality and content of the text by 12 economic topics (e. g., macroeconomics, monetary policy, stock and commodity markets, etc.) 3. the final version of EcSentiThemeLex dictionary is included in the software package (library) ‘rulexicon’ for the programming environment R and Python. Step-by-step examples of using the developed library in the R environment are given. It allows to evaluate the tone and thematic focus of an economic or financial text by means of a concise code. The structure of the library allows you to use the original texts for their assessment without prior lemmatization (the reduction to elementary forms).The resulting EcSentiThemeLex dictionary is included in the rulexicon software package for the R modeling environment .The tonal-thematic dictionary EcSentiThemeLex with all word forms compiled in this work will simplify the solution of applied problems of text analysis in the financial and economic sphere, and can also potentially serve as a basis for increasing the number of relevant studies in the Russian literature.

Author: Anna Kuznetsova

Average precision (AP) as the area under the Precision – Recall curve is the de facto standard for comparing the quality of algorithms for classification, information retrieval, object detection, etc. However, traditional Precision – Recall curves usually have a zigzag shape, which makes it difficult to calculate the average precision and to compare algorithms. This paper proposes a statistical approach to the construction of Precision – Recall curves when assessing the quality of algorithms for object detection in images. This approach is based on calculating Statistical Precision and Statistical Recall. Instead of the traditional confidence level, a statistical confidence level is calculated for each image as a percentage of objects detected. For each threshold value of the statistical confidence level, the total number of correctly detected objects (Integral TP) and the total number of background objects mistakenly assigned by the algorithm to one of the classes (Integral FP) are calculated for each image. Next, the values of Precision and Recall are calculated. Precision – Recall statistical curves, unlike traditional curves, are guaranteed to be monotonically non-increasing. At the same time, the Statistical Average Precision of object detection algorithms on small test datasets turns out to be less than the traditional Average Precision. On relatively large test image datasets, these differences are smoothed out. The comparison of the use of conventional and statistical Precision – Recall curves is given on a specific example.

Information security

Models and methods

At the moment, dirty data, that is, low-quality data, is becoming one of the main problems of effectively solving Data Mining tasks. Since the source data is accumulated from a variety of sources, the probability of getting dirty data is very high. In this regard, one of the most important tasks that have to be solved during the implementation of the Data Mining process is the initial processing (clearing) of data, i.e. preprocessing. It should be noted that preprocessing calendar data is a rather time-consuming procedure that can take up to half of the entire time of implementing the Data Mining technology. Reducing the time spent on the data cleaning procedure can be achieved by automating this process using specially designed tools (algorithms and programs). At the same time, of course, it should be remembered that the use of the above elements does not guarantee one hundred percent cleaning of "dirty" data, and in some cases may even lead to additional errors in the source data. The authors developed a model for automated preprocessing of calendar data based on parsing and regular expressions. The proposed algorithm is characterized by flexible configuration of preprocessing parameters, fairly simple implementability and high interpretability of results, which in turn provides additional opportunities for analyzing unsuccessful results of Data Mining technology application. Despite the fact that the proposed algorithm is not a tool for cleaning absolutely all types of dirty calendar data, nevertheless, it successfully functions in a significant part of real practical situations.

Processes and systems modeling

Author: A. Veselov

In designing modern computer equipment and digital electronics, the use of simulation models is of great importance. At first, monolithic models were widely used for this. However, they worked well only when their size was relatively small. Because of it developers began to refuse gradually use of monolithic models and to pass to use of the distributed models allowing to increase their speed and to expand borders of their admissible sizes. At the same time, they begin to pay special attention to hierarchical distributed models, which provide the opportunity to investigate the behavior of the created devices at different levels of detail. Similar models made it possible to noticeably expand the permissible boundaries of their sizes and increase the speed of work. However, such distributed models have the disadvantage that their effectiveness is noticeably dependent not only on the number of components included in their composition, but also on the size of these components. he paper presents the results of a study of the effect of introducing an additional upper hierarchical level on the performance of distributed models based on Petri networks. The use of such a method of modifying distributed models leads to an increase in their speed in a wide range of changes in their sizes. At the same time, the most significant effect achieved in distributed models containing a large number of small components. The maximum speed of the thus modified models can be an order of magnitude higher than that of the non-modified ones. As a result, in addition to the overall increase in the efficiency of the modified hierarchical distributed models, this also led to a significant equalization of the performance of the modified distributed models with subordinate components of different sizes.

Laboratory

Researching of processes and systems

Demographic indicators are important functions of state programs for the development of Russia, operational monitoring of demographic development is the key to the successful implementation of programs. Very often, government statistics data are published with a delay, which does not allow their use for operational monitoring and planning. In this work, the approach allows for the rapid assessment of demographic processes in the field of formation and forecasting of demographic trends in the short term based on data from query statistics from Google Trends. The relationships between the search queries and demographics are analyzed using Pearson's correlation. The analysis uses annual (total fertility rate, abortions per 100 births, abortions per 1000 women, marriages and divorces per 1000 population) and monthly data (number of births, number of marriages and divorces) by birth, marriages and abortions with and without lags. The analysis is carried out on data for Russia as a whole and for the eight most populated regions: Moscow, Moscow Region, Krasnodar Territory, St. Petersburg, Rostov Region, Sverdlovsk Region, Republic of Tatarstan, Republic of Bashkortostan. Using the temporal metrics available in Google Trends since 2004, some demographics can be predicted based on data from related queries to the Google search algorithm using the ARIMA model. Thus, it is possible to use query data as a supplement to demographic data, when building multiple regression models for demographic calculations, or use it as a proxy variable.

Description of a new method of researching objects in the form of a set of information tasks is the goal of the work. A simplicial analysis of the cognitive structure of the object of study is included in the method. Several stages have a method. The set of basic factors is revealed at the first stage. Pairwise comparison of factors is carried out. The formation of the cognitive model in the form of an adjacency matrix of the 1st level of the hierarchy is done. Factors for the formation of the 2nd level of the hierarchy are grouped. The combination of components in the cognitive structure of the 3rd level of the hierarchy is carried out. Detailing the components of the 3rd level of the hierarchy is presented at the 4th level. A series of simulation experiments is conducted to test the stability of the detailed structure of the cognitive model. The implicit relationship between the underlying factors are studied. The method was tested on the example of the cognitive model "lifestyle" of students. The components "living conditions", "cognitive dissonance", and "performance" are grouped at the second level of the hierarchy. A simulation experiment was conducted. The presence of pulse resonance in the detailed structure of the 4th level of the hierarchy is established. Simplicial analysis is done. The ordering of the elements its purpose is. The simulation experiment was made after simplicial analysis. The result now corresponds to the theory. The influence of cognitive dissonance of the individual on the "activity" was revealed. The "activity" factor affects cognitive dissonance, among other things. To identify significant factors, detection of hidden trends and the implementation of measures of social control, this method need

IT-MANAGEMENT

Performance management

Research of the effectiveness of equipment repair work is of great practical and economic importance. This is confirmed by many publications devoted to monitoring and diagnostic tools for various equipment. This work is devoted to modeling the repair work of technological equipment for various purposes, operating under conditions of uncertainty and risk. The proposed study recommends a technology using an insurance fund that performs two functions: 1) accumulates payments at different intervals to carry out repair work: current, emergency, major; 2) pays for these works as necessary. Mathematical description of the organization of equipment repairs is proposed to be based on a random risk process, which in our case describes the state of the insurance fund. To model this process, it is proposed to use a simulation approach that involves creating a modeling program that creates sample values of a special type. These values are then processed in order to obtain indicators of the effectiveness of the repair work. As indicators, the resource-cost and financial risks are proposed for the assessment of which software has been created. Computational experiments using a modeling program made it possible to obtain estimates of the proposed risk indicators and to conclude that, in terms of their reduction, the frequency of payments to the insurance fund should be different, depending on the type of repair work. The introduction of an insurance fund, a description of its state by a random process, a mathematical description of risk indicators for assessing the effectiveness of repair work, the creation of a modeling program based on an event-based approach are the scientific novelty of this work.