+7 (495) 987 43 74 ext. 3304
Join us -              
Рус   |   Eng

Authors

Meksheneva Zhanna V.

Degree
Cand. Sci. (Econ.), Associate Professor, Information Management and Information and Communication Technologies Department named aft Professor V. V. Dik, Synergy University
E-mail
zhmeksheneva@synergy.ru
Location
Moscow, Russia
Articles

Text sentiment analysis in banking

The paper presents the author's approach to solving the problem of sentiment analysis of online Russian-language messages about the activities of banks. The study data are customer reviews about banks in general and their products, services and quality of service posted on the Banki.ru portal. In this paper, the problem of text sentiment analysis is considered as a binary classification task based on a set of positive and negative reviews. A vector model with a tf-idf weighting scheme was used to represent the collected and preprocessed texts. The following algorithms with the selection of optimal parameters on the grid were used for binary classification task: naive Bayesian classifier, support vector machine, logistic regression, random forest and gradient boosting. Standard statistical metrics, such as accuracy, completeness, and F-measure, were used to evaluate the quality of solving the classification problem. For the indicated metrics, the best results were obtained on the classification model developed with the use of Support Vector Machine. Thematic text modeling was also carried out using the Dirichlet latent placement method to define the most typical topics of customer messages. As a result, it was concluded that the most popular message topics are "cards" and "quality of service". The obtained results can be used in the activities of banks to automate its reputation monitoring in the media and when routing client requests to solve various problems. When solving problems, the features of the Python programming language were actively used, namely, libraries for web scraping, machine learning, and natural language processing. Read more...

Accuracy estimating of highly noisy signals digital processing using heuristic algorithms

Heuristic algorithms are often used as an alternative when solving problems of high computational complexity or lacking an exact solution, allowing to quickly obtain the desired result. Usually, they do not have a strict mathematical justification, but their application is justified in terms of practicality. Formally, algorithms that use approximate methods can be classified as heuristic. However, when applying them, the problem of determinism lack is often arises, which does not always allow one to evaluate the solution obtained accuracy. The paper considers a methodical approach to assessing the accuracy of heuristic algorithms designed to determine the useful signal shape and parameters on the strong noise component background. It is based on the method of analogy and consists in modeling an artificial signal with given parameters and a background noise interference similar in its characteristics to additive white Gaussian noise. In this case, the noise component is formed by software using a pseudo-random number sequence generator. Such generators are included in the packages of almost all high-level programming languages built-in functions. A comparative analysis of the real and artificial noise characteristics is presented, that shown the problem solving by numerical modeling possibility. The results of accuracy estimation in determining the artificial signal parameters, that is separated from the noise component using piecewise linear approximation and averaging heuristic algorithms, are obtained. The problem of empirical data smoothing with the discrete signal equivalent replacement by a quadratic functions whose parameters provide a piecewise parabolic approximation its shape is also considered. This procedure eliminates the residual signal bounce that inevitably occurs as a result of linearization and allows further recording at any sampling rate. Thus, the proposed approach allows us to quantify the accuracy of heuristic algorithms used in determining the expected signal parameters. Read more...

Highly noisy signal waveform restoration based on integro-differential transform and integral curve approximation

In the field of digital signal processing, restoring their shape at a high level of noise component is one of the main problems. Its relevance is due to the widespread use of digital technologies and it becomes particularly acute in those areas where interference inevitably affects the registration quality, recognition, and signals interpretation. A common type of naturally occurring interference is thermal noise, which is directly related to the measuring operation and recording equipment. It is impossible to completely eliminate this noise kind, but modern digital processing methods are capable of significantly reducing its negative impact. Currently, researchers’ attention is increasingly focused on developing heuristic algorithms that represent alternative ways of suppressing the noisy component while preserving the useful signal’s form. These algorithms are characterized by their ability to find approximate solutions where traditional analytical and technical methods lose their effectiveness. They are aimed at adapting to the stochastic nature of thermal noise and offer a reasonable compromise between labor intensity and the useful signal reproduction accuracy. This article continues previous published research into the heuristic algorithms development for recovering the shape of heavily distorted discrete signals. The goal is to propose an alternative approach to solving this problem based on the sequential application idea of numerical integration and differentiation operations combined with integral curve approximation procedure. As a result, the noise component influence is eliminated, and the restored signal retains information components of the useful signal. The proposed algorithm efficiency was determined using a test signal superimposed with artificial noise simulated via computer simulation of a pseudo-random number generator. The results were compared with two previously developed heuristic algorithms: one based on piecewise linear approximation by least squares method and another based on averaging instantaneous values of the signal over partition intervals. Analysis demonstrated that the developed algorithm compares favorably in terms of accuracy with these algorithms, but differs in greater efficiency when processing discrete nonperiodic signals with natural noise contamination. Read more...