Now showing 1 - 10 of 16
  • Publication
    Make Deep Networks Shallow Again
    (SciTePress, 2023-11-15) ;
    Hrycej, Tomas
    ;
    ;
    Ana Fred
    ;
    Frans Coenen
    ;
    Jorge Bernardino
    Deep neural networks have a good success record and are thus viewed as the best architecture choice for complex applications. Their main shortcoming has been, for a long time, the vanishing gradient which prevented the numerical optimization algorithms from acceptable convergence. An important special case of network architecture, frequently used in computer vision applications, consists of using a stack of layers of the same dimension. For this architecture, a breakthrough has been achieved by the concept of residual connections-an identity mapping parallel to a conventional layer. This concept substantially alleviates the vanishing gradient problem and is thus widely used. The focus of this paper is to show the possibility of substituting the deep stack of residual layers with a shallow architecture with comparable expressive power and similarly good convergence properties. A stack of residual layers can be expressed as an expansion of terms similar to the Taylor expansion. This expansion suggests the possibility of truncating the higher-order terms and receiving an architecture consisting of a single broad layer composed of all initially stacked layers in parallel. In other words, a sequential deep architecture is substituted by a parallel shallow one. Prompted by this theory, we investigated the performance capabilities of the parallel architecture in comparison to the sequential one. The computer vision datasets MNIST and CIFAR10 were used to train both architectures for a total of 6,912 combinations of varying numbers of convolutional layers, numbers of filters, kernel sizes, and other meta parameters. Our findings demonstrate a surprising equivalence between the deep (sequential) and shallow (parallel) architectures. Both layouts produced similar results in terms of training and validation set loss. This discovery implies that a wide, shallow architecture can potentially replace a deep network without sacrificing performance. Such substitution has the potential to simplify network architectures, improve optimization efficiency, and accelerate the training process.
    Type:
    Journal:
  • Publication
    FinBERT-FOMC: Fine-Tuned FinBERT Model with Sentiment Focus Method for Enhancing Sentiment Analysis of FOMC Minutes
    (Association for Computing Machinery, 2023-11-25)
    Chen, Ziwei
    ;
    Gössi, Sandro
    ;
    Wonseong Kim
    ;
    ;
    Siegfried Handschuh
    In this research project, we used the financial texts published by the Federal Open Market Committee (FOMC), known as the FOMC Minutes, for sentiment analysis. The pre-trained FinBERT model, a state-of-the-art transformer-based model trained for NLP tasks in finance, was utilized for that. The focus of this research has been on improving the predictive performance of complex financial sentences, as our problem analysis has shown that such sentences pose a significant challenge to existing models. To accomplish this objective the original FinBERT model was fine-tuned for domain-specific sentiment analysis. A strategy, referred to as Sentiment Focus (SF) was utilized to reduce the complexity of sentences, making them more amenable to accurate sentiment predictions. To evaluate the efficacy of our method, we curated a manually labeled test dataset comprising 1,375 entries. The results demonstrated an overall improvement of 5 % in accuracy when using SF-enhanced fine-tuned FinBERT over the original FinBERT model. In cases of complex sentences containing conjunctions like but, while, and though with contradicting sentiments, our fine-tuned model outperformed the original FinBERT by a margin of 17.4 %. CCS CONCEPTS • Computing methodologies → Natural language processing; Supervised learning by classification; Neural networks; • Applied computing → Economics.
    Type:
    Journal:
  • Publication
    Number of Attention Heads vs. Number of Transformer-encoders in Computer Vision
    (SciTePress, 2022-10)
    Hrycej, Tomas
    ;
    ;
    Determining an appropriate number of attention heads on one hand and the number of transformer-encoders, on the other hand, is an important choice for Computer Vision (CV) tasks using the Transformer architecture. Computing experiments confirmed the expectation that the total number of parameters has to satisfy the condition of overdetermination (i.e., number of constraints significantly exceeding the number of parameters). Then, good generalization performance can be expected. This sets the boundaries within which the number of heads and the number of transformers can be chosen. If the role of context in images to be classified can be assumed to be small, it is favorable to use multiple transformers with a low number of heads (such as one or two). In classifying objects whose class may heavily depend on the context within the image (i.e., the meaning of a patch being dependent on other patches), the number of heads is equally important as that of transformers.
    Type:
    Journal:
  • Publication
    Training Neural Networks in Single vs. Double Precision
    (SciTePress, 2022-10)
    Hrycej, Tomas
    ;
    ;
    The commitment to single-precision floating-point arithmetic is widespread in the deep learning community. To evaluate whether this commitment is justified, the influence of computing precision (single and double precision) on the optimization performance of the Conjugate Gradient (CG) method (a second-order optimization algorithm) and Root Mean Square Propagation (RMSprop) (a first-order algorithm) has been investigated. Tests of neural networks with one to five fully connected hidden layers and moderate or strong nonlinearity with up to 4 million network parameters have been optimized for Mean Square Error (MSE). The training tasks have been set up so that their MSE minimum was known to be zero. Computing experiments have dis-closed that single-precision can keep up (with superlinear convergence) with double-precision as long as line search finds an improvement. First-order methods such as RMSprop do not benefit from double precision. However, for moderately nonlinear tasks, CG is clearly superior. For strongly nonlinear tasks, both algorithm classes find only solutions fairly poor in terms of mean square error as related to the output variance. CG with double floating-point precision is superior whenever the solutions have the potential to be useful for the application goal.
    Type:
    Journal:
  • Publication
    Type:
    Journal:
  • Publication
    Deep Watching: Towards New Methods of Analyzing Visual Media in Cultural Studies
    ( 2019-07) ;
    Gassner, Sebastian
    ;
    ;
    Howanitz, Gernot
    ;
    Radisch, Erik
    ;
    Rehbein, Malte
    A large number of digital humanities projects focuses on text. This medial limitation may be attributed to the abundance of well-established quantitative methods applicable to text. Cultural Studies, however, analyse cultural expressions in a broad sense, including different non-textual media, physical artefacts, and performative actions. It is, to a certain extent, possible to transcribe these multi-medial phenomena in textual form; however, this transcription is difficult to automate and some information may be lost. Thus, quantitative approaches which directly access media-specific information are a desideratum for Cultural Studies. Visual media constitute a significant part of cultural production. In our paper, we propose Deep Watching as a way to analyze visual media (films, photographs, and video clips) using cutting-edge machine learning and computer vision algorithms. Unlike previous approaches, which were based on generic information such as frame differences (Howanitz 2015), color distribution (Burghardt/Wolff 2016) or used manual annotation altogether (Dunst/Hartel 2016), Deep Watching allows to automatically identify visual information (symbols, objects, persons, body language, visual configuration of the scene) in large image and video corpora. To a certain extent, Tilton and Arnold’s Distant-Viewing Toolkit uses a comparable approach (Tilton/Arnold 2018). However, by means of our customized training of state-of-the-art convolutional neural networks for object detection and face recognition we can, in comparison to this toolkit, automatically extract more information about individual frames and their contexts.
  • Publication
    Representational Capacity of Deep Neural Networks: A Computing Study
    (SCITEPRESS - Science and Technology Publications, 2019-09) ;
    Hrycej, Tomas
    ;
    There is some theoretical evidence that deep neural networks with multiple hidden layers have a potential for more efficient representation of multidimensional mappings than shallow networks with a single hidden layer. The question is whether it is possible to exploit this theoretical advantage for finding such representations with help of numerical training methods. Tests using prototypical problems with a known mean square minimum did not confirm this hypothesis. Minima found with the help of deep networks have always been worse than those found using shallow networks. This does not directly contradict the theoretical findings—it is possible that the superior representational capacity of deep networks is genuine while finding the mean square minimum of such deep networks is a substantially harder problem than with shallow ones.
    Type:
    Journal:
    Scopus© Citations 2
  • Publication
    Singular Value Decomposition and Neural Networks
    (Springer, 2019-09) ;
    Hrycej, Tomas
    ;
    ;
    Tetko, Igor V.
    ;
    Kůrková, Věra
    ;
    Karpov, Pavel
    ;
    Theis, Fabian
    Type:
    Journal:
    Volume:
    Scopus© Citations 7