Now showing 1 - 10 of 23
  • Publication
    Make Deep Networks Shallow Again
    (SciTePress, 2023-11-15) ;
    Hrycej, Tomas
    ;
    ;
    Ana Fred
    ;
    Frans Coenen
    ;
    Jorge Bernardino
    Deep neural networks have a good success record and are thus viewed as the best architecture choice for complex applications. Their main shortcoming has been, for a long time, the vanishing gradient which prevented the numerical optimization algorithms from acceptable convergence. An important special case of network architecture, frequently used in computer vision applications, consists of using a stack of layers of the same dimension. For this architecture, a breakthrough has been achieved by the concept of residual connections-an identity mapping parallel to a conventional layer. This concept substantially alleviates the vanishing gradient problem and is thus widely used. The focus of this paper is to show the possibility of substituting the deep stack of residual layers with a shallow architecture with comparable expressive power and similarly good convergence properties. A stack of residual layers can be expressed as an expansion of terms similar to the Taylor expansion. This expansion suggests the possibility of truncating the higher-order terms and receiving an architecture consisting of a single broad layer composed of all initially stacked layers in parallel. In other words, a sequential deep architecture is substituted by a parallel shallow one. Prompted by this theory, we investigated the performance capabilities of the parallel architecture in comparison to the sequential one. The computer vision datasets MNIST and CIFAR10 were used to train both architectures for a total of 6,912 combinations of varying numbers of convolutional layers, numbers of filters, kernel sizes, and other meta parameters. Our findings demonstrate a surprising equivalence between the deep (sequential) and shallow (parallel) architectures. Both layouts produced similar results in terms of training and validation set loss. This discovery implies that a wide, shallow architecture can potentially replace a deep network without sacrificing performance. Such substitution has the potential to simplify network architectures, improve optimization efficiency, and accelerate the training process.
    Type:
    Journal:
  • Publication
    When Truth Matters - Addressing Pragmatic Categories in Natural Language Inference (NLI) by Large Language Models (LLMs)
    ( 2023-07) ;
    Kalouli, Aikaterini-Lida
    ;
    ;
    In this paper, we focus on the ability of large language models (LLMs) to accommodate different pragmatic sentence types, such as questions, commands, as well as sentence fragments for natural language inference (NLI). On the commonly used notion of logical inference, nothing can be inferred from a question, a command, or an incomprehensible sentence fragment. We find MNLI, arguably the most important NLI dataset, and hence models fine-tuned on this dataset, insensitive to this fact. Using a symbolic semantic parser, we develop and make publicly available, fine-tuning datasets designed specifically to address this issue, with promising results. We also make a first exploration of ChatGPT's concept of entailment.
    Type:
    Journal:
  • Publication
    Enhancing Educational Dialogues: A Reinforcement Learning Approach for Generating AI Teacher Responses
    (Association for Computational Linguistics, 2023-07-13) ; ;
    Reinforcement Learning remains an underutilized method of training and fine-tuning Language Models (LMs) despite recent successes. This paper presents a simple approach of finetuning a language model with Reinforcement Learning to achieve competitive performance on the BEA 2023 Shared Task whose goal is to automatically generate teacher responses in educational dialogues. We utilized the novel NLPO algorithm that masks out tokens during generation to direct the model towards generations that maximize a reward function. We show results for both the t5-base model with 220 million parameters from the HuggingFace repository submitted to the leaderboard that, despite its comparatively small size, has achieved a good performance on both test and dev set, as well as GPT-2 with 124 million parameters. The presented results show that despite maximizing only one of the metrics used in the evaluation as a reward function our model scores highly in the other metrics as well.
    Type:
    Journal:
    Volume:
  • Publication
    Educational Chatbots for Collaborative Learing: Results of a Design Experiment in a Middle School
    Educational chatbots promise many benefits for teaching and learning. Although chatbot use cases in this research field are rapidly growing, most studies focus on individual users rather than on collaborative group settings. To address this issue, this paper investigates how chatbot-mediated learning can be designed to foster middle school students in team-based assignments. Using an educational design research approach, quality indicators of educational chatbots were derived from the literature, which served as a guideline for the development of the chatbot Tubo (meaning tutoring bot). Tubo is part of a web-based team learning environment in which students can chat with each other and collaboratively work on their group assignments. As a team member and tutor of each group, Tubo guides the students through the learning journey by different scaffolding elements and helps with content-related questions the students have. As part of a first design cycle, the chatbot application was tested with a school class of a technical vocational school in Switzerland. The received feedback suggests that the approach of team-based learning with chatbots has a lot of potential from the students' and teachers' point of view. However, the role distribution of the individual group members may have to be further specified to address the different needs of autonomous as well as more control-oriented students.
  • Publication
    Number of Attention Heads vs. Number of Transformer-encoders in Computer Vision
    (SciTePress, 2022-10)
    Hrycej, Tomas
    ;
    ;
    Determining an appropriate number of attention heads on one hand and the number of transformer-encoders, on the other hand, is an important choice for Computer Vision (CV) tasks using the Transformer architecture. Computing experiments confirmed the expectation that the total number of parameters has to satisfy the condition of overdetermination (i.e., number of constraints significantly exceeding the number of parameters). Then, good generalization performance can be expected. This sets the boundaries within which the number of heads and the number of transformers can be chosen. If the role of context in images to be classified can be assumed to be small, it is favorable to use multiple transformers with a low number of heads (such as one or two). In classifying objects whose class may heavily depend on the context within the image (i.e., the meaning of a patch being dependent on other patches), the number of heads is equally important as that of transformers.
    Type:
    Journal:
  • Publication
    Training Neural Networks in Single vs. Double Precision
    (SciTePress, 2022-10)
    Hrycej, Tomas
    ;
    ;
    The commitment to single-precision floating-point arithmetic is widespread in the deep learning community. To evaluate whether this commitment is justified, the influence of computing precision (single and double precision) on the optimization performance of the Conjugate Gradient (CG) method (a second-order optimization algorithm) and Root Mean Square Propagation (RMSprop) (a first-order algorithm) has been investigated. Tests of neural networks with one to five fully connected hidden layers and moderate or strong nonlinearity with up to 4 million network parameters have been optimized for Mean Square Error (MSE). The training tasks have been set up so that their MSE minimum was known to be zero. Computing experiments have dis-closed that single-precision can keep up (with superlinear convergence) with double-precision as long as line search finds an improvement. First-order methods such as RMSprop do not benefit from double precision. However, for moderately nonlinear tasks, CG is clearly superior. For strongly nonlinear tasks, both algorithm classes find only solutions fairly poor in terms of mean square error as related to the output variance. CG with double floating-point precision is superior whenever the solutions have the potential to be useful for the application goal.
    Type:
    Journal:
  • Publication
    Type:
    Journal:
  • Publication
    AL: An Adaptive Learning Support System for Argumentation Skills
    Recent advances in Natural Language Processing (NLP) bear the opportunity to analyze the argumentation quality of texts. This can be leveraged to provide students with individual and adaptive feedback in their personal learning journey. To test if individual feedback on students' argumentation will help them to write more convincing texts, we developed AL, an adaptive IT tool that provides students with feedback on the argumentation structure of a given text. We compared AL with 54 students to a proven argumentation support tool. We found students using AL wrote more convincing texts with better formal quality of argumentation compared to the ones using the traditional approach. The measured technology acceptance provided promising results to use this tool as a feedback application in different learning settings. The results suggest that learning applications based on NLP may have a beneficial use for developing better writing and reasoning for students in traditional learning settings.
    Type:
    Journal:
    Scopus© Citations 49
  • Publication
    A Corpus for Argumentative Writing Support in German
    In this paper, we present a novel annotation approach to capture claims and premises of arguments and their relations in student-written persuasive peer reviews on business models in German language. We propose an annotation scheme based on annotation guidelines that allows to model claims and premises as well as support and attack relations for capturing the structure of argumentative discourse in student-written peer reviews. We conduct an annotation study with three annotators on 50 persuasive essays to evaluate our annotation scheme. The obtained inter-rater agreement of α = 0.57 for argument components and α = 0.49 for argumentative relations indicates that the proposed annotation scheme successfully guides annotators to moderate agreement. Finally, we present our freely available corpus of 1,000 persuasive student-written peer reviews on business models and our annotation guidelines to encourage future research on the design and development of argumentative writing support systems for students.
    Type:
    Journal: