Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Computer Engineering

TÜBİTAK

Journal

Natural language processing

Articles 1 - 15 of 15

Full-Text Articles in Physical Sciences and Mathematics

Text-To-Sql: A Methodical Review Of Challenges And Models, Ali Buğra Kanburoğlu, Faik Boray Tek May 2024

Text-To-Sql: A Methodical Review Of Challenges And Models, Ali Buğra Kanburoğlu, Faik Boray Tek

Turkish Journal of Electrical Engineering and Computer Sciences

This survey focuses on Text-to-SQL, automated translation of natural language queries into SQL queries. Initially, we describe the problem and its main challenges. Then, by following the PRISMA systematic review methodology, we survey the existing Text-to-SQL review papers in the literature. We apply the same method to extract proposed Text-to-SQL models and classify them with respect to used evaluation metrics and benchmarks. We highlight the accuracies achieved by various models on Text-to-SQL datasets and discuss execution-guided evaluation strategies. We present insights into model training times and implementations of different models. We also explore the availability of Text-to-SQL datasets in non-English …


Exploring The Impact Of Training Datasets On Turkish Stance Detection, Muhammed Sai̇d Zengi̇n, Berk Utku Yeni̇sey, Mücahi̇d Kutlu Nov 2023

Exploring The Impact Of Training Datasets On Turkish Stance Detection, Muhammed Sai̇d Zengi̇n, Berk Utku Yeni̇sey, Mücahi̇d Kutlu

Turkish Journal of Electrical Engineering and Computer Sciences

Stance detection has garnered considerable attention from researchers due to its broad range of applications, including fact-checking and social computing. While state-of-the-art stance detection models are usually based on supervised machine learning methods, their effectiveness is heavily reliant on the quality of training data. This problem is more prevalent in stance detection task because the stance of a text is intimately tied to the target under consideration. While numerous datasets exist for stance detection, determining their suitability for a specific target can be challenging. In this work, we focus on Turkish stance detection and explore the impact of training data …


Solving Turkish Math Word Problems By Sequence-To-Sequence Encoder-Decoder Models, Esi̇n Gedi̇k, Tunga Güngör Mar 2023

Solving Turkish Math Word Problems By Sequence-To-Sequence Encoder-Decoder Models, Esi̇n Gedi̇k, Tunga Güngör

Turkish Journal of Electrical Engineering and Computer Sciences

Solving math word problems (MWP) is a challenging task due to the semantic gap between natural language texts and mathematical equations. The main purpose of the task is to take a written math problem as input and produce a proper equation as output for solving that problem. This paper describes a sequence-to-sequence (seq2seq) neural model for automatically solving Turkish MWPs based on their semantic meanings in the text. It comprises a bidirectional encoder to comprehend the semantics of the problem by encoding the input sequence and a decoder with attention to extract the equation by tracking the semantic meanings of …


Diacritics Correction In Turkish With Context-Aware Sequence To Sequence Modeling, Asi̇ye Tuba Özge, Özge Bozal, Umut Özge Sep 2022

Diacritics Correction In Turkish With Context-Aware Sequence To Sequence Modeling, Asi̇ye Tuba Özge, Özge Bozal, Umut Özge

Turkish Journal of Electrical Engineering and Computer Sciences

Digital texts in many languages have examples of missing or misused diacritics which makes it hard for natural language processing applications to disambiguate the meaning of words. Therefore, diacritics restoration is a crucial step in natural language processing applications for many languages. In this study we approach this problem as bidirectional transformation of diacritical letters and their ASCII counterparts, rather than unidirectional diacritic restoration. We propose a context-aware character-level sequence to sequence model for this transformation. The model is language independent in the sense that no language-specific feature extraction is necessary other than the utilization of word embeddings and is …


Event-Related Microblog Retrieval In Turkish, Çağri Toraman Mar 2022

Event-Related Microblog Retrieval In Turkish, Çağri Toraman

Turkish Journal of Electrical Engineering and Computer Sciences

Microblogs, such as tweets, are short messages in which users are able to share any opinion and information. Microblogs are mostly related to real-life events reported in news articles. Finding event-related microblogs is important to analyze online social networks and understand public opinion on events. However, finding such microblogs is a challenging task due to the dynamic nature of microblogs and their limited length. In this study, assuming that news articles are given as queries and microblogs as documents, we find event-related microblogs in Turkish. In order to represent news articles and microblogs, we examine encoding methods, namely traditional bag-of-words …


Detecting And Correcting Automatic Speech Recognition Errors With A New Model, Recep Si̇nan Arslan, Necaatti̇n Barişçi, Nursal Arici, Sabri̇ Koçer Jan 2021

Detecting And Correcting Automatic Speech Recognition Errors With A New Model, Recep Si̇nan Arslan, Necaatti̇n Barişçi, Nursal Arici, Sabri̇ Koçer

Turkish Journal of Electrical Engineering and Computer Sciences

The purpose of automatic speech recognition (ASR) systems is to recognize speech signals obtained from people and convert them into text so that they can be processed by a computer. Although many ASR applications are versatile and widely used in the real world, they still generate relatively inaccurate results. They tend to generate spelling errors in recognized words, especially in noisy environments, in situations where the vocabulary size is increased, and at times when the input speech is of poor quality. The permanent presence of errors in ASR systems has led to the need to find alternative methods for automatic …


Efficient Turkish Tweet Classification System For Crisis Response, Saed Alqaraleh, Merve Işik Jan 2020

Efficient Turkish Tweet Classification System For Crisis Response, Saed Alqaraleh, Merve Işik

Turkish Journal of Electrical Engineering and Computer Sciences

This paper presents a convolutional neural networks Turkish tweet classification system for crisis response. This system has the ability to classify the present information before or during any crisis. In addition, a preprocessing model was also implemented and integrated as a part of the developed system. This paper presents the first ever Turkish tweet dataset for crisis response, which can be widely used and improve similar studies. This dataset has been carefully preprocessed, annotated, and well organized. It is suitable to be used by all the well-known natural language processing tools. Extensive experimental work, using our produced Turkish tweet dataset …


Automated Labeling Of Terms In Medical Reports In Serbian, Aldina Avdic, Ulfeta Marovac, Dragan Jankovic Jan 2020

Automated Labeling Of Terms In Medical Reports In Serbian, Aldina Avdic, Ulfeta Marovac, Dragan Jankovic

Turkish Journal of Electrical Engineering and Computer Sciences

Nowadays, many electronic health reports (EHRs) are stored daily. They consist of the structured part and of an unstructured section written in natural language. Due to the limited time for medical examination, EHRs are short reports which often contain errors and abbreviations. Therefore it is a challenge to process an EHR and extract knowledge from this part of the text for different purposes. This paper compares the results of three proposed methods for automatic labeling of medical terms in unstructured parts of EHRs. All words are categorized as words within the medical domain (symptoms, diagnoses, therapies, anatomy, specialties etc.) and …


Automatic Concept Identification Of Software Requirements In Turkish, Fatma Bozyi̇ği̇t, Özlem Aktaş, Deni̇z Kilinç Jan 2019

Automatic Concept Identification Of Software Requirements In Turkish, Fatma Bozyi̇ği̇t, Özlem Aktaş, Deni̇z Kilinç

Turkish Journal of Electrical Engineering and Computer Sciences

Software requirements include description of the features for the target system and express the expectations of users. In the analysis phase, requirements are transformed into easy-to-understand conceptual models that facilitate communication between stakeholders. Although creating conceptual models using requirements is mostly implemented manually by analysts, the number of models that automate this process has increased recently. Most of the models and tools are developed to analyze requirements in English, and there is no study for agglutinative languages such as Turkish or Finnish. In this study, we propose an automatic concept identification model that transforms Turkish requirements into Unified Modeling Language …


A Hybrid Sentiment Analysis Method For Turkish, Buket Erşahi̇n, Özlem Aktaş, Deni̇z Kilinç, Mustafa Erşahi̇n Jan 2019

A Hybrid Sentiment Analysis Method For Turkish, Buket Erşahi̇n, Özlem Aktaş, Deni̇z Kilinç, Mustafa Erşahi̇n

Turkish Journal of Electrical Engineering and Computer Sciences

This paper presents a hybrid methodology for Turkish sentiment analysis, which combines the lexicon-based and machine learning (ML)-based approaches. On the lexicon-based side, we use a sentiment dictionary that is extended with a synonyms lexicon. Besides this, we tackle the classification problem with three supervised classifiers, naive Bayes, support vector machines, and J48, on the ML side. Our hybrid methodology combines these two approaches by generating a new lexicon-based value according to our feature generation algorithm and feeds it as one of the features to machine learning classifiers. Despite the linguistic challenges caused by the morphological structure of Turkish, the …


Implementing Universal Dependency, Morphology, And Multiword Expression Annotation Standards For Turkish Language Processing, Umut Sulubacak, Gülşen Eryi̇ği̇t Jan 2018

Implementing Universal Dependency, Morphology, And Multiword Expression Annotation Standards For Turkish Language Processing, Umut Sulubacak, Gülşen Eryi̇ği̇t

Turkish Journal of Electrical Engineering and Computer Sciences

Released only a year ago as the outputs of a research project (``Parsing Web 2.0 Sentences'', supported in part by a TÜBİTAK 1001 grant (No. 112E276) and a part of the ICT COST Action PARSEME (IC1207)), IMST and IWT are currently the most comprehensive Turkish dependency treebanks in the literature. This article introduces the final states of our treebanks, as well as a newly integrated hierarchical categorization of the multiheaded dependencies and their organization in an exclusive deep dependency layer in the treebanks. It also presents the adaptation of recent studies on standardizing multiword expression and named entity annotation schemes …


Relation Extraction Via One-Shot Dependency Parsing On Intersentential, Higher-Order, And Nested Relations, Gözde Gül Şahi̇n, Erdem Emekli̇gi̇l, Seçi̇l Arslan, Onur Ağin, Gülşen Eryi̇ği̇t Jan 2018

Relation Extraction Via One-Shot Dependency Parsing On Intersentential, Higher-Order, And Nested Relations, Gözde Gül Şahi̇n, Erdem Emekli̇gi̇l, Seçi̇l Arslan, Onur Ağin, Gülşen Eryi̇ği̇t

Turkish Journal of Electrical Engineering and Computer Sciences

Despite the emergence of digitalization, people still interact with institutions via traditional means such as submitting free formatted petitions, orders, or applications. These noisy documents generally consist of complex relations that are nested, higher-order, and intersentential. Most of the current approaches address extraction of only sentence-level and binary relations from grammatically correct text and generally require high-level linguistic features coming from preprocessors such as a parts-of-speech tagger, chunker, or syntactic parser. In this article, we focus on extracting complex relations in order to automate the task of understanding user intentions. We propose a novel language-agnostic and noise-immune approach that does …


Unsupervised Learning Of Allomorphs In Turkish, Burcu Can Jan 2017

Unsupervised Learning Of Allomorphs In Turkish, Burcu Can

Turkish Journal of Electrical Engineering and Computer Sciences

One morpheme may have several surface forms that correspond to allomorphs. In English, ed and $d$ are surface forms of the past tense morpheme, and $s$, es, and ies are surface forms of the plural or present tense morpheme. Turkish has a large number of allomorphs due to its morphophonemic processes. One morpheme can have tens of different surface forms in Turkish. This leads to a sparsity problem in natural language processing tasks in Turkish. Detection of allomorphs has not been studied much because of its difficulty. For example, tü and di are Turkish allomorphs (i.e. past tense morpheme), but …


Temporal Logic Extension For Self-Referring, Nonexistence, Multiple Recurrence, And Anterior Past Events, Şadi̇ Evren Şeker Jan 2015

Temporal Logic Extension For Self-Referring, Nonexistence, Multiple Recurrence, And Anterior Past Events, Şadi̇ Evren Şeker

Turkish Journal of Electrical Engineering and Computer Sciences

This study focuses on the possible extensions of current temporal logics. In this study, 4 extensions are proposed: self-referring events, nonexisting events, multiple recurrence of events, and an improvement on anterior past events. Each of these extensions is on a different level of temporal logics. The main motivation behind the extensions is the temporal analysis of Turkish. Similar to temporal logic studies built on other natural languages, like French, Ukrainian, Italian, Korean, English, or Romanian, this is the first time that the Turkish language has been deeply questioned in the sense of computable temporal logic using the view of a …


Automatic Knowledge Extraction For Filling In Biography Forms From Turkish Texts, İlknur Pehli̇van, Zeynep Orhan Jan 2011

Automatic Knowledge Extraction For Filling In Biography Forms From Turkish Texts, İlknur Pehli̇van, Zeynep Orhan

Turkish Journal of Electrical Engineering and Computer Sciences

This study presents a method for building an automatic knowledge extraction system for filling in biography forms from Turkish texts. Several biographies are analyzed in order to choose the set of biography categories to be studied. The fields of the biography form to be created are also defined based on this analysis. Information extraction techniques are used for implementation. A separate testing platform is designed to evaluate the accuracy of the extracted data. Results of the testing platform have shown this study to be a promising process to be further developed especially for creating forms in the Turkish language.