Open Access. Powered by Scholars. Published by Universities.®

Digital Commons Network

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 6 of 6

Full-Text Articles in Entire DC Network

Inéire: An Interpretable Nlp Pipeline Summarizing Inclusive Policy Making Concerning Migrants In Ireland, Arefeh Kazem, Arjumand Younus, Mingyeong Jeon, Muhammad Atif Qureshi, Simon Caton Aug 2023

Inéire: An Interpretable Nlp Pipeline Summarizing Inclusive Policy Making Concerning Migrants In Ireland, Arefeh Kazem, Arjumand Younus, Mingyeong Jeon, Muhammad Atif Qureshi, Simon Caton

Articles

Reaching marginal and other migrant communities to elicit their political views and opinions is a well-known challenge. Social media has enabled a certain amount of online activism and participation, especially in societies with abundant multicultural identities. However, it can be quite challenging to isolate the voice of the migrant in English-speaking countries, especially with an abundance of content in English on social media. In this paper, we pursue a case study of Ireland’s Twitter landscape, specifically migrant and native activists. We present a methodology that can accurately ( >80% ) isolate the Irish migrant voice with as little as 25 …


Know An Emotion By The Company It Keeps: Word Embeddings From Reddit/Coronavirus, Alejandro García-Rudolph, David Sanchez-Pinsach, Dietmar Frey, Eloy Opisso, Katryna Cisek, John Kelleher Jan 2023

Know An Emotion By The Company It Keeps: Word Embeddings From Reddit/Coronavirus, Alejandro García-Rudolph, David Sanchez-Pinsach, Dietmar Frey, Eloy Opisso, Katryna Cisek, John Kelleher

Articles

Social media is a crucial communication tool (e.g., with 430 million monthly active users in online forums such as Reddit), being an objective of Natural Language Processing (NLP) techniques. One of them (word embeddings) is based on the quotation, “You shall know a word by the company it keeps,” highlighting the importance of context in NLP. Meanwhile, “Context is everything in Emotion Research.” Therefore, we aimed to train a model (W2V) for generating word associations (also known as embeddings) using a popular Coronavirus Reddit forum, validate them using public evidence and apply them to the discovery of context for specific …


Research On Medical Question Answering System Based On Knowledge Graph, Zhixue Jiang, Chengying Chi, Yun Yun Zhan Jan 2021

Research On Medical Question Answering System Based On Knowledge Graph, Zhixue Jiang, Chengying Chi, Yun Yun Zhan

Articles

To meet the high-efficiency question answering needs of existing patients and doctors, this system integrates medical professional knowledge, knowledge graphs, and question answering systems that conduct man-machine dialogue through natural language. This system locates the medical field, uses crawler technology to use vertical medical websites as data sources, and uses diseases as the core entity to construct a knowledge graph containing 44,000 knowledge entities of 7 types and 300,000 entities of 11 kinds. It is stored in the Neo4j graph database, using rule-based matching methods and string-matching algorithms to construct a domain lexicon to classify and query questions. This system …


An Ensemble Approach For Annotating Source Code Identifiers With Part-Of-Speech Tags, Christian D. Newman,, Michael J. Decker, Reem S. Alsuhaibani, Anthony Peruma, Mohamed Wiem Mkaouer, Satyajit Mohapatra, Tejal Vishnoi, Marcos Zampieri, Timothy Sheldon, Emily Hill Jan 2021

An Ensemble Approach For Annotating Source Code Identifiers With Part-Of-Speech Tags, Christian D. Newman,, Michael J. Decker, Reem S. Alsuhaibani, Anthony Peruma, Mohamed Wiem Mkaouer, Satyajit Mohapatra, Tejal Vishnoi, Marcos Zampieri, Timothy Sheldon, Emily Hill

Articles

This paper presents an ensemble part-of-speech tagging approach for source code identifiers. Ensemble tagging is a technique that uses machine-learning and the output from multiple part-of-speech taggers to annotate natural language text at a higher quality than the part-of-speech taggers are able to obtain independently. Our ensemble uses three state-of-the-art part-of-speech taggers: SWUM, POSSE, and Stanford. We study the quality of the ensemble's annotations on five different types of identifier names: function, class, attribute, parameter, and declaration statement at the level of both individual words and full identifier names. We also study and discuss the weaknesses of our tagger to …


Comparing Tagging Suggestion Models On Discrete Corpora, Bojan Bozic, Andre Rios, Sarah Jane Delany Jan 2020

Comparing Tagging Suggestion Models On Discrete Corpora, Bojan Bozic, Andre Rios, Sarah Jane Delany

Articles

This paper aims to investigate the methods for the prediction of tags on a textual corpus that describes diverse data sets based on short messages; as an example, the authors demonstrate the usage of methods based on hotel staff inputs in a ticketing system as well as the publicly available StackOverflow corpus. The aim is to improve the tagging process and find the most suitable method for suggesting tags for a new text entry.


Languages For Different Health Information Readers: Multitrait-Multimethod Content Analysis Of Cochrane Systematic Reviews Textual Summary Formats, Jasna Karačić, Pierpaolo Dondio, Ivan Buljan, Darko Hren, Ana Marušić Jan 2019

Languages For Different Health Information Readers: Multitrait-Multimethod Content Analysis Of Cochrane Systematic Reviews Textual Summary Formats, Jasna Karačić, Pierpaolo Dondio, Ivan Buljan, Darko Hren, Ana Marušić

Articles

Background: Although subjective expressions and linguistic fluency have been shown as important factors in processing and interpreting textual facts, analyses of these traits in textual health information for different audiences are lacking. We analyzed the readability and linguistic psychological and emotional characteristics of different textual summary formats of Cochrane systematic reviews. Methods: We performed a multitrait-multimethod cross-sectional study of Press releases available at Cochrane web site (n= 162) and corresponding Scientific abstracts (n= 158), Cochrane Clinical Answers (n= 35) and Plain language summaries in English (n= 156), French (n= 101), German (n= 41) and Croatian (n=156). We used SMOG index …