Open Access. Powered by Scholars. Published by Universities.®
- Discipline
Articles 1 - 6 of 6
Full-Text Articles in Entire DC Network
Inéire: An Interpretable Nlp Pipeline Summarizing Inclusive Policy Making Concerning Migrants In Ireland, Arefeh Kazem, Arjumand Younus, Mingyeong Jeon, Muhammad Atif Qureshi, Simon Caton
Inéire: An Interpretable Nlp Pipeline Summarizing Inclusive Policy Making Concerning Migrants In Ireland, Arefeh Kazem, Arjumand Younus, Mingyeong Jeon, Muhammad Atif Qureshi, Simon Caton
Articles
Reaching marginal and other migrant communities to elicit their political views and opinions is a well-known challenge. Social media has enabled a certain amount of online activism and participation, especially in societies with abundant multicultural identities. However, it can be quite challenging to isolate the voice of the migrant in English-speaking countries, especially with an abundance of content in English on social media. In this paper, we pursue a case study of Ireland’s Twitter landscape, specifically migrant and native activists. We present a methodology that can accurately ( >80% ) isolate the Irish migrant voice with as little as 25 …
Know An Emotion By The Company It Keeps: Word Embeddings From Reddit/Coronavirus, Alejandro García-Rudolph, David Sanchez-Pinsach, Dietmar Frey, Eloy Opisso, Katryna Cisek, John Kelleher
Know An Emotion By The Company It Keeps: Word Embeddings From Reddit/Coronavirus, Alejandro García-Rudolph, David Sanchez-Pinsach, Dietmar Frey, Eloy Opisso, Katryna Cisek, John Kelleher
Articles
Social media is a crucial communication tool (e.g., with 430 million monthly active users in online forums such as Reddit), being an objective of Natural Language Processing (NLP) techniques. One of them (word embeddings) is based on the quotation, “You shall know a word by the company it keeps,” highlighting the importance of context in NLP. Meanwhile, “Context is everything in Emotion Research.” Therefore, we aimed to train a model (W2V) for generating word associations (also known as embeddings) using a popular Coronavirus Reddit forum, validate them using public evidence and apply them to the discovery of context for specific …
Research On Medical Question Answering System Based On Knowledge Graph, Zhixue Jiang, Chengying Chi, Yun Yun Zhan
Research On Medical Question Answering System Based On Knowledge Graph, Zhixue Jiang, Chengying Chi, Yun Yun Zhan
Articles
To meet the high-efficiency question answering needs of existing patients and doctors, this system integrates medical professional knowledge, knowledge graphs, and question answering systems that conduct man-machine dialogue through natural language. This system locates the medical field, uses crawler technology to use vertical medical websites as data sources, and uses diseases as the core entity to construct a knowledge graph containing 44,000 knowledge entities of 7 types and 300,000 entities of 11 kinds. It is stored in the Neo4j graph database, using rule-based matching methods and string-matching algorithms to construct a domain lexicon to classify and query questions. This system …
An Ensemble Approach For Annotating Source Code Identifiers With Part-Of-Speech Tags, Christian D. Newman,, Michael J. Decker, Reem S. Alsuhaibani, Anthony Peruma, Mohamed Wiem Mkaouer, Satyajit Mohapatra, Tejal Vishnoi, Marcos Zampieri, Timothy Sheldon, Emily Hill
An Ensemble Approach For Annotating Source Code Identifiers With Part-Of-Speech Tags, Christian D. Newman,, Michael J. Decker, Reem S. Alsuhaibani, Anthony Peruma, Mohamed Wiem Mkaouer, Satyajit Mohapatra, Tejal Vishnoi, Marcos Zampieri, Timothy Sheldon, Emily Hill
Articles
This paper presents an ensemble part-of-speech tagging approach for source code identifiers. Ensemble tagging is a technique that uses machine-learning and the output from multiple part-of-speech taggers to annotate natural language text at a higher quality than the part-of-speech taggers are able to obtain independently. Our ensemble uses three state-of-the-art part-of-speech taggers: SWUM, POSSE, and Stanford. We study the quality of the ensemble's annotations on five different types of identifier names: function, class, attribute, parameter, and declaration statement at the level of both individual words and full identifier names. We also study and discuss the weaknesses of our tagger to …
Comparing Tagging Suggestion Models On Discrete Corpora, Bojan Bozic, Andre Rios, Sarah Jane Delany
Comparing Tagging Suggestion Models On Discrete Corpora, Bojan Bozic, Andre Rios, Sarah Jane Delany
Articles
This paper aims to investigate the methods for the prediction of tags on a textual corpus that describes diverse data sets based on short messages; as an example, the authors demonstrate the usage of methods based on hotel staff inputs in a ticketing system as well as the publicly available StackOverflow corpus. The aim is to improve the tagging process and find the most suitable method for suggesting tags for a new text entry.
Languages For Different Health Information Readers: Multitrait-Multimethod Content Analysis Of Cochrane Systematic Reviews Textual Summary Formats, Jasna Karačić, Pierpaolo Dondio, Ivan Buljan, Darko Hren, Ana Marušić
Languages For Different Health Information Readers: Multitrait-Multimethod Content Analysis Of Cochrane Systematic Reviews Textual Summary Formats, Jasna Karačić, Pierpaolo Dondio, Ivan Buljan, Darko Hren, Ana Marušić
Articles
Background: Although subjective expressions and linguistic fluency have been shown as important factors in processing and interpreting textual facts, analyses of these traits in textual health information for different audiences are lacking. We analyzed the readability and linguistic psychological and emotional characteristics of different textual summary formats of Cochrane systematic reviews. Methods: We performed a multitrait-multimethod cross-sectional study of Press releases available at Cochrane web site (n= 162) and corresponding Scientific abstracts (n= 158), Cochrane Clinical Answers (n= 35) and Plain language summaries in English (n= 156), French (n= 101), German (n= 41) and Croatian (n=156). We used SMOG index …