Open Access. Powered by Scholars. Published by Universities.®

Data Science Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 6 of 6

Full-Text Articles in Data Science

The Detection Of Sexual Harassment And Chat Predators Using Artificial Neural Network, Noor Amer Hamzah, Ban N. Dhannoon Dec 2021

The Detection Of Sexual Harassment And Chat Predators Using Artificial Neural Network, Noor Amer Hamzah, Ban N. Dhannoon

Karbala International Journal of Modern Science

The vast increase in using social media sites like Twitter and Facebook led to frequent sexual_harassment on the Internet, which is considered a major societal problem. This paper aims to detect sexual_harassment and cyber_predators in early phase. We used deeplearning like Bidirectionally-long-short-term memory. Word representations are carefully reviewed in text specific to mapping to real number vectors. The chat sexual predators Detection_approach with the proposed_model. The best results obtained by the performance measured with F0.5-score were the result is_0.927 with proposed_models. The accuracy measured is_97.27% in the proposed_model. The comments sexual_harassment Detection_approach the result is_0.925 F0.5-score, and accuracy measured is_99.12%.


Fine-Grained Detection Of Hate Speech Using Bertoxic, Yakoob Khan Jun 2021

Fine-Grained Detection Of Hate Speech Using Bertoxic, Yakoob Khan

Dartmouth College Undergraduate Theses

This thesis describes our approach towards the fine-grained detection of hate speech using deep learning. We leverage the transformer encoder architecture to propose BERToxic, a system that fine-tunes a pre-trained BERT model to locate toxic text spans in a given text and utilizes additional post-processing steps to refine the prediction boundaries. The post-processing steps involve (1) labeling character offsets between consecutive toxic tokens as toxic and (2) assigning a toxic label to words that have at least one token labeled as toxic. Through experiments, we show that these two post-processing steps improve the performance of our model by 4.16% on …


Lexical Complexity Prediction With Assembly Models, Aadil Islam Jun 2021

Lexical Complexity Prediction With Assembly Models, Aadil Islam

Dartmouth College Undergraduate Theses

Tuning the complexity of one's writing is essential to presenting ideas in a logical, intuitive manner to audiences. This paper describes a system submitted by team BigGreen to LCP 2021 for predicting the lexical complexity of English words in a given context. We assemble a feature engineering-based model and a deep neural network model with an underlying Transformer architecture based on BERT. While BERT itself performs competitively, our feature engineering-based model helps in extreme cases, eg. separating instances of easy and neutral difficulty. Our handcrafted features comprise a breadth of lexical, semantic, syntactic, and novel phonetic measures. Visualizations of BERT …


Automated Analysis Of Rfps Using Natural Language Processing (Nlp) For The Technology Domain, Sterling Beason, William Hinton, Yousri A. Salamah, Jordan Salsman May 2021

Automated Analysis Of Rfps Using Natural Language Processing (Nlp) For The Technology Domain, Sterling Beason, William Hinton, Yousri A. Salamah, Jordan Salsman

SMU Data Science Review

Much progress has been made in text analysis, specifically within the statistical domain of Term Frequency (TF) and Inverse Document Frequency (IDF). However, there is much room for improvement especially within the area of discovering Emerging Trends. Emerging Trend Detection Systems (ETDS) depend on ingesting a collection of textual data and TF/IDF to identify new or up-trending topics within the Corpus. However, the tremendous rate of change and the amount of digital information presents a challenge that makes it almost impossible for a human expert to spot emerging trends without relying on an automated ETD system. Since the U.S. Government …


Semantic Classification Of Multidialectal Arabic Social Media, Tom Rishel May 2021

Semantic Classification Of Multidialectal Arabic Social Media, Tom Rishel

Dissertations

Arabic is one of the most widely used languages in the world, but due in part to its morphological and syntactic richness, resources for automated processing of Arabic are relatively rare. Arabic takes three primary forms: Classical Arabic as seen in the Qur’an and other classical texts; Modern Standard Arabic (MSA) as seen in newspapers, formal documents, and other written text intended for widespread distribution; and dialectal Arabic as used in common speech and informal communication. Social media posts are often written in informal language and may include non-standard spellings, abbreviations, emoticons, hashtags, and emojis. Dialectal Arabic is commonly used …


Improving Space Efficiency Of Deep Neural Networks, Aliakbar Panahi Jan 2021

Improving Space Efficiency Of Deep Neural Networks, Aliakbar Panahi

Theses and Dissertations

Language models employ a very large number of trainable parameters. Despite being highly overparameterized, these networks often achieve good out-of-sample test performance on the original task and easily fine-tune to related tasks. Recent observations involving, for example, intrinsic dimension of the objective landscape and the lottery ticket hypothesis, indicate that often training actively involves only a small fraction of the parameter space. Thus, a question remains how large a parameter space needs to be in the first place — the evidence from recent work on model compression, parameter sharing, factorized representations, and knowledge distillation increasingly shows that models can be …