Open Access. Powered by Scholars. Published by Universities.®

Computer Engineering Commons

Open Access. Powered by Scholars. Published by Universities.®

Anomaly detection

Discipline
Institution
Publication Year
Publication
Publication Type

Articles 1 - 30 of 35

Full-Text Articles in Computer Engineering

Unveiling Anomalies: A Survey On Xai-Based Anomaly Detection For Iot, Esin Eren, Feyza Yildirim Okay, Suat Özdemi̇r May 2024

Unveiling Anomalies: A Survey On Xai-Based Anomaly Detection For Iot, Esin Eren, Feyza Yildirim Okay, Suat Özdemi̇r

Turkish Journal of Electrical Engineering and Computer Sciences

In recent years, the rapid growth of the Internet of Things (IoT) has raised concerns about the security and reliability of IoT systems. Anomaly detection is vital for recognizing potential risks and ensuring the optimal functionality of IoT networks. However, traditional anomaly detection methods often lack transparency and interpretability, hindering the understanding of their decisions. As a solution, Explainable Artificial Intelligence (XAI) techniques have emerged to provide human-understandable explanations for the decisions made by anomaly detection models. In this study, we present a comprehensive survey of XAI-based anomaly detection methods for IoT. We review and analyze various XAI techniques, including …


Machine Learning For Intrusion Detection Into Unmanned Aerial System 6g Networks, Faisal Alrefaei May 2024

Machine Learning For Intrusion Detection Into Unmanned Aerial System 6g Networks, Faisal Alrefaei

Doctoral Dissertations and Master's Theses

Progress in the development of wireless network technology has played a crucial role in the evolution of societies and provided remarkable services over the past decades. It remotely offers the ability to execute critical missions and effective services that meet the user's needs. This advanced technology integrates cyber and physical layers to form cyber-physical systems (CPS), such as the Unmanned Aerial System (UAS), which consists of an Unmanned Aerial Vehicle (UAV), ground network infrastructure, communication link, etc. Furthermore, it plays a crucial role in connecting objects to create and develop the Internet of Things (IoT) technology. Therefore, the emergence of …


Learning And Analysis Of Dynamic Models For Grid Discrete Events Based On Log Information, Danlong Zhu, Yunqi Yan, Ying Chen, Jiaqi Zhang, Longxing Jin, Wei Fu Oct 2023

Learning And Analysis Of Dynamic Models For Grid Discrete Events Based On Log Information, Danlong Zhu, Yunqi Yan, Ying Chen, Jiaqi Zhang, Longxing Jin, Wei Fu

Journal of System Simulation

Abstract: With the increasing scale of power grid, the massive amount of log information generated bydevices in the power grid poses a challenge to the manual analysis of abnormal grid conditions. The log information generated during the operation of the power grid has the typical discrete sequential characteristics. By analyzing the log information of grid alarm messages, a station event transition probability model and an event sequence risk calculation method are proposed to effectively model and analyze the abnormal operation level of primary and secondary systems in substations. The proposed method not only successfully identifies the event sequences corresponding to …


Transforming Temporal-Dynamic Graphs Into Time-Series Data For Solving Event Detection Problems, Kutay Taşci, Fuat Akal Sep 2023

Transforming Temporal-Dynamic Graphs Into Time-Series Data For Solving Event Detection Problems, Kutay Taşci, Fuat Akal

Turkish Journal of Electrical Engineering and Computer Sciences

Event detection on temporal-dynamic graphs aims at detecting significant events based on deviations from the normal behavior of the graphs. With the widespread use of social media, many real-world events manifest as social media interactions, making them suitable for modeling as temporal-dynamic graphs. This paper presents a workflow for event detection on temporal-dynamic graphs using graph representation learning. Our workflow leverages generated embeddings of a temporal-dynamic graph to reframe the problem as an unsupervised time-series anomaly detection task. We evaluated our workflow on four distinct real-world social media datasets and compared our results with the related work. The results show …


Self-Learning Algorithms For Intrusion Detection And Prevention Systems (Idps), Juan E. Nunez, Roger W. Tchegui Donfack, Rohit Rohit, Hayley Horn Mar 2023

Self-Learning Algorithms For Intrusion Detection And Prevention Systems (Idps), Juan E. Nunez, Roger W. Tchegui Donfack, Rohit Rohit, Hayley Horn

SMU Data Science Review

Today, there is an increased risk to data privacy and information security due to cyberattacks that compromise data reliability and accessibility. New machine learning models are needed to detect and prevent these cyberattacks. One application of these models is cybersecurity threat detection and prevention systems that can create a baseline of a network's traffic patterns to detect anomalies without needing pre-labeled data; thus, enabling the identification of abnormal network events as threats. This research explored algorithms that can help automate anomaly detection on an enterprise network using Canadian Institute for Cybersecurity data. This study demonstrates that Neural Networks with Bayesian …


Variational Autoencoder-Based Anomaly Detection In Time Series Data For Inventory Record Inaccuracy, Hali̇l Arğun, Sadetti̇n Emre Alpteki̇n Jan 2023

Variational Autoencoder-Based Anomaly Detection In Time Series Data For Inventory Record Inaccuracy, Hali̇l Arğun, Sadetti̇n Emre Alpteki̇n

Turkish Journal of Electrical Engineering and Computer Sciences

Retail companies monitor inventory stock levels regularly and manage them based on forecasted sales to sustain their market position. Inventory accuracy, defined as the difference between the warehouse stock records and the actual inventory, is critical for preventing stockouts and shortages. The root causes of inventory inaccuracy are the employee or customer theft, product damage or spoilage, and wrong shipments. In this paper, we aim at detecting inaccurate stocks of one of Turkey's largest supermarket chain using the variational autoencoder (VAE), which is an unsupervised learning method. Based on the findings, we showed that VAE is able to model the …


Anomaly Detection In Multi-Seasonal Time Series Data, Ashton Taylor Williams Jan 2023

Anomaly Detection In Multi-Seasonal Time Series Data, Ashton Taylor Williams

Browse all Theses and Dissertations

Most of today’s time series data contain anomalies and multiple seasonalities, and accurate anomaly detection in these data is critical to almost any type of business. However, most mainstream forecasting models used for anomaly detection can only incorporate one or no seasonal component into their forecasts and cannot capture every known seasonal pattern in time series data. In this thesis, we propose a new multi-seasonal forecasting model for anomaly detection in time series data that extends the popular Seasonal Autoregressive Integrated Moving Average (SARIMA) model. Our model, named multi-SARIMA, utilizes a time series dataset’s multiple pre-determined seasonal trends to increase …


A Secure And Efficient Iiot Anomaly Detection Approach Using A Hybrid Deep Learning Technique, Bharath Reedy Konatham Jan 2023

A Secure And Efficient Iiot Anomaly Detection Approach Using A Hybrid Deep Learning Technique, Bharath Reedy Konatham

Browse all Theses and Dissertations

The Industrial Internet of Things (IIoT) refers to a set of smart devices, i.e., actuators, detectors, smart sensors, and autonomous systems connected throughout the Internet to help achieve the purpose of various industrial applications. Unfortunately, IIoT applications are increasingly integrated into insecure physical environments leading to greater exposure to new cyber and physical system attacks. In the current IIoT security realm, effective anomaly detection is crucial for ensuring the integrity and reliability of critical infrastructure. Traditional security solutions may not apply to IIoT due to new dimensions, including extreme energy constraints in IIoT devices. Deep learning (DL) techniques like Convolutional …


Anomaly Detection In Rotating Machinery Using Autoencoders Based On Bidirectional Lstm And Gru Neural Networks, Krishna Patra, Rabi Narayan Sethi, Dhiren Kkumar Behera May 2022

Anomaly Detection In Rotating Machinery Using Autoencoders Based On Bidirectional Lstm And Gru Neural Networks, Krishna Patra, Rabi Narayan Sethi, Dhiren Kkumar Behera

Turkish Journal of Electrical Engineering and Computer Sciences

A time series anomaly is a form of anomalous subsequence that indicates future faults will occur. The development of novel techniques for detecting this type of anomaly is significant for real-time system monitoring. Several algorithms have been used to classify anomalies successfully. However, the time series anomaly detection algorithm was not studied well. We use a new bidirectional LSTM and GRU neural networks-based hybrid autoencoder to detect if a machine is operating normally in this research. An autoencoder is trained on a set of 12 features taken from healthy operating data gathered promptly after a planned maintenance period using vibration …


Hyperspectral Rx Anomaly Detection Method Based On The Fusion Of Spatial And Spectral Feature, Liu Xuan, Xiangyang Li, He Fang, Jianwei Zhao, Fenggan Zhang Jan 2022

Hyperspectral Rx Anomaly Detection Method Based On The Fusion Of Spatial And Spectral Feature, Liu Xuan, Xiangyang Li, He Fang, Jianwei Zhao, Fenggan Zhang

Journal of System Simulation

Abstract: To address the problem that the hyperspectral anomaly detection algorithm does not make full use of the spatial information of the hyperspectral image and the detection accuracy is limited, a FSSRX (Fusing Spatial and Spectral Reed-Xiaol) anomaly detection algorithm that fuses spatial and spectrum information is proposed to improve the accuracy of hyperspectral anomaly detection. In FSSRX algorithm, the spatial feature of hyperspectral images is firstly extracted by the EMAP(Extended Multi-attribute Profile) method and the abnormal score of each pixel in spatial features is then calculated with RX detector. Meanwhile, RX anomaly detection is carried out directly on the …


Statistics-Based Anomaly Detection And Correction Method For Amazon Customer Reviews, Ishani Chatterjee Dec 2021

Statistics-Based Anomaly Detection And Correction Method For Amazon Customer Reviews, Ishani Chatterjee

Dissertations

People nowadays use the Internet to project their assessments, impressions, ideas, and observations about various subjects or products on numerous social networking sites. These sites serve as a great source of gathering information for data analytics, sentiment analysis, natural language processing, etc. The most critical challenge is interpreting this data and capturing the sentiment behind these expressions. Sentiment analysis is analyzing, processing, concluding, and inferencing subjective texts with the views. Companies use sentiment analysis to understand public opinions, perform market research, analyze brand reputation, recognize customer experiences, and study social media influence. According to the different needs for aspect granularity, …


Network Traffic Anomaly Detection Method For Imbalanced Data, Shuqin Dong, Bin Zhang Mar 2021

Network Traffic Anomaly Detection Method For Imbalanced Data, Shuqin Dong, Bin Zhang

Journal of System Simulation

Abstract: Aiming at the poor detection performances caused by the low feature extraction accuracy of rare traffic attacks from scarce samples, a network traffic anomaly detection method for imbalanced data is proposed. A traffic anomaly detection model is designed, in which the traffic features in different feature spaces are learned by alternating activation functions, architectures, corrupted rates and dropout rates of stacked denoising autoencoder (SDA), and the low accuracy in extracting features of rare traffic attacks in a single space is solved. A batch normalization algorithm is designed, and the Adam algorithm is adopted to train parameters of …


A Systematic Review Of Convolutional Neural Network-Based Structural Condition Assessment Techniques, Sandeep Sony, Kyle Dunphy, Ayan Sadhu, Miriam A M Capretz Jan 2021

A Systematic Review Of Convolutional Neural Network-Based Structural Condition Assessment Techniques, Sandeep Sony, Kyle Dunphy, Ayan Sadhu, Miriam A M Capretz

Electrical and Computer Engineering Publications

With recent advances in non-contact sensing technology such as cameras, unmanned aerial and ground vehicles, the structural health monitoring (SHM) community has witnessed a prominent growth in deep learning-based condition assessment techniques of structural systems. These deep learning methods rely primarily on convolutional neural networks (CNNs). The CNN networks are trained using a large number of datasets for various types of damage and anomaly detection and post-disaster reconnaissance. The trained networks are then utilized to analyze newer data to detect the type and severity of the damage, enhancing the capabilities of non-contact sensors in developing autonomous SHM systems. In recent …


Hyperspectral Image Anomaly Detection Based On Background Reconstruction, Xiaorui Song, Zou Ling, Lingda Wu, Wanpeng Xu Jul 2020

Hyperspectral Image Anomaly Detection Based On Background Reconstruction, Xiaorui Song, Zou Ling, Lingda Wu, Wanpeng Xu

Journal of System Simulation

Abstract: In the anomaly detection of hyperspectral images (HSIs), aiming at the difficulty of distinguishing the abnormal target from the background and the low accuracy of background prediction, a new HSI anomaly detection algorithm based on background sparse reconstruction is proposed. An online dictionary learning method is used to estimate the background spectral dictionary. The estimated background image is sparse reconstructed by the learning dictionary. The estimated background image is subtracted from the origin image to get the residual image. The anomaly detection is achieved by using the local RX detector to traverse the residual image. The effectiveness of the …


Application Of Distributed Clustering In Anomaly Detection Of Farm Environment Data, Deng Li, Honglin Pang, Ling Wang, Minrui Fei Jun 2020

Application Of Distributed Clustering In Anomaly Detection Of Farm Environment Data, Deng Li, Honglin Pang, Ling Wang, Minrui Fei

Journal of System Simulation

Abstract: The massive farm environment data stored in the distributed system should be dealt with so as to provide abnormal environment reference and make preventive strategies for crop yield. Considering the characteristics of the farm environment data, the Dirichlet Process Mixture Model (DPMM) clustering is implemented with the farm environment data on Hadoop and the anomaly detection method of the farm environment is proposed based on clustering analysis. Under the framework of MapReduce, Map stage implements the distribution of the sample points to the models; Reduce stage completes the update of models and the number of clusters. The performance has …


Hierarchical Anomaly Detection For Time Series Data, Ryan E. Sperl Jan 2020

Hierarchical Anomaly Detection For Time Series Data, Ryan E. Sperl

Browse all Theses and Dissertations

With the rise of Big Data and the Internet of Things, there is an increasing availability of large volumes of real-time streaming data. Unusual occurrences in the underlying system will be reflected in these streams, but any human analysis will quickly become out of date. There is a need for automatic analysis of streaming data capable of identifying these anomalous behaviors as they occur, to give ample time to react. In order to handle many high-velocity data streams, detectors must minimize the processing requirements per value. In this thesis, we have developed a novel anomaly detection method which makes use …


Real-Time Anomaly Detection And Mitigation Using Streaming Telemetry In Sdn, Çağdaş Kurt, Osman Ayhan Erdem Jan 2020

Real-Time Anomaly Detection And Mitigation Using Streaming Telemetry In Sdn, Çağdaş Kurt, Osman Ayhan Erdem

Turkish Journal of Electrical Engineering and Computer Sciences

Measurement and monitoring are crucial for various network tasks such as traffic engineering, anomaly detection, and intrusion prevention. The success of critical capabilities such as anomaly detection and prevention depends on whether the utilized network measurement method is able to provide granular, near real-time, low-overhead measurement data or not. In addition to the measurement method, the anomaly detection and mitigation algorithm is also essential for recognizing normal and abnormal traffic patterns in such a huge amount of measured data with high accuracy and low latency. Software-defined networking is an emerging concept to enable programmable and efficient measurement functions for these …


Minos: Unsupervised Netflow-Based Detection Of Infected And Attacked Hosts, And Attack Time In Large Networks, Mousume Bhowmick Aug 2019

Minos: Unsupervised Netflow-Based Detection Of Infected And Attacked Hosts, And Attack Time In Large Networks, Mousume Bhowmick

Boise State University Theses and Dissertations

Monitoring large-scale networks for malicious activities is increasingly challenging: the amount and heterogeneity of traffic hinder the manual definition of IDS signatures and deep packet inspection. In this thesis, we propose MINOS, a novel fully unsupervised approach that generates an anomaly score for each host allowing us to classify with high accuracy each host as either infected (generating malicious activities), attacked (under attack), or clean (without any infection). The generated score of each hour is able to detect the time frame of being attacked for an infected or attacked host without any prior knowledge. MINOS automatically creates a personalized traffic …


Towards Efficient Intrusion Detection Using Hybrid Data Mining Techniques, Fadi Salo Jun 2019

Towards Efficient Intrusion Detection Using Hybrid Data Mining Techniques, Fadi Salo

Electronic Thesis and Dissertation Repository

The enormous development in the connectivity among different type of networks poses significant concerns in terms of privacy and security. As such, the exponential expansion in the deployment of cloud technology has produced a massive amount of data from a variety of applications, resources and platforms. In turn, the rapid rate and volume of data creation in high-dimension has begun to pose significant challenges for data management and security. Handling redundant and irrelevant features in high-dimensional space has caused a long-term challenge for network anomaly detection. Eliminating such features with spectral information not only speeds up the classification process, but …


Abnormal Behavior Detection Via Super-Pixels Time Context Feature, Chen Ying, Dandan He Jan 2019

Abnormal Behavior Detection Via Super-Pixels Time Context Feature, Chen Ying, Dandan He

Journal of System Simulation

Abstract: In order to accurately locate the abnormal behavior, an anomaly detection method based on time context features of super-pixels is proposed. For feature representation, the video frames are firstly segmented into super-pixels. The super-pixels of foreground are then selected according to their pixel ratios of foreground. Super-pixels matching adjacent frames are selected based on the gray-level histogram and the information of location to enhance the temporal context of super-pixel features. The statical value of multilayer histogram of optical flow of matched super-pixels are taken as the feature for detection. In the phase of detection, the sparse combination learning algorithm …


Adapted K-Nearest Neighbors For Detecting Anomalies On Spatio–Temporal Traffic Flow, Youcef Djenouri, Asma Belhadi, Jerry Chun-Wei Lin, Alberto Cano Jan 2019

Adapted K-Nearest Neighbors For Detecting Anomalies On Spatio–Temporal Traffic Flow, Youcef Djenouri, Asma Belhadi, Jerry Chun-Wei Lin, Alberto Cano

Computer Science Publications

Outlier detection is an extensive research area, which has been intensively studied in several domains such as biological sciences, medical diagnosis, surveillance, and traffic anomaly detection. This paper explores advances in the outlier detection area by finding anomalies in spatio-temporal urban traffic flow. It proposes a new approach by considering the distribution of the flows in a given time interval. The flow distribution probability (FDP) databases are first constructed from the traffic flows by considering both spatial and temporal information. The outlier detection mechanism is then applied to the coming flow distribution probabilities, the inliers are stored to enrich the …


On Spectral Analysis Of The Internet Delay Space And Detecting Anomalous Routing Paths, Gonca Gürsun Jan 2019

On Spectral Analysis Of The Internet Delay Space And Detecting Anomalous Routing Paths, Gonca Gürsun

Turkish Journal of Electrical Engineering and Computer Sciences

Latency is one of the most critical performance metrics for a wide range of applications. Therefore, it is important to understand the underlying mechanisms that give rise to the observed latency values and diagnose the ones that are unexpectedly high. In this paper, we study the Internet delay space via robust principal component analysis (RPCA). Using RPCA, we show that the delay space, i.e. the matrix of measured round trip times between end hosts, can be decomposed into two components: the estimated latency between end hosts with respect to the current state of the Internet and the inflation on the …


Graph Analysis Of Network Flow Connectivity Behaviors, Hangyu Hu, Xuemeng Zhai, Mingda Wang, Guangmin Hu Jan 2019

Graph Analysis Of Network Flow Connectivity Behaviors, Hangyu Hu, Xuemeng Zhai, Mingda Wang, Guangmin Hu

Turkish Journal of Electrical Engineering and Computer Sciences

Graph-based approaches have been widely employed to facilitate in analyzing network flow connectivity behaviors, which aim to understand the impacts and patterns of network events. However, existing approaches suffer from lack of connectivity-behavior information and loss of network event identification. In this paper, we propose network flow connectivity graphs (NFCGs) to capture network flow behavior for modeling social behaviors from network entities. Given a set of flows, edges of a NFCG are generated by connecting pairwise hosts who communicate with each other. To preserve more information about network flows, we also embed node-ranking values and edge-weight vectors into the original …


Importance-Based Signal Detection And Parameter Estimation With Applications To New Particle Search, Hati̇ce Doğan, Nasuf Sönmez, Güleser Kalayci Demi̇r Jan 2019

Importance-Based Signal Detection And Parameter Estimation With Applications To New Particle Search, Hati̇ce Doğan, Nasuf Sönmez, Güleser Kalayci Demi̇r

Turkish Journal of Electrical Engineering and Computer Sciences

One of the hardest challenges in data analysis is perhaps the detection of rare anomalous data buried in a huge normal background. We study this problem by constructing a novel method, which is a combination of the Kullback?Leibler importance estimation procedure based anomaly detection algorithm and linear discriminant classifier. We choose to illustrate it with the example of charged Higgs boson (CHB) search in particle physics. Indeed, the Large Hadron Collider experiments at CERN ensure that CHB signal must be a tiny effect within the irreducible W-boson background. In simulations, different CHB events with different characteristics are produced and judiciously …


Cyber Data Anomaly Detection Using Autoencoder Neural Networks, Spencer A. Butt Mar 2018

Cyber Data Anomaly Detection Using Autoencoder Neural Networks, Spencer A. Butt

Theses and Dissertations

The Department of Defense requires a secure presence in the cyber domain to successfully execute its stated mission of deterring war and protecting the security of the United States. With potentially millions of logged network events occurring on defended networks daily, a limited staff of cyber analysts require the capability to identify novel network actions for security adjudication. The detection methodology proposed uses an autoencoder neural network optimized via design of experiments for the identification of anomalous network events. Once trained, each logged network event is analyzed by the neural network and assigned an outlier score. The network events with …


Anomaly Inference Based On Heterogeneous Data Sources In An Electrical Distribution System, Yachen Tang Jan 2018

Anomaly Inference Based On Heterogeneous Data Sources In An Electrical Distribution System, Yachen Tang

Dissertations, Master's Theses and Master's Reports

Harnessing the heterogeneous data sets would improve system observability. While the current metering infrastructure in distribution network has been utilized for the operational purpose to tackle abnormal events, such as weather-related disturbance, the new normal we face today can be at a greater magnitude. Strengthening the inter-dependencies as well as incorporating new crowd-sourced information can enhance operational aspects such as system reconfigurability under extreme conditions. Such resilience is crucial to the recovery of any catastrophic events. In this dissertation, it is focused on the anomaly of potential foul play within an electrical distribution system, both primary and secondary networks as …


Online Growing Neural Gas For Anomaly Detection In Changing Surveillance Scenes, Qianru Sun, Hong Liu, Tatsuya Harada Apr 2017

Online Growing Neural Gas For Anomaly Detection In Changing Surveillance Scenes, Qianru Sun, Hong Liu, Tatsuya Harada

Research Collection School Of Computing and Information Systems

Anomaly detection is still a challenging task for video surveillance due to complex environments and unpredictable human behaviors. Most existing approaches train offline detectors using manually labeled data and predefined parameters, and are hard to model changing scenes. This paper introduces a neural network based model called online Growing Neural Gas (online GNG) to perform an unsupervised learning. Unlike a parameter-fixed GNG, our model updates learning parameters continuously, for which we propose several online neighbor-related strategies. Specific operations, namely neuron insertion, deletion, learning rate adaptation and stopping criteria selection, get upgraded to online modes. In the anomaly detection stage, the …


Hoeffding Tree Algorithms For Anomaly Detection In Streaming Datasets: A Survey, Asmah Muallem, Sachin Shetty, Jan W. Pan, Juan Zhao, Biswajit Biswal Jan 2017

Hoeffding Tree Algorithms For Anomaly Detection In Streaming Datasets: A Survey, Asmah Muallem, Sachin Shetty, Jan W. Pan, Juan Zhao, Biswajit Biswal

Computational Modeling & Simulation Engineering Faculty Publications

This survey aims to deliver an extensive and well-constructed overview of using machine learning for the problem of detecting anomalies in streaming datasets. The objective is to provide the effectiveness of using Hoeffding Trees as a machine learning algorithm solution for the problem of detecting anomalies in streaming cyber datasets. In this survey we categorize the existing research works of Hoeffding Trees which can be feasible for this type of study into the following: surveying distributed Hoeffding Trees, surveying ensembles of Hoeffding Trees and surveying existing techniques using Hoeffding Trees for anomaly detection. These categories are referred to as compositions …


Preprocessing Techniques To Support Event Detection Data Fusion On Social Media Data, Brandon T. Davis Jun 2016

Preprocessing Techniques To Support Event Detection Data Fusion On Social Media Data, Brandon T. Davis

Theses and Dissertations

This thesis focuses on collection and preprocessing of streaming social media feeds for metadata as well as the visual and textual information. Today, news media has been the main source of immediate news events, large and small. However, the information conveyed on these news sources is delayed due to the lack of proximity and general knowledge of the event. Such news have started relying on social media sources for initial knowledge of these events. Previous works focused on captured textual data from social media as a data source to detect events. This preprocessing framework postures to facilitate the data fusion …


“Time For Some Traffic Problems”: Enhancing E-Discovery And Big Data Processing Tools With Linguistic Methods For Deception Detection, Erin S. Crabb Jan 2014

“Time For Some Traffic Problems”: Enhancing E-Discovery And Big Data Processing Tools With Linguistic Methods For Deception Detection, Erin S. Crabb

Journal of Digital Forensics, Security and Law

Linguistic deception theory provides methods to discover potentially deceptive texts to make them accessible to clerical review. This paper proposes the integration of these linguistic methods with traditional e-discovery techniques to identify deceptive texts within a given author’s larger body of written work, such as their sent email box. First, a set of linguistic features associated with deception are identified and a prototype classifier is constructed to analyze texts and describe the features’ distributions, while avoiding topic-specific features to improve recall of relevant documents. The tool is then applied to a portion of the Enron Email Dataset to illustrate how …