Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Kennesaw State University

Theses/Dissertations

Discipline
Keyword
Publication Year
Publication
File Type

Articles 1 - 30 of 109

Full-Text Articles in Physical Sciences and Mathematics

Semantic Segmentation Of Point Cloud Sequences Using Point Transformer V3, Marion Sisk Apr 2024

Semantic Segmentation Of Point Cloud Sequences Using Point Transformer V3, Marion Sisk

Master's Theses

Semantic segmentation of point clouds is a basic step for many autonomous systems including automobiles. In autonomous driving systems, LiDAR sensors are frequently used to produce point cloud sequences that allow the system to perceive the environment and navigate safely. Modern machine learning techniques for segmentation have predominately focused on single-scan segmentation, however sequence segmentation has often proven to perform better on common segmentation metrics. Using the popular Semantic KITTI dataset, we show that by providing point cloud sequences to a segmentation pipeline based on Point Transformer v3, we increase the segmentation performance between seven and fifteen percent when compared …


Mathematical Modeling For Dental Decay Prevention In Children And Adolescents, Mahdiyeh Soltaninejad Apr 2024

Mathematical Modeling For Dental Decay Prevention In Children And Adolescents, Mahdiyeh Soltaninejad

Dissertations

The high prevalence of dental caries among children and adolescents, especially those from lower socio-economic backgrounds, is a significant nationwide health concern. Early prevention, such as dental sealants and fluoride varnish (FV), is essential, but access to this care remains limited and disparate. In this research, a national dataset is utilized to assess sealants' reach and effectiveness in preventing tooth decay, particularly focusing on 2nd molars that emerge during early adolescence, a current gap in the knowledge base. FV is recommended to be delivered during medical well-child visits to children who are not seeing a dentist. Challenges and facilitators in …


Cm-Ii Meditation As An Intervention To Reduce Stress And Improve Attention: A Study Of Ml Detection, Spectral Analysis, And Hrv Metrics, Sreekanth Gopi Dec 2023

Cm-Ii Meditation As An Intervention To Reduce Stress And Improve Attention: A Study Of Ml Detection, Spectral Analysis, And Hrv Metrics, Sreekanth Gopi

Master of Science in Computer Science Theses

Students frequently face heightened stress due to academic and social pressures, particularly in de- manding fields like computer science and engineering. These challenges are often associated with serious mental health issues, including ADHD (Attention Deficit Hyperactivity Disorder), depression, and an increased risk of suicide. The average student attention span has notably decreased from 21⁄2 minutes to just 47 seconds, and now it typically takes about 25 minutes to switch attention to a new task (Mark, 2023). Research findings suggest that over 95% of individuals who die by suicide have been diagnosed with depression (Shahtahmasebi, 2013), and almost 20% of students …


Elucidating The Impacts Of Learning Communities On Retention, Student Sense Of Belonging, And Self-Efficacy In Stem, Honour Williams Jul 2023

Elucidating The Impacts Of Learning Communities On Retention, Student Sense Of Belonging, And Self-Efficacy In Stem, Honour Williams

Master of Science in Chemical Sciences Theses

Student sense of belonging (SoB) and self-efficacy (SE) are necessary aspects of promoting achievement and success in the STEM fields. This study focuses on how learning communities affect students' SoB and SE in relation to STEM as well as demographics, including race, gender, and first-generation status. Students enrolled in either the introductory chemistry lecture or the first-year experience course (students grouped by similar STEM major) were asked to complete a STEM attributes survey (Likert scale) to assess their overall SoB to the university and their chosen major as well as their level of SE in relations to those fields. To …


On Training Neurons With Bounded Compilations, Lance Kennedy Jul 2023

On Training Neurons With Bounded Compilations, Lance Kennedy

Master of Science in Computer Science Theses

Knowledge compilation offers a formal approach to explaining and verifying the behavior of machine learning systems, such as neural networks. Unfortunately, compiling even an individual neuron into a tractable representation such as an Ordered Binary Decision Diagram (OBDD), is an NP-hard problem. In this thesis, we consider the problem of training a neuron from data, subject to the constraint that it has a compact representation as an OBDD. Our approach is based on the observation that a neuron can be compiled into an OBDD in polytime if (1) the neuron has integer weights, and (2) its aggregate weight is bounded. …


Using Machine Learning Techniques To Model Encoder/Decoder Pair For Non-Invasive Electroencephalographic Wireless Signal Transmission, Ernst Fanfan Jul 2023

Using Machine Learning Techniques To Model Encoder/Decoder Pair For Non-Invasive Electroencephalographic Wireless Signal Transmission, Ernst Fanfan

Master of Science in Computer Science Theses

This study investigated the application and enhancement of Non-Invasive Brain-Computer Interfaces (NI-BCIs), focused on enhancing the efficiency and effectiveness of this technology for individuals with severe physical limitations. The core research goal was to improve current limitations associated with wires, noise, and invasive procedures often associated with BCI technology. The key discussed solution involves developing an optimized Encoder/Decoder (E/D) pair using machine learning techniques, particularly those borrowed from Generative Adversarial Networks (GAN) and other Deep Neural Networks, to minimize data transmission and ensure robustness against data degradation. The study highlighted the crucial role of machine learning in self-adjusting and isolating …


Influence Of Woody Vegetation Composition And Structure On Fuels And Prescribed Fire In Mountain Longleaf Restoration, Collin J. Anderson Jun 2023

Influence Of Woody Vegetation Composition And Structure On Fuels And Prescribed Fire In Mountain Longleaf Restoration, Collin J. Anderson

Master of Science in Integrative Biology Theses

Longleaf pine (LLP) ecosystems have experienced a widespread ecological state shift largely due to fire exclusion which has allowed mesophytes, i.e., shade-tolerant, often fire-sensitive species to encroach, reducing flammability and biodiversity through a process known as “mesophication.” Although prescribed fire is commonly used to reverse mesophication, fire behavior, and thus prescribed fire utility for this purpose, is poorly characterized in mixed pine-hardwood stands with mesophyte encroachment. This study aimed to identify mechanisms by which tree composition, structure, and fuels contribute to fire behavior, focusing on the understudied mountain longleaf pine (MLLP) ecoregion in northwest Georgia. I hypothesized that woody vegetation …


Investigating The Activity Of Alternative Warheads For Targeted Covalent Inhibition Of The Inhibitor Vertebrate Lysozyme Protein From Pseudomonas Aeruginosa, Katie Hambrick Jun 2023

Investigating The Activity Of Alternative Warheads For Targeted Covalent Inhibition Of The Inhibitor Vertebrate Lysozyme Protein From Pseudomonas Aeruginosa, Katie Hambrick

Master of Science in Chemical Sciences Theses

Pseudomonas aeruginosa (P. aeruginosa) is a Gram-negative bacterium that causes blood and lung infections in hospital environments due to its ability to survive on improperly sterilized medical equipment. P. aeruginosa has developed several multi-drug resistance mechanisms that make it very difficult to treat with current antibiotics.1 This presents the need for a new class of antibiotics that cannot be overcome by P. aeruginosa’s mechanisms of resistance.

The primary goal of this project was to develop a small library of inhibitors that could later be incorporated into lead compounds for novel antibiotic drug discovery. One of P. …


Mutualism In Architecture, Tj Rottenberg May 2023

Mutualism In Architecture, Tj Rottenberg

Bachelor of Architecture Theses - 5th Year

Mutualism is a term more commonly used in Ecology and can be defined as the relationship between organisms of differing species in which each benefit. In Architectural terms, Mutualism can be defined as the relationship between differing owners, buildings, typologies, or programs in which each benefit. A relationship already exists in many forms, but not always in a mutually beneficial way. I propose that architecture be designed and built in such a way that physical architecture relates to its surrounding infrastructure in such a way that creates a system or an ecosystem that is mutually beneficial.

In nature there are …


Perspective Sky: A New Architectural Typology For Astronomy, Brendan Lydic May 2023

Perspective Sky: A New Architectural Typology For Astronomy, Brendan Lydic

Bachelor of Architecture Theses - 5th Year

This thesis aims to reconnect modern humans to the night sky and the universe around us. A connection that has been lost to a multitude of barriers and distractions. Physical barriers like air and light pollution, and distractions like technology and overwhelming world events. I aim to restore this connection by creating a new architectural typology for the observation of and education about the night sky, the cosmos, and astronomy. It will serve as a site of pilgrimage, where visitors of all ages can re-engage with the stars and reintroduce themselves to the perspective of our ancestors. The questions I …


Quantification Of Various Types Of Biases In Large Language Models, Sudhashree Sayenju Apr 2023

Quantification Of Various Types Of Biases In Large Language Models, Sudhashree Sayenju

Doctor of Data Science and Analytics Dissertations

Natural Language Processing (NLP) systems are included everywhere on the internet from search engines, language translations to more advanced systems like voice assistant and customer service. Since humans are always on the receiving end of NLP technologies, it is very important to analyze whether or not the Large Language Models (LLMs) in use have bias and are therefore unfair. The majority of the research in NLP bias has focused on societal stereotype biases embedded in LLMs. However, our research focuses on all types of biases, namely model class level bias, stereotype bias and domain bias present in LLMs. Model class …


Reducing Restaurant Inventory Costs Through Sales Forecasting, Tyler Mason, Chris Schoen, Trevor Gilbert, Jonathan Enriquez Apr 2023

Reducing Restaurant Inventory Costs Through Sales Forecasting, Tyler Mason, Chris Schoen, Trevor Gilbert, Jonathan Enriquez

Senior Design Project For Engineers

Family Restaurant is a local restaurant in the greater Atlanta area that serves a variety of dishes that include an assortment of 19 different proteins. Currently, Family Restaurant places protein orders based on business intuition, and tends to over-stock and sometimes under-stock. To minimize inventory costs by reducing over-stocking and preventing under-stocking of proteins, we applied Facebook Prophet (FB Prophet), ARIMA, and XG Boost machine learning models to predict protein demand and then fed these results into a Fixed Time Period inventory model to make an overall order suggestion based on the specified time period. We trained our models on …


Azomethine-Containing Moieties For Simple And Degradable Conjugated Polymers, Kimberley Bartlett Apr 2023

Azomethine-Containing Moieties For Simple And Degradable Conjugated Polymers, Kimberley Bartlett

Master of Science in Chemical Sciences Theses

Conjugated polymers have been studied for various applications such as electrochromics, organic photovoltaics, organic light-emitting diodes, thin film transistors, biosensors, and energy storage materials. They have received significant attention from many researchers because they are relatively inexpensive and possess a high degree of synthetic tailorability that enables tuning of optical, electrochemical, and physical properties. However, many state-of-the-art polymers require synthetic protocols that use multiple arduous synthetic steps, harmful chemical reagents, and produce toxic byproducts. Additionally, the synthesized polymers are not typically designed to be degradable and will accumulate during waste disposal without end-of-life management strategies. Therefore, there is a need …


Fairness And Privacy In Machine Learning Algorithms, Neha Bhargava Dec 2022

Fairness And Privacy In Machine Learning Algorithms, Neha Bhargava

Master of Science in Computer Science Theses

Roughly 2.5 quintillion bytes of data is generated daily in this digital era. Manual processing of such huge amounts of data to extract useful information is nearly impossible but with the widespread use of machine learning algorithms and their ability to process enormous data in a fast, cost-effective, and scalable way has proven to be a preferred choice to glean useful insights and solve business problems in many domains. With this widespread use of machine learning algorithms there has always been concerns about the ethical issues that may arise from the use of this modern technology. While achieving high accuracies, …


A Maturity Model Of Data Modeling In Self-Service Business Intelligence Software, Anna Kurenkov Dec 2022

A Maturity Model Of Data Modeling In Self-Service Business Intelligence Software, Anna Kurenkov

Master of Science in Information Technology Theses

Although Self-Service Business Intelligence (SSBI) is continually being adopted in various industries, there is a lack of research focused on data modeling in SSBI. This research aims to fill that research gap and propose a maturity model for SSBI data modeling which is generalizeable between different software and applicable for users of all technical backgrounds. Through extensive literature review, a five-tier maturity model was proposed, explained, and instantiated in PowerBI and Tableau. The testing of the model was found to be simple and intuitive, and the research concludes that the model is applicable to enterprise SSBI environments. This research is …


Debiasing Cyber Incidents – Correcting For Reporting Delays And Under-Reporting, Seema Sangari Aug 2022

Debiasing Cyber Incidents – Correcting For Reporting Delays And Under-Reporting, Seema Sangari

Doctor of Data Science and Analytics Dissertations

This research addresses two key problems in the cyber insurance industry – reporting delays and under-reporting of cyber incidents. Both problems are important to understand the true picture of cyber incident rates. While reporting delays addresses the problem of delays in reporting due to delays in timely detection, under-reporting addresses the problem of cyber incidents frequently under-reported due to brand damage, reputation risk and eventual financial impacts.

The problem of reporting delays in cyber incidents is resolved by generating the distribution of reporting delays and fitting modeled parametric distributions on the given domain. The reporting delay distribution was found to …


Growth Outcomes Of Pseudomonas Aeruginosa Inhibitor Of Vertebrate Lysozyme Knockouts In Conditions Mimicking The Cystic Fibrosis Lung Environment, Amani Gaddy Jul 2022

Growth Outcomes Of Pseudomonas Aeruginosa Inhibitor Of Vertebrate Lysozyme Knockouts In Conditions Mimicking The Cystic Fibrosis Lung Environment, Amani Gaddy

Master of Science in Chemical Sciences Theses

Pseudomonas aeruginosa (PA) is a Gram-negative bacterium, often found in cystic fibrosis (CF) patients and can lead to the decline of lung functioning and premature death in 80% of infected patients when microcolonies form within the mucin of the lung. Due to its major capacity for antibiotic resistance, an alternative strategy towards defending against the bacterial invasion of PA is by the antibacterial activity of our own innate immune system with use of elements such as lysozyme. Pseudomonas aeruginosa inhibitor of vertebrate lysozyme class 1 (Ivyp1) is a periplasmic protein produced by gram-negative bacteria that inhibits the enzymatic activity of …


The Stakeholder-Profile Framework For Tacit Knowledge Acquisition In Requirements Elicitation Interviews, Rasha Eltigani May 2022

The Stakeholder-Profile Framework For Tacit Knowledge Acquisition In Requirements Elicitation Interviews, Rasha Eltigani

Master of Science in Software Engineering Theses

The stakeholder’s tacit knowledge is a key crown jewel of requirements elicitation, and in turn software development at large. This critical element holds significant leverage in determining the outcome and the quality of the requirements, and therefore the development endeavor holistically. Due to its very nature of being tacit, it is innately covert and deeply hidden within the stakeholders’ minds, so it is extremely difficult to articulate and relay, as well as even harder to elicit and utilize. Additionally, the literature reports that there is a scarcity of available theorizations and solutions for addressing this challenge, posing a key and …


The Identification And Quantification Of Organophosphate Flame Retardants In Raw Materials Utilized In Polyurethane Production, Lindsay Tudor May 2022

The Identification And Quantification Of Organophosphate Flame Retardants In Raw Materials Utilized In Polyurethane Production, Lindsay Tudor

Master of Science in Chemical Sciences Theses

Flame retardant additives are utilized in various polyurethane applications to comply with industry and flammability standards. Federal regulations and restrictive standards continue to tighten controls on certain impurities found in flame retardant blends. These restrictions include the limitation of organophosphates. Extensive research has been conducted to examine the effects of organophosphates in humans. To date, researchers have predominantly focused on the final product, mainly furniture and bedding products, rather than the raw materials utilized in production. Industrial chemists continue to grapple with the development of an effective method for detecting and quantifying organophosphate compounds in various flame retardant mixtures as …


Analyzing The Interactions Of Thermoresponsive Coacervate-Forming Biodegradable Polyester Encapsulation On Model Protein Structure And Activity, Conner Casterline May 2022

Analyzing The Interactions Of Thermoresponsive Coacervate-Forming Biodegradable Polyester Encapsulation On Model Protein Structure And Activity, Conner Casterline

Master of Science in Chemical Sciences Theses

Protein therapeutics hold high efficacy in treatment for various diseases including cancer and diabetes. However, the treatment cost is generally higher than other therapeutics mainly due to in vivo protein degradation. This drawback creates demand for more efficient delivery methods to preserve the function and integrity of protein therapeutics. Thermoresponsive coacervate-forming biodegradable polyesters (TR-PEs) are a thermoresponsive molecular packaging system used in protein therapeutic research. The term coacervate refers to a phase-separated solution in which a dense polymer phase separates from the aqueous phase to form nanodroplets within solution, capturing bioactive molecules. Limited research demonstrates if TR-PEs can encapsulate and …


Novel Instance-Level Weighted Loss Function For Imbalanced Learning, Trent Geisler May 2022

Novel Instance-Level Weighted Loss Function For Imbalanced Learning, Trent Geisler

Doctor of Data Science and Analytics Dissertations

Binary classification using imbalanced datasets remains a challenge. Typically, supervised learning algorithms minimize the binary cross-entropy objective function to determine the final parameter estimates. This objective function assumes an equal class distribution between the minority (i.e. events) and majority (i.e. non-events) classes, which almost never exists in real-world modeling. In the imbalanced data setting, the equal class distribution is grossly violated, and the resulting parameter estimates are biased toward the majority class. To overcome the bias and improve model generalization, we focus on modifying the original binary cross-entropy objective function by uniquely weighting each minority class observation. We base our …


Determining The Structure And Activity Of A Pseudomonas Aeruginosa Protein, Ivyp2, Katherine Letsinger May 2022

Determining The Structure And Activity Of A Pseudomonas Aeruginosa Protein, Ivyp2, Katherine Letsinger

Master of Science in Chemical Sciences Theses

Various biophysical methods were employed to structurally characterize and assess the activity of an important resistance factor (Ivyp2) from the multi-drug resistant Gram-negative bacterium Pseudomonas aeruginosa. This opportunistic pathogen accounts for approximately 10% of all hospital-acquired infections in the United States and contains a number of virulence factors that aid in its ability to infect and colonize immunocompromised individuals and those with cystic fibrosis. One of these factors – inhibitor of vertebrate lysozyme, or Ivy – neutralizes the lytic activity of lysozyme, an antimicrobial enzyme part of the innate immune system that hydrolyzes the linkages between bacterial cell wall subunits. …


Probing Convergence In Orthogonal Conjugated Catalysis By T4 Lysozyme Utilizing Biophysical Characterization, William Turner May 2022

Probing Convergence In Orthogonal Conjugated Catalysis By T4 Lysozyme Utilizing Biophysical Characterization, William Turner

Master of Science in Chemical Sciences Theses

Conjugated polymers have become highly attractive as they afford unique material properties that make them promising for a wide range of applications, such as photovoltaics and drug delivery systems. However, these conjugated polymers require extensive synthetic steps involving hazardous organic solvents or metal-based catalysts yielding toxic waste streams. To remedy this, enzymes have emerged as a highly valuable alternative to synthesizing these polymers as they are able to be produced in environmentally benign conditions and have played a pivotal role in various biosynthetic strategies in recent years. Serving as model systems, lysozymes have been shown to polymerize 2-ethynylpyridine (2-EP) via …


Synthesis, Characterization, And Thermal Investigation Of Metal Phosphites And Potential Implications For Astrobiology, Kimberly Faye Meyberg Apr 2022

Synthesis, Characterization, And Thermal Investigation Of Metal Phosphites And Potential Implications For Astrobiology, Kimberly Faye Meyberg

Master of Science in Chemical Sciences Theses

The role of phosphorus in biochemistry is well understood. However, the route by which phosphorus was incorporated into early biomolecules on the prebiotic Earth is uncertain. Phosphate, the most prevalent species of phosphorus found in Earth’s geological record, is insoluble and unreactive with organics in aqueous environments. While the most abundant biogenic elements (C, N, H, O, and S) can be found in a volatile phase under terrestrial conditions, phosphorus cannot, suggesting that minerals must have been the main sources of phosphorus on the early Earth. One possible explanation is that phosphite was a major source of reduced, reactive phosphorus …


Simplified Synthesis Of Conjugated Polymers Enabled Via 1,4-Dihydropyrrolo[3,2-B]Pyrrole, Kenneth-John Jack Bell Apr 2022

Simplified Synthesis Of Conjugated Polymers Enabled Via 1,4-Dihydropyrrolo[3,2-B]Pyrrole, Kenneth-John Jack Bell

Master of Science in Chemical Sciences Theses

Conjugated polymers have attracted significant attention as the active layer material in organic electronics, such as organic photovoltaics and light-emitting diodes, partly due to the ability to influence a broad range of properties through structural design motifs. However, high performance conjugated polymers suffer from numerous synthetic steps, generation of toxic waste, and harsh reaction conditions all of which impart additional costs that inhibit their widespread utilization. Therefore, an emphasis on reducing synthetic complexity and utilizing abundant, commercially available starting materials is needed for organic electronics to reach their full potential. Dihydropyrrolo[3,2-b]pyrrole (H2DPP) chromophores offer a simple one-pot synthesis …


A Distance-Based Clustering Framework For Categorical Time Series: A Case Study In Episodes Of Care Healthcare Delivery System, Lauren Staples Dec 2021

A Distance-Based Clustering Framework For Categorical Time Series: A Case Study In Episodes Of Care Healthcare Delivery System, Lauren Staples

Doctor of Data Science and Analytics Dissertations

Understanding how compensation structures influence overall healthcare costs is a central issue in health economics. Episodes of Care (EoC) is a compensation structure that bundles payments for healthcare interventions that belong to a well-defined health event. Since the variation of clinical pathways can drive the cost of healthcare, this research uses sequences of medical billing codes in Perinatal Episodes of Care claims data to study the extent of that variation by equating it to the number of reproducible clusters found. This research proposes a methodological framework to detect reproducible clusters in an unsupervised problem where the true number of clusters …


Integrated Machine Learning Approaches To Improve Classification Performance And Feature Extraction Process For Eeg Dataset, Mohammad Masum Jul 2021

Integrated Machine Learning Approaches To Improve Classification Performance And Feature Extraction Process For Eeg Dataset, Mohammad Masum

Doctor of Data Science and Analytics Dissertations

Epileptic seizure or epilepsy is a chronic neurological disorder that occurs due to brain neurons' abnormal activities and has affected approximately 50 million people worldwide. Epilepsy can affect patients’ health and lead to life-threatening emergencies. Early detection of epilepsy is highly effective in avoiding seizures by intervening in treatment. The electroencephalogram (EEG) signal, which contains valuable information of electrical activity in the brain, is a standard neuroimaging tool used by clinicians to monitor and diagnose epilepsy. Visually inspecting the EEG signal is an expensive, tedious, and error-prone practice. Moreover, the result varies with different neurophysiologists for an identical reading. Thus, …


Method Development Of Standard Dilution Analysis On Molecules And Dissolution Studies Of Ibuprofen Tablets Along With Common Beverage Constituents, Scott Richardson Jul 2021

Method Development Of Standard Dilution Analysis On Molecules And Dissolution Studies Of Ibuprofen Tablets Along With Common Beverage Constituents, Scott Richardson

Master of Science in Chemical Sciences Theses

The United States Pharmacopeia sets the standards for the manufacturing, storage, and analysis of medicinal formulations. One analysis, dissolution testing evaluates the rate at which the medicinal formulation forms a solution to predict in vivo drug release. Dissolution testing on ibuprofen tablets alone and in the presence of ascorbic acid or caffeine was performed to mimic the administration using orange juice or caffeinated soft drinks to assess their impact on the dissolution rate of ibuprofen. Results using the external calibration method produced a dissolution rate of ibuprofen that decreased 4% in the presence of ascorbic acid and increased 1% in …


Desorption Of Non-Exchangeable Radiocesium In Soil From Fukushima Prefecture, Japan, Grayson Phillips Jul 2021

Desorption Of Non-Exchangeable Radiocesium In Soil From Fukushima Prefecture, Japan, Grayson Phillips

Master of Science in Chemical Sciences Theses

The Fukushima Daiichi Nuclear Powerplant was struck by an earthquake and a tsunami resulting in the meltdown of four of the six reactor cores operating at the plant. As a result, nuclear waste was released from the plant, contaminating the soil in the region. Most of the contamination was sequestered within the few inches of the soil but unluckily the contamination, radiocesium formed a non-exchangeable bond. The Japanese government bagged this soil and has stored it in fields surrounding the exclusion zone. Long-term storage facilities have not been determined. This is a study of the available resources to determine if, …


Using Big Data Analytics To Optimize Practical Large Databases, Po-Chun Lu Jul 2021

Using Big Data Analytics To Optimize Practical Large Databases, Po-Chun Lu

Master of Science in Computer Science Theses

Big data analytics is gaining popularity for enterprises in optimizing their business processes ranging from retailers, supply chains, to online shopping stores. Existing practical raw data are far from usable to achieve the goal. Therefore, a good data pre-processing approach is required and is a key step to success. We propose to research on the effectiveness of data pre-processing and the business process based on a real world database. Our methodology involves natural language processing. Our key goal is to study appropriate methods with big data analysis techniques that can handle errors, ambiguity, and repeated descriptions caused by human languages. …