Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Data Science

PDF

2019

Institution
Keyword
Publication
Publication Type

Articles 1 - 30 of 36

Full-Text Articles in Physical Sciences and Mathematics

Topik Modeling Penelitian Dosen Jptei Uny Pada Google Scholar Menggunakan Latent Dirichlet Allocation, Akhsin Nurlayli, Moch. Ari Nasichuddin Dec 2019

Topik Modeling Penelitian Dosen Jptei Uny Pada Google Scholar Menggunakan Latent Dirichlet Allocation, Akhsin Nurlayli, Moch. Ari Nasichuddin

Elinvo (Electronics, Informatics, and Vocational Education)

The mapping of research topics for lecturers is necessary to determine the research tendencies in a department or study program. This study aims to implement topic modeling in the publication titles of the Department of Electronics and Informatics Education Engineering of Universitas Negeri Yogyakarta (JPTEI UNY) lecturers taken from Google Scholar. The method used for topic modeling is the Latent Dirichlet Allocation (LDA). LDA is a generative probabilistic model for finding the semantic structure of a corpus collection based on the hierarchical bayesian analysis. After the topic modeling process, the results showed that JPTEI UNY lecturers tend to have …


The Application Of Gray-Scale Level-Set Method In Segmentation Of Concrete Deck Delamination Using Infrared Images, Chongsheng Cheng, Zhigang Shen Nov 2019

The Application Of Gray-Scale Level-Set Method In Segmentation Of Concrete Deck Delamination Using Infrared Images, Chongsheng Cheng, Zhigang Shen

Department of Construction Engineering and Management: Faculty Publications

Conventional nondestructive delamination detection of concrete pavements through thermography is often based on temperature contrasts between delaminated and sound areas. Non-uniform backgrounds caused by the environmental conditions are often challenging for contrast-based methods to robustly differentiate the delaminated areas from the sound areas. Instead of focusing on the temperature contrast, this study proposes a temperature gradient-based level set method (LSM) to detect boundaries for delamination segmentations. A modified edge indicator function is developed to represent the normalized temperature gradient of a thermal image. The experimental study was conducted to evaluate its applicability and stability for boundary detection in terms of …


Information Extraction From Primary Care Visits To Support Patient-Provider Interactions, Daniel Baruch Gutstein Nov 2019

Information Extraction From Primary Care Visits To Support Patient-Provider Interactions, Daniel Baruch Gutstein

College of Computing and Digital Media Dissertations

The extent of electronic health record systems usage in clinical settings has affected the dynamic between clinicians and patients and has thus been connected to physician morale and the quality of care patients receive. Recent research has also uncovered a correlation between physician burnout and negative physician attitudes electronic health record systems. In order to begin exploring the nature of the relationship between electronic health record usage, physician burnout, and patient care, it is necessary to first analyze patient-provider interactions within the context of verbal features such as turn-taking and non-verbal features such as eye-contact. While previous works have sought …


Numerical, Secondary Big Data Quality Issues, Quality Threshold Establishment, & Guidelines For Journal Policy Development, Anita Lee-Post, Ram Pakath Nov 2019

Numerical, Secondary Big Data Quality Issues, Quality Threshold Establishment, & Guidelines For Journal Policy Development, Anita Lee-Post, Ram Pakath

Marketing & Supply Chain Faculty Publications

An IS researcher may obtain Big Data from primary or secondary data sources. Sometimes, acquiring primary Big Data is infeasible due to availability, accessibility, cost, time, and/or complexity considerations. In this paper, we focus on Big Data-based IS research and discuss ways in which one may, post hoc, establish quality thresholds for numerical Big Data obtained from secondary sources. We also present guidelines for developing journal policies aimed at ensuring the veracity and verifiability of such data when used for research purposes.


Automated Morgan Keenan Classification Of Observed Stellar Spectra Collected By The Sloan Digital Sky Survey Using A Single Classifier, Michael J. Brice, Răzvan Andonie Oct 2019

Automated Morgan Keenan Classification Of Observed Stellar Spectra Collected By The Sloan Digital Sky Survey Using A Single Classifier, Michael J. Brice, Răzvan Andonie

All Faculty Scholarship for the College of the Sciences

The classification of stellar spectra is a fundamental task in stellar astrophysics. Stellar spectra from the Sloan Digital Sky Survey are applied to standard classification methods, k-nearest neighbors and random forest, to automatically classify the spectra. Stellar spectra are high dimensional data and the dimensionality is reduced using astronomical knowledge because classifiers work in low dimensional space. These methods are utilized to classify the stellar spectra into a complete Morgan Keenan classification (spectral and luminosity) using a single classifier. The motion of stars (radial velocity) causes machine-learning complications through the feature matrix when classifying stellar spectra. Due to the nature …


What About The Environment?: Exploring The Neglected Third Dimension Of Antimicrobial Resistance, Paige E. Montfort Oct 2019

What About The Environment?: Exploring The Neglected Third Dimension Of Antimicrobial Resistance, Paige E. Montfort

Independent Study Project (ISP) Collection

Antimicrobial resistance (AMR) is one of the most urgent and complex health risks of our time, with links to human health, animal health, and the environment. The majority of research and policy related to AMR, however, has been dedicated to human and animal health. The third dimension — the environment — has been relatively neglected. Conversations about this problem have begun, but gaps in understanding remain. This study explores the key barriers that have hindered developments related to the environmental aspect of AMR and some of the solutions that have begun to or could be utilized to overcome these barriers. …


Eavesdropping Hackers: Detecting Software Vulnerability Communication On Social Media Using Text Mining, Susan Mckeever, Brian Keegan, Andrei Quieroz Sep 2019

Eavesdropping Hackers: Detecting Software Vulnerability Communication On Social Media Using Text Mining, Susan Mckeever, Brian Keegan, Andrei Quieroz

Conference papers

Abstract—Cyber security is striving to find new forms of protection against hacker attacks. An emerging approach nowadays is the investigation of security-related messages exchanged on Deep/Dark Web and even Surface Web channels. This approach can be supported by the use of supervised machine learning models and text mining techniques. In our work, we compare a variety of machine learning algorithms, text representations and dimension reduction approaches for the detection accuracies of software-vulnerability-related communications. Given the imbalanced nature of the three public datasets used, we investigate appropriate sampling approaches to boost detection accuracies of our models. In addition, we examine how …


Integrating Intelligent Decision Support Services Into Analytical Management Systems, Mastura Zaynutdinova, Odilbek Asqaraliyev Sep 2019

Integrating Intelligent Decision Support Services Into Analytical Management Systems, Mastura Zaynutdinova, Odilbek Asqaraliyev

Bulletin of TUIT: Management and Communication Technologies

The article discusses the approach to creating projects of organizational management systems that allow to use the theoretical and practical results of research in the field of artificial intelligence in their design. The importance of using the experience gained to support decision-making in the organizational management system was emphasized. In addition, appropriate recommendations will be developed for the intellectual support of decision-making in integrated management systems, as well as the introduction of service technologies. In the example of supporting management decisions, optimal solutions are put forward for the intellectual processing of information, the creation of services appropriate to expert systems. …


Precision Agriculture Gis Technologies For Mississippi, 1st. Edition, Amelia A.A. Fox Aug 2019

Precision Agriculture Gis Technologies For Mississippi, 1st. Edition, Amelia A.A. Fox

College of Agriculture & Life Sciences Publications and Scholarship

Precision agriculture is meant to improve on-farm efficiency in hopes of ultimately increasing profitability while also protecting the environment. However, this difficult process almost always includes the proper management and interpretation of data. Therefore, it is imperative that those individuals involved in making such decisions are educated on these processes. In a data-driven world, this textbook is a great resource for those wanting to learn how to utilize their data in hopes of making better informed on-farm decisions.


Development Of An Algorithm For The Intelligent Decision Support Systems In The Field Of Tax Authorities, Odilbek Аskaraliev Aug 2019

Development Of An Algorithm For The Intelligent Decision Support Systems In The Field Of Tax Authorities, Odilbek Аskaraliev

Bulletin of TUIT: Management and Communication Technologies

An algorithm for the intellectual support of decisions in integrated management systems has been developed, its main components have been described, its structure has been formed, and algorithms have been developed to work in several modes that increase management efficiency on the example of tax authorities. An alternative solution structure has been designed and developed for management decision making. An integrated functional structure has been developed for the overall decision-making system. The process of operation of the proposed intelligent decision-making system for an integrated management system is shown. Issues of improving management efficiency by supporting intelligent decision-making are analyzed. An …


Model Of The State Of Threats To The Access Control System, ‪Durdona Irgasheva Aug 2019

Model Of The State Of Threats To The Access Control System, ‪Durdona Irgasheva

Bulletin of TUIT: Management and Communication Technologies

This article is devoted to the presentation of the threat state model of access control, which allows calculating the probabilities of the impact of threats on the access control system and the probability of opening this system based on taking into account the generalized algorithm for the implementation of external threats, and determines the need to develop additional components of the access control system designed to identify and classify attacks.


Methodology Of Management Monitoring For Flow Rivers Issue Issue, Toshtemir Khojakulov Aug 2019

Methodology Of Management Monitoring For Flow Rivers Issue Issue, Toshtemir Khojakulov

Bulletin of TUIT: Management and Communication Technologies

The article reviewed the development of the all the goals of the UzB are “... shaped in the form of global aspirations, and each government sets its own national goals, in cases where several states are monitoring a single trans boundary watershed, efforts should be made to agree on targets for all countries. Water quality comparison.


Innovative Solutions For State Medicaid Programs To Leverage Their Data, Build Their Analytic Capacity, And Create Evidence-Based Policy, Lauren Adams, Susan Kennedy, Lindsay Allen, Andrew Barnes, Tom Bias, Dushka Crane, Paul Lanier, Rachel Mauk, Shamis Mohamoud, Nathan Pauly, Jeffery C. Talbert, Cynthia Woodcock, Kara Zivin, Julie Donohue Aug 2019

Innovative Solutions For State Medicaid Programs To Leverage Their Data, Build Their Analytic Capacity, And Create Evidence-Based Policy, Lauren Adams, Susan Kennedy, Lindsay Allen, Andrew Barnes, Tom Bias, Dushka Crane, Paul Lanier, Rachel Mauk, Shamis Mohamoud, Nathan Pauly, Jeffery C. Talbert, Cynthia Woodcock, Kara Zivin, Julie Donohue

Pharmacy Practice and Science Faculty Publications

As states have embraced additional flexibility to change coverage of and payment for Medicaid services, they have also faced heightened expectations for delivering high-value care. Efforts to meet these new expectations have increased the need for rigorous, evidence-based policy, but states may face challenges finding the resources, capacity, and expertise to meet this need. By describing state-university partnerships in more than 20 states, this commentary describes innovative solutions for states that want to leverage their own data, build their analytic capacity, and create evidence-based policy. From an integrated web-based system to improve long-term care to evaluating the impact of permanent …


Data-Driven Decision-Support For Process Improvement Through Predictions Of Bed Occupancy Rates, Kar Way Tan, Qi You Ng, Francis Ngoc Hoang Long Nguyen, Sean Shao Wei Lam Aug 2019

Data-Driven Decision-Support For Process Improvement Through Predictions Of Bed Occupancy Rates, Kar Way Tan, Qi You Ng, Francis Ngoc Hoang Long Nguyen, Sean Shao Wei Lam

Research Collection School Of Computing and Information Systems

Managing bed utilization and ensuring the supply keeps up with the demand is not an easy task in a large public hospital with many medical disciplines. The bed managers who makes decisions on reserving and allocating beds centrally require high-dimensional data from several hospital information systems supporting emergency room, specialized clinics and bed management processes. In this work, we put together an automated process for cleaning, consolidating and integrating data from several hospital information systems to several reports required by the bed managers to analyse the bed occupancy situations across more than thirty medical disciplines. To prevent bed crunch situations …


Mathematical And Computer Simulation Of The Processes Of Two-Phase Joint Gas Filtration And Water In A Porous Environment, Elmira Nazirova Jul 2019

Mathematical And Computer Simulation Of The Processes Of Two-Phase Joint Gas Filtration And Water In A Porous Environment, Elmira Nazirova

Bulletin of TUIT: Management and Communication Technologies

A mathematical model, methods and algorithms for the numerical solution of problems of joint gas-water filtration in porous media are considered. The mathematical model of the process of non-stationary joint gas-water filtration in a porous medium is described by a system of nonlinear differential equations of parabolic type. In the numerical solution of the boundary value problem of gas displacement by water in a porous medium, the differential sweeping method is used for systems of differential-difference equations. The system of differential-difference equations with respect to the gas pressure function is nonlinear, therefore, an iterative method is used for it, based …


Statistical Machine Learning Methods For Mining Spatial And Temporal Data, Fei Tan May 2019

Statistical Machine Learning Methods For Mining Spatial And Temporal Data, Fei Tan

Dissertations

Spatial and temporal dependencies are ubiquitous properties of data in numerous domains. The popularity of spatial and temporal data mining has thus grown with the increasing prevalence of massive data. The presence of spatial and temporal attributes not only provides complementary useful perspectives, but also poses new challenges to the representation and integration into the learning procedure. In this dissertation, the involved spatial and temporal dependencies are explored with three genres: sample-wise, feature-wise, and target-wise. A family of novel methodologies is developed accordingly for the dependency representation in respective scenarios.

First, dependencies among discrete, continuous and repeated observations are studied …


On Properties Of Distance-Based Entropies On Fullerene Graphs, Modjtaba Ghorbani, Matthias Dehmer, Mina Rajabi-Parsa, Abbe Mowshowitz, Frank Emmert-Streib May 2019

On Properties Of Distance-Based Entropies On Fullerene Graphs, Modjtaba Ghorbani, Matthias Dehmer, Mina Rajabi-Parsa, Abbe Mowshowitz, Frank Emmert-Streib

Publications and Research

In this paper, we study several distance-based entropy measures on fullerene graphs. These include the topological information content of a graph Ia(G), a degree-based entropy measure, the eccentric-entropy Ifs(G), the Hosoya entropy H(G) and, finally, the radial centric information entropy Hecc. We compare these measures on two infinite classes of fullerene graphs denoted by A12n+4 and B12n+6. We have chosen these measures as they are easily computable and capture meaningful graph properties. To demonstrate the utility of these measures, we investigate the Pearson correlation between them on the fullerene graphs.


Do Misperceptions Of Peer Drinking Influence Personal Drinking Behavior? Results From A Complete Social Network Of First-Year College Students, Melissa J. Cox, Angelo M. Dibello, Matthew K. Meisel, Miles Q. Ott, Shannon R. Kenney, Melissa A. Clark, Nancy P. Barnett May 2019

Do Misperceptions Of Peer Drinking Influence Personal Drinking Behavior? Results From A Complete Social Network Of First-Year College Students, Melissa J. Cox, Angelo M. Dibello, Matthew K. Meisel, Miles Q. Ott, Shannon R. Kenney, Melissa A. Clark, Nancy P. Barnett

Statistical and Data Sciences: Faculty Publications

This study considered the influence of misperceptions of typical versus self-identified important peers' heavy drinking on personal heavy drinking intentions and frequency utilizing data from a complete social network of college students. The study sample included data from 1,313 students (44% male, 57% White, 15% Hispanic/Latinx) collected during the fall and spring semesters of their freshman year. Students provided perceived heavy drinking frequency for a typical student peer and up to 10 identified important peers. Personal past-month heavy drinking frequency was assessed for all participants at both time points. By comparing actual with perceived heavy drinking frequencies, measures of misperceptions …


Watersheds For Semi-Supervised Classification, Aditya Challa, Sravan Danda, B. S.Daya Sagar, Laurent Najman May 2019

Watersheds For Semi-Supervised Classification, Aditya Challa, Sravan Danda, B. S.Daya Sagar, Laurent Najman

Journal Articles

Watershed technique from mathematical morphology (MM) is one of the most widely used operators for image segmentation. Recently watersheds are adapted to edge weighted graphs, allowing for wider applicability. However, a few questions remain to be answered - How do the boundaries of the watershed operator behave? Which loss function does the watershed operator optimize? How does watershed operator relate with existing ideas from machine learning. In this letter, a framework is developed, which allows one to answer these questions. This is achieved by generalizing the maximum margin principle to maximum margin partition and proposing a generic solution, morphMedian, resulting …


Forecasting The Number Of Monthly Active Facebook And Twitter Worldwide Users Using Arma Model, Qasem Abu Al-Haija, Qian Mao, Kamal Al Nasr Apr 2019

Forecasting The Number Of Monthly Active Facebook And Twitter Worldwide Users Using Arma Model, Qasem Abu Al-Haija, Qian Mao, Kamal Al Nasr

Computer Science Faculty Research

In this study, an Auto-Regressive Moving Average (ARMA) Model with optimal order has been developed to estimate and forecast the short term future numbers of the monthly active Facebook and Twitter worldwide users. In order to pickup the optimal estimation order, we analyzed the model order vs. the corresponding model error in terms of final prediction error. The simulation results showed that the optimal model order to estimate the given Facebook and Twitter time series are ARMA[5, 5] and ARMA[3, 3], respectively, since they correspond to the minimum acceptable prediction error values. Besides, the optimal models recorded a high-level of …


A Grammar For Reproducible And Painless Extract-Transform-Load Operations On Medium Data, Benjamin S. Baumer Apr 2019

A Grammar For Reproducible And Painless Extract-Transform-Load Operations On Medium Data, Benjamin S. Baumer

Statistical and Data Sciences: Faculty Publications

Many interesting datasets available on the Internet are of a medium size—too big to fit into a personal computer’s memory, but not so large that they would not fit comfortably on its hard disk. In the coming years, datasets of this magnitude will inform vital research in a wide array of application domains. However, due to a variety of constraints they are cumbersome to ingest, wrangle, analyze, and share in a reproducible fashion. These obstructions hamper thorough peer-review and thus disrupt the forward progress of science. We propose a predictable and pipeable framework for R (the state-of-the-art statistical computing environment) …


Exploring And Visualizing Household Electricity Consumption Patterns In Singapore: A Geospatial Analytics Approach, Yong Ying Tan, Tin Seong Kam Apr 2019

Exploring And Visualizing Household Electricity Consumption Patterns In Singapore: A Geospatial Analytics Approach, Yong Ying Tan, Tin Seong Kam

Research Collection School Of Computing and Information Systems

Despite being a small country-state, electricity consumption in Singa-pore is said to be non-homogeneous, as exploratory data analysis showed that the distributions of electricity consumption differ across and within administrative boundaries and dwelling types. Local indicators of spatial association (LISA) were calculated for public housing postal codes using June 2016 data to discover local clusters of households based on electricity consumption patterns. A detailed walkthrough of the analytical process is outlined to describe the R packages and framework used in the R environment. The LISA results are visualized on three levels: country level, regional level and planning subzone level. At …


Bridge Deck Delamination Segmentation Based On Aerial Thermography Through Regularized Grayscale Morphological Reconstruction And Gradient Statistics, Chongsheng Cheng, Zhexiong Shang, Zhigang Shen Mar 2019

Bridge Deck Delamination Segmentation Based On Aerial Thermography Through Regularized Grayscale Morphological Reconstruction And Gradient Statistics, Chongsheng Cheng, Zhexiong Shang, Zhigang Shen

Department of Construction Engineering and Management: Faculty Publications

Environmental and surface texture-induced temperature variation across the bridge deck is a major source of errors in delamination detection through thermography. This type of external noise poises a significant challenge for conventional quantitative methods such as global thresholding and k-means clustering. An iterative top-down approach is proposed for delamination segmentation based on grayscale morphological reconstruction. A weight-decay function was used to regularize the reconstruction for regional maxima extraction. The mean and coefficient of variation of temperature gradient estimated from delamination boundaries were used for discrimination. The proposed approach was tested on a lab experiment and an in-service bridge deck. The …


Thermographic Laplacian-Pyramid Filtering To Enhance Delamination Detection In Concrete Structure, Chongsheng Cheng, Ri Na, Zhigang Shen Mar 2019

Thermographic Laplacian-Pyramid Filtering To Enhance Delamination Detection In Concrete Structure, Chongsheng Cheng, Ri Na, Zhigang Shen

Department of Construction Engineering and Management: Faculty Publications

Despite decades of efforts using thermography to detect delamination in concrete decks, challenges still exist in removing environmental noise from thermal images. The performance of conventional temperature-contrast approaches can be significantly limited by environment-induced non-uniform temperature distribution across imaging spaces. Time-series based methodologies were found robust to spatial temperature non-uniformity but requires extended period to collect data. A new empirical image filtering method is introduced in this paper to enhance the delamination detection using blob detection method that originated from computer vison. The proposed method employs a Laplacian of Gaussian filter to achieve multi-scale detection of abnormal thermal patterns by …


Streaming Feature Grouping And Selection (Sfgs) For Big Data Classification, Noura Helal Hamad Al Nuaimi Mar 2019

Streaming Feature Grouping And Selection (Sfgs) For Big Data Classification, Noura Helal Hamad Al Nuaimi

Dissertations

Real-time data has always been an essential element for organizations when the quickness of data delivery is critical to their businesses. Today, organizations understand the importance of real-time data analysis to maintain benefits from their generated data. Real-time data analysis is also known as real-time analytics, streaming analytics, real-time streaming analytics, and event processing. Stream processing is the key to getting results in real-time. It allows us to process the data stream in real-time as it arrives. The concept of streaming data means the data are generated dynamically, and the full stream is unknown or even infinite. This data becomes …


Untapped Potential Of Clinical Text For Opioid Surveillance, Amy L. Olex, Tamas Gal, Majid Afshar, Dmitriy Dligach, Niranjan Karnik, Travis Oakes, Brihat Sharma, Meng Xie, Bridget T. Mcinnes, Julian Solway, Abel Kho, William Cramer, F. Gerard Moeller Jan 2019

Untapped Potential Of Clinical Text For Opioid Surveillance, Amy L. Olex, Tamas Gal, Majid Afshar, Dmitriy Dligach, Niranjan Karnik, Travis Oakes, Brihat Sharma, Meng Xie, Bridget T. Mcinnes, Julian Solway, Abel Kho, William Cramer, F. Gerard Moeller

Wright Center for Clinical and Translational Research Works

Accurate surveillance is needed to combat the growing opioid epidemic. To investigate the potential volume of missed opioid overdoses, we compare overdose encounters identified by ICD-10-CM codes and an NLP pipeline from two different medical systems. Our results show that the NLP pipeline identified a larger percentage of OOD encounters than ICD-10-CM codes. Thus, incorporating sophisticated NLP techniques into current diagnostic methods has the potential to improve surveillance on the incidence of opioid overdoses.


Explorobot: Rapid Exploration With Chart Automation, Tamara Matthews, Rohan Goel, John Mcauley Jan 2019

Explorobot: Rapid Exploration With Chart Automation, Tamara Matthews, Rohan Goel, John Mcauley

Conference papers

General-purpose visualization tools are used by people with varying degrees of data literacy. Often the user is not a professional analyst or data scientist and uses the tool infrequently, to support an aspect of their job. This can present difficulties as the user’s unfamiliarity with visualization practice and infrequent use of the tool can result in long processing time, inaccurate data representations or inappropriate visual encodings. To address this problem, we developed a visual analytics application called exploroBOT. The exploroBOT automatically generates visualizations and the exploration guidance path (an associated network of decision points, mapping nodes where visualizations change). These …


Detecting Special-Cause Variation 'Events' From Process Data Signatures, Timothy M. Young, Olga Khaliukova, Nicolas André, Alexander Petutschnigg, Timothy G. Rials, Chung-Hao Chen Jan 2019

Detecting Special-Cause Variation 'Events' From Process Data Signatures, Timothy M. Young, Olga Khaliukova, Nicolas André, Alexander Petutschnigg, Timothy G. Rials, Chung-Hao Chen

Electrical & Computer Engineering Faculty Publications

The ability to detect the special-cause variation of incoming feedstocks from advanced sensor technology is invaluable to manufacturers. Many on-line sensors produce data signatures that require further off-line statistical processing for interpretation by operational personnel. However, early detection of changes in variation in incoming feedstocks may be imperative to promote early-stage preventive measures. A method is proposed in this applied study for developing control bands to quantify the variation of data signatures in the context of statistical process control (SPC). Control bands based on pointwise prediction intervals constructed from the Bonferroni Inequality and Bayesian smoothing splines are developed. Applications using …


Classification As Catachresis: Double Binds Of Representing Difference With Semiotic Infrastructure, Lindsay Poirier Jan 2019

Classification As Catachresis: Double Binds Of Representing Difference With Semiotic Infrastructure, Lindsay Poirier

Statistical and Data Sciences: Faculty Publications

Background; This article explores the results of a three-year ethnographic study of how semiotic infrastructures-or digital standards and frameworks such as taxonomies, schemas, and ontologies that encode the meaning of data-are designed. Analysis: It examines debates over best practices in semiotic infrastructure design, such as how much complexity adopted languages should characterize versus how restrictive they should be. It also discusses political and pragmatic considerations that impact what and how information is represented in an information system. Conclusion and implications: This article suggests that all databased representations are forms of data power, and that examining semiotic infrastructure design provides insight …


Knowledge Management Overview Of Feature Selection Problem In High-Dimensional Financial Data: Cooperative Co-Evolution And Map Reduce Perspectives, A. N. M. Bazlur Rashid, Tonmoy Choudhury Jan 2019

Knowledge Management Overview Of Feature Selection Problem In High-Dimensional Financial Data: Cooperative Co-Evolution And Map Reduce Perspectives, A. N. M. Bazlur Rashid, Tonmoy Choudhury

Research outputs 2014 to 2021

The term "big data" characterizes the massive amounts of data generation by the advanced technologies in different domains using 4Vs volume, velocity, variety, and veracity-to indicate the amount of data that can only be processed via computationally intensive analysis, the speed of their creation, the different types of data, and their accuracy. High-dimensional financial data, such as time-series and space-Time data, contain a large number of features (variables) while having a small number of samples, which are used to measure various real-Time business situations for financial organizations. Such datasets are normally noisy, and complex correlations may exist between their features, …