Computer Engineering | Open Access Articles | Digital Commons Network™

On Resource-Efficiency And Performance Optimization In Big Data Computing And Networking Using Machine Learning, Wuji Liu

Dissertations

Due to the rapid transition from traditional experiment-based approaches to large-scale, computational intensive simulations, next-generation scientific applications typically involve complex numerical modeling and extreme-scale simulations. Such model-based simulations oftentimes generate colossal amounts of data, which must be transferred over high-performance network (HPN) infrastructures to remote sites and analyzed against experimental or observation data on high-performance computing (HPC) facility. Optimizing the performance of both data transfer in HPN and simulation-based model development on HPC is critical to enabling and accelerating knowledge discovery and scientific innovation. However, such processes generally involve an enormous set of attributes including domain-specific model parameters, network transport …

Go to article

Benchmarking Small-Dataset Structure-Activity-Relationship Models For Prediction Of Wnt Signaling Inhibition, Mahtab Kokabi

Masters Theses

Quantitative structure-activity relationship (QSAR) models based on machine learning algorithms are powerful tools to expedite drug discovery processes and therapeutics development. Given the cost in acquiring large-sized training datasets, it is useful to examine if QSAR analysis can reasonably predict drug activity with only a small-sized dataset (size < 100) and benchmark these small-dataset QSAR models in application-specific studies. To this end, here we present a systematic benchmarking study on small-dataset QSAR models built for prediction of effective Wnt signaling inhibitors, which are essential to therapeutics development in prevalent human diseases (e.g., cancer). Specifically, we examined a total of 72 two-dimensional (2D) QSAR models based on 4 best-performing algorithms, 6 commonly used molecular fingerprints, and 3 typical fingerprint lengths. We trained these models using a training dataset (56 compounds), benchmarked their performance on 4 figures-of-merit (FOMs), and examined their prediction accuracy using an external validation dataset (14 compounds). Our data show that the model performance is maximized when: 1) molecular fingerprints are selected to provide sufficient, unique, and not overly detailed representations of the chemical structures of drug compounds; 2) algorithms are selected to reduce the number of false predictions due to class imbalance in the dataset; and 3) models are selected to reach balanced performance on all 4 FOMs. These results may provide general guidelines in developing high-performance small-dataset QSAR models for drug activity prediction.

Go to article

Recipe For Disaster, Zac Travis

MFA Thesis Exhibit Catalogs

Today’s rapid advances in algorithmic processes are creating and generating predictions through common applications, including speech recognition, natural language (text) generation, search engine prediction, social media personalization, and product recommendations. These algorithmic processes rapidly sort through streams of computational calculations and personal digital footprints to predict, make decisions, translate, and attempt to mimic human cognitive function as closely as possible. This is known as machine learning.

The project Recipe for Disaster was developed by exploring automation in technology, specifically through the use of machine learning and recurrent neural networks. These algorithmic models feed on large amounts of data as a …

Go to article

Computer Engineering Commons^™

Full-Text Articles in Computer Engineering

On Resource-Efficiency And Performance Optimization In Big Data Computing And Networking Using Machine Learning, Wuji Liu

Dissertations

Benchmarking Small-Dataset Structure-Activity-Relationship Models For Prediction Of Wnt Signaling Inhibition, Mahtab Kokabi

Masters Theses

Recipe For Disaster, Zac Travis

MFA Thesis Exhibit Catalogs