Open Access. Powered by Scholars. Published by Universities.®

Statistics and Probability Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 3 of 3

Full-Text Articles in Statistics and Probability

The Impact Of Data Preparation And Model Complexity On The Natural Language Classification Of Chinese News Headlines, Torrey J. Wagner, Dennis Guhl, Brent T. Langhals Mar 2024

The Impact Of Data Preparation And Model Complexity On The Natural Language Classification Of Chinese News Headlines, Torrey J. Wagner, Dennis Guhl, Brent T. Langhals

Faculty Publications

Given the emergence of China as a political and economic power in the 21st century, there is increased interest in analyzing Chinese news articles to better understand developing trends in China. Because of the volume of the material, automating the categorization of Chinese-language news articles by headline text or titles can be an effective way to sort the articles into categories for efficient review. A 383,000-headline dataset labeled with 15 categories from the Toutiao website was evaluated via natural language processing to predict topic categories. The influence of six data preparation variations on the predictive accuracy of four algorithms was …


Lightning Forecast From Chaotic And Incomplete Time Series Using Wavelet De-Noising And Spatiotemporal Kriging, Jared K. Nystrom, Raymond Hill, Andrew J. Geyer, Joseph J. Pignatiello Jr., Eric Chicken Oct 2023

Lightning Forecast From Chaotic And Incomplete Time Series Using Wavelet De-Noising And Spatiotemporal Kriging, Jared K. Nystrom, Raymond Hill, Andrew J. Geyer, Joseph J. Pignatiello Jr., Eric Chicken

Faculty Publications

Purpose: Present a method to impute missing data from a chaotic time series, in this case lightning prediction data, and then use that completed dataset to create lightning prediction forecasts.

Design/Methodology/Approach: Using the technique of spatiotemporal kriging to estimate data that is autocorrelated but in space and time. Using the estimated data in an imputation methodology completes a dataset used in lighting prediction.

Findings: The techniques provided prove robust to the chaotic nature of the data, and the resulting time series displays evidence of smoothing while also preserving the signal of interest for lightning prediction.

Abstract © Emerald Publishing …


Improved N-Dimensional Data Visualization From Hyper-Radial Values, Todd J. Paciencia, Trevor J. Bihl, Kenneth W. Bauer Jan 2019

Improved N-Dimensional Data Visualization From Hyper-Radial Values, Todd J. Paciencia, Trevor J. Bihl, Kenneth W. Bauer

Faculty Publications

Higher-dimensional data, which is becoming common in many disciplines due to big data problems, are inherently difficult to visualize in a meaningful way. While many visualization methods exist, they are often difficult to interpret, involve multiple plots and overlaid points, or require simultaneous interpretations. This research adapts and extends hyper-radial visualization, a technique used to visualize Pareto fronts in multi-objective optimizations, to become an n-dimensional visualization tool. Hyper-radial visualization is seen to offer many advantages by presenting a low-dimensionality representation of data through easily understood calculations. First, hyper-radial visualization is extended for use with general multivariate data. Second, a method …