Open Access. Powered by Scholars. Published by Universities.®
Artificial Intelligence and Robotics Commons™
Open Access. Powered by Scholars. Published by Universities.®
- Keyword
-
- Machine Learning (cs.LG) (9)
- Learning systems (8)
- Machine learning (6)
- Optimization (6)
- Reinforcement learning (6)
-
- Stochastic systems (5)
- Artificial Intelligence (cs.AI) (4)
- Deep learning (4)
- Antennas (3)
- Artificial intelligence (3)
- Computational linguistics (3)
- Condition (3)
- Internet of things (3)
- Learn+ (3)
- Optimization and Control (math.OC) (3)
- Simple++ (3)
- Wireless networks (3)
- Auto encoders (2)
- Benchmarking (2)
- Classification tasks (2)
- Computer vision (2)
- Constrained optimization (2)
- Design (2)
- Distributed optimization (2)
- Feature extraction (2)
- Feature selection (2)
- Federated learning (2)
- Geometry (2)
- Health care (2)
- Learning strategy (2)
Articles 1 - 30 of 83
Full-Text Articles in Artificial Intelligence and Robotics
Bare-Bones Based Salp Swarm Algorithm For Text Document Clustering, Mohammed Azmi Al-Betar, Ammar Kamal Abasi, Ghazi Al-Naymat, Kamran Arshad, Sharif Naser Makhadmeh
Bare-Bones Based Salp Swarm Algorithm For Text Document Clustering, Mohammed Azmi Al-Betar, Ammar Kamal Abasi, Ghazi Al-Naymat, Kamran Arshad, Sharif Naser Makhadmeh
Machine Learning Faculty Publications
Text Document Clustering (TDC) is a challenging optimization problem in unsupervised machine learning and text mining. The Salp Swarm Algorithm (SSA) has been found to be effective in solving complex optimization problems. However, the SSA’s exploitation phase requires improvement to solve the TDC problem effectively. In this paper, we propose a new approach, known as the Bare-Bones Salp Swarm Algorithm (BBSSA), which leverages Gaussian search equations, inverse hyperbolic cosine control strategies, and greedy selection techniques to create new individuals and guide the population towards solving the TDC problem. We evaluated the performance of the BBSSA on six benchmark datasets from …
A Study On Feature Selection Using Multi-Domain Feature Extraction For Automated K-Complex Detection, Yabing Li, Xinglong Dong, Kun Song, Xiangyun Bai, Hongye Li, Fakhreddine Karray
A Study On Feature Selection Using Multi-Domain Feature Extraction For Automated K-Complex Detection, Yabing Li, Xinglong Dong, Kun Song, Xiangyun Bai, Hongye Li, Fakhreddine Karray
Machine Learning Faculty Publications
Background: K-complex detection plays a significant role in the field of sleep research. However, manual annotation for electroencephalography (EEG) recordings by visual inspection from experts is time-consuming and subjective. Therefore, there is a necessity to implement automatic detection methods based on classical machine learning algorithms. However, due to the complexity of EEG signal, current feature extraction methods always produce low relevance to k-complex detection, which leads to a great performance loss for the detection. Hence, finding compact yet effective integrated feature vectors becomes a crucially core task in k-complex detection. Method: In this paper, we first extract multi-domain features based …
Reinforcement Learning Approach To Stochastic Vehicle Routing Problem With Correlated Demands, Zangir Iklassov, Ikboljon Sobirov, Ruben Solozabal, Martin Takac
Reinforcement Learning Approach To Stochastic Vehicle Routing Problem With Correlated Demands, Zangir Iklassov, Ikboljon Sobirov, Ruben Solozabal, Martin Takac
Machine Learning Faculty Publications
We present a novel end-to-end framework for solving the Vehicle Routing Problem with stochastic demands (VRPSD) using Reinforcement Learning (RL). Our formulation incorporates the correlation between stochastic demands through other observable stochastic variables, thereby offering an experimental demonstration of the theoretical premise that non-i.i.d. stochastic demands provide opportunities for improved routing solutions. Our approach bridges the gap in the application of RL to VRPSD and consists of a parameterized stochastic policy optimized using a policy gradient algorithm to generate a sequence of actions that form the solution. Our model outperforms previous state-of-the-art metaheuristics and demonstrates robustness to changes in the …
A Multi-Layer Information Dissemination Model And Interference Optimization Strategy For Communication Networks In Disaster Areas, Yuexia Zhang, Yang Hong, Mohsen Guizani, Sheng Wu, Peiying Zhang, Ruiqi Liu
A Multi-Layer Information Dissemination Model And Interference Optimization Strategy For Communication Networks In Disaster Areas, Yuexia Zhang, Yang Hong, Mohsen Guizani, Sheng Wu, Peiying Zhang, Ruiqi Liu
Machine Learning Faculty Publications
The communication network in disaster areas (CNDA) can disseminate the key disaster information in time and provide basic information support for decision-making and rescuing. Therefore, it is of great significance to study the information dissemination mechanism of CNDA. However, a CNDA is vulnerable to interference, which affects information dissemination and rescuing. To solve this problem, this paper established a multi-layer information dissemination model of CNDA (MMND) which models the CNDA from the perspective of degree distribution of nodes. The information dissemination process and equilibrium state in CNDA is analyzed by an improved dynamic dissemination method. Then, the effects of the …
Arabic Dysarthric Speech Recognition Using Adversarial And Signal-Based Augmentation, Massa Baali, Ibrahim Almakky, Shady Shehata, Fakhri Karray
Arabic Dysarthric Speech Recognition Using Adversarial And Signal-Based Augmentation, Massa Baali, Ibrahim Almakky, Shady Shehata, Fakhri Karray
Machine Learning Faculty Publications
Despite major advancements in Automatic Speech Recognition (ASR), the state-of-the-art ASR systems struggle to deal with impaired speech even with high-resource languages. In Arabic, this challenge gets amplified, with added complexities in collecting data from dysarthric speakers. In this paper, we aim to improve the performance of Arabic dysarthric automatic speech recognition through a multi-stage augmentation approach. To this effect, we first propose a signal-based approach to generate dysarthric Arabic speech from healthy Arabic speech by modifying its speed and tempo. We also propose a second stage Parallel Wave Generative (PWG) adversarial model that is trained on an English dysarthric …
Fooctts: Generating Arabic Speech With Acoustic Environment For Football Commentator, Massa Baali, Ahmed Ali
Fooctts: Generating Arabic Speech With Acoustic Environment For Football Commentator, Massa Baali, Ahmed Ali
Machine Learning Faculty Publications
This paper presents FOOCTTS, an automatic pipeline for a football commentator that generates speech with background crowd noise. The application gets the text from the user, applies text pre-processing such as vowelization, followed by the commentator's speech synthesizer. Our pipeline included Arabic automatic speech recognition for data labeling, CTC segmentation, transcription vowelization to match speech, and fine-tuning the TTS. Our system is capable of generating speech with its acoustic environment within limited 15 minutes of football commentator recording. Our prototype is generalizable and can be easily applied to different domains and languages.
S2cd: Self-Heuristic Speaker Content Disentanglement For Any-To-Any Voice Conversion, Pengfei Wei, Xiang Yin, Chunfeng Wang, Zhonghao Li, Xinghua Qu, Zhiqiang Xu, Zejun Ma
S2cd: Self-Heuristic Speaker Content Disentanglement For Any-To-Any Voice Conversion, Pengfei Wei, Xiang Yin, Chunfeng Wang, Zhonghao Li, Xinghua Qu, Zhiqiang Xu, Zejun Ma
Machine Learning Faculty Publications
In this paper, we propose a Self-heuristic Speaker Content Disentanglement (S2CD) model for any to any voice conversion without using any external resources, e.g., speaker labels or vectors, linguistic models, and transcriptions. S2CD is built on the disentanglement sequential variational autoencoder (DSVAE), but improves DSVAE structure at the model architecture level from three perspectives. Specifically, we develop different structures for speaker and content encoders based on their underlying static/dynamic property. We further propose a generative graph, modelled by S2CD, so as to make S2CD well mimic the multi-speaker speech generation process. Finally, we propose a self-heuristic way to introduce bias …
Linear Classifier: An Often-Forgotten Baseline For Text Classification, Yu Chen Lin, Si An Chen, Jie Jyun Liu, Chih Jen Lin
Linear Classifier: An Often-Forgotten Baseline For Text Classification, Yu Chen Lin, Si An Chen, Jie Jyun Liu, Chih Jen Lin
Machine Learning Faculty Publications
Large-scale pre-trained language models such as BERT are popular solutions for text classification. Due to the superior performance of these advanced methods, nowadays, people often directly train them for a few epochs and deploy the obtained model. In this opinion paper, we point out that this way may only sometimes get satisfactory results. We argue the importance of running a simple baseline like linear classifiers on bag-of-words features along with advanced methods. First, for many text data, linear methods show competitive performance, high efficiency, and robustness. Second, advanced models such as BERT may only achieve the best results if properly …
Adversarial Alignment For Source Free Object Detection, Qiaosong Chu, Shuyan Li, Guangyi Chen, Kai Li, Xiu Li
Adversarial Alignment For Source Free Object Detection, Qiaosong Chu, Shuyan Li, Guangyi Chen, Kai Li, Xiu Li
Machine Learning Faculty Publications
Source-free object detection (SFOD) aims to transfer a detector pre-trained on a label-rich source domain to an unlabeled target domain without seeing source data. While most existing SFOD methods generate pseudo labels via a source-pretrained model to guide training, these pseudo labels usually contain high noises due to heavy domain discrepancy. In order to obtain better pseudo supervisions, we divide the target domain into source-similar and source-dissimilar parts and align them in the feature space by adversarial learning. Specifically, we design a detection variance-based criterion to divide the target domain. This criterion is motivated by a finding that larger detection …
Corruption-Tolerant Algorithms For Generalized Linear Models, Bhaskar Mukhoty, Debojyoti Dey, Purushottam Kar
Corruption-Tolerant Algorithms For Generalized Linear Models, Bhaskar Mukhoty, Debojyoti Dey, Purushottam Kar
Machine Learning Faculty Publications
This paper presents SVAM (Sequential Variance-Altered MLE), a unified framework for learning generalized linear models under adversarial label corruption in training data. SVAM extends to tasks such as least squares regression, logistic regression, and gamma regression, whereas many existing works on learning with label corruptions focus only on least squares regression. SVAM is based on a novel variance reduction technique that may be of independent interest and works by iteratively solving weighted MLEs over variance-altered versions of the GLM objective. SVAM offers provable model recovery guarantees superior to the state-of-the-art for robust regression even when a constant fraction of training …
Stability-Based Generalization Analysis For Mixtures Of Pointwise And Pairwise Learning, Jiahuan Wang, Jun Chen, Hong Chen, Bin Gu, Weifu Li, Xin Tang
Stability-Based Generalization Analysis For Mixtures Of Pointwise And Pairwise Learning, Jiahuan Wang, Jun Chen, Hong Chen, Bin Gu, Weifu Li, Xin Tang
Machine Learning Faculty Publications
Recently, some mixture algorithms of pointwise and pairwise learning (PPL) have been formulated by employing the hybrid error metric of “pointwise loss + pairwise loss” and have shown empirical effectiveness on feature selection, ranking and recommendation tasks. However, to the best of our knowledge, the learning theory foundation of PPL has not been touched in the existing works. In this paper, we try to fill this theoretical gap by investigating the generalization properties of PPL. After extending the definitions of algorithmic stability to the PPL setting, we establish the high-probability generalization bounds for uniformly stable PPL algorithms. Moreover, explicit convergence …
Joint Flood Risks In The Grand River Watershed, Poornima Unnikrishnan, Kumaraswamy Ponnambalam, Nirupama Agrawal, Fakhri Karray
Joint Flood Risks In The Grand River Watershed, Poornima Unnikrishnan, Kumaraswamy Ponnambalam, Nirupama Agrawal, Fakhri Karray
Machine Learning Faculty Publications
According to the World Meteorological Organization, since 2000, there has been an increase in global flood-related disasters by 134 percent compared to the previous decades. Efficient flood risk management strategies necessitate a holistic approach to evaluating flood vulnerabilities and risks. Catastrophic losses can occur when the peak flow values in the rivers in a basin coincide. Therefore, estimating the joint flood risks in a region is vital, especially when frequent occurrences of extreme events are experienced. This study focuses on estimating the joint flood risks due to river flow extremes in the Grand River watershed in Canada. For this purpose, …
On The Accelerated Noise-Tolerant Power Method, Zhiqiang Xu
On The Accelerated Noise-Tolerant Power Method, Zhiqiang Xu
Machine Learning Faculty Publications
We revisit the acceleration of the noise-tolerant power method for which, despite previous studies, the results remain unsatisfactory as they are either wrong or suboptimal, also lacking generality. In this work, we present a simple yet general and optimal analysis via noise-corrupted Chebyshev polynomials, which allows a larger iteration rank p than the target rank k, requires less noise conditions in a new form, and achieves the optimal iteration complexity (Equation presented) for some q satisfying k ≤ q ≤ p in a certain regime of the momentum parameter. Interestingly, it shows dynamic dependence of the noise tolerance on the …
Towards Carbon Neutrality: Prediction Of Wave Energy Based On Improved Gru In Maritime Transportation, Zhihan Lv, Nana Wang, Ranran Lou, Yajun Tian, Mohsen Guizani
Towards Carbon Neutrality: Prediction Of Wave Energy Based On Improved Gru In Maritime Transportation, Zhihan Lv, Nana Wang, Ranran Lou, Yajun Tian, Mohsen Guizani
Machine Learning Faculty Publications
Efficient use of renewable energy is one of the critical measures to achieve carbon neutrality. Countries have introduced policies to put carbon neutrality on the agenda to achieve relatively zero emissions of greenhouse gases and to cope with the crisis brought about by global warming. This work analyzes the wave energy with high energy density and wide distribution based on understanding of various renewable energy sources. This study provides a wave energy prediction model for energy harvesting. At the same time, the Gated Recurrent Unit network (GRU), Bayesian optimization algorithm, and attention mechanism are introduced to improve the model's performance. …
Channel-Resilient Deep-Learning-Driven Device Fingerprinting Through Multiple Data Streams, Nora Basha, Bechir Hamdaoui, Kathiravetpillai Sivanesan, Mohsen Guizani
Channel-Resilient Deep-Learning-Driven Device Fingerprinting Through Multiple Data Streams, Nora Basha, Bechir Hamdaoui, Kathiravetpillai Sivanesan, Mohsen Guizani
Machine Learning Faculty Publications
Enabling accurate and automated identification of wireless devices is critical for allowing network access monitoring and ensuring data authentication for large-scale IoT networks. RF fingerprinting has emerged as a solution for device identification by leveraging the transmitters' inevitable hardware impairments that occur during manufacturing. Although deep learning is proven efficient in classifying devices based on hardware impairments, the performance of deep learning models suffers greatly from variations of the wireless channel conditions, across time and space. To the best of our knowledge, we are the first to propose leveraging MIMO capabilities to mitigate the channel effect and provide a channel-resilient …
Differentially Private Stochastic Convex Optimization In (Non)-Euclidean Space Revisited, Jinyan Su, Changhong Zhao, Di Wang
Differentially Private Stochastic Convex Optimization In (Non)-Euclidean Space Revisited, Jinyan Su, Changhong Zhao, Di Wang
Machine Learning Faculty Publications
In this paper, we revisit the problem of Differentially Private Stochastic Convex Optimization (DP-SCO) in Euclidean and general `dp spaces. Specifically, we focus on three settings that are still far from well understood: (1) DP-SCO over a constrained and bounded (convex) set in Euclidean space; (2) unconstrained DP-SCO in `dp space; (3) DP-SCO with heavy-tailed data over a constrained and bounded set in `dp space. For problem (1), for both convex and strongly convex loss functions, we propose methods whose outputs could achieve (expected) excess population risks that are only dependent on the Gaussian width of the constraint set, rather …
A Hybrid Artificial Intelligence Model For Detecting Keratoconus, Zaid Abdi Alkareem Alyasseri, Ali H. Al-Timemy, Ammar Kamal Abasi, Alexandru Lavric, Husam Jasim Mohammed, Hidenori Takahashi, Jose Arthur Milhomens Filho, Mauro Campos, Rossen M. Hazarbassanov, Siamak Yousefi
A Hybrid Artificial Intelligence Model For Detecting Keratoconus, Zaid Abdi Alkareem Alyasseri, Ali H. Al-Timemy, Ammar Kamal Abasi, Alexandru Lavric, Husam Jasim Mohammed, Hidenori Takahashi, Jose Arthur Milhomens Filho, Mauro Campos, Rossen M. Hazarbassanov, Siamak Yousefi
Machine Learning Faculty Publications
Machine learning models have recently provided great promise in diagnosis of several ophthalmic disorders, including keratoconus (KCN). Keratoconus, a noninflammatory ectatic corneal disorder characterized by progressive cornea thinning, is challenging to detect as signs may be subtle. Several machine learning models have been proposed to detect KCN, however most of the models are supervised and thus require large well-annotated data. This paper proposes a new unsupervised model to detect KCN, based on adapted flower pollination algorithm (FPA) and the k-means algorithm. We will evaluate the proposed models using corneal data collected from 5430 eyes at different stages of KCN severity …
Impact Of Digital Twins And Metaverse On Cities: History, Current Situation, And Application Perspectives, Zhihan Lv, Wen Long Shang, Mohsen Guizani
Impact Of Digital Twins And Metaverse On Cities: History, Current Situation, And Application Perspectives, Zhihan Lv, Wen Long Shang, Mohsen Guizani
Machine Learning Faculty Publications
To promote the expansion and adoption of Digital Twins (DTs) in Smart Cities (SCs), a detailed review of the impact of DTs and digitalization on cities is made to assess the progression of cities and standardization of their management mode. Combined with the technical elements of DTs, the coupling effect of DTs technology and urban construction and the internal logic of DTs technology embedded in urban construction are discussed. Relevant literature covering the full range of DTs technologies and their applications is collected, evaluated, and collated, relevant studies are concatenated, and relevant accepted conclusions are summarized by modules. First, the …
A Damped Newton Method Achieves Global O(1/K2) And Local Quadratic Convergence Rate, Slavomír Hanzely, Dmitry Kamzolov, Dmitry Pasechnyuk, Alexander Gasnikov, Peter Richtárik, Martin Takáč
A Damped Newton Method Achieves Global O(1/K2) And Local Quadratic Convergence Rate, Slavomír Hanzely, Dmitry Kamzolov, Dmitry Pasechnyuk, Alexander Gasnikov, Peter Richtárik, Martin Takáč
Machine Learning Faculty Publications
In this paper, we present the first stepsize schedule for Newton method resulting in fast global and local convergence guarantees. In particular, a) we prove an O (1/k2) global rate, which matches the state-of-the-art global rate of cubically regularized Newton method of Polyak and Nesterov (2006) and of regularized Newton method of Mishchenko (2021) and Doikov and Nesterov (2021), b) we prove a local quadratic rate, which matches the best-known local rate of second-order methods, and c) our stepsize formula is simple, explicit, and does not require solving any subproblem. Our convergence proofs hold under affine-invariance assumptions closely related to …
Amp: Automatically Finding Model Parallel Strategies With Heterogeneity Awareness, Dacheng Li, Hongyi Wang, Eric Xing, Hao Zhang
Amp: Automatically Finding Model Parallel Strategies With Heterogeneity Awareness, Dacheng Li, Hongyi Wang, Eric Xing, Hao Zhang
Machine Learning Faculty Publications
Scaling up model sizes can lead to fundamentally new capabilities in many machine learning (ML) tasks. However, training big models requires strong distributed system expertise to carefully design model-parallel execution strategies that suit the model architectures and cluster setups. In this paper, we develop AMP, a framework that automatically derives such strategies. AMP identifies a valid space of model parallelism strategies and efficiently searches the space for high-performed strategies, by leveraging a cost model designed to capture the heterogeneity of the model and cluster specifications. Unlike existing methods, AMP is specifically tailored to support complex models composed of uneven layers …
Automs: Automatic Model Selection For Novelty Detection With Error Rate Control, Yifan Zhang, Haiyan Jiang, Haojie Ren, Changliang Zou, Dejing Dou
Automs: Automatic Model Selection For Novelty Detection With Error Rate Control, Yifan Zhang, Haiyan Jiang, Haojie Ren, Changliang Zou, Dejing Dou
Machine Learning Faculty Publications
Given an unsupervised novelty detection task on a new dataset, how can we automatically select a “best” detection model while simultaneously controlling the error rate of the best model? For novelty detection analysis, numerous detectors have been proposed to detect outliers on a new unseen dataset based on a score function trained on available clean data. However, due to the absence of labeled anomalous data for model evaluation and comparison, there is a lack of systematic approaches that are able to select the “best” model/detector (i.e., the algorithm as well as its hyperparameters) and achieve certain error rate control simultaneously. …
Efficient (Soft) Q-Learning For Text Generation With Limited Good Data, Han Guo, Bowen Tan, Zhengzhong Liu, Eric P. Xing, Zhiting Hu
Efficient (Soft) Q-Learning For Text Generation With Limited Good Data, Han Guo, Bowen Tan, Zhengzhong Liu, Eric P. Xing, Zhiting Hu
Machine Learning Faculty Publications
Maximum likelihood estimation (MLE) is the predominant algorithm for training text generation models. This paradigm relies on direct supervision examples, which is not applicable to many emerging applications, such as generating adversarial attacks or generating prompts to control language models. Reinforcement learning (RL) on the other hand offers a more flexible solution by allowing users to plug in arbitrary task metrics as reward. Yet previous RL algorithms for text generation, such as policy gradient (on-policy RL) and Q-learning (off-policy RL), are often notoriously inefficient or unstable to train due to the large sequence space and the sparse reward received only …
Factored Adaptation For Non-Stationary Reinforcement Learning, Fan Feng, Biwei Huang, Kun Zhang, Sara Magliacane
Factored Adaptation For Non-Stationary Reinforcement Learning, Fan Feng, Biwei Huang, Kun Zhang, Sara Magliacane
Machine Learning Faculty Publications
Dealing with non-stationarity in environments (e.g., in the transition dynamics) and objectives (e.g., in the reward functions) is a challenging problem that is crucial in real-world applications of reinforcement learning (RL). While most current approaches model the changes as a single shared embedding vector, we leverage insights from the recent causality literature to model non-stationarity in terms of individual latent change factors, and causal graphs across different environments. In particular, we propose Factored Adaptation for Non-Stationary RL (FANS-RL), a factored adaption approach that learns jointly both the causal structure in terms of a factored MDP, and a factored representation of …
Independence Testing-Based Approach To Causal Discovery Under Measurement Error And Linear Non-Gaussian Models, Haoyue Dai, Peter Spirtes, Kun Zhang
Independence Testing-Based Approach To Causal Discovery Under Measurement Error And Linear Non-Gaussian Models, Haoyue Dai, Peter Spirtes, Kun Zhang
Machine Learning Faculty Publications
Causal discovery aims to recover causal structures generating the observational data. Despite its success in certain problems, in many real-world scenarios the observed variables are not the target variables of interest, but the imperfect measures of the target variables. Causal discovery under measurement error aims to recover the causal graph among unobserved target variables from observations made with measurement error. We consider a specific formulation of the problem, where the unobserved target variables follow a linear non-Gaussian acyclic model, and the measurement process follows the random measurement error model. Existing methods on this formulation rely on non-scalable over-complete independent component …
On Pac Learning Halfspaces In Non-Interactive Local Privacy Model With Public Unlabeled Data, Jinyan Su, Jinhui Xu, Di Wang
On Pac Learning Halfspaces In Non-Interactive Local Privacy Model With Public Unlabeled Data, Jinyan Su, Jinhui Xu, Di Wang
Machine Learning Faculty Publications
In this paper, we study the problem of PAC learning halfspaces in the non-interactive local differential privacy model (NLDP). To breach the barrier of exponential sample complexity, previous results studied a relaxed setting where the server has access to some additional public but unlabeled data. We continue in this direction. Specifically, we consider the problem under the standard setting instead of the large margin setting studied before. Under different mild assumptions on the underlying data distribution, we propose two approaches that are based on the Massart noise model and self-supervised learning and show that it is possible to achieve sample …
Rare Gems: Finding Lottery Tickets At Initialization, Kartik Sreenivasan, Jy Yong Sohn, Liu Yang, Matthew Grinde, Alliot Nagle, Hongyi Wang, Eric Xing, Kangwook Lee, Dimitris Papailiopoulos
Rare Gems: Finding Lottery Tickets At Initialization, Kartik Sreenivasan, Jy Yong Sohn, Liu Yang, Matthew Grinde, Alliot Nagle, Hongyi Wang, Eric Xing, Kangwook Lee, Dimitris Papailiopoulos
Machine Learning Faculty Publications
Large neural networks can be pruned to a small fraction of their original size, with little loss in accuracy, by following a time-consuming “train, prune, re-train” approach. Frankle & Carbin [9] conjecture that we can avoid this by training lottery tickets, i.e., special sparse subnetworks found at initialization, that can be trained to high accuracy. However, a subsequent line of work [11, 41] presents concrete evidence that current algorithms for finding trainable networks at initialization, fail simple baseline comparisons, e.g., against training random sparse subnetworks. Finding lottery tickets that train to better accuracy compared to simple baselines remains an open …
Unpaired Image-To-Image Translation With Density Changing Regularization, Shaoan Xie, Qirong Ho, Kun Zhang
Unpaired Image-To-Image Translation With Density Changing Regularization, Shaoan Xie, Qirong Ho, Kun Zhang
Machine Learning Faculty Publications
Unpaired image-to-image translation aims to translate an input image to another domain such that the output image looks like an image from another domain while important semantic information are preserved. Inferring the optimal mapping with unpaired data is impossible without making any assumptions. In this paper, we make a density changing assumption where image patches of high probability density should be mapped to patches of high probability density in another domain. Then we propose an efficient way to enforce this assumption: we train the flows as density estimators and penalize the variance of density changes. Despite its simplicity, our method …
Zeroth-Order Hard-Thresholding: Gradient Error Vs. Expansivity, William De Vazelhes, Hualin Zhang, Huimin Wu, Xiao Tong Yuan, Bin Gu
Zeroth-Order Hard-Thresholding: Gradient Error Vs. Expansivity, William De Vazelhes, Hualin Zhang, Huimin Wu, Xiao Tong Yuan, Bin Gu
Machine Learning Faculty Publications
ℓ0 constrained optimization is prevalent in machine learning, particularly for high-dimensional problems, because it is a fundamental approach to achieve sparse learning. Hard-thresholding gradient descent is a dominant technique to solve this problem. However, first-order gradients of the objective function may be either unavailable or expensive to calculate in a lot of real-world problems, where zeroth-order (ZO) gradients could be a good surrogate. Unfortunately, whether ZO gradients can work with the hard-thresholding operator is still an unsolved problem. To solve this puzzle, in this paper, we focus on the ℓ0 constrained black-box stochastic optimization problems, and propose a new stochastic …
Zeroth-Order Negative Curvature Finding: Escaping Saddle Points Without Gradients, Hualin Zhang, Huan Xiong, Bin Gu
Zeroth-Order Negative Curvature Finding: Escaping Saddle Points Without Gradients, Hualin Zhang, Huan Xiong, Bin Gu
Machine Learning Faculty Publications
We consider escaping saddle points of nonconvex problems where only the function evaluations can be accessed. Although a variety of works have been proposed, the majority of them require either second or first-order information, and only a few of them have exploited zeroth-order methods, particularly the technique of negative curvature finding with zeroth-order methods which has been proven to be the most efficient method for escaping saddle points. To fill this gap, in this paper, we propose two zeroth-order negative curvature finding frameworks that can replace Hessian-vector product computations without increasing the iteration complexity. We apply the proposed frameworks to …
Hyperfast Second-Order Local Solvers For Efficient Statistically Preconditioned Distributed Optimization, Pavel Dvurechensky, Dmitry Kamzolov, Aleksandr Lukashevich, Soomin Lee, Erik Ordentlich, César A. Uribe, Alexander Gasnikov
Hyperfast Second-Order Local Solvers For Efficient Statistically Preconditioned Distributed Optimization, Pavel Dvurechensky, Dmitry Kamzolov, Aleksandr Lukashevich, Soomin Lee, Erik Ordentlich, César A. Uribe, Alexander Gasnikov
Machine Learning Faculty Publications
Statistical preconditioning enables fast methods for distributed large-scale empirical risk minimization problems. In this approach, multiple worker nodes compute gradients in parallel, which are then used by the central node to update the parameter by solving an auxiliary (preconditioned) smaller-scale optimization problem. The recently proposed Statistically Preconditioned Accelerated Gradient (SPAG) method [1] has complexity bounds superior to other such algorithms but requires an exact solution for computationally intensive auxiliary optimization problems at every iteration. In this paper, we propose an Inexact SPAG (InSPAG) and explicitly characterize the accuracy by which the corresponding auxiliary subproblem needs to be solved to guarantee …