Search Results for author: Qi Meng

Found 40 papers, 11 papers with code

MEFT: Memory-Efficient Fine-Tuning through Sparse Adapter

1 code implementation • 7 Jun 2024 • Jitai Hao, Weiwei Sun, Xin Xin, Qi Meng, Zhumin Chen, Pengjie Ren, Zhaochun Ren

We store and update the parameters of larger adapters on the CPU.

Paper
Code

On the Convergence of Adam under Non-uniform Smoothness: Separability from SGDM and Beyond

no code implementations • 22 Mar 2024 • Bohan Wang, Huishuai Zhang, Qi Meng, Ruoyu Sun, Zhi-Ming Ma, Wei Chen

This paper aims to clearly distinguish between Stochastic Gradient Descent with Momentum (SGDM) and Adam in terms of their convergence rates.

Paper
Add Code

Deciphering and integrating invariants for neural operator learning with various physical mechanisms

1 code implementation • 24 Nov 2023 • Rui Zhang, Qi Meng, Zhi-Ming Ma

To this end, we propose Physical Invariant Attention Neural Operator (PIANO) to decipher and integrate the physical invariants (PI) for operator learning from the PDE series with various physical mechanisms.

Operator learning Self-Supervised Learning

Paper
Code

Power-law Dynamic arising from machine learning

no code implementations • 16 Jun 2023 • Wei Chen, Weitao Du, Zhi-Ming Ma, Qi Meng

We study a kind of new SDE that was arisen from the research on optimization in machine learning, we call it power-law dynamic because its stationary distribution cannot have sub-Gaussian tail and obeys power-law.

Paper
Add Code

O-GNN: Incorporating Ring Priors into Molecular Modeling

1 code implementation • ICLR 2023 • Jinhua Zhu, Kehan Wu, Bohan Wang, Yingce Xia, Shufang Xie, Qi Meng, Lijun Wu, Tao Qin, Wengang Zhou, Houqiang Li, Tie-Yan Liu

Despite the recent success of molecular modeling with graph neural networks (GNNs), few models explicitly take rings in compounds into consideration, consequently limiting the expressiveness of the models.

Ranked #1 on Graph Regression on PCQM4M-LSC (Validation MAE metric)

Graph Regression Molecular Property Prediction +3

Paper
Code

NeuralStagger: Accelerating Physics-constrained Neural PDE Solver with Spatial-temporal Decomposition

no code implementations • 20 Feb 2023 • Xinquan Huang, Wenlei Shi, Qi Meng, Yue Wang, Xiaotian Gao, Jia Zhang, Tie-Yan Liu

Neural networks have shown great potential in accelerating the solution of partial differential equations (PDEs).

Paper
Add Code

Monte Carlo Neural PDE Solver for Learning PDEs via Probabilistic Representation

1 code implementation • 10 Feb 2023 • Rui Zhang, Qi Meng, Rongchan Zhu, Yue Wang, Wenlei Shi, Shihua Zhang, Zhi-Ming Ma, Tie-Yan Liu

To address these limitations, we propose the Monte Carlo Neural PDE Solver (MCNP Solver) for training unsupervised neural solvers via the PDEs' probabilistic representation, which regards macroscopic phenomena as ensembles of random particles.

Paper
Code

Multilingual Speech Emotion Recognition With Multi-Gating Mechanism and Neural Architecture Search

no code implementations • 31 Oct 2022 • Zihan Wang, Qi Meng, HaiFeng Lan, Xinrui Zhang, Kehao Guo, Akshat Gupta

While Speech Emotion Recognition (SER) is a common application for popular languages, it continues to be a problem for low-resourced languages, i. e., languages with no pretrained speech-to-text recognition models.

Neural Architecture Search Speech Emotion Recognition

Paper
Add Code

Provable Adaptivity in Adam

no code implementations • 21 Aug 2022 • Bohan Wang, Yushun Zhang, Huishuai Zhang, Qi Meng, Zhi-Ming Ma, Tie-Yan Liu, Wei Chen

In particular, the existing analysis of Adam cannot clearly demonstrate the advantage of Adam over SGD.

Attribute

Paper
Add Code

Deep Random Vortex Method for Simulation and Inference of Navier-Stokes Equations

no code implementations • 20 Jun 2022 • Rui Zhang, Peiyan Hu, Qi Meng, Yue Wang, Rongchan Zhu, Bingguang Chen, Zhi-Ming Ma, Tie-Yan Liu

To this end, we propose the \emph{Deep Random Vortex Method} (DRVM), which combines the neural network with a random vortex dynamics system equivalent to the Navier-Stokes equation.

Paper
Add Code

Neural Operator with Regularity Structure for Modeling Dynamics Driven by SPDEs

1 code implementation • 13 Apr 2022 • Peiyan Hu, Qi Meng, Bingguang Chen, Shiqi Gong, Yue Wang, Wei Chen, Rongchan Zhu, Zhi-Ming Ma, Tie-Yan Liu

Stochastic partial differential equations (SPDEs) are significant tools for modeling dynamics in many areas including atmospheric sciences and physics.

Paper
Code

SE(3) Equivariant Graph Neural Networks with Complete Local Frames

1 code implementation • 26 Oct 2021 • Weitao Du, He Zhang, Yuanqi Du, Qi Meng, Wei Chen, Bin Shao, Tie-Yan Liu

In this paper, we propose a framework to construct SE(3) equivariant graph neural networks that can approximate the geometric quantities efficiently.

Computational Efficiency

Paper
Code

Optimizing Information-theoretical Generalization Bounds via Anisotropic Noise in SGLD

no code implementations • NeurIPS 2021 • Bohan Wang, Huishuai Zhang, Jieyu Zhang, Qi Meng, Wei Chen, Tie-Yan Liu

We prove that with constraint to guarantee low empirical risk, the optimal noise covariance is the square root of the expected gradient covariance if both the prior and the posterior are jointly optimized.

Generalization Bounds

Paper
Add Code

Does Momentum Change the Implicit Regularization on Separable Data?

no code implementations • 8 Oct 2021 • Bohan Wang, Qi Meng, Huishuai Zhang, Ruoyu Sun, Wei Chen, Zhi-Ming Ma, Tie-Yan Liu

The momentum acceleration technique is widely adopted in many optimization algorithms.

Paper
Add Code

R-Drop: Regularized Dropout for Neural Networks

9 code implementations • NeurIPS 2021 • Xiaobo Liang, Lijun Wu, Juntao Li, Yue Wang, Qi Meng, Tao Qin, Wei Chen, Min Zhang, Tie-Yan Liu

Dropout is a powerful and widely used technique to regularize the training of deep neural networks.

Ranked #4 on Machine Translation on WMT2014 English-French

Abstractive Text Summarization Image Classification +3

863

Paper
Code

PriorGrad: Improving Conditional Denoising Diffusion Models with Data-Dependent Adaptive Prior

1 code implementation • ICLR 2022 • Sang-gil Lee, Heeseung Kim, Chaehun Shin, Xu Tan, Chang Liu, Qi Meng, Tao Qin, Wei Chen, Sungroh Yoon, Tie-Yan Liu

Denoising diffusion probabilistic models have been recently proposed to generate high-quality samples by estimating the gradient of the data density.

Audio Generation Denoising +2

1,321

Paper
Code

Incorporating NODE with Pre-trained Neural Differential Operator for Learning Dynamics

no code implementations • 8 Jun 2021 • Shiqi Gong, Qi Meng, Yue Wang, Lijun Wu, Wei Chen, Zhi-Ming Ma, Tie-Yan Liu

In this paper, to reduce the reliance on the numerical solver, we propose to enhance the supervised signal in the training of NODE.

Paper
Add Code

Machine-Learning Non-Conservative Dynamics for New-Physics Detection

no code implementations • 31 May 2021 • Ziming Liu, Bohan Wang, Qi Meng, Wei Chen, Max Tegmark, Tie-Yan Liu

Energy conservation is a basic physics principle, the breakdown of which often implies new physics.

BIG-bench Machine Learning Friction

Paper
Add Code

Optimizing Information-theoretical Generalization Bound via Anisotropic Noise of SGLD

no code implementations • NeurIPS 2021 • Bohan Wang, Huishuai Zhang, Jieyu Zhang, Qi Meng, Wei Chen, Tie-Yan Liu

Generalization Bounds

Paper
Add Code

UniDrop: A Simple yet Effective Technique to Improve Transformer without Extra Cost

no code implementations • NAACL 2021 • Zhen Wu, Lijun Wu, Qi Meng, Yingce Xia, Shufang Xie, Tao Qin, Xinyu Dai, Tie-Yan Liu

Therefore, in this paper, we integrate different dropout techniques into the training of Transformer models.

Ranked #4 on Machine Translation on IWSLT2014 English-German

General Classification Machine Translation +3

Paper
Add Code

The Implicit Bias for Adaptive Optimization Algorithms on Homogeneous Neural Networks

1 code implementation • 11 Dec 2020 • Bohan Wang, Qi Meng, Wei Chen, Tie-Yan Liu

Except GD, adaptive algorithms such as AdaGrad, RMSProp and Adam are popular owing to their rapid training process.

Paper
Code

Dynamic of Stochastic Gradient Descent with State-Dependent Noise

no code implementations • 24 Jun 2020 • Qi Meng, Shiqi Gong, Wei Chen, Zhi-Ming Ma, Tie-Yan Liu

Specifically, we show that the covariance of the noise of SGD in the local region of the local minima is a quadratic function of the state.

Paper
Add Code

Interpreting Basis Path Set in Neural Networks

no code implementations • 18 Oct 2019 • Juanping Zhu, Qi Meng, Wei Chen, Zhi-Ming Ma

Based on basis path set, G-SGD algorithm significantly outperforms conventional SGD algorithm in optimizing neural networks.

Paper
Add Code

Path Space for Recurrent Neural Networks with ReLU Activations

no code implementations • 25 Sep 2019 • Yue Wang, Qi Meng, Wei Chen, YuTing Liu, Zhi-Ming Ma, Tie-Yan Liu

Optimization algorithms like stochastic gradient descent that optimize the neural networks in the vector space of weights, which are not positively scale-invariant.

Paper
Add Code

P-BN: Towards Effective Batch Normalization in the Path Space

no code implementations • 25 Sep 2019 • Xufang Luo, Qi Meng, Wei Chen, Tie-Yan Liu

Hence, some new algorithms that conduct optimizations directly in the path space (the path space is proven to be PSI) were developed, such as Stochastic Gradient Descent (SGD) in the path space, and it was shown that SGD in the path space is superior to that in the weight space.

Paper
Add Code

G-SGD: Optimizing ReLU Neural Networks in its Positively Scale-Invariant Space

no code implementations • ICLR 2019 • Qi Meng, Shuxin Zheng, Huishuai Zhang, Wei Chen, Zhi-Ming Ma, Tie-Yan Liu

Then, a natural question is: \emph{can we construct a new vector space that is positively scale-invariant and sufficient to represent ReLU neural networks so as to better facilitate the optimization process }?

Paper
Add Code

Reinforcement Learning with Dynamic Boltzmann Softmax Updates

1 code implementation • 14 Mar 2019 • Ling Pan, Qingpeng Cai, Qi Meng, Wei Chen, Longbo Huang, Tie-Yan Liu

In this paper, we propose to update the value function with dynamic Boltzmann softmax (DBS) operator, which has good convergence property in the setting of planning and learning.

Atari Games Q-Learning +2

Paper
Code

Positively Scale-Invariant Flatness of ReLU Neural Networks

no code implementations • 6 Mar 2019 • Mingyang Yi, Qi Meng, Wei Chen, Zhi-Ming Ma, Tie-Yan Liu

That is to say, the minimum with balanced values of basis paths will more likely to be flatter and generalize better.

Paper
Add Code

A Convergent Variant of the Boltzmann Softmax Operator in Reinforcement Learning

no code implementations • 27 Sep 2018 • Ling Pan, Qingpeng Cai, Qi Meng, Wei Chen, Tie-Yan Liu

We then propose the dynamic Boltzmann softmax(DBS) operator to enable the convergence to the optimal value function in value iteration.

Atari Games Q-Learning +2

Paper
Add Code

Expressiveness in Deep Reinforcement Learning

no code implementations • 27 Sep 2018 • Xufang Luo, Qi Meng, Di He, Wei Chen, Yunhong Wang, Tie-Yan Liu

Based on our observations, we formally define expressiveness of the state extractor as the rank of the matrix composed by representations.

Atari Games reinforcement-learning +2

Paper
Add Code

Target Transfer Q-Learning and Its Convergence Analysis

no code implementations • 21 Sep 2018 • Yue Wang, Qi Meng, Wei Cheng, Yuting Liug, Zhi-Ming Ma, Tie-Yan Liu

In this paper, we propose to transfer the Q-function learned in the source task to the target of the Q-learning in the new task when certain safe conditions are satisfied.

Q-Learning Reinforcement Learning (RL) +1

Paper
Add Code

Capacity Control of ReLU Neural Networks by Basis-path Norm

no code implementations • 19 Sep 2018 • Shuxin Zheng, Qi Meng, Huishuai Zhang, Wei Chen, Nenghai Yu, Tie-Yan Liu

Motivated by this, we propose a new norm \emph{Basis-path Norm} based on a group of linearly independent paths to measure the capacity of neural networks more accurately.

Paper
Add Code

Differential Equations for Modeling Asynchronous Algorithms

no code implementations • 8 May 2018 • Li He, Qi Meng, Wei Chen, Zhi-Ming Ma, Tie-Yan Liu

Then we conduct theoretical analysis on the convergence rates of ASGD algorithm based on the continuous approximation.

Paper
Add Code

$\mathcal{G}$-SGD: Optimizing ReLU Neural Networks in its Positively Scale-Invariant Space

no code implementations • 11 Feb 2018 • Qi Meng, Shuxin Zheng, Huishuai Zhang, Wei Chen, Zhi-Ming Ma, Tie-Yan Liu

Paper
Add Code

LightGBM: A Highly Efficient Gradient Boosting Decision Tree

1 code implementation • NeurIPS 2017 • Guolin Ke, Qi Meng, Thomas Finley, Taifeng Wang, Wei Chen, Weidong Ma, Qiwei Ye, Tie-Yan Liu

We prove that, since the data instances with larger gradients play a more important role in the computation of information gain, GOSS can obtain quite accurate estimation of the information gain with a much smaller data size.

16,177

Paper
Code

Convergence Analysis of Distributed Stochastic Gradient Descent with Shuffling

no code implementations • 29 Sep 2017 • Qi Meng, Wei Chen, Yue Wang, Zhi-Ming Ma, Tie-Yan Liu

First, we give a mathematical formulation for the practical data processing procedure in distributed machine learning, which we call data partition with global/local shuffling.

BIG-bench Machine Learning

Paper
Add Code

A Communication-Efficient Parallel Algorithm for Decision Tree

no code implementations • NeurIPS 2016 • Qi Meng, Guolin Ke, Taifeng Wang, Wei Chen, Qiwei Ye, Zhi-Ming Ma, Tie-Yan Liu

After partitioning the training data onto a number of (e. g., $M$) machines, this algorithm performs both local voting and global voting in each iteration.

2k Attribute

Paper
Add Code

Asynchronous Stochastic Gradient Descent with Delay Compensation

no code implementations • ICML 2017 • Shuxin Zheng, Qi Meng, Taifeng Wang, Wei Chen, Nenghai Yu, Zhi-Ming Ma, Tie-Yan Liu

We propose a novel technology to compensate this delay, so as to make the optimization behavior of ASGD closer to that of sequential SGD.

Paper
Add Code

Asynchronous Stochastic Proximal Optimization Algorithms with Variance Reduction

no code implementations • 27 Sep 2016 • Qi Meng, Wei Chen, Jingcheng Yu, Taifeng Wang, Zhi-Ming Ma, Tie-Yan Liu

The results verified our theoretical findings and demonstrated the practical efficiency of the asynchronous stochastic proximal algorithms with variance reduction.

Paper
Add Code

Generalization Error Bounds for Optimization Algorithms via Stability

no code implementations • 27 Sep 2016 • Qi Meng, Yue Wang, Wei Chen, Taifeng Wang, Zhi-Ming Ma, Tie-Yan Liu

Many machine learning tasks can be formulated as Regularized Empirical Risk Minimization (R-ERM), and solved by optimization algorithms such as gradient descent (GD), stochastic gradient descent (SGD), and stochastic variance reduction (SVRG).

BIG-bench Machine Learning

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.