Eduard Gorbunov

[June 29, 2025]
2 papers accepted to ICLR 2025, 1 paper accepted to ICML 2025, 6 new preprints, and
an Assistant Professor position!
- I am happy to report that I co-authored 2 papers accepted to ICLR 2025:
one on the improved analysis of first-order methods for convex (L0,L1)-smooth optimization,
one on Random Reshuffling and Local steps under (L0,L1)-smoothness
- I also have one paper on the high-probability convergence of AdaGrad/Adam-type methods under the heavy-tailed noise accepted to ICML 2025
- New preprints: see Publications (58-64)
- Starting August 1st, I will be joining MBZUAI as an Assistant Professor of Statistics and Data Science! I will have several positions for interns and postdocs - feel free to get in touch if you are interested!
---------------------------------------------------------
[October 10, 2024]
4 papers were accepted to NeurIPS 2024, and 1 new preprint appeared!
- I am happy to report that I co-authored 4 papers accepted to NeurIPS 2024 this year:
one on Federated Learning with Random Reshuffling, compression, and local steps,
one on Byzantine robustness with partial participation of clients,
one on a new scale-invariant version of AdaGrad,
and one on inexact second-order and quasi-Newton methods for variational inequalities
- New preprint: Methods for Convex (L0,L1)-Smooth Optimization: Clipping, Acceleration, and Adaptivity
---------------------------------------------------------
[September 22, 2024]
Paper at EMNLP!
I am happy to report that our joint work with Viktor Moskvoretskii, Nazarii Tupitsa,
Chris Biemann, Samuel Horváth, and Irina Nikishina title "Low-Resource Machine Translation through the Lens of Personalized Federated Learning"
got accepted to EMNLP 2024 (Findings)!
---------------------------------------------------------
[July 25, 2024]
Some updates:
- New preprint: Gradient Clipping Improves AdaGrad when the Noise Is Heavy-Tailed
- New preprint: Low-Resource Machine Translation through the Lens of Personalized Federated Learning
- I gave a talk at PODL 2024, at INSAIT (invited talk), and at EURO 2024 (where I also chaired 2 sessions) on Byzantine-robust learning in the case of partial participation [slides] [shorter version]
- I gave a talk at EUROPT 2024 on the last-iterate convergence of Extragradient-based methods [slides]
- I gave my first oral presentation at ICML! [slides] [poster] [paper]
---------------------------------------------------------
[June 1, 2024]
Some updates:
- I was promoted to Research Scientist at MBZUAI on April 1
- I attended AISTATS 2024, where I presented this work and Kaja presented this work
- One paper was accepted to ICML 2024 for oral presentation
- I gave a keynote talk at NETYS 2024, where I discussed our recent results on Byzantine-robustness in the case of partial participation [slides]
- New preprint: Exploring Jacobian Inexactness in Second-Order Methods for Variational Inequalities: Lower Bounds, Optimal Algorithms and Quasi-Newton Approximations
---------------------------------------------------------
[March 5, 2024]
New preprint:
Remove that Square Root: A New Efficient Scale-Invariant Version of AdaGrad
---------------------------------------------------------
[February 11, 2024]
2 new preprints and 1 talk at FLOW:
- Zeroth-order Median Clipping for Non-Smooth Convex Optimization Problems with Heavy-tailed Symmetric Noise
- Federated Learning Can Find Friends That Are Beneficial
- Talk at FLOW: Variance Reduction for Byzantine-Robust Distributed Optimization
[slides] [video]
---------------------------------------------------------

[January 20, 2024]
2 papers accepted to AISTATS 2024:
- Breaking the Heavy-Tailed Noise Barrier in Stochastic Optimization Problems
- Communication Compression for Byzantine Robust Learning: New Efficient Algorithms and Improved Rates
---------------------------------------------------------
[November 28, 2023]
Another portion of updates:

- 7 new preprints:
- Intermediate Gradient Methods with Relative Inexactness
- High-Probability Convergence for Composite and Distributed Stochastic Minimization and Variational Inequalities with Heavy-Tailed Noise
- Communication Compression for Byzantine Robust Learning: New Efficient Algorithms and Improved Rates
- Accelerated Zeroth-order Method for Non-Smooth Stochastic Convex Optimization Problem with Infinite Variance
- Breaking the Heavy-Tailed Noise Barrier in Stochastic Optimization Problems
- Byzantine-Tolerant Methods for Distributed Variational Inequalities
- Byzantine Robustness and Partial Participation Can Be Achieved Simultaneously: Just Clip Gradient Differences

- Talk at TES Conference on TES Conference on Mathematical Optimization for Machine Learning [slides]

- 3 papers accepted to NeurIPS 2023:
- Single-Call Stochastic Extragradient Methods for Structured Non-monotone Variational Inequalities: Improved Analysis under Weaker Conditions
- Accelerated Zeroth-order Method for Non-Smooth Stochastic Convex Optimization Problem with Infinite Variance
- Byzantine-Tolerant Methods for Distributed Variational Inequalities
---------------------------------------------------------
[1 August, 2023]
Some recent updates:
- 3 new preprints: 1 on heavy-tailed bandits, 1 on so-called partial personalization in FL, and 1 on the new version of distributed gradient clipping
- I gave a talk at Oberseminar at LT Group, University of Hamburg on June 6, 2023 [slides]
- Unfortunately, I didn't get my US visa and I couldn't participate at ICML this year :(
But you can check the posters here:
[poster on negative comonotonicity]
[poster on high-probability convergence]
---------------------------------------------------------
[25 April, 2023]
2 papers got accepted to ICML 2023:

1. "Convergence of Proximal Point and Extragradient-Based Methods Beyond Monotonicity: the Case of Negative Comonotonicity" - joint work with Adrien Taylor,
Samuel Horváth, Gauthier Gidel. Recently, I presented this work at "PEP talks" in Belgium. [slides] [video]

2. "High-Probability Bounds for Stochastic Optimization and Variational Inequalities: the Case of Unbounded Variance " - joint work with Abdurakhmon Sadiev, Marina Danilova, Samuel Horváth, Gauthier Gidel, Pavel Dvurechensky, Alexander Gasnikov, Peter Richtárik.

I would like to thank my co-authors for amazing work!
---------------------------------------------------------
[23 April, 2023]
It was silent here... Here are some updates:

- 1 paper accepted to AISTATS 2023
- 1 paper accepted to ICLR 2023
- I made a talk at "PEP talks" in Belgium
[paper] [slides] [video]
- 5 new preprints: 1 is a survey on gradient-free methods for convex optimization, 1 on high-probability analysis without bounded variance assumption, 1 on single-call Stochastic Extragradient, 1 on Byzantine-Robust SVRG, and 1 preprint is a short note on unified analysis of SGD prepared for the Encyclopedia of Optimization
---------------------------------------------------------
[25 October, 2022]
New paper out: "Convergence of Proximal Point and Extragradient-Based Methods Beyond Monotonicity: the Case of Negative Comonotonicity" - joint work with
Adrien Taylor, Samuel Horváth, and Gauthier Gidel.
---------------------------------------------------------
[8 October, 2022]
Talk at MBZUAI Workshop on Collaborative Learning
I presented "Variance Reduction is an Antidote to Byzantines: Better Rates, Weaker Assumptions and Communication Compression as a Cherry on the Top" - our recent work with Samuel Horváth, Peter Richtárik and Gauthier Gidel.
[slides]
---------------------------------------------------------
[8 October, 2022]
Outstanding reviewer award (NeurIPS 2022)!
I am happy to share that I got outstanding reviewer award at NeurIPS 2022 and a free registration to the conference.
---------------------------------------------------------
[15 September, 2022]
2 papers accepted to NeurIPS 2022:

1. Clipped Stochastic Methods for Variational Inequalities with Heavy-Tailed Noise - joint work with Marina Danilova, David Dobre, Pavel Dvurechensky, Alexander Gasnikov, Gauthier Gidel.

2. Last-Iterate Convergence of Optimistic Gradient Method for Monotone Variational Inequalities - joint work with Adrien Taylor and Gauthier Gidel.

The list of accepted papers is available here.
---------------------------------------------------------
[9 September, 2022]
Talk at All-Russian Seminar (in Russian).
Title: ``Methods with Clipping for Stochastic Optimization and Variational Inequalities with Heavy-Tailed Noise''.
[video] [slides]
---------------------------------------------------------
[5 September, 2022]
New position! I have started my postdoc position at MBZUAI located in Abu Dhabi, UAE. I will work here in the groups of Samuel Horváth and Martin Takáč. I am excited to be here and looking forward to collaborate with new colleagues and students.
---------------------------------------------------------
[29 August, 2022]
New paper out: "Smooth Monotone Stochastic Variational Inequalities and Saddle Point Problems - Survey" - joint work with Aleksandr Beznosikov, Boris Polyak,
Dmitry Kovalev, Alexander Gasnikov.
---------------------------------------------------------
[13 July, 2022]
Outstanding reviewer award (ICML 2022)!
I am happy to share that I got outstanding reviewer award at ICML 2022 and a free registration to the conference.
---------------------------------------------------------
[3 July, 2022]
Talk at MOTOR 2022.
I presented "Distributed Methods with Absolute Compression and Error Compensation" - joint work with Marina Danilova.
[slides]
---------------------------------------------------------
[14 June, 2022]
New paper out: "Federated Optimization Algorithms with Random Reshuffling and Gradient Compression" - joint work with Abdurakhmon Sadiev, Grigory Malinovsky, Igor Sokolov, Ahmed Khaled, Konstantin Burlachenko, Peter Richtárik.
---------------------------------------------------------
[2 June, 2022]
New paper out: "Clipped Stochastic Methods for Variational Inequalities with Heavy-Tailed Noise" - joint work with Marina Danilova, David Dobre, Pavel Dvurechensky, Alexander Gasnikov, Gauthier Gidel.
---------------------------------------------------------
[1 June, 2022]
New paper out: "Variance Reduction is an Antidote to Byzantines: Better Rates, Weaker Assumptions and Communication Compression as a Cherry on the Top" - joint work with Samuel Horváth, Peter Richtárik, Gauthier Gidel.
---------------------------------------------------------
[17 May, 2022]
New paper out: "Last-Iterate Convergence of Optimistic Gradient Method for Monotone Variational Inequalities" - joint work with Adrien Taylor and Gauthier Gidel.
---------------------------------------------------------
[15 May, 2022]
Two papers accepted to ICML 2022:

1. "Secure Distributed Training at Scale" - joint work with Alexander Borzunov, Michael Diskin and Max Ryabinin.

2. "3PC: Three Point Compressors for Communication-Efficient Distributed Training and a Better Theory for Lazy Aggregation" - joint work with Peter Richtárik, Igor Sokolov, Ilyas Fatkhullin, Elnur Gasanov, and Zhize Li.

The list of accepted papers is available here.
---------------------------------------------------------
[25 April, 2022]
Talk at Lagrange Workshop on Federated Learning.
I presented "Secure Distributed Training at Scale" - joint work with Alexander Borzunov, Michael Diskin and
Max Ryabinin.
[slides]
---------------------------------------------------------
[8 April, 2022]
One paper accepted to MOTOR 2022: "Distributed Methods with Absolute Compression and Error Compensation" - joint work with Marina Danilova.
---------------------------------------------------------
[28-29 March, 2022]
I presented 2 papers at AISTATS 2022:
1. Extragradient Method: O(1/K) Last-Iterate Convergence for Monotone Variational Inequalities and Connections With Cocoercivity - joint work with Nicolas Loizou and Gauthier Gidel. [poster] [page on AISTATS website]
2. Stochastic Extragradient: General Analysis and Improved Rates - joint work with Hugo Berard, Gauthier Gidel and Nicolas Loizou. [poster] [page on AISTATS website]
---------------------------------------------------------
[13 March, 2022]
Talk at Rising Stars in AI Symposium 2022:
I presented our work "Extragradient Method: O(1/K) Last-Iterate Convergence for Monotone Variational Inequalities and Connections With Cocoercivity" (joint work with
Nicolas Loizou and Gauthier Gidel). For me it was the first offline conference since 2020.
[slides] [video]
---------------------------------------------------------
[4 March, 2022]
New paper out: "Distributed Methods with Absolute Compression and Error Compensation" - joint work with Marina Danilova.
---------------------------------------------------------
[16 February, 2022]
Talk at Vector Institute: together with Max Ryabinin we presented our NeurIPS 2022 paper "Moshpit SGD: Communication-Efficient Decentralized Training on Heterogeneous Unreliable Devices" - joint work with Vsevolod Plokhotnyuk and Gennady Pekhimenko.
[slides]
---------------------------------------------------------
[15 February, 2022]
New paper out: "Stochastic Gradient Descent-Ascent: Unified Theory and New Efficient Methods" - joint work with Aleksandr Beznosikov, Hugo Berard, Nicolas Loizou
---------------------------------------------------------
[2 February, 2022]
New paper out: "3PC: Three Point Compressors for Communication-Efficient Distributed Training and a Better Theory for Lazy Aggregation" - joint work with Peter Richtárik, Igor Sokolov, Ilyas Fatkhullin, Elnur Gasanov, and Zhize Li.
---------------------------------------------------------
[18 January, 2022]
2 papers accepted to AISTATS 2022:

1. Extragradient Method: O(1/K) Last-Iterate Convergence for Monotone Variational Inequalities and Connections With Cocoercivity - joint work with Nicolas Loizou and Gauthier Gidel.

2. Stochastic Extragradient: General Analysis and Improved Rates - joint work with Hugo Berard, Gauthier Gidel and Nicolas Loizou.

The list of accepted papers is available here.
---------------------------------------------------------
[23 December, 2021]
PhD defense: I am happy to share that today I defended my dissertation "Distributed and Stochastic Optimization Methods with Gradient Compression and Local Steps" and received a degree of the candidate of physical and mathematical sciences (Russian equivalent of PhD).
[dissertation] [slides of the defense] [video]
---------------------------------------------------------
[20 December, 2021]
I gave a talk at MLO EPFL internal seminar about the paper "Moshpit SGD: Communication-Efficient Decentralized Training on Heterogeneous Unreliable Devices" - joint work with Max Ryabinin, Vsevolod Plokhotnyuk, and
Gennady Pekhimenko.
[slides] [arXiv] [NeurIPS 2021 Proceedings]
---------------------------------------------------------
[10 December, 2021]
NeurIPS 2021 poster presentation: together with
Max Ryabinin we presented our paper "Moshpit SGD: Communication-Efficient Decentralized Training on Heterogeneous Unreliable Devices" (joint work with Vsevolod Plokhotnyuk and Gennady Pekhimenko) at NeurIPS 2021 poster session. Although the conference was virtual, we had great discussions with several people.
[poster] [arXiv] [NeurIPS 2021 Proceedings]
---------------------------------------------------------
[1 December, 2021]
I gave a talk at MTL MLOpt internal seminar about the recent paper "Extragradient Method: O(1/K) Last-Iterate Convergence for Monotone Variational Inequalities and Connections With Cocoercivity" - joint work with Nicolas Loizou and Gauthier Gidel.
[slides] [arXiv]
---------------------------------------------------------
[17 November, 2021]
I gave a talk at All-Russian Optimization Seminar about the recent paper "Extragradient Method: O(1/K) Last-Iterate Convergence for Monotone Variational Inequalities and Connections With Cocoercivity" - joint work with Nicolas Loizou and Gauthier Gidel.
[video (in Russian)] [slides] [arXiv]
---------------------------------------------------------
[16 November, 2021]
New paper out: "Stochastic Extragradient: General Analysis and Improved Rates" - joint work with Hugo Berard, Gauthier Gidel, Nicolas Loizou.
Abstract: The Stochastic Extragradient (SEG) method is one of the most popular algorithms for solving min-max optimization and variational inequalities problems (VIP) appearing in various machine learning tasks. However, several important questions regarding the convergence properties of SEG are still open, including the sampling of stochastic gradients, mini-batching, convergence guarantees for the monotone finite-sum variational inequalities with possibly non-monotone terms, and others. To address these questions, in this paper, we develop a novel theoretical framework that allows us to analyze several variants of SEG in a unified manner. Besides standard setups, like Same-Sample SEG under Lipschitzness and monotonicity or Independent-Samples SEG under uniformly bounded variance, our approach allows us to analyze variants of SEG that were never explicitly considered in the literature before. Notably, we analyze SEG with arbitrary sampling which includes importance sampling and various mini-batching strategies as special cases. Our rates for the new variants of SEG outperform the current state-of-the-art convergence guarantees and rely on less restrictive assumptions.
---------------------------------------------------------
[3 November, 2021]
I gave a talk at Federated Learning One-World Seminar about the recent paper "Secure Distributed Training at Scale" - joint work with Alexander Borzunov, Michael Diskin and Max Ryabinin.

[video] [slides] [arXiv]
---------------------------------------------------------

[20 October, 2021]
Paper accepted to NeurIPS 2021 workshop:
A short version of "EF21 with Bells & Whistles: Practical Algorithmic Extensions of Modern Error Feedback" - joint work with lyas Fatkhullin, Igor Sokolov, Zhize Li, and Peter Richtárik - was accepted to the NeurIPS 2021 workshop OPT2021.
---------------------------------------------------------
[15 October, 2021]
NeurIPS 2021 - Outstanding Reviewer Award:
I got "outstanding reviewer award'' (top 8 % of reviewers) and a free registration to the conference.

---------------------------------------------------------

[8 October, 2021]
New paper out: "Extragradient Method: O(1/K) Last-Iterate Convergence for Monotone Variational Inequalities and Connections With Cocoercivity" - joint work with Nicolas Loizou and Gauthier Gidel.
Abstract: Extragradient method (EG) Korpelevich [1976] is one of the most popular methods for solving saddle point and variational inequalities problems (VIP). Despite its long history and significant attention in the optimization community, there remain important open questions about convergence of EG. In this paper, we resolve one of such questions and derive the first last-iterate O(1/K) convergence rate for EG for monotone and Lipschitz VIP without any additional assumptions on the operator. The rate is given in terms of reducing the squared norm of the operator. Moreover, we establish several results on the (non-)cocoercivity of the update operators of EG, Optimistic Gradient Method, and Hamiltonian Gradient Method, when the original operator is monotone and Lipschitz.
---------------------------------------------------------
[7 October, 2021]
New paper out: "EF21 with Bells & Whistles: Practical Algorithmic Extensions of Modern Error Feedback" - joint work with lyas Fatkhullin, Igor Sokolov, Zhize Li, and Peter Richtárik.
Abstract: First proposed by Seide (2014) as a heuristic, error feedback (EF) is a very popular mechanism for enforcing convergence of distributed gradient-based optimization methods enhanced with communication compression strategies based on the application of contractive compression operators. However, existing theory of EF relies on very strong assumptions (e.g., bounded gradients), and provides pessimistic convergence rates (e.g., while the best known rate for EF in the smooth nonconvex regime, and when full gradients are compressed, is O(1/T^{2/3}), the rate of gradient descent in the same regime is O(1/T)). Recently, Richtárik et al. (2021) proposed a new error feedback mechanism, EF21, based on the construction of a Markov compressor induced by a contractive compressor. EF21 removes the aforementioned theoretical deficiencies of EF and at the same time works better in practice. In this work we propose six practical extensions of EF21, all supported by strong convergence theory: partial participation, stochastic approximation, variance reduction, proximal setting, momentum and bidirectional compression. Several of these techniques were never analyzed in conjunction with EF before, and in cases where they were (e.g., bidirectional compression), our rates are vastly superior.
---------------------------------------------------------
[28 September, 2021]
Paper accepted to NeurIPS 2021:
"Moshpit SGD: Communication-Efficient Decentralized Training on Heterogeneous Unreliable Devices" - joint work with Max Ryabinin, Vsevolod Plokhotnyuk, Gennady Pekhimenko.
Abstract: Training deep neural networks on large datasets can often be accelerated by using multiple compute nodes. This approach, known as distributed training, can utilize hundreds of computers via specialized message-passing protocols such as Ring All-Reduce. However, running these protocols at scale requires reliable high-speed networking that is only available in dedicated clusters. In contrast, many real-world applications, such as federated learning and cloud-based distributed training, operate on unreliable devices with unstable network bandwidth. As a result, these applications are restricted to using parameter servers or gossip-based averaging protocols. In this work, we lift that restriction by proposing Moshpit All-Reduce -- an iterative averaging protocol that exponentially converges to the global average. We demonstrate the efficiency of our protocol for distributed optimization with strong theoretical guarantees. The experiments show 1.3x speedup for ResNet-50 training on ImageNet compared to competitive gossip-based strategies and 1.5x speedup when training ALBERT-large from scratch using preemptible compute nodes.
---------------------------------------------------------
[16 September, 2021]
New scholarship: I received A. M. Raigorodskii Scholarship for Contribution to the Development of Numerical Optimization Methods. The scholarship has different levels and I won the main one. Great thanks to A. M. Raigorodskii for the support!
---------------------------------------------------------
[24 July, 2021]
ICML 2021 - top 10% of reviewers: I am among the top 10% of reviewers. Free registration to the conference was a good bonus :)
---------------------------------------------------------
[21 June, 2021]
New paper out: "Secure Distributed Training at Scale" - joint work with Alexander Borzunov, Michael Diskin
and Max Ryabinin.
Abstract: Some of the hardest problems in deep learning can be solved with the combined effort of many independent parties, as is the case for volunteer computing and federated learning. These setups rely on high numbers of peers to provide computational resources or train on decentralized datasets. Unfortunately, participants in such systems are not always reliable. Any single participant can jeopardize the entire training run by sending incorrect updates, whether deliberately or by mistake. Training in presence of such peers requires specialized distributed training algorithms with Byzantine tolerance. These algorithms often sacrifice efficiency by introducing redundant communication or passing all updates through a trusted server. As a result, it can be infeasible to apply such algorithms to large-scale distributed deep learning, where models can have billions of parameters. In this work, we propose a novel protocol for secure (Byzantine-tolerant) decentralized training that emphasizes communication efficiency. We rigorously analyze this protocol: in particular, we provide theoretical bounds for its resistance against Byzantine and Sybil attacks and show that it has a marginal communication overhead. To demonstrate its practical effectiveness, we conduct large-scale experiments on image classification and language modeling in presence of Byzantine attackers.
---------------------------------------------------------
[10 June, 2021]
New paper out: "Near-Optimal High Probability Complexity Bounds for Non-Smooth Stochastic Optimization with Heavy-Tailed Noise" - joint work with Marina Danilova, Innokentiy Shibaev, Pavel Dvurechensky and Alexander Gasnikov.
Abstract: Thanks to their practical efficiency and random nature of the data, stochastic first-order methods are standard for training large-scale machine learning models. Random behavior may cause a particular run of an algorithm to result in a highly suboptimal objective value, whereas theoretical guarantees are usually proved for the expectation of the objective value. Thus, it is essential to theoretically guarantee that algorithms provide small objective residual with high probability. Existing methods for non-smooth stochastic convex optimization have complexity bounds with the dependence on the confidence level that is either negative-power or logarithmic but under an additional assumption of sub-Gaussian (light-tailed) noise distribution that may not hold in practice, e.g., in several NLP tasks. In our paper, we resolve this issue and derive the first high-probability convergence results with logarithmic dependence on the confidence level for non-smooth convex stochastic optimization problems with non-sub-Gaussian (heavy-tailed) noise. To derive our results, we propose novel stepsize rules for two stochastic methods with gradient clipping. Moreover, our analysis works for generalized smooth objectives with Hölder-continuous gradients, and for both methods, we provide an extension for strongly convex problems. Finally, our results imply that the first (accelerated) method we consider also has optimal iteration and oracle complexity in all the regimes, and the second one is optimal in the non-smooth setting.
---------------------------------------------------------
[8 June, 2021]
Start of the internship at Mila, Montreal!
I started my internship at the group of Gauthier Gidel. It is my pleasure to work with Gauthier! I will be here till the end of September.
---------------------------------------------------------
[8 May, 2021]
Paper accepted to ICML 2021:
"MARINA: Faster Non-Convex Distributed Learning with Compression" - joint work with Konstantin Burlachenko, Zhize Li, Peter Richtárik.
Abstract: We develop and analyze MARINA: a new communication efficient method for non-convex distributed learning over heterogeneous datasets. MARINA employs a novel communication compression strategy based on the compression of gradient differences which is reminiscent of but different from the strategy employed in the DIANA method of Mishchenko et al (2019). Unlike virtually all competing distributed first-order methods, including DIANA, ours is based on a carefully designed biased gradient estimator, which is the key to its superior theoretical and practical performance. To the best of our knowledge, the communication complexity bounds we prove for MARINA are strictly superior to those of all previous first order methods. Further, we develop and analyze two variants of MARINA: VR-MARINA and PP-MARINA. The first method is designed for the case when the local loss functions owned by clients are either of a finite sum or of an expectation form, and the second method allows for partial participation of clients -- a feature important in federated learning. All our methods are superior to previous state-of-the-art methods in terms of the oracle/communication complexity. Finally, we provide convergence analysis of all methods for problems satisfying the Polyak-Lojasiewicz condition.
---------------------------------------------------------
[19 March, 2021]
ICLR 2021: I was recognized as an "outstanding reviewer" and got a free registration to the conference.

---------------------------------------------------------

[10 March, 2021]
I gave a talk at Federated Learning One-World Seminar about the recent paper "MARINA: Faster Non-Convex Distributed Learning with Compression" - joint work with Konstantin Burlachenko, Zhize Li, and Peter Richtárik.

[video] [slides] [arXiv]
---------------------------------------------------------

[4 March 2021]
New paper out: "Moshpit SGD: Communication-Efficient Decentralized Training on Heterogeneous Unreliable Devices" - joint work with Max Ryabinin, Vsevolod Plokhotnyuk, Gennady Pekhimenko.
Abstract: Training deep neural networks on large datasets can often be accelerated by using multiple compute nodes. This approach, known as distributed training, can utilize hundreds of computers via specialized message-passing protocols such as Ring All-Reduce. However, running these protocols at scale requires reliable high-speed networking that is only available in dedicated clusters. In contrast, many real-world applications, such as federated learning and cloud-based distributed training, operate on unreliable devices with unstable network bandwidth. As a result, these applications are restricted to using parameter servers or gossip-based averaging protocols. In this work, we lift that restriction by proposing Moshpit All-Reduce -- an iterative averaging protocol that exponentially converges to the global average. We demonstrate the efficiency of our protocol for distributed optimization with strong theoretical guarantees. The experiments show 1.3x speedup for ResNet-50 training on ImageNet compared to competitive gossip-based strategies and 1.5x speedup when training ALBERT-large from scratch using preemptible compute nodes.
---------------------------------------------------------
[26 February 2021]
New scholarship: I received A. M. Raigorodskii Scholarship for Contribution to the Development of Numerical Optimization Methods. The scholarship has different levels and I won the main one. Great thanks to A. M. Raigorodskii for the support!
---------------------------------------------------------
[15 February 2021]
New paper out: "MARINA: Faster Non-Convex Distributed Learning with Compression" - joint work with
Konstantin Burlachenko, Zhize Li, Peter Richtárik.
Abstract: We develop and analyze MARINA: a new communication efficient method for non-convex distributed learning over heterogeneous datasets. MARINA employs a novel communication compression strategy based on the compression of gradient differences which is reminiscent of but different from the strategy employed in the DIANA method of Mishchenko et al (2019). Unlike virtually all competing distributed first-order methods, including DIANA, ours is based on a carefully designed biased gradient estimator, which is the key to its superior theoretical and practical performance. To the best of our knowledge, the communication complexity bounds we prove for MARINA are strictly superior to those of all previous first order methods. Further, we develop and analyze two variants of MARINA: VR-MARINA and PP-MARINA. The first method is designed for the case when the local loss functions owned by clients are either of a finite sum or of an expectation form, and the second method allows for partial participation of clients -- a feature important in federated learning. All our methods are superior to previous state-of-the-art methods in terms of the oracle/communication complexity. Finally, we provide convergence analysis of all methods for problems satisfying the Polyak-Lojasiewicz condition.
---------------------------------------------------------
[10 February 2021]
ICML 2021 Expert Reviewer invitation: I accepted an invitation to serve as an expert reviewer for ICML 2021.

---------------------------------------------------------

[23 January 2021]
Paper accepted to AISTATS 2021:
"Local SGD: Unified Theory and New Efficient Methods" - joint work with Filip Hanzely and Peter Richtárik.
Abstract: We present a unified framework for analyzing local SGD methods in the convex and strongly convex regimes for distributed/federated training of supervised machine learning models. We recover several known methods as a special case of our general framework, including Local-SGD/FedAvg, SCAFFOLD, and several variants of SGD not originally designed for federated learning. Our framework covers both the identical and heterogeneous data settings, supports both random and deterministic number of local steps, and can work with a wide array of local stochastic gradient estimators, including shifted estimators which are able to adjust the fixed points of local iterations for faster convergence. As an application of our framework, we develop multiple novel FL optimizers which are superior to existing methods. In particular, we develop the first linearly converging local SGD method which does not require any data homogeneity or other strong assumptions.
---------------------------------------------------------
[19 January 2021]
I gave a talk "Linearly Converging Error Compensated SGD" at NeurIPS New Year AfterParty (Yandex).
[slides] [video] [paper]
---------------------------------------------------------
[11 December 2020]
New paper out: "Recent Theoretical Advances in Non-Convex Optimization" - joint work with Marina Danilova, Pavel Dvurechensky, Alexander Gasnikov, Sergey Guminov, Dmitry Kamzolov and Innokentiy Shibaev.
Abstract: Motivated by recent increased interest in optimization algorithms for non-convex optimization in application to training deep neural networks and other optimization problems in data analysis, we give an overview of recent theoretical results on global performance guarantees of optimization algorithms for non-convex optimization. We start with classical arguments showing that general non-convex problems could not be solved efficiently in a reasonable time. Then we give a list of problems that can be solved efficiently to find the global minimizer by exploiting the structure of the problem as much as it is possible. Another way to deal with non-convexity is to relax the goal from finding the global minimum to finding a stationary point or a local minimum. For this setting, we first present known results for the convergence rates of deterministic first-order methods, which are then followed by a general theoretical analysis of optimal stochastic and randomized gradient schemes, and an overview of the stochastic first-order methods. After that, we discuss quite general classes of non-convex problems, such as minimization of α-weakly-quasi-convex functions and functions that satisfy Polyak-Lojasiewicz condition, which still allow obtaining theoretical convergence guarantees of first-order methods. Then we consider higher-order and zeroth-order/derivative-free methods and their convergence rates for non-convex optimization problems.
---------------------------------------------------------
[6-12 December 2020]
I attended NeurIPS 2020!
The conference was organized perfectly. In particular, poster sessions in Gather Town were great: this tool made them as vivid as possible. I presented the paper "Linearly Converging Error Compensated SGD" - joint work with Dmitry Kovalev, Dmitry Makarenko and Peter Richtárik. Marina Danilova presented our joint work with Alexander Gasnikov - "Stochastic Optimization with Heavy-Tailed Noise via Accelerated Gradient Clipping".
[poster] [poster]
---------------------------------------------------------
[26 November 2020]
New paper out: "Recent theoretical advances in decentralized distributed convex optimization" - joint work with Alexander Rogozin, Aleksandr Beznosikov, Darina Dvinskikh and Alexander Gasnikov.
Abstract: In the last few years, the theory of decentralized distributed convex optimization has made significant progress. The lower bounds on communications rounds and oracle calls have appeared, as well as methods that reach both of these bounds. In this paper, we focus on how these results can be explained based on optimal algorithms for the non-distributed setup. In particular, we provide our recent results that have not been published yet, and that could be found in details only in arXiv preprints.
---------------------------------------------------------
[3 November, 2020]
New paper out: "Local SGD: Unified Theory and New Efficient Methods" - joint work with Filip Hanzely and
Peter Richtárik.
Abstract: We present a unified framework for analyzing local SGD methods in the convex and strongly convex regimes for distributed/federated training of supervised machine learning models. We recover several known methods as a special case of our general framework, including Local-SGD/FedAvg, SCAFFOLD, and several variants of SGD not originally designed for federated learning. Our framework covers both the identical and heterogeneous data settings, supports both random and deterministic number of local steps, and can work with a wide array of local stochastic gradient estimators, including shifted estimators which are able to adjust the fixed points of local iterations for faster convergence. As an application of our framework, we develop multiple novel FL optimizers which are superior to existing methods. In particular, we develop the first linearly converging local SGD method which does not require any data homogeneity or other strong assumptions.
---------------------------------------------------------
[23 October, 2020]
New paper out (Spotlight @ NeurIPS 2020):
"Linearly Converging Error Compensated SGD" - joint work with Dmitry Kovalev, Dmitry Makarenko, and Peter Richtárik.
Abstract: In this paper, we propose a unified analysis of variants of distributed SGD with arbitrary compressions and delayed updates. Our framework is general enough to cover different variants of quantized SGD, Error-Compensated SGD (EC-SGD) and SGD with delayed updates (D-SGD). Via a single theorem, we derive the complexity results for all the methods that fit our framework. For the existing methods, this theorem gives the best-known complexity results. Moreover, using our general scheme, we develop new variants of SGD that combine variance reduction or arbitrary sampling with error feedback and quantization and derive the convergence rates for these methods beating the state-of-the-art results. In order to illustrate the strength of our framework, we develop 16 new methods that fit this. In particular, we propose the first method called EC-SGD-DIANA that is based on error-feedback for biased compression operator and quantization of gradient differences and prove the convergence guarantees showing that EC-SGD-DIANA converges to the exact optimum asymptotically in expectation with constant learning rate for both convex and strongly convex objectives when workers compute full gradients of their loss functions. Moreover, for the case when the loss function of the worker has the form of finite sum, we modified the method and got a new one called EC-LSVRG-DIANA which is the first distributed stochastic method with error feedback and variance reduction that converges to the exact optimum asymptotically in expectation with a constant learning rate.
---------------------------------------------------------
[21 October, 2020]
I am among the top 10% reviewers at NeurIPS 2020!
I am happy to contribute to the community through reviewing and extremely happy and proud that my work as a reviewer was highly evaluated. Free registration is a nice bonus :)

---------------------------------------------------------

[7 October, 2020]
I gave a talk on Federated Learning One-World Seminar about the recent paper "Linearly Converging Error Compensated SGD" - joint work with Dmitry Kovalev, Dmitry Makarenko, and Peter Richtárik.

[video] [slides] [arXiv]
---------------------------------------------------------

[26 September, 2020]
Two papers accepted to NeurIPS 2020!

1. "Stochastic Optimization with Heavy-Tailed Noise via Accelerated Gradient Clipping" - joint work with
Marina Danilova and Alexander Gasnikov - got accepted for a poster presentation.
[arXiv] [video from MLSS 2020] [slides from MLSS 2020]

2. "Linearly Converging Error Compensated SGD" - joint work with Dmitry Kovalev, Dmitry Makarenko and
Peter Richtárik - got accepted for a spotlight presentation. The paper will be available on arXiv soon.
[video from MLSS 2020] [slides from MLSS 2020]

I would like to thank my great co-authors!
---------------------------------------------------------
[22 September, 2020]
It is my pleasure to join Yandex.Research-MIPT Lab as a junior researcher!
---------------------------------------------------------
[17 September, 2020]
Teaching in the Fall 2020 semester. This semester I will be one of the lecturers for a fresh course "Optimization methods at ML" at MIPT that we developed together with Alexander Gasnikov, Marina Danilova, Alexander Maslovsky, and Alexander Rogozin.
---------------------------------------------------------
[1 September, 2020]
My remote internship at Peter Richtárik's group has started today and will last for 6 months! It is a great pleasure for me to be a part of Peter's group again!
---------------------------------------------------------
[26-28 August, 2020]
I am attending AISTATS 2020, which is held virtually this year for the reasons you all know. I am glad to present our joint work with Filip Hanzely, and Peter Richtárik called "A Unified Theory of SGD: Variance Reduction, Sampling, Quantization and Coordinate Descent".
[video]
---------------------------------------------------------
[13 August, 2020]
Great news: our paper "An Accelerated Directional Derivative Method for Smooth Stochastic Convex Optimization" with Pavel Dvurechensky and Alexander Gasnikov was accepted to European Journal of Operational Research!
---------------------------------------------------------
[3 August, 2020]
I am happy to announce that our paper "Stochastic Three Points Method for Unconstrained Smooth Minimization" with El Houcine Bergou and Peter Richtárik was accepted to SIAM Journal on Optimization!
---------------------------------------------------------
[2 August, 2020]
I have arrived at Sirius university (Sochi, Russia). I will give here 2 lectures and will be a mentor for a couple of students research projects on optimization. I will be here till the end of August.
---------------------------------------------------------
[8 July, 2020]
I gave a talk "On the convergence of SGD-like methods for convex and non-convex optimization problems" on Russian Optimization Seminar.
[video] [slides]
---------------------------------------------------------
[28 June - 10 July, 2020]
It was my pleasure to take part in online Machine Learning Summer School 2020. I have met a lot of interesting people, listened to amazing lectures, and taken part in several round tables. Finally, I have presented our joint work with Dmitry Kovalev, Dmitry Makarenko and Peter Richtárik called "Linearly Converging Error Compensated SGD".
[video] [slides]

Marina Danilova also presented our joint work with Alexander Gasnikov called "Stochastic Optimization with Heavy-Tailed Noise via Accelerated Gradient Clipping".
[video] [slides]
---------------------------------------------------------
[16 June, 2020]
I have defended my Master thesis at MIPT!
---------------------------------------------------------
[22 May, 2020]
New paper out: "Stochastic Optimization with Heavy-Tailed Noise via Accelerated Gradient Clipping" - joint work with Marina Danilova and Alexander Gasnikov.
Abstract: In this paper, we propose a new accelerated stochastic first-order method called clipped-SSTM for smooth convex stochastic optimization with heavy-tailed distributed noise in stochastic gradients and derive the first high-probability complexity bounds for this method closing the gap in the theory of stochastic optimization with heavy-tailed noise. Our method is based on a special variant of accelerated Stochastic Gradient Descent (SGD) and clipping of stochastic gradients. We extend our method to the strongly convex case and prove new complexity bounds that outperform state-of-the-art results in this case. Finally, we extend our proof technique and derive the first non-trivial high-probability complexity bounds for SGD with clipping without light-tails assumption on the noise.
---------------------------------------------------------
[29 April, 2020]
I got accepted to The Machine Learning Summer School this year which will be fully virtual. There were more than 1300 applications and only 180 available slots, so, I am incredibly pleased to be one of those 180 students and looking forward to taking part in this school.
---------------------------------------------------------
[27-30 April, 2020]
I took part in the first fully virtual conference - ICLR 2020!
I have presented our paper "A Stochastic Derivative Free Optimization Method with Momentum" - joint work with Adel Bibi, Ozan Sener, El Houcine Bergou and Peter Richtárik.
I enjoyed the online format and met a lot of amazing people there. Great thanks to the organizers!
[video]
---------------------------------------------------------
[2 February, 2020]
I have arrived at KAUST. I will be here till the end of March to work with Peter Richtárik as a visiting student.

Update: due to the situation with COVID-19 I have decided to come back to Moscow earlier (no, there is no confirmed cases at KAUST). I have arrived in Moscow on 14 March.
---------------------------------------------------------
[15 January, 2020]
I got Huawei scholarship for MIPT bachelor and master students for academic and reseacrh achievements!
The prize is 125 000 Russian rubles.
---------------------------------------------------------
[07 January, 2020]
Paper accepted to AISTATS 2020!
Our paper "A Unified Theory of SGD: Variance Reduction, Sampling, Quantization and Coordinate Descent" - joint work with Filip Hanzely and Peter Richtárik - got accepted to AISTATS 2020. The conference will take place in Palermo, Sicily, Italy during June 3-5.
---------------------------------------------------------
[20 December, 2019]
Paper accepted to ICLR 2020!
Our paper "A Stochastic Derivative Free Optimization Method with Momentum" - joint work with Adel Bibi, Ozan Sener, El Houcine Bergou and Peter Richtárik - got accepted to ICLR 2020. The conference will take place in Addis Ababa, Ethiopia during April 26-30.

The paper was presented last week at the NeurIPS 2019 Optimization Foundations of Reinforcement Learning Workshop, here is our poster.
---------------------------------------------------------
[26 November, 2019]
New paper out: "Derivative-Free Method For Decentralized Distributed Non-Smooth Optimization" - joint work with Aleksandr Beznosikov and Alexander Gasnikov.
Abstract: In this paper, we propose new derivative-free method which is based on the Sliding Algorithm from Lan (2016, 2019) for the convex composite optimization problem that includes two terms: smooth one and non-smooth one. We prove the convergence rate for the new method that matches the corresponding rate for the first-order method up to a factor proportional to the dimension of the space. We apply this method for the decentralized distributed optimization and prove the bounds for the number of communication rounds for this method that matches the lower bounds. We prove the bound for the number of zeroth-order oracle calls per node that matches the similar state-of-the-art bound for the first-order decentralized distributed optimization up to to the factor proportional to the dimension of the space.
---------------------------------------------------------
[20 November, 2019]
New paper out: "Optimal Decentralized Distributed Algorithms for Stochastic Convex Optimization" - joint work with Darina Dvinskikh and Alexander Gasnikov.
Abstract: We consider stochastic convex optimization problems with affine constraints and develop several methods using either primal or dual approach to solve it. In the primal case, we use special penalization technique to make the initial problem more convenient for using optimization methods. We propose algorithms to solve it based on Similar Triangles Method with Inexact Proximal Step for the convex smooth and strongly convex smooth objective functions and methods based on Gradient Sliding algorithm to solve the same problems in the non-smooth case. We prove the convergence guarantees in the smooth convex case with deterministic first-order oracle.

We propose and analyze three novel methods to handle stochastic convex optimization problems with affine constraints: SPDSTM, R-RRMA-AC-SA2 and SSTM_sc. All methods use stochastic dual oracle. SPDSTM is the stochastic primal-dual modification of STM and it is applied for the dual problem when the primal functional is strongly convex and Lipschitz continuous on some ball. We extend the result from Dvinskikh & Gasnikov (2019) for this method to the case when only biased stochastic oracle is available. R-RRMA-AC-SA2 is an accelerated stochastic method based on the restarts of RRMA-AC-SA2 from Foster et al. (2019) and SSTM_sc is just stochastic STM for strongly convex problems. Both methods are applied to the dual problem when the primal functional is strongly convex, smooth and Lipschitz continuous on some ball and use stochastic dual first-order oracle. We develop convergence analysis for these methods for the unbiased and biased oracles respectively.

Finally, we apply all aforementioned results and approaches to solve the decentralized distributed optimization problem and discuss the optimality of the obtained results in terms of communication rounds and the number of oracle calls per node.
---------------------------------------------------------
[18 October, 2019]
Today I gave a talk based on our recent paper with Filip Hanzely and Peter Richtárik at SIERRA, INRIA.
Peter, thanks for your slides that you prepared for ICCOPT 2019 Summer School.
[slides]
---------------------------------------------------------
[06 October - 26 October, 2019]
I am visiting Adrien Taylor and Francis Bach at SIERRA, INRIA.
---------------------------------------------------------
[02 October, 2019]
2 papers got accepted to NeurIPS workshops:

1) "An Accelerated Method for Derivative-Free Smooth Stochastic Convex Optimization" - joint work with Pavel Dvurechensky and Alexander Gasnikov - got accepted to the workshop "Beyond First Order Methods in ML"

2) "A Stochastic Derivative Free Optimization Method with Momentum" - joint work with Adel Bibi, Ozan Sener, El Houcine Bergou and Peter Richtárik - got accepted to the workshop "Optimization Foundations for Reinforcement Learning"
---------------------------------------------------------
[23 June, 2019]
New book out (in Russian): "Lecture Notes on Stochastic Processes" - joint work with Alexander Gasnikov, Sergey Guz, Elena Chernousova, Maksim Shirobokov, Egor Shulgin.
---------------------------------------------------------
[30 May, 2019]
New paper out: "A Stochastic Derivative Free Optimization Method with Momentum" - joint work with Adel Bibi, Ozan Sener, El Houcine Bergou and Peter Richtárik.
Abstract: We consider the problem of unconstrained minimization of a smooth objective function in R^d in setting where only function evaluations are possible. We propose and analyze stochastic zeroth-order method with heavy ball momentum. In particular, we propose, SMTP, a momentum version of the stochastic three-point method (STP). We show new complexity results for non-convex, convex and strongly convex functions. We test our method on a collection of learning to continuous control tasks on several MuJoCo environments with varying difficulty andcompare against STP, other state-of-the-art derivative-free optimization algorithms and against policy gradient methods. SMTP significantly outperforms STP and all other methods that we considered in our numerical experiments. Our second contribution is SMTP with importance sampling which we call SMTP_IS. We provide convergence analysis of this method for non-convex, convex and strongly convex objectives.
---------------------------------------------------------
[27 May, 2019]
New paper out: "A Unified Theory of SGD: Variance Reduction, Sampling, Quantization and Coordinate Descent" - joint work with Filip Hanzely and Peter Richtárik.
Abstract: In this paper we introduce a unified analysis of a large family of variants of proximal stochastic gradient descent (SGD) which so far have required different intuitions, convergence analyses, have different applications, and which have been developed separately in various communities. We show that our framework includes methods with and without the following tricks, and their combinations: variance reduction, importance sampling, mini-batch sampling, quantization, and coordinate sub-sampling. As a by-product, we obtain the first unified theory of SGD and randomized coordinate descent (RCD) methods, the first unified theory of variance reduced and non-variance-reduced SGD methods, and the first unified theory of quantized and non-quantized methods. A key to our approach is a parametric assumption on the iterates and stochastic gradients. In a single theorem we establish a linear convergence result under this assumption and strong-quasi convexity of the loss function. Whenever we recover an existing method as a special case, our theorem gives the best known complexity result. Our approach can be used to motivate the development of new useful methods, and offers pre-proved convergence guarantees. To illustrate the strength of our approach, we develop five new variants of SGD, and through numerical experiments demonstrate some of their properties.
---------------------------------------------------------
[10 April, 2019]
I got the Ilya Segalovich Award - Yandex scientific scholarship! The award includes scholarship (total amount - 350,000 Russian rubles), insternship offer at Yandex.Research, travel grant to attend one international conference and personal mentor from Yandex.
[video] [about award] [about winners]
---------------------------------------------------------
[23 March, 2019]
New paper out: "On Dual Approach for Distributed Stochastic Convex Optimization over Networks" - joint work with Darina Dvinskikh, Alexander Gasnikov, Pavel Dvurechensky and Cesar A. Uribe.
Abstract: We introduce dual stochastic gradient oracle methods for distributed stochastic convex optimization problems over networks. We estimate the complexity of the proposed method in terms of probability of large deviations. This analysis is based on a new technique that allows to bound the distance between the iteration sequence and the solution point. By the proper choice of batch size, we can guarantee that this distance equals (up to a constant) to the distance between the starting point and the solution.
---------------------------------------------------------
[10 February, 2019]
New paper out: "Stochastic Three Points Method for Unconstrained Smooth Minimization" - joint work with El Houcine Bergou and Peter Richtárik.
Abstract: In this paper we consider the unconstrained minimization problem of a smooth function in R^n in a setting where only function evaluations are possible. We design a novel randomized derivative-free algorithm --- the stochastic three points (STP) method --- and analyze its iteration complexity. At each iteration, STP generates a random search direction according to a certain fixed probability law. Our assumptions on this law are very mild: roughly speaking, all laws which do not concentrate all measure on any halfspace passing through the origin will work. For instance, we allow for the uniform distribution on the sphere and also distributions that concentrate all measure on a positive spanning set.
Given a current iterate x, STP compares the objective function at three points: x, x+αs and x−αs, where α>0 is a stepsize parameter and s is the random search direction. The best of these three points is the next iterate. We analyze the method STP under several stepsize selection schemes (fixed, decreasing, estimated through finite differences, etc). We study non-convex, convex and strongly convex cases.
We also propose a parallel version for STP, with iteration complexity bounds which do not depend on the dimension n.
arXiv: 1901.09269
---------------------------------------------------------
[08 February, 2019]
I am a reviewer for ICML 2019! This is my first time being a reviewer for such a conference.
---------------------------------------------------------
[26 January, 2019]
New paper out: "Distributed learning with compressed gradient differences" - joint work with Konstantin Mishchenko, Martin Takáč and Peter Richtárik.
Abstract: Training very large machine learning models requires a distributed computing approach, with communication of the model updates often being the bottleneck. For this reason, several methods based on the compression (e.g., sparsification and/or quantization) of the updates were recently proposed, including QSGD (Alistarh et al., 2017), TernGrad (Wen et al., 2017), SignSGD (Bernstein et al., 2018), and DQGD (Khirirat et al., 2018). However, none of these methods are able to learn the gradients, which means that they necessarily suffer from several issues, such as the inability to converge to the true optimum in the batch mode, inability to work with a nonsmooth regularizer, and slow convergence rates. In this work we propose a new distributed learning method - DIANA - which resolves these issues via compression of gradient differences. We perform a theoretical analysis in the strongly convex and nonconvex settings and show that our rates are vastly superior to existing rates. Our analysis of block quantization and differences between l2 and l∞ quantization closes the gaps in theory and practice. Finally, by applying our analysis technique to TernGrad, we establish the first convergence rate for this method.
arXiv: 1901.09269
---------------------------------------------------------
[13 January - 24 February, 2019]
I am visiting Peter Richtárik at KAUST.
---------------------------------------------------------
[05 September, 2018]
The paper "Stochastic spectral and conjugate descent methods" - joint work with Dmitry Kovalev, Elnur Gasanov and Peter Richtárik - accepted to The Thirty-second Annual Conference on Neural Information Processing Systems (NIPS).
Abstract: The state-of-the-art methods for solving optimization problems in big dimensions are variants of randomized coordinate descent (RCD). In this paper we introduce a fundamentally new type of acceleration strategy for RCD based on the augmentation of the set of coordinate directions by a few spectral or conjugate directions. As we increase the number of extra directions to be sampled from, the rate of the method improves, and interpolates between the linear rate of RCD and a linear rate independent of the condition number. We develop and analyze also inexact variants of these methods where the spectral and conjugate directions are allowed to be approximate only. We motivate the above development by proving several negative results which highlight the limitations of RCD with importance sampling.

The conference will be in Montreal, Canada during December 3-8, 2018.
---------------------------------------------------------
[01 July - 06 July, 2018]
I am attending the 23rd International Symposium on Mathematical Programming in Bordeaux, France. My talk is scheduled on Friday, 06 July, in the section "New methods for stochastic optimization and variational inequalities".
The Talk: ”An Accelerated Directional Derivative Method for Smooth Stochastic Convex Optimization”
[slides]
---------------------------------------------------------
[27 June - 28 June, 2018]
I am in Grenoble, attending the Grenoble Optimization Days 2018. By the way I have the new most favorite photo!

Here I read the excellent book by my advisor Alexander Gasnikov.
------------------------------------------------------------

[25 June, 2018]
Today I successfully defended my Bachelor diploma.
Thesis: "Accelerated Directional Searchs and Gradient-Free Methods with non-Euclidean prox-structure"
---------------------------------------------------------
[10 June - 15 June, 2018]
I am in Traditional Youth School ”Control, Information and Optimization” (Voronovo, Russia) organized by Boris Polyak and Elena Gryazina.

I presented a poster "An Accelerated Directional Derivative Method for SmoothStochastic Convex Optimization". My work was chosen and I gave a talk there. I won third prize for this talk in competitions of best talks among participants.
[slides of the talk]
---------------------------------------------------------
[14 April, 2018]
I am attending workshop ”Optimization at Work” at MIPT (Moscow, Russia) with a talk ”An Accelerated Method for Derivative-Free Smooth Stochastic Convex Optimization” .
[slides] [video]
---------------------------------------------------------
[10 April, 2018]
New paper out: "On the upper bound for the mathematical expectation of the norm of a vector uniformly distributed on the sphere and the phenomenon of concentration of uniform measure on the sphere" - joint work with Evgeniya Vorontsova and Alexander Gasnikov.
Abstract: We considered the problem of obtaining upper bounds for the mathematical expectation of the q-norm (2⩽q⩽∞) of the vector which is uniformly distributed on the unit Euclidean sphere.
---------------------------------------------------------
[8 April, 2018]
New paper out: "An Accelerated Directional Derivative Method for Smooth Stochastic Convex Optimization" - joint work with Pavel Dvurechensky and Alexander Gasnikov.
Abstract: We consider smooth stochastic convex optimization problems in the context of algorithms which are based on directional derivatives of the objective function. This context can be considered as an intermediate one between derivative-free optimization and gradient-based optimization. We assume that at any given point and for any given direction, a stochastic approximation for the directional derivative of the objective function at this point and in this direction is available with some additive noise. The noise is assumed to be of an unknown nature, but bounded in the absolute value. We underline that we consider directional derivatives in any direction, as opposed to coordinate descent methods which use only derivatives in coordinate directions. For this setting, we propose a non-accelerated and an accelerated directional derivative method and provide their complexity bounds. Despite that our algorithms do not use gradient information, our non-accelerated algorithm has a complexity bound which is, up to a factor logarithmic in problem dimension, similar to the complexity bound of gradient-based algorithms. Our accelerated algorithm has a complexity bound which coincides with the complexity bound of the accelerated gradient-based algorithm up to a factor of square root of the problem dimension, whereas for existing directional derivative methods this factor is of the order of problem dimension. We also extend these results to strongly convex problems. Finally, we consider derivative-free optimization as a particular case of directional derivative optimization with noise in the directional derivative and obtain complexity bounds for non-accelerated and accelerated derivative-free methods. Complexity bounds for these algorithms inherit the gain in the dimension dependent factors from our directional derivative methods.
---------------------------------------------------------
[25 February, 2018]
New paper out: "An Accelerated Method for Derivative-Free Smooth Stochastic Convex Optimization" - joint work with Pavel Dvurechensky and Alexander Gasnikov.
Abstract: We consider an unconstrained problem of minimization of a smooth convex function which is only available through noisy observations of its values, the noise consisting of two parts. Similar to stochastic optimization problems, the first part is of a stochastic nature. On the opposite, the second part is an additive noise of an unknown nature, but bounded in the absolute value. In the two-point feedback setting, i.e. when pairs of function values are available, we propose an accelerated derivative-free algorithm together with its complexity analysis. The complexity bound of our derivative-free algorithm is only by a factor of \sqrt{n} larger than the bound for accelerated gradient-based algorithms, where n is the dimension of the decision variable. We also propose a non-accelerated derivative-free algorithm with a complexity bound similar to the stochastic-gradient-based algorithm, that is, our bound does not have any dimension-dependent factor. Interestingly, if the solution of the problem is sparse, for both our algorithms, we obtain better complexity bound if the algorithm uses a 1-norm proximal setup, rather than the Euclidean proximal setup, which is a standard choice for unconstrained problems.
---------------------------------------------------------
[10 February, 2018]
New paper out: "Stochastic spectral and conjugate descent methods" - joint work with Dmitry Kovalev, Elnur Gasanov and Peter Richtárik.
Abstract: The state-of-the-art methods for solving optimization problems in big dimensions are variants of randomized coordinate descent (RCD). In this paper we introduce a fundamentally new type of acceleration strategy for RCD based on the augmentation of the set of coordinate directions by a few spectral or conjugate directions. As we increase the number of extra directions to be sampled from, the rate of the method improves, and interpolates between the linear rate of RCD and a linear rate independent of the condition number. We develop and analyze also inexact variants of these methods where the spectral and conjugate directions are allowed to be approximate only. We motivate the above development by proving several negative results which highlight the limitations of RCD with importance sampling.
---------------------------------------------------------
[9 February, 2018]
I started my final semester as a Bachelor student today. This spring I will be a tutor for the course "Algorithms and Models of Computation".
---------------------------------------------------------
[5 February - 7 February, 2018]
I am attending KAUST Research Workshop on Optimization and Big Data with poster ”Stochastic Spectral Descent Methods” based on our joint work with Dmitry Kovalev, Elnur Gasanov and Peter Richtárik.
---------------------------------------------------------
[14 January - 08 February, 2018]
I am visiting Peter Richtárik at KAUST.
---------------------------------------------------------
[25 November, 2017]
Today I got a prize for the best talk in Section of Information Transmission Problems, Data Analysis and Optimization, 60th Scientific Conference of MIPT.
The talk: ”About accelerated Directional Search with non-Euclidean prox-structure”
[slides]
---------------------------------------------------------
[27 October, 2017]
I am attending workshop ”Optimization at Work” at MIPT (Moscow, Russia) with a talk ”Accelerated Directional Search with non-Euclidean prox-structure”.
[slides]
---------------------------------------------------------
[30 September, 2017]
New paper out: "Accelerated Directional Search with non-Euclidean prox-structure" - joint work with Evgeniya Vorontsova and Alexander Gasnikov.
Abstract: In the paper we propose an accelerated directional search method with non-euclidian prox-structure. We consider convex unconstrained optimization problem in R^n. For simplicity we start from the zero point. We expect in advance that 1-norm of the solution is close enough to its 2-norm. In this case the standard accelerated Nesterov's directional search method can be improved. In the paper we show how to make Nesterov's method n-times faster (up to a log n-factor) in this case. The basic idea is to use linear coupling, proposed by Allen-Zhu & Orecchia in 2014, and to make Grad-step in 2-norm, but Mirr-step in 1-norm. We show that for constrained optimization problem this approach stable upon an obstacle.
---------------------------------------------------------
[29 July, 2017]
I am happy to announce that I started working with Peter Richtárik as a member of his research group at MIPT.

About me

Skills and Interests

Education

BSc in Applied Mathematics Moscow Institute of Physics and Technology

MSc in Applied Mathematics Moscow Institute of Physics and Technology

PhD in Computer Science
Moscow Institute of Physics and Technology

Work Experience

Moscow Institute of Physics and Technology

Moscow Institute of Physics and Technology

Huawei, Moscow

Huawei-MIPT group, Moscow

Institute for Information Transmission Problems of Russian Academy of Sciences, Moscow

Moscow Institute of Physics and Technology

Russian Presidential Academy of National Economy and Public Administration, Moscow

Moscow Institute of Physics and Technology

National Research University Higher School of Economics

Yandex.Research-MIPT Lab

Mila

MBZUAI

MBZUAI

MBZUAI

News

About me

Skills and Interests

Education

BSc in Applied Mathematics Moscow Institute of Physics and Technology

MSc in Applied Mathematics Moscow Institute of Physics and Technology

PhD in Computer Science Moscow Institute of Physics and Technology

Work Experience

Moscow Institute of Physics and Technology

Moscow Institute of Physics and Technology

Huawei, Moscow

Huawei-MIPT group, Moscow

Institute for Information Transmission Problems of Russian Academy of Sciences, Moscow

Moscow Institute of Physics and Technology

Russian Presidential Academy of National Economy and Public Administration, Moscow

Moscow Institute of Physics and Technology

National Research University Higher School of Economics

Yandex.Research-MIPT Lab

Mila

MBZUAI

MBZUAI

MBZUAI

News

PhD in Computer Science
Moscow Institute of Physics and Technology