Mobirise

About me

I am on the job market for a Tenure Track Assistant Professor position!

I am a postdoctoral fellow at MBZUAI hosted by Samuel Horváth and Martin Takáč. Previously I worked as a junior researcher at MIPT and as a research consultant (remote postdoc) at Mila, in the group of Gauthier Gidel. I obtained my PhD degree at MIPT, Phystech School of Applied Mathematics and Informatics where I worked under the supervision of professors Alexander Gasnikov and
Peter Richtárik. My research interests include Stochastic Optimization and its applications to
Machine Learning, Distributed Optimization, Derivative-Free Optimization, and Variational Inequalities. In 2019, I got the Ilya Segalovich Award (highly selective).

Google scholar
ResearchGate
arXiv
X (Twitter)
CV

E-mail: eduard.gorbunov@mbzuai.ac.ae

Skills and Interests

Computer skills:
  • Operating systems: Microsoft Windows, Linux, Mac OS
  • Programming languages: Python, LaTeX, C, C++
Languages:
  • Russian (native)
  • English (advanced)
Interests:
  • Football: 9 years in football school in Rybinsk, Russia. I also played for an amateur team.
  • Wakesurfing, Gym, Hiking

Education

Sep 2014 - Jul 2018

BSc in Applied Mathematics Moscow Institute of Physics and Technology

Department of Control and Applied Mathematics
Thesis: Accelerated Directional Searchs and Gradient-Free Methods with non-Euclidean prox-structure
Advisor: Alexander Gasnikov

Sep 2018 - Jul 2020

MSc in Applied Mathematics Moscow Institute of Physics and Technology

Phystech School of Applied Mathematics and Informatics
Thesis: Derivative-free and stochastic optimization methods, decentralized distributed optimization
Advisor: Alexander Gasnikov

Sep 2020 - Dec 2021

PhD in Computer Science
 Moscow Institute of Physics and Technology

Phystech School of Applied Mathematics and Informatics
 Thesis: Distributed and Stochastic Optimization Methods with Gradient Compression and Local Steps         Advisors: Alexander Gasnikov and Peter Richtárik
Links: slides and video of the defense

Work Experience

Aug 2017 - Oct 2019

Moscow Institute of Physics and Technology

Researcher at Peter Richtárik's group

Feb 2018 - Jun 2019

Moscow Institute of Physics and Technology

Teaching assistant for the following courses: Algorithms and Models of Computation, Probability Theory

May 2019 - Aug 2019

Huawei, Moscow

Intern in Media Laboratory (research, C++ programming) 

Aug 2019 - Jul 2020

Huawei-MIPT group, Moscow

Researcher

Nov 2019 - Apr 2020

Institute for Information Transmission Problems of Russian Academy of Sciences, Moscow

Junior researcher at Department of Mathematical Methods of Predictive Modelling

Feb 2020 - Dec 2020

Moscow Institute of Physics and Technology

Junior researcher at Laboratory of Numerical Methods of Applied Structural Optimization

Feb 2020 - Dec 2020

Russian Presidential Academy of National Economy and Public Administration, Moscow

Junior researcher at Joint Research Laboratory of Applied Mathematics, RANEPA-MIPT

May 2020 - Aug 2022

Moscow Institute of Physics and Technology

Junior researcher at Laboratory of Advanced Combinatorics and Network Applications

May 2020 - Dec 2021

National Research University Higher School of Economics

Research Assistant at International Laboratory of Stochastic Algorithms and High-Dimensional Inference

Sep 2020 - January 2022

Yandex.Research-MIPT Lab

Junior researcher at the joint Lab of Yandex.Research and Moscow Institute of Physics and Technology

Feb 2022 - May 2022

Mila

Research consultant at Mila, Montreal
Group of Gauthier Gidel

Sep 2022 - Now

MBZUAI

Postdoctoral Fellow at MBZUAI, Abu Dhabi
Groups of Samuel Horváth and Martin Takáč

News

Mobirise
  • [video] [slides] [arXiv]
    ---------------------------------------------------------
Mobirise
  • ---------------------------------------------------------
  • [8 October, 2021]
    New paper out:
    "Extragradient Method: O(1/K) Last-Iterate Convergence for Monotone Variational Inequalities and Connections With Cocoercivity" - joint work with Nicolas Loizou and Gauthier Gidel.
    Abstract: Extragradient method (EG) Korpelevich [1976] is one of the most popular methods for solving saddle point and variational inequalities problems (VIP). Despite its long history and significant attention in the optimization community, there remain important open questions about convergence of EG. In this paper, we resolve one of such questions and derive the first last-iterate O(1/K) convergence rate for EG for monotone and Lipschitz VIP without any additional assumptions on the operator. The rate is given in terms of reducing the squared norm of the operator. Moreover, we establish several results on the (non-)cocoercivity of the update operators of EG, Optimistic Gradient Method, and Hamiltonian Gradient Method, when the original operator is monotone and Lipschitz.
    ---------------------------------------------------------
  • [7 October, 2021]
    New paper out:
    "EF21 with Bells & Whistles: Practical Algorithmic Extensions of Modern Error Feedback" - joint work with lyas Fatkhullin, Igor Sokolov, Zhize Li, and Peter Richtárik.
    Abstract: First proposed by Seide (2014) as a heuristic, error feedback (EF) is a very popular mechanism for enforcing convergence of distributed gradient-based optimization methods enhanced with communication compression strategies based on the application of contractive compression operators. However, existing theory of EF relies on very strong assumptions (e.g., bounded gradients), and provides pessimistic convergence rates (e.g., while the best known rate for EF in the smooth nonconvex regime, and when full gradients are compressed, is O(1/T^{2/3}), the rate of gradient descent in the same regime is O(1/T)). Recently, Richtárik et al. (2021) proposed a new error feedback mechanism, EF21, based on the construction of a Markov compressor induced by a contractive compressor. EF21 removes the aforementioned theoretical deficiencies of EF and at the same time works better in practice. In this work we propose six practical extensions of EF21, all supported by strong convergence theory: partial participation, stochastic approximation, variance reduction, proximal setting, momentum and bidirectional compression. Several of these techniques were never analyzed in conjunction with EF before, and in cases where they were (e.g., bidirectional compression), our rates are vastly superior.
    ---------------------------------------------------------
  • [28 September, 2021]
    Paper accepted to NeurIPS 2021:
    "Moshpit SGD: Communication-Efficient Decentralized Training on Heterogeneous Unreliable Devices" - joint work with Max Ryabinin, Vsevolod Plokhotnyuk, Gennady Pekhimenko.
    Abstract: Training deep neural networks on large datasets can often be accelerated by using multiple compute nodes. This approach, known as distributed training, can utilize hundreds of computers via specialized message-passing protocols such as Ring All-Reduce. However, running these protocols at scale requires reliable high-speed networking that is only available in dedicated clusters. In contrast, many real-world applications, such as federated learning and cloud-based distributed training, operate on unreliable devices with unstable network bandwidth. As a result, these applications are restricted to using parameter servers or gossip-based averaging protocols. In this work, we lift that restriction by proposing Moshpit All-Reduce -- an iterative averaging protocol that exponentially converges to the global average. We demonstrate the efficiency of our protocol for distributed optimization with strong theoretical guarantees. The experiments show 1.3x speedup for ResNet-50 training on ImageNet compared to competitive gossip-based strategies and 1.5x speedup when training ALBERT-large from scratch using preemptible compute nodes.
    ---------------------------------------------------------
  • [16 September, 2021]
    New scholarship: I received A. M. Raigorodskii Scholarship for Contribution to the Development of Numerical Optimization Methods. The scholarship has different levels and I won the main one. Great thanks to A. M. Raigorodskii for the support!
    ---------------------------------------------------------
  • [24 July, 2021]
    ICML 2021 - top 10% of reviewers:
    I am among the top 10% of reviewers. Free registration to the conference was a good bonus :)
    ---------------------------------------------------------
  • [21 June, 2021]
    New paper out: "Secure Distributed Training at Scale" 
    - joint work with Alexander Borzunov, Michael Diskin
    and Max Ryabinin.
    Abstract: Some of the hardest problems in deep learning can be solved with the combined effort of many independent parties, as is the case for volunteer computing and federated learning. These setups rely on high numbers of peers to provide computational resources or train on decentralized datasets. Unfortunately, participants in such systems are not always reliable. Any single participant can jeopardize the entire training run by sending incorrect updates, whether deliberately or by mistake. Training in presence of such peers requires specialized distributed training algorithms with Byzantine tolerance. These algorithms often sacrifice efficiency by introducing redundant communication or passing all updates through a trusted server. As a result, it can be infeasible to apply such algorithms to large-scale distributed deep learning, where models can have billions of parameters. In this work, we propose a novel protocol for secure (Byzantine-tolerant) decentralized training that emphasizes communication efficiency. We rigorously analyze this protocol: in particular, we provide theoretical bounds for its resistance against Byzantine and Sybil attacks and show that it has a marginal communication overhead. To demonstrate its practical effectiveness, we conduct large-scale experiments on image classification and language modeling in presence of Byzantine attackers.
    ---------------------------------------------------------
  • [10 June, 2021]
    New paper out: "Near-Optimal High Probability Complexity Bounds for Non-Smooth Stochastic Optimization with Heavy-Tailed Noise"
    - joint work with Marina Danilova, Innokentiy Shibaev, Pavel Dvurechensky and Alexander Gasnikov.
    Abstract: Thanks to their practical efficiency and random nature of the data, stochastic first-order methods are standard for training large-scale machine learning models. Random behavior may cause a particular run of an algorithm to result in a highly suboptimal objective value, whereas theoretical guarantees are usually proved for the expectation of the objective value. Thus, it is essential to theoretically guarantee that algorithms provide small objective residual with high probability. Existing methods for non-smooth stochastic convex optimization have complexity bounds with the dependence on the confidence level that is either negative-power or logarithmic but under an additional assumption of sub-Gaussian (light-tailed) noise distribution that may not hold in practice, e.g., in several NLP tasks. In our paper, we resolve this issue and derive the first high-probability convergence results with logarithmic dependence on the confidence level for non-smooth convex stochastic optimization problems with non-sub-Gaussian (heavy-tailed) noise. To derive our results, we propose novel stepsize rules for two stochastic methods with gradient clipping. Moreover, our analysis works for generalized smooth objectives with Hölder-continuous gradients, and for both methods, we provide an extension for strongly convex problems. Finally, our results imply that the first (accelerated) method we consider also has optimal iteration and oracle complexity in all the regimes, and the second one is optimal in the non-smooth setting.
    ---------------------------------------------------------
  • [8 June, 2021]
    Start of the internship at Mila, Montreal!
    I started my internship at the group of Gauthier Gidel. It is my pleasure to work with Gauthier! I will be here till the end of September.
    ---------------------------------------------------------
  • [8 May, 2021]
    Paper accepted to ICML 2021:
     "MARINA: Faster Non-Convex Distributed Learning with Compression" - joint work with 
    Konstantin Burlachenko, Zhize Li, Peter Richtárik.
    Abstract: We develop and analyze MARINA: a new communication efficient method for non-convex distributed learning over heterogeneous datasets. MARINA employs a novel communication compression strategy based on the compression of gradient differences which is reminiscent of but different from the strategy employed in the DIANA method of Mishchenko et al (2019). Unlike virtually all competing distributed first-order methods, including DIANA, ours is based on a carefully designed biased gradient estimator, which is the key to its superior theoretical and practical performance. To the best of our knowledge, the communication complexity bounds we prove for MARINA are strictly superior to those of all previous first order methods. Further, we develop and analyze two variants of MARINA: VR-MARINA and PP-MARINA. The first method is designed for the case when the local loss functions owned by clients are either of a finite sum or of an expectation form, and the second method allows for partial participation of clients -- a feature important in federated learning. All our methods are superior to previous state-of-the-art methods in terms of the oracle/communication complexity. Finally, we provide convergence analysis of all methods for problems satisfying the Polyak-Lojasiewicz condition.
    ---------------------------------------------------------
  • [19 March, 2021]
    ICLR 2021: I was recognized as an "outstanding reviewer" and got a free registration to the conference.
Mobirise
  • ---------------------------------------------------------
Mobirise
  • [video] [slides] [arXiv]
    ---------------------------------------------------------
  • [4 March 2021]
    New paper out: "Moshpit SGD: Communication-Efficient Decentralized Training on Heterogeneous Unreliable Devices" - joint work with Max Ryabinin, Vsevolod Plokhotnyuk, Gennady Pekhimenko. 
    Abstract: Training deep neural networks on large datasets can often be accelerated by using multiple compute nodes. This approach, known as distributed training, can utilize hundreds of computers via specialized message-passing protocols such as Ring All-Reduce. However, running these protocols at scale requires reliable high-speed networking that is only available in dedicated clusters. In contrast, many real-world applications, such as federated learning and cloud-based distributed training, operate on unreliable devices with unstable network bandwidth. As a result, these applications are restricted to using parameter servers or gossip-based averaging protocols. In this work, we lift that restriction by proposing Moshpit All-Reduce -- an iterative averaging protocol that exponentially converges to the global average. We demonstrate the efficiency of our protocol for distributed optimization with strong theoretical guarantees. The experiments show 1.3x speedup for ResNet-50 training on ImageNet compared to competitive gossip-based strategies and 1.5x speedup when training ALBERT-large from scratch using preemptible compute nodes.
    ---------------------------------------------------------
  • [26 February 2021]
    New scholarship:
    I received A. M. Raigorodskii Scholarship for Contribution to the Development of Numerical Optimization Methods. The scholarship has different levels and I won the main one. Great thanks to A. M. Raigorodskii for the support!
    ---------------------------------------------------------
  • [15 February 2021]
    New paper out:
    "MARINA: Faster Non-Convex Distributed Learning with Compression" - joint work with
    Konstantin Burlachenko, Zhize Li, Peter Richtárik.
    Abstract: We develop and analyze MARINA: a new communication efficient method for non-convex distributed learning over heterogeneous datasets. MARINA employs a novel communication compression strategy based on the compression of gradient differences which is reminiscent of but different from the strategy employed in the DIANA method of Mishchenko et al (2019). Unlike virtually all competing distributed first-order methods, including DIANA, ours is based on a carefully designed biased gradient estimator, which is the key to its superior theoretical and practical performance. To the best of our knowledge, the communication complexity bounds we prove for MARINA are strictly superior to those of all previous first order methods. Further, we develop and analyze two variants of MARINA: VR-MARINA and PP-MARINA. The first method is designed for the case when the local loss functions owned by clients are either of a finite sum or of an expectation form, and the second method allows for partial participation of clients -- a feature important in federated learning. All our methods are superior to previous state-of-the-art methods in terms of the oracle/communication complexity. Finally, we provide convergence analysis of all methods for problems satisfying the Polyak-Lojasiewicz condition.
    ---------------------------------------------------------
  • [10 February 2021]
    ICML 2021 Expert Reviewer invitation:  I accepted an invitation to serve as an expert reviewer for ICML 2021.
Mobirise
  • ---------------------------------------------------------
  • [23 January 2021]
    Paper accepted to AISTATS 2021:
     
    "Local SGD: Unified Theory and New Efficient Methods" - joint work with Filip Hanzely and  Peter Richtárik
    Abstract: We present a unified framework for analyzing local SGD methods in the convex and strongly convex regimes for distributed/federated training of supervised machine learning models. We recover several known methods as a special case of our general framework, including Local-SGD/FedAvg, SCAFFOLD, and several variants of SGD not originally designed for federated learning. Our framework covers both the identical and heterogeneous data settings, supports both random and deterministic number of local steps, and can work with a wide array of local stochastic gradient estimators, including shifted estimators which are able to adjust the fixed points of local iterations for faster convergence. As an application of our framework, we develop multiple novel FL optimizers which are superior to existing methods. In particular, we develop the first linearly converging local SGD method which does not require any data homogeneity or other strong assumptions. 
    ---------------------------------------------------------
  • [19 January 2021]
    I gave a talk "Linearly Converging Error Compensated SGD" at NeurIPS New Year AfterParty (Yandex).
    [slides] [video] [paper]
    ---------------------------------------------------------
  • [11 December 2020]
    New paper out: "Recent Theoretical Advances in Non-Convex Optimization"
    - joint work with Marina Danilova, Pavel Dvurechensky, Alexander Gasnikov, Sergey Guminov, Dmitry Kamzolov and Innokentiy Shibaev.
    Abstract: Motivated by recent increased interest in optimization algorithms for non-convex optimization in application to training deep neural networks and other optimization problems in data analysis, we give an overview of recent theoretical results on global performance guarantees of optimization algorithms for non-convex optimization. We start with classical arguments showing that general non-convex problems could not be solved efficiently in a reasonable time. Then we give a list of problems that can be solved efficiently to find the global minimizer by exploiting the structure of the problem as much as it is possible. Another way to deal with non-convexity is to relax the goal from finding the global minimum to finding a stationary point or a local minimum. For this setting, we first present known results for the convergence rates of deterministic first-order methods, which are then followed by a general theoretical analysis of optimal stochastic and randomized gradient schemes, and an overview of the stochastic first-order methods. After that, we discuss quite general classes of non-convex problems, such as minimization of α-weakly-quasi-convex functions and functions that satisfy Polyak-Lojasiewicz condition, which still allow obtaining theoretical convergence guarantees of first-order methods. Then we consider higher-order and zeroth-order/derivative-free methods and their convergence rates for non-convex optimization problems.
    ---------------------------------------------------------
  • [6-12 December 2020]
    I attended NeurIPS 2020!
    The conference was organized perfectly. In particular, poster sessions in Gather Town were great: this tool made them as vivid as possible. I presented the paper "Linearly Converging Error Compensated SGD" - joint work with Dmitry Kovalev, Dmitry Makarenko and Peter Richtárik. Marina Danilova presented our joint work with Alexander Gasnikov - "Stochastic Optimization with Heavy-Tailed Noise via Accelerated Gradient Clipping".
    [poster] [poster]
    ---------------------------------------------------------
  • [26 November 2020]
    New paper out: "Recent theoretical advances in decentralized distributed convex optimization" 
    - joint work with Alexander Rogozin, Aleksandr Beznosikov, Darina Dvinskikh and Alexander Gasnikov.
    Abstract: In the last few years, the theory of decentralized distributed convex optimization has made significant progress. The lower bounds on communications rounds and oracle calls have appeared, as well as methods that reach both of these bounds. In this paper, we focus on how these results can be explained based on optimal algorithms for the non-distributed setup. In particular, we provide our recent results that have not been published yet, and that could be found in details only in arXiv preprints.
    ---------------------------------------------------------
  • [3 November, 2020]
    New paper out: "Local SGD: Unified Theory and New Efficient Methods" 
    - joint work with Filip Hanzely and
    Peter Richtárik.
    Abstract: We present a unified framework for analyzing local SGD methods in the convex and strongly convex regimes for distributed/federated training of supervised machine learning models. We recover several known methods as a special case of our general framework, including Local-SGD/FedAvg, SCAFFOLD, and several variants of SGD not originally designed for federated learning. Our framework covers both the identical and heterogeneous data settings, supports both random and deterministic number of local steps, and can work with a wide array of local stochastic gradient estimators, including shifted estimators which are able to adjust the fixed points of local iterations for faster convergence. As an application of our framework, we develop multiple novel FL optimizers which are superior to existing methods. In particular, we develop the first linearly converging local SGD method which does not require any data homogeneity or other strong assumptions.
    ---------------------------------------------------------
  • [23 October, 2020]
    New paper out (Spotlight @ NeurIPS 2020):
    "Linearly Converging Error Compensated SGD" - joint work with Dmitry Kovalev, Dmitry Makarenko, and Peter Richtárik.
    Abstract: In this paper, we propose a unified analysis of variants of distributed SGD with arbitrary compressions and delayed updates. Our framework is general enough to cover different variants of quantized SGD, Error-Compensated SGD (EC-SGD) and SGD with delayed updates (D-SGD). Via a single theorem, we derive the complexity results for all the methods that fit our framework. For the existing methods, this theorem gives the best-known complexity results. Moreover, using our general scheme, we develop new variants of SGD that combine variance reduction or arbitrary sampling with error feedback and quantization and derive the convergence rates for these methods beating the state-of-the-art results. In order to illustrate the strength of our framework, we develop 16 new methods that fit this. In particular, we propose the first method called EC-SGD-DIANA that is based on error-feedback for biased compression operator and quantization of gradient differences and prove the convergence guarantees showing that EC-SGD-DIANA converges to the exact optimum asymptotically in expectation with constant learning rate for both convex and strongly convex objectives when workers compute full gradients of their loss functions. Moreover, for the case when the loss function of the worker has the form of finite sum, we modified the method and got a new one called EC-LSVRG-DIANA which is the first distributed stochastic method with error feedback and variance reduction that converges to the exact optimum asymptotically in expectation with a constant learning rate.
    ---------------------------------------------------------
  • [21 October, 2020]
    I am among the top 10% reviewers at NeurIPS 2020!
    I am happy to contribute to the community through reviewing and extremely happy and proud that my work as a reviewer was highly evaluated. Free registration is a nice bonus :)
Mobirise
  • ---------------------------------------------------------
Mobirise
  • [video] [slides] [arXiv]
    ---------------------------------------------------------
  • [26 September, 2020]
    Two papers accepted to NeurIPS 2020!

    1. "Stochastic Optimization with Heavy-Tailed Noise via Accelerated Gradient Clipping" - joint work with
    Marina Danilova and Alexander Gasnikov - got accepted for a poster presentation.
    [arXiv] [video from MLSS 2020] [slides from MLSS 2020]

    2. "Linearly Converging Error Compensated SGD" - joint work with Dmitry Kovalev, Dmitry Makarenko and
    Peter Richtárik - got accepted for a spotlight presentation. The paper will be available on arXiv soon.
    [video from MLSS 2020] [slides from MLSS 2020]

    I would like to thank my great co-authors!
    ---------------------------------------------------------
  • [22 September, 2020]
    It is my pleasure to join Yandex.Research-MIPT Lab as a junior researcher!
    ---------------------------------------------------------
  • [17 September, 2020]
    Teaching in the Fall 2020 semester.
    This semester I will be one of the lecturers for a fresh course "Optimization methods at ML" at MIPT that we developed together with Alexander Gasnikov, Marina Danilova, Alexander Maslovsky, and Alexander Rogozin.
    ---------------------------------------------------------
  • [1 September, 2020]
    My remote internship at Peter Richtárik's group has started today and will last for 6 months! It is a great pleasure for me to be a part of Peter's group again!
    ---------------------------------------------------------
  • [26-28 August, 2020]
    I am attending AISTATS 2020, which is held virtually this year for the reasons you all know. I am glad to present our joint work with Filip Hanzely, and Peter Richtárik called "A Unified Theory of SGD: Variance Reduction, Sampling, Quantization and Coordinate Descent".
    [video]
    ---------------------------------------------------------
  • [13 August, 2020]
    Great news: our paper "An Accelerated Directional Derivative Method for Smooth Stochastic Convex Optimization" with Pavel Dvurechensky and Alexander Gasnikov was accepted to European Journal of Operational Research!
    ---------------------------------------------------------
  • [3 August, 2020]
    I am happy to announce that our paper "Stochastic Three Points Method for Unconstrained Smooth Minimization" with El Houcine Bergou and Peter Richtárik was accepted to SIAM Journal on Optimization!
    ---------------------------------------------------------
  • [2 August, 2020]
    I have arrived at Sirius university (Sochi, Russia). I will give here 2 lectures and will be a mentor for a couple of students research projects on optimization. I will be here till the end of August.
    ---------------------------------------------------------
  • [8 July, 2020]
    I gave a talk "On the convergence of SGD-like methods for convex and non-convex optimization problems" on Russian Optimization Seminar.
    [video] [slides
    ---------------------------------------------------------
  • [28 June - 10 July, 2020]
    It was my pleasure to take part in online Machine Learning Summer School 2020. I have met a lot of interesting people, listened to amazing lectures, and taken part in several round tables. Finally, I have presented our joint work with Dmitry Kovalev, Dmitry Makarenko and Peter Richtárik called "Linearly Converging Error Compensated SGD".
    [video] [slides]

    Marina Danilova also presented our joint work with Alexander Gasnikov called "Stochastic Optimization with Heavy-Tailed Noise via Accelerated Gradient Clipping".
    [video] [slides]
    ---------------------------------------------------------
  • [16 June, 2020]
    I have defended my Master thesis at MIPT!
    ---------------------------------------------------------
  • [22 May, 2020]
    New paper out: "Stochastic Optimization with Heavy-Tailed Noise via Accelerated Gradient Clipping" - joint work with Marina Danilova and Alexander Gasnikov.
    Abstract: In this paper, we propose a new accelerated stochastic first-order method called clipped-SSTM for smooth convex stochastic optimization with heavy-tailed distributed noise in stochastic gradients and derive the first high-probability complexity bounds for this method closing the gap in the theory of stochastic optimization with heavy-tailed noise. Our method is based on a special variant of accelerated Stochastic Gradient Descent (SGD) and clipping of stochastic gradients. We extend our method to the strongly convex case and prove new complexity bounds that outperform state-of-the-art results in this case. Finally, we extend our proof technique and derive the first non-trivial high-probability complexity bounds for SGD with clipping without light-tails assumption on the noise.
    ---------------------------------------------------------
  • [29 April, 2020]
    I got accepted to The Machine Learning Summer School this year which will be fully virtual. There were more than 1300 applications and only 180 available slots, so, I am incredibly pleased to be one of those 180 students and looking forward to taking part in this school.
    ---------------------------------------------------------
  • [27-30 April, 2020]
    I took part in the first fully virtual conference - ICLR 2020!
    I have presented our paper "A Stochastic Derivative Free Optimization Method with Momentum" - joint work with Adel Bibi, Ozan Sener, El Houcine Bergou and Peter Richtárik.
    I enjoyed the online format and met a lot of amazing people there. Great thanks to the organizers!
    [video]
    ---------------------------------------------------------
  • [2 February, 2020]
    I have arrived at KAUST. I will be here till the end of March to work with Peter Richtárik as a visiting student.

    Update:
    due to the situation with COVID-19 I have decided to come back to Moscow earlier (no, there is no confirmed cases at KAUST). I have arrived in Moscow on 14 March.
    ---------------------------------------------------------
  • [15 January, 2020]
    I got Huawei scholarship for MIPT bachelor and master students for academic and reseacrh achievements!
    The prize is 125 000 Russian rubles.
    ---------------------------------------------------------
  • [07 January, 2020]
    Paper accepted to AISTATS 2020!
    Our paper "A Unified Theory of SGD: Variance Reduction, Sampling, Quantization and Coordinate Descent" - joint work with Filip Hanzely and Peter Richtárik - got accepted to AISTATS 2020. The conference will take place in Palermo, Sicily, Italy during June 3-5.
    ---------------------------------------------------------
  • [20 December, 2019]
    Paper accepted to ICLR 2020!
    Our paper "A Stochastic Derivative Free Optimization Method with Momentum" - joint work with Adel Bibi, Ozan Sener, El Houcine Bergou and Peter Richtárik - got accepted to ICLR 2020. The conference will take place in Addis Ababa, Ethiopia during April 26-30.

    The paper was presented last week at the NeurIPS 2019 Optimization Foundations of Reinforcement Learning Workshop, here is our poster.
    ---------------------------------------------------------
  • [26 November, 2019]
    New paper out: "Derivative-Free Method For Decentralized Distributed Non-Smooth Optimization" 
    - joint work with Aleksandr Beznosikov and Alexander Gasnikov.
    Abstract: In this paper, we propose new derivative-free method which is based on the Sliding Algorithm from Lan (2016, 2019) for the convex composite optimization problem that includes two terms: smooth one and non-smooth one. We prove the convergence rate for the new method that matches the corresponding rate for the first-order method up to a factor proportional to the dimension of the space. We apply this method for the decentralized distributed optimization and prove the bounds for the number of communication rounds for this method that matches the lower bounds. We prove the bound for the number of zeroth-order oracle calls per node that matches the similar state-of-the-art bound for the first-order decentralized distributed optimization up to to the factor proportional to the dimension of the space.
    ---------------------------------------------------------
  • [20 November, 2019]
    New paper out:
    "Optimal Decentralized Distributed Algorithms for Stochastic Convex Optimization" - joint work with Darina Dvinskikh and Alexander Gasnikov.
    Abstract: We consider stochastic convex optimization problems with affine constraints and develop several methods using either primal or dual approach to solve it. In the primal case, we use special penalization technique to make the initial problem more convenient for using optimization methods. We propose algorithms to solve it based on Similar Triangles Method with Inexact Proximal Step for the convex smooth and strongly convex smooth objective functions and methods based on Gradient Sliding algorithm to solve the same problems in the non-smooth case. We prove the convergence guarantees in the smooth convex case with deterministic first-order oracle. 

    We propose and analyze three novel methods to handle stochastic convex optimization problems with affine constraints: SPDSTM, R-RRMA-AC-SA2 and SSTM_sc. All methods use stochastic dual oracle. SPDSTM is the stochastic primal-dual modification of STM and it is applied for the dual problem when the primal functional is strongly convex and Lipschitz continuous on some ball. We extend the result from Dvinskikh & Gasnikov (2019) for this method to the case when only biased stochastic oracle is available. R-RRMA-AC-SA2 is an accelerated stochastic method based on the restarts of RRMA-AC-SA2 from Foster et al. (2019) and SSTM_sc is just stochastic STM for strongly convex problems. Both methods are applied to the dual problem when the primal functional is strongly convex, smooth and Lipschitz continuous on some ball and use stochastic dual first-order oracle. We develop convergence analysis for these methods for the unbiased and biased oracles respectively.  

    Finally, we apply all aforementioned results and approaches to solve the decentralized distributed optimization problem and discuss the optimality of the obtained results in terms of communication rounds and the number of oracle calls per node.

    ---------------------------------------------------------
  • [18 October, 2019]
    Today I gave a talk based on our recent paper with Filip Hanzely and Peter Richtárik at SIERRA, INRIA.
    Peter, thanks for your slides that you prepared for ICCOPT 2019 Summer School.
    [slides]
    ---------------------------------------------------------
  • [06 October - 26 October, 2019]
    I am visiting Adrien Taylor and Francis Bach at SIERRA, INRIA
    ---------------------------------------------------------
  • [02 October, 2019]
    2 papers got accepted to NeurIPS workshops:

    1) "An Accelerated Method for Derivative-Free Smooth Stochastic Convex Optimization" - joint work with Pavel Dvurechensky and Alexander Gasnikov - got accepted to the workshop "Beyond First Order Methods in ML"

    2) "A Stochastic Derivative Free Optimization Method with Momentum" - joint work with Adel Bibi, Ozan Sener, El Houcine Bergou and Peter Richtárik
     - got accepted to the workshop "Optimization Foundations for Reinforcement Learning"
    --------------------------------------------------------- 
  • [23 June, 2019]
    New book out
    (in Russian): "Lecture Notes on Stochastic Processes" - joint work with Alexander Gasnikov, Sergey Guz, Elena Chernousova, Maksim Shirobokov, Egor Shulgin.
    ---------------------------------------------------------
  • [30 May, 2019]
    New paper out: "A Stochastic Derivative Free Optimization Method with Momentum" - joint work with Adel Bibi, Ozan Sener, El Houcine Bergou and Peter Richtárik
    Abstract: We consider the problem of unconstrained minimization of a smooth objective function in R^d in setting where only function evaluations are possible. We propose and analyze stochastic zeroth-order method with heavy ball momentum. In particular, we propose, SMTP, a momentum version of the stochastic three-point method (STP). We show new complexity results for non-convex, convex and strongly convex functions. We test our method on a collection of learning to continuous control tasks on several MuJoCo environments with varying difficulty andcompare against STP, other state-of-the-art derivative-free optimization algorithms and against policy gradient methods. SMTP significantly outperforms STP and all other methods that we considered in our numerical experiments. Our second contribution is SMTP with importance sampling which we call SMTP_IS. We provide convergence analysis of this method for non-convex, convex and strongly convex objectives. 
    ---------------------------------------------------------
  • [27 May, 2019]
    New paper out: "A Unified Theory of SGD: Variance Reduction, Sampling, Quantization and Coordinate Descent" - joint work with Filip Hanzely and Peter Richtárik.
    Abstract: In this paper we introduce a unified analysis of a large family of variants of proximal stochastic gradient descent (SGD) which so far have required different intuitions, convergence analyses, have different applications, and which have been developed separately in various communities. We show that our framework includes methods with and without the following tricks, and their combinations: variance reduction, importance sampling, mini-batch sampling, quantization, and coordinate sub-sampling. As a by-product, we obtain the first unified theory of SGD and randomized coordinate descent (RCD) methods, the first unified theory of variance reduced and non-variance-reduced SGD methods, and the first unified theory of quantized and non-quantized methods. A key to our approach is a parametric assumption on the iterates and stochastic gradients. In a single theorem we establish a linear convergence result under this assumption and strong-quasi convexity of the loss function. Whenever we recover an existing method as a special case, our theorem gives the best known complexity result. Our approach can be used to motivate the development of new useful methods, and offers pre-proved convergence guarantees. To illustrate the strength of our approach, we develop five new variants of SGD, and through numerical experiments demonstrate some of their properties.
    ---------------------------------------------------------
  • [10 April, 2019]
    I got the Ilya Segalovich Award - Yandex scientific scholarship!
    The award includes scholarship (total amount - 350,000 Russian rubles), insternship offer at Yandex.Research, travel grant to attend one international conference and personal mentor from Yandex.
    [video] [about award] [about winners]
    ---------------------------------------------------------
  • [23 March, 2019]
    New paper out: "On Dual Approach for Distributed Stochastic Convex Optimization over Networks" -
    joint work with Darina Dvinskikh, Alexander Gasnikov, Pavel Dvurechensky and Cesar A. Uribe.
    Abstract: We introduce dual stochastic gradient oracle methods for distributed stochastic convex optimization problems over networks. We estimate the complexity of the proposed method in terms of probability of large deviations. This analysis is based on a new technique that allows to bound the distance between the iteration sequence and the solution point. By the proper choice of batch size, we can guarantee that this distance equals (up to a constant) to the distance between the starting point and the solution.
    ---------------------------------------------------------
  • [10 February, 2019]
    New paper out: "Stochastic Three Points Method for Unconstrained Smooth Minimization" - joint work with         El Houcine Bergou and Peter Richtárik
    Abstract: In this paper we consider the unconstrained minimization problem of a smooth function in R^n in a setting where only function evaluations are possible. We design a novel randomized derivative-free algorithm --- the stochastic three points (STP) method --- and analyze its iteration complexity. At each iteration, STP generates a random search direction according to a certain fixed probability law. Our assumptions on this law are very mild: roughly speaking, all laws which do not concentrate all measure on any halfspace passing through the origin will work. For instance, we allow for the uniform distribution on the sphere and also distributions that concentrate all measure on a positive spanning set.
    Given a current iterate x, STP compares the objective function at three points: x, x+αs and x−αs, where α>0 is a stepsize parameter and s is the random search direction. The best of these three points is the next iterate. We analyze the method STP under several stepsize selection schemes (fixed, decreasing, estimated through finite differences, etc). We study non-convex, convex and strongly convex cases. 
    We also propose a parallel version for STP, with iteration complexity bounds which do not depend on the dimension n. 
    arXiv: 1901.09269 
    ---------------------------------------------------------
  • [08 February, 2019]
    I am a reviewer for ICML 2019! This is my first time being a reviewer for such a conference.
    ---------------------------------------------------------
  • [26 January, 2019]
    New paper out:  
    "Distributed learning with compressed gradient differences" - joint work with Konstantin Mishchenko, Martin Takáč and Peter Richtárik.
    Abstract: Training very large machine learning models requires a distributed computing approach, with communication of the model updates often being the bottleneck. For this reason, several methods based on the compression (e.g., sparsification and/or quantization) of the updates were recently proposed, including QSGD (Alistarh et al., 2017), TernGrad (Wen et al., 2017), SignSGD (Bernstein et al., 2018), and DQGD (Khirirat et al., 2018). However, none of these methods are able to learn the gradients, which means that they necessarily suffer from several issues, such as the inability to converge to the true optimum in the batch mode, inability to work with a nonsmooth regularizer, and slow convergence rates. In this work we propose a new distributed learning method - DIANA - which resolves these issues via compression of gradient differences. We perform a theoretical analysis in the strongly convex and nonconvex settings and show that our rates are vastly superior to existing rates. Our analysis of block quantization and differences between l2 and l∞ quantization closes the gaps in theory and practice. Finally, by applying our analysis technique to TernGrad, we establish the first convergence rate for this method.
    arXiv: 1901.09269
    ---------------------------------------------------------
  • [13 January - 24 February, 2019]
    I am visiting Peter Richtárik at KAUST
    --------------------------------------------------------- 
  • [05 September, 2018]
    The paper "Stochastic spectral and conjugate descent methods" - joint work with Dmitry Kovalev, Elnur Gasanov and Peter Richtárik  - accepted to The Thirty-second Annual Conference on Neural Information Processing Systems (NIPS).
    Abstract: The state-of-the-art methods for solving optimization problems in big dimensions are variants of randomized coordinate descent (RCD). In this paper we introduce a fundamentally new type of acceleration strategy for RCD based on the augmentation of the set of coordinate directions by a few spectral or conjugate directions. As we increase the number of extra directions to be sampled from, the rate of the method improves, and interpolates between the linear rate of RCD and a linear rate independent of the condition number. We develop and analyze also inexact variants of these methods where the spectral and conjugate directions are allowed to be approximate only. We motivate the above development by proving several negative results which highlight the limitations of RCD with importance sampling.

    The conference will be in Montreal, Canada during December 3-8, 2018.
    ---------------------------------------------------------
  • [01 July - 06 July, 2018]
    I am attending the 23rd International Symposium on Mathematical Programming in Bordeaux, France. My talk is scheduled on Friday, 06 July, in the section "New methods for stochastic optimization and variational inequalities".
    The Talk: ”An Accelerated Directional Derivative Method for Smooth Stochastic Convex Optimization”
    [slides]
    ---------------------------------------------------------
  • [27 June - 28 June, 2018]
    I am in Grenoble, attending the Grenoble Optimization Days 2018. By the way I have the new most favorite photo! 
Mobirise
Here I read the excellent book by my advisor Alexander Gasnikov.
------------------------------------------------------------
  • [25 June, 2018]
    Today I successfully defended my Bachelor diploma.
    Thesis: "Accelerated Directional Searchs and Gradient-Free Methods with non-Euclidean prox-structure"
    ---------------------------------------------------------
  • [10 June - 15 June, 2018]
    I am in Traditional Youth School ”Control, Information and Optimization” (Voronovo, Russia) organized by Boris Polyak and Elena Gryazina.

    I presented a poster "An Accelerated Directional Derivative Method for SmoothStochastic Convex Optimization". My work was chosen and I gave a talk there. I won third prize for this talk in competitions of best talks among participants.
    [slides of the talk]
    ---------------------------------------------------------
  • [14 April, 2018]
    I am attending workshop ”Optimization at Work” at MIPT (Moscow, Russia) with a talk ”An Accelerated Method for Derivative-Free Smooth Stochastic Convex Optimization” .
    [slides] [video]
    ---------------------------------------------------------
  • [10 April, 2018]
    New paper out: "On the upper bound for the mathematical expectation of the norm of a vector uniformly distributed on the sphere and the phenomenon of concentration of uniform measure on the sphere" - joint work with Evgeniya Vorontsova and Alexander Gasnikov.
    Abstract: We considered the problem of obtaining upper bounds for the mathematical expectation of the q-norm (2⩽q⩽∞) of the vector which is uniformly distributed on the unit Euclidean sphere.
    ---------------------------------------------------------
  • [8 April, 2018]
    New paper out: "
    An Accelerated Directional Derivative Method for Smooth Stochastic Convex Optimization" - joint work with Pavel Dvurechensky and Alexander Gasnikov.
    Abstract: We consider smooth stochastic convex optimization problems in the context of algorithms which are based on directional derivatives of the objective function. This context can be considered as an intermediate one between derivative-free optimization and gradient-based optimization. We assume that at any given point and for any given direction, a stochastic approximation for the directional derivative of the objective function at this point and in this direction is available with some additive noise. The noise is assumed to be of an unknown nature, but bounded in the absolute value. We underline that we consider directional derivatives in any direction, as opposed to coordinate descent methods which use only derivatives in coordinate directions. For this setting, we propose a non-accelerated and an accelerated directional derivative method and provide their complexity bounds. Despite that our algorithms do not use gradient information, our non-accelerated algorithm has a complexity bound which is, up to a factor logarithmic in problem dimension, similar to the complexity bound of gradient-based algorithms. Our accelerated algorithm has a complexity bound which coincides with the complexity bound of the accelerated gradient-based algorithm up to a factor of square root of the problem dimension, whereas for existing directional derivative methods this factor is of the order of problem dimension. We also extend these results to strongly convex problems. Finally, we consider derivative-free optimization as a particular case of directional derivative optimization with noise in the directional derivative and obtain complexity bounds for non-accelerated and accelerated derivative-free methods. Complexity bounds for these algorithms inherit the gain in the dimension dependent factors from our directional derivative methods.
    ---------------------------------------------------------
  • [25 February, 2018]
    New paper out: "An Accelerated Method for Derivative-Free Smooth Stochastic Convex Optimization" - joint work with Pavel Dvurechensky and Alexander Gasnikov.
    Abstract: We consider an unconstrained problem of minimization of a smooth convex function which is only available through noisy observations of its values, the noise consisting of two parts. Similar to stochastic optimization problems, the first part is of a stochastic nature. On the opposite, the second part is an additive noise of an unknown nature, but bounded in the absolute value. In the two-point feedback setting, i.e. when pairs of function values are available, we propose an accelerated derivative-free algorithm together with its complexity analysis. The complexity bound of our derivative-free algorithm is only by a factor of \sqrt{n} larger than the bound for accelerated gradient-based algorithms, where n is the dimension of the decision variable. We also propose a non-accelerated derivative-free algorithm with a complexity bound similar to the stochastic-gradient-based algorithm, that is, our bound does not have any dimension-dependent factor. Interestingly, if the solution of the problem is sparse, for both our algorithms, we obtain better complexity bound if the algorithm uses a 1-norm proximal setup, rather than the Euclidean proximal setup, which is a standard choice for unconstrained problems.
    ---------------------------------------------------------
  • [10 February, 2018]
    New paper out:  "Stochastic spectral and conjugate descent methods" - joint work with Dmitry Kovalev, Elnur Gasanov and Peter Richtárik.
    Abstract: The state-of-the-art methods for solving optimization problems in big dimensions are variants of randomized coordinate descent (RCD). In this paper we introduce a fundamentally new type of acceleration strategy for RCD based on the augmentation of the set of coordinate directions by a few spectral or conjugate directions. As we increase the number of extra directions to be sampled from, the rate of the method improves, and interpolates between the linear rate of RCD and a linear rate independent of the condition number. We develop and analyze also inexact variants of these methods where the spectral and conjugate directions are allowed to be approximate only. We motivate the above development by proving several negative results which highlight the limitations of RCD with importance sampling.
    ---------------------------------------------------------
  • [9 February, 2018]
    I started my final semester as a Bachelor student today. This spring I will be a tutor for the course "Algorithms and Models of Computation".
    ---------------------------------------------------------
  • [5 February - 7 February, 2018]
    I am attending KAUST Research Workshop on Optimization and Big Data with poster ”Stochastic Spectral Descent Methods” based on our joint work with Dmitry Kovalev, Elnur Gasanov and Peter Richtárik.
    ---------------------------------------------------------
  • [14 January - 08 February, 2018]
    I am visiting Peter Richtárik at KAUST.
    ---------------------------------------------------------
  • [25 November, 2017]
    Today I got a prize for the best talk in Section of Information Transmission Problems, Data Analysis and Optimization, 60th Scientific Conference of MIPT.
    The talk: ”About accelerated Directional Search with non-Euclidean prox-structure”
    [slides]
    ---------------------------------------------------------
  • [27 October, 2017] 
    I am attending workshop ”Optimization at Work” at MIPT (Moscow, Russia) with a talk ”Accelerated Directional Search with non-Euclidean prox-structure”.
    [slides]
    ---------------------------------------------------------
  • [30 September, 2017]
    New paper out: "Accelerated Directional Search with non-Euclidean prox-structure" - joint work with Evgeniya Vorontsova and Alexander Gasnikov.
    Abstract: In the paper we propose an accelerated directional search method with non-euclidian prox-structure. We consider convex unconstrained optimization problem in R^n. For simplicity we start from the zero point. We expect in advance that 1-norm of the solution is close enough to its 2-norm. In this case the standard accelerated Nesterov's directional search method can be improved. In the paper we show how to make Nesterov's method n-times faster (up to a log n-factor) in this case. The basic idea is to use linear coupling, proposed by Allen-Zhu & Orecchia in 2014, and to make Grad-step in 2-norm, but Mirr-step in 1-norm. We show that for constrained optimization problem this approach stable upon an obstacle.
    ---------------------------------------------------------
  • [29 July, 2017] 
    I am happy to announce that I started working with Peter Richtárik as a member of his research group at MIPT. 

No Code Website Builder