# Master Projects (Masterarbeiten)

If you are interested in one of the following topics, please contact the person listed with the respective topic.

If you don't find a compelling topic in our list, we are always keen on hearing about your ideas in the area of our research interests (see the Research section of our website). You are most welcome to discuss your interests with us.

Also, we have a list of finished diploma theses on our website.

## List of Projects

- Early prediction of sepsis from clinical data
- Approximation theory of deep neural networks
- Distribution-preserving lossy compression
- Phase transitions for matrix separation
- Estimation of fractal dimensions
- Analyzing cost functions for generative adversarial networks
- Learned priors for variational autoencoders
- Theoretical guarantees for stochastic gradient descent

### Early prediction of sepsis from clinical data (SA/DA)

The aim of this project is to develop and evaluate an algorithm for early prediction of sepsis from clinical data, and to submit the algorithm to the PhysioNet/Computing in Cardiology Challenge 2019 [1]. Sepsis is a life-threatening condition that occurs when the body's response to infection causes tissue damage, organ failure, or death. In the U.S., nearly 1.7 million people develop sepsis and 270,000 people die from sepsis each year, costing U.S. hospitals more than any other health condition at $24 billion/year [1]. Please see the challenge website [1] for more information.

This project offers the opportunity to gain experience in learning from noisy, variable-length clinical time series data with missing entries. Image credit [1].

Type of project: 10%-30% theory, 70%-90% programming

Prerequisites: Programming, linear algebra, experience with deep learning software is a plus

Supervisor: Michael Tschannen

Professor:
Helmut Bölcskei

References:

[1]
Early prediction of sepsis from clinical data: The PhysioNet/Computing in Cardiology Challenge 2019
[Link to Document]

### Approximation theory of deep neural networks (SA/DA)

Deep neural networks approximate functions by arranging neurons in layers that are connected by weighted edges. Deep learning-based methods have recently shown stellar performance in various practical applications, such as speech recognition [1], image classification [2], handwritten digit recognition [3], and game intelligence [4]. This success led to a renewed interest in understanding theoretical properties of deep neural networks.

In this project, you will study various theoretical treatises on benefits of depth in neural networks. In [5], [6] it is shown that in the approximation of certain function classes shallow networks need exponentially many more nodes than deep neural networks. Similarly, in [7] it is demonstrated that approximation of a simple radial function by 2-layer neural networks requires exponentially many nodes in the input dimension, whereas the same function can be approximated by a small 3-layer feedforward network. Smooth functions are also known to be approximated more efficiently by deeper neural networks [8]. The purpose of this project is to evaluate these results in the light of a new theory [9], [10] which introduces the concept of Kolmogorov-optimal approximation characterizing connectivity and memory requirements of the approximating network and the complexity of the function class to be approximated.

Type of project: 100% theory

Prerequisites: Strong mathematical background, knowledge on (deep) neural networks

Supervisor: Recep Gül

Professor:
Helmut Bölcskei

References:

[1]
G. Hinton, L. Deng, D. Yu, et al., "Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups," *IEEE Signal Process. Mag.*, vol. 29, no. 6, pp. 82–97, 2012.
[Link to Document]

[2]
A. Krizhevsky, I. Sutskever, and G. E. Hinton, "Imagenet classification with deep convolutional neural networks," in *Advances in Neural Information Processing Systems 25*, Curran Associates, Inc., pp. 1097–1105, 2012.
[Link to Document]

[3]
Y. LeCun, L. D. Jackel, L. Bottou, et al., "Comparison of learning algorithms for handwritten digit recognition," *International Conference on Artificial Neural Networks*, pp. 53–60, 1995.
[Link to Document]

[4]
D. Silver, A. Huang, C. J. Maddison, et al., "Mastering the game of Go with deep neural networks and tree search," *Nature*, vol. 529, no. 7587, pp. 484–489, 2016.
[Link to Document]

[5]
M. Telgarsky, "Benefits of depth in neural networks," *arXiv:1602.04485*, 2016.
[Link to Document]

[6]
M. Telgarsky, "Representation benefits of deep feedforward networks," *arXiv:1509.08101*, 2015.
[Link to Document]

[7]
R. Eldan, O. Shamir, "The power of depth for feedforward neural networks," *arXiv:1512.03965*, 2016.
[Link to Document]

[8]
D. Yarotsky, "Error bounds for approximations with deep ReLU networks," *arXiv:1610.01145*, 2017.
[Link to Document]

[9]
H. Bölcskei, P. Grohs, G. Kutyniok, and P. Petersen, "Optimal approximation with sparsely connected deep neural networks,"* SIAM Journal on Mathematics of Data Science*, vol. 1, no. 1, pp. 8-45, 2019.
[Link to Document]

[10]
P. Grohs, D. Perekrestenko, D. Elbrächter, and H. Bölcskei, "Deep neural network approximation theory," *IEEE Transactions on Information Theory*, Jan. 2019, submitted.
[Link to Document]

### Distribution-preserving lossy compression (SA/DA)

Recent advances in extreme image compression [1] allow artifact-free image reconstruction even at very low bitrates. Motivated by these results, [2] formalizes the concept of distribution-preserving lossy compression (DPLC), which optimizes the compression rate-distortion tradeoff under the constraint of the reconstructed (decompressed) samples following the empirical distribution of the training data. Specifically, a DPLC system (almost) perfectly reconstructs the training data when enough bits are allocated to the compressed representation. When zero bits are assigned to the compressed representation it learns a (deep) generative model of the data, and for intermediate bitrates DPLC smoothly interpolates between matching the distribution of the training data and perfectly reconstructing the training samples (cf. the figure on the left; the numbers at the top correspond to different rates (in bits per pixel) and each row corresponds to a different decoder realization). The DPLC framework introduced in [2] was so far applied to images only. This project shall explore new applications such as speech compression.

Type of project: 20%-40% theory, 60%-80% programming, depending on the student's preference

Prerequisites: Programming, linear algebra, experience with deep learning software is a plus

Supervisor: Michael Tschannen

Professor:
Helmut Bölcskei

References:

[1]
E. Agustsson, M. Tschannen, F. Mentzer, R. Timofte, and L. Van Gool, "Generative adversarial networks for extreme learned image compression," arXiv:1804.02958, 2018.
[Link to Document]

[2] M. Tschannen, E. Agustsson, and M. Lučić, "Deep generative models for distribution-preserving lossy compression," arXiv:1805.11057, 2018. [Link to Document]

### Phase transitions for matrix separation (DA)

A phase transition is a sharp division of the parameter space of a data processing problem into a region of success and a region of failure. This phenomenon was studied for the matrix separation problem [1, 2] and for the matrix completion problem [3] under specific recovery algorithms. Information-theoretic phase transitions do not depend on specific recovery algorithms and reveal what is possible in principle. Recently, information-theoretic phase transitions were obtained for the matrix completion problem [4] and for the signal separation problem [5].

The first goal of this project is to build on the techniques developed in [4, 5] to characterize information-theoretic phase transitions for the matrix separation problem, which in its traditional incarnation separates a low-rank matrix from a sparse matrix. The second goal is to investigate pairs of matrix structures (beyond low-rank and sparse) that allow for separation and to characterize their information-theoretic phase transitions.

Type of project: 80% theory, 20% simulation

Prerequisites: Strong mathematical background, measure theory, probability theory

Supervisor: Erwin Riegler

Professor:
Helmut Bölcskei

References:

[1] D. Amelunxen, M. Lotz, M. B. McCoy, and J. A. Tropp, “Living on the edge: A geometric theory of phase transitions in convex optimization,” *Information and Inference*, vol. 3, no. 3, pp. 224–294, Jun. 2014.
[Link to Document]

[2] V. Chandrasekaran, S. Sanghavi, P. A. Parrilo, and A. S. Willsky, “Rank-sparsity incoherence for matrix decomposition,” *SIAM Journal on Optimization*, vol. 21, no. 2, pp. 572–596, Jun. 2011.
[Link to Document]

[3] E. J. Candés and B. Recht, “Exact matrix completion via convex optimization,” *Foundations of Computational Mathematics*, vol. 9, no. 6, pp. 717–772, Dec. 2009.
[Link to Document]

[4]
E. Riegler, D. Stotz, and H. Bölcskei, “Information-theoretic limits of matrix completion,” *Proc. IEEE Int. Symp. on
Inf. Theory (ISIT)*, pp. 106–110, Jun. 2015. [Link to Document]

[5]
D. Stotz, E. Riegler, E. Agustsson, and H. Bölcskei, “Almost lossless analog signal separation and probabilistic uncertainty relations,” *IEEE Trans. Inf. Theory*, vol. 63, no. 9, pp. 5445-5460, Sep. 2017.
[Link to Document]

### Estimation of fractal dimensions (DA)

Fractal dimensions such as the box-counting dimension or the Rényi information dimension of a random vector often appear as a measure of the vector's description complexity. Recently, fractal dimensions were shown to characterize fundamental limits in information-theoretic problems such as analog data compression [2, 3] and communication over interference channels [4 - 6]. The exact computation of fractal dimensions is, in general, hard, as explicit formulae are available only for certain classes of well-behaved probability distributions.

The goal of this project is to develop, based on the results in [1], a statistical theory and corresponding algorithms for the estimation of fractal dimensions. The resulting estimation procedures shall then be used to gain insights into the fractal dimension of sources with structure relevant to information-theoretic problems.

Type of project: 60% theory, 40% simulation

Prerequisites: Strong mathematical background, measure theory, probability theory

Supervisor: Erwin Riegler

Professor:
Helmut Bölcskei

References:

[1] C. D. Cutler, “A review of the theory and estimation of fractal dimension,” in *Dimension estimation and models*, H. Tong, Ed., Nonlinear Time Series and Chaos, vol. 1, pp. 1–107, World Scientific, 1993.

[2] Y. Wu and S. Verdú, “Rényi information dimension: Fundamental limits of almost lossless analog compression,” *IEEE Trans. Inf. Theory*, vol. 56, no. 8, pp. 3721–3748, Aug. 2010. [Link to Document]

[3]
D. Stotz, E. Riegler, E. Agustsson, and H. Bölcskei, “Almost lossless analog signal separation and probabilistic uncertainty relations,” *IEEE Trans. Inf. Theory*, vol. 63, no. 9,\
pp. 5445-5460, Sep. 2017.
[Link to Document]

[4]
Y. Wu, S. Shamai (Shitz), and S. Verdú, “Information dimension and the degrees of freedom of the interference channel,” *IEEE Trans. Inf. Theory*, vol. 61, no. 1, pp. 256–279, Jan. 2015. [Link to Document]

[5]
D. Stotz and H. Bölcskei, “Degrees of freedom in vector interference channels,” *IEEE Transactions on Information Theory*, vol. 62, no. 7, pp. 4172–4197, Jul. 2016.
[Link to Document]

[6]
D. Stotz and H. Bölcskei, “Characterizing degrees of freedom through additive combinatorics,” *IEEE Transactions on Information Theory*, vol. 62, no. 11, pp. 6423–6435, Nov. 2016.
[Link to Document]

### Analyzing cost functions for generative adversarial networks (SA/DA)

Generative adversarial networks (GANs) [1] have led to remarkable results in machine learning, in particular in image generation tasks [2]. The GAN-learning problem is a two-player game between the so-called generator, who learns how to generate samples resembling the training data, and the discriminator, who learns how to discriminate between real and fake data points. Both players aim at minimizing their own cost function until a Nash-equilibrium is achieved.

The goal of this project is to analyze–-mathematically and possibly experimentally–-different cost functions in the context of GAN-learning for image generation tasks.

Type of project: 80%-90% theory, 10-20% simulation

Prerequisites: Analysis, linear algebra, probability theory, Python

Supervisor: Michael Tschannen

Professor:
Helmut Bölcskei

References:

[1] I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” *Proc. of Neural Information Processing Systems (NIPS), pp. 2672–2680*, 2014. [Link to Document]

[2] A. Radford, L. Metz, and S. Chintala “Unsupervised representation learning with deep convolutional generative adversarial networks,” *arXiv:1511.06434*, 2015. [Link to Document]

### Learned priors for variational autoencoders (SA/DA)

Variational autoencoders (VAEs) [1] are a popular method for variational inference thanks to their elegant formulation allowing for straightforward implementation and stable training. However, scaling VAEs to high-resolution and complex image data sets was found to be challenging. A promising direction towards learning more powerful VAEs is to use flexible, learned priors, as explored in [2]. This project aims to benchmark existing methods for learning such priors, explore variations thereof, and propose new methods. Image credit [2].

Type of project: 20%-40% theory, 60%-80% programming

Prerequisites: Programming, linear algebra, experience with deep learning software is a plus

Supervisor: Michael Tschannen

Professor:
Helmut Bölcskei

References:

[1]
D. Kingma and M. Welling, "Auto-encoding variational Bayes," International Conference on Learning Representations, 2014.
[Link to Document]

[2] J. Tomczak and M. Welling, "VAE with a VampPrior," International Conference on Artificial Intelligence and Statistics, 2018. [Link to Document]

### Theoretical guarantees for stochastic gradient descent (DA)

Over the last decade deep neural networks have become a staple method of machine learning. A neural network effectively implements a function as an alternating composition of affine maps and nonlinearities. The nonlinearity typically acts componentwise and is fixed, whereas the coefficients of the affine maps, often called the parameters of the neural network, need to be learned by solving the optimization problem of minimizing a loss function. This is usually done through stochastic gradient descent algorithms [1].

This approach to machine learning works incredibly well, as evidenced by groundbreaking applications of deep learning in optical character recognition [2], image classification [3], and speech recognition [4]. However, theoretical results describing conditions under which stochastic gradient descent is guaranteed to solve a given learning task are scarce. The goal of this project is to explore the nature of the stochastic gradient descent algorithm as applied to simple learning scenarios, and, if possible, to derive conditions on the architecture and the parameters of the neural network that yield success guarantees.

Type of project: 60% theory/literature research, 40% own derivations and proofs

Prerequisites: Applied analysis background, basic familiarity with neural networks

Supervisor: Verner Vlačić

Professor:
Helmut Bölcskei

References:

[1]
D. Kingma and J. Ba, "Adam: A method for stochastic optimization," *Proceedings of the 3rd International Conference on Learning Representations*, 2015.
[Link to Document]

[2]
Y. LeCun, L. D. Jackel, L. Bottou, et al., "Comparison of learning algorithms for handwritten digit recognition," *International Conference on Articial Neural Networks*, pp. 53–60, 1995.
[Link to Document]

[3]
A. Krizhevsky, I. Sutskever, and G. E. Hinton, "Imagenet classification with deep convolutional neural networks," in *Advances in Neural Information Processing Systems 25*, Curran Associates, Inc., pp. 1097–1105, 2012.
[Link to Document]

[4]
G. Hinton, L. Deng, D. Yu, et al., "Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups," *IEEE Signal Process. Mag.*, vol. 29, no. 6, pp. 82–97, 2012.
[Link to Document]