Abstract:
Inplace rotation of an array involves nothing but moving data around in computer memory. As modern computer architectures involve several layers of caching, it is a surprisingly non-trivial problem. We describe a competitively fast algorithm (as indicated by measurements) and asymptotically count the number of data moves in the best, worst, and average case. It turns out that this task is equivalent to determining the expected sum of remainders encountered in a run of the Euclidean algorithm, which in turn can be estimated using tools from analytic number theory.
I shall motivate, describe and discuss this algorithm in detail an also compare it to several other reasonable choices.
(Joint work with Valentin Blomer)
Motivated by mathematical models for experimental evolution, we consider a large population whose size is kept fixed over the generations and in which every once in a while some randomly chosen individual experiences a beneficial mutation, leading to a slightly increased reproduction rate of its offspring. In the so-called Gerrish-Lenski parameter regime, typically a finite number of such offspring populations (of mesoscopic size) is present together with one resident type (of macroscopic size). These subpopulations perform a contest for becoming resident, a phenomenon addressed as clonal interference. In this way some of the fitness increments caused by the beneficial mutations may get lost, leading to a reduction of the long-time increase of fitness. For the so-called Moran model (a prominent model in mathematical population genetics which we will explain in the talk), it turns out that, in the limit of infinite population size, the rescaled logarithmic sizes of the contending subpopulations constitute a system of interacting piecewise linear trajectories, whose source of randomness is a Poisson point process. We present a corresponding large population limit result, investigate the long-time increase of fitness in the limiting system of "Poissonian interacting trajectories" and relate it to heuristic predictions.
Reaction networks are widely used to model systems in biology, chemistry, ecology, epidemiology, economics, and other fields. Everywhere, one of the main challenges in their analysis is the lack of precise information about the quantities involved, or even the exact mathematical form of the interactions. In this talk, I will present a qualitative framework to detect or rule out certain dynamical behaviors, using only the structure of the network and no numerical parameters.
The framework builds on a multiscale method and reduction techniques that identify unstable subnetworks which may become dominant and trigger changes in the dynamics. This makes it possible to analyze and classify the resulting bifurcations, interpreting these subnetworks as "patterns" or "motifs" associated with specific dynamical behaviors. I will show examples and recurring structures, with a focus on autocatalysis and periodic oscillations.
This colloquium delves into two nonlinear physics problems in
cosmology, each characterised by an infinite hierarchy of cumulants, essential for describing probability distributions. By examining cosmological statistics with inherent symmetries, we leverage key mathematical theorems to simplify the path integral connecting initial and final statistics, akin to steepest-descent methods. This approach, grounded in large deviations theory, enables us to trace the gravitational evolution of the Universe's large-scale matter structure into the nonlinear regime. Consequently, we predict significant non-Gaussian features of the late-time density field from first principles, offering promising avenues to explore fundamental physics with forthcoming galaxy surveys. In analyzing the collisionless dynamics of dark matter, we encounter an infinite hierarchy of coupled differential equations for the cumulants, typically truncated using fluid-like approximations. These approximations fall short in capturing multiple fluid streams due to the exclusion of higher cumulants. We propose revisiting closure schemes based on the correspondence principle, which bridges semiclassical and classical dynamics, to achieve an approximate closure with a finitely generated set of cumulants, rather than a finite number.
In this talk, we consider three questions arising in phylogenetics, based on recent work with Michael Fuchs and François Bienvenu. First, how many species need to be sampled at the present in order for their most recent common ancestor in an evolutionary tree to lie close to its root? Second, what is the distribution of the 'distance’ between two random Cayley trees (a question arising in tumor research)? Third, how likely is it that a binary phylogenetic tree (with an arbitrary number of leaves) can be uniquely reconstructed from just three multi-state characters?
Abstract:
For dynamical random processes or other stochastic models, all measurable quantities are characterized by probability distributions. In the best case, one can calculate analytically these distributions, covering the full range of support, which includes the low-probability tails. Unfortunately, most systems cannot be solved analytically, one has to use numerical simulations. When applying standard algorithms, this means running the system K times on a computer, the smallest empirical rate one can observe is 1/K. In order to access the tails of the distributions, to cover a much large range of the support, i.e., targeting probabilities such as 10^(-100) or much smaller, one can use sophisticated large-deviation algorithms.
A standard approach is based on tilting or biasing the underlying model distribution such that rare events become frequent. For a convenient sampling of such biased distributions, here a Markov chain Monte Carlo method is explained, where the configurations are vectors of originally [0,1] uniformly or Gaussian distributed random numbers. The vectors generated by the Markov chain are used in the simulation to drive the model, while the model output is used to calculate the bias.
An example applications discussed here are properties of random graphs, e.g. the distribution of the size of the largest component for Erdös-Renyi random graphs. Also, fractional Brownian motion is considered, here the distribution of end points for a system with an absorbing boundary as well as the distribution of first-passage areas, where the left tail can be analyzed for probability densities as low as 10^(-190). Also few other models are briefly covered such as convex hulls of random walks, biological sequence alignment, or non-equilibrium work processes.
Abstract:
This talk presents a generic framework for analytic combinatorics, enabling efficient enumeration and sampling of diverse combinatorial structures. Key features include rapid enumeration via Newton iteration, precise Boltzmann sampling with dynamic arbitrary-precision algorithms, and automated workflows for classes defined by context-free and multiple context-free grammars (MCFGs). While broadly applicable, the presentation will focus on applications in (bio-)chemistry, including the enumeration and sampling of monosubstituted hydrocarbons with stereoinformation and of RNA secondary structures with pseudoknots.
(joined work with Casper Asbjørn Eriksen and Markus Nebel)
Abstract:
Understanding the potential for cooperation among bacteria in the gut environment is crucial for advancing microbiota engineering and optimization. This talk will first review well-established mechanisms that sustain cooperation and then contextualize these mechanisms within the gut environment. More specifically, biological data related to bacterial clustering in the cecum of mice, obtained within our research group, will be presented, along with our strategies for leveraging this information. To this end, we will introduce ongoing work on a simplified mathematical model of bacterial growth in the mouse cecum and discuss its implications. The talk will focus on modeling challenges and rely solely on elementary mathematics, making it suitable for a broad audience.
Abstract:
Compared to artificial neural networks (ANNs), the brain learns faster, generalizes better to new situations and consumes much less energy. ANNs are motivated by the functioning of the brain but differ in several crucial aspects. For instance, ANNs are deterministic while biological neural networks (BNNs) are stochastic. Moreover, it is biologically implausible that the learning of the brain is based on gradient descent. In this talk we look at biological neural networks as a statistical method for supervised learning. We relate the local updating rule of the connection parameters in BNNs to a zero-order optimization method and derive some first statistical risk bounds.
 
Abstract:
While methods for meta-analysis, i.e. the weighted summary of the results of several studies, are meanwhile routinely used in the context of intervention studies, there is still a need for newly developed statistical approaches for meta-analysis of diagnostic test accuracy studies. This is due to the complex underlying data structure with an at least bivariate outcome consisting of sensitivity and specificity. Thereby, sensitivity and specificity refer to the conditional probabilities of a positive or negative test result in the population of diseased and non-diseased, respectively.
A particular challenge arises when individual studies report not only a pair of sensitivity and specificity, but complete receiver operating characteristic (ROC) curves, in which sensitivity and specificity are evaluated at several different diagnostic thresholds. One way to appropriately deal with these data is to adapt statistical methods from survival analysis, such as the application of bivariate time-to-event models for interval- censored data.
In this talk, the general framework of the meta-analysis of diagnostic test accuracy studies will be presented with special respect to statistical challenges and practical applications. Furthermore, newly developed, innovative methods for the complex situation of meta-analysis of ROC curves will be introduced and discussed.
Abstract:
An essential feature in modern data science, especially in machine learning as well as high-dimensional statistics, are large sample sizes and large parameter space dimensions. As a consequence, the design of methods for uncertainty quantification is characterized by a tension between numerically feasible and efficient algorithms and approaches which satisfy theoretically justified statistical properties.
In this talk we discuss a Bayesian MCMC-based method with a stochastic Metropolis-Hastings as a potential solution. By calculating acceptance probabilities on batches, a stochastic Metropolis-Hastings step saves computational costs, but reduces the effective sample size. We show that this obstacle can be avoided by a simple correction term. We study statistical properties of the resulting stationary distribution of the chain if the corrected stochastic Metropolis-Hastings approach is applied to sample from a Gibbs posterior distribution in a nonparametric regression setting. Focusing on deep neural network regression, we prove a PAC-Bayes oracle inequality which yields optimal contraction rates and we analyze the diameter and show high coverage probability of the resulting credible sets.
The talk is based on joint work with Sebastian Bieringer, Gregor Kasieczka and Maximilian F. Steffen.
Abstract:
Structural plasticity and other types of network remodeling are prominent and well-documented processes during brain development, maturation, learning, and aging. Stunningly high turnover rates of neuronal connectivity have been observed under baseline conditions, and even more so upon stimulation or perturbation. However, these findings raise questions about the underlying biological mechanisms and biological function of this drastic form of brain plasticity. Since direct experiments are notoriously difficult to perform and analyze, formal models are crucial for predicting and understanding the emergent properties of graphs or networks with highly dynamic structure. To this end, we consider ensembles of directed multi-graphs with constrained indegrees and outdegrees, the Directed Configuration Model, as this reflects well the limited material resources of brain networks. Similar to the law of mass action in chemistry, we outline a kinetic theory of random graph formation and structural self-organization. We study the equilibria of these networks and describe the transient effects of perturbation. We apply our theoretical findings to biological neural networks of the brain, where the vertices are neurons and the directed edges are chemical synapses between neurons. In this setting, stimulation leads to perturbation of the network structure, followed by dynamic reconfiguration and convergence to a new structural equilibrium that is different from the previous one, reminiscent of hysteresis in solid state physics. I will describe some emergent properties of this new model, including Hebb's rule (“neurons that fire together wire together”) and self-repair of networks under appropriate conditions. For certain perturbation strategies, this provides us with models for brain maturation during adolescence and for engram formation in learning and memory. I will also discuss possible links to psychological phenomena like classical conditioning and describe new machine learning strategies based on the self-organization of networks that can be derived from our biologically motivated theory.
Abstract:
I will explain an astonishingly accurate, yet pretty simple approximation to evaluate thermodynamic equilibrium functions in quantum mechanics. The method rests on the Lanczos procedure, thus can be understood with matrix-vector procedures. Although originally designed to approximate extreme eigenvalues it could be shown that the method can easily be extended to address problems at finite temperature. A mathematical as well as a physical interpretation will be offered.
Abstract:
The class of hidden Markov models (HMMs) is a popular tool for modelling time series driven by underlying states. HMMs can be used for example to relate animal movement processes to underlying behavioural modes, to infer the disease state of a patient from biomarkers, or to predict extreme share returns as a function of the underlying nervousness of the financial market. After a general introduction to this versatile class of time series models, I will focus on one particular extension of HMMs, namely model formulations that attempt to capture periodic variation in the state-switching dynamics (e.g. daily variation in animal behaviour, or seasonality in economic markets). Such periodic variation is commonly modelled using trigonometric functions, but can alternatively and more flexibly be incorporated using cyclic splines. I will showcase these modelling approaches in case studies, in which I will also illustrate a positive consequence of such periodic modelling, namely that for periodic HMMs the distributions of the dwell times in the different system states can deviate substantially from a geometric distribution as it would be implied by a homogeneous HMM.
 
Abstract:
In this talk I shall discuss classical risk processes and risk processes with level dependent premium rate. I shall present a purely probabilistic approach to processes with level-dependent rate, which allows one to obtain upper and lower bounds for ruin probabilities.
 
Abstract:
Randomized clinical trials are the preferred study design for answering medical research questions in an evidence-based way. Apart from a research idea, different aspects have to be decided on when planning a clinical trial, e.g. the treatments to compare with, the endpoints and the sample size.
Clinical trials can only be deemed successful if they come along with a valid sample size planning. Sample sizes should be large enough to detect an existing relevant effect with a sufficient probability. At the same time, they are supposed to be as small as possible to keep possible study related risks low and to not postpone a potential market approval unnecessarily.
After a short introduction to clinical trials, we consider the basic concept of sample size planning as well as more advanced statistical methods for updating sample sizes during ongoing trials.
Abstract:
Diffusion models have been one of the fundamental building blocks of generative deep learning techniques such as "text-to-image" models. In essence, diffusion models reflect a novel, neural network guided sampling strategy. They apply for heavily complex probability distributions. They are based on the general idea that only artificial intelligence techniques can lead one the way from random number generation to, for example, random images showing beaches with palm trees. I will discuss the mathematical and computational foundations of these models. I will also point out why such models may have the potential in attempts to "generate random life".
 
Abstract:
In the talk we will merge the emerging field of scRNA sequencing data with the established field of phylogenomics. We will show that a simple transformation of the complex and high-dimensional scRNA data and the subsequent phylogenetic analysis leads to surprisingly clear visualisations and can thus provide the basis for new biological insights. The talk will be rounded off with some illustrative examples.
This is joint work with Christiane Elgert and Julia Naas.