Institutional Repository
Array
Pages
-
- Supplemental Figures and Data _ Small-Scale Process Engineering for Inverse Phase Miniemulsion Polymerization of Hydrogel Nanoparticles for Therapeutic Applications
- This data set supports the process development work described in the associated manuscript, capturing the transition from bench-scale synthesis to small-scale process development of hydrogel nanoparticles. It includes the raw and processed quantitative data for experimental variables tested during this scale-down effort. Additionally, the data set provides the original red-green channel versions of fluorescence microscopy images used in cell studies. In the manuscript, these images were adjusted to be accessible for readers with red-green color vision deficiency; here, the unmodified channel versions are made available to ensure scientific transparency and reproducibility.
-
- Estimation for Dyadic-Dependent Exponential Random Graph Models
- Graphs are the primary mathematical representation for networks, with nodes or vertices corresponding to units (e.g., individuals) and edges corresponding to relationships. Exponential Random Graph Models (ERGMs) are widely used for describing network data because of their simple structure as an exponential function of a sum of parameters multiplied by their corresponding sufficient statistics. As with other exponential family settings the key computational difficulty is determining the normalizing constant for the likelihood function, a quantity that depends only on the data. In ERGMs for network data, the normalizing constant in the model often makes the parameter estimation intractable for large graphs, when the model involves dependence among dyads in the graph. One way to deal with this problem is to approximate the likelihood function by something tractable, e.g., by using the method of pseudo-likelihood estimation suggested in the early literature. In this paper, we describe the family of ERGMs and explain the increasing complexity that arises from imposing different edge dependence and homogeneous parameter assumptions. We then compare maximum likelihood (ML) and maximum pseudo-likelihood (MPL) estimation schemes with respect to existence and related degeneracy properties for ERGMs involving dependencies among dyads.
-
- Generic Identification of Binary-Valued Hidden Markov Processes
- The generic identification problem is to decide whether a stochastic process (X_t) is a hidden Markov process and if yes to infer its parameters for all but a subset of parametrizations that form a lower-dimensional subvariety in parameter space. Partial answers so far available depend on extra assumptions on the processes, which are usually centered around stationarity. Here we present a general solution for binary-valued hidden Markov processes. Our approach is rooted in algebraic statistics hence it is geometric in nature. We find that the algebraic varieties associated with the probability distributions of binary-valued hidden Markov processes are zero sets of determinantal equations which draws a connection to well-studied objects from algebra. As a consequence, our solution allows for algorithmic implementation based on elementary (linear) algebraic routines.
-
- Binary hidden Markov models and varieties
- This paper closely examines HMMs in which all the hidden random variables are binary. Its main contributions are (1) a birational parametrization for every such HMM, with an explicit inverse for recovering the hidden parameters in terms of observables, (2) a semialgebraic model membership test for every such HMM, and (3) minimal dening equations for the 4-node fully binary model, comprising 21 quadrics and 29 cubics, which were computed using Grobner bases in the cumulant coordinates of Sturmfels and Zwiernik. The new model parameters in (1) are rationally identiable in the sense of Sullivant, Garcia-Puente, and Spielvogel, and each model's Zariski closure is therefore a rational projective variety of dimension 5. Grobner basis computations for the model and its graph are found to be considerably faster using these parameters. In the case of two hidden states, item (2) supersedes a previous algorithm of Schonhuth which is only generically dened, and the dening equations (3) yield new invariants for HMMs of all lengths 4. Such invariants have been used successfully in model selection problems in phylogenetics, and one can hope for similar applications in the case of HMMs.
-
- Learning Coefficient in Bayesian Estimation of Restricted Boltzmann Machine
- We consider the real log canonical threshold for the learning model in Bayesian estimation. This threshold corresponds to a learning coefficient of generalization error in Bayesian estimation, which serves to measure learning efficiency in hierarchical learning models [30, 31, 33]. In this paper, we clarify the ideal which gives the log canonical threshold of the restricted Boltzmann machine and consider the learning coefficients of this model.
-
- The precision space of interpolatory cubature formulæ
- Methods from Commutative Algebra and Numerical Analysis are combined to address a problem common to many disciplines: the estimation of the expected value of a polynomial of a random vector using a linear combination of a finite number of its values. In this work we remark on the error estimation in cubature formulæ for polynomial functions and introduce the notion of a precision space for a cubature rule.
-
- The degeneration of the Grassmannian into a toric variety and the calculation of the eigenspaces of a torus action
- Using the method of degenerating a Grassmannian into a toric variety, we calculate formulas for the dimensions of the eigenspaces of the action of an n-dimensional torus on a Grassmannian of planes in an n-dimensional space.
-
- Varieties with maximum likelihood degree one
- We show that algebraic varieties with maximum likelihood degree one are exactly the images of reduced A-discriminantal varieties under monomial maps with finite fibers. The maximum likelihood estimator corresponding to such a variety is Kapranov’s Horn uniformization. This extends Kapranov’s characterization of A-discriminantal hypersurfaces to varieties of arbitrary codimension.
-
- Maximum Likelihood for Matrices with Rank Constraints
- Maximum likelihood estimation is a fundamental optimization problem in statistics. We study this problem on manifolds of matrices with bounded rank. These represent mixtures of distributions of two independent discrete random variables. We determine the maximum likelihood degree for a range of determinantal varieties, and we apply numerical algebraic geometry to compute all critical points of their likelihood functions. This led to the discovery of maximum likelihood duality between matrices of complementary ranks, a result proved subsequently by Draisma and Rodriguez.
-
- Uncovering Proximity of Chromosome Territories using Classical Algebraic Statistics
- Exchange type chromosome aberrations (ETCAs) are rearrangements of the genome that occur when chromosomes break and the resulting fragments rejoin with fragments from other chromosomes or from other regions within the same chromosome. ETCAs are commonly observed in cancer cells and in cells exposed to radiation. The frequency of these chromosome rearrangements is correlated with their spatial proximity, therefore it can be used to infer the three dimensional organization of the genome. Extracting statistical significance of spatial proximity from cancer and radiation data has remained somewhat elusive because of the sparsity of the data. We here propose a new approach to study the three dimensional organization of the genome using algebraic statistics. We test our method on a published data set of irradiated human blood lymphocyte cells. We provide a rigorous method for testing the overall organization of the genome, and in agreement with previous results we find a random relative positioning of chromosomes with the exception of the chromosome pairs {1,22} and {13,14} that have a significantly larger number of ETCAs than the rest of the chromosome pairs suggesting their spatial proximity. We conclude that algebraic methods can successfully be used to analyze genetic data and have potential applications to larger and more complex data sets.
-
- A linear-algebraic tool for conditional independence inference
- In this note, we propose a new linear-algebraic method for the implication problem among conditional independence statements, which is inspired by the factorization characterization of conditional independence. First, we give a criterion in the case of a discrete strictly positive density and relate it to an earlier linear-algebraic approach. Then, we extend the method to the case of a discrete density that need not be strictly positive. Finally, we provide a computational result in the case of six variables.
-
- A Family of Quasisymmetry Models
- We present a one-parameter family of models for square contingency tables that interpolates between the classical quasisymmetry model and its Pearsonian analogue. Algebraically, this corresponds to deformations of toric ideals associated with graphs. Our discussion of the statistical issues centers around maximum likelihood estimation.
-
- Markov degree of configurations defined by fibers of a configuration
- We consider a series of configurations defined by fibers of a given base configuration. We prove that Markov degree of the configurations is bounded from above by the Markov complexity of the base configuration. As important examples of base configurations we consider incidence matrices of graphs and study the maximum Markov degree of configurations defined by fibers of the incidence matrices. In particular we give a proof that the Markov degree for two-way transportation polytopes is three.
-
- On Polyhedral Approximations of Polytopes for Learning Bayesian Networks
- The motivation for this paper is the geometric approach to statistical learning Bayesiannetwork (BN) structures. We review three vector encodings of BN structures. The first one has been used by Jaakkola et al. [9] and also by Cussens [4], the other two use special integral vectors formerly introduced, called imsets [18, 20]. The topic is the comparison of outer polyhedral approximations of the corresponding polytopes. We show how to transform the inequalities suggested by Jaakkola et al. [9] into the framework of imsets. The result of our comparison is the observation that the implicit polyhedral approximation of the standard imset polytope suggested in [21] gives a tighter approximation than the (transformed) explicit polyhedral approximation from [9]. As a consequence, we confirm a conjecture from [21] that the above-mentioned implicit polyhedral approximation of the standard imset polytope is an LP relaxation of that polytope. In the end, we review recent attempts to apply the methods of integer programming to learning BN structures and discuss the task of finding suitable explicit LP relaxation in the imset-based approach.
-
- Higher Connectivity of Fiber Graphs of Gröbner Bases
- Fiber graphs of Gröbner bases from contingency tables are important in statistical hypothesis testing, where one studies random walks on these graphs using the Metropolis-Hastings algorithm. The connectivity of the graphs has implications on how fast the algorithm converges. In this paper, we study a class of ber graphs with elementary combinatorial techniques and provide results that support a recent conjecture of Engström: the connectivity is given by the minimum vertex degree.
-
- Betti Numbers of Cut Ideals of Trees
- Cut ideals, introduced by Sturmfels and Sullivant, are used in phylogenetics and algebraic statistics. We study the minimal free resolutions of cut ideals of tree graphs. By employing basic methods from topological combinatorics, we obtain upper bounds for the Betti numbers of this type of ideals. These take the form of simple formulas on the number of vertices, which arise from the enumeration of induced subgraphs of certain incomparability graphs associated to the edge sets of trees.
-
- Tying Up Loose Strands: Defining Equations of the Strand Symmetric Model
- The strand symmetric model is a phylogenetic model designed to reflect the symmetry inherent in the double-stranded structure of DNA. We show that the set of known phylogenetic invariants for the general strand symmetric model of the three leaf claw tree entirely defines the ideal. This knowledge allows one to determine the vanishing ideal of the general strand symmetric model of any trivalent tree. Our proof of the main result is computational. We use the fact that the Zariski closure of the strand symmetric model is the secant variety of a toric variety to compute the dimension of the variety. We then show that the known equations generate a prime ideal of the correct dimension using elimination theory.
-
- The maximum likelihood degree of Fermat hypersurfaces
- We study the critical points of the likelihood function over the Fermat hypersurface. This problem is related to one of the main problems in statistical optimization: maximum likelihood estimation. The number of critical points over a projective variety is a topological invariant of the variety and is called maximum likelihood degree. We provide closed formulas for the maximum likelihood degree of any Fermat curve in the projective plane and of Fermat hypersurfaces of degree 2 in any projective space. Algorithmic methods to compute the ML degree of a generic Fermat hypersurface are developed throughout the paper. Such algorithms heavily exploit the symmetries of the varieties we are considering. A computational comparison of the different methods and a list of the maximum likelihood degrees of several Fermat hypersurfaces are available in the last section.
-
- On the Connectivity of Fiber Graphs
- We consider the connectivity of fiber graphs with respect to Gröbner basis and Graver basis moves. First, we present a sequence of fiber graphs using moves from a Gröbner basis and prove that their edge-connectivity is lowest possible and can have an arbitrarily large distance from the minimal degree. We then show that graph-theoretic properties of fiber graphs do not depend on the size of the right-hand side. This provides a counterexample to a conjecture of Engström on the node-connectivity of fiber graphs. Our main result shows that the edge-connectivity in all fiber graphs of this counterexample is best possible if we use moves from Graver basis instead.