Search results

Title: SEQUENTIAL MONTE CARLO METHODS FOR PARAMETER ESTIMATION, DYNAMIC STATE ESTIMATION AND CONTROL IN POWER SYSTEMS
Creator: Maldonado, Daniel Adrian
Date: 2017, 2017-05
Description: The estimation, operation and control of electrical power systems have always contained a degree of uncertainty. It is expected that, with the...
Show moreThe estimation, operation and control of electrical power systems have always contained a degree of uncertainty. It is expected that, with the introduction of technologies such as distributed generation and demand-side management, the ability of system operators to forecast the dynamic behavior of the system will deteriorate and as a result, the cost of keeping the system together will increase. Sequential Monte Carlo or Particle Filtering is a family of algorithms to efficiently perform inference in non-linear dynamic systems by exploiting their structure without assuming any linearity or normality structure. In this thesis we provide two novel ways of employing these algorithms for inference and control of power systems. First, we motivate the use Bayesian statistics in load modelling by introducing a novel statistical model to capture the aggregated response of a set of loads. We then use the model to characterize load with measurement data and prior information using the Sequential Monte Carlo algorithm. Second, we introduce the Model Predictive Control for power system stabilization. We present the use of the Sequential Monte Carlo algorithm as a way of solving the stochastic Model Predictive Control problem and we compare its performance to existing regulators. In addition, Model Predictive Control is applied to load shedding Finally, we test the performance of the algorithm in a large power system scenario.
Ph.D. in Electrical Engineering, May 2017
Show less

Title: DEEP LEARNING FOR IMAGE PROCESSING WITH APPLICATIONS TO MEDICAL IMAGING
Creator: Zarshenas, Amin
Date: 2019
Description: Deep Learning is a subfield of machine learning concerned with algorithms that learn hierarchical data representations. Deep learning has...
Show moreDeep Learning is a subfield of machine learning concerned with algorithms that learn hierarchical data representations. Deep learning has proven extremely successful in many computer vision tasks including object detection and recognition. In this thesis, we aim to develop and design deep-learning models to better perform image processing and tackle three important problems: natural image denoising, computed tomography (CT) dose reduction, and bone suppression in chest radiography (“chest x-ray”: CXR). As the first contribution of this thesis, we aimed to answer to probably the most critical design questions, under the task of natural image denoising. To this end, we defined a class of deep learning models, called neural network convolution (NNC). We investigated several design modules for designing NNC for image processing. Based on our analysis, we design a deep residual NNC (R-NNC) for this task. One of the important challenges in image denoising regards to a scenario in which the images have varying noise levels. Our analysis showed that training a single R-NNC on images at multiple noise levels results in a network that cannot handle very high noise levels; and sometimes, it blurs the high-frequency information on less noisy areas. To address this problem, we designed and developed two new deep-learning structures, namely, noise-specific NNC (NS-NNC) and a DeepFloat model, for the task of image denoising at varying noise levels. Our models achieved the highest denoising performance comparing to the state-of-the-art techniques.As the second contribution of the thesis, we aimed to tackle the task of CT dose reduction by means of our NNC. Studies have shown that high dose of CT scans can increase the risk of radiation-induced cancer in patients dramatically; therefore, it is very important to reduce the radiation dose as much as possible. For this problem, we introduced a mixture of anatomy-specific (AS) NNC experts. The basic idea is to train multiple NNC models for different anatomic segments with different characteristics, and merge the predictions based on the segmentations. Our phantom and clinical analysis showed that more than 90% dose reduction would be achieved using our AS NNC model.We exploited our findings from image denoising and CT dose reduction, to tackle the challenging task of bone suppression in CXRs. Most lung nodules that are missed by radiologists as well as by computer-aided detection systems overlap with bones in CXRs. Our purpose was to develop an imaging system to virtually separate ribs and clavicles from lung nodules and soft-tissue in CXRs. To achieve this, we developed a mixture of anatomy-specific, orientation-frequency-specific (ASOFS) expert deep NNC model. While our model was able to decompose the CXRs, to achieve an even higher bone suppression performance, we employed our deep R-NNC for the bone suppression application. Our model was able to create bone and soft-tissue images from single CXRs, without requiring specialized equipment or increasing the radiation dose.
Show less

Title: LOW DIMENSIONAL SIGNAL SETS FOR RADAR APPLICATIONS
Creator: Alphonse Joseph Rajkumar, Sebastian Anand
Date: 2018
Description: In this dissertation we present a view in which the radar signals as the elements of a high dimensional signal set. The dimension is equal to...
Show moreIn this dissertation we present a view in which the radar signals as the elements of a high dimensional signal set. The dimension is equal to the number of discrete samples (M) of the signal. Because the radar signals should satisfy certain conditions for good performance, most lie in much smaller subsets or subspaces. By developing appropriate lower dimensional signal spaces that approximate these areas where the radar signals live, we can realize potential advantage because of the greater parametric simplicity. In this dissertation we apply this low dimensional signal concept in radar signal processing. In particular we focus on radar signal design and radar signal estimation. Signal design comes under radar measures and signal estimation comes under radar countermeasures.In signal design problem one searches for the signal element that has smaller sidelobes and also satisfies certain constraints such as bandwidth occupancy, AC mainlobe width, etc. The sidelobe levels are quantified by Peak Sidelobe Ratio (PSLR) and Integrated Sidelobe Ratio (ISLR). We use linear combination of these two metrics as the cost function to determine the quality of the designed signal. There is a lot of effort in designing parameterized signal sets including our proposed Asymmetric Time Exponentiated Frequency Modulated (ATEFM) signal and Odd Polynomial FrequencySignal (OPFS). Our contribution is to demonstrate that the best signal elements from these low dimensional signal sets (LDSS) mostly outperform the best signal elements that are randomly chosen from the radar signal subset with dimensionality M. Since searching the best signal element from the LDSS requires less computational resources it is prudent to search for the best signal elements from the low dimensional signal sets.In signal estimation problem we try to estimate the signal transmitted by a noncooperating radar which is intercepted by multiple passive sensors. The intercepted signals often have low SNR and there could be only few intercepted signals available for signal estimation. Predominantly used method for estimating the radar signals is Principal Component Analysis (PCA). When the SNR is low (< 0 dB) we need large number of intercepted signals to get an accurate estimates from PCA method. Our contribution is to demonstrate that by limiting the search for the best signal estimate within the low dimensional signal sets one can get more accurate estimates of the unknown transmitted signal at low SNRs with smaller number of sensors compared to PCA.
Show less

Title: Statistical Experimental Design and Modeling for Complex Data
Creator: Huang, Xiao
Date: 2018
Description: The ability to handle complex data is essential for new research findings and business success today. With increased complexity, data can...
Show moreThe ability to handle complex data is essential for new research findings and business success today. With increased complexity, data can either be difficult to collect with designed experiments or be difficult to analyze with statistical models. Both kinds of difficulties are addressed in this dissertation.The first part of this dissertation (Chapter 2 and 3) addresses the issue of complex data collection by considering two design of experiment problems. In chapter 2, we consider Bayesian A-optimal design problem under a hierarchical probabilistic model involving both quantitative and qualitative response variables. The objective function was derived and an efficient optimization algorithm was developed. In chapter 3, we consider the A/B-testing problem and propose a novel discrepancy-based approach for designing such an experiment. As the numerical examples show, the A/B-testing experiments designed in this way achieve better group balance and parametric estimation results.In the second part of this dissertation (Chapter 4 and 5), we focus on analyzing complex data with Gaussian process (GP) models. Gaussian process model is widely used for analyzing data with highly nonlinear relationships and emulating complex systems. In Chapter 4, we apply and extend GP model to analyze the in-cylinder pressure data resulted from experiments on a newly-developed dual fuel engine. The resulted model incorporates different data types and achieves good prediction accuracy. In Chapter 5, a generalized functional ANOVA GP model is proposed to tackle the difficulty resulted from high-dimensional feature space, and we develop an efficient algorithm for building such a model from the perspective of multiple kernel learning. The proposed approach outperforms traditional MLE-based GP models on both computational efficiency and prediction accuracy.
Show less

Title: WIENER-HOPF FACTORIZATION FOR TIME-INHOMOGENEOUS MARKOV CHAINS AND BAYESIAN ESTIMATIONS FOR DIAGONALIZABLE BILINEAR STOCHASTIC PARTIAL DIFFERENTIAL EQUATIONS
Creator: Cheng, Ziteng
Date: 2021
Description: This thesis consists of two major parts, and contributes to two areas of research in stochastic analysis: (i) Wiener-Hopf factorization (WHf)...
Show moreThis thesis consists of two major parts, and contributes to two areas of research in stochastic analysis: (i) Wiener-Hopf factorization (WHf) for Markov Chains, (ii) statistical inference for Stochastic Partial Differential Equations (SPDEs).WHf for Markov chains is a methodology concerned with computation of expectation of some types of functionals of the underlying Markov chain. Most results in WHf for Markov chains are done in the framework of time-homogeneous Markov chains. The major contribution of this thesis in the area of WHf for Markov chains are: • We extend the classical theory to the framework of time-inhomogeneous Markov chains. • In particular, we establish the existence and uniqueness of solutions for a new class of operator Riccati equations. • We connect the solution of the Riccati equation to some expectations of interest related to a time-inhomogeneous Markov chain. Statistical inference for SPDEs regards estimating parameters of a SPDE based on available and relevant observations of the underlying phenomenon that is modeled by the given SPDE. We summarize the contribution of this thesis in the area statistical inference for SPDEs as follows: • We conduct the statistical inference for a diagonalizable SPDE driven by a multiplicative noise of special structure, using spectral approach. We show that the corresponding statistical model fits the classical uniform asymptotic normality (UAN) paradigm. • We prove a Bernstein-Von Mises type result that strengthens the existing results in the literature. • We prove the asymptotic consistency, asymptotic normality and asymptotic efficiency of two Bayesian type estimators.
Show less

Title: Fast Automatic Bayesian Cubature Using Matching Kernels and Designs
Creator: Rathinavel, Jagadeeswaran
Date: 2019
Description: Automatic cubatures approximate multidimensional integrals to user-specified error tolerances. In many real-world integration problems, the...
Show moreAutomatic cubatures approximate multidimensional integrals to user-specified error tolerances. In many real-world integration problems, the analytical solution is either unavailable or difficult to compute. To overcome this, one can use numerical algorithms that approximately estimate the value of the integral. For high dimensional integrals, quasi-Monte Carlo (QMC) methods are very popular. QMC methods are equal-weight quadrature rules where the quadrature points are chosen deterministically, unlike Monte Carlo (MC) methods where the points are chosen randomly.The families of integration lattice nodes and digital nets are the most popular quadrature points used. These methods consider the integrand to be a deterministic function. An alternative approach, called Bayesian cubature, postulates the integrand to be an instance of a Gaussian stochastic process. For high dimensional problems, it is difficult to adaptively change the sampling pattern. But one can automatically determine the sample size, $n$, given a fixed and reasonable sampling pattern. We take this approach using a Bayesian perspective. We assume a Gaussian process parameterized by a constant mean and a covariance function defined by a scale parameter and a function specifying how the integrand values at two different points in the domain are related. These parameters are estimated from integrand values or are given non-informative priors. This leads to a credible interval for the integral. The sample size, $n$, is chosen to make the credible interval for the Bayesian posterior error no greater than the desired error tolerance. However, the process just outlined typically requires vector-matrix operations with a computational cost of $O(n^3)$. Our innovation is to pair low discrepancy nodes with matching kernels, which lowers the computational cost to $O(n \log n)$. We begin the thesis by introducing the Bayesian approach to calculate the posterior cubature error and define our automatic Bayesian cubature. Although much of this material is known, it is used to develop the necessary foundations. Some of the major contributions of this thesis include the following: 1) The fast Bayesian transform is introduced. This generalizes the techniques that speedup Bayesian cubature when the kernel matches low discrepancy nodes. 2) The fast Bayesian transform approach is demonstrated using two methods: a) rank-1 lattice sequences and shift-invariant kernels, and b) Sobol' sequences and Walsh kernels. These two methods are implemented as fast automatic Bayesian cubature algorithms in the Guaranteed Automatic Integration Library (GAIL). 3) We develop additional numerical implementation techniques: a) rewriting the covariance kernel to avoid cancellation error, b) gradient descent for hyperparameter search, and c) non-integer kernel order selection.The thesis concludes by applying our fast automatic Bayesian cubature algorithms to three sample integration problems. We show that our algorithms are faster than the basic Bayesian cubature and that they provide answers within the error tolerance in most cases. The Bayesian cubatures that we develop are guaranteed for integrands belonging to a cone of functions that reside in the middle of the sample space. The concept of a cone of functions is also explained briefly.
Show less

Title: Latent Price Model for Market Microstructure: Estimation and Simulation
Creator: Yin, Yuan
Date: 2023
Description: This thesis focuses on exploring and solving several problems based on partiallyobserved diffusion models. The thesis has two parts....
Show moreThis thesis focuses on exploring and solving several problems based on partiallyobserved diffusion models. The thesis has two parts. In the first part we present a tractable sufficient condition for the consistency of maximum likelihood estimators (MLEs) in partially observed diffusion models, stated in terms of stationary distributions of the associated test processes, under the assumption that the set of unknown parameter values is finite. We illustrate the tractability of this sufficient condition by verifying it in the context of a latent price model of market microstructure. Finally, we describe an algorithm for computing MLEs in partially observed diffusion models and test it on historical data to estimate the parameters of the latent price model. In the second part we provide a thorough analysis of the particle filtering algorithm for estimating the conditional distribution in partially observed diffusion models. Specifically, we focus on estimating the distribution of unobserved processes using observed data. The algorithm involves several steps and assumptions, which are described in detail. We also examine the convergence of the algorithm and identify the sufficient conditions under which it converges. Finally, we derive an explicit upper bound of the convergence rate of the algorithm, which depends on the set of parameters and the choice of time frequency. This bound provides a measure of the algorithm’s performance and can be used to optimize its parameters to achieve faster convergence.
Show less

Title: Machine Learning On Graphs
Creator: He, Jia
Date: 2022
Description: Deep learning has revolutionized many machine learning tasks in recent years.Successful applications range from computer vision, natural...
Show moreDeep learning has revolutionized many machine learning tasks in recent years.Successful applications range from computer vision, natural language processing to speech recognition, etc. The success is partially due to the availability of large amounts of data and fast growing computing resources (i.e., GPU and TPU), and partially due to the recent advances in deep learning technology. Neural networks, in particular, have been successfully used to process regular data such as images and videos. However, for many applications with graph-structured data, due to the irregular structure of graphs, many powerful operations in deep learning can not be readily applied. In recent years, there is a growing interest in extending deep learning to graphs. We first propose graph convolutional networks (GCNs) for the task of classification or regression on time-varying graph signals, where the signal at each vertex is given as a time series. An important element of the GCN design is filter design. We consider filtering signals in either the vertex (spatial) domain, or the frequency (spectral) domain. Two basic architectures are proposed. In the spatial GCN architecture, the GCN uses a graph shift operator as the basic building block to incorporate the underlying graph structure into the convolution layer. The spatial filter directly utilizes the graph connectivity information. It defines the filter to be a polynomial in the graph shift operator to obtain the convolved features that aggregate neighborhood information of each node. In the spectral GCN architecture, a frequency filter is used instead. A graph Fourier transform operator or a graph wavelet transform operator first transforms the raw graph signal to the spectral domain, then the spectral GCN uses the coe"cients from the graph Fourier transform or graph wavelet transform to compute the convolved features. The spectral filter is defined using the graph’s spectral parameters. There are additional challenges to process time-varying graph signals as the signal value at each vertex changes over time. The GCNs are designed to recognize di↵erent spatiotemporal patterns from high-dimensional data defined on a graph. The proposed models have been tested on simulation data and real data for graph signal classification and regression. For the classification problem, we consider the power line outage identification problem using simulation data. The experiment results show that the proposed models can successfully classify abnormal signal patterns and identify the outage location. For the regression problem, we use the New York city bike-sharing demand dataset to predict the station-level hourly demand. The prediction accuracy is superior to other models. We next study graph neural network (GNN) models, which have been widely used for learning graph-structured data. Due to the permutation-invariant requirement of graph learning tasks, a basic element in graph neural networks is the invariant and equivariant linear layers. Previous work by Maron et al. (2019) provided a maximal collection of invariant and equivariant linear layers and a simple deep neural network model, called k-IGN, for graph data defined on k-tuples of nodes. It is shown that the expressive power of k-IGN is equivalent to k-Weisfeiler-Lehman (WL) algorithm in graph isomorphism tests. However, the dimension of the invariant layer and equivariant layer is the k-th and 2k-th bell numbers, respectively. Such high complexity makes it computationally infeasible for k-IGNs with k > 3. We show that a much smaller dimension for the linear layers is su"cient to achieve the same expressive power. We provide two sets of orthogonal bases for the linear layers, each with only 3(2k & 1) & k basis elements. Based on these linear layers, we develop neural network models GNN-a and GNN-b, and show that for the graph data defined on k-tuples of data, GNN-a and GNN-b achieve the expressive power of the k-WL algorithm and the (k + 1)-WL algorithm in graph isomorphism tests, respectively. In molecular prediction tasks on benchmark datasets, we demonstrate that low-order neural network models consisting of the proposed linear layers achieve better performance than other neural network models. In particular, order-2 GNN-b and order-3 GNN-a both have 3-WL expressive power, but use a much smaller basis and hence much less computation time than known neural network models. Finally, we study generative neural network models for graphs. Generative models are often used in semi-supervised learning or unsupervised learning. We address two types of generative tasks. In the first task, we try to generate a component of a large graph, such as predicting if a link exists between a pair of selected nodes, or predicting the label of a selected node/edge. The encoder embeds the input graph to a latent vector space via vertex embedding, and the decoder uses the vertex embedding to compute the probability of a link or node label. In the second task, we try to generate an entire graph. The encoder embeds each input graph to a point in the latent space. This is called graph embedding. The generative model then generates a graph from a sampled point in the latent space. Di↵erent from the previous work, we use the proposed equivariant and invariant layers in the inference model for all tasks. The inference model is used to learn vertex/graph embeddings and the generative model is used to learn the generative distributions. Experiments on benchmark datasets have been performed for a range of tasks, including link prediction, node classification, and molecule generation. Experiment results show that the high expressive power of the inference model directly improves latent space embedding, and hence the generated samples.
Show less

Title: Algorithms for Discrete Data in Statistics and Operations Research
Creator: Schwartz, William K.
Date: 2021
Description: This thesis develops mathematical background for the design of algorithms for discrete-data problems, two in statistics and one in operations...
Show moreThis thesis develops mathematical background for the design of algorithms for discrete-data problems, two in statistics and one in operations research. Chapter 1 gives some background on what chapters 2 to 4 have in common. It also defines some basic terminology that the other chapters use.Chapter 2 offers a general approach to modeling longitudinal network data, including exponential random graph models (ERGMs), that vary according to certain discrete-time Markov chains (The abstract of chapter 2 borrows heavily from the abstract of Schwartz et al., 2021). It connects conditional and Markovian exponential families, permutation- uniform Markov chains, various (temporal) ERGMs, and statistical considerations such as dyadic independence and exchangeability. Markovian exponential families are explored in depth to prove that they and only they have exponential family finite sample distributions with the same parameter as that of the transition probabilities. Many new statistical and algebraic properties of permutation-uniform Markov chains are derived. We introduce exponential random ?-multigraph models, motivated by our result on replacing ? observations of a permutation-uniform Markov chain of graphs with a single observation of a corresponding multigraph. Our approach simplifies analysis of some network and autoregressive models from the literature. Removing models’ temporal dependence but not interpretability permitted us to offer closed-form expressions for maximum likelihood estimators that previously did not have closed-form expression available. Chapter 3 designs novel, exact, conditional tests of statistical goodness-of-fit for mixed membership stochastic block models (MMSBMs) of networks, both directed and undirected. The tests employ a ?²-like statistic from which we define p-values for the general null hypothesis that the observed network’s distribution is in the MMSBM as well as for the simple null hypothesis that the distribution is in the MMSBM with specified parameters. For both tests the alternative hypothesis is that the distribution is unconstrained, and they both assume we have observed the block assignments. As exact tests that avoid asymptotic arguments, they are suitable for both small and large networks. Further we provide and analyze a Monte Carlo algorithm to compute the p-value for the simple null hypothesis. In addition to our rigorous results, simulations demonstrate the validity of the test and the convergence of the algorithm. As a conditional test, it requires the algorithm sample the fiber of a sufficient statistic. In contrast to the Markov chain Monte Carlo samplers common in the literature, our algorithm is an exact simulation, so it is faster, more accurate, and easier to implement. Computing the p-value for the general null hypothesis remains an open problem because it depends on an intractable optimization problem. We discuss the two schools of thought evident in the literature on how to deal with such problems, and we recommend a future research program to bridge the gap those two schools. Chapter 4 investigates an auctioneer’s revenue maximization problem in combinatorial auctions. In combinatorial auctions bidders express demand for discrete packages of multiple units of multiple, indivisible goods. The auctioneer’s NP-complete winner determination problem (WDP) is to fit these packages together within the available supply to maximize the bids’ sum. To shorten the path practitioners traverse from from legalese auction rules to computer code, we offer a new wdp formalism to reflect how government auctioneers sell billions of dollars of radio-spectrum licenses in combinatorial auctions today. It models common tie-breaking rules by maximizing a sum of bid vectors lexicographically. After a novel pre-solving technique based on package bids’ marginal values, we develop an algorithm for the WDP. In developing the algorithm’s branch-and-bound part adapted to lexicographic maximization, we discover a partial explanation of why classical WDP has been successful in using the linear programming relaxation: it equals the Lagrangian dual. We adapt the relaxation to lexicographic maximization. The algorithm’s dynamic-programming part retrieves already computed partial solutions from a novel data structure suited specifically to our WDP formalism. Finally we show that the data structure can “warm start” a popular algorithm for solving for opportunity-cost prices.
Show less

repository.iit

Search the repository

Enabled Filters

Refine Results

Date

Department

Subject

Creator