Search results

(1 - 3 of 3)

Title: Advances in Machine Learning: Theory and Applications in Time Series Prediction
Creator: London, Justin J.
Date: 2021
Description: A new time series modeling framework for forecasting, prediction and regime switching for recurrent neural networks (RNNs) using machine...
Show moreA new time series modeling framework for forecasting, prediction and regime switching for recurrent neural networks (RNNs) using machine learning is introduced. In this framework, we replace the perceptron with an econometric modeling unit. This cell/unit is a functionally dedicated to processing the prediction component from the econometric model. These supervised learning methods overcome the parameter estimation and convergence problems of traditional econometric autoregression (AR) models that use MLE and expectation-maximization (EM) methods which are computationally expensive, assume linearity, Gaussian distributed errors, and suffer from the curse of dimensionality. Consequently, due to these estimation problems and lower number of lags that can be estimated, AR models are limited in their ability to capture long memory or dependencies. On the other hand, plain RNNs suffer from the vanishing and gradient problem that also limits their ability to have long-memory. We introduce a new class of RNN models, the $\alpha$-RNN and dynamic $\alpha_{t}$-RNNs that does not suffer from these problems by utilizing an exponential smoothing parameter. We also introduce MS-RNNs, MS-LSTMs, and MS-GRUs., novel models that overcome the limitations of MS-ARs but enable regime (Markov) switching and detection of structural breaks in the data. These models have long memory, can handle non-linear dynamics, do not require data stationarity or assume error distributions. Thus, they make no assumptions about the data generating process and have the ability to better capture temporal dependencies leading to better forecasting and prediction accuracy over traditional econometric models and plain RNNs. Yet, the partial autocorrelation function and econometric tools, such as the the ADF, Ljung-Box, and AIC test statistics, can be used to determine optimal sequence lag lengths to input into these RNN models and to diagnose serial correlation. The new framework has capacity to characterize the non-linear partial autocorrelation of time series and directly capture dynamic effects such as trends and seasonality. The optimal sequence lag order can greatly influence prediction performance on test data. This structure provides more interpretability to ML models since traditional econometric models are embedded into RNNs. The ability to embed econometric models into RNNs will allow firms to improve prediction accuracy compared to traditional econometric or traditional ML models by creating a hybrid utilizing a well understood traditional econometric model and a ML. In theory the traditional econometric model should focus on the portion of the estimation error that is best managed by a traditional model and the ML should focus the non-linear portion of the model. This combined structure is a step towards explainable AI and lays the framework for econometric AI.
Show less

Title: Image Synthesis with Generative Adversarial Networks
Creator: Ouyang, Xu
Date: 2023
Description: Image synthesis refers to the process of generating new images from an existing dataset, with the objective of creating images that closely...
Show moreImage synthesis refers to the process of generating new images from an existing dataset, with the objective of creating images that closely resemble the target images, learned from the source data distribution. This technique has a wide range of applications, including transforming captions into images, deblurring blurred images, and enhancing low-resolution images. In recent years, deep learning techniques, particularly Generative Adversarial Network (GAN), has achieved significant success in this field. GAN consists of a generator (G) and a discriminator (D) and employ adversarial learning to synthesize images. Researchers have developed various strategies to improve GAN performance, such as controlling learning rates for different models and modifying the loss functions. This thesis focuses on image synthesis from captions using GANs and aims to improve the quality of generated images. The study is divided into four main parts:In the first part, we investigate the LSTM conditional GAN which is to generate images from captions. We use the word2vec as the caption features and combine these features’ information by LSTM and generate images via conditional GAN. In the second part, to improve the quality of generated images, we address the issue of convergence speed and enhance GAN performance using an adaptive WGAN update strategy. We demonstrate that this update strategy is applicable to Wasserstein GAN(WGAN) and other GANs that utilize WGAN-related loss functions. The proposed update strategy is based on a loss change ratio comparison between G and D. In the third part, to further enhance the quality of synthesized images, we investigate a transformer-based Uformer GAN for image restoration and propose a two-step refinement strategy. Initially, we train a Uformer model until convergence, followed by training a Uformer GAN using the restoration results obtained from the first step.In the fourth part, to generate fine-grained image from captions, we delve into the Recurrent Affine Transformation (RAT) GAN for fine-grained text-to-image synthesis. By incorporating an auxiliary classifier in the discriminator and employing a contrastive learning method, we improve the accuracy and fine-grained details of the synthesized images.Throughout this thesis, we strive to enhance the capabilities of GANs in various image synthesis applications and contribute valuable insights to the field of deep learning and image processing.
Show less

Title: Machine Learning On Graphs
Creator: He, Jia
Date: 2022
Description: Deep learning has revolutionized many machine learning tasks in recent years.Successful applications range from computer vision, natural...
Show moreDeep learning has revolutionized many machine learning tasks in recent years.Successful applications range from computer vision, natural language processing to speech recognition, etc. The success is partially due to the availability of large amounts of data and fast growing computing resources (i.e., GPU and TPU), and partially due to the recent advances in deep learning technology. Neural networks, in particular, have been successfully used to process regular data such as images and videos. However, for many applications with graph-structured data, due to the irregular structure of graphs, many powerful operations in deep learning can not be readily applied. In recent years, there is a growing interest in extending deep learning to graphs. We first propose graph convolutional networks (GCNs) for the task of classification or regression on time-varying graph signals, where the signal at each vertex is given as a time series. An important element of the GCN design is filter design. We consider filtering signals in either the vertex (spatial) domain, or the frequency (spectral) domain. Two basic architectures are proposed. In the spatial GCN architecture, the GCN uses a graph shift operator as the basic building block to incorporate the underlying graph structure into the convolution layer. The spatial filter directly utilizes the graph connectivity information. It defines the filter to be a polynomial in the graph shift operator to obtain the convolved features that aggregate neighborhood information of each node. In the spectral GCN architecture, a frequency filter is used instead. A graph Fourier transform operator or a graph wavelet transform operator first transforms the raw graph signal to the spectral domain, then the spectral GCN uses the coe"cients from the graph Fourier transform or graph wavelet transform to compute the convolved features. The spectral filter is defined using the graph’s spectral parameters. There are additional challenges to process time-varying graph signals as the signal value at each vertex changes over time. The GCNs are designed to recognize di↵erent spatiotemporal patterns from high-dimensional data defined on a graph. The proposed models have been tested on simulation data and real data for graph signal classification and regression. For the classification problem, we consider the power line outage identification problem using simulation data. The experiment results show that the proposed models can successfully classify abnormal signal patterns and identify the outage location. For the regression problem, we use the New York city bike-sharing demand dataset to predict the station-level hourly demand. The prediction accuracy is superior to other models. We next study graph neural network (GNN) models, which have been widely used for learning graph-structured data. Due to the permutation-invariant requirement of graph learning tasks, a basic element in graph neural networks is the invariant and equivariant linear layers. Previous work by Maron et al. (2019) provided a maximal collection of invariant and equivariant linear layers and a simple deep neural network model, called k-IGN, for graph data defined on k-tuples of nodes. It is shown that the expressive power of k-IGN is equivalent to k-Weisfeiler-Lehman (WL) algorithm in graph isomorphism tests. However, the dimension of the invariant layer and equivariant layer is the k-th and 2k-th bell numbers, respectively. Such high complexity makes it computationally infeasible for k-IGNs with k > 3. We show that a much smaller dimension for the linear layers is su"cient to achieve the same expressive power. We provide two sets of orthogonal bases for the linear layers, each with only 3(2k & 1) & k basis elements. Based on these linear layers, we develop neural network models GNN-a and GNN-b, and show that for the graph data defined on k-tuples of data, GNN-a and GNN-b achieve the expressive power of the k-WL algorithm and the (k + 1)-WL algorithm in graph isomorphism tests, respectively. In molecular prediction tasks on benchmark datasets, we demonstrate that low-order neural network models consisting of the proposed linear layers achieve better performance than other neural network models. In particular, order-2 GNN-b and order-3 GNN-a both have 3-WL expressive power, but use a much smaller basis and hence much less computation time than known neural network models. Finally, we study generative neural network models for graphs. Generative models are often used in semi-supervised learning or unsupervised learning. We address two types of generative tasks. In the first task, we try to generate a component of a large graph, such as predicting if a link exists between a pair of selected nodes, or predicting the label of a selected node/edge. The encoder embeds the input graph to a latent vector space via vertex embedding, and the decoder uses the vertex embedding to compute the probability of a link or node label. In the second task, we try to generate an entire graph. The encoder embeds each input graph to a point in the latent space. This is called graph embedding. The generative model then generates a graph from a sampled point in the latent space. Di↵erent from the previous work, we use the proposed equivariant and invariant layers in the inference model for all tasks. The inference model is used to learn vertex/graph embeddings and the generative model is used to learn the generative distributions. Experiments on benchmark datasets have been performed for a range of tasks, including link prediction, node classification, and molecule generation. Experiment results show that the high expressive power of the inference model directly improves latent space embedding, and hence the generated samples.
Show less

repository.iit

Search the repository

Search results

Enabled Filters

Refine Results

Type

Date

Subject

Creator

Rights