Search results
(1 - 6 of 6)
- Title
- MODELING THE INFORMATION CONTENT OF THE LIMIT ORDER BOOK BY BAGGING
- Creator
- Li, Wenyi
- Date
- 2018
- Description
-
I propose a bagging tree framework to study the information content of the limit order book in U.S. equity market. By measuring the...
Show moreI propose a bagging tree framework to study the information content of the limit order book in U.S. equity market. By measuring the predictability and profitability of the order book data up to 5 levels, I find that the limit orders book is informative. In addition to market orders, limit orders behind the best bid and ask prices also contributes to short-term future price movements. Finally, I design simple strategies to show that this information content can be effectively and consistently translated to economic value. My results may provide important implications for both researchers and market practitioners.
Show less
- Title
- Removing Confounds in Text Classification for Computational Social Science
- Creator
- Landeiro Dos Reis, Virgile
- Date
- 2018
- Description
-
Nowadays, one can use social media and other online platforms to communicate with friends and family, write a review for a product, ask...
Show moreNowadays, one can use social media and other online platforms to communicate with friends and family, write a review for a product, ask questions about a topic of interest, or even share details of private life with the rest of the world. The ever-increasing amount of user-generated content has provided researchers with data that can offer insights on human behavior. Because of that, the field of computational social science - at the intersection of machine learning and social sciences - has soared in the past years, especially within the field of public health research. However, working with large amounts of user-generated data creates new issues. In this thesis, we propose solutions for two problems encountered in computational social science and related to confounding bias.First, because of the anonymity provided by online forums, social networks, or other blogging platforms through the common usage of usernames, it is hard to get accurate information about users such as gender, age, or ethnicity. Therefore, although collecting data on a specific topic is made easier, conducting an observational study with this type of data is not simple. Indeed, when one wishes to run a study to measure the effect of a variable on another variable, one needs to control for potential confounding variables. In the case of user-generated data, these potential confounding variables are at best noisily observed or inferred and at worst not observed at all. In this work, we wish to provide a way to use these inferred latent attributes in order to conduct an observational study while reducing the effect of confounding bias as much as possible. We first present a simple matching method in a large-scale observational study. Then, we propose a method to retrieve relevant and representative documents through adaptive query building in order to build the treatment and control groups of an observational study.Second, we focus on the problem of controlling for confounding variables when the influence of these variables on the target variable of a classification problem changes over time. Although identifying and controlling for confounding variables has been assiduously studied in empirical social science, it is often neglected in text classification. This can be understood by the fact that, if we assume that the impact of confounding variables does not change between the training and the testing data, then prediction accuracy should only be slightly affected. Yet, this assumption often does not hold when working with user-generated text. Because of this, computational science studies are at risk of reaching false conclusions when based on text classifiers that are not controlling for confounding variables. In this document, we propose to build a classifier that is robust to confounding bias shift, and we show that we can build such a classifier in different situations: when there are one or more observed confounding variables, when there is one noisily predicted confounding variable, or when the confounding variable is unknown but can be detected through topic modeling.
Show less
- Title
- Language, Perception, and Causal Inference in Online Communication
- Creator
- Wang, Zhao
- Date
- 2021
- Description
-
With the proliferation of social media platforms, online communication is becoming increasingly popular. The nature of a wide audience and...
Show moreWith the proliferation of social media platforms, online communication is becoming increasingly popular. The nature of a wide audience and rapid spread of information make these platforms attractive to public entities, organizations, and individuals. Marketers use these platforms to advertise their products and collect customer feedbacks (e.g. Amazon, Airbnb, Yelp, IMDB). Politicians use these platforms to directly speak with the public and canvass for votes (e.g., Twitter, Youtube, Snapchat). Individuals use these platforms to connect with friends and share daily life (e.g., Twitter, Facebook, Instagram, Weibo). The various platforms allow users to build public image and increase reputation through a fast and cheap way. However, due to the lack of regulations and low effort of online communication, some users try to manage their public impression using vague and tricky expressions during communication, making it hard for the audience to identify the authenticity of the public messages. Studies across many disciplines have shown that words and language play an important role in effective communication but the nature and extent of this role remain murky. Prior works have investigated wording effect on audience perception, but we still need automatic methods to estimate the causal effect of lexical choice on human perception in large scale. Getting insights into the treatment effect of subtle linguistic signals is crucial for intelligent language understanding and text analysis.The causal estimation of wording effect on perception also provides us an alternative way to understand the causal relationship between word features and perception labels. Comparing with correlational associations between features and labels, which is typically learned by statistical machine learning models, we find inconsistencies between the causal and correlational associations. These inconsistencies suggest possible spurious correlations in text classification and it's significant to address this issue by applying causal inference knowledge to guide statistical classifiers.In this thesis, our first goal is to investigate wording effect in online communication and study causal inference in text. We start from a deceptive marketing task to quantify entities' word commitment from online public messaging and identify potentially inauthentic entities. We then propose several frameworks to estimate the causal effects of word choice on audience perception by adapting Individual Treatment Effect estimation from causal inference literature to our problem of Lexical Substitution Effect estimation. The findings from these projects motivate us to explore our second goal of applying causal inference knowledge to improve statistical model robustness. Specifically, we study the causal and correlational associations in text and discover possible spurious correlations in text classifiers. Then, by extending the causal discovery, we propose two frameworks to improve text classifier robustness and fairness either by directly removing bias correlations or by training a robust model with automatically generated counterfactual samples.
Show less
- Title
- Advances in Machine Learning: Theory and Applications in Time Series Prediction
- Creator
- London, Justin J.
- Date
- 2021
- Description
-
A new time series modeling framework for forecasting, prediction and regime switching for recurrent neural networks (RNNs) using machine...
Show moreA new time series modeling framework for forecasting, prediction and regime switching for recurrent neural networks (RNNs) using machine learning is introduced. In this framework, we replace the perceptron with an econometric modeling unit. This cell/unit is a functionally dedicated to processing the prediction component from the econometric model. These supervised learning methods overcome the parameter estimation and convergence problems of traditional econometric autoregression (AR) models that use MLE and expectation-maximization (EM) methods which are computationally expensive, assume linearity, Gaussian distributed errors, and suffer from the curse of dimensionality. Consequently, due to these estimation problems and lower number of lags that can be estimated, AR models are limited in their ability to capture long memory or dependencies. On the other hand, plain RNNs suffer from the vanishing and gradient problem that also limits their ability to have long-memory. We introduce a new class of RNN models, the $\alpha$-RNN and dynamic $\alpha_{t}$-RNNs that does not suffer from these problems by utilizing an exponential smoothing parameter. We also introduce MS-RNNs, MS-LSTMs, and MS-GRUs., novel models that overcome the limitations of MS-ARs but enable regime (Markov) switching and detection of structural breaks in the data. These models have long memory, can handle non-linear dynamics, do not require data stationarity or assume error distributions. Thus, they make no assumptions about the data generating process and have the ability to better capture temporal dependencies leading to better forecasting and prediction accuracy over traditional econometric models and plain RNNs. Yet, the partial autocorrelation function and econometric tools, such as the the ADF, Ljung-Box, and AIC test statistics, can be used to determine optimal sequence lag lengths to input into these RNN models and to diagnose serial correlation. The new framework has capacity to characterize the non-linear partial autocorrelation of time series and directly capture dynamic effects such as trends and seasonality. The optimal sequence lag order can greatly influence prediction performance on test data. This structure provides more interpretability to ML models since traditional econometric models are embedded into RNNs. The ability to embed econometric models into RNNs will allow firms to improve prediction accuracy compared to traditional econometric or traditional ML models by creating a hybrid utilizing a well understood traditional econometric model and a ML. In theory the traditional econometric model should focus on the portion of the estimation error that is best managed by a traditional model and the ML should focus the non-linear portion of the model. This combined structure is a step towards explainable AI and lays the framework for econometric AI.
Show less
- Title
- Unsupervised Learning of Visual Odometry Using Direct Motion Modeling
- Creator
- Andrei, Silviu Stefan
- Date
- 2020
- Description
-
Data for supervised learning of ego-motion and depth from video is scarce and expensive to produce. Subsequently, recent work has focused on...
Show moreData for supervised learning of ego-motion and depth from video is scarce and expensive to produce. Subsequently, recent work has focused on unsupervised learning methods and achieved remarkable results which surpass in some instances the accuracy of supervised methods. Many unsupervised approaches rely on predicted monocular depth and so ignore motion information. Moreover, unsupervised methods which do incorporate motion information do so only indirectly by designing the depth prediction network as an RNN. Hence, none of the existing methods model motion directly. In this work, we show that it is possible to achieve superior pose estimation results by modeling motion explicitly. Our method uses a novel learning-based formulation for depth propagation and refinement which transforms predicted depth maps from the current frame onto the next frame where it serves as a prior for predicting the next frame's depth map. Experimental results demonstrate that the proposed approach surpasses state of the art techniques for the pose prediction task while being better or on par with other methods for the depth prediction task.
Show less
- Title
- SOLID-STATE SMART PLUG DEVICE
- Creator
- Deng, Zhixi
- Date
- 2022
- Description
-
Electrical faults are a leading cause of residential fire, and flexible power cords are particularly susceptible to metal or insulation...
Show moreElectrical faults are a leading cause of residential fire, and flexible power cords are particularly susceptible to metal or insulation degradation that may lead to a variety of electrical faults. Smart Plugs are a type of plug-in device controlling electrical loads via wireless communication for consumer market. However, there is lack of circuit protection features in existing Smart Plug products. Moreover, there is no previous product or research on Smart Plug with circuit protection features. This thesis introduces a new Smart Plug 2.0 concept which offers all-in-one protection against over-current, arc, and ground faults in addition to the smart features in Smart Plug products. It aims at preventing fire and shock hazards caused by degraded or damaged power cords and electrical connections in homes and offices. It offers microsecond-scale time resolution to detect and respond to a fault condition, and significantly reduces the electrothermal stress on household electrical wires and loads. A new arc fault detection method is developed using machine learning models based on load current di/dt events. The Smart Plug 2.0 concept has been validated experimentally. A 120V/10A solid-state Smart Plug 2.0 prototype using power MOSEFTs is designed and tested. It has experimentally demonstrated the comprehensive protection features against all types of electrical faults.
Show less