Search results
(41 - 60 of 124)
Pages
- Title
- NETWORK SELECTION WITH LOAD MANAGEMENT IN HETEROGENEOUS WIRELESS NETWORKS
- Creator
- Ahmed, Syed Qutubuddin
- Date
- 2013, 2013-12
- Description
-
With the growth of data-capable, multi-interface wireless and mobile devices, a lot of research work is being done on handover management and...
Show moreWith the growth of data-capable, multi-interface wireless and mobile devices, a lot of research work is being done on handover management and network selection in heterogeneous environment. Many researchers have proposed several strategies and policies for selecting the appropriate network. These policies di er among each other due to various reasons. The state and load of networks, number and types of wireless networks considered, and preferences of users are some of the reasons that attribute to di erences in those proposed policies. As a result, the applicability and e ciency of those policies depend upon certain situations and circumstances. We propose, in this research, a new concept that will help in utilizing these various policies in a manner that will give better results in a longer run and in various kinds of situations. We have a pool of policies available, and our proposed method will select the policy that is most appropriate according to the current state of user. We modeled this problem as a Markov Decision Process. Since the overall goal is that a user should be able to select an appropriate wireless network according to its service requirements and seamlessly handover to that network regardless of the underlying wireless technology being used. Since di erent sets of methodologies can exist to deal with this issue, we also propose an alternative mechanism that would facilitate a particular user to connect to the most appropriate network in a way that is bene cial to the overall network and its users as a whole. A trusted third party entity receives handover requests from a set of users along with their preferences, takes into consideration current network state of available service providers, and assigns each user to an appropriate network resource. We call this mechanism "Network Assisted Network Selection (NANS)" and it combines network-based, service-based and user-based criteria for network selection, and uses Generalized Assignment Problem (GAP) to assign the network resources to a set of users. ix Mobile Access Gateway (MAG) is a component of Proxy Mobile IPv6 (PMIPv6) which provides network-layer transparent mobility to mobile nodes (MN). MAG serves a local geographical area and mobile nodes in its vicinity may attach to it to get the mobility services from its controlling PMIPV6 domain. Since MAG is the point of attachment of mobile nodes, negotiated and guaranteed quality of service (QoS) is af- fected in case of service disruptions and overload of the MAG. To avoid and minimize the degradation of quality of service, we propose e ective mechanisms to share the load of a ected MAG with the MAG(s) that are working under normal conditions. We propose to handover certain mobile nodes to other MAGs depending upon their geographical serving area and current capacity. Furthermore, location of mobile node, its quality of service pro le, direction of motion and its multi-interface capability are major factors in selecting the mobile nodes for handover.
PH.D in Computer Engineering, December 2013
Show less
- Title
- SCHEDULING FOR THROUGHPUT OPTIMIZATION IN WIMAX NETWORKS
- Creator
- Nusairat, Ashraf
- Date
- 2011-03-21, 2011-05
- Description
-
WiMAX emerged as one of the important Broadband Wireless Access (BWA) networks based on OFDMA technology and is anticipated to be an...
Show moreWiMAX emerged as one of the important Broadband Wireless Access (BWA) networks based on OFDMA technology and is anticipated to be an alternative to wired broadband networks. WiMAX supports different emerging applications with different Quality of Service (QoS) requirements like voice over IP (VoIP), video conference, voice conference and online gaming. Those emerging wireless applications have high throughput demand and impose a challenge to the underlying Radio Access Network (RAN) scheduling algorithms. Efficient allocation of WiMAX shared resources like subchannels is critical to meeting the high throughput demand. The WiMAX resource allocation algorithms determine which users to schedule, how to allocate subcarriers to them, and how to determine the appropriate power levels for each user on each subcarrier. In WiMAX, the DL TDD OFDMA subframe structure is a rectangular area of N subchannels × K time slots. Users are assigned rectangular bursts in the downlink subframe. The burst size varies based on the user’s channel quality and data to be transmitted for the assigned user. In this dissertation we study the problem of assigning users to DL bursts in WiMAX TDD OFDMA system with the objective of maximizing downlink system throughput for the PUSC subchannelization permutation mode. We show that finding the optimal burst assignment that maximizes throughput is NP-hard. In this dissertation,We study this problem following two distinct approaches: (1) Integer Programming approach: we formulate the problem as an IP problem and then relax it to LP, we propose different methods to resolve conflicts resulting from the LP relaxation and through extensive simulations we compare the performance of the proposed conflict resolution methods to the optimal solution. (2) Best Channel approach: we propose several efficient and effective methods to assign bursts to users based on channel quality, we prove that our Best Channel burst assignment method achieves a throughput within a constant factor of the optimal and through extensive simux lations with real system parameters, we study the performance of the Best Channel burst assignment method. To the best of our knowledge, we are the first to study the problem of DL Burst Assignment in the DL OFDMA subframe for PUSC subchannelization permutation mode taking user’s channel quality into consideration in the assignment process.
Ph.D. in Computer Science, May 2011
Show less
- Title
- SPECTRUM ALLOCATIONS ALGORITHMS IN WIRELESS NETWORKS
- Creator
- Xu, Ping
- Date
- 2011-04-26, 2011-05
- Description
-
All wireless devices rely on access to the radio frequency spectrum, which has been chronically regulated by static spectrum allocation...
Show moreAll wireless devices rely on access to the radio frequency spectrum, which has been chronically regulated by static spectrum allocation policies. With the recent fast growing of spectrum-based services and devices, the remaining spectrum available for future wireless services is being exhausted, known as the spectrum scarcity problem. The current fixed spectrum allocation scheme leads to significant spectrum white spaces (including spectral, temporal, and geographic), where many allocated spectrum blocks are used only in certain geographical areas and/or in brief periods of time. In this work, we design and analyze variant spectrum allocation algorithms for better spectrum utilization and study some fundamental performance bounds for networks with opportunistic spectrum utilization. We first propose spectrum allocation algorithms for offline model, in which all spectrum requests are known when allocation decision is made. Then we also addresse the problems in online model, where allocation decision should be made when only a few spectrum requests are known. In the online model, we focus on two different cases. The first one assumes no statistic of future spectrum requests are known, and the second one assumes some statistic is known or can be learned. For all these models, we design efficient spectrum allocation methods and analytically prove most of them are asymptotically optimal. Our extensive simulation results also verify our theoretical conclusion.
Ph.D. in Computer Science, May 2011
Show less
- Title
- AUTOMATED SLICING METHODS FOR LARGE EVENT TRACES
- Creator
- Smith, Raymond D.
- Date
- 2012-05-02, 2012-05
- Description
-
Many long-running computer systems record events as they execute, resulting in a dynamic record of system behavior. In large systems, the...
Show moreMany long-running computer systems record events as they execute, resulting in a dynamic record of system behavior. In large systems, the event trace may contain thousands of entries and when faced with a problem for analysis, programmers must sort through many disparate events to find those that are related to the system behavior under study and eliminate those that are not. In this research we investigated automatic reduction of event traces to reduce the volume of events and assist in analysis of behavior of large systems. Our approach was to adapt the techniques used in program slicing to compute event trace slices as a means of reduction. Two methods for slicing of event traces were proposed and investigated. The Event Dependence Based method (EDB) uses information available in the event trace to identify dependencies between events and to compute an event trace slice that meets a slicing criterion. The Model Dependence Based method (MDB) incorporates the use of an executable state-based system model to achieve further reduction of traces. The method identifies model-based dependences in the trace to compute trace slices. An experimental study was performed on simulated systems, representative of state-based software systems present in industry to analyze and compare the EDB and MDB slicing methods. Both methods provided significant reduction of event traces, particularly for systems with a low degree of sharing and interaction among resources. However, the MDB method significantly outperformed the EDB method for systems with a high degree of resource sharing.
Ph.D. in Computer Science, May 2012
Show less
- Title
- ANYTIME ACTIVE LEARNING DISSERTATION
- Creator
- Ramirez Loaiza, Maria E.
- Date
- 2016, 2016-05
- Description
-
Machine learning is a subfield of artificial intelligence which deals with algorithms that can learn from data. These methods provide...
Show moreMachine learning is a subfield of artificial intelligence which deals with algorithms that can learn from data. These methods provide computers with the ability to learn from past data and make predictions for new data. A few examples of machine learning applications include automated document categorization, spam detection, speech recognition, face detection and recognition, language translation, and self-driving cars. A common scenario for machine learning is supervised learning where the algorithm analyzes known examples to train a model that can identify a concept. For instance, given example documents that are pre-annotated as personal, work, family, etc., a machine learning algorithm can be trained to automate organizing your documents folder. In order to train a model that makes as few mistakes as possible, the algorithm needs many training examples (e.g., documents and their categories). Obtaining these examples often involves consulting the human user/expert whose time is limited and valuable. Hence, the algorithm needs to utilize the human’s time as efficiently as possible by focusing on the most cost-effective and informative examples that would make learning more efficient. Active learning is a technique where the algorithm selects which examples would be most cost-effective and beneficial for consultation with the human. In a typical active learning setting, the algorithm simply chooses the examples that should be asked to the expert. In this thesis, we take this one step further: we observe that we can make even better use of the expert’s time by showing not the full example but only the relevant pieces of it, so that the expert can focus on what is relevant and can provide the answer faster. For example, in document classification, the expert does not need to see the full document to categorize it; if the algorithm can show only the relevant snippet to the expert, the expert should be able to categorize the document much faster. However, automatically finding the relevant snippet is not a trivial task; showing an incorrect snippet can either hinder the expert’s ability to provide an answer at all (if the snippet is irrelevant) or even cause the expert to provide incorrect information (if the snippet is misleading). For this to work, the algorithm needs to find a snippet to show the expert, estimate how much time the expert will spend on that snippet, and predict if the expert will return an answer at all. Further, the algorithm would estimate the likelihood of the expert returning the correct answer. Similar to anytime algorithms that can find better solutions as they are given more time, we call the proposed set of methods anytime active learning where the experts are expected to give better answers as they are shown longer snippets. In this thesis, we focus on three aspects of anytime active learning: i) anytime active learning with document truncation where the algorithm assumes that the first words, sentences, and paragraphs of the document are most informative and it has to decide on the snippet length, i.e., where to truncate the document, ii) given a document, the algorithm optimizes for both snippet location and length, and lastly, iii) the algorithm chooses not only the snippet location and size but also chooses which documents to choose snippets from so that the snippet length, the correctness of the expert’s response, and the informativeness of the document are all optimized in a unified framework.
Ph.D. in Computer Science, May 2016
Show less
- Title
- SPECTRUM SHARING OPPORTUNITY FOR LTE AND AIRCRAFT RADAR IN THE 4.2 - 4.4 GHZ BAND
- Creator
- Singh, Rohit
- Date
- 2017, 2017-07
- Description
-
The Federal Communications Commission (FCC) states that America is facing a spectrum crunch and there is no easy way to meet this increasing...
Show moreThe Federal Communications Commission (FCC) states that America is facing a spectrum crunch and there is no easy way to meet this increasing demand, hence spectrum sensing and sharing has gotten significant attention in the Spectrum Com- munity. Spectrum is an increasingly scarce natural resource which needs to be used to the fullest. Using modern techniques, spectrum bands can be reused such that they do not interfere with the current users in a band. There are many bands in the RF Spectrum which are underutilized and can be reused in the space-time domain. A number of bands have been recognized as candidates for spectrum sharing. In this dissertation, we consider the 4.2 − 4.4GHz band which is dedicated for used by the radar altimeter fixed on aircraft to measure their elevation above the earth’s surface. This spectrum is currently underutilized and with care can be shared with other technologies. This thesis examines the current use of this spectrum as a func- tion of time and location and presents a methodology for assessing whether harmful interference is experienced by either the incumbent radar usage or by a proposed wireless secondary broadband user. However, this band is a potential “safety of life” spectrum which is used by aircraft during landing and takeoff. Improper sharing of this band could cause interference at the radar, which would result in false attitude detection by the radar. Because of its advance technology, LTE should can be a good sharing candidate for this sensitive band. We propose sharing of this band with small cells (perhaps inside buildings) in urban and/or suburban areas, where there is a high demand for LTE and the attenuation from the environment is high enough to cause less interference at the radar altimeters. In this thesis, we propose to detect the aircraft (i.e. the altimeter radars) us- ing the Automatic Dependent Surveillance Broadcast (ADS-B) data which is broad- casted by an aircraft. This aircraft detection mechanism helps us to take intelligent sharing approaches with LTE using the space-time domain. Since the performance of the radar altimeter is safety-of-life critical, a deep understanding of co-existence between these systems is necessary to evaluate whether sharing is feasible. Given the availability of historical ADS-B data, what we believe is an appropriate analysis of Chicagoland has been done to propose implementation of a mix of Exclusion and Coordination zones in this area in the space-time domain. The novelty of this work is to develop spectrum sharing opportunities with radars which are highly transient and their locations are unpredictable due to emergency or traffic or weather. This thesis presents a method for evaluation of the potential for spectrum sharing between the ground-based LTE systems and commercial radar altimeters.
M.S. in Computer Science, July 2017
Show less
- Title
- CHARACTERIZATION AND MODELING OF A COMMERCIAL NATIONWIDE WI-FI HOTSPOT NETWORK
- Creator
- Divgi, Gautam
- Date
- 2014, 2014-12
- Description
-
We present a thorough analysis of a commercial nationwide Wi-Fi hotspot network. The analysis is approached in two ways, characterization and...
Show moreWe present a thorough analysis of a commercial nationwide Wi-Fi hotspot network. The analysis is approached in two ways, characterization and modeling. First we characterize the network from a ve month long log of user activ- ity and traffic collected by a wireless network service provider operating hotspots in restaurants, serviced apartments, hotels and airports all over Australia. The users are categorized based on their account time limits to analyze the impact of account strati cation on the overall user behavior. A similarity index is developed to com- pare two data sets. This is used to quantitatively measure how similar or different various types of accounts are. The user population in the network is found to be highly uctuating, hence user speci c, population independent metrics are proposed to manage this transience. We also introduce metrics to measure account time and data utilization. We then follow through with detailed modeling of session and traffic parame- ters. We develop the truncated loglogistic (T-LL) distribution which can model light and heavy tailed data using a modi cation of Lavalette's law. A novel method to t the T-LL distribution to data by minimizing a goodness-of- t metric is presented. The T-LL distribution and the tting method are subsequently used to model session and traffic parameters of the network based on the categorization methodology de- veloped previously. We address concerns about the speci city of the model by using it to model other publicly available Wi-Fi network traces. The property of the introduced T-LL distribution to model both light and heavy tailed data makes it uniquely quali ed for modeling web le sizes. Thus we extend the applicability of the introduced model by tting it to publicly available web le size data. The T-LL models outperform those of the Pareto and lognormal distributions used to model such data currently.
Ph.D. in Computer Science, December 2014
Show less
- Title
- TOWARDS THE OPTIMAL CONFIGURATION OF USING SSDS UNDER HYBRID PARALLEL I/O AND STORAGE SYSTEMS
- Creator
- Feng, Bo
- Date
- 2014, 2014-05
- Description
-
The performance gap between computing devices and storage devices is con- tinuously getting larger during the past a few decades. This issue...
Show moreThe performance gap between computing devices and storage devices is con- tinuously getting larger during the past a few decades. This issue incurs many I/O problems even in the eld of supercomputing. On one hand, the computing facilities grow very fast as well as the supercomputers are getting more powerful. In addition, the traditional storage devices, such as hard disk drives (HDD) fail to catch up the paces of this growth. On the other hand, applications, from both industries and sciences, are becoming data-intensive, meaning that I/O is highly demanded. Newly emerging non-volatile memory (NVM), such as ash-based solid state drives (SSD), becomes popular in both consuming and enterprise markets. Datacen- ters and supercomputing centers already glimpse this transition and are getting to deploy SSDs in their I/O systems but SSDs still have monetary problems compared to HDDs. Substantial work has been done using SSDs to accelerate I/O and storage systems. However, to the best of our knowledge, there remain some fundamental questions to be addressed, such as what type of storage con guration is suitable to HDD-SSD heterogeneity. Therefore, in this study, we built a high performance hybrid parallel I/O and storage simulator to simulate these con gurations. We also imple- mented an algorithm to approach an optimal con guration using SSDs under parallel I/O and storage systems. This methodology consists of tracing users' applications, analyzing users' requirements including hardware properties, and generating the con- guration suggestions. The experiments show its delity with the minimal error rate is 2% and practical scalability up to 256 processes. The result of this study can help system designers to either optimize current system or predict larger scale design of parallel systems in the future.
M.S. in Computer Science, May 2014
Show less
- Title
- EXPLOITING KNOWLEDGE IN UNSUPERVISED OPEN INFORMATION EXTRACTION
- Creator
- Merhav, Yuval
- Date
- 2012-12-03, 2012-12
- Description
-
The extraction of structured information from text is a long-standing challenge in Natural Language Processing (NLP) which has been...
Show moreThe extraction of structured information from text is a long-standing challenge in Natural Language Processing (NLP) which has been reinvigorated with the ever- increasing availability of user-generated textual content Online. The ability to extract interesting and important pieces of information from text documents is crucial for large scale language understanding, which powers modern Web search engines. The eld of Open Information Extraction (Open IE) o ers a way to auto- matically discover relations from large and heterogeneous text collections. Since it is di cult to obtain adequate training data for Open IE, unsupervised approaches that rely on rules and clustering are popular. However, the major trend in unsupervised Open IE has been to borrow algorithms and low-level features from other applica- tions such as search, relying on previous work that has been proved to be successful in other domains. This thesis argues that it is essential to use domain and external knowledge in Open IE, and proposes several ways of doing it to achieve substantial performance improvements over state-of-the-art systems. We use three main knowledge sources: (1) a large corpus of unstructured text that is used to learn a language model over relations that can be incorporated into a weighting scheme that outperforms the common tf idf weighting scheme; (2) an external knowledge base such as Wikipedia that is used to extract ne-grained types of entities that yield better understanding of how relations are expressed in English; and (3) domain knowledge extracted from the blogosphere (e.g., the degree of a node in the network) that is used to improve performance at scale.
PH.D in Computer Science, December 2012
Show less
- Title
- RECEIVER INITIATED MAC PROTOCOL FOR WIRELESS SENSOR NETWORK
- Creator
- Duan, Sze Ching Eric
- Date
- 2012-04-30, 2012-05
- Description
-
In wireless sensor networks, wireless devices should wake up as necessary as possible to be able to communicate with neighbors and kept high...
Show moreIn wireless sensor networks, wireless devices should wake up as necessary as possible to be able to communicate with neighbors and kept high quality performance. On the other hand, it is also important that wireless devices remain sleeping as much as possible to maintain low power consumption. Thus a power efficient duty cycle mac protocol is required in wireless sensor networks. This thesis proposed a mac layer duty cycle protocol which uses receiver initiated wake up mechanism to allow device to keep their transceivers off most of the time to maintain energy efficiency. Also this report measures the energy consumption of the protocol and made comparisons with other popular mechanisms which shows the approach have successfully reduced energy consumption in compare to other popular protocols.
M.S. in Computer Science, May 2012
Show less
- Title
- SEMANTIC ONTOLOGIES FOR THE PUBLICATION OF SPECTRUM MEASUREMENT PROVENANCE
- Creator
- Faurie, Eric A.
- Date
- 2016, 2016-05
- Description
-
Measurement-based spectrum research isn’t new, but there is a renewed interest in understanding how the spectrum is being utilized. With the...
Show moreMeasurement-based spectrum research isn’t new, but there is a renewed interest in understanding how the spectrum is being utilized. With the modern prevalence of connected devices and our increasing reliance on wireless technologies, there is increasing demand for additional spectrum. The question of how to meet this demand largely depends on how the spectrum is being used today and thus a need for advanced measurement-based research has emerged. Spectrum measurement and analysis is complicated; the data is multi-dimensional and dynamic in time, space, and frequency. Signal behavior is governed by complex mathematics and its use is regulated by government agencies across the world. Data collection relies on a complex system of expensive hardware where the physical attributes of antennas, analyzers, and deployment locations all impact the data that’s collected. These variables and concerns must all be considered while deploying a Spectrum measurement system. This paper presents the Semantic Spectrum Ontology (SSO), a model which aids researchers in designing and deploying Spectrum Measurement systems and publishing their data as community resources. The SSO exists within the paradigm of the Semantic Web and links into the wider Semantic graph by extending the W3C’s Semantic Sensor Network Ontology (SSN). The Semantic Spectrum Ontology also presents two new Semantic Constructs. The Scientific Provenance Model allows researchers to publish in-depth metadata concerning the measurements and the conditions under which they were collected, and the Scientific Property Model creates a framework for encoding knowledge from various sources including domain experts and machine learning statistics. These two models were constructed specifically for the SSO but were generalized to allow for their application within any ontology representing any scientific field.
M.S. in Computer Science, May 2016
Show less
- Title
- SENTIMENT ANALYSIS BASED ON APPRAISAL THEORY AND FUNCTIONAL LOCAL GRAMMARS
- Creator
- Bloom, Kenneth
- Date
- 2011-08, 2011-12
- Description
-
Much of the past work in structured sentiment extraction has been evaluated in ways that summarize the output of a sentiment extraction...
Show moreMuch of the past work in structured sentiment extraction has been evaluated in ways that summarize the output of a sentiment extraction technique for a particular application. In order to get a true picture of how accurate a sentiment extraction system is, however, it is important to see how well it performs at nding individual mentions of opinions in a corpus. Past work also focuses heavily on mining opinion/product-feature pairs from product review corpora, which has lead to sentiment extraction systems assuming that the documents they operate on are review-like | that each document concerns only one topic, that there are lots of reviews on a particular product, and that the product features of interest are frequently recurring phrases. Based on existing linguistics research, this dissertation introduces the concept of an appraisal expression, the basic grammatical unit by which an opinion is expressed about a target. The IIT sentiment corpus, intended to present an alternative to both of these assumptions that have pervaded sentiment analysis research, consists of blog posts annotated with appraisal expressions to enable the evaluation of how well sentiment analysis systems nd individual appraisal expressions. This dissertation introduces FLAG, an automated system for extracting appraisal expressions. FLAG operates using a three step process: (1) identifying attitude groups using a lexicon-based shallow parser, (2) identifying potential structures for the rest of the appraisal expression by identifying patterns in a sentence's dependency parse tree, (3) selecting the best appraisal expression for each attitude group using a discriminative reranker. FLAG achieves an overall accuracy of 0.261 F1 at identifying appraisal expressions, which is good considering the difficulty of the task.
Ph.D. in Computer Science, December 2011
Show less
- Title
- EFFICIENT ALGORITHMS FOR POWER ASSIGNMENT PROBLEMS
- Creator
- Qiao, Kan
- Date
- 2015, 2015-05
- Description
-
Power assignment problems take as input a directed simple graph G = (V;E) and a cost function c : E ! R+. A solution to this problem assigns...
Show morePower assignment problems take as input a directed simple graph G = (V;E) and a cost function c : E ! R+. A solution to this problem assigns every vertex a nonnegative power, p(v). We use H = (V;B(p)) to denote the spanning subgraph of G created by this power assignment. Let B(p) denote the set of all the links established between pairs of nodes in V under the power assignment p. The minimization problem then is to find the minimum power assignment, Pp(v), subject to H satisfying a specific property. 4 variants of this problem are discussed in this paper (a) Min-Power Strong Connectivity: H = (V;B(p)) is strongly connected. (b) Min-Power Broadcast: H = (V;B(p)) has a path from the fixed source z to every other vertex. (c) Min-Power Connectivity with 2-level power (Symmetric): c : E ! f0; 1g and H = (V;B(p)) is connected. (d) Min- Power Strong Connectivity with 2-level power (Asymmetric): c : E ! f0; 1g and H = (V;B(p)) is strongly connected. We give the exact solution using an improved integer linear program for problem (a) and (b) (We do not have a section for the integer linear program of Min-Power Broadcast problem since it is very similar to Min-Power Strong connectivity). Then we try to speedup current best approximation algorithms while preserving their approximation ratio. For problem (a), we give a fast variant of 1:85-approximation algorithm with running time O(n2 log2 n). For problem (b), we give a fast variant of 2(1 + ln n)-approximation algorithm for the most general cost model with running time O(n3) and a fast variant of 4:2- approximation algorithm for 2-dimensional cost model with running time O(nm), where n = jV j and m = jEj. For both problem (c) and (d), We give 5 3-approximation algorithms that run in O(m (n)), where (n) is the inverse Ackermann function.
Ph.D. in Computer Science, May 2015
Show less
- Title
- LIGHTLY SUPERVISED MACHINE LEARNING FOR CLASSIFYING ONLINE SOCIAL DATA
- Creator
- Mohammady Ardehaly, Ehsan
- Date
- 2017, 2017-05
- Description
-
Classifying latent attributes of social media users has many applications in public health, politics, and marketing. For example, web-based...
Show moreClassifying latent attributes of social media users has many applications in public health, politics, and marketing. For example, web-based studies of public health require monthly estimates of the health status and demographics of users based on their public communications. Most existing approaches are based on supervised learning. Supervised learning requires human annotated labeled data, which can be expensive and many attributes such as health are hard to annotate at the user level. In this thesis, we investigate classification algorithms that use population statistical constraints such as demographics, names, polls, and social network followers to predict individual user attributes. For example, the racial makeup of counties is a source of light supervision came from the U.S. Census to train classification models. These statistics are usually easy to obtain, and a large amount of unlabeled data from social media sites (e.g. Twitter) are available. Learning from Label Proportions (LLP) is a lightly supervised approach when the training data is multiple sets of unlabeled samples and only label distributions of them are known. Because social media users are not a representative sample of the population and constraints are too noisy, using existing LLP models (e.g. linear models, label regularization) is insufficient. We develop several new LLP algorithms to extend LLP to deal with this bias, including bag selection and robust classification models. Also, we propose a scalable model to infer political sentiment from the high temporal big data, and estimate the daily conditional probability of different attributes as a supplement method to polls, for social scientists. Because, constraints are not often available in some domains (e.g. blogs), we propose a self-training algorithm to gradually adapt a classifier trained on social media to a different but similar field. We also extend our framework to deep learning and provide empirical results for demographic classification using the user profile image. Finally, when both textual and profile image are available for a user, we provide a co-training algorithm to iteratively improve both image and text classifications accuracy, and apply an ensemble method to achieve the highest precision.
Ph.D. in Computer Science, May 2017
Show less
- Title
- WIRELESS LINK SCHEDULING UNDER PHYSICAL INTERFERENCE MODEL
- Creator
- Ma, Chao
- Date
- 2014, 2014-07
- Description
-
Latency minimization and capacity maximization are fundamental combinatorial optimization problems in wireless networks. Given a set of...
Show moreLatency minimization and capacity maximization are fundamental combinatorial optimization problems in wireless networks. Given a set of communication links in a multihop wireless network, the former computes a schedule satisfying all link demands with shortest latency, while the latter aims at selecting a maximum feasible subset of these links. We study both the Shortest Link Schedule (SLS) and Maximum Independent Set of Links (MISL) from a theoretical perspective, striving for generalized algorithmic treatments and provable approximation guarantees. Wireless devices are prone to radio frequency interference emanating from other devices. Interference can be major inhibitor to transmission performance, degrading the signal quality or even causing the communication to fail. Several models have been used for modeling wireless interference over the past decades. In contrast to graph-based protocol models, which assume the interference end at some boundary, we consider a more realistic SINR-based physical interference model. Under physical interference model, the problem SLS and MISL are hard to solve due to the technical obstacles caused by the ambient noise, non-local and additive nature of interference. In this dissertation, we consider both fixed transmission powers and power control. We explore interference natures under physical interference model and propose a generalization of independent set, which is capable of modeling the independent sets of wireless links. In addition, we present constant-approximation algorithm for MISL with monotone and sub-linear power assignment in both unidirectional and bidirectional mode, and for MISL with sub-mean power assignment in bidirectional mode. We also present constant-approximation algorithm for Maximum Weighted Independent Set of Links (MWISL) with linear power assignment in both unidirectional and bidirectional mode. For MISL with power control in unidirectional mode, we develop a constant-approximation algorithm with the canonical iterative power assignment.
Ph.D. in Computer Science, July 2014
Show less
- Title
- Data used to develop #Polar scores
- Creator
- Culotta, Aron, Hemphill, Libby, Heston, Matthew
- Date
- 2013, 2016
- Description
-
We present a new approach to measuring political polarization, including a novel algorithm and open source Python code, which leverages...
Show moreWe present a new approach to measuring political polarization, including a novel algorithm and open source Python code, which leverages Twitter content to produce measures of polarization for both users and hashtags. #Polar scores provide advantages over existing measures because they (1) can be calculated throughout the legislative cycle, (2) allow for easy differentiation between users with similar scores, (3) are chamber-agnostic, and (4) are a generic approach that can be applied beyond the U.S. Congress. #Polar scores leverage available information such as party labels, word frequency, and hashtags to create an accessible, straightforward algorithm for estimating polarity using text. (from the paper: Hemphill, L., Culotta, A., and Heston, M. (forthcoming) #Polar Scores: Measuring partisanship using social media content. Journal of Information Technology & Politics.)
The dataset contains one plain text TSV file with the following information for each of the 55,244 tweets used to develop #Polar scores : tweet_id, created_at, user_id, screen_name, tag, shortid, sex, party, state, chamber, name. The file contains one row per hashtag, and therefore tweets may appear more than once. The Python code for calculating #Polar scores is available here: http://doi.org/10.5281/zenodo.53888
Show less
- Title
- Inefficiencies in resource allocation games
- Creator
- Tota, Praneeth
- Date
- 2019
- Description
-
This thesis addresses a problem that has been debated by the academic community, the government and the industry at large which is : How...
Show moreThis thesis addresses a problem that has been debated by the academic community, the government and the industry at large which is : How unfair is a tiered Internet compared to a open Internet ? On one hand we have an open Internet in which all the data is treated equally and the Internet service providers have no say when it comes to a pricing differentiation and on the other hand we have a tiered Internet in which the ISPs can charge different amounts based on certain constraints like the type of data or the content provider. The architecture of the internet imposes certain constraints which need mechanisms to efficiently allocate the resources among all the competing participants who only concern themselves with their best interests without considering the social benefit as a whole. We consider one such mechanism known as proportional sharing in which resource or the bandwidth is divided among the participants based on their bids. An efficient allocation is one which maximizes the aggregate utility of the users. We consider inelastic demand with the participants as price anticipating and ensure market clearing.We examine a tiered Internet in which the ISPs can partition the bandwidth based on certain constraints and charge a premium for better service. The participants involved are from all economic classes, so they have different amounts of wealth at their disposal. We quantify the relative loss incurred by the participants in lower economic classes as compared to the higher economic classes. We also calculate the loss of efficiency caused by competition among the participants as compared to the optimum social allocation.
Show less
- Title
- A NEW SATISFIABILITY SOLVER OF THE FEATURE LANGUAGE EXTENSION
- Creator
- Ai, Jieling
- Date
- 2012-04-23, 2012-05
- Description
-
We introduce a satisfiability solver for first order formulas written in a modern object oriented programming language such as Java, which is...
Show moreWe introduce a satisfiability solver for first order formulas written in a modern object oriented programming language such as Java, which is the programming language that we use to implement the solver. The variables in the first order formula can be of any data type definable with the host programming language. The first application of the solver is to detect interaction conditions among programs written in the Feature Language Extensions (FLX). Therefore, it also determines the satisfying conditions of the formula if the formula is satisfiable. FLX is a set of programming language constructs designed to allow the programmer to develop interaction features as reusable program modules [25]. Interaction detection is equivalent to automating the task of finding where to make code changes if the interacting features are implemented with conventional programming languages. The solver requires that predicates in the formula contain no functional elements. This restriction should not reduce the kind of programs that can be written in the host programming language. FLX provides language support for the solver. The language constructs allow the programmer to provide semantic guidance to the solver on the data types that they define, and the compiler to enforce the standards required of the first order formula. While the first application of the solver is to analyze programs written in FLX, it should be useful to other applications which desire such a solver to process variables used in software directly.
M.S. in Computer Science, May 2012
Show less
- Title
- A COMPARATIVE STUDY OF FEATURE INTEGRATION WITH FLX AND ASPECTJ
- Creator
- Ramakrishna Reddy, Niranjana Sompura
- Date
- 2014, 2014-07
- Description
-
Feature Language Extensions (FLX) and AspectJ are two sets of programming language constructs designed to enable the programmer to modularize...
Show moreFeature Language Extensions (FLX) and AspectJ are two sets of programming language constructs designed to enable the programmer to modularize interacting features, or equivalently crosscutting concerns, that cannot be modularized with a main stream programming language. The two approaches are quite di erent. The purpose of this thesis is to compare how e ective they are in feature integration, such as whether the already developed features will need to be modi ed. The study was conducted by integrating a set of features of the familiar computer blackjack game. The blackjack game is interesting because it has features that will execute some programs of the basic game iteratively and recursively. We found that with AspectJ we need to modify existing feature code or repeating feature code under certain integration scenarios. We discuss the underlying reasons why they occur and in some cases suggest methods to overcome them.
M.S. in Computer Science, July 2014
Show less
- Title
- IMPROVING FAULT TOLERANCE FOR EXTREME SCALE SYSTEMS
- Creator
- Berrocal, Eduardo
- Date
- 2017, 2017-05
- Description
-
Mean Time Between Failures (MTBF), now calculated in days or hours, is expected to drop to minutes on exascale machines. In this thesis, a new...
Show moreMean Time Between Failures (MTBF), now calculated in days or hours, is expected to drop to minutes on exascale machines. In this thesis, a new approach for failure prediction based on the Void Search (VS) algorithm is presented . VS is used primarily in astrophysics for nding areas of space that have a very low den- sity of galaxies. We explore its potential for failure prediction using environmental information and compare it to well known prediction methods. Another important issue for the HPC community is that next-generation supercomputers are expected to have more components and consume several times less energy per operation. Hence, supercomputer designers are pushing the limits of miniaturization and energy-saving strategies. Consequently, the number of soft errors is expected to increase dramati- cally in the coming years. While mechanisms are in place to correct or at least detect soft errors, a percentage of those errors pass unnoticed by the hardware. Techniques that leverage certain properties of iterative HPC applications (such as the smoothness of the evolution of a particular dataset) can be used to detect silent errors at the application level. Results show that it is possible to detect a large number of corruptions (i.e., above 90% in some cases) with less than 100% overhead using these techniques. Nevertheless, these data-analytic solutions are still far from fully pro- tecting applications to a level comparable with more expensive solutions such as full replication. In this thesis, partial replication is explored to overcome this limitation. More speci cally, it has been observed that not all processes of an MPI application experience the same level of data variability at exactly the same time. Thus, one can smartly choose and replicate only those processes for which the lightweight data- analytic detectors would perform poorly. Results indicate that this new approach can protect the MPI applications analyzed with 7{70% less overhead (depending on the application) than that of full duplication with similar detection recall.
Ph.D. in Computer Science, May 2017
Show less