Search results
(1 - 11 of 11)
- Title
- INFLUENCE OF TIE STRENGTH ON HOSTILITY IN SOCIAL MEDIA
- Creator
- Radfar, Bahar
- Date
- 2019
- Description
-
Online anti-social behavior, such as cyberbullying, harassment, and trolling, is a widespread problem that threatens free discussion and has...
Show moreOnline anti-social behavior, such as cyberbullying, harassment, and trolling, is a widespread problem that threatens free discussion and has negative physical and mental health consequences for victims and communities.While prior work has proposed automated methods to identify toxic situation such as hostility, they only focused on individual words. While only a bag of keywords is applied to detect hostility, this is not enough as words might have different meaning based on the relationship between participants of the discussion. In this paper, we considered the friendship between the sender and the target of a hostile conversation. First, we studied the characteristic of different types of relationship. Then, we set our goal to be more accurate hostility detection with reduced wrong red flags.Thus, we aim to detect both the presence and intensity of hostile comments based on linguistic and social features from our well-defined relationships. To evaluate our approach, we introduce a corpus of over 12K annotated Twitter tweets from over +170,000 tweets. Next, we extracted useful features such as relationship type and length of the tweet to feed into our Long Short Term Memory(LSTM) and Logistic Regression(LR) classifier. By considering the relationship type in the classifier model we improved the hostility detection AUC by close to 5 % comparing to the baseline method. Also, the F-1 score increased by 4 % as well.
Show less
- Title
- Concurrency and Locality Aware GPGPU Thread Group Scheduling
- Creator
- Nosek, Janusz M
- Date
- 2018
- Description
-
Graphics Processing Units (GPUs) once served a limited function for rending of graphics. With technological advances, these devices gained new...
Show moreGraphics Processing Units (GPUs) once served a limited function for rending of graphics. With technological advances, these devices gained new purposes beyond graphics. Most modern GPUs have exposed their APIs to allow processing of data beyond the display, thus leading to a revolution in computing where instructions and intensive tasks can be offloaded to these now General Purpose Graphical Processing Units (GPGPUs). Many compute and memory intensive tasks have utilized GPGPUs for acceleration and these devices are especially prevalent in the financial, pharmaceutical and automotive industries. As computing resources have increased exponentially, memory resources have not and now create a limiting factor known as the memory wall. GPUs have been designed as an application specific processing unit for the streaming data access patterns found in graphical applications. They are successful at their original purpose, but when extended to general purpose problems, they meet the same memory wall data access problem as their CPU counterparts; they can be more susceptible to the effects latency due to the locality and concurrency of instructions beside data. This thesis reviews the current GPGPU landscape, including the design of current scheduling systems, GPGPU architecture, as well as a way of computing and describing the memory access penalty with Concurrent Average Memory Access Time (C-AMAT). We will also demonstrate the current GPGPU landscape, including design of schedulers, simulators as well as how Concurrent Average Memory Access Time (C-AMAT) can be computed. We have devised a solution to manipulate the number of scheduled thread groups to allow a GPGPU’s processing units to match their current memory states defined by C-AMAT. Our solution results in the increase in IPC, the reduction in C-AMAT and decrease in memory misses. The solution also has different effects on different types of computing problems, with highest improvements achieved in compute intensive memory patterns with as much as a 12% improvement in the instructions per cycle and a 14% reduction in C-AMAT.
Show less
- Title
- PRIVACY PRESERVING BAG PREPARATION FOR LEARNING FROM LABEL PROPORTION
- Creator
- Yan, Xinzhou
- Date
- 2018
- Description
-
We apply Privacy-preserving data mining standards (PPDM) to the Learning from label proportion (LLP) model to create the Private-preserving...
Show moreWe apply Privacy-preserving data mining standards (PPDM) to the Learning from label proportion (LLP) model to create the Private-preserving machine learning framework. We design the data preparation step for the LLP framework to meet the PPDM standards. In the data preparation step, we develop a bag selection method to boost the accuracy of the LLP model by more than 7%. Besides that, we propose three K- anonymous aggregation methods for the datasets which have almost zero accuracy loss and very robust. After the K-anonymous step, we apply Differential privacy to the LLP model and ensure a low accuracy loss for the LLP modelBecause of the LLP model’s special loss function, not only it is possible to replace all the feature vectors with the mean feature vector within each bag, but also the accuracy loss caused by Differential privacy can be bounded by a small number. The loss function ensures low accuracy loss when training LLP model on PPDM dataset. We evaluate the PPDM LLP model on two datasets, one is the Adult dataset and the other is the Instagram comment dataset. Both of them give empirical evidence of the low accuracy loss after applying the PPDM LLP model.
Show less
- Title
- Unsupervised Learning of Visual Odometry Using Direct Motion Modeling
- Creator
- Andrei, Silviu Stefan
- Date
- 2020
- Description
-
Data for supervised learning of ego-motion and depth from video is scarce and expensive to produce. Subsequently, recent work has focused on...
Show moreData for supervised learning of ego-motion and depth from video is scarce and expensive to produce. Subsequently, recent work has focused on unsupervised learning methods and achieved remarkable results which surpass in some instances the accuracy of supervised methods. Many unsupervised approaches rely on predicted monocular depth and so ignore motion information. Moreover, unsupervised methods which do incorporate motion information do so only indirectly by designing the depth prediction network as an RNN. Hence, none of the existing methods model motion directly. In this work, we show that it is possible to achieve superior pose estimation results by modeling motion explicitly. Our method uses a novel learning-based formulation for depth propagation and refinement which transforms predicted depth maps from the current frame onto the next frame where it serves as a prior for predicting the next frame's depth map. Experimental results demonstrate that the proposed approach surpasses state of the art techniques for the pose prediction task while being better or on par with other methods for the depth prediction task.
Show less
- Title
- Integrity based landmark generation: A method to generate landmark configurations that guarantee mobile robot localization safety
- Creator
- Chen, Yihe
- Date
- 2020
- Description
-
From the bronze-age city Nineveh to the modern metropolitan like Tokyo, traffic shape cities and profoundly affect the life of people. Similar...
Show moreFrom the bronze-age city Nineveh to the modern metropolitan like Tokyo, traffic shape cities and profoundly affect the life of people. Similar to how the wide-spreading of automobile had modified the modern cities in early 20th century, we are now standing on the eve of yet another traffic revolution. With the vast spreading of autonomous/semi- autonomous robotics application, it is important for the urban designers to design or retrofit urban environment that is safe and friendly to the autonomous robots; As more robots are deployed in life-critical situations, such as autonomous passenger vehicles, it is imperative to consider their safety, and in particular, their localization safety. While it would be ideal to guarantee safety in any environment without having to physically modify said environment, this is not always possible and one may have add landmarks or active beacons to reach an acceptable level of safety for landmark-based localization. Localization safety is assessed using integrity, the primary safety metric used in open-sky aviation applications that has been recently applied to mobile robots and can ac- count for the impact of rarely occurring, undetected faults. Conventional integrity monitor- ing method has high dependency on GPS system, while the traditional Global Navigation Satellite System - Inertia Measurement Unit (GNSS-IMU) based localization does not ap- plied in the metropolitan areas due to the signal blocking and multi-pathing problem caused by high-rise structures. Thus, this dissertation concentrates on the feature based integrity monitoring method. This dissertation formulates environmental localization safety problem as a system- atic optimization problem: given the robot’s trajectory and the current landmark map, add the minimal number of new landmarks at certain location such that the integrity risk along the trajectory is below a given safety threshold. This dissertation proposes two algorithms to solve the problem: Integrity-based Landmark Generator (I-LaG) and Fast I-LaG. I-LaG adds fewer landmarks but it is relatively computationally expensive; Fast I-LaG is less com- putationally intensive at the expense of more landmarks. Both simulation and experimental results are presented.
Show less
- Title
- Understanding Location Bias in Fake News Datasets of Twitter
- Creator
- Patil, Kayenat Kailas
- Date
- 2023
- Description
-
Fake news tends to spread faster and wider than real news. It has a greater impact and can lead to negative and dangerous outcomes. With the...
Show moreFake news tends to spread faster and wider than real news. It has a greater impact and can lead to negative and dangerous outcomes. With the world spending an increasing amount of time on their mobile devices, people tend to get more of their news from their desired social media platform. It has become part of our daily lives, whether it is to keep in touch with friends and family, to getting gossip on celebrities or even shopping. In 2022, the average time a person spends per day on the internet on a social media platform has been accounted to be about 147 minutes,[1] indicating an increase in time spent scrolling through information online.It has become a widespread phenomenon in recent years, thanks in part to the rapid spread of information through social media and other online channels. It is increasingly important to explore and understand fake news and its impact on society, as well as to develop effective tools and methods for detecting and combating it. There are several factors that can tamper with the successful detection of fake news. Machine learning models often fall to such biases that result in inaccurate predictions. There are several biases that have been identified like age, gender, sex and many more. In this thesis, we are exploring location as a form of a bias and if it hinders prediction. We have looked at location from two perspectives. One, taking location as co-ordinates in the form of latitude and longitude and analyzing the likelihood of a tweet coming from a location to be fake or not. The second method we have used is that we have considered location as an entity and used natural language processing model to see if its able to predict if the given tweet is fake or not, along with masking the location mentioned in the tweet and analyzing how the performance of the model changes. Machine learning models can play an important role in fake news detection models, by analyzing large amounts of data and identifying patterns and indicators that suggest a piece of information may be false or misleading, but they are often susceptible to some form of biases. By studying biases on machine learning models on fake news datasets, we can develop more effective tools for identifying fake news and taking steps towards mitigating it, ultimately helping to protect the integrity of information and promote informed decision-making in society.
Show less
- Title
- Effect of Pre-Processing Data on Fairness and Fairness Debugging using GOPHER
- Creator
- Sarkar, Mousam
- Date
- 2023
- Description
-
At present, Artificial intelligence has been contributing to the decision-making process heavily. Bias in machine learning models has existed...
Show moreAt present, Artificial intelligence has been contributing to the decision-making process heavily. Bias in machine learning models has existed throughout and present studies’ direct usage of eXplainable Artificial Intelligence (XAI) approaches to identify and study bias. To solve the problem of locating bias and then mitigating it has been achieved by Gopher [1]. It generates interpretable top-k explanations for the unfairness of the model and it also identifies subsets of training data that are the root cause of this unfair behavior. We utilize this system to study the effect of pre-processing on bias through provenance. The concept of data lineage through tagging of data points during and after the pre-processing stage is implemented. Our methodology and results provide a useful point of reference for studying the relation of pre-processing data with the unfairness of the machine learning model.
Show less
- Title
- PIMMINER: A HIGH-PERFORMANCE PIM ARCHITECTURE-AWARE GRAPH MINING FRAMEWORK
- Creator
- Su, Jiya
- Date
- 2022
- Description
-
Graph mining applications, such as subgraph pattern matching and mining, are widely used in real-world domains such as bioinformatics, social...
Show moreGraph mining applications, such as subgraph pattern matching and mining, are widely used in real-world domains such as bioinformatics, social network analysis, and computer vision. Such applications are considered as a new class of data-intensive applications that generate massive irregular computation workloads and memory accesses, which degrade the performance and scalability significantly. Leveraging emerging hardware, such as process-in-memory (PIM) technology, could potentially accelerate such applications. In this paper, we propose PIMMiner, a high-performance PIM architecture graph mining framework. We first identify that current PIM architecture cannot be fully utilized by graph mining applications. Next, we propose a set of optimizations that enhance the locality, and internal bandwidth utilization and reduce remote bank accesses and load imbalance through cohesive algorithm and architecture co-designs. We compare PIMMiner with several state-of-the-art graph mining frameworks and show that PIMMiner is able to outperform all of them significantly.
Show less
- Title
- Enhancing Explanation Generation in the CaJaDE system using Interactive User Feedback
- Creator
- Lee, Juseung
- Date
- 2022
- Description
-
In today’s data-driven world, it is becoming increasingly difficult to interpret and understand query results after going through several...
Show moreIn today’s data-driven world, it is becoming increasingly difficult to interpret and understand query results after going through several manipulation steps, especially on a large database. There is a need for automated techniques that explain query results in a meaningful way. A recent study, CaJaDE(Context-Aware Join-Augmented Deep Explanations), presents a novel approach to generating explanations of query results including crucial contextual information. However, it becomes difficult to interpret explanations since the search space increases exponentially.In this thesis, we propose a new approach that introduces a user interaction model for a purpose of enhancing the generation of explanations in the CaJaDE system. We implemented a user interaction model that consists of three modules: User Selection, Recommendation Score, and User Rating. With these modules, our approach guides a user while exploring relevant join graphs, and lets them be involved in the decision-making process while generating join graphs. We demonstrate through performance experiments and user study that our approach is an effective method for users to understand explanations.
Show less
- Title
- TOPICDP – ENSURING DIFFERENTIAL PRIVACY FOR TOPIC MINING
- Creator
- Sharma, Jayashree
- Date
- 2021
- Description
-
Topic mining enables applications to recognize patterns and draw insights from text data, which can be used for applications such as sentiment...
Show moreTopic mining enables applications to recognize patterns and draw insights from text data, which can be used for applications such as sentiment analysis, building of recommender systems and classifiers. The text data can be a set of documents or emails or product feedback and reviews. Each document is analysed using probabilistic models and statistical analysis to discover patterns that reflects underlying topics.TopicDP is a differentially private topic mining technique, which injects well-calibrated Gaussian noise into the matrix output of the topic mining model generated from LDA algorithm. This method ensures differential privacy and good utility of the topic mining model. We derive smooth sensitivity for the Gaussian mechanism via sensitivity sampling, which resses the major challenges of high sensitivity in case of topic mining for differential privacy. Furthermore, we theoretically prove the differential privacy guarantee and utility error bounds of TopicDP. Finally, we conduct extensive experiments on two real-word text datasets (Enron email and Amazon Product Reviews), and the experimental results demonstrate that TopicDP can generate better privacy preserving performance for topic mining as compared against other state-of-the-art differential privacy mechanisms.
Show less
- Title
- Evaluating Speech Separation Through Pre-Trained Deep Neural Network Models
- Creator
- Prabhakar, Deeksha
- Date
- 2023
- Description
-
Speaker separation involves separating individual speakers from a mixture of voices or background noise, known as the "cocktail party problem....
Show moreSpeaker separation involves separating individual speakers from a mixture of voices or background noise, known as the "cocktail party problem." This refers to the ability to focus on a specific sound while filtering out other distractions.In this analysis, we propose the idea of obtaining features present in the original data and then evaluating the impact they have on the ability of the model to separate the mixed audio streams. The dataset is prepared such that these feature values can be used as predictor variables to various models like Logistic Regression, Decision Trees, SVM (both rbf and linear kernel), XGBoost, AdaBoost, to obtain the most contributing features that is the features that will lead to a better separation. These results shall then be analyzed to conclude the features that affect separating the audio streams the most. Initially, 400 audio streams are selected from the VoxCeleb dataset and combined to form 200 single utterances. After the mixes are obtained, the pre-trained Speechbrain model, sepformer-whamr is used. This model separates the audio mixes given as input and obtain two outputs that should be as close as possible to the original ones. A feature list from the 400 chosen audios is obtained and then the effect of certain features on the model's capability to distinguish between multiple audio sources in a mixed recording is assessed. Two analysis parameters- permutation feature importance and SHAP values are used to conclude which features have more effect on separation. Our hypothesis is that the features contributing the most to a good separation are invariant across datasets. To test this hypothesis, we obtain 1,000 audio streams from the Mozilla Common Voice Dataset and perform the same experimental methodology described above. Our results demonstrate that the features we extract from VoxCeleb dataset are indeed invariant and aid in separating the audio streams of the Mozilla Common Voice dataset.
Show less