Computer-mediated communication is becoming the most convenient and important way of sharing and exchanging information. The large volume and... Show moreComputer-mediated communication is becoming the most convenient and important way of sharing and exchanging information. The large volume and diversity of user generated content as well as pervasive user opinions on the web make existing text processing methods ine cient and ine ective. Hence, there is a need for better ways of analyzing and utilizing user generated content. My thesis focuses on user generated data and is composed of two main parts: sentiment analysis and content analysis. I present a case study in which I use machine learning techniques to analyze real-world survey responses. Supervised techniques are exploited to classify customers' loyalty based on their comments and estimate a Net Promoter Score (NPS). NPS is a crucial indicator which has been used as a means of measuring survey results with a single estimator. I de ne three patterns to support generalized sentiment-bearing expression extraction, and design a set of heuristic rules to detect both explicit and implicit negations. By altering existing dependency with detected negations and generalized sentiment-bearing expressions I am able to construct more accurate sentiment features. Our results demonstrate that generalized dependency-based features are more e ective when compared to standard features. For content analysis, the thesis addresses the problem of user generated content summarization. I focus on two sub-problems: how to summarize the novel information from user generated content and how to present the evolutionary theme threads from temporal text collections with summaries. I design two speci c topic models for these two summarization tasks respectively. To discover similar and supplemental topics in user opinions with respect to the descriptive text provided by a publisher, I propose a semi-supervised generative model by casting the local publishers descriptive elds as a prior of a resembling topic. The most representative sentences in user opinions are classi ed based on their sentiment and used to construct a summary of x the comments. To track changes of topics in temporal text collections, I extend the probabilistic model to sentence level and use name entity to make the extracted theme thread easier to understand. Experimental results demonstrate the e ectiveness of the proposed models. PH.D in Computer Science, December 2012 Show less