<?xml version='1.0' encoding='utf-8'?>
<mods xmlns="http://www.loc.gov/mods/v3" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" version="3.7" xsi:schemaLocation="http://www.loc.gov/mods/v3 http://www.loc.gov/standards/mods/v3/mods-3-7.xsd">
   <name>
      <role>
         <roleTerm type="text" authority="marcrelator" authorityURI="http://id.loc.gov/vocabulary/relators" valueURI="http://id.loc.gov/vocabulary/relators/cre">creator</roleTerm>
      </role>
      <namePart>Cho, Jason Y.</namePart>
   </name>
   <titleInfo>
      <title>Independence and Graphical Models for Fitting Real Data</title>
   </titleInfo>
   <originInfo>
      <dateCreated keyDate="yes">2023</dateCreated>
   </originInfo>
   <note displayLabel="Degree Awarded">Spring 2023</note>
   <typeOfResource authority="aat" valueURI="http://vocab.getty.edu/page/aat/300028029">Thesis</typeOfResource>
   <name type="corporate">
      <affiliation>Illinois Institute of Technology</affiliation>
   </name>
   <name type="corporate">
      <namePart>MATH / Applied Mathematics</namePart>
   </name>
   <name authority="wikidata" authorityURI="https://www.wikidata.org" valueURI="https://www.wikidata.org/wiki/Q102111462">
      <role>
         <roleTerm type="text" authority="marcrelator" authorityURI="http://id.loc.gov/vocabulary/relators" valueURI="http://id.loc.gov/vocabulary/relators/cre">advisor</roleTerm>
      </role>
      <namePart>Kaul, Hemanshu</namePart>
   </name>
   <subject>
      <topic>Mathematics</topic>
   </subject>
   <subject>
      <topic>Fisher</topic>
   </subject>
   <subject>
      <topic>Graphical</topic>
   </subject>
   <subject>
      <topic>Hypothesis</topic>
   </subject>
   <subject>
      <topic>Likelihood</topic>
   </subject>
   <subject>
      <topic>Metropolis</topic>
   </subject>
   <subject>
      <topic>Statistical model</topic>
   </subject>
   <language>
      <languageTerm type="code" authority="rfc3066">en</languageTerm>
   </language>
   <abstract>Given some real life dataset where the attributes of the dataset take on categorical values, with corresponding r(1) × r(2) × … × r(m) contingency table with nonzero rows or nonzero columns, we will be testing the goodness-of-fit of various independence models to the dataset using a variation of Metropolis-Hastings that uses Markov bases as a tool to get a Monte Carlo estimate of the p-value. This variation of Metropolis-Hastings can be found in Algorithm 3.1.1. Next we will consider the problem: ``out of all possible undirected graphical models each associated to some graph with m vertices that we test to fit on our dataset, which one best fits the dataset?" Here, the m attributes are labeled as vertices for the graph. We would have to conduct 2^(mC2) goodness-of-fit tests since there are 2^(mC2) possible undirected graphs on m vertices. Instead, we consider a backwards selection method likelihood-ratio test algorithm. We first start with the complete graph G = K(m), and call the corresponding undirected graphical model ℳ(G) as the parent model. Then for each edge e in E(G), we repeatedly apply the likelihood-ratio test to test the relative fit of the model ℳ(G-e), the child model, vs. ℳ(G), the parent model, where ℳ(G-e) ⊆ℳ(G). More details on this iterative process can be found in Algorithm 4.1.3. For our dataset, we will be using the alcohol dataset found in https://www.kaggle.com/datasets/sooyoungher/smoking-drinking-dataset, where the four attributes of the dataset we will use are ``Gender" (male, female), ``Age", ``Total cholesterol (mg/dL)", and ``Drinks alcohol or not?". After testing the goodness-of-fit of three independence models corresponding to the independence statements ``Gender vs Drink or not?", ``Age vs Drink or not?", and "Total cholesterol vs Drink or not?", we found that the data came from a distribution from the two independence models corresponding to``Age vs Drink or not?" and "Total cholesterol vs Drink or not?" And after applying the backwards selection likelihood-ratio method on the alcohol dataset, we found that the data came from a distribution from the undirected graphical model associated to the complete graph minus the edge {``Total cholesterol”, ``Drink or not?”}.</abstract>
   <physicalDescription>
      <digitalOrigin>born digital</digitalOrigin>
      <internetMediaType>application/pdf</internetMediaType>
   </physicalDescription>
   <accessCondition type="useAndReproduction" displayLabel="rightsstatements.org">In
                Copyright</accessCondition>
   <accessCondition type="useAndReproduction" displayLabel="rightsstatements.orgURI">http://rightsstatements.org/page/InC/1.0/</accessCondition>
   <accessCondition type="restrictionOnAccess">Restricted Access</accessCondition>
<identifier type="hdl">http://hdl.handle.net/10560/islandora:1025126</identifier></mods>