<?xml version="1.0" encoding="UTF-8"?>
<mods
    xmlns="http://www.loc.gov/mods/v3"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    version="3.7"
    xsi:schemaLocation="http://www.loc.gov/mods/v3 http://www.loc.gov/standards/mods/v3/mods-3-7.xsd">
    <titleInfo>
        <title>Towards In-Network Semantic Analysis: A Case Study involving Spam Classification</title>
    </titleInfo>

    <name>
        <namePart>Gueyraud, Cyprien</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">Creator</roleTerm>
        </role>
        <description>Graduate student</description>
        <affiliation>cgueyraud@hawk.iit.edu</affiliation>
    </name>
    <name>
        <namePart>Sultana, Nik</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">Creator</roleTerm>
        </role>
    </name>

    <name type="corporate">
        <namePart>CS / Computer Science</namePart>
        <affiliation>Illinois Institute of Technology</affiliation>
        <role>
            <roleTerm type="text">Affiliated department</roleTerm>
        </role>
    </name>

    <subject>
        <topic>Computer Science</topic>
    </subject>

    <originInfo>
        <dateCreated encoding="w3cdtf" keyDate="yes">2023-03-06</dateCreated>

        <dateIssued encoding="w3cdtf" />
    </originInfo>

    <typeOfResource>Master's project</typeOfResource>

    <language>
        <languageTerm type="code" authority="iso639-2b">en</languageTerm>
    </language>

    <abstract
        >Analyzing free-form natural language expressions “in the network”—that is, on programmable switches and smart
        NICs—would enable packet-handling decisions that are based on the textual content of flows. This analysis would
        support richer, latency-critical data services that depend on language analysis—such as emergency response,
        misinformation classification, customer support, and query-answering applications. But packet forwarding and
        processing decisions usually rely on simple analyses based on table look-ups that are keyed on well-defined (and
        usually fixed size) header fields. P4 is the state of the art domain-specific language for programming network
        equipment, but, to the best of our knowledge, analyzing free-form text using P4 has not yet been investigated.
        Although there is an increasing variety of P4-programmable commodity network hardware available, using P4
        presents considerable technical challenges for text analysis since the language lacks loops and fractional
        datatypes. This paper presents the first Bayesian spam classifier written in P4 and evaluates it using a
        standard dataset. The paper contributes techniques for the tokenization, analysis, and classification of
        free-form text using P4, and investigates trade-offs between classification accuracy and resource usage. It
        shows how classification accuracy can be tuned between 69.1% and 90.4%, and how resource usage can be reduced to
        6% by trading-off accuracy. It uses the spam filtering use-case to motivate the need for more research into in
        network text analysis to enable future “semantic analysis” applications in programmable networks.
    </abstract>

    <subject>
        <topic>Computer Science</topic>
    </subject>

    <accessCondition type="restrictionOnAccess">open access</accessCondition>

    <recordInfo>
        <languageOfCataloging>
            <languageTerm authority="iso639-2b" type="code">eng</languageTerm>
        </languageOfCataloging>
    </recordInfo>
    <identifier type="hdl">http://hdl.handle.net/10560/islandora:1012248</identifier></mods
>
