Search results

Title: Machine Learning at the Bureau of Labor Statistics
Creator: Ellis, Robert, Kannan, Vinesh
Date: 2019-11-21
Description: Vinesh Kannan (CS '19) shares his experiences working as a...
Show moreVinesh Kannan (CS '19) shares his experiences working as a data science fellow at the Bureau of Labor Statistics (BLS). Vinesh worked on the team that produces occupation and wage data used by policymakers, hiring staff, job seekers, and researchers across the country. He helped improve machine learning systems at the BLS: automatically identifying problematic training data and classifying rare jobs. Vinesh offers advice for students who may be interested in applying for the 2020 Civic Digital Fellowship, a program that recruits university students at all levels to spend a summer working on civic technology projects with various federal agencies.
Sponsorship: College of Science, Department of Computer Science, Department of Applied Mathematics, Machine Learning at IIT
Show less

Title: Semantics and further Use-Cases and Evaluation of the C-Saw language
Creator: Zhu, Henry, Zhao, Junyong, Sultana, Nik
Date: 2023-03-09
Description: This report provides supplementary technical details to the conference paper that introduced C-Saw, a language for expressing software...
Show moreThis report provides supplementary technical details to the conference paper that introduced C-Saw, a language for expressing software architecture patterns. This report provides additional examples of using C-Saw, supplementary evaluation details, and it defines the formal semantics of the language.
Show less

Title: Continuous Generalization of 2’s Complement Arithmetic
Creator: Patel, Shivam
Date: 2022-11-26

Title: Towards In-Network Semantic Analysis: A Case Study involving Spam Classification
Creator: Gueyraud, Cyprien, Sultana, Nik
Date: 2023-03-06
Description: Analyzing free-form natural language expressions “in the network”—that is, on programmable switches and smart NICs—would enable packet...
Show moreAnalyzing free-form natural language expressions “in the network”—that is, on programmable switches and smart NICs—would enable packet-handling decisions that are based on the textual content of flows. This analysis would support richer, latency-critical data services that depend on language analysis—such as emergency response, misinformation classification, customer support, and query-answering applications. But packet forwarding and processing decisions usually rely on simple analyses based on table look-ups that are keyed on well-defined (and usually fixed size) header fields. P4 is the state of the art domain-specific language for programming network equipment, but, to the best of our knowledge, analyzing free-form text using P4 has not yet been investigated. Although there is an increasing variety of P4-programmable commodity network hardware available, using P4 presents considerable technical challenges for text analysis since the language lacks loops and fractional datatypes. This paper presents the first Bayesian spam classifier written in P4 and evaluates it using a standard dataset. The paper contributes techniques for the tokenization, analysis, and classification of free-form text using P4, and investigates trade-offs between classification accuracy and resource usage. It shows how classification accuracy can be tuned between 69.1% and 90.4%, and how resource usage can be reduced to 6% by trading-off accuracy. It uses the spam filtering use-case to motivate the need for more research into in network text analysis to enable future “semantic analysis” applications in programmable networks.
Show less

repository.iit

Search the repository

Enabled Filters

Refine Results

Type

Date

Academic status of creator

Subject

Creator

Rights