Traditional compute-centric scientific discovery has led to a growing gap between computation power and storage capabilities. However, in the... Show moreTraditional compute-centric scientific discovery has led to a growing gap between computation power and storage capabilities. However, in the data explosion era, where data analysis is essential for scientific discovery, slow storage systems led to the research conundrum known as the I/O bottleneck. Scientists have proposed several optimizations to address the I/O bottleneck. However, selecting and applying the appropriate optimization is a complex task, often left to the users. Additionally, the explosion of data has led to the proliferation of applications as well as storage technologies. This has created a complex matching problem between diverse application requirements and heterogeneous storage resources for the users. We need to move towards a Self-Programmable storage system that can automatically understand the I/O requirements of applications, transparently leverage the heterogeneity of storage, and reconfigures itself dynamically by utilizing application and storage information. In this work, we present the Jal System for building Self-Programmable storage. The Jal System consists of three layers: the application layer, the transfer layer, and the storage layer. The application layer uses automatic extraction of I/O requirements from applications using a source-code-based profiler. The storage layer defines a data abstraction, using a shared log store, to efficiently unify heterogeneous storage resources under a single platform. Finally, the transfer layer defines data management algorithms that consider multi-application and multi-storage information to optimize data operations. Additionally, we illustrate the benefits of utilizing the technologies within the Jal System on modern scientific AI applications. Our evaluations have demonstrated that each technology within the Jal System can accelerate I/O for modern scientific workflows. We have implemented software, tools, and system libraries for modern HPC systems. In the future, we envision building a fully integrated system that efficiently utilizes all the Jal System technologies. Additionally, we plan to extend the strategies and techniques in Jal System to other scientific domains such as AI and IoT. Show less