Software

Our group develops, collaborated on, and releases open research software in atomistic simulation, machine-learned interatomic potentials, lifelong learning for large language models, and molecular AI. All original software contributions are made publicly available through our GitHub organization / repositories.

GitHub: github.com/pythonpanda2

1. Long-Range Physics and Attention Models

Reciprocal-Space Attention (RSA)

RSA is a next-generation architecture that captures long-range interactions directly in reciprocal (Fourier) space using Fourier positional encoding and kernelized attention. It provides linear-scaling, charge-agnostic modeling of electrostatics and dispersion within machine-learned interatomic potentials. This framework can be combined with any short range MLIP backbone in an end-to-end differential fashion and allows access to long-range physics seamlessly, enabling density functional theory accurate modeling at scale.

Repository:  reciprocal_space_attention

2. Active Learning for Atomistic Simulation and MLIPs

Active-learning workflow for Gaussian Approximation Potentials (GAP)

Automated active learning workflow for training Gaussian Approximation Potentials (GAP) for atomistic simulations. The workflow closes the loop between sampling, uncertainty estimation, and model refinement.

Repository:  active-learning-md
Tutorial / hands-on materials:  psik-workshop-AL-GAP

GPU-accelerated active learning with Gaussian Processes / Deep Kernel Learning

A scalable active learning framework implemented in PyTorch/GPyTorch. Uses Gaussian process regression and Deep Kernel Learning on GPUs to accelerate query selection and model improvement.

Repository:  ECG_ActiveLearning

AL4GAP: Ensemble active learning at scale

AL4GAP is an automated workflow for fitting machine-learned interatomic potentials (MLIPs) across combinatorial chemical spaces. It uses Cray SmartSim for ensemble-style active learning on leadership-class high-performance computing systems.

Repository:  AL4GAP_JCP

3. Lifelong / Continual Learning for Large Language Models

Catastrophic forgetting mitigation in chemistry-oriented LLMs

This project studies catastrophic forgetting in a Mistral-7B model fine-tuned on sequential chemistry tasks (e.g., reaction yield prediction) and develops continual / life-long learning strategies to preserve prior knowledge while learning new chemistry data.

JAX / Equinox implementation:  CL_MISTRAL7B_REACT

4. Molecular Machine Learning

MOLAN: Machine Learning Workflow for Molecular Analysis

MOLAN provides suite of tools for unsupervised, supervised learnings and inverse design models for molecular melting point.

Repository:  molan

AI4PFAS: Deep learning for PFAS toxicity

AI4PFAS provides a suite of tools for molecular machine learning and deep learning to assess toxicity in PFAS-class compounds.

Repository:  AI4PFAS