Developing an Intelligent Wet-Lab Protocol Planner and Validator
This guide walks you through creating a smart Wet-Lab Protocol Planner & Validator designed to streamline experimental design and execution. Built with Python, the system leverages advanced natural language processing to interpret and optimize laboratory protocols. The architecture is divided into distinct modules: ProtocolParser extracts structured information such as procedural steps, durations, and temperature conditions from textual protocols; InventoryManager verifies reagent availability and expiration; SchedulePlanner constructs efficient timelines and identifies opportunities for parallel task execution; and SafetyValidator flags potential biosafety or chemical hazards. An integrated large language model (LLM) then provides intelligent recommendations to enhance protocol efficiency, closing the loop between analysis, planning, validation, and refinement.
Setting Up the Environment and Loading the Model
We start by importing necessary Python libraries and loading the Salesforce CodeGen-350M-mono model locally. This approach enables lightweight inference without relying on external APIs. The tokenizer and model are initialized with 16-bit floating-point precision and automatic device mapping to maximize performance on GPU-enabled environments such as Google Colab.
import re
import json
import pandas as pd
from datetime import datetime, timedelta
from collections import defaultdict
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
MODELNAME = "Salesforce/codegen-350M-mono"
print("Loading CodeGen model (approx. 30 seconds)...")
tokenizer = AutoTokenizer.frompretrained(MODELNAME)
tokenizer.padtoken = tokenizer.eostoken
model = AutoModelForCausalLM.frompretrained(
MODELNAME, torchdtype=torch.float16, devicemap="auto"
)
print("✔ Model successfully loaded!")
Parsing Protocols: Extracting Key Experimental Details
The ProtocolParser class processes raw protocol text to identify individual steps, their durations, temperature requirements, and safety considerations. It uses regular expressions to detect step numbers and descriptions, then analyzes the surrounding context to extract timing and temperature data. Safety flags such as biosafety levels, chemical hazards, and light sensitivity are also identified to ensure compliance with laboratory standards.
class ProtocolParser:
def readprotocol(self, text):
steps = []
lines = text.split('n')
for i, line in enumerate(lines, 1):
stepmatch = re.search(r'^(d+).s+(.+)', line.strip())
if stepmatch:
num, name = stepmatch.groups()
context = 'n'.join(lines[i:min(i+4, len(lines))])
duration = self.extractduration(context)
temp = self.extracttemp(context)
safety = self.checksafety(context)
steps.append({
'step': int(num),
'name': name,
'durationmin': duration,
'temp': temp,
'safety': safety,
'line': i,
'details': context[:200]
})
return steps
def extractduration(self, text):
text = text.lower()
if 'overnight' in text:
return 720 # 12 hours
match = re.search(r'(d+)s(?:hour|hr|h)s?(?!w)', text)
if match:
return int(match.group(1)) 60
match = re.search(r'(d+)s(?:min|minute)s?', text)
if match:
return int(match.group(1))
match = re.search(r'(d+)-(d+)s(?:min|minute)', text)
if match:
return (int(match.group(1)) + int(match.group(2))) // 2
return 30 # Default duration
def extracttemp(self, text):
text = text.lower()
if '4°c' in text or '4 °c' in text or '4°' in text:
return '4°C'
if '37°c' in text or '37 °c' in text:
return '37°C'
if '-20°c' in text or '-80°c' in text:
return 'FREEZER'
if 'room temp' in text or 'rt' in text or 'ambient' in text:
return 'RT'
return 'RT'
def checksafety(self, text):
flags = []
textlower = text.lower()
if re.search(r'bsl-[23]|biosafety', textlower):
flags.append('BSL-2/3')
if re.search(r'caution|corrosive|hazard|toxic', textlower):
flags.append('HAZARD')
if 'sharp' in textlower or 'needle' in textlower:
flags.append('SHARPS')
if 'dark' in textlower or 'light-sensitive' in textlower:
flags.append('LIGHT-SENSITIVE')
if 'flammable' in textlower:
flags.append('FLAMMABLE')
return flags
Managing Reagent Inventory with Fuzzy Matching and Expiry Checks
The InventoryManager class loads reagent stock data from CSV format and verifies the availability and expiration status of reagents required by the protocol. It employs fuzzy matching to handle variations in reagent naming and flags low stock levels or imminent expiry to prevent experimental delays.
class InventoryManager:
def init(self, csvtext):
from io import StringIO
self.df = pd.readcsv(StringIO(csvtext))
self.df['expiry'] = pd.todatetime(self.df['expiry'])
def checkavailability(self, reagentlist):
issues = []
for reagent in reagentlist:
reagentclean = reagent.lower().replace('', ' ').replace('-', ' ')
matches = self.df[self.df['reagent'].str.lower().str.contains(
'|'.join(reagentclean.split()[:2]), na=False, regex=True
)]
if matches.empty:
issues.append(f"❌ {reagent}: NOT FOUND IN INVENTORY")
else:
row = matches.iloc[0]
if row['expiry'] < datetime.now():
issues.append(f"⚠️ {reagent}: EXPIRED on {row['expiry'].date()} (lot {row['lot']})")
elif (row['expiry'] - datetime.now()).days < 30:
issues.append(f"⚠️ {reagent}: EXPIRING SOON ({row['expiry'].date()}, lot {row['lot']})")
if row['quantity'] < 10:
issues.append(f"⚠️ {reagent}: LOW STOCK ({row['quantity']} {row['unit']} left)")
return issues
def extractreagents(self, protocoltext):
reagents = set()
patterns = [
r'b([A-Z][a-z]+(?:s+[A-Z][a-z]+))s+(?:antibody|buffer|solution)',
r'b([A-Z]{2,}(?:-[A-Z0-9]+)?)b',
r'(?:add|use|prepare|dilute)s+([a-z-]+(?:antibody|buffer|substrate|solution))',
]
for pattern in patterns:
matches = re.findall(pattern, protocoltext, re.IGNORECASE)
reagents.update(m.strip() for m in matches if len(m) > 2)
return list(reagents)[:15]
Designing Efficient Experiment Schedules and Parallelization
The SchedulePlanner module constructs a detailed timeline for the protocol, starting from a specified time. It accounts for long-duration steps by allocating them to subsequent days and identifies steps that can be performed simultaneously to save time. This optimization is crucial for maximizing lab throughput and minimizing idle periods.
class SchedulePlanner:
def makeschedule(self, steps, starttime="09:00"):
schedule = []
current = datetime.strptime(f"2025-01-01 {starttime}", "%Y-%m-%d %H:%M")
day = 1
for step in steps:
end = current + timedelta(minutes=step['durationmin'])
if step['durationmin'] > 480: # Longer than 8 hours
day += 1
current = datetime.strptime(f"2025-01-0{day} 09:00", "%Y-%m-%d %H:%M")
end = current
schedule.append({
'step': step['step'],
'name': step['name'][:40],
'start': current.strftime("%H:%M"),
'end': end.strftime("%H:%M"),
'duration': step['durationmin'],
'temp': step['temp'],
'day': day,
'canparallelize': step['durationmin'] > 60,
'safety': ', '.join(step['safety']) if step['safety'] else 'None'
})
if step['durationmin'] <= 480:
current = end
return schedule
def optimizeparallelization(self, schedule):
parallelgroups = []
idletimesaved = 0
for i, step in enumerate(schedule):
if step['canparallelize'] and i + 1 < len(schedule):
nextstep = schedule[i + 1]
if step['temp'] == nextstep['temp']:
saved = min(step['duration'], nextstep['duration'])
parallelgroups.append(
f"✨ Steps {step['step']} & {nextstep['step']} can overlap → Save {saved} minutes"
)
idletimesaved += saved
return parallelgroups, idletimesaved
Ensuring Laboratory Safety Through Automated Validation
The SafetyValidator class enforces safety protocols by checking for out-of-range pH values, biosafety level requirements, and chemical hazards. It issues warnings for steps involving sharps, light-sensitive reagents, or flammable substances, helping to maintain a secure working environment.
class SafetyValidator:
RULES = {
'phrange': (5.0, 11.0),
'templimits': {'4°C': (2, 8), '37°C': (35, 39), 'RT': (20, 25)},
'maxconcurrentinstruments': 3,
}
def validate(self, steps):
risks = []
for step in steps:
phmatch = re.search(r'phs(d+.?d)', step['details'].lower())
if phmatch:
ph = float(phmatch.group(1))
if not (self.RULES['phrange'][0] <= ph <= self.RULES['phrange'][1]):
risks.append(f"⚠️ Step {step['step']}: pH {ph} OUTSIDE SAFE RANGE")
if 'BSL-2/3' in step['safety']:
risks.append(f"🛡️ Step {step['step']}: BSL-2 cabinet REQUIRED")
if 'HAZARD' in step['safety']:
risks.append(f"🧪 Step {step['step']}: Full PPE and chemical hood REQUIRED")
if 'SHARPS' in step['safety']:
risks.append(f"💉 Step {step['step']}: Use sharps container and needle safety protocols")
if 'LIGHT-SENSITIVE' in step['safety']:
risks.append(f"🌙 Step {step['step']}: Perform in dark or amber tubes")
return risks
Integrating AI for Protocol Optimization and Refinement
The core agent loop orchestrates the entire workflow, from parsing the protocol and checking inventory to scheduling and safety validation. It calls the CodeGen model to generate optimization suggestions, such as batching similar temperature steps or pre-warming instruments, enhancing overall efficiency.
def llmcall(prompt, maxtokens=200):
try:
inputs = tokenizer(prompt, returntensors="pt", truncation=True, maxlength=512).to(model.device)
outputs = model.generate(
inputs,
maxnewtokens=maxtokens,
dosample=True,
temperature=0.7,
topp=0.9,
padtokenid=tokenizer.eostokenid
)
return tokenizer.decode(outputs[0], skipspecialtokens=True)[len(prompt):].strip()
except Exception:
return "Consider batching steps with similar temperatures and pre-warming instruments."
def agentloop(protocoltext, inventorycsv, starttime="09:00"):
print("n🔬 Starting protocol analysis...n")
parser = ProtocolParser()
steps = parser.readprotocol(protocoltext)
print(f"📄 Extracted {len(steps)} protocol steps")
inventory = InventoryManager(inventorycsv)
reagents = inventory.extractreagents(protocoltext)
print(f"🧪 Detected {len(reagents)} reagents: {', '.join(reagents[:5])}...")
invissues = inventory.checkavailability(reagents)
validator = SafetyValidator()
safetyrisks = validator.validate(steps)
planner = SchedulePlanner()
schedule = planner.makeschedule(steps, starttime)
parallelopts, timesaved = planner.optimizeparallelization(schedule)
totaltime = sum(s['duration'] for s in schedule)
optimizedtime = totaltime - timesaved
optprompt = f"Protocol contains {len(steps)} steps totaling {totaltime} minutes. Suggest key bottleneck optimizations:"
optimization = llmcall(optprompt, maxtokens=80)
return {
'steps': steps,
'schedule': schedule,
'inventoryissues': invissues,
'safetyrisks': safetyrisks,
'parallelization': parallelopts,
'timesaved': timesaved,
'totaltime': totaltime,
'optimizedtime': optimizedtime,
'aioptimization': optimization,
'reagents': reagents
}
Generating User-Friendly Outputs: Checklists and Gantt Charts
To facilitate practical use, the system converts results into Markdown checklists and CSV files compatible with Gantt chart tools. These outputs summarize step timings, reagent pick-lists, safety alerts, and optimization tips, providing clear guidance for laboratory personnel.
def generatechecklist(results):
md = "# 🔬 Wet-Lab Protocol Checklistnn"
md += f"Total Steps: {len(results['schedule'])}n"
md += f"Estimated Duration: {results['totaltime']} minutes ({results['totaltime']//60}h {results['totaltime']%60}m)n"
md += f"Optimized Duration: {results['optimizedtime']} minutes (time saved: {results['timesaved']} minutes)nn"
md += "## ⏱️ Timelinen"
currentday = 1
for item in results['schedule']:
if item['day'] > currentday:
md += f"n### Day {item['day']}n"
currentday = item['day']
parallelicon = " 🔄" if item['canparallelize'] else ""
md += f"- [ ] {item['start']}-{item['end']} | Step {item['step']}: {item['name']} ({item['temp']}){parallelicon}n"
md += "n## 🧪 Reagent Pick-Listn"
for reagent in results['reagents']:
md += f"- [ ] {reagent}n"
md += "n## ⚠️ Safety & Inventory Alertsn"
allissues = results['safetyrisks'] + results['inventoryissues']
if allissues:
for issue in allissues:
md += f"- {issue}n"
else:
md += "- ✅ No critical issues detectedn"
md += "n## ✨ Optimization Tipsn"
for tip in results['parallelization']:
md += f"- {tip}n"
md += f"- 💡 AI Suggestion: {results['aioptimization']}n"
return md
def generateganttcsv(schedule):
df = pd.DataFrame(schedule)
return df.tocsv(index=False)
Demonstration with a Sample ELISA Protocol
We validate the system using a sample ELISA protocol for cytokine detection alongside a reagent inventory dataset. The agent parses the protocol, checks reagent stocks, schedules steps, identifies parallelization opportunities, and outputs a comprehensive checklist and Gantt chart data. This example highlights the planner’s capability to act as an autonomous, intelligent assistant in laboratory settings.
SAMPLEPROTOCOL = """ELISA Protocol for Cytokine Detection
- Coating (Day 1, 4°C overnight)
- Dilute capture antibody to 2 µg/mL in coating buffer (pH 9.6)
- Add 100 µL per well to 96-well plate
- Incubate at 4°C overnight (12-16 hours)
- BSL-2 cabinet required
- Blocking (Day 2)
- Wash plate 3× with PBS-T (200 µL/well)
- Add 200 µL blocking buffer (1% BSA in PBS)
- Incubate 1 hour at room temperature
- Sample Incubation
- Wash 3× with PBS-T
- Add 100 µL diluted samples/standards
- Incubate 2 hours at room temperature
- Detection Antibody
- Wash 5× with PBS-T
- Add 100 µL biotinylated detection antibody (0.5 µg/mL)
- Incubate 1 hour at room temperature
- Streptavidin-HRP
- Wash 5× with PBS-T
- Add 100 µL streptavidin-HRP (1:1000 dilution)
- Incubate 30 minutes at room temperature
- Work in dark
- Development
- Wash 7× with PBS-T
- Add 100 µL TMB substrate
- Incubate 10-15 minutes (monitor color development)
- Add 50 µL stop solution (2M H2SO4) - CAUTION: corrosive
"""
SAMPLEINVENTORY = """reagent,quantity,unit,expiry,lot
capture antibody,500,µg,2025-12-31,AB123
blocking buffer,500,mL,2025-11-30,BB456
PBS-T,1000,mL,2026-01-15,PT789
detection antibody,8,µg,2025-10-15,DA321
streptavidin HRP,10,mL,2025-12-01,SH654
TMB substrate,100,mL,2025-11-20,TM987
stop solution,250,mL,2026-03-01,SS147
BSA,100,g,2024-09-30,BS741"""
results = agentloop(SAMPLEPROTOCOL, SAMPLEINVENTORY, starttime="09:00")
print("n" + "="70)
print(generatechecklist(results))
print("n" + "="*70)
print("n📊 Gantt CSV Preview (first 400 characters):n")
print(generateganttcsv(results['schedule'])[:400])
print("n🎯 Time Savings Achieved:", f"{results['timesaved']} minutes through parallelization")
Conclusion: Advancing Wet-Lab Automation with AI-Driven Protocol Planning
This project illustrates how agentic AI can significantly improve reproducibility, safety, and efficiency in wet-lab workflows. By converting unstructured experimental text into actionable, validated plans, the system automates critical tasks such as reagent management, safety compliance, and scheduling optimization. The integration of on-device CodeGen reasoning ensures data privacy and enables intelligent bottleneck identification and mitigation. The resulting planner produces detailed Gantt charts, Markdown checklists, and AI-generated optimization advice, laying the groundwork for fully autonomous laboratory planning solutions.

