Table 2 Algorithm: model-based DQN for materials design

From: Unlocking the black box beyond Bayesian global optimization for materials design using reinforcement learning

[1] Input: Design space X, initial dataset D₀, GPR surrogate model M, DQN agent A, RL training episode N, episode length L (number of design decisions/actions^a per episode), Batch size B_train for Q-network update, Optimization type: minimization
[2] Output: Best material design x*
[3] Initialize GPR surrogate model M with D₀, D = D₀
[4] Initialize DQN with Q-network Q(s, a)
[5] Initialize replay buffer R
[6] while not terminated do	// Until max iterations or target performance met
[7] Train/Update GPR model M using D
[8] // Agent trai ning stage: learn from surrogate model; train DQN for N episodes
[9] for episode = 1 to N do	// Training budget for DQN agent
[10] s = get_initial_state()	// Initialize with predefined or random state
[11] for step = 1 to L do	// Make L sequential design decisions/actions
[12] a = ε-greedy(Q(s,·), X)	// Select action within design space
[13] s’ = next_state(s, a)	// State transition from s to s'
[14] r = M(s) - M(s’)	// Get reward using GPR, assuming minimization
[15] Store (s, a, r, s’) in R
[16] Update Q(s, a) using a random batch of B_train (s, a, r, s’) tuples from R
[17] s = s’
[18] end for
[19] end for
[20] // Design stage: propose and evaluate one new material design^b
[21] s = get_initial_state()	// Initialize with predefined or random state
[22] for step = 1 to L do	// Make L sequential design decisions/actions
[23] a = ε-greedy(Q(s,·), X)	// Propose materials design action
[24] s’ = next_state(s, a)	// State transition from s to s'
[25] s = s’
[26] end for
[27] x_next = s
[28] y_next = f(x_next)	// Evaluate the new design
[29] D = D ∪ {(x_next, y_next)}	// Update dataset with experiment results
[30] end while
[31] return x* = argmax _x∈D f(x)

^a “Design decisions/actions” refer to the actions taken to select element compositions for one material design.
^bThe term “material design” refers to obtaining one specific alloy composition.

Back to article page

Quick links

Search

Quick links