Introduction

Industry 5.0 aims to establish a manufacturing framework that prioritizes human-centric and efficient processes, fostering seamless collaboration between humans and robots1. In this paradigm, human-robot collaboration (HRC) becomes a cornerstone of modern innovative manufacturing systems2,3,4,5. An ideal HRC scenario involves delegating repetitive, low-skill, and ergonomically challenging tasks to robots, thereby alleviating the physical strain on humans. At the same time, it emphasizes the importance of human intelligence and robotic dexterity in both operational and cognitive functions6. In response to increasingly diversified demands, manufacturing is transitioning from mass production to customized assembly7,8. As a result, robots with attributes such as speed, strength, repeatability, and precision are being integrated into manufacturing systems9. In this context, humans, programmable robots, and computer numerical control manufacturing systems operate in distinct physical spaces, each performing specific tasks and leveraging their unique advantages10.

However, the resilience of a manufacturing system extends beyond robotic operations and must also be adaptable to fluctuations in orders and disruptions11. Traditional mass production lacks the flexibility required to offer small-batch personalized products in response to changes or interruptions12. Flexible manufacturing systems (FMS), based on HRC, show great promise in addressing this limitation. In such systems, humans handle material supply and maintenance, while robots manage the transportation of tools and materials to processing machines, facilitating small-batch customized manufacturing13. This approach enables multi-variety mixed-line production and reduces the need for large inventories of materials and finished products, thereby improving resource utilization, as shown in Fig. 1.

Fig. 1: HRC flexible manufacturing system.
figure 1

The manufacturing system comprises a worker operator responsible for loading and unloading materials, with items stored in a line-side warehouse. A flexible handling robot transports workpieces, materials, and tools to flexible processing equipment for machining. After processing, the completed workpieces and resources are returned to the line-side warehouse. The system includes three flexible computer numerical control machines, each dedicated to machining standard deburring tools for automotive welding lines. These machines process a total of 12 types of components made from materials such as cast iron, aluminum alloy, and copper. The manufacturing processes encompass rough and finish machining, cleaning, inspection, and marking.

Existing FMS for HRC often face challenges, such as complex production processes, frequent faults, and fluctuating order demands, leading to complicated management and limited real-time responsiveness14. To address these issues, current FMS typically rely on simple rule-based resource scheduling15,16, which works well in specific scenarios but results in significantly lower equipment utilization rates in others. Moreover, to ensure timely delivery, substantial quantities of production materials are preemptively prepared, leading to resource wastage and hindering the widespread adoption of flexible manufacturing models17.

Traditional decision-making methods, which rely on workshop managers, are often inefficient and lead to significant resource waste. Workshop scheduling plays a critical role as the “brain” of the entire FMS, responsible for the optimal allocation of resources and task sequencing to meet production goals. A well-designed scheduling plan can enhance overall workshop efficiency without increasing resource input18,19. Solutions to workshop scheduling problems are generally divided into exact18,20,21,22 and approximate23,24,25,26 methods. Exact methods, such as branch-and-bound and dynamic programming, can achieve theoretically optimal solutions27. However, they are computationally intensive and time-consuming, making them unsuitable for the real-time demands of FMS. Consequently, approximate methods, including heuristic dispatching rules (HDRs)15, metaheuristic algorithms28,29,30, deep reinforcement learning (DRL)24,31,32, genetic programming (GP)25,33, and genetic expression programming (GEP)34, have become increasingly widely used. However, metaheuristic algorithms may struggle to adapt quickly to dynamic changes due to the dynamic nature and large scale of FMS. HDRs and DRL methods are favored for their lower computational complexity and faster response times, but they may not consistently produce satisfactory results35. Although GP and GEP offer strong generalization capabilities, they are essentially random search processes that may not yield high-quality dynamic scheduling solutions within short time frames36.

In recent years, with the increasing application of large language models (LLMs) across various domains37,38,39,40,41,42,43, manufacturing systems have increasingly relied on AI to enhance quality, productivity, and overall performance44. The integration of LLMs with evolutionary algorithms has opened new opportunities for prompt engineering45 and automated algorithm design36,41,46. Notably, the integration of LLMs with evolutionary algorithms has introduced new avenues for prompt engineering and automated algorithm design. Building on recent pioneering work, such as the early 2024 publication in Nature by the Google team, where LLMs combined with evolutionary algorithms achieved a new benchmark in combinatorial optimization41, this research explores the transformative potential of LLMs within the domain of FMS. This approach benefits from the ability of LLMs to generate highly adaptable HDRs through meticulously designed prompts and iterative feedback mechanisms. By leveraging their language understanding and generation capabilities, LLMs can rapidly acquire domain-specific knowledge from training datasets and generate high-quality solutions in testing scenarios in a remarkably short timeframe. For example, models like ChatGPT and ChatGLM have been successfully used to evolve HDRs, addressing complex dynamic job shop scheduling challenges36. However, applying this to flexible manufacturing scheduling systems in HRC requires tackling specific challenges, such as machine selection within FMS, which complicates the direct application of such evolutionary frameworks in these contexts. Moreover, sharing production data with online LLMs often conflicts with data security requirements in manufacturing enterprises, making local deployment an essential consideration. While smaller models may lack sufficient inference capability, larger models demand high-performance hardware, resulting in prohibitively high costs. To overcome these challenges, this paper proposes an evolutionary framework for flexible manufacturing scheduling based on LLMs. The framework enhances HDR design through supervised fine-tuning (SFT) of the local qwen2.5-coder-7b model, built upon the population self-evolution (SeEvo) approach. This method enables the training of multiple cases, gathering the final HDRs, and ensures a high level of adaptability and efficiency in deployment. In the deployment phase, contextual prompts enable the rapid generation of high-quality scheduling plans for HRC manufacturing systems, ensuring a quick response to dynamic production environments. By combining the latest advances in LLMs with evolutionary techniques, this framework opens new opportunities for addressing the complexities of real-time, dynamic manufacturing scheduling, significantly advancing the field and pushing the boundaries of AI-driven optimization.

Results

Dynamic flexible manufacturing system scheduling performance testing

In recent years, the application of machine learning in FMS scheduling has experienced a surge, resulting in the development of rich datasets and benchmark tests. However, DRL, which is widely used by researchers to address dynamic workshop scheduling problems, does not consistently outperform traditional HDRs47,48. Similarly, while evolutionary frameworks such as GP25 and GEP34 have been employed for automatic algorithm design, these methods suffer from ineffective guided exploration. Their reliance on extensive random search limits their development and exploration capabilities36.

To address this issue, we develop SeEvo, a language-guided heuristic framework designed to efficiently generate scheduling solutions for dynamic FMS environments by leveraging the capabilities of LLMs. Using a local qwen2.5-coder-7b model, we generate evolutionary prompts that guide the evolution of initial seed heuristic algorithms while continuously collecting effective datasets. The evolutionary process is inspired by individual co-evolution, individual self-evolution, and collective evolution36. Additionally, unlike online LLMs such as ChatGPT, which are used in the literature, we further enhance our local qwen2.5-coder-7b through SFT to improve HDR evolution efficiency and address data privacy concerns. Notably, the effectiveness of SeEvo’s outputs depends on the reasoning capabilities of the LLM and the quality of HDRs accumulated across multiple cases. For instance, directly using open-source LLMs may result in limited reasoning capabilities, making it challenging to generate effective HDRs based on current cases and prompts quickly. This is especially true when evolutionary reflection prompts produce flawed or nonsensical inputs, causing both local LLMs and online LLMs like ChatGLM4 to fail to generate better HDRs. Similarly, inaccurate LLM outputs can lead to erroneous scheduling plans. For these reasons, we evaluate SeEvo’s performance from three dimensions: (1) the accuracy of the generated HDRs, (2) reasoning ability, and (3) the quality of rapid inference on the test set.

As shown in Fig. 2a, qwen2.5-sft-7b achieves HDR generation accuracy close to gpt3.5. Although not perfectly accurate, this represents a significant improvement over the pre-fine-tuned qwen2.5-coder-7b. In Fig. 2b, a comparative experiment of the SeEvo framework is presented across 50 generations and 10 test cases, benchmarked against traditional methods such as GP, GEP, as well as online LLMs like gpt3.5 (gpt-3.5-turbo-ca), glm3 (GLM-3-Turbo), glm4 (GLM-4), the pre-fine-tuned qwen2.5-coder-7b, and qwen2.5-sft-7b. The results demonstrate that qwen2.5-sft-7b outperforms both traditional automatic algorithm design methods and online LLMs, including the pre-fine-tuned qwen2.5-coder-7b. Additionally, to evaluate the generalization and robustness of the framework, we conduct a benchmark comparison on a test set of 200 cases against DRL, GP, GEP, and 10 HDRs. Performance is measured by the relative deviation of each method’s solution from the current best solution. The boxplot results (Fig. 2c) indicate that LLM-guided scheduling methods consistently outperform traditional approaches. Notably, qwen2.5-sft-7b demonstrates exceptional performance, with a median relative deviation close to zero and a Gap ratio below 1% for the majority of cases. This indicates both high stability and superiority, significantly surpassing other non-LLM-guided methods, and suggests its potential as a valuable assistant for HRC manufacturing.

Fig. 2: Performance evaluation of the scheduling method for dynamic FMS based on LLMs.
figure 2

a Comparison of the success rates of different LLMs in generating HDRs. The results show that gpt3.5 achieves the highest success rate, followed by glm-4 and qwen7b-sft. b Convergence performance curves for various methods in 10 random cases. The qwen2.5-sft-7b method demonstrates stronger convergence, indicating its powerful search and exploration capabilities. c Box plot of the relative ratio for different scheduling methods across 200 test cases. The performance of qwen7b-sft is significantly superior to all other methods. d Box plot of the Gap ratio for different LLM methods and their associated evolutionary strategies (SeEvo vs. ReEvo). The combination of qwen2.5-sft-7b and the SeEvo strategy yields the best performance. e Comparison of the number of best solutions obtained (bar chart) and the average makespan (line chart) by each LLM method across 200 test cases. Qwen2.5-sft-7b (SeEvo) performs best on both key metrics, achieving the highest number of best solutions and the lowest average makespan.

Additionally, to validate the effectiveness of the novel individual self-evolution mechanism in SeEvo, we perform an ablation study. By comparing the complete SeEvo strategy with a simplified version (denoted ReEvo, which lacks the individual self-evolution mechanism), we find that under the same LLM conditions, the SeEvo strategy outperforms the ReEvo, as demonstrated by a smaller Gap ratio between its solutions and the best makespan (Fig. 2d). The combination of qwen2.5-sft-7b and the SeEvo strategy yields the best performance, with the most concentrated distribution of Gap ratios. In contrast, for any given LLM, SeEvo consistently outperforms ReEvo. Among the three API-based models, glm3 shows slightly better results. It is worth noting that while glm4 demonstrates strong exploration performance in Fig. 2b, its performance significantly decreases in Fig. 2c–e. This is mainly attributable to our experimental design, where each LLM performs a rapid iteration on the test cases based on its knowledge base of 20 HDRs, making the outcome highly dependent on the quality of that specific knowledge base. We do not explore multiple knowledge bases further, as the primary focus of this paper is the design of the SeEvo method and the local fine-tuning pipeline. Online LLMs are included only to validate the effectiveness of our method and framework, as their adoption in manufacturing is often limited by data privacy concerns. Finally, we further compare the number of best solutions and the average makespan by different LLM methods across the 200 test cases. The results show that qwen2.5-sft-7b (SeEvo) not only finds the highest number of best solutions but also achieves the lowest average makespan, once again confirming its superior overall performance.

Machining flexible manufacturing system scheduling performance testing

To validate the effectiveness of the SeEvo method in a real-world FMS, tests are conducted using operational data from the flexible production line at the headquarters of Guangzhou MINO Equipment Co., Ltd. This production line consists of seven vertical computer numerical control centers capable of mixed-line manufacturing. The system processes a variety of mechanical products, including angle seats, bases, sliding plates, connecting blocks, three-axis bases, roller bases, tray bases, and manual lubrication tables, across multiple orders. The line also features a stacker crane, a palletizer, three flexible handling robots, and two workers (Fig. 3a). In this study, the workshop processes four types of products, each with multiple orders of varying quantities. Orders arrive dynamically based on random user demand and are entered by an operator, who supplies the corresponding materials to a line-side buffer. Machine faults, captured by sensors, occur unpredictably. The optimization objective is to minimize the makespan. The architecture of the FMS is depicted in Fig. 3b. At its core is an intelligent management and control system driven by the scheduling algorithm, which enables centralized resource management, intelligent allocation, efficient process scheduling, and adaptive production in response to dynamic disturbances.

Fig. 3: Architecture of the FMS.
figure 3

a Schematic of the physical entity (left) and the 3D virtual environment (right) of the FMS line, including the core components of the production line: computer numerical control systems, flexible handling robots, and workers. In a typical process, the worker secures a workpiece onto a pallet, which is then transported by the robot to a computer numerical control system for machining. b The cyber-physical integration architecture of the system. An intelligent management and control platform is established under the manufacturing execution system, with the production planning and scheduling module at its core. This module generates scheduling plans by integrating inputs such as process modeling, production tasks, and optimization objectives. Through a data acquisition gateway and communication interfaces, it dispatches commands to the physical production line and collects real-time data, thus creating a closed-loop control system.

In 54 test cases simulating a real-world production scenario, the SeEvo framework is benchmarked against several baseline methods. The results demonstrate that the SeEvo framework, integrated with the fine-tuned qwen2.5-sft-7b model, outperforms the others. It achieves the lowest median Gap Ratio with the most concentrated distribution, indicating superior solution quality and stability. Moreover, for all tested LLMs, the SeEvo method consistently outperforms its simplified version, ReEvo (Fig. 4a).

Fig. 4: Performance evaluation and analysis of generated HDRs in real-world manufacturing scenarios.
figure 4

a Box plot comparing the optimization performance of different LLM methods across 54 scheduling scenarios. Qwen2.5-sft-7b (SeEvo) exhibits the best performance, with the smallest median and distribution range for the Gap Ratio. b Performance of each method across the same 54 real-world scenarios on best-known solutions and average makespan. c Heatmap of the Relative Gap for different LLM methods across various problem scales (from 3 × 10 to 5 × 15). The heatmap shows that qwen2.5-sft-7b (SeEvo) consistently maintains the lowest relative Gap across most tested scales. d Heatmap comparing the performance of all LLM-based methods against the HDR originally used by the factory (Before). The results indicate that all LLM-generated methods significantly outperform the original HDR. Comparison of HDRs generated by different methods. The HDR (e) generated by qwen2.5-sft-7b (SeEvo) is well-structured and highly interpretable. In contrast, the HDRs generated by GEP (f) and GP (g), while effective, are mathematically complex, lack intuitive physical meaning, and exhibit poor interpretability.

A comprehensive performance evaluation further confirms this advantage. On key metrics, such as the number of best solutions obtained and the average minimum makespan, the qwen2.5-sft-7b configuration consistently performs better than the others. In contrast, the HDR originally used by the production line (denoted “Before”) achieves the best solution only once in a scenario (3×13) and performs poorly in the majority of scenarios, indicating its limited generalization capability (Fig. 4b). A heatmap analysis visually represents the performance of different methods across various production scales. The qwen2.5-sft-7b (SeEvo) configuration maintains the smallest relative Gap (indicated by dark green) in almost all scale combinations, demonstrating its robust performance (Fig. 4c). Most importantly, all LLM-generated methods consistently and significantly outperform the original HDR across all test scenarios, emphasizing the practical value of the framework (Fig. 4d).

Beyond its quantitative performance advantages, the HDR generated by qwen2.5-sft-7b (Fig. 4e) is structurally clear and logical, effectively integrating sub-policies with physical meanings. For example, in workpiece selection, it combines the principle of selecting the job with the highest completion percentage while using a forward-looking term for fine-tuning, thus balancing the current state with future trends. For machine selection, it aims to assign a workpiece to a machine with the most balanced load and stability, taking into account the processing complexity of the workpiece itself, potentially prioritizing a “long-duration task” for a machine with greater idle capacity.

In contrast, the HDRs generated by traditional automated design methods, such as GEP and GP (Fig. 4f, g), while effective, are mathematically complex and verbose, with limited interpretability. This black-box nature complicates understanding and debugging in real-world production environments. This comparison highlights the unique advantage of the SeEvo framework: it not only discovers high-performance scheduling strategies but also ensures these strategies are human-understandable, facilitating their application in other complex FMS scenarios, such as aerospace skin manufacturing.

Discussion

This paper presents an LLM-based SeEvo framework for FMS production scheduling, which integrates three stages: individual co-evolution, individual self-evolution, and collective evolution. During the application and testing phases, we input the prompts and 20 pre-collected HDRs into the fine-tuned LLM for inference using the SeEvo framework, with the results directly applied to the FMS. The results significantly outperform existing HDR-based scheduling approaches. Additionally, the framework demonstrates the ability to generate high-performance HDRs within just one minute, offering a novel solution for the application of LLMs in intelligent manufacturing. Notably, the fine-tuned LLM requires only a single 4090D GPU to complete the inference experiments, significantly reducing the cost of utilizing online LLM APIs. When two GPUs are used, the inference speed is comparable to proprietary closed-source models, such as ChatGLM3. These findings underscore the considerable potential of the SeEvo method in FMS scheduling and confirm its effectiveness as a tool for generating scheduling plans.

Despite its numerous advantages, the SeEvo method faces several challenges during the inference process. The selection of training cases, as well as the inherent limitations of the evolutionary logic underlying large-scale test solutions, complicates the evaluation of the SeEvo method. Furthermore, the model’s dependence on a predefined set of HDRs may limit its scalability and adaptability in more realistic scenarios. Future research should focus on constructing larger-scale knowledge bases and developing efficient retrieval mechanisms through a knowledge augmented generation (KAG)49 framework. Additionally, it will be crucial to explore advanced fine-tuning techniques, such as group relative policy optimization (GRPO), which could further enhance the model’s performance, particularly in domains that require more complex and adaptive capabilities.

Method

Evolutionary mechanism based on LLMs

Within the SeEvo framework, LLMs perform two key roles: the Reflector LLM, which generates guiding prompts for individuals, and the Generator LLM, which produces individual HDRs. Unlike traditional hyper-heuristic algorithms such as GP and GEP, which rely on fixed encoding structures and function sets, each individual in SeEvo is a code block generated directly by the LLM. These individuals are only required to adhere to a predefined function signature, including the function name, inputs, and outputs, thus overcoming the limitations associated with fixed encoding length and complexity.

The overall evolutionary process of SeEvo (Fig. 5) is implemented through an iterative loop consisting of three core stages: individual co-evolution, individual self-evolution, and collective evolution. The specific implementation details are as follows:

Fig. 5: The SeEvo framework: an evolutionary process for heuristic algorithms driven by LLMs.
figure 5

a Individual co-evolution reflection and crossover: The system randomly selects two parent individuals for performance comparison. This comparison then serves as input for the Reflector LLM to generate an in-depth analysis of their respective strengths and weaknesses. The resulting analysis acts as an evolutionary instruction, guiding the two parents to generate new offspring. b Individual self-evolution reflection and crossover: The system provides feedback to the Reflector LLM on the performance trajectory of each individual before and after an evolution step. Based on this, the LLM generates targeted suggestions for improvement, guiding the individual to self-optimize and produce a new offspring. c Collective evolution reflection and mutation: This stage integrates all co-evolutionary and self-evolutionary knowledge accumulated in stages (a) and (b). This global information is submitted to the Reflector LLM to generate a macro-level insight into the overall evolutionary direction. This high-level guidance is specifically used to direct the mutation of the best parent individual in the current population, facilitating deeper exploration from the current best-known solution.

Population initialization

The LLM generates an initial population of HDRs based on the task specifications and a seed HDR. The prompt engineering process used for this initialization is depicted in Fig. 6.

Fig. 6: Prompt engineering design.
figure 6

The design of the prompt engineering, including individual co-evolution reflection, individual self-evolution reflection, collective evolution reflection, crossover, and mutation.

Individual co-evolution and crossover

Two parent HDRs are randomly selected from the current population for performance comparison. The evaluation result (e.g., superior or inferior performance on test cases) is fed back to the Reflector LLM. The system guides the LLM to analyze the performance differences in depth and to generate instructive recommendations for improvement. This comparative mechanism provides feedback akin to a language gradient, even in the absence of a continuous reward signal. The analysis and recommendations serve as evolutionary instructions, guiding the Generator LLM to produce two new offspring HDRs based on this parent pair.

Individual self-evolution and crossover

In this stage (Fig. 5b), the system feeds the performance trajectory of each individual before and after co-evolution back to the Reflector LLM. The LLM is prompted to reflect on the changes in performance: if performance has declined or stagnated, the LLM analyzes potential causes and generates reverse prompts to prevent further failures. If performance has improved, the LLM synthesizes successful experiences to generate optimization prompts that amplify strengths. These targeted recommendations guide the individual’s self-optimization, resulting in the generation of a new offspring. The crossover operation in this stage mirrors that in the co-evolution stage.

Collective evolution and mutation

This stage (Fig. 5c) provides macro-level control over the population’s evolutionary trajectory. Long-term reflection data from previous iterations, co-evolutionary reflections from the current round, and self-evolutionary reflections are integrated. The Reflector LLM synthesizes this global information to generate insights into the overall evolutionary direction. This high-level guidance directs the mutation of the best-performing parent individual in the current population, encouraging a more thorough exploration of the current optimal solution. The number of HDRs generated depends on the mutation probability.

Model training, knowledge base construction, and fast inference

To enhance the LLM’s performance on the specific scheduling problem and facilitate rapid deployment, a comprehensive pipeline is designed, encompassing data generation, cleaning, model fine-tuning, knowledge base construction, and fast inference (Fig. 7).

Fig. 7: Data cleaning, fine-tuning, knowledge base construction, and fast inference framework.
figure 7

a evolutionary data generation: The qwen2.5-coder-7b model is used with the SeEvo framework to run 50 rounds of evolution for each of the 200 random cases. The complete interaction logs from this process, including all successful evolutionary instructions and HDRs. b data cleaning and SFT: The raw data is filtered to retain only the lead to performance improvement. Specifically, if an offspring’s HDR outperforms its parent, the instruction-response pair that prompted this optimization is selected. These high-quality datasets are then used to perform LoRA fine-tuning on the qwen2.5-coder-7b. c HDRs knowledge base collection: The qwen2.5-sft-7b is employed to perform another 50 rounds of deep evolution on 20 training cases. The resulting evolved HDRs are collected to build a high-quality knowledge base, providing high-quality examples for the subsequent fast inference stage. d Fast HDR generation: the system calls the qwen2.5-sft-7b and utilizes the HDRs from the knowledge base as the initial individuals. With just a single iteration of the SeEvo framework, the system can generate a high-quality scheduling solution for a new problem within one minute.

Data generation and cleaning

Initially, the SeEvo framework is executed using the base qwen2.5-coder-7b model on 200 randomly generated scheduling cases, with each case iterated for 50 rounds. During this process, complete interaction records from each evolutionary step are systematically collected, including all reflective prompts (inputs) and their corresponding generated HDRs (outputs), which form the raw dataset (Fig. 7a). Each HDR is evaluated in an isolated subprocess, with up to 20 subprocesses running in parallel. If any subprocess encounters an error or times out, it is terminated without affecting the main evolutionary process. The main process then evaluates the outcomes by reading results from designated text files.

The data is then cleaned by filtering for datasets that result in performance improvements. Specifically, when the performance of an offspring HDR exceeds that of its parent, the corresponding instruction-response pair is retained (Fig. 7b). Any failed executions result in very low fitness scores for those HDRs, ensuring that they are excluded during the selection phase. This process ensures that only high-quality, performance-enhancing instances are retained in the dataset, contributing to more reliable results in subsequent evaluations.

Supervised fine-tuning

Using the curated high-quality data, SFT is performed on the base model. The fine-tuning process is conducted with the Llama Factory50 (https://github.com/hiyouga/LLaMA-Factory) on a single A800-80GB GPU. During SFT, successful evolutionary instructions (reflective prompts) are treated as the “instruction”, and the improved individual HDRs as the “output”. This process yields the qwen2.5-sft-7b model, which exhibits enhanced problem-solving capabilities.

High-quality HDRs knowledge base construction

To provide a high-quality initial population for the fast inference stage, another 50 rounds of deep evolution are performed using the fine-tuned qwen2.5-sft-7b on 20 representative training instances. The high-quality HDRs generated during these iterations are collected to form an elite knowledge base consisting of 20 HDRs (Fig. 7c). While the current knowledge base is limited, primarily due to the suboptimal performance of traditional vector-matching methods in this context, future work will focus on constructing larger-scale knowledge bases and enabling their efficient retrieval through a KAG49 framework.

Fast HDR generation

During the online application or testing phase (Fig. 7d), the system invokes the fine-tuned qwen2.5-sft-7b model and uses HDRs from the knowledge base as the initial population. A single complete iteration of the SeEvo framework (i.e., sequential individual co-evolution, individual self-evolution, and collective evolution) generates a high-quality solution for a new scheduling problem in under one minute.

Intelligent HRC flexible manufacturing system

System architecture

The FMS (Fig. 3b) comprises three primary modules: the flexible production line, the management and control system, and the LLM-based scheduling module. The production line hardware includes multiple computer numerical control systems, handling robots, an automated warehouse, buffer positions, tools, and fixtures. The management system is responsible for data coordination, resource management, and task allocation. The LLM-based scheduling module acts as the core decision-making unit, receiving inputs such as the production line model, process flows, production orders, and optimization objectives, which are processed through SeEvo to generate scheduling solutions.

In our system, we have seven machines, and each workpiece consists of multiple operations. For many of these operations, multiple alternative machines are available for processing, and different types of workpieces have distinct technological routes. These characteristics align closely with the flexible job shop scheduling problem, where jobs (workpieces) require a sequence of operations, each of which can be performed on one of several alternative machines. This routing flexibility and the machine assignment decision-making process are central to the scheduling challenge in our system, making FJSP a natural fit for modeling our manufacturing environment.

The use of FJSP allows us to efficiently address the complexities of machine assignment, job sequencing, and processing uncertainties, all of which are critical in optimizing the performance of our intelligent HRC-FMS.

Scheduling processing

In this flexible manufacturing system, the production flow for a workpiece involves several steps, including manual loading, robotic handling, multi-operation machine processing, and manual fixture changes. This scenario is modeled as a flexible job-shop scheduling problem characterized by dynamic order arrivals, random machine faults, and fuzzy processing times. To capture time fluctuations in real-world production, a processing time ambiguity of 2 min is introduced for each operation in the dataset. The production cycle is computed using an event-driven simulation model, where each subsequent operation is only released into the pool of jobs awaiting processing after the completion of the previous operation (Fig. 8).

Fig. 8: HRC flexible manufacturing scheduling evolution framework.
figure 8

The evolutionary framework consists of two phases: the self-evolution phase and the online application phase. The self-evolution phase is divided into two parts: the flexible job shop environment and the SFT LLM. In this phase, HDRs are evolved for 20 cases using the SeEvo framework. In the online application phase, the best HDR is derived through one run of the SeEvo, which is then applied to the HRC flexible manufacturing scheduling system to improve production efficiency.

Online scheduling execution framework

The proposed framework is divided into two phases: the self-evolution phase and the online application phase (Fig. 8). During the self-evolution phase, SeEvo undergoes extensive evolution across multiple training cases to create a high-quality HDRs knowledge base. In the online application phase, when new production tasks or dynamic disturbances (e.g., urgent orders or machine faults) arise, the system leverages the HDRs from the knowledge base as an initial individual. A single rapid iteration is then performed to generate an optimized scheduling plan. The generated HDRs are primarily used for job selection. When multiple jobs compete for the same machine, the HDR calculates a selection probability for each candidate job, and the system assigns the job with the highest probability for processing. This mechanism enables the system to respond to dynamic events in seconds, maintaining ongoing operations while rescheduling subsequent tasks in real-time. This approach effectively mitigates issues such as the creation of semi-finished products or production disruptions caused by interrupting current operations.

Data generation and experimental design methodology

In the Intelligent HRC flexible manufacturing system, the data generation process is crucial for creating realistic test scenarios to validate and optimize the scheduling system. The experiments utilize two types of generated data: randomly generated cases and simulation data based on actual processing information.

Randomly generated cases

The random generation process mimics real-world uncertainties and operational complexities in manufacturing. Here is the breakdown:

  1. 1.

    Order Quantity: The number of orders is randomly chosen between 3 and 10.

  2. 2.

    Workpiece Quantity per Order: Each order consists of a randomly generated number of workpieces, ranging from 10 to 20 pieces.

  3. 3.

    Machine Availability: The number of machines is randomly set between 5 and 10.

  4. 4.

    Operations per Workpiece: Each workpiece undergoes a random number of operations, ranging from 5 to 7 steps.

  5. 5.

    Machine Failures: To simulate real-world disruptions, the number of machine failures is randomly generated between 0 and 3 occurrences.

  6. 6.

    Fuzzy Processing Time: Processing time is assigned with a slight variation ( ± 2 min) to introduce some uncertainty and model real-world fluctuations in time.

  7. 7.

    Machines per Operation: The number of machines available for each operation is randomly chosen between 0 and 3 machines.

  8. 8.

    Processing Time per Workpiece: Each operation has a random processing time between 20 and 60 min, represented as a random integer.

From this methodology, 200 distinct random cases are generated to cover various possible scenarios that the system might encounter in real-world conditions.

Simulation data based on actual processing information

This data type is derived from actual shop floor processing information, ensuring high fidelity to real-world production environments. It helps validate the system by matching the simulation closely with actual production data.

  1. 1.

    Number of Orders: 3 to 5 orders are simulated.

  2. 2.

    Workpiece Quantity per Order: Each order contains 10 to 15 workpieces.

  3. 3.

    Machine Failures: Random machine failures are introduced, mirroring the unpredictability of real manufacturing systems.

  4. 4.

    Processing Information: The specific processing information for each workpiece is derived directly from real-world data collected in a machine shop, ensuring that the simulation accurately reflects actual operational realities.

By combining both randomly generated and real-world simulation data, the system can better respond to dynamic, unpredictable manufacturing conditions, ensuring that the scheduling solutions are not only theoretically sound but also practical and reliable in real-world settings.