# Deriving a Design Space with DOE and Risk Estimate

David Wang is a Senior Data Scientist at Sartorius Stedim Data Analytics based out of Singapore. David has a Ph.D. in Chemical Engineering, M. Eng. in Control Theory and Engineering, and B. Sci. in Mathematics and Statistics. He works on improving efficiency, optimality and profitability of product development, process development, and manufacturing processes (Pharma, Biopharma, Chemical, Food & Feed, Life Sciences, etc.) by applying advanced data analytics technology, where the work is focused on data-driven modelling, process monitoring, process control, Design of Experiments, PAT and QbD.

DOE is the key tool for QbD and Design Space.

Design of Experiments (DOE) has been increasingly used in the biopharmaceutical industry for product and process development as a means to satisfy regulations around Quality by Design (QbD) and to derive a combination of parameters that ensure the product to meet the defined quality attributes, also known as a Design Space. DOE is a technique used in planning experiments and subsequently analyzing the data obtained. This technique allows us to systematically vary several experimental parameters simultaneously to obtain sufficient information using the minimum number of experiments. Based on the obtained data, a mathematical model of the studied process (e.g., a protein purification protocol or a chromatography step) is created. The model can then be used to find an optimum for the process and to understand the influence of the experimental parameters on the outcome. Modern software is used to create the experimental designs, to obtain a model, and to visualize the generated information (Figure 1 shows an example for finding a region where quality attribute is satisfied).

Design space has to address risk management.

Although DOE can derive a region that supposedly satisfies the quality attributes, taking this region as the design space will be overoptimistic. This is because there is always uncertainty in terms of model error and measurement accuracy. Without considering this uncertainty, there is a high risk of failure from simply selecting an optimum and taking it as the operating setpoint. Figure 2 illustrates that a realistic outcome of model prediction (Yield) should be a probability distribution rather than a single value when faced with uncertainty in parameters variations, model errors, measurements systems, and process variability. It can be envisaged that not every point beyond the minimum response contour in Figure 1 would guarantee >80% yield, the region for design space in Insulin purification example could be smaller than it displayed in Figure 1.

How to define a region that will fulfil the product specification profile with a quality estimate.

Sartorius-Stedim Biotech experts typically use MODDE® Design of Experiments Solution for Biopharm process development and services.

They have gained significant experience through successfully delivering many diverse process development projects, and they have trained a number of biopharma scientists and engineers who can easily use the software.

Given specification of possible varying ranges of parameters, Monte Carlo simulation on the DOE model would estimate a new region that satisfies the quality attribute with a risk analysis criteria. It would generate a probability contour plot for us to visualize the region with different levels of certainty so that we have confidence in selecting the operating setpoint. In addition, the software has an Optimizer tool which can easily derive a robust setpoint which is located at the centre in the (often highly) irregular design space volume, far from all the boundaries. This further reduces the risk (Figure 3).

The setpoint analysis that is provided by the software is another excellent platform for analysis of every given setting, imposing practical adjustments to the allowable parameter ranges and evaluates the consequences of such changes. Coming back to the Insulin case study, by regulating Salt and EtOH setpoints and their disturbances, we can inspect the predicted

Yield distribution (Figure 4). With this, we can understand how the Yield changes with the Salt and EtOH settings and their disturbances, and determine proven acceptable ranges (PAR) for the parameters.

Figure 2. The effect of variation in Salt, EtOH, and model error on the Yield. The outcome from model prediction should be a probability distribution considering the model error, measurement errors, and variations in parameters. Insulin purification example: Salt and EtOH as parameters, Yield as response.

Figure 1. Use of model to visualize and identify the region where Yield is larger than 80% in insulin purification considering Salt and EtOH factors

Figure 3. The probability contour plot after Monte Carlo simulation of DOE model. The green area shows 99.9% certainty that Yield > 80%. The outskirt line indicates that there is 50% certainty that Yield >80%. The Cross-hairs shows the robust setpoint, Dotted frame shows design space hypercube in 2D.

Figure 4. Setpoint analysis functionality in MODDE. We can adjust the Salt and EtOH settings and their disturbance ranges in the sliding bars (Above) to inspect their distributions and distribution of Yield (Below).