A Critical Review of Algae Process Optimization Using  Design of Experiment  Methodologies

Bohnen FM; Brück TB

doi:https://doi.org/10.47739/2333-7117/1015

A Critical Review of Algae Process Optimization Using Design of Experiment Methodologies

Research Article | Open Access

Article DOI : https://doi.org/10.47739/2333-7117/1015

Bohnen FM^1* Brück TB^1,2

^1. Bohnen & Brück Sustainable Innovations UG , Germany
^2. Division of Industrial Biocatalysis, Department of Chemistry, Technical University of Munich (TUM), Germany

+ Show More - Show Less

Corresponding Authors

Frank Bohnen, Bohnen & Brück Sustainable Innovations UG, Kattowitzer Str.34, D-51065 Köln, Germany, Tel: +49-1771911792

Abstract

Climate change, limited fossil resources and steady population growth drive industrial development of sustainable biomass based processes for production of fuels, platform and specialty chemicals. In this context phototrophic microalgae cultivation processes are poised to play a central role for development of next generation bioprocesses due to high cellular growth rates and excellent yields of value adding products. Additionally, microalgae cultivation does not compete with food production and does not require agricultural land. However, industrial scale realization of microalgae cultivation has been hampered by technical issues such as optimal light and gas supply and the economic configuration of the bioreactor system. The resulting multi-parameter space complicates a targeted process optimization. To streamline experimental process optimization mathematical modeling using design of experiment (DoE) methodologies can be applied. Initial experimental data can be used to refine primary mathematical models. Therefore, DoE can be applied for consolidation of an iterative process optimization. However, DoE model choice and optimization in a dependent multi-parameter space is complex and requires diligent analysis of model design and parameter outcome. Testing data for normality is essential to derive a viable model and experimental data. If these prerequisites are not met misguided approaches to process optimization may result. In this study we reanalyze published data and demonstrate methods to streamline interpretation of DoE data. Further, methodologies and strategies for data transformation are presented that improve model evolution. To the best of our knowledge, this is the first time that a sequential strategy for DoE model evolution was applied to an algae cultivation process. Guiding iterative process optimization for algae cultivation by robust DoE models will significantly contribute to accelerate process scale-up and time to market scenarios. Ultimately, these factors determine success of an industrial process design.

Citation

Bohnen FM, Brück TB (2013) A Critical Review of Algae Process Optimization Using Design of Experiment Methodologies. JSM Biotechnol Bioeng 1(3): 1015.

Keywords

•   Algae processes optimization
•   Cultivation parameter
•   Design of experiment
•   Analysis tools
•   Box-Cox transformation

INTRODUCTION

The eminent end of fossil fuel resources, climate change and a continuously growing world population drive the development of sustainable fuel and commodity processes [1]. However, current industrial processes for production renewable energy carriers are associated with the redirection of food to fuel feedstocks, destruction of agricultural land and excessive use of potable water [1-2]. Therefore, there is an urgent need to establish new technologies that do not compete with food production and do not result in land use change. A leading technology option is the photoautotrophic cultivation of microalgae species for production of biofuels, renewable chemicals, commodities and even pharmaceuticals [3-8].

A significant advantage of microalgae technology is efficient use of CO2 , rapid growth rates and the accumulation of high concentrations of value adding products [3-10]. Further, algae can be grown on marginal lands using salt instead of potable fresh water. Therefore, there is no competition with production of food resources required to maintain an ever growing population [3]. The efficiency of algae cultivation and hence the yield of biogenic feedstocks for renewable processes is highly dependent on microalgae species, the cultivation mode and cultivation conditions [3-11]. Numerous studies have examined the cultivation of various algae strains under different cultivation mode, particularly examining different photobioreactor configurations with open pond cultivation systems [3-9]. This is of particular value to estimate the economic efficiency of specific cultivation methods [9]. However, these studies were predominantly conducted under different cultivation conditions, varying media composition, gas input, illumination and temperature regimes [4-8]. Therefore, data for these studies are mostly not directly comparable.

To derive robust and scalable algae cultivation however requires systematic studies of process parameters [9]. Of particular interest is the importance of a particular cultivation parameter as this will determine the strategy for downstream optimization processes. To achieve process parameter prioritization, mathematical models can rapidly guide the experimental validation procedures. To this end a few algae cultivation studies have employed the mathematical toolbox of design of experiment (DoE) methodologies [10-11].

However, the application of design of experiments methodology is complex with pitfalls that can lead to data misinterpretation. In this paper we re-analyze literature data [10] and discuss the design of experiments methodologies. Ultimately, this DoE based approach can be a basis for a systematic strategy targeted at the optimization of algae cultivation processes. To derive a scalable, robust process it is imperative to obtain an exact mathematical description of complex algae systems, which can guide process optimization to find the optimal growth conditions.

METHODS

DOE models and data sets for cultivation of the saline, lipid forming microalgae of Dunianella terticola butcher (ATCC 30929), which involved media composition and illumination regimes were taken from the literature [10]. The study examined the effects of culture media components such as KNO3 (N-source) and NaCl as well as the importance of the illumination on biomass and triglyceride yields. A key finding was that the concentration of the nitrogen source governed the biomass and lipid yield, while the illumination regime was a subordinate process parameter. To test the validity of these models we have reanalyzed the DoE data sets using the design of experiments software tools of the Statistica 9 software suite [12]. To examine the validity of the literature data we included tests on normality, analysis of variance (ANOVA) and used the statistically relevant terms at the ≥ 95% probability level for mathematical model building and regression analysis [13].

RESULTS AND DISCUSSION

Re-Evaluation of Dunianella sp. growth parameters according to

i. Dependence of biomass, lipid yield on light intensity and medium salt concentrations

The influence of light intensity on algae growth is one of the most critical variables for any algae cultivation scale up, since photosynthesis rate is a function of light intensity [2,4]. However, intensive illumination particularly at the beginning of the cultivation phase can be counterproductive as it leads to bleaching of photosynthetic pigments and cell death [4]. For industrial scale up of cultivation systems it is of critical importance to determine the optimal operating window in terms of light intensity, since illumination for outdoor algae biomass production is dependent on the geographic latitude and the average climate conditions. For small laboratory applications the illumination can be tuned by lamps, but electricity costs need to be controlled and can be an economic bottleneck. In any economic scenario measures have to be taken to apply the optimum light intensity to the growth media [14].

Algae based lipids are regarded as the future oil source for biodiesel [3]. Massart and Hantson report on the influence of light intensity, potassium nitrate and sodium chloride on biomass productivity and lipid content [10]. The authors used Dunaliella tertiolecta (ATCC 30929) that has been cultured in a modified NORO medium. For optimal phototrophic growth carbon dioxide was applied as a sole carbon source by bubbling air (0.036% v/v CO2 ) through the medium. Effects of light intensity, potassium nitrate and sodium chloride concentrations were investigated by DoE methodologies using a face centered response surface area design with three factors (independent variables) in the ranges 100, 200, 300 [μmol m-2 s-1] for light intensity, 1, 2, 3 [g l-1] for potassium nitrate and 10, 30, 50 [g l-1] for sodium chloride. The dependent variables (output variables) measured were biomass productivity [mg l-1 day-1] and lipid content [wt% of dry mass] after 6 days of culture. We have tested the validity of the applied model and resulting data sets using a test of normality and ANOVA.

ii. Normality Test of available data sets

For the proper analysis of designs of experiments a normal distribution of the dependent variables is required, since all statistical models are based upon a normality premise. In our re-examination of the data [10] we applied the Shapiro-Wilk test (Figure 1) on normality for biomass productivity [mg l-1 day-1], and lipid content [wt% of dry mass].

The normal probability plot (Figure 1) shows that for the algae lipid content the dependent variables biomass productivity and dry biomass follow a normal distribution, while lipid content did not pass the normality test. If the data set adheres to a normal distribution, the data points should follow a diagonal line, and the probability value should be greater than the probability level chosen. Sample distribution in Figure 1 indicates that data for biomass productivity have a much better fit to the diagonal line compared to thelipid content. This visual representation is confirmed by the outcome of the Shapiro-Wilt test: The probability level is set to p=0.05 for the Null-hypothesis, that data do not follow a normal distribution. For biomass productivity p=0.20483 for the Shapiro-Wilk test and greater than p=0.05. Therefore the Null-hypothesis has to be rejected. Consequently, the data can be regarded to follow a normal distribution. For lipid content the Shapiro-Wilk test calculates p=0.00077, that is smaller than the p=0.05 probability level. In this case the Nullhypothesis is accepted. The data for lipid content do not follow a normal distribution. To obtain normality for lipid content the raw data can be subjected to a mathematical transformation. Albeit ANOVA is regarded to be very robust even for non-normal distributed data [6], it potentially can lead to inconsistent models. In the course of this report we demonstrate that the author’s modeling and interpretation of the light intensity influence on the lipid content was limited. We will further show how the BoxCox transformation of lipid content data leads to more robust models and data interpretation. Our data re-analysis indicates that the original authors’ data interpretation can be improved to a valuable outcome.

iii. ANOVA and modeling for biomass productivity

For the determination of the statistically relevant terms an analysis of variance was performed. The ANOVA bases on an equation describing the linear, quadratic and quadratic interaction terms for the biomass productivity response on light intensity, potassium nitrate and sodium chloride (see Equation 1).

Equation 1:

Biomass productivity = k0 + k1 [KNO3 ] + k2 [NaCl] + k3 [Light] + k11 [KNO3 ]2 + k22 [NaCl]2 + k33 [Light]2 + k12 [KNO3 ] [NaCl] + k13 [KNO3 ] [Light] + k23 [NaCl] [Light]; [Eq. 1]

Our re-calculation of the statistically relevant equation terms on biomass productivity confirm the author’s findings. In statistical terms only k2 [NaCl], k3 [Light], k33 [Light]2 and k0 (intercept) contribute at a 95% confidence level to the biomass productivity response. The influence of the nitrogen source in the chosen range is statistically insignificant. Consequently, the model should only contain the coefficients k0 , k2 , k3 and k33 and therefore is reduced to the statistically relevant terms resulting in (See Equation 2).

Equation 2:

Biomass productivity = k0 + k2 [NaCl] + k3 [Light] + k33 [Light]2 ; [Eq. 2]

In the original paper the authors used all equation terms for modeling. Comparing the results of the modeling for the all factor equation (Equation 1) with our statistically refined equation (Equation 2) the differences of the predicted values are negligible. However, the confidence and prediction intervals are significantly smaller in case of the statistically refined model and therefore the predictability of the model is more accurate.

Table 1 shows the calculation for the factor levels: 200 [μmol m-2 s-1] for light intensity, 2 [g l-1] for potassium nitrate and 30 [g l -1] for sodium chloride.

The improved predictability of Equation 2 can be explained by statistic descriptors. The mean square residual or mean square error is lower for the refined model (Equation 2) since the residual degrees of freedom are higher for the refined model compared to the all factor model (Equation 1). This outcome exemplifies that the essence of any effective DoE model building is the iterative elimination of statistically insignificant terms, which leads to a more simplified and statistically more accurate model. Therefore, the simplified process model in (Equation 2) provides for an accelerated and exact model interpretation.

Further, the mean square (MS) residual is a qualitative measure how good the fit corresponds to the data points. In this context, a low value for MS residual indicates an improved mathematical description of the process. If the model leads to a higher degrees of freedom value (df-value) more data points can be used for the error term calculation and the confidence and prediction intervals become smaller. Therefore, the refined model in Equation 2 is more powerful for process prediction purposes. In fact, narrowing of the confidence and prediction intervals indicates that the refined model in (Equation 2) is more suitable for establishing industrial process scale-up since the derived risk analysis will be more accurate.

iv. ANOVA and modeling for lipid content

In the original report [10] data for the lipid contest has not been subjected to a normality test. Subsequently, ANOVA data analysis and model building was performed with the raw, nonnormalized data.

By contrast, in our study application of the Shapiro-Wilk test for the lipid content confirmed a non-normal data distribution. In consequence, the normality prerequisite for variance analysis is not given for the lipid content.

For the determination of the statistically relevant factors the original publication describes an analysis of variance. The ANOVA bases on an equation describing the linear, quadratic and quadratic interaction terms for the biomass productivity response on light intensity, potassium nitrate and sodium chloride (see Equation 3).

Equation 3:

Lipid Content = k0 + k1 [KNO3 ] + k2 [NaCl] + k3 [Light] + k11 [KNO3 ]2 + k22 [NaCl]2 + k33 [Light]2 + k12 [KNO3 ] [NaCl] + k13 [KNO3 ] [Light] + k23 [NaCl] [Light]; [Eq. 3]

In the original paper the authors inferred based on ANOVA analysis that only the terms k0 (intercept), k2 [NaCl] and k22 [NaCl]2 are statistically significant. Therefore it was concluded that light intensity has no statistical relevance. Consequently, the proposed growth model for Dunaliella sp. was summarized as in ( See Equation 4)

Equation 4:

Lipid Content = k0 + k2 [NaCl] + k22 [NaCl]2 ; [Eq. 4]

In the original paper the authors used all coefficients for the modeling (Table 2). Inconsequently, it was concluded based on Equation 3 that medium light intensity at highest NaCl concentration gives the best results regarding lipid content and no photo inhibition takes place.

ANOVA analysis in the original paper shows, that the quadratic term for light intensity has a p-value of 0.08. This is close to the chosen probability level having a p-value of 0.05. This indicates, that in a refined model the term [Light]2 could be of statistical relevance. For clarification it has to be stated that a p-value of 0.05 equals 95% probability that the term has statistical relevance. The null-hypothesis assumes that the term is not statistically relevant.

However, the deviation of a more accurate model based upon statistically relevant data should follow a procedure of consecutive elimination of the most irrelevant terms followed by an ANOVA for the model without the excluded terms. Since the exclusion of terms changes the model, the ANOVA results and regression coefficients change as well. Following the procedure of step-by-step elimination of the statistically least relevant terms the p-values of the remaining terms usually show an evolution towards lower numbers, since the model gets more accurate in statistical means.

v. Modeling of the raw lipid content data

For the evolution of a statistical model for the nontransformed data we followed the above described consecutive term elimination followed by ANOVA. In this context, (Figure 2) shows the sequential evolution of the model. Examination of (Figure 2) indicates, that the term [light]2 in iteration step 4 moves below the 0.05 p-threshold and therefore is statistically relevant within the 95% probability level. Contrary to the original report ANOVA analysis indicates that the quadratic term of light intensity is statistically significant

This is a major difference regarding the mathematical model of the experimental data. The claim on independence of the system to light intensity should be revised since the refined model indicas that photo-inhibition take place [15,16].

The re-evaluation of the raw data leads to a different model (refined raw data model):

Equation 5:

Lipid Content = k0 + k2 [NaCl] + k22 [NaCl]2 + k33 [Light]2

The outcome of the iterative modeling approach results in the statistically relevant terms: linear and quadratic sodium chloride concentration and quadratic light illumination. Again, potassium nitrate concentration variation has no significant impact on the lipid content. The influence of sodium chloride is predominant and high light intensities lead to photo-inhibition.

vi. Modeling of the Box-Cox transformed lipid content data

If dependent variables do not follow normal distribution the raw data must be transformed. In this respect, a Box-Cox transformation of the data is a straightforward methodology that can be applied to identify a suitable transformation algorithm. In principle, the Box-Cox transformation is a power transformation that can shift the raw data towards normality [13]. The most important parameter resulting from this procedure is the lambda (λ) value. If λ ≠ 0, transformation λ equals the exponent applied in the power transformation for each dependent data point, if λ = 0, each data point of the dependent variable should be transformed to its natural logarithm value. As indicated in various literature reports it must not be the exact λ value [13,17]. In this respect, (Table 2) summarizes λ values and corresponding transformation procedures that can be applied to normalize experimental data [3,18,17].

We ran a Box-Cox transformation on the raw data for lipid content. The iteration led to λ= -0.0183 which is close to Zero. Adhering to rules in (Table 2) the lipid concentration data were transformed to its natural logarithm. To validate if the transformed data ln (lipid content) follow a normal distribution, the Shapiro-Wilk test was applied.

Figure 3 shows that the natural logarithm-transformed data passed the test on normality and therefore was used for the ANOVA and model evolution. The initial formula for the modeling is shown in equation

Equation 6:

ln (Lipid Content) = k0 + k1 [KNO3 ] + k2 [NaCl] + k3 [Light] + k11 [KNO3 ]2 + k22 [NaCl]2 + k33 [Light]2 + k12 [KNO3 ] [NaCl] + k13 [KNO3 ] [Light] + k23 [NaCl] [Light];

Using the consecutive elimination of the least significant term followed by ANOVA of the term reduced model the evaluation converged to a model with the statistical significant terms [NaCl], [NaCl]2 and [Light]2 leading to equation 7.

Equation 7:

ln(Lipid Content) = k0 + k2 [NaCl] + k22 [NaCl]2 + k33 [Light]2 ;

The analysis of the natural logarithm transformed data shows the same statistically relevant terms compared with our revised raw data analysis. On first sight it might be not necessary to apply a Box-Cox transformation. However, comparing the models will show significant differences.

vii. Comparison of the different models for lipid content

Table 3 shows the regression results, the mean square residuals, degrees of freedom for error term calculation, the prediction results for the center point of the design of experiments at [KNO3 ] = 2, [NaCl] = 30 and [Light] = 200 for model comparison and the adjusted regression coefficients for the ANOVA effect estimates. The regression equations for the refined models (Equation 5 and Equation 7) contain less terms compared with the all term Equation 3. This can be deduced from the degrees of freedom. Therefore, the error term calculation is more accurate for Equation 5 and Equation 7 and results in broader confidence and prediction limits for the all term model (Eq3).

Albeit the refined models (Equation 5 and Equation 7) contain identical statistically relevant terms, the model using Box-Cox transformed data (Equation 7) is more accurate. In contrast to the other models the Box-Cox transformed data meet the normality premise. ANOVA calculations indicate, that the MS residual for the transformed data is much lower. Consequently, error term calculation is more accurate. For the reason of comparability the prediction data for natural logarithm transformed data were retransformed by an ex function. It can be deduced from table 3, that the confidence and prediction intervals are more constraint for the model using Box-Cox transformed data and most relaxed for the all term raw data model. In addition, the regression coefficients for the ANOVA effect estimates is optimized for our recalculation (Equation 7).

The calculation of the lipid content using the all term model (Equation 3) and the refined raw data model (Equation 5) is insignificant in statistical terms since the confidence limits are very broad. Only the model using the transformed data (Equation 7) allows an accurate mathematical description and could be used for industrial scale-up purposes.

CONCLUSIONS

Limited fossil resources, the climate change and steady population growth drive industrial development of sustainable biomass based processes for production of fuels, platform and specialty chemicals [1]. Due to excellent biomass and product yields phototrophic microalgae cultivation processes are poised to play a central role in the development of next generation renewable processes [2-4]. Microalgae cultivation does not compete with food production and does not require agricultural land. However, industrial scale realization of microalgae cultivation is complicated by technical issues such as optimal light and gas supply and the economic configuration of the bioreactor system. The resulting multi-parameter space complicates experimental process optimization. To streamline experimental process optimization mathematical modeling using design of experiment (DoE) methodologies can be applied [10]. However, devising meaningful models that can guide experimental process optimization requires exact analysis of process parameters and needs to adhere to statistical prerequisites for model building. Incoherent models and data analysis may prohibit accurate process optimization. In this study, we have re-analyzed literature data and DoE models for optimization of Dunaliella tertiolecta cultivation a closed bioreactor system [10]. We demonstrate that the literature applied model can be further improved and present strategies and models that lead to a more accurate interpretation of resulting data. In contrast to the reported mathematical model, our refined model clearly indicates that production of algae biomass and associated triglycerides is primarily dependent in light supply to the culture. If initial data sets, i.e. the light dependence of triglyceride formation [10], does not pass a normality test, methods such as the Box-Cox transformation should be applied to obtain normality. The transformed and normal distributed data can then be utilized for sequential model building and parameter examination. For rapid technical scaleup and commercialization of algae cultivation procedures high accuracy process modeling is required. Strict application of DoE design strategies presented here and statistical prerequisites will significantly improve algae cultivation processes, thereby accelerating technical scale-up and commercialization of algae cultivation processes. Good DoE models will not only accelerate commercialization but also enable an accurate risk analysis due to consideration of confidence and prediction limits.

In summary, design of experiment is a very powerful tool for the optimization of biotechnological processes and the mathematical modeling of the statistical relevant terms at a given probability level [18]. However, to avoid misinterpretation and to obtain accurate results, data analysis must follow several rules [19].

The DoE methodology requires normal distributed data [17,19,20]. After completion of the experimental series a test on normal distributed data is a prerequisite. In this context, the Shapiro-Wilk test is robust method. Failing the test on normality usually is caused by skewed distribution pattern or due to the experimental set-up.

To transform skewed distributions the Box-Cox transformation is the method of choice. If Box-Cox transformed data passes the test on normality the evaluation should be continued with transformed data. In case of Box-Cox transformation failure the whole data set and the experimental set-up has to be revised regarding e.g. inaccurate measurements, analytical precision, changes in raw materials or reaction conditions [21].

Accurate mathematical modeling should follow a consecutive elimination of the least statistical relevant terms based on the analysis of variance (ANOVA) until all terms of the mathematical model show statistical relevance at a given probability level. As a result simplified models with enhanced statistical relevance are obtained [22].

In case of Box-Cox transformation the mathematical model has to be back-transformed by the inverse function applied to the data [13].

The resulting mathematical description only contains statistical relevant terms and allows a solid scientific interpretation of the microorganism’s response on the selected cultivation parameter [23].

In this study we analyzed published data applying the rules and prerequisites of DoE model building. We demonstrated that accurate model evolution results in improved predictability.

Our analysis adopting the sequential model evolution clearly indicates that this methodology leads to robust model building and simplified data interpretation.

To the best of our knowledge, this is the first time that a sequential model evolution approach [22] was adopted for algae process optimization [10,11,15,14,23-31]. Guiding iterative process optimization for algae cultivation by robust DoE models will significantly contribute to accelerate process scale-up and time to market scenarios. Ultimately, these factors determine success of an industrial process design.