# Three-dimensional convolutional neural network (3D-CNN) for heterogeneous material homogenization

###### Abstract

Homogenization is a technique commonly used in multiscale computational science and engineering for predicting collective response of heterogeneous materials and extracting effective mechanical properties. In this paper, a three-dimensional deep convolutional neural network (3D-CNN) is proposed to predict the effective material properties for representative volume elements (RVEs) with random spherical inclusions. The high-fidelity dataset generated by a computational homogenization approach is used for training the 3D-CNN models. The inference results of the trained networks on unseen data indicate that the network is capable of capturing the microstructural features of RVEs and produces an accurate prediction of effective stiffness and Poisson’s ratio. The benefits of the 3D-CNN over conventional finite-element-based homogenization with regard to computational efficiency, uncertainty quantification and model’s transferability are discussed in sequence. We find the salient features of the 3D-CNN approach make it a potentially suitable alternative for facilitating material design with fast product design iteration and efficient uncertainty quantification.

###### keywords:

3D-CNN, convolutional neural network, deep learning, transfer learning, multiscale homogenization, heterogeneous material^{†}

^{†}journal: Computational Materials Science\geometry

letterpaper

## 1 Introduction

The last few decades have seen tremendous applications of heterogeneous materials in automotive industry, civil, aerospace and mechanical engineering. These materials possess superior mechanical properties attributed to the unique architecture and complex microstructure. Most common among these materials are concrete, alloys, polymers, reinforced composites, etc. A primary assumption generally made for computational modeling of composite materials is that these materials are periodic in microscope scale and the periodic microstructures can be approximated by representative elements (RVEs). To develop composite materials with unusual combination of properties, it is crucial to understand the effects of various characteristics of RVE (microstructure, constituent phase, volume fraction, etc.) on the macroscopic material properties.

For most of composite design problems, effective material properties are used instead of taking all the constituents and microstructure into consideration. A lot of efforts have been devoted to developing mathematical and/or numerical approaches for calculating the effective/homogenized material properties. The homogenization theory, which was originally developed to study partial differential equations (PDEs) with rapidly oscillating coefficients hornung2012homogenization, have been widely used to describe the mechanics of periodic microstructure of composites. Numerous homogenization approaches have been developed to calculate effective properties which can subsequently be used for macroscopic structural analysis. These approaches can be classified into three categories aboudi2012: (1) Analytical methods, e.g., the Voigt and Reuss model voigt; reuss1929; (2) Semi-analytical methods, e.g., generalized method of cells (GMC) aboudi2004gmc, self-consistent scheme (SCS) scs1968; scs1978, Mori-Tanaka method mori1973; (3) Numerical methods, e.g., finite flement (FE) feyel1999fe2; feyel2000fe2; feyel2003fe2; miehe2002fe; smit1998fe; terada2001fe, boundary element (BE) kaminski1999bem; okada2001bem, fast fourier transforms (FFT) lee2011fft; eisenlohr2013fft. Each of the aforementioned approaches has its pros and cons. For example, the Voigt and Reuss model provides a quick but rough upper and lower bounds for various properties of a heterogeneous material; however, the gap of the bounds grows with regard to the volume fraction (VF) of inclusions and degree of phase contrast kanoute2009review. Although the numerical methods involve complicated discretizations and expensive computations, they offer a possibility to deal with homogenization of materials with arbitrary microstructures and constitutive models. These methods have been shown to be effective to model multiscale material behavior in both linear kaminski1999bem; terada2001fe; yuan2008homo and nonlinear miehe2002fe; feyel1999fe2; feyel2003fe2; feyel2000fe2; yuan2008homo; hain2008numerical; liu2014regularized; liu2016nonlocal problems given the properly defined material constituents and microstructure. However, when it comes to iterative computational design of composites with desired properties, these numerical approaches are not suitable owing to the huge computational cost fritzen2018two and high-dimensional sample space olson1997computational; lookman2019active; fujii2001composite.

With recent prevalence of data science, many machine learning (ML) approaches are applied to material modeling, analysis and design. A novel framework named materials knowledge systems (MKS) landi2010mks; fast2011mks; kalidindi2015mks was formulated to exploit the merits of both analytical and numerical approaches. MKS has its theoretical rooted in statistical continuum mechanics theory kroner1986statistical in which the structure-property linkage of the material is expressed as a polynomial series sum. Each term of the series is a product of local microstructure-related statistics and their corresponding physics-related (or influence) coefficients landi2010mks which reflects the underlying knowledge of the localization relationship. The core of MKS is employing discrete Fourier transform (DFT) to calibrate these coefficients to the results obtained from finite element analysis (FEA). This framework is characterized with computational efficiency, data-driven property and remarkable accuracy in a variety of works landi2010mks; fast2011mks; kalidindi2015mks; yabansu2014mks; gupta2015mks. There are also some other applications of ML approaches on computational materials and mechanics. Fritzen and Kunc fritzen2018two proposed a two-stage data-driven homogenization approach for nonlinear solids. Lookman *et al.* lookman2019active employed an active learning approach to navigate the search space for identifying the candidates for guiding experiments or computations. The surrogate model and utility function are used for selecting among the unexplored data.

Traditional ML techniques rely largely on the feature engineering which is time-consuming and requires expert knowledge lecun2015deep. Deep learning (DL) approaches have been developed to address this problem. Typical DL approaches, such as fully connected neural networks (FC-NN), convolutional neural network (CNN) and long short-term memory (LSTM), can automatically find the most salient features to be learned. These approaches have demonstrated tremendous success in a variety of applications such as speech recognition, computer vision (CV), natural language processing (NLP), etc. They turned out to excel at discovering the intricate structures within high-dimensional data lecun2015deep. Some of the recent applications of DL approaches on material science include material classification zheng2016; Bell2015, defect classification masci2012; cha2017deep; faghih2016deep, microstructure identification azimi2018; chowdhury2016image, microstructure reconstruction li2018transfer; li2018GAN, composite strength prediction yeh1998modeling, etc. In this paper, we are mostly concerned with the works employing DL to address multiscale problems of composites, particularly in the context of homogenization. For example, Lu *et al.* lu2018data adopted neural networks (NN) to establish a surrogate model for electric conduction homogenization. By substituting the RVE calculations with the data-driven model in multiscale modeling, a drastic saving of computational cost (of the order of ) was achieved compared with the FE method feyel1999fe2. Le *et al.* le2015computational proposed a decoupled computational homogenization approach for nonlinear elastic materials using NN to approximate the effective potential. Li *et al.* li2018transfer employed the transfer learning idea on CNN for microstructure reconstruction. Bhattacharjee and Matouš bhattacharjee2016nonlinear performed both homogenization and localization on heterogeneous hyperelastic materials using a digital database and the manifold-based nonlinear reduced order model (MNROM). The mapping between the macroscopic loading conditions and the reduced space are realized through NN. Yang *et al.* yang18gan applied generative adversarial networks (GAN) to generate microstructures with desired material properties. Cang *et al.* cang2017microstructure implemented convolutional deep belief network (CDBN) to automate a two-way conversion between microstructures and their lower-dimensional feature representations. Bostanabad *et al.* bost2016 adopted a supervised learning approach to characterize and reconstruct the stochastic microstructure.

Most of the above studies are image-based and perform representation learning within a 2D space. To fully capture the salient features of the microstructure, the 3D geometry should be considered. Very recently, Yang *et al.* yang2018dl showed the potential of three-dimensional CNN (3D-CNN) for effective elastic modulus homogenization for composites and demonstrated its advantages over traditional sophisticated physics-inspired approaches. In this work, we leverage the capability of 3D-CNN and design a network architecture for predicting the effective material properties of composites with complex heterogeneous microstructure. In particular, we consider the composite material whose microstructure can be modeled as a two-phase (matrix/inclusion) representative volume element (RVE) with randomly distributed inclusions. A diverse group of RVEs, or virtual experiment samples, have been created with different inclusion VFs and spatial distributions, so that the sample space is large enough to include the intrinsic features of the material. Finite element analysis is then performed for each of the samples to obtain the effective moduli through linear homogenization. The geometric information of the RVEs have been pre-processed to a structured (Euclidean) grid that the 3D-CNN can accept. The networks are then trained, verified and tested on synthetic data. The salient features of the proposed 3D-CNN approach include: (1) It provides an end-to-end solution for predicting the effective material properties of the composites with high efficiency and good accuracy given the geometric information of the corresponding RVEs; (2) It is able to reproduce the probability distribution of the material properties for the input characterized with uncertainty; and (3) Its transferability makes it extremely convenient while adding supplementary data or training a model for new datasets that come from different microstructure configurations. It is worth noticing that the proposed 3D-CNN approach is more advantageous for heterogeneous materials with multiple constituents and extremely complex microstructure since it has demonstrated extraordinary ability in handling high-dimensional inputs ji3DCNN; maturana3dCNN; kamnitsas3dcnn; yang2018dl.

The rest of the paper is organized as follows. Section 2 describes the proposed methodology. Specifically, generation of the training dataset (based on 2000 RVEs) is presented in Section 2.1. Some pre-processing procedures including the conversion of the raw data into the input format of the 3D-CNN model, computational homogenization approach to obtain the labels and rescaling of the labels are given. In Section 2.2, the basic concepts and mathematical operations involved in the 3D-CNN are briefly introduce. Section 3 presents the numerical results. We first conduct a series of parametric tests on the hyperparameters of the 3D-CNN to find an optimal network architecture. Then a comparison between the 3D-CNN prediction and FEA result is made with regard to the accuracy and efficiency in Section 3.2. The benefits of the 3D-CNN approach over traditional FEM are discussed. The uncertainty quantification (UQ) is conducted in Section 3.3 to evaluate the performance of current 3D-CNN model on the input with uncertainty. In Section 3.4, the transferability of the proposed 3D-CNN model to a dataset representing a different type of composite microstructure is investigated. Section 4 is devoted to conclusions of the paper and the outlook of future work.

## 2 Methodologies

### 2.1 Generation of dataset and preprocessing

In this present study, we consider particle reinforced composites, e.g., metal matrix composites, whose microstructure can be represented by a parametric two-phase RVE model with a matrix phase and a particle phase. We generate 2000 RVE samples with the volume fraction (VF) of inclusions ranging from to to establish the training data (see Fig. 1(a)). The radius of each spherical inclusion follows a uniform distribution in the range of 0.050.1 mm while the length of the square RVE is 1.0 mm. The spherical inclusions within the RVE are randomly distributed based on the Hierarchical Random Sequential Adsorption (HRSA) algorithm bai2014auto that could achieve a user-defined desired VF. Generally the RVE with low inclusion VF demonstrates greater randomness in terms of particle spatial distributions resulting in significant randomness of the effective elastic moduli. To resolve this issue, we impose an exponential distribution on the number of samples with regard to the VF, as shown in Fig. 2, to most likely cover the manifold of the relationship between random RVEs and the effective elastic properties. This practice is meant to better capture the spatial characteristics of the RVE during training.

Preprocessing is required to convert the geometric data (or discretized mesh data) of RVEs into Euclidean grids, the input format that a 3D-CNN can take. We resample the phase information of RVEs within fixed Cartesian grids. In particular, these RVEs are converted to voxels where matrix phase is denoted by 0 while inclusion phase by 1 (see Fig. 1(b)). Given the center location and geometric information of all these inclusions, a level-set function is used to assign a binary phase value to the voxel with coordinate , namely,

(1) |

where is the total number of inclusions; , , and are coordinates of center and radius of th spherical inclusion, respectively. It is noted that the size of 101 (length of 0.01 mm) is selected in order to cover all the microstructural details within the RVEs since the minimum radius for the spherical inclusion is 0.05 mm.

Materials | Young’s modulus (GPa) | Poisson’s ratio |
---|---|---|

Matrix | 68.9 | 0.33 |

Inclusion | 379.2 | 0.21 |

The deep learning method falls into the category of supervised learning in which training data needs to be labelled. In this paper, linear elastic materials are considered for both the matrix and inclusion phases. The material properties of each single phase used in this study are given in Table 1. Since the considered composite is assumed to be orthotropic, its constitutive tensor has 9 independent variables from which the following vector of effective material properties can be obtained:

(2) |

where denotes the label for each RVE sample; ’s, ’s and ’s denote the effective elastic modulus, shear modulus and Poisson’s ratio, respectively, along different directions. The computational homogenization is conducted based on the framework of the classical mathematical homogenization theory guedes1990homo; yuan2008homo via FEM. Specifically, the homogenized constitutive tensor can be calculated through averaging over the entire volume of the RVE, expressed as

(3) |

in which is the stress influence function with regard to the fine-scale coordinate . It can be interpreted as the fine-scale stress induced by an unit overall strain . The implementation of numerical homogenization is achieved by solving a RVE (or unit cell) problem under periodic boundary conditions (PBCs) and unit thermal strain yuan2008homo. The components of constitutive tensor can then be obtained by averaging the stress field over the volume, given by

(4) |

The constitutive tensor can be represented in the Voigt notation, written as

(5) |

The inverse of results in the so-called stiffness matrix shown as follows, from which the vector of effective material properties can be calculated.

(6) |

The entire dataset is randomly divided into training, validation and testing set with a ratio of 1400:300:300. The training set is used for learning the parameters (i.e., weights and biases) of the 3D-CNN (see Section 2.2) while the validation set is used to tune the hyperparameters (i.e., the architecture) of the 3D-CNN. The validation set is also adopted as a regularizer via early stopping, i.e., to stop the training when the loss function on the validation set increases, as it is a sign of overfitting to the training data set ripley1996pattern. The testing set, which is usually unseen to the training process, serves for confirming and evaluating the actual predictive power of the trained deep learning model.

Since the RVEs in this paper are generated artificially, we can directly extract the microstructure information from the formatted data. However, how to obtain the phase information of samples from field measurements is an issue of interest. The nondestructive imaging techniques such as X-ray micro-topography stienon2009xray; proudhon2007xray; Alp2020, 3-D atom probe kelly2007atom and automated serial sectioning spowart2006automated have made possible to capture 3D material microstructures. These imaging techniques are characterized with high resolution. For example, the synchrotron radiation micro-tomography is able to sample microstructure with resolution of 2048 voxels in each dimension betz2007imaging. Therefore, it will be promising for field measurement techniques to be incorporated into current framework with appropriate down-sampling on the microstructure data. Nevertheless, this is beyond the scope of the current study.

### 2.2 3D convolutional neural network

The convolutional neural network (CNN or ConvNet) is proposed originally to solve computer vision problems. LeCun *et al.* lecun1989 designed one of the very first CNNs to successfully recognize handwritten digits in 1990s. The applications of CNNs were limited by the less powerful computational ability at that time. In recent years, the CNN approach has been revived owing to the huge advancements on computational hardware such as the general purpose graphics processing units (GPUs). The CNN differs from the classical FC-NNs by its weights sharing mechanism. In this study, we propose a 3D-CNN architecture (see Fig. 3) for inferring homogenized/effective material properties (e.g., elastic moduli, shear moduli and Poisson’s ratio) from given microstructure configurations (e.g., discretized distribution of material phases).

The 3D-CNN takes the preprocessed phase voxels as the input. Subsequent multiple convolutional layers serve as the critical composition of the CNN with 3D convolution filters and pooling operation. As indicated in Fig. 4, the 3D filter scans over the phase voxels and applies convolutional operation (dot product of tensor) to produce the feature map. The weights and biases of each filter are trained to extract the salient features from the input. Stride, padding and filter size are a few common hyperparameters defining convolutional operations. Stride denotes the size of step that filters move each time. For instance, the stride length of 1 means the filters scan the volume voxel by voxel. To preserve the spatial size of the output, it is convenient to pad the input with zero-value voxels. A good example is that the input and output size in Fig. 4 will be identical () if the convolution operations are conducted with stride of 1 and 2-layer zero padding. Pooling layers are usually added between successive convolutional layers in the CNN. It progressively reduces the spatial size of data through down-sampling the voxel value. Pooling operations may compute the maximum or average value within a volume. Fig. 5 demonstrates how the max-pooling operation works with volume size of . The activation layers are employed to introduce nonlinearity into the CNN. It takes a single number and performs a certain fixed mathematical function. Some typical activation functions are Rectified Linear Unit (ReLU) , Sigmoid function and tanh function . Among these non-linear functions, ReLU (see Fig. 6) is preferred and thus selected owing to its cheap arithmetic operation and excellent convergence properties on the stochastic gradient descent (SGD) algorithm compared with the Sigmoid or tanh functions. Mathematical expression of the output value at position on th feature map in th 3D convolutional layer can be written as ji3DCNN

(7) |

where denotes element-wise ReLU function; is the common bias for th feature map; is the th value of the 3D filter for th feature map at th layer associated with the th feature map in the th layer; is the number of feature maps at th layer; and denotes the size of the 3D filter at th layer. In this paper, an constant filter size is used through the convolutional layers.

FC layers are employed at the end of the 3D-CNN where neurons between two neighboring layers are interconnected. FC layers take the flattened tensor from the previous hidden layer as the input and map them to desired output which are exactly the vector of effective material properties with length of 12 as shown in Eq. (2). The connection between two adjacent layers, here from th to th, can be expressed concisely in the form of tensor operations, given by

(8) |

where and are the input and output for the th layer; denotes the Sigmoid activation function acting element-wise; and are the weight matrix and bias vector between the th and the th FC layers. The weights and biases in the FC layers are also the trainable parameters of the 3D-CNN. The mean square error (MSE) between the 3D-CNN’s prediction and the ground truth of the training dataset is adopted as the loss function, given by

(9) |

where denotes the training data set {}, denotes the total number of samples, denotes the index of component for the effective properties vector. The optimal parameters , can be obtained by minimizing the loss function, namely,

(10) |

A common issue facing the DNN-based approaches is to mitigate the overfitting brought about by its extraordinary approximation ability. Several treatments are considered in this paper. Firstly, it is noted that there is a scale difference between the outputs of elastic (or shear) modulus and Poisson’s ratio which might bring problems to the optimization. For example, an output variable with a large range of values could result in large error gradient values causing weight values to change dramatically, making the learning process unstable bishop1995neural. Therefore, label rescaling is employed here to address this problem. The elastic (or shear) moduli and Poisson’s ratios are scaled separately into the range of 0 to 1 with a min-max scaling manner, e.g.,

(11) |

where denotes the output component vector while is the corresponding scaled output. In addition to label rescaling, early stopping girosi1995regularization and sample shuffling during training are adopted as the regularizer to alleviate overfitting.

In this paper, the filter size of , stride length of 1 and no-padding are configured on the convolutional layers. The max pooling with size is set on pooling layer. ReLU function is selected as the activation function due to the aforementioned merits. Other hyperparameters such as number of filters, depth of convolutional layers and FC layers are selected through parametric tests on Section 3.1. An adaptive learning rate optimization algorithm, Adam kingma2014adam, is used for the training of 3D-CNN models.

## 3 Results

In this section, the performance of the proposed 3D-CNN for heterogeneous material homogenization is evaluated. A series of parametric tests on the network hyperparameters (e.g., filter size, depth, width) of the 3D-CNN are conducted to find a suitable architecture for the current application. Then the trained 3D-CNN is used to predict the effective properties on the testing dataset with 300 RVEs. The performance of the 3D-CNN is discussed based on a comparison between the model inference and the results produced by traditional FEA. Since the randomness of the inclusion distribution is a significant aspect of the naturally occurring heterogeneous materials, uncertainty quantification is conducted on an independent dataset that imitates the input with uncertainty. Finally, the transferability of the trained 3D-CNN model to a new dataset (for RVEs with different inclusion shapes) is examined. The proposed 3D-CNN architecture is implemented with the high-level neural networks API - Keras 2015keras using Python 3.7. Our networks are trained on platform equipped with NVIDIA GeForce GTX 1080 Ti GPU and Intel Core i9-7980XE [email protected]

### 3.1 Design of the 3D-CNN architecture

A typical CNN involves dozens of hyperparameters that control the learning process of the network. These include the number of filters, filter size, learning rate, number of hidden layers, and batch size, just to name a few. The huge sample space makes it nearly impossible to find an optimal combination of hyperparameters. Therefore, the hyperparameters are usually searched in a trail-and-error manner within a small sample space. Fortunately, some rules of thumb for selecting the hyperparameters can be applied here. For example, the number of filters in convolutional layer should reflect the enrichment of characterized features within the input. It usually depends on the number of samples and the complexity of the task krizhevsky2012imagenet. The number of FC layers and neurons determine directly the total number of parameters (weights and biases) and thus affect the representational power of the network cybenko1989approximation. Therefore, it is natural to select the hyperparameter combination based on the underlying physical and mathematical interpretation of the “knowledge” to be learned. In this section, We evaluate different 3D-CNN architectures with varying number of hidden layers and filters. The MSE on validation dataset) defined in Eq. (9) is used to measure the performance of each 3D-CNN architecture.

As is mentioned in Section 2.1, the Cartesian grid used to sample the RVE is of size so that the smallest inclusion with radius equaling 0.05 mm could be captured. In our design of the 3D-CNN architecture, we select the fixed filter size to be 5 in all three dimensions so that it is identical to the size of smallest inclusion. The batch size during training is set to be 25 according to the memory space available on the hardware. The trained model with the best performance, i.e., lowest MSE, for each architecture after 1000 epochs are saved for later inference. This is the commonly used technique aforementioned as early stopping. Table 2 provides the configurations of each 3D-CNN architecture. The convolutional layer and fully connected layer are denoted by Conv() and FC() respectively. The values within the bracket of Conv() indicates the filter number and filter size. Similarly the values within the bracket of FC() represent the number of neurons (width) in each layer. For example, Conv(32, 5) means the convolutional layer has 32 filters whose size is while FC(64, 32) means the FC layers are composed of two layers whose widths are 64 and 32 respectively.

The corresponding MSE of each architecture is listed on Table 2. It can be inferred from Case 2 and Case 5 that increasing the number of filters in each convolutional layer does not necessarily improve the prediction performance. A large width of the network might cause the overfitting issue on the training dataset. A similar situation is met while increasing the number of convolutional layers (e.g., Case 4) and FC layers (e.g., Case 7) on the basis of Case 2. Moreover, the comparison between Cases 1-3 demonstrates that two FC layers each with 64 and 32 neurons deliver the best prediction performance on unseen validation dataset. Taking both the accuracy and efficiency of the listed architecture into account, the 3D-CNN architecture with hyperparameters shown in Case 2 is employed in the remaining of this paper.

To check how the material phase information (input) is transformed through the multiple convolution layers, a group of example feature map slices are visualized in Fig. 7. The feature map is 3D in the present 3D-CNN approach. However, for easier visualization, we only show the slices of the feature map. Typically the colored area in the feature map is called activated region which represents the extracted feature from the input. In our application, the activated region reflects the microstructural characteristics that the convolutional filters capture. It can be seen that the first convolutional layer preserves most of the details in the original input. As we go deeper into the convolutional layer, the feature map becomes abstract because it usually represents the high-level characteristics that is less visually recognizable.

No. | Model description | MSE |
---|---|---|

1 | Conv(16,5)+Conv(16,5)+Conv(32,5)+FC(32 16) | 2.82 |

2 | Conv(16,5)+Conv(16,5)+Conv(32,5)+FC(64 32) | 2.79 |

3 | Conv(16,5)+Conv(16,5)+Conv(32,5)+FC(128 64) | 2.89 |

4 | Conv(16,5)+Conv(16,5)+Conv(16,5)+Conv(32,5)+FC(64 32) | 6.33 |

5 | Conv(16,5)+Conv(32,5)+Conv(32,5)+FC(64 32) | 3.61 |

6 | Conv(16,5)+Conv(16,5)+Conv(16,5)+FC(64 32) | 3.44 |

7 | Conv(16,5)+Conv(16,5)+Conv(32,5)+FC(64 32 32) | 2.86 |

### 3.2 Prediction of effective properties

In this part, the performance of the trained 3D-CNN model is evaluated on the validation dataset which consists of 300 RVEs with the same VF range (e.g., 2%-28%). The prediction and ground truth (obtained through FEA) for the effective properties of each RVE sample is shown as scatter plots in Fig. 8. Since the baseline is given as red line, we can see that the trained model gives accurate prediction for the 12 components of Young’s modulus, shear modulus and Poisson’s ratio. It is also observed that the prediction on the samples with low VFs, e.g., the left-bottom part of the scatter plots for moduli (’s and ’s) and the upper-right part for Poisson’s ratios (’s), perform identically well as the counterpart with high VFs, though larger randomness is present for RVEs with low VFs. Let us recall that, in Section 2.1, an exponential distribution of sample number against VF is imposed while generating the datasets. As a result, the number of low VF samples is much greater than the number of the high VF samples, which alleviates the issue of low VF induced uncertainty. To measure the prediction performance quantitatively, we calculate the mean absolute relative error (MARE) for each component, defined as

(12) |

where and are prediction and ground truth of the component for the th test sample. The results are summarized in Table 3. It is seen that the MAREs, for all the 12 components, are below 0.55%.

MARE (%) | 0.45 | 0.42 | 0.47 | 0.48 | 0.50 | 0.53 | 0.22 | 0.23 | 0.24 | 0.22 | 0.25 | 0.22 |

The efficiency of the proposed 3D-CNN approach is also evaluated by drawing a contrast of computational time between 3D-CNN inference and finite element analysis (FEA), as shown in Fig. 9. Note that the process of inference is defined as the prediction operation on new input data by the trained 3D-CNN model. It is well known that GPU parallelization has been highly exploited on deep learning models in the context of both network training and inference. However, to make the comparison fair, we also collect the averaged CPU time consumed by 3D-CNN by performing inference on the CPU. The configurations of hardware are given in the beginning of Section 3. It is noted that the CPU time of FEA depends largely on the number of discrete elements of the RVE. In our test, the number of tetrahedral elements in the discretized RVEs increases from 7705 for VF=2.13% to 26136 for VF=28.22% to maintain a reliable discretization. We collect the averaged computational time of 10 different RVEs for each fixed VF. For the 3D-CNN inference, however, the computational time is theoretically independent of VF since all the RVEs are sampled with voxels. We collect the computational time for 300 RVEs with all VF covered. It is seen from Fig. 9 that the GPU-based 3D-CNN inference provides 25 speedup for the low-VF samples and up to 50 speedup for the highest VF. Even on the CPU, the 3D-CNN beats the traditional FEA for VF greater than 12%.

Another aspect that cannot be neglected is the computational time for training the 3D-CNN model. For the training dataset with 1400 RVEs considered in this paper, it takes about 35 hours on GPU to achieve a desirable trained model. Nevertheless, it is noticed that the high computational demand for training is one-off which means that, once the model is trained, the inference can be conducted on any upcoming new RVEs that fall into the ensemble. Even if the new RVE comes from another type of composite, the transferability of the trained 3D-CNN, discussed in Section 3.4, will largely reduce the time expense. We will verify that transfer learning makes the 3D-CNN extremely convenient for adding supplementary data or training a model for new datasets to account for new scenarios and enhance the generalizability of the trained model.

### 3.3 Uncertainty quantification

Modelling of natural composites is usually characterized with uncertainty. The uncertainty may come from the measurement error, microstructural randomness, mixture of materials and some other natural (or artificial) systems. Predicting the effective properties in a probabilistic/statistical sense, such as obtaining the mean value and standard deviation (SD), would provide a better reference for engineering and designing materials.

Strictly speaking, the output of a trained 3D-CNN is deterministic for a given input. Therefore, the uncertainty of the 3D-CNN output is largely affected by the variance of the input. To verify that our 3D-CNN model is capable of preserving the uncertainty of the effective properties for the particle reinforced composite, we manually introduce the uncertainty into the dataset to be evaluated in the framework of Monte Carlo simulation. In particular, we generate a group of RVE samples VF following Gaussian distributions (e.g., mean of , and for three configurations, and identical standard deviation of 0.7%). In each configuration, 200 RVEs are generated. The details for the uncertainty quantification (UQ) dataset are listed in Table 4.

Mean (, %) | SD (, %) | Number of RVE Samples |

7 | 0.7 | 200 |

14 | 0.7 | 200 |

21 | 0.7 | 200 |

Fig. 10 presents the predicted distributions of the modulus and Poisson’s ratio components in comparison with the reference ground truth. These histograms are fitted by Gaussian distributions whose mean and standard deviation parameters are also listed. It can be seen that the trained 3D-CNN produces very satisfactory prediction of the probabilistic distributions, e.g., the errors for the mean value of all the components are less than 1% while the predicted standard deviations are also very close, but slightly larger than, the ground truth values.

The predicted distributions of effective properties for three VF cases are shown in Fig. 11. It is obvious that the modulus components are positively correlated to the VF while the Poisson’s ratio components are on the contrary, which are in accordance with the Voigt/Reuss models voigt; reuss1929. In a word, the 3D-CNN’s ability to reproduce the probabilistic distribution of the effective properties, together with its high computational efficiency (as discussed in Section 3.2), will make it a promising approach for probabilistic design of engineering composites du2002efficient; chen2006probabilistic.

### 3.4 Transferability of the trained model

A major assumption required by lots of DL approaches is that the training data and future data must be from the same generator or source. In other words, they must be in the same feature space and follow the same distribution pan2009survey. In many real-world applications, this assumption may not hold. In these cases, if the knowledge learned by the DL model can be transferred, it will largely reduce the effort on retraining the model on new datasets. The transferability refers to the convenience of transferring the learned knowledge from a trained model to a different but related problem. Transfer learning is usually achieved through transfer the pre-trained model to a new model with additional trainable parameters relying on new datasets of interest (e.g., adding additional layers to the trained network while fixing the transferred network parameters from the original model). The need for transferring learning arises when the acquired data can be easily outdated or when the target data is intractable (or costly) to obtain but a less rich dataset is available.

To examine the transferability of the previously trained 3D-CNN model, we consider a new dataset of RVEs with ellipsoidal inclusions. The major and minor radius of the ellipsoids are randomly generated within the interval independently. The overall range of the VF is the same as the previous data set (e.g., 2%-28%). Following the similar manner in Section 2.1, a much smaller dataset with only 320 samples is generated with the sample number as an exponential function of the VF. The entire data is divided into training, validation and testing set with the ratio of 200:60:60. We transfer the trained 3D-CNN model with the architecture described in Case 2 as shown in Table 2, and establish a new 3D-CNN network by adding one additional convolution layer before flattening, e.g., Conv(32, 5), and activate the trainable parameters in the last FC layer (see Fig. 3). We try to generalize the trained 3D-CNN for RVEs with spherical inclusions to the case of ellipsoidal inclusions (see Fig. 12 for example). The transfer learning (TL) model fine tuned with new dataset is compared with the model trained from scratch (TS) with regard to the learning curve and prediction performance.

The learning curves for both cases (e.g., TL *vs.* TS) are shown in Fig. 13 where the -axis denotes the epoch and -axis denotes the loss function value. It can be seen that the initial loss is much lower for the TL model which indicates that the transferred model for sphere inclusions can already well capture the latent features for RVEs with ellipsoidal inclusions. The asymptote for the TS convergence curve is much higher than that of the TL model. Given a small amount of training dataset, the TL model converges much faster, e.g., only taking dozens of epochs for the loss to decrease to which is close to our best model () discussed in Section 3.1. It demonstrates that we can successfully transfer the knowledge of as well as fine tune a pre-trained 3D-CNN model to achieve a good accuracy at a particular low training expense. Therefore, the transfer learning might help overcome problems such as lack of the data and high computational cost for training a large size model. These challenges are critical especially in field measurements where rich RVEs data are costly to obtain. The prediction performance of these both TL and TS models are compared in Fig. 14. It is evident that the TL model outperforms the TS model no matter in the bias or variance of the effective properties. The averaged MARE for the TL and TS models on all the components are 0.43% and 1.36%, respectively.

## 4 Conclusions

In this paper, a 3D-CNN approach is proposed for determining the effective/homogenized properties of heterogeneous materials. In particular, we consider RVEs reinforced by reandomly distributed particle inclusion (e.g., spherical and elliptical inclusions). The geometries of the RVEs are generated using the Hierarchical Random Sequential Adsorption (HRSA) algorithm bai2014auto and labeled for training the 3D-CNN model via FEA-based linear homogenization. The proposed 3D-CNN architecture consists of multiple hidden 3D convolution layers, pooling operation, flattening and FC layers. A parametric study of the network hyperparameters has been conducted to determine optimal network architecture with the best inference performance. The proposed approach was tested on a series of numerical experiments in the context of inference accuracy, computational efficiency, uncertainty quantification (UQ) ability and transferability. Results show promising potential of the proposed approach to advance efficient design and analysis of heterogeneous composite materials composed of representative microstructures.

It is worth mentioning that the comparison with the FEA results shows that the 3D-CNN model can reproduce the effective material properties with a high accuracy (e.g., the maximum prediction error around 0.5%). Also, the 3D-CNN demonstrates advantages regarding the computational efficiency for the model inference over the traditional FEA, which could achieve a speed-up from 25 to 50 on GPU operation. In addition, the UQ study verifies the trained 3D-CNN is capable of accurately predicting probabilistic distributions of the effective material properties, in the framework of Monte Carlo simulation, when uncertain inputs are provided.

In summary, the proposed 3D-CNN is characterized with the following benefits: (1) It provides an end-to-end solution for predicting the effective material properties from 3D phase voxels which can be obtained via parametric modeling, advanced imaging techniques such as X-ray micro-topography and 3D atom probe; (2) It is able to reproduce the effective properties with a high accuracy and computational efficiency, which would empower a faster product design iteration or design optimization for composite materials; (3) The 3D-CNN model preserves the probabilistic distribution of effective material properties for the input with uncertainty. This feature makes the 3D-CNN a promising approach for probabilistic engineering design; (4) The knowledge learned by the 3D-CNN model can be easily transferred to a different type of composite at a very low training expense, in which a good prediction performance can still be achieved even on a new dataset of small size with the help of transfer learning. This particular characteristic becomes significant when RVEs data are costly to obtain.

Nevertheless, there remain some issues of interest on the 3D-CNN model to be studied in the future, that include, for example: (1) investigating the universality of transfer learning on other heterogeneous materials such as fiber-reinforced or polymer composites; (2) extending the current 3D-CNN to model composites with nonlinear material properties (to this end, the load condition on each RVE must be considered as part of the input for the networks); (3) applying the trained model or retraining a generative model for microstructure generation with desired effective properties yang18gan; li2018GAN.

## Acknowledgement

The authors would like to thank Dr. Hao Sun and Dr. Ruiyang Zhang, from the Department of Civil and Environmental Engineering at Northeastern University, for their constructive suggestions and comments on designing the proposed network.

## Data Availability

The datasets and computer codes are available upon request from the authors.