Review of porosity uncertainty estimation methods in computed tomography dataset

X-ray computed tomography is a common tool for non-destructive testing and analysis. One major application of this imaging technique is 3D porosity identification and quantification, which involves image segmentation of the analysed dataset. This segmentation step, which is most commonly performed using a global thresholding algorithm, has a major impact on the results of the analysis. Therefore, a thorough description of the workflow and a general uncertainty estimation should be provided alongside the results of porosity analysis to ensure a certain level of confidence and reproducibility. A review of current literature in the field shows that a sufficient workflow description and an uncertainty estimation of the result are often missing. This work provides recommendations on how to report the processing steps for porosity evaluation in computed tomography data using global thresholding, and reviews the methods for the estimation of the general uncertainty in porosity measurements.


Introduction
Porous materials and samples are common in a wide range of scientific fields. For instance, permeability and reservoir characteristics of porous rocks are useful parameters in the oil and * Author to whom any correspondence should be addressed.
Original Content from this work may be used under the terms of the Creative Commons Attribution 4.0 licence. Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI. gas industry and related fluid research [1], whereas in paleontology, the shape and volume of pores are used to identify fossils [2]. Porosity analysis is also widely used in engineering [3], manufacturing [4,5], or material development [6] as an indicator of a material's strength [7]. Pores in metals often indicate crack initiation locations in cyclic loading applications, and they even influence static strength and ductility of materials, making non-destructive porosity testing valuable for quality control purposes [8,9].
Generally, porosity refers to a measurement of the presence of voids within a sample. Allaby [10] defines pores as voids that can be empty or filled with trapped gas and/or fluids, and surrounded by any type of material. Using this definition, The shape and volume of pores is connected to the material's formation, temperature, environment and type [12]. Pores can be cylindrical, slits, conical, spherical, ink bottle-like, and interstitial [13], but they can also feature more complex shapes. Pores can form networks that may or may not be accessible from the outside of the object (open porosity), or they can be isolated (closed porosity) [14]. Pore characteristics influence bulk density, mechanical strength, and thermal conductivity of objects [8].
Various methods can be used to analyze the porosity of a sample. The choice of a method depends on the type of porosity present in the particular sample, as well as on other parameters. X-ray computed tomography (CT) is among the most widespread and broadly applicable methods. CT is an imaging modality which is based on the absorption of x-rays in materials [15], and makes non-destructive three-dimensional analysis of samples and their internal structures possible [16]. The output of a typical CT measurement is a set of cross-sectional slices stacked in a 3D volume (figure 1 Steps 1 and 2). This forms a grid of voxels, which are volumetric elements with a specific gray value determined by the density and atomic number of materials contained within, and the x-ray energy used [17]. The edge length of a voxel influences the best possible resolution of a measurement. The scanning, tomographic reconstruction, and subsequent analysis of a CT dataset are all potential sources of uncertainty and variation between measurements (figure 1).
The various error sources in figure 1 influence the quality of the resulting images, which in turn has a major impact on pore segmentation and the subsequent porosity measurement [18]. In this case, quality refers to the combination of noise level, contrast between material and pores, and sharpness of edges between the two regions [19]. As an example of the complex influences of the image quality on porosity assessment, image denoising may decrease the amount of noise falsely detected as pores, but it may also cause some smaller pores to be blurred and therefore missed, skewing the results [20,21]. Due to this, it is important to report on the various sample, measurement, and processing parameters used in a porosity study, and to take uncertainty into account [18]. Data used in studies can be shared through data repositories such as the GigaScience Database, see Goodman [22], as in the case of Du Plessis et al [23]. The workflows and protocols can also be shared on services like Protocols.io [24].
Characterization of porosity in a CT dataset is directly related to the segmentation procedure, the partitioning of a volume into two or more separate sections (e.g. material and voids). Segmentation is based on the intrinsic characteristics of voxels or regions of the volume, such as gray values, edges, or texture [28]. The creation of a 3D dataset from a given object using x-ray CT has three steps: (1) the scan or measurement, (2) reconstruction, and (3) segmentation. Then, analysis (4) of the volume can be done. Errors that occur during step 1 can lead to tomographic artifacts (discrepancies between an object and its image). In steps 2-4, other types of errors can lead to uncertainty in the final analysed results. The red lines show the focus of this work. Figure inspired by Villarraga-Gómez et al [25], Smet et al [26] and Hiller and Reindl [27].
One of the most widely used segmentation methods is thresholding [29], where voxels are separated into distinct categories based on a threshold set for one or more of the characteristics mentioned above. Thresholding can be global or locally adaptive [30,31]. The latter is mainly used for complex objects, where the optimal threshold value may change throughout the dataset [32]. On the other hand, global thresholding defines a single threshold value for the entire dataset, influencing all further analysis and interpretations [29]. For its simplicity, ease of use, and ease of access, global thresholding is the go-to segmentation method in many CT data analyses.
Porosity analysis may yield different results based on the chosen method of segmentation and researcher input [33]. The conclusions drawn from a study can be ambiguous if the methodology used for segmentation is not clearly described. This has caused some researchers to call for standardisation [18,[34][35][36][37]. It is commonplace to describe measurement parameters used for data acquisition in porosity studies, such as the tube voltage and current, and the voxel size. However, to ensure reproducibility, the applied segmentation approach should be described and the uncertainty of the results should be estimated too. Otherwise, the The relationship between accuracy and precision shown using multiple measurements (small red dots) and a reference value (big green dot). Precision estimation can be determined with several measurements. A standalone CT measurement (small blue dot) has no reference value (the green dot is unknown) and different a different approach must be adopted for precision and uncertainty estimation. This figure was inspired by Pospíšil and Ludvík [48], and Taylor [49]. reliability of these studies runs the risk of being disputed [26,29,38,39].
Measurements are typically expressed with an error ratio, a confidence interval, or a standard deviation presented as a ± value. This value is based on the cumulative effect of device, measurement and processing errors [27,40].
Knigge [41] states that a measurement does not need to be accurate (close to the reference value), but its precision should be known (figure 2). The accuracy refers to the closeness between a measured value and a reference value [42,43]. A precise measurement is not necessarily close to the reference value, but it has little variability when repeated. Thus, precision quantifies the reproducibility and level of uncertainty of a measurement. International metrological standards for estimating the measurement uncertainty exist [44][45][46], but they cannot be directly applied for the purposes discussed here, as they do not provide specific guidelines for the uncertainty of 3D CT data segmentation [47].
Two causes of errors [42] can affect the precision and accuracy of a result, namely: systematic error (affects all measurements in a similar manner, observable through repeated measurements [50,51]), and random error (mainly caused by operator errors, and affects individual measurements and thus the measurement precision [52]). Acquisition, hardware, and reconstruction discrepancies all significantly influence the final porosity evaluation.
A complex analysis of uncertainties in CT measurements is the domain of metrology and of some industrial fields, where calibrated devices or calibration methodologies are used [51,53,54]. Without a ground truth, which is a measurement that is considered to have the exact true value, accuracy of a measurement cannot be assessed [55]. In terms of precision, the influence of uncertainties stemming from the measurement process is a complex issue, and it is already the subject of thorough research [40]. In contrast, uncertainties associated with segmentation are seldom referenced or explained thoroughly within current literature.
This work offers an overview of CT data segmentation methods that use global thresholding and are commonly used for porosity analysis. Crucial aspects of the segmentation process, which should be disclosed in studies, are identified in the section 'Thresholding and reproducibility'. The need for a thorough description of approaches used in studies is supported by a systematic review of recent relevant literature. Methods for the uncertainty estimation of porosity analysis in CT data are discussed in the section 'Uncertainties of CT data segmentation'. Due to financial and time constraints, researchers may often only have access to a single CT dataset for analysis [9], so particular attention is paid to those methods that can be used in these situations. Uncertainty estimation is still needed in such cases, but the approaches to perform it may be less obvious. We hope to provide a practical overview of the possibilities available to researchers to ensure the reproducibility of their results.

Thresholding and reproducibility
Global thresholding methods can be divided into manual, semi-automatic, and automatic [56], depending on the extent of operator involvement in the selection of the threshold value. Automatic algorithms calculate a threshold objectively based on the characteristics of the input dataset, such as voxel grayscale values and features of the image histogram (figure 3). Common automatic algorithms include minimum error thresholding [57], Otsu's method [58], valleyemphasis [59], optimal thresholding [60], histogram concavity analysis [61], iterative thresholding (isodata method) [62], entropy-based thresholding [63], Bayesian thresholding [64], and others. The results of these may be used directly or further fine-tuned manually. Manual threshold selection is subjective and observer-dependent, and it is usually based on visualizing the segmentation result on a slice of the CT dataset and tuning it until it is satisfactory [35]. The potential human bias inherent in manual thresholding may lead to larger differences between data segmented by different operators. Despite this, manual thresholding is still very common due to its simplicity.
One of the simplest global thresholding methods is ISO50, which sets a threshold at the mean of two extreme peak values in the grayscale histogram of a dataset [65]. Results of this method tend to be satisfactory when the analysed histogram is bi-or multi-modal (figure 3) [66]. However, Horner et al [67] showed that ISO50 might be influenced by local variations in the image, in which case the threshold should be modified accordingly. It is also challenging to use it with low porosity values because the histogram of such data may lack a clear peak corresponding to pore values.
Otsu's thresholding [58] is another simple method, and along with Kittler's thresholding [57], it is one of the most used algorithms for porosity segmentation (table 2). Similar to ISO50, the results of Otsu's method are affected by the modality of the histogram [29]. Algorithms such as Otsu's method can also be used for multilevel thresholding, which may be used to classify datasets into pores, grains and high-density inclusions (figures 3(e) and (f)) [68,69]. In cases where a dataset's histogram is approximately unimodal, the algorithms mentioned above are likely to perform poorly, and a threshold can be set using probability-based algorithms [59].
There is no consensus in the scientific community about which thresholding method is ideal for porosity analysis in CT data. In fact, the wide selection of published specialized algorithms in various fields suggests that the effectiveness of an algorithm changes with the dataset type and application [29,37,70,71]. There is, however, an agreement that the reproducibility of manual segmentation is lower than that of an automated or trainable procedure, as remarked by Kalasová et al [72].
A round robin test, which is porosity in a specific scan evaluated by multiple operators, was reported by Du Plessis et al [18]. This study found a good agreement in the qualitative pore distribution assessment of ten operators, but quantitative results varied significantly, partly due to the low porosity content in the test sample used. Works of Zikmund et al [11,73] and Baveye et al [37] also feature a comparison of various manual segmentation strategies in addition to algorithm-based ones. Significant discrepancies were found both within and between these groups. Therefore, automated methods do not ensure an accurate or reliable result either, as the choice of algorithm significantly impacts the threshold value and the obtained results (figure 3) [26,29,37,38,74,75].
Regardless of the segmentation method used, porosity analysis needs to be reproducible for a reliable inter-study comparison of results. This means that parameters of the segmentation process should be described and explained thoroughly, as remarked in multiple works [18,[34][35][36]73].

Analysis of published porosity methodologies
To assess the trends regarding reproducibility in the current literature, we chose 53 articles from geosciences and material sciences (industry, engineering, metrology, agriculture, and cultural heritage) that deal with porosity analysis in CT data, and analysed their segmentation methodologies (table 2). The articles were selected through Google Scholar using the keywords CT, Porosity, Segmentation, Global Thresholding, Quantitative analysis, and Uncertainty evaluation. Our selection was narrowed down to articles that were cited at least once.
Twenty of the 53 articles (57%) featured either no description of the segmentation procedure, or their description was not sufficient for their results to be reliably reproducible. Ten of these articles (19%) had a description but was not accompanied by any visualization of the histogram and threshold value. Out of the remaining 23 articles (43%), ten (19%) featured a sufficient description and provided an example CT slice showing segmentation results, along with either a grayscale histogram of the slice, or an estimation of the uncertainty of the results. These results can be considered reproducible, but not optimally so. Only 13 (25%) of the surveyed articles disclosed all parameters needed to ensure measurement reproducibility, including a description, an example slice along with its histogram, an uncertainty estimation or the threshold value, and a mention of the software used.
Over the observed period (1992-2020), the overall thoroughness of thresholding methodology descriptions seems to not have changed. There are no clear distinctions between methodology descriptions in the various fields of study, except that works dealing with soil porosity (21% of the studied articles) are more prone to inter-study comparison, and therefore they generally include a more thorough definition of the parameters used for thresholding.
The examined studies are mostly based on a single segmentation method (43% of the examined articles), followed by comparison (28%) and combination (19%) of segmentation methods. Otsu's method is the most commonly used (36%), both on its own and in comparison to, or in combination with, other techniques. It is closely followed by manual global thresholding segmentation (34%). This is fairly consistent across the fields, which shows that the choice of thresholding method is probably mainly dependent on the operator experience and sample type.
A mention of the software used, which is present in 68% of the selected articles, can aid in the reproducibility of results. The most commonly mentioned software in table 2 includes VG Studio (15%; Volume Graphics GmbH, Heidelberg/D), ImageJ/Fiji (15%; National Institutes of Health, Bethesda, Maryland/US and LOCI, University of Wisconsin-Madison, Madison, Wisconsin/US), and Avizo (13%; ThermoFisher Scientific, Waltham, Massachusetts/US). We have observed that industrial and engineering fields tend to utilize VG Studio and Avizo, whereas the selection of software used in geosciences and geology is broader. This may be because the complex samples in geosciences are difficult to process in general purpose image processing software, forcing researchers to use more specialized solutions.

Reproducibility
Analysis of table 2 and previous work done by Taina et al [76], Lievers and Pilkey [77], and Iassonov et al [74] has led to the identification of six major parameters of the thresholding process, which should be included in a study to ensure result reproducibility. These parameters include: histogram shape, an image of a slice from the dataset showing pore determination, the chosen threshold value, description of the thresholding procedure, and the name and version of the software used. The parameters are also listed in table 1 [74,76,77].
The most common thresholding algorithms operate on the grayscale histograms of images, and manually selected thresholding is often partly determined by the histogram shape, too. Therefore, including an example histogram in a Uncertainty estimation (±) 6 Software + version study can aid its reproducibility. Likewise, including a slice illustrating the final segmentation and the threshold value provides a valuable visual and numeric reference. Documenting the software used, along with its version, may also be relevant for reproducibility. In some types of software, the internal workings of the algorithms may not be accessible to the end user (these algorithms are called black boxes). Due to this, analyses carried out in different software may be hard to compare, and the name and version of the software used for a given study becomes relevant.
An important but often omitted piece of information (table 2) in terms of reproducibility is the estimation of the porosity result uncertainty. Porosity measurement in CT is directly related to the segmentation process, which affects the size, shape, distribution, and total volume of pores [71]. Different segmentation methods may lead to an over-or underestimation and increased uncertainty of porosity (figure 3) [33]. An uncertainty range will therefore help other researchers assess whether their reproduced results differ significantly from the outcomes of the original study. However, uncertainty of porosity in CT data may not always be straightforward to estimate. The next section goes over a selection of possible approaches to perform this estimation in a variety of scenarios.

Uncertainties of CT data segmentation
Several studies were conducted concerning the uncertainty estimation of both manual and automatic threshold selection in CT datasets [51,53]. These works describe various procedures for estimating the uncertainty of porosity analysis (table 3), which can be separated into empirical, analytical, and sensitivity approaches. All these procedures mitigate different kinds of errors in the final uncertainty estimation.
Empirical procedures are based on the comparison of a CT dataset with a reference. This reference may take the form of a calibrated workpiece or a measurement conducted using another calibrated method. Such procedures are demanding, costly, and impractical for very complex or non-homogeneous samples. Since these methods require a reference measurement, they will increase the error sources of step 1 in figure 1 (mostly systematic errors) but will reduce operator errors.  Analytical approaches use various statistical concepts to enumerate uncertainty. They usually utilize multiple porosity measurements in some way, so they can be time-consuming and potentially expensive. This makes them unsuitable for large amounts of data. Despite this, analytical approaches are applicable to a large variety of samples. When this approach is used, data processing errors are increased, and statistical and operator errors are reduced.
Sensitivity approaches are based on the operator's behavior, knowledge of the dataset, and expectations. Here, an uncertainty of the operator's measurement is estimated using some heuristics. Methods in this category can be unreliable if certain rules are not followed, but they are usually quick, cheap, and easy to apply on any dataset and in a wide range of software (table 3). Since sensitivity approaches are based on the experience of the operator and visualization of the data, they are expected to reduce systematic and analytical errors, but increase random errors [73].
It should be emphasized that the uncertainty estimated using any of these methods is relative, not absolute. Various methods in the three categories described above are suitable for different scenarios, depending on the number of samples, operators, time, and other resources available.
The following text discusses the methods in table 3, and offers recommendations concerning their proper application. As the number of available datasets is likely to be a major factor when choosing an appropriate method for uncertainty estimation, the text is primarily divided into approaches for multiple datasets and for a single dataset. Despite their importance, approaches that require multiple datasets are described briefly, as they have already been exhaustively described in the literature. Single-dataset methods are often easier to apply and fit a wide range of datasets, but they are rarely explained in sufficient detail. For this reason, we describe those approaches more thoroughly.

Empirical approach.
Correlation of results of various measurement methods for the same sample (figure 4(c); table 3) is a common approach. For example, Taud et al [11] and Robin et al [130] estimated the uncertainty of their porosity measurement by comparing the results of CT and helium injection measurements. Taud et al [11] found an uncertainty of about ±2%. This approach is straightforward and reliable, but the methods that are compared must be selected carefully, and the measurement must be clearly planned out beforehand. Additional demands are placed on the researchers who choose this approach, as they need both access to, and the knowhow for, multiple measurement methods. Different methods are suited for evaluating different types of porosity, making the comparison complex. The voxel size of a CT scan strongly influences measurement results, particularly in samples with a wide range of pore sizes (e.g. concrete). For more precise results, multiple CT scans might be required [138]. Similarly, other quantification methods are limited to specific ranges of pore sizes, making a direct comparison between methods challenging. Additionally, if a chosen method is destructive, the non-destructive nature of CT may no longer be an advantage. A calibrated object is usually used in metrological studies to evaluate the results of the analyses of objects with simple and reproducible shapes [53,66,102]. The calibrated workpiece is measured using a method that is widely recognized as accurate ( figure 4(a)), such as coordinate measuring machines [53] or any calibrated higher-resolution optical device [40,131]. This is an accurate and precise approach, but it is hardly applicable in most porosity measurement scenarios, where the assessed objects are complex and no calibration reference for porosity exists [139].

Analytical approach.
Inter-study comparisons are viable in cases where different measurements of the same sample type can be found in the literature. For instance, Kerckhofs et al [132] compared their CT porosity results to another study using the same device and parameters on the same sample type ( figure 4(b)). Correlation of different studies is possible when a thorough description of the measurement and segmentation is available, which eliminates any ambiguity of the process. In theory, only variations between the samples play a role in this case. This is an easily applicable and potentially reliable method. However, measurement and segmentation must be clearly described in the compared studies, as was noted before. As it stands, this method is rarely applied.
The multiple scan approach is suitable for studies that require high resolution and precision [133], and measurement time and the amount of processed data are not major concerns. An object is measured multiple times, producing several CT datasets ( figure 4(d)). These are then processed, the results are averaged, and uncertainty can be enumerated statistically. The precision of this method increases with the number of datasets, but so do the demands in terms of cost, time, and processing load.

Single dataset
3.2.1. Empirical approach. Scanning a sample of interest together with a reference object of a similar density material with known porosity is a possible approach to uncertainty analysis (figure 5(a)) [131]. Both the sample and the reference object are analysed in the same way, and the measured porosity of both is calculated. The resulting value for the reference object can be compared to its known porosity, adding confidence to the measurement. This method can confirm the quality of a CT image, which helps the quantification of the minimum detectable pore size for the selected CT scan settings, and the combination of several objects in one scan enables a direct comparison of the results [131]. The drawback of this method is that a bigger field of view is needed, potentially reducing the image quality. It is also required to know the material of the measured sample beforehand, as the reference material should have a similar x-ray density as the sample. A large reference library would therefore be needed for multi-material samples and routine material measurements.

Analytical approach.
Multiple location averaging is suitable for samples with a homogeneous porosity distribution, and it is useful when the number of measurements is restricted ( figure 5(b)). It is similar to the multiple measurements approach, except it assesses multiple locations within a single dataset instead of multiple datasets [71]. This is a low-cost and simple method, which provides acceptable results if pores are distributed evenly across the chosen locations. However, its results can be biased if the dataset is not homogeneous, if the number of chosen regions-of-interest is insufficient, or if the choice of regions is biased. Observer error and mathematical uncertainty are major factors here, and the method is considered time-consuming and not particularly reliable [134].
Multi-segmentation based on various thresholding methods is suitable for datasets where uneven pore distribution hinders the use of the previous method ( figure 5(b)). Vieira et al [135] averaged the results of manual thresholding and two algorithms, and estimated the uncertainty of this averaged reference dataset. Panigrahi et al [136] used a similar, although more automated approach. The core idea of this method is to perform multiple segmentations on a single dataset. The results of these can then be averaged to yield a single resulting porosity value, and their variance can be assessed to determine uncertainty. It is inexpensive, requiring only a single dataset and operator, and relatively time-efficient, as it can easily be applied on any dataset and with any software. One drawback may be the higher time demand associated with multiple segmentations. This method may also lead to a high uncertainty, depending on the particular threshold values and choice of algorithms (for example, see the differences between the two methods in figure 3(f)), and human supervision may be required in some cases to make sure the results of automatic methods are not erroneous.

Sensitivity approach. Multiple manual thresholds
set by different operators can be applied in a similar manner to the previous method, especially if the required manpower is readily available (figure 5(c)). Zikmund et al [73] asked 20 CT and porosity experts to independently select a threshold to separate pores and material in a single CT slice. They then averaged the threshold values to represent the mean opinion of the entire group. When compared with a different method, Zikmund et al [73] concluded that the two showed good compliance. The ease of application and general-purpose nature of this method are its main strengths. On the other hand, finding enough expert operators for a particular study may be difficult, and depending on their experience and number, biased or overly uncertain results may be obtained [37].
Three segmentations at −n%, 0%, and +n% of an initial threshold (n being an arbitrary number) are also suitable for any situation where uncertainty needs to be assessed using only a single dataset ( figure 5(d)). The three results are averaged, and the precision of this average is then estimated based on their difference [73]. The initial threshold choice can be manual or automatic, as shown in figure 3. Then ±n% is applied from this chosen threshold and the porosity is calculated for each threshold value. From figure 3, the ±n% will be added to the ISO50 (light green) or Otsu (dark red dashed line) threshold. Zikmund et al [73] chose ±1% for their study. The choice of n requires a careful and informed optimization by an operator, who needs to adapt it to the particular grayscale range and contrast of a dataset ( figure 3). This method can be used on any dataset, with any segmentation procedure and in all study fields. It is cost-and time-effective, and easily reproducible. It is crucial to keep in mind that the results here are strongly influenced by the initial threshold choice, as well as the choice of n.
Region erosion-dilation is very similar in concept and execution to the previous method. First, a global threshold is set using any appropriate method, and pores are segmented ( figure 5(a)). Then, morphological erosion and dilation are applied to the initial segmented volume, simulating multiple passes of manual segmentation. To express uncertainty, the initial porosity is divided by that calculated from the dilated and eroded volumes, yielding a ±n% range of the porosity number. The amount of erosion and dilation is a qualified empirical estimation. For instance, Kalasová et al [137] used 0.3 pixels of erosion-dilation, which they set after thorough testing. This uncertainty estimation method is quick, low-cost and applicable to any dataset in any field. Similar to some of the previously mentioned methods, the uncertainty can be highly over-or underestimated depending on the initial threshold and the amount of erosion-dilation chosen. All uncertainty estimation procedures described here have their specific advantages and drawbacks. In order to choose an appropriate method, we recommend to first define the field of study and the level of precision needed (table 3). Then, the choice of appropriate methods can be further narrowed down based on the number of samples and number and types of measurements that can be performed. It is common that only one object and a single measurement of it are available. In that case, we suggest using one of the sensitivity approaches, especially the last two mentioned. These do not require a high commitment in terms of time, resources, or expertise. Although the results of these procedures are strongly dependent on the choices of the operator, a reliable uncertainty estimate can be achieved if the rules outlined above are followed.

Conclusions
CT is one of the leading methods for non-destructive 3D material testing, and it is an ideal tool for porosity evaluation, combining volume, distribution, and shape information in a single measurement and dataset.
The segmentation procedure has a large impact on the obtained porosity value. The global uncertainty can be reduced through calibration for the measurement or by the use of selected algorithm for the segmentation, but in any case, this uncertainty has to be estimated and noted alongside the final porosity result. The precision of the porosity value can be estimated from a single dataset, but an assessment of its accuracy requires a reference value. The uncertainty estimation methods outlined in this work consider this difference between the availability of a reference value or not. The best method for each study can be freely chosen from the provided overview according to the available data and values.
This estimation has to be coupled with a thorough description of the experimental method (i.e. quality) and the segmentation procedure to ensure a proper comparison and reproducibility between porosity studies. As demonstrated in this work, many otherwise well-developed studies lack these features, which may lead to wrong interpretations and complicate any attempt to compare different studies.
There is a great number of segmentation procedures, and as the ideal approach often depends on the particular dataset, standardisation is not possible across all fields. Instead, this work aims to promote transparency of the methodologies used in various studies, with a focus on the ubiquitous global thresholding techniques.
We hope that a thorough description of the segmentation procedure, as well as uncertainty estimation, will become commonplace in future studies. It is the only way to verify the reliability of the results, which is of high importance in scientific studies utilizing x-ray CT.

Data availability statement
No new data were created or analysed in this study.

Acknowledgments
We acknowledge CzechNanoLab Research Infrastructure supported by MEYS CR (LM2018110). Jozef Kaiser gives thanks to the support of the Grant FSI-S-20-6353.

Contributions
T Z and J K defined the topic and got the funding for this research, V J analysed the literature and interpreted the results, M Z participated in the article construction. All the figures were created by V J and M Z. A D P verified the overall validity of the results, J Š, Z S reviewed the manuscript with all other authors.