FinTech, bank risk-taking, and credit allocation
Model
To verify whether the research hypotheses are valid, this paper establishes an econometric model to conduct relevant tests. Firstly, we conduct a baseline regression to test the validity of Hypothesis 1, and the model is shown in Eq. (1):
$$\begin{array}{l}Ris{k}_{it}={a}_{0}+{a}_{1}FinTec{h}_{it}+{a}_{2}R\_siz{e}_{it}+{a}_{3}Ca{r}_{it}+{a}_{4}Ld{r}_{it}\\\qquad\qquad+\,{a}_{5}Li{q}_{it}+{a}_{6}Ro{e}_{it}+{a}_{7}Imves{t}_{it}+{a}_{8}Gd{p}_{t}\\\qquad\qquad +\,{a}_{9}M{2}_{t}+Yeareffect+Bankeffect+{\varepsilon }_{it}\end{array}$$
(1)
In Eq. (1), Riskit represents the single bank risk-taking indicators, which can be measured by the non-performing loan ratio (NPLit) and the loan loss reserve ratio (LLSRit). \(FinTec{h}_{it}\) is the extent of FinTech application by banks. The significance of a negative regression coefficient a1 would indicate that FinTech contributes to the reduction of bank risk-taking. Conversely, a positive coefficient would suggest that the adoption of FinTech by commercial banks may enhance bank risk-taking. R_size it is scale expansion, Carit is the capital adequacy ratio, Ldrit is the deposit-to-loan ratio, Liqit is the liquidity ratio, Roeit is the return on net assets, it Investit is the investment return ratio, Gdpt is the economic growth, M2t is the monetary policy. Yeareffect and Bankeffect is the time trend effect and the bank-specific effect. To mitigate the influence of heteroskedasticity and serial correlation, clustered robust standard errors are applied at the bank and year levels.
We consider the potential heterogeneity in the efficacy of FinTech in mitigating bank risk-taking across various credit segments. To empirically assess the differential risk-mitigating impact of FinTech on distinct credit types and to substantiate the validity of Hypothesis H2, we incorporate an interaction term between \(R\_loan\) and FinTech into our analytical framework. This approach is operationalized in Eq. (2), which is presented as follow:
$$\begin{array}{c}Ris{k}_{it}={\beta }_{0}+{\beta }_{1}R\_loa{n}_{it}+{\beta }_{2}FinTec{h}_{it}+{\beta }_{3}R\_loa{n}_{it}\times FinTec{h}_{it}+\,{\beta }_{4}R\_siz{e}_{it}+{\beta }_{5}Ca{r}_{it}+{\beta }_{6}Ld{r}_{it}\\ +\,{\beta }_{7}Li{q}_{it}+{\beta }_{8}Ro{e}_{it}+{\beta }_{9}Inves{t}_{it}+{\beta }_{10}Gd{p}_{t}+\,{\beta }_{11}M{2}_{t}+Yeareffect+Bankeffect+{\varepsilon }_{it}\end{array}$$
(2)
In Eq. (2), we define R_loanit as the credit allocation variable, which captures the distribution of credit across micro and small credit, retail credit, and corporate credit. Taking micro and small credit as an example, when \(\beta_{3}\) is significantly negative and \({\beta}_{1}\) is significantly positive, it means that FinTech serves to mitigate risks associated with the expansion of micro and small credit. Conversely, a significantly positive coefficient for this interaction term would imply that FinTech adoption could potentially elevate the risk associated with the expansion of micro and small credit.
In the concluding segment of our analysis, the study stratifies the sampled banks into two primary categories: national commercial banks and local commercial banks. Subsequently, to capture regional disparities, local commercial banks are further classified into those situated in the eastern, central, and western regions. Employing Eq. (2), this stratification allows us to assess and compare the efficacy of FinTech in reducing risk-taking among banks of different scales and geographical locations. This examination is pivotal in ascertaining the veracity of Hypothesis H3.
Variable. (1) Explained variables. Banks, as the backbone of traditional finance, are crucial for the stability of the financial system. This paper, following Cheng and Qu (2020) and Zhang et al. (2022), uses the non-performing loan ratio and the loan loss provision rate to gauge commercial banks’ risk-taking. The non-performing loan ratio (NPL) is a key metric for assessing the level of passive risk-taking by banks, calculated as the proportion of non-performing loans to the total loan portfolio. An elevated NPL signifies a higher degree of risk borne by the bank due to potential loan defaults. The loan loss provision rate (LLSR) reflects the bank’s anticipation of potential future non-performing loans and the necessary provisions for loan losses, expressed as a percentage of the total loan loss provisions. A higher LLSR indicates that the bank has weaker confidence in risk prevention, and its willingness and ability to actively take on risk are relatively lower.
(2) Explanatory variables. Existing methodologies for assessing FinTech development can be broadly classified into three categories: Firstly, the China Digital Inclusive Finance Index, derived from Ant Group’s transaction data, which evaluates FinTech development across dimensions such as coverage breadth, usage depth, and digital support services (Cai et al., 2024); Secondly, macro or regional FinTech indices based on indicators like Baidu FinTech keyword indices, FinTech patent counts, or regional FinTech loan volumes (Zhao et al., 2022; Jia, 2024); Thirdly, bank-specific FinTech development indices constructed using text mining and web crawling techniques to quantify FinTech keyword frequencies (Wu et al., 2023; Fang et al., 2023).
Given the context of commercial banks’ active engagement with FinTech and their ongoing strategic transformations, we recognized that macro or regional indices might not adequately capture the nuanced differences in FinTech application among individual banks. Consequently, we adopted the third approach, employing text analysis and web crawling techniques to quantify bank-specific FinTech development levels. According to competition theory, technology acceptance theory, and agency theory, banks’ technology disclosures in news are shaped by market pressure, user trust, and governance constraints, making the news-based FinTech index a reliable indicator of their actual FinTech maturity (Acharya and Ryan, 2016; Chen et al., 2019; Li and Xu, 2025). Given the diverse and intricate nature of activities within the FinTech sector, which encompasses numerous variables, factor analysis allows us to condense these variables into a few key representative factors. This approach not only simplifies the data structure effectively but also captures the essential characteristics of FinTech, offering a scientific and systematic framework for constructing a FinTech index (Cheng and Qu, 2020). Consequently, this paper employs factor analysis to develop the FinTech index. The detailed construction steps are as follows:
Step 1: Determine keywords. The keyword library is determined based on the most commonly used underlying FinTech technologies in the bank credit field. As big data, cloud computing, artificial intelligence, blockchain, and the Internet of Things are the five most widely applied technologies and have a significant impact on information transmission and risk management in the commercial bank credit process, these five technologies are selected as core keywords. Among them, big data technology mainly captures and processes massive data and conducts comprehensive detection of enterprises; cloud computing can agilely process data and build high-quality data production elements; artificial intelligence can automatically monitor the operation and management behavior of lending enterprises, move the risk identification checkpoint forward, and timely discover fund use violations and potential default risks; blockchain technology helps achieve transaction transparency and ensures the authenticity and traceability of transaction information flow; the Internet of Things can integrate relevant industry chain resources to realize the digitalization and intelligence of finance.
Step 2: Generate variables. Use the Baidu News search engine to calculate the annual number of news articles containing keywords for each bank, such as “Bank of China + Big Data”, which can search for news pages containing both Bank of China and big data. Then, based on the original keywords of each dimension, obtain Big Data (BGit), Blockchain (BKit), AI (AIit), IoT (ITit), and Cloud Computing (CCit) successively.
Step 3: Undertake the pilot test. The appropriateness of the variables for factor analysis is assessed through the KMO measure and Bartlett’s test of sphericity. The calculation formulas for the KMO value and the Bartlett test statistic are presented below:
$$KMO=\frac{\sum \sum _{i\ne j}{r}_{ij}^{2}}{\sum _{i\ne j}{r}_{ij}^{2}+\sum _{i\ne j}{\alpha }_{ij}^{2}}$$
(3)
$${\chi }^{2}=-\left[\frac{(N-1)-\frac{2p+5}{18}N}{72}\right]{\mathrm{ln}}\,|R|+\left[\frac{(N-1)-\frac{3p+21}{54}N}{18}\right]{\mathrm{ln}}\,|S|-\frac{p+1}{6}N$$
(4)
In Eq. (3), \({r}_{ij}^{2}\) represents the simple correlation coefficient and \({\alpha }_{ij}^{2}\) represents the partial correlation coefficient. In Eq. (4), χ2 is the test statistic, \(N\) is the sample size, \(p\) is the number of variables, \(R\) is the correlation coefficient matrix, and \(S\) is the covariance matrix. \(|R|\) and \(|S|\) are the determinants of matrices \(R\) and \(S\), respectively.
Equations (3) and (4) yield a KMO value of 0.875, and the Bartlett test statistic is significant at the 0.000 level, suggesting that the keyword variables are interrelated and appropriate for factor analysis.
Step 4: Extract common factors. Utilizing the principal component analysis approach, we prioritize factors with eigenvalues that surpass the threshold of 1. The formula for computing the eigenvalues is detailed subsequently:
$$\lambda =R{\bf{v}}$$
(5)
In Eq. (5), \({\bf{v}}\) represents the corresponding eigenvector. The eigenvalue is essentially the sum of the squares of all the elements in the eigenvector, that is to say:
$${\lambda }_{i}=\mathop{\sum }\limits_{j=1}^{p}{({R}_{ij}{{\bf{v}}}_{j})}^{2}$$
(6)
In Eq. (6), \({R}_{ij}\) is the element in the correlation matrix \(R\), and \({{\bf{v}}}_{j}\) is the element in the eigenvector \({\bf{v}}\). One extracted common factor accounts for 72.83% of the variance, indicating a high explanatory power of the common factor.
Step 5: Calculate factor scores. The principle of maximizing variance is used to perform an orthogonal rotation on the loading matrix, and the factor score coefficient matrix is estimated through regression analysis. The formula is as follows:
$$\hat{F}={B}^{T}{R}^{-1}X$$
(7)
In Eq. (7), \(\beta_{3}\) is the estimated factor score matrix, \({\beta}_{1}\) is the transpose of the factor loading matrix, R−1 is the correlation matrix, and X is the matrix of observed variables. Since there is only one common factor with an eigenvalue greater than 1, this paper uses the score of this common factor as the FinTech index.
Our FinTech index sets itself apart from those of scholars like Zhang et al. (2022) and Geng et al. (2023) by concentrating on the development levels of specific core FinTech technologies within commercial banks. Additionally, we source our data from Baidu news texts instead of traditional financial news databases. This decision is driven by the extensive media coverage that commercial banks’ FinTech endeavors receive. As the largest Chinese-language search engine globally, Baidu provides exhaustive indexing of these initiatives, offering a robust indicator of both the level of FinTech activity and the degree of public interest in banks’ FinTech developments.
Credit allocation. This study, expanding on Hu et al. (2024), assesses bank credit allocation across three dimensions: micro and small credit, retail credit, and corporate credit. The methodology employs the proportion of each loan category’s total value relative to the overall loan portfolio to serve as an indicative metric. Micro and small credit is defined as the year-end balance or the aggregate annual credit extended below one million yuan to micro-enterprises, individual entrepreneurs, and small business owners for operational purposes. Retail credit encompasses the combined total of consumer, personal, and mortgage loans, without the offset of loan loss provisions. Corporate credit is calculated as the gross amount of loans extended to businesses, without the offset of loan loss provisions.
(3) Control variables. The scale of banks is captured by the scale expansion (R_size) from Balyuk (2023), noting that while a larger scale can improve risk management and profitability, it might also increase the inclination towards riskier investments, raising the risk of bankruptcy. The capital adequacy ratio (Car), as indicated by He et al. (2023), measures banks’ risk resistance, with higher values suggesting stronger capabilities to absorb risks, thereby safeguarding stability and stakeholder interests. Liquidity is assessed through the deposit-to-loan ratio (Ldr) and the liquidity ratio (Liq), following Wang et al. (2021), where higher values indicate greater liquidity but may also signal capital underutilization. Profitability is gauged by the return on net assets (Roe) and the investment return ratio (Invest), according to Li et al. (2022), with higher values reflecting a stronger drive for credit expansion, a higher risk appetite, and a better capacity to diversify risks. Macroeconomic factors are controlled for by including the GDP growth rate (Gdp) and the money supply growth rate (M2), following Fang et al. (2023), as these variables reflect the overall economic climate and the stance of monetary policy.
Data sources and descriptive statistics
In response to the rise of Internet finance, commercial banks have increasingly integrated FinTech into their operations. Accordingly, this study designates 2015 as the starting point and selects a sample of commercial banks spanning 2015 to 2022. The sample encompasses national banks, which include 5 state-owned commercial banks and 12 national joint-stock banks, as well as local banks consisting of 130 urban commercial banks, 410 rural commercial banks, 3 rural cooperative banks, 10 rural credit cooperatives, and 17 private banks. Figure 3 illustrates the proportional distribution of banks across provinces relative to the national total. We have also incorporated a detailed table in Appendix Table 1. Data on the banks are sourced from their quarterly, semi-annual, and annual reports, along with other pertinent public disclosures. Macroeconomic data is obtained from the National Bureau of Statistics. To mitigate the impact of outliers, continuous variables are adjusted by winsorizing at the 1% and 99% quantiles.

This figure illustrates the geographic distribution of the 587 sample banks across China, including both national and regional institutions. National banks account for 17 banks (2.90%) of the sample. Among regional banks, Zhejiang contributes the largest share (68 banks, 11.58%), followed by Jiangsu (55 banks, 9.37%) and Shandong (55 banks, 9.37%). Other provinces with relatively high representation include Anhui (37 banks, 6.30%), Sichuan (35 banks, 5.96%), Fujian (31 banks, 5.28%), and Shanxi (31 banks, 5.28%). Liaoning (28 banks, 4.77%) and Henan (21 banks, 3.58%) also have notable proportions. Medium to small shares are observed in Inner Mongolia (15 banks, 2.56%), Yunnan (14 banks, 2.39%), Jilin (13 banks, 2.21%), Shaanxi (8 banks, 1.36%), Gansu (6 banks, 1.02%), Ningxia (6 banks, 1.02%), and Chongqing (4 banks, 0.68%). The least represented provinces include Shanghai, Qinghai, and Hainan (3 banks each, 0.51%) and Tibet (1 bank, 0.17%).
The descriptive statistics presented in Table 2 indicate that the non-performing loan (NPL) and loan loss provision rate (LLSR) exhibit relatively high standard deviations, signifying considerable variation among banks in these metrics. The FinTech index has a mean close to zero but displays a large SD and range, which may reflect the uneven adoption of FinTech across banks. The bank credit allocation indicators (R1_loan, R2_loan, R3_loan) have both low means and medians but a large range, suggesting diversity in bank lending structures. The descriptive results for the control variables reveal significant heterogeneity in individual bank characteristics, underscoring the necessity to account for these controls in empirical analysis to accurately assess the impact of the primary variables of interest.
link
