Towards reliable energy storage: a review of machine learning-driven SOH estimation for electrochemical energy storage systems

Shenglu Guo; Xuecheng Qian; Mengyang Chen; Yifan Du; Jilei Ye; Lili Liu; Yuping Wu

doi:10.20517/energyz.2025.08

Download PDF

Review | Open Access | 17 Jun 2026

Towards reliable energy storage: a review of machine learning-driven SOH estimation for electrochemical energy storage systems

Views: 24 | Downloads: 2 | Cited:

0

Shenglu Guo¹

,

Xuecheng Qian¹

, ...

Yuping Wu^2,*

Energy Z 2026, 2, 200011.

10.20517/energyz.2025.08 | © The Author(s) 2026.

Author Information

Article Notes

Cite This Article

Abstract

State-of-health (SOH) estimation is a prerequisite for the safe and efficient operation of electrochemical energy storage systems (ESS), but practical deployment remains limited by incomplete observability, inconsistent health labels, domain shift, and insufficient trustworthiness under changing operating conditions. This review synthesizes machine-learning-based SOH estimation across electrochemical ESS, covering lithium-ion, sodium-ion, and aqueous batteries, supercapacitors, and battery-supercapacitor hybrid configurations. The literature is organized along an end-to-end machine-learning-driven pipeline, including health target definition and labeling, field data and observability, health-indicator representation, model learning and generalization, benchmarking and trustworthiness, and deployment-oriented integration. The review shows that reliable SOH estimation does not depend solely on model architecture, but on the alignment among target semantics, signal availability, feature representation, supervision regime, evaluation design, and operational objective. It further highlights that health semantics are device-dependent and should not be transferred mechanically across batteries and supercapacitors. Representative learning strategies are discussed, including structured regression, temporal deep models, transfer and online adaptation, and physics-guided or uncertainty-aware learning. The review also extends the discussion from component-level diagnosis to system-level integration in batteries, supercapacitors, and hybrid ESS. Future progress will depend on observability-aware target design, robust evaluation, and trustworthy deployment rather than algorithmic novelty alone.

Graphical Abstract

Keywords

Electrochemical energy storage systems, supercapacitors, lithium-ion batteries, machine learning, state-of-health estimation

Download PDF 0 0

INTRODUCTION

Driven by rapid power-sector decarbonization and the large-scale integration of variable renewable energy, modern energy systems increasingly require fast, reliable flexibility to balance supply and demand. Battery-based storage is widely recognized as a key short-duration flexibility resource, and the International Energy Agency estimates that global energy storage capacity needs to expand sixfold by 2030 to support accelerated renewable deployment^[1]. In parallel, electrification in transport, industry, and buildings is expanding the role of electrical energy storage as a cross-sector infrastructure component rather than a niche technology. ESS are therefore emerging as a core infrastructure for both electrified end-use sectors and modern power systems, enabling efficient energy utilization, resilient operation, and deep decarbonization.

Across today’s applications, ESS are being deployed at multiple scales and in diverse contexts, spanning electric mobility, industrial electrification, grid-connected assets, and microgrids. In transportation, batteries enable the electrification of vehicles, while in power systems, stationary storage is increasingly used to provide fast flexibility and enhance operational reliability under dynamic conditions. In stationary deployments, battery packs are typically aggregated into racks and interfaced to the AC grid through power conversion systems under energy management system (EMS) and battery management system (BMS) supervision^[2]; supercapacitors are often coupled on the DC side via bidirectional DC-DC converters to buffer fast transients and reduce battery stress. As deployment expands, the performance, safety, and economic value of storage assets become increasingly dependent on reliable state estimation over long service periods, where aging, temperature variations, and non-stationary duty cycles can substantially alter energy and power delivery capabilities.

Within this landscape, electrochemical energy storage has become the dominant technology family for short- to medium-duration storage and for many high-power buffering tasks. This review focuses on machine learning-driven state-of-health (ML-SOH) estimation for electrochemical ESS, covering battery ESS built on lithium-ion, sodium-ion, and aqueous batteries, as well as hybrid configurations that couple batteries with supercapacitors. To motivate the inclusion of multiple chemistries and device types, Table 1 positions representative electrochemical storage technologies by energy, power, lifetime, response, cost, and application^[3-7]. Supercapacitors are included because their positioning is distinct from battery technologies, being primarily associated with high-power buffering and fast transient support, while hybrid battery-supercapacitor systems are increasingly considered to improve dynamic performance and reduce battery stress. Rather than treating all electrochemical storage technologies as a chemically uniform category, this review uses a broader systems title to examine which elements of the ML-SOH pipeline remain transferable across device classes and which remain chemistry- or device-specific. This distinction is necessary because recent application-oriented and cross-electrochemistry studies show that target definition, relabeling, recalibration, and validation cannot be assumed to transfer directly from lithium-ion practice to other storage families^[8,9].

Table 1

Electrochemical energy storage technologies

Technology	Energy	Power	Lifetime	Response	Cost	Application
Lithium-ion	High	Moderate	Moderate	Fast	High	Mobile
Sodium-ion	Moderate	Moderate	Moderate	Fast	Medium	Emerging
Lead-acid	Low	Moderate	Low	Moderate	Low	Backup
Flow battery	Moderate	Low	High	Slow	High	Stationary
Supercapacitor	Low	High	Very high	Ultra-fast	High	Power
Hybrid system	Balanced	High	High	Fast	Variable	Grid

Table 1 is used to position representative electrochemical storage technologies in the broader systems context. The detailed methodological synthesis in this review is centered on lithium-ion batteries and supercapacitors, while other chemistries are included mainly to clarify transferability limits and scope boundaries.

In this broader framing, the technologies considered here are not discussed to the same depth. Lithium-ion batteries and supercapacitors provide the main basis for the methodological analysis, whereas lead-acid and flow batteries are included mainly to clarify where SOH semantics and estimation logic cease to be directly transferable. Lead-acid batteries have long supported their own SOH estimation literature, while flow batteries highlight cases in which health interpretation is shaped more strongly by electrolyte-side imbalance and system-level operating interactions than by conventional cell-centered battery proxies^[10-12].

From a system perspective, SOH is not merely a cell-level diagnostic metric. In electrochemical ESS, SOH directly influences system-level power capability, round-trip efficiency, and thermal margin, thereby affecting the frequency of derating actions, service availability, and operational safety margin^[13-16]. This system relevance is amplified in long-life deployments where duty cycles vary over time due to market signals, renewable intermittency, and operational modes. Therefore, SOH estimation is a system-level enabler for availability, lifetime-aware dispatch, maintenance scheduling, and risk control.

Table 2 compares major SOH estimation strategies in terms of signal basis, label requirement, strength, limitations, and scenarios. The comparison shows that method selection is determined not only by algorithmic form, but also by observability, supervision, and deployment constraints. In practical applications, this distinction is critical because field operation is usually limited to voltage, current, and temperature measurements under non-stationary conditions.

Table 2

Comparison of major SOH estimation methods

Strategy	Signal basis	Label need	Strength	Limitation	Scenario
Physics-based	Model-constrained	Sparse	Consistency	Calibration	Diagnosis
Feature-based ML	HIs	Moderate	Efficiency	Sensitivity	Small-data SOH
Probabilistic ML	HIs	Moderate	Uncertainty	Simplicity	Risk-aware SOH
Sequence DL	Time series	Dense	Dynamics	Opacity	Long-horizon tracking
Transfer learning	Source-target features	Sparse target	Portability	Domain gap	Cross-domain SOH
Online learning	Streaming signals	Incremental	Adaptation	Drift error	Field updating
Physics-guided ML	Signals + priors	Moderate	Robustness	Complexity	Deployment

“Label need” denotes the relative amount or availability of supervision required for stable model development. “Scenario” denotes the primary application setting. For supercapacitors, the dominant health labels usually shift from capacity-oriented targets to capacitance retention and ESR growth, which further affects feature design and deployment interpretation. SOH: State-of-health; HI: health indicator; DL: deep learning; ML: machine learning; ESR: equivalent series resistance.

Machine learning has become a prominent pathway for SOH estimation because it can extract degradation-relevant signatures from field-available signals without requiring full-fidelity electrochemical models. Nevertheless, reliable ML deployment in safety- and reliability-critical energy storage must address recurring pitfalls: label definition and consistency, domain shift across temperature and duty cycles, data leakage in evaluation, interpretability of learned health indicators, and trustworthiness under distributional changes. Recent literature further emphasizes uncertainty-aware and confidence-bounded health outputs as an important ingredient for risk-aware operation and maintenance decisions^[17-19]. At the same time, purely data-driven estimation is not sufficient for all deployment settings. In safety-critical ESS applications, physically informed constraints, model-based consistency checks, and uncertainty-aware outputs are increasingly recognized as necessary complements to black-box prediction, especially when observability is incomplete and operating conditions drift over time.

Accordingly, this review is organized along an end-to-end ML-SOH pipeline covering problem definition and labeling, field data and observability, health indicators and feature representation, model learning and generalization, benchmarking and trustworthiness, and deployment-oriented synthesis. Within this structure, the discussion extends beyond lithium-ion-centered literature to multi-chemistry contexts, supercapacitor aging signatures, and hybrid-system interfaces. Figure 1 illustrates the ML-driven SOH estimation pipeline for electrochemical ESS. Existing reviews have examined machine-learning-based SOH estimation from several important angles, including lithium-ion-centered syntheses of data, features, and algorithms, reviews focused on real-world data challenges, and broader discussions of application-oriented battery machine learning^[8,20-22]. However, these perspectives do not fully resolve a question that is central to deployable SOH estimation in electrochemical ESS: how target semantics, field observability, label construction, feature representation, evaluation validity, and operational use must be aligned when device class, chemistry, and system role change. This review is positioned at that interface. Its novelty, therefore, lies less in expanding the list of algorithms than in redefining the analytical framework of ML-SOH across electrochemical storage applications. Rather than offering another algorithm-by-algorithm summary, it treats ML-SOH as a coupled engineering workflow and uses that framing to connect observability, health semantics, evaluation design, and deployment relevance across electrochemical storage applications.

Towards reliable energy storage: a review of machine learning-driven SOH estimation for electrochemical energy storage systems

Figure 1. ML-driven SOH estimation pipeline for electrochemical ESS. ML: Machine learning; SOH: state-of-health; ESS: Energy storage system.

In summary, this review makes three contributions. First, it reframes machine-learning-based SOH estimation as a deployment-oriented engineering workflow rather than an isolated model-selection problem by linking health-target definition, field observability, feature representation, model learning, evaluation protocol, and operational objective into a single analytical chain. Second, it develops a device- and system-aware comparison framework for electrochemical storage health, showing that SOH semantics, useful proxies, and deployment meaning cannot be transferred mechanically across batteries, supercapacitors, and hybrid interfaces. Third, it shifts the focus of the review from nominal predictive accuracy to deployment validity by synthesizing the practical failure modes that repeatedly undermine ML-SOH in real applications - especially label inconsistency, domain shift, evaluation leakage, limited interpretability, sparse observability, and insufficient uncertainty awareness - and translating them into benchmark and reporting implications for reliable energy-storage operation.

SOH TARGETS AND DEFINITIONS

Electrochemical ESS are deployed under diverse duty cycles and sensing constraints, so state of health should not be treated as a single universal number. A more defensible view is to treat health as a family of targets that quantify different forms of capability loss and risk accumulation over time. This framing is consistent with application-oriented SOH reviews that emphasize well-defined references and operational relevance rather than purely laboratory metrics, and with system-level discussions in battery ESS that connect health to operational constraints and management actions.

In this review, we align health targets with operational decisions and distinguish four field-relevant targets: energy capability, resistance and impedance growth, power capability, and thermal safety margin. Energy capability is commonly quantified through capacity retention under a specified reference protocol and remains the most widely used end-of-life indicator for batteries. Resistance and impedance growth capture efficiency loss and heat generation, and they often trigger conservative power limiting in practice. Power capability can be formalized as the maximum deliverable or absorbable power over a specified time horizon while satisfying voltage and thermal constraints. Representative state-of-power formulations explicitly tie the power limit to constraint sets and prediction horizons, making the health-constraint link operationally meaningful^[23]. Thermal safety margin connects health to risk control because aging-induced changes increase heat generation and can shrink the allowable operating envelope.

Credible health reporting requires a transparent baseline and reference protocol. A measured initial baseline obtained after commissioning under a specified procedure should be distinguished from the manufacturer nameplate rating, which is intended for specification and may not match the measured starting point of a deployed asset. Standardized tests provide a common language for reporting reference performance. For lithium-ion cells, IEC 62660-1 specifies performance testing procedures for capacity and power and requires that conditions and reporting details be stated for comparability^[24]. For supercapacitors, IEC 62391-1 establishes generic specifications and test methods, including capacitance and internal resistance measurement procedures^[25]. Because test methods can produce systematically different capacitance values, reporting the exact characterization method is essential when capacitance is used as a health proxy^[26,27].

Ground-truth labeling is therefore a central practical issue. Strong labels typically come from controlled reference tests, such as capacity checks, standardized pulse tests, or impedance measurements, but these are often available only during commissioning or maintenance windows. In contrast, real deployments frequently rely on proxy labels derived from operational segments, such as short-pulse responses, rest-and-relaxation features, and trends in direct-current internal resistance surrogates. Roman and colleagues highlighted that the reliability of data-driven pipelines hinges on consistent label definitions and evaluation protocols, especially when models are expected to generalize across operating conditions^[17]. Thelen and coauthors further argued that health estimates used for decision-making should be accompanied by uncertainty quantification to reflect limited observability and distribution shifts in the field^[18]. These considerations define a comparability boundary: quantitative performance claims are meaningful only when methods are evaluated under consistent label definitions, test procedures, and operating envelopes; otherwise, results should be framed as qualitative guidance. More generally, label quality in ML-SOH should be judged along three dimensions: fidelity, comparability, and timeliness. Fidelity refers to how closely the label reflects the actual degradation state; comparability concerns whether labels obtained under different protocols or operating envelopes can be placed on a common basis; timeliness indicates whether the label can support online, quasi-online, or only offline health updating. These dimensions help explain why seemingly similar SOH studies may not be directly comparable even when they report the same target name. This is also why online prognostics and health management-oriented studies increasingly treat label construction and update cadence as part of the estimation method itself rather than as a separate preprocessing detail^[28].

Although the above principles apply broadly, practical SOH proxies and their observability vary systematically across chemistries and device types. For lithium-ion, sodium-ion, and many aqueous batteries, capacity retention and resistance growth remain dominant proxies, but their identifiability is constrained by the availability of repeatable operating segments and consistent thermal conditions. Recent reviews further indicate that sodium-ion batteries should not be treated as a direct extension of lithium-ion SOH practice. Aqueous battery systems introduce additional chemistry-specific constraints related to electrolyte stability, interfacial side reactions, dissolution, and gas evolution, all of which can alter degradation pathways and the interpretation of health proxies^[29,30].

For supercapacitors, health semantics differ more fundamentally from those of batteries, as reflected in the shift from energy-oriented proxies to power-oriented observables such as capacitance and equivalent series resistance (ESR). Whereas battery SOH is usually interpreted through capacity fade, resistance growth, and energy-delivery loss over relatively long horizons, supercapacitor health is more directly reflected in capacitance retention, equivalent series resistance increase, leakage behavior, and transient power deterioration^[31]. As a result, battery-style SOH definitions cannot simply be transferred to supercapacitors without redefining the target in relation to pulse-power delivery, thermal loss, and short-timescale dynamic response. Recovery phenomena and rest conditions can also alter apparent capacitance during aging tests, which reinforces the need to explicitly report characterization procedures and interpretation assumptions when capacitance is used as a health proxy^[27,32]. Earlier mechanistic and model-based studies established aging models for electrochemical double-layer capacitors and enabled lifetime simulation under dynamic applications by tracking parameter evolution^[33,34]. More recent literature extends this line from mechanism-oriented aging interpretation to data-driven health estimation, including machine-learning reviews focused on capacitance- and RUL-oriented prediction and field-data frameworks built on tram operation logs^[35-37]. This distinction also matters at the hybrid-system level: once batteries and supercapacitors are coupled to redistribute energy and power stress, a single scalar SOH may no longer be sufficient, and system-relevant health reporting is better expressed through component-wise indicators or constraint-based power-margin formulations^[38]. These findings support a device-aware health framework rather than a battery-centered definition extended by analogy.

Hybrid battery and supercapacitor systems combine distinct degradation behaviors, so health reporting should avoid collapsing component degradation modes into a single ambiguous index. A consistent description is either multi-output, reporting battery and supercapacitor health separately, or system-oriented, reporting an aggregate power capability margin under clearly stated constraints and horizons^[13,31,39]. At the module and pack level, additional complications arise from series-connected cells and cell-to-cell variability. For example, incremental capacity analysis can be informative for health assessment under controlled conditions, but module-level application requires careful handling of series-connected behavior and operating-segment consistency^[40]. These considerations highlight why health targets, baselines, and labels must be specified before comparing estimation methods.

The same point also helps define the role of lead-acid and flow batteries within a broader electrochemical storage review. In lead-acid systems, SOH interpretation is strongly influenced by sulfation, corrosion, and active-mass shedding, and the associated estimation methods have developed along a path that is not simply an extension of lithium-ion practice^[10]. Flow batteries present a different case because recent studies suggest that their health meaning is tied more closely to electrolyte imbalance, recoverable capacity effects, and flow-coupled operating behavior, so the direct transfer of conventional battery proxies is equally problematic^[11,12].

Table 3 maps field-oriented health targets and representative SOH proxies across electrochemical storage systems in terms of representation, observability, sensitivity, and applicability. With these targets and labeling conventions established, the next Section reviews field data availability and observability constraints that determine which estimation strategies are feasible in real operation.

Table 3

Field-oriented health targets and proxies across electrochemical storage devices

System	Proxy	Representation	Observability	Sensitivity	Limitation	Applicability
Lithium-ion	Capacity	Integral	Partial	High	Protocol-dependent	Energy
Lithium-ion	Resistance	Differential	Accessible	Moderate	Condition-coupled	Power
Sodium-ion	Capacity	Integral	Partial	Moderate	Immature data	Energy
Aqueous	Voltage	Curve	Accessible	Moderate	Low resolution	Stationary
Supercapacitor	Capacitance	Linear	Accessible	High	Recovery effects	Power
Supercapacitor	ESR	Transient	Accessible	High	Temperature-sensitive	Power
Hybrid	Multi-output	Combined	Partial	Moderate	Coupling effects	System

Observability indicates whether a proxy can be reliably extracted under typical field operating conditions.

PRACTICAL DATA FOUNDATIONS FOR SOH ESTIMATION

In practical ESS, SOH estimation rarely relies on a single measurement or a single modeling step. Instead, it usually emerges from a workflow in which field signals are screened, preprocessed, converted into usable representations, and then linked to health outputs through estimation models. Figure 2 illustrates this general workflow under practical observability constraints.

Figure 2. The general flow of SOH estimation. RNN: Recurrent neural network; GRU: gated recurrent unit-squeeze; BiSTM: bidirectional long short-term memory; CNN: convolutional neural network; empirical mode decomposition; GPR: Gaussian process regression; SOH: state-of-health.

Data quality and preprocessing

In practical SOH estimation, data often determine the upper bound of model performance before model architecture does. Even a sophisticated learning framework can only be as reliable as the labels it is trained on, the signals that are observable in real operation, and the range of operating conditions represented in the training set. For this reason, recent methodological work has increasingly treated data foundations as a primary technical issue rather than a background condition, emphasizing that label quality, observability, and operating-domain coverage jointly define the credibility of SOH estimation results^[8,41,42]. A core difficulty is that real electrochemical ESS rarely provide laboratory-grade observability. In most deployments, continuously available measurements are limited to voltage, current, and temperature, sometimes supplemented by control commands, operating modes, and fault logs. By contrast, measurements that are highly informative for degradation, such as impedance spectra, pressure, or gas evolution, are usually unavailable online or only obtainable during maintenance. This mismatch between field observability and degradation physics is one of the main reasons why strong performance in laboratory studies does not automatically translate to robust field deployment^[43]. Nonstationary operation further complicates the data problem. Signatures learned under a single temperature band, charging protocol, or duty profile may lose validity when the same asset later participates in another service or operates under a new control policy. Cross-condition robustness therefore cannot be inferred from a single average error metric. Instead, practical SOH estimation requires explicit consideration of domain shift, source-target similarity, and evaluation settings that separate within-domain fit from across-domain generalization^[44,45]. The variables most commonly used as inputs are voltage, current, and temperature, as they can capture major aging-related factors; however, high-frequency sampling across long operations can accumulate measurement errors and propagate uncertainty into the estimator inputs.

These observations lead to two practical requirements for deployable SOH estimation. First, data quality control should be treated as an integral part of the estimation method rather than as a secondary preprocessing step. Second, benchmarking protocols should reflect field constraints instead of assuming laboratory-grade observability, dense labels, and idealized operating regularity.

Field observability

To connect data limitations with method design, field observability can be organized into three practical levels. The first level is routine observability, which includes continuously available measurements such as voltage, current, temperature, timestamps, operating states, and event logs. These signals support online estimation and are the backbone of most deployable SOH methods, but they often provide only indirect access to degradation. The second level is opportunistic observability, which includes informative segments that arise naturally during operation, such as charge segments, discharge segments, rest periods, relaxation windows, and pulse-like transients induced by control actions. These segments are often more informative than routine streams for estimating practical health proxies, particularly resistance-related and dynamic-response-related quantities. However, their occurrence is policy-dependent, and their repeatability can be poor across sites and services. The third level is maintenance observability, which includes capacity checks, standardized pulse tests, and impedance measurements. These measurements are sparse, but they usually support stronger labels and more interpretable health inference. As a result, the same nominal SOH target can have very different ground-truth quality depending on whether it is derived from routine signals, opportunistic segments, or maintenance diagnostics. This difference is often hidden in published studies and is one of the main sources of inconsistency in cross-paper comparison^[17,18,25].

Open datasets

Open datasets play a central role in ML-driven SOH research because they determine which methods can be reproduced, compared, and stress-tested. At the same time, available datasets are unevenly distributed across chemistries or device families. Lithium-ion datasets are the most mature and abundant, while openly accessible datasets for sodium-ion batteries, aqueous batteries, and supercapacitors remain relatively limited. As a result, the evidence base for ML-SOH is still strongly shaped by lithium-ion-centric benchmarks. Accordingly, cross-chemistry ML-SOH should presently be interpreted more as a transferability problem than as a fully benchmarked consensus across all electrochemical storage families, with sodium-ion batteries, aqueous batteries, and supercapacitors serving mainly as comparative or emerging domains rather than equally mature benchmark ecosystems^[35-38].

NASA

In 2008, NASA published the first comprehensive battery dataset^[46], followed by a subsequent “random usage dataset” in 2014^[47]. The 2008 dataset contains cycling data for lithium-ion batteries tested at multiple temperatures (4 °C, 24 °C, and 43 °C)^[46]. These batteries were charged using a constant current-constant voltage (CC-CV) protocol with varied discharge methods. The dataset records in-cycle parameters (current, voltage, temperature) as well as cycle-level measurements (discharge capacity, impedance). The 2014 dataset focuses on randomized usage patterns and features lithium cobalt oxide (LCO) batteries (nominal capacity: 2.2 Ah). These cells were divided into seven groups of four, tested at either room temperature or 40 °C. Five groups underwent CC-CV charging followed by randomized discharge currents (seven distinct profiles), while the remaining two groups experienced fully randomized charge/discharge protocols. The dataset provides in-cycle measurements (current, voltage, temperature) along with periodic capacity checks and electrochemical impedance spectroscopy (EIS) data every 50 cycles.

Their main advantages lie in clear cycle-level structure and strong historical influence, which make them a common starting point for ML-SOH benchmarking. Their limitation is that they remain laboratory-centric and cover a relatively narrow set of cell types and profile designs^[47-50].

The Center for Advanced Life Cycle Engineering

The Center for Advanced Life Cycle Engineering (CALCE) at the University of Maryland has conducted extensive battery cycling tests, generating three main datasets in addition to other battery data not primarily intended for SOH estimation^[51]. The LCO Prism CS2 Battery Dataset contains experimental data from 15 lithium cobalt oxide batteries categorized into six types (Type 1-6) based on testing conditions^[52]. Type 1 includes four 0.9 Ah cells while Type 2 comprises four 1.1Ah cells, with Types 3 through 6 containing one or two 1.1Ah cells each. All cells were charged using a CC-CV protocol (0.5 C constant current to 4.2 V followed by constant voltage until current dropped below 0.05 A) and then cycled at 23°C under varying partial charge/discharge depths and current rates. Another dataset focuses on 16 LCO pouch cells (1.5Ah) tested at 25 °C to examine the impact of different state-of-charge (SOC) ranges and discharge currents on battery performance. The third dataset involves 12 CX2 batteries (1.35Ah) that follow the same charging protocol but are divided into six testing groups, where Types 1-2 contain four cells each tested similarly to CS2 cells, while Types 3-6 feature individual cells subjected to unique cycling conditions including temperature variations from 25 °C to 55 °C. All datasets provide detailed cycling parameters including current, voltage, and temperature measurements.

CALCE datasets have also been widely used because they provide multiple cell tests under controlled protocols and often include relatively rich documentation. This makes them valuable for benchmarking sensitivity to protocol variation and cell-to-cell variability. However, the coexistence of multiple test conditions and metadata styles means that careless pooling can introduce confounding effects, especially when temperature, chemistry, or policy differences align with train-test splits^[51].

The Oxford Battery Intelligence Laboratory

The Oxford Battery Intelligence Laboratory released comprehensive degradation data in 2017 for eight 740 mAh lithium-ion batteries cycled at 40 °C under CC-CV charging and urban Artemis discharge profiles^[53]. Organized into two subsets, the dataset includes cyclically recorded measurements of voltage, thermal conditions, and discharge capacity obtained from characterization tests performed every hundred cycles. The Energy Trading Dataset documents a year-long experiment where six cells followed real-world grid-trading current profiles, recording monthly capacity measurements alongside operational current, voltage, and temperature data. The Path Dependence Dataset examines aging sequence effects using twelve lithium nickel cobalt aluminum oxide (NCA)/graphite 18,650 cells divided into four groups subjected to different cycling-rest patterns: Groups 1-2 underwent 1-day cycling followed by 5-day calendar aging at C/2 or C/4 rates, while Groups 3-4 experienced extended 2-day cycling with 10-day rest periods under similar conditions. Both datasets provide detailed insights into battery degradation mechanisms under varied operational scenarios.

Oxford datasets provide an important bridge toward stationary and long-horizon perspectives. The energy trading dataset records batteries operated under grid-trading current profiles over an extended period, together with monthly capacity checks, thereby making it particularly relevant for non-vehicle storage applications. The path dependence dataset is valuable because it shows that degradation is influenced not only by cumulative throughput but also by the ordering and temporal structure of operational stress. These datasets are therefore highly informative for understanding nonstationarity and protocol dependence, although their labels are relatively sparse compared with laboratory cycling datasets^[53,54].

The Massachusetts Institute of Technology

The Toyota Research Institute (TRI), together with MIT and Stanford University, published a battery cycling dataset comprising 124 commercial lithium iron phosphate (LFP)/graphite cells (A123 Systems APR18650M1A) tested under accelerated charging conditions^[55,56]. The 1.1Ah (3.3V nominal) cells were cycled to failure in a temperature-controlled environment (30 °C) using a 48-channel Arbin LBT system. The study employed customized fast-charging protocols following either one-step or two-step patterns denoted as “C1(Q1)-C2”, where C1/C2 represent constant-current phases and Q1 indicates the SOC transition point (terminating at 80% SOC before switching to standard 1C CC-CV). Voltage limits were maintained at 3.6V (upper) and 2.0 V (lower) per manufacturer specifications, with all cells undergoing 4C discharges. Resistance characterization involved averaging ten charge-phase current pulses (± 3.6C, 30-33 ms duration) at 80% SOC. Notably, prolonged cycling caused some cells to reach voltage cutoffs prematurely during fast charging, resulting in extended constant-voltage phases.

This dataset, often cited in the Severson et al. study, has become one of the most influential benchmarks for early-life battery-life prediction. Its importance lies not only in sample size, but also in the combination of strong cell-to-cell variability and systematically varied fast-charging policies. That makes it especially useful for testing whether early-cycle features are genuinely informative and whether train-test splits are robust to hidden correlations. For battery ESS, however, its relevance is indirect: it is highly valuable for methodological benchmarking and feature discovery, but less representative of the slower, more heterogeneous, and operationally constrained duty cycles typical of stationary storage. A further limitation is that improper random splitting can still mix correlated trajectories, leading to data leakage and overoptimistic conclusions^[41].

Table 4 summarizes representative open datasets commonly used in ML-driven SOH studies. Presenting them at the dataset level rather than the institution level clarifies differences in chemistry, cell format, nominal capacity, and operating conditions, all of which are relevant to model comparability and transferability.

Table 4

The open-source battery datasets

Dataset Name	Battery type	Battery format	Battery capacity	Operating conditions	Source
NASA Aging	Li-ion	18,650 cylindrical	2.0 Ah	CC-CV; multiple temperatures; EIS; run-to-failure	NASA^[46]
NASA Random Usage	LCO/graphite	18,650 cylindrical	2.2 Ah	Randomized cycling; 25-40 °C; reference cycles	NASA^[47]
CALCE CS2	LiCoO₂	Prismatic	1.1 Ah	Partial cycling; variable temperature	CALCE^[51,52]
CALCE PL	LiCoO₂-based	Pouch	1.5 Ah	Partial cycling; SOC-window variation	CALCE^[51]
CALCE CX2	LiCoO₂	Prismatic	1.35 Ah	Protocol variation; pulsed discharge; variable temperature	CALCE^[51]
Oxford OBDD1	Li-ion	Pouch	0.74 Ah	40 °C; Artemis discharge; periodic characterization	Oxford^[53]
TRI-MIT Fast Charging	LFP/graphite	18,650 cylindrical	1.1 Ah	30 °C; 72 fast-charge policies; run-to-failure	TRI/data.matr.io^[55,56]

CC-CV: Constant current-constant voltage; EIS: electrochemical impedance spectroscopy; SOC: state of charge; LFP: lithium iron phosphate; LCO: lithium cobalt oxide; NCA: lithium nickel cobalt aluminum oxide; CALCE: The Center for Advanced Life Cycle Engineering; TRI: The Toyota Research Institute.

Taken together, these representative open datasets provide a useful basis for algorithm development, feature validation, and protocol-sensitive benchmarking, but they remain only partially representative of practical energy storage deployment. Their common shortcoming is that most are cell-level, laboratory-generated, and label-rich only under controlled protocols. By contrast, practical battery ESS operate at the module, rack, or system level, under long-horizon, policy-driven, and often nonstationary conditions, with sparse maintenance labels and limited observability^[57]. Therefore, the existing open-data landscape should be regarded as methodologically valuable but only partially representative of the deployment space. This gap also explains why cross-dataset generalization, uncertainty-aware estimation, and deployment-oriented benchmarking deserve to be treated as central themes rather than supplementary remarks. In other words, the problem is not only whether there are enough data, but whether the available data captures the operational diversity that a reliable SOH estimator must face in real energy storage applications.

HEALTH INDICATORS AND FEATURE REPRESENTATION

Health indicators (HIs) form the bridge between the problem formulation in Section SOH TARGETS AND DEFINITIONS and the field observability constraints discussed in Section PRACTICAL DATA FOUNDATIONS FOR SOH ESTIMATION. Once SOH targets are defined and the practically available signals are clarified, the next question is not simply which model to use, but which representations of voltage, current, temperature, time history, or transient response actually carry transferable degradation information. In this sense, HI design is not a secondary preprocessing step. It is a central methodological decision that shapes model accuracy, interpretability, robustness, and deployment feasibility. The importance of feature representation has been emphasized in both earlier and more recent studies. Zheng et al.^[58] discussed the diagnostic value of incremental-capacity and differential-voltage analyses for capturing degradation-related changes, while the review in^[20] further noted that feature extraction and selection remain critical in ML-based SOH estimation when labels are sparse and operating conditions are non-stationary.

HI taxonomy and physical meaning

Curve-derived indicators

The most intuitive class of HIs is derived directly from charging and discharging curves. These indicators include charge or discharge duration, constant-voltage stage duration, voltage plateau displacement, peak position and peak magnitude on incremental-capacity or differential-voltage curves, and other descriptors obtained from reshaped electrochemical trajectories. Their value lies in compressing a long electrochemical process into a small number of physically interpretable markers. In lithium-ion batteries, IC/DV features are especially useful because changes in peak location, height, and area often track loss of cyclable lithium, active-material degradation, or increased polarization. This limitation has been discussed more explicitly in recent studies. In lithium-ion batteries, IC/DV features are especially useful because changes in peak location, height, and area often track loss of cyclable lithium, active-material degradation, or increased polarization. Charging-curve-based studies have further shown the practical value of curve-derived descriptors for SOH estimation^[59]. However, not all such features are equally robust. In^[60], the ranking of candidate IC features changed once robustness across charge rate and temperature was included in the assessment, indicating that features selected under one protocol do not necessarily remain optimal under another. A broader discussion of this issue is also given in^[20], where feature usefulness is considered together with data realism and operating-condition variability.

For the present review, curve-derived indicators should not be treated as an unstructured pool of candidate features. A more useful interpretation is to regard them as observability-dependent summaries. Full-profile IC/DV analysis can be highly informative under laboratory protocols, but the same indicators may become unavailable or distorted under partial charging, irregular user operation, or incomplete discharge windows. This distinction matters for real ESS deployment. A feature that is highly informative under standardized cycling but rarely observable online has limited practical value, whereas a slightly less expressive feature that can be extracted robustly from partial operational segments may be more valuable for deployment-oriented SOH estimation^[60,61].

Pulse- and response-based indicators

A second category of HIs is extracted from short transient responses rather than from complete charge-discharge trajectories. Typical examples include voltage drop under current pulses, relaxation slope after excitation removal, short-window resistance surrogates, and energy- or power-related descriptors computed from operational events. These indicators are attractive because they are more compatible with field conditions, where complete reference cycles are rare but local perturbations, drive events, or charging fragments are common. In practical terms, pulse- and response-based HIs often provide a compromise between physical relevance and online availability, particularly when the target application emphasizes power fade or dynamic capability rather than capacity loss alone^[61,62]. This class of indicators is also closely related to resistance-sensitive and short-window observables emphasized in rapid SOH estimation and feature-fusion studies^[63,64].

This class of HIs is also important because it broadens SOH estimation beyond a purely capacity-centric perspective. In many ESS applications, the first operational symptom of degradation is not the loss of nominal energy, but a reduction in fast power delivery, an increase in ohmic drop, or a change in recovery behavior after transient loading. Accordingly, pulse-response HIs are particularly relevant when the health target includes dispatchable power, thermal burden, or dynamic efficiency. Their main limitation is that they depend strongly on the excitation condition itself: if pulse amplitude, duration, temperature, or SOC window are not comparable, the same response feature may encode both aging and operating-condition bias. This is precisely why pulse features should be discussed together with observability and domain shift, rather than as universally valid signals^[61,62,65]. This perspective is also consistent with fast-charging SOH studies showing that diagnostically useful health information can still be extracted from operationally constrained charging segments rather than from full reference cycles^[66].

Impedance-related indicators

Impedance-related HIs occupy a special position because they are directly linked to power capability, internal losses, and electrochemical kinetics. A battery may still retain substantial capacity while already exhibiting degraded high-power performance, in which case impedance growth becomes a more actionable health descriptor than capacity fade. This is why impedance-based SOH assessment is repeatedly highlighted as crucial in applications where power capability matters. Depending on sensing availability, impedance information may appear in full EIS measurements, simplified frequency-domain descriptors, geometric indicators from Nyquist curves, or resistance surrogates derived from current-voltage response during operation^[20,62]. However, impedance-based HIs should not be idealized. Full EIS remains highly informative but is often unavailable in routine ESS operation, while simplified resistance features are more practical but less specific. In impedance-oriented studies, both geometric descriptors from Nyquist curves and timescale-related quantities extracted from impedance trajectories have been explored as useful SOH features; however, their use in online systems remains constrained by sensing cost, excitation requirements, and data-processing complexity. In practice, the key issue is not whether impedance is informative, but whether the selected impedance proxy can actually be obtained under the intended operating regime. Recent monitoring perspectives further stress that impedance-linked observables remain among the most operationally meaningful signatures when high-power performance and safety margins are of concern^[67].

Thermal- and efficiency-related indicators

Thermal and efficiency-related HIs are sometimes underused in SOH studies, but they are increasingly relevant for real systems. Degradation affects Joule heating, polarization heat, round-trip efficiency, and energy throughput under load. As a result, temperature rise, thermal gradients, heat-generation tendency, or energy-efficiency decline can serve as indirect markers of health evolution, especially in applications where operational safety and thermal margin are central concerns. Field-oriented and multimodal studies have increasingly used energy-based and power-based quantities as practical health surrogates, especially in settings where laboratory reference tests are unavailable but operational logs remain accessible^[68]. These indicators are especially valuable because they make clear that HI design should follow the health target rather than the other way around. If the target problem is capacity certification under controlled cycling, curve-derived features may dominate. If the target is thermal-safe dispatch in a real ESS, then temperature- or efficiency-related HIs may be more informative than a single capacity proxy. A related implication is that a useful HI taxonomy is not merely descriptive. It helps align the feature space with the actual decision context of the storage system. This alignment is one of the main differences between benchmark-driven SOH estimation and deployment-oriented SOH estimation^[18].

Figure 3 illustrates how practically available signals are transformed into different HI representations and how these representations acquire distinct health meanings in batteries and supercapacitors.

Figure 3. Health indicators and representation pathways across batteries and supercapacitors. ESR: Equivalent series resistance.

While the examples in this Section focus mainly on batteries and supercapacitors, the same representation logic also applies to other electrochemical storage chemistries, although the observability and physical meaning of specific indicators may differ.

Device-specific signatures

Although the HI taxonomy above provides a common representation framework, the physical meaning and practical value of a given indicator remain device-dependent. This distinction is particularly important for electrochemical ESS, where batteries and supercapacitors may share observable signals but differ substantially in health semantics and service objectives. A device-specific discussion is therefore necessary before moving from representation to model design.

For batteries, HI design is typically organized around four degradation manifestations: capacity fade, resistance growth, polarization increase, and loss of energy or power delivery under load. Curve-derived features are useful because they reflect changes in electrochemical staging and reaction pathways, while resistance- and impedance-related features capture internal transport loss and degraded power capability. In practical terms, battery HIs are meaningful when they link measurable signal distortion to a health consequence that matters for operation - for example, reduced usable energy, increased voltage sag, or stronger heat generation during the same duty cycle. This is why battery SOH estimation literature has progressively shifted from single-capacity proxies toward multi-HI formulations that combine energy, resistance, and temporal descriptors. Interpretable short-window voltage and feature-parameter studies similarly show that these manifestations are often best captured by compact indicators with clear physical meaning rather than by indiscriminate feature accumulation^[69].

Another important feature of battery HIs is that they often evolve across relatively long horizons. Many useful indicators become apparent only after repeated cycling, cumulative throughput, or staged changes in charging behavior. This gives battery health signatures a strong temporal structure and partly explains why sequential learning models perform well once meaningful HIs have been extracted. Yet the longer-timescale nature of battery degradation also means that apparent HI stability on laboratory datasets can be misleading if the same indicator depends strongly on protocol regularity. This point is illustrated in both domain-guided and real-world studies. The feature-selection study in^[58] explicitly evaluates IC features under robustness constraints, while^[70] extracts health-related indicators from online vehicle driving data. Taken together, these studies suggest that physically informed and operationally extractable indicators are often more valuable for deployment than large but fragile feature pools.

For supercapacitors, health semantics are fundamentally different. Their value is tied much more directly to high-power buffering, rapid charge-discharge behavior, and cycling endurance than to high specific energy. Accordingly, the most meaningful HIs are usually capacitance retention, equivalent series resistance (ESR) growth, leakage or self-discharge behavior, and transient power deterioration. Reviews on supercapacitor management and reliability consistently treat these variables as the core observables for monitoring, protection, and lifetime assessment. In other words, a supercapacitor is rarely judged healthy simply because it can still store charge; it must also preserve low-loss, fast-response, and thermally acceptable high-power behavior^[71,72]. This supercapacitor-centered view is also reinforced by recent review work on reliability, management, and deployment constraints in power-buffering applications^[72,73].

This distinction becomes even more important when aging phenomena are considered. During supercapacitor aging, performance degradation is typically observed as a decrease in capacitance and an increase in ESR, but this evolution is not always monotonic. Under power cycling, rest periods can induce partial recovery, especially in apparent capacitance, while ESR may also change with interruption duration^[32]. Temperature and voltage stress are repeatedly identified as dominant drivers of supercapacitor aging, and self-discharge or leakage behavior can become a reliability concern in long-duration idle conditions. Therefore, supercapacitor HIs must be interpreted with attention to duty cycle, rest behavior, and stress condition, rather than borrowed directly from battery aging practice^[35,74]. Batteries and supercapacitors do share observable signals such as voltage, current, temperature, and operational history. They also share a methodological affinity with data-driven learning, because both device classes generate rich time-series information. However, shared signals do not imply shared health semantics. The same voltage response can indicate energy fade in a battery but rising ESR in a supercapacitor. The same current pulse may reveal polarization growth in one device and fast power degradation in the other. The term “capacity” can be misleading if transferred across device classes without qualification: in batteries, it is tied to deliverable charge over a relatively broad energy window; in supercapacitors, capacitance is tied much more directly to charge-voltage proportionality and short-timescale power support. Consequently, the value of a feature depends not only on how easy it is to extract, but also on whether its physical interpretation remains valid for the target device class^[71,72]. Recent degradation studies likewise emphasize that capacitance fade, ESR growth, and lifecycle dispersion depend strongly on temperature, duty profile, and mission structure^[75-77].

For this reason, a battery-centered HI framework should not be extended to supercapacitors by analogy. This is also why the present Section emphasizes feature representation rather than generic feature engineering. A feature should be judged not only by predictive utility, but also by semantic validity within the target electrochemical system.

The same principle extends beyond the battery-supercapacitor contrast: in lead-acid batteries, degradation signatures must be interpreted against sulfation- and corrosion-dominated aging, whereas in flow batteries the meaning of observable signals is more tightly coupled to electrolyte-side imbalance and flow-dependent operating behavior, so even shared measurements remain informative only when read within the health semantics of the target device class^[10-12].

Robustness of HIs under domain shift

A good HI is not merely correlated with SOH; it should remain informative when the operating domain changes. This requirement is difficult to satisfy because many widely used indicators are entangled with temperature, current rate, SOC window, and protocol structure. IC peaks can shift or blur under different charge rates. Voltage-time features can stretch or compress with thermal conditions. Pulse-response features can vary due to excitation design rather than actual degradation. The IC feature-selection study in^[60] makes this point particularly clear: once robustness across charge rate and temperature is included in the assessment, the ranking of candidate features changes accordingly. This means that an HI selected under one protocol may not remain optimal under another.

This issue becomes even more severe in real systems because operating windows are rarely complete or repeatable. Electric-vehicle (EV) charging sessions may terminate before the constant-voltage stage stabilizes, stationary ESS dispatch may involve shallow cycling within restricted SOC regions, and module- or pack-level data may exhibit cell inconsistency that is absent from cell-level laboratory experiments. Recent multimodal and real-world studies emphasize that point features extracted in idealized cell tests may not transfer directly to packs and field assets, precisely because operational context changes the observability of degradation. Therefore, HI robustness should be treated as an empirical, deployment-dependent property rather than an intrinsic property of the feature itself^[78,79].

A second source of HI fragility is that feature usefulness often changes over the aging trajectory itself. Some indicators are highly sensitive in early degradation but saturate later. Others remain nearly flat for a long time and then become informative only near the end of life. This stage dependence is well aligned with the broader probabilistic health literature, which argues that uncertainty and interpretability should be assessed together with the phase of degradation rather than assuming uniform feature quality across the full lifecycle. In practice, this means that HI selection should avoid treating all aging stages as statistically homogeneous.

For supercapacitors, the problem is compounded by recovery behavior. Under cycling tests, apparent capacitance can partially recover during rest, while ESR may also respond to interruption duration. As a result, a feature that appears monotonic under uninterrupted stress may become non-monotonic in realistic mission profiles. Batteries can exhibit analogous, though usually less pronounced, issues through relaxation, hysteresis, or usage-pattern transitions. The broader lesson is that “feature monotonicity” is not a universal property; it is conditional on how the device is stressed, observed, and rested. This is one reason why highly ranked benchmark features can still fail once moved into realistic operating schedules^[80,81].

From a deployment perspective, the best HI is rarely the most elaborate one. A robust indicator should satisfy four practical conditions: it should be physically interpretable, extractable from realistically available signals, reasonably stable under expected operating variation, and informative for the health target that matters to the application. This principle favors compact, observable, and semantically meaningful feature sets over exhaustive but fragile combinations. It also explains why recent domain-guided and real-world studies increasingly prefer a small number of physically grounded indicators to large feature pools whose correlations may collapse under data shift^[82].

This Section therefore suggests a shift in emphasis: HI design should be evaluated by deployment value, not only by benchmark utility. In other words, the question is not simply whether a feature can improve root-mean-square error (RMSE) on a curated dataset, but whether it preserves meaning when labels are sparse, operating profiles are irregular, and uncertainty must be managed explicitly. Once this perspective is adopted, feature representation becomes tightly connected to model trustworthiness. This shift in emphasis has an important consequence: representation design cannot be separated from model trustworthiness. Once HI quality becomes deployment-dependent, model choice, evaluation protocol, uncertainty quantification, and conservative fallback are no longer independent algorithmic decisions. They become distinct aspects of the same deployment problem, which motivates the discussion in Section MODELS UNDER DIFFERENT DATA REGIMES.

MODELS UNDER DIFFERENT DATA REGIMES

Machine learning has become a major route for state-of-health estimation because it can infer degradation-related patterns directly from measured data without requiring a complete electrochemical model. Early data-driven studies have shown that battery health can be estimated via empirical mappings between operational variables and degradation states. For example, Bayesian probabilistic models were used for battery lifespan estimation^[83], while support vector machine learning was employed to characterize nonlinear degradation behavior from operational measurements^[84]. Similar ideas were also explored in supercapacitor studies, where neural-network-based methods were investigated for early cycle-life prediction and degradation modeling^[85]. These studies established a common starting point for later work: once a degradation-relevant representation is available, machine learning can provide a practical mapping from measurable signals to health states.

As the field developed, the main question was no longer whether machine learning could be used for SOH estimation, but how different model families perform under varying data conditions. Earlier reviews summarized several recurring directions in ML-based SOH research, including hyperparameter optimization, ensemble and transfer learning, algorithm combination, and online updating^[86,87]. More recent work has extended these directions through deep temporal models, attention-based architectures, decomposition-assisted frameworks, uncertainty-aware learners, and physics-guided hybrids. These developments are important not simply because they introduce more complex architectures, but also because they address different data structures, supervision regimes, and deployment requirements. For this reason, machine learning modeling is discussed here as a set of learning strategies shaped by the data regime, generalization requirement, and deployment risks rather than as a list of model names.

Model selection under different data regimes

Regression and probabilistic models

A large proportion of early SOH studies were developed in a structured-feature setting, in which degradation information had already been condensed into curated health indicators before the learning step. Under this condition, traditional regressors remained attractive because the main task was not representation discovery from raw sequences, but the construction of a stable mapping from selected HIs to SOH labels. Bayesian probabilistic modeling^[83] and support vector machine learning^[84] are representative examples of this stage in battery research. A similar methodological logic appeared in supercapacitor studies, where lifecycle prediction was examined using neural-network-based methods and other structured-input learning approaches^[85]. Although the devices differ, these studies share the same assumption: the degradation-relevant information has already been compressed into structured inputs, so the learning task is robust regression rather than deep representation learning. Similar conclusions appear in recent comparative studies that revisit structured-feature pipelines and show that carefully curated inputs can still compete strongly when data volume is limited^[88].

This family of methods remains relevant because many practical SOH workflows still rely on curated indicators extracted from voltage, current, temperature, capacity, or resistance-related measurements. When the representation is already physically informative, and the available labels are limited, simpler models often provide important advantages. They usually require fewer samples, are easier to train and validate, and remain easier to interpret than deeper architectures. These properties are particularly important in electrochemical ESS, where field labels are often sparse, and model maintainability can be as important as benchmark accuracy.

Probabilistic learners deserve particular attention in this context. Early Bayesian formulations already implied that point estimation alone was insufficient for lifecycle prediction^[83]. More recent probabilistic battery-health studies further argue that uncertainty should be treated as part of the estimation output rather than as a post hoc refinement. Under limited supervision, probabilistic regressors can therefore serve not only as predictors of SOH, but also as support tools for maintenance planning, derating, and other risk-sensitive operational decisions. This point is also increasingly connected to trust calibration, conservative maintenance actions, and action-aware decision support in recent uncertainty-oriented work^[89].

Deep sequence models

As operational datasets became longer and more complex, SOH estimation gradually shifted from structured-feature regression toward sequence-oriented learning. This transition is evident in studies based on convolutional, recurrent, attention-based, and decomposition-assisted frameworks. These models are not simply more complicated versions of earlier regressors. Rather, they are designed for a different data regime, in which degradation information is distributed across local patterns, long-range temporal dependencies, and interactions among multiple channels or scales. Architectures derived from the broader attention literature also contributed to this transition by making long-range dependence modeling practical for battery degradation sequences^[90].

Representative studies illustrate this transition clearly. A convolutional neural network-gated recurrent unit-squeeze-and-excitation (CNN-GRU-SE) framework was proposed for battery SOH estimation by combining convolutional feature extraction, recurrent sequence learning, and squeeze-and-excitation mechanisms^[91]. Transformer-based architectures were later introduced for battery capacity-sequence modeling^[92], and subsequent studies combined complete ensemble empirical mode decomposition with adaptive noise (CEEMDAN) with Transformer-based learning to separate long-term degradation trends from regeneration-like fluctuations before prediction^[93]. Capacity regeneration was further examined explicitly by decomposing the degradation trajectory prior to Transformer-based estimation^[94]. In each case, the central idea is similar: sequence structure itself is treated as a source of health information rather than as a simple carrier of static features.

Hybrid sequence models have become especially common because no single architecture captures all aspects of degradation equally well. Convolutional modules are effective for extracting local temporal motifs, recurrent modules preserve sequential dependence, and attention mechanisms provide another route to modeling long-horizon interactions. Decomposition-assisted frameworks add a further layer by separating global aging trends from oscillatory or regeneration-related components before learning. This logic is also visible in convolutional neural network-long short-term memory (CNN-LSTM) combinations, variational mode decomposition (VMD)-assisted frameworks, and attention-enhanced recurrent models. The VMD-LSTM-GRU model in^[95,96], for example, was introduced to mitigate the effects of noise and regeneration, while the CEEMDAN-Transformer framework in^[93] separated oscillatory components from the global aging trajectory before estimation. These models are therefore better understood as architecture-signal matching strategies than as arbitrary algorithm combinations^[97].

A similar development can be found in supercapacitor SOH estimation. Deep belief networks optimized by Bayesian optimization and Hyperband were used for supercapacitor SOH estimation^[98]. A staged bidirectional long short-term memory (BiLSTM)-H∞ (H-infinity) observer framework was proposed for capacitance-based SOH estimation, in which BiLSTM was first used to estimate capacitance, and the degradation trajectory was then updated iteratively by an H∞ observer^[99]. Gaussian process regression was also applied to model the degradation trend of electrochemical double-layer capacitors under high-temperature cycling^[100]. Although these studies use different model forms, they share the same motivation: when degradation evolves over long horizons and the signal is noisy, sequence-aware or probabilistic learners often preserve health trajectory information better than static one-shot regressors.

The advantages of deep temporal models, however, should not be overstated. Their flexibility comes with greater dependence on data scale, split design, and preprocessing consistency. A deep architecture may perform very well on highly regular protocols while exploiting correlations specific to the benchmark rather than transferable across operating regimes. For this reason, deep temporal learning is most convincing when the input modality genuinely requires richer representation learning, rather than when additional depth is introduced simply to improve numerical performance under narrow conditions.

Adaptive and physics-guided learning

Adaptive and physics-guided learning become important when the training setting no longer matches the target deployment condition. In SOH estimation, this mismatch may arise from sparse target-domain labels, changing operating regimes, cross-dataset differences, or the need to preserve physical consistency under limited supervision. Under such conditions, transfer learning, online updating, and physics-guided strategies should be understood not as isolated algorithmic variants, but as different responses to distribution mismatch and model unreliability^[86].

Transfer-learning-based battery capacity prediction provides a representative example of this logic^[101]. Its significance lies not only in the predictive framework itself, but in the assumption that knowledge learned under one data regime can be transferred to another with limited additional supervision. More recent work has made this idea even more explicit. Domain-adaptive deep learning has been used for battery SOH estimation without requiring additional degradation experiments on the target batteries^[102], showing that transferable health representations can still be learned across manufacturers and datasets when adaptation is handled carefully. More broadly, these developments suggest that adaptation is most useful when it addresses representation mismatch and sparse supervision under changing operating conditions. These studies indicate that transfer learning in SOH estimation is most valuable when it addresses representation mismatch and sparse supervision rather than simply adding another model layer^[81].

A related development is the growing use of physics-guided and hybrid learning. Earlier studies often treated data-driven and physics-based methods as separate categories, whereas more recent work increasingly combines them^[103]. The motivation is straightforward: a purely black-box model may fit the data well, yet remain difficult to validate or interpret under unseen conditions. This issue is especially relevant in battery management, where estimation error can affect maintenance, derating, and safety-related control. Recent studies therefore embed physical priors, degradation constraints, or consistency relations into neural estimators to improve robustness without sacrificing the flexibility of machine learning. Physics-guided battery-health research has argued that combining mechanism awareness with data efficiency is one of the most promising directions for SOH estimation under realistic operating variability^[18].

These developments suggest that transfer learning, online updating, and physics-guided learning are best viewed as different responses to the same underlying problem: the training distribution is incomplete, the target distribution is variable, and a static black-box learner is rarely sufficient on its own. What differs among them is the strategy used to bridge that gap—knowledge transfer across domains, incremental correction during operation, or structural regularization based on physical understanding.

Figure 4 summarizes the correspondence between representative data regimes and the model families most commonly adopted for SOH estimation, highlighting the primary advantage that motivates each choice.

Figure 4. Representative learning strategies for SOH estimation under different data regimes. SOH: State-of-health.

Generalization and evaluation bias

Good performance on a benchmark does not necessarily imply that a model has learned transferable degradation behavior. This problem has become increasingly apparent in SOH estimation because many widely used datasets are generated under clean and highly structured laboratory protocols. The fast-charging benchmark introduced by Severson et al.^[55] is a representative example. It enabled early-life lifetime prediction at an unprecedented scale and greatly accelerated data-driven battery prognostics. At the same time, it exposed a broader issue: when charge protocols, cycling patterns, and measurement structures are highly regular, a model may exploit correlations that are informative for the benchmark but much weaker in practice.

The same concern extends to other widely used datasets, including NASA, CALCE, Oxford, and other laboratory-derived benchmarks discussed in Section PRACTICAL DATA FOUNDATIONS FOR SOH ESTIMATION. Strong benchmark performance is therefore necessary but not sufficient. The more demanding question is whether the learned relationship remains valid when the operating regime, chemistry, manufacturer, or sampling structure changes. Domain shift is central here. In electrochemical ESS, it is rarely a marginal issue; rather, it is one of the defining conditions of practical deployment.

A related difficulty is data leakage. In SOH estimation, leakage often arises not from direct duplication of training and test samples, but from correlations that survive after splitting. Adjacent cycles from the same cell, overlapping temporal windows, decomposition performed before train-test separation, normalization statistics computed on the full dataset, or protocol fragments that are nearly identical on both sides of the split can all make the evaluation appear more realistic than it actually is. This issue is especially serious for expressive sequence models, which can exploit hidden regularities without making the shortcut obvious.

For this reason, split design should be treated as part of the methodological claim rather than as a minor implementation detail. If the intended claim is generalization across assets, cell-wise splits are more meaningful than random sample splits. If the target is robustness across temperatures, protocols, or manufacturers, condition-wise splits become necessary. If the application concerns forecasting or online updating, time-wise splits are essential. Once this principle is accepted, performance numbers can no longer be interpreted in isolation. They must be read together with the split logic, preprocessing scope, and the relationship between source and target domains.

Recent studies make this point increasingly explicit. Domain-adaptive frameworks address cross-dataset and cross-manufacturer mismatches directly rather than remain within a single narrow protocol^[102]. Multimodal and real-vehicle investigations further indicate that feature usefulness itself may change once operating context becomes part of the data-generating process^[104]. Generalization is therefore not only a property of the trained model; it is also a property of the supervision regime and the evaluation design.

Uncertainty and trustworthiness

In practical SOH estimation, a model should do more than return a point estimate. It should also indicate how much confidence can be placed in that estimate. This need is not new. Early probabilistic work, such as the Bayesian battery lifespan model in^[83] and the Gaussian-process-based supercapacitor degradation modeling in^[100], already showed that uncertainty-aware learning had value in lifecycle estimation. What has changed in recent years is that uncertainty is no longer regarded as a secondary refinement. It is increasingly treated as part of the information required for action.

Uncertainty in SOH estimation may arise from noisy measurements, sparse labels, imperfect model structure, incomplete domain coverage, or operating conditions absent from the training data. Trustworthiness should therefore not be reduced to a single confidence interval. In electrochemical ESS, at least three capabilities are central. The first is uncertainty quantification, which attaches confidence information to the estimate. The second is out-of-distribution (OOD) awareness, which detects when the current input no longer resembles the regime represented in training. In this sense, OOD detection is not separate from SOH estimation, but part of determining whether a health estimate remains valid under the current operating regime. The third is conservative output behavior, meaning that the estimator or its supervisory layer should revert to safer actions when confidence is low, or the distribution has shifted.

Recent literature has made these requirements more explicit. Probabilistic battery-health reviews argue that uncertainty quantification should be treated as a necessary component of diagnostics and prognostics rather than as a supplementary layer^[18]. A similar lesson can be drawn from field-oriented supercapacitor studies. The framework based on tram operational data^[37] showed that sparse and incomplete field fragments can still support SOH prediction, but only when fragment selection, label aggregation, and result interpretation are handled with great care. This is important not only because it concerns supercapacitors, but also because it places incomplete observability and low-resolution field data at the center of the estimation problem itself.

Trustworthiness is therefore not an optional layer added after model development. It is the natural culmination of the earlier Sections. Section SOH TARGETS AND DEFINITIONS defined what should be labeled; Section PRACTICAL DATA FOUNDATIONS FOR SOH ESTIMATION clarified what can be observed; Section HEALTH INDICATORS AND FEATURE REPRESENTATION examined what degradation information can be represented; and the present Section shows that model choice is only one component of a broader reliability problem. A trustworthy estimator is not simply the one with the lowest RMSE. It is the one whose representation, supervision, evaluation protocol, and confidence behavior remain aligned with the intended operating context.

Overall, machine-learning-based SOH estimation is better understood as a family of learning strategies matched to different data conditions than as a ranking of algorithm names. Traditional regressors remain effective when informative HIs and limited labels are available. Deep temporal and hybrid sequence models become valuable when degradation information is distributed across long, noisy, and structured trajectories. Transfer learning, online updating, and physics-guided learning become particularly relevant when target-domain data are sparse, distributions shift, or physical consistency must be preserved. Across all model families, the decisive issue is no longer only whether a model fits the data, but whether it generalizes under realistic evaluation settings, resists leakage, communicates uncertainty, and supports reliable decision-making under changing operating conditions.

EVALUATION AND DEPLOYMENT OF SOH ESTIMATION METHODS

Benchmarking and evaluation for deployable SOH estimation

A large number of machine learning models have been reported for SOH estimation, but comparison across studies remains difficult because datasets, label definitions, operating conditions, and evaluation protocols are rarely aligned. Recent reviews have therefore emphasized that performance reported on curated datasets should not be interpreted as a direct indicator of deployability in real ESS^[20].

This issue is evident in the way predictive error is commonly reported. Metrics such as RMSE and mean absolute error (MAE) are strongly influenced by the underlying dataset, including cycling protocol, temperature range, sampling structure, and the definition of the health target itself. As a result, a model that performs well under controlled laboratory conditions may not retain the same level of performance once the operating regime becomes less regular or less observable.

For this reason, recent literature increasingly treats benchmarking as a multi-dimensional problem rather than an accuracy-only exercise. In addition to predictive performance, comparison of SOH estimation methods has begun to incorporate data requirements, observability dependence, robustness to domain shift, interpretability, uncertainty awareness, and computational feasibility. These factors are particularly important in electrochemical ESS, where signals are often incomplete, labels are sparse, and operating profiles are non-stationary.

Viewed from this perspective, different model families occupy different positions in the deployment landscape. Regression-based and probabilistic models remain effective when structured health indicators are available and labels are reliable, especially because they retain relatively strong interpretability and data efficiency^[88]. Deep temporal models, including LSTM- and Transformer-based frameworks, are more suitable when degradation information is distributed across long and structured time-series data^[90]. Transfer learning and online updating methods become more attractive when source and target conditions differ, and target-domain supervision is limited. Physics-guided approaches further strengthen consistency and robustness by constraining learning with electrochemical knowledge^[103].

Taken together, these studies do not support a universal ranking of model families. Instead, they indicate that method suitability depends on the interaction among data condition, feature representation, supervision regime, and deployment objective. This perspective underlies the comparison framework summarized in Table 5.

Table 5

Deployment-oriented comparison of representative SOH estimation strategies

Strategy	Regime	Representation	Observability	Strength	Limitation	Deployment	Trustworthiness
Regression	Structured	Engineered	Partial	Interpretability	Temporality	Routine	Limited
Probabilistic	Sparse	Engineered	Partial	Uncertainty	Scalability	Safety	Explicit
Temporal	Sequential	Continuous	Dense	Dynamics	Opacity	Data-rich	Weak
Transfer	Cross-domain	Shared	Partial	Adaptation	Misalignment	Heterogeneous	Conditional
Physics-guided	Hybrid	Constrained	Informed	Consistency	Complexity	Operational	Strong

SOH: State-of-health.

System-level integration of SOH estimation outputs

In practice, the usefulness of an SOH estimator is determined not only by model-level performance, but also by how its outputs are integrated into system operation. In real ESS, health outputs are rarely used in isolation; instead, they are embedded in control, scheduling, protection, and maintenance processes. Recent studies have increasingly emphasized that the value of an SOH estimator is ultimately determined by the way its outputs are translated into system-level decisions^[20,39].

In battery systems, SOH estimation is closely linked to operational limits such as allowable power, thermal margin, and maintenance interval. Capacity fade reduces the usable energy available for dispatch, while resistance growth may tighten power constraints and increase heat generation under the same duty cycle. For this reason, battery SOH is often operationalized not as an isolated scalar, but as a quantity that informs derating, dispatch planning, and risk management under prevailing operating conditions.

The integration logic is different in supercapacitor systems. As discussed in earlier sections, the dominant health variables are more closely associated with capacitance retention, equivalent series resistance, and fast-response capability than with battery-style energy capacity^[71-74]. This means that supercapacitor SOH estimation is more directly connected to transient power support and short-timescale efficiency. Under such conditions, state estimation and health estimation should be interpreted jointly in the context of high-frequency cycling, power buffering, and recovery behavior.

The complexity becomes greater in hybrid battery-supercapacitor systems, where the two devices play complementary but non-interchangeable roles. Batteries provide the main energy support, whereas supercapacitors absorb rapid power fluctuations and short-timescale transients. This division of labor has been repeatedly shown to reduce battery current stress and slow degradation, which explains why hybridization is often introduced as a lifetime-oriented design choice rather than only a power-management option. At the same time, recent reviews show that the hybrid literature remains dominated by sizing and energy-management strategies, while coupled health representation remains comparatively underdeveloped^[105-108]. This gap is important for SOH estimation because battery aging and supercapacitor degradation evolve under different stress channels; system health can no longer be represented adequately by a single scalar metric.

A more consistent interpretation is to distinguish two levels of health reporting. The first is component-wise, in which battery and supercapacitor health are estimated separately and then interpreted jointly at the supervisory level. The second is system-oriented, in which the outputs of lower-level estimators are translated into aggregate quantities such as power capability margin, efficiency loss, thermal burden, or maintenance urgency under specified operating constraints. Health-relevant energy-management studies already point in this direction by optimizing power allocation to explicitly account for battery lifetime extension and supercapacitor utilization under realistic operating conditions^[109,110]. Viewed in this way, SOH estimation in hybrid systems is better treated as a hierarchical problem that links component-level degradation to system-level operating consequences, rather than as a direct extension of single-device diagnostics.

Decision guide for method selection and open directions

Given the diversity of data conditions and application scenarios, the selection of an SOH estimation strategy remains a central methodological issue. Recent studies increasingly suggest that method choice should be guided by data availability, label quality, and deployment requirements rather than by algorithmic preference alone.

When structured health indicators are available and labels are reliable, conventional regression and probabilistic models remain attractive because they offer relatively high interpretability and data efficiency. When degradation information is distributed across long, noisy, and structured time-series data, deep temporal models become more suitable. Under cross-domain conditions with limited target-domain supervision, transfer learning and online updating are more likely to preserve performance. In safety-critical settings, physics-guided and uncertainty-aware learning becomes especially important because reliability depends not only on predictive accuracy, but also on consistency and confidence.

This selection logic is summarized conceptually in Figure 5. The figure links data condition, label regime, and deployment demand to representative modeling strategies, thereby emphasizing that method choice is not a one-dimensional preference for a particular algorithm. Instead, it is a constrained decision problem shaped by observability, supervision, and operational consequence.

Figure 5. Constraint-driven selection logic for SOH estimation methods. SOH: State-of-health.

Despite substantial progress, several challenges remain open. One is the lack of standardized datasets and evaluation protocols, which continues to limit the comparability of existing studies. Another is the limited evidence for robustness under real-world conditions, since a large share of the literature is still validated primarily on laboratory data. Uncertainty quantification and OOD detection also remain underdeveloped in practical deployments. In addition, system-level health estimation for hybrid ESS is still emerging and lacks mature evaluation frameworks.

Future research is therefore likely to focus on four converging directions: the development of more standardized and realistic benchmarks, improved cross-domain generalization, deeper integration of physical knowledge with data-driven learning, and the advancement of trustworthy machine learning frameworks for practical energy storage operation. Progress along these lines is likely to determine whether ML-based SOH estimation can move from promising methodology to robust infrastructure for real electrochemical ESS.

CONCLUSIONS AND OUTLOOKS

State-of-health estimation has become a core requirement for electrochemical ESS because health evolution directly affects energy availability, power capability, thermal margin, maintenance planning, and operational safety. As storage deployment moves from laboratory validation to real applications, SOH estimation should no longer be viewed as a purely cell-level diagnostic task, but as a system-relevant inference problem shaped by limited observability, non-stationary operating conditions, and the need for reliable decision support.

This review has organized recent progress along an end-to-end ML-driven SOH estimation pipeline, covering health-target definition and labeling, field data foundations, health-indicator representation, model learning, evaluation, and deployment-oriented selection. A central conclusion is that SOH estimation performance is not determined solely by model architecture. Instead, it emerges from the interaction among target definition, signal availability, feature representation, supervision regime, evaluation design, and deployment objective. Health representation remains one of the most decisive stages in this pipeline, because the usefulness of voltage, current, temperature, pulse response, impedance, and efficiency-related quantities depends on whether they remain observable, stable, and semantically valid under realistic operating conditions. This issue is especially important across device classes: batteries and supercapacitors may share measurable signals, but their health meanings are not interchangeable.

Overall, machine learning for SOH estimation is better understood as a family of deployment-oriented learning strategies than as a ranking of isolated algorithms. Regression and probabilistic models remain valuable when structured indicators and limited labels are available; deep sequence models become advantageous when degradation is distributed across long and noisy trajectories; transfer and online learning are important under sparse target-domain data and evolving conditions; and physics-guided learning becomes increasingly relevant when physical consistency and trustworthiness must be preserved. Future progress will depend less on increasing model complexity alone than on tighter integration across observability-aware target definition, device-consistent health representation, robust evaluation, uncertainty-aware learning, and deployment-oriented method selection.

DECLARATIONS

Authors’ contributions

Investigation: Guo, S.; Chen, M.

Resources: Guo, S.; Chen, M.

Writing - original draft: Guo, S.; Chen, M.

Writing and revising: Qian, X.

Writing - review & editing: Du, Y.; Liu, L.; Wu, Y.

Supervision: Ye, J.; Liu, L.

Project administration: Ye, J.; Liu, L.

Funding acquisition: Wu, Y.

Availability of data and materials

Not applicable.

AI and AI-assisted tools statement

During the preparation of this manuscript, the AI tool Chat GPT (version 5.5, released 2026-04-24) was used solely for language editing. The tool did not influence the study design, data collection, analysis, interpretation, or the scientific content of the work. All authors take full responsibility for the accuracy, integrity, and final content of the manuscript.

Financial support and sponsorship

This work was financially by the National Natural Science Foundation of China (52131306), Project on Carbon Emission Peak and Neutrality of Jiangsu Province (BE2022031-3, BE2022031-4), the National Key Research and Development Program of China (2021YFB2400400), Fundamental Research Funds for the Central Universities (2242023R10001), Start-up Research Fund of Southeast University (RF1028623005), and the Big Data Computing Center of Southeast University.

Conflicts of interest

Wu, Y. is Editor-in-Chief of the journal Energy Z. Wu, Y. was not involved in any steps of editorial processing, notably including reviewers' selection, manuscript handling and decision making. The other authors declare that there are no conflicts of interest.

Ethical approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Copyright

REFERENCES

1. International Energy Agency. Batteries and secure energy transitions: outlook for battery demand and supply. Paris: IEA; 2024. Available from: https://www.iea.org/reports/batteries-and-secure-energy-transitions/outlook-for-battery-demand-and-supply. [Accessed on 2026-6-11].

2. Rey, S. O.; Romero, J. A.; Romero, L. T.; et al. Powering the future: a comprehensive review of battery energy storage systems. Energies 2023, 16, 6344.

3. Dunn, B.; Kamath, H.; Tarascon, J. Electrical energy storage for the grid: a battery of choices. Science 2011, 334, 928-35.

4. Hirsh, H. S.; Li, Y.; Tan, D. H. S.; Zhang, M.; Zhao, E.; Meng, Y. S. Sodium-ion batteries paving the way for grid energy storage. Adv. Energy. Mater. 2020, 10, 2001274.

5. Sinha, A. P.; Thomas, T. S.; Mandal, D. Emerging role of aqueous batteries in next generation energy-dense sustainable storage. Chem. Commun. 2025, 61, 14843-69.

6. Ling, C. A review of the recent progress in battery informatics. npj. Comput. Mater. 2022, 8, 33.

7. Lv, C.; Zhou, X.; Zhong, L.; et al. Machine learning: an advanced platform for materials development and state prediction in lithium-ion batteries. Adv. Mater. 2021, 34, 2101474.

8. Wang, Y. Application-oriented design of machine learning paradigms for battery science. npj. Comput. Mater. 2025, 11, 89.

9. Liu, C.; Deng, Z.; Zhang, X.; Bao, H.; Cheng, D. Battery state of health estimation across electrochemistry and working conditions based on domain adaptation. Energy 2024, 297, 131294.

10. Jiang, S.; Song, Z. A review on the state of health estimation methods of lead-acid batteries. J. Power. Sources. 2022, 517, 230710.

11. Cantera, M.; Rivas, F.; Lubián, L.; Rubio-presa, R.; Ventosa, E.; Cámara, J. M. State-of-health classification of redox flow batteries using neural networks. JJ. Energy. Storage. 2025, 137, 118605.

12. Zheng, Q.; Shi, X.; Cai, Y.; An, L.; Zhang, D. Artificial intelligence-empowered modeling and management of flow batteries: a mini-review. Fut. Batteries. 2025, 7, 100107.

13. Nazaralizadeh, S.; Banerjee, P.; Srivastava, A. K.; Famouri, P. Battery energy storage systems: a review of energy management systems and health metrics. Energies 2024, 17, 1250.

14. Berecibar, M.; Gandiaga, I.; Villarreal, I.; Omar, N.; Van Mierlo, J.; Van Den Bossche, P. Critical review of state of health estimation methods of Li-ion batteries for real applications. Renew. Sust. Energ. Rev. 2016, 56, 572-87.

15. Yang, B.; Qian, Y.; Li, Q.; et al. Critical summary and perspectives on state-of-health of lithium-ion battery. Renew. Sust. Energ. Rev. 2024, 190, 114077.

16. Wassiliadis, N.; Steinsträter, M.; Schreiber, M.; et al. Quantifying the state of the art of electric powertrains in battery electric vehicles: range, efficiency, and lifetime from component to system level of the Volkswagen ID.3. eTransportation 2022, 12, 100167.

17. Roman, D.; Saxena, S.; Robu, V.; Pecht, M.; Flynn, D. Machine learning pipeline for battery state-of-health estimation. Nat. Mach. Intell. 2021, 3, 447-56.

18. Thelen, A.; Huan, X.; Paulson, N.; Onori, S.; Hu, Z.; Hu, C. Probabilistic machine learning for battery health diagnostics and prognostics - review and perspectives. npj. Mater. Sustain. 2024, 2, 14.

19. Wang, Z.; Zhao, X.; Fu, L.; Zhen, D.; Gu, F.; Ball, A. D. A review on rapid state of health estimation of lithium-ion batteries in electric vehicles. Sust. Energy. Technol. Assess. 2023, 60, 103457.

20. Wang, Y.; Guo, S.; Cui, Y.; et al. A comprehensive review of machine learning-based state of health estimation for lithium-ion batteries: data, features, algorithms, and future challenges. Renew. Sust. Energ. Rev. 2025, 224, 116125.

21. Shu, X.; Shen, J.; Guo, F.; et al. Towards real-world battery health intelligence: a review of machine learning advances and challenges in SOH estimation. eTransportation 2025, 26, 100509.

22. Chen, H.; Chen, Y.; Sun, C.; et al. Towards practical data-driven battery state of health estimation: Advancements and insights targeting real-world data. J. Energy. Chem. 2025, 110, 657-80.

23. Wu, M.; Qin, L.; Wu, G. State of power estimation of power lithium-ion battery based on an equivalent circuit model. J. Energy. Storage. 2022, 51, 104538.

24. International Electrotechnical Commission. IEC 62660-1:2018 Secondary lithium-ion cells for the propulsion of electric road vehicles - Part 1: Performance testing. Geneva: IEC; 2018. Available from: https://webstore.iec.ch/en/publication/28965. [Accessed on 2026-6-11].

25. International Electrotechnical Commission. IEC 62391-1:2022 Fixed electric double-layer capacitors for use in electric and electronic equipment - Part 1: Generic specification. Geneva: IEC; 2022. Available from: https://webstore.iec.ch/en/publication/66557. [Accessed on 2026-6-11].

26. Khaleghi, S.; Hosen, M. S.; Karimi, D.; et al. Developing an online data-driven approach for prognostics and health management of lithium-ion batteries. Appl. Energy. 2022, 308, 118348.

27. Yang, H. A comparative study of supercapacitor capacitance characterization methods. J. Energy. Storage. 2020, 29, 101316.

28. Liu, S.; Nie, Y.; Tang, A.; Li, J.; Yu, Q.; Wang, C. Online health prognosis for lithium-ion batteries under dynamic discharge conditions over wide temperature range. eTransportation 2023, 18, 100296.

29. Xu, W.; Liu, L.; Li, M.; et al. Comprehensive review on capacity degradation mechanisms and state-of-health estimation of sodium-ion batteries. J. Energy. Storage. 2025, 132, 117725.

30. Ahn, H.; Kim, D.; Lee, M.; Nam, K. W. Challenges and possibilities for aqueous battery systems. Commun. Mater. 2023, 4, 37.

31. Torregrossa, D.; Paolone, M. Modelling of current and temperature effects on supercapacitors ageing. Part I: review of driving phenomenology. J. Energy. Storage. 2016, 5, 85-94.

32. Chaari, R.; Briat, O.; Vinassa, J. Capacitance recovery analysis and modelling of supercapacitors during cycling ageing tests. Energy. Convers. Manage. 2014, 82, 37-45.

33. Bohlen, O.; Kowal, J.; Sauer, D. U. Ageing behaviour of electrochemical double layer capacitors. J. Power. Sources. 2007, 172, 468-75.

34. Bohlen, O.; Kowal, J.; Sauer, D. U. Ageing behaviour of electrochemical double layer capacitors. J. Power. Sources. 2007, 173, 626-32.

35. Sawant, V.; Deshmukh, R.; Awati, C. Machine learning techniques for prediction of capacitance and remaining useful life of supercapacitors: A comprehensive review. J. Energy. Chem. 2023, 77, 438-51.

36. Hossain Lipu, M.; Rahman, M. A.; Mansor, M.; et al. Data driven health and life prognosis management of supercapacitor and lithium-ion battery storage systems: developments, implementation aspects, limitations, and future directions. J. Energy. Storage. 2024, 98, 113172.

37. Xu, C.; Zhang, C.; Wu, M.; An, Z.; Yang, H. A supercapacitor state of health prediction framework based on tram field data. J. Power. Sources. 2025, 659, 238384.

38. Reveles-Miranda, M.; Ramirez-Rivera, V.; Pacheco-Catalán, D. Hybrid energy storage: Features, applications, and ancillary benefits. Renew. Sust. Energ. Rev. 2024, 192, 114196.

39. Waseem, M.; Ahmad, M.; Parveen, A.; Suhaib, M. Battery technologies and functionality of battery management system for EVs: Current status, key challenges, and future prospectives. J. Power. Sources. 2023, 580, 233349.

40. Krupp, A.; Ferg, E.; Schuldt, F.; Derendorf, K.; Agert, C. Incremental capacity analysis as a state of health estimation method for lithium-ion battery modules with series-connected cells. Batteries 2020, 7, 2.

41. Shu, X.; Shen, S.; Shen, J.; et al. Protocol for state-of-health prediction of lithium-ion batteries based on machine learning. STAR. Protocols. 2022, 3, 101272.

42. Tao, S.; Ma, R.; Zhao, Z.; et al. Generative learning assisted state-of-health estimation for sustainable battery recycling with random retirement conditions. Nat. Commun. 2024, 15, 10154.

43. Bian, X.; Zou, C.; Fridholm, B.; Sundvall, C.; Wik, T. Smart sensing breaks the accuracy barrier in battery state monitoring. Energy. Storage. Mater. 2025, 80, 104410.

44. Sahoo, S.; Hariharan, K. S.; Agarwal, S.; et al. Transfer learning based generalized framework for state of health estimation of Li-ion cells. Sci. Rep. 2022, 12, 13173.

45. Duan, C.; Le, H.; Wu, D. Lithium-ion battery state-of-health estimation using intra-domain and cross-domain transfer learning: mitigating domain shift based on Wasserstein distance. J. Energy. Storage. 2025, 132, 117601.

46. Saxena, A.; Goebel, K. Li-ion Battery Aging Datasets dataset. NASA Ames Prognostics Center of Excellence; 2008. Available from: https://data.nasa.gov/dataset/li-ion-battery-aging-datasets. [Accessed on 2026-6-11].

47. Bole, C. K. B.; Kulkarni, C. S.; Daigle, M. Randomized battery usage data set dataset.. NASA Prognostics Data Repository; 2014. Available from: https://www.nasa.gov/intelligent-systems-division/discovery-and-systems-health/pcoe/pcoe-data-set-repository/. [Accessed on 2026-6-11].

48. He, W.; Williard, N.; Osterman, M.; Pecht, M. Prognostics of lithium-ion batteries based on dempster-Shafer theory and the Bayesian monte Carlo method. J. Power. Sources. 2011, 196, 10314-21.

49. Williard, N.; He, W.; Osterman, M.; Pecht, M. Comparative analysis of features for determining state of health in lithium-ion batteries. IJPHM. 2020, 4.

50. Xing, Y.; Ma, E. W.; Tsui, K.; Pecht, M. An ensemble model for predicting the remaining useful performance of lithium-ion batteries. Microelectron. Reliab. 2013, 53, 811-20.

51. Center for Advanced Life Cycle Engineering (CALCE). Battery Data dataset. University of Maryland. Available from: https://calce.umd.edu/battery-data. [Accessed on 2026-6-11].

52. Deng, Y.; Ying, H.; E, J.; et al. Feature parameter extraction and intelligent estimation of the State-of-Health of lithium-ion batteries. Energy 2019, 176, 91-102.

53. Reniers, J. M.; Mulder, G.; Howey, D. A. Oxford energy trading battery degradation dataset dataset. University of Oxford; 2020.

54. University of Oxford. Oxford battery degradation: path dependence dataset dataset. University of Oxford.

55. Severson, K. A.; Attia, P. M.; Jin, N.; et al. Data-driven prediction of battery cycle life before capacity degradation. Nat. Energy. 2019, 4, 383-91.

56. Toyota Research Institute. data.matr.io: Experimental data platform dataset. Available from: https://data.matr.io/. [Accessed on 2026-6-11].

57. Mun, T.; Noh, C.; Lee, S. Comparative Analysis of DCIR and SOH in field-deployed ess considering thermal non-uniformity using linear regression. Energies 2025, 18, 5640.

58. Zheng, L.; Zhu, J.; Lu, D. D.; Wang, G.; He, T. Incremental capacity analysis and differential voltage analysis based state of charge and capacity estimation for lithium-ion batteries. Energy 2018, 150, 759-69.

59. Guo, Z.; Qiu, X.; Hou, G.; Liaw, B. Y.; Zhang, C. State of health estimation for lithium ion batteries based on charging curves. J. Power. Sources. 2014, 249, 457-62.

60. Su, L.; Tao, S.; Chen, Y.; Zou, C.; Zhang, X. Incremental capacity feature selection for lithium-ion battery state of health estimation considering estimation capability and efficiency. Cell. Rep. Phys. Sci. 2026, 7, 103083.

61. Lanubile, A.; Bosoni, P.; Pozzato, G.; Allam, A.; Acquarone, M.; Onori, S. Domain knowledge-guided machine learning framework for state of health estimation in Lithium-ion batteries. Commun. Eng. 2024, 3, 168.

62. Tian, J.; Xiong, R.; Shen, W. A review on state of health estimation for lithium ion batteries in photovoltaic systems. eTransportation 2019, 2, 100028.

63. Chen, L.; Lü, Z.; Lin, W.; Li, J.; Pan, H. A new state-of-health estimation method for lithium-ion batteries through the intrinsic relationship between ohmic internal resistance and capacity. Measurement 2018, 116, 586-95.

64. Lin, M.; Wu, D.; Meng, J.; Wu, J.; Wu, H. A multi-feature-based multi-model fusion method for state of health estimation of lithium-ion batteries. J. Power. Sources. 2022, 518, 230774.

65. Tan, R.; Lu, X.; Cheng, M.; Li, J.; Huang, J.; Zhang, T. Forecasting battery degradation trajectory under domain shift with domain generalization. Energy. Storage. Materials. 2024, 72, 103725.

66. Zhou, R.; Zhu, R.; Huang, C.; Peng, W. State of health estimation for fast-charging lithium-ion battery based on incremental capacity analysis. J. Energy. Storage. 2022, 51, 104560.

67. Zeng, X.; Berecibar, M. Emerging sensor technologies and physics-guided methods for monitoring automotive lithium-based batteries. Commun. Eng. 2025, 4, 44.

68. Liu, H.; Li, C.; Hu, X.; et al. Multi-modal framework for battery state of health evaluation using open-source electric vehicle data. Nat. Commun. 2025, 16, 1137.

69. Li, G.; Li, B.; Li, C.; Wang, S. State-of-health rapid estimation for lithium-ion battery based on an interpretable stacking ensemble model with short-term voltage profiles. Energy 2023, 263, 126064.

70. Sun, R.; Chen, J.; Piao, C. Battery health features extraction and state of health estimation based on real-time online vehicle driving data. J. Power. Sources. 2025, 645, 236784.

71. Naseri, F.; Karimi, S.; Farjah, E.; Schaltz, E. Supercapacitor management system: A comprehensive review of modeling, estimation, balancing, and protection techniques. Renew. Sust. Energ. Rev. 2022, 155, 111913.

72. Dutta, A.; Mitra, S.; Basak, M.; Banerjee, T. A comprehensive review on batteries and supercapacitors: development and challenges since their inception. Energy. Storage. 2022, 5, e339.

73. Şahin, M.; Blaabjerg, F.; Sangwongwanich, A. A Comprehensive review on supercapacitor applications and developments. Energies 2022, 15, 674.

74. Liu, S.; Wei, L.; Wang, H. Review on reliability of supercapacitors in energy storage applications. Appl. Energy. 2020, 278, 115436.

75. Guo, F.; Lv, H.; Wu, X.; et al. A machine learning method for prediction of remaining useful life of supercapacitors with multi-stage modification. J. Energy. Storage. 2023, 73, 109160.

76. Li, Y.; Stroe, D.; Cheng, Y.; Sheng, H.; Sui, X.; Teodorescu, R. On the feature selection for battery state of health estimation based on charging-discharging profiles. J. Energy. Storage. 2021, 33, 102122.

77. Stroe, D.; Schaltz, E. Lithium-ion battery state-of-health estimation using the incremental capacity analysis technique. IEEE. Trans. Ind. Appl. 2020, 56, 678-85.

78. Guo, F.; Wu, X.; Liu, L.; et al. Prediction of remaining useful life and state of health of lithium batteries based on time series feature and Savitzky-Golay filter combined with gated recurrent unit neural network. Energy 2023, 270, 126880.

79. Son, S.; Jeong, S.; Kwak, E.; Kim, J.; Oh, K. Integrated framework for SOH estimation of lithium-ion batteries using multiphysics features. Energy 2022, 238, 121712.

80. Xing, J.; Zhang, H.; Zhang, J. Remaining useful life prediction of - lithium batteries based on principal component analysis and improved Gaussian process regression. Int. J. Electrochem. Sci. 2023, 18, 100048.

81. Tarar, M. O.; Naqvi, I. H.; Khalid, Z.; Pecht, M. Accurate prediction of remaining useful life for lithium-ion battery using deep neural networks with memory features. Front. Energy. Res. 2023, 11, 1059701.

82. Wang, Y.; Zhu, J.; Cao, L.; et al. A generalizable method for capacity estimation and RUL prediction in lithium-ion batteries. Ind. Eng. Chem. Res. 2023, 63, 345-57.

83. Ng, S. S.; Xing, Y.; Tsui, K. L. A naive Bayes model for robust remaining useful life prediction of lithium-ion battery. Appl. Energy. 2014, 118, 114-23.

84. Patil, M. A.; Tagade, P.; Hariharan, K. S.; et al. A novel multistage Support Vector Machine based approach for Li ion battery remaining useful life estimation. Appl. Energy. 2015, 159, 285-97.

85. Ren, J.; Lin, X.; Liu, J.; et al. Engineering early prediction of supercapacitors’ cycle life using neural networks. Mater. Today. Energy. 2020, 18, 100537.

86. Li, X.; Yu, D.; Søren Byg, V.; Daniel Ioan, S. The development of machine learning-based remaining useful life prediction for lithium-ion batteries. J. Energy. Chem. 2023, 82, 103-21.

87. Li, J.; Zhao, S.; Miah, M. S.; Niu, M. Remaining useful life prediction of lithium-ion batteries via an EIS based deep learning approach. Energy. Rep. 2023, 10, 3629-38.

88. Zhang, X.; Xu, Y.; Gong, Z. A feature fusion optimization algorithm for predicting the remaining useful life of lithium-ion batteries. Energy. Rep. 2023, 9, 142-53.

89. Wang, Z.; Liu, N.; Chen, C.; Guo, Y. Adaptive self-attention LSTM for RUL prediction of lithium-ion batteries. Inform. Sciences. 2023, 635, 398-413.

90. Vaswani, A.; Shazeer, N.; Parmar, N.; et al. Attention is all you need. Adv. Neural. Inf. Process. Syst. 2017, 30, 6000-10.

91. Chen, X.; Chen, M.; Fang, W.; Ye, J.; Liu, L.; Wu, Y. Improving lithium-ion battery state of health estimation with an integrated convolutional neural network, gated recurrent unit, and squeeze-and-excitation model. Phys. Scr. 2025, 100, 036004.

92. Chen, D.; Hong, W.; Zhou, X. Transformer network for remaining useful life prediction of lithium-ion batteries. IEEE. Access. 2022, 10, 19621-28.

93. Wang, Z.; Liu, Y.; Wang, F.; Wang, H.; Su, M. Capacity and remaining useful life prediction for lithium-ion batteries based on sequence decomposition and a deep-learning network. J. Energy. Storage. 2023, 72, 108085.

94. Cai, Y.; Li, W.; Zahid, T.; Zheng, C.; Zhang, Q.; Xu, K. Early prediction of remaining useful life for lithium-ion batteries based on CEEMDAN-transformer-DNN hybrid model. Heliyon 2023, 9, e17754.

95. Hu, L.; Wang, W.; Ding, G. RUL prediction for lithium-ion batteries based on variational mode decomposition and hybrid network model. SIViP. 2023, 17, 3109-17.

96. Dragomiretskiy, K.; Zosso, D. Variational mode decomposition. IEEE. Trans. Signal. Process. 2014, 62, 531-44.

97. Zhao, L.; Song, S.; Wang, P.; Wang, C.; Wang, J.; Guo, M. A MLP-Mixer and mixture of expert model for remaining useful life prediction of lithium-ion batteries. Front. Comput. Sci. 2023, 18, 185329.

98. Haris, M.; Hasan, M. N.; Qin, S. Early and robust remaining useful life prediction of supercapacitors using BOHB optimized Deep Belief Network. Appl. Energy. 2021, 286, 116541.

99. Lou, G.; Lin, W.; Huang, G.; Xiang, W. A two-stage online remaining useful life prediction framework for supercapacitors based on the fusion of deep learning network and state estimation algorithm. Eng. Appl. Artif. Intell. 2023, 123, 106399.

100. Roman, D.; Saxena, S.; Bruns, J.; Valentin, R.; Pecht, M.; Flynn, D. A machine learning degradation model for electrochemical capacitors operated at high temperature. IEEE. Access. 2021, 9, 25544-53.

101. Chou, J.; Wang, F.; Lo, S. Predicting future capacity of lithium-ion batteries using transfer learning method. J. Energy. Storage. 2023, 71, 108120.

102. Lu, J.; Xiong, R.; Tian, J.; Wang, C.; Sun, F. Deep learning to estimate lithium-ion battery state of health without additional degradation experiments. Nat. Commun. 2023, 14, 2760.

103. Borah, M.; Wang, Q.; Moura, S.; Sauer, D. U.; Li, W. Synergizing physics and machine learning for advanced battery management. Commun. Eng. 2024, 3, 134.

104. Tian, H.; Xi, C.; Zhang, Q. A framework for estimating battery state of health using multi-source domain adaptation and real vehicle data. J. Energy. Storage. 2025, 136, 118449.

105. Gopi, C. V. M.; Ramesh, R. Review of battery-supercapacitor hybrid energy storage systems for electric vehicles. Results. Eng. 2024, 24, 103598.

106. Rezaei, H.; Abdollahi, S. E.; Abdollahi, S.; Filizadeh, S. Energy management strategies of battery-ultracapacitor hybrid storage systems for electric vehicles: Review, challenges, and future trends. J. Energy. Storage. 2022, 53, 105045.

107. Peng, X.; Wang, C.; Liu, Y.; et al. Critical advances in re-engineering the cathode-electrolyte interface in alkali metal-oxygen batteries. Energy. Mater. 2022, 1, 100011.

108. Liu, L.; Zhou, C.; Fang, W.; Hou, Y.; Wu, Y. Rational design of Ru/TiO₂/CNTs as cathode: promotion of cycling performance for aprotic lithium-oxygen battery. Energy. Mater. 2023.

109. Zhu, T.; Wills, R. G.; Lot, R.; Ruan, H.; Jiang, Z. Adaptive energy management of a battery-supercapacitor energy storage system for electric vehicles based on flexible perception and neural network fitting. Appl. Energy. 2021, 292, 116932.

110. Ma, B.; Guo, X.; Li, P. Adaptive energy management strategy based on a model predictive control with real-time tuning weight for hybrid energy storage system. Energy 2023, 283, 129128.

Cite This Article

Review

Open Access

Towards reliable energy storage: a review of machine learning-driven SOH estimation for electrochemical energy storage systems

How to Cite

Download Citation

If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. Simply select your manager software from the list below and click on download.

Export Citation File:

RIS BibTeX EndNote

Type of Import

Direct Import Indirect Import

Tips on Downloading Citation

This feature enables you to download the bibliographic information (also called citation data, header data, or metadata) for the articles on our site.

Citation Manager File Format

Use the radio buttons to choose how to format the bibliographic data you're harvesting. Several citation manager formats are available, including EndNote and BibTex.

Type of Import

If you have citation management software installed on your computer your Web browser should be able to import metadata directly into your reference database.

Direct Import: When the Direct Import option is selected (the default state), a dialogue box will give you the option to Save or Open the downloaded citation data. Choosing Open will either launch your citation manager or give you a choice of applications with which to use the metadata. The Save option saves the file locally for later use.

Indirect Import: When the Indirect Import option is selected, the metadata is displayed and may be copied and pasted as needed.

About This Article

Disclaimer/Publisher’s Note: All statements, opinions, and data contained in this publication are solely those of the individual author(s) and contributor(s) and do not necessarily reflect those of OAE and/or the editor(s). OAE and/or the editor(s) disclaim any responsibility for harm to persons or property resulting from the use of any ideas, methods, instructions, or products mentioned in the content.

Copyright

© The Author(s) 2026. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, sharing, adaptation, distribution and reproduction in any medium or format, for any purpose, even commercially, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Data & Comments

Data

Views

24

Downloads

2

Citations

0

Comments

0

Comments

Comments must be written in English. Spam, offensive content, impersonation, and private information will not be permitted. If any comment is reported and identified as inappropriate content by OAE staff, the comment will be removed without notice. If you have any queries or need any help, please contact us at support@oaepublish.com.