Statistical methods for exoplanet detection with radial velocities

Exoplanets can be detected with various observational techniques. Among them, radial velocity (RV) has the key advantages of revealing the architecture of planetary systems and measuring planetary mass and orbital eccentricities. RV observations are poised to play a key role in the detection and characterization of Earth twins. However, the detection of such small planets is not yet possible due to very complex, temporally correlated instrumental and astrophysical stochastic signals. Furthermore, exploring the large parameter space of RV models exhaustively and efficiently presents difficulties. In this review, we frame RV data analysis as a problem of detection and parameter estimation in unevenly sampled, multivariate time series. The objective of this review is two-fold: to introduce the motivation, methodological challenges, and numerical challenges of RV data analysis to nonspecialists, and to unify the existing advanced approaches in order to identify areas for improvement.


INTRODUCTION
Until recently, our Solar System was the only laboratory in which to study theories of planetary formation and evolution, and to search for life beyond Earth. The presence of planets outside the Solar system, or exoplanets, was uncertain until the detection of planets orbiting a pulsar (Wolszczan & Frail 1992). The study of exoplanets as a scientific field accelerated with the detection of 51 Peg b, a planet of minimum ∼ 0.5 Jupiter mass orbiting a Sun-like star in 4.2 days (Mayor & Queloz 1995).
The discovery of 51 Peg b, and hundreds of additional exoplanets, was based on the radial velocity method, which is the focus of the present work, and relies on the following principle. A star around which a planet revolves has a periodic reflex motion, so it moves back and forth toward the observer with a certain velocity: its radial velocity (RV). By acquiring spectra of a given star at different times, the observer can measure a time series of RV through the Doppler effect. The amplitude of the variations in RV is proportional to the planetary mass, and its shape depends on the orbital eccentricity.
Mass is one of the most fundamental parameters of a planet. When the radius is measured separately via photometry, the combination of mass and radius gives the density, essential to characterizing the internal structure of the planet, and the surface gravity, key to interpreting measurements of its atmosphere (Kempton et al. 2018, Batalha et al. 2019). The eccentricity is important to understand the formation of planetary systems, especially in multiplanetary systems (e.g. Jurić & Tremaine 2008).

Radial velocity (RV):
Velocity of a given star projected onto the direction observer -star.
Stellar spectrum: Number of photons collected from a given star per wavelength.
Spectrograph: instrument used to measure the light flux as a function of its wavelength; can be used to measure the radial velocity of stars The RV method plays a key role in understanding the demographics of planetary systems, by allowing the detection of planets spanning a much wider range of orbital periods than other detection techniques. Moreover, it does not require a precise orientation of the orbital plane, thereby allowing RVs to characterize planetary systems in which the planets have significant mutual inclinations. As such, RV measurements have a unique potential to reveal the architecture of planetary systems with periods up to ∼ 10 years  Photometry: Number of photons collected on a celestial body, here a star, in a broad spectral band as a function of time Transit: passage of a planet between a star and an observer; results in a periodic dip in the flux received One long-term goal of the scientific community working on exoplanets is to detect and characterize the atmospheres of a population of potentially Earth-like planets, and in particular to search for evidence of life. Characterising the atmosphere of Earth-like planets is beyond the reach of current facilities, but projects of ground based instruments such as PCS@ELT (Kasper et al. 2021), ANDES@ELT, and direct imaging space mission concepts LUVOIR, HaBex 1 or nulling interferometer LIFE (ESA, Quanz et al. 2021)) hope to achieve this by 2050.
Measuring the mass of terrestrial planets to an accuracy of 20% or better is essential to interpreting direct imaging and atmospheric observations (Batalha et al. 2019, Crass et al. 2021. Additionally, detecting Earth-like planets with RV measurements prior to the start of a future direct imaging mission would substantially increase its yield. The RV technique is poised to play a key role in the detection and characterisation of Earth-like planets around Sun-like stars, but we do not yet know if it can achieve the required accuracy. The Earth produces a velocity variation of the Sun of RV⊕ = 9 cm/s, which we use as a standard unit. The latest-generation of spectrographs (e.g., ESPRESSO (Echelle SPectrograph for Rocky Exoplanets and Stable Spectroscopic Observations), EXPRES (EXtreme PREcision Spectrometer), NEID (NN-EXPLORE Exoplanet Investigations with Doppler spectroscopy)) have demonstrated on-sky precision on a few nights from 3 to 4.5 RV⊕ , Lin et al. 2022). If the measurement noise were independent, then 200 measurements would suffice to measure the mass of an Earth-clone with 20% precision, but intrinsic stellar variability causes complex, temporally correlated radial velocity signal of the order of at least 5 RV⊕. As an example, the RV of the Sun as measured with the HARPS-N spectrograph has a 22 RV⊕ standard deviation. Characterizing Earth-like planets requires better methods for mitigating stellar variability and instrument systematics. This can be viewed as a particular instance of a general problem: detecting and characterizing periodic signals in multivariate, unevenly sampled time series, corrupted by complex noise.
The present work overviews the efforts undertaken to make Earth twins detectable. The RV data products are presented in §2. We present an overview of the different challenges in §3. We then present RV analysis in a recursive manner: Given summary statistics extracted from raw data and a statistical model of these, the question of how one detects planets and estimates their orbital elements is discussed §4. Given the summary statistics, the question of how one builds a statistical model of the data is treated in §5. How to exploit information in lower data products to extract accurate RVs and useful summary statistics is discussed in §6.

Doppler shifts
The radial velocity (RV) of a star, its velocity projected onto the line of sight, can be measured due to the Doppler effect. If a source emits a photon with wavelength λ0, and has a velocity v relative to an observer, then the wavelength of the photon received is given by where · is the scalar product and k is the unit vector from the observer to the source (Einstein 1905). The spectrum of a star contains thousands of absorption lines, short intervals of wavelengths for which photons are absorbed in the upper parts of stellar atmospheres. As the star moves, the Doppler effect causes the apparent wavelength of spectral lines to change (see Fig. 1). An observer who acquires the spectrum of a given star multiple times can measure changes in the apparent wavelength of each spectral line as a function of time, t. This maps to measuring RV(t) ≡ k · v(t) in Eq. 1. For more details on the definition of RV, see Lindegren & Dravins (2003) and Lovis & Fischer (2010).
Measuring the radial velocity from a spectrum has many steps, including the correction of instrumental effects. In this review, we start with the estimate of the true stellar spectrum, de-biased from instrumental effects, and transformed to be equivalent to what an observer at the solar system's center of mass would see. The earlier steps of analysis are briefly presented in Appendix §B.1.

Forward model of planetary effects
An orbiting planet causes a reflex motion of the star, thus creating RV variations. More precisely, the star and the planet periodically traverse elliptical orbits. The star's RV projected onto the line of sight, or RV, at time t depends on the planet's mass and orbital parameters and is given by the so-called Keplerian signal, f (t; K, P, e, ω, M0) = K [cos (ω + ν(t; e, P, M0)) + e cos ω] 2.
where K, P, e, ω, and M0 are the velocity amplitude, the orbital period, the orbital eccentricity, the argument of periastron and the mean anomaly describing the motion of the star. The true anomaly, ν, is an angle that parametrizes the position of the star on its orbit (see Appendix §A and Murray & Correia 2010).
Keplerian signal: RV signal as a function of time resulting from the orbit of a planet, as given in Eq. 2 and shown in Fig. 2.
The eccentricity of the ellipse is confined to [0, 1) for a bound system undergoing periodic motion. Orbits with e ≃ 0 are nearly circular and can be well-approximated by the leading terms of a series expansion in the mean anomaly. Orbits with e ≥ 0.3 become Figure 1: (a) Small portion of a spectrum in rest frame (red, dotted) and in observer frame (blue, solid). (b) Difference in flux between the Doppler-shifted and rest frame spectra. For illustration, we show a large Doppler shift typical of a binary star system. Abbreviation: v, relative velocity of the stars.
obviously elongated and the motion becomes noticeably uneven in time, typically leading to a sharp rise and slow fall (or vice versa), as shown in Fig. 2. Very high eccentricities (0.9 < e < 1−R⋆/a, where R⋆ is the stellar radius and a is the mean star-planet separation) are very rare, but can create numerical difficulties. The signal amplitude depends on the mass of the star and the planet, as well as on the orbital eccentricity, according to the formula where M is the stellar mass, m is the planet mass, G the gravitational constant, and i is the angle between plane in which the planet orbits and the plane perpendicular to the line of sight (Perryman 2011). Eq. 3. shows that the RV semi-amplitude is proportional to m/P 1/3 : the more massive the star, and the smaller and farther from its star a planet is, the more difficult the planet is to detect. Each observed star has a proper motion due to the motion of the star (and the Sun) around the galactic center. On the timescale of RV observations, ∼ 1 to 20 years, this appears as constant or in some cases a linear trend that must be added to Eq. 2. Furthermore, most Sun-like stars host multiple planets (He et al. 2021). In principle, the planet-planet interactions cause deviations from the Keplerian model, but for the vast majority of planetary systems, the motion of the star can be well-approximated over the timescale of RV surveys as the linear superposition of the Keplerian orbit due to each planet. Hence, our nominal physical model for a time series of RV measurements of a star hosting n planets with parameters (Kj, Pj, ej, ωj, M0 j )j=1,..,n is RV(t) = c0 + c1t + n j=1 f (t; Kj, Pj, ej, ωj, M0 j ). 4.
The RV amplitude of planet j, Kj is proportional to m sin i (see Eq. 3.). In the absence of information on i, the planetary mass m cannot be determined unambiguously. Constraints Figure 2: Radial velocity (RV) signature of a single planet. Each panel shows the RV signature over one orbital period for a planet with a given orbital eccentricity e. All curves are computed with the same RV semiamplitude K, and all panels have the same y-axis scale.
on i can be obtained if the planet transits or, in the case of systems with multiple high-mass short-period planets, due to detecting planet-planet interactions (e.g. Laughlin & Chambers 2001, Nelson et al. 2016, Rosenthal et al. 2019. When the data are acquired with different instruments, it is important to account for the fact that they might have different zero velocity references. Hence, if there are m different instruments, c0 in Eq. 4. should be replaced by m j=1 c j 0 χj(t), where χj(t) = 1 if the measurement at t is taken by instrument j and 0 otherwise.

Basic model of observed data
To detect the variations due to planets, astronomers measure the RV of a given star at several irregularly spaced epochs, usually from 20 to 1,000. Each star is only observable when it is high above the horizon, typically for only a few hours per night, and weather can prevent stars from being observed. Furthermore, a star is typically observable for ∼ 4-8 months, as it must not be too close to the Sun. Finally, most observatories support multiple science programs, so observations on a given star might be interrupted for a period from hours to weeks.
Each RV measurement has a nominal uncertainty depending on the number of photon received, as well as the properties of star, spectrograph and data reduction technique (Bouchy et al. 2001). Astronomers typically assume independent Gaussian measurement noise at each time ti, but in order to reduce the risk of underestimating uncertainties (e.g., unmodeled planets, instrumental noise or stellar variability), an extra Gaussian noise term, often called "jitter", is added to the model (Ford 2006), leading to where RV(t) is defined in Eq. 4. and the value of the assumed noise for the measurement made at ti, ϵ(ti), follows a Gaussian distribution of variance σ 2 t i + σ 2 J . Fig. 3 shows an example of a two-planet fit on the RV data of HD114783. Figure 3: (a) Radial velocity (RV) measurements of the star HD 114783. The gray curve shows the maximum likelihood 2-planet model. (b) RV residuals relative to the 2-planet model above. Error bars reflect reported measurement uncertainties (σt i ) and do not reflect the 3 m/s of jitter (σJ ). Points are color-coded to indicate whether they are before or after an instrument upgrade that resulted in an unknown RV offset. Data are from Rosenthal et al. (2021).

Other Processes Affecting RVs
The model in Equations 4.-6. has been extensively used to estimate the masses and orbital parameters of exoplanets (e.g., Wright & Howard 2009, Bonomo et al. 2017. For favorable stars, it is useful for analysing RV signals with amplitude over 3 − 10 m/s = 33 -110 RV⊕, enough to characterise short-period planets with masses greater than Neptune. However, detecting less massive planets requires very precise observations and/or many observations. Accurately interpreting their RV signatures requires greater sophistication because multiple potential noise sources become relevant and require more detailed models for ϵt. These sources are listed below, and presented in more detail in Appendix B.
2.4.1. Stellar effects. The principle of Doppler spectroscopy is to measure the velocity of a light source with respect to an observer, so if the hot gas in different parts of the stellar surface has different brightness and velocities, it has different impacts on the data. Several physical processes on the surface of stars have an RV signature.
Convection near the surface of the star creates outward and inward motion of the gas. This creates a pattern of evolving convection cells. This process, referred to as granulation, has an effect on RV measurements that can be described as a stationary noise with a Lorentzian or super-Lorentzian power spectrum (Harvey 1985, Dumusque et al. 2011b. Granulation effects on the RV have been simulated in detail , Dravins et al. 2021. Local enhancement of the magnetic field at the surfaces of stars might result in regions darker or brighter than their surroundings, called spots and faculae, respectively (Schrijver 2002, Strassmeier 2009). This creates an imbalance in flux from the the approaching and receding side of the rotating star, and reduces upward convection, which has an additional net RV effect. The effect of magnetic activity on RV has been studied in Saar & Donahue (1997), Dumusque et al. (2011a) and Haywood et al. (2016), and simulated in Lagrange et al. (2010), Boisse et al. (2012), Dumusque et al. (2014) and Gilbertson et al. (2020). Active regions grow in size rapidly and disappear more gradually. The visible stellar surface is similar but not identical to itself after one stellar rotation, which results in quasi-periodic RV variations.
The rate of appearance, size and location of spots and faculae varies over a ∼ 10 year time-scale on solar-type stars, although it can be down to ∼ 1 year for M-type stars (Cortés-Zuleta et al. 2023). This affects the amplitude of short-term RV variations due to magnetic activity (up to 25 m s -1 (Lovis et al. 2011)) and changes the net RV effect over these timescales. Appendix §B.2 provides a description of known stellar effects affecting RVs, and we refer the reader to Cegla (2019), Meunier (2021), and Crass et al. (2021) for reviews.

Other effects.
Residual instrumental systematics and absorption of certain wavelengths by the Earth atmosphere can result in spurious RV signatures of the order of a few RV⊕ (e.g. Cunha et al. 2014, Artigau et al. 2014, Smette et al. 2015. As they are not amenable to a concise statistical description, we refer the reader to Appendix §B.1 and Halverson et al. (2016), Cretignier et al. (2021) for details.

Model of RV
The planets only affect the Doppler shift, which must be estimated and converted to RV measurements. We can express the RV extracted from the spectra, RV(t), as the sum of RV(t), the RV at time t due to a motion of the center of the mass of the star (in particular due to planets), RVcontam(t), the RV caused by stellar variability and instrument systematics, and ϵ(t), the inevitable photon noise. Then, we have RV(ti) = RV(ti) + RVcontam(ti) + ϵ(ti). 7.
where RV planets (ti) and RVg(ti) are respectively induced by planets and by other gravitational effects such as companion stars or proper motion in the galaxy. In Eq. 4., RVg(t) = c0 + c1t.
In order to interpret RV measurements properly, we must adequately model RVcontam(t). Unlike planetary RV signals, RV variations induced by the star and instrument do not have a constant frequency, phase, and amplitude (see §4.4), and stellar processes and systematics not only cause a Doppler shift in the spectrum but also cause the shape of the spectrum to vary with time (see Figure 4) and do not affect all lines identically (e. g. Thompson et al. 2017, Wise et al. 2018, Cretignier et al. 2020a, Ning et al. 2019, Al Moulla et al. 2022a, Lafarga et al. 2023. The shape variations of the spectra can be used to predict the associated RVcontam(t) signal, either in a machine learning approach with a training set of spectra, or in a statistical framework. In the following section, we show how these ideas can be expressed in a general statistical formalism.
CCF Mask: Mask: function of the wavelength equal to unity except on wavelength ranges corresponding to spectral lines (see Pepe et al. 2002) Ancillary indicator: time series of N scalars extracted from time series of N spectra, summarizing a shape variation of the spectrum, for instance the variations of the average spectral line, or of the amplitude of a specific line. Figure 4: Schematic cross-correlation function (CCF). The scalar product of a mask and the spectrum as a function of the wavelength offset between the two, which can be visualized as an average spectral line. (a) The blue curve shows the CCF of the quiet stellar surface, and the red curve shows the CCF when a dark spot is present on the surface. (b) The CCF when the spot is present has a different full width at half maximum (FWHM) than the CCF of the quiet stellar surface and is asymmetric. The difference between the average position of the lower and upper parts of the CCF, known as bisector inverse slope (BIS) , is nonzero. As the star rotates, the spot CCF will have a different Doppler shift: The bump in the red CCF moves from left to right, which also causes the shape of the spectra to change with time, in particular the BIS and FWHM. The spot in this figure covers 20% of the surface, whereas spots on the sun cover ∼0.01-0.1% of the surface.

Challenges of RV data analysis
Our understanding of exoplanets can be thought of as a language where each planet is a word, and each planetary system is a sentence. As in information theory (Shannon 1948), our knowledge of this language comes through a noisy and biased communication channel. In the long term, we want to understand the full language, but first we must learn to interpret each sentence: what planets orbit a given star? The Bayesian formalism is well suited to describe such situations, and indeed it offers a compact way to present the different problems of RV data analysis (Sandford et al. 2019, also uses an informationtheoretic framework for exoplanets).
In our analogy, the message received is the raw data. The most fundamental data product is the number of electrons counted on each CCD detector of the spectrograph. 2 The sentence to be decoded is represented by vectors θ and η. The vector θ = (θ1, ..., θn) represents the planetary system, where n is the number of planets (which can also be treated as an unknown variable) and θj the orbital elements of planet indexed by j (see Eq. 2.). The vector η includes all other relevant non-planetary parameters, such as an offset, a drift, the stellar rotation period, noise amplitude and timescale.
Let us denote the lowest-level data by D. Our goal is to obtain a meaningful joint posterior distribution for η and θ knowing D, p(θ, η | D), but D is corrupted by hundreds of instrumental effects, and computing the posterior directly is currently unmanageable. To be usable D needs to be reduced to an estimate of the RV time series RV = ( RV(ti))i=1,..,N and ancillary indicators I = (Ij(ti))j=1..p,i=1..N , such that for each j, Ij(t) is a time series of length N , summarizing the variations of shape of the spectrum (see Fig. 4.b). With notations of Eqs. 7. and 8., .

9.
The last equality comes from the fact that RV(t) depends deterministically on θ, η. Below, we denote y = ( RV, I), potentially with I = ∅. We will refer to p(θ, η) as the prior distribution and p(y | θ, η) as the likelihood. In some cases, to emphasize that the likelihood is restricted to models with exactly n planets we will denote the likelihood by p(y | θ, η, n). This formalism also accounts for the case where photometric data of the target star is also available: the photometry time series can be considered as another indicator, with the added complexity that the data is not acquired at the same epochs as the spectra.
Reduced data: vector y concatenating the RV time series and ancillary indicators, extracted from the raw data D Likelihood: probability distribution of the reduced data y knowing the model parameters θ and η; denoted by p(y | θ, η) Based on Eq. 9, we can identify several choices to be made: (a) a reduction method transforming the observational raw data into summary statistics RV, I that estimate the RV and provide useful indicators of spectral variability; (b) a model, that is a likelihood p(y | θ, η) and prior distribution p(θ, η). The key difference with step (a) is that the effect of potential planets is now explicitly included in the likelihood definition; (c) a decision method for confidently claiming the detection of planetary signals and estimating their masses and orbital elements for a given model, or a collection of models. In the Bayesian formalism above, the decision is based on the posterior distribution p(θ, η | y) and must include a discussion of its sensitivity of the result to the choices made in a and b. The Bayesian formalism only serves here to compactly present points a-c and is not always used in practice. Points a, b, and c are treated recursively in §6, 5 and 4, respectively, and must be completed by (d) a set of practical and reliable numerical methods for performing the necessary calculations. Numerical aspects are highlighted when relevant.

Likelihood
Suppose that we have at our disposal an RV time series, a few ancillary indicators and a statistical model of them. We want to determine how many planets can be confidently detected, and what their orbital elements are.
In §3, we presented the different steps of exoplanet detection, and in particular of a likelihood function describing the distribution of data y as a function of the model parameters θ and η, describing respectively the planets and all other effects. The likelihood is commonly assumed to be Gaussian, where |V(η)| is the determinant of the covariance matrix V(η). For instance, supposing that y is the RV time series, using model 4.-6., θ = (Kj, Pj, ej, ωj, M0 j )j=1,..,n, g is the sum of Keplerians and an affine function (see Eq. 4.), and η = σJ , c0, c1, V is diagonal with i-th element σ 2 t i + σ 2 J . If RVcontam is non-zero, it might be modelled by a non-diagonal matrix V(η), and/or a linear model of stellar activity indicators in g. This form of the likelihood is used both when the data y is the RV time series only, a single ancillary indicator, or the concatenation of the time series of RV and ancillary indicators (possibly including also photometric observations of the target star). We return to how to specify further g(t; θ) and V(η) in §5.
Assessing the statistical significance of a putative RV signal is remarkably challenging. We review three broad approaches: periodogram-based methods ( §4.2.2), Bayesian model comparison ( §4.3), and in §4.4, we present approaches that do not aim to specify an explicit model for nuisance signals, but aim to obtain robust detections.

Planet Detection via Periodograms
The decision on the planets detected and their orbital elements should make use of the full posterior of orbital elements, or in a frequentist setting the full likelihood. However, computing the posterior distribution of elements given in Eq. 9. or exploring exhaustively the parameter space to evaluate the likelihood is only possible with the latest numerical tools, and is computationally intensive. Realistic datasets often have millions of local likelihood maxima. Historically, the exploration of likelihood modes was done with periodogram methods, which are still extensively used due to their speed, numerical stability, and ability to unveil the dominating frequencies in the data.
4.2.1. Definition. Given a time series, for instance the RV time series, or the time series of a given ancillary indicator, periodograms consist in comparing the log-likelihoods of two models: a base model H0 and a model Kω including H0 plus a periodic component at frequency ω, for a grid of frequencies. In periodograms, the Keplerian model is often replaced with an approximation that yields a convex model once conditioned on a small number of parameters.
The first and simplest case is the Lomb-Scargle periodogram (Lomb 1976, Scargle 1982, where the base model H0 is Gaussian white noise, and Kω is the same white noise plus a sine function. Both H0 and Kω are described with likelihoods such as that 10., with H0 : g = 0, 11. Kω : g(A, B, ω) = A cos ωt + B sin ωt, 12.
and V = diag((σ 2 i )i=1,...,N ) in both models. In Kω, g is the exact RV signature of a planet on a circular orbit (e = 0) with orbital period P = 2π/ω and is a good approximation when eK ≤ σ RV,ideal . For a given ω, maximizing the likelihood with respect to A and B is equivalent to to minimizing the sum of squares, and it is a linear problem. The Lomb-Scargle periodogram is then the difference of the log-likelihoods of models H0 and Kω as a function of ω.  , and ∆ log R ′ HK (Noyes 1984)), and RV time series (RV) of the ESPRESSO (Echelle SPectrograph for Rocky Exoplanets and Stable Spectroscopic Observations) data of the star TOI 178178 (Leleu et al. 2021). The time series of ∆BIS, log R ′ HK and RV are shown in Fig. 7. Periodograms are expressed in reduction in sum of squares (RSS), normalized -that is, the difference between residual sum of squares after fitting H0 and Kω, divided by that of the H0 residuals (see Eqs. 11. and 12.). The periodogram presents peaks around a period of 1 day, due to the repetition of observations every 24 hours ± 2-3 h (Dawson & Fabrycky 2010). All three time series present power at ∼ 40 days, due to stellar rotation. (b) In orange, periodogram of the RV time series when the ∆ log R ′ HK and ∆BIS are included as a linear predictors in the H0 and Kω models. This damps the amplitude of the stellar rotation: / signal, and a peak at 3.2 days emerges. The red and light brown curves correspond to adding a correlated noise Gaussian model in the covariance with a Gaussian kernel of amplitude 1 m/s and timescale 2 or 10 days. Adding correlated noise models always damps power at low frequencies (for details, see Delisle et al. (2020a)).
The principle of the periodogram can be extended with more complex definitions of H0 and Kω and/or by interpreting the periodogram in a Bayesian context. The null hypothesis H0 can be complexified (e. g. Baluev 2008), the periodic signals might be chosen as nonsinusoidal (e. g. Baluev 2013aBaluev , 2015b, and the assumption that the noise is uncorrelated can be dropped (e.g. Delisle et al. 2020a). Figure 5 shows an example of the application of different types of periodograms. A comprehensive list of existing periodograms is given in Appendix C.

Periodogram-based model comparison.
The usual way to determine whether a periodogram supports the detection of a periodic signal is to compute p(maxω P(ω) | H0) the probability distribution of the maximum value of the periodogram, maxω P(ω) under a null hypothesis, H0. We compute the maximum of the periodogram for the data to be analyzed, P d on a grid of frequencies Ω = (ωi)i=1..N , and define a false alarm probability, FAP, as p(maxω∈Ω P(ω) ⩾ P d | H0).
The FAP is most robustly estimated by generating signals under the null hypothesis, and the FAP estimated is the fraction of simulations with a maximum peak greater than P d . This is computationally expensive, as it requires ∼ n simulations for a precision of 100/n% on the FAP. Since astronomers often aim for a FAP of < 0.1% to 0.01%, n > 10 4 is needed.
A semianalytical approach allows one to simulate a reduced number of signals and to fit a generalised extreme value distribution (Süveges 2014) to the empirical distribution. The analytical approach approximates the FAP thanks to the theory of extreme values of stochastic processes using the Rice formula (Baluev 2008(Baluev , 2009(Baluev , 2013, even for complex periodic shapes (Baluev 2013a(Baluev , 2015b or correlated noise (Delisle et al. 2020a). These analytical approximations are very accurate in practice if at least a few tens of data points are available (Süveges et al. 2015).

Pros and cons.
Periodograms are fast and numerically stable, and the advent of accurate analytical estimates of the FAP, even in the case where signals are searched for simultaneously (Baluev 2013) or with Gaussian correlated noise models (Delisle et al. 2020a), further simplifies their use. They provide a useful visual diagnostic of the frequency content of the signal. However, they have drawbacks.
First, most periodograms scan for the orbital period of one planet at a time. For multiple planet systems, periodograms can be applied iteratively, first to identify the most conspicuous planet from the data, then to look for a (j + 1)th planet in the residuals of the best-fit model using j planets. However, periodograms are sensitive to aliasing, or spectral leakage, (Dawson & Fabrycky 2010). Aliases of different signals can add destructively or coherently, so, even if the highest peak of a periodogram is statistically significant, its period may not match that of any physical signal. This can be avoided by searching for multiple signals simultaneously , Baluev 2013. Brute force search for two signals at once is expensive and allowing for more rapidly becomes prohibitive. Other approaches based on sparse recovery allow one to search for several signals at low computational cost (Hara et al. 2017). Stellar activity causes low-frequency signals which can be mistaken for planets. It is better not only to search for several planets simultaneously but also to fit the parameters of stellar activity and planetary signals jointly. As discussed in §4.1, stellar variability is often modeled as correlated RV noise using a non-diagonal covariance matrix V that requires additional parameters that must be inferred from the data. This means that the log likelihood is non-convex for a given orbital period. Marginalizing (or even optimizing) over the kernel parameters dramatically increases the computational cost of periodograms, although it can be done (Delisle et al. 2018).

Bayesian Approach to Planet Detection
A more principled approach to comparing models with different numbers of planets is to directly use the Bayesian formalism of Eq. 9. The first method consists in computing the the Bayesian evidence, also called marginal likelihood, of the n-planet model. Letting Θn denote the parameter space of all possible combinations of n planets, where p(y | θ, η, n) is given by Eq. 10. If one can compute the Bayesian evidences for models with n + 1 and n planets, then their ratio gives the "Bayes factor" (Kass & Raftery 1995). If the noise model is correct, then when one considers more planets than are justified for the given data set, the evidence decreases as more planets are added. For instance, when the priors on the different parameters are considered independent, the prior term decreases geometrically with the number of planets but the model does not result in a significantly higher likelihood. Bayesian model comparison for exoplanet detection was suggested by Gregory ( , 2007a, Ford & Gregory (2007), Tuomi & Kotiranta (2009) and Ford et al. (2011), and has become one of the primary methods for establishing the statistical significance of detections. Astronomers compute the Bayes factors for increasing n, starting at n = 0, until they are below a certain threshold. The literature contains various heuristics for the interpretation of Bayes factors (e.g., Jeffreys 1961). Unfortunately, following guidelines blindly can be very dangerous. For important scientific discoveries astrophysicists routinely demand that a frequentist test rejects a null hypothesis test with a p-value of ∼ 10 −3 or even ∼ 10 −7 before publishing a result. In such situations, a Bayesian would demand a posterior odds ratio (i.e., prior odds ratio times the Bayes factor) exceeding 10 3 or even 10 7 before publishing a discovery. A Bayes factor strongly favoring an (n + 1)-planet model over an n planet model, does not necessarily imply that there must be an (n + 1)th planet, if the physical or statistical models used are inaccurate. It is important to check the dependence of the results on the adopted model.
Bayes factors compare different numbers of planets. It is insufficient to claim a confident detection of a planet, which requires knowledge of the orbital elements to a certain accuracy: there is little value in a planet detection without estimates of its size and period. Welldefined orbital elements correspond to a sharp posterior mode, whose existence can be checked with the posterior samples of orbital elements. This can be done with periodograms, or directly with the posterior distribution of orbital elements if it can be estimated reliably. If so, the detection criterion can be constructed to convey information about the planets' location. For instance a detection claim can be defined as: "there is a planet with orbital elements in Θ", where Θ is a region of the parameter space (Brewer & Donovan 2015, Handley et al. 2015b, Hara et al. 2022c). In that case, the criterion minimizing the number of missed detections for a certain tolerance to false ones is the false inclusion probability (FIP, Hara et al. 2022a), i. e. the probability of not having a planet with orbital elements in Θ.
A significant barrier to more routine adoption of Bayesian model comparison is the difficulty of computing the Bayesian evidence accurately. Nelson et al. (2020) compared different methods to compute the Bayesian evidence and found that the agreement between them goes from a factor ∼ 1 for 0 planet models to ∼ 10 2 for 2 planet models. Additionally, the observed dispersion of Bayes factor estimates from multiple runs of the same method was often significantly greater than the reported uncertainties for some methods. There, they recommend evaluating the numerical uncertainty based on several independent runs. When a fast estimate is needed, a Laplace approximation is advised over heuristics such as AIC or BIC. Numerical details are discussed further in Appendix D.

Qualitative approach to planet detection
The methods presented in §4.2 and 4.3 rely on a complete model of the signal. All the alternative hypotheses are made explicit and are compared to one another. This gives meaningful statistical significance indicators and measures of uncertainty. However, if none of the noise models captures stochastic variation in the data, the inferences are unreliable. A second approach consists, in the formalism of Eq. 8., of extracting meaningful information about RV planets (t) without explicitly specifying a model for RVcontam(t). Such approaches also might serve another purpose: diagnosing unanticipated effects.
On the timescale of RV observations, unless there are strong gravitational interactions between the planets, planetary signals are purely periodic, unlike for stellar and instrumental signals. To diagnose whether a signal is truly periodic Schuster (1898) and Mortier & Collier Cameron (2017) compute classical periodograms by adding one point at a time and checking that the amplitude of the peak corresponding to the candidate planet increases steadily. Alternatively, Gregory (2016), Hara et al. (2022b) use the Bayesian framework described in §4.3 and adds an apodization factor to the Keplerians: the model of Eq. 2 is multiplied by a Gaussian term e −(t−t 0 ) 2 /(2τ 2 ) , where t0 and τ are free parameters. If the signal is consistent, the probability that τ exceeds the total observation time-span should be high.
Another line of work consists in searching for periodic signals in the data, without specifying a parametric form. Zucker (2015) suggests using a Hoeffding test. Zucker (2018) applies the formalism of distance correlation to evaluate the statistical dependence of a cyclic variable with period P and the data, and this is further explored by Binnenfeld et al. (2021), who aim to find RV variations which are statistically independent from the spectral shape variations. This work uses spectra information through all pairwise distances between two spectra measurements, and shows promising results.

Parameter estimation and uncertainty quantification
Once the dominant local modes of the likelihood or posterior have been identified by a periodogram analysis or with a random sampler with good convergence properties, more computationally intensive methods can be employed for finer parameter estimation in this region. The posterior distribution of the elements can be evaluated with MCMC algorithms (Ford 2005). Brute-force random walk MCMC requires carefully chosen proposal distributions for systems with up to four planets (Ford 2006). Modern studies typically use ensemble samplers (Nelson et al. 2014, Foreman-Mackey et al. 2013, adaptive Metropolis sampling (Delisle et al. 2018), Hamiltonian and/or geometric MCMC samplers (e.g., Papamarkou et al. 2021), or pre-marginalisation with a Laplace approximation of the evidence over the linear parameters (Price-Whelan et al. 2017). In each of these methods, the orbital periods are initialised at multiple values very near the dominant signals found by the periodogram analysis. The results are generally reliable if the likelihood is dominated by a single mode and the initial estimate of the period falls into that mode. Testing numerical convergence when using MCMC or other iterative algorithms. Some nested samplers have shown good performances in blindly locating different likelihood maxima (Brewer & Donovan 2015, Handley et al. 2015a,b, Faria et al. 2022).

MODEL: SPECIFYING THE PRIORS AND LIKELIHOODS
Different choices of priors and likelihoods have a very strong impact on the detection of planets. Priors have a strong influence both on the detection and orbital elements estimation of low amplitude signals, where the likelihood is less constraining (Hara et al. 2022c). To avoid priors that are unrealistically diffuse, a possibility is to use well tested reference priors and to exclude orbital configurations that are unstable (Tamayo et al. 2020, Stalport et al. 2022. A discussion of the influence of priors and a list of commonly used ones is provided in Appendix E.1, and we now focus on the likelihood, p(RV | I, D) (see Eq. 9.).
Historically, the first method was to represent RV variations induced by the instrument and star as a linear combination of the ancillary indicators, RVcontam = j αjIj , 2009). This neglects the fact that indicators are themselves noisy. The efficiency of this method is also limited due to the fact that there can be phase shifts between activity-induced changes in RVs and ancillary indicators, reducing the correlation , Lanza et al. 2018. For rotational-linked variability, it can be useful to generalize to models that predict RVcontam at time t using spectra taken at times near t (Collier Cameron et al. 2019, . Another way to account for the signals corrupting RVs is to represent them as correlated Gaussian noise. One can still follow the formalism of Eq. 10. assuming that here y is the RV time series, but now specifying the matrix V with a kernel, which gives the correlation between the value of the stellar RV signal at t and t + ∆t. A list of common kernels is provided in Appendix E.2. The correlation of RVs due to stellar variability can be expressed as a sum of the correlations due to each of the processes described in Section 2.4 (see Fig. 6). However, using a Gaussian process representation of the RV only has serious limitations: it acts as a frequency filter, and thus might spuriously decrease the significance of viable planet candidates. To use the information contained in the ancillary indicators, account for their random nature and complex relationships to one another, recent work use Eq. 10. but where the data y is the concatenation of RV and ancillary indicators time series in the framework of Gaussian processes.
Gaussian processes were introduced to the exoplanet community in Aigrain et al. (2012). They showed that when nearly contemporaneous photometric and RV observations of a star are available, the effect of one stellar spot on RV can be predicted through the photometric flux and its derivative. For most targets, continuous photometric measurements are unavailable. One can estimate the derivative of the flux at any time using Gaussian processes (GPs). These are stochastic processes G(t), function of a variable t such that for any n values of t, t1, ..., tn, (G(t1), ..., G(tn)) follows a multivariate Gaussian distribution. In general, t can be a vector, but for our purpose, we define it as the time t. This implies that the GP is defined by two quantities: its mean m(t) and kernel, k(t, t ′ ), equal to the covariance of G(t) and G(t ′ ) (Rasmussen & Williams 2005, Aigrain & Foreman-Mackey 2022. Aigrain et al. (2012) use photometric data to predict the RV variation. We can go a step further and analyze RV and ancillary indicators simultaneously by building a likelihood p(RV, I | D), typically in the Gaussian process (GP) framework. RVs and ancillary indicators are expressed as linear combinations of one or several latent Gaussian processes and their derivatives (Rajpaul et al. 2015, Gilbertson et al. 2020, Gordon et al. 2020, Barragán et al. 2022, Tran et al. 2023. Since G(t + ∆t) ≈ G(t) +Ġ(t)∆t, using the derivative can account for small phase shifts between RV and indicators. The data y in Eq. 10 is then the concatenation of the RV vector and ancillary indicators. Fig. 7 shows an example of RV and ancillary indicators modeled simultaneously with a GP and its derivative.
Representing RV and ancillary indicators as a linear combination of a GP and its deriva- tive is only one way to specify the covariance in Eq. 10, other options have been proposed. For example, the GP framework can be used to model simultaneously the RVs derived in different spectral bands (Cale et al. 2021). The kernel of the extended data y can also be based on explicit physical models of the star (Luger et al. 2021b,a, Hara & Delisle 2023. Furthermore, the assumption that the likelihood is Gaussian and stationary can be relaxed. Hara & Delisle (2023) quantifies non Gaussianity with an analytical model of stellar variability. Camacho et al. (2023) model the RV and ancillary indicator time series in a GP regression network. It allows, in particular, consideration of noise with heavy tails and offers ways to account for non-stationarity due to the magnetic cycle.
Evaluating the likelihood Eqn. 10 for certain values of the parameters requires the log determinant of V(η) and computing {y − g(t; θ)} T V −1 (η){y − g(t; θ)}. With standard algorithms the computational cost scales as O(N 3 ) with the size N of the dataset y, and becomes prohibitive for N ≳ 10 3 . This prevents using many indicators on stars with hundreds of points, or analysing the tens of thousands of RV measurements on the Sun. Fortunately, for certain choices of the kernel, the covariance matrix has a "semi-separable" expression and then the computational cost scales as O(N ) (the CELERITE framework, Foreman-Mackey et al. 2017, Foreman-Mackey 2018. Similarly, TemporalGPs.jl 3 allows for O(N ) computation of a broad class of 1-d kernel functions in Julia. The CELERITE framework is generalised by Delisle et al. (2020b) to the S+LEAF framework, which models covariances as a sum of semi-separable and "LEAF matrices" 4 . Quasi-separable kernels provide even greater flexibility for modeling multivariable timeseries (tinygp 5 ). Delisle et al. (2022) show that if RV and ancillary indicators are linear combinations of a Gaussian process G(t) and its derivatives, and the covariance of G(t) has a S+LEAF form, the computation of the likelihood of the augmented data is still O(N ) 6 .
The merits

ANALYSIS METHODS: DEEPER LEVELS
So far, we have seen different methods to detect planets and estimate their orbital elements given priors and likelihood in §4. We have seen how the model can be built based on a certain reduction of the data in §5. We now present techniques to extract physical information from the spectra.
Our goal is to identify conceptual similarities between different methods, rather than to assess their performances. While the community has begun to compare methods , proper evaluations are challenging since one does not know the true RV for real stars due to potential undetected planets (except for our Sun; Collier Cameron et al. 2019, Dumusque et al. 2020, Lin et al. 2022. We suggest metrics to compare approaches in §7.

Estimating RV
For a fixed stellar spectrum, the RV is well defined and can be extracted from the spectrum using multiple techniques described below. However, at the sub-m/s (or a few RV⊕), level, the shape of the spectrum changes, so, there is no unambiguous definition of RV, and each method to estimate it relies on more or less explicit assumptions. Depending on how this shape change is modelled, the estimate of the Doppler shift changes.
Three main families of methods for extracting the velocity in each spectrum separately are in use. As discussed in §3, the historic method consists of cross-correlating the spectrum with an idealised mask (Baranne et al. 1979, Pepe et al. 2002. Denoting by f (λ) the flux as a function of the wavelength Λ, Figure 7: Gaussian process regression applied to the time series of RV, and ancillary indicators ∆BIS ) and ∆ log R ′ HK (Noyes 1984) of 46 measurements obtained on the star TOI 178 with the ESPRESSO instrument (Leleu et al. 2021). Each of the three time series yi(t), i = 1, 2, 3 is represented by a model yi(t) = αiG(t) + βiĠ(t) + ϵm(t) + ϵJ (t). The Gaussian process G has a stochastic harmonic oscillator (SHO) kernel (Foreman-Mackey et al. 2017) with three parameters, fitted with the S+LEAF package . The processes ϵ m (t) and ϵ J (t) represent the measurement error and jitter. We represent with a brown line the mean of the posterior predictive distribution for each time series, and the shaded yellow areas represent ±1 standard deviation of αiG(t) + βiĠ(t). The purple curve represents the variations predicted by the best-fit Keplerian model for the planet, whose presence is confirmed by transits (Leleu et al. 2021). Other abbreviations: BIS, bisector inverse slope; ∆ log R ′ HK , ancillary indicator derived from each spectrum.
where * is the convolution operator. This method can be viewed as computing an average line shape. Each line contributes to improving the SNR, but also contributes a bias since true line wavelengths are unknown. Additionally, the CCF loses information about differences in shapes of each line. Often, many lines are excluded from the CCF mask to reduce contamination of lines likely to contribute significant bias. Alternatively, Lienhard et al. (2022) proposes a least-square deconvolution technique to estimate a common spectral line profile. A second approach, template matching, consists in measuring the RV based on a template or model spectrum and a Taylor expansion for the spectrum as a function of velocity (Connes 1985, Bouchy et al. 2001, Anglada-Escudé & Tuomi 2012, Astudillo-Defru et al. 2015, Jones et al. 2022. Related work has been done by Cretignier et al. (2022). This approach is based on the approximation where f0(λ) is a reference spectrum and c is the speed of light. Both the CCF and template matching approaches aim to reduce the impact of stellar variability by careful selection of lines/wavelengths for inclusion, but lack a mechanism to recognize stellar variability. Some authors measure the RV of each line separately (Dumusque 2018, Artigau et al. 2022) (or ∼ 2Å"chunk" of the spectrum), and compute a weighted average RV after a statistical rejection of outliers, or do independent RV estimates in several bandwidths (Feng et al. 2017, Zechmeister et al. 2018). The rationale is to disentangle signals which affect all wavelengths identically (planet candidates) from other signals.
A third approach builds a forward model of the spectrum, /c 2 is the wavelength in a frame that accounts for v bc (t), the known motion of the observatory relative to the solar system barycenter (the "barycentric correction"), where T (·) is an atmospheric transmission profile, and IP (·) is the instrument response. Variations on forward modeling have focused on telluric effects (Butler et al. 1996, Hirano et al. 2020 or on stellar variability (Bedell et al. 2019, Gilbertson et al. 2020, Jones et al. 2022). Ongoing research is developing computational tractable approaches for including both simultaneously (e.g., StellarSpectraObservationFitting.jl).
As methods become more sophisticated, the estimation of the velocity and fitting of spectral shape tend to be made simultaneously on the time series of spectra. In principle, this mitigates the chance that the Doppler shift is contaminated by shape changes, and uses temporal structure to further constrain activity.

Estimating nuisance RV signals
To put Eq. 9. into practice, one must specify a statistical model for p(I, RVcontam | θ, η). First, one chooses the method of dimension reduction to obtain I, and then a form for the likelihood which describes I and RVcontam.
Dimension reduction has been applied to real and synthetic spectra. Davis et al. (2017) applied PCA to spectra generated with SOAP 2.0. Since real stellar spectra are more complicated, their analysis provides a lower limit on the number of PCA components needed to accurately reconstruct solar spectra (ranging from 1 to 4 depending on the spectral resolution and SNR). Analysis of solar data suggests that only 6-13 basis vectors are necessary to model solar variability at the resolution and SNR of HARPS-N observations, and at least four of those are clearly linked to effects due to the instrument or unique to sun-as-a-star observations (Collier Cameron et al. 2021). Together, these suggest that reducing spectra to an RV and two to six indicators is a fruitful direction for future research.
Dimension reduction and estimation of contaminating RV can be done simultaneously. Existing methods extract ancillary indicators either in each spectrum separately, from a time series of CCFs (or other summaries of the spectra), or from the time series of spectra themselves. In the following subsections we describe the associated methods.
6.2.1. Extraction spectrum by spectrum. Some methods extract activity indicators from each spectrum separately. For example, one can measure line shapes (or deviations from their time-average). Shape line indicators were first extracted from the CCF, in particular its asymmetry ) and width (Queloz et al. 2009). Other indicators involve fitting Gaussian distributions on each side of the CCF (Figueira et al. 2013). Holzer et al. (2021) fit a superposition of Gauss-Hermite functions to each line to build shape indicators.
Another approach is to measure properties of specific absorption lines of interest. The utility of different lines depends on the effective temperature of the host star. For sun-like stars, log R ′ HK and the so-called S-index are popular magnetic activity indicators based on emission in the core of the Calcium II H&K lines (Noyes 1984, Wright et al. 2004. For cooler stars, the Hα line is a more useful indicator (Kürster et al. 2003). The unsigned magnetic field estimated from the spectrum can be a powerful indicator of RVcontam ) and might generalize better than any individual line. Santerne et al. (2015) and Lanza et al. (2018) find that activity indicators can have relatively weak correlations with RV signals contaminating the data. One contributing factor is the expected time lag between RVs and other indicators, noted in the case of RV and photometry , Santos et al. 2003, Queloz et al. 2009) and other indicators (Suárez Mascareño et al. 2017). This time lag can be partially mitigated in models where the derivative of the latent Gaussian process appears (Rajpaul et al. 2015 or with adaptive correlation (Simola et al. 2022).
6.2.2. Methods using stellar variability indicators. More generally, spectra can be reduced to a set of summary statistics that serve as stellar variability indicators and their time series can be analyzed jointly with the RVs. By utilizing temporal information, we may obtain less noisy indicators.
The most common reduction is to compute the CCF. However, good indicators should be insensitive to true Doppler shifts, so Collier Cameron et al. (2021) proposed Scalpels to analyze the autocorrelation of the CCF. The resulting time series of CCF autocorrelations are analyzed with a PCA and the PCA scores are used as variability indicators. Since the autocorrelation function is insensitive to shifts, true Doppler shifts will not affect the resulting variability indicators. de Beurs et al. (2022) explored more flexible supervised learning approaches to predicting RVcontam from a training data set, either simulated or solar data CCFs. It is unclear whether the greater flexibility of neural networks will outweight their added complexity and difficulty of interpretation relative to linear regression or Scalpels, particularly given the limited size of datasets available for stars other than the Sun. Cretignier et al. (2022) transform the spectra into "shells" instead of CCFs, in an effort to reduce the amount of information lost when averaging lines of different depths. The PCA scores for the shell are used as spectral indicators, but are first orthonormalised with respect to the shell corresponding to a pure Doppler shift.
More research is needed to develop effective means of comparing choices of summary statistics, variability indicators and corresponding likelihoods. Several methods were explored in the EXPRES Stellar Signals Project, where several teams analysed observations of four stars by EXPRES . One key finding was that while many methods could reduce the root mean square RMS RV of observations, the estimates of RVcontam differ significantly across methods. Without knowing the true velocity, it was impossible to determine which methods are best. In §7.2, we present several ideas for a robust comparison of the different methods leveraging the existing and upcoming RV observations of the Sun with different instruments.
Sun-as-a-star observations allow the testing and validation of methods for mitigating stellar variability. Three methods, Scalpels (Collier Cameron et al. 2021), linear regression on CCFs and neural networks (de Beurs et al. 2022) have been shown to significantly reduce the level of RV variability in solar observations. Existing and upcoming comparisons of Sun-as-a-star observations from multiple instruments, including a new generation of more highly stabilized spectrographs, will help disentangling solar RV variations due to the solar variability and from instrument specific signals, and refine the comparisons of different analysis methods.
6.2.3. Methods using the time series of spectra. Rajpaul et al. (2020) propose an alternative approach of measuring a ∆ RV from each pair of spectra and reconstructing the RV time series (modulo a constant), while giving greater weight to pairs that are more similar. In order to reduce the computational cost, they split the spectra into many small chunks to be analyzed separately.
In principle, one could perform inference on the entire spectroscopic time series. Jones et al. (2022) introduced Doppler-constrained PCA on the time series of spectra to characterize stellar variability and provided a proof of concept on simulated solar observations. Bedell et al. (2019) applied a similar method to real observations, emphasizing modeling of telluric contamination and neglecting stellar variability. They regularized the variables using ℓ1 and ℓ2 norm constraints. More recent work has developed computationally efficient implementations that simultaneously model both stellar and telluric variability, as well as accounting for the instrumental profile and allowing for physically-informed regularization schemes.

Summary
RV is poised to play a key role in the study of exoplanets in the next decade as a primary way to measure their masses; to detect interesting, nontransiting planets; to have a more complete view of planetary system architectures; and to provide targets whose atmospheres will be further characterized by spectro-imaging or interferometry. Currently, characterizing an Earth twin is out of reach, and improving data analysis techniques will play a fundamental role.
We presented the different steps of RV data analysis, separating them in three problems (see Section 3): (a) reducing the information of the spectrum into an RV time series, (b) modeling nuisance signals and the prior information on planetary and nuisance parameters, and (c) deciding how many planets are present and what their orbital elements are. Each of these steps requires numerical methods, where convergence should be carefully checked. Since there are multiple reasonable choices for a, b, and c, researchers should perform multiple analyses with different assumptions to determine whether key conclusions are sensitive to these choices, especially of the likelihood and prior.

Future work
The exoplanet community is now well-equipped to analyze RV observations when planetary signals dominate stellar and instrumental variability. When these planetary signals and corrupting ones are of similar amplitude, the fact that different methods yield different results  shows that more work remains to be done. We believe that further research in steps a and b is crucial.
The ability of RVs to detect and measure the mass of potential Earth twins is critically linked to how precisely the contaminating RV signals, the RV signal not due to the motion of the center of mass of the star (RVcontam in Equations 7.-8.), can be predicted and how accurately the uncertainty on these predictions can be quantified. Correcting these signals can be done either in a statistical framework by building a likelihood function or with supervised learning techniques trained to predict the RV contamination signal from spectral shapes. The new models may be data-driven or physics-driven, especially for RV signals originating from the star, for which there is a substantial modeling effort (see Appendix B. 2) yet to be translated to data analysis techniques. Further research in step a is also required to ensure that all the information contained in the spectra is used and that the RV and indicators derived are reliable summary statistics.
We believe that one of the most critical question is how to validate choices for each step as effective tools for detecting and characterizing low mass planets in the presence of stellar variability. We propose three ways forward, the first two concerning the validation of reduction and modelling (step a and b), and the third the whole process (a-c).
The first approach is Bayesian model comparison, to evaluate the relative merits of different stellar activity and instrument systematics models. Ideally this would be applied to datasets with no ambiguity on the presence or not of planetary signals. For example, since we know the true solar velocity, we can evaluate the signal RVcontam of each method applied to sun-as-a-star observations. A similar approach may be possible for some other stars hosting a massive planet on an eccentric orbit, since one can exclude the possibility of additional planets for a wide range of orbital periods based on orbital stability considerations , Stalport et al. 2022. Large and computationally demanding computer models could provide data for training and testing models, though they must be validated beforehand. The observations of the Sun by different high precision spectrographs can be leveraged to disentangle instrument specific noise from stellar variability.
Second, for stars where one acquires a sufficiently large number of observations, one could evaluate the accuracy and precision of a model mitigating stellar variability via its ability to predict RVcontam (see Eq. 7.) and stellar variability indicators at times not used in the training of the model. Given multiple processes operating on a variety of timescales, one must carefully design the training, testing and validation procedure. For example, if one obtained multiple spectra per night and stellar variability were operating on timescales of weeks, then one could trivially predict the RVcontam from another observation on the same night, without learning how to recognize stellar variability in line shapes or depth ratios. Additionally, one must be careful that cross validation is not undermined by researchers effectively trying many strategies and reporting results from those that appear to work best.
The two approaches above pertain to validating models for contaminating signals. The most convincing method for validating models may be via planet injection-recovery tests. One group of researchers would work on injecting planets in real or simulated datasets. Another group of researchers would blindly analyze large ensembles of simulated datasets. This would allow each proposed data reduction method and likelihood to be evaluated differentially, i.e., comparing the inferred velocities from multiple simulated data sets generated from the same true data set. Given the inevitable arbitrary nature of labeling some signals as "confident" exoplanet detections, it is not sufficient for teams to label which sig-nals they believe is due to an exoplanet. Instead, we recommend that they provide a list of all putative signals along with a quantiative measure of the signals' statistical significances (FAP, Bayes factor, FIP, see §4). Then, the number of false detections can be computed as a function of the number or properties of missed planets. To generate the data, one approach would be to remove telluric and instrumental effects, introduce a larger number of artificial Doppler shifts, reinject the telluric and instrumental effects, and generate many new synthetic data sets (perhaps adding additional noise along the way). One must be careful in implementing this approach, as an error in the injection process could become a feature learned by data-driven methods that would not be available for realistic data sets.
In this review, we adopted the viewpoint that RV data analysis should be viewed as a whole from the most basic data products to the final decisions. It will be crucial to build accurate instrument systematics and stellar noise models, to leverage as much as possible the information contained in the spectra, and to build reliable metrics to validate the different methods. Progress will most likely stem from the combination of a deep knowledge of the instruments and astronomical context combined with formal approaches, which will provide adequate tools to represent and analyze the data.

A. DEFINITIONS
For convenience we reproduce here the formalism used in the main text. We denote by RV(t) the radial velocity (RV) of a given star due to a motion of its center of the mass at time t. We denote by RVcontam(t) the RV signal caused by stellar variability and instrument systematics, and ϵ(t), the photon noise. From an estimated Doppler shift of the spectrum, the radial velocity measurement estimated at time t, RV(t), is 17.
where RV planets (t) is the RV signal caused by the planets and RVg(t) is the RV signal caused by other gravitational effects, such as the proper motion of the star in the galaxy, or the influence of another star. The time series of radial measured velocities is denoted by RV = ( RV(ti))i=1,..,N . Let us suppose that a star is observed by a spectrograph at N different times, ti, i = 1, .., N . We denote by D(ti) the lowest level data product at time ti, the fluxes measured on the spectrograph detector of the and various housekeeping data (temperature, pressure etc). In principle, we could base our inference on the time series D = (D(ti))i=1,..N . Loosely speaking, we want to use non Doppler variations of D to estimate as closely as possible RVcontam(t), in particular the variation of the shape of spectral lines, and colordependent effects. We can also leverage the fact that, unless there are strong dynamical interactions between planets, the planetary signals are purely periodic, but systematics and stellar activity are not.
We denote by θ the parameters of the planets orbiting the star and η all other parameters describing the signal (offset, noise level, parameters of the noise kernel), respectively. We have θ = (θ1, ..., θn) where n is the number of planets (which is also a variable) and θj the orbital elements of the planet indexed by j (see Eq. 23.). D is reduced to an estimate of the RV time series RV and ancillary indicators I = (Ij(ti))j=1..p,i=1..N , are p time series of length N , summarizing the variations of shape of the spectrum. The goal is not to loose too much information in the reduction process. We thus want p(θ, η | D) ≈ p(θ, η | I, RV), 19.
and thanks to the Bayes formula, we have .

20.
The last equality comes from the fact that RV(t) depends deterministically on θ, η. Below, we denote y = ( RV, I), potentially with I = 0. We will refer to p(θ, η) as the prior distribution and p(y | θ, η) as the likelihood. The notation p(y | θ, η, n) refers to the posterior restricted to models with exactly n planets. The likelihood is commonly assumed to be Gaussian, where |V(η)| is the determinant of the covariance matrix V(η). If y = RV (and we assume non-interacting Keplerian orbits), then the vector g = (g(ti))i=1..N is such that for some user-defined function h(η), for instance intercepts (also called offsets) modelling the uncertainty on the reference 0 velocity of the instruments. The function f is defined as f (t; K, P, e, ω, M0) = K [cos (ω + ν(t; e, P, M0)) + e cos ω] 23.

26.
We treat the case where y is a concatenation of RV and ancillary indicator time series in §E.3 of this supplemental appendix.

B. UNDERSTANDING THE SIGNAL
In §B.1, we present briefly how the RV and activity indicators are extracted from the raw data. In the formalism of §A, we present how the data is treated to go from the raw measurements D to 1-dimensional spectra, and from there to activity indicators and RV y = (I, RV). In §B.2 we present the efforts undertaken to understand stellar signals. For exoplanet detection, this can be seen as building knowledge to express a realistic likelihood p(y | θ, η). Practical likelihoods are given in §E.2.

B.1. Early stages of signal processing: calibration, wavelength solution, reduction
To measure the radial velocity of a given star, its light is collected by a telescope, and, in modern high-precision instruments, an optical fiber in the focal plane of the telescope is connected to the spectrograph. (In some earlier instruments, a slit was used instead of fibers, but these are being phased out for high-precision radial velocity surveys.) The spectrograph will diffract the input light and record the obtained spectrum on a detector (e. g. CCD arrays for spectrographs operating in the visible). Depending on their position, pixels receive light at a certain wavelength. More precisely, to measure radial velocities, a certain design of spectrograph calledéchelle spectrographs is used. Such a design has the particularity of dispersing the light both vertically and horizontally, which allows to obtain very high resolution spectra with a wide spectral range on a small detector, therefore reducing costs of the detector array and cryostats to cool down the detector. The raw data product of a single measurements are flux measured on a two-dimensional CCD arrays. The extraction of the radial velocity and activity indicators follows many steps. The objective of the first steps is to compute a one-dimensional spectrum: the flux of the star as a function of wavelength. From one exposure to the next, the response of the instrument will vary because of the variations of the environmental conditions of the instrument (temperature, pressure), which varies even for the most recent instruments put in vaccuum and controlled in temperature. Accounting for these changes relies on so-called calibration steps, where the response of the instrument is evaluated with reference lamps which properties are known. There are two types of lamps: one emitting white light (a tungsten lamp, a laser-driven light source, LDLS, or continuum laser), and one with reference emission spectral lines at known wavelength, such as a ThAr or Uranium lamps, Fabry-Perot interferometer or a laser frequency comb. On a Fabry-Perot interferometer, we do not know the position of the lines but the line spacing is known, and their drift on a timescale of a few hours is very low, so we can use such a lamp to measure drift over the course of one night. Presenting in detail the different steps of the calibration process is well beyond the scope of the present work, and we refer the reader to Hatzes (2016) for an introduction (see also Baranne et al. (1996) 2022)). We here briefly describe important steps, which might be performed differently from one instrument to the next: (1) Removing the values of the "bad pixels" with aberrant values; (2) Correcting for the "flat field": The detector is illuminated with a uniform white light to locate the pixels which receive flux, and correct for the different response of pixels). (3) Extracting the orders: not all the pixels of the detector receive photons from the star. This step includes extracting the position of the pixels receiving these photons and modeling any cross contamination due to light from another order, fiber or scattering in the instrument; (4) Computing the wavelength solution: The wavelength of the photons falling on each pixel has a certain wavelength that depends on its position. Prior to the observations of the target star, the detector is illuminated with the calibration lamp with reference lines. The pixels corresponding to the emission lines are now attributed a wavelength, and the wavelength of pixels in between are associated to a wavelength via an interpolation with a polynomial or more sophisticated techniques (e.g. Dumusque 2018, Cersullo et al. 2019); (5) Measuring the drift of the instrument: Between the wavelength solution step and the observation of a given star, the wavelength solution of the instrument has changed. To evaluate it, close to the pixels receiving the stellar flux, other pixels are illuminated through a second fiber with a reference lamp. This is done both prior to the observations taken on a given night when the wavelength solution is computed, and during the exposure to the star of interest, and the differences are used to evaluate the shift of the instrument; and (6) Barycentric correction: The Earth rotates and moves in the Solar system, resulting in a Doppler signature with an amplitude of ≃ 30 km/s. This contribution is estimated from an ephemeris of the Solar system to cm/s accuracy and accounted for by transforming the wavelength solution to refer to the wavelength in the solar system barycenter frame (neglecting gravitational redshift from the sun) (Kanodia & Wright 2018, Tronsgaard et al. 2019. See Halverson et al. (2016) for a review of the error budget corresponding to the steps outlined above.
Once the spectrum is extracted, the spectral energy distribution is affected by the black-body profile of the star, and different observing conditions (airmass, clouds, optical filter ageing, reduced fiber transmission and efficiency CCD response in the blue portion of the spectrum). A step of so-called "color correction" is performed to scale the observed continuum toward a reference template (see e.g. Malavolta et al. 2017) A similar color correction can be obtained by normalizing the stellar continuum (see e.g. Xu et al. 2019, Cretignier et al. 2020b) -a step which is unnecessary when extracting the velocity line by line (Dumusque 2018, Artigau et al. 2022) -before re-injecting a reference color. From the one dimensional spectrum, the radial velocity is extracted with different methods, described in section 6 of the main text. The one dimensional spectrum is still affected by many instrument systematics, the absorption of the Earth atmosphere at certain wavelength, and sunlight scattered by the moon (e.g. Bonomo et al. 2010, Roy et al. 2020 which have an important effect on the data. There is much to gain by better understanding these effects. Once an effect has been suspected to occur, one can compute the velocities before and after the procedure aimed at correcting of the said effect, and the difference of velocities gives the contribution of the effect. This methodology has been used in Cunha et al. (2014), Artigau et al. (2014, Kjaersgaard et al. (2021), Allart et al. (2022, Wang et al. (2022) for the correction of the Earth atmosphere absorption, and Cretignier et al. (2021) for other various effects. In the latter case, the systematic effects can be seen on a plot in color code of spectra stacked onto one another.
In §2 of the main text, we separated the analysis of radial velocity in three steps: (a) Reduction: From the raw spectrum, we obtain an estimate of the radial velocity and of other stellar variability indicators. These indicators can be the time series of one dimensional spectra at the available wavelenghts themselves (Bedell et al. 2019), or in the majority of cases, a few summary statistics; (b) Modelisation: We build a likelihood model of the indicators, and possibly attributing them prior distributions; and (c) Parameter estimation & Decisions: Based on these information, we decide which planets have been detected and characterize the masses and orbital elements of each detected planet. Typically, it is also important to retriev some information about the host star (often from separate astronomical techniques), since the stellar properties effect the planet mass and climate. In the present section, we have seen that step (a) can be in fact subdivided in obtaining the one-dimensional spectrum, then applying further corrections, and finally extracting indicators. Similarly, step c combines the detection of planets and the estimation of their orbital elements into one logical step. In practice, these are often implemented separately for computational convenience.
Overall, the goal of step (a) is two-fold. First, we aim to obtain stellar variability indicators and RV estimates that have the intended meaning. In particular, the RV that we measure should result from a pure Doppler shift of the spectrum. Second, the information contained in the spectrum should be condensed with as little loss as possible in the RV time series and indicators.

B.2. Stellar signal
Understanding the impact of the variability of stellar surfaces on the signal is important to reach a precision below 1 m/s. There are several types of effects on stellar surface which affect the spectra measured and imprint onto the estimated radial velocities.
The physical processes taking place in stellar surfaces, how they vary depending on the type of star, and their impact on the data have been investigated through simulations, observations of the Sun and other stars. For a review of stellar effects affecting the RV, we refer the reader to Cegla (2019), Meunier (2021).
Stars can be locally brighter and darker because of local excess magnetic field (see Fig.  8). As the observed star rotates, the feature breaks the imbalance between the approaching and receding stellar limbs, respectively blue-shifted and red-shifted. Furthermore, these (a) Full surface of the Sun, sunspots are the regions that appear darker than the continuum. The solar surface appears darker on the limb, a fact known as the limb-darkening effect.
(b) Same group of sunspots observed at two different dates, top: Jan 6 2012, bottom: Jan 8 2012.The granular aspect of the continuum comes from the convective motion of the gas at the stellar surface.
(c) SDO observation of the Sun at wavelength 1700Å. The bright regions are called faculae, notice that faculae areas surround the spots (darker regions). Both faculae and spots inhibit the upward convective motion of the gas. Faculae cover more area than spots, but have a smaller temperature contrast with the continuum. The latitude, number and lifetime of the spots varies along stellar magnetic cycles, on the timescale of a few years (11 years for the Sun).
(d) Inouye Solar Wave Front Correction (WFC) image, captured Jan 28, 2020, at 530 nanometers. The granulated structures around the spot are due to convection: hot plasma moves upwards at the center of granules, cools down and goes downwards between granules (darker inter-granular regions). The contribution of brighter intragranular region exceeds that of intergranular ones, creating a so-called convective bluedhift on observed spectra.
Convection on the surface of the star creates a so-called granulation pattern (see Fig.  8d, and Cegla (2019) for a review). The effect of the changing patterns of convection on radial velocity and indicators changes with stellar spectral types (Meunier et al. 2017) and has been simulated in several works, Cegla et al. (2013), Meunier et al. (2015), Cegla et al. (2018), Dravins et al. (2021). In simulations, the effect of this process on RV appears to be correlated to its effect on activity indicators (Meunier & Lagrange 2013, a fact that is still to be fully leveraged for the correction of the RV granulation effect. Stellar variability models need to be calibrated. This was initially done through the observation with the HARPS spectrograph of the solar light scattered by the asteroid 4/Vesta, with simultaneous high resolution observations of the solar surface with the solar dynamics observatory (SDO). From the surface observations, and based on the physical understanding of RV effect, the expected RV is computed and compared to the HARPS RV measurements . The lifetime and properties of granules on the Sun can be also derived from observations of the Sun (e.g. Hirzberger et al. 1999). An interesting observation is that, even though the lifetime of granules is ∼ 15 min, averaging the solar surface for 1h does not completely average out their effect. Furthermore, in recent years, several telescopes were built to observe the Sun and feed high precision spectrographs used to detect exoplanets at night. Such telescopes exist for HARPS-N (Dumusque et al. 2015, Phillips et al. 2016, HARPS and NIRPS 7 spectrograph, NEID (Lin et al. 2022), EXPRES (Llama et al. 2022), and Keck Planet Finder (KPF) 8 . Several years of high-resolution solar spectra are publically avaliable from HARPS-N ) and NEID 9 . And astronomers are beginning to draw conclusions about the Sun's variability based on Sun-as-a-star observations (e.g., Collier Cameron et al. 2019, Miklos et al. 2020, Al Moulla et al. 2022b Another well explored area consists in studying the spectroscopic observations of other stars, in particular the radial velocities and activity indicators. There has been substantial work on the observation of M dwarves (e. g. Bonfils et al. 2007, Forveille et al. 2009, Suárez Mascareño et al. 2018, Zechmeister et al. 2018, in particular the periods of the exhibited variability, using the CARMENES sample (Schöfer et al. 2019, Lafarga et al. 2020. It was shown in particular that some activity indicators exhibit a closed-loop pattern. It means that when plotting the time series of RV acquired at times ti, i = 1, ..., N not against time, but as a function of an activity indicator at time ti, the figure exibits a loop pattern. The behaviour of the indicators in Sun-like stars has been studied in Gomes da Silva et al. (2011, Gomes da Silva et al. (2012), Figueira et al. (2013), Santerne et al. (2015), Lanza et al. (2018), Suárez Mascareño et al. (2020. The effect of granulation on RV and photometry has been studied in Sulis et al. (2022) with simultaneous CHEOPS and ESPRESSO observations of two stars, of spectral type G and F.
More recently, attention has been given to the color-dependent effect of stellar activity: depending on their wavelength, spectral lines are affected differently (Thompson et al. 2017, Wise et al. 2018, Cretignier et al. 2020a, Ning et al. 2019, Al Moulla et al. 2022a, Lafarga et al. 2023. This aspect will be crucial to leverage all the information present in the spectrum to estimate the contaminating signal in RVs (see Eq. 18.). This knowledge might be coupled to data-driven approach leveraging all the spetral information (Bedell et al. 2019, Gully-Santiago & Morley 2022. In addition to magnetic activity and granulation, acoustic oscillations of the stars also affect the data (Dumusque et al. 2011b, Chaplin et al. 2019. The most important oscillations for solar-type stars (known as p modes) have characteristic periods of a few to several minutes. Stellar Meridional winds (Beckers 2007, Makarov 2010, as well as gravitational redshift (Cegla et al. 2012) also affect the data at the level of a few cm/s.
To conclude this section, let us note that the problem laid out in §A has some similarities with the setting of a technique called Doppler imaging, which relies on the following principle. As we have seen above, the presence of magnetically active region on the surface of the star changes the shape of the spectral lines. Conversely, one can infer the position and size of magnetic regions from the change of shape of the spectrum (for stars that rotate sufficiently fast). The inference is usually done on a weighted average of the spectral lines (Khokhlova 1976, Goncharskii et al. 1977, 1982, Vogt & Penrod 1983, Vogt et al. 1987. From spectrapolarimetric observations, one can also retrieve some properties of the magnetic field (Donati et al. 1997). In §A, we do not aim at retrieving the distribution of stellar spots from a shape change in the spectrum, but at estimating the contaminating RV signal. Another approach consists of estimating the distribution of stellar spots with a Doppler imaging approach, and predicting the corresponding RV signal Lanza et al. (2011), Petit et al. (2015, Barnes et al. (2017), Yu et al. (2019), Klein et al. (2022), see Barnes et al. (2015Barnes et al. ( , 2017, Hébrard et al. (2016), Baroch et al. (2020) for the particular case of M dwarves. While this can work well for stars with one or two large active regions, the problem becomes underconstrained once one allows for several active regions being active at once and/or smaller active regions that evolve on shorter timescales. Luger et al. (2021a) builds explicitly a likelihood for the Doppler imaging problem and provides a framework for marginalizing over surface maps. While this can work well for stars with large active regions, applying it to active regions more typical of sun-like stars has yet to be proven practical, as it would require a very high-order spherical harmonic expansion of the solar surface.

C. PERIODOGRAMS
Let us suppose we have a timeseries with N entries, y = (y(ti))i=1,..N . We want to determine if y exhibits periodic variations, here potentially due to planets. A common way to do this is to use a family of methods called periodograms. Such methods consist in computing the difference of log-likelihoods of a null hypothesis model H0 and an alternative model Kω, including H0 and one or several periodic signals. The simplest form of periodogram is the Lomb-Scargle periodogram (Lomb 1976, Scargle 1982. With the notations of Eq. 21. the alternative models are both Gaussian multivariate distributions such that H0 : g = 0, 27. Kω : g(A, B, ω) = A cos ωt + B sin ωt, 28.
where A and B are chosen to maximize the log likelihood of Kω, which is here equivalent to finding A and B minimizing the sum of squares N i=1 (y(ti) − A cos ω(ti) − B sin ω(ti)) 2 /σ 2 i , where σi is the error on measurement i. The null hypothesis expresses how the data is expected to behave in the absence of periodic components. For the Lomb-Scargle periodogram, the null hypothesis is that the data is a realization of a Gaussian, uncorrelated noise.
In general, denoting by θK and θH the parameters of models Kω and H0, the periodogram of time series y evaluated in frequency ω is Other definitions are possible, in particular some authors normalize the periodogram by max θ H 0 log p(y | θH 0 ) (Baluev 2008). The periodogram is computed on a grid of frequencies, Ω = (ωi)i=1..N . We compute the maximum of the periodogram of the data to be analysed y, which we denote by P d , and define a false alarm probability, that is the probability that the maximum of the periodogram exceeds P d knowing H0 is true. A small FAP means the hypothesis H0 is unlikely. If one generates M signals following the hypothesis H0 and computes their periodogram, the FAP is simply the number of times the maximum of the periodogram exceeds P d divided by M . This brute force approach gives a precision of 100/M % on the FAP. To evaluate whether the FAP is below 0.01 % requires at least 10,000 simulations. However, thanks to the theory of extreme values of stochastic process, it is possible to obtain analytical approximations of the FAP associated to a certain definition of the periodograms, which greatly facilitates the use of periodograms (Baluev 2008, 2009,a, 2015b, Delisle et al. 2020a). In the following, we will make explicit whether such a formula exists. In between the brute force and analytical approaches, the method of Süveges (2014) consists in generating a few datasets following the null hypothesis, and fit to the cumulative distribution function of maxima of the periodograms a generalised extreme value (GEV) distribution. The periodogram has been extended to definitions of the likelihoods of H0 and Kω more complex than the Lomb-Scargle periodogram.The models are often assumed to be Gaussian, and each have a likelihood in the form of 21. In Ferraz-Mello (1981), Cumming et al. (1999), Reegen (2007), Zechmeister & Kürster (2009) the Lomb-Scargle periodogram is generalized by fitting for c0, as defined in Eq. 22. at each trial period. Ford (2008) extends the definition of H0 to c01 + c1t where t = (ti)i=1..N . These are particular instances of the case considered in Baluev (2008), where H0 is a linear model Mx, M is a N × p matrix and x a vector of size p, H0 : g(x) = Mx, 31.
where x, A and B are fitted for each trial frequency to maximize the likelihood of model Kω. This framework can be applied to fitting separate offsets c0 for each instrument. If there are p instruments, one simply has to define Mij = 1 if measurement at time ti is taken by instrument j, and Mij = 0 otherwise. This framework also allows to approximately compare n + 1 vs. n planets instead of 1 vs. 0, when including in the columns of M the cos ωit and sin ωit of the frequencies of planets found so far ωi, i = 1, .., n. Baluev (2008) provides an accurate analytical false alarm probability. So far, V is supposed fixed. The noise model can also be generalized by allowing for an unknown σJ fitted at each frequency (Baluev 2009). Here, the noise is still assumed to be uncorrelated temporally, which is unrealistic in many cases due to the presence of stellar activity and instrumental systematics. The analytical FAP formula is generalised in Delisle et al. (2020a) to arbitrary, fixed covariance matrix V, thus generalising the formula of Sulis et al. (2016) to the case of uneven sampling and non stationary Gaussian noises. It is also possible to fit the parameters of a non white noise model at each trial frequency, but this is costly and there is no analytical formula for the FAP (Delisle et al. 2018).
Another line of work generalizes the form of the periodic function. Instead of fitting a sinusoid, the Kω model can be defined as a Keplerian model (Cumming 2004, Zechmeister & Kürster 2009). This is particularly useful to detect signals of planets with high orbital eccentricity. Baluev (2015b) provides a rigorous analytical FAP in that case, as a special case of analytical FAPs for non-sinusoidal signals Baluev (2013a).
Furthermore, instead of searching for one periodicity, it is possible to search for multiple planets simultaneously , Baluev 2013, and jointly modeling RVs and other stellar ancillary indicators , Jones et al. 2022.
Typically θ 1 Kω is the frequency and θ 2 are parameters on which the model depends linearly. In that case, marginalization can be done with analytical expressions (Cumming 2004, Mortier et al. 2015, Feng et al. 2017. Marginalizing can also be done via the Laplace approximation (Ford 2008, Nelson et al. 2020 or by using numerical quadrature. The latter method has been used to marginalize over a noise term σJ at each trial frequency (Nelson et al. 2020, Appendix A3). Conceptual and practical differences between using the maximization or marginalization of the likelihood are discussed in Baluev (2022).
The type of periodogram to use depends on the trade-off between precision and speed needed. The availability of an accurate analytical FAP is the major driver of the speed, since there is no need to compute periodograms several times. Analytical estimates listed above are reliable for signals with a FAP of 0.1 or lower. All periodograms involving a linear fit with an analytical FAP (Baluev 2008, Delisle et al. 2020a) are very fast (a few seconds on a typical dataset, on a laptop). Keplerian or multi frequency periodograms can take longer (several minutes at least).

D. BAYESIAN MODEL COMPARISON: COMPUTATIONAL CHALLENGES
Bayesian model comparison involves computing the Bayesian evidence, of marginal likelihood. The Bayesian of evidence (also called marginal likelihood) of a n planet model is 34. and the ratio p(y | n+1)/p(y | n), referred to as a Bayes factor, is a very common statistical significance metric for exoplanet detection. If n planets have been confidently detected, and if the Bayes factor is above a certain threshold, the detection of the n+1th planet is claimed. The computation of the evidence and a reliable estimation of the posterior distribution p(θ, η | y, n) are both crucial for a useful Bayesian detection criteria that results in the posterior probability for a planet to be within a confined region of the parameter space (Brewer & Donovan 2015, Hara et al. 2022c).
Evaluating an integral over a large parameter space is notoriously difficult. One of the difficulties of radial velocity data is that the model contains periodic signals. The frequency resolution is approximately 2π/T obs where T obs is the total timespan of observation. As a result, in a frequency range [ω1, ω2] with Ω = ω2 −ω1, the number of local likelihood maxima is approximately N locmax = ΩT obs /2π. If there are k periodic signals with frequencies in the same range, we should expect ∼ N k locmax local minima when varying the frequencies. Furthermore, the geometry of the Keplerian signals is such that there are many local minima at high eccentricity (Baluev 2015b, Hara et al. 2019. To compute Eq. 34., two family of methods are used: Monte-Carlo Markov Chain (MCMC) and nested sampling. In both cases, special attention must be paid to the exploration of the complex parameter space. MCMC methods rely on the definition of a proposal distribution, which must lead to samples both jumping from local minima far from each other, and explore the dominant modes. The numerical difficulty of nested sampling algorithms is to find efficiently a sequence of independent samples following the prior distribution restricted to regions of the parameter space with sufficient likelihood. Ford & Gregory (2007) compared several methods for estimating the Bayesian evidence, and found that several performed particularly poorly for our standard model. Methods that performed well for the 1-planet model were: the Laplace approximation, importance sampling, the "ratio estimator" (Nelson et al. 2016), and parallel tempering Monte-Carlo Markov Chain (MCMC) (Gregory 2007b). Subsequently, several authors have applied nested sampling algorithms (Brewer 2014, Handley et al. 2015b, Faria et al. 2016, Buchner 2021, which couple the estimation of the posterior distribution with the Bayesian evidence (Eq. 34).
In order to better understand the strengths and limitations of approaches for computing Bayesian evidences, Nelson et al. (2020) designed a data challenge in coordination with the 3rd Extreme Precision Radial Velocity workshop (Penn state, August 14-17 2017). Nine participants/teams were given two sets of priors, and four likelihoods (corresponding to 0, 1, 2 and 3-planet models) and 6 simulated datasets. Teams compared their results for the first dataset during the workshop, refined their methods and submitted updated results. Collectively, teams reported estimates of the marginalized likelihood and its uncertainty using fifteen computational methods.
As expected, some methods (e.g., Bayes Information Criterion, Chib's estimator) gave highly discrepant results for Bayes factors. Laplace approximation typically agreed with other high-quality methods to within a factor of ∼ 10 2 for 0 to 2-planet models and ∼ 10 4 for a 3-planet model. Of the remaining methods, there was general agreement for the Bayes factor to within a factor of ∼ 10 for 1-planet models, and ∼ 10 2 for 2 and 3-planet models. Methods that demonstrated broad agreement included: combining MCMC with importance sampling using either the Perrakis estimator (Perrakis et al. 2014) or the ratio estimator (Nelson et al. 2016), variational Bayes with importance sampling (Beaujean & Caldwell 2013), and multiple forms of nested sampling (Veitch & Vecchio 2010, Feroz et al. 2009, Feroz & Skilling 2013. It is unclear why diffusive nested sampling (using DNest4; Brewer 2016) sometimes gave results significantly discrepant from the other Bayesian methods.
The significant dispersion in numerical estimates of Bayes factors emphasizes the importance of adopting a threshold Bayes factor much larger than is typically recommended in the literature (often 150, in the context of much simpler models). At first, it may seem concerning that different estimates of Bayes factors could vary by a few orders of magnitude. In practice, Bayes factors for RV datasets are often so extreme that an uncertainty of "only" a few orders of magnitude is acceptable. Of course, publication biases can result in planet discoveries being preferentially reported when the Bayes factor is still "only" a few to several of orders of magnitude from unity.
Another important result of this data challenge was that the observed dispersion of Bayes factors estimates was often significantly greater than predicted for some methods. Therefore, when using iterative algorithms, it is important to compute and report uncertainties for Bayes factors based on multiple independent runs.

E.1. Priors
The choice of priors affects both orbital parameter estimates and model comparison to determine the number of planets, especially for low amplitude signals, where the likelihood is flatter (Hara et al. 2022c). Based on a combination of astrophysical and statistical considerations, Ford & Gregory (2007) −c0,max, c0,max], where U represents a uniform prior with the given bounds. For minimally informed priors of scale variables, Ford & Gregory (2007) introduce a prior distribution for a real variable z ∈ [0, zmax], defining 0 < z0, zmax pF G(z | zo, zmax) = (zo + z) −1 log(1 + zmax/zo) , 35. they recommend p(K | P ) ∼ pF G(K | Ko, Kmax(Pmin/P ) 1/3 ) and p(σJ ) ∼ pF G(σJ | σJ,o, Kmax). Physically motivated choices for Pmin and Kmax vary with the target star properties. For instance, the values of K and P corresponding to a planet that would be disintegrated with tidal forces, that is beyond the Roche limit, are excluded. Typical values for a sun-like star would be Pmin = 0.2 days and Kmax = 3.6 km/s. Non-zero values of Ko and σJ,o are necessary to ensure proper priors. Ford & Gregory (2007) recommend Ko = σJ,o = σRV,o 50/N obs , where σRV,o is the typical measurement precision and N obs is the number of observations, so as to provide sensitivity to planets within the realm of detection without overly penalizing models for planets that are essentially impossible to detect with available data. Posterior distributions for e and ω vary considerably with the quality of the data. As geometric considerations dictate the prior for ω, the only priors that are likely to affect the estimation of orbital parameters are those of K and e.
a For sun-like stars, astrophysical considerations justify hard limits of Pmin ≃ 0.2d and Pmax ≃ 10 8 year. However, theory predicts that the occurrence rate drops dramatically at much shorter orbital periods (≃ 1000 3 yr). In practice, RV surveys are limited by the survey duration and the choice of Pmax is only important when performing model comparison. b Empirically, the planet occurrence rate decreases for periods less than 3d. For an analytical reference prior that limits how much very-short period planets are overweighted relative to nature, one might adopt P ∼ p F G (3d, Pmax), truncate assign no probability to periods less than Pmin. c The intrinsic occurrence rate of small planets may to continue rising to smaller amplitudes than are typically detectable for the foreseeable future. In practice, we recommend setting Ko = σ RV,1obs 50/N obs , so as to reflect the limiting precision of an RV survey, where σ RV,1obs is measurement precision of a single observations and N obs is the number of observations. Since there will not be strong evidence for a planet with K less than the detection limit of a survey, there is no benefit to assigning substantial prior mass to such planets. d For Sun-like stars and Pmin = 0.2d, Kmax ≃ 3.6km/s. Larger RV signals would be due to perturbations by a stellar binary companion, rather than an exoplanet. e Keplerian orbital models develop pathologically narrow spikes for e ≃ 1. Formally, one could avoid these by imposing a maximum eccentricity, emax(P ) = 1− r p,min AU P yr , so as to ensure the planet's pericenter distance exceeds the Roche limit, rp,min. In practice, most MCMC and nested samplers do not find the pathological solutions for e ≃ 1 and thus practioners set emax = 1, but ignore the contributions from e ≃ 1 solutions. Kipping (2014) uses a beta prior fitted on the eccentricity of known giant planets, which disfavours high eccentricities. f Geometric considerations provide a strong motivation for this choice. g For surveys of sun-like stars in the solar neighborhood, a dispersion of σ disp ≃ 40km s −1 would be well-motivated by knowledge of Milky Way structure. While astrophysical knowledge could motivate alternative choices for other populations, in practice this has minimal effect as long as the prior assigns non-zero mass to the true mean radial velocity offset. In practice, most practitioners assign a uniform prior. h σc 1 ≃ 10 km s −1 is representative of the acceleration due to a wide binary companion. The specific choice of prior is unlikely to effect parameters for any well-characterized planets once the timespan of observations exceeds two orbital periods.
The choices made so far consider that the different orbital elements are statistically independent. However, some combinations of parameters are less likely than other. For instance Steffen et al. (2010) uses a Rayleigh prior on eccentricity stemming from dynamical scattering models in the solar system. In multiplanetary systems, certain parameter configurations lead to dynamically unstable systems and can be ruled out. Ensuring socalled AMD stability, which is necessary for the system to be stable , or using randomly selected samples to set the initial conditions of a n-body simulation of the system, and simulations that end up violating a stability condition are dropped (Hara et al. 2020, Stalport et al. 2022. To speed up computations, the stability for a given set of orbital elements can be predicted with a trained neural network (Tamayo et al. 2020).
While the uncertainties on orbital elements obtained from posterior distribution for RV-detected exoplanets are usually insensitive to the priors, those for P , K and σJ can significantly affect the marginalized likelihood and thus can influence Bayesian model com-parison. Indeed, the broader the priors, the more costly it is to add a planet. The effect of P and K priors is discussed in Hara et al. (2022c).
One could attempt to update the priors above based on the substantial improvements in our understanding of the distribution of exoplanets over the past decade (e.g., He et al. 2020, Kipping 2014. This is particularly difficult in RV, subject to complex observational biases due to frequent human intervention in observational strategies. Thus, at present we recommend that standard practice use the analytic priors such as those above, potentially refined by a stability analysis. In our nominal model (see Eq. 4.), we add a trend c1, for which we recommend, p(c1) ∼ N (0, σ 2 c 1 ), with σc 1 = 10 m/s/yr, representative of an acceleration due to a wide binary companion. Posteriors for c0, c1 and σJ are typically well-constrained, but of secondary interest. The priors on the correlated noise model must be chosen carefully and with relatively strict bounds, otherwise the noise model can absorb viable planetary signals.

E.2. Contaminating signals as Correlated Noise
Signals induced by stellar activity and instrumental systematics can be represented as Gaussian noises with vanishing mean and a certain covariance function conveying their self similarity as a function of time. Below, we focus on the representation of stellar noises, as there are no consensus on stochastic representations for systematics. However the models adopted can also absorb some of the instrumental signals.
E.2.1. Granulation. Section 2.4.1 of the main document introduced a type of stellar variability called granulation. The RV variations induced by this phenomenon are modelled as a random process with a super-Lorentzian profile, which has a power density spectrum in the form P (ω) = S0/(1 + ω a /ω a 0 ) where S0, ω0 and a are free parameters (Harvey 1985, Dumusque et al. 2011b, Cegla et al. 2018), or as a sum of such processes. A value of a ∼ 4 seems to be favoured by photometric and RV observations (Kallinger et al. 2014, although there can be a higher discrepancy (Dumusque et al. 2011b, Cegla et al. 2018. With this value, the covariance matrix V in Eq. 21 can be written ) where Sg and ωg parametrize the amplitude and correlation time-scale. Based on astreroseismic observations, astronomers typically invoke 2-4 granulation timescales, potentially ranging from minutes to days. Theoretical considerations also provide guidance for relationships between granulation amplitudes and timescales as a function of stellar properties Guo et al. (2022). Cegla et al. (2019) study the correlations between the RV and ancillary indicators time series and find in particular that the so-called line equivalent width and the total photometric flux correlate with RV, although the correlation reduces with an increased rotational velocity of the star.
E.2.2. Magnetic activity. Next, we consider the effect of magnetically active regions (e.g., stellar spots and faculae). Large active regions can persist for more than one rotation time, leading to a quasi-periodicity in the measured RVs, since each active region grows, shrinks and changes in strength and shape over its lifetime. As stellar rotation periods are often comparable to plausible orbital periods, the quasi-periodic nature of active regions makes them particularly problematic when trying to detect and build confidence in a putative planet candidate. One kernel commonly used to model this behaviour is (Aigrain et al. 2012, Haywood et al. 2014, is the product of a Gaussian decay with time-scale η2 and a periodic function with period η3 that is often associated with the stellar rotation period. The parameter η4 controls how close the ∆t must be to a multiple of η3 for there to be a strong periodicity. Low values of η4 lead to a very peaked correlation around the multiple of η3, which is unphysical, and η4 ∈ [0.5, 2] is often used. On Sun like stars, spots have a tendency to appear on diametrically opposed longitudes of the star . Based on simulations, Perger et al. (2021) suggest using where h1, h2, λ and P are the free parameters of the kernel representing the amplitude of the quasi periodic part of the kernel, of the first harmonic of rotation due to the presence of spots on opposed longitudes, the coherence timescale of the noise and the stellar rotation period. See Nicholson & Aigrain (2022) for a discussion of the QP and QPC kernel parameters and stellar properties. Foreman-Mackey et al. (2017) suggest to use the kernel corresponding to a stochastically excited harmonic oscillator, the SHO kernel, cosh (η ω0 τ ) + 1 2 η Q sinh (η ω0 τ ), 0 < Q < 1/2 2 (1 + ω0 τ ), Q = 1/2 cos (η ω0 τ ) + 1 2 η Q sin (η ω0 τ ), 1/2 < Q 38. where S0, ω0 and Q are the free parameters of the kernel, describing its amplitude, natural frequency and quality factor, respectively. The higher the quality factor, the more the power spectrum density (PSD) is peaked around the natural frequency. For low quality factors, the PSD is decreasing with frequency, and might represent granulation. The kernel 36. is a SHO kernel with Q = 1/ √ 2. The covariance matrix corresponding to a SHO kernel has the advantage of being semi-separable, such that computing V −1 (y − g) (see Eq. 10.) has a O(N ) cost.
Quasi periodic kernel can be useful for stars with large, long-lived active regions, most active regions for relatively quiet stars like the Sun do not last a rotation period. Based on modeling simulated solar time series, Gilbertson et al. (2020) recommend the kernel where k M 5/2 is the Matérn-5/2 kernel, η3 is a characteristic timescale, and the strength parameters η1 ≃ 0.1069 and η2 ≃ 1.154 were calibrated based on simulations active regions for the Sun. Other kernels in use include Gaussian, exponential or kernels based on the physics of the star (Baluev 2018, Luger et al. 2021b). In Fig. 6, we represent the kernels described above with different values of their parameters. Figure 9: Kernel functions: functions of the time lag ∆t between two observations, normalized so they are equal to 1 in ∆t = 0. The covariance matrix V in Eq. 21. is such that its element i, j is a sum of kernel functions evaluated in ∆t = ti − tj. Each color corresponds to a different family of kernels: Quasi periodic (purple, Eq. 37., with η1 = 1 m/s, η2 = 50 days, η3 = 25 days, η4 = 0.5, 1.25, 2 corresponding to dotted, dashed and plain lines respectively, Quasi periodic cosine (yellow, Eq. 38.) with λ0 = 100 days, ω0 = 0.31, h2/h1 = 0, 0.5, 1 corresponding to dotted, dashed and plain lines respectively, SHO (red, Eq. 38.) with S0 = 1 m/s, ω0 = 2π/25 rad/day, Q = 1/2, 2, 5 corresponding to dotted, dashed and plain lines respectively, and Matèrn 5/2 (blue, Eq. 40.) with σ = 1, ρ = 1, 3, 5 days corresponding to dotted, dashed and plain lines respectively.
The analytical form of the kernels listed above rely on qualitative considerations and fits to simulations, but does not result from physical first principles. The approach taken by Luger et al. (2021a) expresses a physical linear transformation giving the spectrum as a function of the state of the stellar surface. It considers that the brightness on the surface of the star, decomposed in spherical harmonics, is a Gaussian process. Applying the linear transformation to the surface yields a two-dimensional Gaussian process representation of the time series of spectra. The framework of Luger et al. (2021a) is primarily thought for Doppler imaging: reconstructing the stellar surface from a time series of spectra (e.g. Vogt & Penrod 1983, Vogt et al. 1987. Hara & Delisle (2023) uses another way to derive kernels with physical interpretation of radial velocity, indicators and photometry -either considered individually or jointly -which take into account the varying properties of spots and faculae along the magnetic cycle.
In the scientific literature dedicated to RV, kernels modelling stellar activity have been assumed to be functions of |t ′ −t| (except in Baluev 2015a)), However, the kernel can depend not only on the time difference between measurements, but on their absolute values. Such non stationarity is observed for instance in instrumental calibration noise (Delisle et al. 2020b). Stationarity is not fully justified, since the magnetic cycles of stars modulate the strength of stellar activity from year-to-year (see section 2.4.1 of the main document), but is realistic over a few years. E.2.3. Oscillations. Stellar oscillations also contribute to the observed RV signal. For the Sun, the characteristic timescale for p-mode oscillation is ∼ 5 minutes and is usually dealt with through observational strategies (Dumusque et al. 2011b, Chaplin et al. 2019). In the case of solar-type stars, RVs come from a complex spectrum of ∼ 30 p-modes, each with a lifetime of days. Astronomers often invoke with a continuous mixture of modes with a Gaussian envelope for their characteristic amplitudes. Such oscillations can be described by a stochastic harmonic oscillator kernel . In practice, the exposure time is usually a non-negligible fraction of the characteristic timescale, so one should integrate the kernel over the exposure times; see Luhn et al. (2022).

E.2.4. Computational Considerations.
Evaluating the likelihood Eqn. 21 for certain values of the parameters requires the log determinant of V(η) and computing {y − g(t; θ)} T V −1 (η){y − g(t; θ)}. With standard algorithms the computational cost scales as O(N 3 ) with the size N of the dataset y, while more advanced or approximate algorithms can reduce this considerably. As seen in §D, evaluating the posterior distribution and Bayesian evidence requires millions of likelihood evaluations, and makes the analysis of large datasets computationally challenging. Fortunately, the covariance matrix is semi-separable for certain kernels and then the computational cost scales as O(N ). Foreman-Mackey et al.
(2017) present the celerite framework 10 , which models covariances as sums of constant, cosine and sine functions. They suggest modeling stellar activity with the covariance of a stochastically excited harmonic oscillator. Similarly, TemporalGPs.jl 11 allows for O(N ) computation of a broad class of 1-d kernel functions in Julia. The CELERITE framework is generalised by Delisle et al. (2020b) to the S+LEAF framework, which models covariances as a sum of semi-separable and so-called LEAF matrices 12 , useful to model the noise due to the calibration of the instrument.Quasi-separable kernels provide even greater flexibility for modeling multivariable timeseries (tinygp 13 ).

E.3. Incorporating Ancillary Time Series With Gaussian Processes
Due to the high flexibility of correlated noise models, true RV variations due to planets can often be explained as correlated noise. In principle, one can compare the marginalized likelihood of models with no, one or multiple planets using a correlated noise model, but when the parameters of the noise model of Eq. 37. are treated as free parameters in the computation of the Bayesian evidence, this often leads to overfitting that increases the likelihoods and decreases the evidence for planets, including those that are actually present. Hence, it is highly desirable to incorporate more information in order to constrain the stellar activity model more tightly. So far, we have assumed that the measured RVs are analyzed separately from any ancillary time series (e.g., other summary statistics computed from the spectra or photometry observations), but, as argued in §2.4.1. of the main document, 10 https://github.com/exoplanet-dev/celerite2 11 https://github.com/JuliaGaussianProcesses/TemporalGPs.jl 12 https://gitlab.unige.ch/Jean-Baptiste.Delisle/spleaf 13 https://tinygp.readthedocs.io/en/stable/index.html features on the surface of the star should affect other properties of the spectrum. Jointly analysing a multivariate time series constrains the effects of stellar variability and improves power for detecting planets.
In principle, one could analyze the time series of successive spectra, but this is computationally challenging and may be difficult to interpret, so astronomers typically analyze low-dimensional time series composed of the measured RVs, additional summary statistics computed from the spectra and, when available, photometric observations. Potential summary statistics include classical activity indicators (e.g., excess emission in cores of Ca II H&K lines), the shape of spectral lines (e.g., FWHM & BIS, see Fig. 4, §3.2 of the main text), measurements of RVs and/or spectral line shapes from distinct regions of the spectrum (e.g., from different spectral orders, ≃ 2Åchunks of the spectrum, or individuals lines), or the entire spectrum itself. In practice when using a large number of summary statistics, dimension reduction (e.g., Principal Components Analysis; PCA).
As stellar activity is particularly problematic, most effort to date has focused on modeling magnetic activity. Aigrain et al. (2012) argue that the RV effect of stellar spots could be predicted based on measurements of their photometric effect. They model the flux of light as a function of time t as Ψ0{1 − F (t)}, where Ψ0 is a constant with dimensions of flux and F (t) models the time variability of the photometric signal, and model the RV variation contamination as −αḞ F + βF (t) 2 with α, β > 0 estimated from the light flux. For most targets, continuous photometric measurements are unavailable, as observational constraints often lead to to large gaps in time series. One can estimate F andḞ by modeling F as a Gaussian process (GP). GPs are stochastic processes G(t), function of a variable t such that for any n values of t, t1, ..., tn, (G(t1), ..., G(tn)) follows a multivariate Gaussian distribution. In general, t can be a vector, but for our purpose, we define it as the time t. This implies that the GP is defined by two quantities: its mean m(t) and kernel, k(t, t ′ ), equal to the covariance of G(t) and G(t ′ ) (Rasmussen & Williams 2005, Aigrain & Foreman-Mackey 2022. The posterior probability distribution of G(t) conditioned on noisy measurements of G at t1, ..., tn can be computed efficiently from formulae 2.23-2.24 in Rasmussen & Williams (2005). Its mean provides an interpolator for measurements at the desired times. By modelling F (t) as a Gaussian process, one can obtain an estimate of its derivative (and the corresponding uncertainty) at the times of spectroscopic observations. The above model can be considered as a special case of a multivariate Gaussian process latent variable model. Based on astrophysical considerations, Rajpaul et al. (2015) postulated that the effect of stellar activity on measured RVs, and two activity indicators (BIS and log R ′ HK ) could be be described by a latent Gaussian process G(t),
where Vc, Vr, Bc, Br and Lc are free parameters. The process G(t) is characterised by its mean function, chosen as null, and there are various choices of kernels parametrising the noise, as discussed in §E.2. This framework, implemented in Barragán et al. (2022) 14 , has been further generalised by Jones et al. (2022), Gilbertson et al. (2020) 15 and Delisle et al. (2022) 16 , where the RV and arbitrary ancillary indicators are modelled as linear combinations of a Gaussian process and its first two derivatives.
In the framework of Rajpaul et al. (2015) and Jones et al. (2022), the expression for the likelihood is the same as Eq. 10. except that the data y is the concatenation of RV and the ancillary indicators. The expression for V involves the covariance of G(t) and its derivatives, which can be expressed analytically (Rajpaul et al. 2015, Jones et al. 2022. With the modified likelihood, the Bayesian model comparison framework applies here also. It is also possible to model jointly the radial velocities computed in different spectral bands, and avail on the fact the amplitude of stellar activity effects changes with wavelength (Reiners et al. 2010, Cale et al. 2021. As mentioned in §E.2.2, Luger et al. (2021a) builds a Gaussian process model of the time series of spectra assuming that the dominant source of variability is related to stellar rotation. Supposing there are n wavelength bins centered at wavelength λi, i = 1, ..., m and N measurements at times tj, j = 1, ..., N . Instead of having a model for 3 time series as in 41., they have m time series, the flux at λi as a function of time tj, F (λi, tj), and they compute the means and covariances of F (λi, tj) and F (λ i ′ , t j ′ ) for all combinations of i, j, i ′ , j ′ . Code is available at https://github.com/rodluger/starry_process.  provide an alternative data-driven approach for modeling the flux at every pixel as a function of time that simultaneously models variability in both stellar spectrum and atmospheric absorption, but does not assume stellar variabiltiy is rotationally modulated. (Their code is avaliable at https://github.com/christiangil/StellarSpectraObservationFitting.jl). Most analyses in the literature assume a stationary noise model. For observing campaigns spanning many years, it may be advantageous to allow for non-stationary GP kernels to account for changes due to long-term magnetic activity cycles. A general framework based on Gaussian process regression networks is provided in Camacho et al. (2023).

E.4. Robustness to a change of model
Although the physics of Doppler shift and planetary motion is precisely understood, the complex astrophysics responsible for stellar variability can only be approximated by a GP model. If the likelihood or priors are poorly chosen, then the results may be unreliable. In particular, if correlated noise is ignored, then one can easily be overconfident about the statistical significance of a putative planet. For real planets, the mass and eccentricity estimators can be biased (Zakamska et al. 2011, Damasso et al. 2019, Hara et al. 2019. In extreme cases, one can recognize remnant correlated noise by subtracting the best-fit planetary model (or better subtracting the predicted planetary signals while over planetary models) and looking for evidence of correlated residuals. Whether there is correlation in the residuals can be assessed by computing the Bayesian evidence different kernels to describe them. Another approach consists in computing periodograms (see §4.2)(e.g., Baluev 2013b) or variograms (Hara et al. 2019) from the residuals, or evaluating the predictive cababilities of the model (Hara et al. 2022a). We stress again the importance of testing the sensitivity of the results to priors and likelihood choices.
As discussed in §E.1, when working with datasets of fewer than ∼20 observations and 15 https://github.com/christiangil/GPLinearODEMaker.jl 16 https://gitlab.unige.ch/Jean-Baptiste.Delisle/spleaf low amplitude signals (e.g., close to the average noise level), it is important to perform a sensitivty test to check that the key conclusions are robust to different (but still reasonable) assumptions for the likelihood and prior.