Market Regimes, Sentiment Analysis, and the Federal Reserve¶
Market regime detection is usually presented as a black box: some hidden model labels the market as bullish, bearish, or neutral, and you are asked to trust the output. In this post, I want to take a different approach.
Using sentiment scores derived from Federal Reserve statements and publications, we will build a simple and inspectable regime indicator for sector ETFs. We will show how policy language can be converted into a structured macro signal, and align this with historical prices.
The Empirical Markets API is useful here because it provides both price data and sentiment scores.
Sentiment Data¶
The sentiment dataset used here is the fed-market-sectors topic from the Empirical Markets API. It provides a historical time series of sector-level scores derived from Federal Reserve publications, along with a short reasoning field for each observation. The available dimensions include overall, financials, energy, technology, and other major sectors. The scores are mapped on a discrete scale from strongly bearish (1) to strongly bullish (5).
This matters because raw macro text is hard to use directly. A score lets us work with the data statistically, while the reasoning field preserves interpretability. This combination is more useful than either one alone: a score without rationale is opaque, but rationale without a structured series is difficult to analyze empirically.
In this post, the workflow is straightforward. We use the Empirical Markets API to pull historical sentiment for each sector, align it with the price history of a corresponding ETF, and then build a simple regime detection model.
Obtaining Sector Level Sentiment¶
The code blocks below show you how to collect sector-level sentiment scores. The steps are straightforward:
- Authenticate to the Empirical Markets API
- Make a
GETrequest tohttps://api.empiricalmarkets.com/v1/ai/sentiment/fed-market-sectorsto get a list of available dimensions (sectors) for thefed-market-sectorssentiment topic. - For each sector, make a
GETrequest tohttps://api.empiricalmarkets.com/v1/ai/sentiment/fed-market-sectors/dimensions/{sector}to obtain a time series of historical sentiment scores and rationales.
import os
import requests
from dotenv import load_dotenv
load_dotenv() # load environment variables from .env file
API_AUTH_HEADERS = {"Content-Type": "application/x-www-form-urlencoded"}
API_HOST = "https://api.empiricalmarkets.com"
def get_request_headers() -> dict:
api_key = os.getenv("EM_API_KEY")
if api_key is None:
raise ValueError("API key not found. Please set the EM_API_KEY environment variable.")
r = requests.post(
f"{API_HOST}/v1/token",
data={"api_key": api_key},
headers={"Content-Type": "application/x-www-form-urlencoded"}
)
r.raise_for_status()
token = r.json().get("access_token")
request_headers = {"Authorization": "Bearer {}".format(token)}
return request_headers
request_headers = get_request_headers()
print(str(request_headers)[:50] + "...}")
{'Authorization': 'Bearer eyJhbGciOiJIUzI1NiIsInR5...}
import time
import requests
def get_me(request_headers: dict) -> dict:
r = requests.get(f"{API_HOST}/v1/me", headers=request_headers)
r.raise_for_status()
return r.json()
me_info = get_me(request_headers)
# Use the rate limit from the /me endpoint; fall back to 1 req/s (free tier)
_rate_limit_per_second = me_info.get("api_rate_limit_per_second") or 1
_min_interval = 1.0 / _rate_limit_per_second # e.g. 1.0s for free, 0.33s for 3 req/s
def rate_limited_get(url: str, **kwargs) -> requests.Response:
"""Drop-in replacement for requests.get that respects the tier rate limit."""
response = requests.get(url, **kwargs)
time.sleep(_min_interval)
return response
print(f"Built rate limit utility with allowed rate limit: {_rate_limit_per_second} req/s. The program will sleep {_min_interval:.2f}s between API calls.")
Built rate limit utility with allowed rate limit: 3 req/s. The program will sleep 0.33s between API calls.
import pandas as pd
def get_sector_sentiment(sector: str) -> dict:
"""Fetch the historical sentiment data for a given sector."""
r = rate_limited_get(f"{API_HOST}/v1/ai/sentiment/fed-market-sectors/dimensions/{sector}", headers=request_headers)
r.raise_for_status()
json_data = r.json()
value_map = json_data["value_map"]
df = pd.DataFrame(json_data["sentiment"])
df["score_name"] = df["score"].astype(str).map(value_map)
df["date"] = pd.to_datetime(df["date"])
return df
def get_all_sector_sentiments(request_headers: dict) -> pd.DataFrame:
"""Query the list of available sectors, and fetch the historical sentiment data for each."""
r = rate_limited_get(f"{API_HOST}/v1/ai/sentiment/fed-market-sectors", headers=request_headers)
r.raise_for_status()
fed_sector_sentiment_info = r.json()
all_data = []
for sector in fed_sector_sentiment_info["dimensions"]:
print(f"Fetching sentiment data for sector: {sector}")
df = get_sector_sentiment(sector)
all_data.append(df)
all_data = pd.concat(all_data, ignore_index=True)
all_data = all_data.sort_values("date").reset_index(drop=True)
return all_data.set_index("date")
sentiment_df = get_all_sector_sentiments(request_headers)
sentiment_df
Fetching sentiment data for sector: consumer-discretionary Fetching sentiment data for sector: consumer-staples Fetching sentiment data for sector: energy Fetching sentiment data for sector: financials Fetching sentiment data for sector: healthcare Fetching sentiment data for sector: industrials Fetching sentiment data for sector: materials Fetching sentiment data for sector: overall Fetching sentiment data for sector: real-estate Fetching sentiment data for sector: technology Fetching sentiment data for sector: utilities
| model_name | model_version | dimension | score | reasoning | score_name | |
|---|---|---|---|---|---|---|
| date | ||||||
| 1996-10-30 | Newton | 1.0.0 | Consumer Discretionary | 3 | Mixed reports on consumer spending and auto sa... | Neutral |
| 1996-10-30 | Newton | 1.0.0 | Utilities | 3 | No specific mention implies no significant cha... | Neutral |
| 1996-10-30 | Newton | 1.0.0 | Financials | 3 | The financial sector is stable to slightly str... | Neutral |
| 1996-10-30 | Newton | 1.0.0 | Real Estate | 3 | Mixed conditions with regional strengths and w... | Neutral |
| 1996-10-30 | Newton | 1.0.0 | Industrials | 3 | Manufacturing is stable to stronger with regio... | Neutral |
| ... | ... | ... | ... | ... | ... | ... |
| 2026-03-04 | Newton | 1.0.0 | Consumer Staples | 3 | Consumer spending faced challenges due to econ... | Neutral |
| 2026-03-04 | Newton | 1.0.0 | Financials | 4 | Financial services activity was stable to up, ... | Slightly Bullish |
| 2026-03-04 | Newton | 1.0.0 | Materials | 3 | Nonlabor cost pressures were noted, including ... | Neutral |
| 2026-03-04 | Newton | 1.0.0 | Technology | 2 | Layoffs in technology services were reported, ... | Slightly Bearish |
| 2026-03-04 | Newton | 1.0.0 | Utilities | 3 | Utilities faced moderate price growth with tar... | Neutral |
2585 rows × 6 columns
Obtaining Historical Prices¶
After collecting the historical sentiment scores for each sector, the next step is to align those scores with historical prices for further analysis. For this, too, we can utilize the Empirical Markets API by making some GET requests to the https://api.empiricalmarkets.com/v1/ohlc/tickers/{ticker} endpoint.
from datetime import datetime
from time import sleep
def get_ticker_ohlc_data(
ticker: str,
request_headers: dict,
start_date: datetime = datetime(1996, 1, 1),
chunk_size: int = 1260,
) -> pd.DataFrame:
print(f"Fetching OHLC data for ticker: {ticker}")
frames = []
params = {
"limit": chunk_size,
"start_date": start_date.strftime("%Y-%m-%d"),
}
while True:
r = rate_limited_get(
f"{API_HOST}/v1/ohlc/tickers/{ticker}",
params=params,
headers=request_headers,
)
sleep(0.5)
r.raise_for_status()
payload = r.json()
data = payload["data"]
if not data:
break
frames.append(pd.DataFrame(data))
date_cursor = payload["date_cursor"]
if not date_cursor["has_more"]:
break
# next page: swap start_date for the exclusive date_cursor
params = {
"limit": chunk_size,
"date_cursor": date_cursor["next_cursor"],
}
if not frames:
return pd.DataFrame()
df = pd.concat(frames, ignore_index=True)
df["date"] = pd.to_datetime(df["date"])
return df.set_index("date").sort_index()
ohlc_data = {
"SPY": get_ticker_ohlc_data("SPY", request_headers),
"XLF": get_ticker_ohlc_data("XLF", request_headers),
"XLE": get_ticker_ohlc_data("XLE", request_headers),
}
ohlc_data["SPY"]
Fetching OHLC data for ticker: SPY Fetching OHLC data for ticker: XLF Fetching OHLC data for ticker: XLE
| ticker | asset_class | frequency | open | high | low | close | volume | |
|---|---|---|---|---|---|---|---|---|
| date | ||||||||
| 1996-01-02 | SPY | etfs | D | 36.540131 | 36.977139 | 36.502940 | 36.977139 | 514400.0 |
| 1996-01-03 | SPY | etfs | D | 37.097996 | 37.191003 | 36.893475 | 37.079430 | 610300.0 |
| 1996-01-04 | SPY | etfs | D | 37.125903 | 37.265385 | 36.428558 | 36.726086 | 1129700.0 |
| 1996-01-05 | SPY | etfs | D | 36.484374 | 36.744711 | 36.400649 | 36.651704 | 302400.0 |
| 1996-01-08 | SPY | etfs | D | 36.781902 | 36.837659 | 36.735368 | 36.791184 | 179900.0 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 2026-04-07 | SPY | etfs | D | 656.650000 | 659.610000 | 651.060000 | 659.220000 | 69980362.0 |
| 2026-04-08 | SPY | etfs | D | 676.390000 | 677.080000 | 671.460000 | 676.010000 | 93606114.0 |
| 2026-04-09 | SPY | etfs | D | 674.840000 | 681.160000 | 673.770000 | 679.910000 | 57134445.0 |
| 2026-04-10 | SPY | etfs | D | 681.320000 | 682.030000 | 678.450000 | 679.460000 | 42253456.0 |
| 2026-04-13 | SPY | etfs | D | 677.410000 | 686.300000 | 676.580000 | 686.100000 | 54185819.0 |
7619 rows × 8 columns
Exploring the Sentiment Data¶
Before building the algorithm to detect regimes, it helps to inspect the data a bit. A regime model is only as credible as the data going into it. To do this, we'll join the price data we collected with the sentiment scores.
This is not meant to prove causality. It is more of a diagnostic step. If the sentiment series is useful for trading purposes, then periods of unusually bearish language should cluster around well-known stress episodes. Conversely, periods of more constructive language should appear during healthier risk environments.
import matplotlib.pyplot as plt
import numpy as np
plt.style.use("dark_background")
def _ffill_price_sentiment_df(df: pd.DataFrame) -> pd.DataFrame:
"""Forward fill."""
df = df.sort_index()
df["score"] = df["score"].ffill()
df["close"] = df["close"].ffill()
return df
def plot_price_vs_sector_sentiment(
ticker: str,
sector_name: str,
sentiment_df: pd.DataFrame,
ohlc_data: dict[str, pd.DataFrame]
):
plot_df = (
sentiment_df[sentiment_df["dimension"] == sector_name][["score", "score_name"]]
.merge(
ohlc_data[ticker]["close"],
how="outer",
left_index=True,
right_index=True
)
)
plot_df = _ffill_price_sentiment_df(plot_df).dropna()
fig, ax = plt.subplots(figsize=(12, 8))
np.log(plot_df["close"]).plot(ax=ax, label=f"{ticker} (Log)")
ax2 = ax.twinx()
# plot_df["score"].plot(ax=ax2, label="Sentiment Score", color="orange")
ax2.bar(
plot_df.index,
plot_df["score"],
width=pd.Timedelta(days=20), # adjust if your sentiment dates are closer/farther apart
alpha=0.5,
label="Sentiment Score",
color="orange",
align="center",
)
ax.set_title(f"{sector_name} Sector Sentiment vs {ticker} (Log)")
ax.set_ylabel(f"{ticker} (Log)")
ax2.set_ylabel(f"FOMC {sector_name} Sentiment Score")
h1, l1 = ax.get_legend_handles_labels()
h2, l2 = ax2.get_legend_handles_labels()
fig.legend(
h1 + h2,
l1 + l2,
loc="lower center",
bbox_to_anchor=(0.5, 0.93),
ncol=2,
columnspacing=1.0,
handlelength=1.0,
)
plt.show()
plot_price_vs_sector_sentiment(ticker="SPY", sector_name="Overall", sentiment_df=sentiment_df, ohlc_data=ohlc_data)
plot_price_vs_sector_sentiment(ticker="XLF", sector_name="Financials", sentiment_df=sentiment_df, ohlc_data=ohlc_data)
plot_price_vs_sector_sentiment(ticker="XLE", sector_name="Energy", sentiment_df=sentiment_df, ohlc_data=ohlc_data)
Interpretable Snapshots¶
One of the more useful features of this dataset is that each observation includes a short reasoning statement. This makes the series easier to validate. Instead of treating the sentiment score as an unexplained label, we can inspect the dates with the most bearish readings and ask whether the explanation matches the macro backdrop.
For example, the most bearish observation for Financials was during the 2008 crisis, with language centered on tightening credit conditions and deteriorating loan quality. For Technology, the most bearish reading appears during the post-dot-com downturn, with the rationale pointing to weakness in computers, semiconductors, and telecommunications.
def _sector_most_bearish_sentiment(df: pd.DataFrame, sector_name: str) -> dict:
"""Identify the date with the most bearish sentiment score for a given sector."""
sector_df = df[df["dimension"] == sector_name].reset_index()
most_bearish_row = sector_df.loc[sector_df["score"].idxmin()]
return most_bearish_row.to_dict()
_sector_most_bearish_sentiment(sentiment_df, "Financials")
{'date': Timestamp('2008-10-15 00:00:00'),
'model_name': 'Newton',
'model_version': '1.0.0',
'dimension': 'Financials',
'score': 1,
'reasoning': 'Severely tightened credit conditions and deteriorating loan quality across districts pose a significant threat to financial sector stability, indicating a strongly bearish sentiment.',
'score_name': 'Strongly Bearish'}
_sector_most_bearish_sentiment(sentiment_df, "Technology")
{'date': Timestamp('2001-08-08 00:00:00'),
'model_name': 'Newton',
'model_version': '1.0.0',
'dimension': 'Technology',
'score': 1,
'reasoning': 'The decline in manufacturing activity, particularly in computers, semiconductors, and telecommunications, suggests strong bearish sentiment for the technology sector.',
'score_name': 'Strongly Bearish'}
This does not “prove” the signal, but it is an important sanity check. If extreme readings line up with intuitive economic narratives, the series becomes much easier to trust as an input into a broader regime framework.
Sentiment Regime Detection¶
With the raw data validated, we can define a regime model. The goal is not to optimize a trading system. It is to create a transparent state variable that summarizes the policy-language backdrop of the Federal Reserve.
The pipeline is intentionally simple:
- Smooth the sentiment series with an exponential moving average to reduce one-off noise.
- Standardize the smoothed series with a rolling z-score so that extreme sentiment is measured relative to its own history.
- Label each date as
risk on,risk off, orneutralbased on a z-score threshold.
I like this approach because every step is interpretable. There is no hidden classifier, no latent-state estimation, and no optimization routine fitting itself to a specific backtest window. If the model behaves oddly, you can inspect the exact series and threshold that produced the label.
In the charts below, green shading marks risk on periods, red marks risk off, and gray marks neutral periods. These labels should be interpreted as macro sentiment regimes, not direct buy or sell signals. The main use case is contextual, as it characterizes the backdrop in which price action is unfolding.
REGIME_LABEL_MAP = {1: "risk_on", 0: "neutral", -1: "risk_off"}
def join_price_history(ticker: str, sector_name: str, sentiment_df: pd.DataFrame, ohlc_data: dict[str, pd.DataFrame]) -> pd.DataFrame:
"""Join FOMC Sentiment to OHLC Prices."""
joined_df = ohlc_data[ticker][["close"]].merge(
sentiment_df[sentiment_df["dimension"] == sector_name],
how="outer",
left_index=True,
right_index=True
)
joined_df = _ffill_price_sentiment_df(joined_df).dropna()
return joined_df
def compute_regimes(
df_price: pd.DataFrame,
sector_name: str,
ema_span: int = 30,
z_window: int = 300,
z_threshold: float = 2.,
) -> pd.DataFrame:
"""Calculate rolling sentiment z-score and create regime labels for a given sector."""
sector_df = df_price[df_price["dimension"] == sector_name].copy()
sector_df = sector_df.sort_index()
# Smooth sentiment and standardize
sector_df["score_smooth"] = sector_df["score"].ewm(span=ema_span, adjust=False).mean()
roll_mean = sector_df["score_smooth"].rolling(window=z_window, min_periods=max(ema_span, 63)).mean()
roll_std = sector_df["score_smooth"].rolling(window=z_window, min_periods=max(ema_span, 63)).std(ddof=0)
sector_df["z"] = (sector_df["score_smooth"] - roll_mean) / roll_std
# Regime classification
sector_df["regime"] = np.select([sector_df["z"] >= z_threshold, sector_df["z"] <= -z_threshold], [1, -1], default=0)
sector_df["regime_label"] = sector_df["regime"].map(REGIME_LABEL_MAP)
return sector_df
def _shade_regimes(ax, index: pd.DatetimeIndex, regime: pd.Series):
current, start = None, None
for t, r in regime.items():
if current is None:
current, start = r, t
elif r != current:
color = "green" if current == 1 else ("red" if current == -1 else "gray")
alpha = 0.2 if color == "gray" else 0.5
ax.axvspan(start, t, color=color, alpha=alpha, lw=0)
current, start = r, t
if current is not None and len(index):
color = "green" if current == 1 else ("red" if current == -1 else "gray")
alpha = 0.2 if color == "gray" else 0.5
ax.axvspan(start, index[-1], color=color, alpha=alpha, lw=0)
def plot_price_with_regimes(sector_df: pd.DataFrame, ticker: str, sector_name: str):
"""
sector_df: output of compute_regimes
"""
sector_df = sector_df.dropna(subset=["z"])
fig, ax = plt.subplots(figsize=(12, 7))
np.log(sector_df["close"]).plot(ax=ax, label=f"${ticker.upper()} (Log Close)")
# z-score on secondary axis
ax2 = ax.twinx()
sector_df["z"].clip(-3, 3).plot(ax=ax2, color="orange", alpha=0.7, label="FOMC Sentiment z-score")
_shade_regimes(ax, sector_df.index, sector_df["regime"])
ax.set_title(f"{sector_name} Sentiment Regimes vs ${ticker.upper()}")
ax.set_ylabel(f"${ticker.upper()} (log)")
ax2.set_ylabel("Sentiment z-score")
h1, l1 = ax.get_legend_handles_labels()
h2, l2 = ax2.get_legend_handles_labels()
fig.legend(h1 + h2, l1 + l2, loc="lower center", bbox_to_anchor=(0.5, 0.93), ncol=2)
plt.show()
# overall sentiment vs spy
overall = compute_regimes(
join_price_history("SPY", "Overall", sentiment_df, ohlc_data),
sector_name="Overall",
ema_span=20,
z_window=252,
z_threshold=1.5
)
plot_price_with_regimes(overall, ticker="SPY", sector_name="Overall")
# financials sentiment vs xlf
financials = compute_regimes(
join_price_history("XLF", "Financials", sentiment_df, ohlc_data),
sector_name="Financials",
ema_span=20,
z_window=252,
z_threshold=1.5
)
plot_price_with_regimes(financials, ticker="XLF", sector_name="Financials")
# energy sentiment vs xle
energy = compute_regimes(
join_price_history("XLE", "Energy", sentiment_df, ohlc_data),
sector_name="Energy",
ema_span=20,
z_window=252,
z_threshold=1.5
)
plot_price_with_regimes(energy, ticker="XLE", sector_name="Energy")
Conclusion¶
Policy language is often treated as something qualitative, subjective, and difficult to use systematically. This analysis shows that it can be turned into a structured, interpretable signal that is useful for regime classification. By combining historical sentiment scores with price data, we can build a transparent framework for identifying when the macro backdrop is becoming more supportive or more adverse.
The Empirical Markets API makes that workflow practical by exposing both historical sentiment scores and the rationale behind them in a format that is easy to query. If you already think in terms of market regimes, this kind of sentiment series is a good complement to price-based or macro-based indicators. And if you are skeptical of text-derived signals, that skepticism is exactly why an interpretable workflow matters: you can inspect the history, challenge the assumptions, and decide whether the signal earns a place in your process.
The information presented in this analysis is for educational and informational purposes only. It does not constitute financial, investment, or trading advice, and should not be interpreted as a recommendation to buy, sell, or hold any securities. The methods shown in this article are for analysis only and should not be used for live trading. Past performance does not indicate future results.