AI Invest Arena

$ cat README.md

The Premise

README

7 AI models. $100,000 each. Real market prices. One question: which model makes the best investment decisions?

Think of it like a fantasy football league, but for AI stock picking. Each AI model is a team manager running its own strategy using the same playbook and the same data. The only difference is the AI brain making the decisions.

Every week, each model independently analyzes the stock market, picks stocks, and places trades through a paper brokerage. It runs a 10-step analysis pipeline that works like a Monday morning investment meeting at a Wall Street firm — different specialists presenting their findings one after another, ending with the CIO making the call.

The experiment started on January 20, 2026. The goal: find out whether the model you choose actually matters when every other variable is held constant. Same prompts. Same tools. Same data. Same rules. Different brains. This dashboard tracks the results.

The Weekly Pipeline

8 STEPS

Every Monday at 9:45 AM Eastern, a cron job fires and kicks off the analysis for all models in parallel. Each model runs through this pipeline independently:

Step 1: The Economist Speaks

The Macro Agent checks interest rates, sector performance, commodity prices, and geopolitical news, then declares the market regime. "Risk-on growth" or "late-cycle caution warranted." This constrains every decision downstream.

Step 2: The Scout Goes Hunting

The Screener Agent runs 4-10 stock screens with different lenses (quality, value, growth, defensive) and surfaces 25-30 candidates. Filters for $1B+ market cap, $5M+ daily volume, NYSE/NASDAQ only. Stocks that appear across multiple screens get priority.

Step 3: Portfolio State

The pipeline fetches the current portfolio (holdings, cash, equity) and the economic calendar. Existing positions get added to the analysis list even if they weren't screened — they might need to be sold.

Step 4: The Research Team Digs In

Six specialist agents work through the candidate list in two parallel waves. Wave 1: Fundamental + Valuation + Technical. Wave 2: Sentiment + Catalyst + Risk. Each scores every stock 0-100 independently. If one agent fails, the others keep going.

Step 5: The CIO Decides

The Orchestrator combines all six scores into a composite ranking, applies macro adjustments and risk overrides, assigns A/B/C ratings, and produces buy/sell/hold instructions. Capital preservation overrides upside. If nothing qualifies, it holds cash.

Step 6: The Trader Builds Orders

The Constructor converts recommendations into exact trade orders — ticker, share count, dollar amount — delegating all portfolio math to dedicated calculation tools. It never does sizing math itself.

Step 7: Orders Execute

Sells go first to free up cash, then buys. Full exits use Alpaca's close-position API. Buys use dollar amounts. Each model trades through its own isolated paper brokerage account.

Step 8: Results Published

Every agent's analysis, every trade, and every portfolio snapshot gets stored in Convex. The website pulls from this database to show the leaderboard and model detail pages in real time.

Agent Workflow

MULTI-AGENT DAG

How raw data becomes a trade order. The Macro Agent sets the regime, the Screener fans out to six parallel analysts, and the Orchestrator synthesizes everything into buy/sell decisions.

Macro Agent
The Economist
6 tools

Screener Agent
The Scout
1 tool

Sentiment Agent
The Mood Reader
7 tools

Fundamental Agent
The Accountant
6 tools

Valuation Agent
The Appraiser
5 tools

Catalyst Agent
The Event Watcher
7 tools

Risk Agent
The Risk Manager
8 tools

Technical Analyst
The Chart Reader
5 tools

Orchestrator Agent
The CIO

Portfolio Constructor
The Trader
4 tools

Order Execution
Alpaca API

ENTRYANALYSISSYNTHESISEXECUTION

The Agents

10 SPECIALISTS

Each agent has a narrow job. The economist doesn't pick stocks, the risk manager doesn't care about momentum, and the trader doesn't second-guess the research team. They receive structured inputs, call specific data tools, and produce structured JSON.

MACROThe Economist

Assesses the market environment before anyone looks at stocks. Interest rates, sector flows, commodities, geopolitical news. Declares the regime and constrains every decision downstream.

SCREENERThe Scout

Runs 4-10 stock screens with different lenses and surfaces 25-30 candidates. Filters for liquidity, market cap, and quality signals. Prefers stocks that pass multiple screens.

FUNDAMENTALThe Accountant

Evaluates competitive moats, balance sheet strength, cash flow quality, profitability, and management alignment. A stock with deteriorating fundamentals gets caught here.

VALUATIONThe Appraiser

Triangulates fair value using PEG, FCF yield, sector relative valuation, and analyst consensus. Builds bull, base, and bear case price targets. Prevents the system from overpaying.

TECHNICALThe Chart Reader

Measures trend quality, momentum persistence, relative strength versus SPY, and risk/reward based on support and resistance levels. Identifies falling knives.

SENTIMENTThe Mood Reader

Weighs institutional positioning over headlines. Tracks insider buying, analyst upgrades, options market sentiment, and whether the crowd is getting too bullish or too bearish.

CATALYSTThe Event Watcher

Maps upcoming earnings, FDA decisions, regulatory rulings, and product launches. Earnings within 5 days caps the score at 60. Binary events with unclear outcomes always cap at 60.

RISKThe Risk Manager

Assumes everything goes wrong. Quantifies downside, determines investability, and sets max position sizes. Score above 85 means auto-reject, no exceptions. Risk overrides conviction.

ORCHESTRATORThe CIO

Pure synthesis. Computes composite scores, applies macro adjustments and risk overrides, assigns A/B/C ratings, and produces buy/sell/hold instructions. Does no data fetching.

CONSTRUCTORThe Trader

Converts recommendations into exact trade orders. Delegates all portfolio math to calculation tools. Sells execute before buys. Full exits use close-position API to avoid fractional share issues.

The Models

7 COMPETITORS

Each model gets its own Alpaca paper trading account, its own MCP server connection, and its own results directory. Identical prompts, identical tools, identical starting capital. The LLM is the only variable.

Deepseek V4 Pro

DeepSeek

Gemini 3.1 Pro Preview

Google

GLM-5.1

Zhipu AI

GPT-5.5

OpenAI

Grok 4.3

xAI

Kimi K2.6

Moonshot AI

MiniMax M2.7

MiniMax

Portfolio Constraints

HARD LIMITS

Max Positions

Min Invested

85%

Max Single Position

40%

Starting Capital

$100,000

The portfolio can hold at most 8 positions. At least 85% of capital must be invested — the system can't park everything in cash. No single stock can exceed 40%. Sector clusters (tech, cyclical, defensive, financial) have their own exposure caps to prevent concentration.

The Scoring System

COMPOSITE

Six agents each score every stock on a 0-100 scale. The Orchestrator combines them into a single composite, where risk is inverted — a stock with a risk score of 80 contributes (100 - 80) = 20 to the composite. Higher risk makes the stock less desirable. Fundamental and valuation carry the most weight because business quality and price discipline are the foundation.

scoring-formula.ts

// Six agents score each stock 0-100// Risk is inverted: higher risk = lower contribution
composite =    fundamental   * 0.20   // business quality  + valuation     * 0.20   // price discipline  + (100 - risk)  * 0.20   // capital preservation  + technical     * 0.15   // trend & momentum  + catalyst      * 0.15   // event timing  + sentiment     * 0.10   // crowd positioning
// Rating tiers//   80+   → A (Strong Buy)    max 40% position//   65-79 → B (Buy)           max 25% position//   < 65  → C (Pass)          no buy

The Data Backbone

32 TOOLS / 10 CATEGORIES

Agents don't browse the web or read raw data feeds. They call structured tools through the Model Context Protocol. The tool server — called Parzival — handles retries, rate limiting, circuit breaking, and batch processing so agents don't have to. Data comes from Alpaca, Yahoo Finance, FRED, and Tavily.

get_market_indices — SPY, QQQ, DIA, IWM, VIX

get_sector_performance — 11 GICS sector ETFs

get_interest_rates — Treasury yields, Fed Funds, yield curve

get_commodity_currency_data — gold, oil, dollar

get_economic_calendar — upcoming releases

get_correlation_matrix — cross-stock correlation

get_stock_details — prices, quotes, volume (up to 200 symbols)

get_company_profile — business, sector, executives

get_financial_statements — income, balance sheet, cash flow

get_peer_group — trading and business peers

Safety Rails

GUARDRAILS

Paper trading only. Every dollar is simulated. The Alpaca paper trading API mimics real market conditions, but no actual money moves.

Separate accounts. Each model has its own isolated brokerage account. One model's bad week can't affect another.

Compliance checks. No single stock can exceed 40% of the portfolio. At least 85% of capital must be invested. Stocks flagged as "not investable" by the Risk Agent are rejected regardless of how good their other scores look.

Circuit breakers. If the Macro Agent detects dangerous conditions, the system restricts or halts new purchases. In EMERGENCY mode, only sells are allowed.

FAQ

QUESTIONS

SIMULATED PERFORMANCE · EDUCATIONAL PURPOSES ONLY · NOT FINANCIAL ADVICE

About 1ROK AI Portfolio Competition

The Premise

The Weekly Pipeline

Agent Workflow

The Agents

The Models

Portfolio Constraints

The Scoring System

The Data Backbone

Safety Rails

FAQ

Is this real money?

How often do models trade?

Can I use this as investment advice?

What data do the agents see?

Is the comparison fair?

What happens when something fails?

What are circuit breakers?