7 AI models. $100,000 each. Real market prices. One question: which model makes the best investment decisions?
Think of it like a fantasy football league, but for AI stock picking. Each AI model is a team manager running its own strategy using the same playbook and the same data. The only difference is the AI brain making the decisions.
Every week, each model independently analyzes the stock market, picks stocks, and places trades through a paper brokerage. It runs a 10-step analysis pipeline that works like a Monday morning investment meeting at a Wall Street firm — different specialists presenting their findings one after another, ending with the CIO making the call.
The experiment started on January 20, 2026. The goal: find out whether the model you choose actually matters when every other variable is held constant. Same prompts. Same tools. Same data. Same rules. Different brains. This dashboard tracks the results.
The Weekly Pipeline
8 STEPS
Every Monday at 9:45 AM Eastern, a cron job fires and kicks off the analysis for all models in parallel. Each model runs through this pipeline independently:
Step 1: The Economist Speaks
The Macro Agent checks interest rates, sector performance, commodity prices, and geopolitical news, then declares the market regime. "Risk-on growth" or "late-cycle caution warranted." This constrains every decision downstream.
Step 2: The Scout Goes Hunting
The Screener Agent runs 4-10 stock screens with different lenses (quality, value, growth, defensive) and surfaces 25-30 candidates. Filters for $1B+ market cap, $5M+ daily volume, NYSE/NASDAQ only. Stocks that appear across multiple screens get priority.
Step 3: Portfolio State
The pipeline fetches the current portfolio (holdings, cash, equity) and the economic calendar. Existing positions get added to the analysis list even if they weren't screened — they might need to be sold.
Step 4: The Research Team Digs In
Six specialist agents work through the candidate list in two parallel waves. Wave 1: Fundamental + Valuation + Technical. Wave 2: Sentiment + Catalyst + Risk. Each scores every stock 0-100 independently. If one agent fails, the others keep going.
Step 5: The CIO Decides
The Orchestrator combines all six scores into a composite ranking, applies macro adjustments and risk overrides, assigns A/B/C ratings, and produces buy/sell/hold instructions. Capital preservation overrides upside. If nothing qualifies, it holds cash.
Step 6: The Trader Builds Orders
The Constructor converts recommendations into exact trade orders — ticker, share count, dollar amount — delegating all portfolio math to dedicated calculation tools. It never does sizing math itself.
Step 7: Orders Execute
Sells go first to free up cash, then buys. Full exits use Alpaca's close-position API. Buys use dollar amounts. Each model trades through its own isolated paper brokerage account.
Step 8: Results Published
Every agent's analysis, every trade, and every portfolio snapshot gets stored in Convex. The website pulls from this database to show the leaderboard and model detail pages in real time.
Agent Workflow
MULTI-AGENT DAG
How raw data becomes a trade order. The Macro Agent sets the regime, the Screener fans out to six parallel analysts, and the Orchestrator synthesizes everything into buy/sell decisions.
Macro Agent
The Economist
6 tools
Screener Agent
The Scout
1 tool
Sentiment Agent
The Mood Reader
7 tools
Fundamental Agent
The Accountant
6 tools
Valuation Agent
The Appraiser
5 tools
Catalyst Agent
The Event Watcher
7 tools
Risk Agent
The Risk Manager
8 tools
Technical Analyst
The Chart Reader
5 tools
Orchestrator Agent
The CIO
Portfolio Constructor
The Trader
4 tools
Order Execution
Alpaca API
Press enter or space to select a node. You can then use the arrow keys to move the node around. Press delete to remove it and escape to cancel.
Press enter or space to select an edge. You can then press delete to remove it or escape to cancel.
ENTRYANALYSISSYNTHESISEXECUTION
The Agents
10 SPECIALISTS
Each agent has a narrow job. The economist doesn't pick stocks, the risk manager doesn't care about momentum, and the trader doesn't second-guess the research team. They receive structured inputs, call specific data tools, and produce structured JSON.
MACROThe Economist
Assesses the market environment before anyone looks at stocks. Interest rates, sector flows, commodities, geopolitical news. Declares the regime and constrains every decision downstream.
SCREENERThe Scout
Runs 4-10 stock screens with different lenses and surfaces 25-30 candidates. Filters for liquidity, market cap, and quality signals. Prefers stocks that pass multiple screens.
FUNDAMENTALThe Accountant
Evaluates competitive moats, balance sheet strength, cash flow quality, profitability, and management alignment. A stock with deteriorating fundamentals gets caught here.
VALUATIONThe Appraiser
Triangulates fair value using PEG, FCF yield, sector relative valuation, and analyst consensus. Builds bull, base, and bear case price targets. Prevents the system from overpaying.
TECHNICALThe Chart Reader
Measures trend quality, momentum persistence, relative strength versus SPY, and risk/reward based on support and resistance levels. Identifies falling knives.
SENTIMENTThe Mood Reader
Weighs institutional positioning over headlines. Tracks insider buying, analyst upgrades, options market sentiment, and whether the crowd is getting too bullish or too bearish.
CATALYSTThe Event Watcher
Maps upcoming earnings, FDA decisions, regulatory rulings, and product launches. Earnings within 5 days caps the score at 60. Binary events with unclear outcomes always cap at 60.
RISKThe Risk Manager
Assumes everything goes wrong. Quantifies downside, determines investability, and sets max position sizes. Score above 85 means auto-reject, no exceptions. Risk overrides conviction.
ORCHESTRATORThe CIO
Pure synthesis. Computes composite scores, applies macro adjustments and risk overrides, assigns A/B/C ratings, and produces buy/sell/hold instructions. Does no data fetching.
CONSTRUCTORThe Trader
Converts recommendations into exact trade orders. Delegates all portfolio math to calculation tools. Sells execute before buys. Full exits use close-position API to avoid fractional share issues.
The Models
7 COMPETITORS
Each model gets its own Alpaca paper trading account, its own MCP server connection, and its own results directory. Identical prompts, identical tools, identical starting capital. The LLM is the only variable.
Deepseek V4 Pro
DeepSeek
Gemini 3.1 Pro Preview
Google
GLM-5.1
Zhipu AI
GPT-5.5
OpenAI
Grok 4.3
xAI
Kimi K2.6
Moonshot AI
MiniMax M2.7
MiniMax
Portfolio Constraints
HARD LIMITS
Max Positions
8
Min Invested
85%
Max Single Position
40%
Starting Capital
$100,000
The portfolio can hold at most 8 positions. At least 85% of capital must be invested — the system can't park everything in cash. No single stock can exceed 40%. Sector clusters (tech, cyclical, defensive, financial) have their own exposure caps to prevent concentration.
The Scoring System
COMPOSITE
Six agents each score every stock on a 0-100 scale. The Orchestrator combines them into a single composite, where risk is inverted — a stock with a risk score of 80 contributes (100 - 80) = 20 to the composite. Higher risk makes the stock less desirable. Fundamental and valuation carry the most weight because business quality and price discipline are the foundation.
scoring-formula.ts
// Six agents score each stock 0-100// Risk is inverted: higher risk = lower contributioncomposite = fundamental * 0.20 // business quality + valuation * 0.20 // price discipline + (100 - risk) * 0.20 // capital preservation + technical * 0.15 // trend & momentum + catalyst * 0.15 // event timing + sentiment * 0.10 // crowd positioning// Rating tiers// 80+ → A (Strong Buy) max 40% position// 65-79 → B (Buy) max 25% position// < 65 → C (Pass) no buy
The Data Backbone
32 TOOLS / 10 CATEGORIES
Agents don't browse the web or read raw data feeds. They call structured tools through the Model Context Protocol. The tool server — called Parzival — handles retries, rate limiting, circuit breaking, and batch processing so agents don't have to. Data comes from Alpaca, Yahoo Finance, FRED, and Tavily.
Paper trading only. Every dollar is simulated. The Alpaca paper trading API mimics real market conditions, but no actual money moves.
Separate accounts. Each model has its own isolated brokerage account. One model's bad week can't affect another.
Compliance checks. No single stock can exceed 40% of the portfolio. At least 85% of capital must be invested. Stocks flagged as "not investable" by the Risk Agent are rejected regardless of how good their other scores look.
Circuit breakers. If the Macro Agent detects dangerous conditions, the system restricts or halts new purchases. In EMERGENCY mode, only sells are allowed.
FAQ
QUESTIONS
SIMULATED PERFORMANCE · EDUCATIONAL PURPOSES ONLY · NOT FINANCIAL ADVICE