AI Practice Management · Legal Ethics · Thought Leadership
Why AI Performance Data Now Belongs On The Managing Partner’s Desk
By Jeff Howell, Esq. · AI Legal Strategist
The Bottom Line
The Mercor APEX benchmark shows that AI is already capable of producing significant associate level work. At the same time, accuracy and consistency vary greatly between models. Some tools help a law firm produce stronger work product. Other tools quietly introduce risk. Managing partners now need three essential competencies to use AI in a way that is ethical, safe, and strategically aligned with firm goals. Those competencies are Tool Literacy, Verification Protocols, and Strategic Governance. These determine whether AI improves the firm or endangers the firm.
APEX: From Hype To Measurable Performance
This article uses relevant findings from the AI Productivity Index (APEX) published by Mercor. APEX evaluates how leading AI models perform on realistic tasks in law, investment banking, consulting, and medicine. The full benchmark can be found here: https://www.mercor.com/apex/.
APEX evaluates associate level work. The tasks require structured reasoning, multi hour analysis, drafting, and complex written advice. These are tasks that real lawyers perform, not simple multiple choice questions or trivia prompts.
For law firm leaders, the message is straightforward. AI is now a direct influence on legal work product quality, risk, and client perception. The question has shifted from whether firms should use AI to what level of competency partners must have in order to use AI responsibly and competitively.
Based on APEX data and ethical obligations of technology competence, three competencies are required at the managing partner level:
- Tool Literacy to understand what AI can and cannot do in legal tasks.
- Verification Protocols to ensure AI supports the work instead of silently damaging accuracy.
- Strategic Governance to select the right models based on documented performance spreads.
Competency 1: Tool Literacy
According to APEX, even the highest performing models do not reach expert human performance. The top model in the benchmark, GPT 5 (Thinking High), achieved a mean score of 64.2 percent.
Some models perform at a much lower level. The lowest scoring model in APEX, Phi 4 Multimodal, produced a mean score of 20.7 percent. The difference between these two models is 43.5 percentage points (64.2 minus 20.7 equals 43.5).
For legal tasks specifically:
- The average performance across models was approximately 56.9 percent.
- The top performance in the law domain was 70.5 percent by GPT 5 (Thinking High).
These numbers show that AI is neither weak nor fully reliable. AI is capable enough to meaningfully shape legal work, yet fallible enough that every firm must understand the limits. Tool Literacy for managing partners means understanding those limits clearly and teaching lawyers to work alongside AI rather than over trust it.
AI is now strong enough to help your associates and still imperfect enough to threaten your accuracy if you use it without discipline.
– Jeff Howell, Esq.
Tool Literacy includes:
- Understanding that different models have very different profiles and strengths.
- Recognizing that APEX tests multi hour legal tasks, not superficial chat exchanges.
- Knowing that high performing models still produce errors that map directly to malpractice risk.
- Tying AI capability to your duty of competence and supervision, not to marketing language.
Competency 2: Verification Protocols
APEX highlights that AI models are not deterministic. The same model can produce different scores on the same prompt across different runs. Across all models, the average spread between the model’s lowest and highest score over three runs on the same task was 11.9 percentage points.
This means that even when prompts remain identical, outputs can vary. Sometimes the model produces a strong answer. Other times it misses key reasoning steps or citations. Firms that do not build verification processes allow these silent failures to travel into filings, motions, and client communications.
Effective Verification Protocols for a law firm should include:
- Use case limits for where AI may assist and where human drafting remains primary.
- Mandatory review by a supervising attorney before AI assisted work leaves the firm.
- Citation checks inside your research platforms for any case or statute generated by AI.
- Usage logs that capture which matters, which models, and which reviewers were involved.
- Alignment with bar guidance on confidentiality, supervision, and technology risk.
The goal is not to eliminate AI errors. The goal is to eliminate unobserved AI errors that slip into client work.
– Jeff Howell, Esq., Founder, Lex Wire Journal
When you know from APEX that models can swing more than ten points across runs, you can justify verification as a governance requirement, not as an optional best practice.
Competency 3: Strategic Governance
APEX shows major gaps between model families. Closed source models produced an average score of 55.2 percent. Open source models produced an average of 45.8 percent.
At the model level:
- Best overall model: GPT 5 (Thinking High) at 64.2 percent.
- Best law domain model: GPT 5 (Thinking High) at 70.5 percent.
- Lowest model: Phi 4 Multimodal at 20.7 percent.
The difference between 64.2 and 20.7 is a 43.5 point performance gap. That is the difference between a model that can be trusted as a useful junior associate and a model that may regularly miss basic criteria on complex matters.
Strategic Governance means:
- Selecting models based on benchmarked performance rather than vendor promises.
- Mapping which models are best for which practice areas based on domain specific results.
- Considering open source versus closed source tradeoffs in light of performance, security, and cost.
- Updating your model choices as new benchmarks and model releases arrive.
- Treating model selection as a leadership and risk decision, not only an IT decision.
Firm Assessment Checklist
Use this checklist to evaluate your current AI readiness in partner meetings or executive sessions.
Tool Literacy
- Partners can explain in plain language the strengths and weaknesses of the models they’re using .
- Maintain a documented list of which models are used and for which workflows.
- Associates receive training on when to rely on AI and when not to.
- Explicitly connect AI use to ethical duties of competence and supervision in internal guidance.
Verification Protocols
- All AI assisted work product is reviewed by a responsible lawyer before leaving the firm.
- High risk matters have written limits on how AI may be used.
- All citations generated by AI are manually verified in Westlaw, Lexis, or internal resources.
- Log AI usage per matter so you have a clear audit trail on which tools influenced which work.
Strategic Governance
- Reference benchmarks like APEX when evaluating or changing AI vendors.
- Consider domain specific performance when matching models to practice areas.
- Review AI vendor and model stack at least once per year.
- Have a plan in place if a model is deprecated, loses performance, or becomes a regulatory concern.
From Insight To Implementation
Most firms do not need a full time Chief AI Officer yet. They do need someone who can translate the current state of AI tools into real policies, training, and workflows that align with ethics and strategy. A fractional AI practice management function gives the firm leadership level guidance without adding a full time role, and helps ensure that AI improves client service instead of silently increasing risk.
About the author
Jeff Howell, Esq., is a dual-licensed attorney and AI legal strategist who helps law firms build structured authority in an era of AI-driven search and discovery. He aligns ethical obligations with AI tools to minimize risk and manage compliance.
