top of page
Search

Go-to-market filter: “Data Advantage × Distribution Advantage

  • Writer: Anmol Shantha Ram
    Anmol Shantha Ram
  • May 17
  • 3 min read

“Look for two things. 1. A data advantage that no one else has 2. A distribution advantage the customer already owns.

If you’re missing either, think twice before building or fine-tuning a custom LLM."


This week we had a candid conversation and reality-check with the superbly impressive Ali Ghodsi, CEO of Databricks on the challenges and opportunities defining the future of AI in business.


1. Large-scale model progress has hit a “soft wall”

  • Scaling laws kept delivering exactly the quality increases researchers predicted—until ~mid-2023.

  • Every major lab can still squeeze small gains, but nobody wants to ship a true next-generation model (“4 → 4.1 → 3.7…”) because the jump is no longer dramatic.

  • Without inference-time compute tricks (e.g., “reasoning” / tree-of-thought / MoE routing at decode time) the narrative might already feel like a burst bubble.

  • Databricks’ own DBRX briefly led the open-source charts, proving how fast the window of leadership now closes.


Implication: Breakthroughs over the next 18–24 months are more likely to come from novel training paradigms, retrieval, or tool-use than from brute-force pre-training runs.


2. Inference-time reasoning helps, but it doesn’t generalise

  • Techniques such as OpenAI’s GPT-4o or Anthropic’s “Claude 3.5 with chain-of-thought” mostly guide the model to find an answer, they don’t make the base weights smarter. Implication: For enterprises this means: good for latency/quality trade-offs on well-trodden tasks, not yet a silver bullet for domain-specific reliability.


3. Enterprises still struggle with two mundane problems

  1. Data semantics  Nobody has a complete, machine-readable definition of what every column, acronym or metric means. Even top-tier reasoning models add almost no lift when the semantic layer is missing.

  2. Evaluation and reliability 

    The industry lacks robust, domain-specific evals. Public leaderboards are “P-hacked”; enterprises need bespoke yardsticks to trust answers that can cost or save millions.


4. Databricks’ dual AI Strategy

  1. “Data Intelligence” Assistant – natural-language interface that answers questions over a company’s numerical data (English → SQL/Python → viz).

    Progress depends far more on building semantic context offline than on swapping in bigger LLMs.

  2. Custom-Model Factory – a services + platform motion that:

    • Ingests proprietary documents.

    • Generates large volumes of synthetic data to compensate for enterprises’ inevitable data sparsity (15 T tokens ≫ any corporate corpus).

    • Builds custom eval suites & judges.

    • Fine-tunes / configures the best open or closed model for that use-case.


5. Go-to-market filter: “Data Advantage × Distribution Advantage”

Databricks now vets AI opportunities by asking:

  1. Do you own a unique data asset competitors cannot easily replicate?

  2. Can you reach users at meaningful scale? If either answer is “no”, the customer is discouraged from embarking on an expensive custom-AI build (e.g., DIY HR-handbook bots).


6. Reality check on internal AI productivity

  • Databricks uses copilots everywhere (Cursor, GitHub Copilot, custom sales/HR bots).

  • Coding time is only ~20 % of an engineer’s week; copilots boost that slice, but they don’t touch the other 80 % (design, alignment, meetings, roadmap).

  • Meeting summaries, PM copilots, and collaborative agents “still suck” → <10 % productivity lift.

  • The hardest unsolved problem: an agent that can align 2 000 engineers’ roadmaps and mediate organisational politics.


7. Synthetic data and synthetic evals are the next battlefield

Because customers refuse to label at scale, Databricks is betting on:

  • Programmatic generation of high-quality instruction / domain pairs.

  • Automatic generation of domain-specific eval sets + LLM judges.

  • Tight iteration loops between eval → synthetic data → fine-tune.


8. Reliable AI is multi-year away, not multi-month

Ali places reliability for mission-critical enterprise AI 3–5 years away, not “the next quarter.”

Key blockers:

  1. High-fidelity semantic layers.

  2. Better eval frameworks.

  3. Tooling that can orchestrate data, models, and human oversight at scale.


9. CEO-Level automation is still science fiction

  • Even with full access to emails, Slack, calendar and multimodal context, an “AI chief-of-staff” remains a research project.

  • Startups should automate well-bounded, low-context tasks first; the CEO role is the worst place to start.


10. What would move the needle the fastest?

Ali’s hierarchy of needs:

  1. Robust, domain-specific evals.

  2. Cheap generation of high-quality synthetic data aligned to those evals.

  3. Tooling to integrate that pipeline seamlessly into enterprise MLOps.

Solve those. Reliability and adoption will follow.






 
 
 

Recent Posts

See All
My playbook for building MVPs

MVPs come up a lot in my conversations. Here's my playbook for planning and building MVPs.  - Launch (likely something bad) quickly  -...

 
 
 

Comments


© 2024 by Anmol Shantha Ram

bottom of page