What AI should I use? (May 2025)

This is an important question that requires the following detailed explanation:

There are a lot of LLM (large language models) inside each AI system but they all perform better at different tasks. Some may be good at research or coding and some better at examining images or creating videos.
It isn’t possible to subscribe to a single AI system even if its ‘multimodal’ and expect it to do everything, however this may be possible soon.

The term “Multimodal AI” means the AI can handle multiple inputs, like text with images, videos, hand written documents, audio, excel sheets, etc..
However each LLM is only good at certain inputs and the results will vary depending on which AI you use.

Most AI systems require a subscription. The free options are very limited and will not give the same detail/quality/accurate results and you will not have access to the advanced LLMs on offer or the ability to process larger amounts of data as input or output.

There is one important concept to understand before going any further.
The term “context window” or “token” means “the amount of information the AI can process or remember at one time” and this includes information going into the AI (what you provide by typing, document/image upload) and what it can feed back to you as a result as text/code/images. The input and output is a combined token value, so keep that in mind.

A “token” (ChatGPT/Gemini) is calculated using the following formula: 1 token = 4 text characters including spaces and punctuation.

Some AI systems support 128k (128’000) or 200k but some offer 1m (1 million) and 2m is just becoming available. *Free AI is 4k or less.
Typical example of 100 tokens is equal to around +-75 words or +-1 paragraph.
A4 single page of 500 words is about 600 to 800 tokens.
A typical novel with 80’000 word / 400’000 text characters is around 100’000 tokens.
750’000 words (English dictionary) is 1 million tokens.
However a 44 million word encyclopaedia is 58.6 million tokens.
* Pay careful attention to how many tokens each LLM can process, as exceeding this can cause truncated or poor results.
A list AI systems and their token amounts is located further down this web page.
Images also count as tokens but the calculation is complex and not critically important for general use.

The top AI systems for quick reference are:

OpenAI (ChatGPT) / Sora
Google Gemini / Google Veo 2
Anthropic Claude Sonnet
xAI Grok
Meta Llama
DeepSeek R1
Writer

The majority of people would do extremely well with just Google Gemini and ChatGPT subscriptions which would give them some of the most accurate and extensive results.

Websites that offer combined access to multiple LLMs usually for a very low price often have hidden rate limit, question limit and lesser token limits than direct access to the AI provider. Also you have to trust that service as they will also have the ability to see all your questions and answers.
The typical £$20 a month for direct access to the LLMs, like Gemini or ChatGPT (OpenAI) for example usually gives you more privacy options, access to experimental or newer models, a greater input output value, compatibility with dedicated phone apps, memory and usually no rate limiting.

Optimal AI/LLMs Ranked by Use Case (May 2025)

Use Case	Optimal LLMs / Systems (Ranked: Best first, based on May 2025 info)
General Chat / Reasoning / Multimodality	1. OpenAI GPT-4o / 4.1 / o3: State-of-the-art conversational ability, strong reasoning (especially o3/o4 models), excellent integrated multimodality (GPT-4o handles text, audio, image input/output smoothly). 2. Google Gemini 2.5 Pro: Leading native multimodality (text, image, audio, video input), strong reasoning performance on benchmarks, large context. 3. Anthropic Claude 3.7 Sonnet: Excellent reasoning (especially [R] variant), reliable, good instruction following, strong vision capabilities, transparent “thinking mode”. 4. xAI Grok 3: Noted for strong reasoning capabilities and access to real-time information. 5. DeepSeek R1: Top-performing open-source model for reasoning tasks.
Coding	1. Anthropic Claude 3.7 Sonnet [R]: Frequently cited as top performer on coding benchmarks (like SWE Bench), excellent for complex tasks and debugging (via “thinking mode”). 2. Google Gemini 2.5 Pro: Top scores on some benchmarks, proven ability for complex code generation (e.g., functional simulators). 3. OpenAI GPT-4.1 / o3: Very strong coding performance, good instruction following, potentially excels at code reviews. Powers GitHub Copilot. 4. DeepSeek R1: Leading open-source model specifically strong in coding and mathematical reasoning. 5. Meta Llama 3 Series: Highly capable and widely used open-source option for various coding tasks.
Image Generation (Integrated LLM Capability)	1. Google Gemini / Imagen 3: Often rated highest for photorealism and overall image quality in recent comparisons. Integrated into the Gemini platform. 2. OpenAI GPT-4o: Produces high-quality images with excellent text rendering and prompt adherence directly within chat interfaces like ChatGPT. Note: For specific needs, specialized tools often excel: Midjourney (artistic/stylized results), Stable Diffusion / FLUX.1 (open-source, high customization), Ideogram (best-in-class text rendering), Adobe Firefly (professional integration, commercial use focus).
Video Generation (Integrated/Announced Foundational Models)	1. Google Veo 2: Reported to generate high-quality, coherent video with good understanding of physics and cinematic styles. 2. OpenAI Sora: Capable of generating highly creative and complex scenes, though output quality and access can be variable. Note: The market heavily relies on specialized platforms: Runway (Gen-3, creative tools), Pika (image-to-video), Luma AI (Dream Machine/Ray2 for realism), Kling (realistic motion), Synthesia (AI avatars), Adobe Firefly Video (professional editing).
Large Context Tasks / Document Analysis	1. Google Gemini 1.5 / 2.5 Pro: Industry-leading context window (up to 2 Million tokens reported), ideal for analyzing vast amounts of text or code. 2. OpenAI GPT-4.1: Massive 1 Million token context window, very capable for long document comprehension. 3. Writer Palmyra X5: New enterprise-focused model with 1 Million tokens, claiming good speed and cost-efficiency for agentic workflows. 4. Meta Llama 4: (Upcoming) Promises context windows up to 10 Million tokens. 5. Anthropic Claude 3.7 Sonnet: Very large context window (200k+ tokens), sufficient for most long document analysis tasks.
Medical Research Applications (Applying LLMs)	1. Google Gemini 1.5/2.5 Pro: Unmatched context window is ideal for processing and finding insights within extensive medical literature, patient data (anonymized), or clinical trial results. 2. Anthropic Claude 3.7 Sonnet: High reliability, strong reasoning, and large context make it suitable for in-depth analysis of complex research papers and medical documents where accuracy is critical. 3. OpenAI GPT-4o/4.1: Powerful general analytical and summarization capabilities applicable to research tasks; large context in 4.1 is a plus. Note: Progress often relies on specialized AI tools fine-tuned on medical data for tasks like image analysis (e.g., MindGlide), drug discovery, or risk prediction.
Legal Applications (Applying LLMs)	1. Anthropic Claude 3.7 Sonnet: Strong reasoning, reliability, and ability to handle long documents make it a preferred choice for tasks like contract review, analysis, and legal research requiring high accuracy. 2. Google Gemini 1.5/2.5 Pro: Extremely large context window provides an advantage when dealing with voluminous case files, discovery documents, or legislation. 3. OpenAI GPT-4o/4.1: Highly capable for drafting legal documents, summarizing cases, and assisting with legal research queries. Note: The legal industry heavily utilizes specialized AI platforms trained on legal data: CoCounsel (Casetext), Harvey.ai, Lex Machina, LEGALFLY, Westlaw Edge, Lexis+, Plexus, etc., often provide superior domain-specific performance.
Open Source Development / Customization	1. Meta Llama 3 Series (3.1, 3.3): Generally considered the leading open-source family for overall performance, versatility, large ecosystem, and strong community support. 2. DeepSeek R1: Top choice if the primary need is state-of-the-art open-source reasoning and coding performance (permissive MIT license). 3. Mistral Large 2 / Small 3: Very strong and efficient open-weight models, popular alternative to Llama. 4. Qwen 2 Series (Alibaba): Excellent performance, particularly noted for multilingual tasks, coding, and math. 5. Gemma 3 (Google) / Phi-4 (Microsoft): High-quality, smaller open models suitable for specific tasks or deployment in resource-constrained environments.

AI Model Leaderboard Top 10 (May 2025)

This table shows representative performance data for top Large Language Models based on information available at the start of May 2025. Rankings and metrics can change rapidly. For the latest live data, please refer to sources like Artificial Analysis. These are based on intelligence not image generation or video creation.

Model	Creator	Context Window	Artificial Analysis Intelligence Index
o4-mini (high)	OpenAI	200k	70
Gemini 2.5 Pro	Google	1m	68
o3	OpenAI	128k	67
Grok 3 mini Reasoning (high)	xAI	1m	67
o3-mini (high)	OpenAI	200k	66
o3-mini	OpenAI	200k	63
o1	OpenAI	200k	62
Llama 3.1 Nemotron Ultra 253B Reasoning	NVIDIA	128k	61
Gemini 2.5 Flash (Reasoning)	Google	1m	60
DeepSeek R1	deepseek	128k	60

Notes:

Context Window sizes (e.g., 1M = 1 Million tokens, 200k = 200,000 tokens). Note: Free editions of these same LLMs only offer 4k or less.
Multimodal capabilities vary (Yes = Text, Audio, Image, Video; Vision/Image = Text + Image/Vision input).
AI Intelligence Index: A standardised test for AI systems to rank their intelligence with complex knowledge, reasoning/understanding, coding and a range subjects like ‘Math, Physics, Chemistry, Law, Engineering, Economics, Health, Psychology, Business, Biology, Philosophy, and Computer Science’.

Additional Links:

A good YouTube channel to see all the latest AI News, features and abilities of image/video generation is @theAIsearch

TeamAI the multi AI solution that lets you use a range of AI systems for one price and is also good for business teams.

Abacus is a single AI assistant giving you access to ALL the top LLMs with search, video, image and coding all included. *$10pm

AI-Search is a repository of AI tools/apps which may or may not be better than using the major AI LLMs directly.

Synthesia AI Tools List has a good selection of AI options to choose from in addition to our list here.

Additional AI System:

DreaminaAI: is an amazing art/image AI. See the web page for ideas!
Jasper AI: A powerful AI content creation platform for generating various types of content, from marketing copy to social media posts.
AdCreative.ai: Most used AI tool for advertising and marketing.
Vidu : Advanced AI video generator.
Synthesia: A video creation platform that uses AI to create realistic videos from text.
Runway: A versatile AI video creation platform with features like text-to-video, image-to-video, and motion tools.
Midjourney: a leading image generation AI system.
KlingAI: Next level AI video generator.
ElevenLabs: An AI voice and audio tool for creating versatile audio assets.
Suno: An AI voice and audio tool for creating creative text-to-audio results. AI Music Creator.
Murf: AI text to speech.
AudioX: Anything to audio creator AI.
Udio: Create any song you can imagine.https://www.udio.com/
Adobe Firefly: An AI image generator with features like Generative Fill and Expand, integrating with Adobe’s suite.
Canva: A visual platform that offers AI-powered tools for design and content creation.
QuillBot: An AI tool for summarizing, paraphrasing, and checking grammar, particularly useful for research.
Tidio: A customer service AI solution.
Planner5d: Floor plans to 3D renders with AI assisted design.
Maket: Allows anyone to design & plan their new build or renovation project in a few simple steps.
DeepL: The worlds most advanced language translator. There is also a writing tool.
OmniTalker: Real-Time Text-Driven Talking Head Generation with In-Context Audio-Visual Style Replication!
OmniHuman: Hyper-realistic videos from just a single image and a motion signal like audio or video input. *
OmniSVG: Vector graphics generator AI.
ByteDance UNO: Amazing AI image generator that includes image blending, virtual try on, story generation and more.
ByteDance InfinitineYou: Image generation using your own images. Recreate yourself or others.
SparkAudio TTS: Advanced Voice cloner. Github coding.
AI Portrait: Forget photography for your personal portrait or website profile. Use this AI tool for amazing results.
Huggingface: A range of AI apps. Qwen3, DeepSite, Dia (Text to speech), Thera Super Upscale and more.
Pinokio: A system to run local AI tools typically found on Github easily on your PC. Wide range of ready to run apps.

Research & Learning:

Semantic Scholar: A free, AI-powered research tool for scientific literature, offering advanced search and filtering capabilities.
Perplexity: A search engine that uses LLMs to provide AI-generated answers with citations.
Stanford HAI’s AI Index: A comprehensive resource for data and insights on artificial intelligence.
Litmaps: Scientific literature research.
Gemini 2.5 “Deep Research” or ChatGPT “Deep Research” is often the best research tool to use.

Medical:

Diagnostic Support

IBM Watson Health (now part of Merative): Known for oncology decision support and mining medical literature.
Google DeepMind: Specializes in retinal disease diagnosis and has demonstrated performance at or above expert level in radiology.

Medical Imaging

Aidoc: AI-powered radiology tool for detecting anomalies in CT scans (e.g. strokes, pulmonary embolism).
Arterys: Cloud-based platform for AI-assisted imaging, particularly useful in cardiology and oncology.
Zebra Medical Vision: Offers a wide suite of FDA-cleared AI algorithms for radiology.

Administrative and Clinical Documentation

Nuance Dragon Medical One (Microsoft): Voice-to-text with AI assistance, streamlines documentation.
Suki AI: Smart digital assistant for doctors that automates note-taking and EHR integration.

Patient Interaction & Triage

Babylon Health: AI chatbot for symptom checking and triage, integrated with telemedicine.
Infermedica: AI triage and pre-diagnosis tools used by health systems worldwide.

Drug Discovery & Research

BenevolentAI, Atomwise, Insilico Medicine: Use AI to accelerate drug discovery and target identification.

Medical Consulting:

Heidi Health is a reputable AI-powered medical scribe platform that has been gaining traction among clinicians in the UK, including NHS practices. It is designed to assist healthcare professionals by transcribing patient consultations and generating clinical notes, thereby reducing administrative burdens and allowing more focus on patient care.

Real-Time Transcription: Heidi listens to patient-clinician interactions during consultations and generates detailed medical notes in real time. Clinicians can review and approve these notes before they are added to the patient’s medical record, ensuring accuracy and completeness.
Customizable Templates: The platform offers various note formats, such as SOAP and CHEDDAR, and allows for customization to fit individual clinician preferences. This flexibility helps in maintaining consistency and efficiency in documentation.
Integration with EHR Systems: Heidi can integrate with existing Electronic Health Record (EHR) systems, facilitating seamless incorporation into current workflows.
Compliance and Data Security: Heidi Health adheres to strict data protection regulations, including GDPR and NHS standards. It employs advanced encryption and hosts data within the UK to ensure patient information is handled securely and confidentially.
Time Efficiency: By automating the documentation process, Heidi allows clinicians to spend more time with patients, improving the quality of care and communication during consultations.

Nuance Dragon Medical One – by Microsoft (UK/USA)

Purpose: Real-time voice transcription and clinical documentation.
Why it’s good: Doctors can speak naturally during consultations, and it generates accurate, structured notes in the EHR.
Best for: General practitioners, specialists, and hospital use.

Suki AI (USA)

Purpose: AI assistant for clinical documentation.
Why it’s good: Learns individual doctor preferences, integrates with major EHRs, and generates SOAP notes with voice commands.
Best for: Busy outpatient settings and private practices.

Glass AI (Glass Health)

Purpose: Generates differential diagnoses and clinical plans based on text input.
Why it’s good: Doctors input a patient summary, and it suggests clinical reasoning and next steps.
Best for: Supporting junior clinicians or enhancing diagnostic thinking.

Infermedica

Purpose: AI triage and symptom checker.
Why it’s good: Helps streamline the first part of a consultation by pre-screening symptoms before the doctor sees the patient.
Best for: Telemedicine and pre-visit intake.

Abridge (USA)

Purpose: Automatic consultation summarization.
Why it’s good: Records conversations with patient consent and generates summaries for both doctors and patients.
Best for: In-person or telehealth visits where notes and follow-ups are important.