Using LLM APIs

LLM APIs provide easy access to powerful language models without managing infrastructure. Simply send text prompts and receive AI-generated responses via HTTP requests.

Advantage: No GPUs needed! Pay per token, scale instantly, access latest models.

Popular LLM APIs

OpenAI API

GPT-4, GPT-3.5-turbo, DALL-E, Whisper.

Models: gpt-4, gpt-3.5-turbo
Pricing: ~$0.01-0.06 per 1K tokens
Features: Function calling, vision, audio

Anthropic API

Claude 3 (Opus, Sonnet, Haiku).

200K context window
Strong reasoning and analysis
Constitutional AI for safety

Google AI (Gemini)

Gemini Pro, Gemini Ultra.

Multimodal capabilities
Free tier available
Integrated with Google Cloud

Basic API Usage

python

Output:

Click "Run Code" to see output

Key Parameters

temperature

Controls randomness (0-2).

0: Deterministic, focused
0.7: Balanced
1.5+: Creative, diverse

max_tokens

Maximum response length.

Limits output size
Affects cost
~4 chars per token

top_p

Nucleus sampling (0-1).

Alternative to temperature
0.9: Consider top 90% probability mass

presence_penalty

Encourage new topics (-2 to 2).

Positive: More diverse topics
Negative: Stay on topic

Prompt Engineering

Be Specific

✗ Vague

"Tell me about AI"

✓ Specific

"Explain 3 key differences between supervised and unsupervised learning"

Provide Context

"You are an expert Python developer. Review this code for bugs and suggest improvements: [code]"

Use Examples (Few-Shot)

"Classify sentiment:
Text: 'I love this!' → Positive
Text: 'Terrible experience' → Negative
Text: 'It was okay' → ?"

Best Practices

✓Use system messages to set behavior and context

✓Start with lower temperature for factual tasks

✓Implement rate limiting and error handling

✓Cache responses when possible to save costs

✓Monitor token usage to control expenses

✓Use streaming for better UX on long responses

✓Validate and sanitize user inputs

✓Never expose API keys in client-side code

Key Takeaway: LLM APIs make powerful AI accessible. Focus on prompt engineering, manage costs with token limits, and always handle errors gracefully.