Hi, I'm Martin 👋

Software & Cloud Engineer. I help businesses make production systems (including AI-powered ones) faster, cheaper, and more reliable.

Services

Performance & Cost Fix Sprint

What I Offer

Hands-on analysis of production performance and cloud costs
Includes AI/LLM-specific issues: high inference costs, slow response times, unreliable outputs
Identification of the single highest-impact bottleneck
Direct implementation of targeted fixes
Improvements to observability where needed to validate impact
Clear before/after metrics and next steps

How I Work

I start with real data and production visibility, not assumptions.

I isolate the few issues that create most of the latency, instability, or cost, then fix one of them end-to-end. Scope is intentionally narrow to avoid risky or unnecessary changes.

The goal is a measurable improvement within a short, defined engagement.

I implement changes directly and verify results with metrics.

This includes AI and LLM systems. If your product uses AI and something feels slow, expensive, or unreliable, that falls within this work too — not just traditional backend and infrastructure issues.

Typical Outcomes

2–10x faster API endpoints20–60% lower cloud costsStabilized production systems30–70% lower AI/LLM costs

€1,500

14-day engagement

Implementation included

How Engagements Work

1
Initial call
2
Production analysis
3
Bottleneck isolation
4
Fix implementation
5
Before/after metrics

Consulting Call

Have a specific performance or cloud question? I'll review your situation and explain what's likely going wrong, what matters, and what doesn't.

Custom Engagements

Need something different? I also offer follow-up implementation help, deeper reviews, or ongoing advisory work depending on your needs and constraints.

Work Experience

Contributions

Previous Work

Results that speak for themselves

From AI platforms to mobile apps, I've helped teams achieve measurable improvements in performance, scalability, and cost efficiency.

Artifimo

Built a complete AI automation platform from scratch. Achieved sub-200ms response times on LLM orchestration and 99.9% uptime across all client deployments.

ProblemHigh latency on LLM orchestration calls

FixEnd-to-end pipeline architecture redesign

ResultSub-200ms responses, 99.9% uptime

Actiko

Implemented intelligent caching and RAG optimization that reduced API costs by 65% while improving response quality scores by 40%.

ProblemHigh API costs with declining quality

FixIntelligent caching + RAG optimization

Result65% cost reduction, 40% quality boost

VOWCE

Optimized speech-to-text pipeline achieving real-time transcription with 95% accuracy. Reduced app bundle size by 35% through code splitting.

ProblemSlow transcription, oversized app bundle

FixPipeline optimization + code splitting

ResultReal-time transcription, 35% smaller bundle

JobCue

Architected scalable interview processing system handling 1000+ concurrent sessions. Reduced infrastructure costs by 50% through smart resource allocation.

ProblemInterview system couldn't scale

FixSmart resource allocation architecture

Result1000+ concurrent sessions, 50% cost cut

Postmate

Built high-throughput content generation pipeline. Implemented queue workers that process 10,000+ posts daily with zero downtime.

ProblemContent pipeline bottleneck

FixQueue worker architecture

Result10,000+ posts/day, zero downtime

Sentimenty

Delivered enterprise-grade feedback system with real-time analytics. Achieved 60ms average page load through edge caching and optimization.

ProblemSlow page loads, no real-time data

FixEdge caching + rendering optimization

Result60ms page loads, real-time analytics

CloseUp.Pics

Engineered GPU inference pipeline with 3x faster image generation. Built monitoring stack that reduced debugging time by 80%.

ProblemSlow image generation, blind debugging

FixGPU inference pipeline + monitoring stack

Result3x faster generation, 80% less debug time

IrreglY

Developed scalable mobile reporting system serving thousands of users. Implemented efficient geospatial queries with sub-100ms response times.

ProblemSlow geospatial queries at scale

FixEfficient query optimization

ResultSub-100ms responses, thousands of users

PromptFern

Built AI recommendation monitoring across ChatGPT, Claude, Perplexity, and Gemini with real-time alerts and a 60s refresh loop. Implemented audit trails and multilingual tracking to surface brand mentions worldwide, reducing analysis time by 70%.

ProblemManual brand mention tracking across AI platforms

FixReal-time multi-platform monitoring loop

Result70% less analysis time, global coverage