
Inference on GKE Private Clusters
Setting up inference service without access to Internet Deploying an inference service on...
Browse the latest writing surfaced through DevArt.

Setting up inference service without access to Internet Deploying an inference service on...

So you’ve built your AI application prototype. You used your own local GPU to run the AI model, or...

At Google, our team (Google Cloud Samples) uses Gemini to produce thousands of samples in batches. In doing so, we’ve learned that the biggest hurdle isn’t the AI, it’s our own expectations about these tools.

My team at Google is automating sample code generation and maintenance. Part of that is using...

Authenticated in the CLI but still getting "Could not load default credentials"? Let's bridge the gap between gcloud and your application code.

Neal Sample called it the "Lumberjack Paradox": AI gives us a chainsaw, but we risk forgetting how to use the axe. In this post, I explore why code samples are the critical "line of representation" for modern engineering, and why fragmented documentation isn't just confusing developers—it's poisoning the AI models they rely on.

Written in cooperation with Aron Eidelman. As organizations race to deploy powerful GPU-accelerated...

In the age of artificial intelligence and machine learning, there is a constant need for powerful...

As I entered the office today, it was clear that physical desktop computers are becoming a rarity....

Quick run-down of one of the interactive demos that was presented at Next 2025, from the architecture to the products and features showcased.

This post will guide you through deploying a simple “Hello, World!” application on Cloud Run. You’ll...

"Wpadła śliwka w .... Google Cloud" 😉 Recently, thanks to the Ministry of Digital Affairs, there's...

While you were out shopping or cleaning up around the house, have you ever wondered what an item is...

Compressing keys and values to reduce the cache size is MLA’s key innovation Attention is the...

Learn how to streamline exposing AI models using LangChain and LangServe, deployed on Google Kubernetes Engine (GKE).

Learn how to leverage Google Kubernetes Engine (GKE) to deploy an AI-powered LangChain application backed by an local instance of Gemma 2.

Learn how to leverage Google Kubernetes Engine (GKE) to deploy an AI-powered LangChain application backed by Gemini.

Fresh is the most popular web framework built on Deno. With the imminent Deno 2.0 launch, now is a...

Let's learn what LangChain is, how it can help simplify development of AI-powered applications, and how to get started.

Recently I needed to pull data from the GitHub API and publish to a Google Sheet so I could share...

Node.js 22.6.0 adds a new option for lightweight TypeScript support. What's nice about this is that...

Node.js has experimental support for building a single executable application, or SEA, which is what...

Unix is well-known for advocating the philosophy that commands should do one thing and do it...

This post continues my AI exploration series with a look at the open source solution Ray and how it...

Context: I'm a decision record enthusiast and will absolutely write more about this in the future....

Generative AI has potential applications far beyond chatbots and Retrieval Augmented Generation. For...

LangChain4j 0.32.0 was released yesterday, including my pull requestwith the support for lots of new...

The Vertex AI SDK for Python now offers local tokenization. This feature allows you to calculate...

Prelude As I’m focusing a lot on Generative AI, I’m curious about how things work under...

The happy users of Gemini Advanced, the powerful AI web assistant powered by the Gemini model, can...