Back to HomeGCP

GCP AI/ML and Vertex AI Complete Guide: From Model Training to Production Deployment

19 min min read
#Vertex AI#GCP AI#Machine Learning#AutoML#Gemini API#MLOps#BigQuery ML#TensorFlow#Generative AI#Enterprise AI

GCP AI/ML and Vertex AI Complete Guide: From Model Training to Production Deployment

GCP AI/ML and Vertex AI Complete Guide: From Model Training to Production Deployment

Want to adopt AI in your company but don't know where to start?

Training your own model is too complex, but using ready-made APIs might not be flexible enough?

GCP's AI services offer solutions ranging from "no-code" to "fully customized." This article will introduce you to GCP's AI ecosystem, from the Vertex AI platform to Gemini API, helping you find the best entry point.

Want to understand GCP's core services first? Please refer to "GCP Complete Guide: From Beginner Concepts to Enterprise Practice."


GCP AI/ML Service Ecosystem Overview

GCP's AI services aren't just one product—they're an entire ecosystem.

Google Cloud AI Market Position and Advantages

What advantages does Google have in AI?

Technical Foundation:

  • TensorFlow is open-sourced by Google
  • TPU (Tensor Processing Unit) is developed by Google
  • Transformer architecture (the basis of GPT, BERT) was invented by Google

Practical Experience:

  • Google Search, YouTube recommendations, Gmail spam filtering all use ML
  • These experiences are reflected in GCP's AI service design

Unique Advantages:

  • The most powerful data analytics platform (BigQuery)
  • Native AI infrastructure (TPU)
  • Complete MLOps toolchain

Choosing Between Pre-trained APIs and Custom Models

GCP AI services fall into two categories:

Pre-trained APIs (Ready-made):

  • Call the API directly to use
  • No training data needed
  • No ML knowledge required
  • Suitable for: common tasks, quick validation

Custom Models (Train your own):

  • Train with your data
  • Can optimize for specific needs
  • Requires ML knowledge or using AutoML
  • Suitable for: special requirements, seeking best results

How to Choose?

ScenarioChoiceReason
Recognize common objectsVision APIAlready trained
Detect product defectsAutoML VisionNeed your own data
Translate common languagesTranslation APIQuality is already good
Translate technical termsCustom modelRequires domain knowledge
Quick prototype validationPre-trained APIGet results quickly
Seeking best resultsCustom modelTargeted optimization

AI Service Architecture Diagram

GCP AI Service Layers:

┌─────────────────────────────────────────────────┐
│        Application Layer: Gemini API, Agent Builder │
├─────────────────────────────────────────────────┤
│              Platform Layer: Vertex AI              │
│  ┌──────────┬──────────┬──────────┬──────────┐ │
│  │ Workbench │ AutoML   │ Pipelines │ Model    │ │
│  │          │          │           │ Garden   │ │
│  └──────────┴──────────┴──────────┴──────────┘ │
├─────────────────────────────────────────────────┤
│          Data Layer: BigQuery, Cloud Storage        │
├─────────────────────────────────────────────────┤
│      Infrastructure: GPU, TPU, Compute Engine       │
└─────────────────────────────────────────────────┘

Vertex AI Platform Deep Dive

Vertex AI is GCP's unified AI platform. All ML work can be completed here.

Vertex AI Core Features

What does Vertex AI integrate?

FeatureDescriptionPrevious Service
WorkbenchJupyter Notebook environmentAI Platform Notebooks
TrainingModel training serviceAI Platform Training
PredictionModel deployment serviceAI Platform Prediction
AutoMLAutomated machine learningAutoML Vision/NL/Tables
PipelinesML workflowKubeflow Pipelines
Feature StoreFeature managementNew feature
Model RegistryModel version managementNew feature
Model GardenPre-trained model libraryNew feature

Benefits:

  • One interface to manage all ML work
  • Seamless integration between tools
  • Unified permissions and billing management

Workbench (Jupyter Notebook Environment)

The first step in ML is usually opening a Notebook to explore data.

Workbench Types:

TypeFeaturesSuitable For
Managed NotebooksFully managed, quick startMost users
User-Managed NotebooksMore controlNeed custom configuration

Create Workbench Instance:

gcloud workbench instances create my-notebook \
  --location=asia-east1-b \
  --machine-type=n1-standard-4

Pre-installed Tools:

  • JupyterLab
  • TensorFlow, PyTorch
  • Pandas, Scikit-learn
  • BigQuery connector
  • Git integration

Model Registry Management

Trained models need version management.

Features:

  • Model version tracking
  • Model metadata management
  • Deployment status tracking
  • A/B testing support

Upload Model to Registry:

from google.cloud import aiplatform

aiplatform.init(project='my-project', location='asia-east1')

model = aiplatform.Model.upload(
    display_name='my-model',
    artifact_uri='gs://my-bucket/model/',
    serving_container_image_uri='us-docker.pkg.dev/vertex-ai/prediction/tf2-cpu.2-8:latest'
)

Pipelines Workflow Automation

Automate the entire ML workflow.

What a Pipeline includes:

  1. Data loading
  2. Data preprocessing
  3. Model training
  4. Model evaluation
  5. Model deployment

Using Kubeflow Pipelines SDK:

from kfp import dsl
from kfp.v2 import compiler

@dsl.pipeline(name='my-pipeline')
def my_pipeline():
    # Define each step
    data_op = load_data_component()
    train_op = train_model_component(data=data_op.output)
    deploy_op = deploy_model_component(model=train_op.output)

# Compile and execute
compiler.Compiler().compile(my_pipeline, 'pipeline.json')

Feature Store Engineering

Features are the core of ML. Feature Store helps you manage them.

What problems does it solve?

  • Training and inference use the same features
  • Features can be shared across teams
  • Feature version management
  • Point-in-time correctness

Use Cases:

  • User features (age, preferences, behavior)
  • Product features (category, price, rating)
  • Real-time features (recent clicks, cart status)

AutoML: No-Code AI Modeling

Can you train ML models without writing code? AutoML makes this possible.

How AutoML Works

AutoML automatically handles:

  1. Data exploration and cleaning
  2. Feature engineering
  3. Model architecture search
  4. Hyperparameter tuning
  5. Model training
  6. Model evaluation

You only need to:

  1. Prepare labeled data
  2. Upload to Vertex AI
  3. Click "Train"
  4. Wait for completion

AutoML Vision (Image Recognition)

Supported Tasks:

  • Single-label classification (What is this?)
  • Multi-label classification (What things are there?)
  • Object detection (Where is it?)

Data Requirements:

  • Minimum 100 images per category
  • Recommended 1,000+ images for better results
  • Supports JPG, PNG, BMP, GIF

Use Cases:

  • Manufacturing: Defect detection
  • Retail: Product classification
  • Healthcare: Medical imaging assistance

AutoML Natural Language (Text Analysis)

Supported Tasks:

  • Text classification (sentiment analysis, topic classification)
  • Entity extraction (find names, places, organizations)
  • Sentiment analysis (positive, negative, neutral)

Data Requirements:

  • Minimum 1,000 documents
  • At least 100 per category
  • Supports plain text or CSV

Use Cases:

  • Customer service: Auto-classify complaints
  • Media: News topic classification
  • Social: Sentiment analysis

AutoML Tables (Structured Data)

Supported Tasks:

  • Classification (Will this customer churn?)
  • Regression (How many will this product sell?)

Data Requirements:

  • Minimum 1,000 rows of data
  • At least 2 feature columns
  • Supports CSV or BigQuery tables

Use Cases:

  • Finance: Credit risk assessment
  • Retail: Sales forecasting
  • Marketing: Customer churn prediction

AutoML Use Cases and Limitations

Good for AutoML:

  • No ML team
  • Want to quickly validate ideas
  • Task is a standard type
  • Data volume is not particularly large

Not suitable for AutoML:

  • Need cutting-edge model performance
  • Have complex custom requirements
  • Extremely large data volume (custom training more cost-effective)
  • Need special architectures (like GAN, reinforcement learning)

Cost Considerations:

  • AutoML charges by training hour
  • Training an image model costs about $3-20/hour
  • Complex tasks may require tens of hours of training

Gemini API and Generative AI

The hottest AI technology in 2024-2025: Generative AI.

Gemini Model Version Comparison (Pro / Flash / Ultra)

ModelFeaturesSuitable ForPrice
Gemini 2.0 FlashUltra-fast, low costReal-time apps, high-volume requestsLowest
Gemini 1.5 ProBalanced performance and costGeneral business appsMedium
Gemini 1.5 FlashFast responseConversation systems, lightweight tasksLower
Gemini UltraBest performanceComplex reasoning, professional tasksHighest

Selection Recommendations:

  • Start with Flash for prototyping
  • Evaluate Pro after confirming feasibility
  • Only use Ultra when truly needed

API Calls and Billing

Basic Call Example:

import google.generativeai as genai

genai.configure(api_key='YOUR_API_KEY')

model = genai.GenerativeModel('gemini-1.5-pro')
response = model.generate_content('Explain what machine learning is')

print(response.text)

Calling from Vertex AI:

from vertexai.generative_models import GenerativeModel

model = GenerativeModel('gemini-1.5-pro')
response = model.generate_content('Write a product description')

print(response.text)

Billing Method:

  • Charged by tokens (input + output)
  • 1,000 English words ≈ 700-900 tokens
  • Different models have different prices

Prompt Engineering Best Practices

A good prompt looks like this:

You are a professional product copywriter.

Task: Write a 50-word promotional copy for the following product.

Product Information:
- Name: Ultra-lightweight Laptop
- Weight: 900g
- Features: 16-hour battery life, military-grade durability

Requirements:
1. Use clear, professional English
2. Tone is lively but professional
3. Emphasize lightweight and battery advantages

Prompt Techniques:

  • Role Setting: Tell the model what role it is
  • Clear Task: Clearly state what to do
  • Provide Examples: Give one or two expected output examples
  • Specify Format: JSON? Bullet points? Paragraphs?
  • Set Constraints: Word count, language, tone

Enterprise Application Cases

Case 1: Customer Service Auto-Reply

  • Use Gemini to understand customer questions
  • Find answers from knowledge base
  • Generate natural language responses

Case 2: Document Summarization

  • Upload lengthy reports
  • Auto-generate key summaries
  • Extract key data

Case 3: Code Assistance

  • Explain existing code
  • Generate test cases
  • Suggest refactoring directions

Case 4: Content Generation

  • Product descriptions
  • Marketing copy
  • Technical documentation

BigQuery ML: SQL-Driven Machine Learning

Can data analysts do ML? They can with SQL.

BQML Supported Model Types

Model TypeSQL CommandSuitable Tasks
Linear RegressionLINEAR_REGPredict values
Logistic RegressionLOGISTIC_REGBinary classification
K-MeansKMEANSCustomer segmentation
Time SeriesARIMA_PLUSTrend forecasting
XGBoostBOOSTED_TREE_CLASSIFIERComplex classification
DNNDNN_CLASSIFIERDeep learning
AutoML TablesAUTOML_CLASSIFIERAutomated ML

Create and Train Model Syntax

Create Model:

CREATE OR REPLACE MODEL `my_dataset.sales_forecast`
OPTIONS(
  model_type='ARIMA_PLUS',
  time_series_timestamp_col='date',
  time_series_data_col='sales',
  time_series_id_col='product_id'
) AS
SELECT
  date,
  product_id,
  sales
FROM
  `my_dataset.sales_data`
WHERE
  date < '2024-01-01'

Forecast:

SELECT *
FROM ML.FORECAST(
  MODEL `my_dataset.sales_forecast`,
  STRUCT(30 AS horizon, 0.95 AS confidence_level)
)

Evaluate Model:

SELECT *
FROM ML.EVALUATE(MODEL `my_dataset.my_model`)

Use Cases and Performance Considerations

Good for BQML:

  • Data is already in BigQuery
  • Team is familiar with SQL
  • Want to quickly validate ideas
  • Task is standard classification/regression

Not suitable for BQML:

  • Need cutting-edge performance
  • Task requires custom architecture
  • Image, audio, and other unstructured data

Cost Tips:

  • Training costs calculated by data processed
  • Complex models take longer to train
  • Can set training budget limits

AI/ML Cost Planning and Optimization

AI projects can easily go over budget. Good cost planning is important.

Training vs Inference Cost Structure

Training Costs:

  • One-time cost
  • Charged by compute time
  • GPU/TPU costs are high
  • Can use Spot VMs to save money

Inference Costs:

  • Ongoing cost
  • Charged by predictions or time
  • Need to consider 24/7 running costs
  • Batch inference is cheaper than real-time

Cost Comparison Example:

ItemTraining CostInference Cost (Monthly)
Small Model$50-200$100-300
Medium Model$500-2,000$500-1,500
Large Model$5,000-20,000$2,000-10,000

GPU/TPU Selection and Cost Comparison

GPU Options:

GPUMemorySuitable ForHourly Cost
T416GBInference, small training~$0.35
L424GBBalanced~$0.70
A100 40GB40GBLarge training~$3.00
A100 80GB80GBVery large models~$4.00
H10080GBLatest and most powerful~$8.00

TPU Options:

TPUSuitable ForHourly Cost
v2-8Medium training~$4.50
v3-8Large training~$8.00
v5eInference optimized~$1.20

Selection Recommendations:

  • Development phase → T4 or L4
  • Production training → A100
  • TensorFlow large models → TPU
  • Inference service → T4 or v5e

Batch Inference Cost Reduction

Real-time vs Batch Inference:

TypeLatencyCostSuitable For
Real-time (Online)MillisecondsHigherReal-time apps
BatchMinutes to hoursLowerHigh-volume processing

Batch Inference Use Cases:

  • Daily customer score updates
  • Product recommendation pre-calculation
  • Report data analysis
  • Historical data backfill

Cost Difference: Batch inference can be 60-80% cheaper than real-time inference.


Enterprise AI Adoption Best Practices

From POC to production—how do enterprise AI projects progress?

Path from POC to Production

Phase 1: Exploration and Definition (2-4 weeks)

  • Confirm business problem
  • Assess data availability
  • Define success metrics
  • Evaluate technical feasibility

Phase 2: POC (4-8 weeks)

  • Small-scale data validation
  • Quickly build prototype
  • Verify if results meet targets
  • Estimate production environment costs

Phase 3: Development (8-16 weeks)

  • Complete data processing pipeline
  • Model tuning
  • Build MLOps processes
  • Integrate with existing systems

Phase 4: Launch (4-8 weeks)

  • Performance testing
  • Gradual rollout
  • Monitoring and alerting setup
  • Documentation and knowledge transfer

Common Failure Reasons:

  • Skipping POC and going straight to development
  • Underestimating data cleaning work
  • No clear success metrics
  • No MLOps leading to maintenance difficulties

MLOps and Model Monitoring

What MLOps includes:

  • Version control (data, code, models)
  • Automated training pipeline
  • Automated model deployment
  • Continuous monitoring and retraining

Model Monitoring Metrics:

  • Prediction performance (accuracy, recall)
  • Data drift
  • Concept drift
  • Latency and throughput

Vertex AI Model Monitoring:

from google.cloud import aiplatform

# Enable monitoring
endpoint = aiplatform.Endpoint('endpoint-id')
endpoint.update(
    traffic_split={'model-v1': 100},
    enable_model_monitoring=True,
    model_monitoring_config={
        'alert_config': {
            'email_alert_config': {
                'user_emails': ['[email protected]']
            }
        }
    }
)

Data Governance and Compliance

Data Privacy:

  • PII de-identification
  • Data minimization principle
  • Access control
  • Usage logging and tracking

Model Compliance:

  • Model explainability
  • Bias detection and mitigation
  • Decision transparency
  • Human review mechanism

GCP Compliance Tools:

  • Data Loss Prevention (DLP): Automatically detect and mask sensitive data
  • Cloud Audit Logs: Record all operations
  • VPC Service Controls: Network-level isolation

For security details, see "GCP Security and Cloud Armor Protection Complete Guide."


Want to Adopt AI in Your Enterprise?

From Gemini to building your own LLM, there are many choices but also many pitfalls.

Schedule AI Adoption Consultation and let experienced professionals help you avoid pitfalls.

CloudInsight's AI Adoption Services:

  • Requirements Assessment: Clarify business needs, confirm if AI is the best solution
  • Technology Selection: Use ready-made APIs or train your own?
  • POC Planning: Quickly validate feasibility and effectiveness
  • Cost Estimation: Complete cost estimation for training, inference, and maintenance
  • Architecture Design: Complete solution from data to deployment

Conclusion: Building Your GCP AI Strategy

GCP's AI services are comprehensive. The key is finding the right entry point for you.

Selection Recommendations:

Your SituationRecommended Solution
Want to quickly try AIGemini API
Have data but no ML teamAutoML
Data is in BigQueryBigQuery ML
Have ML team wanting more controlVertex AI Custom Training
Need complete MLOpsVertex AI Pipelines

Recommendations for Different Roles:

For Business Executives:

  • Start with Gemini for internal efficiency tools
  • Accumulate experience from small projects
  • Expand investment after success

For Engineers:

  • Get familiar with the Vertex AI platform
  • Practice AutoML and custom training
  • Understand MLOps best practices

For Data Analysts:

  • Start with BigQuery ML
  • Gradually learn AutoML
  • Collaborate with engineering teams

AI adoption is a journey, not a single project. Start small, keep learning, and gradually scale up.

FAQ

Q1: Vertex AI vs. direct Gemini API — when to pick which?

Vertex AI is "enterprise-grade Gemini"; Gemini API is "direct usage." Differences: (1) Gemini API (AI Studio) — (A) more free quota (15 req/min); (B) data defaults to being used for Google model improvement; (C) fits individual developers, prototyping, non-commercial use; (2) Vertex AI Gemini — (A) enterprise contracts, SLA, data residency guarantees; (B) data isn't used for model training (contractual guarantee); (C) supports VPC-SC (confining AI traffic to your VPC); (D) complete audit logging; (E) integrated with IAM and Workload Identity. Commercial use must use Vertex AI — not for pricing reasons, but for contracts and data protection. Pricing comparison: nearly identical (Gemini 2.0 Flash ~$0.075/M tokens input, $0.30/M tokens output). Practical guidance: (A) prototyping — Gemini API, rapid idea validation; (B) formal development — immediately migrate to Vertex AI; (C) already on GCP — use Vertex AI, simpler IAM integration; (D) other cloud but wanting Gemini — use Vertex AI + Workload Identity Federation.

Q2: How much does fine-tuning a Vertex AI model cost?

Depends on model size, data volume, training hours, practical range $50–10,000+. (1) Supervised Fine-Tuning (SFT) on Gemini — billed by tokens; Gemini 1.5 Flash ~$8/M training tokens, Gemini 1.5 Pro ~$80. Fine-tuning Flash on 5,000 examples (500 tokens each) costs ~$20; Pro ~$200. (2) RLHF (Reinforcement Learning from Human Feedback) — more expensive, requires preference data, each training run $1,000–5,000. (3) Custom Training (Vertex AI Training) — GPU/TPU by the hour, A100 at $3.5–4/hour, H100 at $10+/hour; training a mid-size model (10B parameters, 1B tokens) ~$500–3,000. Cost-saving tips: (1) Try prompt engineering or few-shot first — fine-tuning is often unnecessary; a good prompt with 5–10 examples achieves 95% of the effect; (2) Use smaller models — Flash instead of Pro; (3) Quality beats quantity — 1,000 high-quality examples beat 10,000 noisy data points; (4) Use batch API, not online — batch is 50% cheaper. When fine-tuning is actually needed: (A) the model needs domain-specific knowledge (medical, legal, company-specific terminology); (B) output format is very fixed (internal report formats); (C) prompt engineering tried but insufficient. Only if at least one is true.

Q3: BigQuery ML vs. Vertex AI AutoML — which should data analysts use?

BigQuery ML better for data analysts; AutoML better for those with ML background. (1) BigQuery ML — (A) write ML models in SQL; (B) zero data movement when data already in BigQuery; (C) supports logistic/linear regression, matrix factorization, time series (ARIMA), DNN, boosted trees, AutoML tabular; (D) lowest barrier for data analysts — SQL skill is enough for ML. (2) Vertex AI AutoML — (A) UI-based, no coding needed; (B) auto-selects models and tunes hyperparameters; (C) covers tabular, image, video, text — broadest scope; (D) still low-barrier but data needs export+import into Vertex, one more step than BigQuery-native. Selection guidance: (A) data in BigQuery + simple prediction — BigQuery ML (classification, regression, time series); (B) need image/NLP models — Vertex AI AutoML; (C) need deep learning or ensembles — Vertex AI Custom Training + TensorFlow/PyTorch; (D) structured data prediction with best accuracy — BigQuery ML with AutoML tabular method. Real cases: sales forecasting, customer churn, trend forecasting use BigQuery ML; image recognition, document classification use Vertex AutoML.

Q4: What free / open-source models does Vertex AI Model Garden offer?

Quite a few — Model Garden aggregates Google's own + third-party + open-source models, one-click deploy. 2025 main categories: (1) Google models (some with free quota) — Gemini series, PaLM, Imagen, Chirp (speech), MedPaLM (medical); (2) Open-source (weights free; you pay compute) — Llama 3.1/3.2/3.3, Gemma series, Mistral/Mixtral, DeepSeek, Qwen, Stable Diffusion, BERT, T5; (3) Third-party commercial models — Anthropic Claude (paid), AI21, Cohere. Deployment methods: (A) Vertex AI Endpoints — 24/7 hosted GPU/TPU, fits steady traffic; (B) Batch Prediction — one-off bulk inference, cheap but not real-time; (C) Import to Colab Enterprise — trial/experimentation. Cost considerations: (A) open-source model compute costs similar to self-hosting (GPU $3–10/hour), but without Gemini API's "pay-per-token" convenience; (B) fits private model requirements (sensitive data, offline capability); (C) if only inference is needed, Gemini API is cheaper (no idle GPU cost). Practical guidance: 99% of use cases pick Gemini (paid is fine, good performance, zero ops); open-source self-hosting only for specialized needs.

Q5: What are the most common pitfalls when enterprises integrate Gemini into product features?

Five major pitfalls. (1) Runaway hallucinations — model confidently generates incorrect information. Fix: use Grounding (Vertex AI Search or Google Search grounding) to tie model output to real data; add response_schema to enforce structured output, preventing free-form drift. (2) Runaway costs — post-launch request volume spikes, long context burns budget. Fix: (A) implement quotas first (per user, per endpoint); (B) use cache + deduplication — don't re-query identical prompts; (C) try smaller model first (Flash), upgrade to Pro only when needed; (D) enable Context Caching to save on repeated context cost (50%+ savings). (3) Prompt Injection exploitation — users craft malicious prompts to bypass restrictions. Fix: input-side validation, output-side content filter, use Model Armor (Vertex AI's prompt injection protection). (4) Latency issues — Gemini Pro responses take 2–5 seconds, impacting UX. Fix: (A) use streaming response to let UI render incrementally; (B) switch simple tasks to Flash (~500ms); (C) use Batch API for bulk tasks. (5) Compliance audits — customers ask "is data used for training?" "where does data reside?" Fix: (A) use Vertex AI (not Gemini API) for enterprise contractual guarantees; (B) pick specific regions (e.g., asia-east1 Taiwan region); (C) enable VPC-SC to restrict data boundary; (D) retain complete audit logs for compliance review.


Further Reading


Image Descriptions





References

  1. Google Cloud, "Vertex AI Documentation" (2024)
  2. Google Cloud, "AutoML Documentation" (2024)
  3. Google Cloud, "Gemini API Documentation" (2024)
  4. Google Cloud, "BigQuery ML Documentation" (2024)
  5. Google Cloud, "MLOps: Continuous delivery and automation pipelines in machine learning" (2024)

Need Professional Cloud Advice?

Whether you're evaluating cloud platforms, optimizing existing architecture, or looking for cost-saving solutions, we can help

Book Free Consultation

Related Articles