Skip to content

Instantly share code, notes, and snippets.

@atemate
Last active December 26, 2024 15:25
Show Gist options
  • Save atemate/739814480e5860f65777d07725612e78 to your computer and use it in GitHub Desktop.
Save atemate/739814480e5860f65777d07725612e78 to your computer and use it in GitHub Desktop.
GCP cheatsheet
Feature/Use Case AlloyDB Cloud Spanner Bigtable Firestore Firebase Realtime Database Memorystore Cloud SQL Cloud Storage Dataproc (HBase)
Type Relational (PostgreSQL-compatible) Relational (NewSQL, globally distributed) NoSQL (wide-column database) NoSQL (document-oriented) NoSQL (real-time JSON-based) In-memory key-value store Relational (MySQL, PostgreSQL, SQL Server) Object Storage NoSQL (Hadoop-based)
Key Use Case High-performance relational DB with analytical capabilities Globally distributed relational DB Real-time, high-throughput, low-latency workloads Real-time apps, mobile, web apps Real-time syncing for mobile/web Low-latency caching General-purpose relational DB File/object storage, backups Batch processing on HDFS
Scaling Horizontal compute, separate storage Horizontal scaling, global consistency Horizontal scaling, massive scale Horizontally scalable with regional/global options Horizontally scalable Scales horizontally by sharding Vertical scaling (VM-based limits) Unlimited horizontal Limited by cluster resources
Transactions Strong consistency (ACID) Strong consistency (ACID) Limited to row-level atomicity Document-level atomicity Limited atomic writes Not transactional Strong consistency (ACID) Not applicable Limited to HBase capabilities
Schema Relational (PostgreSQL-compatible) Relational (Spanner SQL) NoSQL (wide-column schema) NoSQL (schema-flexible JSON) NoSQL (schema-flexible JSON) Key-value Relational (SQL-compliant) No schema NoSQL (wide-column schema)
Performance High for mixed OLTP/OLAP workloads Optimized for global consistency Optimized for low-latency, high-throughput Low-latency real-time updates Ultra-low latency for sync Ultra-low latency (sub-ms) Good for standard workloads Dependent on access patterns Good for batch/analytical loads
Query Language SQL (PostgreSQL) SQL (Spanner SQL) Limited SQL-like queries NoSQL-style queries NoSQL-style queries Not applicable SQL (MySQL/PostgreSQL/SQL Server) Not queryable HBase shell/MapReduce
Regional/Global Regional Global Regional Regional/Global Regional Regional Regional Global Regional
Integration PostgreSQL ecosystem Google ecosystem Google ecosystem Google ecosystem Google ecosystem Redis/Memcached-compatible APIs MySQL/PostgreSQL ecosystem General-purpose Hadoop ecosystem
Cost Efficiency Mid-range High for small workloads Cost-effective for high throughput Cost-effective for real-time apps Very cost-effective for small apps Cost-efficient for caching Mid-range for general workloads Low for static storage Varies with cluster size
Best For Mixed transactional and analytical workloads Multi-region/global apps requiring strict consistency IoT, analytics, logging Mobile/web apps needing real-time sync Mobile/web apps, IoT, low-complexity sync Caching and session storage Standard relational apps File storage, backups, archives Analytical processing on big data

When to Use Which?

  1. Relational Needs:

    • AlloyDB: PostgreSQL workloads with advanced analytics.
    • Cloud SQL: Traditional relational databases for smaller-scale apps.
    • Cloud Spanner: Global-scale relational apps requiring strong consistency.
  2. NoSQL Needs:

    • Firestore: Structured data with real-time sync for apps.
    • Firebase Realtime Database: Low-latency syncing for simpler apps.
    • Bigtable: Massive-scale NoSQL for analytics or time-series data.
  3. In-Memory Needs:

    • Memorystore: For ultra-fast caching or session data.
  4. Unstructured Data:

    • Cloud Storage: Binary data like images, videos, and backups.
  5. Big Data Processing:

    • Bigtable: High-throughput analytics.
    • Dataproc (HBase): Hadoop-based batch processing.

(c) ChatGPT

Service Purpose Key Features Common Use Cases Integration
Dataflow Stream and batch data processing using Apache Beam. Serverless, autoscaling, unified stream and batch processing, supports windowing and watermarking. ETL, real-time analytics, log analysis, IoT data processing. BigQuery, Pub/Sub, Cloud Storage, AI Platform.
Dataprep Cloud-based data preparation and cleaning tool (by Trifacta). Drag-and-drop UI, data profiling, intelligent suggestions, serverless, integrates with cloud storage. Preprocessing datasets for ML models, cleaning messy data, preparing datasets for BI tools. BigQuery, Cloud Storage, Sheets, Dataflow.
Dataproc Managed Apache Hadoop and Spark service for big data processing. Quick cluster setup, autoscaling, preemptible VMs, native integration with GCP tools. Data transformation, scalable workflows, machine learning on large datasets. Cloud Storage, BigQuery, Dataflow, Vertex AI.
Datastore NoSQL document database for web and mobile apps. Scalable, high availability, automatic sharding, ACID transactions, JSON-like documents. Storing semi-structured data, user profiles, product catalogs, scalable backends for mobile/web apps. App Engine, Compute Engine, GKE, Firestore.
Data Catalog Centralized metadata management service for discovering and managing datasets. Searchable metadata, data lineage tracking, custom tagging, security controls. Organizing datasets, enhancing discoverability, managing metadata across data lakes and warehouses. BigQuery, Dataplex, Dataflow, Pub/Sub.
Dataplex Unified data management and governance across data lakes and data warehouses. Automated metadata management, data quality checks, policy enforcement, unified data view. Ensuring data governance, building trusted data lakes, centralizing analytics-ready data. BigQuery, Cloud Storage, Data Catalog, Looker.
Data Fusion Fully managed data integration service for building ETL pipelines visually. Drag-and-drop pipeline building, pre-built connectors, cloud-native ETL processing. Simplified data integration, combining data from multiple sources, enterprise-grade data pipelines. BigQuery, Cloud Storage, JDBC connectors.
Data Studio Business intelligence tool for creating customizable dashboards and reports. Interactive dashboards, real-time data visualization, customizable templates, easy sharing. Building executive dashboards, visualizing business KPIs, creating sharable and interactive reports. BigQuery, Sheets, Looker Studio, Cloud Storage.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment