atemate/GCP_dataXXX_products.md

## GCP_databases.md

      
    Raw
  

              GCP_databases.md
            
          
Feature/Use Case
AlloyDB
Cloud Spanner
Bigtable
Firestore
Firebase Realtime Database
Memorystore
Cloud SQL
Cloud Storage
Dataproc (HBase)


Type
Relational (PostgreSQL-compatible)
Relational (NewSQL, globally distributed)
NoSQL (wide-column database)
NoSQL (document-oriented)
NoSQL (real-time JSON-based)
In-memory key-value store
Relational (MySQL, PostgreSQL, SQL Server)
Object Storage
NoSQL (Hadoop-based)


Key Use Case
High-performance relational DB with analytical capabilities
Globally distributed relational DB
Real-time, high-throughput, low-latency workloads
Real-time apps, mobile, web apps
Real-time syncing for mobile/web
Low-latency caching
General-purpose relational DB
File/object storage, backups
Batch processing on HDFS


Scaling
Horizontal compute, separate storage
Horizontal scaling, global consistency
Horizontal scaling, massive scale
Horizontally scalable with regional/global options
Horizontally scalable
Scales horizontally by sharding
Vertical scaling (VM-based limits)
Unlimited horizontal
Limited by cluster resources


Transactions
Strong consistency (ACID)
Strong consistency (ACID)
Limited to row-level atomicity
Document-level atomicity
Limited atomic writes
Not transactional
Strong consistency (ACID)
Not applicable
Limited to HBase capabilities


Schema
Relational (PostgreSQL-compatible)
Relational (Spanner SQL)
NoSQL (wide-column schema)
NoSQL (schema-flexible JSON)
NoSQL (schema-flexible JSON)
Key-value
Relational (SQL-compliant)
No schema
NoSQL (wide-column schema)


Performance
High for mixed OLTP/OLAP workloads
Optimized for global consistency
Optimized for low-latency, high-throughput
Low-latency real-time updates
Ultra-low latency for sync
Ultra-low latency (sub-ms)
Good for standard workloads
Dependent on access patterns
Good for batch/analytical loads


Query Language
SQL (PostgreSQL)
SQL (Spanner SQL)
Limited SQL-like queries
NoSQL-style queries
NoSQL-style queries
Not applicable
SQL (MySQL/PostgreSQL/SQL Server)
Not queryable
HBase shell/MapReduce


Regional/Global
Regional
Global
Regional
Regional/Global
Regional
Regional
Regional
Global
Regional


Integration
PostgreSQL ecosystem
Google ecosystem
Google ecosystem
Google ecosystem
Google ecosystem
Redis/Memcached-compatible APIs
MySQL/PostgreSQL ecosystem
General-purpose
Hadoop ecosystem


Cost Efficiency
Mid-range
High for small workloads
Cost-effective for high throughput
Cost-effective for real-time apps
Very cost-effective for small apps
Cost-efficient for caching
Mid-range for general workloads
Low for static storage
Varies with cluster size


Best For
Mixed transactional and analytical workloads
Multi-region/global apps requiring strict consistency
IoT, analytics, logging
Mobile/web apps needing real-time sync
Mobile/web apps, IoT, low-complexity sync
Caching and session storage
Standard relational apps
File storage, backups, archives
Analytical processing on big data


When to Use Which?


Relational Needs:

AlloyDB: PostgreSQL workloads with advanced analytics.
Cloud SQL: Traditional relational databases for smaller-scale apps.
Cloud Spanner: Global-scale relational apps requiring strong consistency.


NoSQL Needs:

Firestore: Structured data with real-time sync for apps.
Firebase Realtime Database: Low-latency syncing for simpler apps.
Bigtable: Massive-scale NoSQL for analytics or time-series data.


In-Memory Needs:

Memorystore: For ultra-fast caching or session data.


Unstructured Data:

Cloud Storage: Binary data like images, videos, and backups.


Big Data Processing:

Bigtable: High-throughput analytics.
Dataproc (HBase): Hadoop-based batch processing.


(c) ChatGPT

  
## GCP_dataXXX_products.md

      
    Raw
  

              GCP_dataXXX_products.md
            
          
Service
Purpose
Key Features
Common Use Cases
Integration


Dataflow
Stream and batch data processing using Apache Beam.
Serverless, autoscaling, unified stream and batch processing, supports windowing and watermarking.
ETL, real-time analytics, log analysis, IoT data processing.
BigQuery, Pub/Sub, Cloud Storage, AI Platform.


Dataprep
Cloud-based data preparation and cleaning tool (by Trifacta).
Drag-and-drop UI, data profiling, intelligent suggestions, serverless, integrates with cloud storage.
Preprocessing datasets for ML models, cleaning messy data, preparing datasets for BI tools.
BigQuery, Cloud Storage, Sheets, Dataflow.


Dataproc
Managed Apache Hadoop and Spark service for big data processing.
Quick cluster setup, autoscaling, preemptible VMs, native integration with GCP tools.
Data transformation, scalable workflows, machine learning on large datasets.
Cloud Storage, BigQuery, Dataflow, Vertex AI.


Datastore
NoSQL document database for web and mobile apps.
Scalable, high availability, automatic sharding, ACID transactions, JSON-like documents.
Storing semi-structured data, user profiles, product catalogs, scalable backends for mobile/web apps.
App Engine, Compute Engine, GKE, Firestore.


Data Catalog
Centralized metadata management service for discovering and managing datasets.
Searchable metadata, data lineage tracking, custom tagging, security controls.
Organizing datasets, enhancing discoverability, managing metadata across data lakes and warehouses.
BigQuery, Dataplex, Dataflow, Pub/Sub.


Dataplex
Unified data management and governance across data lakes and data warehouses.
Automated metadata management, data quality checks, policy enforcement, unified data view.
Ensuring data governance, building trusted data lakes, centralizing analytics-ready data.
BigQuery, Cloud Storage, Data Catalog, Looker.


Data Fusion
Fully managed data integration service for building ETL pipelines visually.
Drag-and-drop pipeline building, pre-built connectors, cloud-native ETL processing.
Simplified data integration, combining data from multiple sources, enterprise-grade data pipelines.
BigQuery, Cloud Storage, JDBC connectors.


Data Studio
Business intelligence tool for creating customizable dashboards and reports.
Interactive dashboards, real-time data visualization, customizable templates, easy sharing.
Building executive dashboards, visualizing business KPIs, creating sharable and interactive reports.
BigQuery, Sheets, Looker Studio, Cloud Storage.
Feature/Use Case	AlloyDB	Cloud Spanner	Bigtable	Firestore	Firebase Realtime Database	Memorystore	Cloud SQL	Cloud Storage	Dataproc (HBase)
Type	Relational (PostgreSQL-compatible)	Relational (NewSQL, globally distributed)	NoSQL (wide-column database)	NoSQL (document-oriented)	NoSQL (real-time JSON-based)	In-memory key-value store	Relational (MySQL, PostgreSQL, SQL Server)	Object Storage	NoSQL (Hadoop-based)
Key Use Case	High-performance relational DB with analytical capabilities	Globally distributed relational DB	Real-time, high-throughput, low-latency workloads	Real-time apps, mobile, web apps	Real-time syncing for mobile/web	Low-latency caching	General-purpose relational DB	File/object storage, backups	Batch processing on HDFS
Scaling	Horizontal compute, separate storage	Horizontal scaling, global consistency	Horizontal scaling, massive scale	Horizontally scalable with regional/global options	Horizontally scalable	Scales horizontally by sharding	Vertical scaling (VM-based limits)	Unlimited horizontal	Limited by cluster resources
Transactions	Strong consistency (ACID)	Strong consistency (ACID)	Limited to row-level atomicity	Document-level atomicity	Limited atomic writes	Not transactional	Strong consistency (ACID)	Not applicable	Limited to HBase capabilities
Schema	Relational (PostgreSQL-compatible)	Relational (Spanner SQL)	NoSQL (wide-column schema)	NoSQL (schema-flexible JSON)	NoSQL (schema-flexible JSON)	Key-value	Relational (SQL-compliant)	No schema	NoSQL (wide-column schema)
Performance	High for mixed OLTP/OLAP workloads	Optimized for global consistency	Optimized for low-latency, high-throughput	Low-latency real-time updates	Ultra-low latency for sync	Ultra-low latency (sub-ms)	Good for standard workloads	Dependent on access patterns	Good for batch/analytical loads
Query Language	SQL (PostgreSQL)	SQL (Spanner SQL)	Limited SQL-like queries	NoSQL-style queries	NoSQL-style queries	Not applicable	SQL (MySQL/PostgreSQL/SQL Server)	Not queryable	HBase shell/MapReduce
Regional/Global	Regional	Global	Regional	Regional/Global	Regional	Regional	Regional	Global	Regional
Integration	PostgreSQL ecosystem	Google ecosystem	Google ecosystem	Google ecosystem	Google ecosystem	Redis/Memcached-compatible APIs	MySQL/PostgreSQL ecosystem	General-purpose	Hadoop ecosystem
Cost Efficiency	Mid-range	High for small workloads	Cost-effective for high throughput	Cost-effective for real-time apps	Very cost-effective for small apps	Cost-efficient for caching	Mid-range for general workloads	Low for static storage	Varies with cluster size
Best For	Mixed transactional and analytical workloads	Multi-region/global apps requiring strict consistency	IoT, analytics, logging	Mobile/web apps needing real-time sync	Mobile/web apps, IoT, low-complexity sync	Caching and session storage	Standard relational apps	File storage, backups, archives	Analytical processing on big data
Service	Purpose	Key Features	Common Use Cases	Integration
Dataflow	Stream and batch data processing using Apache Beam.	Serverless, autoscaling, unified stream and batch processing, supports windowing and watermarking.	ETL, real-time analytics, log analysis, IoT data processing.	BigQuery, Pub/Sub, Cloud Storage, AI Platform.
Dataprep	Cloud-based data preparation and cleaning tool (by Trifacta).	Drag-and-drop UI, data profiling, intelligent suggestions, serverless, integrates with cloud storage.	Preprocessing datasets for ML models, cleaning messy data, preparing datasets for BI tools.	BigQuery, Cloud Storage, Sheets, Dataflow.
Dataproc	Managed Apache Hadoop and Spark service for big data processing.	Quick cluster setup, autoscaling, preemptible VMs, native integration with GCP tools.	Data transformation, scalable workflows, machine learning on large datasets.	Cloud Storage, BigQuery, Dataflow, Vertex AI.
Datastore	NoSQL document database for web and mobile apps.	Scalable, high availability, automatic sharding, ACID transactions, JSON-like documents.	Storing semi-structured data, user profiles, product catalogs, scalable backends for mobile/web apps.	App Engine, Compute Engine, GKE, Firestore.
Data Catalog	Centralized metadata management service for discovering and managing datasets.	Searchable metadata, data lineage tracking, custom tagging, security controls.	Organizing datasets, enhancing discoverability, managing metadata across data lakes and warehouses.	BigQuery, Dataplex, Dataflow, Pub/Sub.
Dataplex	Unified data management and governance across data lakes and data warehouses.	Automated metadata management, data quality checks, policy enforcement, unified data view.	Ensuring data governance, building trusted data lakes, centralizing analytics-ready data.	BigQuery, Cloud Storage, Data Catalog, Looker.
Data Fusion	Fully managed data integration service for building ETL pipelines visually.	Drag-and-drop pipeline building, pre-built connectors, cloud-native ETL processing.	Simplified data integration, combining data from multiple sources, enterprise-grade data pipelines.	BigQuery, Cloud Storage, JDBC connectors.
Data Studio	Business intelligence tool for creating customizable dashboards and reports.	Interactive dashboards, real-time data visualization, customizable templates, easy sharing.	Building executive dashboards, visualizing business KPIs, creating sharable and interactive reports.	BigQuery, Sheets, Looker Studio, Cloud Storage.