Cloud Databases - SQL/Spanner/Firestore/BigTable/Memorystore - GCP Learning Notes.

Cloud SQL is a fully managed service which includes MySQL, PostgreSQL, Microsoft SQL Server. And for horizontal scalability, consider using Cloud Spanner.

Cloud SQL

Example - (courtesy google images)

Build a containerized app with a scalable database

Managed services include.

Patching and updates to the service automatically.
We can administer the users.
Cloud SQL supports many clients like.
- gcloud sql
- App Engine.
- Toad.
- SQL Workbench etc.

Instance.

30TB Storage
416 GB RAM
40k IOPS

Services

Replica services
Backups
Import/Export
Scaling
- Up: Machine capacity [restart required]
- Out: Replicas

Connecting to Cloud SQL

Within the same GCP region - use private IP.
Outside of GCP - use Cloud Proxy.
- Authorized Networks if no SSL connection, then authorize a specific IP to connect.

Private IP

Migration from MySQL to Cloud SQL

Overview.

Migration from MySQL to Cloud SQL

Cloud Spanner

Relational database structure but non-relation horizontal scaling.
Can scale up to petabytes.
Automatic sharding.
Transactional consistency at global scale.
Auto sync replications.
Uptime: SLA

Replications

Data availability
Geographic locality
Single database experience
Easier application development

Types.

read-write replicas,
read-only replicas, and
witness replicas.

`read-write` replicas

Maintain a full copy of your data.
Serve reads.
Can vote whether to commit a write.
Participate in leadership election.
Are eligible to become a leader.
Are the only type used in single-region instances.

`read-only` replicas

As the name suggests are read-only.
Maintain a full copy of your data, which is replicated from read-write replicas.
Serve reads.
Do not participate in voting to commit writes. Hence, the location of the read-only replicas never contributes to write latency.
Can usually serve stale reads without needing a round-trip to the default leader region *
Are not eligible to become a leader.

`witness` replicas

Are only used in multi-region instances.
Do not maintain a full copy of data.
Do not serve reads.
Vote whether to commit writes.
Participate in leader election but are not eligible to become leader.

Creating a Database in Spanner.

Clone repo git clone https://github.com/googleapis/python-spanner
Move to samples cd python-spanner/samples/samples
Create a Virtual environment.

Requirement.txt

google-cloud-spanner==3.2.0
futures==3.3.0; python_version < "3"

virtualenv env
source env/bin/activate
pip install -r requirements.txt

Sample code to create database.

# Imports the Google Cloud Client Library.
from google.cloud import spanner

def create_database(instance_id, database_id):
    """Creates a database and tables for sample data."""
    spanner_client = spanner.Client()
    instance = spanner_client.instance(instance_id)

    database = instance.database(
        database_id,
        ddl_statements=[
            """CREATE TABLE Singers (
            SingerId     INT64 NOT NULL,
            FirstName    STRING(1024),
            LastName     STRING(1024),
            SingerInfo   BYTES(MAX)
        ) PRIMARY KEY (SingerId)""",
            """CREATE TABLE Albums (
            SingerId     INT64 NOT NULL,
            AlbumId      INT64 NOT NULL,
            AlbumTitle   STRING(MAX)
        ) PRIMARY KEY (SingerId, AlbumId),
        INTERLEAVE IN PARENT Singers ON DELETE CASCADE""",
        ],
    )

    operation = database.create()

    print("Waiting for operation to complete...")
    operation.result(120)

    print("Created database {} on instance {}".format(database_id, instance_id))

Creating Backups

Create a backup

# Imports the Google Cloud Client Library.
from google.cloud import spanner

def create_backup(instance_id, database_id, backup_id, version_time):
    """Creates a backup for a database."""
    spanner_client = spanner.Client()
    instance = spanner_client.instance(instance_id)
    database = instance.database(database_id)

    # Create a backup
    expire_time = datetime.utcnow() + timedelta(days=14)
    backup = instance.backup(backup_id, database=database, expire_time=expire_time, version_time=version_time)
    operation = backup.create()

    # Wait for backup operation to complete.
    operation.result(1200)

    # Verify that the backup is ready.
    backup.reload()
    assert backup.is_ready() is True

    # Get the name, create time and backup size.
    backup.reload()
    print(
        "Backup {} of size {} bytes was created at {} for version of database at {}".format(
            backup.name, backup.size_bytes, backup.create_time, backup.version_time
        )
    )

Restore database from backup

# Imports the Google Cloud Client Library.
from google.cloud import spanner

def restore_database(instance_id, new_database_id, backup_id):
    """Restores a database from a backup."""
    spanner_client = spanner.Client()
    instance = spanner_client.instance(instance_id)
    # Create a backup on database_id.

    # Start restoring an existing backup to a new database.
    backup = instance.backup(backup_id)
    new_database = instance.database(new_database_id)
    operation = new_database.restore(backup)

    # Wait for restore operation to complete.
    operation.result(1600)

    # Newly created database has restore information.
    new_database.reload()
    restore_info = new_database.restore_info
    print(
        "Database {} restored to {} from backup {} with version time {}.".format(
            restore_info.backup_info.source_database,
            new_database_id,
            restore_info.backup_info.backup,
            restore_info.backup_info.version_time
        )
    )

When to use

Outgrown single instance database.
Sharding is required for better throughput.
Transactional consistancy is required.
Strong global consistancy.
DB consolidation, combine multiple different databases into one service.

Cloud Firestore.

Simplifies storage of data.
Live sync and offline support.
ACID Transactions.
Multi-region replication.

Data Model

Firestore is a NoSQL, document-oriented database. Unlike a SQL database, there are no tables or rows. Instead, you store data in documents, which are organized into collections.

Similar to mongoDB.
Data is organized in to collections
document are set of key-value pairs.
Hierarchy of Collections -> list of documents [Documents]

Example to show how this looks. Below we have users collection with 2 documents alovelace and aturing

users
    alovelace
		first : "Ada"
		last : "Lovelace"
		born : 1815

    aturing
		first : "Alan"
		last : "Turing"
		born : 1912

Every document in Firestore is uniquely identified by its location within the database. The previous example showed a document alovelace within the collection users. To refer to this location in your code, you can create a reference to it.

a_lovelace_ref = db.collection(u'users').document(u'alovelace')

We can also give path to a document or collection as a string, separated forward slash (/). For example, to create a reference to the alovelace document:

a_lovelace_ref = db.document(u'users/alovelace')

Both above code snippets will give the same output.

When to choose.

Adaptable database - schema might change.
Scale down to zero.
Low maintenance and no overhead to scaling up to TB.

Best practices

Database Location
- Closest to users.
- Multi region for better
- Regional location for lower costs and lower write latency (if app is sensitive to latency).
Document ID.
- Avoid using . and ..
- Aviod / in the document ID (as we can traverse using / path)
- Avoid monotonically increasing document IDs. id1, id2, id3 …
Field names.
- Avoid be low chars.
- ., [, ], *, ` (backtick)
Indexes.
- Avoid too many indexes.
- Be aware that indexing fields with monotonically increasing values, such as timestamps, can lead to hotspots which impact latency for applications with high read and write rates.

Sample example.

from google.cloud import firestore

# Project ID is determined by the GCLOUD_PROJECT environment variable
db = firestore.Client()

data = {
    u'stringExample': u'Hello, World!',
    u'booleanExample': True,
    u'numberExample': 3.14159265,
    u'dateExample': datetime.datetime.now(),
    u'arrayExample': [5, True, u'hello'],
    u'nullExample': None,
    u'objectExample': {
        u'a': 5,
        u'b': True
    }
}

db.collection(u'data').document(u'one').set(data)

Cloud Bigtable.

If no transactional consistancy.
Fully managed NoSQL database petabytes scale with very low latency.
Scales for throughput.
Running on core services in google - maps, analytics, gmail.
Supports Hbase API - Apache Hbase.
Integrates with hadoop, cloud dataflow, cloud dataproc.

Data Model.

Table
- Multiple rows.
Row
- Row keys, one or more columns.
- Rows are sorted alphabetically by row kwy.
- Design for row key is very important.
- Key should be designed with mind that related data should be close to each other.
- Example: Area-code/city if we are creating a population graph.
Column
- A column in HBase consists of a column family
- column qualifier
- which are delimited by a : (colon) character.
- Example: content:html
Column Family
- Column families physically colocate a set of columns and their values, often for performance reasons.
- Each row in a table has the same column families,
- though a given row might not store anything in a given column family.
Column Qualifier
- Mutable
- Are added to CF column family to provide index.
- Ex: content:html or content:pdf
Cell
- Combination of row, column family, and column qualifier.
- Contains a value and a timestamp, which represents the value’s version.
- Example: "com.cnn.www" | t6 | contents:html = "<html> ... "
Timestamp
- A timestamp is written alongside each value.
- Used for versioning.
- By default, the timestamp represents the time on the RegionServer when the data was written.
- But we can specify a different timestamp value when you put data into the cell.

Storage Model.

Stores data in scalable tables which are sorted key/value map.
Table contains rows and column
rows describe single entity.
column contain individual vaules for each row. column family

Conceptual View

Multi-dimensional map. This is only a mock-up.

{
  "com.cnn.www": {
    contents: {
      t6: contents:html: "<html>..."
      t5: contents:html: "<html>..."
      t3: contents:html: "<html>..."
    }
    anchor: {
      t9: anchor:cnnsi.com = "CNN"
      t8: anchor:my.look.ca = "CNN.com"
    }
    people: {}
  }
  "com.example.www": {
    contents: {
      t5: contents:html: "<html>..."
    }
    anchor: {}
    people: {
      t5: people:author: "John Doe"
    }
  }
}

Above we can illustrate what the different components of the data.

Row com.cnn.www.
Column contents:html.
Column Family contents, anchor.
Column Qualifier html, cnnsi.
Cell com.cnn.www | t6 | contents:html: "<html>..." .
Timestamp t6, t5.

Example: Twitter followers.

follows

Hbase Shell - Table Operations

Creating a Table.

hbase(main):001:0> create 'test', 'cf'
0 row(s) in 0.4170 seconds

=> Hbase::Table - test

Use the list command to confirm your table exists

hbase(main):002:0> list 'test'
TABLE
test
1 row(s) in 0.0180 seconds

=> ["test"]

Now use the describe command to see details, including configuration defaults

hbase(main):003:0> describe 'test'
Table test is ENABLED
test
COLUMN FAMILIES DESCRIPTION
{NAME => 'cf', VERSIONS => '1', EVICT_BLOCKS_ON_CLOSE => 'false', NEW_VERSION_BEHAVIOR => 'false', KEEP_DELETED_CELLS => 'FALSE', CACHE_DATA_ON_WRITE =>
'false', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', MIN_VERSIONS => '0', REPLICATION_SCOPE => '0', BLOOMFILTER => 'ROW', CACHE_INDEX_ON_WRITE => 'f
alse', IN_MEMORY => 'false', CACHE_BLOOMS_ON_WRITE => 'false', PREFETCH_BLOCKS_ON_OPEN => 'false', COMPRESSION => 'NONE', BLOCKCACHE => 'true', BLOCKSIZE
 => '65536'}
1 row(s)
Took 0.9998 seconds

Put data into your table. To put data into your table, use the put command.

hbase(main):003:0> put 'test', 'row1', 'cf:a', 'value1'
0 row(s) in 0.0850 seconds

hbase(main):004:0> put 'test', 'row2', 'cf:b', 'value2'
0 row(s) in 0.0110 seconds

hbase(main):005:0> put 'test', 'row3', 'cf:c', 'value3'
0 row(s) in 0.0100 seconds

scan table data.

hbase(main):006:0> scan 'test'
ROW                                      COLUMN+CELL
 row1                                    column=cf:a, timestamp=1421762485768, value=value1
 row2                                    column=cf:b, timestamp=1421762491785, value=value2
 row3                                    column=cf:c, timestamp=1421762496210, value=value3
3 row(s) in 0.0230 seconds

Get single row.

hbase(main):007:0> get 'test', 'row1'
COLUMN                                   CELL
 cf:a                                    timestamp=1421762485768, value=value1
1 row(s) in 0.0350 seconds

Bigtable Architecture

Bigtable

Tablets are stored on Colossus, Google’s file system.
Stored in SSTable format.

Importantly, data is never stored in Cloud Bigtable nodes themselves; each node has pointers to a set of tablets that are stored on Colossus. As a result:

Rebalancing tablets from one node to another is very fast, because the actual data is not copied.
Cloud Bigtable simply updates the pointers for each node.
Recovery from the failure of a Cloud Bigtable node is very fast, because only metadata needs to be migrated to the replacement node.
When a Cloud Bigtable node fails, no data is lost.

When to use Bigtable.

Store >1TB of data.
High volumes of writes.
Low Latency rw <10ms.
Hbase API compatibility.

Cloud Memorystore.

Memorystore for Redis provides a fully-managed service that is powered by the Redis in-memory data store to build application caches that provide sub-millisecond data access.

In-memory data store service.
Redis on GCP infra.
Enable HA
patching, monitoring.
Sub-millisecond latency.
300GB, Network 12Gbps.
Can be moved from Redis to Memorystore.

Help

Share on

Twitter Facebook Google+ LinkedIn

Cloud Databases - SQL/Spanner/Firestore/BigTable/Memorystore - GCP Learning Notes.

AHMED ZBYR

Cloud SQL

Managed services include.

Instance.

Services

Connecting to Cloud SQL

Migration from MySQL to Cloud SQL

Cloud Spanner

Replications

`read-write` replicas

`read-only` replicas

`witness` replicas

Creating a Database in Spanner.

Creating Backups

When to use

Cloud Firestore.

Data Model

When to choose.

Best practices

Sample example.

Cloud Bigtable.

Data Model.

Storage Model.

Conceptual View

Hbase Shell - Table Operations

Bigtable Architecture

When to use Bigtable.

Cloud Memorystore.

Help

Share on

You May Also Enjoy

Datastream Workflow - Cloud SQL (MySQL) to BigQuery via Cloud Auth Proxy

Establishing a Datastream from Cloud SQL (MySQL) to BigQuery

Setting Up HashiCorp Vault for Secret Management

Data Export from Datastore and Firestore

Cloud Databases - SQL/Spanner/Firestore/BigTable/Memorystore - GCP Learning Notes.

AHMED ZBYR

Cloud SQL

Managed services include.

Instance.

Services

Connecting to Cloud SQL

Migration from MySQL to Cloud SQL

Cloud Spanner

Replications

read-write replicas

read-only replicas

witness replicas

Creating a Database in Spanner.

Creating Backups

When to use

Cloud Firestore.

Data Model

When to choose.

Best practices

Sample example.

Cloud Bigtable.

Data Model.

Storage Model.

Conceptual View

Hbase Shell - Table Operations

Bigtable Architecture

When to use Bigtable.

Cloud Memorystore.

Help

Share on

You May Also Enjoy

Datastream Workflow - Cloud SQL (MySQL) to BigQuery via Cloud Auth Proxy

Establishing a Datastream from Cloud SQL (MySQL) to BigQuery

Setting Up HashiCorp Vault for Secret Management

Data Export from Datastore and Firestore

`read-write` replicas

`read-only` replicas

`witness` replicas