Cloud SQL is a fully managed service which includes MySQL
, PostgreSQL
, Microsoft SQL Server
.
And for horizontal scalability, consider using Cloud Spanner.
Cloud SQL
Example - (courtesy google images)
Managed services include.
- Patching and updates to the service automatically.
- We can administer the users.
- Cloud SQL supports many clients like.
gcloud sql
- App Engine.
- Toad.
- SQL Workbench etc.
Instance.
- 30TB Storage
- 416 GB RAM
- 40k IOPS
Services
- Replica services
- Backups
- Import/Export
- Scaling
- Up: Machine capacity [restart required]
- Out: Replicas
Connecting to Cloud SQL
- Within the same GCP region - use private IP.
- Outside of GCP - use Cloud Proxy.
Authorized Networks
if no SSL connection, then authorize a specific IP to connect.
Migration from MySQL to Cloud SQL
Overview.
Cloud Spanner
- Relational database structure but non-relation horizontal scaling.
- Can scale up to petabytes.
- Automatic sharding.
- Transactional consistency at global scale.
- Auto sync replications.
- Uptime: SLA
Replications
- Data availability
- Geographic locality
- Single database experience
- Easier application development
Types.
read-write
replicas,read-only
replicas, andwitness
replicas.
read-write
replicas
- Maintain a full copy of your data.
- Serve reads.
- Can vote whether to commit a write.
- Participate in leadership election.
- Are eligible to become a leader.
- Are the only type used in single-region instances.
read-only
replicas
- As the name suggests are
read-only
. - Maintain a full copy of your data, which is replicated from read-write replicas.
- Serve reads.
- Do not participate in voting to commit writes. Hence, the location of the read-only replicas never contributes to write latency.
- Can usually serve stale reads without needing a round-trip to the default leader region *
- Are not eligible to become a leader.
witness
replicas
- Are only used in multi-region instances.
- Do not maintain a full copy of data.
- Do not serve reads.
- Vote whether to commit writes.
- Participate in leader election but are not eligible to become leader.
Creating a Database in Spanner.
- Clone repo
git clone https://github.com/googleapis/python-spanner
- Move to samples
cd python-spanner/samples/samples
- Create a Virtual environment.
Requirement.txt
google-cloud-spanner==3.2.0
futures==3.3.0; python_version < "3"
virtualenv env
source env/bin/activate
pip install -r requirements.txt
Sample code to create database.
# Imports the Google Cloud Client Library.
from google.cloud import spanner
def create_database(instance_id, database_id):
"""Creates a database and tables for sample data."""
spanner_client = spanner.Client()
instance = spanner_client.instance(instance_id)
database = instance.database(
database_id,
ddl_statements=[
"""CREATE TABLE Singers (
SingerId INT64 NOT NULL,
FirstName STRING(1024),
LastName STRING(1024),
SingerInfo BYTES(MAX)
) PRIMARY KEY (SingerId)""",
"""CREATE TABLE Albums (
SingerId INT64 NOT NULL,
AlbumId INT64 NOT NULL,
AlbumTitle STRING(MAX)
) PRIMARY KEY (SingerId, AlbumId),
INTERLEAVE IN PARENT Singers ON DELETE CASCADE""",
],
)
operation = database.create()
print("Waiting for operation to complete...")
operation.result(120)
print("Created database {} on instance {}".format(database_id, instance_id))
Creating Backups
Create a backup
# Imports the Google Cloud Client Library.
from google.cloud import spanner
def create_backup(instance_id, database_id, backup_id, version_time):
"""Creates a backup for a database."""
spanner_client = spanner.Client()
instance = spanner_client.instance(instance_id)
database = instance.database(database_id)
# Create a backup
expire_time = datetime.utcnow() + timedelta(days=14)
backup = instance.backup(backup_id, database=database, expire_time=expire_time, version_time=version_time)
operation = backup.create()
# Wait for backup operation to complete.
operation.result(1200)
# Verify that the backup is ready.
backup.reload()
assert backup.is_ready() is True
# Get the name, create time and backup size.
backup.reload()
print(
"Backup {} of size {} bytes was created at {} for version of database at {}".format(
backup.name, backup.size_bytes, backup.create_time, backup.version_time
)
)
Restore database from backup
# Imports the Google Cloud Client Library.
from google.cloud import spanner
def restore_database(instance_id, new_database_id, backup_id):
"""Restores a database from a backup."""
spanner_client = spanner.Client()
instance = spanner_client.instance(instance_id)
# Create a backup on database_id.
# Start restoring an existing backup to a new database.
backup = instance.backup(backup_id)
new_database = instance.database(new_database_id)
operation = new_database.restore(backup)
# Wait for restore operation to complete.
operation.result(1600)
# Newly created database has restore information.
new_database.reload()
restore_info = new_database.restore_info
print(
"Database {} restored to {} from backup {} with version time {}.".format(
restore_info.backup_info.source_database,
new_database_id,
restore_info.backup_info.backup,
restore_info.backup_info.version_time
)
)
When to use
- Outgrown single instance database.
- Sharding is required for better throughput.
- Transactional consistancy is required.
- Strong global consistancy.
- DB consolidation, combine multiple different databases into one service.
Cloud Firestore.
- Simplifies storage of data.
- Live sync and offline support.
- ACID Transactions.
- Multi-region replication.
Data Model
Firestore is a NoSQL, document-oriented database. Unlike a SQL database, there are no tables or rows.
Instead, you store data in documents
, which are organized into collections
.
- Similar to mongoDB.
- Data is organized in to
collections
document
are set ofkey
-value
pairs.- Hierarchy of Collections -> list of documents [Documents]
Example to show how this looks. Below we have users
collection with 2 documents alovelace
and aturing
users
alovelace
first : "Ada"
last : "Lovelace"
born : 1815
aturing
first : "Alan"
last : "Turing"
born : 1912
Every document in Firestore is uniquely identified by its location within the database.
The previous example showed a document alovelace
within the collection users
.
To refer to this location in your code, you can create a reference to it.
a_lovelace_ref = db.collection(u'users').document(u'alovelace')
We can also give path to a document or collection as a string, separated forward slash (/
).
For example, to create a reference to the alovelace document:
a_lovelace_ref = db.document(u'users/alovelace')
Both above code snippets will give the same output.
When to choose.
- Adaptable database - schema might change.
- Scale down to zero.
- Low maintenance and no overhead to scaling up to TB.
Best practices
- Database Location
- Closest to users.
- Multi region for better
- Regional location for lower costs and lower write latency (if app is sensitive to latency).
- Document ID.
- Avoid using
.
and..
- Aviod
/
in the document ID (as we can traverse using/
path) - Avoid monotonically increasing document IDs.
id1
,id2
,id3
…
- Avoid using
- Field names.
- Avoid be low chars.
.
,[
,]
,*
, ` (backtick)
- Indexes.
- Avoid too many indexes.
- Be aware that indexing fields with monotonically increasing values, such as timestamps, can lead to hotspots which impact latency for applications with high read and write rates.
Sample example.
from google.cloud import firestore
# Project ID is determined by the GCLOUD_PROJECT environment variable
db = firestore.Client()
data = {
u'stringExample': u'Hello, World!',
u'booleanExample': True,
u'numberExample': 3.14159265,
u'dateExample': datetime.datetime.now(),
u'arrayExample': [5, True, u'hello'],
u'nullExample': None,
u'objectExample': {
u'a': 5,
u'b': True
}
}
db.collection(u'data').document(u'one').set(data)
Cloud Bigtable.
- If no transactional consistancy.
- Fully managed
NoSQL
database petabytes scale with very low latency. - Scales for throughput.
- Running on core services in google - maps, analytics, gmail.
- Supports Hbase API - Apache Hbase.
- Integrates with
hadoop
,cloud dataflow
,cloud dataproc
.
Data Model.
- Table
- Multiple rows.
- Row
- Row keys, one or more columns.
- Rows are sorted alphabetically by row kwy.
- Design for row key is very important.
- Key should be designed with mind that related data should be close to each other.
- Example: Area-code/city if we are creating a population graph.
- Column
- A column in HBase consists of a column family
- column qualifier
- which are delimited by a : (colon) character.
- Example:
content
:html
- Column Family
- Column families physically colocate a set of columns and their values, often for performance reasons.
- Each row in a table has the same column families,
- though a given row might not store anything in a given column family.
- Column Qualifier
- Mutable
- Are added to
CF
column family to provide index. - Ex:
content
:html
orcontent
:pdf
- Cell
- Combination of
row
,column family
, andcolumn qualifier
. - Contains a value and a timestamp, which represents the value’s version.
- Example:
"com.cnn.www" | t6 | contents:html = "<html> ... "
- Combination of
- Timestamp
- A timestamp is written alongside each value.
- Used for versioning.
- By default, the timestamp represents the time on the
RegionServer
when the data was written. - But we can specify a different
timestamp
value when you put data into the cell.
Storage Model.
- Stores data in scalable tables which are sorted
key
/value
map. - Table contains
rows
andcolumn
rows
describe single entity.column
contain individual vaules for each row. column family
Conceptual View
Multi-dimensional map. This is only a mock-up.
{
"com.cnn.www": {
contents: {
t6: contents:html: "<html>..."
t5: contents:html: "<html>..."
t3: contents:html: "<html>..."
}
anchor: {
t9: anchor:cnnsi.com = "CNN"
t8: anchor:my.look.ca = "CNN.com"
}
people: {}
}
"com.example.www": {
contents: {
t5: contents:html: "<html>..."
}
anchor: {}
people: {
t5: people:author: "John Doe"
}
}
}
Above we can illustrate what the different components of the data.
- Row
com.cnn.www
. - Column
contents:html
. - Column Family
contents
,anchor
. - Column Qualifier
html
,cnnsi
. - Cell
com.cnn.www | t6 | contents:html: "<html>..."
. - Timestamp
t6
,t5
.
Example: Twitter followers.
Hbase Shell - Table Operations
Creating a Table.
hbase(main):001:0> create 'test', 'cf'
0 row(s) in 0.4170 seconds
=> Hbase::Table - test
Use the list
command to confirm your table exists
hbase(main):002:0> list 'test'
TABLE
test
1 row(s) in 0.0180 seconds
=> ["test"]
Now use the describe command to see details, including configuration defaults
hbase(main):003:0> describe 'test'
Table test is ENABLED
test
COLUMN FAMILIES DESCRIPTION
{NAME => 'cf', VERSIONS => '1', EVICT_BLOCKS_ON_CLOSE => 'false', NEW_VERSION_BEHAVIOR => 'false', KEEP_DELETED_CELLS => 'FALSE', CACHE_DATA_ON_WRITE =>
'false', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', MIN_VERSIONS => '0', REPLICATION_SCOPE => '0', BLOOMFILTER => 'ROW', CACHE_INDEX_ON_WRITE => 'f
alse', IN_MEMORY => 'false', CACHE_BLOOMS_ON_WRITE => 'false', PREFETCH_BLOCKS_ON_OPEN => 'false', COMPRESSION => 'NONE', BLOCKCACHE => 'true', BLOCKSIZE
=> '65536'}
1 row(s)
Took 0.9998 seconds
Put data into your table. To put data into your table, use the put
command.
hbase(main):003:0> put 'test', 'row1', 'cf:a', 'value1'
0 row(s) in 0.0850 seconds
hbase(main):004:0> put 'test', 'row2', 'cf:b', 'value2'
0 row(s) in 0.0110 seconds
hbase(main):005:0> put 'test', 'row3', 'cf:c', 'value3'
0 row(s) in 0.0100 seconds
scan
table data.
hbase(main):006:0> scan 'test'
ROW COLUMN+CELL
row1 column=cf:a, timestamp=1421762485768, value=value1
row2 column=cf:b, timestamp=1421762491785, value=value2
row3 column=cf:c, timestamp=1421762496210, value=value3
3 row(s) in 0.0230 seconds
Get single row.
hbase(main):007:0> get 'test', 'row1'
COLUMN CELL
cf:a timestamp=1421762485768, value=value1
1 row(s) in 0.0350 seconds
Bigtable Architecture
- Tablets are stored on Colossus, Google’s file system.
- Stored in SSTable format.
Importantly, data is never stored in Cloud Bigtable nodes themselves; each node has pointers to a set of tablets that are stored on Colossus. As a result:
- Rebalancing tablets from one node to another is very fast, because the actual data is not copied.
- Cloud Bigtable simply updates the pointers for each node.
- Recovery from the failure of a Cloud Bigtable node is very fast, because only metadata needs to be migrated to the replacement node.
- When a Cloud Bigtable node fails, no data is lost.
When to use Bigtable.
- Store >1TB of data.
- High volumes of writes.
- Low Latency
rw
<10ms. - Hbase API compatibility.
Cloud Memorystore.
Memorystore for Redis provides a fully-managed service that is powered by the Redis in-memory data store to build application caches that provide sub-millisecond data access.
- In-memory data store service.
- Redis on GCP infra.
- Enable HA
- patching, monitoring.
- Sub-millisecond latency.
- 300GB, Network 12Gbps.
- Can be moved from
Redis
toMemorystore
.
Help
- Cloud SQL - Private IP
- Cloud SQL - Connecting Overview
- Cloud SQL - Exporting Data
- Cloud SQL - Migration from MySQL to Cloud SQL
- Cloud Spanner - Backups
- Cloud Spanner - Replication
- Cloud Spanner - Getting Started - Python
- Firestore - Data Model
- Firestore - Quickstart
- Cloud Bigtable - Hbase
- Cloud Bigtable - Overview
- Cloud Memorystore - Overview