Wednesday, 19 September 2018

Introduction to Google Cloud Platform

Introduction to Google Cloud Platform

  1. 1. Introduction to Google Cloud Platform Google Cloud Platform Meetup 

  2. 2. Agenda • 
  3. Why Google Cloud? • Infrastructure underpinning Google Cloud • Components of Google Cloud • Compute Services • Networking Services • Storage Service • Big Data • Machine Learning

  4. 3. Why Google Cloud? “Google Cloud is underpinned by the same infrastructure and innovation that powers Google products” “Google has scaled seven products each of which has over a billion users each, every single day Google handles 1.4 petabytes of information in Gmail alone with 99.97% availability ” “We are at the beginning of what’s possible with the cloud” - Sundar Pichai (GCP Next 16 Keynote)

  5. 4. Why Google Cloud? Google's ability to build, organize, and operate a huge network of servers and fiber optic cables with an efficiency and speed that rocks physics on its heels. This is what makes Google Google: its physical network, its thousands of fiber miles, and those many thousands of servers that, in aggregate, add up to the mother of all clouds" - Wired

  6. 5. Google’s Network Infrastructure Global, meshed fiber backbone network interconnecting data centers with 70+ Edge points of presence in 33 countries with elements within ISP and access networks 
  7. Read More at https://peering.google.com/#/infrastructure https://cloudplatform.googleblog.com/2015/06/A-Look-Inside-Googles-Data-Center-Networks.html http://www.wired.com/2015/06/google-reveals-secret-gear-connects-online-empire/

  8. 6. Compute Services

  9. 7. Compute Engine • Configurable Custom Machine Types • Live migration • Up to 2 GBPS networking between VMs • Instance metadata and startup scripts • HTTP(s) and Network load balancing • APIs for auto-scaling and group management • Sub-Hourly billing, Automatic sustained use discount • Preemptible VMs (Spot Instances)

  10. 8. Container Engine • Kubernetes based Container orchestration • Uses underlying Compute Engine resources • Declarative syntax for orchestration and scheduling Docker containers • Managed Logging, Monitoring, and Scaling

  11. 9. App Engine • Managed runtime for Java, Go, Python, & PHP • Local SDK for developing, testing and deployment • Auto-scaling based on demand • Free daily quota, usage-based billing • 60s Request timeout • Can’t write to local filesystem • Limits on third-party software

  12. 10. Load Balancing • HTTP(S) and Network Load Balancing • HTTP(S) Load balancing and auto-scaling across Compute Engine Regions • Single Anycast external IP, simplifies DNS setup • No pre-warming required, scales to 1 million+ QPS • Policy based Auto-scaling of Instance groups • Network Load balancing for TCP and UDP traffic within a Compute Engine Region • Only healthy instances handle traffic

  13. 11. Cloud DNS • Fully managed, Scalable and Highly Available DNS • 100% availability SLA • Programmatically manage zones and records with RESTful API • Powered by the global network of Anycast name servers • Managed zones for projects • Cost effective pricing tiers

  14. 12. Cloud Storage • Highly scalable immutable object /blob store • Standard variant (HA & low latency) • Durable Reduced Availability variant (Reduced availability) • Nearline Storage for archiving, backup and DR (~3s response) • No capacity planning required • All options accessed through the same API • Can be mounted as the file system using GCS Fuse

  15. 13. Cloud Datastore • NoSQL database that can scale to billions of rows • Fully managed service • Automatically handles Sharding and Replication • Support for ACID transactions, SQL like queries • Fast and Highly Scalable • Local development tools • Access from anywhere through a RESTful Interface • Free daily quota

  16. 14. Cloud Bigtable • Massively scalable NoSQL • For large workload applications - Terabytes to petabytes of data • Low latency and high throughput • Accessed using HBase API • Native compatibility with Hadoop ecosystem • Replicated storage • Role-based ACLs • Encryption of in-flight and at rest data • Used by Google Analytics and Gmail

  17. 15. Cloud SQL • Managed MySQL • Packages and Pay-per-use billing • Second generation Cloud SQL is currently in Beta • Vertical scaling for read and write • Horizontal scaling for read • Seamless integration with App Engine, and Compute Engine • Data is automatically encrypted • Automatic failover for high availability

  18. 16. Big Data Services (Fully Managed) BigQuery Analytics data warehouse Stream data at 100,000 rows per second Dataflow Stream and Batch processing of data Unified programming model Pub/Sub Scalable & Reliable enterprise messaging middleware Dataproc Managed Hadoop, Spark, Pig and Hive at affordable pricing

  19. 17. BigQuery • Fully managed petabyte-scale analytics data warehouse • Near real-time interactive analysis of massive datasets • Based on the columnar structure for performance • SQL like syntax for querying • Scale storage and compute separately • Pay for storage and compute used • Benefit from integration points developed by partners

  20. 18. Dataflow • Unified programming model for developing and executing scalable and reliable data pipelines • Support for ETL, Analytics, Real-time computation, and Process orchestration • Processes data using Compute Engine instances • Open Source Java SDK for developing custom extensions • Benefit from integration developed by GCP partners

  21. 19. Dataproc • Fully managed Hadoop, Spark, Pig, and Hive • Dataproc clusters can be resized at any time, even when the jobs are running • Clusters are billed minute-by-minute • Clusters can use preemptible instances to further reduce cost • Restful API and integration with Google Cloud SDK • Easy to move existing ETL pipelines without redevelopment

  22. 20. Cloud Pub/sub • Scalable and reliable messaging middleware • Based on proven Google technologies • Guaranteed “at least once” delivery with low latency • Supports both pull and push delivery • Fully managed and global by design taking advantage of all GCP regions • Includes support for offline consumers

  23. 21. Cloud Datalab • Interactive tool for large-scale exploratory data analysis and visualization • Based on Jupyter notebook (IPython) • Code, documentation, results, and visualizations all in notebook format • Runs on Google App Engine • Python, SQL, and JavaScript for data analysis • Google charts or matplotlib for visualization • Easy to deploy transformation, analysis models to BigQuery

  24. 22. Cloud Machine Learning • Cloud Machine Learning is currently in Alpha • Fully managed large-scale Machine Learning Platform • Fully managed and Integrated with Cloud Storage and BigQuery • Uses open source TensorFlow framework that powers Google Photos, and Cloud Speech API • Integrated with Cloud Dataflow for pre-processing • Google has built custom Tensor Processing Units for efficiently running Machine Learning • http://venturebeat.com/2016/05/18/google-is-bringing-custom-tensor- processing-units-to-its-public-cloud/ • http://www.infoworld.com/article/3072569/cloud-computing/googles- cloud-strategy-becomes-clearer-with-tensorflow.html

  25. 23. Translate API • Simple API for translating an arbitrary string into any supported language • Programmatically detect a document’s language • Support for dozens of languages • Highly Scalable high-quality translation • Supports Python, Java, Go and etc • You can try it out from API Explorer • Usage and billing calculated per million characters • We can try it on APIs Explorer

  26. 24. Prediction API • Predicts trends based on historical data • Use cases: – Categorizing emails as spam or non-spam – Product recommendations – Assessing whether posted comments have positive or negative sentiment • Data replicated using Cloud Storage • Fast & Reliable (Most queries take less than 200 ms) • RESTFul API is available for many popular languages

  27. 25. Cloud Vision API • Image analysis based on powerful machine learning models • Ability to classify images into thousands of categories • Detect individual objects and faces within the image • API improves over time by building on insights • Detect different types of inappropriate content • Analyze emotional facial attributes • Object Character Recognition to detect text with automatic language identification

  28. 26. Cloud Speech API • Currently in Alpha • Audio to text powered by neural network models • Recognizes over 80 languages and variants • Ability to filter inappropriate content • Return partial results in real time as and when they become available • Built-in noise elimination for a variety of environments • API improves over time by building on insights

  29. 27. What Next GCP Blog https://cloudplatform.googleblog.com/ GCP Docs https://cloud.google.com/docs/

Sunday, 9 September 2018

Google Cloud Platform Training in Hyderabad – Google Cloud Platform



Google Cloud Engine Hyderabad – Google Cloud Platform

Google Cloud Engine Hyderabad: Google Compute Engine gives users the ability to run large-scale workloads on virtual machines hosted on Google’s infrastructure. It is a part of Google Cloud Platform.
 Image result for google cloud platform
 Google Compute Engine features:
  • High-performance virtual machines
  • Minute-level billing (10-minute minimum)
  • Fast VM provisioning
  • Persistent block storage (SSD and standard)
  • Native Load Balancing

What You Will Learn

Scale and develop your applications with Google App Engine’s runtime environment
Get to grips with request handling mechanism and write request handlers
Deep dive into Google’s distributed NoSQL and highly scalable datastore and design your application around it
Implement powerful search with scalable datastore
Perform long-running tasks in the background using task queues
Write compartmentalized apps using multi tenancy, memcache, and other Google App Engine runtime services
Handle web requests using the CGI, WSGI, and multi-threaded configurations
Deploy, tweak, and manage apps in production on Google App Engine

Course Outline

1. Getting Started
Creating a Compute Engine Project
Enabling Billing
Adding Team Members
Compute Engine Resources
Manage Compute Engine Resources
2. Instances
Creating an Instance Using the Developers Console
Accessing an Instance Using the Developers Console
Deleting an Instance Using the Developers Console
Creating an Instance Using gcloud
Instance Attributes
Accessing an Instance Using gcloud
Deleting an Instance Using gcloud
Creating an Instance Programmatically
Creating an Instance Using a Service Account
Selecting an Access Mode
Cleaning Up
3. Storage: Persistent Disk
Compute Engine Storage Options at a Glance
Persistent Disk
Persistent Disk Performance
Create a Persistent Disk Using Developers Console
Create a Persistent Disk Using gcloud
Attaching/Detaching a PD to/from a Running VM
Create a Persistent Disk Programmatically
Persistent Disk Snapshots
4. Storage: Cloud Storage
Understanding BLOB Storage
Getting Started
Introducing gsutil
Using Cloud Storage from Your Code
Configuring Access Control
Understanding ACLs
Using Default Object ACLs
Understanding Object Immutability
Understanding Strong Consistency
5. Storage: Cloud SQL and Cloud Datastore
Cloud SQL
Getting Started
Creating Databases and Tables
Running SQL Queries
Cloud Datastore
Getting Started
Creating and Viewing Entities via the Developers Console
Creating and Retrieving Entities Programmatically from a VM
Bring Your Own Database
6. Networking
A Short Networking Primer
Network Addresses and Routing
Transport Control Protocol (TCP)
The Domain Name System (DNS)
Hypertext Transfer Protocol (HTTP)
Load Balancing
Firewalls
Default Networking
Configuring Firewall Rules
Configuring Load Balancing
Reserving External IP Addresses
Configuring Networks
Understanding Networking Costs
Understanding Routing
Selecting an Access Mode
7. Advanced Topics
Startup Scripts
gcloud compute
Literal-Value Approach
Local-File Approach
Cloud-Storage Approach
Publicly Available Approach
API Approach
Custom Images
Creating a Custom Image
Using a Custom Image
Metadata
Metadata Server
Metadata Entries
Project Metadata
Instance Metadata
Data Formats
Default Versus Custom
Project-Level Custom Metadata
Instance-Level Custom Metadata
wait_for_change URL parameter

Google Cloud Certified Professional Data Engineer Certification Preparation guide by Kiran Vasadi




I’m Kiran Kumar Vasadi, a BigData Analyst, Google Cloud Platform, BigQuery, Azure HDInsight, Hadoop (HDFS, Sqoop, Hive, HBase, Spark), Tableau, Talend, MS Business Intelligence(SSIS, SSRS), Powe BI, IBM Cognos





Here I am sharing my Google Cloud Certified Professional Data Engineer Preparation Guide.
Let’s dive right in, here is the preparation I followed:

Data Engineer
Goal: Assess tradeoffs revolving around processing, analyzing, and storing data on Google Cloud Platform.

– Focus on using Dataflow vs Dataproc
– Focus on different data retention strategies and trade-offs
– Focus on Big data processing workflows
– Focus on the underlying the cases to use BigTable and BigQuery and their trade-offs
– Some case studies for choosing data sources
My feedback on the exam:
  • Check the scope of this exam, be prepared for design questions on database models, optimization and troubleshooting
  • Know Bigquery VS Bigtable VS Datastore VS Cloud SQL
  • Dataflow and how to deal with batch and stream processing
  • Read as much as you can and play with machine learning!
  • How to share datasets, queries, reports is really something that comes often, don’t underestimate security aspects
  • Understand Hadoop ecosystem, learn about the typical big data lifecycle on GCP
Final Thoughts
The exam was great but do not beat real-world experience with the platform. I felt like I was in high school again studying. General good test taking and studying practices was the key. These exams are a great starting point for your journey to the Cloud. If you’d like to learn more, feel free to drop me a line on any of my social media or email channels.


Good luck to everyone taking this exam!