[go: up one dir, main page]

0% found this document useful (0 votes)
69 views32 pages

OCI Data Science Module2 Summary

Module 2 of the OCI Data Science Professional Certification 2025 covers the introduction and configuration of Oracle Cloud Infrastructure for Data Science, emphasizing key concepts like tenancy, compartments, and access control. It explains the role of the ADS SDK in automating machine learning tasks and highlights the importance of understanding AI vs ML services. The module also includes practical examples, exam tips, and key terminology essential for certification preparation.

Uploaded by

Francisco Vega
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
69 views32 pages

OCI Data Science Module2 Summary

Module 2 of the OCI Data Science Professional Certification 2025 covers the introduction and configuration of Oracle Cloud Infrastructure for Data Science, emphasizing key concepts like tenancy, compartments, and access control. It explains the role of the ADS SDK in automating machine learning tasks and highlights the importance of understanding AI vs ML services. The module also includes practical examples, exam tips, and key terminology essential for certification preparation.

Uploaded by

Francisco Vega
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 32

OCI Data Science 2025 Professional Certification – Module 2 Summary

This document summarizes Module 2: Introduction and Configuration for the OCI Data
Science Professional Certification 2025. It provides friendly explanations, key concepts, and
likely exam topics for each lesson, followed by a one-page reference table.

Lesson 1 – Introduction to OCI Data Science Configuration


This lesson introduces the structure and purpose of configuring Oracle Cloud Infrastructure
(OCI) for Data Science. You learn how tenancy, compartments, groups, policies, and
networking form the foundation for secure and organized Data Science workloads. Key
takeaway: configuration ensures proper access control, scalability, and integration across
OCI services.

🧠 1. Overview: What You’re Learning

This lesson introduces you to:

 The history and evolution of data science.

 The Oracle Cloud Infrastructure (OCI) Data Science service and how it fits into
Oracle’s AI ecosystem.

 Key concepts, terminology, and components you’ll use in real OCI projects.

It sets the stage for later lessons on configuration, model building, deployment, and
automation.

📜 2. The Historical Background (Fun but Important)

Oracle starts with history to show how ideas of simplicity, data, and learning evolved:

Thinker / Era Concept Modern Data Science Connection

William Ockham Ockham’s Razor — prefer In ML, we aim for the simplest model
(1300s) simple explanations that fits the data (avoid overfitting).

More data improves Data quantity and quality are key to


Tobias Mayer (1700s)
accuracy model performance.

ML systems improve automatically


Arthur Samuel (1952) Coined machine learning
through experience.

Predicted data analysis as Early vision of “data-driven decision


John Tukey (1962)
a science making.”
Thinker / Era Concept Modern Data Science Connection

Beat Garry Kasparov Demonstrated the power of


IBM Deep Blue (1997)
using computation algorithms + compute.

DJ Patil & Jeff Popularized data science Unified ML, stats, and engineering
Hammerbacher (2008) term under one role.

👉 Exam tip: You don’t need to memorize dates, but understand how simplicity, data, and
learning from experience form the foundation of data science.

🌍 3. Modern Data Science Use Case — Employee Attrition

Oracle uses a business example: predicting employee attrition (turnover).


Why? Because it’s a real, relatable business problem that combines:

 Data collection (HR, satisfaction, salaries),

 ML modeling (classification/prediction),

 Insights for business (retain talent).

👉 Exam tip: Expect scenario questions like:

“Which OCI service would a data scientist use to build an attrition prediction model end-to-
end?”

Answer: OCI Data Science.

🧩 4. Oracle’s AI and ML Ecosystem — How It Fits Together

Oracle’s approach is layered. Picture three levels:

1️⃣ Data layer (foundation):


All data types — structured (databases), unstructured (text, images, IoT, social media).

2️⃣ Middle layer (services):

 Machine Learning Services:


Used by data scientists to build, train, deploy, and manage custom models.
→ Tools: OCI Data Science, Oracle Database ML, Data Labeling.

 AI Services:
Pre-built models you can use via API (vision, language, speech, anomaly detection).
→ No need to build from scratch.
3️⃣ Top layer (applications):
Business apps, analytics, and custom systems that consume AI results.

👉 Exam tip: Be able to explain the difference between AI services and Machine Learning
services.

Feature AI Services ML Services

Purpose Ready-to-use ML models Build custom models

Who uses App developers, business users Data scientists

Customization Minimal (train with your data) Full control

Example OCI Language, OCI Vision OCI Data Science

⚙️5. OCI Data Science — The Core Service

This is the heart of the certification.


OCI Data Science helps data scientists through the entire ML lifecycle:

Build → Train → Deploy → Manage

Supports:

 Python and open-source tools

 JupyterLab notebooks

 Compute flexibility (CPU/GPU)

 Full integration with OCI services (security, storage, etc.)

🔑 The 3 Core Principles

Principle What It Means Example / Benefit

Simplify setup; access big Run Jupyter notebooks on cloud GPUs


1. Accelerate
compute easily without managing servers

Teams share assets, ensure


2. Collaborate Shared projects, shared model catalog
reproducibility

3. Enterprise- Security, governance, OCI handles maintenance, access control,


grade automation auditing
🧰 6. Key Terminology You Must Know (Exam Critical)

Term What It Means Exam Example

A container/workspace for organizing all “Where do you organize your


Project
assets (notebooks, models, jobs). team’s data science work?”

“Which OCI component


Managed JupyterLab environment with
Notebook Session provides a managed Jupyter
compute and storage.
environment?”

“What tool manages


Environment and package manager for
Conda dependencies in OCI Data
Python libraries.
Science notebooks?”

ADS SDK Oracle’s Python library that automates “Which SDK helps automate
(Accelerated Data ML steps (data prep, AutoML, model the ML workflow in OCI Data
Science) explainability, etc.). Science?”

“Where are models stored and


Centralized repository for storing,
Model Catalog versioned in OCI Data
sharing, and managing models.
Science?”

Model Exposes a model as an HTTP endpoint for “How do you operationalize a


Deployment real-time predictions. trained model?”

“Which feature runs


Automated, repeatable ML task (training,
Data Science Job scheduled ML tasks on
evaluation, batch scoring).
managed infrastructure?”

🧩 7. Ways to Access OCI Data Science

Access
Description Example
Method

Click and create projects, notebook


OCI Console Browser interface (most common)
sessions

REST API Full programmatic control Automate deployment pipelines

Code-based access (Python, Java,


SDKs Python SDK for model deployment
etc.)

CLI Command-line control Fast scripting or automation


🌐 8. OCI Regions

 Regions = physical data centers where OCI services run.

 OCI Data Science is available across global, government, and dedicated regions.

 Regions are expanding — always check oracle.com/cloud for updates.

👉 Exam tip: You might get a question like:

“Where does OCI Data Science run?”


✅ Answer: In OCI regions (commercial, government, or dedicated).

🎯 9. Key Takeaways for Certification

✅ Know the purpose of OCI Data Science: to manage the end-to-end ML lifecycle using
Python and open-source tools.
✅ Understand the difference between AI services vs ML services.
✅ Memorize key components: Projects, Notebook Sessions, Model Catalog, Model
Deployment, Data Science Jobs.
✅ Understand collaboration and governance features (enterprise-grade security, team
sharing).
✅ Recognize integration points: ADS SDK, Conda environments, REST APIs, SDKs, and CLI.
✅ Be familiar with the general Oracle AI architecture (Data → ML/AI Services →
Applications).

🧩 Example Certification-Style Questions

1. Which OCI service is primarily used by data scientists to build and deploy
custom ML models?
→ OCI Data Science.

2. What is the purpose of the Model Catalog in OCI Data Science?


→ To store, manage, and share trained models with metadata and versioning.

3. What is the difference between AI Services and Machine Learning Services in


OCI?
→ AI Services = pre-built models; ML Services = build your own models.

4. Which environment does OCI Data Science use for interactive coding?
→ JupyterLab notebooks.
5. What is the role of the ADS SDK?
→ To simplify and automate data science workflows like data prep, AutoML, and
model deployment.

6. What’s one key advantage of using OCI Data Science vs local laptop notebooks?
→ Access to scalable, managed compute and integrated OCI security.

Lesson 2 – Tenancy Configuration Basics


This lesson explains how OCI resources are organized within a tenancy. Key components
include:
- Compartments: logical containers for resources.
- User Groups and Dynamic Groups: manage access for users and resources.
- Policies: define permissions with syntax like 'Allow group <name> to manage data-
science-family in compartment <name>'.

You also learn about verbs (inspect, read, use, manage) defining access levels, and required
policies for Data Science, including metrics, logs, and network access.

🧠 1. What is ADS SDK?

ADS (Accelerated Data Science) SDK is Oracle’s Python library that helps automate and
simplify the entire machine learning lifecycle on OCI.

Think of ADS as your “data science assistant” inside OCI Data Science.

It’s built by data scientists for data scientists, with two main goals:

1. Integrate seamlessly with other OCI services.

2. Accelerate every step — from data connection → visualization → model training →


deployment → explainability.

⚙️2. The Two Versions of ADS

Version Description Where You Use It

Your laptop, local


Public (Open Available on GitHub and PyPI — anyone can
JupyterLab, or non-OCI
Source) install it.
environment.

Installed automatically inside OCI Data Science


OCI Integrated Within OCI Data Science
environments. Includes AutoML and ML
Version Notebooks.
Explainability.
👉 Exam tip: AutoML and Explainability are only available in the OCI version, not the public
one.

🧩 3. Key Goals of ADS SDK

ADS SDK is meant to:

 Integrate OCI services securely (Vault, Autonomous Database, Big Data Service).

 Simplify data science tasks like data exploration, feature engineering, training, and
deployment.

 Automate repetitive work with a few lines of Python.

Example:

Instead of manually connecting to databases or tuning parameters,


you can do it with ADS classes like ADSTuner, DatasetFactory, or ADSModel.

🔑 4. Key ADS SDK Features and What They Do

Stage What ADS Does Key Classes / Features

Securely access data from DatasetFactory, SecretKeeper,


1. Connect to Data
anywhere (OCI, DBs, S3, etc.) OCIAuth, OCI Object Storage, Vault

2. Data Visualization & Understand data through plots,


Smart Plotting, Feature Types
Exploration correlations, summaries

Suggest or perform data


3. Feature Engineering ADSDataset, Feature Type System
transformations automatically

Automate model creation,


4. Model Training AutoML, ADSTuner, ADSModel
tuning, and evaluation

Compare models and view


5. Model Evaluation ADSEvaluator
metrics easily

Explain what your model


6. Model Explainability Module (PDP, ALE,
learned (local/global
Interpretability What-if)
explainability)

Push models to production in ADSModel, ModelArtifact,


7. Model Deployment
OCI Model Catalog ModelDeployment
🧮 5. Connecting to Data Sources (Know These!)

ADS simplifies connecting to many types of data securely and efficiently.

Data Source Type How ADS Connects Notes

Fastest for small


Local Storage Uses block storage inside the notebook session
datasets

Access via oci:// protocol directly in pandas Uses fsspec to act


OCI Object Storage
(pd.read_csv('oci://bucket/file.csv')) like local files

Oracle Databases /
Secure connection using SecretKeeper and Keeps credentials
Autonomous DB
credentials stored in OCI Vault safe
(ADB)

OCI Big Data Service Connect directly to Hadoop-based data without


(HDFS) copying

S3, Google Cloud Storage, Azure Blob/Data Lake, Requires simple


3rd Party Clouds
Dropbox configuration

Read files directly from HTTP/HTTPS into Useful for public


Web Data
pandas datasets

Works for non-


NoSQL Databases Use DatasetFactory to connect and query
relational sources

👉 Exam tip: You may get a question like:

“Which ADS component securely manages database credentials?”


✅ SecretKeeper (stores secrets in OCI Vault).

📊 6. Data Visualization & Feature Types

ADS includes smart plotting and feature type classes that:

 Automatically pick the right chart (bar, scatter, histogram, etc.).

 Provide summary statistics and correlation heatmaps.

 Allow reusable visualizations across projects.

👉 Example: If you define “Age” as a numerical feature type, ADS will always visualize it with
histograms or box plots.
This standardization improves consistency and saves time.

🧬 7. Feature Engineering (Where Models Get Smarter)

ADS can automatically analyze your dataset and suggest transformations to improve
model performance:

 Handle categorical encoding

 Manage missing values / imputation

 Suggest new derived features

All of this is done with the ADSDataset class, which wraps around a pandas dataframe.

“Feature engineering is often the difference between a good model and a great model.”

🤖 8. Model Training and Optimization

ADS helps you train models automatically or manually.

Two main tools:

Tool Purpose

AutoML (Oracle Labs) Automatically builds and compares models — picks best one.

ADSTuner Performs hyperparameter optimization manually or automatically.

Once trained, ADS can:

 Package models as artifacts,

 Save them in the Model Catalog, and

 Push them into production.

👉 Exam tip: “Which ADS component performs hyperparameter optimization?”


✅ ADSTuner

📈 9. Model Evaluation

After training, you use ADSEvaluator to:

 Compare different models side-by-side,


 Automatically pick proper metrics for classification or regression,

 Generate charts and performance summaries automatically.

Example:

 For binary classification: AUC, accuracy, confusion matrix.

 For regression: RMSE, MAE, R².

No need to manually plot each — ADS does it for you.

🔍 10. Model Interpretability & Explainability

One of the most powerful parts of ADS!

ADS provides both:

Type Description Example

Local Understand why the model made a “Why did this employee get
Explainability specific prediction predicted to leave?”

Global Understand overall model behavior and “Which features influence


Explainability feature importance attrition the most?”

Techniques used:

 PDP (Partial Dependence Plots) – show how features affect predictions.

 ALE (Accumulated Local Effects) – similar to PDP, but more accurate for
correlated features.

 What-If Scenarios – modify inputs and observe how predictions change.

👉 Exam tip: Be able to describe local vs global explainability and mention tools like PDP
and ALE.

🚀 11. Model Deployment

Once your model is trained and saved in the Model Catalog, ADS makes it easy to deploy.

ADS Model Framework lets you:

 Deploy models from any library (scikit-learn, TensorFlow, PyTorch, XGBoost,


AutoML).
 Deploy even generic models.

 Automatically create an HTTP endpoint in OCI for real-time predictions.

Integration with OCI Logging:

 Records prediction logs and access logs for monitoring and debugging.

👉 Exam tip:

“How does ADS help operationalize models?”


✅ It provides deployment classes that publish models as HTTP endpoints integrated with
OCI logging.

🧩 12. Summary — What You Should Remember

Concept Why It Matters

ADS SDK Oracle’s Python library for automating data science in OCI

Two Versions Public (basic) vs OCI (with AutoML + Explainability)

SecretKeeper Securely stores database credentials in OCI Vault

ADSTuner Hyperparameter optimization

ADSDataset Simplifies data cleaning and feature engineering

ADSEvaluator Compares and evaluates multiple models

Explainability Tools Local & global explanations with PDP/ALE

ADSModel Framework Easily deploys models into OCI production

Integration Works seamlessly with OCI Object Storage, DBs, Big Data, SDKs

🧠 13. Example Certification-Style Questions

1. What is the purpose of the ADS SDK?


→ To simplify and automate the end-to-end machine learning lifecycle on OCI.

2. Which component allows secure storage and access to database credentials?


→ SecretKeeper (integrated with OCI Vault).

3. What are two key features available only in the OCI version of ADS SDK?
→ AutoML and ML Explainability.
4. Which class helps perform hyperparameter tuning?
→ ADSTuner.

5. What does the ADSDataset class do?


→ Wraps a pandas DataFrame and provides feature engineering tools and
recommendations.

6. How does ADS support model interpretability?


→ Through local/global explainability tools like PDPs, ALE plots, and what-if testing.

7. Which component allows deploying trained models into OCI production


environments?
→ ADSModel Framework.

8. How can you connect pandas directly to OCI Object Storage?


→ Using the oci:// protocol with fsspec integration.

Lesson 3 – Configure Tenancy with OCI Resource Manager


Instead of manually creating compartments, groups, and policies, you can use the OCI
Resource Manager (ORM) Data Science template. The ORM stack automates configuration
by creating user and dynamic groups, matching rules, and policies. Steps: create a stack →
select the Data Science template → select your compartment → run the stack. You can also
access Terraform scripts from Oracle’s GitHub for customization.

Friendly summary — Tenancy & configuration (Lesson 3)

Nice — this lesson covers the tenancy basics you must know for OCI Data Science:
compartments, user groups, dynamic groups, and policies. Below I’ll explain each
concept in plain language, point out the exam-important bits, give concrete examples
(policy syntax and dynamic group rules), and finish with likely certification questions and
short answers.

1) High-level idea (one-liner)

Organize resources with compartments, group people and resources with user groups
and dynamic groups, then grant permissions by writing policies that tie groups →
actions (verbs) → resource types → compartments.

2) Concepts explained for a student

Compartments

 Logical folders for OCI resources (like projects or workspaces).


 Used to isolate and control access: policies target groups in a compartment.

 Plan first: decide how you’ll split resources (environments, teams, projects) before
creating compartments.

Practical exam tip: Know that access is granted at the group + compartment level.

User groups

 Collections of human user accounts.

 Admins create users, create groups, then add users to groups.

 Policies reference groups to grant human users permissions.

Exam tip: “Create users → create groups → add users to groups.”

Dynamic groups

 Special groups, but membership is resources (not humans).

 Membership is determined by matching rules (attributes of resources, e.g.,


resource type + compartment).

 Useful to let notebook sessions, job runs, or deployments act as principals and call
other OCI services (for example, let a notebook access object storage).

Key idea: Dynamic groups let resources act like users — you write policies that grant those
resources permissions.

Example matching rule (pseudo):

(instance.compartment.id = 'ocid1.compartment.oc1..aaaaaaa...') AND (resource.type =


'dataScienceNotebookSession')

(In practice use the exact rule syntaxes Oracle shows for Data Science resource types.)

Policies (the enforcement rules)

 Syntax pattern:

Allow <group> to <verb> <resource-type> in compartment <compartment-name>

 Components:
o Group: user group or dynamic group name.

o Verb: access level (inspect, read, use, manage).

o Resource type: single resource (data-science-job) or family (data-science-


family).

o Compartment: the target compartment.

Verb hierarchy (least → most permissive):

 inspect — list/see resources, minimal metadata

 read — inspect + get resource and metadata

 use — read + operate on the resource (not create/delete usually)

 manage — full permissions incl. create/delete

3) Practical policy examples (copy/learn these forms)

A. Allow data scientists to manage all Data Science resources in a compartment

Allow group DataScienceUsers to manage data-science-family in compartment DataScience-


Compartment

B. Allow Data Science notebook sessions (dynamic group) to manage data science
resources

Allow dynamic-group DS-Resources to manage data-science-family in compartment


DataScience-Compartment

C. Metrics & logging access (common required statements)

Allow group DataScienceUsers to read metrics in compartment DataScience-Compartment

Allow dynamic-group DS-Resources to use log-content in compartment DataScience-


Compartment

Allow group DataScienceUsers to manage log-groups in compartment DataScience-


Compartment

Allow group DataScienceUsers to use log-content in compartment DataScience-


Compartment

D. Virtual networking (if using custom VCNs)


Allow service data-science to use virtual-network-family in compartment DataScience-
Compartment

Allow group DataScienceUsers to use virtual-network-family in compartment DataScience-


Compartment

Allow dynamic-group DS-Resources to use virtual-network-family in compartment


DataScience-Compartment

E. Useful: grant object storage access

Allow group DataScienceUsers to manage object-family in compartment DataScience-


Compartment

Allow dynamic-group DS-Resources to manage object-family in compartment DataScience-


Compartment

Small tip: use aggregate resource families (like data-science-family) to simplify policies
instead of listing many resource types.

4) Typical workflow to configure tenancy for Data Science

1. Plan compartment structure (team/project/env boundaries).

2. Create compartment(s) in Identity console. Grab the OCID if needed.

3. Create users, then create user groups, and add users to groups.

4. Create dynamic group(s) with matching rules for notebook sessions, job runs,
model deployments (scoped to the compartment).

5. Create policies: required policies (data-science manage, metrics/logging), plus


auxiliary policies (object storage, networking) as needed.

6. Test: launch a notebook session and verify it can access required services per the
policies.

5) Exam-focused key points to memorize

 Access is group + compartment based.

 Dynamic groups contain resources, not humans. They use matching rules.

 Policy syntax: Allow <group> to <verb> <resource> in compartment <name>

 Verb meanings (inspect / read / use / manage) — know differences.


 Required policies for Data Science: allow user group to manage data-science-
family; allow dynamic group to manage data-science-family; plus metrics and
logging-related allow statements.

 Useful extra policies: object storage (object-family), virtual network (virtual-


network-family) if using custom networking.

6) Likely certification questions (with model answers)

1. Q: What is a dynamic group in OCI and when would you use it?
A: A dynamic group is a group whose members are resource principals (cloud
resources) determined by matching rules. Use it to grant permissions to resources
(e.g., notebook sessions) so they can call other OCI services securely.

2. Q: Write the policy statement to allow the DataScienceTeam group to fully manage
Data Science resources in the Prod-DS compartment.
A: Allow group DataScienceTeam to manage data-science-family in compartment
Prod-DS

3. Q: Which policy verb would you use to let a user list resources but not view or
change them?
A: inspect

4. Q: You created a notebook session in compartment DS-Work. Which construct lets


that notebook act as a principal to read from object storage?
A: Create a dynamic group with a matching rule for notebook sessions in DS-Work,
then write a policy that allows that dynamic group to read/use/manage object-
family in the compartment.

5. Q: What are two required policies for Data Science resources to access metrics and
logs?
A: Examples: Allow group <users> to read metrics in compartment <compartment>
and Allow dynamic-group <dynGroup> to use log-content in compartment
<compartment> (plus manage log-groups/use log-content for user group).

6. Q: Why plan compartments before creating them?


A: Because compartments are logical isolation boundaries used in policies; good
planning prevents complex cross-compartment access and simplifies governance.

7) Quick checklist you can memorize

 Plan compartments.

 Create users → create groups → add users to groups.


 Create dynamic groups with matching rules for notebook sessions, job runs, model
deployments.

 Add required policies: data-science manage (user group + dynamic group),


metrics/logging access.

 Add useful policies: object storage, virtual network family (if needed).

 Test resource access.

Friendly Summary — Lesson 4: Configure Tenancy with OCI Resource Manager (ORM)

This lesson builds directly on Lesson 3 — you already know how to configure
compartments, user groups, dynamic groups, and policies manually.
Now you’ll see a faster, automated way: using Oracle Resource Manager (ORM) with the
Data Science Service template.

Let’s go through it step by step — clear, simple, and focused on what’s exam-relevant.

️Big Picture

Instead of creating all the tenancy setup by hand (user group, dynamic group, policies), you
can let Oracle Resource Manager do it automatically using a predefined Data Science
Service template.

Think of ORM as Oracle’s version of a “click-and-deploy Terraform automation” inside OCI.

🧱 Key Concepts

1️⃣ Oracle Resource Manager (ORM)

 A managed Terraform service in OCI.

 Lets you automate provisioning of resources through stacks (Terraform


configuration bundles).

 Each “stack” = configuration + variables + jobs that run “plan/apply/destroy.”

✅ Why it matters:
You can configure your Data Science tenancy (groups, policies, dynamic groups)
automatically and consistently — less manual setup, fewer typos, faster onboarding.

2️⃣ Data Science Service Template


 A pre-built stack available in the OCI console under the “Service → Data Science”
section.

 When you run it, it automatically creates:

o A user group — with a name you specify.

o A dynamic group — with matching rules for:

 datasciencenotebooksession

 datasciencemodeldeployment

 datasciencejobrun

o A policy — with the key required statements for Data Science.

🔐 The Policy Includes:

1. Allow your user group to manage data-science-family in the compartment.

2. Allow the dynamic group to manage data-science-family in the compartment.

3. Allow your user group to read metrics in the compartment.

4. Allow the dynamic group to use log content in the compartment.

✅ Essentially, it automates all the “required policy setup” from Lesson 3.

3️⃣ Steps to Run the ORM Stack

Here’s the friendly “cookbook” process:

1. Open OCI Console → navigate to Resource Manager → Stacks.

2. Click Create Stack.

3. Under Template as Origin, select Service → Data Science → Select Template.

4. Pick your compartment where you want the Data Science setup.

5. (Optional) Fill in variables — group names, etc.

6. Choose to run “Apply” immediately so it deploys right away.

7. Wait for the job to finish — this provisions your user group, dynamic group, and
policies.
8. Finally, add your users to the created user group manually (since it just creates the
empty group).

✅ Done! Your tenancy is now configured automatically for OCI Data Science.

4️⃣ Terraform Option (for customization)

 The same configuration is available as Terraform scripts on a public GitHub


repository (official Oracle sample).

 You can:

o Clone/edit the script,

o Modify names, add extra policies,

o Run via ORM or your own Terraform CLI.

This is useful if you want repeatable automation across multiple environments (e.g., dev,
test, prod).

🧩 Why Use ORM Instead of Manual Setup?

Manual Configuration ORM Template

Many console steps (create groups, write policies) One stack run

Risk of missing policies Fully tested Oracle defaults

Time-consuming for large orgs Repeatable, version-controlled

Useful for learning internals Best for production consistency

✅ Exam tip: Know that both methods achieve the same result — ORM just automates the
manual tenancy setup.

🎓 What to Remember for Certification

 ORM = managed Terraform in OCI.

 Data Science Service Template automatically creates:

o One user group (you name it),


o One dynamic group (you name it, with notebook, job, and deployment
rules),

o One policy (with 4 key statements).

 You still need to add users to the group manually afterward.

 Template available only in the OCI Console (not via SDK or CLI directly).

 Alternative: same setup via Terraform from GitHub repo.

 Stack workflow: Create → Select Template → Choose Compartment → Run Apply →


Add Users.

🧠 Likely Certification Questions

1. Q: What is Oracle Resource Manager (ORM)?


A: It’s a managed Terraform service in OCI that lets you automate provisioning of
resources through stacks.

2. Q: Which resources does the Data Science Service Template automatically create?
A: A user group, a dynamic group (with matching rules for notebook sessions, model
deployments, and job runs), and a policy with key permissions for metrics, logs, and
data-science-family.

3. Q: What must you still do manually after running the stack?


A: Add users to the created user group.

4. Q: Where is the Data Science Service Template available?


A: Only in the OCI Console (though the Terraform script is public on GitHub).

5. Q: What is the advantage of using ORM instead of manual setup?


A: Automation, consistency, and reduced risk of missing required policies.

6. Q: Where can you find the Terraform version of this configuration?


A: In Oracle’s public GitHub repository (link provided in the course).

🧭 Quick Visual Summary

[Data Science Template in ORM]

Creates user group + dynamic group + policies


You add users to the user group

Tenancy ready for Data Science (metrics, logs, model deploy)

Lesson 4 – Networking for Data Science


Networking enables communication between Data Science resources and other OCI services
or the internet. Key components include:
- VCN (Virtual Cloud Network), Subnets, and VNICs for private connectivity.
- DRG (Dynamic Routing Gateway), NAT Gateway, and Service Gateway for routing.

You can choose between:


1. Default Networking: OCI-managed network with internet and service gateway access.
2. Custom Networking: Your own subnet configuration, required for private data or on-
prem access.

🌐 Lesson 5 — Networking for Data Science (Friendly Certification Summary)

This lesson helps you understand how OCI networking supports Data Science
workloads — not in deep networking detail, but just enough to know how your notebook
sessions, job runs, and model deployments connect to data, code, and services.

🧭 Lesson Goal

Understand the networking components and connectivity options that Data Science
workloads use in Oracle Cloud Infrastructure (OCI).

🧱 Part 1: Key OCI Networking Components

These are the building blocks of OCI networking. You don’t need to memorize every router,
but you should know what each one does and how it affects Data Science workloads.

Component Description Key Purpose

VCN (Virtual Cloud Your private network in OCI, like a Holds all your subnets and
Network) virtual data center. networking gateways.
Component Description Key Purpose

Defines shared rules like route


A segment within a VCN; contains
Subnet tables, security lists, DHCP
compute resources and VNICs.
options.

VNIC (Virtual Determines how your resource


Network interface attached to an
Network Interface (VM, notebook, etc.) connects
instance.
Card) internally or externally.

Router for private traffic between


DRG (Dynamic Enables hybrid or multi-region
your VCN and on-premises networks
Routing Gateway) private connectivity.
or between VCNs in different regions.

Lets private resources reach the


Allows outbound-only internet
NAT Gateway internet without exposing
connections from private subnets.
themselves to incoming traffic.

Enables private network traffic to Lets private resources interact


Service Gateway Oracle services (like Object Storage, with OCI services without using
Logging, etc.). public IPs.

🧩 How They Fit Together

Imagine your VCN as a neighborhood:

 VCN = the entire gated community.

 Subnets = individual streets.

 VNICs = driveways connecting houses (instances) to the street.

 Gateways = the neighborhood’s gates or roads to other areas (internet, OCI


services, or corporate networks).

So, your Data Science workloads (notebooks, jobs, model deployments) live inside
these subnets and need proper gateways to access data, libraries, or APIs.

🧠 Part 2: Data Science Networking Patterns

Every Data Science workload needs network access — for example, to:

 Pull code or libraries (GitHub, PyPI)

 Access datasets (Object Storage, databases)


 Connect to other workloads (Data Flow, Functions)

 Send logs or metrics

OCI offers two networking modes for Data Science resources:

⚙️Option 1 — Default Networking

Best for: quick setup, public resources, Object Storage, general OCI access.

 Your workload gets a secondary VNIC connected to an Oracle-managed subnet.

 That subnet already has:

o NAT Gateway → outbound internet access

o Service Gateway → access to OCI services privately

 You don’t have to create or configure any VCNs or policies yourself.

✅ Pros:

 Fastest and simplest.

 Fully managed by OCI.

 No extra setup or permissions required.

⚠️Limitations:

 Can’t access private enterprise networks (like on-prem or private Git servers).

 Limited control over routing and security lists.

️Option 2 — Custom Networking (Bring Your Own Network)

Best for: enterprise setups, private data, or controlled environments.

 You select a subnet from your own VCN when creating the workload.

 The Data Science service attaches to it via a secondary VNIC.

 Connectivity and access depend on how your subnet and gateways are configured.

✅ Pros:

 Can access private corporate networks or private resources.

 Full control over routing, gateways, and security.


 Enables hybrid setups (via DRG or FastConnect).

⚠️Requirements:

 You must have proper policies (from the tenancy configuration lesson).

 Requires coordination with network administrators.

 More setup time.

⚡ Part 3: Quick Setup Demo (VCN Wizard)

If you’re setting up networking yourself (not via Resource Manager), you can use the VCN
Wizard:

1. In the OCI Console, go to Networking → Virtual Cloud Networks.

2. Click Start VCN Wizard.

3. Choose Create VCN with Internet Connectivity.

4. Give it a name (e.g., “DS-VCN”).

5. Leave advanced settings as default.

6. Click Next → Create.

7. Wait a moment — the wizard creates:

o VCN

o Subnet

o Internet Gateway

o NAT Gateway

o Route Tables and Security Lists

8. Click View Virtual Cloud Network to see the full setup.

✅ Note:
If you already configured your tenancy using OCI Resource Manager (Lesson 4), you may
not need this step, since the template can create networking for you automatically.

🧩 Part 4: When to Use Which Option


Scenario Recommended Networking

Accessing public datasets, PyPI, or OCI Object Storage Default networking

Training models that read from a private DB or corporate


Custom networking
Git

Multi-region or hybrid environment Custom networking with DRG

Quick testing / sandbox Default networking

🧠 Exam Tips

 Default networking = OCI-managed, fastest, internet + OCI access only.

 Custom networking = your own VCN/subnet, for private assets and full control.

 VCN Wizard can create a ready-to-use network for testing.

 NAT Gateway allows outbound internet, Service Gateway gives private OCI access.

 DRG enables private connections to on-premises or other VCNs.

 Subnets define shared rules for VNICs (route tables, security lists, DHCP).

 Data Science workloads connect via a secondary VNIC — this is how they attach to
your network.

🧭 Visual Summary

+--------------------------+

| Oracle Cloud Network |

+--------------------------+

| VCN |

| ├── Subnet (Public) |

| │ └─ Internet GW |

| ├── Subnet (Private) |

| │ ├─ NAT Gateway |

| │ └─ Service GW |
| └── DRG (optional) |

+--------------------------+

↑ Data Science workloads (Notebook, Job, Model Deployment)

| connect via secondary VNIC to either default or custom subnet

🎓 What You Should Remember

 VCN = virtual private network in OCI.

 Subnets define how your workloads connect.

 VNICs attach workloads to networks.

 Gateways control external or private access.

 Default networking = easiest start.

 Custom networking = for private enterprise access.

 VCN Wizard simplifies setup.

 Policies may be required for custom networking.

Lesson 5 – Authenticate OCI APIs


Authentication ensures your Data Science resources can securely access other OCI services
via APIs. Two main authentication methods:
1. Resource Principals – Recommended: automatic, secure, no manual key management,
tokens cached 15 minutes.
2. Config File + API Key – Manual: user-based authentication, requires uploading config and
key files.
Resource Principals are ideal for jobs and deployments, while config files suit interactive
notebooks.

🔐 Lesson 6 — Authenticate OCI APIs (Friendly Certification Summary)

This final lesson of Module 2: Introduction and Configuration focuses on how to


authenticate your Data Science workloads and users when they interact with OCI APIs.

Authentication is all about proving who you are to OCI before accessing its services.

🎯 Lesson Goal
Understand how Data Science resources (notebooks, jobs, model deployments)
authenticate with OCI APIs using:

 Resource Principals

 Configuration files and API keys

…and know when to use each method for your workloads.

🧠 First: Authentication vs Authorization

These two concepts often get mixed up, but you’ll see both in OCI exams.

Concept What it means Example

Logging in using your OCI credentials


Authentication Proving identity – “Who are you?”
or key.

Checking permissions – “What are Your IAM policy allows you to


Authorization
you allowed to do?” manage object storage.

👉 This lesson = Authentication only


(Authorization was covered earlier in Lesson 2 – Tenancy Configuration.)

🧩 Why Authentication Matters for Data Science

Your notebook session, job, or model deployment often needs to:

 Read/write data in Object Storage,

 Launch jobs in Data Flow,

 Interact with APIs of other OCI services.

To do that, the code running inside your Data Science environment must authenticate to
OCI — just like a person logging in.

🔑 Two Main Authentication Methods

1️⃣ Resource Principals (Recommended)

A resource principal lets a resource itself (like your notebook session or job run) act as a
recognized OCI identity.
How it works:

 OCI automatically assigns certificates to the resource.

 The resource uses these certificates to authenticate securely to OCI.

 No need to store or upload API keys.

 Tokens are automatically rotated and managed by OCI.

Supported in:

 Data Science Notebook Sessions

 Job Runs

 Model Deployments

✅ Advantages

 Most secure (no manual key handling)

 Ideal for automated jobs — no user interaction needed

 No credential leakage risk

 Automatic rotation of tokens

⚠️Considerations

 Token (Resource Principal Token, RPT) is cached for 15 minutes.

o If you change IAM policies or dynamic groups → wait 15 mins for effects.

 Works seamlessly with ADS SDK, OCI SDK, and CLI, if you specify it.

2️⃣ Config File + API Key (User Principal)

This is the traditional method, using your personal IAM user credentials.

How it works:

 You create an OCI configuration file (~/.oci/config) that defines your profile
(tenancy OCID, user OCID, key file, etc.).

 You upload or generate a .pem private key.

 Your code uses this profile to authenticate as you (your IAM user).
✅ When to Use

 For interactive notebook sessions where you want to run code as your user.

 When resource principals aren’t configured.

⚠️Limitations

 You must upload config and key files manually.

 Less secure (keys are static and could be exposed).

 Not ideal for non-interactive workloads (e.g., job runs).

🧰 Example: Creating Config and Key in a Notebook

If you need to authenticate using the config + key method inside a notebook:

1. Open JupyterLab Launcher.

2. Click Notebook Examples → api_keys notebook.

3. Follow steps to create:

o OCI config file (usually in /home/datascience/.oci/config)

o .pem key file

4. Define your profile inside the config file.

Then, you can authenticate using:

import oci

config = oci.config.from_file("~/.oci/config", "DEFAULT")

object_storage_client = oci.object_storage.ObjectStorageClient(config)

🧠 Resource Principals in Code

To use Resource Principals in Python (with the OCI or ADS SDK):

import oci

import ads
# Authenticate using resource principal

auth = oci.auth.signers.get_resource_principals_signer()

object_storage_client = oci.object_storage.ObjectStorageClient(config={}, signer=auth)

✅ This tells your code:


“Authenticate as the resource (the notebook or job), not the human user.”

🧠 Certification Focus — Key Distinctions

Topic Resource Principal Config File + API Key

Identity Type OCI resource (notebook, job, etc.) IAM user

Setup Required None — automatic Manual upload of config + key

Security High — managed by OCI Medium — keys can be exposed

Token Validity 15 minutes (auto-refresh) Static until rotated manually

Ideal For Jobs, model deployments, automation Interactive notebooks

Credential Storage None Stored in file system

Rotation Automatic Manual

🧭 Quick Recap

 OCI APIs require authentication before interaction.

 Resource Principals = preferred and secure method.

 Config File + API Key = manual, user-based method.

 Resource Principal tokens last 15 minutes and refresh automatically.

 Use ADS SDK, OCI SDK, or CLI to interact programmatically.

 Authorization (permissions) handled separately through IAM policies.

⚡ Exam Tips
1. Definition: Resource principals = IAM identities assigned to OCI resources (like
Data Science jobs).

2. Caching: Resource principal token = valid for 15 minutes.

3. Security: Resource principals are safer because no keys are stored.

4. Fallback: If resource principal isn’t explicitly set, the SDK defaults to using config +
API key.

5. Config files: Must include tenancy OCID, user OCID, key fingerprint, private key
path, region, and profile.

6. Automation: Always prefer resource principals for non-interactive services.

🧩 Visual Summary

+-----------------------------+

| OCI Identity & Access Mgmt |

+-----------------------------+

+--------------------------------------+

| Authentication Methods |

+--------------------------------------+

| 1. Resource Principal (Recommended) |

| • Automatic |

| • Secure (certificates, no keys) |

| • Token cached 15 min |

| • Best for jobs, deployments |

| |

| 2. Config + API Key (Manual) |

| • User-based IAM identity |

| • Requires key + config upload |


| • Best for notebook users |

+--------------------------------------+

🧾 You Should Remember

 ✅ Resource Principal = default and secure method for workloads.

 ✅ Config + API key = manual user authentication.

 ⏱️15-minute cache for resource principal token.

 ⚙️Used with ADS SDK, OCI Python SDK, or CLI.

 🚫 Authentication ≠ Authorization — permissions depend on IAM policies.

You might also like