OCI Data Science 2025 Professional Certification – Module 2 Summary
This document summarizes Module 2: Introduction and Configuration for the OCI Data
Science Professional Certification 2025. It provides friendly explanations, key concepts, and
likely exam topics for each lesson, followed by a one-page reference table.
Lesson 1 – Introduction to OCI Data Science Configuration
This lesson introduces the structure and purpose of configuring Oracle Cloud Infrastructure
(OCI) for Data Science. You learn how tenancy, compartments, groups, policies, and
networking form the foundation for secure and organized Data Science workloads. Key
takeaway: configuration ensures proper access control, scalability, and integration across
OCI services.
🧠 1. Overview: What You’re Learning
This lesson introduces you to:
The history and evolution of data science.
The Oracle Cloud Infrastructure (OCI) Data Science service and how it fits into
Oracle’s AI ecosystem.
Key concepts, terminology, and components you’ll use in real OCI projects.
It sets the stage for later lessons on configuration, model building, deployment, and
automation.
📜 2. The Historical Background (Fun but Important)
Oracle starts with history to show how ideas of simplicity, data, and learning evolved:
Thinker / Era Concept Modern Data Science Connection
William Ockham Ockham’s Razor — prefer In ML, we aim for the simplest model
(1300s) simple explanations that fits the data (avoid overfitting).
More data improves Data quantity and quality are key to
Tobias Mayer (1700s)
accuracy model performance.
ML systems improve automatically
Arthur Samuel (1952) Coined machine learning
through experience.
Predicted data analysis as Early vision of “data-driven decision
John Tukey (1962)
a science making.”
Thinker / Era Concept Modern Data Science Connection
Beat Garry Kasparov Demonstrated the power of
IBM Deep Blue (1997)
using computation algorithms + compute.
DJ Patil & Jeff Popularized data science Unified ML, stats, and engineering
Hammerbacher (2008) term under one role.
👉 Exam tip: You don’t need to memorize dates, but understand how simplicity, data, and
learning from experience form the foundation of data science.
🌍 3. Modern Data Science Use Case — Employee Attrition
Oracle uses a business example: predicting employee attrition (turnover).
Why? Because it’s a real, relatable business problem that combines:
Data collection (HR, satisfaction, salaries),
ML modeling (classification/prediction),
Insights for business (retain talent).
👉 Exam tip: Expect scenario questions like:
“Which OCI service would a data scientist use to build an attrition prediction model end-to-
end?”
Answer: OCI Data Science.
🧩 4. Oracle’s AI and ML Ecosystem — How It Fits Together
Oracle’s approach is layered. Picture three levels:
1️⃣ Data layer (foundation):
All data types — structured (databases), unstructured (text, images, IoT, social media).
2️⃣ Middle layer (services):
Machine Learning Services:
Used by data scientists to build, train, deploy, and manage custom models.
→ Tools: OCI Data Science, Oracle Database ML, Data Labeling.
AI Services:
Pre-built models you can use via API (vision, language, speech, anomaly detection).
→ No need to build from scratch.
3️⃣ Top layer (applications):
Business apps, analytics, and custom systems that consume AI results.
👉 Exam tip: Be able to explain the difference between AI services and Machine Learning
services.
Feature AI Services ML Services
Purpose Ready-to-use ML models Build custom models
Who uses App developers, business users Data scientists
Customization Minimal (train with your data) Full control
Example OCI Language, OCI Vision OCI Data Science
⚙️5. OCI Data Science — The Core Service
This is the heart of the certification.
OCI Data Science helps data scientists through the entire ML lifecycle:
Build → Train → Deploy → Manage
Supports:
Python and open-source tools
JupyterLab notebooks
Compute flexibility (CPU/GPU)
Full integration with OCI services (security, storage, etc.)
🔑 The 3 Core Principles
Principle What It Means Example / Benefit
Simplify setup; access big Run Jupyter notebooks on cloud GPUs
1. Accelerate
compute easily without managing servers
Teams share assets, ensure
2. Collaborate Shared projects, shared model catalog
reproducibility
3. Enterprise- Security, governance, OCI handles maintenance, access control,
grade automation auditing
🧰 6. Key Terminology You Must Know (Exam Critical)
Term What It Means Exam Example
A container/workspace for organizing all “Where do you organize your
Project
assets (notebooks, models, jobs). team’s data science work?”
“Which OCI component
Managed JupyterLab environment with
Notebook Session provides a managed Jupyter
compute and storage.
environment?”
“What tool manages
Environment and package manager for
Conda dependencies in OCI Data
Python libraries.
Science notebooks?”
ADS SDK Oracle’s Python library that automates “Which SDK helps automate
(Accelerated Data ML steps (data prep, AutoML, model the ML workflow in OCI Data
Science) explainability, etc.). Science?”
“Where are models stored and
Centralized repository for storing,
Model Catalog versioned in OCI Data
sharing, and managing models.
Science?”
Model Exposes a model as an HTTP endpoint for “How do you operationalize a
Deployment real-time predictions. trained model?”
“Which feature runs
Automated, repeatable ML task (training,
Data Science Job scheduled ML tasks on
evaluation, batch scoring).
managed infrastructure?”
🧩 7. Ways to Access OCI Data Science
Access
Description Example
Method
Click and create projects, notebook
OCI Console Browser interface (most common)
sessions
REST API Full programmatic control Automate deployment pipelines
Code-based access (Python, Java,
SDKs Python SDK for model deployment
etc.)
CLI Command-line control Fast scripting or automation
🌐 8. OCI Regions
Regions = physical data centers where OCI services run.
OCI Data Science is available across global, government, and dedicated regions.
Regions are expanding — always check oracle.com/cloud for updates.
👉 Exam tip: You might get a question like:
“Where does OCI Data Science run?”
✅ Answer: In OCI regions (commercial, government, or dedicated).
🎯 9. Key Takeaways for Certification
✅ Know the purpose of OCI Data Science: to manage the end-to-end ML lifecycle using
Python and open-source tools.
✅ Understand the difference between AI services vs ML services.
✅ Memorize key components: Projects, Notebook Sessions, Model Catalog, Model
Deployment, Data Science Jobs.
✅ Understand collaboration and governance features (enterprise-grade security, team
sharing).
✅ Recognize integration points: ADS SDK, Conda environments, REST APIs, SDKs, and CLI.
✅ Be familiar with the general Oracle AI architecture (Data → ML/AI Services →
Applications).
🧩 Example Certification-Style Questions
1. Which OCI service is primarily used by data scientists to build and deploy
custom ML models?
→ OCI Data Science.
2. What is the purpose of the Model Catalog in OCI Data Science?
→ To store, manage, and share trained models with metadata and versioning.
3. What is the difference between AI Services and Machine Learning Services in
OCI?
→ AI Services = pre-built models; ML Services = build your own models.
4. Which environment does OCI Data Science use for interactive coding?
→ JupyterLab notebooks.
5. What is the role of the ADS SDK?
→ To simplify and automate data science workflows like data prep, AutoML, and
model deployment.
6. What’s one key advantage of using OCI Data Science vs local laptop notebooks?
→ Access to scalable, managed compute and integrated OCI security.
Lesson 2 – Tenancy Configuration Basics
This lesson explains how OCI resources are organized within a tenancy. Key components
include:
- Compartments: logical containers for resources.
- User Groups and Dynamic Groups: manage access for users and resources.
- Policies: define permissions with syntax like 'Allow group <name> to manage data-
science-family in compartment <name>'.
You also learn about verbs (inspect, read, use, manage) defining access levels, and required
policies for Data Science, including metrics, logs, and network access.
🧠 1. What is ADS SDK?
ADS (Accelerated Data Science) SDK is Oracle’s Python library that helps automate and
simplify the entire machine learning lifecycle on OCI.
Think of ADS as your “data science assistant” inside OCI Data Science.
It’s built by data scientists for data scientists, with two main goals:
1. Integrate seamlessly with other OCI services.
2. Accelerate every step — from data connection → visualization → model training →
deployment → explainability.
⚙️2. The Two Versions of ADS
Version Description Where You Use It
Your laptop, local
Public (Open Available on GitHub and PyPI — anyone can
JupyterLab, or non-OCI
Source) install it.
environment.
Installed automatically inside OCI Data Science
OCI Integrated Within OCI Data Science
environments. Includes AutoML and ML
Version Notebooks.
Explainability.
👉 Exam tip: AutoML and Explainability are only available in the OCI version, not the public
one.
🧩 3. Key Goals of ADS SDK
ADS SDK is meant to:
Integrate OCI services securely (Vault, Autonomous Database, Big Data Service).
Simplify data science tasks like data exploration, feature engineering, training, and
deployment.
Automate repetitive work with a few lines of Python.
Example:
Instead of manually connecting to databases or tuning parameters,
you can do it with ADS classes like ADSTuner, DatasetFactory, or ADSModel.
🔑 4. Key ADS SDK Features and What They Do
Stage What ADS Does Key Classes / Features
Securely access data from DatasetFactory, SecretKeeper,
1. Connect to Data
anywhere (OCI, DBs, S3, etc.) OCIAuth, OCI Object Storage, Vault
2. Data Visualization & Understand data through plots,
Smart Plotting, Feature Types
Exploration correlations, summaries
Suggest or perform data
3. Feature Engineering ADSDataset, Feature Type System
transformations automatically
Automate model creation,
4. Model Training AutoML, ADSTuner, ADSModel
tuning, and evaluation
Compare models and view
5. Model Evaluation ADSEvaluator
metrics easily
Explain what your model
6. Model Explainability Module (PDP, ALE,
learned (local/global
Interpretability What-if)
explainability)
Push models to production in ADSModel, ModelArtifact,
7. Model Deployment
OCI Model Catalog ModelDeployment
🧮 5. Connecting to Data Sources (Know These!)
ADS simplifies connecting to many types of data securely and efficiently.
Data Source Type How ADS Connects Notes
Fastest for small
Local Storage Uses block storage inside the notebook session
datasets
Access via oci:// protocol directly in pandas Uses fsspec to act
OCI Object Storage
(pd.read_csv('oci://bucket/file.csv')) like local files
Oracle Databases /
Secure connection using SecretKeeper and Keeps credentials
Autonomous DB
credentials stored in OCI Vault safe
(ADB)
OCI Big Data Service Connect directly to Hadoop-based data without
(HDFS) copying
S3, Google Cloud Storage, Azure Blob/Data Lake, Requires simple
3rd Party Clouds
Dropbox configuration
Read files directly from HTTP/HTTPS into Useful for public
Web Data
pandas datasets
Works for non-
NoSQL Databases Use DatasetFactory to connect and query
relational sources
👉 Exam tip: You may get a question like:
“Which ADS component securely manages database credentials?”
✅ SecretKeeper (stores secrets in OCI Vault).
📊 6. Data Visualization & Feature Types
ADS includes smart plotting and feature type classes that:
Automatically pick the right chart (bar, scatter, histogram, etc.).
Provide summary statistics and correlation heatmaps.
Allow reusable visualizations across projects.
👉 Example: If you define “Age” as a numerical feature type, ADS will always visualize it with
histograms or box plots.
This standardization improves consistency and saves time.
🧬 7. Feature Engineering (Where Models Get Smarter)
ADS can automatically analyze your dataset and suggest transformations to improve
model performance:
Handle categorical encoding
Manage missing values / imputation
Suggest new derived features
All of this is done with the ADSDataset class, which wraps around a pandas dataframe.
“Feature engineering is often the difference between a good model and a great model.”
🤖 8. Model Training and Optimization
ADS helps you train models automatically or manually.
Two main tools:
Tool Purpose
AutoML (Oracle Labs) Automatically builds and compares models — picks best one.
ADSTuner Performs hyperparameter optimization manually or automatically.
Once trained, ADS can:
Package models as artifacts,
Save them in the Model Catalog, and
Push them into production.
👉 Exam tip: “Which ADS component performs hyperparameter optimization?”
✅ ADSTuner
📈 9. Model Evaluation
After training, you use ADSEvaluator to:
Compare different models side-by-side,
Automatically pick proper metrics for classification or regression,
Generate charts and performance summaries automatically.
Example:
For binary classification: AUC, accuracy, confusion matrix.
For regression: RMSE, MAE, R².
No need to manually plot each — ADS does it for you.
🔍 10. Model Interpretability & Explainability
One of the most powerful parts of ADS!
ADS provides both:
Type Description Example
Local Understand why the model made a “Why did this employee get
Explainability specific prediction predicted to leave?”
Global Understand overall model behavior and “Which features influence
Explainability feature importance attrition the most?”
Techniques used:
PDP (Partial Dependence Plots) – show how features affect predictions.
ALE (Accumulated Local Effects) – similar to PDP, but more accurate for
correlated features.
What-If Scenarios – modify inputs and observe how predictions change.
👉 Exam tip: Be able to describe local vs global explainability and mention tools like PDP
and ALE.
🚀 11. Model Deployment
Once your model is trained and saved in the Model Catalog, ADS makes it easy to deploy.
ADS Model Framework lets you:
Deploy models from any library (scikit-learn, TensorFlow, PyTorch, XGBoost,
AutoML).
Deploy even generic models.
Automatically create an HTTP endpoint in OCI for real-time predictions.
Integration with OCI Logging:
Records prediction logs and access logs for monitoring and debugging.
👉 Exam tip:
“How does ADS help operationalize models?”
✅ It provides deployment classes that publish models as HTTP endpoints integrated with
OCI logging.
🧩 12. Summary — What You Should Remember
Concept Why It Matters
ADS SDK Oracle’s Python library for automating data science in OCI
Two Versions Public (basic) vs OCI (with AutoML + Explainability)
SecretKeeper Securely stores database credentials in OCI Vault
ADSTuner Hyperparameter optimization
ADSDataset Simplifies data cleaning and feature engineering
ADSEvaluator Compares and evaluates multiple models
Explainability Tools Local & global explanations with PDP/ALE
ADSModel Framework Easily deploys models into OCI production
Integration Works seamlessly with OCI Object Storage, DBs, Big Data, SDKs
🧠 13. Example Certification-Style Questions
1. What is the purpose of the ADS SDK?
→ To simplify and automate the end-to-end machine learning lifecycle on OCI.
2. Which component allows secure storage and access to database credentials?
→ SecretKeeper (integrated with OCI Vault).
3. What are two key features available only in the OCI version of ADS SDK?
→ AutoML and ML Explainability.
4. Which class helps perform hyperparameter tuning?
→ ADSTuner.
5. What does the ADSDataset class do?
→ Wraps a pandas DataFrame and provides feature engineering tools and
recommendations.
6. How does ADS support model interpretability?
→ Through local/global explainability tools like PDPs, ALE plots, and what-if testing.
7. Which component allows deploying trained models into OCI production
environments?
→ ADSModel Framework.
8. How can you connect pandas directly to OCI Object Storage?
→ Using the oci:// protocol with fsspec integration.
Lesson 3 – Configure Tenancy with OCI Resource Manager
Instead of manually creating compartments, groups, and policies, you can use the OCI
Resource Manager (ORM) Data Science template. The ORM stack automates configuration
by creating user and dynamic groups, matching rules, and policies. Steps: create a stack →
select the Data Science template → select your compartment → run the stack. You can also
access Terraform scripts from Oracle’s GitHub for customization.
Friendly summary — Tenancy & configuration (Lesson 3)
Nice — this lesson covers the tenancy basics you must know for OCI Data Science:
compartments, user groups, dynamic groups, and policies. Below I’ll explain each
concept in plain language, point out the exam-important bits, give concrete examples
(policy syntax and dynamic group rules), and finish with likely certification questions and
short answers.
1) High-level idea (one-liner)
Organize resources with compartments, group people and resources with user groups
and dynamic groups, then grant permissions by writing policies that tie groups →
actions (verbs) → resource types → compartments.
2) Concepts explained for a student
Compartments
Logical folders for OCI resources (like projects or workspaces).
Used to isolate and control access: policies target groups in a compartment.
Plan first: decide how you’ll split resources (environments, teams, projects) before
creating compartments.
Practical exam tip: Know that access is granted at the group + compartment level.
User groups
Collections of human user accounts.
Admins create users, create groups, then add users to groups.
Policies reference groups to grant human users permissions.
Exam tip: “Create users → create groups → add users to groups.”
Dynamic groups
Special groups, but membership is resources (not humans).
Membership is determined by matching rules (attributes of resources, e.g.,
resource type + compartment).
Useful to let notebook sessions, job runs, or deployments act as principals and call
other OCI services (for example, let a notebook access object storage).
Key idea: Dynamic groups let resources act like users — you write policies that grant those
resources permissions.
Example matching rule (pseudo):
(instance.compartment.id = 'ocid1.compartment.oc1..aaaaaaa...') AND (resource.type =
'dataScienceNotebookSession')
(In practice use the exact rule syntaxes Oracle shows for Data Science resource types.)
Policies (the enforcement rules)
Syntax pattern:
Allow <group> to <verb> <resource-type> in compartment <compartment-name>
Components:
o Group: user group or dynamic group name.
o Verb: access level (inspect, read, use, manage).
o Resource type: single resource (data-science-job) or family (data-science-
family).
o Compartment: the target compartment.
Verb hierarchy (least → most permissive):
inspect — list/see resources, minimal metadata
read — inspect + get resource and metadata
use — read + operate on the resource (not create/delete usually)
manage — full permissions incl. create/delete
3) Practical policy examples (copy/learn these forms)
A. Allow data scientists to manage all Data Science resources in a compartment
Allow group DataScienceUsers to manage data-science-family in compartment DataScience-
Compartment
B. Allow Data Science notebook sessions (dynamic group) to manage data science
resources
Allow dynamic-group DS-Resources to manage data-science-family in compartment
DataScience-Compartment
C. Metrics & logging access (common required statements)
Allow group DataScienceUsers to read metrics in compartment DataScience-Compartment
Allow dynamic-group DS-Resources to use log-content in compartment DataScience-
Compartment
Allow group DataScienceUsers to manage log-groups in compartment DataScience-
Compartment
Allow group DataScienceUsers to use log-content in compartment DataScience-
Compartment
D. Virtual networking (if using custom VCNs)
Allow service data-science to use virtual-network-family in compartment DataScience-
Compartment
Allow group DataScienceUsers to use virtual-network-family in compartment DataScience-
Compartment
Allow dynamic-group DS-Resources to use virtual-network-family in compartment
DataScience-Compartment
E. Useful: grant object storage access
Allow group DataScienceUsers to manage object-family in compartment DataScience-
Compartment
Allow dynamic-group DS-Resources to manage object-family in compartment DataScience-
Compartment
Small tip: use aggregate resource families (like data-science-family) to simplify policies
instead of listing many resource types.
4) Typical workflow to configure tenancy for Data Science
1. Plan compartment structure (team/project/env boundaries).
2. Create compartment(s) in Identity console. Grab the OCID if needed.
3. Create users, then create user groups, and add users to groups.
4. Create dynamic group(s) with matching rules for notebook sessions, job runs,
model deployments (scoped to the compartment).
5. Create policies: required policies (data-science manage, metrics/logging), plus
auxiliary policies (object storage, networking) as needed.
6. Test: launch a notebook session and verify it can access required services per the
policies.
5) Exam-focused key points to memorize
Access is group + compartment based.
Dynamic groups contain resources, not humans. They use matching rules.
Policy syntax: Allow <group> to <verb> <resource> in compartment <name>
Verb meanings (inspect / read / use / manage) — know differences.
Required policies for Data Science: allow user group to manage data-science-
family; allow dynamic group to manage data-science-family; plus metrics and
logging-related allow statements.
Useful extra policies: object storage (object-family), virtual network (virtual-
network-family) if using custom networking.
6) Likely certification questions (with model answers)
1. Q: What is a dynamic group in OCI and when would you use it?
A: A dynamic group is a group whose members are resource principals (cloud
resources) determined by matching rules. Use it to grant permissions to resources
(e.g., notebook sessions) so they can call other OCI services securely.
2. Q: Write the policy statement to allow the DataScienceTeam group to fully manage
Data Science resources in the Prod-DS compartment.
A: Allow group DataScienceTeam to manage data-science-family in compartment
Prod-DS
3. Q: Which policy verb would you use to let a user list resources but not view or
change them?
A: inspect
4. Q: You created a notebook session in compartment DS-Work. Which construct lets
that notebook act as a principal to read from object storage?
A: Create a dynamic group with a matching rule for notebook sessions in DS-Work,
then write a policy that allows that dynamic group to read/use/manage object-
family in the compartment.
5. Q: What are two required policies for Data Science resources to access metrics and
logs?
A: Examples: Allow group <users> to read metrics in compartment <compartment>
and Allow dynamic-group <dynGroup> to use log-content in compartment
<compartment> (plus manage log-groups/use log-content for user group).
6. Q: Why plan compartments before creating them?
A: Because compartments are logical isolation boundaries used in policies; good
planning prevents complex cross-compartment access and simplifies governance.
7) Quick checklist you can memorize
Plan compartments.
Create users → create groups → add users to groups.
Create dynamic groups with matching rules for notebook sessions, job runs, model
deployments.
Add required policies: data-science manage (user group + dynamic group),
metrics/logging access.
Add useful policies: object storage, virtual network family (if needed).
Test resource access.
Friendly Summary — Lesson 4: Configure Tenancy with OCI Resource Manager (ORM)
This lesson builds directly on Lesson 3 — you already know how to configure
compartments, user groups, dynamic groups, and policies manually.
Now you’ll see a faster, automated way: using Oracle Resource Manager (ORM) with the
Data Science Service template.
Let’s go through it step by step — clear, simple, and focused on what’s exam-relevant.
️Big Picture
Instead of creating all the tenancy setup by hand (user group, dynamic group, policies), you
can let Oracle Resource Manager do it automatically using a predefined Data Science
Service template.
Think of ORM as Oracle’s version of a “click-and-deploy Terraform automation” inside OCI.
🧱 Key Concepts
1️⃣ Oracle Resource Manager (ORM)
A managed Terraform service in OCI.
Lets you automate provisioning of resources through stacks (Terraform
configuration bundles).
Each “stack” = configuration + variables + jobs that run “plan/apply/destroy.”
✅ Why it matters:
You can configure your Data Science tenancy (groups, policies, dynamic groups)
automatically and consistently — less manual setup, fewer typos, faster onboarding.
2️⃣ Data Science Service Template
A pre-built stack available in the OCI console under the “Service → Data Science”
section.
When you run it, it automatically creates:
o A user group — with a name you specify.
o A dynamic group — with matching rules for:
datasciencenotebooksession
datasciencemodeldeployment
datasciencejobrun
o A policy — with the key required statements for Data Science.
🔐 The Policy Includes:
1. Allow your user group to manage data-science-family in the compartment.
2. Allow the dynamic group to manage data-science-family in the compartment.
3. Allow your user group to read metrics in the compartment.
4. Allow the dynamic group to use log content in the compartment.
✅ Essentially, it automates all the “required policy setup” from Lesson 3.
3️⃣ Steps to Run the ORM Stack
Here’s the friendly “cookbook” process:
1. Open OCI Console → navigate to Resource Manager → Stacks.
2. Click Create Stack.
3. Under Template as Origin, select Service → Data Science → Select Template.
4. Pick your compartment where you want the Data Science setup.
5. (Optional) Fill in variables — group names, etc.
6. Choose to run “Apply” immediately so it deploys right away.
7. Wait for the job to finish — this provisions your user group, dynamic group, and
policies.
8. Finally, add your users to the created user group manually (since it just creates the
empty group).
✅ Done! Your tenancy is now configured automatically for OCI Data Science.
4️⃣ Terraform Option (for customization)
The same configuration is available as Terraform scripts on a public GitHub
repository (official Oracle sample).
You can:
o Clone/edit the script,
o Modify names, add extra policies,
o Run via ORM or your own Terraform CLI.
This is useful if you want repeatable automation across multiple environments (e.g., dev,
test, prod).
🧩 Why Use ORM Instead of Manual Setup?
Manual Configuration ORM Template
Many console steps (create groups, write policies) One stack run
Risk of missing policies Fully tested Oracle defaults
Time-consuming for large orgs Repeatable, version-controlled
Useful for learning internals Best for production consistency
✅ Exam tip: Know that both methods achieve the same result — ORM just automates the
manual tenancy setup.
🎓 What to Remember for Certification
ORM = managed Terraform in OCI.
Data Science Service Template automatically creates:
o One user group (you name it),
o One dynamic group (you name it, with notebook, job, and deployment
rules),
o One policy (with 4 key statements).
You still need to add users to the group manually afterward.
Template available only in the OCI Console (not via SDK or CLI directly).
Alternative: same setup via Terraform from GitHub repo.
Stack workflow: Create → Select Template → Choose Compartment → Run Apply →
Add Users.
🧠 Likely Certification Questions
1. Q: What is Oracle Resource Manager (ORM)?
A: It’s a managed Terraform service in OCI that lets you automate provisioning of
resources through stacks.
2. Q: Which resources does the Data Science Service Template automatically create?
A: A user group, a dynamic group (with matching rules for notebook sessions, model
deployments, and job runs), and a policy with key permissions for metrics, logs, and
data-science-family.
3. Q: What must you still do manually after running the stack?
A: Add users to the created user group.
4. Q: Where is the Data Science Service Template available?
A: Only in the OCI Console (though the Terraform script is public on GitHub).
5. Q: What is the advantage of using ORM instead of manual setup?
A: Automation, consistency, and reduced risk of missing required policies.
6. Q: Where can you find the Terraform version of this configuration?
A: In Oracle’s public GitHub repository (link provided in the course).
🧭 Quick Visual Summary
[Data Science Template in ORM]
Creates user group + dynamic group + policies
↓
You add users to the user group
Tenancy ready for Data Science (metrics, logs, model deploy)
Lesson 4 – Networking for Data Science
Networking enables communication between Data Science resources and other OCI services
or the internet. Key components include:
- VCN (Virtual Cloud Network), Subnets, and VNICs for private connectivity.
- DRG (Dynamic Routing Gateway), NAT Gateway, and Service Gateway for routing.
You can choose between:
1. Default Networking: OCI-managed network with internet and service gateway access.
2. Custom Networking: Your own subnet configuration, required for private data or on-
prem access.
🌐 Lesson 5 — Networking for Data Science (Friendly Certification Summary)
This lesson helps you understand how OCI networking supports Data Science
workloads — not in deep networking detail, but just enough to know how your notebook
sessions, job runs, and model deployments connect to data, code, and services.
🧭 Lesson Goal
Understand the networking components and connectivity options that Data Science
workloads use in Oracle Cloud Infrastructure (OCI).
🧱 Part 1: Key OCI Networking Components
These are the building blocks of OCI networking. You don’t need to memorize every router,
but you should know what each one does and how it affects Data Science workloads.
Component Description Key Purpose
VCN (Virtual Cloud Your private network in OCI, like a Holds all your subnets and
Network) virtual data center. networking gateways.
Component Description Key Purpose
Defines shared rules like route
A segment within a VCN; contains
Subnet tables, security lists, DHCP
compute resources and VNICs.
options.
VNIC (Virtual Determines how your resource
Network interface attached to an
Network Interface (VM, notebook, etc.) connects
instance.
Card) internally or externally.
Router for private traffic between
DRG (Dynamic Enables hybrid or multi-region
your VCN and on-premises networks
Routing Gateway) private connectivity.
or between VCNs in different regions.
Lets private resources reach the
Allows outbound-only internet
NAT Gateway internet without exposing
connections from private subnets.
themselves to incoming traffic.
Enables private network traffic to Lets private resources interact
Service Gateway Oracle services (like Object Storage, with OCI services without using
Logging, etc.). public IPs.
🧩 How They Fit Together
Imagine your VCN as a neighborhood:
VCN = the entire gated community.
Subnets = individual streets.
VNICs = driveways connecting houses (instances) to the street.
Gateways = the neighborhood’s gates or roads to other areas (internet, OCI
services, or corporate networks).
So, your Data Science workloads (notebooks, jobs, model deployments) live inside
these subnets and need proper gateways to access data, libraries, or APIs.
🧠 Part 2: Data Science Networking Patterns
Every Data Science workload needs network access — for example, to:
Pull code or libraries (GitHub, PyPI)
Access datasets (Object Storage, databases)
Connect to other workloads (Data Flow, Functions)
Send logs or metrics
OCI offers two networking modes for Data Science resources:
⚙️Option 1 — Default Networking
Best for: quick setup, public resources, Object Storage, general OCI access.
Your workload gets a secondary VNIC connected to an Oracle-managed subnet.
That subnet already has:
o NAT Gateway → outbound internet access
o Service Gateway → access to OCI services privately
You don’t have to create or configure any VCNs or policies yourself.
✅ Pros:
Fastest and simplest.
Fully managed by OCI.
No extra setup or permissions required.
⚠️Limitations:
Can’t access private enterprise networks (like on-prem or private Git servers).
Limited control over routing and security lists.
️Option 2 — Custom Networking (Bring Your Own Network)
Best for: enterprise setups, private data, or controlled environments.
You select a subnet from your own VCN when creating the workload.
The Data Science service attaches to it via a secondary VNIC.
Connectivity and access depend on how your subnet and gateways are configured.
✅ Pros:
Can access private corporate networks or private resources.
Full control over routing, gateways, and security.
Enables hybrid setups (via DRG or FastConnect).
⚠️Requirements:
You must have proper policies (from the tenancy configuration lesson).
Requires coordination with network administrators.
More setup time.
⚡ Part 3: Quick Setup Demo (VCN Wizard)
If you’re setting up networking yourself (not via Resource Manager), you can use the VCN
Wizard:
1. In the OCI Console, go to Networking → Virtual Cloud Networks.
2. Click Start VCN Wizard.
3. Choose Create VCN with Internet Connectivity.
4. Give it a name (e.g., “DS-VCN”).
5. Leave advanced settings as default.
6. Click Next → Create.
7. Wait a moment — the wizard creates:
o VCN
o Subnet
o Internet Gateway
o NAT Gateway
o Route Tables and Security Lists
8. Click View Virtual Cloud Network to see the full setup.
✅ Note:
If you already configured your tenancy using OCI Resource Manager (Lesson 4), you may
not need this step, since the template can create networking for you automatically.
🧩 Part 4: When to Use Which Option
Scenario Recommended Networking
Accessing public datasets, PyPI, or OCI Object Storage Default networking
Training models that read from a private DB or corporate
Custom networking
Git
Multi-region or hybrid environment Custom networking with DRG
Quick testing / sandbox Default networking
🧠 Exam Tips
Default networking = OCI-managed, fastest, internet + OCI access only.
Custom networking = your own VCN/subnet, for private assets and full control.
VCN Wizard can create a ready-to-use network for testing.
NAT Gateway allows outbound internet, Service Gateway gives private OCI access.
DRG enables private connections to on-premises or other VCNs.
Subnets define shared rules for VNICs (route tables, security lists, DHCP).
Data Science workloads connect via a secondary VNIC — this is how they attach to
your network.
🧭 Visual Summary
+--------------------------+
| Oracle Cloud Network |
+--------------------------+
| VCN |
| ├── Subnet (Public) |
| │ └─ Internet GW |
| ├── Subnet (Private) |
| │ ├─ NAT Gateway |
| │ └─ Service GW |
| └── DRG (optional) |
+--------------------------+
↑ Data Science workloads (Notebook, Job, Model Deployment)
| connect via secondary VNIC to either default or custom subnet
🎓 What You Should Remember
VCN = virtual private network in OCI.
Subnets define how your workloads connect.
VNICs attach workloads to networks.
Gateways control external or private access.
Default networking = easiest start.
Custom networking = for private enterprise access.
VCN Wizard simplifies setup.
Policies may be required for custom networking.
Lesson 5 – Authenticate OCI APIs
Authentication ensures your Data Science resources can securely access other OCI services
via APIs. Two main authentication methods:
1. Resource Principals – Recommended: automatic, secure, no manual key management,
tokens cached 15 minutes.
2. Config File + API Key – Manual: user-based authentication, requires uploading config and
key files.
Resource Principals are ideal for jobs and deployments, while config files suit interactive
notebooks.
🔐 Lesson 6 — Authenticate OCI APIs (Friendly Certification Summary)
This final lesson of Module 2: Introduction and Configuration focuses on how to
authenticate your Data Science workloads and users when they interact with OCI APIs.
Authentication is all about proving who you are to OCI before accessing its services.
🎯 Lesson Goal
Understand how Data Science resources (notebooks, jobs, model deployments)
authenticate with OCI APIs using:
Resource Principals
Configuration files and API keys
…and know when to use each method for your workloads.
🧠 First: Authentication vs Authorization
These two concepts often get mixed up, but you’ll see both in OCI exams.
Concept What it means Example
Logging in using your OCI credentials
Authentication Proving identity – “Who are you?”
or key.
Checking permissions – “What are Your IAM policy allows you to
Authorization
you allowed to do?” manage object storage.
👉 This lesson = Authentication only
(Authorization was covered earlier in Lesson 2 – Tenancy Configuration.)
🧩 Why Authentication Matters for Data Science
Your notebook session, job, or model deployment often needs to:
Read/write data in Object Storage,
Launch jobs in Data Flow,
Interact with APIs of other OCI services.
To do that, the code running inside your Data Science environment must authenticate to
OCI — just like a person logging in.
🔑 Two Main Authentication Methods
1️⃣ Resource Principals (Recommended)
A resource principal lets a resource itself (like your notebook session or job run) act as a
recognized OCI identity.
How it works:
OCI automatically assigns certificates to the resource.
The resource uses these certificates to authenticate securely to OCI.
No need to store or upload API keys.
Tokens are automatically rotated and managed by OCI.
Supported in:
Data Science Notebook Sessions
Job Runs
Model Deployments
✅ Advantages
Most secure (no manual key handling)
Ideal for automated jobs — no user interaction needed
No credential leakage risk
Automatic rotation of tokens
⚠️Considerations
Token (Resource Principal Token, RPT) is cached for 15 minutes.
o If you change IAM policies or dynamic groups → wait 15 mins for effects.
Works seamlessly with ADS SDK, OCI SDK, and CLI, if you specify it.
2️⃣ Config File + API Key (User Principal)
This is the traditional method, using your personal IAM user credentials.
How it works:
You create an OCI configuration file (~/.oci/config) that defines your profile
(tenancy OCID, user OCID, key file, etc.).
You upload or generate a .pem private key.
Your code uses this profile to authenticate as you (your IAM user).
✅ When to Use
For interactive notebook sessions where you want to run code as your user.
When resource principals aren’t configured.
⚠️Limitations
You must upload config and key files manually.
Less secure (keys are static and could be exposed).
Not ideal for non-interactive workloads (e.g., job runs).
🧰 Example: Creating Config and Key in a Notebook
If you need to authenticate using the config + key method inside a notebook:
1. Open JupyterLab Launcher.
2. Click Notebook Examples → api_keys notebook.
3. Follow steps to create:
o OCI config file (usually in /home/datascience/.oci/config)
o .pem key file
4. Define your profile inside the config file.
Then, you can authenticate using:
import oci
config = oci.config.from_file("~/.oci/config", "DEFAULT")
object_storage_client = oci.object_storage.ObjectStorageClient(config)
🧠 Resource Principals in Code
To use Resource Principals in Python (with the OCI or ADS SDK):
import oci
import ads
# Authenticate using resource principal
auth = oci.auth.signers.get_resource_principals_signer()
object_storage_client = oci.object_storage.ObjectStorageClient(config={}, signer=auth)
✅ This tells your code:
“Authenticate as the resource (the notebook or job), not the human user.”
🧠 Certification Focus — Key Distinctions
Topic Resource Principal Config File + API Key
Identity Type OCI resource (notebook, job, etc.) IAM user
Setup Required None — automatic Manual upload of config + key
Security High — managed by OCI Medium — keys can be exposed
Token Validity 15 minutes (auto-refresh) Static until rotated manually
Ideal For Jobs, model deployments, automation Interactive notebooks
Credential Storage None Stored in file system
Rotation Automatic Manual
🧭 Quick Recap
OCI APIs require authentication before interaction.
Resource Principals = preferred and secure method.
Config File + API Key = manual, user-based method.
Resource Principal tokens last 15 minutes and refresh automatically.
Use ADS SDK, OCI SDK, or CLI to interact programmatically.
Authorization (permissions) handled separately through IAM policies.
⚡ Exam Tips
1. Definition: Resource principals = IAM identities assigned to OCI resources (like
Data Science jobs).
2. Caching: Resource principal token = valid for 15 minutes.
3. Security: Resource principals are safer because no keys are stored.
4. Fallback: If resource principal isn’t explicitly set, the SDK defaults to using config +
API key.
5. Config files: Must include tenancy OCID, user OCID, key fingerprint, private key
path, region, and profile.
6. Automation: Always prefer resource principals for non-interactive services.
🧩 Visual Summary
+-----------------------------+
| OCI Identity & Access Mgmt |
+-----------------------------+
+--------------------------------------+
| Authentication Methods |
+--------------------------------------+
| 1. Resource Principal (Recommended) |
| • Automatic |
| • Secure (certificates, no keys) |
| • Token cached 15 min |
| • Best for jobs, deployments |
| |
| 2. Config + API Key (Manual) |
| • User-based IAM identity |
| • Requires key + config upload |
| • Best for notebook users |
+--------------------------------------+
🧾 You Should Remember
✅ Resource Principal = default and secure method for workloads.
✅ Config + API key = manual user authentication.
⏱️15-minute cache for resource principal token.
⚙️Used with ADS SDK, OCI Python SDK, or CLI.
🚫 Authentication ≠ Authorization — permissions depend on IAM policies.