Quickstart
Introduction
Welcome to Sequor! This guide will walk you through setting up Sequor and creating your first data integration flow. You’ll build a GitHub repository health monitor that:
- Fetches your GitHub repositories via API
- Collects open issues for each repository
- Transforms this data to calculate health metrics (resolution rates, issue age, etc.)
- Creates a private GitHub Gist with a formatted markdown report
This example demonstrates Sequor’s key capabilities: API integration, data transformation with SQL, and publishing results — all without writing application code.
Let’s get started!
Prerequisites
Before starting, make sure you have:
- Completed the Installation steps
- Activated your Sequor virtual environment:
source ~/sequor-venv/bin/activate
Step 1: Initialize an environment
# Create a Sequor environment 'dev' to store credentials and logs
sequor env init dev
This creates a file at ~/.sequor/envs/dev.yaml
with the necessary structure.
Step 2: Create a new project
Sequor uses a project-based structure to organize your integration flows. To create a new project:
# Navigate to where you want to store your projects
cd ~/projects # or any directory of your choice
# Initialize a new Sequor project
sequor init sequor-github-integration
# Navigate to the project directory
cd sequor-github-integration
Step 3: Link project to environment
# Create a local reference to your environment to avoid passing it as a CLI arg on every execution
echo "env: dev" > env.yaml
Step 4: Store credentials in environment
Edit ~/.sequor/envs/dev.yaml
to add your GitHub API credentials:
variables:
github_bearer_token: "<GitHub personal access token>"
Replace <GitHub personal access token>
with your token.
How to create GitHub token
- Log in to your GitHub account > Settings [click on your profile picture in the top right corner] > Developer settings > Personal access tokens > Tokens (classic) → Generate new token button > Generate new token (classic)
- Configure your token:
- Add a note: Sequor
- Set an expiration date: choose based on your needs
- Select the required scopes/permissions:
repo
(orpublic_repo
for access only to the public ones) - used in this demo to fetch repos and issuesgist
- used in this demo to create a gist with repo health report
- Click Generate token button (at the bottom of the page) > copy and save your token in
~/.sequor/envs/dev.yaml
as shown above.
Step 5: Create HTTP source to access GitHub
In the project directory, create a GitHub source in ./sources/github.yaml
with the following content:
type: http
variables:
api_version: "2022-11-28"
auth:
type: bearer_token
token: "{{ var('github_bearer_token') }}"
Notice how we use jinja expression to get the value of the github_bearer_token
environment variable.
Step 6: Create DuckDB source to use as stage database
Sequor comes with built-in connector to DuckDB, an embedded database with full SQL support designed for analytics. DuckDB will be used as a stage database in this guide.
Create a DuckDB source in ./sources/duckdb.yaml
:
type: duckdb
conn_str: "duckdb:///~/sequor-stage.duckdb"
Step 6: Create flow
Create a flow definition in ./flows/github_repo_health.yaml
:
steps:
- op: http_request
id: get_repos
request:
source: "github"
url: "https://api.github.com/user/repos"
method: GET
headers:
"Accept": "application/vnd.github+json"
"X-GitHub-Api-Version": "{{ var('api_version') }}"
response:
parser_expression: |
def evaluate(context, response):
if response.status_code == 200:
repos = response.json() # array of objects representing repos
for repo in repos:
# Extract the owner username
repo["owner_username"] = repo["owner"]["login"]
return {
"tables": [{
"source": "duckdb",
"table": "github_repos",
"data": repos,
"model": {
"columns": [
{"name": "id", "type": "text"},
{"name": "name", "type": "text"},
{"name": "full_name", "type": "text"},
{"name": "owner_username", "type": "text"},
{"name": "open_issues_count", "type": "text"},
{"name": "forks_count", "type": "text"},
{"name": "watchers_count", "type": "text"}
]
}
}]
}
else:
raise Exception("Error response code: " + str(response.status_code) + "; body: " + response.text)
- op: http_request
id: get_issues
for_each:
source: "duckdb"
table: "github_repos"
as: repo
request:
source: "github"
url_expression: |
def evaluate(context):
return f"https://api.github.com/repos/{ context.var('repo').get('owner_username') }/{ context.var('repo').get('name') }/issues"
method: GET
headers:
"Accept": "application/vnd.github+json"
"X-GitHub-Api-Version": "{{ var('api_version') }}"
response:
parser_expression: |
def evaluate(context, response):
if response.status_code == 200:
issues = response.json()
for issue in issues:
issue["repo_id"] = context.var("repo").get("id")
return {
"tables": [{
"source": "duckdb",
"table": "github_issues",
"data": issues,
"model": {
"columns": [
{"name": "repo_id", "type": "text"},
{"name": "id", "type": "text"},
{"name": "title", "type": "text"},
{"name": "state", "type": "text"},
{"name": "created_at", "type": "text"},
{"name": "updated_at", "type": "text"}
]
}
}]
}
else:
raise Exception("Error response code: " + str(response.status_code) + "; body: " + response.text)
- op: transform
id: create_health_report
source: duckdb
query: |
SELECT
r.name AS repository_name,
r.full_name AS full_repository_name,
COUNT(CASE WHEN i.state = 'open' THEN 1 END) AS actual_open_issues,
COUNT(i.id) AS total_issues,
CASE WHEN COUNT(i.id) > 0 THEN
ROUND((COUNT(CASE WHEN i.state = 'closed' THEN 1 END) * 100.0 / COUNT(i.id))::numeric, 2)
ELSE 0 END AS issue_resolution_rate,
ROUND(AVG(CASE WHEN i.state = 'open' THEN
EXTRACT(EPOCH FROM (CURRENT_TIMESTAMP - i.created_at::timestamp))/86400 END)::numeric, 1) AS avg_age_open_issues_days,
COUNT(CASE WHEN i.state = 'open' AND
EXTRACT(EPOCH FROM (CURRENT_TIMESTAMP - i.created_at::timestamp))/86400 > 30 THEN 1 END) AS stale_issues_count,
CASE
WHEN COUNT(CASE WHEN i.state = 'open' THEN 1 END) > 10 AND
COUNT(CASE WHEN i.state = 'open' AND EXTRACT(EPOCH FROM (CURRENT_TIMESTAMP - i.created_at::timestamp))/86400 > 30 THEN 1 END) > 5
THEN 'Needs attention'
WHEN COUNT(CASE WHEN i.state = 'open' THEN 1 END) > 0 AND
AVG(CASE WHEN i.state = 'open' THEN EXTRACT(EPOCH FROM (CURRENT_TIMESTAMP - i.created_at::timestamp))/86400 END) > 60
THEN 'Stale'
WHEN COUNT(i.id) = 0 THEN 'No activity'
WHEN COUNT(i.id) > 0 AND COUNT(CASE WHEN i.state = 'open' THEN 1 END) < 5 THEN 'Healthy'
ELSE 'Moderate'
END AS health_status
FROM github_repos r
LEFT JOIN github_issues i ON r.id = i.repo_id
GROUP BY r.id, r.name, r.full_name, r.open_issues_count, r.forks_count, r.watchers_count;
target_table: "github_repo_health_report"
- op: http_request
id: create_gist
request:
source: "github"
url: "https://api.github.com/gists"
method: POST
headers:
"Accept": "application/vnd.github+json"
"X-GitHub-Api-Version": "{{ var('api_version') }}"
body_format: json
body_expression: |
def evaluate(context):
# Get the health report data
report_data = context.sources.query("duckdb", """
SELECT * FROM github_repo_health_report
""")
# Format the data as markdown table
markdown = "# GitHub Repository Health Report\n\n"
markdown += "| Repository | Open Issues | Total Issues | Resolution Rate (%) | Avg Age (days) | Stale Issues | Health Status |\n"
markdown += "|------------|-------------|--------------|---------------------|----------------|--------------|---------------|\n"
for row in report_data:
markdown += f"| {row['repository_name']} | {row['actual_open_issues']} | {row['total_issues']} | {row['issue_resolution_rate']} | {row['avg_age_open_issues_days']} | {row['stale_issues_count']} | {row['health_status']} |\n"
# Create gist payload
return {
"description": "GitHub Repository Health Report",
"public": False,
"files": {
"repo_health_report.md": {
"content": markdown
}
}
}
response:
parser_expression: |
def evaluate(context, response):
if response.status_code == 201:
gist_data = response.json()
return {
"variables": {
"gist_url": gist_data["html_url"]
},
"tables": [{
"source": "duckdb",
"table": "github_gist_created",
"data": [gist_data],
"model": {
"columns": [
{"name": "id", "type": "text"},
{"name": "html_url", "type": "text"},
{"name": "created_at", "type": "text"}
]
}
}]
}
else:
raise Exception("Error creating gist: " + str(response.status_code) + "; body: " + response.text)
- op: print
id: print_gist_url
message: "Gist URL created: {{ var('gist_url') }}"
Step 7: Understanding the Flow
Before running the flow, let’s examine its key components:
HTTP Operations with Response Parsing
The http_request: get_repos
operation shows how Sequor handles API request and responses:
- Fetches repositories via GitHub’s REST API
- Uses a parser to transform JSON responses into relational tables
- Defines a schema that serves dual purposes: creating database tables and guiding how JSON fields map to table columns
- Extracts
owner_username
from the nestedowner -> login
object, demonstrating how to flatten nested data to match your schema
Record Iteration with Dynamic Parameters
The http_request: get_issues
operation demonstrates Sequor’s iteration capabilities:
- Processes each record from the github_repos table using for_each
- Dynamically constructs URLs with Python expressions
- Access the current record data via
context.var('repo').get('name')
to build repository-specific endpoints - Enriches issues by adding
repo_id
to establish relationships between issues and repositories
SQL Transformation
The transform: create_health_report
operation showcases SQL-powered tranformation and analytics:
- Joins repositories and issues tables
- Calculates metrics like resolution rates and issue age
- Produces an analysis report with standardized health status indicators
Downstream Publishing
The http_request: create_gist
operation demonstrates how to publish results:
- Uses inline SQL to query the report table
- Transforms data into a markdown table
- Sends the formatted report to GitHub’s Gist API
- Stores the URL of the created gist as a Sequor variable to print the URL for convenient access to your report
Step 8: Run your flow
sequor run github_repo_health
That’s it! Your flow will execute, fetching repositories, analyzing issues, and creating a health report. When the flow completes, you’ll see the direct URL to your newly created report in the output:
2025-04-17 14:00:33,764 INFO [sequor.ops.print]: Message: Gist URL created: https://gist.github.com/<your_user_name>/<id_of_created_gist>
You can also find the created gist among your other gists at https://gist.github.com/<your_user_name>
Pro tip: These debug CLI commands will be invaluable when developing Sequor flows:
- Run a specific operation:
sequor run github_repo_health --op-id "get_repos"
- Preview HTTP responses:
sequor run github_repo_health --op-id "get_repos" --debug-httprequest-preview-trace
. This runs only the HTTP request part ofhttp_request
and shows the response — perfect for understanding the response format before writing your parsing logic.
Next steps
Nice work! You’ve just built and run your first Sequor flow.
Start integrating your systems by exploring our prebuilt integrations for Salesforce, Mailchimp, BigCommerce, and more at https://github.com/paloaltodatabases/sequor-integrations .