Skip to Content
Getting StartedQuickstart

Quickstart

Introduction

Welcome to Sequor! This guide will walk you through setting up Sequor and creating your first data integration flow. You’ll build a GitHub repository health monitor that:

  1. Fetches your GitHub repositories via API
  2. Collects open issues for each repository
  3. Transforms this data to calculate health metrics (resolution rates, issue age, etc.)
  4. Creates a private GitHub Gist with a formatted markdown report

This example demonstrates Sequor’s key capabilities: API integration, data transformation with SQL, and publishing results — all without writing application code.

Let’s get started!

Prerequisites

Before starting, make sure you have:

  • Completed the Installation steps
  • Activated your Sequor virtual environment:
source ~/sequor-venv/bin/activate

Step 1: Initialize an environment

# Create a Sequor environment 'dev' to store credentials and logs sequor env init dev

This creates a file at ~/.sequor/envs/dev.yaml with the necessary structure.

Step 2: Create a new project

Sequor uses a project-based structure to organize your integration flows. To create a new project:

# Navigate to where you want to store your projects cd ~/projects # or any directory of your choice # Initialize a new Sequor project sequor init sequor-github-integration # Navigate to the project directory cd sequor-github-integration
# Create a local reference to your environment to avoid passing it as a CLI arg on every execution echo "env: dev" > env.yaml

Step 4: Store credentials in environment

Edit ~/.sequor/envs/dev.yaml to add your GitHub API credentials:

variables: github_bearer_token: "<GitHub personal access token>"

Replace <GitHub personal access token> with your token.

How to create GitHub token

  • Log in to your GitHub account > Settings [click on your profile picture in the top right corner] > Developer settings > Personal access tokens > Tokens (classic)Generate new token button > Generate new token (classic)
  • Configure your token:
    • Add a note: Sequor
    • Set an expiration date: choose based on your needs
    • Select the required scopes/permissions:
      • repo (or public_repo for access only to the public ones) - used in this demo to fetch repos and issues
      • gist - used in this demo to create a gist with repo health report
  • Click Generate token button (at the bottom of the page) > copy and save your token in ~/.sequor/envs/dev.yaml as shown above.

Step 5: Create HTTP source to access GitHub

In the project directory, create a GitHub source in ./sources/github.yaml with the following content:

github.yaml
type: http variables: api_version: "2022-11-28" auth: type: bearer_token token: "{{ var('github_bearer_token') }}"

Notice how we use jinja expression to get the value of the github_bearer_token environment variable.

Step 6: Create DuckDB source to use as stage database

Sequor comes with built-in connector to DuckDB, an embedded database with full SQL support designed for analytics. DuckDB will be used as a stage database in this guide.

Create a DuckDB source in ./sources/duckdb.yaml:

duckdb.yaml
type: duckdb conn_str: "duckdb:///~/sequor-stage.duckdb"

Step 6: Create flow

Create a flow definition in ./flows/github_repo_health.yaml:

github_repo_health.yaml
steps: - op: http_request id: get_repos request: source: "github" url: "https://api.github.com/user/repos" method: GET headers: "Accept": "application/vnd.github+json" "X-GitHub-Api-Version": "{{ var('api_version') }}" response: parser_expression: | def evaluate(context, response): if response.status_code == 200: repos = response.json() # array of objects representing repos for repo in repos: # Extract the owner username repo["owner_username"] = repo["owner"]["login"] return { "tables": [{ "source": "duckdb", "table": "github_repos", "data": repos, "model": { "columns": [ {"name": "id", "type": "text"}, {"name": "name", "type": "text"}, {"name": "full_name", "type": "text"}, {"name": "owner_username", "type": "text"}, {"name": "open_issues_count", "type": "text"}, {"name": "forks_count", "type": "text"}, {"name": "watchers_count", "type": "text"} ] } }] } else: raise Exception("Error response code: " + str(response.status_code) + "; body: " + response.text) - op: http_request id: get_issues for_each: source: "duckdb" table: "github_repos" as: repo request: source: "github" url_expression: | def evaluate(context): return f"https://api.github.com/repos/{ context.var('repo').get('owner_username') }/{ context.var('repo').get('name') }/issues" method: GET headers: "Accept": "application/vnd.github+json" "X-GitHub-Api-Version": "{{ var('api_version') }}" response: parser_expression: | def evaluate(context, response): if response.status_code == 200: issues = response.json() for issue in issues: issue["repo_id"] = context.var("repo").get("id") return { "tables": [{ "source": "duckdb", "table": "github_issues", "data": issues, "model": { "columns": [ {"name": "repo_id", "type": "text"}, {"name": "id", "type": "text"}, {"name": "title", "type": "text"}, {"name": "state", "type": "text"}, {"name": "created_at", "type": "text"}, {"name": "updated_at", "type": "text"} ] } }] } else: raise Exception("Error response code: " + str(response.status_code) + "; body: " + response.text) - op: transform id: create_health_report source: duckdb query: | SELECT r.name AS repository_name, r.full_name AS full_repository_name, COUNT(CASE WHEN i.state = 'open' THEN 1 END) AS actual_open_issues, COUNT(i.id) AS total_issues, CASE WHEN COUNT(i.id) > 0 THEN ROUND((COUNT(CASE WHEN i.state = 'closed' THEN 1 END) * 100.0 / COUNT(i.id))::numeric, 2) ELSE 0 END AS issue_resolution_rate, ROUND(AVG(CASE WHEN i.state = 'open' THEN EXTRACT(EPOCH FROM (CURRENT_TIMESTAMP - i.created_at::timestamp))/86400 END)::numeric, 1) AS avg_age_open_issues_days, COUNT(CASE WHEN i.state = 'open' AND EXTRACT(EPOCH FROM (CURRENT_TIMESTAMP - i.created_at::timestamp))/86400 > 30 THEN 1 END) AS stale_issues_count, CASE WHEN COUNT(CASE WHEN i.state = 'open' THEN 1 END) > 10 AND COUNT(CASE WHEN i.state = 'open' AND EXTRACT(EPOCH FROM (CURRENT_TIMESTAMP - i.created_at::timestamp))/86400 > 30 THEN 1 END) > 5 THEN 'Needs attention' WHEN COUNT(CASE WHEN i.state = 'open' THEN 1 END) > 0 AND AVG(CASE WHEN i.state = 'open' THEN EXTRACT(EPOCH FROM (CURRENT_TIMESTAMP - i.created_at::timestamp))/86400 END) > 60 THEN 'Stale' WHEN COUNT(i.id) = 0 THEN 'No activity' WHEN COUNT(i.id) > 0 AND COUNT(CASE WHEN i.state = 'open' THEN 1 END) < 5 THEN 'Healthy' ELSE 'Moderate' END AS health_status FROM github_repos r LEFT JOIN github_issues i ON r.id = i.repo_id GROUP BY r.id, r.name, r.full_name, r.open_issues_count, r.forks_count, r.watchers_count; target_table: "github_repo_health_report" - op: http_request id: create_gist request: source: "github" url: "https://api.github.com/gists" method: POST headers: "Accept": "application/vnd.github+json" "X-GitHub-Api-Version": "{{ var('api_version') }}" body_format: json body_expression: | def evaluate(context): # Get the health report data report_data = context.sources.query("duckdb", """ SELECT * FROM github_repo_health_report """) # Format the data as markdown table markdown = "# GitHub Repository Health Report\n\n" markdown += "| Repository | Open Issues | Total Issues | Resolution Rate (%) | Avg Age (days) | Stale Issues | Health Status |\n" markdown += "|------------|-------------|--------------|---------------------|----------------|--------------|---------------|\n" for row in report_data: markdown += f"| {row['repository_name']} | {row['actual_open_issues']} | {row['total_issues']} | {row['issue_resolution_rate']} | {row['avg_age_open_issues_days']} | {row['stale_issues_count']} | {row['health_status']} |\n" # Create gist payload return { "description": "GitHub Repository Health Report", "public": False, "files": { "repo_health_report.md": { "content": markdown } } } response: parser_expression: | def evaluate(context, response): if response.status_code == 201: gist_data = response.json() return { "variables": { "gist_url": gist_data["html_url"] }, "tables": [{ "source": "duckdb", "table": "github_gist_created", "data": [gist_data], "model": { "columns": [ {"name": "id", "type": "text"}, {"name": "html_url", "type": "text"}, {"name": "created_at", "type": "text"} ] } }] } else: raise Exception("Error creating gist: " + str(response.status_code) + "; body: " + response.text) - op: print id: print_gist_url message: "Gist URL created: {{ var('gist_url') }}"

Step 7: Understanding the Flow

Before running the flow, let’s examine its key components:

HTTP Operations with Response Parsing

The http_request: get_repos operation shows how Sequor handles API request and responses:

  • Fetches repositories via GitHub’s REST API
  • Uses a parser to transform JSON responses into relational tables
  • Defines a schema that serves dual purposes: creating database tables and guiding how JSON fields map to table columns
  • Extracts owner_username from the nested owner -> login object, demonstrating how to flatten nested data to match your schema

Record Iteration with Dynamic Parameters

The http_request: get_issues operation demonstrates Sequor’s iteration capabilities:

  • Processes each record from the github_repos table using for_each
  • Dynamically constructs URLs with Python expressions
  • Access the current record data via context.var('repo').get('name') to build repository-specific endpoints
  • Enriches issues by adding repo_id to establish relationships between issues and repositories

SQL Transformation

The transform: create_health_report operation showcases SQL-powered tranformation and analytics:

  • Joins repositories and issues tables
  • Calculates metrics like resolution rates and issue age
  • Produces an analysis report with standardized health status indicators

Downstream Publishing

The http_request: create_gist operation demonstrates how to publish results:

  • Uses inline SQL to query the report table
  • Transforms data into a markdown table
  • Sends the formatted report to GitHub’s Gist API
  • Stores the URL of the created gist as a Sequor variable to print the URL for convenient access to your report

Step 8: Run your flow

sequor run github_repo_health

That’s it! Your flow will execute, fetching repositories, analyzing issues, and creating a health report. When the flow completes, you’ll see the direct URL to your newly created report in the output:

2025-04-17 14:00:33,764 INFO [sequor.ops.print]: Message: Gist URL created: https://gist.github.com/<your_user_name>/<id_of_created_gist>

You can also find the created gist among your other gists at https://gist.github.com/<your_user_name>

💡

Pro tip: These debug CLI commands will be invaluable when developing Sequor flows:

  • Run a specific operation: sequor run github_repo_health --op-id "get_repos"
  • Preview HTTP responses: sequor run github_repo_health --op-id "get_repos" --debug-httprequest-preview-trace. This runs only the HTTP request part of http_request and shows the response — perfect for understanding the response format before writing your parsing logic.

Next steps

Nice work! You’ve just built and run your first Sequor flow.

Start integrating your systems by exploring our prebuilt integrations for Salesforce, Mailchimp, BigCommerce, and more at https://github.com/paloaltodatabases/sequor-integrations.

Last updated on