Quickstart

Introduction

Welcome to Sequor! This guide will walk you through setting up Sequor and creating your first data integration flow. You’ll build a GitHub repository health monitor that:

Fetches your GitHub repositories via API
Collects open issues for each repository
Transforms this data to calculate health metrics (resolution rates, issue age, etc.)
Creates a private GitHub Gist with a formatted markdown report

This example demonstrates Sequor’s key capabilities: API integration, data transformation with SQL, and publishing results — all without writing application code.

Let’s get started!

Prerequisites

Before starting, make sure you have:

Completed the Installation steps
Activated your Sequor virtual environment:


source ~/sequor-venv/bin/activate

Step 1: Initialize an environment


# Create a Sequor environment 'dev' to store credentials and logs
sequor env init dev

This creates a file at ~/.sequor/envs/dev.yaml with the necessary structure.

Step 2: Create a new project

Sequor uses a project-based structure to organize your integration flows. To create a new project:


# Navigate to where you want to store your projects
cd ~/projects  # or any directory of your choice
 
# Initialize a new Sequor project
sequor init sequor-github-integration
 
# Navigate to the project directory
cd sequor-github-integration

Step 3: Link project to environment


# Create a local reference to your environment to avoid passing it as a CLI arg on every execution
echo "env: dev" > env.yaml

Step 4: Store credentials in environment

Edit ~/.sequor/envs/dev.yaml to add your GitHub API credentials:


variables:
  github_bearer_token: "<GitHub personal access token>"

Replace <GitHub personal access token> with your token.

How to create GitHub token

Log in to your GitHub account > Settings [click on your profile picture in the top right corner] > Developer settings > Personal access tokens > Tokens (classic) → Generate new token button > Generate new token (classic)
Configure your token:
- Add a note: Sequor
- Set an expiration date: choose based on your needs
- Select the required scopes/permissions:
  - repo (or public_repo for access only to the public ones) - used in this demo to fetch repos and issues
  - gist - used in this demo to create a gist with repo health report
Click Generate token button (at the bottom of the page) > copy and save your token in ~/.sequor/envs/dev.yaml as shown above.

Step 5: Create HTTP source to access GitHub

In the project directory, create a GitHub source in ./sources/github.yaml with the following content:

github.yaml


type: http
variables:
    api_version: "2022-11-28"
auth: 
    type: bearer_token
    token: "{{ var('github_bearer_token') }}"

Notice how we use jinja expression to get the value of the github_bearer_token environment variable.

Step 6: Create DuckDB source to use as stage database

Sequor comes with built-in connector to DuckDB, an embedded database with full SQL support designed for analytics. DuckDB will be used as a stage database in this guide.

Create a DuckDB source in ./sources/duckdb.yaml:

duckdb.yaml


type: duckdb
conn_str: "duckdb:///~/sequor-stage.duckdb"

Step 6: Create flow

Create a flow definition in ./flows/github_repo_health.yaml:

github_repo_health.yaml


steps:
  - op: http_request
    id: get_repos
    request:
      source: "github"
      url: "https://api.github.com/user/repos"
      method: GET
      headers:
        "Accept": "application/vnd.github+json"
        "X-GitHub-Api-Version": "{{ var('api_version') }}"
    response:
      success_status: [200]
      tables: 
        - source: "duckdb"
          table: "github_repos"
          columns: {
            "id": "text", "name": "text", "full_name": "text", "owner_username": "text", 
            "open_issues_count": "text", "forks_count": "text", "watchers_count": "text"
          }
          data_expression: |
            repos = response.json()  # array of objects representing repos
            for repo in repos:
              # Extract the owner username
              repo["owner_username"] = repo["owner"]["login"]
            return repos
 
  - op: http_request
    id: get_issues
    for_each:
      source: "duckdb"
      table: "github_repos"
      as: repo
    request:
      source: "github"
      url_expression: f"https://api.github.com/repos/{ var('repo').get('owner_username') }/{ var('repo').get('name') }/issues"
      method: GET
      headers:
        "Accept": "application/vnd.github+json"
        "X-GitHub-Api-Version": "{{ var('api_version') }}"
    response:
      success_status: [200]
      tables: 
        - source: "duckdb"
          table: "github_issues"
          columns: {"repo_id": "text", "id": "text", "title": "text", "state": "text", "created_at": "text", "updated_at": "text"}
          data_expression: | 
            issues = response.json()
            for issue in issues:
              issue["repo_id"] = var("repo").get("id")
            return issues 
   
  - op: transform
    id: create_health_report
    source: duckdb
    query: |
      SELECT
        r.name AS repository_name,
        r.full_name AS full_repository_name,
        COUNT(CASE WHEN i.state = 'open' THEN 1 END) AS actual_open_issues,
        COUNT(i.id) AS total_issues,
        CASE WHEN COUNT(i.id) > 0 THEN 
          ROUND((COUNT(CASE WHEN i.state = 'closed' THEN 1 END) * 100.0 / COUNT(i.id))::numeric, 2)
          ELSE 0 END AS issue_resolution_rate,
        ROUND(AVG(CASE WHEN i.state = 'open' THEN 
          EXTRACT(EPOCH FROM (CURRENT_TIMESTAMP - i.created_at::timestamp))/86400 END)::numeric, 1) AS avg_age_open_issues_days,
        COUNT(CASE WHEN i.state = 'open' AND 
          EXTRACT(EPOCH FROM (CURRENT_TIMESTAMP - i.created_at::timestamp))/86400 > 30 THEN 1 END) AS stale_issues_count,
        CASE
          WHEN COUNT(CASE WHEN i.state = 'open' THEN 1 END) > 10 AND 
              COUNT(CASE WHEN i.state = 'open' AND EXTRACT(EPOCH FROM (CURRENT_TIMESTAMP - i.created_at::timestamp))/86400 > 30 THEN 1 END) > 5 
              THEN 'Needs attention'
          WHEN COUNT(CASE WHEN i.state = 'open' THEN 1 END) > 0 AND 
              AVG(CASE WHEN i.state = 'open' THEN EXTRACT(EPOCH FROM (CURRENT_TIMESTAMP - i.created_at::timestamp))/86400 END) > 60 
              THEN 'Stale'
          WHEN COUNT(i.id) = 0 THEN 'No activity'
          WHEN COUNT(i.id) > 0 AND COUNT(CASE WHEN i.state = 'open' THEN 1 END) < 5 THEN 'Healthy'
          ELSE 'Moderate'
        END AS health_status
      FROM github_repos r
      LEFT JOIN github_issues i ON r.id = i.repo_id
      GROUP BY r.id, r.name, r.full_name, r.open_issues_count, r.forks_count, r.watchers_count;
    target_table: "github_repo_health_report"
  
  - op: http_request
    id: create_gist
    request:
      source: "github"
      url: "https://api.github.com/gists"
      method: POST
      headers:
        "Accept": "application/vnd.github+json"
        "X-GitHub-Api-Version": "{{ var('api_version') }}"
      body_format: json
      body_expression: |
        # Get the health report data
        report_data = query("duckdb", """
          SELECT * FROM github_repo_health_report
        """)
        
        # Format the data as markdown table
        markdown = "# GitHub Repository Health Report\n\n"
        markdown += "| Repository | Open Issues | Total Issues | Resolution Rate (%) | Avg Age (days) | Stale Issues | Health Status |\n"
        markdown += "|------------|-------------|--------------|---------------------|----------------|--------------|---------------|\n"
        
        for row in report_data:
          markdown += f"| {row['repository_name']} | {row['actual_open_issues']} | {row['total_issues']} | {row['issue_resolution_rate']} | {row['avg_age_open_issues_days']} | {row['stale_issues_count']} | {row['health_status']} |\n"
        
        # Create gist payload
        return {
          "description": "GitHub Repository Health Report",
          "public": False,
          "files": {
            "repo_health_report.md": {
              "content": markdown
            }
          }
        }
    response:
      success_status: [201]
      tables: 
        - source: "duckdb"
          table: "github_gist_created"
          columns: {"id": "text", "html_url": "text", "created_at": "text"}
          data_expression: return [response.json()]
      variables:
        gist_url_expression: response.json()["html_url"]
 
  - op: print
    id: print_gist_url
    message: "Gist URL created: {{ var('gist_url') }}"

Step 7: Understanding the Flow

Before running the flow, let’s examine its key components:

HTTP Operations with Response Parsing

The http_request: get_repos operation shows how Sequor handles API request and responses:

Fetches repositories via GitHub’s REST API
Uses a parser to transform JSON responses into relational tables
Defines a schema that serves dual purposes: creating database tables and guiding how JSON fields map to table columns
Extracts owner_username from the nested owner -> login object, demonstrating how to flatten nested data to match your schema

Record Iteration with Dynamic Parameters

The http_request: get_issues operation demonstrates Sequor’s iteration capabilities:

Processes each record from the github_repos table using for_each
Dynamically constructs URLs with Python expressions
Access the current record data via var('repo').get('name') to build repository-specific endpoints
Enriches issues by adding repo_id to establish relationships between issues and repositories

SQL Transformation

The transform: create_health_report operation showcases SQL-powered tranformation and analytics:

Joins repositories and issues tables
Calculates metrics like resolution rates and issue age
Produces an analysis report with standardized health status indicators

Downstream Publishing

The http_request: create_gist operation demonstrates how to publish results:

Uses inline SQL to query the report table
Transforms data into a markdown table
Sends the formatted report to GitHub’s Gist API
Stores the URL of the created gist as a Sequor variable to print the URL for convenient access to your report

Step 8: Run your flow


sequor run github_repo_health

That’s it! Your flow will execute, fetching repositories, analyzing issues, and creating a health report. When the flow completes, you’ll see the direct URL to your newly created report in the output:


2025-04-17 14:00:33,764 INFO [sequor.ops.print]: Message: Gist URL created: https://gist.github.com/<your_user_name>/<id_of_created_gist>

You can also find the created gist among your other gists at https://gist.github.com/<your_user_name>

💡

Pro tip: These debug CLI commands will be invaluable when developing Sequor flows:

Run a specific operation: sequor run github_repo_health --op-id "get_repos"
Preview HTTP responses: sequor run github_repo_health --op-id "get_repos" --debug-httprequest-preview-trace. This runs only the HTTP request part of http_request and shows the response — perfect for understanding the response format before writing your parsing logic.

Next steps

Nice work! You’ve just built and run your first Sequor flow.

Start integrating your systems by exploring our prebuilt integrations for Salesforce, Mailchimp, BigCommerce, and more at https://github.com/paloaltodatabases/sequor-integrations .