Getting started with Databox Connector Builder

The Connector Builder gives users the ability to quickly create a connection between Databox and any data source.

With the Connector Builder, you’ll be able to:

  • Connect to a data source via a defined authentication protocol
  • Extract data from a data source using defined API calls
  • Transform data to metrics
  • Load data to Databox

In this quick tutorial, we'll create a basic GitHub connector that will authorize with GitHub using Oauth2 protocol, define an API endpoint, make an API call, transform the data and, finally, push the data to Databox. Databox Designer can then be used to visualize the data.

Here’s what you need to do.

  1. Login to the Builder environment
  2. Write a Manifest file
  3. Connect the data source
  4. Run tasks and check the returned data
  5. Transform and push the data
  6. Visualize and verify the data
  7. Submit for review

1. Login

Once you've received your login data and logged in to Webapp, simply click the link Build connectors next to your user name. This will take you to the Builder environment.

logo

2. Manifest File

First, you need to define the manifest file. The manifest file is a simple YAML format file that consists of several sections. These sections give you the ability to describe the tasks you want to achieve in a simple, logical way.

Think of the manifest file as the blueprint of a connector. It defines all components that the connector needs to recognize in order to communicate with a 3rd party API.

The sections are:

  • Info: basic connector info and metrics are defined here
  • Auth: API authentication type and any data needed to authenticate
  • API: defines all of the API calls that will be made
  • UserSelect: optional, for additional choices upon connect
  • Tasks: runs API calls and extracts desired data
  • Transformations: PHP functions that will manipulate data from tasks
  • Batches: specifies which tasks will run and when it will happen

Actual PHP code, if needed at all, will be written only in the Transformations section.

Now, let’s explore what’s necessary to build a usable connector so that you can achieve your goal of creating some functional databoards.

Info section

Info:
  name: GitHub
  author: Databox Demo
  description: "Online project hosting using Git. Includes source-code browser, in-line editing, wikis, and ticketing."
  avatar: https://cdn1.databox.com/images/apps/s/GitHub.png
  avatarBig: https://cdn1.databox.com/images/apps/m/GitHub.png
  timezone: UTC
  popup:
    width: 1090
    height: 750
  category:
    - IT & Engineering
  profile: repo.name
  metrics:
    - key: contributors
      name: Contributors
      dataType: OverallTotal
      intervals: [AllTime,Today,Yesterday,Last7Days,Last30Days,MTD,LastMonth]
    - key: branches
      name: Branches
      dataType: OverallTotal
      intervals: [AllTime,Today,Yesterday,Last7Days,Last30Days,MTD,LastMonth]

First, we must give our connector a unique name. Enter a description, along with links to two connector avatar pictures (must differ in size), set timezone as UTC, set the popup size in pixels, select the category and define metrics. Make sure you choose an avatar picture that best describes this connector.

Please refer to Manifest for all possible field types and details on each field.

Auth section

Auth:
  provider: oauth2
  parameters:
    client_id: '573d6XXXXXXXXXXXXb6a9'
    client_secret: '691d2XXXXXXXXXXXXXXXXXXXXXe7a74'
    response_type: code
    grant_type: 'authorization_code'
    authorization_endpoint: 'https://github.com/login/oauth/authorize'
    access_token_endpoint: 'https://github.com/login/oauth/access_token'
    scope: repo

This is where we define the provider type. GitHub uses OAuth2, so we'll use just that for this example. Parameters are defined according to GitHub API docs, and our Auth section docs docs.

Visit your GitHub OAuth settings page and choose “Register a new application”

logo

Populate all fields, like this:

logo

Once the form is submitted, you will be presented with Client ID and Client Secret values. These are needed for the OAuth authentication. Enter them in the Auth section as seen above.

We’ll also visit the GitHub OAuth docs to familiarize ourselves with mandatory fields for OAuth2 and API calls, as well as what the values should be.

API section

In the API section, we define all calls that get sent to the API, along with parameters and the http method type.

API:
  baseUrl: https://api.github.com/
  operations:
    Default:
      parameters:
        page:
          type: string
          location: query
          default: 1
        per_page:
          type: string
          location: query
          default: 100
    GetUserRepos:
      extends: Default
      httpMethod: GET
      uri: user/repos
    GetContributors:
      httpMethod: GET
      uri: repos/{repo.name}/contributors

GitHub API URL is entered as baseUrl. Operations are as follows:

Default parameters page and per_page are defined. These are parameters that will be set for every request that uses them with extends keyword.

GetUserRepos is the arbitrary name we gave to the GET API call user/repos. It's good practice to keep it close to the actual API call name. It uses the Default parameters via extends keyword. URI user/repos will be called.

GetContributors is another API call. URI will use the parameter {repo.name} which is set during the initial connect and will be substituted by chosen repo name.

UserSelect section

UserSelect:
    - id: repo
      description: Select repository.
      method: Tasks.GetUserRepos
      transform: '[].{id:id name:full_name}'

Optional section. This is used when an additional action is necessary upon connect. For GitHub, this is the repository selection. Upon initial selection, a window entitled description is displayed, followed by the execution of the GetUserRepos task, which returns all the user's repos. The user must then select the repo he/she wishes to use.

Jmespath transform is used on reply to filter just the data we need. More on that shortly.

The chosen repo is then stored in the database and used in all subsequent calls that need a specific repo.

Tasks section

Tasks:
  - name: GetUserRepos
    operation: GetUserRepos
    parameters:
      page: 1
    paginate:
      property: page
      max_pages: 20
      array_merge: true
  - name: GetContributors
    operation: GetContributors
    transform: '[].{id:id login:login}'
  - name: GetBranches
    operation: GetBranches
    transform: 'length([])'

When tasks are defined, each task is tied to arbitrary API call via operation keyword. An API call is run and the JSON is returned. Additional parameters may also be supplied, and pagination may be added to the Task.

Task usually returns larger amounts of data than we need. That's why data can be filtered and cleaned up before use. Jmespath format is used for this purpose. Its syntax may be used to transform values. For easier development and debugging later on, it’s good practice to filter out all the data we don't need

Transformations section

Transformations:
  - name: GetContributors
    from: GetContributors
    use: getContributors
  - name: GetBranches
    from: GetBranches
    use:
      - metrickey: branches

To begin with, transformation is named via the name field. The Task is then linked to it with the from directive, and finally, the PHP function that will transform the data is named with the use keyword. In our example, getContributors will be the function.

The next transformation named GetBranches is even simpler because it doesn't need any PHP code. Output of the task GetBranches is simply pushed as a metric key named branches. Since the output of GetBranches is just a number, this shortcut can be used to omit any PHP code.

Basically, we can do anything with data that the native API didn't provide, or we can push the data that we get. Data is then subsequently pushed to Databox. More on this later.

Batches section

Batch:
  - id: default
    runs:
      - use: GetContributors
      - use: GetBranches
  - id: contributors
    runs:
      - use: GetContributors
  - id: branches
    runs:
      - use: GetBranches

Batches define which tasks/transformations will run on initial connect, daily (id: default) and hourly (id: metrickeyname). In this example, default batch starts upon initial connect and runs two tasks/transformations. After that, two hourly batches are defined. Their id is set to the metric keys used.

Based on the id set, batches are run as following:

  • history -> runs only when the user connects for the first time
  • default -> runs when the user connects, plus once daily. Run time depends on the timezone selected (at 4 AM)
  • nightly -> runs 1 minute after midnight, UTC timezone
  • <metric_name> -> runs every hour if the user used this metrics in the Designer. Doesn't run for non-active metrics

3. Connect the datasource

Great! The Manifest is now finally ready for testing and for our first connection. After saving it, we can now try to connect our connector to GitHub.

logo

A window will popup for us to authorize this connection, then another popup from the UserSelect section will appear. The GitHub repository must be selected.

After a successful connect, the menu is populated with some further options:

logo

If everything went well, our datasource will now be connected. We can proceed with running Tasks, checking if the API call was executed correctly and verifying data.

4. Tasks

Select Tasks from the menu, so that the Run Tasks sandbox environment is displayed. From here, we can run each Task and inspect the API call, raw response data and mapped response filtered data.

All of our Manifest-defined tasks are listed here:

logo

So, let's choose the second task called GetContributors and run it.

logo

You might be wondering, “what just happened?” Essentially, the Task ran with the GetContributors API call. The request sent is visible in the Request tab. We can see the GET uri and all of the other headers sent.

If we go over to the Response tab, we can see the whole response GitHub API returned. We'll omit the listing since it's too big.

Now we move to the Mapped Response tab. Output here is much cleaner, since we've applied the jmespath transformation transform: '[].{id:id login:login}' to the JSON response. This transformation basically only keeps id and login fields, and purges all other data from the original output.

logo

5. Transformations

We can now be sure that our Task is running smoothly, and that data is returned and filtered. It’s time for the final step, reformatting the data to Databox’s format and pushing it to Databox’s database servers. This is the only step that might require some PHP code, which can be very simple, depending on the API call and what we need to achieve.

We’ll navigate back to the connectors screen and choose the Transformation logic link. This time, we get an editor window with some pre-filled PHP code to make things easier. Code can be edited here or copy & pasted from your favorite IDE/editor. Let's add a custom function:

public function getContributors($response)
{
    $noOfContributors = is_array($response)
        ? count($response)
        : 0;

    $push = [[
        'date'          => (new \DateTime())->format('Y-m-d'),
        '$contributors' => $noOfContributors
    ]];

    return $push;
}

We must check to see if $response is of array type. If it is, we count the number of array elements. We have to format the data that will be returned to Databox (the $push array is created for this purpose). It has two elements, the date (which must be present in every push call) and the metric key (named $contributors in our case). The metric key name is always preceded by $. If it isn't, then we're pushing an attribute.

Click ‘Push’ in the upper right corner to actually push this result to Databox’s database:

logo

6. Visualize data

Now that we’ve finished all of the previous steps (i.e., transformed the data and pushed to Databox), we know that the connector is in a good working form and it's time to make a final test to visualize and validate the connectors data.

Connect your data

Navigate back to your Sandbox account in Databox, click the "Data manager" tab at the top of the screen and hit the green "+ New connection" button. Your previously built connector tile will appear. Hover over the tile and click connect.

logo

Visualize and validate your data

Click the "Designer" tab at the top of your screen to visualize your connector's data.

Click Visualization types on the left side of your screen, then drag-and-drop the Number datablock onto the datacard in the center of your screen.

logo

All that's left to do is to setup the datablock by choosing your previously connected Data source, desired Metric and Date range from the options in the Properties panel on the right side of your screen.

logo

... and the datablock should populate the number from the metric value:

logo

Here is an article and how-to video that details how to use the Databox designer.

7. Submit for review

Congratulations, you've created your first connector! To make it available for your clients, it must be submitted for review.

To do so, click the “Submit to Review” menu option on your connector tile.

logo

Our data analytics team will be notified and will follow up and proceed with the verification process.

8. Connector example

The example below is also available in the Databox Connector Builder as a sample when you're creating new manifest or Transformation logic.

## GitHub connector
Info:
    name:        Sample GitHub
    author:      Brian May
    description: Enter description here.
    avatar:      https://cdn1.databox.com/images/apps/s/GitHub.png
    avatarBig:   https://cdn1.databox.com/images/apps/m/GitHub.png
    timezone:    UTC
    popup:
        width:  1090
        height: 750
    category:
        - IT & Engineering
    profile: repo.name
    metrics:
        - key: stars
          name: Stars
          dataType: OverallTotal
          intervals: [AllTime, Today, Yesterday, Last7Days, Last30Days, MTD]
        - key: commits.daily
          name: Commits
          intervals: [Today, Yesterday, Last7Days, Last30Days, MTD]
          dataType: DailyLatestSum
          attributes:
              - key: user
                name: Contributors by commits
                intervals: [Today, Yesterday, Last7Days, Last30Days, MTD]
                dataType: DailyLatestSum
        - key: releases
          name: Releases
          dataType: Diff
          intervals: [Today, Yesterday, Last7Days, Last30Days, MTD]


Auth:
    provider: oauth2
    parameters:
        client_id:     'a8795139f121cffe9020'
        client_secret: '2d1b981731aefda584c975749e60750251693402'
        grant_type:    'authorization_code'
        authorization_endpoint: 'https://github.com/login/oauth/authorize'
        access_token_endpoint:  'https://github.com/login/oauth/access_token'
        scope: repo

    ## SAMPLE: Basic authentication
    # provider: basic
    # parameters:
    #     - name:  profile
    #       label: Name
    #       note:  <FIELD DESCRIPTION>
    #     - name:  username
    #       label: API Key
    #       type:  password
    #     - name:  password
    #       value: X
    #       type:  hidden

    ## SAMPLE: OAuth1 authentication
    # provider: oauth1
    # parameters:
    #     consumer_key:    '<YOUR OAUTH1 CONSUMER KEY>'
    #     consumer_secret: '<YOUR OAUTH1 CONSUMER SECRET>'
    #     authorization_endpoint: https://api.example.com/oauth/authorize
    #     request_token_endpoint: https://api.example.com/oauth/request_token
    #     access_token_endpoint:  https://api.example.com/oauth/access_token


UserSelect:
    - id: repo
      description: Select repository.
      method: Tasks.GetUserRepos
      transform: '[].{id:id name:full_name}'


API:
    baseUrl: https://api.github.com/
    operations:
        ## Helper operation - adds parameters where extended
        Default:
            parameters:
                page:
                    type: string
                    location: query
                    ## default will be sent in request (if not overridden in task)
                    default: 1
                per_page:
                    type: string
                    location: query
                    default: 100

        GetUserRepos:
            ## Here we extend Default operation. That way we don't need to
            ## add 'page' and 'per_page' parameters
            extends: Default
            httpMethod: GET
            uri: user/repos

        GetRepo:
            httpMethod: GET
            uri: repos/{repo.name}

        GetCommits:
            extends: Default
            httpMethod: GET
            uri: repos/{repo.name}/commits
            parameters:
                since:
                    type: string
                    location: query

        GetReleases:
            extends: Default
            httpMethod: GET
            uri: repos/{repo.name}/releases


Tasks:
    ## Task for UserSelect above
    - name: GetUserRepos
      operation: GetUserRepos
      parameters:
          page: 1
      paginate:
          property: page
          max_pages: 20
          array_merge: true

    ## Connector tasks
    - name: GetStars
      operation: GetRepo
      transform: stargazers_count
    - name: GetCommitsHistory
      operation: GetCommits
      transform: '[].{author:commit.author.name commited_at:commit.author.date}'
      parameters:
          page: 1
          since: '{date:30daysAgo}'
      ## paginate with 'page' parameter above
      paginate:
          property: page
          max_pages: 40
          array_merge: true
    - name: GetReleases
      operation: GetReleases
      transform: '[].{name:name,draft:draft,prerelease:prerelease,created_at:created_at,published_at:published_at}'


Transformations:
  - name: GetStars
    from: GetStars
    use: getStars
  - name: GetCommitsHistory
    from: GetCommitsHistory
    use: getCommitsHistory
  - name: GetReleases
    from: GetReleases
    use: getReleases


Batch:
    ## Default batch - runs once per day (and right after successful authentication)
    - id: default
      runs:
        - use: GetStars
        - use: GetCommitsHistory
        - use: GetReleases

    ## Batch for each metric
    - id: stars
      runs:
        - use: GetStars
    - id: commits.daily
      runs:
        - use: GetCommitsHistory
    - id: releases
      runs:
        - use: GetReleases
    ## We also need separate batch for attribute 'name'
    - id: commits.daily|name
      runs:
        - use: GetCommitsHistory
<?php

namespace Databox\SampleGitHub;

class Transform
{
    private $settings;
    private $storage;

    public function __construct($settings, $storage)
    {
        $this->settings = $settings;
        $this->storage  = $storage;
    }

    // In case we'd like to store information to settings
    // Used when we need to know the value from prevoius fetch
    private function saveToSettings($field, $value)
    {
        $this->settings[$field] = $value;
        $this->storage->saveSettings($this->settings, true);
    }

    private function getFromSettings($field)
    {
        return isset($this->settings[$field]) ? $this->settings[$field] : null;
    }

    public function getStars($response)
    {
        $push = [[
            'date'   => (new \DateTime())->format('Y-m-d'),
            '$stars' => $response
        ]];

        return $push;

        // EXAMPLE
        // We'd like to store current stars number to use it later
        $this->saveToSettings('starsNumber', $response);

        // Get 'starsNumber' from settings (returns null if not set)
        $stars = $this->getFromSettings('starsNumber');
    }

    public function getReleases($response)
    {
        $push = [];

        foreach ($response as $release) {
            if ($release['draft']) {
                continue;
            }

            // We push '1' on release timestamp (data type Diff)
            $push[] = [
                'date'      => $release['created_at'],
                '$releases' => 1
            ];
        }

        return $push;
    }

    private function getCommits($response, $date)
    {
        $commits         = 0;
        $commitsByAuthor = [];

        $dateString = $date->format('Y-m-d');

        foreach ($response as $issue) {
            $commitedAt = (new \DateTime($issue['commited_at']))->format('Y-m-d');

            if ($commitedAt === $dateString) {
                $commits += 1;

                if (isset($commitsByAuthor[$issue['author']])) {
                    $commitsByAuthor[$issue['author']] += 1;
                } else {
                    $commitsByAuthor[$issue['author']] = 1;
                }
            }
        }

        $push[] = [
            'date'           => $dateString,
            '$commits.daily' => $commits
        ];

        foreach ($commitsByAuthor as $author => $commitCount) {
            $push[] = [
                'date'           => $dateString,
                '$commits.daily' => $commitCount,
                'user'           => $author
            ];
        }

        return $push;
    }

    public function getCommitsHistory($response)
    {
        $push     = [];
        $noOfDays = 0;
        $day      = (new \DateTime())->setTime(0, 0, 0);

        while ($noOfDays < 30) {
            $pushCommits = $this->getCommits($response, $day);
            $push        = array_merge($push, $pushCommits);

            $noOfDays += 1;
            $day->modify('-1 day');
        }

        return $push;
    }
}