Getting started with Databox Connector Builder
The Connector Builder gives users the ability to quickly create a connection between Databox and any data source.
With the Connector Builder, you’ll be able to:
- Connect to a data source via a defined authentication protocol
- Extract data from a data source using defined API calls
- Transform data to metrics
- Load data to Databox
In this quick tutorial, we'll create a basic GitHub connector that will authorize with GitHub using Oauth2 protocol, define an API endpoint, make an API call, transform the data and, finally, push the data to Databox. Databox Designer can then be used to visualize the data.
Here’s what you need to do.
- Login to the Builder environment
- Write a Manifest file
- Connect the data source
- Run tasks and check the returned data
- Transform and push the data
- Visualize and verify the data
- Submit for review
1. Login
Once you've received your login data and logged in to Webapp, simply click the link Build connectors
next to your user name. This will take you to the Builder environment.
2. Manifest File
First, you need to define the manifest file. The manifest file is a simple YAML format file that consists of several sections. These sections give you the ability to describe the tasks you want to achieve in a simple, logical way.
Think of the manifest file as the blueprint of a connector. It defines all components that the connector needs to recognize in order to communicate with a 3rd party API.
The sections are:
- Info: basic connector info and metrics are defined here
- Auth: API authentication type and any data needed to authenticate
- API: defines all of the API calls that will be made
- UserSelect: optional, for additional choices upon connect
- Tasks: runs API calls and extracts desired data
- Transformations: PHP functions that will manipulate data from tasks
- Batches: specifies which tasks will run and when it will happen
Actual PHP code, if needed at all, will be written only in the Transformations section.
Now, let’s explore what’s necessary to build a usable connector so that you can achieve your goal of creating some functional databoards.
Info section
Info:
name: GitHub
author: Databox Demo
description: "Online project hosting using Git. Includes source-code browser, in-line editing, wikis, and ticketing."
avatar: https://cdn1.databox.com/images/apps/s/GitHub.png
avatarBig: https://cdn1.databox.com/images/apps/m/GitHub.png
timezone: UTC
popup:
width: 1090
height: 750
category:
- IT & Engineering
profile: repo.name
metrics:
- key: contributors
name: Contributors
dataType: OverallTotal
intervals: [AllTime,Today,Yesterday,Last7Days,Last30Days,MTD,LastMonth]
- key: branches
name: Branches
dataType: OverallTotal
intervals: [AllTime,Today,Yesterday,Last7Days,Last30Days,MTD,LastMonth]
First, we must give our connector a unique name. Enter a description, along with links to two connector avatar pictures (must differ in size), set timezone as UTC, set the popup size in pixels, select the category and define metrics. Make sure you choose an avatar picture that best describes this connector.
Please refer to Manifest for all possible field types and details on each field.
Auth section
Auth:
provider: oauth2
parameters:
client_id: '573d6XXXXXXXXXXXXb6a9'
client_secret: '691d2XXXXXXXXXXXXXXXXXXXXXe7a74'
response_type: code
grant_type: 'authorization_code'
authorization_endpoint: 'https://github.com/login/oauth/authorize'
access_token_endpoint: 'https://github.com/login/oauth/access_token'
scope: repo
This is where we define the provider type. GitHub uses OAuth2, so we'll use just that for this example. Parameters are defined according to GitHub API docs, and our Auth section docs docs.
Visit your GitHub OAuth settings page and choose “Register a new application”
Populate all fields, like this:
Once the form is submitted, you will be presented with Client ID and Client Secret values. These are needed for the OAuth authentication. Enter them in the Auth section as seen above.
We’ll also visit the GitHub OAuth docs to familiarize ourselves with mandatory fields for OAuth2 and API calls, as well as what the values should be.
API section
In the API section, we define all calls that get sent to the API, along with parameters and the http method type.
API:
baseUrl: https://api.github.com/
operations:
Default:
parameters:
page:
type: string
location: query
default: 1
per_page:
type: string
location: query
default: 100
GetUserRepos:
extends: Default
httpMethod: GET
uri: user/repos
GetContributors:
httpMethod: GET
uri: repos/{repo.name}/contributors
GitHub API URL is entered as baseUrl. Operations
are as follows:
Default
parameters page and per_page are defined. These are parameters that will be set for every request that uses them with extends
keyword.
GetUserRepos
is the arbitrary name we gave to the GET API call user/repos
. It's good practice to keep it close to the actual API call name. It uses the Default
parameters via extends
keyword. URI user/repos
will be called.
GetContributors
is another API call. URI will use the parameter {repo.name}
which is set during the initial connect and will be substituted by chosen repo name.
UserSelect section
UserSelect:
- id: repo
description: Select repository.
method: Tasks.GetUserRepos
transform: '[].{id:id name:full_name}'
Optional section. This is used when an additional action is necessary upon connect. For GitHub, this is the repository selection. Upon initial selection, a window entitled description
is displayed, followed by the execution of the GetUserRepos task, which returns all the user's repos. The user must then select the repo he/she wishes to use.
Jmespath transform is used on reply to filter just the data we need. More on that shortly.
The chosen repo is then stored in the database and used in all subsequent calls that need a specific repo.
Tasks section
Tasks:
- name: GetUserRepos
operation: GetUserRepos
parameters:
page: 1
paginate:
property: page
max_pages: 20
array_merge: true
- name: GetContributors
operation: GetContributors
transform: '[].{id:id login:login}'
- name: GetBranches
operation: GetBranches
transform: 'length([])'
When tasks are defined, each task is tied to arbitrary API call via operation
keyword. An API call is run and the JSON is returned. Additional parameters
may also be supplied, and pagination
may be added to the Task.
Task usually returns larger amounts of data than we need. That's why data can be filtered and cleaned up before use. Jmespath format is used for this purpose. Its syntax may be used to transform
values. For easier development and debugging later on, it’s good practice to filter out all the data we don't need
Transformations section
Transformations:
- name: GetContributors
from: GetContributors
use: getContributors
- name: GetBranches
from: GetBranches
use:
- metrickey: branches
To begin with, transformation is named via the name
field. The Task is then linked to it with the from
directive, and finally, the PHP function that will transform the data is named with the use
keyword. In our example, getContributors
will be the function.
The next transformation named GetBranches
is even simpler because it doesn't need any PHP code. Output of the task GetBranches
is simply pushed as a metric key named branches
. Since the output of GetBranches
is just a number, this shortcut can be used to omit any PHP code.
Basically, we can do anything with data that the native API didn't provide, or we can push the data that we get. Data is then subsequently pushed to Databox. More on this later.
Batches section
Batch:
- id: default
runs:
- use: GetContributors
- use: GetBranches
- id: contributors
runs:
- use: GetContributors
- id: branches
runs:
- use: GetBranches
Batches define which tasks/transformations will run on initial connect, daily (id: default
) and hourly (id: metrickeyname
). In this example, default
batch starts upon initial connect and runs two tasks/transformations. After that, two hourly batches are defined. Their id
is set to the metric keys used.
Based on the id
set, batches are run as following:
history
-> runs only when the user connects for the first timedefault
-> runs when the user connects, plus once daily. Run time depends on the timezone selected (at 4 AM)nightly
-> runs 1 minute after midnight, UTC timezone<metric_name>
-> runs every hour if the user used this metrics in the Designer. Doesn't run for non-active metrics
3. Connect the datasource
Great! The Manifest is now finally ready for testing and for our first connection. After saving it, we can now try to connect our connector to GitHub.
A window will popup for us to authorize this connection, then another popup from the UserSelect
section will appear. The GitHub repository must be selected.
After a successful connect, the menu is populated with some further options:
If everything went well, our datasource will now be connected. We can proceed with running Tasks, checking if the API call was executed correctly and verifying data.
4. Tasks
Select Tasks from the menu, so that the Run Tasks sandbox environment is displayed. From here, we can run each Task and inspect the API call, raw response data and mapped response filtered data.
All of our Manifest-defined tasks are listed here:
So, let's choose the second task called GetContributors
and run it.
You might be wondering, “what just happened?” Essentially, the Task ran with the GetContributors
API call. The request sent is visible in the Request tab. We can see the GET uri and all of the other headers sent.
If we go over to the Response tab, we can see the whole response GitHub API returned. We'll omit the listing since it's too big.
Now we move to the Mapped Response tab. Output here is much cleaner, since we've applied the jmespath transformation transform: '[].{id:id login:login}'
to the JSON response. This transformation basically only keeps id
and login
fields, and purges all other data from the original output.
5. Transformations
We can now be sure that our Task is running smoothly, and that data is returned and filtered. It’s time for the final step, reformatting the data to Databox’s format and pushing it to Databox’s database servers. This is the only step that might require some PHP code, which can be very simple, depending on the API call and what we need to achieve.
We’ll navigate back to the connectors screen and choose the Transformation logic
link. This time, we get an editor window with some pre-filled PHP code to make things easier. Code can be edited here or copy & pasted from your favorite IDE/editor. Let's add a custom function:
public function getContributors($response)
{
$noOfContributors = is_array($response)
? count($response)
: 0;
$push = [[
'date' => (new \DateTime())->format('Y-m-d'),
'$contributors' => $noOfContributors
]];
return $push;
}
We must check to see if $response
is of array type. If it is, we count the number of array elements. We have to format the data that will be returned to Databox (the $push array is created for this purpose). It has two elements, the date (which must be present in every push call) and the metric key (named $contributors
in our case). The metric key name is always preceded by $
. If it isn't, then we're pushing an attribute.
Click ‘Push’ in the upper right corner to actually push this result to Databox’s database:
6. Visualize data
Now that we’ve finished all of the previous steps (i.e., transformed the data and pushed to Databox), we know that the connector is in a good working form and it's time to make a final test to visualize and validate the connectors data.
Connect your data
Navigate back to your Sandbox account in Databox, click the "Data manager" tab at the top of the screen and hit the green "+ New connection" button. Your previously built connector tile will appear. Hover over the tile and click connect.
Visualize and validate your data
Click the "Designer" tab at the top of your screen to visualize your connector's data.
Click Visualization types
on the left side of your screen, then drag-and-drop the Number
datablock onto the datacard in the center of your screen.
All that's left to do is to setup the datablock by choosing your previously connected Data source
, desired Metric
and Date range
from the options in the Properties panel on the right side of your screen.
... and the datablock should populate the number from the metric value:
Here is an article and how-to video that details how to use the Databox designer.
7. Submit for review
Congratulations, you've created your first connector! To make it available for your clients, it must be submitted for review.
To do so, click the “Submit to Review” menu option on your connector tile.
Our data analytics team will be notified and will follow up and proceed with the verification process.
8. Connector example
The example below is also available in the Databox Connector Builder as a sample when you're creating new manifest or Transformation logic.
## GitHub connector
Info:
name: Sample GitHub
author: Brian May
description: Enter description here.
avatar: https://cdn1.databox.com/images/apps/s/GitHub.png
avatarBig: https://cdn1.databox.com/images/apps/m/GitHub.png
timezone: UTC
popup:
width: 1090
height: 750
category:
- IT & Engineering
profile: repo.name
metrics:
- key: stars
name: Stars
dataType: OverallTotal
intervals: [AllTime, Today, Yesterday, Last7Days, Last30Days, MTD]
- key: commits.daily
name: Commits
intervals: [Today, Yesterday, Last7Days, Last30Days, MTD]
dataType: DailyLatestSum
attributes:
- key: user
name: Contributors by commits
intervals: [Today, Yesterday, Last7Days, Last30Days, MTD]
dataType: DailyLatestSum
- key: releases
name: Releases
dataType: Diff
intervals: [Today, Yesterday, Last7Days, Last30Days, MTD]
Auth:
provider: oauth2
parameters:
client_id: 'a8795139f121cffe9020'
client_secret: '2d1b981731aefda584c975749e60750251693402'
grant_type: 'authorization_code'
authorization_endpoint: 'https://github.com/login/oauth/authorize'
access_token_endpoint: 'https://github.com/login/oauth/access_token'
scope: repo
## SAMPLE: Basic authentication
# provider: basic
# parameters:
# - name: profile
# label: Name
# note: <FIELD DESCRIPTION>
# - name: username
# label: API Key
# type: password
# - name: password
# value: X
# type: hidden
## SAMPLE: OAuth1 authentication
# provider: oauth1
# parameters:
# consumer_key: '<YOUR OAUTH1 CONSUMER KEY>'
# consumer_secret: '<YOUR OAUTH1 CONSUMER SECRET>'
# authorization_endpoint: https://api.example.com/oauth/authorize
# request_token_endpoint: https://api.example.com/oauth/request_token
# access_token_endpoint: https://api.example.com/oauth/access_token
UserSelect:
- id: repo
description: Select repository.
method: Tasks.GetUserRepos
transform: '[].{id:id name:full_name}'
API:
baseUrl: https://api.github.com/
operations:
## Helper operation - adds parameters where extended
Default:
parameters:
page:
type: string
location: query
## default will be sent in request (if not overridden in task)
default: 1
per_page:
type: string
location: query
default: 100
GetUserRepos:
## Here we extend Default operation. That way we don't need to
## add 'page' and 'per_page' parameters
extends: Default
httpMethod: GET
uri: user/repos
GetRepo:
httpMethod: GET
uri: repos/{repo.name}
GetCommits:
extends: Default
httpMethod: GET
uri: repos/{repo.name}/commits
parameters:
since:
type: string
location: query
GetReleases:
extends: Default
httpMethod: GET
uri: repos/{repo.name}/releases
Tasks:
## Task for UserSelect above
- name: GetUserRepos
operation: GetUserRepos
parameters:
page: 1
paginate:
property: page
max_pages: 20
array_merge: true
## Connector tasks
- name: GetStars
operation: GetRepo
transform: stargazers_count
- name: GetCommitsHistory
operation: GetCommits
transform: '[].{author:commit.author.name commited_at:commit.author.date}'
parameters:
page: 1
since: '{date:30daysAgo}'
## paginate with 'page' parameter above
paginate:
property: page
max_pages: 40
array_merge: true
- name: GetReleases
operation: GetReleases
transform: '[].{name:name,draft:draft,prerelease:prerelease,created_at:created_at,published_at:published_at}'
Transformations:
- name: GetStars
from: GetStars
use: getStars
- name: GetCommitsHistory
from: GetCommitsHistory
use: getCommitsHistory
- name: GetReleases
from: GetReleases
use: getReleases
Batch:
## Default batch - runs once per day (and right after successful authentication)
- id: default
runs:
- use: GetStars
- use: GetCommitsHistory
- use: GetReleases
## Batch for each metric
- id: stars
runs:
- use: GetStars
- id: commits.daily
runs:
- use: GetCommitsHistory
- id: releases
runs:
- use: GetReleases
## We also need separate batch for attribute 'name'
- id: commits.daily|name
runs:
- use: GetCommitsHistory
<?php
namespace Databox\SampleGitHub;
class Transform
{
private $settings;
private $storage;
public function __construct($settings, $storage)
{
$this->settings = $settings;
$this->storage = $storage;
}
// In case we'd like to store information to settings
// Used when we need to know the value from prevoius fetch
private function saveToSettings($field, $value)
{
$this->settings[$field] = $value;
$this->storage->saveSettings($this->settings, true);
}
private function getFromSettings($field)
{
return isset($this->settings[$field]) ? $this->settings[$field] : null;
}
public function getStars($response)
{
$push = [[
'date' => (new \DateTime())->format('Y-m-d'),
'$stars' => $response
]];
return $push;
// EXAMPLE
// We'd like to store current stars number to use it later
$this->saveToSettings('starsNumber', $response);
// Get 'starsNumber' from settings (returns null if not set)
$stars = $this->getFromSettings('starsNumber');
}
public function getReleases($response)
{
$push = [];
foreach ($response as $release) {
if ($release['draft']) {
continue;
}
// We push '1' on release timestamp (data type Diff)
$push[] = [
'date' => $release['created_at'],
'$releases' => 1
];
}
return $push;
}
private function getCommits($response, $date)
{
$commits = 0;
$commitsByAuthor = [];
$dateString = $date->format('Y-m-d');
foreach ($response as $issue) {
$commitedAt = (new \DateTime($issue['commited_at']))->format('Y-m-d');
if ($commitedAt === $dateString) {
$commits += 1;
if (isset($commitsByAuthor[$issue['author']])) {
$commitsByAuthor[$issue['author']] += 1;
} else {
$commitsByAuthor[$issue['author']] = 1;
}
}
}
$push[] = [
'date' => $dateString,
'$commits.daily' => $commits
];
foreach ($commitsByAuthor as $author => $commitCount) {
$push[] = [
'date' => $dateString,
'$commits.daily' => $commitCount,
'user' => $author
];
}
return $push;
}
public function getCommitsHistory($response)
{
$push = [];
$noOfDays = 0;
$day = (new \DateTime())->setTime(0, 0, 0);
while ($noOfDays < 30) {
$pushCommits = $this->getCommits($response, $day);
$push = array_merge($push, $pushCommits);
$noOfDays += 1;
$day->modify('-1 day');
}
return $push;
}
}