DOCS

New!

/

5 minute read

November 5, 2019

Exporting and Downloading Datasets

The DiVA API enables you to define a Marqeta platform dataset and export it as a compressed CSV file. You can choose between Zip or Gzip compression. After export, you use the API to download the compressed file.

Exporting a dataset as a file

You can export any dataset as a CSV file by sending a GET request to the appropriate endpoint. To construct your endpoint URL, start with the URL you would use to retrieve that same dataset in JSON format, for example:

/views/authorizations/month?program=my_program

Then insert the export_type path parameter (/csv) before the query string, for example:

/views/authorizations/month/csv?program=my_program

By default, the resulting dataset is compressed as a gz file. You can compress it as a zip file by including the compress query parameter, for example:

/views/authorizations/month/csv?compress=zip&program=my_program

Because the export operation is processed asynchronously, you should receive an immediate 202 Accepted response. The JSON-formatted response body contains a token that you will use in downloading your data-set file, for example:

{
    "token": "db63c24d8307c24b7e17d33735114dc8f807838a.csv.gz"
}

Is this helpful?

Downloading the exported file

The API returns up to 1,048,576 rows in a file export and can take several minutes to generate the file.

To retrieve your file, send a GET request to the /download?token={my_download_token} endpoint, where {my_download_token} is the value of the token field that was returned in response to your export request, for example:

/download?token=db63c24d8307c24b7e17d33735114dc8f807838a.csv.gz

Note
The token value includes two filename extensions (for example, .csv.gz). You must include these extensions in your request URL.

The API returns one of these responses:

  • If the job is not finished: The 202 "Accepted" HTTP response code and a plain-text body containing the word Pending.

  • If the job is finished: The 200 "OK" HTTP response code and the file as an application/octet-stream.

  • If the job has expired: The 410 "Gone" HTTP response code. Completed jobs expire after 60 minutes.

When saving your file, use the same filename extensions you used in your URL request, for example: my_downloaded_file.csv.gz

The following example of Python code illustrates how you can download an exported report file in CSV format:

import requests
 from requests.auth import HTTPBasicAuth
 import time
 import pandas as pd

 # Constants for HTTP response codes
 RC_SUCCESS = 200
 RC_ACCEPTED = 202
 RC_UNAUTHORIZED = 401

 # Generate authentication string
 username = "APPLICATION_TOKEN" # replace APPLICATION_TOKEN with your application token
 password = "ACCESS_TOKEN" # replace ACCESS_TOKEN with your access token
 basic_auth = HTTPBasicAuth(username, password)

 # Download an exported file with the specified token
 # Parameters:
 # file_token - token of the file to download
 # auth - authentication string
 # base_url - base api path for download url
 # retry_seconds - maximum time to retry, in seconds
 def getCSV(file_token, auth, base_url, retry_seconds = 300):

             # Set timeout to current time plus maximum time to retry
             timeout = time.time() + retry_seconds

             # Build URL to download exported file
             download_file_url = base_url + "/download?token=" + file_token

             # Check status whether the file is ready for download
             code = requests.head(download_file_url, auth = basic_auth).status_code
             while (code != RC_SUCCESS) and time.time() < timeout:
                 time.sleep(1)
                 # Retry check status
                 code = requests.head(download_file_url, auth = basic_auth).status_code

             if code == RC_SUCCESS:  # check status succeeded - the file is ready to download
                 download_response = requests.get(download_file_url, auth = basic_auth)

                 # Save the response content into a temporary file
                 file = open("temp.csv.gz", "wb")
                 file.write(download_response.content)
                 file.close()

                 # Read the CSV content from the gzipped file
                 data_out = pd.read_csv("temp.csv.gz", compression = "gzip",
                                    error_bad_lines = False)

             else:
                 data_out = "no timely response" # check status timed out

             return data_out

  # Build URL to export dataset for resource of interest (e.g. cards) in desired file format (e.g. CSV)
 api_base_path = "https://diva-api.marqeta.com/data/v2"
 resource_format_path = "/views/cards/detail/csv"
 program_selector = "?program=MY_PROGRAM" # replace MY_PROGRAM with the name of your program
 export_dataset_url = api_base_path + resource_format_path + program_selector

 # Invoke request to export the dataset
 export_response = requests.get(export_dataset_url, auth = basic_auth)

 if export_response.status_code == RC_ACCEPTED: # export request succeeded

     # Obtain the CSV file token from the response
     export_file_token = export_response.json().get("token")

     # Call the getCSV function to download the CSV file
     data = getCSV(file_token = export_file_token, auth = basic_auth, base_url = api_base_path)

     if data == "no timely response":
         print("Failure: No timely response")
     else:
         print("Success: Dataset length = " + str(len(data)))

 elif export_response.status_code == RC_UNAUTHORIZED:
     print("Failure: Unauthorized access") # authentication failed

 else:
     print("Failure: Unknown error") # export request failed

Is this helpful?

Have any feedback on this page?

If you feel we can do anything better, please let our team know.

We strive for the best possible developer experience.