AWS Glue to Lambda via JSON

Parsing Hive/OpenX JSON SerDe from AWS Glue in Node.js

John Elliott
2 min readAug 23, 2021

Spark ETL on AWS Glue produces JSON that cannot be parsed in JavaScript. According to AWS, the format of JSON produced is native Hive JSON SerDe or the OpenX JSON SerDe.

This is an example of the JSON produced by Glue:

{ "forename" : "John", "surname" : "Elliott" }
{ "forename" : "Henrik", "surname" : "Larsson" }

However, for Javascript to deserialize the above using JSON.parse(), it should look like this:

[{ "forename" : "John", "surname" : "Elliott" },
{ "forename" : "Henrik", "surname" : "Larsson" }]

As we can see, it needs to be an array of JSON elements to be parsed without error.

The code snippet below is a JavaScript / Node.js function that performs the above transformation. The output string can then be parsed using JSON.parse().

/**
* Gets a Hive / OpenX JsonSerDe formatted-JSON string
* and converts to valid Javascript JSON
*/
function convertJSONSerDe(jsonSerDe) {
// add commas at the end of each line
let jsonString = jsonSerDe.replace(/[}]/g,"},");

// remove the last comma
jsonString = jsonString.substring(0,jsonString.length-1);

// wrap in square brackets to convert to array
jsonString = "[" + jsonString + "]";
return jsonString;
}

This is useful if, for example, you want to process the output of AWS Glue in Lambda. A typical use case could be to aggregate a large dataset using Spark / AWS Glue, output the results to S3 in JSON format, and then use Cloudwatch Events to trigger the Lambda function on creation of file. The Lambda function would get the S3 object bucket and key, and can then read the file, convert to valid JSON and finally use the parsed data as necessary.

Sources:

  1. JsonSerde — A read/write SerDe for JSON Data https://github.com/rcongiu/Hive-JSON-Serde
  2. Examples provided by AWS of Athena / Glue-valid JSON https://aws.amazon.com/premiumsupport/knowledge-center/error-json-athena/

--

--

John Elliott

Enterprise cloud, analytics and ML. Computer Science and MBA educated. Triple AWS certified. Scottish but living in Australia.