On Thursday we launched some add-ons for AWS Step Functions, on which I helped a bit. As usual, there’s a nice Jeff Barr blog. This is to add design notes and extra color.
Our announcement describes these as “Integrations” — internally, while we were building them, we called them Connectors, and I’m going to stick with that because it has one less syllable and feels idiomatic.
Tl;dr · Up till now, Step Functions knew how to hand work to Lambda functions and to polling “Activity Workers”. As of now, it can also make use of DynamoDB (read/write), Batch (start a job in either fire-and-forget or wait-for completion mode), ECS (regular and Fargate flavors), SNS (write-only), SQS (write-only), Glue (like Batch, async or sync), and SageMaker (same).
Of course, you could do all this before, by running a little Lambda function to call whatever API, but now Step Functions knows how to make those calls directly. Which means fewer Lambdas to own and maintain. Also this should run a little faster without an interposed function, and finally, Step Functions can be smarter about dealing with retries and throttling and so on.
How it works · Nothing essential in the Amazon States Language has changed. Just as before, you use a Task State to get work done, and you identify the work in the value of the Resource field, which is a URI. Used to be, the only URIs recognized were Lambda ARNs and Activity-worker Task ARNs. Yeah, as far as I know, nobody’s ever registered AWS’s “arn” URI scheme, but for all practical purposes they’re perfectly good URIs.
So all we really had to do to make this work was teach Step Functions to recognize new flavors of ARNs: One for each of the
operations I mentioned above. For example, the Resource value that requests fetching a DynamoDB item is
arn:aws:states:::dynamodb:getItem
. All the other stuff in Task states about Retriers and Catchers and so on goes on
working just as it did before.
This notion of short strings that identify a “unit of information or service” is a straightforward use of URIs, and shouldn’t be surprising to anyone who understands how the Web works.
In most cases, the implementation is simple enough; the service just feeds the appropriate input data to the indicated API. But a couple of the new Connectors go further, for example running a Batch job in synchronous mode. It turns out that the Batch service only has a fire-and-forget API, so what the service does in this case is write a rule into the caller’s CLoudWatch Events account which catches Batch’s job-finished event and routes it to an SQS queue, which Step Functions has a long-poll posted on to find out when the work is done.
Once again, customers could previously have set this up for themselves (and in fact some have) but it just makes more sense to offer it as a built-in.
Parameterization · Step Functions has always had a tool, the “InputPath” field, to filter incoming input and select bits and pieces of it to feed to workers. When we started working on the first few Connectors, we realized it wasn’t quite up to the task of assembling the correct input for an arbitrary collection of API calls. We were at risk of replacing the dumb little Lambdas that existed just to call APIs with even dumber little Lambdas that just wrangled JSON into the right shape to call the API.
Thus the States Language’s brand-new “Parameters” field. To explain this, I’m going to re-use the example from Jeff’s blog linked above:
1 "Read Next Message from DynamoDB": {
2 "Type": "Task",
3 "Resource": "arn:aws:states:::dynamodb:getItem",
4 "Parameters": {
5 "TableName": "StepDemoStack-DDBTable-1DKVAVTZ1QTSH",
6 "Key": {
7 "MessageId": {"S.$": "$.List[0]"}
8 }
9 },
10 "ResultPath": "$.DynamoDB",
11 "Next": "Send Message to SQS"
12 }
You can see the magic read-from-Dynamo ARN there in the Resource field on line 3. But it’s the Parameters field value that’s
interesting.
It has the right JSON shape to hold the GetItem
API arguments, but buried down in the Key field on line 7 there’s a little
weirdness going on. It turns out that DynamoDB wants you to pass a string argument by sending JSON that looks like:
"S": "My own personal string key"
In line 7, you see a field whose name is “S.$” and value is “$.List[0]”. That “.$” suffix is the new
thing; whenever you see that in the Parameters block, it means that the value is to be interpreted as a JSONPath, applied to the
state’s input, and then the whole field is replaced by one whose name is the same with the “.$” suffix subtracted, and whose value
is whatever you got from the JSONPath.
[Tim, what if their API already uses a field whose name ends in “.$”? -Ed.]
[That would be unfortunate. -T]
We think this should provide people with most of what they need to compose the arguments for almost anything you might want to invoke. By the way, the Parameters-block idea wasn’t mine, it was cooked up by folks on the Step Functions team, notably Ali Baghani. And because it’s not mine, I can say: Way cool!
Permissions · If you go to the console and set up a state machine like the one in the example, we can do an extra trick, namely look at all the Connectors in your machine, figure out what permissions you need to make those calls, and synthesize a Role for you, designed to be used in running that state machine.
API mapping? · Now, if you look at the way we’ve provided mappings for a few of the AWS APIs, you might reasonably wonder “Why not all of them?”. After all, I notice that Diego ZoracKy recently published a general-purpose Lambda function that does just that — give it the API name and the right arguments and it’ll do the call.
It’s not a crazy idea, but we’re not going to do it. Blindly calling APIs without having thought it through a little could be a recipe for unhappiness. We want to make sure that when we make those calls, we’re being sensible about buffering, polling mode, retrying, checking for impossible arguments, and so on. For example, we support sending a message to SQS, but receiving one will require some head-scratching about whether and when to delete after a successful read.
Also, in some cases we might want to do prep work, as we already do to make those asynchronous fire-and-forget APIs look synchronous when you put them in a state machine.
So, we’re going to reserve the right to add Connectors at our own speed and in a thoughtful way.
Future directions · The fact that we identify workers with URIs leaves the door open for any kind of future Connector you can think of. There are lots of obvious candidates in the AWS SDK, to start with. One especially obvious one is supporting the ARNs of other Step Functions state machines, giving you nested child-workflow invocations.
Another is allowing the use of any old HTTP endpoint URL, so your state machine could talk to any Web API in the universe. I suppose we’ll need to add an "HTTP-method" field or equivalent to specify whether you want GET or POST.
Anyhow, there are lots of Step Functions features on the road map that aren’t just Connectors; but I suspect that we’re going to come under pressure to keep adding them, starting today and going on more or less forever.
States-Language housekeeping · The States Language spec has been updated, and so has its source on GitHub. So has the statelint state-machine validator (and the Ruby Gem updated).
All these years into my career, I still get a little thrill being part of the Launch Day dance, a few little pushes with my own pinkies.
I tested my parts pretty hard but there might be bugs; I do look at pull requests and have taken some in the past. In particular I have to say thank-you for jezhiggins’ updates to the supporting j2119 parser generator that made it possible to add Parameters validation to statelint.
Comment feed for ongoing:
From: Dan Kibler (Dec 03 2018, at 15:06)
These are some welcome additions. I'm glad you took the trouble to make them reliable and easy to use (e.g., security).
An obvious extension useful to many would be to write to and from S3.
As I eluded to in our brief Twitter exchange, increasing the max size of the data that can be passed into or between step would be very useful too.
[link]