Cognitive Artificial Intelligence

Cloud Vendor Based NoOps

Use Cases

Transcription
Diarization
Language Detection

AWS (Amazon Web Services)

Amazon Transcribe
Prerequisites are to have a valid and activated AWS account and permissions to use "Transcribe" cognitive services

Prepare to configure AWS CLI
NB. Do not use the AWS account root user access key. The access key for the AWS account root user gives full access to all resources for all AWS services, including billing information. The permissions cannot be reduce for the AWS account root user access key.
1. Create a GROUP in the Console, such as cognitive, and assign AmazonTranscribeFullAccess and AmazonS3FullAccess as Policy create-admin-group
  Select one or more policies to attach. Each group can have up to 10 policies attached.
2. Create a USER in the Console, such as aiuser, assign it to the GROUP, and save the credentials.csv file (store and keep it secret) create-admin-user
3. Set a PASSWORD for the user aws-password

Run the aws configure command to configure the AWS CLI using the keys for the USER (aiuser)
NB. The command prompts for: access key, secret access key, AWS Region, and output format; stores this in a profile ("default"), this is used when running an AWS CLI command without explicitly specify another profile.

$ aws configure list
      Name                    Value             Type    Location
      ----                    -----             ----    --------
   profile                <not set>             None    None
access_key     ****************MYVZ shared-credentials-file    
secret_key     ****************nEac shared-credentials-file    
    region                <not set>             None    None

Create S3 Bucket

In this case the bucket is named blobbucket and set to private, with LocationConstraint set to the specified region

$ aws s3api create-bucket --bucket blobbucket --acl private --region us-east-2 --create-bucket-configuration LocationConstraint=us-east-2
http://blobbucket.s3.amazonaws.com/

Upload files to the S3 Bucket (s3 and s3api commands)

$ aws s3 cp --recursive ../data/ s3://blobbucket/

$ aws s3api put-object --bucket blobbucket --key texttyped1.png --body ../data/texttyped1.png --acl private

List objects (files) in the S3 Bucket (s3 and s3api commands)

$ aws s3 ls s3://blobbucket

$ aws s3api list-objects --bucket blobbucket --query 'Contents[].{Key: Key}' | jq -r '.[].Key'

Trying to access this bucket over HTTP without authenticating is denied

<Error>
      <Code>AccessDenied</Code>
      <Message>Access Denied</Message>
      <RequestId>090832BE4B92F4DC</RequestId>
   <HostId>
      27Ec+Sx6rPwGJFpWIQ4ktZrdlG5m710m+yUKjXJ9IfWE3GWXde6e2OdaY0OdKnV6Y3NEUSOI4iw=
   </HostId>
</Error>

Transcription

transcribe
transcribe-input
- FLAC, MP3, MP4, or WAV file format
API_StartTranscriptionJob

Verify (that the file is in the S3 Bucket, if not copy it there

$ aws s3 ls s3://blobbucket/audio2.wav || aws s3 cp ../data/audio2.wav s3://blobbucket/audio2.wav
upload: ../data/audio2.wav to s3://blobbucket/audio2.wav

Create JSON formatted request file (request.json)

$ JOBNO=$RANDOM

$ cat <<-EOD > request.json
	{ "TranscriptionJobName": "job$JOBNO", "LanguageCode": "en-US", "MediaFormat": "wav", "Media": { "MediaFileUri": "s3://blobbucket/audio2.wav" } }
EOD

$ cat request.json
{ "TranscriptionJobName": "job26816", "LanguageCode": "en-US", "MediaFormat": "wav", "Media": { "MediaFileUri": "s3://blobbucket/audio2.wav" } }

Submit the job (input: JSON file "request.json"; output: JSON file "result$JOBNO.json)

$ aws transcribe start-transcription-job --region us-east-2 --cli-input-json file://request.json | tee result-start-$JOBNO.json
{
    "TranscriptionJob": {
        "TranscriptionJobName": "job26816",
        "TranscriptionJobStatus": "IN_PROGRESS",
        "LanguageCode": "en-US",
        "MediaFormat": "wav",
        "Media": {
            "MediaFileUri": "s3://blobbucket/audio2.wav"
        },
        "CreationTime": 1570858854.632
    }
}

Check progress of the Job

$ aws transcribe list-transcription-jobs --region us-east-2 --status IN_PROGRESS | tee result-list-$JOBNO.json
{
    "Status": "IN_PROGRESS",
    "TranscriptionJobSummaries": [
        {
            "TranscriptionJobName": "job26816",
            "CreationTime": 1570871217.434,
            "LanguageCode": "en-US",
            "TranscriptionJobStatus": "IN_PROGRESS",
            "OutputLocationType": "SERVICE_BUCKET"
        }
    ]
}

$ aws transcribe list-transcription-jobs --region us-east-2 --status IN_PROGRESS | tee result-list-$JOBNO.json
{
    "Status": "IN_PROGRESS",
    "TranscriptionJobSummaries": []
}

Get details about the Job

$ aws transcribe get-transcription-job --region us-east-2 --transcription-job-name "job26816" | tee result-get-$JOBNO.json
{
    "TranscriptionJob": {
        "TranscriptionJobName": "job26816",
        "TranscriptionJobStatus": "COMPLETED",
        "LanguageCode": "en-US",
        "MediaSampleRateHertz": 44100,
        "MediaFormat": "wav",
        "Media": {
            "MediaFileUri": "s3://blobbucket/audio2.wav"
        },
        "Transcript": {
            "TranscriptFileUri": "https://s3.us-east-2.amazonaws.com/aws-transcribe-us-east-2-prod/598691507898/job26816/92e124ae-d054-480f-a850-72d68a61bbc0/asrOutput.json?X-Amz-Security-Token=AgoJb3JpZ2luX2VjEPz%2F%2F%2F%2F%2F%2F%2F%2F%2F%2FwEaCXVzLWVhc3QtMiJHMEUCIQC%2FU4Y2jp7gWkaTY8JQztfwfXxeSNIcQdQOMFxl4IVFhgIgdfv%2FtLHAXXgOYGjwZdVsAngpjlpRWGIV0sfEbwbfVq4q4wMI5v%2F%2F%2F%2F%2F%2F%2F%2F%2F%2FARABGgwyNDYzNjEzMjI3NTIiDPMx2qiU32KAOSeN%2Byq3A8o1hjejbXr%2B0odnwNNLleH2ve9oqLvb3k8HBQIDr0Oh2X9h277vD%2BoXI6ZgfL2NF2rPw3NtFaj25OYBbWdcRYfHNel6uJD8wq49a8oGGPh8GblmvfgpW9kqzP82L1NJaTxOoKOYpHi9aIo16G7ygMjwqqeQgNk3JOuIm4J6YMzNs3Gyp7aLOd180JGGTjgzkJ%2BWFD%2BCj5EG%2BZjjDO%2BWz7G7jqdk7Md498bWa%2BVjkEo3a2Kytc9v5W3X2tpJT%2ByqS3o%2FuoFUJj2f%2FbVhOV%2BoPvch7UWwz9spi0kO1pqp%2FivmZ%2B2e3VxYrTTUwMIfssW7r%2FZe755sRlUcjcMNDZk0UTJHA7VIv63VGLpI7VtCt0nXLylq8Hre1Y479Y83mz4ZF3PvQ%2B4ms3HSm62XNlDxjqfXnqhXxU69YZlMHf%2FysaAqQZWAUrecIDaGgsDUm0g5yLOKTsEDIMpmCp9e4fsiWTI44gQo2fKoxgyaSRW9nTx%2B%2FcMCQiN2Iutpl7A%2BRXjFu5qJVxg1wr%2Bh5aOSaIq%2FLsFUBFLtTpWnggmLerbP3Hdv%2BnFTJkAYTkxbU79FbIkkCL91FTQn5fwwgauF7QU6tAGjF8Oe7uXrvHiac3gSGKNbpB2GKa%2FzGdbXMmIbCnkENx0aoRSaB2kqq3oVGeNF70XJoa1xvLzLrml2YYmLpUKFyeEH6segX%2F0hkhF0d2Haegw27do4rLyoLFRnub58M0zQCWLc5aYoDo2R9fYoxwR%2BOFdmJJk7%2BoI6R44vURaLnhoR%2FD1C3wkq0kfqnMIZ7i3TQVl%2BlaSPX6XTqJxHNqgYuypw6tPXiP9MQvTGNhbIuVFh1zo%3D&X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Date=20191012T055032Z&X-Amz-SignedHeaders=host&X-Amz-Expires=899&X-Amz-Credential=ASIATSXCHOUAOYARCSPL%2F20191012%2Fus-east-2%2Fs3%2Faws4_request&X-Amz-Signature=0ffe7b88113dacb547f1861105c21fcebbf3022f70dc831f6a1193c544a4d732"
        },
        "CreationTime": 1570858854.632,
        "CompletionTime": 1570858902.423,
        "Settings": {
            "ChannelIdentification": false
        }
    }
}

Retrieve the JSON output file from the Job

$ wget $(jq -r '.TranscriptionJob.Transcript.TranscriptFileUri' results-get-job26816.json)  --output-document=results-output-job26816.json

Review the translation result from the Job

$ jq -r '.jobName,.status,.results.transcripts[0].transcript' results-output-job26816.json
job26816
COMPLETED
checking in with another show for H p. R. Um, In the car on my way to a client's gonna be a short show. I'm think I'm gonna be there in 10 minutes, but I want to do, you know, shoot something up the flagpole here, uh, wanted to talk about the state of podcasting these days. These days, I I sound old because in podcasting terms, I am. I've been around since 4 4000 Started producing shows since 2005. Have been listening to podcasts and daily since 2004. I came across, um, my own archives from shows that I used to download back then and listen to which I had burned to a CD, and I've put them on my nads. And I've started streaming them while at work the last couple of weeks and I've had a ball listening to old podcast episodes of

Diarization

diarization
API_StartTranscriptionJob
To turn on speaker identification, set the MaxSpeakerLabels and ShowSpeakerLabels field of the Settings field when you make a call to the StartTranscriptionJob operation.

Verify (that the file is in the S3 Bucket, if not copy it there

$ aws s3 ls s3://blobbucket/audio2.wav || aws s3 cp ../data/audio2.wav s3://blobbucket/audio2.wav
upload: ../data/audio2.wav to s3://blobbucket/audio2.wav

Create JSON formatted request file (request.json)

$ JOBNO=28912

$ cat <<-EOD > request.json
{ "TranscriptionJobName": "job28912", "LanguageCode": "en-US", "Settings": { "MaxSpeakerLabels": 2, "ShowSpeakerLabels": true }, "MediaFormat": "wav", "Media": { "MediaFileUri": "s3://blobbucket/audio2.wav" } }
EOD

Submit the job (input: JSON file "request.json"; output: JSON file "result$JOBNO.json)

$ aws transcribe start-transcription-job --region us-east-2 --cli-input-json file://request.json | tee result-start-$JOBNO.json
{
    "TranscriptionJob": {
        "TranscriptionJobName": "job28912",
        "TranscriptionJobStatus": "IN_PROGRESS",
        "LanguageCode": "en-US",
        "MediaFormat": "wav",
        "Media": {
            "MediaFileUri": "s3://blobbucket/audio2.wav"
        },
        "CreationTime": 1570871217.434,
        "Settings": {
            "ShowSpeakerLabels": true,
            "MaxSpeakerLabels": 2
        }
    }
}

Check progress of the Job

$ aws transcribe list-transcription-jobs --region us-east-2 --status IN_PROGRESS | tee result-list-$JOBNO.json
{
    "Status": "IN_PROGRESS",
    "TranscriptionJobSummaries": [
        {
            "TranscriptionJobName": "job28912",
            "CreationTime": 1570871217.434,
            "LanguageCode": "en-US",
            "TranscriptionJobStatus": "IN_PROGRESS",
            "OutputLocationType": "SERVICE_BUCKET"
        }
    ]
}

$ aws transcribe list-transcription-jobs --region us-east-2 --status IN_PROGRESS | tee result-list-$JOBNO.json
{
    "Status": "IN_PROGRESS",
    "TranscriptionJobSummaries": []
}

Get details about the Job

$ aws transcribe get-transcription-job --region us-east-2 --transcription-job-name "job28912" | tee result-get-$JOBNO.json
{
    "TranscriptionJob": {
        "TranscriptionJobName": "job28912",
        "TranscriptionJobStatus": "COMPLETED",
        "LanguageCode": "en-US",
        "MediaSampleRateHertz": 44100,
        "MediaFormat": "wav",
        "Media": {
            "MediaFileUri": "s3://blobbucket/audio2.wav"
        },
        "Transcript": {
            "TranscriptFileUri": "https://s3.us-east-2.amazonaws.com/aws-transcribe-us-east-2-prod/598691507898/job28912/d18357e9-b7f8-419e-b6c8-c37516a66f8f/asrOutput.json?X-Amz-Security-Token=AgoJb3JpZ2luX2VjEAEaCXVzLWVhc3QtMiJGMEQCIFc8TsgswisbjUbQEAkddBvqUCfu%2BjBl%2B9o30RWxKZwhAiAqYZWg9g%2BxAMXw2yzI2JLABe4h%2BlS4Xc%2B%2B4jQvSFZvRyrjAwjq%2F%2F%2F%2F%2F%2F%2F%2F%2F%2F8BEAEaDDI0NjM2MTMyMjc1MiIM7gm%2FH9KjxBU60vG2KrcDJFfi1ztE13grWZYWJStMu%2BotR%2FSk%2FlD%2Fqi0%2BCxyjnIx9oYKG2LCARSPZKprNpXZloFI0A7i6%2FBXrnGf6P%2F9zbTNzeWZE3rxr3GTUjq067HMpNOoZnx3kLDnjRk1NE90CN7XS3VcHKha8eFiBdhtTiMvBmN7rpS%2BdiWpQH6cD3UGAyXkr17jswn3hsV8yc9DkrEZ5sRzVqDEaWHuRp8JczkHV07wGIPKlbFF%2F%2Blw%2Fs401PWKJKncqtVhYuwG97rzhloifNAdVEgs7u5Lip4SfBpFV1Lr%2B3%2FKTT2azj%2FJDKSEfVJLnzGwmDcL34Z88efajRyTobFTbrSufkA7v0fPLBxSURD87YgN7bQh%2B3WB3pxWl4rkKcw8r6QJgKHgBskSMWC2uHWjfUVji5RcRsAuSeedJLcrUdQ1NSsEI13Vkzr7oR8WYsiHz3rPmVsKCLfMgfifPMmU0MNqAcPBZhi3UlCJ0bh5nM7Bb2m9nMiTHk7kRdg1mbra8eckKFXUcHwFtdIcQwxTPdiuxkiM5eMc%2BsWCyaDLUZ7VZE2w%2BfOv6PyMXYj%2BVQKax9c%2F4V9Jirlv7MtfbiB41czC8oYbtBTq1ARgc5qb%2ByLnlbtOCr7yYRsUorkH0iz5WHepFpB73AHDcttA%2BtN2Irkz2FPWfV%2BBUrKMU7G012niqsDu8qXBZeIxQ2ZA7z0sCdaZBaJqODywo4o5CeH173FKhmy00YFvfXXTtSZHOy3XYxuf%2BEDFix1q6bfiRe18eNA5mCR%2BPwLoFSEUdB5eLiJFvFpM6MXAUrWpEal3%2FAvIzXcEqqb0RPAb6YZid1%2BDV%2FzjBO%2B7W9JzVUrgKKdI%3D&X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Date=20191012T092107Z&X-Amz-SignedHeaders=host&X-Amz-Expires=900&X-Amz-Credential=ASIATSXCHOUAPLLBLSNZ%2F20191012%2Fus-east-2%2Fs3%2Faws4_request&X-Amz-Signature=ee56417aac64d5428616d74a93bd82bd6c7ec48bb77eec4a2611637203daaf48"
        },
        "CreationTime": 1570871217.434,
        "CompletionTime": 1570871343.764,
        "Settings": {
            "ShowSpeakerLabels": true,
            "MaxSpeakerLabels": 2,
            "ChannelIdentification": false
        }
    }
}

Retrieve the JSON output file from the Job

$ wget $(jq -r '.TranscriptionJob.Transcript.TranscriptFileUri' results-get-job28912.json) --output-document=results-output-job28912.json
--2019-10-12 13:22:20--  https://s3.us-east-2.amazonaws.com/aws-transcribe-us-east-2-prod/598691507898/job28912/d18357e9-b7f8-419e-b6c8-c37516a66f8f/asrOutput.json?X-Amz-Security-Token=AgoJb3JpZ2luX2VjEAEaCXVzLWVhc3QtMiJGMEQCIFc8TsgswisbjUbQEAkddBvqUCfu%2BjBl%2B9o30RWxKZwhAiAqYZWg9g%2BxAMXw2yzI2JLABe4h%2BlS4Xc%2B%2B4jQvSFZvRyrjAwjq%2F%2F%2F%2F%2F%2F%2F%2F%2F%2F8BEAEaDDI0NjM2MTMyMjc1MiIM7gm%2FH9KjxBU60vG2KrcDJFfi1ztE13grWZYWJStMu%2BotR%2FSk%2FlD%2Fqi0%2BCxyjnIx9oYKG2LCARSPZKprNpXZloFI0A7i6%2FBXrnGf6P%2F9zbTNzeWZE3rxr3GTUjq067HMpNOoZnx3kLDnjRk1NE90CN7XS3VcHKha8eFiBdhtTiMvBmN7rpS%2BdiWpQH6cD3UGAyXkr17jswn3hsV8yc9DkrEZ5sRzVqDEaWHuRp8JczkHV07wGIPKlbFF%2F%2Blw%2Fs401PWKJKncqtVhYuwG97rzhloifNAdVEgs7u5Lip4SfBpFV1Lr%2B3%2FKTT2azj%2FJDKSEfVJLnzGwmDcL34Z88efajRyTobFTbrSufkA7v0fPLBxSURD87YgN7bQh%2B3WB3pxWl4rkKcw8r6QJgKHgBskSMWC2uHWjfUVji5RcRsAuSeedJLcrUdQ1NSsEI13Vkzr7oR8WYsiHz3rPmVsKCLfMgfifPMmU0MNqAcPBZhi3UlCJ0bh5nM7Bb2m9nMiTHk7kRdg1mbra8eckKFXUcHwFtdIcQwxTPdiuxkiM5eMc%2BsWCyaDLUZ7VZE2w%2BfOv6PyMXYj%2BVQKax9c%2F4V9Jirlv7MtfbiB41czC8oYbtBTq1ARgc5qb%2ByLnlbtOCr7yYRsUorkH0iz5WHepFpB73AHDcttA%2BtN2Irkz2FPWfV%2BBUrKMU7G012niqsDu8qXBZeIxQ2ZA7z0sCdaZBaJqODywo4o5CeH173FKhmy00YFvfXXTtSZHOy3XYxuf%2BEDFix1q6bfiRe18eNA5mCR%2BPwLoFSEUdB5eLiJFvFpM6MXAUrWpEal3%2FAvIzXcEqqb0RPAb6YZid1%2BDV%2FzjBO%2B7W9JzVUrgKKdI%3D&X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Date=20191012T092107Z&X-Amz-SignedHeaders=host&X-Amz-Expires=900&X-Amz-Credential=ASIATSXCHOUAPLLBLSNZ%2F20191012%2Fus-east-2%2Fs3%2Faws4_request&X-Amz-Signature=ee56417aac64d5428616d74a93bd82bd6c7ec48bb77eec4a2611637203daaf48
Resolving s3.us-east-2.amazonaws.com (s3.us-east-2.amazonaws.com)... 52.219.104.114
Connecting to s3.us-east-2.amazonaws.com (s3.us-east-2.amazonaws.com)|52.219.104.114|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 30007 (29K) [application/octet-stream]
Saving to: 'results-output-job28912.json'

results-output-job28912.json                  100%[==============================================================================================>]  29.30K   143KB/s    in 0.2s    

2019-10-12 13:22:22 (143 KB/s) - 'results-output-job28912.json' saved [30007/30007]

Review the translation result from the Job

$ jq -r '.jobName,.status,.results.speaker_labels.speakers,.results.transcripts[0].transcript' results-output-job28912.json
job28912
COMPLETED
1
checking in with another show for H p. R. Um, In the car on my way to a client's gonna be a short show. I'm think I'm gonna be there in 10 minutes, but I want to do, you know, shoot something up the flagpole here, uh, wanted to talk about the state of podcasting these days. These days, I I sound old because in podcasting terms, I am. I've been around since 4 4000 Started producing shows since 2005. Have been listening to podcasts and daily since 2004. I came across, um, my own archives from shows that I used to download back then and listen to which I had burned to a CD, and I've put them on my nads. And I've started streaming them while at work the last couple of weeks and I've had a ball listening to old podcast episodes of

Language Detection

N/A