How to Automatically Archive Twitch Streams - The Hard Way
Fri Jul 20, 2018 · 1781 words

If you’re like me, simply downloading Twitch VODs aren’t good enough. The major one for me is that VODs will have copyrighted content muted, so if the stream has a copyrighted song playing, an entire 5+ minute chunk of the video will be muted. You can get around this by streaming the stream to a local file with streamlink.
Pretty simple, until you want it to not be. Let’s make it difficult, with powershell scripting, timed scripts, and error handling!


In this I’ll be using PowerShell, because I’m on Windows and scripting with it is pretty easy. However, there are a few nags that need to be addressed during the process, which I’ll address later.

Step 1: Installing the necessary stuff

We’re going to use Streamlink in order to stream the Twitch stream directly to a file. Streamlink gives us a lot of options in the stream, such as desired quality.
If you don’t already have Python installed, you can get the latest release here.
Pip is my preferred Python package installer, and you can install streamlink with pip install streamlink. Pip installs all the necessary dependencies as well, which is nice.

FFmpeg

I use FFmpeg here to copy the streamed file to a final mp4 container. This isn’t really necessary, but I like to do it because the streamed file doesn’t have the length in the windows file explorer and that bothers me. It also has the odd benefit of slightly reducing the file size (around 3% reduction in my observations). Not much of an impact (like 30GB saved over 1000GB), but hey.
Rambling aside (this is my blog, I’m allowed to ramble, yeah?), installed ffmpeg is simple. Download the binary zip, and extract the bin/ffmpeg.exe file to your C:\Windows\System32\. This automatically adds it to your path, instead of having to extract it to some odd folder and adding it to path.

Get a Twitch API token

So this isn’t technically anything you need to install, but you do need this to get Twitch data through API calls.
Go to dev.twitch.tv/dashboard and register a new app. You can name it whatever you want, with any category, it doesn’t really matter. All you care about is the client ID, which is what you use to authenticate basic usage of the Twitch API.

Step 2: Getting data from Twitch

When saving the file, we can name it however we want. However, for my use, I want to name it based on the day (20180720), the time of the stream (AM or PM), and the game (Hitman). The first two can be found from built-in PowerShell commands, but the game we can pull from Twitch APIs.

While getting the game from Twitch is cool and all, what I actually want is to see if the channel is live, keep checking until they are live, and then get the game they’re playing and start downloading the stream.
Channel ID: Twitch APIs use channel IDs instead of the channel name (login name), however there isn’t an easy way to get this… unless you use wonderful APIs!

Let’s begin making our PowerShell script:

$CLIENTID="13ll40uu06spkju4haat5eeqcpsp61" # Client ID from Twitch Dev
$STREAMID="" # Leave blank if you don't know it, or fill it in
$LOGINNAME="moonmoon_ow" # Login of the user you want to get data on (can be left blank if streamid is filled in)

Now, if we don’t know the stream ID, we can get that with a simple curl API call

if ($STREAMID.Length.Equals(0)) {
$STREAMID=curl -s -H "Accept: application/vnd.twitchtv.v5+json" \ -H "Client-ID: $CLIENTID" \ -X GET "https://api.twitch.tv/kraken/users?login=$LOGINNAME"
$STREAMID=$STREAMID -replace "^.*id\`":\`"",""
$STREAMID=$STREAMID -replace "\`",\`"name\`".*$",""
}

This only runs if the STREAMID variable is blank. All the -replace text that looks like actual garbage is regex, and because the api returns a json string, we have to sift through what’s given to get what we want (the stream ID).
Here’s what the curl returns before formatting:

{"_total":1,"users":[{"display_name":"MOONMOON_OW","_id":"121059319","name":"moonmoon_ow","type":"user","bio":"hi i am a video gamer i play the video games | twitter.com/moonmoon_ow","created_at":"2016-04-06T04:12:40.993797Z","updated_at":"2018-07-20T13:02:14.113217Z","logo":"picture url i cut off"}]}

The first -replace takes the parameters "^.*id\`":\`"","", which can be broken down into several pieces of regex.

The next -replace takes the parameters “`”,`“name`”.*$",""

This is what I meant when I said doing it the hard way. Still getting data btw, hang in there!

Step 2.5: Making sure the channel is live, and getting the game name

To check if a channel is live, an API call can be made:

$DATA=curl -s -H "Accept: application/vnd.twitchtv.v5+json" \ -H "Client-ID: $CLIENTID" \ -X GET "https://api.twitch.tv/kraken/streams/$STREAMID" 

Which returns:

{"stream":{"_id":29559308352,"game":"Bloons TD 6","broadcast_platform":"live","community_id":"","community_ids":[],"viewers":10368,"video_height":900,"average_fps":60,"delay":0,"created_at":"2018-07-20T14:07:47Z","is_playlist":false,"stream_type":"live","preview":{"small":"https://static-cdn.jtvnw.net/previews-ttv/live_user_moonmoon_ow-80x45.jpg","medium":"https://static-cdn.jtvnw.net/previews-ttv/live_user_moonmoon_ow-320x180.jpg","large":"https://static-cdn.jtvnw.net/previews-ttv/live_user_moonmoon_ow-640x360.jpg","template":"https://static-cdn.jtvnw.net/previews-ttv/live_user_moonmoon_ow-{width}x{height}.jpg"},"channel":{"mature":true,"status":"NEW bloons :)","broadcaster_language":"en","display_name":"MOONMOON_OW","game":"Bloons TD 6","language":"en","_id":121059319,"name":"moonmoon_ow","created_at":"2016-04-06T04:12:40.993797Z","updated_at":"2018-07-20T18:01:32.044163Z","partner":true,"logo":"https://static-cdn.jtvnw.net/jtv_user_pictures/3973e918fe7cc8c8-profile_image-300x300.png","video_banner":"https://static-cdn.jtvnw.net/jtv_user_pictures/moonmoon_ow-channel_offline_image-2b3302e20384eee8-1920x1080.png","profile_banner":"https://static-cdn.jtvnw.net/jtv_user_pictures/moonmoon_ow-profile_banner-13fbfa1ba07bcd8a-480.png","profile_banner_background_color":"#ffffff","url":"https://www.twitch.tv/moonmoon_ow","views":37775223,"followers":622895,"broadcaster_type":"","description":"hi i am a video gamer i play the video games | twitter.com/moonmoon_ow","private_video":false,"privacy_options_enabled":false}}}

And all we need is "game":"Bloons TD 6" (but we can also use this curl to check if the channel is live).

If the channel is not live, all that returns from the above API call is

{"stream":null}

And since we don’t want to get rate-limited by Twitch, if the channel is not live, sleep for 30 seconds before trying again

if ($DATA.Contains("null")) { echo "Stream is not live, sleeping for 30s"; Sleep 30 }

I made this into a one-liner because it’s essentially just for sleeping, with an extra echo output so the user knows whats going on.
Wrap the entire request and sleep function in a do-while $DATA-Contains("null") to keep looping through the curl request until the streamer is live.

Here are the next two bits of replacin’ regex:

$DATA=$DATA -replace ",\`"broadcast_platform\`".*$",""
$DATA=$DATA -replace "^.*,\`"game","{`"game"
$DATA=$DATA + "}"

These are identical to the previous regex uses, and the only thing that changes is where I want the start and end line selection to stop and begin. It’s important to note here that switching the first two commands around would result in wrong output, because there are two game instances in the data. By removing everything past the broadcast_platform, which is right after the first game, we guarentee the right data is selected. (Probably) I also add a { to the front, because without it the text wouldn’t be proper JSON output, and our next step wouldn’t work out very well.

Finally, we can convert the JSON string into a PowerShell object, with

$DATA=$DATA | ConvertFrom-Json

Which allows us to select the game with

$GAME=$DATA.game -replace " ","_"

(And also replace the spaces with _, which is entirely optional and only changes how the files are named)

Step 3: What are we working on, again?

All this for the name of the channels game. Now that we have what we need for the file naming, we can actually get around to downloading, storing, and scheduling.

Step 3.33: Downloading

The filenames can (and should) be what you want. This is based on my extremely specific use case. I’m naming my files based on the date, time, and game (which we went through all that trouble to get programatically).

$FILE=(Get-Date -UFormat "%Y%m%d_%p") + "_" + $GAME
$FILETEMP=$FILE + "_t.mp4"
$FILE=$FILE + ".mp4"

If you want to use different time formats (e.g. %Y%m%d_%p turns into 20170720_PM), consult this list for the UNIX time format codes.
The temp file name is for the ffmpeg encode, which has to copy the file into a new one for a bit. Don’t worry, we’ll change it back!

Step 3.66: Storing

To store, we just have to go into the directory we want to save it in, start streaming the stream to a file, and once the stream is done, use ffmpeg to copy it for my inane reason (see preface).

Use the basic cd to change into whatever directory you want (can even use variables).
Start streamlink with

streamlink -o $FILE twitch.tv/moonmoon_ow best

Once the stream is over, encode it with ffmpeg

ffmpeg -i $FILE -c copy $FILETEMP

Then just change the temp filename to be the new filename (and delete the old one)

rm $FILE
ren $FILETEMP $FILE

And that’s it! If you skipped all the way down here, that’s fine. I wanted to document what I did, and what I was thinking for each step, as well as trying to give more understanding of what the code does.

Step 3.99: Scheduling

If you want to run the script automatically based on time (or something more complicated), the Windows Task Scheduler is what I’d use. All you have to do is set up some rules based on, say, the streamer’s schedule, and have it run. For more detail, check out this spiceworks article.

The end result

# Required Variables
$CLIENTID="" # Twitch API ID
$STREAMID="" # Stream ID of the stream, can be left blank
$LOGINNAME="" # Twitch URL of the stream you want to download
$DIR="" # Base directory to save the files

if ($STREAMID.Length.Equals(0)) {
$STREAMID=curl -s -H "Accept: application/vnd.twitchtv.v5+json" \ -H "Client-ID: $CLIENTID" \ -X GET "https://api.twitch.tv/kraken/users?login=$LOGINNAME"
$STREAMID=$STREAMID -replace "^.*id\`":\`"",""
$STREAMID=$STREAMID -replace "\`",\`"name\`".*$",""
}

Do {

$DATA=curl -s -H "Accept: application/vnd.twitchtv.v5+json" \ -H "Client-ID: $CLIENTID" \ -X GET "https://api.twitch.tv/kraken/streams/$STREAMID" 
# Get json data on current stream. If offline, result is a null json.
# GET ends with the streamers client ID

if ($DATA.Contains("null")) { echo "Stream is not live, sleeping for 30s"; Sleep 30 }

} While ($DATA.Contains("null"))

$DATA=$DATA -replace ",\`"broadcast_platform\`".*$",""
# Delete everything after (and including) the broadcast_platform. Only thing we're interested in is if they're live and what game they're playing
$DATA=$DATA -replace "^.*,\`"game","{`"game"
# Remove some prefix data so it's just the game name and live status

$DATA=$DATA + "}"
$DATA=$DATA | ConvertFrom-Json
# Converts json output to PS object

mkdir "$DIR/$($DATA.game)"
$GAME=$DATA.game -replace " ","_"

$FILE=(Get-Date -UFormat "%Y%m%d_%p") + "_" + $GAME
$FILETEMP=$FILE + "_t.mp4"
$FILE=$FILE + ".mp4"
cd "$DIR/$GAME"


streamlink -o $FILE twitch.tv/$LOGINNAME best
ffmpeg -i $FILE -c copy $FILETEMP
rm $FILE
ren $FILETEMP $FILE

View the up-to-date version on GitHub

Thanks for reading!


back · about · writing · projects