Twitter Export Script

Adam Franco October 13th, 2008

I have been using Twitter as a log of my daily doings and wished to export my time-line for reformatting into a calender format. Unfortunately TweetDumpr just retrieves the list of Tweets using a single fetch request which is limited by the Twitter API to a maximum of 200 Tweets. (Update: apparently TweetDumpr can get more than 200 Tweets. It just didn’t say so in its description.)

I wanted to export all 600+ of my tweets, so I wrote the following little php script to accomplish this. I have not yet tested it with many concurrent users or added a form to select which user to update. Until I do so, I won’t be providing it as an end-user service. You are free to put it on your own machine and use it though.

TwitterExport.php

<?php
/**
 * This script will allow the export of complete user time-lines from the twitter
 * service. It joins together all pages of status updates into one large XML block
 * that can then be reformatted/processed with other tools.
 *
 * @since 10/13/08
 *
 * @copyright Copyright © 2008, Adam Franco
 * @license http://www.gnu.org/copyleft/gpl.html GNU General Public License (GPL)
 */

$user = 'afranco_work';	// Replace this with your user name.

header('Content-type: text/plain');

$allDoc = new DOMDocument;
$root = $allDoc->appendChild($allDoc->createElement('statuses'));
$root->setAttribute('type', 'array');

$page = 1;
do {
	$numStatus = 0;

	$pageDoc = new DOMDocument;
	$res = @$pageDoc->load('http://twitter.com/statuses/user_timeline/'.$user.'.xml?page='.$page);
	if (!$res) {
		print "\n\n**** Error loading page $page ****";
		exit;
	}
	foreach ($pageDoc->getElementsByTagName('status') as $status) {
		$root->appendChild($allDoc->createTextNode("\n"));
		$root->appendChild($allDoc->importNode($status, true));
		$numStatus++;
	}

	print "\nLoaded page $page with $numStatus status updates.";
	flush();

	$page ++;
	sleep(1);

} while ($numStatus);

print "\nDone loading timeline.";
print "\n\n\n";

$root->appendChild($allDoc->createTextNode("\n"));
print $allDoc->saveXml();



Usage (assuming PHP is installed)

  1. Save the code above on your machine as twitter_export.php
  2. Edit the code to change the $user variable to be your own Twitter username
  3. From the command line run php twitter_export.php
  4. Copy/paste the XML output into a file for safe keeping and further processing

6 Responses to “Twitter Export Script”

  1. Brad Kelletton 13 Oct 2008 at 10:52 am

    Actually Adam, TweetDumpr does as your method does and scrapes the HTML of a user’s timeline. Twitter now limits the number of pages you can go back in time now, however, so it isn’t possible to get your entire timeline.

    I was thinking of doing something with the Twitter search API, but unfortunately that too is limited to how many pages of results you can grab.

  2. Adam Francoon 13 Oct 2008 at 11:06 am

    Thanks Brad,

    I’ve updated the post to reflect this. The last time I looked at TweetDumpr there was a message saying it was limited to 250 Tweets…

    Using the script in the post above I was able to retrieve all 34 pages of my Tweets. Maybe HTML scraping the twitter site is the only place where the page limitation exists.

  3. Chrison 04 Nov 2008 at 10:57 am

    Adam, this rocks. I was able to download all of my timeline, which has a grand total of 1,533 updates. The only issue with the directions was that I had to run the file in my browser, as I got several errors when I attempted to run it from the command line.

    Now, if only I can figure out a way to import all of these into Identi.ca, I will stop using Twitter.

  4. Simon Crowleyon 18 Nov 2008 at 5:35 pm

    This is probably a thing with your script, not twitter, but it appeared to be working fine until page 100, when it threw an error—and now it throws an error loading page 1. Did I get throttled by Twitter, do you think?

  5. Simon Crowleyon 18 Nov 2008 at 5:36 pm

    I meant “probably a problem with twitter, not your script”. Proofreading is important, kids.

  6. Adam Francoon 18 Nov 2008 at 10:20 pm

    yes Simon, you probably got throttled. I find that I can usually run the script on my 700 tweets two or three times before I get blocked. When I try it the next day it works again.

Trackback URI | Comments RSS

Leave a Reply