Twitter Export Script

Adam Franco October 13th, 2008

I have been using Twitter as a log of my daily doings and wished to export my time-line for reformatting into a calender format. Unfortunately TweetDumpr just retrieves the list of Tweets using a single fetch request which is limited by the Twitter API to a maximum of 200 Tweets. (Update: apparently TweetDumpr can get more than 200 Tweets. It just didn’t say so in its description.)

I wanted to export all 600+ of my tweets, so I wrote the following little php script to accomplish this. I have not yet tested it with many concurrent users or added a form to select which user to update. Until I do so, I won’t be providing it as an end-user service. You are free to put it on your own machine and use it though.

TwitterExport.php

<?php
/**
 * This script will allow the export of complete user time-lines from the twitter
 * service. It joins together all pages of status updates into one large XML block
 * that can then be reformatted/processed with other tools.
 *
 * @since 10/13/08
 *
 * @copyright Copyright © 2008, Adam Franco
 * @license http://www.gnu.org/copyleft/gpl.html GNU General Public License (GPL)
 */

$user = 'afranco_work';	// Replace this with your user name.

header('Content-type: text/plain');

$allDoc = new DOMDocument;
$root = $allDoc->appendChild($allDoc->createElement('statuses'));
$root->setAttribute('type', 'array');

$page = 1;
do {
	$numStatus = 0;

	$pageDoc = new DOMDocument;
	$res = @$pageDoc->load('http://twitter.com/statuses/user_timeline/'.$user.'.xml?page='.$page);
	if (!$res) {
		print "\n\n**** Error loading page $page ****";
		exit;
	}
	foreach ($pageDoc->getElementsByTagName('status') as $status) {
		$root->appendChild($allDoc->createTextNode("\n"));
		$root->appendChild($allDoc->importNode($status, true));
		$numStatus++;
	}

	print "\nLoaded page $page with $numStatus status updates.";
	flush();

	$page ++;
	sleep(1);

} while ($numStatus);

print "\nDone loading timeline.";
print "\n\n\n";

$root->appendChild($allDoc->createTextNode("\n"));
print $allDoc->saveXml();



Usage (assuming PHP is installed)

  1. Save the code above on your machine as twitter_export.php
  2. Edit the code to change the $user variable to be your own Twitter username
  3. From the command line run php twitter_export.php
  4. Copy/paste the XML output into a file for safe keeping and further processing

15 Responses to “Twitter Export Script”

  1. Brad Kelletton 13 Oct 2008 at 10:52 am

    Actually Adam, TweetDumpr does as your method does and scrapes the HTML of a user’s timeline. Twitter now limits the number of pages you can go back in time now, however, so it isn’t possible to get your entire timeline.

    I was thinking of doing something with the Twitter search API, but unfortunately that too is limited to how many pages of results you can grab.

  2. Adam Francoon 13 Oct 2008 at 11:06 am

    Thanks Brad,

    I’ve updated the post to reflect this. The last time I looked at TweetDumpr there was a message saying it was limited to 250 Tweets…

    Using the script in the post above I was able to retrieve all 34 pages of my Tweets. Maybe HTML scraping the twitter site is the only place where the page limitation exists.

  3. Chrison 04 Nov 2008 at 10:57 am

    Adam, this rocks. I was able to download all of my timeline, which has a grand total of 1,533 updates. The only issue with the directions was that I had to run the file in my browser, as I got several errors when I attempted to run it from the command line.

    Now, if only I can figure out a way to import all of these into Identi.ca, I will stop using Twitter.

  4. Simon Crowleyon 18 Nov 2008 at 5:35 pm

    This is probably a thing with your script, not twitter, but it appeared to be working fine until page 100, when it threw an error—and now it throws an error loading page 1. Did I get throttled by Twitter, do you think?

  5. Simon Crowleyon 18 Nov 2008 at 5:36 pm

    I meant “probably a problem with twitter, not your script”. Proofreading is important, kids.

  6. Adam Francoon 18 Nov 2008 at 10:20 pm

    yes Simon, you probably got throttled. I find that I can usually run the script on my 700 tweets two or three times before I get blocked. When I try it the next day it works again.

  7. Greg Hollingsworthon 12 Feb 2009 at 2:21 pm

    Is there a way to modify this script so that it will work for twitter searches?

  8. Adam Francoon 12 Feb 2009 at 2:26 pm

    Greg, I’m sure it would be possible to do so. I haven’t been using Twitter recently — Yammer is the new thing at my workplace — so I will probably not be getting to that change myself. If you (or anyone else) does make those changes, please post them here for the benefit of others.

  9. [...] Adam Franco hat ein tolles PHP Script erstellt, um seine Twitterfeeds als XML Archiv lokal zu speichern. I like. Similar Posts/Verwandte Beiträge:Neue Webseite welche Fluggesellschaften bewertet [...]

  10. ellyon 14 May 2009 at 5:38 pm

    I’m trying to run this sucker and getting errors. Now that twitter no longer has its next/prev buttons and instead has that big ajaxy MORE button, this script no longer functions, since it’s based on the pagination URL. Any idea how to get to the old pagination URLs? Are they gone forever?

    These are the errors:

    Warning: domdocument() expects at least 1 parameter, 0 given in /home/ellyjonez/twitter_export.php on line 17

    Fatal error: Call to undefined function: appendchild() in /home/ellyjonez/twitter_export.php on line 18

  11. Y.G.on 28 Oct 2009 at 1:09 am

    This script is fantastic! Thank you too much!

  12. Samon 05 Jan 2010 at 6:47 pm

    Anyone know a way to perform a dump from a Twitter search result?

  13. sreeon 10 Jan 2010 at 10:46 pm

    Hey. Will this script work now…… I want to export all my tweets (nearly 2000). Plz help. Plz give a step by step instruction, as I don’t have much exposure in PHP

  14. diamondTearzon 11 Jan 2010 at 11:18 am

    @sree- I’ve gotten TweetScan to export my user timeline- just trying to figure out how to extract the data now.

  15. netzturbineon 19 Jan 2010 at 5:28 pm

    big kudos for this simple solution, adam

    it works like a charme. unfortunately #failwhale can screw it up. I’m trying it the 3rd time on twitter + raised the timeout to 30 secs in hope it will not exit again ;)

    with a bit of tweaking it can b used on identi.ca too.

    one needs just to change in line #26

    code:http://twitter.com/statuses/user_timeline/

    to

    code:http://identi.ca/api/statuses/user_timeline/

    or the equivalent of any other status.net instance ;)

    luv it
    so long
    arnd

Trackback URI | Comments RSS

Leave a Reply