Twitter Export Script

I have been using Twitter as a log of my daily doings and wished to export my time-line for reformatting into a calender format. Unfortunately TweetDumpr just retrieves the list of Tweets using a single fetch request which is limited by the Twitter API to a maximum of 200 Tweets. (Update: apparently TweetDumpr can get more than 200 Tweets. It just didn’t say so in its description.)

I wanted to export all 600+ of my tweets, so I wrote the following little php script to accomplish this. I have not yet tested it with many concurrent users or added a form to select which user to update. Until I do so, I won’t be providing it as an end-user service. You are free to put it on your own machine and use it though.

TwitterExport.php

<?php
/**
 * This script will allow the export of complete user time-lines from the twitter
 * service. It joins together all pages of status updates into one large XML block
 * that can then be reformatted/processed with other tools.
 *
 * @since 10/13/08
 *
 * @copyright Copyright © 2008, Adam Franco
 * @license http://www.gnu.org/copyleft/gpl.html GNU General Public License (GPL)
 */

$user = 'afranco_work';	// Replace this with your user name.


header('Content-type: text/plain');

$allDoc = new DOMDocument;
$root = $allDoc->appendChild($allDoc->createElement('statuses'));
$root->setAttribute('type', 'array');

$page = 1;
do {
	$numStatus = 0;

	$pageDoc = new DOMDocument;
	$res = @$pageDoc->load('http://twitter.com/statuses/user_timeline/'.$user.'.xml?page='.$page);
	if (!$res) {
		print "\n\n**** Error loading page $page ****";
		exit;
	}
	foreach ($pageDoc->getElementsByTagName('status') as $status) {
		$root->appendChild($allDoc->createTextNode("\n"));
		$root->appendChild($allDoc->importNode($status, true));
		$numStatus++;
	}

	print "\nLoaded page $page with $numStatus status updates.";
	flush();

	$page ++;
	sleep(1);

} while ($numStatus);

print "\nDone loading timeline.";
print "\n\n\n";

$root->appendChild($allDoc->createTextNode("\n"));
print $allDoc->saveXml();

Usage (assuming PHP is installed)

  1. Save the code above on your machine as twitter_export.php
  2. Edit the code to change the $user variable to be your own Twitter username
  3. From the command line run php twitter_export.php
  4. Copy/paste the XML output into a file for safe keeping and further processing

27 Comments

  1. Actually Adam, TweetDumpr does as your method does and scrapes the HTML of a user’s timeline. Twitter now limits the number of pages you can go back in time now, however, so it isn’t possible to get your entire timeline.

    I was thinking of doing something with the Twitter search API, but unfortunately that too is limited to how many pages of results you can grab.

  2. Thanks Brad,

    I’ve updated the post to reflect this. The last time I looked at TweetDumpr there was a message saying it was limited to 250 Tweets…

    Using the script in the post above I was able to retrieve all 34 pages of my Tweets. Maybe HTML scraping the twitter site is the only place where the page limitation exists.

  3. Adam, this rocks. I was able to download all of my timeline, which has a grand total of 1,533 updates. The only issue with the directions was that I had to run the file in my browser, as I got several errors when I attempted to run it from the command line.

    Now, if only I can figure out a way to import all of these into Identi.ca, I will stop using Twitter.

  4. This is probably a thing with your script, not twitter, but it appeared to be working fine until page 100, when it threw an error—and now it throws an error loading page 1. Did I get throttled by Twitter, do you think?

  5. I meant “probably a problem with twitter, not your script”. Proofreading is important, kids.

  6. yes Simon, you probably got throttled. I find that I can usually run the script on my 700 tweets two or three times before I get blocked. When I try it the next day it works again.

  7. Is there a way to modify this script so that it will work for twitter searches?

  8. Greg, I’m sure it would be possible to do so. I haven’t been using Twitter recently — Yammer is the new thing at my workplace — so I will probably not be getting to that change myself. If you (or anyone else) does make those changes, please post them here for the benefit of others.

  9. Pingback: Twitterfeed XML Archiv Script | The Man In The Arena

  10. I’m trying to run this sucker and getting errors. Now that twitter no longer has its next/prev buttons and instead has that big ajaxy MORE button, this script no longer functions, since it’s based on the pagination URL. Any idea how to get to the old pagination URLs? Are they gone forever?

    These are the errors:

    Warning: domdocument() expects at least 1 parameter, 0 given in /home/ellyjonez/twitter_export.php on line 17

    Fatal error: Call to undefined function: appendchild() in /home/ellyjonez/twitter_export.php on line 18

  11. This script is fantastic! Thank you too much!

  12. Anyone know a way to perform a dump from a Twitter search result?

  13. Hey. Will this script work now…… I want to export all my tweets (nearly 2000). Plz help. Plz give a step by step instruction, as I don’t have much exposure in PHP

  14. @sree- I’ve gotten TweetScan to export my user timeline- just trying to figure out how to extract the data now.

  15. big kudos for this simple solution, adam

    it works like a charme. unfortunately #failwhale can screw it up. I’m trying it the 3rd time on twitter + raised the timeout to 30 secs in hope it will not exit again 😉

    with a bit of tweaking it can b used on identi.ca too.

    one needs just to change in line #26

    code:http://twitter.com/statuses/user_timeline/

    to

    code:http://identi.ca/api/statuses/user_timeline/

    or the equivalent of any other status.net instance 😉

    luv it
    so long
    arnd

  16. Thankyou very much. This is what I’am looking for.

    My way is to save your script as php. say tweet.php. Afterthat i put it on my webserver on internet. Finally i call it throught browser. It’s work. Thanks again.

  17. Hey Adam,
    i did like durahman and it works great. i want to add an xsl file to your script so that the xml i’ll get will be only the time and the status. i built an xsl file but i just don’t know where to put the lines in your script so the result will be xml (after transforming with xsl)
    plz help..
    thanks!

  18. Hi Gery,

    To output a transformed version, replace the last line
    print $allDoc->saveXml();
    with something close to this:
    $xslDoc = new DOMDocument();
    $xsl = new XSLTProcessor();
    $xslDoc->load($xsl_filename);
    $xsl->importStyleSheet($xslDoc);

    print $xsl->transformToXML($allDoc);

    I haven’t had a chance to test this, but it is based on this example of XSLTProcessor usage. The only thing you should need to change is to replace $xsl_filename with the path to your XSL file.

  19. Hey Adam,
    u helped me a lot!!!! thanks!!!!
    i have another little question if i may…
    i want to get only the last 3 statuses and not all the timeline… is there a little something to change in ur script so i’ll get only the last 3?
    thanks again,
    Gery.

  20. Gery,

    To fetch just one page, you could change the line
    } while ($numStatus);
    to
    } while ($numStatus && $page < 1);
    or for 3 pages, change it to:
    } while ($numStatus && $page < 3);

    To fetch only 3 updates from the first page, make the change above as well as change
    foreach ($pageDoc->getElementsByTagName(‘status’) as $status) {
    $root->appendChild($allDoc->createTextNode(“\n”));
    $root->appendChild($allDoc->importNode($status, true));
    $numStatus++;
    }
    to
    foreach ($pageDoc->getElementsByTagName(‘status’) as $status) {
    $root->appendChild($allDoc->createTextNode(“\n”));
    $root->appendChild($allDoc->importNode($status, true));
    $numStatus++;
    if ($numStatus > 3) {
    break;
    }
    }

  21. Thank u very very much!!!

  22. Great script Adam! I just tried it and successfully pulled down my 21 pages of tweets into XML.

    What holds this script back from being used on the Internet (via HTTP).
    I think it’d be cool to run this script via http… and then in the background save the tweets to SQL. Any recs on which part of the script would need updating to get that going? Thanks!

  23. @Chris — Accessing this script via the browser should be fine instead of running it from the command line. The only real issue you might run into is timeouts due to your setting of max_execution_time (default 30s) or Apache’s Timeout directive (default 300s). These time limits generally don’t apply when running scripts on the command line, making command-line invocation slightly more generally applicable.

  24. Hi Adam.

    I’ve just came accross your script and was wondering how to change this to just give all the tweets and dates of tweets, and exclude all the other data.
    Is this possible and if so can you explain how.

    Looking forward to reply.
    All The Best.
    Pablo

  25. P.S Great script!.

    Thanks.
    Pablo.

  26. Enjoyed reading this, very good stuff, regards .

Leave a Reply

Your email address will not be published. Required fields are marked *