Twitter Export Script

October 13th, 2008

Filed under: Computers and Technology , Software

Tags:

I have been using Twitter as a log of my daily doings and wished to export my time-line for reformatting into a calender format. Unfortunately TweetDumpr just retrieves the list of Tweets using a single fetch request which is limited by the Twitter API to a maximum of 200 Tweets. (Update: apparently TweetDumpr can get more than 200 Tweets. It just didn’t say so in its description.)

I wanted to export all 600+ of my tweets, so I wrote the following little php script to accomplish this. I have not yet tested it with many concurrent users or added a form to select which user to update. Until I do so, I won’t be providing it as an end-user service. You are free to put it on your own machine and use it though.

TwitterExport.php

<?php
/**
 * This script will allow the export of complete user time-lines from the twitter
 * service. It joins together all pages of status updates into one large XML block
 * that can then be reformatted/processed with other tools.
 *
 * @since 10/13/08
 *
 * @copyright Copyright © 2008, Adam Franco
 * @license http://www.gnu.org/copyleft/gpl.html GNU General Public License (GPL)
 */

$user = 'afranco_work';	// Replace this with your user name.


header('Content-type: text/plain');

$allDoc = new DOMDocument;
$root = $allDoc->appendChild($allDoc->createElement('statuses'));
$root->setAttribute('type', 'array');

$page = 1;
do {
	$numStatus = 0;

	$pageDoc = new DOMDocument;
	$res = @$pageDoc->load('http://twitter.com/statuses/user_timeline/'.$user.'.xml?page='.$page);
	if (!$res) {
		print "\n\n**** Error loading page $page ****";
		exit;
	}
	foreach ($pageDoc->getElementsByTagName('status') as $status) {
		$root->appendChild($allDoc->createTextNode("\n"));
		$root->appendChild($allDoc->importNode($status, true));
		$numStatus++;
	}

	print "\nLoaded page $page with $numStatus status updates.";
	flush();

	$page ++;
	sleep(1);

} while ($numStatus);

print "\nDone loading timeline.";
print "\n\n\n";

$root->appendChild($allDoc->createTextNode("\n"));
print $allDoc->saveXml();



Usage (assuming PHP is installed)

  1. Save the code above on your machine as twitter_export.php
  2. Edit the code to change the $user variable to be your own Twitter username
  3. From the command line run php twitter_export.php
  4. Copy/paste the XML output into a file for safe keeping and further processing

27 Responses to “Twitter Export Script”

  1. Brad Kelletton 13 Oct 2008 at 10:52 am

    Actually Adam, TweetDumpr does as your method does and scrapes the HTML of a user’s timeline. Twitter now limits the number of pages you can go back in time now, however, so it isn’t possible to get your entire timeline.

    I was thinking of doing something with the Twitter search API, but unfortunately that too is limited to how many pages of results you can grab.

  2. Adam Francoon 13 Oct 2008 at 11:06 am

    Thanks Brad,

    I’ve updated the post to reflect this. The last time I looked at TweetDumpr there was a message saying it was limited to 250 Tweets…

    Using the script in the post above I was able to retrieve all 34 pages of my Tweets. Maybe HTML scraping the twitter site is the only place where the page limitation exists.

  3. Chrison 04 Nov 2008 at 10:57 am

    Adam, this rocks. I was able to download all of my timeline, which has a grand total of 1,533 updates. The only issue with the directions was that I had to run the file in my browser, as I got several errors when I attempted to run it from the command line.

    Now, if only I can figure out a way to import all of these into Identi.ca, I will stop using Twitter.

  4. Simon Crowleyon 18 Nov 2008 at 5:35 pm

    This is probably a thing with your script, not twitter, but it appeared to be working fine until page 100, when it threw an error—and now it throws an error loading page 1. Did I get throttled by Twitter, do you think?

  5. Simon Crowleyon 18 Nov 2008 at 5:36 pm

    I meant “probably a problem with twitter, not your script”. Proofreading is important, kids.

  6. Adam Francoon 18 Nov 2008 at 10:20 pm

    yes Simon, you probably got throttled. I find that I can usually run the script on my 700 tweets two or three times before I get blocked. When I try it the next day it works again.

  7. Greg Hollingsworthon 12 Feb 2009 at 2:21 pm

    Is there a way to modify this script so that it will work for twitter searches?

  8. Adam Francoon 12 Feb 2009 at 2:26 pm

    Greg, I’m sure it would be possible to do so. I haven’t been using Twitter recently — Yammer is the new thing at my workplace — so I will probably not be getting to that change myself. If you (or anyone else) does make those changes, please post them here for the benefit of others.

  9. [...] Adam Franco hat ein tolles PHP Script erstellt, um seine Twitterfeeds als XML Archiv lokal zu speichern. I like. Similar Posts/Verwandte Beiträge:Neue Webseite welche Fluggesellschaften bewertet [...]

  10. ellyon 14 May 2009 at 5:38 pm

    I’m trying to run this sucker and getting errors. Now that twitter no longer has its next/prev buttons and instead has that big ajaxy MORE button, this script no longer functions, since it’s based on the pagination URL. Any idea how to get to the old pagination URLs? Are they gone forever?

    These are the errors:

    Warning: domdocument() expects at least 1 parameter, 0 given in /home/ellyjonez/twitter_export.php on line 17

    Fatal error: Call to undefined function: appendchild() in /home/ellyjonez/twitter_export.php on line 18

  11. Y.G.on 28 Oct 2009 at 1:09 am

    This script is fantastic! Thank you too much!

  12. Samon 05 Jan 2010 at 6:47 pm

    Anyone know a way to perform a dump from a Twitter search result?

  13. sreeon 10 Jan 2010 at 10:46 pm

    Hey. Will this script work now…… I want to export all my tweets (nearly 2000). Plz help. Plz give a step by step instruction, as I don’t have much exposure in PHP

  14. diamondTearzon 11 Jan 2010 at 11:18 am

    @sree- I’ve gotten TweetScan to export my user timeline- just trying to figure out how to extract the data now.

  15. netzturbineon 19 Jan 2010 at 5:28 pm

    big kudos for this simple solution, adam

    it works like a charme. unfortunately #failwhale can screw it up. I’m trying it the 3rd time on twitter + raised the timeout to 30 secs in hope it will not exit again ;)

    with a bit of tweaking it can b used on identi.ca too.

    one needs just to change in line #26

    code:http://twitter.com/statuses/user_timeline/

    to

    code:http://identi.ca/api/statuses/user_timeline/

    or the equivalent of any other status.net instance ;)

    luv it
    so long
    arnd

  16. durahmanon 04 Mar 2010 at 11:32 pm

    Thankyou very much. This is what I’am looking for.

    My way is to save your script as php. say tweet.php. Afterthat i put it on my webserver on internet. Finally i call it throught browser. It’s work. Thanks again.

  17. Geryon 13 Mar 2010 at 4:58 am

    Hey Adam,
    i did like durahman and it works great. i want to add an xsl file to your script so that the xml i’ll get will be only the time and the status. i built an xsl file but i just don’t know where to put the lines in your script so the result will be xml (after transforming with xsl)
    plz help..
    thanks!

  18. Adam Francoon 13 Mar 2010 at 9:54 am

    Hi Gery,

    To output a transformed version, replace the last line
    print $allDoc->saveXml();
    with something close to this:
    $xslDoc = new DOMDocument();
    $xsl = new XSLTProcessor();
    $xslDoc->load($xsl_filename);
    $xsl->importStyleSheet($xslDoc);

    print $xsl->transformToXML($allDoc);

    I haven’t had a chance to test this, but it is based on this example of XSLTProcessor usage. The only thing you should need to change is to replace $xsl_filename with the path to your XSL file.

  19. Geryon 13 Mar 2010 at 5:46 pm

    Hey Adam,
    u helped me a lot!!!! thanks!!!!
    i have another little question if i may…
    i want to get only the last 3 statuses and not all the timeline… is there a little something to change in ur script so i’ll get only the last 3?
    thanks again,
    Gery.

  20. Adam Francoon 14 Mar 2010 at 2:15 pm

    Gery,

    To fetch just one page, you could change the line
    } while ($numStatus);
    to
    } while ($numStatus && $page < 1);
    or for 3 pages, change it to:
    } while ($numStatus && $page < 3);

    To fetch only 3 updates from the first page, make the change above as well as change
    foreach ($pageDoc->getElementsByTagName(‘status’) as $status) {
    $root->appendChild($allDoc->createTextNode(“\n”));
    $root->appendChild($allDoc->importNode($status, true));
    $numStatus++;
    }
    to
    foreach ($pageDoc->getElementsByTagName(‘status’) as $status) {
    $root->appendChild($allDoc->createTextNode(“\n”));
    $root->appendChild($allDoc->importNode($status, true));
    $numStatus++;
    if ($numStatus > 3) {
    break;
    }
    }

  21. Geryon 15 Mar 2010 at 1:19 am

    Thank u very very much!!!

  22. Chrison 21 Apr 2010 at 12:26 am

    Great script Adam! I just tried it and successfully pulled down my 21 pages of tweets into XML.

    What holds this script back from being used on the Internet (via HTTP).
    I think it’d be cool to run this script via http… and then in the background save the tweets to SQL. Any recs on which part of the script would need updating to get that going? Thanks!

  23. Adam Francoon 21 Apr 2010 at 1:06 pm

    @Chris — Accessing this script via the browser should be fine instead of running it from the command line. The only real issue you might run into is timeouts due to your setting of max_execution_time (default 30s) or Apache’s Timeout directive (default 300s). These time limits generally don’t apply when running scripts on the command line, making command-line invocation slightly more generally applicable.

  24. Skordahlon 19 Jan 2011 at 5:56 pm

    Great post!
    Thanks

  25. Pabloon 17 Sep 2011 at 8:01 am

    Hi Adam.

    I’ve just came accross your script and was wondering how to change this to just give all the tweets and dates of tweets, and exclude all the other data.
    Is this possible and if so can you explain how.

    Looking forward to reply.
    All The Best.
    Pablo

  26. Pabloon 17 Sep 2011 at 8:02 am

    P.S Great script!.

    Thanks.
    Pablo.

  27. kołowrotkion 08 Nov 2011 at 10:21 am

    Enjoyed reading this, very good stuff, regards .

Trackback URI | Comments RSS

Leave a Reply