1. Trang chủ >
  2. Công Nghệ Thông Tin >
  3. Quản trị mạng >

Hack 21. Track Additions to Yahoo!

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (15.01 MB, 888 trang )


You'reinterestedintrendtracking

Whichcategoriesareconsistentlybusy?Whichareallbut

dead?BywatchinghowYahoo!addssitestocategories,

overtimeyou'llgetasenseoftherhythmsandtrendsand

detectwhenunusualactivityoccursinacategory.

ThishackscrapestherecentcountsofadditionstoYahoo!

categoriesandprintsthemout,providinganat-a-glancelookat

additionstovariouscategories.You'llalsogetatab-delimited

tableofhowmanysiteshavebeenaddedtoeachcategoryfor

eachday.Atab-delimitedfileisexcellentforimportingintoa

spreadsheet,whereyoucanturnthecountnumbersintoa

chart.



1.22.1.TheCode

Savethefollowingcodetoafilecalledhoocount.pl:





#!/usr/bin/perl-w













usestrict;

useDate::Manip;

useLWP::Simple;

useGetopt::Long;







$ENV{TZ}="GMT"if$^Oeq"MSWin32";





















#thehomepageforYahoo!'s"What'sNew".

my$new_url="http://dir.yahoo.com/new/";



#themajorcategoriesatYahoo!.hash'dbecause

#we'llusethemtoholdourcountsstring.

my@categories=("Arts&Humanities",

"Busine







"Computers&Internet",







"Entertainment",























"Health",











"Recreation&Sports",







"Regional",









"SocialScience",



my%final_counts;#wherewesaveourfinalreadouts.











#loadinouroptionsfromthecommandline.

my%opts;GetOptions(\%opts,"c|count=i");

dieunless$opts{c};#countsitesfrompast$idays.











#ifwe'vebeentoldtocountthenumberofnewsites,

#thenwe'llgothrougheachofourmaincategories

#forthelast$idaysandcollatearesult.











#begintheheader

#forourimportfile.

my$header="Category";























#fromtoday,goingbackwards,get$idays.

for(my$i=1;$i<=$opts{c};$i++){















#createaData::Maniptimethatwill

#beusedtoconstructthelast$idays

my$day;#queryforYahoo!retrieval.

if($i==1){$day="yesterday";}

else{$day="$idaysago";}

my$date=UnixDate($day,"%Y%m%d");











































#andthisdateto

#ourimportfile.

$header.="\t$date";

#anddownloadtheday.

my$url="$new_url$date.html";

my$data=get($url)ordie$!;

#andloopthrougheachofourcategories.























}



my$day_count;foreachmy$category(sort@cate



$data=~/$category.*?(\d+)/;my$count



$final_counts{$category}.="\t$count";

}

















#withallourcountsfinished,

#printoutourfinalfile.

print$header."\n";

foreachmy$category(@categories){



print$category,$final_counts{$category},"\n"

}



1.22.2.RunningtheHack

Theonlyargumentyouneedtoprovidetothescriptisthe

numberofdaysbackyou'dlikeittotravelinsearchofnew

additions.SinceYahoo!doesn'tarchiveits"newpagesadded"

indefinitely,asafeupperlimitisaroundtwoweeks.Here,we're

lookingatthepasttwodays:



























%perlhoocount.pl--count2

Category



20050711

Arts&Humanities





Business&Economy





Computers&Internet



30

Education





0

Entertainment

77



Government





2

Health 11





0

News&Media

0





Recreation&Sports





Reference





0





32

44











2005071





0



0





0

48





















Regional



Science6



SocialScience

Society&Culture







0





81

9

















12



0





1.22.3.HackingtheHack

Ifyou'renotonlyaresearcherbutalsoaYahoo!observer,you

mightbeinterestedinhowthenumberofsitesaddedchanges

overtime.Tothatend,youcouldrunthisscriptundercronor

theWindowsSchedulerandoutputtheresultstoafile.After

threemonthsorso,you'dhaveaprettyinterestingsetof

countstomanipulatewithaspreadsheetprogram.

KevinHemenwayandTaraCalishain







Hack22.Yahoo!DirectoryMindshareinGoogle



HowdoeslinkpopularitycompareinYahoo!'ssearchable

subjectindexversusGoogle'sfull-textindex?Findoutby

calculatingmindshare!

Yahoo!andGooglearetwoverydifferentanimals.Yahoo!

indexesonlyasite'smainURL,title,anddescription,while

Googlebuildsfull-textindexesofentiresites.Surelythere's

someinterestingcross-pollinationwhenyoucombineresults

fromthetwo.

ThishackscrapesalltheURLsinaspecifiedsubcategoryofthe

Yahoo!directory.ItthentakeseachURLandgetsitslinkcount

fromGoogle.Eachlinkcountprovidesanicesnapshotofhowa

particularYahoo!categoryanditslistedsitesstackuponthe

popularityscale.



What'salinkcount?It'ssimplythetotalnumberofpagesinGoogle's

indexthatlinktoaspecificURL.



Thereareacoupleofwaysyoucanuseyourknowledgeofa

subcategory'slinkcount.IfyoufindasubcategorywhoseURLs

haveonlyafewlinkseachinGoogle,youmayhavefounda

subcategorythatisn'tgettingalotofattentionfromYahoo!'s

editors.Considergoingelsewhereforyourresearch.Ifyou'rea

webmasterandyou'reconsideringpayingtohaveYahoo!add

youtoitsdirectory,runthishackonthecategoryinwhichyou

wanttobelisted.Aremostofthelinksreallypopular?Ifthey



are,areyousureyoursitewillstandoutandgetclicks?Maybe

youshouldchooseadifferentcategory.

WegotthisideafromasimilarexperimentdonebyJonUdell

(http://weblog.infoworld.com/udell)in2001.HeusedAltaVista

insteadofGoogle;see

http://udell.roninhouse.com/download/mindshare-script.txt.We

appreciatetheinspiration,Jon!



1.23.1.TheCode

YouwillneedaGoogleAPIaccount(http://api.google.com)as

wellasthePerlmodulesSOAP::Lite(http://www.soaplite.com)

andHTML::LinkExtor(http://search.cpan.org/author/GAAS/HTMLParser/lib/HTML/LinkExtor.pm)torunthefollowingcode.You'll

alsoneedacopyoftheGoogleWSDLfileinthesamedirectory

asthescript(http://api.google.com/GoogleSearch.wsdl).Save

thefollowingcodetoafilecalledmindshare.pl:















#!/usr/bin/perl-w













my$google_key="yourAPIkeygoeshere";

my$google_wdsl="GoogleSearch.wsdl";

my$yahoo_dir=shift||"/Computers_and_Internet/Data_







"eXtensible_Markup_Language_/RS









#downloadtheYahoo!directory.

my$data=get("http://dir.yahoo.com".$yahoo_dir)or









#createourGoogleobject.

my$google_search=SOAP::Lite->service("file:$google_w



usestrict;

useLWP::Simple;

useHTML::LinkExtor;

useSOAP::Lite;



Xem Thêm
Tải bản đầy đủ (.pdf) (888 trang)

×