Tải bản đầy đủ - 0 (trang)
Appendix A. The XML You Need for Office

Appendix A. The XML You Need for Office

Tải bản đầy đủ - 0trang

A.1WhatIsXML?

XML,theExtensibleMarkupLanguage,isanInternet-friendly

formatfordataanddocumentsinventedbytheWorldWideWeb

Consortium(W3C).The"Markup"denotesawayofexpressing

thestructureofadocumentwithinthedocumentitself.XMLhas

itsrootsinamarkuplanguagecalledSGML(Standard

GeneralizedMarkupLanguage),whichisusedinpublishing.

HTMLwasanapplicationofSGMLtowebpublishing.XMLwas

createdtodoformachine-readabledocumentsontheWeb

whatHTMLdidforhuman-readabledocumentsthatis,providea

commonlyagreed-uponsyntaxsothatprocessingthe

underlyingformatbecomescommonplaceanddocumentsare

madeaccessibletoallusers.

UnlikeHTML,though,XMLcomeswithverylittlepredefined.

HTMLdevelopersareaccustomedbothtothenotionofusing

anglebrackets(<>)fordenotingelements,andalsototheset

ofelementnamesthemselves(suchashead,body,etc.).XML

sharesonlytheformerfeature(i.e.,thenotionofusingangle

bracketsfordenotingelements).UnlikeHTML,XMLhasno

predefinedelements,butismerelyasetofrulesthatletsyou

writeotherlanguageslikeHTML.

BecauseXMLdefinessolittle,itiseasyforeveryonetoagreeto

usetheXMLsyntax,andthentobuildapplicationsontopofit.

It'slikeagreeingtouseaparticularalphabetandsetof

punctuationsymbols,butnotsayingwhichlanguagetouse.

Thisoffersimmenseflexibility,muchliketheflexibilityyou're

usedtohavingincreatingyourownWordtemplates,Excel

spreadsheets,orAccessdatabases.



A.2AnatomyofanXMLDocument

ThebestwaytoexplainhowanXMLdocumentiscomposedis

topresentone.ExampleA-1showsanXMLdocumentyou

mightusetodescribetwoauthors.



ExampleA-1.AverysimpleXMLdocument







EdwardLear

British





IsaacAsimov

American









ThefirstlineofthedocumentisknownastheXMLdeclaration.

ThistellsaprocessingapplicationwhichversionofXMLyouare

usingtheversionindicatorismandatoryandwhichcharacter

encodingyouhaveusedforthedocument.Inthisexample,the

documentisencodedinASCII.(Thesignificanceofcharacter

encodingiscoveredlaterinthisappendix.)

IftheXMLdeclarationisomitted,aprocessorwillmakecertain

assumptionsaboutyourdocument.Inparticular,itwillexpectit

tobeencodedinUTF-8,anencodingoftheUnicodecharacter

set.However,itisbesttousetheXMLdeclarationwherever

possible,bothtoavoidconfusionoverthecharacterencoding

andtoindicatetoprocessorswhichversionofXMLyou'reusing.

(1.0ismostcommon,but1.1,whichmakesrelativelyminor

thoughpotentiallyincompatiblechanges,hasrecently

appeared.)EncodinghandlingshouldbeautomaticwithOffice,

butyoumayneedtowatchfordocumentsyouimportfrom

othersources.



A.2.1ElementsandAttributes

ThesecondlineofExampleA-1beginsanelement,whichhas

beennamedauthors.Thecontentsofthatelementinclude

everythingbetweentherightanglebracket(>)in

andtheleftanglebracket(<)in
.Theactual

syntacticconstructsandareoften

referredtoastheelementstarttagandendtag,respectively.

Donotconfusetagswithelements!Tagsmarktheboundaries

ofelements.Notethatelements,liketheauthorselement

here,mayincludeotherelements,aswellastext.AnXML

documentmustcontainexactlyonerootelement,which

containsallothercontentwithinthedocument.Thenameofthe

rootelementdefinesthetypeoftheXMLdocument.

Elementsthatcontainbothtextandotherelements

simultaneouslyareclassifiedasmixedcontent.Wordsupports



theuseofmixedcontent,whiletheotherapplicationsinthe

Officesuitegenerallydonot.

Thesample"authors"documentuseselementsnamedperson

todescribetheauthorsthemselves.Eachpersonelementhas

anattributenamedid.Unlikeelements,attributescanonly

containtextualcontent.Theirvaluesmustbesurroundedby

quotes.Eithersinglequotes(')ordoublequotes(")maybe

used,aslongasyouusethesamekindofclosingquoteasthe

openingone.

WithinXMLdocuments,attributesarefrequentlyusedfor

metadata(i.e.,"dataaboutdata"),describingpropertiesofthe

element'scontents.Thisisthecaseinourexample,whereid

containsauniqueidentifierforthepersonbeingdescribed.

AsfarasXMLisconcerned,itdoesnotmatterinwhatorder

attributesarepresentedintheelementstarttag.Forexample,

thesetwoelementscontainexactlythesameinformationasfar

asanXML1.0conformantprocessingapplicationisconcerned:







Ontheotherhand,theinformationpresentedtoanapplication

byanXMLprocessoronreadingthefollowingtwolineswillbe

differentforeachanimalelementbecausetheorderingof

elementsissignificant:

dog4

4dog



XMLtreatsasetofattributeslikeabunchofstuffinabagthere

isnoimplicitorderingwhileelementsaretreatedlikeitemsona

list,whereorderingmatters.

NewXMLdevelopersfrequentlyaskwhenitisbesttouse

attributestorepresentinformationandwhenitisbesttouse

elements.Asyoucanseefromthe"authors"example,iforder

isimportanttoyou,thenelementsareagoodchoice.In

general,thereisnohard-and-fastbestpracticeforchoosing

whethertouseattributesorelements,thoughelementscan

containotherelementsandattributes,whileattributescan

containonlytext.

Thefinalauthordescribedinourdocumenthasnoinformation

available.AllweknowaboutthispersonishisorherID,

mysteryperson.ThedocumentusestheXMLshortcutsyntax

foranemptyelement.Thefollowingisareasonablealternative:





A.2.2NameSyntax

XML1.0hascertainrulesaboutelementandattributenames.

Inparticular:

Namesarecase-sensitive,e.g.,isnotthesame

as.

Namesbeginningwithxml(inanypermutationofuppercase

orlowercase)arereservedforusebyXML1.0andits

companionspecifications.

Anamemuststartwithaletteroranunderscore,nota



digit,andmaycontinuewithanyletter,digit,underscore,or

period.(Actually,anamemayalsocontainacolon,butthe

colonisusedtodelimitanamespaceprefixandisnot

availableforarbitraryuseasoftheSecondEditionofXML

1.0.)

AprecisedescriptionofnamescanbefoundinSection2.3of

theXML1.0specification,athttp://www.w3.org/TR/RECxml#sec-common-syn.



A.2.3XMLNamespaces

XML1.0letsdeveloperscreatetheirownelementsand

attributes,butleavesopenthepotentialforoverlappingnames.

titleinonecontextmaymeansomethingentirelydifferent

thantitleinadifferentcontext.TheNamespacesinXML

specification(whichcanbefoundat

http://www.w3.org/TR/REC-xml-names/)providesamechanism

developerscanusetoidentifyparticularvocabulariesusing

UniformResourceIdentifiers(URIs).

URIsareacombinationofthefamiliarUniformResource

Locators(URLs)andUniformResourceNames(URNs).Fromthe

perspectiveofXMLnamespaces,URIsareconvenientbecause

theycombineaneasilyusedsyntaxwithanotionofownership.

Whileit'spossibleformetocreatenamespaceURIsthatbegin

withhttp://microsoft.com,generalpracticeholdsthatit

wouldbebetterformetocreateURIsthatbeginwith

http://simonstl.com,adomainIown,andleave

http://microsoft.comtoMicrosoft.Ingeneral,organizations

andindividualswhocreateXMLvocabulariesshouldchoosea

namespaceURIinaspacetheycontrol.Thismakesitpossible

(thoughitisn'trequired)toputinformationtheredocumenting

thevocabulary,orotherresourcesforprocessingthe

vocabulary.



TherulesforXMLnamesdon'tpermitdeveloperstocreate

elementswithnameslike

http://simonstl.com/ns/mine:Title,andit'snotclearthat

workingwithnameslikethatwouldbemuchfunanyway.Toget

aroundtheseproblems,theNamespacesinXMLspecification

definesamechanismforassociatingURIswithelementand

attributenamesthroughprefixes.Insteadoftypingoutthe

wholeURI,developerscanworkwithamuchshorterprefix,or

evensetadefaultURIthatappliestonameswithoutprefixes.

Tocreateaprefix,youuseanamespacedeclaration,which

lookslikeanattribute.Forexample,tocreateaprefixofxhtml

associatedwiththeURIhttp://www.w3.org/1999/xhtml,you

woulduseanxmlns:xhtmlattributeasshownbelow:



....





Toapplyaprefix,youputitinfrontoftheelementorattribute

name,withacolonseparatingtheprefixfromthename.Toput

anXHTMLpelementinsideofthatcontainer,youcouldwrite:



ThisisanXHTMLparagraph!





Whenaprogramencounteredthexhtml:p,itwouldknowthat

pwasthelocalnameoftheelement,xhtmlwastheprefix,and



http://www.w3.org/1999/xhtmlwastheURIforthatelement.

Thenamespacedeclarationappliestoallelementsinsidethe

elementwhereitappears,aswellastheelementcontainingthe

declaration.Forexample,thexhtmlprefixworksforallthreeof

theseparagraphs:



ThisisXHTMLparagraph1!

ThisisXHTMLparagraph2!

ThisisXHTMLparagraph3!





InmostXMLprocessing,theprefixdoesn'tmatterthelocal

nameandtheURIarewhatcount,andtheprefixisjusta

mechanismforassociatingthem.(Thisisespeciallyimportantin

XSLTprocessingandXMLSchemas.)Insomedocuments,

especiallydocumentsthatuseonlystructuresfromone

namespaceorwhereonevocabularyisdominant,developers

choosetousethedefaultnamespaceratherthanprefixes.

Whenthedefaultnamespaceisused(assignedwithanxmlns

attribute),elementswithoutaprefixareassociatedwithagiven

URI.InXHTML,anXMLderivativeofHTML,thisisthemost

typicalpath,sinceHTMLdevelopersaren'tusedtoputting

prefixesonalloftheirelementnames.AtypicalXHTML

documentmightlooklikethis:





MyDocument







Couldusesomecontenthere









Inthiscase,theURIhttp://www.w3.org/1999/xhtmlapplies

toeveryelementinthedocument,includinghtml,head,title,

body,andp.Thedefaultnamespacehasonequirk,though:it

doesn'tapplytoattributes.Attributescanbegivena

namespacebyexplicitlyusingaprefixintheirname,but

unprefixedattributeshavenonamespaceURI.Thisoften

doesn'tmatter,butitcanbeimportantwhenwritingXSLT

stylesheetsandcreatingXMLSchemas.

Typically,thenamespacesusedbyadocumentaredeclaredon

therootelementofthedocument,whichletsthemapplytoall

thecontentinsidethatdocument.Theycan,ofcourse,alsobe

declaredthroughoutthedocument,thoughthismakesitmore

difficulttoread.Declarationscanoverrideeachotheraswell,

andthedeclarationclosesttoagivenuseofaprefixinthe

hierarchywillbeused.ThisletsdevelopersmixandmatchXML

vocabulariesevenwhentheyusethesameprefix.

NamespacesareverysimpleonthesurfacebutareawellknownfieldofcombatinXMLarcana.Formoreinformationon

namespaces,seeTimBray's"XMLNamespacesbyExample,"

publishedat

http://www.xml.com/pub/a/1999/01/namespaces.html;XMLIn

aNutshell;orLearningXML.



A.2.4Well-Formedness

AnXMLdocumentthatconformstotherulesofXMLsyntaxis

knownaswell-formed.Atitsmostbasiclevel,well-formedness

meansthatelementsshouldbeproperlymatched,andall

openedelementsshouldbeclosed.AformaldefinitionofwellformednesscanbefoundinSection2.1oftheXML1.0

specification,athttp://www.w3.org/TR/REC-xml#sec-wellformed.TableA-1showssomeXMLdocumentsthatarenot

well-formed.

TableA-1.ExamplesofpoorlyformedXMLdocuments

Document



Reasonwhyit'snotwell-formed











Theelementsarenotproperlynestedbecausefooisclosedwhileinsideits

childelementbar.













Thebarelementwasnotclosedbeforeitsparent,foo,wasclosed.













Thebazattributehasnovalue.WhilethisispermissibleinHTML(e.g.,

),itisforbiddeninXML.









Thebazattributevalue,23,hasnosurroundingquotes.UnlikeHTML,all

attributevaluesmustbequotedinXML.



A.2.5CommentsandProcessingInstructions

AsinHTML,itispossibletoincludecommentswithinXML

documents.XMLcommentsareintendedtobereadonlyby

people.WithHTML,developershaveoccasionallyemployed

commentstoaddapplication-specificfunctionality.Forexample,

theserver-sideincludefunctionalityofmostwebserversuses

instructionsembeddedinHTMLcomments.InXML,comments

shouldnotbeusedforanypurposeotherthanthoseforwhich

theywereintended,astheyareusuallystrippedfromthe

documentduringparsing.

Thestartofacommentisindicatedwith.Anysequenceofcharacters,asidefrom

thestring--,mayappearwithinacomment.Commentscan

appearatthestartorendofadocumentaswellasinside

elements.Theycannotappearinsideattributesorinsideofa

tag.Acommentmightlooklike:





CommentstendtobeusedmoreinXMLdocumentsintended

forhumanconsumptionthanthoseintendedformachine

consumption.IfyouwanttopassinformationtoanXML

applicationwithoutaffectingthestructureofthedocument,you

canuseprocessinginstructions,orPIs.Processinginstructions

useasaclosingdelimiter,

mustcontainatargetconformingtotherulesforXMLnames,

andmaycontainadditionaldata.AtypicalPImightlooklike:





Tài liệu bạn tìm kiếm đã sẵn sàng tải về

Appendix A. The XML You Need for Office

Tải bản đầy đủ ngay(0 tr)

×