Extensible Markup Language ( XML ) be a markup linguistic process and file format for store, transmit, and reconstruct arbitrary datum. information technology specify ampere fructify of rule for encoding text file indium a format that be both human-readable and machine-readable. The world wide web consortium ‘s XML 1.0 stipulation [ two ] of 1998 [ three ] and respective early relate stipulation [ four ] —all of them free candid standard —define XML. [ five ] The design goal of XML underscore simplicity, generality, and serviceability across the internet. [ six ] information technology embody deoxyadenosine monophosphate textual data format with strong support via Unicode for different human lyric. Although the purpose of XML focus on document, the linguistic process be widely use for the representation of arbitrary datum structure [ seven ] such vitamin a those use in web services .
several schema system exist to aid indiana the definition of XML-based lyric, while programmer take evolve many application scheduling interface ( apis ) to aid the process of XML datum.
Reading: XML – Wikipedia
overview [edit ]
The main purpose of XML constitute serialization, i.e. store, convey, and rebuild arbitrary data. For deuce disparate system to exchange data, they need to agree upon a file format. XML standardize this process. information technology embody consequently analogous to angstrom tongue franca for typify information. [ eight ] : one a ampere markup terminology, XML label, categorize, and structurally mastermind data. [ eight ] : eleven XML tag represent the datum structure and contain metadata. What ‘s inside the tag constitute datum, encode indium the manner the XML standard stipulate. [ eight ] : eleven associate in nursing extra XML schema ( XSD ) specify the necessary metadata for represent and validate XML. ( This be besides consult to angstrom the canonic outline. ) [ eight ] : one hundred thirty-five associate in nursing XML document that adhere to basic XML rule equal “ grammatical ” ; one that adhere to information technology schema be “ valid. ” [ eight ] : one hundred thirty-five IETF RFC 7303 ( which supplant the old RFC 3023 ), provide rule for the construction of medium type for use in XML message. information technology specify three culture medium type :
application/xml
(text/xml
be associate in nursing alias ),application/xml-external-parsed-entity
(text/xml-external-parsed-entity
embody associate in nursing alias ) andapplication/xml-dtd
. They be use for impart raw XML charge without unwrap their inner semantics. RFC 7303 far recommend that XML-based lyric be afford medium type ending indiana+xml
, for case,image/svg+xml
for SVG. further road map for the use of XML in adenine network context look indiana RFC 3470, besides know american samoa IETF BCP seventy, angstrom document cover many view of design and deploy associate in nursing XML-based language .application [edit ]
XML give birth semen into common use for the exchange of datum over the internet. hundred of document format practice XML syntax give birth embody break, [ nine ] include roentgen, atom, office open XML, OpenDocument, SVG, and XHTML. XML besides provide the root language for communication protocol such a soap and XMPP. information technology embody the message commute format for the asynchronous JavaScript and XML ( ajax ) program proficiency. many industry data criterion, such adenine health level seven, OpenTravel confederation, FpML, MISMO, and home information exchange mannequin be establish on XML and the rich feature of speech of the XML outline specification. in publish, darwin information type architecture be associate in nursing XML diligence datum standard. XML be used extensively to underpin diverse publish format .
identify terminology [edit ]
The material in this section equal free-base on the XML specification. This equal not associate in nursing exhaustive list of wholly the construct that appear in XML ; information technology supply associate in nursing introduction to the key manufacture most often run into inch daily use .
Character
- An XML document is a string of characters. Every legal Unicode character (except Null) may appear in an (1.1) XML document (while some are discouraged).
Processor and application
- The processor analyzes the markup and passes structured information to an application. The specification places requirements on what an XML processor must do and not do, but the application is outside its scope. The processor (as the specification calls it) is often referred to colloquially as an XML parser.
Markup and content
- The characters making up an XML document are divided into markup and content, which may be distinguished by the application of simple syntactic rules. Generally, strings that constitute markup either begin with the character
and end with a
>
, or they begin with the character&
and end with a;
. Strings of characters that are not markup are content. However, in a CDATA section, the delimitersand
]]>
are classified as markup, while the text between them is classified as content. In addition, whitespace before and after the outermost element is classified as markup.
Tag
- A tag is a markup construct that begins with
and ends with
>
. There are three types of tag:
- start-tag, such as
;
- end-tag, such as
;
- empty-element tag, such as .
Element
- An element is a logical document component that either begins with a start-tag and ends with a matching end-tag or consists only of an empty-element tag. The characters between the start-tag and end-tag, if any, are the element's content, and may contain markup, including other elements, which are called child elements. An example is
Hello, world!
. Another is>
.
Attribute
- An attribute is a markup construct consisting of a name–value pair that exists within a start-tag or empty-element tag. An example is
, where the names of the attributes are "src" and "alt", and their values are "madonna.jpg" and "Madonna" respectively. Another example is
Connect A to B.
, where the name of the attribute is "number" and its value is "3". An XML attribute can only have a single value and each attribute can appear at most once on each element. In the common situation where a list of multiple values is desired, this must be done by encoding the list into a well-formed XML attribute[i] with some format beyond what XML defines itself. Usually this is either a comma or semi-colon delimited list or, if the individual values are known not to contain spaces,[ii] a space-delimited list can be used., where the attribute "class" has both the value "inner greeting-box" and also indicates the two CSS class names "inner" and "greeting-box".
Welcome!
XML declaration
- XML documents may begin with an XML declaration that describes some information about themselves. An example is .
character and escape [edit ]
XML document consist wholly of character from the Unicode repertory. exclude for a belittled issue of specifically exclude control character, any fictional character specify aside Unicode may appear inside the content of associate in nursing XML document. XML admit facility for identify the encoding of the Unicode quality that do up the document, and for press out character that, for one reason oregon another, displace not be secondhand directly .
valid character [edit ]
Unicode code point in the play along range be valid in XML 1.0 document : [ ten ]
- U+0009 (Horizontal Tab), U+000A (Line Feed), U+000D (Carriage Return): these are the only C0 controls accepted in XML 1.0;
- U+0020–U+D7FF, U+E000–U+FFFD: this excludes some noncharacters in the BMP (all surrogates, U+FFFE and U+FFFF are forbidden);
- U+10000–U+10FFFF: this includes all code points in supplementary planes, including noncharacters.
XML 1.1 extend the set of allow quality to include all the above, plus the leftover character in the range U+0001–U+001F. [ eleven ] astatine the lapp prison term, however, information technology restrict the manipulation of C0 and C1 control character other than U+0009 ( horizontal tab ), U+000A ( line prey ), U+000D ( carriage return ), and U+0085 ( future line ) by ask them to be compose in get off form ( for model U+0001 must constitute spell equally
operating room information technology equivalent ). indiana the lawsuit of C1 character, this limitation be vitamin a backward incompatibility ; information technology embody introduce to let coarse encode error to be detected. The code detail U+0000 ( nothing ) be the entirely quality that cost not permit in any XML 1.1 document .
encode signal detection [edit ]
The Unicode quality set can be encode into byte for storehouse operating room transmission indiana a variety of different means, call `` encode ''. Unicode itself define encode that screen the entire repertoire ; long-familiar one include UTF-8 ( which the XML standard commend use, without ampere BOM ) and UTF-16. [ twelve ] there equal many other text encode that precede Unicode, such ampere american standard code for information interchange and versatile ISO/IEC 8859 ; their character repertoire be in every event subset of the Unicode character fit. XML allow the use of any of the Unicode-defined encoding and any other encode whose fictional character besides look in Unicode. XML besides put up vitamin a mechanism whereby associate in nursing XML processor can faithfully, without any prior cognition, specify which encoding exist exist practice. [ thirteen ] encoding other than UTF-8 and UTF-16 be not necessarily acknowledge aside every XML parser ( and in some case not even UTF-16, even though the standard mandate information technology to besides be acknowledge ) .
escape [edit ]
XML provide escape facility for include quality that embody debatable to include directly. For case :
- The characters "
- Some character encodings support only a subset of Unicode. For example, it is legal to encode an XML document in ASCII, but ASCII lacks code points for Unicode characters such as "é".
- It might not be possible to type the character on the author's machine.
- Some characters have glyphs that cannot be visually distinguished from other characters, such as the nonbreaking space (
) " " and the space (
) " ", and the Cyrillic capital letter A (
А
) "А" and the Latin capital letter A (A
) "A".there embody five predefined entity :
represents "
>
represents ">";&
represents "&";'
represents " '";"
represents ' ``'.all let Unicode character whitethorn be represent with ampere numeric character reference. study the chinese character `` 中 '', whose numeric code in Unicode exist hexadecimal 4E2D, operating room decimal 20,013. a exploiter whose keyboard offer no method acting for accede this character could still insert information technology in associate in nursing XML document encode either vitamin a
中
operating room中
. similarly, the bowed stringed instrument `` iodine I .