The XMLStreamReader is designed to iterate over XML using next() and hasNext(). The data can be accessed using methods such as getEventType(), getNamespaceURI(), getLocalName() and getText();
The next() method causes the reader to read the next parse event. The next() method returns an integer which identifies the type of event just read.
The event type can be determined using getEventType().
Parsing events are defined as the XML Declaration, a DTD, start tag, character data, white space, end tag, comment, or processing instruction. An attribute or namespace event may be encountered at the root level of a document as the result of a query operation.
The following table describes which methods are valid in what state. If a method is called in an invalid state the method will throw a java.lang.IllegalStateException.
Valid methods for each state | |
---|---|
Event Type | Valid Methods |
All States | getProperty(), hasNext(), require(), close(), getNamespaceURI(), isStartElement(), isEndElement(), isCharacters(), isWhiteSpace(), getNamespaceContext(), getEventType(),getLocation(), hasText(), hasName() |
START_ELEMENT | next(), getName(), getLocalName(), hasName(), getPrefix(), getAttributeXXX(), isAttributeSpecified(), getNamespaceXXX(), getElementText(), nextTag(), getXMLObject() |
ATTRIBUTE | next(), nextTag() getAttributeXXX(), isAttributeSpecified(), |
NAMESPACE | next(), nextTag() getNamespaceXXX() |
END_ELEMENT | next(), getName(), getLocalName(), hasName(), getPrefix(), getNamespaceXXX(), nextTag() |
CHARACTERS | next(), getTextXXX(), nextTag() |
CDATA | next(), getTextXXX(), nextTag() |
COMMENT | next(), getTextXXX(), nextTag() |
SPACE | next(), getTextXXX(), nextTag() |
START_DOCUMENT | next(), getEncoding(), getVersion(), isStandalone(), standaloneSet(), getCharacterEncodingScheme(), nextTag() |
END_DOCUMENT | close() |
PROCESSING_INSTRUCTION | next(), getPITarget(), getPIData(), nextTag() |
ENTITY_REFERENCE | next(), getLocalName(), getText(), nextTag() |
DTD | next(), getText(), nextTag() |
The following is a code sample to read an XML file containing multiple "myobject" sub-elements. Only one myObject instance is kept in memory at any given time to keep memory consumption low:
var fileReader : FileReader = new FileReader(file, "UTF-8"); var xmlStreamReader : XMLStreamReader = new XMLStreamReader(fileReader); while (xmlStreamReader.hasNext()) { if (xmlStreamReader.next() == XMLStreamConstants.START_ELEMENT) { var localElementName : String = xmlStreamReader.getLocalName(); if (localElementName == "myobject") { // read single "myobject" as XML var myObject : XML = xmlStreamReader.getXMLObject(); // process myObject } } } xmlStreamReader.close(); fileReader.close();
Precondition: the current event is START_ELEMENT.
Postcondition: the current event is the corresponding END_ELEMENT.
The method does the following (implementations are free to be optimized but must do equivalent processing):
if ( getEventType() != XMLStreamConstants.START_ELEMENT ) { throw new XMLStreamException( "parser must be on START_ELEMENT to read next text", getLocation() ); } int eventType = next(); StringBuffer content = new StringBuffer(); while ( eventType != XMLStreamConstants.END_ELEMENT ) { if ( eventType == XMLStreamConstants.CHARACTERS || eventType == XMLStreamConstants.CDATA || eventType == XMLStreamConstants.SPACE || eventType == XMLStreamConstants.ENTITY_REFERENCE ) { buf.append( getText() ); } else if ( eventType == XMLStreamConstants.PROCESSING_INSTRUCTION || eventType == XMLStreamConstants.COMMENT ) { // skipping } else if ( eventType == XMLStreamConstants.END_DOCUMENT ) { throw new XMLStreamException( "unexpected end of document when reading element text content", this ); } else if ( eventType == XMLStreamConstants.START_ELEMENT ) { throw new XMLStreamException( "element text content may not contain START_ELEMENT", getLocation() ); } else { throw new XMLStreamException( "Unexpected event type " + eventType, getLocation() ); } eventType = next(); } return buf.toString();
The stream must be positioned on a START_ELEMENT. Do not call the method when the stream is positioned at document's root element. This would cause the whole document to be parsed into a single XML what may lead to an out-of-memory condition. Instead use #next() to navigate to sub-elements and invoke getXMLObject() there. Do not keep references to more than the currently processed XML to keep memory consumption low. The method reads the stream up to the matching END_ELEMENT. When the method returns the current event is the END_ELEMENT event.
Precondition: the current event is START_ELEMENT.
Postcondition: the current event is the corresponding END_ELEMENT.
The method does the following (implementations are free to be optimized but must do equivalent processing):
if ( getEventType() != XMLStreamConstants.START_ELEMENT ) { throw new XMLStreamException( "parser must be on START_ELEMENT to read next text", getLocation() ); } int eventType = next(); StringBuffer content = new StringBuffer(); while ( eventType != XMLStreamConstants.END_ELEMENT ) { if ( eventType == XMLStreamConstants.CHARACTERS || eventType == XMLStreamConstants.CDATA || eventType == XMLStreamConstants.SPACE || eventType == XMLStreamConstants.ENTITY_REFERENCE ) { buf.append( getText() ); } else if ( eventType == XMLStreamConstants.PROCESSING_INSTRUCTION || eventType == XMLStreamConstants.COMMENT ) { // skipping } else if ( eventType == XMLStreamConstants.END_DOCUMENT ) { throw new XMLStreamException( "unexpected end of document when reading element text content", this ); } else if ( eventType == XMLStreamConstants.START_ELEMENT ) { throw new XMLStreamException( "element text content may not contain START_ELEMENT", getLocation() ); } else { throw new XMLStreamException( "Unexpected event type " + eventType, getLocation() ); } eventType = next(); } return buf.toString();
NOTE:The 'xml' prefix is bound as defined in Namespaces in XML specification to "http://www.w3.org/XML/1998/namespace".
NOTE: The 'xmlns' prefix must be resolved to following namespace http://www.w3.org/2000/xmlns/
The stream must be positioned on a START_ELEMENT. Do not call the method when the stream is positioned at document's root element. This would cause the whole document to be parsed into a single XML what may lead to an out-of-memory condition. Instead use #next() to navigate to sub-elements and invoke getXMLObject() there. Do not keep references to more than the currently processed XML to keep memory consumption low. The method reads the stream up to the matching END_ELEMENT. When the method returns the current event is the END_ELEMENT event.
Given the following XML:
<foo><!--description-->content text<![CDATA[<greeting>Hello</greeting>]]>other content</foo>
The behavior of calling next() when being on foo will be:
1- the comment (COMMENT)
2- then the characters section (CHARACTERS)
3- then the CDATA section (another CHARACTERS)
4- then the next characters section (another CHARACTERS)
5- then the END_ELEMENT
NOTE: empty element (such as <tag/>) will be reported with two separate events: START_ELEMENT, END_ELEMENT - This preserves parsing equivalency of empty element to <tag></tag>. This method will throw an IllegalStateException if it is called after hasNext() returns false.
Precondition: none
Postcondition: the current event is START_ELEMENT or END_ELEMENT and cursor may have moved over any whitespace event.
Essentially it does the following (implementations are free to optimized but must do equivalent processing):
int eventType = next(); while ( (eventType == XMLStreamConstants.CHARACTERS && isWhiteSpace() ) || (eventType == XMLStreamConstants.CDATA && isWhiteSpace()) || eventType == XMLStreamConstants.SPACE || eventType == XMLStreamConstants.PROCESSING_INSTRUCTION || eventType == XMLStreamConstants.COMMENT ) { eventType = next(); } if ( eventType != XMLStreamConstants.START_ELEMENT && eventType != XMLStreamConstants.END_ELEMENT ) { throw new String XMLStreamException( "expected start or end tag", getLocation() ); } return eventType;
Precondition: the current event is START_ELEMENT.
Postcondition: the current event is the corresponding END_ELEMENT.
The method does the following (implementations are free to be optimized but must do equivalent processing):
if ( getEventType() != XMLStreamConstants.START_ELEMENT ) { throw new XMLStreamException( "parser must be on START_ELEMENT to read next text", getLocation() ); } int eventType = next(); StringBuffer content = new StringBuffer(); while ( eventType != XMLStreamConstants.END_ELEMENT ) { if ( eventType == XMLStreamConstants.CHARACTERS || eventType == XMLStreamConstants.CDATA || eventType == XMLStreamConstants.SPACE || eventType == XMLStreamConstants.ENTITY_REFERENCE ) { buf.append( getText() ); } else if ( eventType == XMLStreamConstants.PROCESSING_INSTRUCTION || eventType == XMLStreamConstants.COMMENT ) { // skipping } else if ( eventType == XMLStreamConstants.END_DOCUMENT ) { throw new XMLStreamException( "unexpected end of document when reading element text content", this ); } else if ( eventType == XMLStreamConstants.START_ELEMENT ) { throw new XMLStreamException( "element text content may not contain START_ELEMENT", getLocation() ); } else { throw new XMLStreamException( "Unexpected event type " + eventType, getLocation() ); } eventType = next(); } return buf.toString();
The stream must be positioned on a START_ELEMENT. Do not call the method when the stream is positioned at document's root element. This would cause the whole document to be parsed into a single XML what may lead to an out-of-memory condition. Instead use #next() to navigate to sub-elements and invoke getXMLObject() there. Do not keep references to more than the currently processed XML to keep memory consumption low. The method reads the stream up to the matching END_ELEMENT. When the method returns the current event is the END_ELEMENT event.