W3CHINA.ORG讨论区--[转帖]使用jaxp处理xml注意的事项

http://blogger.org.cn/blog/more.asp?name=hongrui&id=24292

处理xml是经常使用的技术，实用JAXP处理是比较通用的方法，这篇文档主要介绍几个有用的方法和注意的事项。
1.加载xml文件，使用默认的EntityResolver
/**
* Creates and returns a Document from the given input source.
* @param inputSource the source of the document
* @param validating true if the parser should validate
* @param namespaceAware true if the parser should be namespaceAware
* @param errorHandler errorHandler for the DocumentBuilder
* @param coalescing true if the parser produced will convert CDATA
* nodes to Text nodes and append it to the adjacent (if any) text node; false otherwise.
* @param entityResolver the entity resolver
* @return the parsed and loaded DOM document.
*/
public static Document fromInputSource(InputSource inputSource, boolean validating, boolean namespaceAware, ErrorHandler errorHandler, boolean coalescing, EntityResolver entityResolver) throws XMLException
{
   DocumentBuilderFactory dbFact;
   Document doc;
   try
   {
     dbFact = DocumentBuilderFactory.newInstance();
     dbFact.setNamespaceAware(namespaceAware);
     dbFact.setValidating(validating);
     dbFact.setCoalescing(coalescing);
     DocumentBuilder db = dbFact.newDocumentBuilder();
     db..normalize();
     if (errorHandler != null)
     {
       db.setErrorHandler(errorHandler);
     }
     if (entityResolver != null)
     {
       db.setEntityResolver(new EntityResolverDelegate( new EntityResolver[] { entityResolver, getDTDResolver() }));
     }
     else
     {
       db.setEntityResolver(getDTDResolver());
     }
     doc = db.parse(inputSource);
   }
   catch (IOException e)
   {
     e.printStackTrace();
   }
     catch (ParserConfigurationException e)
   {
     e.printStackTrace();
   }
   catch (SAXException e)
   {
     e.printStackTrace();
   }
   return doc;
}
注意的是parse(string),一般这样写
Document doc = builder.parse(new InputSource(new StringReader(strXml)));
对document对象调用normalize()，可以去掉xml文档中作为格式化内容的空白，避免了这些空白映射在dom树中成为不必要的text node对象。否则你得到的dom树可能并不是你所想象的那样。特别是在输出的时候，这个normalize()更为有用。 2.创建一个新的document
**
* Creates a new XML document and returns it.
* @return a new XML document without any data
* @exception XMLException if an error occurred creating the new XML document
*/
public static Document newDocument() throws XMLException
{
   try
   {
     DocumentBuilderFactory factory;
     DocumentBuilder builder;
     factory = DocumentBuilderFactory.newInstance();
     factory.setNamespaceAware(false);
     factory.setValidating(false);
     builder = factory.newDocumentBuilder();
     return builder.newDocument();
   }
   catch (ParserConfigurationException e)
   {
      e.printStackTrace();
   }
}
3.创建一个文本节点
/**
* Creates a child element with a text value
* @param parent The parent element
* @param name The child's name (tag)
* @param textValue The text that goes in the child's element
* @return The newly created element
*/
public static Element createChildElementWithText(Element parent, String name, String textValue)

{
   Element child = parent.getOwnerDocument().createElement(name);
   parent.appendChild(child);
   setTextForNode(child, textValue);
   return child;
}
4.设置文本节点的值
/**
* Updates the text for the given node.
* @param node the node to update
* @param newValue the value to be place at that node
*/
public static void setTextForNode(Node node, String newValue) {
   NodeList children;
   Node childNode;
   children = node.getChildNodes();
   boolean success = false;
   if (children != null)
   {
     for (int i = 0; i < children.getLength(); i++)
     {
       childNode = children.item(i);
       if ((childNode.getNodeType() == org.w3c.dom.Node.TEXT_NODE) || (childNode.getNodeType() == Node.CDATA_SECTION_NODE))
       {
         childNode.setNodeValue(newValue);
         success = true;
       }
     }
   }
   if (!success)
   {
     Text textNode = node.getOwnerDocument().createTextNode(newValue);
     node.appendChild(textNode);
   }
}
5.查找指定名的Element(一个)
/**
* Gets the first child element with the given name or returns null if one can't be found.
* @param parent the parent element of the child element to search for
* @param childName the name of the child element
* @param deepSearch - if True then the search will be performed on all levels
* If False then only Direct childs will be searched
* @return the first child element with the given name or null if one can't be found.
*/
public static Element getFirstChildElementNamed(Element parent, String childName, boolean deepSearch)
{
   if (parent == null)
   {
     throw new NullPointerException("Parent element cannot be null");
   }
   NodeList children = parent.getChildNodes();
   Element child = null;
   for (int i = 0; i < children.getLength() && child == null; i++)
   {
     if (children.item(i).getNodeName().equals(childName))
     {
       child = (Element)children.item(i);
     }
     else if ((deepSearch) && (children.item(i).getNodeType() == Element.ELEMENT_NODE))
     {
       child = getFirstChildElementNamed((Element)children.item(i), childName, deepSearch);
     }
   }
   return child;
}
5.查找指定名的Element(一组)
/**
* Gets any child elements with the given name or returns null if one can't be found.
* @param parent the parent element of the child element to search for
* @param childName the name of the child element
* @param deepSearch - if True then the search will be performed on all levels
* If False then only Direct chileds will be searched
* @return the first child element with the given name or null if one can't be found.
*/
public static Element[] getAllChildElementNamed(Element parent, String childName, boolean deepSearch)
{
   if (parent == null)
   {
     throw new NullPointerException("Parent element cannot be null");
   }
   NodeList children = parent.getChildNodes();
   ArrayList child = new ArrayList();
   for (int i = 0; i < children.getLength(); i++)
   {
     if (children.item(i).getNodeName().equals(childName))
     {
       child.add(children.item(i));
     }
     else if ((deepSearch) && (children.item(i).getNodeType() == Element.ELEMENT_NODE))
     {
       Element[] childs = getAllChildElementNamed((Element)children.item(i), childName, deepSearch);
       for (int j=0; j< childs.length; j++)
       {
         child.add(childs[j]);
       }
     }
   }
   return (Element[])child.toArray(new Element[0]);
}
6.从节点得到值,
/**
* Gets the String value of the node.
If the node does not contain text then an empty String is returned
* @param node the node of interest
* @return the value of that node
*/
public static String getTextForNode(Node node)
{
   NodeList children;
   Node childNode;
   String returnString = “”;
   children = node.getChildNodes();
   if (children != null)
   {
      for (int i = 0; i < children.getLength(); i++)
      {
         childNode = children.item(i);
         if ((childNode.getNodeType() == Node.TEXT_NODE) || (childNode.getNodeType() == Node.CDATA_SECTION_NODE))
         {
            returnString = childNode.getNodeValue();
            break;
         }
      }
   }
   return returnString;
}
/**
* Returns the text from all CDATA sections of the specified element.
* @param element the element
* @return the value of the CDATA sections or null
*/
public static String getCDATA(Element element)
{
   StringBuffer buffer = new StringBuffer();
   NodeList children = element.getChildNodes();
   for (int i = 0; i < children.getLength(); i++)
   {
     Node child = children.item(i);
     if (child.getNodeType() == Node.CDATA_SECTION_NODE)
     {
       CDATASection section = (CDATASection)child;
       buffer.append(section.getNodeValue());
     }
   }
   String returnValue = null;
   if (buffer.length() > 0)
   {
     returnValue = buffer.toString();
   }
   return returnValue;
}
7.xml document转为string
/**
  *
  * @param doc
  *            传人Document
  * @return 返回xml 字符串
  * @throws TransformerException
  */
public static String getStringFromDocument(Document doc)
   throws TransformerException {
  StringWriter writer = new StringWriter();
  Transformer transformer = null;
  transformer = TransformerFactory.newInstance().newTransformer();
  // 获取Transformser对象的输出属性,亦即XSLT引擎的缺省输出属性,是java.util.Properties对象
  Properties properties = transformer.getOutputProperties();
  // 设置新的输出属性:输出字符编码为GB2312,这样可以支持中文字符,
  // XSLT引擎所输出的XML文档如果包含了中文字符,可以正常显示。
  properties.setProperty(OutputKeys.ENCODING, "UTF-8");
  // 这里设置输出为XML格式，实际上这是XSLT引擎的默认输出格式
  properties.setProperty(OutputKeys.METHOD, "xml");
  transformer.setOutputProperties(properties);
  transformer.transform(new DOMSource(doc), new StreamResult(writer));

return writer.toString();
}
8 document存为文件
/**
* Writes the document passed to the file passed.
* @param fileName the name of the file to store the document to
* @param XMLdoc the source document to transform to the file
*/

public static void toFile(String Filename, Document XMLdoc)
{
try {

FileOutputStream fos = new FileOutputStream(new File(Filename));

TransformerFactory tFactory = TransformerFactory.newInstance();
Transformer transformer = tFactory.newTransformer();

   DOMSource source = new DOMSource(XMLdoc);
   StreamResult result = new StreamResult(fos);
   transformer.transform(source, result);
  } catch (TransformerException e) {
   e.printStackTrace(); // To change body of catch statement use
   // File | Settings | File Templates.
  } catch (FileNotFoundException e) {
   e.printStackTrace(); // To change body of catch statement use
   // File | Settings | File Templates.
  }
}
9.使用Xpath
XPath 1.0 是一种 W3C 推荐标准，它定义了一种能够提取部分 XML 文档的语言。您可以抽取一组元素及其所有的后代，也可以仅仅抽取某个属性值。要选择文档的某一部分，必须规定起始节点（称为上下文节点）和所选内容之间的路径。通过指定路径可以选择上下文结点的某个子元素，也可以选择以上下文结点为根的整棵子树中满足复杂表达式（比如包含特定属性值的两个子元素）的所有子元素。
但直到 JAXP 1.3，才最终将这种功能引入了 Java 平台。与以前的 XPath API 不同，JAXP 1.3 是完全厂商中立的，但和解析器以及转换器领域一样，它为系统发现和创建兼容的对象提供了同一类型的工厂机制。JAXP 1.3 API 也不知道底层的数据模型。从理论上说，JAXP 1.3 可以使用任何数据模型，只要精心设计好和 XPath 1.0 所定义的简单数据模型之间的映射（这样就能以确定的方式应用 XPath 表达式）。W3C Document Object Model (DOM) 是要求 JAXP 1.3 实现支持的惟一数据模型。
javax.xml.xpath 包中包含了和新的 XPath API 相关的所有接口和抽象类。用于创建 XPath 表达式求值对象的对象被称为 XPathFactory，而它创建的对象则称为 XPath。
对于特定类型的数据模型， XPathFactory 只需要知道如何创建 XPath 对象。因此，在创建 XPathFactory 时必须指定数据模型。对于明确的 API，需要为数据模型分配一个 URI。如果没有指定 URI，则创建用于 DOM 模型的 XPathFactory。同一 XPath 对象可用于多个 DOM 树，但必须注意 XPath 对象不是线程安全的。
XPath 对象主要有两种使用方式：
要计算 XPath 表达式，将表达式作为简单的 String 对象传递，并给定所支持的数据模型实例中的某个节点作为上下文节点。这种方式称为解释 XPath 表达式，因为 String 被直接应用于数据模型。
将 XPath 表达式从 String 转化为 XPathExpression 对象，然后将该对象应用于所支持的数据模型实例中的任何节点。 XPathExpression 对象通过将 XPath 表达式的 String 表示传递给 XPath 的编译方法来创建。 XPathExpression 对象是原来的 XPath String 的编译表示，代表 XPath 表达式在内部的、经过优化的表示。事实上，在很多实现中 XPathExpression 都完全由 Java 字节码组成，很难再进一步优化。
如果两种方式都可行，就要考虑哪一种更好一些。对于 XPath 表达式，一定要记住将 XPath 表达式编译成优化的表示，其中包括字节码，需要做大量的工作；因此，如果表达式非常简单或者不经常使用，可能就不需要将其编译。但是对于复杂的表达式，特别是在应用程序中频繁使用的表达式，编译可以极大地改善性能。
XPath 和 XPathExpression 都提供了 4 种不同的方法计算 XPath 表达式，是通过重载 evaluate 实现的。对应的方法签名都是一样的，只不过 XPath 的计算方法必须以 XPath 表达式的 String 表示作为第一个参数，而 XPathExpression 可以完全省略这个参数，因为它们已经包含了特定的表达式。为了简便起见，本文只讨论 XPathExpression 的 evaluate 方法。
最常用的 evaluate 形式可能是这种：用对象表示上下文节点，以 QName 表明表达式的返回类型。返回类型可以是 4 种 XPath 1.0 基本数据类型之一：boolean、字符串、节点集和数字。所用的数据模型决定了这些 XPath 数据类型如何在 Java 代码中表示，在 DOM 模型中分别定义为 Boolean、String、org.w3c.dom.NodeList 和 Double。因此在决定如何对该方法的返回值进行强制类型转换时，必须同时考虑预期的返回类型和所用的数据模型。表示上下文节点的对象表示也必须适应数据模型，对于 DOM 可以使用任何类型的 org.w3c.dom.Node。这个包中的 XPathConstants 类为这 4 种 XPath 数据类型定义了 QName。
其他形式的 evaluate 仅仅稍有变化：
evaluate(Object item)：String 是 evaluate(item, XPathConstants.STRING) 的缩写形式，注意返回类型是已知的。
evaluate(InputSource source, QName returnType): Object 将把 org.xml.sax.InputSource 解析成数据模型的一个实例，以根节点作为上下文节点，其他方面都和上一个 evaluate 方法相同。
evaluate(InputSource source): String 时 evaluate(source, XPathConstants.STRING) 的简写形式，是第一个 evaluate 的对称版本。
XPath xpath = XPathFactory.newInstance().newXPath();
   String expression = "/*[@action]";
   NodeList nameNodes = (NodeList) xpath.evaluate(expression, doc,
     XPathConstants.NODESET);
   if (nameNodes.getLength() > 0) {
    NamedNodeMap attributes = nameNodes.item(0).getAttributes();
    if (attributes != null) {
     for (int i = 0; i < attributes.getLength(); i++) {
      Node current = attributes.item(i);
      if (current.getNodeName().equalsIgnoreCase("action")) {
       cd.setAction(current.getNodeValue());
      }
     }
    }

}
xalan的org.apache.xpath.XPathAPI 类已经移植到了 JRE 1.5 中，重构为com.sun.org.apache.xpath.internal.XPathAPI,使用更为简单。


	W 3 C h i n a ( since 2003 ) 旗下站点苏ICP备05006046号《全国人大常委会关于维护互联网安全的决定》《计算机信息网络国际联网安全保护管理办法》	93.750ms