Shredding XML into HIERARCHYID – A New Look

In a previous blog post, I had discussed a method of shredding XML to a table with HIERARCHYID, and realized that it had a dependency that I was not too keen about: The XML data required an “id” attribute in order to create the hierarchy. I had sorted out a way to inject a unique attribute ID into all the nodes (I’ll discuss this in a follow up post), but having to modify the original XML didn’t have much appeal. But, upon reading another post by my fellow blogger, Adam Machanic, I realized it could be done without this requirement. Using the technique that Adam presented, I can generate unique paths to be parsed into a HIERARCHYID column.

SET @x = '<a someAttribute="1"><b><c>abc</c><c anotherAttribute="2">def</c></b><b><c>abc</c><c>def</c></b></a>'

DECLARE @T TABLE (NodeName VARCHAR(255), Attributes XML, NodeText VARCHAR(MAX), HierarchyNode HIERARCHYID)
;
WITH N (Node, NodeName, Attributes, NodeText, HierarchyPath
AS
SELECT
      
CAST(Expr.query('.') AS XML) -- Node
      
, CAST(Expr.value('local-name(.)', 'varchar(255)') AS VARCHAR(255)) -- NodeName
      
, CASE WHEN Expr.value('count(./@*)', 'INT') >
              
THEN Expr.query('<a>{for $a in ./@* return $a}</a>'
              
ELSE NULL END -- Attributes
      
, CAST(Expr.value('./text()[1]', 'varchar(max)') AS VARCHAR(MAX)) -- NodeText
      
, CAST('/' AS VARCHAR(1000)) -- HierarchyPath
  
FROM @x.nodes('/*[1]') AS Res(Expr
       
  
UNION ALL
   
  
SELECT  
      
Expr.query('.') -- Node
      
, CAST(Expr.value('local-name(.)', 'varchar(255)') AS VARCHAR(255)) -- NodeName
      
, CASE WHEN Expr.value('count(./@*)', 'INT') >
              
THEN Expr.query('<a>{for $a in ./@* return $a}</a>'
              
ELSE NULL END -- Attributes
      
, CAST(Expr.value('./text()[1]', 'varchar(max)') AS VARCHAR(MAX)) -- NodeText
      
, CAST(N.HierarchyPath 
             + CAST(DENSE_RANK() OVER (ORDER BY Expr) AS VARCHAR(1000)) 
             +
'/' AS VARCHAR(1000)) -- HierarchyPath
  
FROM N CROSS APPLY Node.nodes('*/*') AS Res(Expr)
)
INSERT INTO @T (NodeName, Attributes, NodeText, HierarchyNode)
SELECT NodeName, Attributes, NodeText, CAST(HierarchyPath AS HIERARCHYID)
FROM N
ORDER BY CAST(HierarchyPath AS HIERARCHYID)

SELECT * FROM @T

For this example, I simple grab the node name, the node text, and the attributes (when they exist) as a simple XML value of the format:

<a [attribute1=”attribute value” [attribute2=”attribute value”]…] />

Of course, these values could also be shredded into the hierarchy. One way of doing this would be to add an additional column to the results that represents the type of entry in the hierarchy (node versus attribute). My challenge to you is to create that solution.

Have fun!


Cross-posted from SQLBlog! – http://www.sqlblog.com

Advertisement
Explore posts in the same categories: Blogroll

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Connecting to %s


Follow

Get every new post delivered to your Inbox.