In my recent project I need to import data from MS word to Xml so that it could be easily imported to database. Actually within MS word document only tables are needed to be read. After a few hours of searching, googling and reading articles I found basic clues of .NET MS Word programming. I assume that the pretty simple algorithm for reading tables from Word document, is to use document.tables collection for accessing information located in tables – that is what I needed. And reading tables row by row, excluding unnecessary data like headers.
Everything gone fine until my program started to give me exceptions that can't read from merged rows. Table that I needed to read was something like follow:
|
Name |
Some description |
1. Property |
|
2. Property |
|
3. Property |
|
4. Property |
|
5. Property |
It is critical for me to read merged cells. Meanwhile the only solution for that problem I thought about is to run on table column by column, but I am forced to run on each table few times: First I read all single cells to DataTable, then I start to read merged cell. Each table cell object has rowindex property, this is actually the row index where cell starts. So I read all that indexes to an array so that I know where each cell starts and ends. Using this information I could generate following DataTable:
|
Name |
Some description |
1. Property |
|
Name |
Some description |
2. Property |
|
Name |
Some description |
3. Property |
|
Name |
Some description |
4. Property |
|
Name |
Some description |
5. Property |
I use following code:
foreach (Table tbl in doc.Tables)
{
if(tbl.Columns.Count==3)
{
DataTable dt = new DataTable();
// procedure that add columns to datatable
InitTable(ref dt);
// Insert Steps to table
Console.WriteLine("Inserting steps to scenario "+nScenario);
for(Int32 row=2;row<=tbl.Rows.Count;row++)
{
DataRow dr = dt.NewRow();
dr["Step"]= ReplaceWordTags(tbl.Cell(row,3).Range.Text);
dt.Rows.Add(dr);
}
// get begin of cells
Console.WriteLine("reading indexes ");
Int32[] BeginsOfCells = new int[tbl.Columns[1].Cells.Count];
// insert number to datateable
Console.WriteLine("Inserting No. and Desc to scenario "+nScenario);
for (Int32 i=2;i<=tbl.Columns[1].Cells.Count;i++)
{
Cell TmpCell = tbl.Columns[1].Cells[i];
BeginsOfCells[i-2]=TmpCell.RowIndex;
}
BeginsOfCells[tbl.Columns[1].Cells.Count-1]=tbl.Rows.Count;
for (Int32 i=2;i<=tbl.Columns[1].Cells.Count;i++)
{
Cell TmpCell = tbl.Columns[1].Cells[i];
Cell TmpCellDesc = tbl.Columns[2].Cells[i];
TmpCellDesc.Select();
Selection newselect = ThisApplication.Selection;
string CellDesc= ReplaceWordTags(newselect.Text);
string CellValue=ReplaceWordTags(TmpCell.Range.Text);
for(Int32 f=BeginsOfCells[i-2];f<BeginsOfCells[i-1];f++)
{
dt.Rows[f-2]["Number"]=CellValue;
dt.Rows[f-2]["Descript"]=CellDesc;
}
}
nScenario= nScenario+1;
dt.TableName = "Scenario"+nScenario;
ds.Tables.Add(dt);
}
Relevant links:
Understanding the Word Object Model from a .NET Developer's Perspective