Ran into a performance issue in a .Net remoting situation. A Winforms app is calling an application server asking for data. A relatively large DataSet (>10,000 rows, <6 columns) being passed over the wire was causing a performance problem. The database and application servers processed it quickly. Examining the transfer with WireShark showed that the transfer wasn't so bad either. There was a flurry of data passed, and then a bunch of waiting on the client-side, with the client CPU usage around 50% the entire duration of the wait. Turns out there is a calculated column in one of the data tables. The column is not calculated on the application server-side, so as not to pass a bunch of data across the wire that would be unnecessary. The calc happens on the client. That was the source of the slowdown and CPU usage. In the end the solution to the problem was not using the calculated column, we found a different solution to fix the business problem. I suppose you could perform the calculation in the SQL statement that was ultimately filling the DataSet. That might take longer to transfer, but won't slow down the client app.
There is an article in Dr. Dobbs that describes a survey performed by the author. The survey was intended to query Project Managers, IT Managers, IT staff and business stakeholders on what defined the success of a project.
There were definitely some interesting findings:
- The IT industry has a long way to go to achieve a 100% success rate for projects
- Agile projects were more successful than traditional waterfall projects
- Off-shored projects were most likely to fail
- Respondents rated quality as the most important issue, and rated the importance of following items this way when analyzed as a group:
Quality > Scope > Staff Health > Time of Completion > Money
- Project managers differed significantly from all other groups in their perception of success, rating time and money over quality.
- Business stakeholders placed a higher value on ROI and and shipping when ready than the rest of the groups
- A majority of respondents in all groups worked on projects they knew would fail from the start, but canceling a troubled project was not viewed as a successful outcome
I am not surprised by project managers having a different view, given that they are trained to value on-time and on-budget projects. Someone has to keep an eye on the bottom line, but I have experienced this gap when a project runs into trouble. I have been thinking about the differences between Agile and Waterfall-style projects and how they differ. I think there is some middle ground between the two that is yet to be identified that gives us the benefits of heavier requirements and design and the responsiveness to change.
I recently gave a talk about Open XML, and found that there were not many complete code samples out there which described how to build Office 2007 documents using .Net and SpreadsheetML. Most of the examples I ran into were snippets or functions, or just examples of the SpreadsheetML. As one of my demos, I created a C# class which builds a basic spreadsheet. This post describes that class.
There are prerequisite installs required to run this code:
- .Net 3.0 Framework (System.IO.Packaging is part of WPF)
- SDK for Open XML Formats, which is currently a CTP, so the code is subject to change if the object model changes at all with the final release (so therefore does the code in this post).
- Code Snippets that are available for Open XML.
The class (called Spreadsheet) does two basic things:
- Create a spreadsheet package
- Insert data into a worksheet in the newly created package
The first step is creating the package, which consists of XML files for the SpreadsheetML and XML files which manage the relationships between those files. In an Open XML spreadsheet, the minimal spreadsheet package requires three documents containing SpreadsheetML:
- A workbook file
- A worksheet file
- A relationship file
Additionally, SpreadsheetML uses a concept called "Shared Strings". SpreadsheetML dictates storing Shared Strings separately from the worksheet in their own document, so the document stores less data if the document re-uses strings. Strings can also be added to the spreadsheet "in-line" and not used Shared Strings storage. For this example labels are stored as Shared Strings to demonstrate the concept, therefore the spreadsheet package also requires a Shared Strings document.
The SDK for Open XML Formats provides a new component, Microsoft.Office.DocumentFormat.OpenXml.dll, that wraps some of the functionality of creating an Open XML document with System.IO.Packaging. Essentially it manages creating the files and the relationships between the files in the package. Once you have created the files and relationships, you still need to create code to insert actual data into the documents. This example uses two steps:
- Create the basic XML document using a template of existing XML
- Insert data into the existing XML.
The following are the contents of three small XML files created and added to a Templates directory in the solution. These three files are the basis for the required parts of the package:
The workbook template
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<workbook xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships">
<sheets>
<sheet name="{1}" sheetId="1" r:id="{0}" />
</sheets>
</workbook>
Notice that the XML contains .Net placeholders. Later on we can replace these with actual values that can vary at run time.
The worksheet template
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<worksheet xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" >
<sheetData/>
</worksheet>
The shared strings template
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<sst xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main">
</sst>
These XML templates make up the basic content of the package. The C# class contains a CreateSpreadSheet procedure which will create the basic pieces of the package. The main thing to notice is that by creating the part object (workbook, shared strings, worksheet), you are only creating the part file, not the content of that part file. The templates above become the content for the parts. There is no need to manage the relationship files directly, the API is doing that automatically.
public void CreateSpreadsheet(string path, string firstSheetName)
{
using (SpreadsheetDocument doc = SpreadsheetDocument.Create(path, SpreadsheetDocumentType.Workbook))
{
//Add the workbook
WorkbookPart workbook = doc.AddWorkbookPart();
//Create the shared strings part
SharedStringTablePart stringTable = workbook.AddNewPart<SharedStringTablePart>();
this.AddPartXml(stringTable, this.ReadXML(@"Templates\SharedStringTemplate.xml"));
//Create a worksheet
WorksheetPart sheet = workbook.AddNewPart<WorksheetPart>();
//Get the relationship id so the workbook and worksheet can be related
string sheetId = workbook.GetIdOfPart(sheet);
this.AddPartXml(workbook, this.WorkbookXml(sheetId, firstSheetName));
this.AddPartXml(sheet, this.ReadXML(@"Templates\WorkSheetTemplate.xml"));
doc.Close();
}
}
The only interesting part is retrieving the ID of the worksheet part when building the workbook part. To create the content of each part the procedure opens an XML file and streams the content into the file. There are helper functions for this, which are really just standard ways of handling XML in .Net:
protected void AddPartXml(OpenXmlPart part, string xml)
{
using (Stream stream = part.GetStream())
{
byte[] buffer = (new UTF8Encoding()).GetBytes(xml);
stream.Write(buffer, 0, buffer.Length);
}
}
protected string ReadXML(string fileName)
{
StreamReader reader = new StreamReader(Environment.CurrentDirectory + @"\" + fileName);
string contents = reader.ReadToEnd();
return contents;
}
protected string WorkbookXml(string sheetId, string sheetName)
{
string contents = this.ReadXML(@"Templates\WorkbookTemplate.xml");
return string.Format(contents, sheetId, sheetName);
}
Notice the WorkbookXml procedure has a call to string.Format to replace some placeholders with actual data: the ID of the worksheet part relationship and the name of the worksheet. The name of the worksheet is important later, when we want to add data to the worksheet.
The second step is to actually add data to the worksheet. The class uses two functions available as Code Snippets (XLInsertStringIntoCell, and XLInsertNumberIntoCell). I won't reproduce the code here as I don't own it, but essentially the functions open the proper parts and insert the data. These functions take in the file, the sheet name, cell reference and cell value as parameters.
Lastly, I wrote a console app to exercise the Spreadsheet class:
class Program
{
protected static readonly string fileName = "example.xlsx";
protected static readonly string firstSheetName = "Sheet1";
static void Main(string[] args)
{
string path = Environment.CurrentDirectory + @"\" + fileName;
Spreadsheet file = new Spreadsheet();
file.CreateSpreadsheet(path, firstSheetName);
file.XLInsertStringIntoCell(fileName, firstSheetName, "A1", "Category");
file.XLInsertStringIntoCell(fileName, firstSheetName, "B1", "Value");
file.XLInsertStringIntoCell(fileName, firstSheetName, "A2", "Red");
file.XLInsertNumberIntoCell(fileName, firstSheetName, "B2", 30);
file.XLInsertStringIntoCell(fileName, firstSheetName, "A3", "Blue");
file.XLInsertNumberIntoCell(fileName, firstSheetName, "B3", 60);
file.XLInsertStringIntoCell(fileName, firstSheetName, "A4", "Green");
file.XLInsertNumberIntoCell(fileName, firstSheetName, "B4", 10);
Console.WriteLine("Workbook created at " + path);
Console.ReadKey();
}
}
Before the comments start to fly, I want to point out a couple things:
- This bit of code is not that efficient, I realize it opens and closes the package a bunch of times. This is really just to demonstrate what is possible and not what is necessarily the best practice. There are very few code samples available, and I am shooting for simplicity here.
- I know ExcelPackage is on CodePlex and does a better job of wrapping the APIs involved and is much easier to write code with. Once you have a basic understanding of these APIs you will appreciate for the work being done on that project.
Download the VS 2005 project. Don't forget to install all the prerequisites listed above before trying the project. I didn't include the two functions necessary from the Code Snippets in the project either (since I didn't write that code), you will have to put those in yourself.
Jeff Atwood shows us why we should consider better password policies when developing applications or setting company policy.
As we know, the biggest threat to security is not hackers, but the users themselves making it easy for someone to gain access to protected resources by having ridiculously easy to guess passwords. As developers we are as much at fault for building applications that allow this behavior. Jeff recommends using pass phrases instead of passwords. A phrase is longer (and thus more resistant to brute force) and easier to remember than a mixed up jumble of nonsensical characters. By adding an unusual word or character pass phrases are very difficult to break with dictionary attacks as well. Pass phrases are controversial as well, see:
The Great Debates: Pass Phrases vs. Passwords. Part 1 of 3
The Great Debates: Pass Phrases vs. Passwords. Part 2 of 3
The Great Debates: Pass Phrases vs. Passwords. Part 3 of 3
Personally, I think the hard part is convincing users and business owners of an application that longer or more complicated is better. From my own experience I understand users want the simplest password policy possible. Often the business owners of an app don't feel the information being protected is all that important to justify such an imposition for the users, or feel that it becomes a support expense because users can't manage their own data or password very well (a great argument for using something like Windows CardSpace). I think they forget that users re-use the same password everywhere possible: a free e-mail account, network access at work, bank web sites, a blog, a MySpace account, etc. I would not want to be responsible for a malicious person to gain a password from my system and then use that password to systematically destroy someone else's life. Be strong, insist on good password policy.
I have a web project (the original 2005 web project type, not a web application project) and had a problem getting files copied to the bin directory. Essentially, one of the library projects referenced by the web project has an XML file in the project output, but when the solution is built, the XML file in the bin directory of the library project is not pulled into to the bin directory of the web project. Of course, a post-build event seemed like the thing to do, but web projects don't have support for that.
A little digging and I found this post by Scott Guthrie that describes a "Build Helper Project". You simply add an empty class library project to your solution. You then use the build events in the empty project use to add build events to your web project. You just make sure the project build order is correct so the events get called when you need them.