Pages

Friday, December 12, 2014

SQL query on spreadsheet

Introduction

Sometimes I have some Excel spreadsheets and I need to some kind of query on them. Like extracting all unique values from a column, doing a join between two tabs or counting how many occurrences I have for some value.
Most of the time you can combine Excel functions to obtain what you need. There are Excel gurus that can solve almost anything with them.
But as a developer sometimes I'm in a hurry and would prefer to just be able to execute a SQL query on my sheets, avoiding to learn a new function to obtain what I need.
So I created a small Windows tool called ExcelSQL that allows you to load CSV or XLS files and then perform arbitrary SQL queries on them.
You can download it from here: DOWNLOAD

Features

  • Windows app to load CSV and XLS files (no XLSX for the moment) and perform any SQL query on those files.
  • You can do joins, use aggregation functions, sub-queries, and everything that is supported by ODBC Text driver
  • Save results to a file
  • Use results as a new table to join in another query

Examples

Let's suppose we have the following spreadsheet called "products" with a bunch of items and some attributes:

If for example we want to know how many different types of products we have we can open my app and load this file:



The app will show the file in the right panel and will open a new SQL sheet where we can write an arbitrary query. Instead of tables we have files and each file must be specified with a $ at the beginning. In this case: $products.
The app shows below all the rows and columns that satisfy the query we have written. The first row is always considered to be the header and the column names are extracted from it.



Then we can obtain the different types of products executing the following query:


We can also see how many products we have of each type:



More features

Imagine now that we have another spreadsheet called "references" with the description of each type.



We want to add those descriptions to our result. In that case we can also load the other file and execute a query with a JOIN statement:




We can then save this report by clicking on "Save results" or we can also create a temporary table to use these values in another query. To do that we click on "Create table" and a new file will be created containing the current result. Then those results can be used in another query.
For example we may want to list all the products that belongs to a category that has more than 25 products. In that case we can use the previous reports instead of re-computing it again.



Implementation

The application was developed in C# for Windows using Visual Studio 2010 and .NET framework 4.0.

File import

It supports two types of files: CSV and XLS. CSV are copied from their original location to a temporary folder. All files are stored in the same temporary folder.
XLS files are loaded using ODBC with Microsoft.Jet.OleDb driver. The first sheet is identified and then it's queried with a SELECT * sentence. Then its content is saved as CSV file in the same previous temporary  folder.

//Read file and convert to CSV
OleDbConnection conn = new OleDbConnection("Provider=Microsoft.Jet.OleDb.4.0;Data Source=" + srcFilePath + ";Extended Properties=Excel 8.0");
conn.Open();

//Obtain first sheet
string firstSheet = null;
DataTable schemaDt = conn.GetSchema("Tables");
for (int i = 0; i < schemaDt.Rows.Count - 1; i++)
{
 string tableType = schemaDt.Rows[i].ItemArray[schemaDt.Columns.IndexOf("TABLE_TYPE")].ToString();
 string sheetName = schemaDt.Rows[i].ItemArray[schemaDt.Columns.IndexOf("TABLE_NAME")].ToString();
 if (tableType == "TABLE" && sheetName.Contains("$"))
 {
  firstSheet = sheetName;
  break;
 }
}
if (firstSheet == null)
{
 mainForm.showError("Import file error", "Could not find first sheet in excel file: " + srcFilePath);
 return false;
}

//Query sheet
OleDbDataAdapter adapter = new OleDbDataAdapter("SELECT * FROM [" + firstSheet + "]", conn);
OleDbCommandBuilder commandBuilder = new OleDbCommandBuilder(adapter);
DataTable dataTable = new DataTable();
adapter.Fill(dataTable);

//Create CSV file
using (var writer = new StreamWriter(dstFilePath))
{
 //columns
 for (int i = 0; i < dataTable.Columns.Count; i++)
 {
  string colName = dataTable.Columns[i].ColumnName;
  if (i != dataTable.Columns.Count - 1)
  {
   writer.Write(colName + ",");
  }
  else
  {
   writer.WriteLine(colName);
  }
 }
 writer.Flush();

 //rows
 for (int i = 0; i < dataTable.Rows.Count; i++)
 {
  object[] cells = dataTable.Rows[i].ItemArray;
  for (int j = 0; j < cells.Length; j++)
  {
   object value = cells[j];
   string cellValue = value != null ? value.ToString() : "";
   if (j != cells.Length - 1)
   {
    writer.Write(cellValue + ",");
   }
   else
   {
    writer.WriteLine(cellValue);
   }
  }
  writer.Flush();
 }
}

XLSX files cannot be opened with Microsoft.Jet.OleDb driver so they are not supported for now. You have to save them to a CSV or XLS format first.


Queries

CSV files are queried using ODBC with Microsoft Text Driver. All files must be in the same folder in order to perform joins.
We first adapt query to remove the "$" character replace each table with [table_filename]. Then we execute the query and store its result in a DataTable.

OdbcConnection connection = null;
try
{
 //Adapt table names
 queryText = adaptQueryToExecute(queryText, this.mainForm.TableFiles);

 //Open ODBC connection to temp folder
 string connectionStr = @"Driver={Microsoft Text Driver (*.txt; *.csv)};Dbq=" + mainForm.TempTablesPath + ";Extensions=csv,txt";
 connection = new OdbcConnection(connectionStr);
 connection.Open();

 //Execute query
 OdbcDataAdapter odbcAdapter = new OdbcDataAdapter(queryText, connection);

 //Fill dataGrid
 DataTable dataTable = new DataTable();
 odbcAdapter.Fill(dataTable);
 dataGridViewResults.DataSource = dataTable;
 dataGridViewResults.Refresh();
 mainForm.log(dataGridViewResults.Rows.Count + " rows loaded");

 //Close connection
 connection.Close();
}
catch (Exception e)
{
 string error = "Error executing query:\n" + e.Message;
 if (e.InnerException != null)
 {
  error += Environment.NewLine + e.InnerException.Message;
 }
 mainForm.logError(error);
}
finally
{
 if (connection != null)
 {
  connection.Close();
 }
}


Conclusions

If you have to analyse some spreadsheets and don't want to deal with Excel functions then these tool maybe useful for you. Specially if you a developer with strong skills in SQL and almost no knowledge of Excel.
You can download it from here: DOWNLOAD
Some improvements that could be added:

  • ODBC does not support many advanced SQL features and its performance is not very good for large files. It maybe good to replace it with some embedded database, like SQLite or Firebird.
  • Support for XLSX files should also be added.
  • If a XLS files contains many tabs we should be able to load all or some of them instead of the first one.



















No comments:

Post a Comment