LINQ2SQL intresting facts.
Something I discovered recently about using LINQ to go after SQL databases to take note of.
Here was the Senerio
Ran a query on a large table (150,000 rows with 5 columns) and stored that in application cache.
Application then when needed, did a Linq on primary key on that cached LINQ collection.
Result: Linq Re-ran the origianal query, instead of going after the data already in the collection. Additionally was unable to close (close not discard) the DB Connection that was asscociated with that collection.
So I tried a few other things.
1. Ran same linq, and the iterated (foreach) every row in that collection.
2. Iterated the collection a second time and verified it re-ran the origianl SQL query and had to return all that data a second time.
Linq2SQL is a form of late bound query, and each access of a existing collection (foreeach or additional LINQ statements) in code will cause a new query to run, all the data to trasfer again, making it impractical to use for when you want to cache into memory table that arn't update frequently for higher performance.
Additionally LINQ2SQL does not close the DBConnection, leading to the built in .net connection pooling to not fuction as it's supposed too. Each user or web session gets a different Connection object and they always stay open until the end of a web page. If you put the collection into application cache, that connection stays open until the server is reset.
For high performance applications that need to precache some table, LINQ is not a good solution, better off to use ado.net
(system.data.SQLClient) and then cache the raw DataTable object. (Unfortunately LINQ doesn't use the DataTable object behind the scenes so you can't get at that with system reflection).
Performance Metrics on the above table (150K rows) with LINQ followed by 100 primary key sub queries on that collection, then DataTable with the same (after manualy indicating the primary key field on the in core datatable), and then loading both to a Dictionary generic collection class and doing the same 100 fetches.
Linq - 9.87 seconds
DataTable - 0.0987 seconds
Linq Loading to Dict collection .0104 seconds
Reading Dict 0.0012 seconds
DataTable loading to Dict collection .0367 seconds
Reading Dict 0.0012 seconds
After running the metirics, for truely highest performance needs when those arise, reloading a DataTable or a Linq collection's data into a Dictionary would be the best method of caching data. (unless you need the table updated event callbacks to reload from an updated table, in which case a DataTable would be the next best option)
However in the end remember that LINQ2SQL collections have connection pooling issues, and will rerun origianl queries instead of working on the data already in memory on each access of those collections. So while LINQ may be good for smaller applications that don't get much use, for larger or highly used applications, staying with ado.net
classes is the better option.
I also found that when doing complex joined queries, (4 tables or more inner or left joined) that ado.net
also significantly outperforms LINQ2SQL.
Please note, i'm not saying LINQ itself is bad, just saying the LINQ2SQL interface has some issues that are not very good for larger or high use applications.