There is some talk about being able to store Files (or more specifically) access FileStreams within Sql Server 2008. How is this being handled internally, and in the case of multi-millions of files (which the current Windows File System has a hard time handling), how will this be managed? Does it involve replication, Specific FileGroups, a SAN, CIFS shares, etc. Any help or information you can provide would be useful.
In the case of Oracle, there you have access to RAC or to something like GFS. Can you highlight what makes the Sql 2008 implementation different, able to scale better, etc.
Thanks,
Andre' Hazelwood
What you're refferring to is the new FILESTREAM data type. As I've understood it, the data is stored in files (or filestreams) in the file system, but can be accessed from the database as if they were blobs. Actually, that's pretty neat, and something which MAY cause me to change my opinion regarding SharePoint, if or when it will start using this new feature. I'm really looking forward to work with this one. I'm afraid that's all i know (or think I know) so far.|||
Hi, thank you for this question. Can you please be more specific on which scenarios you are referring to when you say "Windows File system has hard time handling in the case of multi millions of files"? I need you to first clearly identify what issue(s) you have in mind (examples/perf numbers can help), since not all are related to the way we organize files. If you could provide any data on the specific issue you will refer to in your reply this would be very helpful to answer your question better. Typically, NTFS can scale up to a very large number (greater than million) files per directory as long as some conditions are met (filenames are small, short file name generation is disabled, etc.).
thanks,
I see the original poster did not followup on that question. I am interested in the topic, so I can answer from our point of view:
a) I have seen NT have real problems with too many files and directories. I don't remember exact numbers but there was millions of files and directories (never more than 5 levels deep) with short names for a total of about 800gigs. Opening a folder could take 5-20 minutes. Access through FTP/HTTP was much better of course. Some pretty large files but 90% of them were smaller than 1 mb (all zipped files). And lots of directories.
b) we have scenarios were we could store up to 4-5 terabytes but I expect to answer "up to 10-12" in 4-5 years at most.
c) I started looking at the CTP but haven't looked at Filestreams yet (lack of time) so I could ask a number of relevant questions that already have answers but I should say that the first things I will look for is how to manage that data (partition, backup/restore, mirror, etc) with flexibility/safety, how to access it (OLEDB is slow for BLOBS and giving HTTP/FTP access directly in the db is a must) and the licensing impact (most scenarios are to segregate that kind of services to a distinct server. Are filestreams part of the database size? If it is not, I can see lots of small DAM starting to appear, using Express, and proliferate through the enterprise. I would use such small DAMs in the DMZ for example and keep the big thing inside the LAN. But if Filestreams are part of the 4 gigs limit, we will stay with our current practice of storing in the file system and keep only a reference in the database.
d) How can this work with a mirror? We are currently using DoubleTake for its ability to mirror both the db and the file system. Could we now rely just on db mirroring?
e) can we store large UDT in there? I can see scenarios were I would like to store an Invoice as a large UDT. One of its property could be to expose itself as a PDF/XPS format, but keeping extra metadata as part of the Invoice is not all. Being an object, the Invoice can have some specific behaviors based on the context/user or from what is requested of it. mmm. Is SQL2008 bridging the object-sql gap by bringing objects as rich Relational Domains? Sure smells like it:-) Wow, I am impressed and exited by the possibilities!
No comments:
Post a Comment