My thoughts as an enterprise Java developer.

Thursday, December 31, 2009

Official Google Blog: Browser Size: a tool to see how others view your website

Official Google Blog: Browser Size: a tool to see how others view your website: "Browser Size is based on a sample of data from visitors to google.com. Special code collects data on the height and width of the browser for a sample of users. For a given point in the browser, the tool will tell you what percentage of users can see it. For example, if an important button is in the 80% region it means that 20% of users have to scroll in order to see it. If you're a web designer, you can use Browser Size to redesign your page to minimize scrolling and make sure that the important parts of the page are always prominent to your audience."

Thursday, November 05, 2009

TLS (SSL) compromised!

extendedsubset.com: "There are three general attacks against HTTPS discussed here, each with slightly different characteristics, all of which yield the same result: the attacker is able to execute an HTTP transaction of his choice, authenticated by a legitimate user (the victim of the MITM attack). Some attacks result in the attacker-supplied request generating a response document which is then presented to the client without any certificate warning or other indication to the user. Other techniques allow the attacker to forward or re-purpose client certificate authentication credentials."

Monday, November 02, 2009

Open source as an antitrust strategy | The Open Road - CNET News

Open source as an antitrust strategy | The Open Road - CNET News: "IBM, Intel, Red Hat, and others aren't investing in Linux because they're all chums at the country club together, but rather because they're looking for ways to reduce Microsoft's hold on their own businesses through its control of personal computer and server operating systems.
As an added benefit, it's a great way for companies to collaborate without running afoul of antitrust laws. It's collusion without the collusion."

Wednesday, October 21, 2009

Google Public Policy Blog: Vint Cerf on the importance of keeping the Internet open

Google Public Policy Blog: Vint Cerf on the importance of keeping the Internet open: "Earlier this week, Vint Cerf, one of the original architects of the Internet and our Chief Internet Evangelist, joined other pioneers in a letter to the FCC expressing support for the Commission's consideration of safeguards that would preserve the open Internet."

Government regulation will mean that it isn't open! Once the government takes control it will only increase its control and the regulation will limit good experimentation.

Tuesday, October 13, 2009

[JavaSpecialists 177] - Logging Part 3 of 3

[JavaSpecialists 177] - Logging Part 3 of 3: "For example, here is a Layout for Log4J that can be used to collapse lines with '|>>' instead of a newline character. For readability, it is trivial to then replace those characters with \n again."

import org.apache.log4j.*; import org.apache.log4j.spi.*;  import java.io.*;  public class MultipleLineLayout extends Layout {   public String format(LoggingEvent event) {     Object o = event.getMessage();     if (o instanceof Throwable) {       return format((Throwable) o);     }     return format(String.valueOf(event.getMessage()));   }    private String format(Throwable t) {     StringWriter out = new StringWriter();     PrintWriter pw = new PrintWriter(out);     t.printStackTrace(pw);     pw.close();     return format(out.toString());   }    private String format(String s) {     return s.         replaceAll("\r", ""). // remove Windows carriage returns         replaceAll("\n*$", ""). // remove trailing newline chars         replaceAll("\n", " |>> ") // replace newline with |>>         + "\n";   }    public boolean ignoresThrowable() {     return false;   }    public void activateOptions() {     // not necessary   } }

Thursday, September 24, 2009

Microsoft bashes Google's Chrome-in-IE plan | Beyond Binary - CNET News

Microsoft bashes Google's Chrome-in-IE plan | Beyond Binary - CNET News: "'Given the security issues with plug-ins in general and Google Chrome in particular, Google Chrome Frame running as a plug-in has doubled the attach area for malware and malicious scripts. This is not a risk we would recommend our friends and families take.'
However, some took Microsoft to task for criticizing plug-ins, noting that Redmond itself has more than a few.
'Microsoft scared of security of plug-ins. Uninstall Silverlight now,' Mozilla's Dion Almaer wrote in a Twitter posting."

It's not like MS or IE has a stellar security record.

Tuesday, September 08, 2009

Distributed Relational Database Architecture

Distributed Relational Database Architecture

James Stauffer

August 1, 2009

Prepared for CS425 Summer 2009

 

 

Table of Contents

 

        I. Introduction

                A.  Focus

                B.  Implementation

        II.  Relevent Features 

                A.  Availability

                B.  Failover

                C.  Replication

                D.  Partitioning

                E.  Single Point of Failure

                F.  Shared Storage

                G.  Inter-node Communication

        III.  Shared Everything: Oracle RAC (Real Application Cluster

                 A. Maintenance availability

                 B.  Inter-node communication

                 C.  Permanent Storage

                 D.  Client interaction

                 E.  Further information 

        IV:  Shared Nothing:  My SQL Cluster  

                 A. Maintenance availability

                 B.  Inter-node communication

                 C.  Permanent Storage

                 D.  Client interaction

                 E.  Further information                

        V.  Sharding      

                 A. Maintenance availability

                 B.  Inter-node communication

                 C.  Permanent Storage

                 D.  Client interaction

                 E.  Further information                 

         VI.  Conclusion

 

 

 

  1. Introduction
    1. Focus

This paper will cover distributed relational database architecture, primarily as it affects availability to the client.  Features that allow the database to run more complex queries or handle larger data sets (scalability) will not be covered.  All discussion will focus on systems with multiple servers (call nodes) involved.

    1. Implementations

This paper will cover three basic types of implementations: Shared Everything, Shared Nothing, and Sharding.  Shared Everything implementations don't actually share everything but just share the data storage between nodes, while shared nothing systems have nodes that are completely independent (including storage) and self-sufficient.  Systems that present the nodes as one or as interchangeable are generally called clusters and include shared everything and shared nothing.  Sharding is similar to shared nothing in that nothing in the nodes is shared, and the nodes are independent.  However, sharding nodes are seen as completely separate and independent (but related) by the client.  Shared everything systems need to synchronize data access and can do it either through a shared storage device or with direct communication.  Oracle RAC will be the example reviewed for shared everything, and MySQL will be the example reviewed for shared nothing.  Sharding doesn’t have specific support in any major database, so it will be generically reviewed with no example.  For each type of architecture, the following will be addressed: maintenance availability, inter-node communication, permanent storage, and client interaction.

  1. Relevant Features

Some major aspects relevant to highly available distributed relational database architecture include: availability, failover, replication, partitioning, single point of failure, shared storage, and inter-node communication.

    1. Availability

Availability refers to the percent of time that a service is responsive and working correctly (synonymous with uptime).  It is usually expressed in a percent with 99.999% (5 nines) considered a very high level of availability.  Downtime refers to the time that a service is not available (i.e. the opposite of availability).  Both planned and unplanned events can affect availability, therefor both need to be addressed.  All types of database systems try to minimize unplanned downtime to some extent, but vary in how much maintenance can be done without downtime.  Techniques to minimize unplanned downtime include: automatic transferring of connections from failed to working nodes, automatic restart of failed nodes/services, and optimal checkpointing to minimize restart time.  Techniques to minimize planned downtime include supporting maintenance operations while the service running is one of the following: allowing patches, updates, reconfiguration, adding or removing nodes, adding or removing databases.

    1. Failover

Failover refers to what happens when one node fails, and how another node (or nodes) starts handling the service that had been provided by the failed node.  Failover may be mostly transparent to the client, or can require the client to connect to another node.  The client can have a front-end, such as a JDBC driver, that can handle some of the cluster activity (i.e. connecting to another node if the cluster doesn't handle that automatically).  Whether or not the client is automatically redirected to another node, any transactions open on the failed node will also fail.

    1. Replication

Replication refers to the copying of data to another system or node – especially so that when a node fails, no data is lost and the data is immediately available.  Asynchronous replication happens in the background so that the client receives the result of its request much quicker, and does have the risk that the node will fail before all data is replicated.  Synchronous replication happens before the result is sent back to the client, but makes each client request take longer.  There can also be multiple levels of replication. Replication can be done to the memory of two nodes synchronously to provide fast response (and minimilize the chance of data loss): and asynchronous replication can be made to permanent storage.

    1. Partitioning

Partitioning refers to splitting the data into parts, and distributing each part to a separate node.  This allows each node to have less data to handle. One problem,  however, is that it can cause the need for requests to contact many nodes in which to access all needed data.  Therefore, partitioning may not be appropriate for all types of databases.  When a node has less data to handle, it can more effectively cache data in memory, it can have more efficient indexes, and therefore it can increase performance.

    1. Single point of failure

A single point of failure is a single component of the system that is used by the whole system, and if that component fails, the whole system will fail.  A system with a single point of failure is much more at risk for failure, therefor a single point of failure should be avoided.  In a highly available system, as many components as possibleare duplicated to help avoid as many single points of failure as possible.  Even when nodes are duplicates, components on each node are often duplicated to reduce the chance of failure on an individual node (i.e. network interface, storage interface, power supply, etc).  The single point of failure assessment is done on many levels – from the storage system (drives, connectors, controllers, power), all the way up to datacenter power and network connection.

    1. Shared storage

Shared storage is permanent database storage that is shared between all or many nodes.  Like anything else shared, shared storage can be a risk as a single point of failure.  Shared storage can also be used as a communication channel.  Types of shared storage include a database (however, using a database for shared storage of a database isn't used), NAS(Network attached storage), SAN(Storage Area Network), external SCSI disk, cloud storage (such as Amazon S3), etc.  The connection to the shared storage is generally duplicated (i.e. two network cards, SCSI controllers, etc).  The single point of failure risk can be addressed by duplicating the data into two identical shared storage systems through a process called mirroring.

    1. Inter-node communication

Inter-node communication refers to how the nodes communicate with each other.  Nodes can communicate through shared storage, or over a network.  When communication is over a network, it is generally a fast network (gigabit or faster), a network that is private to the nodes (accessed only by the nodes), and a network that is called an interconnect.  Each node generally has two network interfaces for improved fault tolerance. Because the network is slower than memory, the inter-node communication of a cluster can make each action slower in a cluster, as opposed to communication on a single database.

  1. Shared Everything: Oracle RAC (Real Application Cluster)
    1. Maintenance Availability

Adding a new node to the cluster doesn't involve cluster downtime, however clients may not completely use the new node until they are told about it. Some actions, like parallel queries, will immediately take advantage of the new node. All cluster nodes can be managed as one, or managed separately, as deemed necessary. Some code upgrades (patches) to the DBMS (database management system) can be applied to the cluster.  This is done one node at a time, in a rolling fashion, so that all nodes can be upgraded without service downtime.

    1. Inter-node Communication

The nodes communicate updates (for cache), locking, etc. between each other over an interconnect (older versions communicated over the file system).  When a node needs to write to a data block, it first sends a request across the cluster to transfer ownership of the data block to itself. It appears that each cluster has a master that tracks which node owns each block, therefor this design doesn't slow a single update as nodes are added to the cluster because ownership transfer only involves threenodes: requester, master, and current owner.  Since only one node can own a block at one time, blocks that are updated often can cause ownership to jump around between nodes, often degrading performance.

    1. Permanent Storage

Oracle RAC uses shared storage.  The shared storage can be NAS, SAN, or SCSI disk.  ASM (Automatic Storage Management) can address the storage single point of failure risk by mirroring data across different failure groups (a set of disks sharing something in common, such as a controller).  Since the file system is shared, all volume management must be designed to work with the cluster.

    1. Client Interaction

Clients connect to the nodes with Virtual IP (VIP) addresses so that if a node fails, the VIP can be redirected to another node. The client needs to know the VIP for all nodes. Clients can use Fast Application Notification (FAN), Fast Connection Failover (FCF), and Transparent Application Failover (TAF) to detect and/or handle node failure.  However, it may require the client to re-do some work, and it may be hard to determine which options are current, and which options will work best.  It can be a huge code change to change a program to detect and redo actions every place that the database is used.  Load balancing is supported by the client having the list of all nodes, and randomly choosing a node for a connection (so that connections are spread across nodes).  Alternatively, the client can get load information from the listener running on the chosen node.  This is done so that the listener can direct the client to another node that has more resources available (which all depends on the connection option chosen.)

    1. Further Information

http://en.wikipedia.org/wiki/Oracle_RAC

http://www.oracle.com/technology/products/database/clustering/index.html

  1. Shared Nothing: MySQL Cluster

A MySQL cluster has 3 node types: data nodes to store the data, SQL nodes to run a MySQL server, and management nodes.  Therefore a minimum of five nodes is generally needed for high availability (1 management, 2 SQL, and 2 data each with 1 replica of the data).

    1. Maintenance Availability

Adding and dropping data nodes requires that the cluster be restarted.  One exception is that data nodes for new partitions can be added while the cluster is running.  If one node fails, failover automatically happens to another node, and any transaction information on the failed node is lost.  The failover only takes sub-second time to happen.  Adding space to an existing database by adding a data node, requires that the data be repartitioned to include the new node (so that the data is spread out over the new set of nodes). Therefore, adding and repartitioning causes downtime and uses a lot of resources. Rolling software updates are supported.

    1. Inter-node Communication

Nodes communicate on a private interconnect.  Because there is only one primary owner of each block (the primary replica), ownership of the block doesn’t need to be transferred, and updates only have to involve 2 nodes (the SQL node processing the request, and the master data node that owns the block).

    1. Permanent Storage

The data is partitioned across the data nodes.  Each piece of data can have multiple replicas (with one node being the master, and the others being slaves) so that the data exists on multiple nodes, and so that there is no single point of failure.  When a node needs to write a block, it can replicate the data either asynchronously or synchronously. If the node is configured to replicate asynchronously, it first replicates the data to all other data nodes that have a replica, and then asks them if they can commit the change.  If all reply affirmatively,the node then sends another message to tell all other nodes to commit the change (two phase commit).  Since the data is replicated to other nodes, each node can replicate to permanent storage asynchronously.

    1. Client Interaction

Clients can connect to the cluster through a load balancer, or through a proxy, so that they don't have to be aware of the individual SQL nodes.  Client reads can be done on the replicated nodes to provide better performance because the access can be spread out over more data nodes.  Read and write lock conflicts can also be reduced if the masters are setup to only handle writes, and the slaves only handle reads. The reduced contention also increases performance.

    1. Further Information

http://en.wikipedia.org/wiki/MySQL_Cluster

http://dev.mysql.com/doc/refman/5.1/en/mysql-cluster.html

http://dev.mysql.com/doc/refman/5.1/en/mysql-cluster-replication.html

http://dev.mysql.com/doc/refman/5.1/en/replication.html

  1. Sharding
    1. Maintenance Availability

Since each shard is independent, all maintenance on one shard has no affect on the rest of the shards.  The only exception is if the data needs to be repartitioned across the shards, then all shards can be affected.  However, repartitioning can be done without affecting availability.  Adding a shard can also require repartitioning.  Sharding takes more work to manage because of the work involved in repartitioning to keep the load balanced, or when adding or removing a shard.  Each shard could have many of the techniques for normal clusters used, but they are used independently, so any bottlenecks are for a smaller data set.  Balancing the load between servers can be difficult.  For example, some users may use more resources, thus using some shards much more than others.

    1. Inter-node Communication

There is no inter-node communication with sharding, and problems on one server can't affect other servers.

    1. Permanent Storage

Sharding splits the data into partitions/shards (i.e. by groups of users) and puts each portion on a completely separate DB system.  This removes write bottlenecks across shards without the potential for data inconsistency.  There is no specific storage architecture since each shared can use any storage architecture.

    1. Client Interaction

The client either has to know the sharding algorithm (so that the client knows which shard to contact), or there must be a shard lookup service that the client uses.  The shard lookup service will have the potential to be both a single point of failure, and a bottleneck (however, it should be used much less so that it isn't a bottleneck).  Determining the correct shard to contact can increase the complexity of the client code.  Also, queries across groups are more difficult, therefore, systems that require a lot of queries across shards may not work well with a sharding system.

    1. Further Information

http://highscalability.com/unorthodox-approach-database-design-coming-shard

  1. Conclusion

Sharding has the highest maintenance availability, MySQL cluster has the lowest maintenance availability.  Sharding has the lowest inter-node communication, Oracle cluster has the highest inter-node communication.  MySQL cluster has the most fault-tolerant permanent storage, sharding has the least fault-tolerant.  MySQL cluster has the least complex client interaction, Oracle RAC has the most complex.  Depending on the exact features needed, and the type of data used, each could be the best solution  However, for this focused comparison, MySQL cluster provides the best combination.

Friday, August 28, 2009

Bill would give president emergency control of Internet | Politics and Law - CNET News

Bill would give president emergency control of Internet | Politics and Law - CNET News: "The new version would allow the president to 'declare a cybersecurity emergency' relating to 'non-governmental' computer networks and do what's necessary to respond to the threat. Other sections of the proposal include a federal certification program for 'cybersecurity professionals,' and a requirement that certain computer systems and networks in the private sector be managed by people who have been awarded that license."

Why should we expect the executive branch to have the expertise and information to know how to best use that control?

Tuesday, August 04, 2009

Defeating Character Frequency Analysis of Substitution Ciphertext

(This is a paper I wrote in the fall of 2008 for CS461: Computer Security I at UIUC. If you are interest in the code leave a comment and contact info.)

Introduction


Character frequency analysis is a common way to crack substitution ciphertext because of the uneven distribution of letters in languages. In order to defeat that analysis, the techniques considered were: synonyms, dropping letters or words, adding extra letters or words, alternate spelling (Leet, spam, etc), and Two cipher letters per plaintext letter. Most of these techniques would be applied before a standard substitution cipher encryption. To evaluate the techniques, test data was obtained from Project Gutenberg(http://www.gutenberg.org/) and consisted of the texts of Psalms (Ps), Pride and Prejudice (P&P), Alice in Wonderland (AiW), and Sherlock Holmes (SH). After applying the technique, the output character frequency was compared to the standard character frequency (http://en.wikipedia.org/wiki/Letter_frequencies and http://www.data-
compression.com/english.html
) and then the standard deviation of the proportion of the changed frequency for each letter compared to the standard character frequency. i.e. If the letter "e" occurred 10% of the time in the changed text then the proportion for "e" is .787 (.10/.12702). That gives an overall measure of how close the changed text character frequency is to the standard character frequency. A standard deviation of 0% would mean that it exactly matches the standard frequency. Initial standard deviations are: Ps: 25%, P&P: 28%, AiW: 25%, SH: 14%. For comparison, an equal distribution produces a standard deviation of 1,439%. In LetterFrequencies.zip there are programs for most of these options and for computing the standard deviation.

Synonyms


Replacing words with synonyms that have a higher standard deviation can help defeat character frequency analysis. The thesaurus used has synonyms in categories, so when a word was looked up, the first category was arbitrarily used. Within the first category, the word that had the largest standard deviation from the normal character frequency was chosen. The chosen work often didn't fit grammatically, so the results are skewed, but this is somewhat offset by only looking in the first category. To make this usable, grammar checking would have to be used. It would probably need a UI that would suggest alternatives (based on grammar rules and the standard deviation). Attacks against this technique could include creating a list of the best synonyms, creating a new standard character frequency distribution for those words, and therefore largely defeat the benefit of this technique. Final standard deviation achieved: Ps: 182%, P&P: 180%, AiW: 135%, SH: 149%. The increase in standard deviation was very significant, but not enough to make a large impact on frequency analysis.

Dropping letters or words


Randomly dropping letters or words could easily hide the meaning from the intended recipient so it would be hard to keep the meaning while still randomly dropping enough to make an impact. An attempt to always drop vowels and rare letters (a,e,i,o,u,x,q) and words (I, the, is, was, of, a, & an) made an insignificant difference (Drop Words Ps: 8%, P&P: 6%, AiW: 8%, SH: 8%; Drop Letters: Ps: 73%, P&P: 91%, AiW: 69%, SH: 70%). Also, some of each letter would need to be dropped to prevent the non-dropped letters from serving as landmarks.

Adding extra letters or words


To be effective, extra letters and words must be randomly placed. They could be chosen based on the inverse of distribution. However, additions that have a significant impact on the standard deviation may garble the message too much. Add letters results(based on inverse of distribution): P&P: 2,154%, SH: 2,795%. The result is readable but takes significantly more work to understand the message.

Alternate spelling (Leet, spam, etc)


Alternate spelling includes l33t (http://en.wikipedia.org/wiki/Leet) and spam spelling (http://en.wikipedia.org/wiki/E-mail_spam#Obfuscating_message_content) and is similar to synonyms, in that the possibility of a new new standard character frequency has to be taken into account. This technique was not implemented or programatically evaluated because it was non-obvious what to use as the standard distribution.

Two cipher letters per plaintext letter


Using two letters in the cipher text for each letter in the plaintext can be a good way to create a flat character distribution. The algorithm is to partition the 676 2-letter combinations based on the standard character frequency. i.e. if the standard frequency for a letter is 5% then it will get 5% of the 2-letter combinations (randomly selected). This doubles the size of the data, could include spaces & punctuation, and makes a much larger key. Note that some letters may get dropped because they occur less than 1/676 (0.15%) of the time. Both 1-gram and 2-gram frequency analysis produce a nearly uniform histogram (variation appears to only be caused by rounding). Two-gram results: P&P=5,117%; SH=5,013%. Therefore this technique was extremely effective with no obvious weaknesses.

Conclusion


Changing the plaintext proved problematic because of inadvertent changes to the meaning, but it did make a significant impact on standard deviation even though it probably wasn't significant enough. Applying all of the plaintext changes to a short message while using safeguards to protect the meaning would probably be effective. Two cipher letters per plaintext letter appears to be the easiest way to defeat character frequency analysis.

Friday, July 31, 2009

Sample chapter from Don't Make Me Think

Sample chapter from Don't Make Me Think: "Why are things always in the last place you look for them?
Because you stop looking when you find them.
—Children’s riddle"

Great info on usability.

Thursday, July 30, 2009

Pressure sensitive keyboard for repeat rate

Would it be helpful to have a pressure sensitive keyboard where the pressure determines the repeat rate (harder=faster, etc)? It seems that it would be especially useful for navigation keys (arrow, page up/down) and keys commonly repeated (enter, backspace, delete, space, tab). Does such a thing exist? Was it tried and how well did it work?

Wednesday, July 29, 2009

Ex-Google CIO breaks his own security rules | InSecurity Complex - CNET News

Ex-Google CIO breaks his own security rules | InSecurity Complex - CNET News: "But 'it's not security's responsibility to go out there and say 'Users want to use Gmail. Let them use it,'' Johnson added. 'If we decide to use Gmail we need to have a project and treat it in a formal way and pay money to do it right.'"

So are you going to have a project for every website that users might find useful? There is no way that you will keep up and the barrier to using useful websites will significantly hurt productivity.

Beyond the hype: Where open source actually saves you money | The Open Road - CNET News

Beyond the hype: Where open source actually saves you money | The Open Road - CNET News: "One way, as Urlocker points out on his blog, is that open source allows enterprise IT projects to succeed or fail with little risk. You know before you pay anything--if you pay anything--that open-source software is going to work, or not."

"Open source tends to offer best-of-breed solutions that aim to do a limited range of functions well, rather than to be all things to all people."

Saturday, July 25, 2009

MIT OpenCourseWare | Electrical Engineering and Computer Science | 6.854J Advanced Algorithms, Fall 2008 | Home

MIT OpenCourseWare |


Electrical Engineering and Computer Science | 6.854J Advanced Algorithms, Fall 2008 | Home
: "This is a graduate course on the design and analysis of algorithms, covering several advanced topics not studied in typical introductory courses on algorithms. It is especially designed for doctoral students interested in theoretical computer science."

Monday, June 22, 2009

Pattern Formatter for java.util.logging

Pattern Formatter for java.util.logging: "The Log Formatter will use the following formatting tokens:

* LoggerName %LOGGER%
* Level %LEVEL%
* Time %TIME%
* Message %MESSAGE%
* SourceClassName %SOURCECLASS%
* SourceMethodName %SOURCEMETHOD%
* Exception Message %EXCEPTION%"

Thursday, May 07, 2009

Save Icon

The most common save icon is a picture of a floppy disk.

But with floppy disks not being used anymore will most programs change the picture? In 1-2 decades what picture will be used for save icons?

Thursday, April 16, 2009

FBI seizures highlight law as cloud impediment | The Wisdom of Clouds - CNET News

FBI seizures highlight law as cloud impediment | The Wisdom of Clouds - CNET News: "The articles report that the FBI raided at least two Texas data centers last week, serving search and seizure warrants for computing equipment, including servers, routers, and storage. The FBI was seeking equipment that may have been involved in fraudulent business practices by a handful of small VoIP vendors.
The problem is that they didn't just grab the systems belonging to the VoIP vendors, but grabbed hundreds of servers serving a wide variety of businesses, the vast majority of which had never dealt with or even heard of the companies under investigation, according to Threat Level. Company officials interviewed complained of losing millions of dollars in lost revenue and equipment with no warning whatsoever."

"If the court upholds that servers can be seized despite no direct warrants being served on the owners of those servers (or the owners of the software and data housed on those servers), then imagine what that means for hosting your business in a cloud shared by thousands or millions of other users."

"Here is what I argue must happen:
The law must respect digital assets in the same way that it respects physical assets. This means that search and seizure rules should apply to data and software run on third party infrastructure (or wholly owned infrastructure run in third party facilities) in the same way that they protect my home and personal property if I rent an apartment in a building housing hundreds of tenants. The fact that one tenant commits a crime is not enough for the civil liberties of all of the other tenants to be null and void. I argue the same goes for digital assets "renting" space in the cloud.
The federal government should adopt a cloud computing bill of rights. (Here is a rudimentary example.) Each state should as well. Declare loud and clear that you suffer little or no loss of rights if you choose to run your business in the cloud over running it within your own facilities. Repeal or revise the laws that make it impossible for foreign businesses and governments to allow communications and data to pass within U.S. borders (including relevant elements of the Patriot Act).
It is time for our policy makers to step up and really understand the influence that the Internet and cloud computing will have on the future growth of this country. It is scary how little technical understanding most congressional and senate members have. However, that alone is not an excuse for not grasping the policy gaps that are brought about as our commerce and society rely increasingly upon Internet-based services."

Wednesday, January 28, 2009

Caching strategy

Generally Least Recently Used (LRU) is considered a good strategy for removing items from the cache. Instead it might be useful to measure the cost (i.e. time) to produce a cache item, and how many hits that item gets and then remove the item that has the lowest value of hits * cost.

A quick Google search didn't find anything like this. Comments?