Constanza as CO, Mark as MA, Gabriel as GABI, David as DÁ. Throughout the semester we will be working in a project that is going to be an extension of a bike rebalancing predictive system for the Guadalajara Bike Sharing System (BSS) called “MIBICI”.
We will develop a bike user application where the user will be able to show himself as “connected” and enter the key of the bicycle he is using. Each time the user travels n meters away, he will send his location, latitude and longitude, in such a way that we will be able to know in real time where a determined bicycle is, and at the end of the travel we will also have the complete route that the user followed.
The recollection of this date will be useful to know which are the busiest routes, which can be very useful to make decisions regarding infrastructure improvements as creation of bicycle lanes, more bicycle collection or battery charging points, etc; or even lucrative purposes such as what is the best position for a point of sale of energy drinks, sportswear, and more others.
After discussing among the possibilities that we had for the development, we decided to make the native application for iOS and we use firebase to store the data.
At first we had considered that the best alternative was an application with react native, especially because our partner David does not have mac, but taking into account that working directly in the development of the application is not the only “chamba” in the project we went the other way.
This is first week so we will focus on downloading everything necessary, read documentation and learn the basics. Personally I am excited because it will be my first time developing a native app for iOS. I
Hello my name is Kevin Oswaldo Cabrera Navarro. In the next 3 months I’ll be writing about Software testing, verification and quality. So why does Testing and quality matter?
Quick example of why quality is important sometimes
I wonder if there is any quality department in Sony pictures that didn’t realized about how bad does the idea of this movie is. This is the equivalent to that story about how one small testing can save tons of money to a company. Probably the correlation is so tiny, but it is a good example of why everything should have a quality measurement (it doesn’t matter if it is a movie, a book or a software system)
No bugs in here!
One of the main goals in this blog is to prevent the usage of the word bug to refer to a fault, error or failure in a program. Maybe it is a word rich in history but in this blog we will try to follow the advices of Paul Ammann and Jeff Offutt writers of the Introduction to Software Testing book to don’t encapsulate all the different meanings into only one word.
But that’s another history for a different post. See you later
Clustering techniques apply when there is no class to be predicted but rather when the instances are to be divided into natural groups. These clusters presumably reflect some mechanism at work in the domain from which instances are drawn, a mechanism that causes some instances to bear a stronger resemblance to each other than they do to the remaining instances.
The groups that are identified may be exclusive so that any instance belongs in only one group. Or they may be overlapping so that an instance may fall into several groups. Or they may be probabilistic, whereby an instance belongs to each group with a certain probability.
Iterative distance-based clustering
The classic clustering technique is called k-means. First, you specify in advance how many clusters are being sought: this is the parameter k. Then k points are chosen at random as cluster centers. All instances are assigned to their closest cluster center according to the ordinary Euclidean distance metric. Next the centroid, or mean, of the instances in each cluster is calculated—this is the “means” part. These centroids are taken to be new center values for their respective clusters. Finally, the whole process is repeated with the new cluster centers. Iteration continues until the same points are assigned to each cluster in consecutive rounds, at which stage the cluster centers have stabilized and will remain the same forever.
Faster distance calculations
For example, you can project the dataset and make cuts along selected axes, instead of using the arbitrary hyperplane divisions that are implied by choosing the nearest cluster center. But this inevitably compromises the quality of the resulting clusters.
In instance-based learning the training examples are stored verbatim, and a distance function is used to determine which member of the training set is closest to an unknown test instance. Once the nearest training instance has been located, its class is predicted for the test instance.
Although there are other possible choices, most instance-based learners use Euclidean distance.
When comparing distances it is not necessary to perform the square root operation; the sums of squares can be compared directly.
Different attributes are measured on different scales, so if the Euclidean distance formula were used directly, the effects of some attributes might be completely dwarfed by others that had larger scales of measurement. Consequently, it is usual to normalize all attribute values to lie between 0 and 1, by calculating
Nearest-neighbor instance-based learning is simple and often works very well. In the method described previously each attribute has exactly the same influence on the decision, just as it does in the Naïve Bayes method. Another problem is that the database can easily become corrupted by noisy exemplars.
Ian H. Witten, Eibe Frank. (1999). Data mining practical machine learning tools and techniques. Elsevier
When the outcome, or class, is numeric, and all the attributes are numeric, linear regression is a natural technique to consider. This is a staple method in statistics. The idea is to express the class as a linear combination of the attributes, with predetermined weights:
Linear regression is an excellent, simple method for numeric prediction, and it has been widely used in statistical applications for decades. Of course, linear models suffer from the disadvantage of, well, linearity. If the data exhibits a nonlinear dependency, the best-fitting straight line will be found, where “best” is interpreted as the least mean-squared difference.
Linear classification: Logistic regression
we can use any regression technique, whether linear or nonlinear, for classification. The trick is to perform a regression for each class, setting the output equal to one for training instances that belong to the class and zero for those that do not. The result is a linear expression for the class. Then, given a test example of unknown class, calculate the value of each linear expression and choose the one that is largest. This method is sometimes
called multiresponse linear regression.
One way of looking at multiresponse linear regression is to imagine that it approximates a numeric membership function for each class. The membership function is 1 for instances that belong to that class and 0 for other instances. Given a new instance we calculate its membership for each class and select the biggest.
First, the membership values it produces are not proper probabilities because they can fall outside the range 0 to 1.
Second, leastsquares regression assumes that the errors are not only statistically independent, but are also normally distributed with the same standard deviation, an assumption that is blatantly violated
To find such rules, you would have to execute the rule-induction procedure once for every possible combination of attributes, with every possible combination of values, on the right-hand side. That would result in an enormous number of association rules, which would then have to be pruned down on the basis of their coverage (the number of instances that they predict correctly) and their accuracy (the same number expressed as a proportion of the number of instances to which the rule applies). This approach is quite infeasible. Instead, we capitalize on the fact that we are only interested in association rules with high coverage. We ignore, for the moment, the distinction between the left- and right-hand sides of a rule and seek combinations of attribute–value pairs that have a prespecified minimum coverage. These are called item sets: an attribute–value pair is an item.
Once all item sets with the required coverage have been generated, the next step is to turn each into a rule, or set of rules, with at least the specified minimum accuracy. Some item sets will produce more than one rule; others will produce none.
Generating rules efficiently
The first stage proceeds by generating all one-item sets with the given
minimum coverage and then using this to generate the two-item sets, three-item sets (third column), and so on.Each operation involves a pass through the dataset to count the items in each set, and after the pass the surviving item sets are stored in a hash table. From the one-item sets, candidate two-item sets are generated, and then a pass is made through the dataset, counting the coverage of each two-item set; at the end the candidate sets with less than minimum coverage Continue reading "Mining association rules"→
An alternative approach to a decision tree is to take each class in turn and seek a way of covering all instances in it, at the same time excluding instances not in the class. This is called a covering approach because at each stage you identify a rule that “covers” some of the instances. By its very nature, this covering approach leads to a set of rules rather than to a decision tree.
A difference between the covering algorithms in comparison with recursive divide and conquer decision trees is that, in the multi class case, a decision tree split takes all classes into account, trying to maximize the purity of the split, whereas the rule-generating method concentrates on one class at a time, disregarding what happens to the other classes.
A simple covering algorithm
Covering algorithms operate by adding tests to the rule that is under construction, always striving to create a rule with maximum accuracy. In contrast, divide and-conquer algorithms operate by adding tests to the tree that is under construction, always striving to maximize the separation among the classes. Each of these involves finding an attribute to split on. But the criterion for the best attribute is different in each case. Whereas divide-and-conquer algorithms such as ID3 choose an attribute to maximize the information gain, the covering algorithm we will describe chooses an attribute–value pair to maximize the probability of the desired classification.
The PRISM method for constructing rules. It generates only correct or “perfect” rules. It measures the success of a rule by the accuracy formula p/t. Any rule with accuracy less than 100% is “incorrect” in that it assigns cases to the class in question Continue reading "Covering algorithms: Constructing rules"→
The problem of constructing a decision tree can be expressed recursively. First, select an attribute to place at the root node and make one branch for each possible value. This splits up the example set into subsets, one for every value of the attribute. Now the process can be repeated recursively for each branch, using only those instances that actually reach the branch. If at any time all instances at a node have the same classification, stop developing that part of the tree.
The only thing left to decide is how to determine which attribute to split on, given a set of examples with different classes. If we had a measure of the purity of each node, we could choose the attribute that produces the purest daughter nodes. The measure of purity that we will use is called the information and is measured in units called bits. Associated with a node of the tree, it represents the expected amount of information that would be needed to specify whether a new instance should be classified yes or no, given that the example reached that node.
For outlook We can calculate the average information value of these, taking into account the number of instances that go down each branch—five down the first and third and four down the second:
This average represents the amount of information that we expect would be necessary to specify the class of a new instance, given the tree structure in Figure 4.2(a) Before we created any of the nascent tree structures in Figure 4.2, the training examples at the root comprised line yes and five no nodes, corresponding to an information value of