User talk:Rambot


Rambot FAQ

Below is a FAQ for general questions related to rambot. For information on the bot including IP address and known problems, see User:rambot. Please direct all talk about the bot to User talk:Ram-Man. See Wikipedia:FAQ for more FAQs.

What is rambot?

The rambot is really a custom front-end program written in the Java programming language. It has performed a number of tasks, from user interactive spellchecking to automatically modifying many thousands of U.S. city/county articles as well as creating them from pre-generated articles based on SQL queries. What most people call rambot articles are often just the result copying a local article from one's own computer to Wikipedia itself without any knowledge of the article itself. The rambot is not so much an author as a copier or copyeditor. Generally, a rambot could consist of the background tasks used to harvest and process data into articles, but this is technically done with human-assisted computing (and has nothing to do with the Java bot code).

What is the name of the bot?

Ortolan88 coined the name rambot (named after User:Ram-Man of course). The name is correctly spelled in all lowercase.

There is a problem with the bot. How do I block it?

The most recent IP address information will be posted on the bot's user page here.

Is your bot slowing down Wikipedia?

Probably not, but it is possible. Most bot owners, including this one, try to use bots during off-peak hours and implement features in the bots to back off when problems occur. The effect of bots on Wikipedia has been discussed at Wikipedia talk:Bots at great length and will continue to change as hardware and software changes.

Can I have the source code to your bot?

While I normally like to be open (otherwise I wouldn't be here), I don't want script kiddies ruining our experience here. Besides, an intelligent well-meaning programmer can easily duplicate the work with little effort.

I hate bots, what do I do?

You're not the only one! See Wikipedia:Bots for a discussion on the benefits/disadvantages of bots.

How do we know the geographic data is accurate or correct?

Due to the similar nature of all the articles, The data can be verified periodically by having the rambot check the source data with the article data. This can also be used to automatically update data as it becomes available. The source of the information can be found at Geographic references. The articles can only be as good as the sources, but they are as good as we can get on this scale. In the articles themselves the sources are referenced by numerical superscripts such as [[Geographic references|<sup>1</sup>]].

What good are these articles? They are just gazetteer entries and not encyclopedia articles. There is also too high of a percentage of these articles. We want variety!

It is true that these articles contain information that is found in a gazetteer, however, it is not only that. It contains a collection of information from a variety of sources as well as individual edits from persons who know about the cities or counties the articles are about. Some people are uncomfortable with the idea of a bot generating articles, however, these articles are often times more complete than other stubs on other topics that are done by humans. A lot has been said here about the worth of the articles, but generally the best thing to do is not to complain about the articles but to add to them and make them better.
Over thanksgiving I was with my wife's family and we had a discussion about the demographics of my hometown and my wife's family's town. They wanted to know the very information that I had added to the respective city articles a month earlier. So in fact I went to the computer and had their answer in less than a minute. Needless to say they were impressed and once again I got to see how this information can be quite useful.
This view has been stated many times here but few actually feel that the articles should not be added. (Un?)officially, the "Random Page" feature was designed to help people find stubs to add to. If not for that feature, no one would know or care about the percentage of city entries, so in essence it is not the articles themselves that are a problem, but a single feature which is biased towards them. But it is true that there is a lack of balance, but that only implies that we need more people to add on a variety of topics. But we will always need that. The best thing to do is work harder. See Deaf Smith County, Texas and its talk page for an example of this in action.

My favorite city XXX is missing, where is it?

Believe it or not even though 30,000 cities were entered, many are still missing. Over 1,000 entries could not be immediately automated and are still on the TODO list. These will be done sometime as I find time to work on the list. If it still does not show up, it is possible that the census bureau does not consider it an independent census location. Don't wait around for me to add an article, add it yourself!

If we're interested in adding further information about a city or town to its article, will the updates by the bot delete what we've added?

No. The bot is just like anyone else and will only modify existing pages. It will even be affected by edit conflicts as well.

Some of the cities are just neighborhoods and not really cities

This is a known "problem". The solution is simply to update the article to replace the wrong name with the correct one (e.g. replacing "city" with "neighborhood" and rewording accordingly). One real example is that of Wheeler AFB, Hawaii which is really a U.S. Air Force base and not a town. These will get fixed as people notice them and correct the inaccuracies.

Where did all the bot entries go from Recent Changes?

Access the recent changes page with bot entries here.

The rambot screwed up an accented character or some other character. What's up with that?

The rambot originally could only handle 7-bit ASCII characters, so all of the extended characters were messed up. This was then fixed to contain full 8-bit ASCII support. Recently, however, partial support for unicode has been added in that characters larger than 8-bit will be converted into their HTML equivalents. If any errors still exist they should be reported to User talk:Ram-Man so it can be fixed.

There is a duplicate article on X
Why are there two articles for "Fooville (city), Some County, State" and "Fooville (town), Some County, State?"

The US census bureau sometimes lists multiple entries for the same general place, however, this place is not always exactly the same. A town may be smaller than a city (similar to a town and township relationship). Sometimes the one is a subentity of the other and the "city" is the governing agent. Sometimes these two agents contain the same data. Most of these things are either intentional or accidental things that must be corrected. Which one depends on your own knowledge of the place in question. Feel free to discuss it on the talk page or try to fix it yourself. (Large) PDF maps displaying the street-by-street boundaries of the census areas are available for download, and can be useful for understanding how things are divided up.

Are the rambot entries available under alternate licenses?

All English Wikipedia main and main talk namespace articles or edits produced by the rambot are multi-licensed as described on the rambot user page. I encourage you multi-license your contributions as well. Most changes are not available under the public domain, however, the original data for most of the city/country articles is. (See: Geographic references)

How many county and city entries are in the rambot's database?

Approximately 3,141 counties and 33,832 cities (10,024 cities, 8,039 towns, 5,655 CDPs, 4,853 townships, 3,768 villages, 1,232 boroughs, 99 unorganized territories, 49 locations, 34 U.S. Air Force bases, 33 plantations, 16 Indian reservations, 14 balances, 9 counties, 3 gores, and 1 grant, municipality, purchase, and district). This results in a maximum of 36,973 articles created and mantained by the rambot.